Full Code of crs4/pydoop for AI

develop c346870c27b2 cached

370 files

1.6 MB

531.1k tokens

1874 symbols

1 requests

Download .txt

Showing preview only (1,784K chars total). Download the full file or copy to clipboard to get everything.

Repository: crs4/pydoop
Branch: develop
Commit: c346870c27b2
Files: 370
Total size: 1.6 MB

Directory structure:
gitextract_2qljhz4z/

├── .dir-locals.el
├── .dockerignore
├── .gitignore
├── .travis/
│   ├── check_script_template.py
│   ├── cmd/
│   │   └── hadoop_localfs.sh
│   ├── run_checks
│   └── start_container
├── .travis.yml
├── AUTHORS
├── Dockerfile
├── Dockerfile.client
├── Dockerfile.docs
├── LICENSE
├── MANIFEST.in
├── README.md
├── VERSION
├── dev_tools/
│   ├── build_deprecation_tables
│   ├── bump_copyright_year
│   ├── docker/
│   │   ├── client_side_tests/
│   │   │   ├── apache_2.6.0/
│   │   │   │   ├── initialize.sh
│   │   │   │   └── local_client_setup.sh
│   │   │   └── hdp_2.2.0.0/
│   │   │       ├── initialize.sh
│   │   │       └── local_client_setup.sh
│   │   ├── cluster.rst
│   │   ├── clusters/
│   │   │   └── apache_2.6.0/
│   │   │       ├── docker-compose.yml
│   │   │       └── images/
│   │   │           ├── base/
│   │   │           │   ├── Dockerfile
│   │   │           │   └── scripts/
│   │   │           │       ├── generate_conf_files.py
│   │   │           │       ├── zk_set.py
│   │   │           │       └── zk_wait.py
│   │   │           ├── bootstrap/
│   │   │           │   ├── Dockerfile
│   │   │           │   └── scripts/
│   │   │           │       ├── bootstrap.py
│   │   │           │       └── create_hdfs_dirs.sh
│   │   │           ├── datanode/
│   │   │           │   ├── Dockerfile
│   │   │           │   └── scripts/
│   │   │           │       └── start_datanode.sh
│   │   │           ├── historyserver/
│   │   │           │   ├── Dockerfile
│   │   │           │   └── scripts/
│   │   │           │       └── start_historyserver.sh
│   │   │           ├── namenode/
│   │   │           │   ├── Dockerfile
│   │   │           │   └── scripts/
│   │   │           │       └── start_namenode.sh
│   │   │           ├── nodemanager/
│   │   │           │   ├── Dockerfile
│   │   │           │   └── scripts/
│   │   │           │       └── start_nodemanager.sh
│   │   │           ├── resourcemanager/
│   │   │           │   ├── Dockerfile
│   │   │           │   └── scripts/
│   │   │           │       └── start_resourcemanager.sh
│   │   │           └── zookeeper/
│   │   │               ├── Dockerfile
│   │   │               └── scripts/
│   │   │                   └── start_namenode.sh
│   │   ├── images/
│   │   │   ├── base/
│   │   │   │   └── Dockerfile
│   │   │   └── client/
│   │   │       └── Dockerfile
│   │   └── scripts/
│   │       ├── build_base_images.sh
│   │       ├── build_cluster_images.sh
│   │       ├── share_etc_hosts.py
│   │       ├── start_client.sh
│   │       └── start_cluster.sh
│   ├── docker_build
│   ├── dump_app_params
│   ├── edit_conf
│   ├── git_export
│   ├── import_src
│   ├── mapred_pipes
│   ├── unpack_debian
│   └── update_docs
├── docs/
│   ├── Makefile
│   ├── _build/
│   │   └── .gitignore
│   ├── _templates/
│   │   └── layout.html
│   ├── api_docs/
│   │   ├── hadut.rst
│   │   ├── hdfs_api.rst
│   │   ├── index.rst
│   │   └── mr_api.rst
│   ├── conf.py
│   ├── examples/
│   │   ├── avro.rst
│   │   ├── index.rst
│   │   ├── input_format.rst
│   │   ├── intro.rst
│   │   └── sequence_file.rst
│   ├── how_to_cite.rst
│   ├── index.rst
│   ├── installation.rst
│   ├── news/
│   │   ├── archive.rst
│   │   ├── index.rst
│   │   └── latest.rst
│   ├── pydoop_script.rst
│   ├── pydoop_script_options.rst
│   ├── pydoop_submit_options.rst
│   ├── running_pydoop_applications.rst
│   ├── self_contained.rst
│   └── tutorial/
│       ├── hdfs_api.rst
│       ├── index.rst
│       ├── mapred_api.rst
│       └── pydoop_script.rst
├── examples/
│   ├── README
│   ├── avro/
│   │   ├── build.sh
│   │   ├── config.sh
│   │   ├── data/
│   │   │   └── mini_aligned_seqs.gz.parquet
│   │   ├── pom.xml
│   │   ├── py/
│   │   │   ├── avro_base.py
│   │   │   ├── avro_container_dump_results.py
│   │   │   ├── avro_key_in.py
│   │   │   ├── avro_key_in_out.py
│   │   │   ├── avro_key_value_in.py
│   │   │   ├── avro_key_value_in_out.py
│   │   │   ├── avro_parquet_dump_results.py
│   │   │   ├── avro_pyrw.py
│   │   │   ├── avro_value_in.py
│   │   │   ├── avro_value_in_out.py
│   │   │   ├── check_cc.py
│   │   │   ├── check_results.py
│   │   │   ├── color_count.py
│   │   │   ├── create_input.py
│   │   │   ├── gen_data.py
│   │   │   ├── generate_avro_users.py
│   │   │   ├── kmer_count.py
│   │   │   ├── show_kmer_count.py
│   │   │   └── write_avro.py
│   │   ├── run
│   │   ├── run_avro_container_in
│   │   ├── run_avro_container_in_out
│   │   ├── run_avro_parquet_in
│   │   ├── run_avro_parquet_in_out
│   │   ├── run_avro_pyrw
│   │   ├── run_color_count
│   │   ├── run_kmer_count
│   │   ├── schemas/
│   │   │   ├── alignment_record.avsc
│   │   │   ├── alignment_record_proj.avsc
│   │   │   ├── pet.avsc
│   │   │   ├── stats.avsc
│   │   │   └── user.avsc
│   │   ├── src/
│   │   │   └── main/
│   │   │       └── java/
│   │   │           └── it/
│   │   │               └── crs4/
│   │   │                   └── pydoop/
│   │   │                       ├── WriteKV.java
│   │   │                       └── WriteParquet.java
│   │   └── write_avro_kv
│   ├── c++/
│   │   ├── HadoopPipes.cc
│   │   ├── Makefile
│   │   ├── README.txt
│   │   ├── SerialUtils.cc
│   │   ├── StringUtils.cc
│   │   ├── include/
│   │   │   └── hadoop/
│   │   │       ├── Pipes.hh
│   │   │       ├── SerialUtils.hh
│   │   │       ├── StringUtils.hh
│   │   │       └── TemplateFactory.hh
│   │   └── wordcount.cc
│   ├── config.sh
│   ├── hdfs/
│   │   ├── common.py
│   │   ├── repl_session.py
│   │   ├── run
│   │   ├── treegen.py
│   │   └── treewalk.py
│   ├── input/
│   │   ├── alice_1.txt
│   │   └── alice_2.txt
│   ├── input_format/
│   │   ├── check_results.py
│   │   ├── it/
│   │   │   └── crs4/
│   │   │       └── pydoop/
│   │   │           ├── mapred/
│   │   │           │   └── TextInputFormat.java
│   │   │           └── mapreduce/
│   │   │               └── TextInputFormat.java
│   │   └── run
│   ├── pydoop_script/
│   │   ├── check.py
│   │   ├── data/
│   │   │   ├── base_histogram_input/
│   │   │   │   ├── example_1.sam
│   │   │   │   └── example_2.sam
│   │   │   ├── stop_words.txt
│   │   │   └── transpose_input/
│   │   │       └── matrix.txt
│   │   ├── run
│   │   ├── run_script.sh
│   │   └── scripts/
│   │       ├── base_histogram.py
│   │       ├── caseswitch.py
│   │       ├── grep.py
│   │       ├── lowercase.py
│   │       ├── transpose.py
│   │       ├── wc_combiner.py
│   │       ├── wordcount.py
│   │       └── wordcount_sw.py
│   ├── pydoop_submit/
│   │   ├── check.py
│   │   ├── data/
│   │   │   ├── cols_1.txt
│   │   │   └── cols_2.txt
│   │   ├── mr/
│   │   │   ├── map_only_java_writer.py
│   │   │   ├── map_only_python_writer.py
│   │   │   ├── nosep.py
│   │   │   ├── wordcount_full.py
│   │   │   └── wordcount_minimal.py
│   │   ├── run
│   │   └── run_submit.sh
│   ├── run_all
│   ├── self_contained/
│   │   ├── check_results.py
│   │   ├── run
│   │   └── vowelcount/
│   │       ├── __init__.py
│   │       ├── lib/
│   │       │   └── __init__.py
│   │       └── mr/
│   │           ├── __init__.py
│   │           ├── main.py
│   │           ├── mapper.py
│   │           └── reducer.py
│   └── sequence_file/
│       ├── bin/
│       │   ├── filter.py
│       │   └── wordcount.py
│       ├── check.py
│       └── run
├── int_test/
│   ├── config.sh
│   ├── mapred_submitter/
│   │   ├── check.py
│   │   ├── genwords.py
│   │   ├── input/
│   │   │   ├── map_only/
│   │   │   │   ├── f1.txt
│   │   │   │   └── f2.txt
│   │   │   ├── map_reduce/
│   │   │   │   ├── f1.txt
│   │   │   │   └── f2.txt
│   │   │   └── map_reduce_long/
│   │   │       └── f.txt
│   │   ├── mr/
│   │   │   ├── map_only_java_writer.py
│   │   │   ├── map_only_python_writer.py
│   │   │   ├── map_reduce_combiner.py
│   │   │   ├── map_reduce_java_rw.py
│   │   │   ├── map_reduce_java_rw_pstats.py
│   │   │   ├── map_reduce_python_partitioner.py
│   │   │   ├── map_reduce_python_reader.py
│   │   │   ├── map_reduce_python_writer.py
│   │   │   ├── map_reduce_raw_io.py
│   │   │   ├── map_reduce_slow_java_rw.py
│   │   │   └── map_reduce_slow_python_rw.py
│   │   ├── run
│   │   ├── run_app.sh
│   │   └── run_perf.sh
│   ├── opaque_split/
│   │   ├── check.py
│   │   ├── gen_splits.py
│   │   ├── mrapp.py
│   │   └── run
│   ├── progress/
│   │   ├── mrapp.py
│   │   └── run
│   └── run_all
├── lib/
│   └── avro-mapred-1.7.7-hadoop2.jar
├── logo/
│   └── ubuntu-font-family.tar.bz2
├── notice_template.txt
├── pydoop/
│   ├── __init__.py
│   ├── app/
│   │   ├── __init__.py
│   │   ├── argparse_types.py
│   │   ├── main.py
│   │   ├── script.py
│   │   ├── script_template.py
│   │   └── submit.py
│   ├── avrolib.py
│   ├── hadoop_utils.py
│   ├── hadut.py
│   ├── hdfs/
│   │   ├── __init__.py
│   │   ├── common.py
│   │   ├── core/
│   │   │   └── __init__.py
│   │   ├── file.py
│   │   ├── fs.py
│   │   └── path.py
│   ├── jc.py
│   ├── mapreduce/
│   │   ├── __init__.py
│   │   ├── api.py
│   │   ├── binary_protocol.py
│   │   ├── connections.py
│   │   └── pipes.py
│   ├── test_support.py
│   ├── test_utils.py
│   └── utils/
│       ├── __init__.py
│       ├── conversion_tables.py
│       ├── jvm.py
│       ├── misc.py
│       └── py3compat.py
├── pydoop.properties
├── requirements.txt
├── setup.cfg
├── setup.py
├── src/
│   ├── Py_macros.h
│   ├── buf_macros.h
│   ├── it/
│   │   └── crs4/
│   │       └── pydoop/
│   │           ├── NoSeparatorTextOutputFormat.java
│   │           └── mapreduce/
│   │               └── pipes/
│   │                   ├── Application.java
│   │                   ├── BinaryProtocol.java
│   │                   ├── DownwardProtocol.java
│   │                   ├── DummyRecordReader.java
│   │                   ├── OpaqueSplit.java
│   │                   ├── OutputHandler.java
│   │                   ├── PipesMapper.java
│   │                   ├── PipesNonJavaInputFormat.java
│   │                   ├── PipesNonJavaOutputFormat.java
│   │                   ├── PipesPartitioner.java
│   │                   ├── PipesReducer.java
│   │                   ├── PydoopAvroBridgeKeyReader.java
│   │                   ├── PydoopAvroBridgeKeyValueReader.java
│   │                   ├── PydoopAvroBridgeKeyValueWriter.java
│   │                   ├── PydoopAvroBridgeKeyWriter.java
│   │                   ├── PydoopAvroBridgeReaderBase.java
│   │                   ├── PydoopAvroBridgeValueReader.java
│   │                   ├── PydoopAvroBridgeValueWriter.java
│   │                   ├── PydoopAvroBridgeWriterBase.java
│   │                   ├── PydoopAvroInputBridgeBase.java
│   │                   ├── PydoopAvroInputKeyBridge.java
│   │                   ├── PydoopAvroInputKeyValueBridge.java
│   │                   ├── PydoopAvroInputValueBridge.java
│   │                   ├── PydoopAvroKeyInputFormat.java
│   │                   ├── PydoopAvroKeyOutputFormat.java
│   │                   ├── PydoopAvroKeyRecordReader.java
│   │                   ├── PydoopAvroKeyRecordWriter.java
│   │                   ├── PydoopAvroKeyValueInputFormat.java
│   │                   ├── PydoopAvroKeyValueOutputFormat.java
│   │                   ├── PydoopAvroKeyValueRecordReader.java
│   │                   ├── PydoopAvroKeyValueRecordWriter.java
│   │                   ├── PydoopAvroOutputBridgeBase.java
│   │                   ├── PydoopAvroOutputFormatBase.java
│   │                   ├── PydoopAvroOutputKeyBridge.java
│   │                   ├── PydoopAvroOutputKeyValueBridge.java
│   │                   ├── PydoopAvroOutputValueBridge.java
│   │                   ├── PydoopAvroRecordReaderBase.java
│   │                   ├── PydoopAvroRecordWriterBase.java
│   │                   ├── PydoopAvroValueInputFormat.java
│   │                   ├── PydoopAvroValueOutputFormat.java
│   │                   ├── PydoopAvroValueRecordReader.java
│   │                   ├── PydoopAvroValueRecordWriter.java
│   │                   ├── Submitter.java
│   │                   ├── TaskLog.java
│   │                   ├── TaskLogAppender.java
│   │                   └── UpwardProtocol.java
│   ├── libhdfs/
│   │   ├── common/
│   │   │   ├── htable.c
│   │   │   └── htable.h
│   │   ├── config.h
│   │   ├── exception.c
│   │   ├── exception.h
│   │   ├── hdfs.c
│   │   ├── include/
│   │   │   └── hdfs/
│   │   │       └── hdfs.h
│   │   ├── jni_helper.c
│   │   ├── jni_helper.h
│   │   └── os/
│   │       ├── mutexes.h
│   │       ├── posix/
│   │       │   ├── mutexes.c
│   │       │   ├── platform.h
│   │       │   ├── thread.c
│   │       │   └── thread_local_storage.c
│   │       ├── thread.h
│   │       ├── thread_local_storage.h
│   │       └── windows/
│   │           ├── inttypes.h
│   │           ├── mutexes.c
│   │           ├── platform.h
│   │           ├── thread.c
│   │           ├── thread_local_storage.c
│   │           └── unistd.h
│   ├── native_core_hdfs/
│   │   ├── hdfs_file.cc
│   │   ├── hdfs_file.h
│   │   ├── hdfs_fs.cc
│   │   ├── hdfs_fs.h
│   │   └── hdfs_module.cc
│   ├── py3k_compat.h
│   └── sercore/
│       ├── HadoopUtils/
│       │   ├── SerialUtils.cc
│       │   └── SerialUtils.hh
│       ├── hu_extras.cpp
│       ├── hu_extras.h
│       ├── sercore.cpp
│       ├── streams.cpp
│       └── streams.h
└── test/
    ├── __init__.py
    ├── all_tests.py
    ├── app/
    │   ├── __init__.py
    │   ├── all_tests.py
    │   └── test_submit.py
    ├── avro/
    │   ├── all_tests.py
    │   ├── common.py
    │   ├── test_io.py
    │   └── user.avsc
    ├── common/
    │   ├── __init__.py
    │   ├── all_tests.py
    │   ├── test_hadoop_utils.py
    │   ├── test_hadut.py
    │   ├── test_pydoop.py
    │   └── test_test_support.py
    ├── hdfs/
    │   ├── __init__.py
    │   ├── all_tests.py
    │   ├── common_hdfs_tests.py
    │   ├── test_common.py
    │   ├── test_core.py
    │   ├── test_hdfs.py
    │   ├── test_hdfs_fs.py
    │   ├── test_local_fs.py
    │   ├── test_path.py
    │   └── try_hdfs.py
    ├── mapreduce/
    │   ├── __init__.py
    │   ├── all_tests.py
    │   ├── it/
    │   │   └── crs4/
    │   │       └── pydoop/
    │   │           └── mapreduce/
    │   │               └── pipes/
    │   │                   └── OpaqueRoundtrip.java
    │   ├── m_task.cmd
    │   ├── r_task.cmd
    │   ├── test_connections.py
    │   └── test_opaque.py
    └── sercore/
        ├── all_tests.py
        ├── test_deser.py
        └── test_streams.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .dir-locals.el
================================================
;;; Directory Local Variables
;;; See Info node `(emacs) Directory Variables' for more information.

((python-mode
  (flycheck-flake8rc . "setup.cfg")))


================================================
FILE: .dockerignore
================================================
.*
Dockerfile*
docker


================================================
FILE: .gitignore
================================================
*.pyc
*~
build
docs/_static/favicon.ico
docs/_static/logo.png
pydoop/config.py
pydoop/version.py
src/hadoop*/libhdfs/config.h
src/hdfs/hdfs.xcodeproj
src/hdfs/hdfs/*

dist

examples/**/*.class
examples/**/*.jar

test/timings/dataset

pydoop.egg-info

.DS_Store
.idea
*.xcodeproj


================================================
FILE: .travis/check_script_template.py
================================================
"""\
Perform full substitution on the Pydoop script template and check
it with flake8.

Any options (i.e., arguments starting with at least a dash) are passed
through to flake8.
"""

import sys
import os
import tempfile

from flake8.main.cli import main as flake8_main


THIS_DIR = os.path.dirname(os.path.abspath(__file__))
sys.path.append(os.path.join(THIS_DIR, os.pardir, "pydoop", "app"))
from script_template import DRIVER_TEMPLATE


def main(argv):
    code = DRIVER_TEMPLATE.substitute(
        module="module",
        map_fn="map_fn",
        reduce_fn="reduce_fn",
        combine_fn="combine_fn",
        combiner_wp="None",
    )
    fd = None
    try:
        fd, fn = tempfile.mkstemp(suffix=".py", text=True)
        os.write(fd, code.encode("utf-8"))
    finally:
        if fd is not None:
            os.close(fd)
    flake8_argv = [fn] + [_ for _ in argv if _.startswith("-")]
    try:
        flake8_main(flake8_argv)
    finally:
        os.remove(fn)


if __name__ == "__main__":
    argv = sys.argv[1:]
    if set(argv).intersection(["-h", "--help"]):
        print(__doc__)
    else:
        main(argv)


================================================
FILE: .travis/cmd/hadoop_localfs.sh
================================================
#!/bin/bash

set -euo pipefail
[ -n "${DEBUG:-}" ] && set -x

function onshutdown {
    mr-jobhistory-daemon.sh stop historyserver
    yarn-daemon.sh stop nodemanager
    yarn-daemon.sh stop resourcemanager
}

trap onshutdown SIGTERM
trap onshutdown SIGINT

conf_dir=$(dirname $(dirname $(command -v hadoop)))/etc/hadoop
cat >"${conf_dir}"/core-site.xml <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
</configuration>
EOF
cat >"${conf_dir}"/hdfs-site.xml <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
</configuration>
EOF

yarn-daemon.sh start resourcemanager
yarn-daemon.sh start nodemanager
mr-jobhistory-daemon.sh start historyserver

tail -f /dev/null

onshutdown


================================================
FILE: .travis/run_checks
================================================
#!/bin/bash

set -euo pipefail
[ -n "${DEBUG:-}" ] && set -x

docker exec pydoop bash -c 'cd test && ${PYTHON} all_tests.py'
docker exec pydoop bash -c 'cd test/avro && ${PYTHON} all_tests.py'
docker exec -e DEBUG="${DEBUG:-}" pydoop bash -c 'cd int_test && ./run_all'
docker exec -e DEBUG="${DEBUG:-}" pydoop bash -c 'cd examples && ./run_all'
docker exec -e DEBUG="${DEBUG:-}" pydoop bash -c 'cd examples/avro && ./run'


================================================
FILE: .travis/start_container
================================================
#!/bin/bash

set -euo pipefail
[ -n "${DEBUG:-}" ] && set -x
this="${BASH_SOURCE-$0}"
this_dir=$(cd -P -- "$(dirname -- "${this}")" && pwd -P)
img=crs4/pydoop:${HADOOP_VERSION}-${TRAVIS_PYTHON_VERSION}

pushd "${this_dir}"
cmd_dir=$(readlink -e "cmd")
pushd ..
docker build . \
  --build-arg hadoop_version=${HADOOP_VERSION} \
  --build-arg python_version=${TRAVIS_PYTHON_VERSION} \
  -t ${img}
if [ -n "${LOCAL_FS:-}" ]; then
    docker run --rm --name pydoop -v "${cmd_dir}":/cmd:ro -d ${img} \
      /cmd/hadoop_localfs.sh
else
    docker run --rm --name pydoop -d ${img}
    docker exec pydoop bash -c 'until datanode_cid; do sleep 0.1; done'
fi
popd
popd


================================================
FILE: .travis.yml
================================================
language: python

cache: pip

matrix:
  include:
  - python: "2.7"
    env: HADOOP_VERSION=3.2.0
  - python: "3.6"
    env: HADOOP_VERSION=2.9.2
  - python: "3.6"
    env: HADOOP_VERSION=3.2.0
  - python: "3.6"
    env: HADOOP_VERSION=3.2.0 LOCAL_FS=true
  - python: "3.7"
    env: HADOOP_VERSION=3.2.0
    dist: xenial

sudo: required

services: docker

before_install: pip install flake8

# skip installation, requirements are handled in the Docker image
install: true

before_script:
  - flake8 -v .
  - python .travis/check_script_template.py -v
  - docker build -t crs4/pydoop-docs -f Dockerfile.docs .

script:
 - ./.travis/start_container
 - ./.travis/run_checks
 - docker stop pydoop

deploy:
  provider: pypi
  user: "${CI_USER}"
  password: "${CI_PASS}"
  on:
    python: "3.7"
    repo: crs4/pydoop
    tags: true


================================================
FILE: AUTHORS
================================================
Pydoop is developed and maintained by:
 * Simone Leo <simone.leo@crs4.it>
 * Gianluigi Zanetti <gianluigi.zanetti@crs4.it>
 * Luca Pireddu <luca.pireddu@crs4.it>
 * Francesco Cabras <francesco.cabras@crs4.it>
 * Mauro Del Rio <mauro@crs4.it>
 * Marco Enrico Piras <kikkomep@crs4.it>

Other contributors:
 * Cosmin Cătănoaie
 * Liam Slusser
 * Jeremy G. Kahn
 * Simon Li


================================================
FILE: Dockerfile
================================================
ARG hadoop_version=3.2.0
ARG python_version=3.6

FROM crs4/pydoop-base:${hadoop_version}-${python_version}

COPY . /build/pydoop
WORKDIR /build/pydoop

RUN ${PYTHON} -m pip install --no-cache-dir --upgrade -r requirements.txt \
    && ${PYTHON} setup.py sdist \
    && ${PYTHON} -m pip install --pre dist/pydoop-$(cat VERSION).tar.gz


================================================
FILE: Dockerfile.client
================================================
ARG hadoop_version=3.2.0
ARG python_version=3.6

FROM crs4/pydoop-client-base:${hadoop_version}-${python_version}

COPY . /build/pydoop
WORKDIR /build/pydoop

RUN ${PYTHON} -m pip install --no-cache-dir --upgrade -r requirements.txt \
    && ${PYTHON} setup.py build \
    && ${PYTHON} setup.py install --skip-build \
    && ${PYTHON} setup.py clean


================================================
FILE: Dockerfile.docs
================================================
FROM crs4/pydoop-docs-base

COPY . /build/pydoop
WORKDIR /build/pydoop

RUN ${PYTHON} -m pip install --no-cache-dir --upgrade -r requirements.txt \
    && ${PYTHON} setup.py build \
    && ${PYTHON} setup.py install --skip-build \
    && ${PYTHON} setup.py clean \
    && inkscape -z -D -f logo/logo.svg -e logo.png -w 800 2>/dev/null \
    && convert -resize 200x logo.png docs/_static/logo.png \
    && inkscape -z -D -f logo/favicon.svg -e 256.png -w 256 -h 256 2>/dev/null \
    && for i in 16 32 64 128; do \
        convert 256.png -resize ${i}x${i} ${i}.png; \
    done \
    && convert 16.png 32.png 64.png 128.png docs/_static/favicon.ico \
    && for a in script submit; do \
        ${PYTHON} dev_tools/dump_app_params --app ${a} -o docs/pydoop_${a}_options.rst; \
    done \
    && make SPHINXOPTS="-W" -C docs html


================================================
FILE: LICENSE
================================================

                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: MANIFEST.in
================================================
include AUTHORS LICENSE VERSION README.md pydoop.properties requirements.txt

recursive-include src *
recursive-include test *
recursive-include examples *
recursive-include docs *
recursive-include lib *


================================================
FILE: README.md
================================================
[![Build Status](https://travis-ci.org/crs4/pydoop.png)](https://travis-ci.org/crs4/pydoop)

Pydoop is a Python MapReduce and HDFS API for
[Hadoop](http://hadoop.apache.org/).

Copyright 2009-2026 [CRS4](http://www.crs4.it/).

To get started, take a look at [the docs](http://crs4.github.io/pydoop/).


================================================
FILE: VERSION
================================================
2.0.0


================================================
FILE: dev_tools/build_deprecation_tables
================================================
#!/usr/bin/env python

"""
An utility to generate mrv1 to mrv2 conversion tables.

Usage::

  bash$ build_deprecation_tables /opt/hadoop-2.4.1-src ./pydoop/utils/conversion_tables.py


"""

import os, sys, re

DEFAULT_DEPRECATED_PROPERTIES_APT_VM_FNAME = \
"hadoop-common-project/hadoop-common/src/site/apt/DeprecatedProperties.apt.vm"



block_separator = '||'

def extract_tables(apt_vm_fname):
    """Returns the deprecated-to-new-property table and its inverse as two dict(s)."""
    with open(apt_vm_fname) as f:
        lines = [x  for x in f.readlines() if re.match('^\|[^\|]', x)]
    pairs = [p for p in [map(lambda x : x.strip(), l.split('|'))[1:] for l in lines]
               if not p[1].startswith('NONE')]
    return dict(pairs), dict(( (y, x) for (x, y) in pairs))
    

def main(argv):
    src_root = argv[0]
    module_path = argv[1]
    fname = os.path.join(src_root, DEFAULT_DEPRECATED_PROPERTIES_APT_VM_FNAME)
    mrv1_to_mrv2, mrv2_to_mrv1 = extract_tables(fname)
    with open(module_path, 'w') as f:
        f.write('mrv1_to_mrv2=%r\n' % mrv1_to_mrv2);
        f.write('mrv2_to_mrv1=%r\n' % mrv2_to_mrv1);


if __name__ == "__main__":
  main(sys.argv[1:])


================================================
FILE: dev_tools/bump_copyright_year
================================================
#!/usr/bin/env python

"""\
Set copyright end year across the distribution.
"""

import sys
import os
import re
import argparse
import datetime


THIS_YEAR = datetime.date.today().year
THIS_DIR = os.path.dirname(os.path.abspath(__file__))
PATTERN = re.compile(r"(?<=opyright 2009-)\d+")


def find_files(root_dir):
    for d, subdirs, fnames in os.walk(root_dir, topdown=True):
        for fn in fnames:
            yield os.path.join(d, fn)
        subdirs[:] = [_ for _ in subdirs if _ != ".git"]


def bump_end_year(root_dir, year):
    year = "%d" % year
    for fn in find_files(root_dir):
        if fn == os.path.abspath(__file__):
            continue
        print("processing %r" % (fn,))
        with open(fn, "r") as f:
            try:
                content = f.read()
            except UnicodeDecodeError:
                continue
        with open(fn, "w") as f:
            f.write(re.sub(PATTERN, year, content))


def make_parser():
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument("-y", type=int, metavar="YYYY", default=THIS_YEAR,
                        help="copyright end year (default = current)")
    return parser


def main(argv):
    parser = make_parser()
    args = parser.parse_args(argv[1:])
    repo_root = os.path.dirname(THIS_DIR)
    bump_end_year(repo_root, args.y)


if __name__ == "__main__":
    main(sys.argv)


================================================
FILE: dev_tools/docker/client_side_tests/apache_2.6.0/initialize.sh
================================================
#!/bin/bash

port=$1
client_id=$2
rm_container_id=$3
DOCKER_HOST_IP=${4:-localhost}
#----------------------------------
client_name=`docker exec ${client_id} hostname`

#----- Upload hadoop to the client container
hdp_ver=hadoop-2.6.0
hdp_tgz=${hdp_ver}.tar.gz
if [[ ! -f ${hdp_tgz} ]]
then
	hdp_url=http://mirror.nohup.it/apache/hadoop/common/${hdp_ver}/${hdp_tgz}
	wget ${hdp_url} -O ${hdp_tgz}
fi

# copy the hadoop*.tar.gz
scp -P${port} ${hdp_tgz} root@${DOCKER_HOST_IP}:/opt/

# copy the installer script
scp -P${port} local_client_setup.sh root@${DOCKER_HOST_IP}:.

# exec and remove the installer script
ssh -p${port} root@${DOCKER_HOST_IP} './local_client_setup.sh && rm local_client_setup.sh'

# copy the hadoop configuration from the resourcemanager container to the client container
echo "Copying hadoop config from the resourcemanager container..."
for c in core-site.xml mapred-site.xml yarn-site.xml
do
    from=/opt/hadoop/etc/hadoop/${c}
    to=/opt/hadoop/etc/hadoop/${c}
    docker exec -it ${rm_container_id} scp ${from} ${client_name}:${to}
done



================================================
FILE: dev_tools/docker/client_side_tests/apache_2.6.0/local_client_setup.sh
================================================
#!/bin/bash

#-----------
# This script should be run in the client container.


pushd /opt

#----- Hadoop setup
hdp_ver=hadoop-2.6.0
hdp_tgz=${hdp_ver}.tar.gz
tar xzf ${hdp_tgz}
ln -s ./${hdp_ver} hadoop
cat <<EOF  > /opt/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://namenode:9000</value>
</property>
</configuration>
EOF

cat <<EOF  > /opt/hadoop/etc/hadoop/yarn-site.xml
<configuration>
	<property>
	  <name>yarn.resourcemanager.hostname</name>
	  <value>resourcemanager</value>
	</property>
</configuration>
EOF
export HADOOP_HOME=/opt/hadoop
export PATH=${HADOOP_HOME}/bin:${PATH}
popd

#------------------
# Pydoop setup
git_url=https://github.com/crs4/pydoop.git

cat <<EOF > /home/aen/prepare_pydoop.sh
export HADOOP_HOME=/opt/hadoop
git clone ${git_url}
cd pydoop
python setup.py build
EOF

cat <<EOF > /home/aen/run_tests.sh
export HADOOP_HOME=/opt/hadoop
export PATH=\${HADOOP_HOME}/bin:\${PATH}
cd pydoop/test
python all_tests.py
EOF

cat <<EOF > /home/aen/run_examples.sh
export HADOOP_HOME=/opt/hadoop
export PATH=\${HADOOP_HOME}/bin:\${PATH}
cd pydoop/examples
./run_all
EOF

cat <<EOF > /home/aen/run_test_jar.sh
export HADOOP_HOME=/opt/hadoop
export PATH=\${HADOOP_HOME}/bin:\${PATH}
hdfs dfs -put run_test_jar.sh
yarn jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar wordcount run_test_jar.sh foobar
EOF


#----------------------------------------------------
# Fix bad sw versions and missing things
apt-get install -y zip 
pip install setuptools --upgrade

#su - aen -c '/bin/bash ./prepare_pydoop.sh'
#su - aen -c '/bin/bash ./run_test_jar.sh'
#su - aen -c '/bin/bash ./run_tests.sh'
#su - aen -c '/bin/bash ./run_examples.sh'


================================================
FILE: dev_tools/docker/client_side_tests/hdp_2.2.0.0/initialize.sh
================================================
#!/bin/bash

port=$1
client_id=$2
rm_container_id=$3
DOCKER_HOST_IP=${4:-localhost}
#----------------------------------
client_name=`docker exec ${client_id} hostname`

#----------------------------------
scp -P${port} local_client_setup.sh root@${DOCKER_HOST_IP}:.

# exec and remove the installer script
ssh -p${port} root@${DOCKER_HOST_IP} './local_client_setup.sh && rm local_client_setup.sh'

# copy the hadoop configuration from the resourcemanager container to the client container
echo "Copying hadoop config from the resourcemanager container..."
for c in core-site.xml mapred-site.xml yarn-site.xml
do
    from=/opt/hadoop/etc/hadoop/${c}
    to=/etc/hadoop/conf/${c}
    docker exec -it ${rm_container_id} scp ${from} ${client_name}:${to}
done



================================================
FILE: dev_tools/docker/client_side_tests/hdp_2.2.0.0/local_client_setup.sh
================================================
#!/bin/bash

# This script should be run in the client container, see initialize.sh

#-----------
function log() {
    echo "$1"
}


function install_hdp2_ubuntu_packages() {
    local VERSION="${1}"
    local HRTWRKS_REPO=http://public-repo-1.hortonworks.com/HDP/ubuntu12/2.x
    local HDP_LIST=${HRTWRKS_REPO}/GA/${VERSION}/hdp.list

    log "Adding repository"
    wget -nv ${HDP_LIST} -O /etc/apt/sources.list.d/hdp.list
    gpg --keyserver pgp.mit.edu --recv-keys B9733A7A07513CAD && gpg -a --export 07513CAD | apt-key add -
    apt-get update
    apt-get install -y hadoop hadoop-hdfs libhdfs0 \
                       hadoop-yarn hadoop-mapreduce hadoop-client \
                       openssl libsnappy1 libsnappy-dev
}


#----- Hadoop setup
hdp_ver=2.2.0.0
install_hdp2_ubuntu_packages ${hdp_ver}

export HADOOP_HOME=/usr/hdp/current/hadoop-client
export PATH=${HADOOP_HOME}/bin:${PATH}

#------------------
# Pydoop setup
git_url=https://github.com/crs4/pydoop.git

cat <<EOF > /home/aen/prepare_pydoop.sh
git clone ${git_url}
cd pydoop
python setup.py build
EOF

cat <<EOF > /home/aen/run_tests.sh
cd pydoop/test
python all_tests.py
EOF

cat <<EOF > /home/aen/run_examples.sh
cd pydoop/examples
./run_all
EOF

cat <<EOF > /home/aen/run_test_jar.sh
hdfs dfs -put run_test_jar.sh
yarn jar /usr/hdp/2.2.0.0-2041/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount run_test_jar.sh foobar
EOF


#----------------------------------------------------
# Fix bad sw versions and missing things
apt-get install -y zip 
pip install setuptools --upgrade

#su - aen -c '/bin/bash ./prepare_pydoop.sh'
#cd /home/aen/pydoop
#python setup.py install
#cd
#su - aen -c '/bin/bash ./run_test_jar.sh'
#su - aen -c '/bin/bash ./run_tests.sh'
#su - aen -c '/bin/bash ./run_examples.sh'


================================================
FILE: dev_tools/docker/cluster.rst
================================================
Testing pydoop using a Docker Cluster
=====================================

The purpose of the pydoop docker cluster is to provide a full, standard, hadoop
cluster that can be used for testing purposes. This is a "real" cluster, not a
pseudo-cluster single node thing.

The supported testing strategy is to do the following:

 #. choose and start an appropriate docker cluster;
 #. log in the 'client' node provided by the cluster;
 #. install on the client node the targeted hadoop version -- it should be
    compatible, at the protocol level should be enough, with the cluster;
 #. install on the client node the pydoop version under test;
 #. run pydoop tests and examples.


Docker cluster
--------------

Build a cluster
;;;;;;;;;;;;;;;

Clusters configurations are defined in subdirectories of the directory
``clusters``, e.g., ``clusters/apache_2.6.0``.

Do the following to build all the cluster independent images::

  $ cd clusters
  $ ../scripts/build_base_images.sh
  
Next, build all the cluster dependent images::

  $ ../scripts/build_cluster_images.sh apache_2.6.0

where we have used ``apache_2.6.0`` as an example.


Run a cluster
;;;;;;;;;;;;;

To start a cluster, do the following::

  $ ../scripts/start_cluster.sh apache_2.6.0
  No stopped containers
  Creating apache260_zookeeper_1...
  Creating apache260_bootstrap_1...
  Creating apache260_client_1...
  Creating apache260_namenode_1...
  Creating apache260_datanode_1...
  Creating apache260_historyserver_1...
  Creating apache260_resourcemanager_1...
  Creating apache260_nodemanager_1...

The script attemps to clean up left-overs from previous runs. Thus if it is not
the first time you have run it, it will ask for your permission to rm old containers::

  $ ../scripts/start_cluster.sh apache_2.6.0
  Stopping apache260_nodemanager_1...
  Stopping apache260_resourcemanager_1...
  Stopping apache260_historyserver_1...
  Stopping apache260_datanode_1...
  Stopping apache260_namenode_1...
  Stopping apache260_client_1...
  Stopping apache260_zookeeper_1...
  Going to remove apache260_nodemanager_1, apache260_resourcemanager_1, apache260_historyserver_1, apache260_client_1, apache260_datanode_1, apache260_namenode_1, apache260_bootstrap_1, apache260_zookeeper_1
  Are you sure? [yN] y
  Removing apache260_zookeeper_1...
  Removing apache260_bootstrap_1...
  Removing apache260_client_1...
  Removing apache260_namenode_1...
  Removing apache260_datanode_1...
  Removing apache260_historyserver_1...
  Removing apache260_resourcemanager_1...
  Removing apache260_nodemanager_1...
  Moved logs to logs.backup.12522
  Moved local to local.backup.12522
  Creating apache260_zookeeper_1...
  Creating apache260_bootstrap_1...
  Creating apache260_client_1...
  Creating apache260_namenode_1...
  Creating apache260_datanode_1...
  Creating apache260_historyserver_1...
  Creating apache260_resourcemanager_1...
  Creating apache260_nodemanager_1...


To check how the cluster is doing, look at the logs of the bootstrap node::

  $ cd apache_2.6.0
  $ docker-compose logs bootstrap
  Attaching to apache260_bootstrap_1
  bootstrap_1 | INFO:root:Starting bootstrap.
  bootstrap_1 | INFO:root:Waiting for /etc/hosts to update on bootstrap
  bootstrap_1 | INFO:root:Waiting for /etc/hosts to update on bootstrap
  bootstrap_1 | ....
  bootstrap_1 | INFO:root:Waiting for /etc/hosts to update on bootstrap
  bootstrap_1 | INFO:kazoo.client:Connecting to zookeeper:2181
  bootstrap_1 | INFO:kazoo.client:Zookeeper connection established, state: CONNECTED
  bootstrap_1 | INFO:root:Booting namenode
  bootstrap_1 | INFO:root:	done.
  bootstrap_1 | INFO:root:Booting datanode
  bootstrap_1 | INFO:root:	done.
  bootstrap_1 | Creating /mr-history/tmp
  bootstrap_1 | Creating /mr-history/done
  bootstrap_1 | Setting ownership (mapred:hadoop) and permissions for /mr-history
  bootstrap_1 | INFO:root:Booting resourcemanager
  bootstrap_1 | INFO:root:	done.
  bootstrap_1 | INFO:root:Booting nodemanager
  bootstrap_1 | INFO:root:	done.
  bootstrap_1 | INFO:root:Booting historyserver
  bootstrap_1 | INFO:root:	done.
  bootstrap_1 | INFO:root:Done with bootstrap.
  apache260_bootstrap_1 exited with code 0

Then check:

  #. the namenode, ``http://localhost:50070``, it should be up and reporting a
     datanode;
  #. the resourcemanager, ``http://localhost:8088``, it should be up and reporting a
     nodemanager;
  #. the historyserver, ``http://localhost:19888``.


How to use a docker cluster
---------------------------

These are the basic steps.

Change directory to ``client_side_tests``, choose a specific distribution, say
``apache_2.6.0`` and ``cd`` to that directory.

Run the following command::

  $ ../../scripts/start_client.sh [<PORT>]

The script will create a new docker container with a cluster client node that
will respond to ssh connections on port ``PORT``, with 3333 as its default
value.  The ``start_client.sh`` script will execute the bash script
``initialize.sh``, see the provided client side tests for examples, to install
on the client container the appropriate hadoop distribution, needed software,
and a set of utility scripts.

.. note::

  You will probably have to answer twice 'yes' to ssh paranoia.


Log in on the client, install pydoop and run the tests::

  $ ssh -p 3333 root@localhost
    Linux minas-morgul 3.18.7-gentoo #1 SMP Mon Feb 23 17:39:58 PST 2015 x86_64
    
    The programs included with the Debian GNU/Linux system are free software;
    the exact distribution terms for each program are described in the
    individual files in /usr/share/doc/*/copyright.

    Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
    permitted by applicable law.
    root@client:~# su - aen -c "bash -x prepare_pydoop.sh"
    root@client:~# cd /home/aen/pydoop/
    root@client:~# python setup.py install
    root@client:~# cd
    root@client:~# su - aen -c "bash -x run_tests.sh"
    root@client:~# su - aen -c "bash -x run_examples.sh"    

Details
-------

Bootstrap strategy
;;;;;;;;;;;;;;;;;

The main synchronization issues are:

 #. All hosts should be able to resolve logical names to IP, e.g., namenode
   wants to resolve datenodes' IP to their logical names

 #. Part of inter-services communication is handled by using shared hdfs
   directories that should be accessible with the appropriate permissions as a
   pre-condition to service firing up.


The bootstrap strategy is as follows.

 #. There is an external mechanism -- here is the script
    ``../scripts/share_etc_hosts.py``, but it should really be integrated in
    docker-compose -- that guarantees that all nodes have in their ``/etc/hosts``
    entries for all nodes in the group.  We need to have an external mechanism
    that can talk to the docker server to be sure that we got all the nodes
    involved.

 #. We have a zookeeper node that is guaranteed to be fired before any other
    service by having all other nodes linked to it in the docker-compose.yml
    file.

 #. We have an auxiliary service, bootstrap, that is in charge of orchestrating
    the system bootstrap.

 #. The expected bootstrap workflow is as follows.

   a. docker-compose starts
   b. all services (except zookeeper and bootstrap) wait until
      ``zookeeper:/<servicename>`` is set to ``boot``
   c. bootstrap then does the following:
      
      1. waits until its /etc/hosts  has been changed;
      2. sets ``/{namenode,datanode}`` to boot;
      3. waits until namenode sets the ``/namenode`` to ``up``;
      4. creates the needed hdfs dirs with appropriate permissions;
      5. sets ``/{resourcemanager,nodemanager,historyserver}`` to ``boot``;
      6. dies gracefully.



================================================
FILE: dev_tools/docker/clusters/apache_2.6.0/docker-compose.yml
================================================
zookeeper:
  image: crs4_pydoop/apache_2.6.0_zookeeper:latest
  name: zookeeper
  hostname: zookeeper
  ports:
    - "2181:2181"

bootstrap:
  image: crs4_pydoop/apache_2.6.0_bootstrap:latest
  name: bootstrap
  hostname: bootstrap
  links:
    - zookeeper
    
namenode:
  image: crs4_pydoop/apache_2.6.0_namenode:latest
  name: namenode
  hostname: namenode
  volumes:
    - ./logs:/tmp/logs
  links:
    - zookeeper
  ports:
    - "9000:9000"
    - "50070:50070"

datanode:
  image: crs4_pydoop/apache_2.6.0_datanode:latest
  name: datanode
  hostname: datanode
  volumes_from:
    - namenode
  links:
    - zookeeper
  ports:
    - "50020:50020"        
    
resourcemanager:
  image: crs4_pydoop/apache_2.6.0_resourcemanager:latest
  name: resourcemanager
  hostname: resourcemanager
  volumes_from:
    - namenode
  links:
    - zookeeper
  ports:
    - "8088:8088"
    - "8021:8021"    
    - "8031:8031"
    - "8033:8033"    

historyserver:
  image: crs4_pydoop/apache_2.6.0_historyserver:latest
  name: historyserver
  hostname: historyserver
  volumes_from:
    - namenode
  links:
    - zookeeper
  ports:
    - "10020:10020"
    - "19888:19888"

nodemanager:
  image: crs4_pydoop/apache_2.6.0_nodemanager:latest
  name: nodemanager
  hostname: nodemanager
  links:
    - zookeeper
  ports:
    - "8042:8042"
  volumes_from:
    - namenode
    - client
    
client:
  image: crs4_pydoop/client:latest
  name: client
  hostname: client
  ports:
    - "2222:22"
  volumes:
    - ./local:/usr/local


================================================
FILE: dev_tools/docker/clusters/apache_2.6.0/images/base/Dockerfile
================================================
#----------------------------------------------------
FROM crs4_pydoop/base:latest

# ------------------------------------------------------------------
# Get zookeeper
     
ENV zoo_ver zookeeper-3.4.6
ENV zoo_tgz ${zoo_ver}.tar.gz
ENV zoo_site http://mirror.nohup.it/apache/zookeeper
ENV zoo_tgz_site ${zoo_site}/${zoo_ver}

RUN wget ${zoo_tgz_site}/${zoo_tgz} -O ${zoo_tgz} && \
    mkdir -p /opt && tar -C /opt -xzf ${zoo_tgz} && rm -f ${zoo_tgz} && \
    ln -s /opt/${zoo_ver} /opt/zookeeper

ENV ZOO_DATA_DIR      /data/zookeeper/data
ENV ZOO_CLIENT_PORT   2181

EXPOSE ${ZOO_CLIENT_PORT}

RUN mkdir -p ${ZOO_DATA_DIR}
RUN echo "tickTime=2000"                  > /opt/zookeeper/conf/zoo.cfg && \
    echo "dataDir ${ZOO_DATA_DIR}"       >> /opt/zookeeper/conf/zoo.cfg && \
    echo "clientPort ${ZOO_CLIENT_PORT}" >> /opt/zookeeper/conf/zoo.cfg && \
    echo 1 > ${ZOO_DATA_DIR}/myid

# Note that we are forcing the installation into dist-packages,
# so that it will be possible to share kazoo and externally mount /usr/local later.
RUN pip install kazoo -t /usr/lib/python2.7/dist-packages
COPY scripts/zk_wait.py /tmp/
COPY scripts/zk_set.py /tmp/
# -----------------------------------------------------------------
# Get hadoop

ENV hdp_ver hadoop-2.6.0
ENV hdp_tgz ${hdp_ver}.tar.gz
ENV hdp_site http://mirror.nohup.it/apache/hadoop/common
ENV hdp_tgz_site ${hdp_site}/hadoop-2.6.0

RUN wget ${hdp_tgz_site}/${hdp_tgz} -O ${hdp_tgz} && \
    mkdir -p /opt && tar -C /opt -xzf ${hdp_tgz} && rm -f ${hdp_tgz} && \
    ln -s /opt/${hdp_ver} /opt/hadoop

# ------------------------------------------------------------------
# User:Group	   Daemons
# hdfs:hadoop	   NameNode, Secondary NameNode, JournalNode, DataNode
# yarn:hadoop	   ResourceManager, NodeManager
# mapred:hadoop	 MapReduce JobHistory Server

ENV HADOOP_GROUP hadoop
ENV HDFS_USER hdfs
ENV YARN_USER yarn
ENV MAPRED_USER mapred

ENV HDP_DATA_ROOT /data/hadoop
ENV LOG_DIR_ROOT /tmp/logs
ENV HADOOP_TMP_DIR /tmp

ENV HADOOP_CONF_DIR  /opt/hadoop/etc/hadoop

ENV DFS_NAME_DIR ${HDP_DATA_ROOT}/hdfs/nn
ENV DFS_DATA_DIR ${HDP_DATA_ROOT}/hdfs/dn
ENV DFS_CHECKPOINT_DIR   ${HDP_DATA_ROOT}/hdfs/snn
ENV HDFS_LOG_DIR ${LOG_DIR_ROOT}/hdfs
ENV HDFS_PID_DIR ${HDP_DATA_ROOT}/pid/hdfs

ENV YARN_LOCAL_DIR ${HDP_DATA_ROOT}/yarn
ENV YARN_LOG_DIR ${LOG_DIR_ROOT}/yarn
ENV YARN_LOCAL_LOG_DIR ${YARN_LOCAL_DIR}/userlogs
ENV YARN_PID_DIR ${HDP_DATA_ROOT}/pid/yarn

ENV YARN_REMOTE_APP_LOG_DIR   /app-logs

ENV MAPRED_LOG_DIR   ${LOG_DIR_ROOT}/mapred
ENV MAPRED_PID_DIR   ${HDP_DATA_ROOT}/pid/mapred

ENV MAPRED_JH_ROOT_DIR /mr-history
ENV MAPRED_JH_INTERMEDIATE_DONE_DIR ${MAPRED_JH_ROOT_DIR}/tmp
ENV MAPRED_JH_DONE_DIR ${MAPRED_JH_ROOT_DIR}/done

#----------------------------------------------------------

# Create groups and users
RUN groupadd ${HADOOP_GROUP} && \
    useradd -g ${HADOOP_GROUP} ${HDFS_USER} && \
    useradd -g ${HADOOP_GROUP} ${YARN_USER} && \
    useradd -g ${HADOOP_GROUP} ${MAPRED_USER}

# Create DATA_DIR_ROOT
RUN mkdir -p ${HDP_DATA_ROOT} && \
    chmod -R 755 ${HDP_DATA_ROOT}

# Create LOG_DIR_ROOT
RUN mkdir -p ${LOG_DIR_ROOT} && \
    chmod -R 1777 ${LOG_DIR_ROOT}
	
RUN mkdir -p ${HADOOP_CONF_DIR}
	
### HDFS DIRs ###########################################################

# DataNode
RUN mkdir -p ${DFS_DATA_DIR} && \
    chown -R ${HDFS_USER}:${HADOOP_GROUP} ${DFS_DATA_DIR} && \
    chmod -R 750 ${DFS_DATA_DIR}

# NameNode
RUN mkdir -p ${DFS_NAME_DIR} && \
    chown -R ${HDFS_USER}:${HADOOP_GROUP} ${DFS_NAME_DIR} && \
    chmod -R 755 ${DFS_NAME_DIR}
	
# HDFS log dir
RUN	mkdir -p ${HDFS_LOG_DIR} && \
    chown -R ${HDFS_USER}:${HADOOP_GROUP} ${HDFS_LOG_DIR} && \
    chmod -R 750 ${HDFS_LOG_DIR}
	
# HDFS pid dir	
RUN mkdir -p ${HDFS_PID_DIR} && \
    chown -R ${HDFS_USER}:${HADOOP_GROUP} ${HDFS_PID_DIR} && \
    chmod -R 750 ${HDFS_PID_DIR}

#
RUN mkdir -p ${DFS_CHECKPOINT_DIR} && \
    chown -R ${HDFS_USER}:${HADOOP_GROUP} ${DFS_CHECKPOINT_DIR} && \
    chmod -R 755 ${DFS_CHECKPOINT_DIR}
 
 
### YARN DIRs ########################################################### 
	
# YARN_LOCAL_DIR
RUN mkdir -p ${YARN_LOCAL_DIR} && \
    chown -R ${YARN_USER}:${HADOOP_GROUP} ${YARN_LOCAL_DIR} && \
    chmod -R 755 ${YARN_LOCAL_DIR}

# YARN log dir
RUN mkdir -p ${YARN_LOG_DIR} && \
    chown -R ${YARN_USER}:${HADOOP_GROUP} ${YARN_LOG_DIR} && \
    chmod -R 755 ${YARN_LOG_DIR}

# YARN_LOCAL_LOG_DIR
RUN mkdir -p ${YARN_LOCAL_LOG_DIR} && \
    chown -R ${YARN_USER}:${HADOOP_GROUP} ${YARN_LOCAL_LOG_DIR} && \
    chmod -R 755 ${YARN_LOCAL_LOG_DIR}

# YARN pid dir
RUN mkdir -p $YARN_PID_DIR && \
    chown -R $YARN_USER:$HADOOP_GROUP $YARN_PID_DIR && \
    chmod -R 755 $YARN_PID_DIR
	
	
### MAPRED DIRs ##########################################################	

# MAPRED log dir
RUN mkdir -p $MAPRED_LOG_DIR && \
    chown -R $MAPRED_USER:$HADOOP_GROUP $MAPRED_LOG_DIR && \
    chmod -R 755 $MAPRED_LOG_DIR
	
# MAPRED pid dir	
RUN mkdir -p $MAPRED_PID_DIR && \
    chown -R $MAPRED_USER:$HADOOP_GROUP $MAPRED_PID_DIR && \
    chmod -R 755 $MAPRED_PID_DIR

RUN mkdir -p $ ${YARN_REMOTE_APP_LOG_DIR} && \
    chown -R ${YARN_USER}:${HADOOP_GROUP} ${YARN_REMOTE_APP_LOG_DIR} && \
    chmod -R 777 ${YARN_REMOTE_APP_LOG_DIR}


COPY scripts/generate_conf_files.py /tmp/
RUN python2.7 /tmp/generate_conf_files.py ${HADOOP_CONF_DIR}

ENV HADOOP_HOME /opt/hadoop
ENV PATH ${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH



================================================
FILE: dev_tools/docker/clusters/apache_2.6.0/images/base/scripts/generate_conf_files.py
================================================
import sys
import os
import xml.etree.cElementTree as ET


def add_property(conf, name, value):
    prop = ET.SubElement(conf, 'property')
    ET.SubElement(prop, 'name').text = name
    ET.SubElement(prop, 'value').text = value


def write_xml(root, fname):
    tree = ET.ElementTree(root)
    with open(fname, 'w') as f:
        f.write('<?xml version="1.0" encoding="UTF-8"?>\n')
        f.write('<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>')
        tree.write(f)


def generate_xml_conf_file(fname, props):
    root = ET.Element("configuration")
    for name, value in props:
        add_property(root, name, value)
    write_xml(root, fname)


def generate_core_site(fname):
    hostname = 'namenode'
    generate_xml_conf_file(fname, (
        ('fs.defaultFS', 'hdfs://%s:8020' % hostname),
        ('hadoop.tmp.dir', 'file://' + os.environ['HADOOP_TMP_DIR'])
    ))


def generate_hdfs_site(fname):
    generate_xml_conf_file(fname, (
        ('dfs.replication', '1'),
        ('dfs.namenode.name.dir', 'file://' + os.environ['DFS_NAME_DIR']),
        ('dfs.datanode.data.dir', 'file://' + os.environ['DFS_DATA_DIR']),
        ('dfs.namenode.checkpoint.dir', os.environ['DFS_CHECKPOINT_DIR']),
        ('dfs.namenode.checkpoint.edits.dir',
            os.environ['DFS_CHECKPOINT_DIR']),
    ))


def generate_yarn_site(fname):
    generate_xml_conf_file(fname, (
        ('yarn.resourcemanager.hostname', 'resourcemanager'),
        ('yarn.nodemanager.hostname', 'nodemanager'),
        ('yarn.nodemanager.aux-services', 'mapreduce_shuffle'),
        ('yarn.nodemanager.aux-services.mapreduce.shuffle.class',
            'org.apache.hadoop.mapred.ShuffleHandler'),
        # seconds to delay before deleting application
        # localized logs and files. > 0 if debugging.
        ('yarn.nodemanager.delete.debug-delay-sec', '600'),
        ('yarn.nodemanager.log-dirs',
            'file://' + os.environ['YARN_LOCAL_LOG_DIR']),
        ('yarn.log.dir', os.environ['YARN_LOG_DIR']),
        ('yarn.nodemanager.remote-app-log-dir',
            os.environ['YARN_REMOTE_APP_LOG_DIR']),
        ('yarn.log-aggregation-enable', 'true'),
        # ('yarn.log-aggregation.retain-seconds', '360000'),
        # ('yarn.log-aggregation.retain-check-interval-seconds', '360'),
        # ('yarn.log.server.url', 'http://historyserver:19888'),
    ))


def generate_mapred_site(fname):
    generate_xml_conf_file(fname, (
        ('mapreduce.framework.name', 'yarn'),

        # MRv1
        ('mapreduce.jobtracker.address', 'resourcemanager:8021'),
        ('mapreduce.jobtracker.http.address', 'resourcemanager:50030'),
        ('mapreduce.tasktracker.http.address', 'nodemanager:50060'),

        # History Server
        ('mapreduce.jobhistory.address', 'historyserver:10020'),
        ('mapreduce.jobhistory.webapp.address', 'historyserver:19888'),
        ('mapreduce.jobhistory.intermediate-done-dir',
            os.environ['MAPRED_JH_INTERMEDIATE_DONE_DIR']),
        ('mapreduce.jobhistory.done-dir', os.environ['MAPRED_JH_DONE_DIR']),
    ))


def generate_capacity_scheduler(fname):
    generate_xml_conf_file(fname, (
        ('yarn.scheduler.capacity.resource-calculator',
         'org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator'),
        ('yarn.scheduler.capacity.root.queues', 'default'),
        ('yarn.scheduler.capacity.root.default.capacity', '100'),
        ('yarn.scheduler.capacity.root.default.user-limit-factor', '1'),
        ('yarn.scheduler.capacity.root.default.maximum-capacity', '100'),
        ('yarn.scheduler.capacity.root.default.state', 'RUNNING'),
        ('yarn.scheduler.capacity.root.default.acl_submit_applications', '*'),
        ('yarn.scheduler.capacity.root.default.acl_administer_queue', '*'),
        ('yarn.scheduler.capacity.node-locality-delay', '40')))


def main(argv):
    target_dir = argv[1]
    generate_core_site(os.path.join(target_dir, 'core-site.xml'))
    generate_hdfs_site(os.path.join(target_dir, 'hdfs-site.xml'))
    generate_yarn_site(os.path.join(target_dir, 'yarn-site.xml'))
    generate_mapred_site(os.path.join(target_dir, 'mapred-site.xml'))
    generate_capacity_scheduler(os.path.join(target_dir,
                                             'capacity-scheduler.xml'))


main(sys.argv)


================================================
FILE: dev_tools/docker/clusters/apache_2.6.0/images/base/scripts/zk_set.py
================================================
import sys
import os
from kazoo.client import KazooClient

import logging
logging.basicConfig()

logger = logging.getLogger()
logger.setLevel(logging.INFO)


kz = KazooClient('zookeeper', int(os.environ['ZOO_CLIENT_PORT']))

path = '/' + sys.argv[1]
value = sys.argv[2]

kz.start()
kz.set(path, value)
kz.stop()


================================================
FILE: dev_tools/docker/clusters/apache_2.6.0/images/base/scripts/zk_wait.py
================================================
import sys
import os
import time
from kazoo.client import KazooClient

import logging
logging.basicConfig()

logger = logging.getLogger()
logger.setLevel(logging.INFO)

host = 'zookeeper'
port = int(os.environ['ZOO_CLIENT_PORT'])
logger.info('Starting on %s:%d', host, port)

kz = KazooClient(host, port)

path = '/' + sys.argv[1]
logger.info('Path is %s', path)

done = False
while not done:
    kz.start(timeout=15)
    done = kz.exists(path)
    kz.stop()
    time.sleep(10)


================================================
FILE: dev_tools/docker/clusters/apache_2.6.0/images/bootstrap/Dockerfile
================================================
#----------------------------------------------------
FROM crs4_pydoop/apache_2.6.0_base:latest


COPY scripts/bootstrap.py /tmp/
COPY scripts/create_hdfs_dirs.sh /tmp/

CMD ["/usr/bin/python", "/tmp/bootstrap.py"]
    


================================================
FILE: dev_tools/docker/clusters/apache_2.6.0/images/bootstrap/scripts/bootstrap.py
================================================
from kazoo.client import KazooClient
import os
import time
import logging
import platform

logging.basicConfig()

logger = logging.getLogger()
logger.setLevel(logging.INFO)


# FIXME this will break if our name is a substring of the hosts we are linked
# to.
def etc_updated():
    hostname = platform.node()
    logger.info('Waiting for /etc/hosts to update on %s', hostname)
    if not hostname:
        raise RuntimeError('hostname is undefined')
    with open('/etc/hosts') as f:
        return sum(x.find(hostname) > -1 for x in f) > 1
    logger.info('\tdone')


def boot_node(kz, nodename):
    logger.info('Booting %s', nodename)
    path = '/' + nodename
    kz.create(path, 'boot')
    while kz.get(path)[0] != 'up':
        time.sleep(2)
    logger.info('\tdone.')


def main():
    logger.info('Starting bootstrap.')
    zookeeper_host = 'zookeeper'
    zookeeper_port = int(os.environ['ZOO_CLIENT_PORT'])
    while not etc_updated():
        time.sleep(1)
    kz = KazooClient(hosts='%s:%d' % (zookeeper_host, zookeeper_port))
    kz.start()
    boot_node(kz, 'namenode')
    boot_node(kz, 'datanode')
    os.system('bash /tmp/create_hdfs_dirs.sh')
    boot_node(kz, 'resourcemanager')
    boot_node(kz, 'nodemanager')
    boot_node(kz, 'historyserver')
    logger.info('Done with bootstrap.')


main()


================================================
FILE: dev_tools/docker/clusters/apache_2.6.0/images/bootstrap/scripts/create_hdfs_dirs.sh
================================================
#!/bin/bash

export HADOOP_LOG_DIR=${HDFS_LOG_DIR}
export HADOOP_PID_DIR=${HDFS_PID_DIR}

HADOOP_BIN=${HADOOP_HOME}/bin


# su ${HDFS_USER} -c "${HADOOP_BIN}/hdfs dfs -mkdir -p ${YARN_REMOTE_APP_LOG_DIR}"
# su ${HDFS_USER} -c "${HADOOP_BIN}/hdfs dfs -chown -R ${YARN_USER}:${HADOOP_GROUP} ${YARN_REMOTE_APP_LOG_DIR}"
# su ${HDFS_USER} -c "${HADOOP_BIN}/hdfs dfs -chmod -R ${YARN_REMOTE_APP_LOG_DIR}"

#for d in ${MAPRED_JH_DONE_DIR} ${MAPRED_JH_INTERMEDIATE_DONE_DIR}
# do
#     su ${HDFS_USER} -c "${HADOOP_BIN}/hdfs dfs -mkdir -p ${d}"
#     su ${HDFS_USER} -c "${HADOOP_BIN}/hdfs dfs -chown -R ${MAPRED_USER}:${HADOOP_GROUP} ${d}"
#     su ${HDFS_USER} -c "${HADOOP_BIN}/hdfs dfs -chmod -R 777 ${d}"
# done

su ${HDFS_USER} -c "${HADOOP_BIN}/hdfs dfs -mkdir -p /tmp"
su ${HDFS_USER} -c "${HADOOP_BIN}/hdfs dfs -chmod -R 1777 /tmp"

echo "Creating /tmp/hadoop-yarn (owner ${MAPRED_USER}:${HADOOP_GROUP})"
su ${HDFS_USER} -c "${HADOOP_BIN}/hdfs dfs -mkdir -p /tmp/hadoop-yarn/staging"
#su ${HDFS_USER} -c "${HADOOP_BIN}/hdfs dfs -mkdir -p /tmp/hadoop-yarn/staging/history/tmp"
su ${HDFS_USER} -c "${HADOOP_BIN}/hdfs dfs -chown -R ${MAPRED_USER}:${HADOOP_GROUP} /tmp/hadoop-yarn"
su ${HDFS_USER} -c "${HADOOP_BIN}/hdfs dfs -chmod -R 1777 /tmp/hadoop-yarn"


echo "Creating ${MAPRED_JH_INTERMEDIATE_DONE_DIR}"
su ${HDFS_USER} -c "${HADOOP_BIN}/hdfs dfs -mkdir -p ${MAPRED_JH_INTERMEDIATE_DONE_DIR}"
echo "Creating ${MAPRED_JH_DONE_DIR}"
su ${HDFS_USER} -c "${HADOOP_BIN}/hdfs dfs -mkdir -p ${MAPRED_JH_DONE_DIR}"
echo "Setting ownership (${MAPRED_USER}:${HADOOP_GROUP}) and permissions for ${MAPRED_JH_ROOT_DIR}"
su ${HDFS_USER} -c "${HADOOP_BIN}/hdfs dfs -chown -R ${MAPRED_USER}:${HADOOP_GROUP} ${MAPRED_JH_ROOT_DIR}"
su ${HDFS_USER} -c "${HADOOP_BIN}/hdfs dfs -chmod -R 1777 ${MAPRED_JH_ROOT_DIR}"


su ${HDFS_USER} -c "${HADOOP_BIN}/hdfs dfs -mkdir -p /user/${UNPRIV_USER}"
su ${HDFS_USER} -c "${HADOOP_BIN}/hdfs dfs -chown ${UNPRIV_USER} /user/${UNPRIV_USER}"

su ${HDFS_USER} -c "${HADOOP_BIN}/hdfs dfs -mkdir -p /user/${MAPRED_USER}"
su ${HDFS_USER} -c "${HADOOP_BIN}/hdfs dfs -chown ${MAPRED_USER} /user/${MAPRED_USER}"
su ${MAPRED_USER} -c "${HADOOP_BIN}/hdfs dfs -mkdir /user/${MAPRED_USER}/logs"


================================================
FILE: dev_tools/docker/clusters/apache_2.6.0/images/datanode/Dockerfile
================================================
#----------------------------------------------------
FROM crs4_pydoop/apache_2.6.0_base:latest

#
EXPOSE  50020

COPY scripts/start_datanode.sh /tmp/

CMD ["/bin/bash", "/tmp/start_datanode.sh"]



================================================
FILE: dev_tools/docker/clusters/apache_2.6.0/images/datanode/scripts/start_datanode.sh
================================================
#!/bin/bash

#--- manage_deamon stardard
export HADOOP_LOG_DIR=${HDFS_LOG_DIR}
export HADOOP_PID_DIR=${HDFS_PID_DIR}

python /tmp/zk_wait.py datanode

su - ${HDFS_USER} -p -c "${HADOOP_HOME}/sbin/hadoop-daemon.sh --config ${HADOOP_CONF_DIR} start datanode"

# FIXME
python /tmp/zk_set.py datanode up

echo "Log is  ${HDFS_LOG_DIR}/*datanode-${HOSTNAME}.out"

tail -f ${HDFS_LOG_DIR}/*datanode-${HOSTNAME}.out





================================================
FILE: dev_tools/docker/clusters/apache_2.6.0/images/historyserver/Dockerfile
================================================
#----------------------------------------------------
FROM crs4_pydoop/apache_2.6.0_base:latest

#
EXPOSE 10020 19888

COPY scripts/start_historyserver.sh /tmp/

CMD ["/bin/bash", "/tmp/start_historyserver.sh"]



================================================
FILE: dev_tools/docker/clusters/apache_2.6.0/images/historyserver/scripts/start_historyserver.sh
================================================
#!/bin/bash

python /tmp/zk_wait.py historyserver

# we should actually check that the nodemanager is up ...
python /tmp/zk_set.py historyserver up

export HADOOP_JHS_LOGGER=DEBUG,JSA

su ${MAPRED_USER} -c "${HADOOP_HOME}/bin/mapred --config ${HADOOP_CONF_DIR} historyserver 2>&1 >/tmp/logs/historyserver.out"




================================================
FILE: dev_tools/docker/clusters/apache_2.6.0/images/namenode/Dockerfile
================================================
#----------------------------------------------------
FROM crs4_pydoop/apache_2.6.0_base:latest

# HDFS WebUI and HDFS default port
EXPOSE  50070 9000

COPY scripts/start_namenode.sh /tmp/

CMD ["/bin/bash", "/tmp/start_namenode.sh"]
    


================================================
FILE: dev_tools/docker/clusters/apache_2.6.0/images/namenode/scripts/start_namenode.sh
================================================
#!/bin/bash

#--- manage_deamon stardard
export HADOOP_LOG_DIR=${HDFS_LOG_DIR}
export HADOOP_PID_DIR=${HDFS_PID_DIR}

python /tmp/zk_wait.py namenode

su -l ${HDFS_USER} -c "${HADOOP_HOME}/bin/hdfs --config ${HADOOP_CONF_DIR} namenode -format"

su -l -p ${HDFS_USER} -c "${HADOOP_HOME}/sbin/hadoop-daemon.sh --config ${HADOOP_CONF_DIR} start namenode"

# we should actually check that the namenode is up ...
python /tmp/zk_set.py namenode up

echo "log is ${HDFS_LOG_DIR}/*namenode-${HOSTNAME}.out"

tail -f ${HDFS_LOG_DIR}/*namenode-${HOSTNAME}.out


================================================
FILE: dev_tools/docker/clusters/apache_2.6.0/images/nodemanager/Dockerfile
================================================
#----------------------------------------------------
FROM crs4_pydoop/apache_2.6.0_base:latest

#
EXPOSE 8042

COPY scripts/start_nodemanager.sh /tmp/

CMD ["/bin/bash", "/tmp/start_nodemanager.sh"]



================================================
FILE: dev_tools/docker/clusters/apache_2.6.0/images/nodemanager/scripts/start_nodemanager.sh
================================================
#!/bin/bash

export YARN_LOG_DIR=${YARN_LOG_DIR}
export HADOOP_PID_DIR=${HDFS_PID_DIR}

python /tmp/zk_wait.py nodemanager

# YARN_OPTS="$YARN_OPTS -Dhadoop.log.dir=$YARN_LOG_DIR"
# YARN_OPTS="$YARN_OPTS -Dyarn.log.dir=$YARN_LOG_DIR"
# YARN_OPTS="$YARN_OPTS -Dhadoop.log.file=$YARN_LOGFILE"
# YARN_OPTS="$YARN_OPTS -Dyarn.log.file=$YARN_LOGFILE"
# YARN_OPTS="$YARN_OPTS -Dyarn.home.dir=$YARN_COMMON_HOME"
# YARN_OPTS="$YARN_OPTS -Dyarn.id.str=$YARN_IDENT_STRING"
# YARN_OPTS="$YARN_OPTS -Dhadoop.root.logger=${YARN_ROOT_LOGGER:-INFO,console}"
# YARN_OPTS="$YARN_OPTS -Dyarn.root.logger=${YARN_ROOT_LOGGER:-INFO,console}"


su - ${YARN_USER} -p -c "${HADOOP_HOME}/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager"

# we should actually check that the nodemanager is up ...
python /tmp/zk_set.py nodemanager up

echo log is ${YARN_LOG_DIR}/*nodemanager-${HOSTNAME}.out

tail -f ${YARN_LOG_DIR}/*nodemanager-${HOSTNAME}.out



================================================
FILE: dev_tools/docker/clusters/apache_2.6.0/images/resourcemanager/Dockerfile
================================================
#----------------------------------------------------
FROM crs4_pydoop/apache_2.6.0_base:latest

#
EXPOSE 8088 8021 8031 8032 8033

COPY scripts/start_resourcemanager.sh /tmp/

CMD ["/bin/bash", "/tmp/start_resourcemanager.sh"]



================================================
FILE: dev_tools/docker/clusters/apache_2.6.0/images/resourcemanager/scripts/start_resourcemanager.sh
================================================
#!/bin/bash

export YARN_LOG_DIR=${YARN_LOG_DIR}
export HADOOP_PID_DIR=${HDFS_PID_DIR}
export YARN_OPTS=''

export HADOOP_MAPRED_LOG_DIR=${YARN_LOG_DIR}

# YARN_OPTS="$YARN_OPTS -Dhadoop.log.dir=$YARN_LOG_DIR"
# YARN_OPTS="$YARN_OPTS -Dyarn.log.dir=$YARN_LOG_DIR"
# YARN_OPTS="$YARN_OPTS -Dhadoop.log.file=$YARN_LOGFILE"
# YARN_OPTS="$YARN_OPTS -Dyarn.log.file=$YARN_LOGFILE"
# YARN_OPTS="$YARN_OPTS -Dyarn.home.dir=$YARN_COMMON_HOME"
# YARN_OPTS="$YARN_OPTS -Dyarn.id.str=$YARN_IDENT_STRING"
# YARN_OPTS="$YARN_OPTS -Dhadoop.root.logger=${YARN_ROOT_LOGGER:-INFO,console}"
# YARN_OPTS="$YARN_OPTS -Dyarn.root.logger=${YARN_ROOT_LOGGER:-INFO,console}"


python /tmp/zk_wait.py resourcemanager

su - ${YARN_USER} -p -c "${HADOOP_HOME}/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager"

# su - ${MAPRED_USER} -p -c "${HADOOP_HOME}/sbin/mr-jobhistory-daemon.sh --config ${HADOOP_CONF_DIR} start historyserver"

# we should actually check that the resourcemanager is up ...
python /tmp/zk_set.py resourcemanager up

echo log is ${YARN_LOG_DIR}/*resourcemanager-${HOSTNAME}.out

tail -f ${YARN_LOG_DIR}/*resourcemanager-${HOSTNAME}.out



================================================
FILE: dev_tools/docker/clusters/apache_2.6.0/images/zookeeper/Dockerfile
================================================
#----------------------------------------------------
FROM crs4_pydoop/apache_2.6.0_base:latest

EXPOSE 2181

CMD ["/opt/zookeeper/bin/zkServer.sh", "start-foreground"]
    


================================================
FILE: dev_tools/docker/clusters/apache_2.6.0/images/zookeeper/scripts/start_namenode.sh
================================================
#!/bin/bash

#--- manage_deamon stardard
export HADOOP_LOG_DIR=${HDFS_LOG_DIR}
export HADOOP_PID_DIR=${HDFS_PID_DIR}

python /tmp/zk_wait.py namenode

su ${HDFS_USER} -c "${HADOOP_HOME}/bin/hdfs --config ${HADOOP_CONF_DIR} namenode -format"

# we should actually check that the namenode is up ...
python /tmp/zk_set.py namenode up

su ${HDFS_USER} -c "${HADOOP_HOME}/bin/hdfs --config ${HADOOP_CONF_DIR} namenode"




================================================
FILE: dev_tools/docker/images/base/Dockerfile
================================================
#----------------------------------------------------
#
# A basic java machine with java, basic services and iv6 disabled
#----------------------------------------------------
FROM debian:latest

#----------------------------------------------------
# Install java and basic services
RUN echo "deb http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main" | tee /etc/apt/sources.list.d/webupd8team-java.list && \
    echo "deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main" | tee -a /etc/apt/sources.list.d/webupd8team-java.list && \
    apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys EEA14886 && \
    apt-get update && \
    echo yes | apt-get install -y --force-yes oracle-java8-installer && \
    apt-get install -y \
    apt-utils \
    openssh-server \
    python \
    python-pip \
    wget

ENV JAVA_HOME /usr/lib/jvm/java-8-oracle
RUN echo "export JAVA_HOME=${JAVA_HOME}" >> /etc/profile.d/java.sh

#----------------------------------------------------
# disable ipv6
RUN echo "net.ipv6.conf.all.disable_ipv6=1"     >> /etc/sysctl.conf && \
    echo "net.ipv6.conf.default.disable_ipv6=1" >> /etc/sysctl.conf && \
    echo "net.ipv6.conf.lo.disable_ipv6=1"      >>  /etc/sysctl.conf

#----------------------------------------------------
# add default unprivileged user (Alfred E. Neuman, "What? Me worry?")
ENV UNPRIV_USER aen
RUN useradd -m ${UNPRIV_USER} -s /bin/bash && \
    echo "${UNPRIV_USER}:hadoop" | chpasswd
    
RUN mkdir -p /root/.ssh && \
    ssh-keygen -t dsa -P '' -f /root/.ssh/id_dsa && \
    cat /root/.ssh/id_dsa.pub >> /root/.ssh/authorized_keys

================================================
FILE: dev_tools/docker/images/client/Dockerfile
================================================
#----------------------------------------------------
FROM crs4_pydoop/base:latest

#----------------------------------
# Install useful stuff
# NO update. We should be in line with base
RUN apt-get install -y git build-essential python-dev

#----------------------------------
# Enable sshd
RUN mkdir /var/run/sshd
RUN echo 'root:hadoop' | chpasswd
RUN sed -i 's/PermitRootLogin without-password/PermitRootLogin yes/' /etc/ssh/sshd_config

# SSH login fix. Otherwise user is kicked off after login
RUN sed 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' -i /etc/pam.d/sshd

ENV NOTVISIBLE "in users profile"
RUN echo "export VISIBLE=now" >> /etc/profile

EXPOSE 22

#-----------------------------------
CMD ["/usr/sbin/sshd", "-D"]

================================================
FILE: dev_tools/docker/scripts/build_base_images.sh
================================================
#!/bin/bash

current_path=$(cd $(dirname ${BASH_SOURCE}); pwd; cd - >/dev/null)
images_path="${current_path}/../images"

echo "Building crs4_pydoop/base image (path: ${images_path}/base)"
docker build -t crs4_pydoop/base	${images_path}/base

echo "Building crs4_pydoop/client image (path: ${images_path}/client)"
docker build -t crs4_pydoop/client ${images_path}/client


================================================
FILE: dev_tools/docker/scripts/build_cluster_images.sh
================================================
#!/bin/bash

TAG=${1}

CL_DIR=${TAG}/images

for d in ${CL_DIR}/*
do
    if [ -d ${d} -a -e ${d}/Dockerfile ]; then
        base=${d##${CL_DIR}/}
        docker build -t crs4_pydoop/${TAG}_${base} ${d}
    fi
done
         
exit

# docker build -t crs4_pydoop/${TAG}_base     ${CL_DIR}/base
# docker build -t crs4_pydoop/${TAG}_zookeeper ${CL_DIR}/zookeeper
# docker build -t crs4_pydoop/${TAG}_namenode ${CL_DIR}/namenode
# docker build -t crs4_pydoop/${TAG}_datanode ${CL_DIR}/datanode
# docker build -t crs4_pydoop/${TAG}_resourcemanager ${CL_DIR}/resourcemanager
# docker build -t crs4_pydoop/${TAG}_nodemanager ${CL_DIR}/nodemanager
# docker build -t crs4_pydoop/${TAG}_historyserver ${CL_DIR}/historyserver
# docker build -t crs4_pydoop/${TAG}_bootstrap     ${CL_DIR}/bootstrap




================================================
FILE: dev_tools/docker/scripts/share_etc_hosts.py
================================================
import os
import sys
import ssl
import logging
from docker import tls
from docker import Client


logging.basicConfig()

logger = logging.getLogger('share_etc_hosts')
logger.setLevel(logging.DEBUG)


class App(object):
    def __init__(self, compose_group_name):
        self.client = docker_client()
        self.containers = self._get_containers(compose_group_name)

    def _get_containers(self, compose_group_name):
        head = '/%s_' % compose_group_name
        cs = [c for c in self.client.containers()
              if c['Names'][0].startswith(head)]
        return cs

    def _get_hosts(self):
        hosts = {}
        for c in self.containers:
            d = self.client.inspect_container(c['Id'])
            hosts[c['Id']] = (d['NetworkSettings']['IPAddress'],
                              d['Config']['Hostname'])
        return hosts

    def share_etc_hosts(self):
        hosts = self._get_hosts()
        host_table = str('\n'.join(['%s\t%s' % h for h in hosts.itervalues()]))
        logger.debug('Host table is:\n%s', host_table)
        cmd = '/bin/bash -c "echo -e %r >> /etc/hosts"' % host_table
        for k in hosts:
            logger.debug('Updating %s', k)
            print(self.client.execute(k, cmd))


def docker_client():
    """
    Returns a docker-py client configured using environment variables
    according to the same logic as the official Docker client.
    """
    cert_path = os.environ.get('DOCKER_CERT_PATH', '')
    if cert_path == '':
        cert_path = os.path.join(os.environ.get('HOME', ''), '.docker')

    base_url = os.environ.get('DOCKER_HOST')
    tls_config = None

    if os.environ.get('DOCKER_TLS_VERIFY', '') != '':
        parts = base_url.split('://', 1)
        base_url = '%s://%s' % ('https', parts[1])

        client_cert = (os.path.join(cert_path, 'cert.pem'),
                       os.path.join(cert_path, 'key.pem'))
        ca_cert = os.path.join(cert_path, 'ca.pem')

        tls_config = tls.TLSConfig(
            ssl_version=ssl.PROTOCOL_TLSv1,
            verify=True,
            assert_hostname=False,
            client_cert=client_cert,
            ca_cert=ca_cert,
        )

    timeout = int(os.environ.get('DOCKER_CLIENT_TIMEOUT', 60))
    return Client(
        base_url=base_url, tls=tls_config, version='1.15', timeout=timeout
    )


def main(argv):
    tag = argv[1].replace('.', '').replace('_', '')
    logger.info('Tag is:%s', tag)
    app = App(tag)
    app.share_etc_hosts()


main(sys.argv)


================================================
FILE: dev_tools/docker/scripts/start_client.sh
================================================
#!/bin/bash

#-------------------------------------------
#
# Insert a new client in a running cluster
#
# Usage:
#        $ cd client_side_tests/<client>
#        $ ../../scripts/start_client.sh <PORT>
#
real_path=`readlink -f ${BASH_SOURCE[0]}`
script_dir=`dirname ${real_path}`
share_hosts_bin="python ${script_dir}/share_etc_hosts.py"

client_dir=`basename $PWD`
port=${1:-3333}

if [[ -z "${DOCKER_HOST_IP}" ]]
then 
	echo "No explicit DOCKER_HOST_IP in your env: localhost is assumed"
	DOCKER_HOST_IP=localhost
fi

# We assume that there is only one service with that name
cluster_tag=$(docker ps | grep resourcemanager | \
                     awk '{print $NF}'| sed -e 's/_.*$//')
client_name=${cluster_tag}_client_${client_dir}
docker run -d --name ${client_name} -p ${port}:22 crs4_pydoop/client:latest
${share_hosts_bin} ${cluster_tag}

rm_id=$(docker ps | grep resourcemanager | awk '{print $1}')
client_id=$(docker ps | grep ${client_name} | awk '{print $1}')

(cat ${HOME}/.ssh/id_dsa.pub | docker exec -i ${client_id} tee -a /root/.ssh/authorized_keys) > /dev/null

if [ -x ./initialize.sh ]; then
    ./initialize.sh ${port} ${client_id} ${rm_id}  ${DOCKER_HOST_IP}
fi




================================================
FILE: dev_tools/docker/scripts/start_cluster.sh
================================================
#!/bin/bash

cluster_name=$1
script_dir=$(cd $(dirname ${BASH_SOURCE}); pwd; cd - >/dev/null)
share_hosts_bin="python ${script_dir}/share_etc_hosts.py"
cluster_path="${script_dir}/../clusters/${cluster_name}"

tag=`echo ${cluster_name} | tr -d '._/'`

cd ${cluster_path}

docker-compose stop
docker-compose rm

for x in logs local
do
    if [ -d ${x} ]; then
        backup=${x}.backup.$$
        mv ${x} ${backup}
        echo "Moved ${x} to ${backup}"
    fi
    mkdir ${x}
    chmod 1777 ${x}
done

docker-compose up -d
${share_hosts_bin} ${tag}


================================================
FILE: dev_tools/docker_build
================================================
#!/usr/bin/env bash

set -euo pipefail
this="${BASH_SOURCE-$0}"
this_dir=$(cd -P -- "$(dirname -- "${this}")" && pwd -P)

pushd "${this_dir}/.."
docker build --build-arg HADOOP_MAJOR_VERSION=2 -t crs4/pydoop-hadoop2 .
docker build -t crs4/pydoop .
docker build -t crs4/pydoop-docs -f Dockerfile.docs .
popd


================================================
FILE: dev_tools/dump_app_params
================================================
#!/usr/bin/env python

"""
Dump app options in rst table format.
"""

import sys
import argparse

import pydoop.app.main


AUTOGEN_NOTICE = """\
..
  Auto-generated by %(prog)s. DO NOT EDIT!
  To update, run:
    %(prog)s --app %(app)s -o %(out_fn)s

"""


def set_option_attrs(actions):
    for a in actions:
        opts = a.option_strings
        assert len(opts) > 0
        try:
            a.short_opt, a.long_opt = opts
        except ValueError:
            o = opts[0]
            assert o.startswith('-')
            if o.startswith('--'):
                a.short_opt, a.long_opt = None, o
            else:
                a.short_opt, a.long_opt = o, None


def get_col_widths(actions):
    lengths = {}
    for a in actions:
        for n in 'short_opt', 'long_opt', 'help':
            attr = getattr(a, n)
            lengths.setdefault(n, []).append(0 if attr is None else len(attr))
    widths = dict((k, max(v)) for k, v in lengths.items())
    # add 4 for ``backticks``
    for n in 'short_opt', 'long_opt':
        widths[n] += 4
    return widths


class Formatter(object):

    NAMES = 'short_opt', 'long_opt', 'help'

    def __init__(self, actions):
        self.col_widths = get_col_widths(actions)
        self.actions = actions

    def format_line(self, fields):
        ln = [f.ljust(self.col_widths[n]) for f, n in zip(fields, self.NAMES)]
        return '| %s |' % ' | '.join(ln)

    def format_action(self, action):
        ln = []
        for n in 'short_opt', 'long_opt':
            opt = getattr(action, n)
            ln.append('``%s``' % opt if opt else '')
        ln.append(getattr(action, 'help'))
        return self.format_line(ln)

    def hline(self, filler='-'):
        ln = []
        for n in self.NAMES:
            ln.append(filler * self.col_widths[n])
        return '+{0}{1}{0}+'.format(
            filler, '{0}+{0}'.format(filler).join(ln)
        )

    def header_lines(self):
        lines = [self.hline()]
        lines.append(self.format_line(['Short', 'Long', 'Meaning']))
        lines.append(self.hline(filler='='))
        return lines

    def dump_table(self, outf, exclude_h=True):
        for ln in self.header_lines():
            outf.write(ln + '\n')
        for a in self.actions:
            if exclude_h and a.short_opt == '-h':
                continue
            outf.write(self.format_action(a) + '\n')
            outf.write(self.hline() + '\n')


def make_parser():
    parser = argparse.ArgumentParser(description='dump pydoop app help')
    parser.add_argument('-o', '--out-fn', metavar='FILE', help='output file')
    parser.add_argument('--app', metavar='PYDOOP_APP_NAME', default='script')
    return parser


def main():
    parser = make_parser()
    args = parser.parse_args()
    outf = None
    pydoop_parser = pydoop.app.main.make_parser()
    subp = pydoop_parser._pydoop_docs_helper[args.app]
    act_map = dict((_.title, _._group_actions) for _ in subp._action_groups)
    actions = act_map['optional arguments']
    set_option_attrs(actions)
    fmt = Formatter(actions)
    try:
        outf = open(args.out_fn, 'w') if args.out_fn else sys.stdout
        outf.write(AUTOGEN_NOTICE % {
            'prog': sys.argv[0],
            'app': args.app,
            'out_fn': args.out_fn
        })
        fmt.dump_table(outf)
    finally:
        if outf:
            outf.close()


if __name__ == '__main__':
    main()


================================================
FILE: dev_tools/edit_conf
================================================
#!/usr/bin/env python

"""\
A utility to edit hadoop configuration files.

Usage::

  $ edit_conf conf/yarn-site.xml tmp.xml \
       yarn.nodemanager.resource.cpu-vcores 2 \
       yarn.nodemanager.resource.memory-mb 1024
"""

from lxml import etree as ET
import sys


def doc_to_dict(doc):
    props = {}
    root = doc.getroot()
    for p in root.findall('property'):
        props[p.find('name').text] = p.find('value').text
    return props


def dict_to_doc(props):
    doc = ET.ElementTree(ET.fromstring('<configuration/>'))
    root = doc.getroot()
    pi = ET.ProcessingInstruction(
        'xml-stylesheet',
        'type="text/xsl" href="configuration.xsl"')
    root.addprevious(pi)
    for k in props:
        p = ET.SubElement(root, "property")
        name = ET.SubElement(p, "name")
        val = ET.SubElement(p, "value")
        name.text, val.text = k, props[k]
    return doc


def main(argv):
    assert len(argv) >= 2 and not (len(argv) & 0x01)
    conf_input = argv[0]
    conf_output = argv[1]
    doc = ET.parse(conf_input)
    props = doc_to_dict(doc)
    ai = iter(argv[2:])
    for k, v in zip(ai, ai):
        props[k] = v
    ndoc = dict_to_doc(props)
    with open(conf_output, 'wb') as f:
        f.write(ET.tostring(
            ndoc,
            encoding="utf-8",
            xml_declaration=True,
            pretty_print=True
        ))


if __name__ == "__main__":
    main(sys.argv[1:])


================================================
FILE: dev_tools/git_export
================================================
#!/usr/bin/env python

"""
Export git working copy including uncommitted changes
"""

import sys
import os
import argparse
import shutil
import subprocess as sp


THIS_DIR = os.path.dirname(os.path.abspath(__file__))
PARENT_DIR = os.path.dirname(THIS_DIR)
DEFAULT_EXPORT_DIR = os.path.join(PARENT_DIR, "git_export")


def get_sources():
    cmd = "git ls-files --full-name %s" % PARENT_DIR
    return sp.check_output(cmd, shell=True).splitlines()


def export(sources, export_root):
    if os.path.isdir(export_root):
        shutil.rmtree(export_root)
    os.makedirs(export_root)
    for fn in sources:
        d, bn = os.path.split(fn)
        if bn.startswith(".git"):
            print "skipping", fn
            continue
        d = os.path.join(export_root, d)
        if not os.path.isdir(d):
            os.makedirs(d)
        in_path = os.path.join(PARENT_DIR, fn)
        if os.path.islink(in_path):
            in_path = os.path.realpath(in_path)
            out_path = os.path.join(d, bn)
            if os.path.isdir(in_path):
                shutil.copytree(in_path, out_path, symlinks=True)
            else:
                shutil.copy(in_path, out_path)
        else:
            shutil.copy(in_path, d)


def make_parser():
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument("-o", "--output-dir", metavar="DIR",
                        help="output directory", default=DEFAULT_EXPORT_DIR)
    return parser


def main(argv):
    parser = make_parser()
    args = parser.parse_args(argv[1:])
    sources = get_sources()
    export(sources, args.output_dir)


if __name__ == "__main__":
    main(sys.argv)


================================================
FILE: dev_tools/import_src
================================================
#!/usr/bin/env python

"""
Import Hadoop pipes/utils source code.

NOTE: starting from cdh4.3, there is a single Hadoop tarball with both
mr2 and mr1 code. The latter is located in:
${HADOOP_HOME}/src/hadoop-mapreduce1-project/. To fetch the code for
mrv1, run import_src ${HADOOP_HOME}/src/hadoop-mapreduce1-project; to
fetch the code for mrv2, run import_src ${HADOOP_HOME} --skip-dir
hadoop-mapreduce1-project.
"""

import sys, os, argparse, warnings, shutil


WANTED = {  # basename: relative location
  "StringUtils.cc": "utils/impl",
  "SerialUtils.cc": "utils/impl",
  "StringUtils.hh": "utils/api/hadoop",
  "SerialUtils.hh": "utils/api/hadoop",
  "HadoopPipes.cc": "pipes/impl",
  "Pipes.hh": "pipes/api/hadoop",
  "TemplateFactory.hh": "pipes/api/hadoop",
  #--- libhdfs, all versions ---
  "hdfs.h": "libhdfs",
  "hdfs.c": "libhdfs",
  # --- libhdfs, old versions ---
  "hdfsJniHelper.h": "libhdfs",
  "hdfsJniHelper.c": "libhdfs",
  # --- libhdfs, recent versions ---
  "jni_helper.h": "libhdfs",
  "jni_helper.c": "libhdfs",
  "native_mini_dfs.h": "libhdfs",
  "native_mini_dfs.c": "libhdfs",
  "exception.h": "libhdfs",
  "exception.c": "libhdfs",
  # --- java pipes ---
  "Application.java": "org/apache/hadoop/mapred/pipes",
  "BinaryProtocol.java": "org/apache/hadoop/mapred/pipes",
  "DownwardProtocol.java": "org/apache/hadoop/mapred/pipes",
  "OutputHandler.java": "org/apache/hadoop/mapred/pipes",
  "PipesMapRunner.java": "org/apache/hadoop/mapred/pipes",
  "PipesNonJavaInputFormat.java": "org/apache/hadoop/mapred/pipes",
  "PipesPartitioner.java": "org/apache/hadoop/mapred/pipes",
  "PipesReducer.java": "org/apache/hadoop/mapred/pipes",
  "Submitter.java": "org/apache/hadoop/mapred/pipes",
  "UpwardProtocol.java": "org/apache/hadoop/mapred/pipes",
  "LocalJobRunner.java": "org/apache/hadoop/mapred",
  }


def get_sources(root_dir, skip=None):
  sources = {}
  for d, _, basenames in os.walk(root_dir):
    if skip in d.split(os.sep):
      continue
    for bn in basenames:
      if bn in WANTED:
        if d.endswith(WANTED[bn]):
          sources[bn] = os.path.join(d, bn)
  missing = set(WANTED) - set(sources)
  if missing:
    warnings.warn("not found: %r" % (sorted(missing),))
  return sources


def make_parser():
  parser = argparse.ArgumentParser(description=__doc__)
  parser.add_argument('hadoop_home', metavar="HADOOP_HOME")
  parser.add_argument("-o", "--output-dir", metavar="DIR",
                      help="output directory")
  parser.add_argument("-s", "--skip-dir", metavar="DIR",
                      help="skip directories with this basename")
  return parser


def main(argv):
  parser = make_parser()
  args = parser.parse_args(argv[1:])
  if not args.output_dir:
    this_dir = os.path.dirname(os.path.abspath(__file__))
    parent_dir = os.path.dirname(this_dir)
    args.output_dir = os.path.join(
      parent_dir, "src", os.path.basename(args.hadoop_home.rstrip("/"))
      )
  if args.skip_dir:
    args.skip_dir = os.path.basename(args.skip_dir)
  sources = get_sources(args.hadoop_home, skip=args.skip_dir)
  for bn, p in sources.iteritems():
    out_dir = os.path.join(args.output_dir, WANTED[bn])
    try:
      os.makedirs(out_dir)
    except OSError:
      pass
    shutil.copy(p, out_dir)
    print "%s -> %s" % (p, out_dir) 


if __name__ == "__main__":
  main(sys.argv)


================================================
FILE: dev_tools/mapred_pipes
================================================
#!/usr/bin/env bash

# Set up the layout needed to build the "mapred" version of pipes

set -euo pipefail
this="${BASH_SOURCE-$0}"
this_dir=$(cd -P -- "$(dirname -- "${this}")" && pwd -P)

if [ $# -lt 1 ]; then
    echo "Usage: $0 HADOOP_SRC"
    exit 1
fi
if [ ! -d "${1}"/hadoop-mapreduce-project ]; then
    echo "ERROR: \"$1\" does not look like a Hadoop source dir"
    exit 1
fi
hadoop_src=${1}

pushd "${this_dir}/.."
mapred_pipes_dir=src/it/crs4/pydoop/mapred/pipes
rm -rf "${mapred_pipes_dir}"
mkdir -p "${mapred_pipes_dir}"
cp -rf "${hadoop_src}"/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/pipes/* "${mapred_pipes_dir}"/
sed -i 's/package org\.apache\.hadoop/package it\.crs4\.pydoop/g' "${mapred_pipes_dir}"/*

# not exactly future-proof
sed_cmd="s|self\.java_files = |self\.java_files = glob.glob(\"${mapred_pipes_dir}/*.java\") + |"
sed -i "${sed_cmd}" setup.py
popd


================================================
FILE: dev_tools/unpack_debian
================================================
#!/usr/bin/env python

"""
Unpack debian packages -- a quick shortcut for debug purposes.
"""

import sys, os, argparse, shutil, subprocess as sp


THIS_DIR = os.path.dirname(os.path.abspath(__file__))
PARENT_DIR = os.path.dirname(THIS_DIR)
DEFAULT_FROM_DIR = os.path.join(PARENT_DIR, "sandbox")
DEFAULT_TO_DIR = os.path.join(PARENT_DIR, "temp")


def get_pkg_map(from_dir):
  pkg_map = {}
  for fn in os.listdir(from_dir):
    if fn.endswith(".deb"):
      tag = fn.split("_", 1)[0]
      pkg_map[tag] = os.path.abspath(os.path.join(from_dir, fn))
  return pkg_map


def unpack(pkg_map, to_dir):
  if os.path.isdir(to_dir):
    shutil.rmtree(to_dir)
  os.makedirs(to_dir)
  for tag, fn in pkg_map.iteritems():
    d = os.path.join(to_dir, tag)
    os.makedirs(d)
    old_wd = os.getcwd()
    os.chdir(d)
    print "unpacking %s to %s" % (fn, d)
    sp.check_call("ar x %s" % fn, shell=True)
    sp.check_call("tar xf data.tar.gz", shell=True)
    sp.check_call("tar xf control.tar.gz", shell=True)
    os.chdir(old_wd)


def make_parser():
  parser = argparse.ArgumentParser(description=__doc__)
  parser.add_argument("-i", "--input-dir", metavar="DIR",
                      help="input directory", default=DEFAULT_FROM_DIR)
  parser.add_argument("-o", "--output-dir", metavar="DIR",
                      help="output directory", default=DEFAULT_TO_DIR)
  return parser


def main(argv):
  parser = make_parser()
  args = parser.parse_args(argv[1:])
  pkg_map = get_pkg_map(args.input_dir)
  unpack(pkg_map, args.output_dir)


if __name__ == "__main__":
  main(sys.argv)


================================================
FILE: dev_tools/update_docs
================================================
#!/bin/bash

set -eu

die() {
    echo "$1" 1>&2
    exit 1
}

DOCS_PREFIX="docs/_build/html"
REPO="https://github.com/crs4/pydoop.git"

[ -f "setup.py" ] || die "ERROR: run from the main repo dir"

git subtree pull --prefix="${DOCS_PREFIX}" "${REPO}" gh-pages --squash
make docs
git add "${DOCS_PREFIX}"
git commit -a -m "updated gh-pages"
git subtree push --prefix="${DOCS_PREFIX}" "${REPO}" gh-pages --squash


================================================
FILE: docs/Makefile
================================================
# Makefile for Sphinx documentation
#

# You can set these variables from the command line.
SPHINXOPTS    =
SPHINXBUILD   = sphinx-build
PAPER         =
BUILDDIR      = _build

# Internal variables.
PAPEROPT_a4     = -D latex_paper_size=a4
PAPEROPT_letter = -D latex_paper_size=letter
ALLSPHINXOPTS   = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .

.PHONY: help clean html dirhtml pickle json htmlhelp qthelp latex changes linkcheck doctest

help:
	@echo "Please use \`make <target>' where <target> is one of"
	@echo "  html      to make standalone HTML files"
	@echo "  dirhtml   to make HTML files named index.html in directories"
	@echo "  pickle    to make pickle files"
	@echo "  json      to make JSON files"
	@echo "  htmlhelp  to make HTML files and a HTML help project"
	@echo "  qthelp    to make HTML files and a qthelp project"
	@echo "  latex     to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
	@echo "  changes   to make an overview of all changed/added/deprecated items"
	@echo "  linkcheck to check all external links for integrity"
	@echo "  doctest   to run all doctests embedded in the documentation (if enabled)"

clean:
	-rm -rf $(BUILDDIR)/*

html:
	$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
	@echo
	@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."

dirhtml:
	$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
	@echo
	@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."

pickle:
	$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
	@echo
	@echo "Build finished; now you can process the pickle files."

json:
	$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
	@echo
	@echo "Build finished; now you can process the JSON files."

htmlhelp:
	$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
	@echo
	@echo "Build finished; now you can run HTML Help Workshop with the" \
	      ".hhp project file in $(BUILDDIR)/htmlhelp."

qthelp:
	$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
	@echo
	@echo "Build finished; now you can run "qcollectiongenerator" with the" \
	      ".qhcp project file in $(BUILDDIR)/qthelp, like this:"
	@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/Pydoop.qhcp"
	@echo "To view the help file:"
	@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/Pydoop.qhc"

latex:
	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
	@echo
	@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
	@echo "Run \`make all-pdf' or \`make all-ps' in that directory to" \
	      "run these through (pdf)latex."

changes:
	$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
	@echo
	@echo "The overview file is in $(BUILDDIR)/changes."

linkcheck:
	$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
	@echo
	@echo "Link check complete; look for any errors in the above output " \
	      "or in $(BUILDDIR)/linkcheck/output.txt."

doctest:
	$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
	@echo "Testing of doctests in the sources finished, look at the " \
	      "results in $(BUILDDIR)/doctest/output.txt."


================================================
FILE: docs/_build/.gitignore
================================================
*
!.gitignore
!html


================================================
FILE: docs/_templates/layout.html
================================================
{% extends "!layout.html" %}


{%- macro mysidebar() %}
      {%- if not embedded %}{% if not theme_nosidebar|tobool %}
      <div class="sphinxsidebar">
        <div class="sphinxsidebarwrapper">
          {%- block sidebarlogo %}
          {%- if logo %}
            <p class="logo"><a href="{{ pathto(master_doc) }}">
              <img class="logo" src="{{ pathto('_static/' + logo, 1) }}" alt="Logo"/>
            </a></p>
          {%- endif %}
          {%- endblock %}
          {%- block sidebartoc %}
          {%- if display_toc %}
            <h3><a href="{{ pathto(master_doc) }}">{{ _('Table Of Contents') }}</a></h3>
            {{ toc }}
          {%- endif %}
          {%- endblock %}
          {%- block sidebarrel %}
          {%- if prev %}
            <h4>{{ _('Previous topic') }}</h4>
            <p class="topless"><a href="{{ prev.link|e }}"
                                  title="{{ _('previous chapter') }}">{{ prev.title }}</a></p>
          {%- endif %}
          {%- if next %}
            <h4>{{ _('Next topic') }}</h4>
            <p class="topless"><a href="{{ next.link|e }}"
                                  title="{{ _('next chapter') }}">{{ next.title }}</a></p>
          {%- endif %}
          {%- endblock %}
          {%- block sidebarsourcelink %}
          {%- endblock %}

					<h4>Get Pydoop</h4>
					<ul>
						<li> <a href="https://pypi.python.org/pypi/pydoop">Download page</a> </li>
						<li> <a href="{{ pathto('installation') }}"> Installation Instructions </a> </li>
					</ul>

					<h4>Contributors</h4>
					<p class="topless">
					Pydoop is developed by:
					<a href="http://www.crs4.it">
						<img src="{{ pathto("_static/crs4.png", 1) }}" alt="CRS4" width="200" height="60" />
					</a>
					</p>
          {%- if customsidebar %}
          {% include customsidebar %}
          {%- endif %}
          {%- block sidebarsearch %}
          {%- if pagename != "search" %}
          <div id="searchbox" style="display: none">
            <h3>{{ _('Quick search') }}</h3>
              <form class="search" action="{{ pathto('search') }}" method="get">
                <input type="text" name="q" size="18" />
                <input type="submit" value="{{ _('Go') }}" />
                <input type="hidden" name="check_keywords" value="yes" />
                <input type="hidden" name="area" value="default" />
              </form>
              <p class="searchtip" style="font-size: 90%">
              {{ _('Enter search terms or a module, class or function name.') }}
              </p>
          </div>
          <script type="text/javascript">$('#searchbox').show(0);</script>
          {%- endif %}
          {%- endblock %}
        </div>
      </div>
      {%- endif %}{% endif %}
{%- endmacro %}


{% block rootrellink %}
	<li><a href="{{ pathto('index') }}">Home</a>|&nbsp;</li>
	<li><a href="{{ pathto('installation') }}">Installation</a>|&nbsp;</li>
	<li><a href="https://github.com/crs4/pydoop/issues">Support</a>|&nbsp;</li>
	<li><a href="https://github.com/crs4/pydoop">Git Repo</a>|&nbsp;</li>
	<li><a href="https://crs4.github.io/pydoop/_pydoop1">Pydoop 1</a></li>
{% endblock %}

{# put the sidebar before the body #}
{% block sidebar1 %}
{{ mysidebar() }}
{% endblock %}
{% block sidebar2 %}{% endblock %}


================================================
FILE: docs/api_docs/hadut.rst
================================================
.. _hadut:

:mod:`pydoop.hadut` --- Hadoop shell interaction
================================================

.. automodule:: pydoop.hadut
   :members:


================================================
FILE: docs/api_docs/hdfs_api.rst
================================================
.. _hdfs-api:

:mod:`pydoop.hdfs` --- HDFS API
===============================

.. automodule:: pydoop.hdfs
   :members:

.. automodule:: pydoop.hdfs.path
   :members:

.. automodule:: pydoop.hdfs.fs
   :members:

.. automodule:: pydoop.hdfs.file
   :members: FileIO

.. autoclass:: pydoop.hdfs.file.local_file


================================================
FILE: docs/api_docs/index.rst
================================================
.. _api-docs:

API Docs
========

.. toctree::

   mr_api
   hdfs_api
   hadut


================================================
FILE: docs/api_docs/mr_api.rst
================================================
.. _mr_api:

:mod:`pydoop.mapreduce.api` --- MapReduce API
=============================================

.. automodule:: pydoop.mapreduce.api
   :members:

.. autofunction:: pydoop.mapreduce.pipes.run_task


================================================
FILE: docs/conf.py
================================================
# -*- coding: utf-8 -*-
#
# Pydoop documentation build configuration file, created by
# sphinx-quickstart on Sun Jun 20 17:06:55 2010.
#
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.

import datetime

FIRST_RELEASE_YEAR = 2009
CURRENT_YEAR = datetime.datetime.now().year

# No need to hack the path, we install before building docs
# sys.path[1:1] = [ os.path.abspath('../pydoop') ]

# -- General configuration ----------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
    'sphinx.ext.autodoc',
    'sphinx.ext.doctest',
    'sphinx.ext.imgmath',
    'sphinx.ext.ifconfig',
    'sphinx.ext.intersphinx'
]

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

# The suffix of source filenames.
source_suffix = '.rst'

# The encoding of source files.
# source_encoding = 'utf-8'

# The master toctree document.
master_doc = 'index'

# General information about the project.
project = u'Pydoop'
copyright = u'%d-%d, CRS4' % (FIRST_RELEASE_YEAR, CURRENT_YEAR)

# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#

# The short X.Y version.
with open("../VERSION") as f:
    version_string = f.read().strip()
version = ".".join(version_string.split(".", 2)[:2])
# The full version, including alpha/beta/rc tags.
release = version_string

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
# language = None

# There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used:
# today = ''
# Else, today_fmt is used as the format for a strftime call.
# today_fmt = '%B %d, %Y'

# Avoid doc-not-included-in-toctree warning
exclude_patterns = [
    'pydoop_script_options.rst',  # included with ..include::
    'pydoop_submit_options.rst',  # included with ..include::
]

# List of directories, relative to source directory, that shouldn't be searched
# for source files.
exclude_trees = ['_build']

# The reST default role (used for this markup: `text`) to use for all
# documents.
# default_role = None

# If true, '()' will be appended to :func: etc. cross-reference text.
# add_function_parentheses = True

# If true, the current module name will be prepended to all description
# unit titles (such as .. function::).
# add_module_names = True

# If true, sectionauthor and moduleauthor directives will be shown in the
# output. They are ignored by default.
# show_authors = False

# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'

# A list of ignored prefixes for module index sorting.
# modindex_common_prefix = []


# -- Options for HTML output --------------------------------------------------

# The theme to use for HTML and HTML Help pages.  Major themes that come with
# Sphinx are currently 'default' and 'sphinxdoc'.
html_theme = 'sphinxdoc'

# Theme options are theme-specific and customize the look and feel of a theme
# further.  For a list of options available for each theme, see the
# documentation.
# html_theme_options = {}

# Add any paths that contain custom themes here, relative to this directory.
# html_theme_path = []

# The name for this set of Sphinx documents.  If None, it defaults to
# "<project> v<release> documentation".
# html_title = None

# A shorter title for the navigation bar.  Default is the same as html_title.
# html_short_title = None

# The name of an image file (relative to this directory) to place at the top
# of the sidebar.
html_logo = "_static/logo.png"

# The name of an image file (within the static path) to use as favicon of the
# docs.  This file should be a Windows icon file (.ico) being 16x16 or 32x32
# pixels large.
html_favicon = "_static/favicon.ico"

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']

# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
# using the given strftime format.
# html_last_updated_fmt = '%b %d, %Y'

# If true, SmartyPants will be used to convert quotes and dashes to
# typographically correct entities.
# html_use_smartypants = True

# Custom sidebar templates, maps document names to template names.
# html_sidebars = {}

# Additional templates that should be rendered to pages, maps page names to
# template names.
# html_additional_pages = {}

# If false, no module index is generated.
# html_use_modindex = True

# If false, no index is generated.
# html_use_index = True

# If true, the index is split into individual pages for each letter.
# html_split_index = False

# If true, links to the reST sources are added to the pages.
# html_show_sourcelink = True

# If true, an OpenSearch description file will be output, and all pages will
# contain a <link> tag referring to it.  The value of this option must be the
# base URL from which the finished HTML is served.
# html_use_opensearch = ''

# If nonempty, this is the file name suffix for HTML files (e.g. ".xhtml").
# html_file_suffix = ''

# Output file base name for HTML help builder.
htmlhelp_basename = 'Pydoopdoc'


# -- Options for LaTeX output -------------------------------------------------

# The paper size ('letter' or 'a4').
# latex_paper_size = 'letter'

# The font size ('10pt', '11pt' or '12pt').
# latex_font_size = '10pt'

# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title, author, documentclass
# [howto/manual]).
latex_documents = [
    ('index', 'Pydoop.tex', u'Pydoop Documentation',
     u'Simone Leo, Gianluigi Zanetti', 'manual'),
]

# The name of an image file (relative to this directory) to place at the top of
# the title page.
# latex_logo = None

# For "manual" documents, if this is true, then toplevel headings are parts,
# not chapters.
# latex_use_parts = False

# Additional stuff for the LaTeX preamble.
# latex_preamble = ''

# Documents to append as an appendix to all manuals.
# latex_appendices = []

# If false, no module index is generated.
# latex_use_modindex = True

# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {'python': ('http://docs.python.org/2.7', None)}


================================================
FILE: docs/examples/avro.rst
================================================
.. _avro_io:

Avro I/O
========

Pydoop transparently supports reading and writing `Avro
<http://avro.apache.org>`_ records in MapReduce applications. This must be
enabled by setting appropriate options in ``pydoop submit`` (see below).

The following program implements a (slightly
modified) version of the color count example from the Avro docs:

.. literalinclude:: ../../examples/avro/py/color_count.py
   :language: python
   :start-after: DOCS_INCLUDE_START

The application counts the per-office occurrence of favorite colors in
a dataset of user records with the following structure:

.. literalinclude:: ../../examples/avro/schemas/user.avsc
   :language: javascript

User records are read from an Avro container stored on HDFS, and
results are written to another Avro container with the following
schema:

.. literalinclude:: ../../examples/avro/schemas/stats.avsc
   :language: javascript

Pydoop transparently serializes and/or deserializes Avro data as
needed, allowing you to work directly with Python dictionaries.  To
get this behavior, enable Avro I/O and specify the output schema as follows:

.. code-block:: bash

  export STATS_SCHEMA=$(cat stats.avsc)
  pydoop submit \
    -D pydoop.mapreduce.avro.value.output.schema="${STATS_SCHEMA}" \
    --avro-input v --avro-output v \
    --upload-file-to-cache color_count.py \
    color_count input output

The ``--avro-input v`` and ``--avro-output v`` flags specify that we
want to work with Avro records on MapReduce values; the other possible
choices are ``"k"``, where records are exchanged over keys, and
``"kv"``, which assumes that the top-level record structure has two
fields named ``"key"`` and ``"value"`` and passes the former on keys
and the latter on values.

Note that we did not have to specify any input schema: in this case,
Avro automatically falls back to the *writer schema*, i.e., the one
that's been used to write the container file.

The ``examples/avro`` directory contains examples for all I/O modes.


Avro-Parquet I/O
----------------

The above example focuses on `Avro containers
<http://avro.apache.org/docs/1.7.6/spec.html#Object+Container+Files>`_.
However, Pydoop supports any input/output format that exchanges Avro
records.  In particular, it can be used to read from and write to
Avro-Parquet files, i.e., `Parquet
<http://parquet.incubator.apache.org>`_ files that use the Avro object
model.

.. note::

  Make sure you have Parquet version 1.6 or later to avoid running
  into `object reuse problems
  <https://issues.apache.org/jira/browse/PARQUET-62>`_.  More
  generally, the record writer must be aware of the fact that records
  passed to its ``write`` method are mutable and can be reused by the
  caller.

The following application reproduces the k-mer count example from the
`ADAM <https://github.com/bigdatagenomics/adam>`_ docs:

.. literalinclude:: ../../examples/avro/py/kmer_count.py
   :language: python
   :start-after: DOCS_INCLUDE_START

To run the above program, execute pydoop submit as follows:

.. code-block:: bash

  export PROJECTION=$(cat projection.avsc)
  pydoop submit \
     -D parquet.avro.projection="${PROJECTION}" \
    --upload-file-to-cache kmer_count.py \
    --input-format parquet.avro.AvroParquetInputFormat \
    --avro-input v --libjars "path/to/the/parquet/jar" \
    kmer_count input output

Since we are using an external input format (Avro container input and
output formats are integrated into the Java Pydoop code), we have to
specify the corresponding class via ``--input-format`` and its jar
with ``--libjars``.  The optional parquet projection allows to extract
only selected fields from the input data.  Note that, in this case,
reading input records from values is not an option: that's how
``AvroParquetInputFormat`` works.

More Avro-Parquet examples are available under ``examples/avro``.


Running the examples
--------------------

To run the Avro examples you have to install the Python Avro package
(you can get it from the Avro web site), while the ``avro`` jar is
included in Hadoop and the ``avro-mapred`` one is included in Pydoop.
Part of the examples code (e.g., input generation) is written in Java.
Compilation and packaging into a jar is handled by the bash runners,
but `Maven <https://maven.apache.org/>`_ needs to be installed on the
client machine.


================================================
FILE: docs/examples/index.rst
================================================
.. _examples:

Examples
========

.. toctree::
   :maxdepth: 2

   intro
   sequence_file
   input_format
   avro


================================================
FILE: docs/examples/input_format.rst
================================================
.. _input_format_example:

Writing a Custom InputFormat
============================

You can use a custom Java ``InputFormat`` together with a Python
:class:`~pydoop.mapreduce.api.RecordReader`: the java RecordReader
supplied by the ``InputFormat`` will be overridden by the Python one.

Consider the following simple modification of Hadoop's built-in
``TextInputFormat``:

.. literalinclude:: ../../examples/input_format/it/crs4/pydoop/mapreduce/TextInputFormat.java
   :language: java
   :start-after: DOCS_INCLUDE_START

With respect to the default one, this InputFormat adds a configurable
boolean parameter (``pydoop.input.issplitable``) that, if set to
``false``, makes input files non-splitable (i.e., you can't get more
input splits than the number of input files).

For details on how to compile the above code into a jar and use it
with Pydoop, see ``examples/input_format``\ .


================================================
FILE: docs/examples/intro.rst
================================================
Introduction
============

Pydoop includes several usage examples: you can find them in the
"examples" subdirectory of the distribution root. 


Python Dependencies
-------------------

If you've installed Pydoop or other Python packages needed by your
application in a non-standard location (e.g.,
``/opt/lib/python3.6/site-packages``), the Python code that runs within
Hadoop tasks might not be able to find them. Note that, according to your
Hadoop version or configuration, map and reduce tasks might run as a
different user than the one who launched the job. If you can't install
globally, Pydoop offers the option of shipping packages automatically
upon job submission, see the section on :ref:`installation-free
usage<self_contained>`.


Input Data
----------

Most examples, by default, take their input from a free version of
Lewis Carrol's "Alice's Adventures in Wonderland" available at
`Project Gutenberg <http://www.gutenberg.org>`_ (see the
``examples/input`` sub-directory).


================================================
FILE: docs/examples/sequence_file.rst
================================================
Using the Hadoop SequenceFile Format
====================================

Although many MapReduce applications deal with text files, there are
many cases where processing binary data is required. In this case, you
basically have two options:

#. write appropriate :class:`~pydoop.mapreduce.api.RecordReader` /
   :class:`~pydoop.mapreduce.api.RecordWriter` classes for the binary format
   you need to process
#. convert your data to Hadoop's standard ``SequenceFile`` format.

To write sequence files with Pydoop, set the output format and the
compression type as follows::

  pydoop submit \
  --output-format=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat \
  -D mapreduce.output.fileoutputformat.compress.type=NONE|RECORD|BLOCK [...]

To read sequence files, set the input format as follows::

  pydoop submit \
  --input-format=org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat


Example Application: Filter Wordcount Results
---------------------------------------------

``SequenceFile`` is mostly useful to handle complex objects like
C-style structs or images. To keep our example as simple as possible,
we considered a situation where a MapReduce task needs to emit the raw
bytes of an integer value.

We wrote a trivial application that reads input from a previous
:ref:`word count <word_count>` run and filters out
words whose count falls below a
configurable threshold. Of course, the filter could have been directly
applied to the wordcount reducer: the job has been artificially split
into two runs to give a ``SequenceFile`` read / write example.

Suppose you know in advance that most counts will be large, but not so
large that they cannot fit in a 32-bit integer: since the decimal
representation could require as much as 10 bytes, you decide to save
space by having the wordcount reducer emit the raw four bytes of the
integer instead:

.. literalinclude:: ../../examples/sequence_file/bin/wordcount.py
   :language: python
   :pyobject: WordCountReducer

Since newline characters can appear in the serialized values, you
cannot use the standard text format where each line contains a
tab-separated key-value pair. The problem can be solved by using
``SequenceFileOutputFormat`` for wordcount and
``SequenceFileInputFormat`` for the filtering application.

The full source code for the example is available under
``examples/sequence_file``\ .


================================================
FILE: docs/how_to_cite.rst
================================================
How to Cite
===========

Pydoop is developed and maintained by researchers at `CRS4
<http://www.crs4.it>`_ -- Distributed Computing group.  If you use
Pydoop as part of your research work, please cite `the HPDC 2010 paper
<https://doi.org/10.1145/1851476.1851594>`_.

**Plain text**::

  S. Leo and G. Zanetti.  Pydoop: a Python MapReduce and HDFS API for
  Hadoop.  In Proceedings of the 19th ACM International Symposium on
  High Performance Distributed Computing, 819-825, 2010.

**BibTeX**::

  @inproceedings{Leo:2010:PPM:1851476.1851594,
   author = {Leo, Simone and Zanetti, Gianluigi},
   title = {{Pydoop: a Python MapReduce and HDFS API for Hadoop}},
   booktitle = {{Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing}},
   series = {HPDC '10},
   year = {2010},
   isbn = {978-1-60558-942-8},
   location = {Chicago, Illinois},
   pages = {819--825},
   numpages = {7},
   url = {http://doi.acm.org/10.1145/1851476.1851594},
   doi = {10.1145/1851476.1851594},
   acmid = {1851594},
   publisher = {ACM},
   address = {New York, NY, USA},
  }


================================================
FILE: docs/index.rst
================================================
.. Pydoop documentation master file, created by
   sphinx-quickstart on Sun Jun 20 17:06:55 2010.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

**Pydoop** is a Python interface to `Hadoop
<http://hadoop.apache.org>`_ that allows you to write MapReduce
applications in pure Python:

.. literalinclude:: ../examples/pydoop_submit/mr/wordcount_minimal.py
   :language: python
   :pyobject: Mapper

.. literalinclude:: ../examples/pydoop_submit/mr/wordcount_minimal.py
   :language: python
   :pyobject: Reducer

Feature highlights:

* a rich :ref:`HDFS API <hdfs_api_tutorial>`;

* a :ref:`MapReduce API <api_tutorial>` that allows to write pure
  Python record readers / writers, partitioners and combiners;

* transparent :ref:`Avro (de)serialization <avro_io>`.

Pydoop enables MapReduce programming via a pure (except for a
performance-critical serialization section) Python client for Hadoop
Pipes, and HDFS access through an extension module based on `libhdfs
<https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/LibHdfs.html>`_.

To get started, read the :ref:`tutorial <tutorial>`.  Full docs,
including :ref:`installation instructions <installation>`, are listed
below.


Contents
========

.. toctree::
   :maxdepth: 2

   news/index
   tutorial/index
   installation
   pydoop_script
   running_pydoop_applications
   api_docs/index
   examples/index
   self_contained
   how_to_cite


Indices and Tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`


================================================
FILE: docs/installation.rst
================================================
.. _installation:

Installation
============

Prerequisites
-------------

We regularly test Pydoop on Ubuntu only, but it should also work on other
Linux distros and (possibly with some tweaking) on macOS. Other platforms are
**not** supported. Additional requirements:

* `Python <http://www.python.org>`_ 2 or 3, including header files (e.g.,
  ``apt-get install python-dev``, ``yum install python-devel``);

* `setuptools <https://pypi.python.org/pypi/setuptools>`_ >= 3.3;

* Hadoop >=2. We run regular CI tests with recent versions of
  `Apache Hadoop <http://hadoop.apache.org/releases.html>`_ 2.x and 3.x,
  but we expect Pydoop to also work with other Hadoop distributions. In
  particular, we have tested it on `Amazon EMR <https://aws.amazon.com/emr>`_
  (see :ref:`emr`).

These are both build time and run time requirements. At build time you will
also need a C++ compiler (e.g., ``apt-get install build-essential``, ``yum
install gcc gcc-c++``) and a JDK (a JRE is not sufficient).

**Optional:**

* `Avro <https://avro.apache.org/>`_ Python implementation to enable
  :ref:`avro_io` (run time only). Note that the pip packages for Python 2 and 3
  are named differently (respectively ``avro`` and ``avro-python3``).


Environment Setup
-----------------

To compile the HDFS extension module, Pydoop needs the path to the JDK
installation. You can specify this via ``JAVA_HOME``. For instance::

  export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"

Note that Pydoop is interested in the **JDK** home (where ``include/jni.h``
can be found), not the JRE home. Depending on your Java distribution and
version, these can be different directories (usually the former being the
latter's parent). If ``JAVA_HOME`` is not found in the environment, Pydoop
will try to locate the JDK via Java system properties.

Pydoop also includes some Java components, and it needs Hadoop libraries to be
in the ``CLASSPATH`` in order to build them. This is done by calling ``hadoop
classpath``, so make sure that the ``hadoop`` executable is in the
``PATH``. For instance, if Hadoop was installed by unpacking the tarball into
``/opt/hadoop``::

  export PATH="/opt/hadoop/bin:/opt/hadoop/sbin:${PATH}"

The Hadoop class path is also needed at run time by the HDFS extension. Again,
since Pydoop picks it up from ``hadoop classpath``, ensure that ``hadoop`` is
in the ``PATH``, as shown above. ``pydoop submit`` must also be able to call
the ``hadoop`` executable.

Additionally, Pydoop needs to read part of the Hadoop configuration to adapt
to specific scenarios. If ``HADOOP_CONF_DIR`` is in the environment, Pydoop
will try to read the configuration from the corresponding location. As a
fallback, Pydoop will also try ``${HADOOP_HOME}/etc/hadoop`` (in the above
example, ``HADOOP_HOME`` would be ``/opt/hadoop``). If ``HADOOP_HOME`` is not
defined, Pydoop will try to guess it from the ``hadoop`` executable (again,
this will have to be in the ``PATH``).


Building and Installing
-----------------------

Install prerequisites::

  pip install --upgrade pip
  pip install --upgrade -r requirements.txt

Install Pydoop via pip::

  pip install pydoop

To install a pre-release (e.g., alpha, beta) add ``--pre``::

  pip install --pre pydoop

You can also install the latest development version from GitHub::

  git clone https://github.com/crs4/pydoop.git
  cd pydoop
  python setup.py build
  python setup.py install --skip-build

If possible, you should install Pydoop on all cluster nodes. Alternatively, it
can be distributed, together with your MapReduce applications, via the Hadoop
distributed cache (see :doc:`self_contained`).


Troubleshooting
---------------

#. ``libjvm.so`` not found: try the following::

    export LD_LIBRARY_PATH="${JAVA_HOME}/jre/lib/amd64/server:${LD_LIBRARY_PATH}"

#. non-standard include/lib directories: the setup script looks for
   includes and libraries in standard places -- read ``setup.py`` for
   details. If some of the requirements are stored in different
   locations, you need to add them to the search path. Example::

    python setup.py build_ext -L/my/lib/path -I/my/include/path -R/my/lib/path
    python setup.py build
    python setup.py install --skip-build

   Alternatively, you can write a small ``setup.cfg`` file for distutils:

   .. code-block:: cfg

    [build_ext]
    include_dirs=/my/include/path
    library_dirs=/my/lib/path
    rpath=%(library_dirs)s

   and then run ``python setup.py install``.

   Finally, you can achieve the same result by manipulating the
   environment.  This is particularly useful in the case of automatic
   download and install with pip::

    export CPATH="/my/include/path:${CPATH}"
    export LD_LIBRARY_PATH="/my/lib/path:${LD_LIBRARY_PATH}"
    pip install pydoop


Testing your Installation
-------------------------

After Pydoop has been successfully installed, you might want to run unit
tests and/or examples to verify that everything works fine. Here is a short
list of things that can go wrong and how to fix them. For full details on
running tests and examples, see ``.travis.yml``.

#. Incomplete configuration: make sure that Pydoop is able to find the
   ``hadoop`` executable and configuration directory (check the above section
   on environment setup).

#. Cluster not ready: wait until all Hadoop daemons are up and HDFS exits from
   safe mode (``hadoop dfsadmin -safemode wait``).

#. HDFS tests may fail if your NameNode's hostname and port are
   non-standard. In this case, set the ``HDFS_HOST`` and ``HDFS_PORT``
   environment variables accordingly.

#. Some HDFS tests may fail if not run by the cluster superuser, in
   particular ``capacity``, ``chown`` and ``used``.  To get superuser
   privileges, you can either start the cluster with your own user account or
   set the ``dfs.permissions.superusergroup`` Hadoop property to one of your
   unix groups (type ``groups`` at the command prompt to get the list of
   groups for your current user), then restart the HDFS daemons.


.. _emr:

Using Pydoop on Amazon EMR
--------------------------

You can configure your EMR cluster to automatically install Pydoop on
all nodes via `Bootstrap Actions
<https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-bootstrap.html>`_. The
main difficulty is that Pydoop relies on Hadoop being installed and
configured, even at compile time, so the bootstrap script needs to
wait until EMR has finished setting it up:

.. code-block:: bash

  #!/bin/bash
  PYDOOP_INSTALL_SCRIPT=$(cat <<EOF
  #!/bin/bash
  NM_PID=/var/run/hadoop-yarn/yarn-yarn-nodemanager.pid
  RM_PID=/var/run/hadoop-yarn/yarn-yarn-resourcemanager.pid
  while [ ! -f \${RM_PID} ] && [ ! -f \${NM_PID} ]; do
    sleep 2
  done
  export JAVA_HOME=/etc/alternatives/java_sdk
  sudo -E pip install pydoop
  EOF
  )
  echo "${PYDOOP_INSTALL_SCRIPT}" | tee -a /tmp/pydoop_install.sh
  chmod u+x /tmp/pydoop_install.sh
  /tmp/pydoop_install.sh >/tmp/pydoop_install.out 2>/tmp/pydoop_install.err &

The bootstrap script creates the actual installation script and calls
it; the latter, in turn, waits for either the resource manager or the
node manager to be up (i.e., for YARN to be up whether we are on
the master or on a slave) before installing Pydoop. If you want to use
Python 3, install version 3.6 with yum:

.. code-block:: bash

  #!/bin/bash
  sudo yum -y install python36-devel python36-pip
  sudo alternatives --set python /usr/bin/python3.6
  PYDOOP_INSTALL_SCRIPT=$(cat <<EOF
  ...

The above instructions have been tested on ``emr-5.12.0``.


Trying Pydoop without installing it
-----------------------------------

You can try Pydoop on a `Docker <https://www.docker.com/>`_ container. The
Dockerfile is in the distribution root directory::

  docker build -t pydoop .
  docker run --name pydoop -d pydoop

This spins up a single-node, `pseudo-distributed
<https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation>`_
Hadoop cluster with `HDFS
<https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Introduction>`_,
`YARN
<https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YARN.html>`_
and a Job History server. Before attempting to use the container, wait a few
seconds until all daemons are up and running.

You may want to expose some ports to the host, such as the ones used by the
web interfaces. For instance::

  docker run --name pydoop -p 8088:8088 -p 9870:9870 -p 19888:19888 -d pydoop

Refer to the Hadoop docs for a complete list of ports used by the various
services.


================================================
FILE: docs/news/archive.rst
================================================
News Archive
------------


New in 1.2.0
^^^^^^^^^^^^

 * Added support for Hadoop 2.7.2.
 * Dropped support for Python 2.6. Maintaining 2.6 compatibility would
   require adding another dimension to the Travis matrix, vastly
   increasing the build time and ultimately slowing down the
   development. Since the default Python version in all major
   distributions is 2.7, the added effort would gain us little.
 * Bug fixes.


New in 1.1.0
^^^^^^^^^^^^

 * Added support for `HDP <http://hortonworks.com/hdp/>`_ 2.2.
 * `Pyavroc <https://github.com/Byhiras/pyavroc>`_ is now
   automatically loaded if installed, enabling much faster (30-40x)
   Avro (de)serialization.
 * Added Timer objects to help debug performance issues.
 * ``NoSeparatorTextOutputFormat`` is now available for all MR
   versions.
 * Added Avro support to the Hadoop Simulator.
 * Bug fixes and performance improvements.


New in 1.0.0
^^^^^^^^^^^^

 * Pydoop now features a brand new, more pythonic :ref:`MapReduce API <mr_api>`
 * Added built-in `Avro <http://avro.apache.org>`_ support (for now,
   only with Hadoop 2).  By setting a few flags in the submitter and
   selecting ``AvroContext`` as your application's context class, you
   can read and write Avro data, transparently manipulating records as
   Python dictionaries.  See the :ref:`avro_io` docs for further details.
 * The new :ref:`pydoop submit <running_apps>` tool drastically
   simplifies job submission, in particular when running applications
   without installing Pydoop and other dependencies on the cluster
   nodes (see :ref:`self_contained`).
 * Added support for testing Pydoop programs in a simulated Hadoop framework
 * Added support (experimental) for MapReduce V2 input/output formats (see
   :ref:`input_format_example`)
 * The :mod:`~pydoop.hdfs.path` module offers many new functions that
   serve as the HDFS-aware counterparts of those in :mod:`os.path`
 * The pipes backend (except for the performance-critical
   serialization section) has been reimplemented in pure Python
 * An alternative (optional) JPype HDFS backend is available
   (currently slower than the one based on libhdfs)
 * Added support for CDH5 and Apache Hadoop 2.4.1, 2.5.2 and 2.6.0
 * Removed support for CDH3 and Apache Hadoop 0.20.2
 * Installation has been greatly simplified: now Pydoop does not
   require any external library to build its native extensions


New in 0.12.0
^^^^^^^^^^^^^

 * YARN is now fully supported
 * Added support for CDH 4.4.0 and CDH 4.5.0


New in 0.11.1
^^^^^^^^^^^^^

 * Added support for hadoop 2.2.0
 * Added support for hadoop 1.2.1

   
New in 0.10.0
^^^^^^^^^^^^^

 * Added support for CDH 4.3.0

 * Added a :meth:`~pydoop.hdfs.fs.hdfs.walk` method to hdfs instances
   (works similarly to :func:`os.walk` from Python's standard library)

 * The Hadoop version parser is now more flexible.  It should be able
   to parse version strings for all CDH releases, including older ones
   (note that most of them are **not** supported)

 * Pydoop script can now handle modules whose file name has no extension

 * Fixed "unable to load native-hadoop library" problem (thanks to
   Liam Slusser)


New in 0.9.0
^^^^^^^^^^^^

* Added explicit support for:

  * Apache Hadoop 1.1.2
  * CDH 4.2.0

* Added support for Cloudera from-parcels layout (as installed by
  Cloudera Manager)

* Added :func:`pydoop.hdfs.move`

* Record writers can now be used in map-only jobs


New in 0.8.1
^^^^^^^^^^^^

* Fixed a problem that was breaking installation from PyPI via pip install


New in 0.8.0
^^^^^^^^^^^^

* Added support for Apple OS X Mountain Lion
* Added support for Hadoop 1.1.1
* Patches now include a fix for `HDFS-829
  <https://issues.apache.org/jira/browse/HDFS-829>`_
* Restructured docs

  * A separate tutorial section collects and expands introductory material


New in 0.7.0
^^^^^^^^^^^^

* Added Debian package


New in 0.7.0-rc3
^^^^^^^^^^^^^^^^

* Fixed a bug in the hdfs instance caching method


New in 0.7.0-rc2
^^^^^^^^^^^^^^^^

* Support for HDFS append open mode

  * fails if your Hadoop version and/or configuration does not support
    HDFS append


New in 0.7.0-rc1
^^^^^^^^^^^^^^^^

* Works with CDH4, with the following limitations:

  * support for MapReduce v1 only
  * CDH4 must be installed from dist-specific packages (no tarball)

* Tested with the latest releases of other Hadoop versions

  * Apache Hadoop 0.20.2, 1.0.4
  * CDH 3u5, 4.1.2

* Simpler build process

  * the source code we need is now included, rather than searched for
    at compile time

* Pydoop scripts can now accept user-defined configuration parameters

  * New examples show how to use the new feature

* New wrapper object makes it easier to interact with the JobConf
* New hdfs.path functions: isdir, isfile, kind
* HDFS: support for string description of permission modes in chmod
* Several bug fixes


New in 0.6.6
^^^^^^^^^^^^

Fixed a bug that was causing the pipes runner to incorrectly preprocess
command line options.


New in 0.6.4
^^^^^^^^^^^^

Fixed several bugs triggered by using a local fs as the default fs for
Hadoop.  This happens when you set a ``file:`` path as the value of
``fs.defaultFS`` in core-site.xml.  For instance:

.. code-block:: xml

  <property>
    <name>fs.defaultFS</name>
    <value>file:///var/hadoop/data</value>
  </property>


New in 0.6.0
^^^^^^^^^^^^

* The HDFS API features new high-level tools for easier manipulation
  of files and directories. See the :ref:`API docs <hdfs-api>` for
  more info
* Examples have been thoroughly revised in order to make them easier
  to understand and run
* Several bugs were fixed; we also introduced a few optimizations,
  most notably the automatic caching of HDFS instances


New in 0.5.0
^^^^^^^^^^^^

* Pydoop now works with Hadoop 1.0
* Multiple versions of Hadoop can now be supported by the same
  installation of Pydoop.
* We have added a :ref:`command line tool <pydoop_script_tutorial>` to
  make it trivially simple to write shorts scripts for simple
  problems.
* In order to work out-of-the-box, Pydoop now requires Pydoop 2.7.
  Python 2.6 can be used provided that you install a few additional
  modules (see the :ref:`installation <installation>` page for
  details).
* We have dropped support for the 0.21 branch of Hadoop, which has
  been marked as unstable and unsupported by Hadoop developers.


================================================
FILE: docs/news/index.rst
================================================
.. _news:

News
====

.. toctree::
   :maxdepth: 1

   latest
   archive


================================================
FILE: docs/news/latest.rst
================================================
New in 2.0.0
------------

Pydoop 2.0.0 adds Python 3 and Hadoop 3 support, and features a complete
overhaul of the ``mapreduce`` subpackage, which is now easier to use and more
efficient. As any major software release, Pydoop 2 also makes some
backwards-incompatible changes, mainly by dropping old, seldom-used
features. Finally, it includes several bug fixes and performance
improvements. Here is a more detailed list of changes:

 * Python 3 support.
 * Hadoop 3 support.
 * The ``sercore`` extension, together with most of the ``pydoop.mapreduce``
   subpackage, has been rewritten from scratch. Now it's simpler and slightly
   faster (much faster when using a combiner).
 * ``JobConf`` is now fully compatible with ``dict``.
 * ``pydoop submit`` now works when the default file system is local.
 * Compilation of avro-parquet-based examples is now much faster.
 * Many utilities for guessing Hadoop environment details have been either
   removed or drastically simplified (affects ``hadoop_utils`` and related
   package-level functions). Pydoop now assumes that the ``hadoop`` command is
   in the ``PATH``, and uses only that information to try fallback values when
   ``HADOOP_HOME`` and/or ``HADOOP_CONF_DIR`` are not defined.
 * The ``hadut`` module has been stripped down to contain little more than
   what's required by ``pydoop submit``. In particular, ``PipesRunner`` is
   gone. Running applications with ``mapred pipes`` still works, but with
   caveats (e.g., `it does not work on the local fs
   <https://issues.apache.org/jira/browse/MAPREDUCE-4000>`_, and controlling
   remote task environment is not trivial).
 * The ``hdfs`` module no longer provides a default value for ``LIBHDFS_OPTS``.
 * The Hadoop simulator has been dropped.
 * `Support for opaque binary input splits <https://github.com/crs4/pydoop/pull/302>`_.
 * `Dropped support for Hadoop 1 <https://github.com/crs4/pydoop/pull/237>`_.
 * `Dropped old MapReduce API <https://github.com/crs4/pydoop/pull/255>`_.
 * `Dropped JPype HDFS backend <https://github.com/crs4/pydoop/pull/238>`_.
 * Bug fixes and performance improvements.


================================================
FILE: docs/pydoop_script.rst
================================================
.. _pydoop_script_guide:

Pydoop Script User Guide
========================

Pydoop Script is the easiest way to write simple MapReduce programs
for Hadoop.  With Pydoop Script, you only need to write a map and/or a reduce
functions and the system will take care of the rest.

For a full explanation please see the :ref:`tutorial <pydoop_script_tutorial>`.


Command Line Tool
-----------------

In the simplest case, Pydoop Script is invoked as::

  pydoop script MODULE INPUT OUTPUT

where ``MODULE`` is the file (on your local file system) containing
your map and reduce functions, in Python, while ``INPUT`` and
``OUTPUT`` are, respectively, the HDFS paths of your input data and
your job's output directory.

Options are shown in the following table.

.. include:: pydoop_script_options.rst


Example: Word Count with Stop Words
+++++++++++++++++++++++++++++++++++

Here is the word count example modified to ignore stop words from a
file that is distributed to all the nodes via the Hadoop distributed
cache:

.. literalinclude:: ../examples/pydoop_script/scripts/wordcount_sw.py
   :language: python
   :start-after: DOCS_INCLUDE_START

To execute the above script, save it to a ``wc.py`` file and run::

  pydoop script wc.py hdfs_input hdfs_output --upload-file-to-cache stop_words.txt

where ``stop_words.txt`` is a text file that contains the stop words,
one per line.

While this script works, it has the obvious weakness of loading the
stop words list even when executing the reducer (since it's loaded as
soon as we import the module).  If this inconvenience is a concern, we
could solve the issue by triggering the loading from the ``mapper``
function, or by writing a :ref:`full Pydoop application <api_tutorial>`
which would give us all the control we need to only load the list when
required.

Writing your Map and Reduce Functions
-------------------------------------

In this section we assume you'll be using the default ``TextInputFormat``
and ``TextOutputFormat``.

Mapper
++++++

The ``mapper`` function in your module will be called for each record
in your input data.  It receives 3 parameters:

#. key: the byte offset with respect to the current input file. In most cases,
   you can ignore it;
#. value: the line of text to be processed;
#. writer object: a Python object to write output and count values (see below);
#. optionally, a job conf object from which to fetch configuration
   property values (see `Accessing Parameters`_ below).

Combiner
++++++++

The ``combiner`` function will be called for each unique key-value pair
produced by your map function.  It also receives 3 parameters:

#. key: the key produced by your map function
#. values iterable: iterate over this parameter to see all the values emitted
   for the current key
#. writer object: a writer object identical to the one given to the
   map function
#. optionally, a job conf object, identical to the one given to the
   map function.

The key-value pair emitted by your combiner will be piped to the reducer.

Reducer
+++++++

The ``reducer`` function will be called for each unique key-value pair
produced by your map function.  It also receives 3 parameters:

#. key: the key produced by your map function;
#. values iterable: iterate over this parameter to traverse all the
   values emitted for the current key;
#. writer object: this is identical to the one given to the map function;
#. optionally, a job conf object, identical to the one given to the
   map function.

The key-value pair emitted by your reducer will be joined by the
key-value separator specified with the ``--kv-separator`` option
(a tab character by default).


Writer Object
+++++++++++++

The writer object given as the third parameter to both the ``mapper``
and ``reducer`` functions has the following methods:

* ``emit(k, v)``: pass a ``(k, v)`` key-value pair to the framework;
* ``count(what, how_many)``: add ``how_many`` to the counter named
  ``what``.  If the counter doesn't already exist, it will be created
  dynamically;
* ``status(msg)``: update the task status to ``msg``;
* ``progress()``: mark your task as having made progress without changing
  the status message.

The latter two methods are useful for keeping your task alive in cases
where the amount of computation to be done for a single record might
exceed Hadoop's timeout interval (Hadoop kills a task if it neither reads an
input, writes an output, nor updates its status for a configurable amount
of time, set to 10 minutes by default).


Accessing Parameters
++++++++++++++++++++

Pydoop Script lets you access the values of your job configuration
properties through a dict-like :class:`~pydoop.mapreduce.api.JobConf`
object, which gets passed as the fourth (optional) parameter to your
functions.


Naming your Functions
+++++++++++++++++++++

If you'd like to give your map and reduce functions names different
from ``mapper`` and ``reducer``, you may do so, but you must tell the
script tool.  Use the ``--map-fn`` and ``--reduce-fn`` command line
arguments to select your customized names.  Combiner functions can only
be assigned by explicitly setting the ``--combine-fn`` flag.


Map-only Jobs
+++++++++++++

You may have a program that doesn't use a reduce function.  Specify
``--num-reducers 0`` on the command line and your map output will be
written directly to file.  In this case, your map output will go
directly to the output formatter and be written to your final output,
separated by the key-value separator.


================================================
FILE: docs/pydoop_script_options.rst
================================================
..
  Auto-generated by dev_tools/dump_app_params. DO NOT EDIT!
  To update, run:
    dev_tools/dump_app_params --app script -o docs/pydoop_script_options.rst

+--------+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| Short  | Long                          | Meaning                                                                                                                                                  |
+========+===============================+==========================================================================================================================================================+
|        | ``--num-reducers``            | Number of reduce tasks. Specify 0 to only perform map phase                                                                                              |
+--------+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--no-override-home``        | Don't set the script's HOME directory to the $HOME in your environment.  Hadoop will set it to the value of the 'mapreduce.admin.user.home.dir' property |
+--------+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--no-override-env``         | Use the default PATH, LD_LIBRARY_PATH and PYTHONPATH, instead of copying them from the submitting client node                                            |
+--------+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--no-override-ld-path``     | Use the default LD_LIBRARY_PATH instead of copying it from the submitting client node                                                                    |
+--------+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--no-override-pypath``      | Use the default PYTHONPATH instead of copying it from the submitting client node                                                                         |
+--------+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--no-override-path``        | Use the default PATH instead of copying it from the submitting client node                                                                               |
+--------+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--set-env``                 | Set environment variables for the tasks. If a variable is set to '', it will not be overridden by Pydoop.                                                |
+--------+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-D`` | ``--job-conf``                | Set a Hadoop property, e.g., -D mapreduce.job.priority=high                                                                                              |
+--------+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--python-zip``              | Additional python zip file                                                                                                                               |
+--------+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--upload-file-to-cache``    | Upload and add this file to the distributed cache.                                                                                                       |
+--------+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--upload-archive-to-cache`` | Upload and add this archive file to the distributed cache.                                                                                               |
+--------+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--log-level``               | Logging level                                                                                                                                            |
+--------+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--job-name``                | name of the job                                                                                                                                          |
+--------+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--python-program``          | python executable that should be used by the wrapper                                                                                                     |
+--------+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--pretend``                 | Do not actually submit a job, print the generated config settings and the command line that would be invoked                                             |
+--------+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--hadoop-conf``             | Hadoop configuration file                                                                                                                                |
+--------+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--input-format``            | java classname of InputFormat                                                                                                                            |
+--------+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-m`` | ``--map-fn``                  | name of map function within module                                                                                                                       |
+--------+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-r`` | ``--reduce-fn``               | name of reduce function within module                                                                                                                    |
+--------+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-c`` | ``--combine-fn``              | name of combine function within module                                                                                                                   |
+--------+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--combiner-fn``             | --combine-fn alias for backwards compatibility                                                                                                           |
+--------+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-t`` | ``--kv-separator``            | output key-value separator                                                                                                                               |
+--------+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+


================================================
FILE: docs/pydoop_submit_options.rst
================================================
..
  Auto-generated by dev_tools/dump_app_params. DO NOT EDIT!
  To update, run:
    dev_tools/dump_app_params --app submit -o docs/pydoop_submit_options.rst

+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| Short  | Long                                   | Meaning                                                                                                                                                  |
+========+========================================+==========================================================================================================================================================+
|        | ``--num-reducers``                     | Number of reduce tasks. Specify 0 to only perform map phase                                                                                              |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--no-override-home``                 | Don't set the script's HOME directory to the $HOME in your environment.  Hadoop will set it to the value of the 'mapreduce.admin.user.home.dir' property |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--no-override-env``                  | Use the default PATH, LD_LIBRARY_PATH and PYTHONPATH, instead of copying them from the submitting client node                                            |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--no-override-ld-path``              | Use the default LD_LIBRARY_PATH instead of copying it from the submitting client node                                                                    |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--no-override-pypath``               | Use the default PYTHONPATH instead of copying it from the submitting client node                                                                         |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--no-override-path``                 | Use the default PATH instead of copying it from the submitting client node                                                                               |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--set-env``                          | Set environment variables for the tasks. If a variable is set to '', it will not be overridden by Pydoop.                                                |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-D`` | ``--job-conf``                         | Set a Hadoop property, e.g., -D mapreduce.job.priority=high                                                                                              |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--python-zip``                       | Additional python zip file                                                                                                                               |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--upload-file-to-cache``             | Upload and add this file to the distributed cache.                                                                                                       |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--upload-archive-to-cache``          | Upload and add this archive file to the distributed cache.                                                                                               |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--log-level``                        | Logging level                                                                                                                                            |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--job-name``                         | name of the job                                                                                                                                          |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--python-program``                   | python executable that should be used by the wrapper                                                                                                     |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--pretend``                          | Do not actually submit a job, print the generated config settings and the command line that would be invoked                                             |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--hadoop-conf``                      | Hadoop configuration file                                                                                                                                |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--input-format``                     | java classname of InputFormat                                                                                                                            |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--disable-property-name-conversion`` | Do not adapt property names to the hadoop version used.                                                                                                  |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--do-not-use-java-record-reader``    | Disable java RecordReader                                                                                                                                |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--do-not-use-java-record-writer``    | Disable java RecordWriter                                                                                                                                |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--output-format``                    | java classname of OutputFormat                                                                                                                           |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--libjars``                          | Additional comma-separated list of jar files                                                                                                             |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--cache-file``                       | Add this HDFS file to the distributed cache as a file.                                                                                                   |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--cache-archive``                    | Add this HDFS archive file to the distributed cacheas an archive.                                                                                        |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--entry-point``                      | Explicitly execute MODULE.ENTRY_POINT() in the launcher script.                                                                                          |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--avro-input``                       | Avro input mode (key, value or both)                                                                                                                     |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--avro-output``                      | Avro output mode (key, value or both)                                                                                                                    |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--pstats-dir``                       | Profile each task and store stats in this dir                                                                                                            |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--pstats-fmt``                       | pstats filename pattern (expert use only)                                                                                                                |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|        | ``--keep-wd``                          | Don't remove the work dir                                                                                                                                |
+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+


================================================
FILE: docs/running_pydoop_applications.rst
================================================
.. _running_apps:

Pydoop Submit User Guide
========================

Pydoop applications are run via the ``pydoop submit`` command.  To
start, you will need a working Hadoop cluster.  If you don't have one
available, you can bring up a single-node Hadoop cluster on your
machine -- see `the Hadoop web site <http://hadoop.apache.org>`_ for
instructions. Alternatively, the source directory contains a
Dockerfile that can be used to build an image with Hadoop and Pydoop
installed and (minimally) configured. Check out ``.travis.yml`` for
usage hints.

If your application is contained in a single (local) file named
``wc.py``, with an entry point called ``__main__`` (see
:ref:`api_tutorial`) you can run it as follows::

  pydoop submit --upload-file-to-cache wc.py wc input output

where ``input`` (file or directory) and ``output`` (directory) are
HDFS paths.  Note that the ``output`` directory will not be
overwritten: instead, an error will be generated if it already exists
when you launch the program.

If your entry point has a different name, specify it via ``--entry-point``.

The following table shows command line options for ``pydoop submit``:

.. include:: pydoop_submit_options.rst


Setting the Environment for your Program
----------------------------------------

When working on a shared cluster where you don't have root access, you
might have a lot of software installed in non-standard locations, such
as your home directory. Since non-interactive ssh connections do not
usually preserve your environment, you might lose some essential
setting like ``LD_LIBRARY_PATH``\ .

For this reason, by default ``pydoop submit`` copies some environment
variables from the submitting node to the driver script that runs each task
on Hadoop.  If this behavior is not desired, you can disable it via the
``--no-override-env`` command line option.


================================================
FILE: docs/self_contained.rst
================================================
.. _self_contained:

Installation-free Usage
=======================

This example shows how to use the Hadoop Distributed Cache (DC) to
distribute Python packages, possibly including Pydoop itself, to all
cluster nodes at job launch time. This is useful in all cases where
installing to each node is not feasible (e.g., lack of a shared mount
point). Of course, Hadoop itself must be already installed and
properly configured in all cluster nodes before you can run this.

Source code for this example is available under ``examples/self_contained``\ .


Example Application: Count Vowels
---------------------------------

The example MapReduce application, ``vowelcount``, is rather trivial: it counts
the occurrence of each vowel in the input text. Since the point here
is to show how a structured package can be distributed and imported,
the implementation is exceedingly verbose.

.. literalinclude:: ../examples/self_contained/vowelcount/lib/__init__.py
   :language: python
   :start-after: DOCS_INCLUDE_START

.. literalinclude:: ../examples/self_contained/vowelcount/mr/mapper.py
   :language: python
   :pyobject: Mapper

.. literalinclude:: ../examples/self_contained/vowelcount/mr/reducer.py
   :language: python
   :pyobject: Reducer


How it Works
------------

The DC supports automatic distribution of files and archives across
the cluster at job launch time.  This feature can be used to dispatch
Python packages to all nodes, eliminating the need to install
dependencies for your application, including Pydoop itself::

  pydoop submit --upload-archive-to-cache vowelcount.tgz \
                --upload-archive-to-cache pydoop.tgz [...]

The ``pydoop.tgz`` and ``vowelcount.tgz`` archives will be copied to
all slave nodes and unpacked; in addition, ``pydoop`` and
``vowelcount`` symlinks will be created in the current working
directory of each task before it is executed.  If you include in each
archive the *contents* of the corresponding package, they will be
available for import::

  cd examples/self_contained/vowelcount
  tar cfz ../vowelcount.tgz .

The archive must be in one of the formats supported by Hadoop: zip, tar or tgz.

.. note::

  Pydoop submit automatically builds the name of the symlink that
  points to the unpacked archive by stripping the last extension.
  Thus, ``foo.tar.gz`` will not work as expected, since the link will
  be called ``foo.tar``. Always use the ``.tgz`` extension in this
  case.

The example is supposed to work with Pydoop and vowelcount *not*
installed on the slave nodes (you do need Pydoop on the client machine
used to run the example, however).


================================================
FILE: docs/tutorial/hdfs_api.rst
================================================
.. _hdfs_api_tutorial:

The HDFS API
============

The :ref:`HDFS API <hdfs-api>` allows you to connect to an HDFS
installation, read and write files and get information on files,
directories and global file system properties:

.. literalinclude:: ../../examples/hdfs/repl_session.py
   :language: python
   :start-after: DOCS_INCLUDE_START
   :end-before: DOCS_INCLUDE_END


Low-level API
-------------

The high level API showcased above can be inefficient
when performing multiple operations on the same HDFS instance. This is
due to the fact that, under the hood, each function opens a separate
connection to the HDFS server and closes it before returning. The
following example shows how to build statistics of HDFS usage by block
size by directly instantiating an ``hdfs`` object, which represents an
open connection to an HDFS instance. Full source code for the example,
including a script that can be used to generate an HDFS directory tree
is located under ``examples/hdfs`` in the Pydoop distribution.

.. literalinclude:: ../../examples/hdfs/treewalk.py
   :language: python
   :start-after: DOCS_INCLUDE_START

For more information, see the :ref:`HDFS API reference <hdfs-api>`.


================================================
FILE: docs/tutorial/index.rst
================================================
.. _tutorial:

Tutorial
========

.. toctree::
   :maxdepth: 2

   pydoop_script
   hdfs_api
   mapred_api


================================================
FILE: docs/tutorial/mapred_api.rst
================================================
.. _api_tutorial:

Writing Full-Featured Applications
==================================

While :ref:`Pydoop Script <pydoop_script_tutorial>` allows to solve
many problems with minimal programming effort, some tasks require a
broader set of features. If your data is not simple text with one record
per line, for instance, you may need to write a record reader; if
you need to change the way intermediate keys are assigned to reducers,
you have to write your own partitioner.  These components are
accessible via the Pydoop MapReduce API.

The rest of this section serves as an introduction to MapReduce
programming with Pydoop; the :ref:`API reference <mr_api>` has
all the details.


Mappers and Reducers
--------------------

The Pydoop API is object-oriented: the application developer writes a
:class:`~pydoop.mapreduce.api.Mapper` class, whose core job is
performed by the :meth:`~pydoop.mapreduce.api.Mapper.map` method, and
a :class:`~pydoop.mapreduce.api.Reducer` class that processes data via
the :meth:`~pydoop.mapreduce.api.Reducer.reduce` method.  The
following snippet shows how to write the mapper and reducer for
*wordcount*, an application that counts the occurrence of each word in a
text data set:

.. literalinclude:: ../../examples/pydoop_submit/mr/wordcount_minimal.py
   :language: python
   :start-after: DOCS_INCLUDE_START

The mapper is instantiated by the MapReduce framework that, for each
input record, calls the ``map`` method passing a ``context`` object to it.
The context serves as a communication interface between the framework
and the application: in the ``map`` method, it is used to get the current
key (not used in the above example) and value, and to emit (send back
to the framework) intermediate key-value pairs.  The reducer works in
a similar way, the main difference being the fact that the ``reduce``
method gets a set of values for each key.  The context has several
other functions that we will explore later.

To run the above program, save it to a ``wc.py`` file and execute::

  pydoop submit --upload-file-to-cache wc.py wc input output

Where ``input`` is the HDFS input directory.

See the section on :ref:`running Pydoop programs<running_apps>` for
more details.  Source code for the word count example is located under
``examples/pydoop_submit/mr`` in the Pydoop distribution.


Counters and Status Updates
---------------------------

Hadoop features application-wide counters that can be set and
incremented by developers.  Status updates are arbitrary text messages
sent to the framework: these are especially useful in cases where the
computation associated with a single input record can take a
considerable amount of time, since Hadoop kills tasks that read no
input, write no output and do not update the status within a
configurable amount of time (ten minutes by default).

The following snippet shows how to modify the above example to use
counters and status updates:

.. literalinclude:: ../../examples/pydoop_submit/mr/wordcount_full.py
   :language: python
   :pyobject: Mapper

.. literalinclude:: ../../examples/pydoop_submit/mr/wordcount_full.py
   :language: python
   :pyobject: Reducer

Counter values and status updates show up in Hadoop's web interface.
In addition, the final values of all counters are listed in the
command line output of the job (note that the list also includes Hadoop's
default counters).


Record Readers and Writers
--------------------------

By default, Hadoop assumes you want to process plain text and splits
input data into text lines.  If you need to process binary data, or
your text data is structured into records that span multiple lines,
you need to write your own :class:`~pydoop.mapreduce.api.RecordReader`.
The **record reader** operates at the HDFS file level: its job is to read
data from the file and feed it as a stream of key-value pairs
(records) to the mapper. To interact with HDFS files, we need to import the
``hdfs`` submodule:

.. code-block:: python

  import pydoop.hdfs as hdfs

The following example shows how to write a record reader that mimics
Hadoop's default ``LineRecordReader``, where keys are byte offsets
with respect to the whole file and values are text lines:

.. literalinclude:: ../../examples/pydoop_submit/mr/wordcount_full.py
   :language: python
   :pyobject: Reader

From the context, the record reader gets the following information on
the byte chunk assigned to the current task, or **input split**:

* the name of the file it belongs to;
* its offset with respect to the beginning of the file;
* its length.

This allows to open the file, seek to the correct offset and read
until the end of the split is reached.  The framework gets the record
stream by means of repeated calls to the
:meth:`~pydoop.mapreduce.api.RecordReader.next` method.  The
:meth:`~pydoop.mapreduce.api.RecordReader.get_progress` method is
called by the framework to get the fraction of the input split that's
already been processed.  The ``close`` method (present in all
components except for the partitioner) is called by the framework once
it has finished retrieving the records: this is the right place to
perform cleanup tasks such as closing open handles.

To use the reader, pass the class object to the factory with
``record_reader_class=Reader`` and, when running the program with
``pydoop submit``, set the ``--do-not-use-java-record-reader`` flag.

The **record writer** writes key/value pairs to output files. The default
behavior is to write one tab-separated key/value pair per line; if you
want to do something different, you have to write a custom
:class:`~pydoop.mapreduce.api.RecordWriter`:

.. literalinclude:: ../../examples/pydoop_submit/mr/wordcount_full.py
   :language: python
   :pyobject: Writer

The above example, which simply reproduces the default behavior, also
shows how to get job configuration parameters: the one starting with
``mapreduce`` is a standard Hadoop parameter, while ``pydoop.hdfs.user``
is a custom parameter defined by the application developer.
Configuration properties are passed as ``-D <key>=<value>`` (e.g.,
``-D mapreduce.output.textoutputformat.separator='|'``) to the submitter.

To use the writer, pass the class object to the factory with
``record_writer_class=Writer`` and, when running the program with
``pydoop submit``, set the ``--do-not-use-java-record-writer`` flag.


Partitioners and Combiners
--------------------------

The :class:`~pydoop.mapreduce.api.Partitioner` assigns intermediate keys to
reducers. If you do *not* explicitly set a partitioner via the factory,
partitioning will be done on the Java side. By default, Hadoop uses
`HashPartitioner
<https://hadoop.apache.org/docs/r3.0.0/api/org/apache/hadoop/mapreduce/lib/partition/HashPartitioner.html>`_,
which selects the reducer on the basis of a hash function of the key.

To write a custom partitioner in Python, subclass
:class:`~pydoop.mapreduce.api.Partitioner`, overriding the
:meth:`~pydoop.mapreduce.api.Partitioner.partition` method. The framework will
call this method with the current key and the total number of reducers ``N``
as the arguments, and expect the chosen reducer ID --- in the ``[0, ...,
N-1]`` range --- as the return value.

The following examples shows how to write a partitioner that simply mimics the
default ``HashPartitioner`` behavior:

.. literalinclude:: ../../examples/pydoop_submit/mr/wordcount_full.py
   :language: python
   :pyobject: Partitioner
   :prepend: from hashlib import md5

The combiner is functionally identical to a reducer, but it is run
locally, on the key-value stream output by a single mapper.  Although
nothing prevents the combiner from processing values differently from
the reducer, the former, provided that the reduce function is
associative and idempotent, is typically configured to be the same as
the latter, in order to perform local aggregation and thus help cut
down network traffic.

Local aggregation is implemented by caching intermediate key/value pairs in a
dictionary. Like in standard Java Hadoop, cache size is controlled by
``mapreduce.task.io.sort.mb`` and defaults to 100 MB. Pydoop uses
:func:`sys.getsizeof` to determine key/value size, which takes into account
Python object overhead. This can be quite substantial (e.g.,
``sys.getsizeof(b"foo") == 36``) and must be taken into account if fine tuning
is desired.

.. important:: Due to the caching, when using a combiner there are
  limitations on the types that can be used for intermediate keys and
  values. First of all, keys must be `hashable
  <https://docs.python.org/3/glossary.html>`_. In addition, values
  belonging to a mutable type should not change after having been
  emitted by the mapper. For instance, the following (however contrived)
  example would not work as expected:

  .. code-block:: python

    intermediate_value = {}

    class Mapper(api.Mapper):
      def map(self, ctx):
         intermediate_value.clear()
         intermediate_value[ctx.key] = ctx.value
         ctx.emit("foo", intermediate_value)

  For these reasons, it is recommended to use immutable types for both keys
  and values when the job includes a combiner.

Custom partitioner and combiner classes must be declared to the factory as
done above for record readers and writers. To recap, if we need to use all of
the above components, we need to instantiate the factory as:

.. literalinclude:: ../../examples/pydoop_submit/mr/wordcount_full.py
   :language: python
   :start-after: DOCS_INCLUDE_START
   :end-before: DOCS_INCLUDE_END


Profiling Your Application
--------------------------

Python has built-in support for application `profiling
<https://docs.python.org/3/library/profile.html>`_. Profiling a standalone
program is relatively straightforward: run it through ``cProfile``, store
stats in a file and use ``pstats`` to read and interpret them. A MapReduce
job, however, spawns multiple map and reduce tasks, so we need a way to
collect all stats. Pydoop supports this via a ``pstats_dir`` argument to
``run_task``:

.. code-block:: python

  pipes.run_task(factory, pstats_dir="pstats")

With the above call, Pydoop will run each MapReduce task with ``cProfile``,
and store resulting pstats files in the ``"pstats"`` directory on HDFS.
You can also enable profiling in the ``pydoop submit`` command line:

.. code-block:: bash

  pydoop submit --pstats-dir HDFS_DIR [...]

If the pstats directory is specified both ways, the one from ``run_task``
takes precedence.

Another way to do time measurements is via counters. The ``utils.misc`` module
provides a ``Timer`` object for this purpose:

.. code-block:: python

  from pydoop.utils.misc import Timer

  class Mapper(api.Mapper):

      def __init__(self, context):
          super(Mapper, self).__init__(context)
          self.timer = Timer(context)

      def map(self, context):
          with self.timer.time_block("tokenize"):
              words = context.value.split()
          for w in words:
              context.emit(w, 1)

With the above coding, the total time spent to execute
``context.value.split()`` (in ms) will be automatically accumulated in
a ``TIME_TOKENIZE`` counter under the ``Timer`` counter group.

Since profiling and timers can substantially slow down the Hadoop job, they
should only be used for performance debugging.


================================================
FILE: docs/tutorial/pydoop_script.rst
================================================
.. _pydoop_script_tutorial:

Easy Hadoop Scripting with Pydoop Script
========================================

Pydoop Script is the easiest way to write simple MapReduce programs
for Hadoop.  With Pydoop Script, your code focuses on the core of the
MapReduce model: the mapper and reducer functions.


Writing and Running Scripts
---------------------------

Write a ``script.py`` Python module that contains the mapper and
reducer functions:

.. code-block:: python

  def mapper(input_key, input_value, writer):
      # your computation here
      writer.emit(intermediate_key, intermediate_value)

  def reducer(intermediate_key, value_iterator, writer):
      # your computation here
      writer.emit(output_key, output_value)

The program can be run as follows::

  pydoop script script.py hdfs_input hdfs_output


Examples
--------

The following examples show how to use Pydoop Script for common
problems.  More examples can be found in the
``examples/pydoop_script`` subdirectory of Pydoop's source
distribution root.  The :ref:`Pydoop Script Guide
<pydoop_script_guide>` contains more detailed information on writing
and running programs.


.. _word_count:

Word Count
++++++++++

Count the occurrence of each word in a set of text files.

.. literalinclude:: ../../examples/pydoop_script/scripts/wordcount.py
   :language: python
   :start-after: DOCS_INCLUDE_START

A few more lines allow to set a combiner for local aggregation:

.. literalinclude:: ../../examples/pydoop_script/scripts/wc_combiner.py
   :language: python
   :start-after: DOCS_INCLUDE_START

Run the example with::

  pydoop script -c combiner wordcount.py hdfs_input hdfs_output

Note that we need to explicitly set the ``-c`` flag to activate the
combiner.  By default, no combiner is called.

One thing to remember is that the current Hadoop Pipes architecture
runs the combiner under the hood of the executable run by ``pipes``,
so it does not update the combiner counters of the general Hadoop
framework.  Thus, if you run the above script, you'll get a value of 0
for "Combine input/output records" in the "Map-Reduce Framework"
group, but the "combiner calls" counter should be updated correctly.


Map-only Jobs and Output Separators
+++++++++++++++++++++++++++++++++++

Suppose we want to convert all input text to lower case. All we need to do is read each input line, convert it to lower case and emit it (for instance, as the output value). Since there is no aggregation involved, we don't need a reducer:

.. literalinclude:: ../../examples/pydoop_script/scripts/lowercase.py
   :language: python
   :start-after: DOCS_INCLUDE_START

The only problem with the above code is that, by default, each output key-value pair is written as tab-separated, which would lead to each output line having a leading tab character that's not found in the original input (note that we'd get a *trailing* tab if we emitted each record as the output key instead). We can turn off the reduce phase and get an empty separator for output key-value pairs by submitting the job with the following options::

  pydoop script --num-reducers 0 -t '' lowercase.py hdfs_input hdfs_output


Custom Parameters
+++++++++++++++++

Suppose we want to select all lines containing a substring to be given at run time (distributed grep). As in the previous example, we can do this with a map-only job (read each input line and emit it if it contains the substring), but we need a way for the user of our application to specify the substring to be matched. This can be done by adding a fourth argument to the mapper function:

.. literalinclude:: ../../examples/pydoop_script/scripts/grep.py
   :language: python
   :start-after: DOCS_INCLUDE_START

In this case, Pydoop Script passes the Hadoop job configuration to the ``mapper`` function as a dictionary via the fourth argument. Moreover, just like Hadoop tools (e.g., ``hadoop pipes``), Pydoop Script allows to set additional configuration parameters via ``-D key=value``. To search for "hello", for instance, we can run the application as::

  pydoop script --num-reducers 0 -t '' -D grep-expression=hello \
    grep.py hdfs_input hdfs_output


Applicability
-------------

Pydoop Script makes it easy to solve simple problems.  It makes it
feasible to write simple (even throw-away) scripts to perform simple
manipulations or analyses on your data, especially if it's text-based.
If you can specify your algorithm in two simple functions that have no
state or have a simple state that can be stored in module variables,
then you can consider using Pydoop Script.
If, on the other hand, you need more sophisticated processing, consider
using the :ref:`full Pydoop API <api_tutorial>`.


================================================
FILE: examples/README
================================================
This directory contains several Pydoop usage examples. Documentation
is in the "examples" subsection of the Pydoop html docs (look for the
"docs" subdirectory in the distribution root).


================================================
FILE: examples/avro/build.sh
================================================
#!/usr/bin/env bash

set -euo pipefail
[ -n "${DEBUG:-}" ] && set -x
this="${BASH_SOURCE-$0}"
this_dir=$(cd -P -- "$(dirname -- "${this}")" && pwd -P)
. "${this_dir}/../config.sh"
. "${this_dir}/config.sh"

pushd "${this_dir}"
gen_classpath
cp="$(<"${CP_PATH}"):$(${HADOOP} classpath)"
mkdir -p "${CLASS_DIR}"
javac -cp "${cp}" -d "${CLASS_DIR}" src/main/java/it/crs4/pydoop/*
jar -cf "${JAR_PATH}" -C "${CLASS_DIR}" ./it
popd


================================================
FILE: examples/avro/config.sh
================================================
[ -n "${PYDOOP_AVRO_EXAMPLES:-}" ] && return || readonly PYDOOP_AVRO_EXAMPLES=1

TARGET="target"
export CLASS_DIR="${TARGET}/classes"
export CP_PATH="${TARGET}/cp.txt"
export JAR_PATH="${TARGET}/pydoop-avro-examples.jar"

gen_classpath() {
    [ -f "${CP_PATH}" ] && return 0
    mkdir -p "${TARGET}"
    mvn dependency:resolve
    mvn dependency:build-classpath -D mdep.outputFile="${CP_PATH}"
    echo -n ':'$(readlink -e ../../lib)/'*' >> "${CP_PATH}"
}

export -f gen_classpath


================================================
FILE: examples/avro/pom.xml
================================================
<?xml version="1.0" encoding="UTF-8"?>

<!--
  BEGIN_COPYRIGHT

  Copyright 2009-2026 CRS4.

  Licensed under the Apache License, Version 2.0 (the "License"); you may not
  use this file except in compliance with the License. You may obtain a copy
  of the License at

  http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
  WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
  License for the specific language governing permissions and limitations
  under the License.

  END_COPYRIGHT
-->

<project xmlns="http://maven.apache.org/POM/4.0.0"
	 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

  <modelVersion>4.0.0</modelVersion>
  <groupId>it.crs4.pydoop</groupId>
  <artifactId>pydoop-avro-examples</artifactId>
  <packaging>jar</packaging>
  <version>2.0a2</version>
  <name>Pydoop Avro Examples</name>
  <url>https://crs4.github.io/pydoop/</url>

  <properties>
    <parquet.version>1.7.0</parquet.version>
  </properties>

  <dependencies>
    <dependency>
      <groupId>org.apache.parquet</groupId>
      <artifactId>parquet-common</artifactId>
      <version>${parquet.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.parquet</groupId>
      <artifactId>parquet-column</artifactId>
      <version>${parquet.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.parquet</groupId>
      <artifactId>parquet-hadoop</artifactId>
      <version>${parquet.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.parquet</groupId>
      <artifactId>parquet-avro</artifactId>
      <version>${parquet.version}</version>
    </dependency>
  </dependencies>

</project>


================================================
FILE: examples/avro/py/avro_base.py
================================================
#!/usr/bin/env python

# BEGIN_COPYRIGHT
#
# Copyright 2009-2026 CRS4.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not
# use this file except in compliance with the License. You may obtain a copy
# of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
#
# END_COPYRIGHT

import sys
import abc
from collections import Counter

import pydoop.mapreduce.api as api
import pydoop.mapreduce.pipes as pp


class ColorPickBase(api.Mapper):

    @abc.abstractmethod
    def get_user(self, ctx):
        """
        Get the user record.  This is just to avoid writing near identical
        examples for the various key/value cases.  In a real application,
        carrying records over keys or values would be a design decision,
        so you would simply do, e.g., ``user = self.value``.
        """

    def map(self, ctx):
        user = self.get_user(ctx)
        color = user['favorite_color']
        if color is not None:
            ctx.emit(user['office'], Counter({color: 1}))


class AvroKeyColorPick(ColorPickBase):

    def get_user(self, ctx):
        return ctx.key


class AvroValueColorPick(ColorPickBase):

    def get_user(self, ctx):
        return ctx.value


class AvroKeyValueColorPick(ColorPickBase):

    def get_user(self, ctx):
        return ctx.key

    def map(self, ctx):
        sys.stdout.write("value (unused): %r\n" % (ctx.value,))
        super(AvroKeyValueColorPick, self).map(ctx)


class ColorCountBase(api.Reducer):

    def reduce(self, ctx):
        s = sum(ctx.values, Counter())
        self.emit(s, ctx)

    @abc.abstractmethod
    def emit(self, s, ctx):
        """
        Emit the sum to the ctx.  As in the base mapper, this is just to
        avoid writing near identical examples.
        """


class NoAvroColorCount(ColorCountBase):

    def emit(self, s, ctx):
        ctx.emit(ctx.key, "%r" % s)


class AvroKeyColorCount(ColorCountBase):

    def emit(self, s, ctx):
        ctx.emit({'office': ctx.key, 'counts': s}, ctx.key)


class AvroValueColorCount(ColorCountBase):

    def emit(self, s, ctx):
        ctx.emit(ctx.key, {'office': ctx.key, 'counts': s})


class AvroKeyValueColorCount(ColorCountBase):

    def emit(self, s, ctx):
        record = {'office': ctx.key, 'counts': s}
        ctx.emit(record, record)  # FIXME: do something fancier


def run_task(mapper_class, reducer_class=NoAvroColorCount):
    pp.run_task(pp.Factory(mapper_class, reducer_class=reducer_class))


================================================
FILE: examples/avro/py/avro_container_dump_results.py
================================================
# BEGIN_COPYRIGHT
#
# Copyright 2009-2026 CRS4.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not
# use this file except in compliance with the License. You may obtain a copy
# of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
#
# END_COPYRIGHT

import sys

from avro.io import DatumReader
from avro.datafile import DataFileReader


def main(fn, out_fn, avro_mode=''):
    with open(out_fn, 'w') as fo:
        with open(fn, 'rb') as f:
            reader = DataFileReader(f, DatumReader())
            for r in reader:
                if avro_mode.upper() == 'KV':
                    r = r['key']

                fo.write('%s\t%r\n' % (r['office'], r['counts']))
    print('wrote', out_fn)


if __name__ == '__main__':
    main(*sys.argv[1:])


================================================
FILE: examples/avro/py/avro_key_in.py
================================================
#!/usr/bin/env python

# BEGIN_COPYRIGHT
#
# Copyright 2009-2026 CRS4.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not
# use this file except in compliance with the License. You may obtain a copy
# of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
#
# END_COPYRIGHT

from avro_base import AvroKeyColorPick, run_task


def __main__():
    run_task(AvroKeyColorPick)


if __name__ == '__main__':
    __main__()


================================================
FILE: examples/avro/py/avro_key_in_out.py
================================================
#!/usr/bin/env python

# BEGIN_COPYRIGHT
#
# Copyright 2009-2026 CRS4.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not
# use this file except in compliance with the License. You may obtain a copy
# of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
#
# END_COPYRIGHT

from avro_base import AvroKeyColorPick, AvroKeyColorCount, run_task


def __main__():
    run_task(AvroKeyColorPick, AvroKeyColorCount)


if __name__ == '__main__':
    __main__()


================================================
FILE: examples/avro/py/avro_key_value_in.py
================================================
#!/usr/bin/env python

# BEGIN_COPYRIGHT
#
# Copyright 2009-2026 CRS4.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not
# use this file except in compliance with the License. You may obtain a copy
# of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
#
# END_COPYRIGHT

from avro_base import AvroKeyValueColorPick, run_task


def __main__():
    run_task(AvroKeyValueColorPick)


if __name__ == '__main__':
    __main__()


================================================
FILE: examples/avro/py/avro_key_value_in_out.py
================================================
#!/usr/bin/env python

# BEGIN_COPYRIGHT
#
# Copyright 2009-2026 CRS4.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not
# use this file except in compliance with the License. You may obtain a copy
# of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
#
# END_COPYRIGHT

from avro_base import AvroKeyValueColorPick, AvroKeyValueColorCount, run_task


def __main__():
    run_task(AvroKeyValueColorPick, AvroKeyValueColorCount)


if __name__ == '__main__':
    __main__()


================================================
FILE: examples/avro/py/avro_parquet_dump_results.py
================================================
#!/usr/bin/env python

# BEGIN_COPYRIGHT
#
# Copyright 2009-2026 CRS4.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not
# use this file except in compliance with the License. You may obtain a copy
# of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
#
# END_COPYRIGHT

import pydoop.mapreduce.api as api
import pydoop.mapreduce.pipes as pp


class Mapper(api.Mapper):

    def map(self, ctx):
        cc_stat = ctx.value
        ctx.emit(cc_stat['office'], repr(cc_stat['counts']))


def __main__():
    pp.run_task(pp.Factory(Mapper))


================================================
FILE: examples/avro/py/avro_pyrw.py
================================================
#!/usr/bin/env python

# BEGIN_COPYRIGHT
#
# Copyright 2009-2026 CRS4.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not
# use this file except in compliance with the License. You may obtain a copy
# of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
#
# END_COPYRIGHT

"""
Avro color count with Python record reader/writer.
"""

from collections import Counter

import pydoop.mapreduce.api as api
import pydoop.mapreduce.pipes as pp
from pydoop.avrolib import AvroReader, AvroWriter, parse


class UserReader(AvroReader):
    pass


class ColorWriter(AvroWriter):

    schema = parse(open("stats.avsc").read())

    def emit(self, key, value):
        self.writer.append({'office': key, 'counts': value})


class ColorPick(api.Mapper):

    def map(self, ctx):
        user = ctx.value
        color = user['favorite_color']
        if color is not None:
            ctx.emit(user['office'], Counter({color: 1}))


class ColorCount(api.Reducer):

    def reduce(self, ctx):
        s = sum(ctx.values, Counter())
        ctx.emit(ctx.key, s)


pp.run_task(pp.Factory(
    mapper_class=ColorPick,
    reducer_class=ColorCount,
    record_reader_class=UserReader,
    record_writer_class=ColorWriter
), private_encoding=True)


================================================
FILE: examples/avro/py/avro_value_in.py
================================================
#!/usr/bin/env python

# BEGIN_COPYRIGHT
#
# Copyright 2009-2026 CRS4.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not
# use this file except in compliance with the License. You may obtain a copy
# of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
#
# END_COPYRIGHT

from avro_base import AvroValueColorPick, run_task


def __main__():
    run_task(AvroValueColorPick)


if __name__ == '__main__':
    __main__()


================================================
FILE: examples/avro/py/avro_value_in_out.py
================================================
#!/usr/bin/env python

# BEGIN_COPYRIGHT
#
# Copyright 2009-2026 CRS4.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not
# use this file except in compliance with the License. You may obtain a copy
# of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
#
# END_COPYRIGHT

from avro_base import AvroValueColorPick, AvroValueColorCount, run_task


def __main__():
    run_task(AvroValueColorPick, AvroValueColorCount)


if __name__ == '__main__':
    __main__()


================================================
FILE: examples/avro/py/check_cc.py
================================================
# BEGIN_COPYRIGHT
#
# Copyright 2009-2026 CRS4.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not
# use this file except in compliance with the License. You may obtain a copy
# of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
#
# END_COPYRIGHT

import sys
import os
import errno
from collections import Counter

from avro.io import DatumReader
from avro.datafile import DataFileReader
from pydoop.utils.py3compat import iteritems


def iter_fnames(path):
    try:
        contents = os.listdir(path)
    except OSError as e:
        if e.errno == errno.ENOTDIR:
            yield path
    else:
        for name in contents:
            yield os.path.join(path, name)


def main(in_, out_):

    expected = {}
    for in_fn in iter_fnames(in_):
        with open(in_fn, 'rb') as f:
            reader = DataFileReader(f, DatumReader())
            for r in reader:
                expected.setdefault(
                    r["office"], Counter()
                )[r["favorite_color"]] += 1

    computed = {}
    for out_fn in iter_fnames(out_):
        with open(out_fn) as f:
            for l in f:
                p = l.strip().split('\t')
                computed[p[0]] = eval(p[1])

    if set(computed) != set(expected):
        sys.exit("ERROR: computed keys != expected keys: %r != %r" % (
            sorted(computed), sorted(expected)))
    for k, v in iteritems(expected):
        if computed[k] != v:
            sys.exit("ERROR: %r: %r != %r" % (k, computed[k], dict(v)))
    print('All is ok!')


if __name__ == '__main__':
    main(sys.argv[1], sys.argv[2])


================================================
FILE: examples/avro/py/check_results.py
================================================
# BEGIN_COPYRIGHT
#
# Copyright 2009-2026 CRS4.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not
# use this file except in compliance with the License. You may obtain a copy
# of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
#
# END_COPYRIGHT

import sys
import os
import errno
from collections import Counter

from pydoop.utils.py3compat import iteritems


def iter_lines(path):
    try:
        contents = os.listdir(path)
    except OSError as e:
        if e.errno == errno.ENOTDIR:
            contents = [path]
    for name in contents:
        with open(os.path.join(path, name)) as f:
            for line in f:
                yield line


def main(exp, res):

    expected = {}
    for l in iter_lines(exp):
        p = l.strip().split(';')
        expected.setdefault(p[1], Counter())[p[2]] += 1

    computed = {}
    for l in iter_lines(res):
        p = l.strip().split('\t')
        computed[p[0]] = eval(p[1])

    if set(computed) != set(expected):
        sys.exit("ERROR: computed keys != expected keys: %r != %r" % (
            sorted(computed), sorted(expected)))
    for k, v in iteritems(expected):
        if computed[

Download .txt

gitextract_2qljhz4z/

├── .dir-locals.el
├── .dockerignore
├── .gitignore
├── .travis/
│   ├── check_script_template.py
│   ├── cmd/
│   │   └── hadoop_localfs.sh
│   ├── run_checks
│   └── start_container
├── .travis.yml
├── AUTHORS
├── Dockerfile
├── Dockerfile.client
├── Dockerfile.docs
├── LICENSE
├── MANIFEST.in
├── README.md
├── VERSION
├── dev_tools/
│   ├── build_deprecation_tables
│   ├── bump_copyright_year
│   ├── docker/
│   │   ├── client_side_tests/
│   │   │   ├── apache_2.6.0/
│   │   │   │   ├── initialize.sh
│   │   │   │   └── local_client_setup.sh
│   │   │   └── hdp_2.2.0.0/
│   │   │       ├── initialize.sh
│   │   │       └── local_client_setup.sh
│   │   ├── cluster.rst
│   │   ├── clusters/
│   │   │   └── apache_2.6.0/
│   │   │       ├── docker-compose.yml
│   │   │       └── images/
│   │   │           ├── base/
│   │   │           │   ├── Dockerfile
│   │   │           │   └── scripts/
│   │   │           │       ├── generate_conf_files.py
│   │   │           │       ├── zk_set.py
│   │   │           │       └── zk_wait.py
│   │   │           ├── bootstrap/
│   │   │           │   ├── Dockerfile
│   │   │           │   └── scripts/
│   │   │           │       ├── bootstrap.py
│   │   │           │       └── create_hdfs_dirs.sh
│   │   │           ├── datanode/
│   │   │           │   ├── Dockerfile
│   │   │           │   └── scripts/
│   │   │           │       └── start_datanode.sh
│   │   │           ├── historyserver/
│   │   │           │   ├── Dockerfile
│   │   │           │   └── scripts/
│   │   │           │       └── start_historyserver.sh
│   │   │           ├── namenode/
│   │   │           │   ├── Dockerfile
│   │   │           │   └── scripts/
│   │   │           │       └── start_namenode.sh
│   │   │           ├── nodemanager/
│   │   │           │   ├── Dockerfile
│   │   │           │   └── scripts/
│   │   │           │       └── start_nodemanager.sh
│   │   │           ├── resourcemanager/
│   │   │           │   ├── Dockerfile
│   │   │           │   └── scripts/
│   │   │           │       └── start_resourcemanager.sh
│   │   │           └── zookeeper/
│   │   │               ├── Dockerfile
│   │   │               └── scripts/
│   │   │                   └── start_namenode.sh
│   │   ├── images/
│   │   │   ├── base/
│   │   │   │   └── Dockerfile
│   │   │   └── client/
│   │   │       └── Dockerfile
│   │   └── scripts/
│   │       ├── build_base_images.sh
│   │       ├── build_cluster_images.sh
│   │       ├── share_etc_hosts.py
│   │       ├── start_client.sh
│   │       └── start_cluster.sh
│   ├── docker_build
│   ├── dump_app_params
│   ├── edit_conf
│   ├── git_export
│   ├── import_src
│   ├── mapred_pipes
│   ├── unpack_debian
│   └── update_docs
├── docs/
│   ├── Makefile
│   ├── _build/
│   │   └── .gitignore
│   ├── _templates/
│   │   └── layout.html
│   ├── api_docs/
│   │   ├── hadut.rst
│   │   ├── hdfs_api.rst
│   │   ├── index.rst
│   │   └── mr_api.rst
│   ├── conf.py
│   ├── examples/
│   │   ├── avro.rst
│   │   ├── index.rst
│   │   ├── input_format.rst
│   │   ├── intro.rst
│   │   └── sequence_file.rst
│   ├── how_to_cite.rst
│   ├── index.rst
│   ├── installation.rst
│   ├── news/
│   │   ├── archive.rst
│   │   ├── index.rst
│   │   └── latest.rst
│   ├── pydoop_script.rst
│   ├── pydoop_script_options.rst
│   ├── pydoop_submit_options.rst
│   ├── running_pydoop_applications.rst
│   ├── self_contained.rst
│   └── tutorial/
│       ├── hdfs_api.rst
│       ├── index.rst
│       ├── mapred_api.rst
│       └── pydoop_script.rst
├── examples/
│   ├── README
│   ├── avro/
│   │   ├── build.sh
│   │   ├── config.sh
│   │   ├── data/
│   │   │   └── mini_aligned_seqs.gz.parquet
│   │   ├── pom.xml
│   │   ├── py/
│   │   │   ├── avro_base.py
│   │   │   ├── avro_container_dump_results.py
│   │   │   ├── avro_key_in.py
│   │   │   ├── avro_key_in_out.py
│   │   │   ├── avro_key_value_in.py
│   │   │   ├── avro_key_value_in_out.py
│   │   │   ├── avro_parquet_dump_results.py
│   │   │   ├── avro_pyrw.py
│   │   │   ├── avro_value_in.py
│   │   │   ├── avro_value_in_out.py
│   │   │   ├── check_cc.py
│   │   │   ├── check_results.py
│   │   │   ├── color_count.py
│   │   │   ├── create_input.py
│   │   │   ├── gen_data.py
│   │   │   ├── generate_avro_users.py
│   │   │   ├── kmer_count.py
│   │   │   ├── show_kmer_count.py
│   │   │   └── write_avro.py
│   │   ├── run
│   │   ├── run_avro_container_in
│   │   ├── run_avro_container_in_out
│   │   ├── run_avro_parquet_in
│   │   ├── run_avro_parquet_in_out
│   │   ├── run_avro_pyrw
│   │   ├── run_color_count
│   │   ├── run_kmer_count
│   │   ├── schemas/
│   │   │   ├── alignment_record.avsc
│   │   │   ├── alignment_record_proj.avsc
│   │   │   ├── pet.avsc
│   │   │   ├── stats.avsc
│   │   │   └── user.avsc
│   │   ├── src/
│   │   │   └── main/
│   │   │       └── java/
│   │   │           └── it/
│   │   │               └── crs4/
│   │   │                   └── pydoop/
│   │   │                       ├── WriteKV.java
│   │   │                       └── WriteParquet.java
│   │   └── write_avro_kv
│   ├── c++/
│   │   ├── HadoopPipes.cc
│   │   ├── Makefile
│   │   ├── README.txt
│   │   ├── SerialUtils.cc
│   │   ├── StringUtils.cc
│   │   ├── include/
│   │   │   └── hadoop/
│   │   │       ├── Pipes.hh
│   │   │       ├── SerialUtils.hh
│   │   │       ├── StringUtils.hh
│   │   │       └── TemplateFactory.hh
│   │   └── wordcount.cc
│   ├── config.sh
│   ├── hdfs/
│   │   ├── common.py
│   │   ├── repl_session.py
│   │   ├── run
│   │   ├── treegen.py
│   │   └── treewalk.py
│   ├── input/
│   │   ├── alice_1.txt
│   │   └── alice_2.txt
│   ├── input_format/
│   │   ├── check_results.py
│   │   ├── it/
│   │   │   └── crs4/
│   │   │       └── pydoop/
│   │   │           ├── mapred/
│   │   │           │   └── TextInputFormat.java
│   │   │           └── mapreduce/
│   │   │               └── TextInputFormat.java
│   │   └── run
│   ├── pydoop_script/
│   │   ├── check.py
│   │   ├── data/
│   │   │   ├── base_histogram_input/
│   │   │   │   ├── example_1.sam
│   │   │   │   └── example_2.sam
│   │   │   ├── stop_words.txt
│   │   │   └── transpose_input/
│   │   │       └── matrix.txt
│   │   ├── run
│   │   ├── run_script.sh
│   │   └── scripts/
│   │       ├── base_histogram.py
│   │       ├── caseswitch.py
│   │       ├── grep.py
│   │       ├── lowercase.py
│   │       ├── transpose.py
│   │       ├── wc_combiner.py
│   │       ├── wordcount.py
│   │       └── wordcount_sw.py
│   ├── pydoop_submit/
│   │   ├── check.py
│   │   ├── data/
│   │   │   ├── cols_1.txt
│   │   │   └── cols_2.txt
│   │   ├── mr/
│   │   │   ├── map_only_java_writer.py
│   │   │   ├── map_only_python_writer.py
│   │   │   ├── nosep.py
│   │   │   ├── wordcount_full.py
│   │   │   └── wordcount_minimal.py
│   │   ├── run
│   │   └── run_submit.sh
│   ├── run_all
│   ├── self_contained/
│   │   ├── check_results.py
│   │   ├── run
│   │   └── vowelcount/
│   │       ├── __init__.py
│   │       ├── lib/
│   │       │   └── __init__.py
│   │       └── mr/
│   │           ├── __init__.py
│   │           ├── main.py
│   │           ├── mapper.py
│   │           └── reducer.py
│   └── sequence_file/
│       ├── bin/
│       │   ├── filter.py
│       │   └── wordcount.py
│       ├── check.py
│       └── run
├── int_test/
│   ├── config.sh
│   ├── mapred_submitter/
│   │   ├── check.py
│   │   ├── genwords.py
│   │   ├── input/
│   │   │   ├── map_only/
│   │   │   │   ├── f1.txt
│   │   │   │   └── f2.txt
│   │   │   ├── map_reduce/
│   │   │   │   ├── f1.txt
│   │   │   │   └── f2.txt
│   │   │   └── map_reduce_long/
│   │   │       └── f.txt
│   │   ├── mr/
│   │   │   ├── map_only_java_writer.py
│   │   │   ├── map_only_python_writer.py
│   │   │   ├── map_reduce_combiner.py
│   │   │   ├── map_reduce_java_rw.py
│   │   │   ├── map_reduce_java_rw_pstats.py
│   │   │   ├── map_reduce_python_partitioner.py
│   │   │   ├── map_reduce_python_reader.py
│   │   │   ├── map_reduce_python_writer.py
│   │   │   ├── map_reduce_raw_io.py
│   │   │   ├── map_reduce_slow_java_rw.py
│   │   │   └── map_reduce_slow_python_rw.py
│   │   ├── run
│   │   ├── run_app.sh
│   │   └── run_perf.sh
│   ├── opaque_split/
│   │   ├── check.py
│   │   ├── gen_splits.py
│   │   ├── mrapp.py
│   │   └── run
│   ├── progress/
│   │   ├── mrapp.py
│   │   └── run
│   └── run_all
├── lib/
│   └── avro-mapred-1.7.7-hadoop2.jar
├── logo/
│   └── ubuntu-font-family.tar.bz2
├── notice_template.txt
├── pydoop/
│   ├── __init__.py
│   ├── app/
│   │   ├── __init__.py
│   │   ├── argparse_types.py
│   │   ├── main.py
│   │   ├── script.py
│   │   ├── script_template.py
│   │   └── submit.py
│   ├── avrolib.py
│   ├── hadoop_utils.py
│   ├── hadut.py
│   ├── hdfs/
│   │   ├── __init__.py
│   │   ├── common.py
│   │   ├── core/
│   │   │   └── __init__.py
│   │   ├── file.py
│   │   ├── fs.py
│   │   └── path.py
│   ├── jc.py
│   ├── mapreduce/
│   │   ├── __init__.py
│   │   ├── api.py
│   │   ├── binary_protocol.py
│   │   ├── connections.py
│   │   └── pipes.py
│   ├── test_support.py
│   ├── test_utils.py
│   └── utils/
│       ├── __init__.py
│       ├── conversion_tables.py
│       ├── jvm.py
│       ├── misc.py
│       └── py3compat.py
├── pydoop.properties
├── requirements.txt
├── setup.cfg
├── setup.py
├── src/
│   ├── Py_macros.h
│   ├── buf_macros.h
│   ├── it/
│   │   └── crs4/
│   │       └── pydoop/
│   │           ├── NoSeparatorTextOutputFormat.java
│   │           └── mapreduce/
│   │               └── pipes/
│   │                   ├── Application.java
│   │                   ├── BinaryProtocol.java
│   │                   ├── DownwardProtocol.java
│   │                   ├── DummyRecordReader.java
│   │                   ├── OpaqueSplit.java
│   │                   ├── OutputHandler.java
│   │                   ├── PipesMapper.java
│   │                   ├── PipesNonJavaInputFormat.java
│   │                   ├── PipesNonJavaOutputFormat.java
│   │                   ├── PipesPartitioner.java
│   │                   ├── PipesReducer.java
│   │                   ├── PydoopAvroBridgeKeyReader.java
│   │                   ├── PydoopAvroBridgeKeyValueReader.java
│   │                   ├── PydoopAvroBridgeKeyValueWriter.java
│   │                   ├── PydoopAvroBridgeKeyWriter.java
│   │                   ├── PydoopAvroBridgeReaderBase.java
│   │                   ├── PydoopAvroBridgeValueReader.java
│   │                   ├── PydoopAvroBridgeValueWriter.java
│   │                   ├── PydoopAvroBridgeWriterBase.java
│   │                   ├── PydoopAvroInputBridgeBase.java
│   │                   ├── PydoopAvroInputKeyBridge.java
│   │                   ├── PydoopAvroInputKeyValueBridge.java
│   │                   ├── PydoopAvroInputValueBridge.java
│   │                   ├── PydoopAvroKeyInputFormat.java
│   │                   ├── PydoopAvroKeyOutputFormat.java
│   │                   ├── PydoopAvroKeyRecordReader.java
│   │                   ├── PydoopAvroKeyRecordWriter.java
│   │                   ├── PydoopAvroKeyValueInputFormat.java
│   │                   ├── PydoopAvroKeyValueOutputFormat.java
│   │                   ├── PydoopAvroKeyValueRecordReader.java
│   │                   ├── PydoopAvroKeyValueRecordWriter.java
│   │                   ├── PydoopAvroOutputBridgeBase.java
│   │                   ├── PydoopAvroOutputFormatBase.java
│   │                   ├── PydoopAvroOutputKeyBridge.java
│   │                   ├── PydoopAvroOutputKeyValueBridge.java
│   │                   ├── PydoopAvroOutputValueBridge.java
│   │                   ├── PydoopAvroRecordReaderBase.java
│   │                   ├── PydoopAvroRecordWriterBase.java
│   │                   ├── PydoopAvroValueInputFormat.java
│   │                   ├── PydoopAvroValueOutputFormat.java
│   │                   ├── PydoopAvroValueRecordReader.java
│   │                   ├── PydoopAvroValueRecordWriter.java
│   │                   ├── Submitter.java
│   │                   ├── TaskLog.java
│   │                   ├── TaskLogAppender.java
│   │                   └── UpwardProtocol.java
│   ├── libhdfs/
│   │   ├── common/
│   │   │   ├── htable.c
│   │   │   └── htable.h
│   │   ├── config.h
│   │   ├── exception.c
│   │   ├── exception.h
│   │   ├── hdfs.c
│   │   ├── include/
│   │   │   └── hdfs/
│   │   │       └── hdfs.h
│   │   ├── jni_helper.c
│   │   ├── jni_helper.h
│   │   └── os/
│   │       ├── mutexes.h
│   │       ├── posix/
│   │       │   ├── mutexes.c
│   │       │   ├── platform.h
│   │       │   ├── thread.c
│   │       │   └── thread_local_storage.c
│   │       ├── thread.h
│   │       ├── thread_local_storage.h
│   │       └── windows/
│   │           ├── inttypes.h
│   │           ├── mutexes.c
│   │           ├── platform.h
│   │           ├── thread.c
│   │           ├── thread_local_storage.c
│   │           └── unistd.h
│   ├── native_core_hdfs/
│   │   ├── hdfs_file.cc
│   │   ├── hdfs_file.h
│   │   ├── hdfs_fs.cc
│   │   ├── hdfs_fs.h
│   │   └── hdfs_module.cc
│   ├── py3k_compat.h
│   └── sercore/
│       ├── HadoopUtils/
│       │   ├── SerialUtils.cc
│       │   └── SerialUtils.hh
│       ├── hu_extras.cpp
│       ├── hu_extras.h
│       ├── sercore.cpp
│       ├── streams.cpp
│       └── streams.h
└── test/
    ├── __init__.py
    ├── all_tests.py
    ├── app/
    │   ├── __init__.py
    │   ├── all_tests.py
    │   └── test_submit.py
    ├── avro/
    │   ├── all_tests.py
    │   ├── common.py
    │   ├── test_io.py
    │   └── user.avsc
    ├── common/
    │   ├── __init__.py
    │   ├── all_tests.py
    │   ├── test_hadoop_utils.py
    │   ├── test_hadut.py
    │   ├── test_pydoop.py
    │   └── test_test_support.py
    ├── hdfs/
    │   ├── __init__.py
    │   ├── all_tests.py
    │   ├── common_hdfs_tests.py
    │   ├── test_common.py
    │   ├── test_core.py
    │   ├── test_hdfs.py
    │   ├── test_hdfs_fs.py
    │   ├── test_local_fs.py
    │   ├── test_path.py
    │   └── try_hdfs.py
    ├── mapreduce/
    │   ├── __init__.py
    │   ├── all_tests.py
    │   ├── it/
    │   │   └── crs4/
    │   │       └── pydoop/
    │   │           └── mapreduce/
    │   │               └── pipes/
    │   │                   └── OpaqueRoundtrip.java
    │   ├── m_task.cmd
    │   ├── r_task.cmd
    │   ├── test_connections.py
    │   └── test_opaque.py
    └── sercore/
        ├── all_tests.py
        ├── test_deser.py
        └── test_streams.py

Download .txt

SYMBOL INDEX (1874 symbols across 206 files)

FILE: .travis/check_script_template.py
  function main (line 21) | def main(argv):

FILE: dev_tools/docker/clusters/apache_2.6.0/images/base/scripts/generate_conf_files.py
  function add_property (line 6) | def add_property(conf, name, value):
  function write_xml (line 12) | def write_xml(root, fname):
  function generate_xml_conf_file (line 20) | def generate_xml_conf_file(fname, props):
  function generate_core_site (line 27) | def generate_core_site(fname):
  function generate_hdfs_site (line 35) | def generate_hdfs_site(fname):
  function generate_yarn_site (line 46) | def generate_yarn_site(fname):
  function generate_mapred_site (line 68) | def generate_mapred_site(fname):
  function generate_capacity_scheduler (line 86) | def generate_capacity_scheduler(fname):
  function main (line 100) | def main(argv):

FILE: dev_tools/docker/clusters/apache_2.6.0/images/bootstrap/scripts/bootstrap.py
  function etc_updated (line 15) | def etc_updated():
  function boot_node (line 25) | def boot_node(kz, nodename):
  function main (line 34) | def main():

FILE: dev_tools/docker/scripts/share_etc_hosts.py
  class App (line 15) | class App(object):
    method __init__ (line 16) | def __init__(self, compose_group_name):
    method _get_containers (line 20) | def _get_containers(self, compose_group_name):
    method _get_hosts (line 26) | def _get_hosts(self):
    method share_etc_hosts (line 34) | def share_etc_hosts(self):
  function docker_client (line 44) | def docker_client():
  function main (line 78) | def main(argv):

FILE: examples/avro/py/avro_base.py
  class ColorPickBase (line 29) | class ColorPickBase(api.Mapper):
    method get_user (line 32) | def get_user(self, ctx):
    method map (line 40) | def map(self, ctx):
  class AvroKeyColorPick (line 47) | class AvroKeyColorPick(ColorPickBase):
    method get_user (line 49) | def get_user(self, ctx):
  class AvroValueColorPick (line 53) | class AvroValueColorPick(ColorPickBase):
    method get_user (line 55) | def get_user(self, ctx):
  class AvroKeyValueColorPick (line 59) | class AvroKeyValueColorPick(ColorPickBase):
    method get_user (line 61) | def get_user(self, ctx):
    method map (line 64) | def map(self, ctx):
  class ColorCountBase (line 69) | class ColorCountBase(api.Reducer):
    method reduce (line 71) | def reduce(self, ctx):
    method emit (line 76) | def emit(self, s, ctx):
  class NoAvroColorCount (line 83) | class NoAvroColorCount(ColorCountBase):
    method emit (line 85) | def emit(self, s, ctx):
  class AvroKeyColorCount (line 89) | class AvroKeyColorCount(ColorCountBase):
    method emit (line 91) | def emit(self, s, ctx):
  class AvroValueColorCount (line 95) | class AvroValueColorCount(ColorCountBase):
    method emit (line 97) | def emit(self, s, ctx):
  class AvroKeyValueColorCount (line 101) | class AvroKeyValueColorCount(ColorCountBase):
    method emit (line 103) | def emit(self, s, ctx):
  function run_task (line 108) | def run_task(mapper_class, reducer_class=NoAvroColorCount):

FILE: examples/avro/py/avro_container_dump_results.py
  function main (line 25) | def main(fn, out_fn, avro_mode=''):

FILE: examples/avro/py/avro_key_in.py
  function __main__ (line 24) | def __main__():

FILE: examples/avro/py/avro_key_in_out.py
  function __main__ (line 24) | def __main__():

FILE: examples/avro/py/avro_key_value_in.py
  function __main__ (line 24) | def __main__():

FILE: examples/avro/py/avro_key_value_in_out.py
  function __main__ (line 24) | def __main__():

FILE: examples/avro/py/avro_parquet_dump_results.py
  class Mapper (line 25) | class Mapper(api.Mapper):
    method map (line 27) | def map(self, ctx):
  function __main__ (line 32) | def __main__():

FILE: examples/avro/py/avro_pyrw.py
  class UserReader (line 32) | class UserReader(AvroReader):
  class ColorWriter (line 36) | class ColorWriter(AvroWriter):
    method emit (line 40) | def emit(self, key, value):
  class ColorPick (line 44) | class ColorPick(api.Mapper):
    method map (line 46) | def map(self, ctx):
  class ColorCount (line 53) | class ColorCount(api.Reducer):
    method reduce (line 55) | def reduce(self, ctx):

FILE: examples/avro/py/avro_value_in.py
  function __main__ (line 24) | def __main__():

FILE: examples/avro/py/avro_value_in_out.py
  function __main__ (line 24) | def __main__():

FILE: examples/avro/py/check_cc.py
  function iter_fnames (line 29) | def iter_fnames(path):
  function main (line 40) | def main(in_, out_):

FILE: examples/avro/py/check_results.py
  function iter_lines (line 27) | def iter_lines(path):
  function main (line 39) | def main(exp, res):

FILE: examples/avro/py/color_count.py
  class Mapper (line 28) | class Mapper(api.Mapper):
    method map (line 30) | def map(self, ctx):
  class Reducer (line 37) | class Reducer(api.Reducer):
    method reduce (line 39) | def reduce(self, ctx):
  function __main__ (line 44) | def __main__():

FILE: examples/avro/py/create_input.py
  function create_input (line 27) | def create_input(n, stream):
  function main (line 36) | def main(n, filename):

FILE: examples/avro/py/gen_data.py
  class Mapper (line 27) | class Mapper(api.Mapper):
    method map (line 29) | def map(self, ctx):
  function __main__ (line 35) | def __main__():

FILE: examples/avro/py/generate_avro_users.py
  function main (line 37) | def main(argv):

FILE: examples/avro/py/kmer_count.py
  function window (line 28) | def window(s, width):
  class Mapper (line 33) | class Mapper(api.Mapper):
    method map (line 35) | def map(self, ctx):
  class Reducer (line 41) | class Reducer(api.Reducer):
    method reduce (line 43) | def reduce(self, ctx):
  function __main__ (line 47) | def __main__():

FILE: examples/avro/py/show_kmer_count.py
  function main (line 26) | def main(argv):

FILE: examples/avro/py/write_avro.py
  function main (line 30) | def main(schema_fn, csv_fn, avro_fn):

FILE: examples/avro/src/main/java/it/crs4/pydoop/WriteKV.java
  class WriteKV (line 41) | class WriteKV {
    method buildUser (line 45) | private static GenericRecord buildUser(
    method buildPet (line 54) | private static GenericRecord buildPet(
    method createFile (line 62) | private static <T> File createFile(File file, Schema schema, T... reco...
    method createInputFile (line 74) | private static File createInputFile(
    method main (line 106) | public static void main(String[] args) throws Exception {

FILE: examples/avro/src/main/java/it/crs4/pydoop/WriteParquet.java
  class WriteParquet (line 53) | public class WriteParquet extends Configured implements Tool {
    method getSchema (line 60) | private static Schema getSchema(Configuration conf)
    class WriteUserMap (line 70) | public static class WriteUserMap
      method setup (line 75) | @Override
      method map (line 81) | @Override
    method run (line 94) | public int run(String[] args) throws Exception {
    method main (line 129) | public static void main(String[] args) throws Exception {

FILE: examples/c++/HadoopPipes.cc
  type HadoopPipes (line 48) | namespace HadoopPipes {
    class JobConfImpl (line 50) | class JobConfImpl: public JobConf {
      method set (line 54) | void set(const string& key, const string& value) {
      method hasKey (line 58) | virtual bool hasKey(const string& key) const {
      method string (line 62) | virtual const string& get(const string& key) const {
      method getInt (line 70) | virtual int getInt(const string& key) const {
      method getFloat (line 75) | virtual float getFloat(const string& key) const {
      method getBoolean (line 80) | virtual bool getBoolean(const string&key) const {
    class DownwardProtocol (line 86) | class DownwardProtocol {
    class UpwardProtocol (line 101) | class UpwardProtocol {
    class Protocol (line 116) | class Protocol {
    class TextUpwardProtocol (line 123) | class TextUpwardProtocol: public UpwardProtocol {
      method writeBuffer (line 129) | void writeBuffer(const string& buffer) {
      method TextUpwardProtocol (line 134) | TextUpwardProtocol(FILE* _stream): stream(_stream) {}
      method output (line 136) | virtual void output(const string& key, const string& value) {
      method partitionedOutput (line 144) | virtual void partitionedOutput(int reduce, const string& key,
      method status (line 154) | virtual void status(const string& message) {
      method progress (line 159) | virtual void progress(float progress) {
      method registerCounter (line 164) | virtual void registerCounter(int id, const string& group,
      method incrementCounter (line 171) | virtual void incrementCounter(const TaskContext::Counter* counter,
      method done (line 177) | virtual void done() {
    class TextProtocol (line 182) | class TextProtocol: public Protocol {
      method readUpto (line 190) | int readUpto(string& buffer, const char* limit) {
      method TextProtocol (line 205) | TextProtocol(FILE* down, DownwardProtocol* _handler, FILE* up) {
      method UpwardProtocol (line 211) | UpwardProtocol* getUplink() {
      method nextEvent (line 215) | virtual void nextEvent() {
    type MESSAGE_TYPE (line 297) | enum MESSAGE_TYPE {START_MESSAGE, SET_JOB_CONF, SET_INPUT_TYPES, RUN_MAP,
    class BinaryUpwardProtocol (line 303) | class BinaryUpwardProtocol: public UpwardProtocol {
      method BinaryUpwardProtocol (line 307) | BinaryUpwardProtocol(FILE* _stream) {
      method authenticate (line 312) | virtual void authenticate(const string &responseDigest) {
      method output (line 318) | virtual void output(const string& key, const string& value) {
      method partitionedOutput (line 324) | virtual void partitionedOutput(int reduce, const string& key,
      method status (line 332) | virtual void status(const string& message) {
      method progress (line 337) | virtual void progress(float progress) {
      method done (line 343) | virtual void done() {
      method registerCounter (line 347) | virtual void registerCounter(int id, const string& group,
      method incrementCounter (line 355) | virtual void incrementCounter(const TaskContext::Counter* counter,
    class BinaryProtocol (line 367) | class BinaryProtocol: public Protocol {
      method getPassword (line 376) | void getPassword(string &password) {
      method verifyDigestAndRespond (line 396) | void verifyDigestAndRespond(string& digest, string& challenge) {
      method verifyDigest (line 413) | bool verifyDigest(string &password, string& digest, string& challeng...
      method string (line 422) | string createDigest(string &password, string& msg) {
      method BinaryProtocol (line 462) | BinaryProtocol(FILE* down, DownwardProtocol* _handler, FILE* up) {
      method UpwardProtocol (line 471) | UpwardProtocol* getUplink() {
      method nextEvent (line 475) | virtual void nextEvent() {
    class CombineContext (line 575) | class CombineContext: public ReduceContext {
      method CombineContext (line 589) | CombineContext(ReduceContext* _baseContext,
      method JobConf (line 604) | virtual const JobConf* getJobConf() {
      method emit (line 616) | virtual void emit(const std::string& key, const std::string& value) {
      method progress (line 625) | virtual void progress() {
      method setStatus (line 629) | virtual void setStatus(const std::string& status) {
      method nextKey (line 633) | bool nextKey() {
      method nextValue (line 648) | virtual bool nextValue() {
      method Counter (line 657) | virtual Counter* getCounter(const std::string& group,
      method incrementCounter (line 662) | virtual void incrementCounter(const Counter* counter, uint64_t amoun...
    class CombineRunner (line 671) | class CombineRunner: public RecordWriter {
      method CombineRunner (line 682) | CombineRunner(int64_t _spillSize, ReduceContext* _baseContext,
      method emit (line 694) | virtual void emit(const std::string& key,
      method close (line 703) | virtual void close() {
      method spillAll (line 708) | void spillAll() {
    class TaskContextImpl (line 719) | class TaskContextImpl: public MapContext, public ReduceContext,
      method TaskContextImpl (line 751) | TaskContextImpl(const Factory& _factory) {
      method setProtocol (line 774) | void setProtocol(Protocol* _protocol, UpwardProtocol* _uplink) {
      method start (line 780) | virtual void start(int protocol) {
      method setJobConf (line 787) | virtual void setJobConf(vector<string> values) {
      method setInputTypes (line 797) | virtual void setInputTypes(string keyType, string valueType) {
      method runMap (line 802) | virtual void runMap(string _inputSplit, int _numReduces, bool pipedI...
      method mapItem (line 828) | virtual void mapItem(const string& _key, const string& _value) {
      method runReduce (line 834) | virtual void runReduce(int reduce, bool pipedOutput) {
      method reduceKey (line 843) | virtual void reduceKey(const string& _key) {
      method reduceValue (line 848) | virtual void reduceValue(const string& _value) {
      method isDone (line 853) | virtual bool isDone() {
      method close (line 860) | virtual void close() {
      method abort (line 866) | virtual void abort() {
      method waitForTask (line 870) | void waitForTask() {
      method nextKey (line 876) | bool nextKey() {
      method nextValue (line 906) | virtual bool nextValue() {
      method JobConf (line 919) | virtual JobConf* getJobConf() {
      method string (line 927) | virtual const string& getInputKey() {
      method string (line 936) | virtual const string& getInputValue() {
      method progress (line 944) | virtual void progress() {
      method setStatus (line 961) | virtual void setStatus(const string& status) {
      method string (line 970) | virtual const string& getInputKeyClass() {
      method string (line 977) | virtual const string& getInputValueClass() {
      method emit (line 988) | virtual void emit(const string& key, const string& value) {
      method Counter (line 1003) | virtual Counter* getCounter(const std::string& group,
      method incrementCounter (line 1014) | virtual void incrementCounter(const Counter* counter, uint64_t amoun...
      method closeAll (line 1018) | void closeAll() {
    function runTask (line 1104) | bool runTask(const Factory& factory) {

FILE: examples/c++/SerialUtils.cc
  type HadoopUtils (line 29) | namespace HadoopUtils {
    function serializeInt (line 178) | void serializeInt(int32_t t, OutStream& stream) {
    function serializeLong (line 182) | void serializeLong(int64_t t, OutStream& stream)
    function deserializeInt (line 213) | int32_t deserializeInt(InStream& stream) {
    function deserializeLong (line 217) | int64_t deserializeLong(InStream& stream)
    function serializeFloat (line 246) | void serializeFloat(float t, OutStream& stream)
    function deserializeFloat (line 255) | float deserializeFloat(InStream& stream)
    function deserializeFloat (line 262) | void deserializeFloat(float& t, InStream& stream)
    function serializeString (line 271) | void serializeString(const std::string& t, OutStream& stream)
    function deserializeString (line 279) | void deserializeString(std::string& t, InStream& stream)

FILE: examples/c++/StringUtils.cc
  type HadoopUtils (line 32) | namespace HadoopUtils {
    function string (line 34) | string toString(int32_t x) {
    function toInt (line 40) | int toInt(const string& val) {
    function toFloat (line 49) | float toFloat(const string& val) {
    function toBool (line 58) | bool toBool(const string& val) {
    function getCurrentMillis (line 72) | uint64_t getCurrentMillis() {
    function splitString (line 80) | vector<string> splitString(const std::string& str,
    function string (line 97) | string quoteString(const string& str,
    function string (line 129) | string unquoteString(const string& str) {

FILE: examples/c++/include/hadoop/Pipes.hh
  class ReduceContext (line 141) | class ReduceContext: public TaskContext {
  class Closable (line 149) | class Closable {
    method close (line 151) | virtual void close() {}
  class Mapper (line 158) | class Mapper: public Closable {
  class Reducer (line 166) | class Reducer: public Closable {
  class Partitioner (line 174) | class Partitioner {
  class RecordReader (line 184) | class RecordReader: public Closable {
  class RecordWriter (line 198) | class RecordWriter: public Closable {
  class Factory (line 207) | class Factory {
    method Reducer (line 216) | virtual Reducer* createCombiner(MapContext& context) const {
    method Partitioner (line 225) | virtual Partitioner* createPartitioner(MapContext& context) const {
    method RecordReader (line 234) | virtual RecordReader* createRecordReader(MapContext& context) const {
    method RecordWriter (line 243) | virtual RecordWriter* createRecordWriter(ReduceContext& context) const {

FILE: examples/c++/include/hadoop/SerialUtils.hh
  type HadoopUtils (line 24) | namespace HadoopUtils {
    class Error (line 29) | class Error {
    class InStream (line 68) | class InStream {
    class OutStream (line 83) | class OutStream {
    class FileInStream (line 102) | class FileInStream : public InStream {
    class FileOutStream (line 125) | class FileOutStream: public OutStream {
    class StringInStream (line 151) | class StringInStream: public InStream {

FILE: examples/c++/include/hadoop/StringUtils.hh
  type HadoopUtils (line 25) | namespace HadoopUtils {

FILE: examples/c++/include/hadoop/TemplateFactory.hh
  type HadoopPipes (line 21) | namespace HadoopPipes {
    class TemplateFactory2 (line 24) | class TemplateFactory2: public Factory {
      method Mapper (line 26) | Mapper* createMapper(MapContext& context) const {
      method Reducer (line 29) | Reducer* createReducer(ReduceContext& context) const {
    class TemplateFactory3 (line 35) | class TemplateFactory3: public TemplateFactory2<mapper,reducer> {
      method Partitioner (line 37) | Partitioner* createPartitioner(MapContext& context) const {
    class TemplateFactory3<mapper, reducer, void> (line 43) | class TemplateFactory3<mapper, reducer, void>
    class TemplateFactory4 (line 48) | class TemplateFactory4
      method Reducer (line 51) | Reducer* createCombiner(MapContext& context) const {
    class TemplateFactory4<mapper,reducer,partitioner,void> (line 57) | class TemplateFactory4<mapper,reducer,partitioner,void>
    class TemplateFactory5 (line 63) | class TemplateFactory5
      method RecordReader (line 66) | RecordReader* createRecordReader(MapContext& context) const {
    class TemplateFactory5<mapper,reducer,partitioner,combiner,void> (line 72) | class TemplateFactory5<mapper,reducer,partitioner,combiner,void>
    class TemplateFactory (line 79) | class TemplateFactory
      method RecordWriter (line 82) | RecordWriter* createRecordWriter(ReduceContext& context) const {
    class TemplateFactory<mapper, reducer, partitioner, combiner, recordReader, 
                        void> (line 89) | class TemplateFactory<mapper, reducer, partitioner, combiner, recordRe...

FILE: examples/c++/wordcount.cc
  function deserializeLongWritable (line 31) | int64_t deserializeLongWritable(std::string s) {
  class Mapper (line 43) | class Mapper: public HadoopPipes::Mapper {
    method Mapper (line 46) | Mapper(HadoopPipes::TaskContext &context) { }
    method map (line 48) | void map(HadoopPipes::MapContext &context) {
  class Reducer (line 61) | class Reducer: public HadoopPipes::Reducer {
    method Reducer (line 64) | Reducer(HadoopPipes::TaskContext &context) { }
    method reduce (line 66) | void reduce(HadoopPipes::ReduceContext &context) {
  function main (line 76) | int main(int argc, char *argv[]) {

FILE: examples/hdfs/common.py
  function isdir (line 25) | def isdir(fs, d):

FILE: examples/hdfs/repl_session.py
  function clean (line 46) | def clean():

FILE: examples/hdfs/treegen.py
  function treegen (line 34) | def treegen(fs, root, depth, span):
  function main (line 52) | def main(argv):

FILE: examples/hdfs/treewalk.py
  function usage_by_bs (line 27) | def usage_by_bs(fs, root):

FILE: examples/input_format/check_results.py
  function get_res (line 28) | def get_res(output_dir):
  function check (line 33) | def check(measured_res, expected_res):
  function main (line 41) | def main(argv):

FILE: examples/input_format/it/crs4/pydoop/mapred/TextInputFormat.java
  class TextInputFormat (line 29) | public class TextInputFormat extends FileInputFormat<LongWritable, Text>
    method configure (line 34) | public void configure(JobConf conf) {
    method isSplitable (line 38) | protected boolean isSplitable(FileSystem fs, Path file) {
    method getRecordReader (line 42) | public RecordReader<LongWritable, Text> getRecordReader(

FILE: examples/input_format/it/crs4/pydoop/mapreduce/TextInputFormat.java
  class TextInputFormat (line 33) | public class TextInputFormat extends FileInputFormat<LongWritable, Text> {
    method createRecordReader (line 35) | @Override
    method isSplitable (line 41) | @Override

FILE: examples/pydoop_script/check.py
  function check_base_histogram (line 43) | def check_base_histogram(mr_out_dir):
  function check_caseswitch (line 58) | def check_caseswitch(mr_out_dir, switch="upper"):
  function check_grep (line 67) | def check_grep(mr_out_dir):
  function check_lowercase (line 79) | def check_lowercase(mr_out_dir):
  function check_transpose (line 83) | def check_transpose(mr_out_dir):
  function check_wordcount (line 104) | def check_wordcount(mr_out_dir, stop_words=None):
  function check_wordcount_sw (line 111) | def check_wordcount_sw(mr_out_dir):
  function make_parser (line 120) | def make_parser():
  function main (line 128) | def main(argv):

FILE: examples/pydoop_script/scripts/base_histogram.py
  function mapper (line 27) | def mapper(_, samrecord, writer):
  function reducer (line 34) | def reducer(key, ivalue, writer):

FILE: examples/pydoop_script/scripts/caseswitch.py
  function mapper (line 28) | def mapper(_, record, writer, conf):

FILE: examples/pydoop_script/scripts/grep.py
  function mapper (line 31) | def mapper(_, text, writer, conf):

FILE: examples/pydoop_script/scripts/lowercase.py
  function mapper (line 27) | def mapper(_, record, writer):

FILE: examples/pydoop_script/scripts/transpose.py
  function mapper (line 60) | def mapper(key, value, writer):
  function reducer (line 70) | def reducer(key, ivalue, writer):

FILE: examples/pydoop_script/scripts/wc_combiner.py
  function mapper (line 24) | def mapper(_, text, writer):
  function reducer (line 29) | def reducer(word, icounts, writer):
  function combiner (line 34) | def combiner(word, icounts, writer):

FILE: examples/pydoop_script/scripts/wordcount.py
  function mapper (line 25) | def mapper(_, text, writer):
  function reducer (line 30) | def reducer(word, icounts, writer):

FILE: examples/pydoop_script/scripts/wordcount_sw.py
  function mapper (line 33) | def mapper(_, value, writer):
  function reducer (line 41) | def reducer(word, icounts, writer):

FILE: examples/pydoop_submit/check.py
  function check_wordcount_minimal (line 41) | def check_wordcount_minimal(mr_out_dir):
  function check_nosep (line 51) | def check_nosep(mr_out_dir):
  function check_map_only_python_writer (line 65) | def check_map_only_python_writer(mr_out_dir):

FILE: examples/pydoop_submit/mr/map_only_java_writer.py
  class Mapper (line 27) | class Mapper(api.Mapper):
    method __init__ (line 29) | def __init__(self, context):
    method map (line 32) | def map(self, context):
  function __main__ (line 36) | def __main__():

FILE: examples/pydoop_submit/mr/map_only_python_writer.py
  class Mapper (line 32) | class Mapper(api.Mapper):
    method __init__ (line 34) | def __init__(self, context):
    method map (line 37) | def map(self, context):
  class Writer (line 41) | class Writer(api.RecordWriter):
    method __init__ (line 43) | def __init__(self, context):
    method close (line 53) | def close(self):
    method emit (line 57) | def emit(self, key, value):
  function __main__ (line 61) | def __main__():

FILE: examples/pydoop_submit/mr/nosep.py
  class Mapper (line 25) | class Mapper(api.Mapper):
    method map (line 27) | def map(self, ctx):
  function __main__ (line 32) | def __main__():

FILE: examples/pydoop_submit/mr/wordcount_full.py
  class Mapper (line 34) | class Mapper(api.Mapper):
    method __init__ (line 36) | def __init__(self, context):
    method map (line 41) | def map(self, context):
  class Reducer (line 48) | class Reducer(api.Reducer):
    method __init__ (line 50) | def __init__(self, context):
    method reduce (line 55) | def reduce(self, context):
  class Reader (line 60) | class Reader(api.RecordReader):
    method __init__ (line 65) | def __init__(self, context):
    method close (line 81) | def close(self):
    method next (line 86) | def next(self):
    method get_progress (line 96) | def get_progress(self):
  class Writer (line 100) | class Writer(api.RecordWriter):
    method __init__ (line 102) | def __init__(self, context):
    method close (line 112) | def close(self):
    method emit (line 117) | def emit(self, key, value):
  class Partitioner (line 121) | class Partitioner(api.Partitioner):
    method __init__ (line 123) | def __init__(self, context):
    method partition (line 127) | def partition(self, key, n_reduces):
  function main (line 145) | def main():

FILE: examples/pydoop_submit/mr/wordcount_minimal.py
  class Mapper (line 30) | class Mapper(api.Mapper):
    method map (line 32) | def map(self, context):
  class Reducer (line 37) | class Reducer(api.Reducer):
    method reduce (line 39) | def reduce(self, context):
  function main (line 46) | def main():

FILE: examples/self_contained/check_results.py
  function compute_vc (line 31) | def compute_vc(input_dir):
  function get_res (line 41) | def get_res(output_dir):
  function check (line 45) | def check(measured_res, expected_res):
  function main (line 53) | def main(argv):

FILE: examples/self_contained/vowelcount/lib/__init__.py
  function is_vowel (line 23) | def is_vowel(c):

FILE: examples/self_contained/vowelcount/mr/main.py
  function main (line 24) | def main():

FILE: examples/self_contained/vowelcount/mr/mapper.py
  class Mapper (line 23) | class Mapper(api.Mapper):
    method map (line 25) | def map(self, context):

FILE: examples/self_contained/vowelcount/mr/reducer.py
  class Reducer (line 22) | class Reducer(api.Reducer):
    method reduce (line 24) | def reduce(self, context):

FILE: examples/sequence_file/bin/filter.py
  class FilterMapper (line 31) | class FilterMapper(Mapper):
    method __init__ (line 36) | def __init__(self, context):
    method map (line 41) | def map(self, context):
  function __main__ (line 48) | def __main__():

FILE: examples/sequence_file/bin/wordcount.py
  class WordCountMapper (line 27) | class WordCountMapper(Mapper):
    method map (line 29) | def map(self, context):
  class WordCountReducer (line 34) | class WordCountReducer(Reducer):
    method reduce (line 36) | def reduce(self, context):
  function __main__ (line 41) | def __main__():

FILE: examples/sequence_file/check.py
  function main (line 27) | def main(args):

FILE: int_test/mapred_submitter/check.py
  function get_lines (line 30) | def get_lines(dir_path):
  function check_output (line 41) | def check_output(items, exp_items):
  function check_counters (line 53) | def check_counters(counter, exp_counter):
  function word_count (line 57) | def word_count(lines):
  function check_map_only (line 61) | def check_map_only(in_dir, out_dir):
  function check_map_reduce (line 67) | def check_map_reduce(in_dir, out_dir):
  function check_pstats (line 74) | def check_pstats(pstats_dir):

FILE: int_test/mapred_submitter/genwords.py
  function genfile (line 38) | def genfile(path, size):

FILE: int_test/mapred_submitter/mr/map_only_java_writer.py
  class Mapper (line 25) | class Mapper(api.Mapper):
    method map (line 27) | def map(self, context):
  function __main__ (line 31) | def __main__():

FILE: int_test/mapred_submitter/mr/map_only_python_writer.py
  class Mapper (line 28) | class Mapper(api.Mapper):
    method map (line 30) | def map(self, context):
  class Writer (line 34) | class Writer(api.RecordWriter):
    method __init__ (line 36) | def __init__(self, context):
    method close (line 42) | def close(self):
    method emit (line 45) | def emit(self, key, value):
  function __main__ (line 49) | def __main__():

FILE: int_test/mapred_submitter/mr/map_reduce_combiner.py
  class Mapper (line 25) | class Mapper(api.Mapper):
    method map (line 27) | def map(self, context):
  class Reducer (line 32) | class Reducer(api.Reducer):
    method reduce (line 34) | def reduce(self, context):
  function __main__ (line 38) | def __main__():

FILE: int_test/mapred_submitter/mr/map_reduce_java_rw.py
  class Mapper (line 25) | class Mapper(api.Mapper):
    method map (line 27) | def map(self, context):
  class Reducer (line 32) | class Reducer(api.Reducer):
    method reduce (line 34) | def reduce(self, context):
  function __main__ (line 38) | def __main__():

FILE: int_test/mapred_submitter/mr/map_reduce_java_rw_pstats.py
  class Mapper (line 25) | class Mapper(api.Mapper):
    method map (line 27) | def map(self, context):
  class Reducer (line 32) | class Reducer(api.Reducer):
    method reduce (line 34) | def reduce(self, context):
  function __main__ (line 38) | def __main__():

FILE: int_test/mapred_submitter/mr/map_reduce_python_partitioner.py
  class Mapper (line 27) | class Mapper(api.Mapper):
    method map (line 29) | def map(self, context):
  class Reducer (line 34) | class Reducer(api.Reducer):
    method reduce (line 36) | def reduce(self, context):
  class Partitioner (line 40) | class Partitioner(api.Partitioner):
    method partition (line 42) | def partition(self, key, n_reduces):
  function __main__ (line 46) | def __main__():

FILE: int_test/mapred_submitter/mr/map_reduce_python_reader.py
  class Mapper (line 28) | class Mapper(api.Mapper):
    method map (line 30) | def map(self, context):
  class Reducer (line 35) | class Reducer(api.Reducer):
    method reduce (line 37) | def reduce(self, context):
  class Reader (line 41) | class Reader(api.RecordReader):
    method __init__ (line 43) | def __init__(self, context):
    method close (line 53) | def close(self):
    method next (line 56) | def next(self):
    method get_progress (line 66) | def get_progress(self):
  function __main__ (line 70) | def __main__():

FILE: int_test/mapred_submitter/mr/map_reduce_python_writer.py
  class Mapper (line 28) | class Mapper(api.Mapper):
    method map (line 30) | def map(self, context):
  class Reducer (line 35) | class Reducer(api.Reducer):
    method reduce (line 37) | def reduce(self, context):
  class Writer (line 41) | class Writer(api.RecordWriter):
    method __init__ (line 43) | def __init__(self, context):
    method close (line 49) | def close(self):
    method emit (line 52) | def emit(self, key, value):
  function __main__ (line 56) | def __main__():

FILE: int_test/mapred_submitter/mr/map_reduce_raw_io.py
  class Mapper (line 25) | class Mapper(api.Mapper):
    method map (line 29) | def map(self, context):
  class Reducer (line 36) | class Reducer(api.Reducer):
    method reduce (line 38) | def reduce(self, context):
  function __main__ (line 43) | def __main__():

FILE: int_test/mapred_submitter/mr/map_reduce_slow_java_rw.py
  class Mapper (line 28) | class Mapper(api.Mapper):
    method __init__ (line 30) | def __init__(self, context):
    method map (line 34) | def map(self, context):
    method close (line 40) | def close(self):
  class Reducer (line 44) | class Reducer(api.Reducer):
    method __init__ (line 46) | def __init__(self, context):
    method reduce (line 50) | def reduce(self, context):
    method close (line 55) | def close(self):
  function __main__ (line 59) | def __main__():

FILE: int_test/mapred_submitter/mr/map_reduce_slow_python_rw.py
  class Mapper (line 33) | class Mapper(api.Mapper):
    method __init__ (line 35) | def __init__(self, context):
    method map (line 39) | def map(self, context):
    method close (line 45) | def close(self):
  class Reducer (line 49) | class Reducer(api.Reducer):
    method __init__ (line 51) | def __init__(self, context):
    method reduce (line 55) | def reduce(self, context):
    method close (line 60) | def close(self):
  class Reader (line 64) | class Reader(api.RecordReader):
    method __init__ (line 66) | def __init__(self, context):
    method close (line 76) | def close(self):
    method next (line 79) | def next(self):
    method get_progress (line 89) | def get_progress(self):
  class Writer (line 93) | class Writer(api.RecordWriter):
    method __init__ (line 95) | def __init__(self, context):
    method close (line 101) | def close(self):
    method emit (line 104) | def emit(self, key, value):
  function __main__ (line 108) | def __main__():

FILE: int_test/opaque_split/check.py
  function check_output (line 29) | def check_output(mr_out_dir):

FILE: int_test/opaque_split/gen_splits.py
  function gen_ranges (line 32) | def gen_ranges():

FILE: int_test/opaque_split/mrapp.py
  class Reader (line 44) | class Reader(api.RecordReader):
    method __init__ (line 46) | def __init__(self, context):
    method next (line 53) | def next(self):
    method get_progress (line 57) | def get_progress(self):
  class Mapper (line 62) | class Mapper(api.Mapper):
    method map (line 64) | def map(self, context):
  function __main__ (line 68) | def __main__():

FILE: int_test/progress/mrapp.py
  class Mapper (line 28) | class Mapper(api.Mapper):
    method map (line 30) | def map(self, context):
  class Writer (line 36) | class Writer(api.RecordWriter):
    method __init__ (line 38) | def __init__(self, context):
    method close (line 46) | def close(self):
    method emit (line 49) | def emit(self, key, value):
  function __main__ (line 56) | def __main__():

FILE: pydoop/__init__.py
  function reset (line 62) | def reset():
  function hadoop_home (line 66) | def hadoop_home():
  function hadoop_conf (line 70) | def hadoop_conf():
  function hadoop_params (line 74) | def hadoop_params():
  function hadoop_classpath (line 78) | def hadoop_classpath():
  function package_dir (line 82) | def package_dir():
  function jar_name (line 92) | def jar_name(hadoop_vinfo=None):
  function jar_path (line 96) | def jar_path(hadoop_vinfo=None):
  function complete_mod_name (line 104) | def complete_mod_name(module, hadoop_vinfo=None):
  function import_version_specific_module (line 108) | def import_version_specific_module(name):
  class AddSectionWrapper (line 119) | class AddSectionWrapper(object):
    method __init__ (line 123) | def __init__(self, f):
    method __iter__ (line 127) | def __iter__(self):
    method __next__ (line 130) | def __next__(self):
    method readline (line 136) | def readline(self):
  function read_properties (line 146) | def read_properties(fname):
  class LocalModeNotSupported (line 159) | class LocalModeNotSupported(RuntimeError):
    method __init__ (line 160) | def __init__(self):
  function check_local_mode (line 165) | def check_local_mode():

FILE: pydoop/app/argparse_types.py
  function kv_pair (line 23) | def kv_pair(s):
  class UpdateMap (line 31) | class UpdateMap(argparse.Action):
    method __init__ (line 42) | def __init__(self, option_strings, dest, **kwargs):
    method __call__ (line 47) | def __call__(self, parser, namespace, values, option_string=None):
  function a_file_that_can_be_read (line 53) | def a_file_that_can_be_read(x):
  function a_hdfs_file (line 59) | def a_hdfs_file(x):
  function a_comma_separated_list (line 64) | def a_comma_separated_list(x):

FILE: pydoop/app/main.py
  class PatchedArgumentParser (line 39) | class PatchedArgumentParser(argparse.ArgumentParser):
    method _read_args_from_files (line 45) | def _read_args_from_files(self, arg_strings):
  function make_parser (line 54) | def make_parser():
  function main (line 72) | def main(argv=None):

FILE: pydoop/app/script.py
  class PydoopScript (line 45) | class PydoopScript(object):
    method __init__ (line 47) | def __init__(self, args, unknown_args):
    method generate_driver (line 53) | def generate_driver(mr_module, args):
    method convert_args (line 64) | def convert_args(self, args, unknown_args):
    method run (line 112) | def run(self):
    method clean (line 122) | def clean(self):
  function run (line 126) | def run(args, unknown_args=None):
  function add_parser_arguments (line 135) | def add_parser_arguments(parser):
  function add_parser (line 151) | def add_parser(subparsers):

FILE: pydoop/app/submit.py
  class PydoopSubmitter (line 54) | class PydoopSubmitter(object):
    method __init__ (line 60) | def __init__(self):
    method __cache_archive_link (line 80) | def __cache_archive_link(archive_name):
    method __set_files_to_cache_helper (line 84) | def __set_files_to_cache_helper(self, prop, upload_and_cache, cache):
    method __set_files_to_cache (line 103) | def __set_files_to_cache(self, args):
    method __set_archives_to_cache (line 110) | def __set_archives_to_cache(self, args):
    method _env_arg_to_dict (line 120) | def _env_arg_to_dict(set_env_list):
    method set_args (line 132) | def set_args(self, args, unknown_args=None):
    method __warn_user_if_wd_maybe_unreadable (line 163) | def __warn_user_if_wd_maybe_unreadable(self, abs_remote_path):
    method _generate_pipes_code (line 201) | def _generate_pipes_code(self):
    method __validate (line 272) | def __validate(self):
    method __clean_wd (line 282) | def __clean_wd(self):
    method __setup_remote_paths (line 292) | def __setup_remote_paths(self):
    method run (line 324) | def run(self):
    method fake_run_class (line 384) | def fake_run_class(self, *args, **kwargs):
  function run (line 391) | def run(args, unknown_args=None):
  function add_parser_common_arguments (line 400) | def add_parser_common_arguments(parser):
  function add_parser_arguments (line 481) | def add_parser_arguments(parser):
  function add_parser (line 550) | def add_parser(subparsers):

FILE: pydoop/avrolib.py
  class Deserializer (line 39) | class Deserializer(object):
    method __init__ (line 41) | def __init__(self, schema_str):
    method deserialize (line 45) | def deserialize(self, rec_bytes):
  class Serializer (line 49) | class Serializer(object):
    method __init__ (line 51) | def __init__(self, schema_str):
    method serialize (line 55) | def serialize(self, record):
  class SeekableDataFileReader (line 73) | class SeekableDataFileReader(DataFileReader):
    method align_after (line 77) | def align_after(self, offset):
  class AvroReader (line 102) | class AvroReader(RecordReader):
    method __init__ (line 108) | def __init__(self, ctx):
    method next (line 117) | def next(self):
    method get_progress (line 124) | def get_progress(self):
  class AvroWriter (line 135) | class AvroWriter(RecordWriter):
    method __init__ (line 139) | def __init__(self, context):
    method close (line 148) | def close(self):

FILE: pydoop/hadoop_utils.py
  class HadoopXMLError (line 35) | class HadoopXMLError(Exception):
  function extract_text (line 39) | def extract_text(node):
  function parse_hadoop_conf_file (line 45) | def parse_hadoop_conf_file(fn):
  class PathFinder (line 67) | class PathFinder(object):
    method __init__ (line 71) | def __init__(self):
    method reset (line 78) | def reset(self):
    method hadoop_home (line 82) | def hadoop_home(self):
    method hadoop_conf (line 95) | def hadoop_conf(self):
    method hadoop_params (line 110) | def hadoop_params(self):
    method hadoop_classpath (line 123) | def hadoop_classpath(self):
    method __get_is_local (line 134) | def __get_is_local(self):
    method is_local (line 144) | def is_local(self):

FILE: pydoop/hadut.py
  function _pop_generic_args (line 43) | def _pop_generic_args(args):
  function _merge_csv_args (line 59) | def _merge_csv_args(args):
  function _construct_property_args (line 79) | def _construct_property_args(prop_dict):
  class RunCmdError (line 84) | class RunCmdError(RuntimeError):
    method __init__ (line 89) | def __init__(self, returncode, cmd, output=None):
    method __str__ (line 94) | def __str__(self):
  function run_tool_cmd (line 105) | def run_tool_cmd(tool, cmd, args=None, properties=None, hadoop_conf_dir=...
  function run_cmd (line 155) | def run_cmd(cmd, args=None, properties=None, hadoop_home=None,
  function run_class (line 167) | def run_class(class_name, args=None, properties=None, classpath=None,
  function iter_mr_out_files (line 209) | def iter_mr_out_files(mr_out_dir):
  function collect_output (line 215) | def collect_output(mr_out_dir, out_file=None):

FILE: pydoop/hdfs/__init__.py
  function init (line 83) | def init():
  function reset (line 92) | def reset():
  function open (line 101) | def open(hdfs_path, mode="r", buff_size=0, replication=0, blocksize=0,
  function dump (line 116) | def dump(data, hdfs_path, **kwargs):
  function load (line 133) | def load(hdfs_path, **kwargs):
  function _cp_file (line 149) | def _cp_file(src_fs, src_path, dest_fs, dest_path, **kwargs):
  function cp (line 164) | def cp(src_hdfs_path, dest_hdfs_path, **kwargs):
  function put (line 220) | def put(src_path, dest_hdfs_path, **kwargs):
  function get (line 232) | def get(src_hdfs_path, dest_path, **kwargs):
  function mkdir (line 244) | def mkdir(hdfs_path, user=None):
  function rm (line 255) | def rm(hdfs_path, recursive=True, user=None):
  function rmr (line 270) | def rmr(hdfs_path, user=None):
  function lsl (line 274) | def lsl(hdfs_path, user=None, recursive=False):
  function ls (line 300) | def ls(hdfs_path, user=None, recursive=False):
  function chmod (line 311) | def chmod(hdfs_path, mode, user=None):
  function move (line 327) | def move(src, dest, user=None):
  function chown (line 343) | def chown(hdfs_path, user=None, group=None, hdfs_user=None):
  function rename (line 354) | def rename(from_path, to_path, user=None):
  function renames (line 368) | def renames(from_path, to_path, user=None):

FILE: pydoop/hdfs/common.py
  function parse_mode (line 45) | def parse_mode(mode):
  function encode_path (line 60) | def encode_path(path):
  function decode_path (line 63) | def decode_path(path):
  function encode_host (line 66) | def encode_host(host):
  function decode_host (line 69) | def decode_host(host):
  function encode_path (line 72) | def encode_path(path):
  function decode_path (line 77) | def decode_path(path):
  function encode_host (line 82) | def encode_host(host):
  function decode_host (line 87) | def decode_host(host):
  function get_groups (line 93) | def get_groups(user=DEFAULT_USER):

FILE: pydoop/hdfs/core/__init__.py
  function init (line 26) | def init():
  function core_hdfs_fs (line 38) | def core_hdfs_fs(host, port, user):

FILE: pydoop/hdfs/file.py
  function _complain_ifclosed (line 31) | def _complain_ifclosed(closed):
  class FileIO (line 36) | class FileIO(object):
    method __init__ (line 47) | def __init__(self, raw_hdfs_file, fs, mode, encoding=None, errors=None):
    method __enter__ (line 76) | def __enter__(self):
    method __exit__ (line 79) | def __exit__(self, exc_type, exc_value, traceback):
    method fs (line 83) | def fs(self):
    method name (line 90) | def name(self):
    method size (line 97) | def size(self):
    method writable (line 104) | def writable(self):
    method readline (line 107) | def readline(self):
    method next (line 122) | def next(self):
    method __next__ (line 129) | def __next__(self):
    method __iter__ (line 140) | def __iter__(self):
    method available (line 143) | def available(self):
    method close (line 154) | def close(self):
    method pread (line 165) | def pread(self, position, length):
    method read (line 188) | def read(self, length=-1):
    method seek (line 218) | def seek(self, position, whence=os.SEEK_SET):
    method tell (line 232) | def tell(self):
    method write (line 242) | def write(self, data):
    method flush (line 258) | def flush(self):
  class hdfs_file (line 266) | class hdfs_file(FileIO):
    method pread_chunk (line 268) | def pread_chunk(self, position, chunk):
    method read_chunk (line 286) | def read_chunk(self, chunk):
  class local_file (line 301) | class local_file(io.FileIO):
    method __init__ (line 309) | def __init__(self, fs, name, mode):
    method __make_parents (line 320) | def __make_parents(fs, name):
    method fs (line 329) | def fs(self):
    method size (line 333) | def size(self):
    method available (line 336) | def available(self):
    method close (line 340) | def close(self):
    method seek (line 347) | def seek(self, position, whence=os.SEEK_SET):
    method __seek_and_read (line 352) | def __seek_and_read(self, position, length=None, buf=None):
    method pread (line 366) | def pread(self, position, length):
    method pread_chunk (line 369) | def pread_chunk(self, position, chunk):
    method read_chunk (line 372) | def read_chunk(self, chunk):
  class TextIOWrapper (line 377) | class TextIOWrapper(io.TextIOWrapper):
    method __getattr__ (line 379) | def __getattr__(self, name):
    method pread (line 390) | def pread(self, position, length):

FILE: pydoop/hdfs/fs.py
  class _FSStatus (line 45) | class _FSStatus(object):
    method __init__ (line 47) | def __init__(self, fs, host, port, user, refcount=1):
    method __repr__ (line 54) | def __repr__(self):
  function _complain_ifclosed (line 58) | def _complain_ifclosed(closed):
  function _get_ip (line 63) | def _get_ip(host, default=None):
  function _get_connection_info (line 71) | def _get_connection_info(host, port, user):
  function _default_fs (line 92) | def _default_fs():
  function default_is_local (line 98) | def default_is_local():
  class hdfs (line 108) | class hdfs(object):
    method __canonize_hpu (line 134) | def __canonize_hpu(self, hpu):
    method __lookup (line 141) | def __lookup(self, hpu):
    method __eq__ (line 146) | def __eq__(self, other):
    method __init__ (line 153) | def __init__(self, host="default", port=0, user=None, groups=None):
    method __enter__ (line 187) | def __enter__(self):
    method __exit__ (line 190) | def __exit__(self, exc_type, exc_value, traceback):
    method fs (line 194) | def fs(self):
    method refcount (line 198) | def refcount(self):
    method host (line 202) | def host(self):
    method port (line 209) | def port(self):
    method user (line 216) | def user(self):
    method close (line 222) | def close(self):
    method closed (line 234) | def closed(self):
    method open_file (line 237) | def open_file(self, path,
    method capacity (line 284) | def capacity(self):
    method copy (line 296) | def copy(self, from_path, to_hdfs, to_path):
    method create_directory (line 313) | def create_directory(self, path):
    method default_block_size (line 325) | def default_block_size(self):
    method delete (line 335) | def delete(self, path, recursive=True):
    method exists (line 350) | def exists(self, path):
    method get_hosts (line 362) | def get_hosts(self, path, start, length):
    method get_path_info (line 380) | def get_path_info(self, path):
    method list_directory (line 407) | def list_directory(self, path):
    method move (line 420) | def move(self, from_path, to_hdfs, to_path):
    method rename (line 437) | def rename(self, from_path, to_path):
    method set_replication (line 450) | def set_replication(self, path, replication):
    method set_working_directory (line 463) | def set_working_directory(self, path):
    method used (line 475) | def used(self):
    method working_directory (line 485) | def working_directory(self):
    method chown (line 496) | def chown(self, path, user='', group=''):
    method __get_umask (line 512) | def __get_umask():
    method __compute_mode_from_string (line 517) | def __compute_mode_from_string(self, path, mode_string):
    method chmod (line 580) | def chmod(self, path, mode):
    method utime (line 597) | def utime(self, path, mtime, atime):
    method walk (line 612) | def walk(self, top):

FILE: pydoop/hdfs/path.py
  class StatResult (line 35) | class StatResult(object):
    method __init__ (line 49) | def __init__(self, path_info):
    method __repr__ (line 72) | def __repr__(self):
  class _HdfsPathSplitter (line 81) | class _HdfsPathSplitter(object):
    method raise_bad_path (line 86) | def raise_bad_path(cls, hdfs_path, why=None):
    method parse (line 92) | def parse(cls, hdfs_path):
    method unparse (line 118) | def unparse(cls, scheme, netloc, path):
    method split_netloc (line 132) | def split_netloc(cls, netloc):
    method split (line 150) | def split(cls, hdfs_path, user):
  function parse (line 173) | def parse(hdfs_path):
  function unparse (line 185) | def unparse(scheme, netloc, path):
  function split (line 192) | def split(hdfs_path, user=None):
  function join (line 208) | def join(*parts):
  function abspath (line 242) | def abspath(hdfs_path, user=None, local=False):
  function splitpath (line 281) | def splitpath(hdfs_path):
  function basename (line 289) | def basename(hdfs_path):
  function dirname (line 296) | def dirname(hdfs_path):
  function exists (line 304) | def exists(hdfs_path, user=None):
  function lstat (line 316) | def lstat(hdfs_path, user=None):
  function lexists (line 320) | def lexists(hdfs_path, user=None):
  function kind (line 325) | def kind(path, user=None):
  function isdir (line 341) | def isdir(path, user=None):
  function isfile (line 348) | def isfile(path, user=None):
  function expanduser (line 355) | def expanduser(path):
  function expandvars (line 371) | def expandvars(path):
  function _update_stat (line 378) | def _update_stat(st, path_):
  function stat (line 389) | def stat(path, user=None):
  function getatime (line 403) | def getatime(path, user=None):
  function getmtime (line 410) | def getmtime(path, user=None):
  function getctime (line 417) | def getctime(path, user=None):
  function getsize (line 424) | def getsize(path, user=None):
  function isfull (line 431) | def isfull(path):
  function isabs (line 441) | def isabs(path):
  function islink (line 452) | def islink(path, user=None):
  function ismount (line 464) | def ismount(path):
  function normcase (line 476) | def normcase(path):
  function normpath (line 480) | def normpath(path):
  function realpath (line 488) | def realpath(path):
  function samefile (line 500) | def samefile(path1, path2, user=None):
  function splitdrive (line 509) | def splitdrive(path):
  function splitext (line 513) | def splitext(path):
  function access (line 520) | def access(path, mode, user=None):
  function utime (line 544) | def utime(hdfs_path, times=None, user=None):

FILE: pydoop/jc.py
  function jc_wrapper (line 24) | def jc_wrapper(obj):

FILE: pydoop/mapreduce/api.py
  class JobConf (line 35) | class JobConf(dict):
    method get_int (line 48) | def get_int(self, key, default=None):
    method get_float (line 55) | def get_float(self, key, default=None):
    method get_bool (line 62) | def get_bool(self, key, default=None):
    method get_json (line 83) | def get_json(self, key, default=None):
  class InputSplit (line 88) | class InputSplit(object):
  class FileSplit (line 98) | class FileSplit(InputSplit,
  class OpaqueSplit (line 106) | class OpaqueSplit(InputSplit, namedtuple("OpaqueSplit", "payload")):
  class Context (line 124) | class Context(ABC):
    method input_split (line 139) | def input_split(self):
    method get_input_split (line 152) | def get_input_split(self, raw=False):
    method job_conf (line 156) | def job_conf(self):
    method get_job_conf (line 163) | def get_job_conf(self):
    method key (line 167) | def key(self):
    method get_input_key (line 174) | def get_input_key(self):
    method value (line 178) | def value(self):
    method get_input_value (line 185) | def get_input_value(self):
    method values (line 189) | def values(self):
    method get_input_values (line 196) | def get_input_values(self):
    method emit (line 200) | def emit(self, key, value):
    method progress (line 207) | def progress(self):
    method set_status (line 211) | def set_status(self, status):
    method get_counter (line 221) | def get_counter(self, group, name):
    method increment_counter (line 235) | def increment_counter(self, counter, amount):
  class Closable (line 242) | class Closable(object):
    method close (line 244) | def close(self):
  class Component (line 253) | class Component(ABC):
    method __init__ (line 255) | def __init__(self, context):
  class Mapper (line 259) | class Mapper(Component, Closable):
    method map (line 265) | def map(self, context):
  class Reducer (line 279) | class Reducer(Component, Closable):
    method reduce (line 286) | def reduce(self, context):
  class Combiner (line 299) | class Combiner(Reducer):
  class Partitioner (line 315) | class Partitioner(Component):
    method partition (line 326) | def partition(self, key, num_of_reduces):
  class RecordReader (line 342) | class RecordReader(Component, Closable):
    method __iter__ (line 347) | def __iter__(self):
    method next (line 351) | def next(self):
    method __next__ (line 364) | def __next__(self):
    method get_progress (line 368) | def get_progress(self):
  class RecordWriter (line 379) | class RecordWriter(Component, Closable):
    method emit (line 385) | def emit(self, key, value):
  class Factory (line 397) | class Factory(ABC):
    method create_mapper (line 414) | def create_mapper(self, context):
    method create_reducer (line 417) | def create_reducer(self, context):
    method create_combiner (line 420) | def create_combiner(self, context):
    method create_partitioner (line 428) | def create_partitioner(self, context):
    method create_record_reader (line 437) | def create_record_reader(self, context):
    method create_record_writer (line 446) | def create_record_writer(self, context):

FILE: pydoop/mapreduce/binary_protocol.py
  function get_password (line 85) | def get_password():
  function _get_LongWritable (line 97) | def _get_LongWritable(downlink):
  function _get_Text (line 102) | def _get_Text(downlink):
  function _get_avro_key (line 112) | def _get_avro_key(downlink):
  function _get_avro_value (line 117) | def _get_avro_value(downlink):
  function _get_pickled (line 122) | def _get_pickled(downlink):
  class Downlink (line 126) | class Downlink(object):
    method __init__ (line 183) | def __init__(self, istream, context, **kwargs):
    method close (line 193) | def close(self):
    method read_job_conf (line 196) | def read_job_conf(self):
    method verify_digest (line 203) | def verify_digest(self, digest, challenge):
    method setup_record_writer (line 209) | def setup_record_writer(self, piped_output):
    method get_k (line 216) | def get_k(self):
    method get_v (line 219) | def get_v(self):
    method setup_avro_deser (line 222) | def setup_avro_deser(self):
    method setup_deser (line 240) | def setup_deser(self, key_type, value_type):
    method __next__ (line 250) | def __next__(self):
    method __iter__ (line 343) | def __iter__(self):
    method next (line 347) | def next(self):
  class Uplink (line 351) | class Uplink(object):
    method __init__ (line 356) | def __init__(self, stream):
    method flush (line 359) | def flush(self):
    method close (line 362) | def close(self):
    method authenticate (line 367) | def authenticate(self, response_digest):
    method output (line 370) | def output(self, k, v):
    method partitioned_output (line 373) | def partitioned_output(self, part, k, v):
    method status (line 376) | def status(self, msg):
    method progress (line 379) | def progress(self, p):
    method done (line 382) | def done(self):
    method register_counter (line 385) | def register_counter(self, id, group, name):
    method increment_counter (line 388) | def increment_counter(self, id, amount):

FILE: pydoop/mapreduce/connections.py
  class Connection (line 38) | class Connection(object):
    method __init__ (line 56) | def __init__(self, context, istream, ostream, **kwargs):
    method close (line 60) | def close(self):
    method __enter__ (line 64) | def __enter__(self):
    method __exit__ (line 67) | def __exit__(self, *args):
  class NetworkConnection (line 71) | class NetworkConnection(Connection):
    method __init__ (line 73) | def __init__(self, context, host, port, **kwargs):
    method close (line 82) | def close(self):
  class FileConnection (line 87) | class FileConnection(Connection):
    method __init__ (line 89) | def __init__(self, context, in_fn, out_fn, **kwargs):
  function get_connection (line 97) | def get_connection(context, **kwargs):

FILE: pydoop/mapreduce/pipes.py
  function create_digest (line 60) | def create_digest(key, msg):
  function read_int_writable (line 67) | def read_int_writable(f):
  function write_int_writable (line 72) | def write_int_writable(n, f):
  function read_bytes_writable (line 76) | def read_bytes_writable(f):
  function write_bytes_writable (line 84) | def write_bytes_writable(s, f):
  class FileSplit (line 90) | class FileSplit(api.FileSplit):
    method frombuffer (line 93) | def frombuffer(cls, buf):
  class OpaqueSplit (line 98) | class OpaqueSplit(api.OpaqueSplit):
    method frombuffer (line 101) | def frombuffer(cls, buf):
    method read (line 105) | def read(cls, f):
    method write (line 108) | def write(self, f):
  function write_opaque_splits (line 112) | def write_opaque_splits(splits, f):
  function read_opaque_splits (line 118) | def read_opaque_splits(f):
  class TaskContext (line 123) | class TaskContext(api.Context):
    method __init__ (line 129) | def __init__(self, factory, **kwargs):
    method get_input_split (line 159) | def get_input_split(self, raw=False):
    method get_job_conf (line 169) | def get_job_conf(self):
    method get_input_key (line 172) | def get_input_key(self):
    method get_input_value (line 175) | def get_input_value(self):
    method get_input_values (line 178) | def get_input_values(self):
    method create_combiner (line 181) | def create_combiner(self):
    method create_mapper (line 190) | def create_mapper(self):
    method create_partitioner (line 194) | def create_partitioner(self):
    method create_record_reader (line 198) | def create_record_reader(self):
    method create_record_writer (line 202) | def create_record_writer(self):
    method create_reducer (line 206) | def create_reducer(self):
    method progress (line 210) | def progress(self):
    method set_status (line 227) | def set_status(self, status):
    method get_counter (line 231) | def get_counter(self, group, name):
    method increment_counter (line 238) | def increment_counter(self, counter, amount):
    method __spill_counters (line 244) | def __spill_counters(self):
    method _authenticate (line 250) | def _authenticate(self, password, digest, challenge):
    method _setup_avro_ser (line 257) | def _setup_avro_ser(self):
    method __maybe_serialize (line 273) | def __maybe_serialize(self, key, value):
    method emit (line 286) | def emit(self, key, value):
    method __actual_emit (line 309) | def __actual_emit(self, key, value):
    method __spill_all (line 320) | def __spill_all(self):
    method close (line 330) | def close(self):
    method get_output_dir (line 352) | def get_output_dir(self):
    method get_work_path (line 355) | def get_work_path(self):
    method get_task_partition (line 361) | def get_task_partition(self):
    method get_default_work_file (line 364) | def get_default_work_file(self, extension=""):
  class Factory (line 374) | class Factory(api.Factory):
    method __init__ (line 376) | def __init__(self, mapper_class,
    method create_mapper (line 389) | def create_mapper(self, context):
    method create_reducer (line 392) | def create_reducer(self, context):
    method create_combiner (line 395) | def create_combiner(self, context):
    method create_partitioner (line 398) | def create_partitioner(self, context):
    method create_record_reader (line 401) | def create_record_reader(self, context):
    method create_record_writer (line 404) | def create_record_writer(self, context):
  function _run (line 408) | def _run(context, **kwargs):
  function run_task (line 414) | def run_task(factory, **kwargs):

FILE: pydoop/test_support.py
  function __inject_pos (line 33) | def __inject_pos(code, start=0):
  function inject_code (line 45) | def inject_code(new_code, target_code):
  function add_sys_path (line 59) | def add_sys_path(target_code):
  function set_python_cmd (line 67) | def set_python_cmd(code, python_cmd=sys.executable):
  function adapt_script (line 77) | def adapt_script(code, python_cmd=sys.executable):
  function parse_mr_output (line 81) | def parse_mr_output(output, vtype=str):
  function compare_counts (line 97) | def compare_counts(c1, c2):
  class LocalWordCount (line 111) | class LocalWordCount(object):
    method __init__ (line 113) | def __init__(self, input_path, min_occurrence=0, stop_words=None):
    method expected_output (line 120) | def expected_output(self):
    method run (line 125) | def run(self):
    method _wordcount_file (line 138) | def _wordcount_file(self, wc, fn, path=None):
    method check (line 145) | def check(self, output):
  function get_wd_prefix (line 155) | def get_wd_prefix(base="pydoop_"):

FILE: pydoop/test_utils.py
  function _get_special_chr (line 49) | def _get_special_chr():
  class FSTree (line 83) | class FSTree(object):
    method __init__ (line 99) | def __init__(self, name, kind=1):
    method add (line 106) | def add(self, name, kind=1):
    method walk (line 111) | def walk(self):
  function make_wd (line 119) | def make_wd(fs, prefix="pydoop_test_"):
  function make_random_data (line 128) | def make_random_data(size=_RANDOM_DATA_SIZE, printable=True):
  function get_bytes_per_checksum (line 134) | def get_bytes_per_checksum():
  function silent_call (line 141) | def silent_call(func, *args, **kwargs):
  function get_module (line 155) | def get_module(name, path=None):
  function compile_java (line 165) | def compile_java(java_file, classpath, opts=None):
  function run_java (line 184) | def run_java(jclass, classpath, args, wd):
  function get_java_output_stream (line 192) | def get_java_output_stream(jclass, classpath, args, wd):
  class WDTestCase (line 199) | class WDTestCase(unittest.TestCase):
    method setUp (line 201) | def setUp(self):
    method tearDown (line 204) | def tearDown(self):
    method _mkfn (line 207) | def _mkfn(self, basename):
    method _mkf (line 210) | def _mkf(self, basename, mode='w'):

FILE: pydoop/utils/jvm.py
  function get_java_home (line 37) | def get_java_home():
  function load_jvm_lib (line 74) | def load_jvm_lib(java_home=None):
  function get_include_dirs (line 85) | def get_include_dirs():
  function get_libraries (line 101) | def get_libraries():
  function get_macros (line 114) | def get_macros():
  function get_jvm_lib_path_and_name (line 125) | def get_jvm_lib_path_and_name(java_home=None):
  function check_jni_header (line 139) | def check_jni_header(include_dirs=None):
  function find_file (line 150) | def find_file(path, to_find):

FILE: pydoop/utils/misc.py
  class NullHandler (line 31) | class NullHandler(logging.Handler):
    method emit (line 32) | def emit(self, record):
  class NullLogger (line 36) | class NullLogger(logging.Logger):
    method __init__ (line 37) | def __init__(self):
  function make_random_str (line 43) | def make_random_str(prefix="pydoop_", postfix=''):
  class Timer (line 47) | class Timer(object):
    method __init__ (line 49) | def __init__(self, ctx, counter_group=None):
    method _gen_counter_name (line 55) | def _gen_counter_name(self, event):
    method _get_time_counter (line 58) | def _get_time_counter(self, name):
    method start (line 66) | def start(self, s):
    method stop (line 69) | def stop(self, s):
    method time_block (line 73) | def time_block(self, event_name):
    class TimingBlock (line 76) | class TimingBlock(object):
      method __init__ (line 78) | def __init__(self, timer, event_name):
      method __enter__ (line 82) | def __enter__(self):
      method __exit__ (line 86) | def __exit__(self, exception_type, exception_val, exception_tb):

FILE: pydoop/utils/py3compat.py
  class Py2ABC (line 43) | class Py2ABC(object):
  function __identity (line 47) | def __identity(x):
  function __chr (line 51) | def __chr(x):
  function __iteritems_2 (line 55) | def __iteritems_2(x):
  function __iteritems_3 (line 59) | def __iteritems_3(x):
  function __parser_read_2 (line 63) | def __parser_read_2(parser, f):
  function __parser_read_3 (line 67) | def __parser_read_3(parser, f):

FILE: setup.py
  function rm_rf (line 90) | def rm_rf(path, dry_run=False):
  function mtime (line 108) | def mtime(fn):
  function must_generate (line 112) | def must_generate(target, prerequisites):
  function get_version_string (line 119) | def get_version_string():
  function write_config (line 127) | def write_config(filename="pydoop/config.py"):
  function write_version (line 137) | def write_version(filename="pydoop/version.py"):
  class JavaLib (line 179) | class JavaLib(object):
    method __init__ (line 181) | def __init__(self):
  class JavaBuilder (line 194) | class JavaBuilder(object):
    method __init__ (line 196) | def __init__(self, build_temp, build_lib):
    method run (line 201) | def run(self):
    method __build_java_lib (line 205) | def __build_java_lib(self, jlib):
  class BuildPydoopExt (line 251) | class BuildPydoopExt(build_ext):
    method __have_better_tls (line 253) | def __have_better_tls(self):
    method __finalize_hdfs (line 269) | def __finalize_hdfs(self, ext):
    method build_extension (line 292) | def build_extension(self, ext):
  class BuildPydoop (line 298) | class BuildPydoop(build):
    method build_java (line 300) | def build_java(self):
    method create_tmp (line 304) | def create_tmp(self):
    method clean_up (line 310) | def clean_up(self):
    method run (line 313) | def run(self):

FILE: src/it/crs4/pydoop/NoSeparatorTextOutputFormat.java
  class NoSeparatorTextOutputFormat (line 41) | public class NoSeparatorTextOutputFormat extends TextOutputFormat<Text, ...
    method getRecordWriter (line 43) | public RecordWriter<Text, Text>

FILE: src/it/crs4/pydoop/mapreduce/pipes/Application.java
  class Application (line 74) | class Application<K1 extends Writable, V1 extends Writable,
    method Application (line 90) | Application(TaskInputOutputContext<K1,V1,K2,V2> context,
    method getSecurityChallenge (line 163) | private String getSecurityChallenge() {
    method writePasswordToLocalFile (line 174) | private void writePasswordToLocalFile(String localPasswordFile,
    method getDownlink (line 190) | DownwardProtocol<K1, V1> getDownlink() {
    method waitForAuthentication (line 199) | void waitForAuthentication() throws IOException,
    method waitForFinish (line 211) | boolean waitForFinish() throws Throwable {
    method abort (line 221) | void abort(Throwable t) throws IOException {
    method cleanup (line 243) | void cleanup() throws IOException {
    method runClient (line 260) | static Process runClient(List<String> command,
    method createDigest (line 270) | public static String createDigest(byte[] password, String data)

FILE: src/it/crs4/pydoop/mapreduce/pipes/BinaryProtocol.java
  class BinaryProtocol (line 51) | class BinaryProtocol<K1 extends Writable, V1 extends Writable,
    type MessageType (line 71) | private static enum MessageType { START(0),
      method MessageType (line 91) | MessageType(int code) {
    class UplinkReaderThread (line 96) | private static class UplinkReaderThread<K2 extends WritableComparable,
      method UplinkReaderThread (line 106) | public UplinkReaderThread(InputStream stream,
      method closeConnection (line 116) | public void closeConnection() throws IOException {
      method run (line 120) | public void run() {
      method readObject (line 174) | private void readObject(Writable obj) throws IOException {
    class TeeOutputStream (line 197) | private static class TeeOutputStream extends FilterOutputStream {
      method TeeOutputStream (line 199) | TeeOutputStream(String filename, OutputStream base) throws IOExcepti...
      method write (line 203) | public void write(byte b[], int off, int len) throws IOException {
      method write (line 208) | public void write(int b) throws IOException {
      method flush (line 213) | public void flush() throws IOException {
      method close (line 218) | public void close() throws IOException {
    method BinaryProtocol (line 236) | public BinaryProtocol(Socket sock,
    method close (line 259) | public void close() throws IOException, InterruptedException {
    method authenticate (line 267) | public void authenticate(String digest, String challenge)
    method start (line 276) | public void start() throws IOException {
    method setJobConf (line 282) | public void setJobConf(Configuration conf) throws IOException {
    method setInputTypes (line 295) | public void setInputTypes(String keyType,
    method runMap (line 302) | public void runMap(InputSplit split, int numReduces,
    method mapItem (line 313) | public void mapItem(Writable key,
    method runReduce (line 320) | public void runReduce(int reduce, boolean pipedOutput) throws IOExcept...
    method reduceKey (line 326) | public void reduceKey(Writable key) throws IOException {
    method reduceValue (line 331) | public void reduceValue(Writable value) throws IOException {
    method endOfInput (line 336) | public void endOfInput() throws IOException {
    method abort (line 341) | public void abort() throws IOException {
    method flush (line 346) | public void flush() throws IOException {
    method writeObject (line 357) | private void writeObject(Writable obj) throws IOException {

FILE: src/it/crs4/pydoop/mapreduce/pipes/DownwardProtocol.java
  type DownwardProtocol (line 35) | interface DownwardProtocol<K extends Writable, V extends Writable> {
    method authenticate (line 40) | void authenticate(String digest, String challenge) throws IOException;
    method start (line 46) | void start() throws IOException;
    method setJobConf (line 53) | void setJobConf(Configuration conf) throws IOException;
    method setInputTypes (line 61) | void setInputTypes(String keyType, String valueType) throws IOException;
    method runMap (line 70) | void runMap(InputSplit split, int numReduces,
    method mapItem (line 79) | void mapItem(K key, V value) throws IOException;
    method runReduce (line 87) | void runReduce(int reduce, boolean pipedOutput) throws IOException;
    method reduceKey (line 94) | void reduceKey(K key) throws IOException;
    method reduceValue (line 101) | void reduceValue(V value) throws IOException;
    method endOfInput (line 108) | void endOfInput() throws IOException;
    method abort (line 114) | void abort() throws IOException;
    method flush (line 119) | void flush() throws IOException;
    method close (line 124) | void close() throws IOException, InterruptedException;

FILE: src/it/crs4/pydoop/mapreduce/pipes/DummyRecordReader.java
  class DummyRecordReader (line 27) | public abstract class DummyRecordReader
    method next (line 30) | public abstract  boolean next(FloatWritable key, NullWritable value)

FILE: src/it/crs4/pydoop/mapreduce/pipes/OpaqueSplit.java
  class OpaqueSplit (line 33) | class OpaqueSplit extends InputSplit implements Writable {
    method OpaqueSplit (line 37) | public OpaqueSplit() {
    method OpaqueSplit (line 41) | public OpaqueSplit(byte[] payload) {
    method getPayload (line 45) | public BytesWritable getPayload() {
    method getLength (line 49) | @Override
    method toString (line 54) | @Override
    method getLocations (line 59) | @Override
    method getLocationInfo (line 64) | @Override
    method write (line 71) | @Override
    method readFields (line 76) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/OutputHandler.java
  class OutputHandler (line 36) | class OutputHandler<K extends WritableComparable, V extends Writable>
    method OutputHandler (line 57) | public OutputHandler(TaskInputOutputContext context,
    method output (line 68) | @Override
    method partitionedOutput (line 76) | @Override
    method status (line 86) | @Override
    method progress (line 96) | @Override
    method done (line 109) | @Override
    method getProgress (line 121) | public float getProgress() {
    method failed (line 128) | public void failed(Throwable e) {
    method waitForFinish (line 140) | public synchronized boolean waitForFinish() throws Throwable {
    method registerCounter (line 150) | @Override
    method incrementCounter (line 156) | @Override
    method authenticate (line 166) | public synchronized boolean authenticate(String digest) throws IOExcep...
    method waitForAuthentication (line 184) | synchronized void waitForAuthentication()

FILE: src/it/crs4/pydoop/mapreduce/pipes/PipesMapper.java
  class PipesMapper (line 40) | class PipesMapper<K1 extends Writable, V1 extends Writable,
    method setup (line 50) | @Override
    method cleanup (line 60) | @Override
    method run (line 68) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PipesNonJavaInputFormat.java
  class PipesNonJavaInputFormat (line 51) | class PipesNonJavaInputFormat
    method getSplits (line 54) | public List<InputSplit> getSplits(JobContext context)
    method getOpaqueSplits (line 69) | private List<InputSplit> getOpaqueSplits(Configuration conf, String uri)
    method createRecordReader (line 92) | @Override
    class PipesDummyRecordReader (line 109) | static class PipesDummyRecordReader extends DummyRecordReader {
      method PipesDummyRecordReader (line 113) | public PipesDummyRecordReader() {}
      method PipesDummyRecordReader (line 115) | public PipesDummyRecordReader(InputSplit split, TaskAttemptContext c...
      method initialize (line 120) | @Override
      method close (line 124) | public synchronized void close() throws IOException {}
      method getProgress (line 126) | @Override
      method nextKeyValue (line 131) | @Override
      method getCurrentKey (line 136) | @Override
      method getCurrentValue (line 142) | @Override
      method next (line 148) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PipesNonJavaOutputFormat.java
  class PipesNonJavaOutputFormat (line 29) | public class PipesNonJavaOutputFormat<K, V> extends FileOutputFormat<K, ...
    method getRecordWriter (line 31) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PipesPartitioner.java
  class PipesPartitioner (line 33) | class PipesPartitioner<K extends WritableComparable, V extends Writable>
    method setConf (line 42) | public void setConf(Configuration conf) {
    method getConf (line 48) | public Configuration getConf() {
    method setNextPartition (line 56) | static void setNextPartition(int newValue) {
    method getPartition (line 67) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PipesReducer.java
  class PipesReducer (line 38) | class PipesReducer<K2 extends WritableComparable, V2 extends Writable,
    method setup (line 48) | @Override
    method reduce (line 58) | @Override
    method startApplication (line 70) | @SuppressWarnings("unchecked")
    method cleanup (line 88) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroBridgeKeyReader.java
  class PydoopAvroBridgeKeyReader (line 37) | public class PydoopAvroBridgeKeyReader
    method PydoopAvroBridgeKeyReader (line 42) | public PydoopAvroBridgeKeyReader(
    method getInRecords (line 48) | protected List<IndexedRecord> getInRecords()
    method initialize (line 54) | public void initialize(InputSplit split, TaskAttemptContext context)
    method getCurrentKey (line 64) | @Override
    method getCurrentValue (line 71) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroBridgeKeyValueReader.java
  class PydoopAvroBridgeKeyValueReader (line 36) | public class PydoopAvroBridgeKeyValueReader
    method PydoopAvroBridgeKeyValueReader (line 41) | public PydoopAvroBridgeKeyValueReader(
    method getInRecords (line 48) | protected List<IndexedRecord> getInRecords()
    method initialize (line 55) | public void initialize(InputSplit split, TaskAttemptContext context)
    method getCurrentKey (line 67) | @Override
    method getCurrentValue (line 74) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroBridgeKeyValueWriter.java
  class PydoopAvroBridgeKeyValueWriter (line 33) | public class PydoopAvroBridgeKeyValueWriter
    method PydoopAvroBridgeKeyValueWriter (line 36) | public PydoopAvroBridgeKeyValueWriter(
    method write (line 43) | public void write(Text key, Text value)

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroBridgeKeyWriter.java
  class PydoopAvroBridgeKeyWriter (line 34) | public class PydoopAvroBridgeKeyWriter extends PydoopAvroBridgeWriterBase {
    method PydoopAvroBridgeKeyWriter (line 36) | public PydoopAvroBridgeKeyWriter(
    method write (line 43) | public void write(Text key, Text ignore)

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroBridgeReaderBase.java
  class PydoopAvroBridgeReaderBase (line 42) | public abstract class PydoopAvroBridgeReaderBase<K, V>
    method getInRecords (line 71) | protected abstract List<IndexedRecord> getInRecords()
    method initialize (line 74) | public void initialize(InputSplit split, TaskAttemptContext context)
    method nextKeyValue (line 105) | public synchronized boolean nextKeyValue()
    method getProgress (line 147) | public float getProgress() throws IOException,  InterruptedException {
    method close (line 151) | public synchronized void close() throws IOException {

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroBridgeValueReader.java
  class PydoopAvroBridgeValueReader (line 37) | public class PydoopAvroBridgeValueReader
    method PydoopAvroBridgeValueReader (line 42) | public PydoopAvroBridgeValueReader(
    method getInRecords (line 48) | protected List<IndexedRecord> getInRecords()
    method initialize (line 54) | public void initialize(InputSplit split, TaskAttemptContext context)
    method getCurrentKey (line 64) | @Override
    method getCurrentValue (line 70) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroBridgeValueWriter.java
  class PydoopAvroBridgeValueWriter (line 34) | public class PydoopAvroBridgeValueWriter extends PydoopAvroBridgeWriterB...
    method PydoopAvroBridgeValueWriter (line 36) | public PydoopAvroBridgeValueWriter(
    method write (line 43) | public void write(Text ignore, Text value)

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroBridgeWriterBase.java
  class PydoopAvroBridgeWriterBase (line 46) | public abstract class PydoopAvroBridgeWriterBase
    method PydoopAvroBridgeWriterBase (line 64) | public PydoopAvroBridgeWriterBase(TaskAttemptContext context, AvroIO m...
    method getOutRecords (line 91) | protected List<GenericRecord> getOutRecords(List<Text> inRecords)
    method write (line 104) | protected void write(List<GenericRecord> outRecords)
    method close (line 126) | public void close(TaskAttemptContext context)

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroInputBridgeBase.java
  class PydoopAvroInputBridgeBase (line 32) | public abstract class PydoopAvroInputBridgeBase<K, V>
    method getActualFormat (line 38) | protected InputFormat getActualFormat(Configuration conf) {
    method getSplits (line 49) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroInputKeyBridge.java
  class PydoopAvroInputKeyBridge (line 31) | public class PydoopAvroInputKeyBridge
    method PydoopAvroInputKeyBridge (line 34) | public PydoopAvroInputKeyBridge() {
    method createRecordReader (line 38) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroInputKeyValueBridge.java
  class PydoopAvroInputKeyValueBridge (line 30) | public class PydoopAvroInputKeyValueBridge
    method PydoopAvroInputKeyValueBridge (line 33) | public PydoopAvroInputKeyValueBridge() {
    method createRecordReader (line 37) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroInputValueBridge.java
  class PydoopAvroInputValueBridge (line 31) | public class PydoopAvroInputValueBridge
    method PydoopAvroInputValueBridge (line 34) | public PydoopAvroInputValueBridge() {
    method createRecordReader (line 38) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroKeyInputFormat.java
  class PydoopAvroKeyInputFormat (line 32) | public class PydoopAvroKeyInputFormat
    method createRecordReader (line 35) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroKeyOutputFormat.java
  class PydoopAvroKeyOutputFormat (line 30) | public class PydoopAvroKeyOutputFormat
    method getRecordWriter (line 33) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroKeyRecordReader.java
  class PydoopAvroKeyRecordReader (line 31) | public class PydoopAvroKeyRecordReader
    method PydoopAvroKeyRecordReader (line 37) | public PydoopAvroKeyRecordReader(Schema readerSchema) {
    method getCurrentKey (line 41) | @Override
    method getCurrentValue (line 47) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroKeyRecordWriter.java
  class PydoopAvroKeyRecordWriter (line 31) | public class PydoopAvroKeyRecordWriter
    method PydoopAvroKeyRecordWriter (line 34) | public PydoopAvroKeyRecordWriter(Schema writerSchema,
    method write (line 40) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroKeyValueInputFormat.java
  class PydoopAvroKeyValueInputFormat (line 31) | public class PydoopAvroKeyValueInputFormat
    method createRecordReader (line 34) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroKeyValueOutputFormat.java
  class PydoopAvroKeyValueOutputFormat (line 31) | public class PydoopAvroKeyValueOutputFormat
    method getRecordWriter (line 34) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroKeyValueRecordReader.java
  class PydoopAvroKeyValueRecordReader (line 27) | public class PydoopAvroKeyValueRecordReader
    method PydoopAvroKeyValueRecordReader (line 30) | public PydoopAvroKeyValueRecordReader(Schema readerSchema) {
    method getCurrentKey (line 34) | @Override
    method getCurrentValue (line 40) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroKeyValueRecordWriter.java
  class PydoopAvroKeyValueRecordWriter (line 31) | public class PydoopAvroKeyValueRecordWriter
    method PydoopAvroKeyValueRecordWriter (line 36) | public PydoopAvroKeyValueRecordWriter(Schema writerSchema,
    method write (line 43) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroOutputBridgeBase.java
  class PydoopAvroOutputBridgeBase (line 33) | public abstract class PydoopAvroOutputBridgeBase
    method getActualFormat (line 39) | protected OutputFormat getActualFormat(Configuration conf) {
    method checkOutputSpecs (line 50) | @Override
    method getOutputCommitter (line 57) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroOutputFormatBase.java
  class PydoopAvroOutputFormatBase (line 31) | public abstract class PydoopAvroOutputFormatBase<K, V>
    method getOutputSchema (line 34) | protected static Schema getOutputSchema(

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroOutputKeyBridge.java
  class PydoopAvroOutputKeyBridge (line 30) | public class PydoopAvroOutputKeyBridge extends PydoopAvroOutputBridgeBase {
    method PydoopAvroOutputKeyBridge (line 32) | public PydoopAvroOutputKeyBridge() {
    method getRecordWriter (line 36) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroOutputKeyValueBridge.java
  class PydoopAvroOutputKeyValueBridge (line 30) | public class PydoopAvroOutputKeyValueBridge
    method PydoopAvroOutputKeyValueBridge (line 33) | public PydoopAvroOutputKeyValueBridge() {
    method getRecordWriter (line 37) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroOutputValueBridge.java
  class PydoopAvroOutputValueBridge (line 30) | public class PydoopAvroOutputValueBridge extends PydoopAvroOutputBridgeB...
    method PydoopAvroOutputValueBridge (line 32) | public PydoopAvroOutputValueBridge() {
    method getRecordWriter (line 36) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroRecordReaderBase.java
  class PydoopAvroRecordReaderBase (line 41) | public abstract class PydoopAvroRecordReaderBase<K, V>
    method PydoopAvroRecordReaderBase (line 53) | protected PydoopAvroRecordReaderBase(Schema readerSchema) {
    method initialize (line 58) | @Override
    method nextKeyValue (line 77) | @Override
    method getProgress (line 87) | @Override
    method close (line 100) | @Override
    method getCurrentRecord (line 111) | protected GenericRecord getCurrentRecord() {
    method createSeekableInput (line 115) | protected SeekableInput createSeekableInput(Configuration conf, Path p...

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroRecordWriterBase.java
  class PydoopAvroRecordWriterBase (line 34) | public abstract class PydoopAvroRecordWriterBase<K, V>
    method PydoopAvroRecordWriterBase (line 39) | protected PydoopAvroRecordWriterBase(Schema writerSchema,
    method close (line 48) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroValueInputFormat.java
  class PydoopAvroValueInputFormat (line 32) | public class PydoopAvroValueInputFormat
    method createRecordReader (line 35) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroValueOutputFormat.java
  class PydoopAvroValueOutputFormat (line 30) | public class PydoopAvroValueOutputFormat
    method getRecordWriter (line 33) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroValueRecordReader.java
  class PydoopAvroValueRecordReader (line 31) | public class PydoopAvroValueRecordReader
    method PydoopAvroValueRecordReader (line 37) | public PydoopAvroValueRecordReader(Schema readerSchema) {
    method getCurrentKey (line 41) | @Override
    method getCurrentValue (line 47) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/PydoopAvroValueRecordWriter.java
  class PydoopAvroValueRecordWriter (line 31) | public class PydoopAvroValueRecordWriter
    method PydoopAvroValueRecordWriter (line 34) | public PydoopAvroValueRecordWriter(Schema writerSchema,
    method write (line 40) | @Override

FILE: src/it/crs4/pydoop/mapreduce/pipes/Submitter.java
  class CommandLineParser (line 66) | class CommandLineParser {
    method CommandLineParser (line 69) | CommandLineParser() {
    method addOption (line 85) | void addOption(String longName, boolean required, String description,
    method addArgument (line 93) | void addArgument(String name, boolean required, String description) {
    method parse (line 100) | CommandLine parse(Configuration conf, String[] args)
    method printUsage (line 108) | void printUsage() {
  class Submitter (line 131) | public class Submitter extends Configured implements Tool {
    type AvroIO (line 133) | public static enum AvroIO {
    method getPydoopProperties (line 162) | public static Properties getPydoopProperties() {
    method Submitter (line 176) | public Submitter() {
    method isLocalFS (line 181) | public static boolean isLocalFS(Configuration conf) throws IOException {
    method getExecutable (line 190) | public static String getExecutable(Configuration conf) {
    method setExecutable (line 200) | public static void setExecutable(Configuration conf, String executable) {
    method setIsJavaRecordReader (line 209) | public static void setIsJavaRecordReader(Configuration conf, boolean v...
    method getIsJavaRecordReader (line 218) | public static boolean getIsJavaRecordReader(Configuration conf) {
    method setIsJavaMapper (line 227) | public static void setIsJavaMapper(Configuration conf, boolean value) {
    method getIsJavaMapper (line 236) | public static boolean getIsJavaMapper(Configuration conf) {
    method setIsJavaReducer (line 245) | public static void setIsJavaReducer(Configuration conf, boolean value) {
    method getIsJavaReducer (line 254) | public static boolean getIsJavaReducer(Configuration conf) {
    method setIsJavaRecordWriter (line 263) | public static void setIsJavaRecordWriter(Configuration conf, boolean v...
    method getIsJavaRecordWriter (line 272) | public static boolean getIsJavaRecordWriter(Configuration conf) {
    method setIfUnset (line 283) | private static void setIfUnset(Configuration conf, String key, String ...
    method setJavaPartitioner (line 294) | static void setJavaPartitioner(Configuration conf, Class cls) {
    method getJavaPartitioner (line 303) | static Class<? extends Partitioner> getJavaPartitioner(Configuration c...
    method getClass (line 308) | private static <InterfaceType>
    method getKeepCommandFile (line 327) | public static boolean getKeepCommandFile(Configuration conf) {
    method setKeepCommandFile (line 336) | public static void setKeepCommandFile(Configuration conf, boolean keep) {
    method setupPipesJob (line 340) | private static void setupPipesJob(Job job)
    method run (line 453) | public int run(String[] args) throws Exception {
    method main (line 548) | public static void main(String[] args) throws Exception {

FILE: src/it/crs4/pydoop/mapreduce/pipes/TaskLog.java
  class TaskLog (line 71) | @InterfaceAudience.Private
    method getYarnAppContainerLogDir (line 84) | private static String getYarnAppContainerLogDir(){
    method getMRv2LogDir (line 93) | public static String getMRv2LogDir() {
    method getTaskLogFile (line 97) | public static File getTaskLogFile(TaskAttemptID taskid, boolean isClea...
    method getRealTaskLogFileLocation (line 106) | static File getRealTaskLogFileLocation(TaskAttemptID taskid,
    class LogFileDetail (line 117) | private static class LogFileDetail {
    method getLogFileDetail (line 124) | private static LogFileDetail getLogFileDetail(TaskAttemptID taskid,
    method getTmpIndexFile (line 177) | private static File getTmpIndexFile(TaskAttemptID taskid, boolean isCl...
    method getIndexFile (line 181) | static File getIndexFile(TaskAttemptID taskid, boolean isCleanup) {
    method obtainLogDirOwner (line 189) | static String obtainLogDirOwner(TaskAttemptID taskid) throws IOExcepti...
    method getBaseLogDir (line 197) | static String getBaseLogDir() {
    method getAttemptDir (line 201) | static File getAttemptDir(TaskAttemptID taskid, boolean isCleanup) {
    method writeToIndexFile (line 209) | private static synchronized
    method resetPrevLengths (line 255) | private static void resetPrevLengths(String logLocation) {
    method syncLogs (line 262) | @SuppressWarnings("unchecked")
    method syncLogsShutdown (line 287) | public static synchronized void syncLogsShutdown(
    method syncLogs (line 303) | @SuppressWarnings("unchecked")
    method flushAppenders (line 322) | @SuppressWarnings("unchecked")
    method createLogSyncer (line 338) | public static ScheduledExecutorService createLogSyncer() {
    type LogName (line 369) | @InterfaceAudience.Private
      method LogName (line 388) | private LogName(String prefix) {
      method toString (line 392) | @Override
    class Reader (line 398) | public static class Reader extends InputStream {
      method Reader (line 414) | public Reader(TaskAttemptID taskid, LogName kind,
      method read (line 446) | @Override
      method read (line 456) | @Override
      method available (line 466) | @Override
      method close (line 471) | @Override
    method getTaskLogLength (line 485) | public static long getTaskLogLength(Configuration conf) {
    method captureOutAndError (line 503) | public static List<String> captureOutAndError(List<String> setup,
    method buildCommandLine (line 530) | static String buildCommandLine(List<String> setup, List<String> cmd,
    method buildDebugScriptCommandLine (line 593) | static String buildDebugScriptCommandLine(List<String> cmd, String deb...
    method addCommand (line 624) | public static String addCommand(List<String> cmd, boolean isExecutable)
    method getUserLogDir (line 649) | static File getUserLogDir() {
    method getJobDir (line 665) | public static File getJobDir(JobID jobid) {

FILE: src/it/crs4/pydoop/mapreduce/pipes/TaskLogAppender.java
  class TaskLogAppender (line 38) | @InterfaceStability.Unstable
    method activateOptions (line 51) | @Override
    method setOptionsFromSystemProperties (line 70) | private synchronized void setOptionsFromSystemProperties() {
    method append (line 86) | @Override
    method flush (line 100) | @Override
    method close (line 107) | @Override
    method getTaskId (line 121) | public synchronized String getTaskId() {
    method setTaskId (line 125) | public synchronized void setTaskId(String taskId) {
    method getTotalLogFileSize (line 131) | public synchronized long getTotalLogFileSize() {
    method setTotalLogFileSize (line 135) | public synchronized void setTotalLogFileSize(long logSize) {
    method setIsCleanup (line 145) | public synchronized void setIsCleanup(boolean isCleanup) {
    method getIsCleanup (line 154) | public synchronized boolean getIsCleanup() {

FILE: src/it/crs4/pydoop/mapreduce/pipes/UpwardProtocol.java
  type UpwardProtocol (line 29) | interface UpwardProtocol<K extends WritableComparable, V extends Writabl...
    method output (line 36) | void output(K key, V value) throws IOException, InterruptedException;
    method partitionedOutput (line 46) | void partitionedOutput(int reduce, K key,
    method status (line 54) | void status(String msg) throws IOException, InterruptedException;
    method progress (line 61) | void progress(float progress) throws IOException, InterruptedException;
    method done (line 68) | void done() throws IOException, InterruptedException;
    method failed (line 74) | void failed(Throwable e);
    method registerCounter (line 82) | void registerCounter(int id, String group, String name) throws IOExcep...
    method incrementCounter (line 90) | void incrementCounter(int id, long amount) throws IOException;
    method authenticate (line 99) | boolean authenticate(String digest) throws IOException;

FILE: src/libhdfs/common/htable.c
  type htable_pair (line 27) | struct htable_pair {
  type htable (line 35) | struct htable {
  function htable_insert_internal (line 54) | static void htable_insert_internal(struct htable_pair *nelem,
  function htable_realloc (line 74) | static int htable_realloc(struct htable *htable, uint32_t new_capacity)
  function round_up_to_power_of_2 (line 97) | static uint32_t round_up_to_power_of_2(uint32_t i)
  type htable (line 112) | struct htable
  type htable (line 115) | struct htable
  function htable_visit (line 135) | void htable_visit(struct htable *htable, visitor_fn_t fun, void *ctx)
  function htable_free (line 147) | void htable_free(struct htable *htable)
  function htable_put (line 155) | int htable_put(struct htable *htable, void *key, void *val)
  function htable_get_internal (line 185) | static int htable_get_internal(const struct htable *htable,
  type htable (line 213) | struct htable
  function htable_pop (line 223) | void htable_pop(struct htable *htable, const void *key,
  function htable_used (line 260) | uint32_t htable_used(const struct htable *htable)
  function htable_capacity (line 265) | uint32_t htable_capacity(const struct htable *htable)
  function ht_hash_string (line 270) | uint32_t ht_hash_string(const void *str, uint32_t max)
  function ht_compare_string (line 282) | int ht_compare_string(const void *a, const void *b)

FILE: src/libhdfs/common/htable.h
  type htable (line 28) | struct htable
  type htable (line 59) | struct htable
  type htable (line 71) | struct htable
  type htable (line 81) | struct htable
  type htable (line 96) | struct htable
  type htable (line 106) | struct htable
  type htable (line 118) | struct htable
  type htable (line 128) | struct htable
  type htable (line 137) | struct htable

FILE: src/libhdfs/exception.c
  type ExceptionInfo (line 30) | struct ExceptionInfo {
  type ExceptionInfo (line 36) | struct ExceptionInfo
  function getExceptionInfo (line 94) | void getExceptionInfo(const char *excName, int noPrintFlags,
  function printExceptionAndFreeV (line 113) | int printExceptionAndFreeV(JNIEnv *env, jthrowable exc, int noPrintFlags,
  function printExceptionAndFree (line 173) | int printExceptionAndFree(JNIEnv *env, jthrowable exc, int noPrintFlags,
  function printPendingExceptionAndFree (line 185) | int printPendingExceptionAndFree(JNIEnv *env, int noPrintFlags,
  function jthrowable (line 208) | jthrowable getPendingExceptionAndClear(JNIEnv *env)
  function jthrowable (line 217) | jthrowable newRuntimeError(JNIEnv *env, const char *fmt, ...)

FILE: src/libhdfs/hdfs.c
  type hdfsStreamType (line 66) | enum hdfsStreamType
  type hdfsFile_internal (line 76) | struct hdfsFile_internal {
  type hdfsExtendedFileInfo (line 87) | struct hdfsExtendedFileInfo {
  function hdfsFileIsOpenForRead (line 91) | int hdfsFileIsOpenForRead(hdfsFile file)
  function hdfsFileGetReadStatistics (line 96) | int hdfsFileGetReadStatistics(hdfsFile file,
  function hdfsReadStatisticsGetRemoteBytesRead (line 181) | int64_t hdfsReadStatisticsGetRemoteBytesRead(
  function hdfsFileClearReadStatistics (line 187) | int hdfsFileClearReadStatistics(hdfsFile file)
  function hdfsFileFreeReadStatistics (line 218) | void hdfsFileFreeReadStatistics(struct hdfsReadStatistics *stats)
  function hdfsFileIsOpenForWrite (line 223) | int hdfsFileIsOpenForWrite(hdfsFile file)
  function hdfsFileUsesDirectRead (line 228) | int hdfsFileUsesDirectRead(hdfsFile file)
  function hdfsFileDisableDirectRead (line 233) | void hdfsFileDisableDirectRead(hdfsFile file)
  function hdfsDisableDomainSocketSecurity (line 238) | int hdfsDisableDomainSocketSecurity(void)
  type hdfsJniEnv (line 261) | typedef struct
  function jthrowable (line 273) | static jthrowable constructNewObjectOfPath(JNIEnv *env, const char *path,
  function jthrowable (line 294) | static jthrowable hadoopConfGetStr(JNIEnv *env, jobject jConfiguration,
  function hdfsConfGetStr (line 317) | int hdfsConfGetStr(const char *key, char **val)
  function hdfsConfStrFree (line 349) | void hdfsConfStrFree(char *val)
  function jthrowable (line 354) | static jthrowable hadoopConfGetInt(JNIEnv *env, jobject jConfiguration,
  function hdfsConfGetInt (line 374) | int hdfsConfGetInt(const char *key, int32_t *val)
  type hdfsBuilderConfOpt (line 406) | struct hdfsBuilderConfOpt {
  type hdfsBuilder (line 412) | struct hdfsBuilder {
  type hdfsBuilder (line 421) | struct hdfsBuilder
  type hdfsBuilder (line 423) | struct hdfsBuilder
  type hdfsBuilder (line 423) | struct hdfsBuilder
  function hdfsBuilderConfSetStr (line 431) | int hdfsBuilderConfSetStr(struct hdfsBuilder *bld, const char *key,
  function hdfsFreeBuilder (line 447) | void hdfsFreeBuilder(struct hdfsBuilder *bld)
  function hdfsBuilderSetForceNewInstance (line 460) | void hdfsBuilderSetForceNewInstance(struct hdfsBuilder *bld)
  function hdfsBuilderSetNameNode (line 465) | void hdfsBuilderSetNameNode(struct hdfsBuilder *bld, const char *nn)
  function hdfsBuilderSetNameNodePort (line 470) | void hdfsBuilderSetNameNodePort(struct hdfsBuilder *bld, tPort port)
  function hdfsBuilderSetUserName (line 475) | void hdfsBuilderSetUserName(struct hdfsBuilder *bld, const char *userName)
  function hdfsBuilderSetKerbTicketCachePath (line 480) | void hdfsBuilderSetKerbTicketCachePath(struct hdfsBuilder *bld,
  function hdfsFS (line 486) | hdfsFS hdfsConnect(const char *host, tPort port)
  function hdfsFS (line 497) | hdfsFS hdfsConnectNewInstance(const char *host, tPort port)
  function hdfsFS (line 508) | hdfsFS hdfsConnectAsUser(const char *host, tPort port, const char *user)
  function hdfsFS (line 520) | hdfsFS hdfsConnectAsUserNewInstance(const char *host, tPort port,
  function calcEffectiveURI (line 549) | static int calcEffectiveURI(struct hdfsBuilder *bld, char ** uri)
  type hdfsBuilder (line 589) | struct hdfsBuilder
  function hdfsFS (line 599) | hdfsFS hdfsBuilderConnect(struct hdfsBuilder *bld)
  function hdfsDisconnect (line 774) | int hdfsDisconnect(hdfsFS fs)
  function jthrowable (line 825) | static jthrowable getDefaultBlockSize(JNIEnv *env, jobject jFS,
  function hdfsFile (line 839) | hdfsFile hdfsOpenFile(hdfsFS fs, const char *path, int flags,
  function hdfsTruncateFile (line 1029) | int hdfsTruncateFile(hdfsFS fs, const char* path, tOffset newlength)
  function hdfsUnbufferFile (line 1066) | int hdfsUnbufferFile(hdfsFile file)
  function hdfsCloseFile (line 1094) | int hdfsCloseFile(hdfsFS fs, hdfsFile file)
  function hdfsExists (line 1145) | int hdfsExists(hdfsFS fs, const char *path)
  function readPrepare (line 1186) | static int readPrepare(JNIEnv* env, hdfsFS fs, hdfsFile f,
  function tSize (line 1207) | tSize hdfsRead(hdfsFS fs, hdfsFile f, void* buffer, tSize length)
  function tSize (line 1286) | tSize readDirect(hdfsFS fs, hdfsFile f, void* buffer, tSize length)
  function tSize (line 1327) | tSize hdfsPread(hdfsFS fs, hdfsFile f, tOffset position,
  function tSize (line 1395) | tSize hdfsWrite(hdfsFS fs, hdfsFile f, const void* buffer, tSize length)
  function hdfsSeek (line 1466) | int hdfsSeek(hdfsFS fs, hdfsFile f, tOffset desiredPos)
  function tOffset (line 1501) | tOffset hdfsTell(hdfsFS fs, hdfsFile f)
  function hdfsFlush (line 1540) | int hdfsFlush(hdfsFS fs, hdfsFile f)
  function hdfsHFlush (line 1569) | int hdfsHFlush(hdfsFS fs, hdfsFile f)
  function hdfsHSync (line 1598) | int hdfsHSync(hdfsFS fs, hdfsFile f)
  function hdfsAvailable (line 1627) | int hdfsAvailable(hdfsFS fs, hdfsFile f)
  function hdfsCopyImpl (line 1661) | static int hdfsCopyImpl(hdfsFS srcFS, const char *src, hdfsFS dstFS,
  function hdfsCopy (line 1737) | int hdfsCopy(hdfsFS srcFS, const char *src, hdfsFS dstFS, const char *dst)
  function hdfsMove (line 1742) | int hdfsMove(hdfsFS srcFS, const char *src, hdfsFS dstFS, const char *dst)
  function hdfsDelete (line 1747) | int hdfsDelete(hdfsFS fs, const char *path, int recursive)
  function hdfsRename (line 1792) | int hdfsRename(hdfsFS fs, const char *oldPath, const char *newPath)
  function hdfsSetWorkingDirectory (line 1929) | int hdfsSetWorkingDirectory(hdfsFS fs, const char *path)
  function hdfsCreateDirectory (line 1970) | int hdfsCreateDirectory(hdfsFS fs, const char *path)
  function hdfsSetReplication (line 2021) | int hdfsSetReplication(hdfsFS fs, const char *path, int16_t replication)
  function hdfsChown (line 2067) | int hdfsChown(hdfsFS fs, const char *path, const char *owner, const char...
  function hdfsChmod (line 2136) | int hdfsChmod(hdfsFS fs, const char *path, short mode)
  function hdfsUtime (line 2196) | int hdfsUtime(hdfsFS fs, const char *path, tTime mtime, tTime atime)
  type hadoopRzOptions (line 2246) | struct hadoopRzOptions
  type hadoopRzOptions (line 2254) | struct hadoopRzOptions
  type hadoopRzOptions (line 2256) | struct hadoopRzOptions
  type hadoopRzOptions (line 2265) | struct hadoopRzOptions
  function hadoopRzOptionsClearCached (line 2273) | static void hadoopRzOptionsClearCached(JNIEnv *env,
  function hadoopRzOptionsSetSkipChecksum (line 2283) | int hadoopRzOptionsSetSkipChecksum(
  function hadoopRzOptionsSetByteBufferPool (line 2297) | int hadoopRzOptionsSetByteBufferPool(
  function hadoopRzOptionsFree (line 2331) | void hadoopRzOptionsFree(struct hadoopRzOptions *opts)
  type hadoopRzBuffer (line 2346) | struct hadoopRzBuffer
  function jthrowable (line 2354) | static jthrowable hadoopRzOptionsGetEnumSet(JNIEnv *env,
  function hadoopReadZeroExtractBuffer (line 2405) | static int hadoopReadZeroExtractBuffer(JNIEnv *env,
  function translateZCRException (line 2493) | static int translateZCRException(JNIEnv *env, jthrowable exc)
  type hadoopRzBuffer (line 2518) | struct hadoopRzBuffer
  type hadoopRzOptions (line 2519) | struct hadoopRzOptions
  type hadoopRzBuffer (line 2525) | struct hadoopRzBuffer
  type hadoopRzBuffer (line 2538) | struct hadoopRzBuffer
  function hadoopRzBufferLength (line 2591) | int32_t hadoopRzBufferLength(const struct hadoopRzBuffer *buffer)
  type hadoopRzBuffer (line 2596) | struct hadoopRzBuffer
  function hadoopRzBufferFree (line 2601) | void hadoopRzBufferFree(hdfsFile file, struct hadoopRzBuffer *buffer)
  function hdfsFreeHosts (line 2789) | void hdfsFreeHosts(char ***blockHosts)
  function tOffset (line 2802) | tOffset hdfsGetDefaultBlockSize(hdfsFS fs)
  function tOffset (line 2830) | tOffset hdfsGetDefaultBlockSizeAtPath(hdfsFS fs, const char *path)
  function tOffset (line 2864) | tOffset hdfsGetCapacity(hdfsFS fs)
  function tOffset (line 2904) | tOffset hdfsGetUsed(hdfsFS fs)
  function getExtendedFileInfoOffset (line 2961) | static size_t getExtendedFileInfoOffset(const char *str)
  type hdfsExtendedFileInfo (line 2967) | struct hdfsExtendedFileInfo
  type hdfsExtendedFileInfo (line 2970) | struct hdfsExtendedFileInfo
  function jthrowable (line 2974) | static jthrowable
  function jthrowable (line 3124) | static jthrowable
  function hdfsFileInfo (line 3167) | hdfsFileInfo* hdfsListDirectory(hdfsFS fs, const char *path, int *numEnt...
  function hdfsFileInfo (line 3263) | hdfsFileInfo *hdfsGetPathInfo(hdfsFS fs, const char *path)
  function hdfsFreeFileInfoEntry (line 3307) | static void hdfsFreeFileInfoEntry(hdfsFileInfo *hdfsFileInfo)
  function hdfsFreeFileInfo (line 3315) | void hdfsFreeFileInfo(hdfsFileInfo *hdfsFileInfo, int numEntries)
  function hdfsFileIsEncrypted (line 3327) | int hdfsFileIsEncrypted(hdfsFileInfo *fileInfo)

FILE: src/libhdfs/include/hdfs/hdfs.h
  type hdfsBuilder (line 74) | struct hdfsBuilder
  type tSize (line 75) | typedef int32_t   tSize;
  type time_t (line 76) | typedef time_t    tTime;
  type tOffset (line 77) | typedef int64_t   tOffset;
  type tPort (line 78) | typedef uint16_t  tPort;
  type tObjectKind (line 79) | typedef enum tObjectKind {
  type hdfs_internal (line 88) | struct hdfs_internal
  type hdfs_internal (line 89) | struct hdfs_internal
  type hdfsFile_internal (line 91) | struct hdfsFile_internal
  type hdfsFile_internal (line 92) | struct hdfsFile_internal
  type hadoopRzOptions (line 94) | struct hadoopRzOptions
  type hadoopRzBuffer (line 96) | struct hadoopRzBuffer
  type hdfsReadStatistics (line 116) | struct hdfsReadStatistics {
  type hdfsReadStatistics (line 138) | struct hdfsReadStatistics
  type hdfsReadStatistics (line 147) | struct hdfsReadStatistics
  type hdfsReadStatistics (line 169) | struct hdfsReadStatistics
  type hdfsBuilder (line 234) | struct hdfsBuilder
  type hdfsBuilder (line 251) | struct hdfsBuilder
  type hdfsBuilder (line 275) | struct hdfsBuilder
  type hdfsBuilder (line 284) | struct hdfsBuilder
  type hdfsBuilder (line 293) | struct hdfsBuilder
  type hdfsBuilder (line 304) | struct hdfsBuilder
  type hdfsBuilder (line 316) | struct hdfsBuilder
  type hdfsBuilder (line 330) | struct hdfsBuilder
  type hdfsFileInfo (line 654) | typedef struct  {
  type hadoopRzOptions (line 843) | struct hadoopRzOptions
  type hadoopRzOptions (line 861) | struct hadoopRzOptions
  type hadoopRzOptions (line 870) | struct hadoopRzOptions
  type hadoopRzOptions (line 897) | struct hadoopRzOptions
  type hadoopRzBuffer (line 906) | struct hadoopRzBuffer
  type hadoopRzBuffer (line 919) | struct hadoopRzBuffer
  type hadoopRzBuffer (line 929) | struct hadoopRzBuffer

FILE: src/libhdfs/jni_helper.c
  type htable (line 30) | struct htable
  function destroyLocalReference (line 57) | void destroyLocalReference(JNIEnv *env, jobject jObject)
  function jthrowable (line 63) | static jthrowable validateMethodType(JNIEnv *env, MethType methType)
  function jthrowable (line 72) | jthrowable newJavaStr(JNIEnv *env, const char *str, jstring *out)
  function jthrowable (line 92) | jthrowable newCStr(JNIEnv *env, jstring jstr, char **out)
  function jthrowable (line 109) | jthrowable invokeMethod(JNIEnv *env, jvalue *retval, MethType methType,
  function jthrowable (line 203) | jthrowable constructNewObjectOfClass(JNIEnv *env, jobject *out, const ch...
  function jthrowable (line 229) | jthrowable methodIdFromClass(const char *className, const char *methName,
  function jthrowable (line 258) | jthrowable globalClassReference(const char *className, JNIEnv *env, jcla...
  function jthrowable (line 306) | jthrowable classNameOfObject(jobject jobj, JNIEnv *env, char **name)
  function JNIEnv (line 369) | static JNIEnv* getGlobalJNIEnv(void)
  function JNIEnv (line 499) | JNIEnv* getJNIEnv(void)
  function javaObjectIsOfClass (line 526) | int javaObjectIsOfClass(JNIEnv *env, jobject obj, const char *name)
  function jthrowable (line 542) | jthrowable hadoopConfSetStr(JNIEnv *env, jobject jConfiguration,
  function jthrowable (line 566) | jthrowable fetchEnumInstance(JNIEnv *env, const char *className,

FILE: src/libhdfs/jni_helper.h
  type MethType (line 33) | typedef enum {

FILE: src/libhdfs/os/posix/mutexes.c
  function mutexLock (line 27) | int mutexLock(mutex *m) {
  function mutexUnlock (line 36) | int mutexUnlock(mutex *m) {

FILE: src/libhdfs/os/posix/platform.h
  type pthread_mutex_t (line 31) | typedef pthread_mutex_t mutex;
  type pthread_t (line 32) | typedef pthread_t threadId;

FILE: src/libhdfs/os/posix/thread.c
  function threadCreate (line 37) | int threadCreate(thread *t) {
  function threadJoin (line 46) | int threadJoin(const thread *t) {

FILE: src/libhdfs/os/posix/thread_local_storage.c
  function hdfsThreadDestructor (line 37) | static void hdfsThreadDestructor(void *v)
  function threadLocalStorageGet (line 53) | int threadLocalStorageGet(JNIEnv **env)
  function threadLocalStorageSet (line 70) | int threadLocalStorageSet(JNIEnv *env)

FILE: src/libhdfs/os/thread.h
  type thread (line 32) | typedef struct {

FILE: src/libhdfs/os/windows/mutexes.c
  function initializeMutexes (line 36) | static void __cdecl initializeMutexes(void) {
  function mutexLock (line 44) | int mutexLock(mutex *m) {
  function mutexUnlock (line 49) | int mutexUnlock(mutex *m) {

FILE: src/libhdfs/os/windows/platform.h
  type CRITICAL_SECTION (line 79) | typedef CRITICAL_SECTION mutex;
  type HANDLE (line 84) | typedef HANDLE threadId;

FILE: src/libhdfs/os/windows/thread.c
  function DWORD (line 31) | static DWORD WINAPI runThread(LPVOID toRun) {
  function threadCreate (line 37) | int threadCreate(thread *t) {
  function threadJoin (line 50) | int threadJoin(const thread *t) {

FILE: src/libhdfs/os/windows/thread_local_storage.c
  function detachCurrentThreadFromJvm (line 32) | static void detachCurrentThreadFromJvm()
  function tlsCallback (line 68) | static void NTAPI tlsCallback(PVOID h, DWORD reason, PVOID pv)
  function threadLocalStorageGet (line 125) | int threadLocalStorageGet(JNIEnv **env)
  function threadLocalStorageSet (line 161) | int threadLocalStorageSet(JNIEnv *env)

FILE: src/native_core_hdfs/hdfs_file.cc
  function PyObject (line 26) | PyObject* FileClass_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
  function FileClass_dealloc (line 52) | void FileClass_dealloc(FileInfo* self)
  function FileClass_init (line 59) | int FileClass_init(FileInfo *self, PyObject *args, PyObject *kwds)
  function FileClass_init_internal (line 85) | int FileClass_init_internal(FileInfo *self, hdfsFS fs, hdfsFile file)
  function PyObject (line 94) | PyObject* FileClass_close(FileInfo* self){
  function PyObject (line 105) | PyObject* FileClass_getclosed(FileInfo* self, void* closure) {
  function PyObject (line 110) | PyObject* FileClass_getbuff_size(FileInfo* self, void* closure) {
  function PyObject (line 115) | PyObject* FileClass_getname(FileInfo* self, void* closure) {
  function PyObject (line 121) | PyObject* FileClass_getmode(FileInfo* self, void* closure) {
  function PyObject (line 127) | PyObject* FileClass_readable(FileInfo* self) {
  function PyObject (line 132) | PyObject* FileClass_writable(FileInfo* self) {
  function PyObject (line 137) | PyObject* FileClass_seekable(FileInfo* self) {
  function PyObject (line 142) | PyObject* FileClass_available(FileInfo *self){
  function _ensure_open_for_reading (line 150) | static int _ensure_open_for_reading(FileInfo* self) {
  function Py_ssize_t (line 159) | static Py_ssize_t _read_into_pybuf(FileInfo *self, char* buf, Py_ssize_t...
  function PyObject (line 179) | static PyObject* _read_new_pybuf(FileInfo* self, Py_ssize_t nbytes) {
  function Py_ssize_t (line 217) | static Py_ssize_t _pread_into_pybuf(FileInfo *self, char* buffer, Py_ssi...
  function PyObject (line 246) | static PyObject* _pread_new_pybuf(FileInfo* self, Py_ssize_t pos, Py_ssi...
  function PyObject (line 276) | PyObject* FileClass_read(FileInfo *self, PyObject *args, PyObject *kwds){
  function PyObject (line 299) | PyObject* FileClass_read_chunk(FileInfo *self, PyObject *args, PyObject ...
  function PyObject (line 319) | PyObject* FileClass_pread(FileInfo *self, PyObject *args, PyObject *kwds){
  function PyObject (line 346) | PyObject* FileClass_pread_chunk(FileInfo *self, PyObject *args, PyObject...
  function PyObject (line 375) | PyObject* FileClass_seek(FileInfo *self, PyObject *args, PyObject *kwds) {
  function PyObject (line 416) | PyObject* FileClass_tell(FileInfo *self, PyObject *args, PyObject *kwds){
  function PyObject (line 429) | PyObject* FileClass_write(FileInfo* self, PyObject *args, PyObject *kwds) {
  function PyObject (line 458) | PyObject* FileClass_flush(FileInfo *self){

FILE: src/native_core_hdfs/hdfs_file.h
  type FileInfo (line 38) | typedef struct {

FILE: src/native_core_hdfs/hdfs_fs.cc
  function PyObject (line 33) | PyObject* FsClass_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
  function FsClass_dealloc (line 50) | void FsClass_dealloc(FsInfo* self)
  function FsClass_init (line 56) | int FsClass_init(FsInfo *self, PyObject *args, PyObject *kwds)
  function PyObject (line 95) | PyObject* FsClass_close(FsInfo* self)
  function PyObject (line 102) | PyObject* FsClass_get_working_directory(FsInfo* self) {
  function PyObject (line 123) | PyObject* FsClass_get_path_info(FsInfo* self, PyObject *args, PyObject *...
  function PyObject (line 167) | PyObject* FsClass_get_hosts(FsInfo* self, PyObject *args, PyObject *kwds) {
  function PyObject (line 228) | PyObject* FsClass_get_default_block_size(FsInfo* self) {
  function PyObject (line 233) | PyObject* FsClass_get_used(FsInfo* self) {
  function PyObject (line 238) | PyObject* FsClass_set_replication(FsInfo* self, PyObject* args, PyObject...
  function PyObject (line 264) | PyObject* FsClass_set_working_directory(FsInfo* self, PyObject* args, Py...
  function PyObject (line 289) | PyObject* FsClass_open_file(FsInfo* self, PyObject *args, PyObject *kwds)
  function PyObject (line 377) | PyObject *FsClass_get_capacity(FsInfo *self) {
  function PyObject (line 403) | PyObject* FsClass_copy(FsInfo* self, PyObject *args, PyObject *kwds)
  function PyObject (line 433) | PyObject *FsClass_exists(FsInfo *self, PyObject *args, PyObject *kwds) {
  function PyObject (line 464) | PyObject *FsClass_create_directory(FsInfo *self, PyObject *args, PyObjec...
  function setPathInfo (line 496) | static int setPathInfo(PyObject* dict, hdfsFileInfo* fileInfo) {
  function PyObject (line 549) | PyObject *FsClass_list_directory(FsInfo *self, PyObject *args, PyObject ...
  function PyObject (line 628) | PyObject *FsClass_move(FsInfo *self, PyObject *args, PyObject *kwds) {
  function PyObject (line 659) | PyObject *FsClass_rename(FsInfo *self, PyObject *args, PyObject *kwds) {
  function PyObject (line 686) | PyObject *FsClass_delete(FsInfo *self, PyObject *args, PyObject *kwds) {
  function PyObject (line 714) | PyObject *FsClass_chmod(FsInfo *self, PyObject *args, PyObject *kwds) {
  function PyObject (line 754) | PyObject *FsClass_chown(FsInfo *self, PyObject *args, PyObject *kwds) {
  function PyObject (line 796) | PyObject *FsClass_utime(FsInfo *self, PyObject *args, PyObject *kwds) {

FILE: src/native_core_hdfs/hdfs_fs.h
  type FsInfo (line 42) | typedef struct {

FILE: src/native_core_hdfs/hdfs_module.cc
  type PyModuleDef (line 216) | struct PyModuleDef
  function PyMODINIT_FUNC (line 232) | PyMODINIT_FUNC
  function PyMODINIT_FUNC (line 255) | PyMODINIT_FUNC

FILE: src/sercore/HadoopUtils/SerialUtils.cc
  type HadoopUtils (line 28) | namespace HadoopUtils {
    function serializeInt (line 177) | void serializeInt(int32_t t, OutStream& stream) {
    function serializeLong (line 181) | void serializeLong(int64_t t, OutStream& stream)
    function deserializeInt (line 212) | int32_t deserializeInt(InStream& stream) {
    function deserializeLong (line 216) | int64_t deserializeLong(InStream& stream)
    function serializeFloat (line 245) | void serializeFloat(float t, OutStream& stream)
    function deserializeFloat (line 254) | float deserializeFloat(InStream& stream)
    function deserializeFloat (line 261) | void deserializeFloat(float& t, InStream& stream)
    function serializeString (line 270) | void serializeString(const std::string& t, OutStream& stream)
    function deserializeString (line 278) | void deserializeString(std::string& t, InStream& stream)

FILE: src/sercore/HadoopUtils/SerialUtils.hh
  type HadoopUtils (line 24) | namespace HadoopUtils {
    class Error (line 29) | class Error {
    class InStream (line 68) | class InStream {
    class OutStream (line 83) | class OutStream {
    class FileInStream (line 102) | class FileInStream : public InStream {
    class FileOutStream (line 125) | class FileOutStream: public OutStream {
    class StringInStream (line 151) | class StringInStream: public InStream {

FILE: src/sercore/hu_extras.cpp
  function deserializeLongWritable (line 24) | int64_t deserializeLongWritable(HadoopUtils::InStream& stream) {

FILE: src/sercore/sercore.cpp
  function PyObject (line 36) | static PyObject *
  type PyModuleDef (line 97) | struct PyModuleDef
  function initsercore (line 115) | initsercore(void) {

FILE: src/sercore/streams.cpp
  function FILE (line 52) | FILE *
  function FileInStream_init (line 77) | static int
  function PyObject (line 108) | static PyObject *
  function PyObject (line 129) | static PyObject *
  function PyObject (line 137) | static PyObject *
  function PyObject (line 143) | static PyObject *
  function PyObject (line 169) | static PyObject *
  function PyObject (line 187) | static PyObject *
  function PyObject (line 205) | static PyObject *
  function _FileInStream_read_cppstring (line 223) | std::string
  function PyObject (line 240) | static PyObject *
  function PyObject (line 253) | static PyObject *
  function PyObject (line 266) | static PyObject *
  function PyObject (line 310) | static PyObject *
  function PyObject (line 340) | static PyObject *
  function FileOutStream_init (line 427) | static int
  function PyObject (line 458) | static PyObject *
  function PyObject (line 479) | static PyObject *
  function PyObject (line 487) | static PyObject *
  function PyObject (line 493) | static PyObject *
  function PyObject (line 519) | static PyObject *
  function PyObject (line 540) | static PyObject *
  function PyObject (line 561) | static PyObject *
  function PyObject (line 582) | static PyObject*
  function PyObject (line 597) | static PyObject *
  function PyObject (line 628) | static PyObject *
  function PyObject (line 656) | static PyObject *
  function PyObject (line 710) | static PyObject *
  function PyObject (line 769) | static PyObject *
  function PyObject (line 788) | static PyObject *

FILE: src/sercore/streams.h
  type FileInStreamObj (line 27) | typedef struct {
  type FileOutStreamObj (line 34) | typedef struct {

FILE: test/all_tests.py
  function suite (line 32) | def suite():

FILE: test/app/all_tests.py
  function suite (line 28) | def suite(path=None):

FILE: test/app/test_submit.py
  function nop (line 31) | def nop(x=None):
  class Args (line 35) | class Args(object):
    method __init__ (line 36) | def __init__(self, **kwargs):
    method __getattr__ (line 40) | def __getattr__(self, _):
  class TestAppSubmit (line 47) | class TestAppSubmit(unittest.TestCase):
    method setUp (line 49) | def setUp(self):
    method _gen_default_args (line 53) | def _gen_default_args():
    method test_help (line 67) | def test_help(self):
    method _check_args (line 90) | def _check_args(self, args, args_kv):
    method test_conf_file (line 102) | def test_conf_file(self):
    method test_empty_param (line 131) | def test_empty_param(self):
    method test_generate_pipes_code_env (line 141) | def test_generate_pipes_code_env(self):
    method test_generate_pipes_code_no_override_ld_path (line 157) | def test_generate_pipes_code_no_override_ld_path(self):
    method test_generate_pipes_code_no_override_path (line 171) | def test_generate_pipes_code_no_override_path(self):
    method test_generate_pipes_code_no_override_pythonpath (line 180) | def test_generate_pipes_code_no_override_pythonpath(self):
    method test_generate_pipes_code_with_set_env (line 189) | def test_generate_pipes_code_with_set_env(self):
    method test_generate_code_no_env_override (line 204) | def test_generate_code_no_env_override(self):
    method test_generate_code_no_env_override_with_set_env (line 216) | def test_generate_code_no_env_override_with_set_env(self):
    method test_env_arg_to_dict (line 230) | def test_env_arg_to_dict(self):
    method test_bad_upload_files (line 237) | def test_bad_upload_files(self):
    method test_pretend (line 242) | def test_pretend(self):
  function suite (line 255) | def suite():

FILE: test/avro/all_tests.py
  function suite (line 28) | def suite(path=None):

FILE: test/avro/common.py
  class AvroSerializer (line 24) | class AvroSerializer(object):
    method __init__ (line 26) | def __init__(self, schema):
    method serialize (line 30) | def serialize(self, record):
  function avro_user_record (line 37) | def avro_user_record(i):

FILE: test/avro/test_io.py
  class TestAvroIO (line 40) | class TestAvroIO(WDTestCase):
    method setUp (line 42) | def setUp(self):
    method write_avro_file (line 47) | def write_avro_file(self, rec_creator, n_samples, sync_interval):
    method test_seekable (line 56) | def test_seekable(self):
    method test_avro_reader (line 93) | def test_avro_reader(self):
    method test_avro_writer (line 121) | def test_avro_writer(self):
  function suite (line 146) | def suite():

FILE: test/common/all_tests.py
  function suite (line 32) | def suite(path=None):
  function main (line 39) | def main():

FILE: test/common/test_hadoop_utils.py
  class TestHadoopUtils (line 31) | class TestHadoopUtils(unittest.TestCase):
    method setUp (line 33) | def setUp(self):
    method tearDown (line 41) | def tearDown(self):
    method test_get_hadoop_params (line 46) | def test_get_hadoop_params(self):
    method __check_params (line 60) | def __check_params(self, xml_content=None, expected=None):
  function suite (line 73) | def suite():

FILE: test/common/test_hadut.py
  function pair_set (line 31) | def pair_set(seq):
  class TestHadut (line 35) | class TestHadut(unittest.TestCase):
    method assertEqualPairSet (line 42) | def assertEqualPairSet(self, seq1, seq2):
    method test_pop_generic_args (line 45) | def test_pop_generic_args(self):
    method test_merge_csv_args (line 60) | def test_merge_csv_args(self):
    method test_cmd (line 78) | def test_cmd(self):
    method test_run_class (line 82) | def test_run_class(self):
  function suite (line 89) | def suite():

FILE: test/common/test_pydoop.py
  class TestPydoop (line 32) | class TestPydoop(unittest.TestCase):
    method setUp (line 34) | def setUp(self):
    method tearDown (line 41) | def tearDown(self):
    method test_home (line 50) | def test_home(self):
    method test_conf (line 59) | def test_conf(self):
    method test_pydoop_jar_path (line 67) | def test_pydoop_jar_path(self):
  function suite (line 76) | def suite():

FILE: test/common/test_test_support.py
  class TestTestSupport (line 40) | class TestTestSupport(unittest.TestCase):
    method test_inject_code (line 42) | def test_inject_code(self):
    method test_set_python_cmd (line 51) | def test_set_python_cmd(self):
  function suite (line 57) | def suite():

FILE: test/hdfs/all_tests.py
  function suite (line 32) | def suite(path=None):

FILE: test/hdfs/common_hdfs_tests.py
  class TestCommon (line 33) | class TestCommon(unittest.TestCase):
    method __init__ (line 35) | def __init__(self, target, hdfs_host='', hdfs_port=0):
    method setUp (line 40) | def setUp(self):
    method tearDown (line 44) | def tearDown(self):
    method _make_random_path (line 48) | def _make_random_path(self, where=None, add_uni=True):
    method _make_random_dir (line 55) | def _make_random_dir(self, where=None, add_uni=True):
    method _make_random_file (line 62) | def _make_random_file(self, where=None, content=None, **kwargs):
    method failUnlessRaisesExternal (line 77) | def failUnlessRaisesExternal(self, excClass, callableObj, *args, **kwa...
    method assertEqualPathInfo (line 84) | def assertEqualPathInfo(self, info1, info2, tolerance=10):
    method open_close (line 101) | def open_close(self):
    method delete (line 112) | def delete(self):
    method copy (line 126) | def copy(self):
    method move (line 141) | def move(self):
    method chmod (line 151) | def chmod(self):
    method __set_and_check_perm (line 162) | def __set_and_check_perm(self, path, new_mode, expected_mode):
    method chmod_w_string (line 167) | def chmod_w_string(self):
    method file_attrs (line 192) | def file_attrs(self):
    method flush (line 212) | def flush(self):
    method available (line 218) | def available(self):
    method get_path_info (line 224) | def get_path_info(self):
    method read (line 239) | def read(self):
    method __read_chunk (line 254) | def __read_chunk(self, chunk_factory):
    method read_chunk (line 266) | def read_chunk(self):
    method write (line 272) | def write(self):
    method append (line 290) | def append(self):
    method tell (line 306) | def tell(self):
    method pread (line 313) | def pread(self):
    method pread_chunk (line 330) | def pread_chunk(self):
    method copy_on_self (line 341) | def copy_on_self(self):
    method rename (line 349) | def rename(self):
    method change_dir (line 358) | def change_dir(self):
    method list_directory (line 367) | def list_directory(self):
    method __check_readline (line 383) | def __check_readline(self, get_lines):
    method readline (line 400) | def readline(self):
    method readline_big (line 411) | def readline_big(self):
    method readline_and_read (line 421) | def readline_and_read(self):
    method iter_lines (line 431) | def iter_lines(self):
    method seek (line 448) | def seek(self):
    method block_boundary (line 471) | def block_boundary(self):
    method walk (line 497) | def walk(self):
    method exists (line 538) | def exists(self):
    method text_io (line 547) | def text_io(self):
    method __check_path_info (line 568) | def __check_path_info(self, info, **expected_values):
  function common_tests (line 578) | def common_tests():

FILE: test/hdfs/test_common.py
  class TestMode (line 24) | class TestMode(unittest.TestCase):
    method runTest (line 26) | def runTest(self):
  function suite (line 40) | def suite():

FILE: test/hdfs/test_core.py
  class TestCore (line 26) | class TestCore(unittest.TestCase):
    method test_default (line 28) | def test_default(self):
  function suite (line 43) | def suite():

FILE: test/hdfs/test_hdfs.py
  class TestHDFS (line 33) | class TestHDFS(unittest.TestCase):
    method setUp (line 35) | def setUp(self):
    method tearDown (line 55) | def tearDown(self):
    method open (line 63) | def open(self):
    method dump (line 72) | def dump(self):
    method __ls (line 80) | def __ls(self, ls_func, path_transform):
    method lsl (line 98) | def lsl(self):
    method ls (line 101) | def ls(self):
    method mkdir (line 104) | def mkdir(self):
    method load (line 113) | def load(self):
    method __make_tree (line 119) | def __make_tree(self, wd, root="d1", create=True):
    method __cp_file (line 139) | def __cp_file(self, wd):
    method __cp_dir (line 153) | def __cp_dir(self, wd):
    method __cp_recursive (line 164) | def __cp_recursive(self, wd):
    method cp (line 187) | def cp(self):
    method put (line 193) | def put(self):
    method get (line 203) | def get(self):
    method rm (line 212) | def rm(self):
    method chmod (line 218) | def chmod(self):
    method move (line 224) | def move(self):
    method chown (line 234) | def chown(self):
    method rename (line 249) | def rename(self):
    method renames (line 261) | def renames(self):
    method capacity (line 270) | def capacity(self):
    method get_hosts (line 279) | def get_hosts(self):
    method thread_allow (line 292) | def thread_allow(self):
  function suite (line 371) | def suite():

FILE: test/hdfs/test_hdfs_fs.py
  function get_explicit_hp (line 34) | def get_explicit_hp():
  class TestConnection (line 41) | class TestConnection(unittest.TestCase):
    method setUp (line 43) | def setUp(self):
    method connect (line 57) | def connect(self):
    method cache (line 64) | def cache(self):
  class TestHDFS (line 78) | class TestHDFS(TestCommon):
    method __init__ (line 80) | def __init__(self, target):
    method capacity (line 83) | def capacity(self):
    method default_block_size (line 87) | def default_block_size(self):
    method used (line 91) | def used(self):
    method chown (line 95) | def chown(self):
    method utime (line 111) | def utime(self):
    method block_size (line 132) | def block_size(self):
    method replication (line 138) | def replication(self):
    method set_replication (line 143) | def set_replication(self):
    method readline_block_boundary (line 151) | def readline_block_boundary(self):
    method get_hosts (line 186) | def get_hosts(self):
  function suite (line 199) | def suite():

FILE: test/hdfs/test_local_fs.py
  class TestConnection (line 28) | class TestConnection(unittest.TestCase):
    method runTest (line 30) | def runTest(self):
  class TestLocalFS (line 42) | class TestLocalFS(TestCommon):
    method __init__ (line 44) | def __init__(self, target):
  function suite (line 48) | def suite():

FILE: test/hdfs/test_path.py
  function uni_last (line 30) | def uni_last(tup):
  class TestSplit (line 34) | class TestSplit(unittest.TestCase):
    method good (line 36) | def good(self):
    method good_with_user (line 67) | def good_with_user(self):
    method bad (line 84) | def bad(self):
    method splitext (line 100) | def splitext(self):
  class TestUnparse (line 108) | class TestUnparse(unittest.TestCase):
    method good (line 110) | def good(self):
    method bad (line 121) | def bad(self):
  class TestJoin (line 125) | class TestJoin(unittest.TestCase):
    method __check_join (line 127) | def __check_join(self, cases):
    method simple (line 131) | def simple(self):
    method slashes (line 137) | def slashes(self):
    method absolute (line 143) | def absolute(self):
    method full (line 151) | def full(self):
    method unicode_ (line 158) | def unicode_(self):
  class TestAbspath (line 164) | class TestAbspath(unittest.TestCase):
    method setUp (line 166) | def setUp(self):
    method without_user (line 175) | def without_user(self):
    method with_user (line 186) | def with_user(self):
    method forced_local (line 195) | def forced_local(self):
    method already_absolute (line 200) | def already_absolute(self):
  class TestSplitBasenameDirname (line 210) | class TestSplitBasenameDirname(unittest.TestCase):
    method runTest (line 212) | def runTest(self):
  class TestExists (line 230) | class TestExists(unittest.TestCase):
    method good (line 232) | def good(self):
  class TestKind (line 241) | class TestKind(unittest.TestCase):
    method setUp (line 243) | def setUp(self):
    method test_kind (line 247) | def test_kind(self):
    method test_isfile (line 262) | def test_isfile(self):
    method test_isdir (line 277) | def test_isdir(self):
  class TestExpand (line 293) | class TestExpand(unittest.TestCase):
    method expanduser (line 295) | def expanduser(self):
    method expanduser_no_expansion (line 307) | def expanduser_no_expansion(self):
    method expandvars (line 313) | def expandvars(self):
  class TestStat (line 324) | class TestStat(unittest.TestCase):
    method stat (line 336) | def stat(self):
    method stat_on_local (line 356) | def stat_on_local(self):
    method stat_on_dir (line 386) | def stat_on_dir(self):
    method __check_extra_args (line 402) | def __check_extra_args(self, stat_res, path_info):
    method __check_wrapper_funcs (line 408) | def __check_wrapper_funcs(self, path):
  class TestIsSomething (line 414) | class TestIsSomething(unittest.TestCase):
    method full_and_abs (line 416) | def full_and_abs(self):
    method islink (line 425) | def islink(self):
    method ismount (line 434) | def ismount(self):
  class TestNorm (line 438) | class TestNorm(unittest.TestCase):
    method normpath (line 440) | def normpath(self):
  class TestReal (line 448) | class TestReal(unittest.TestCase):
    method realpath (line 450) | def realpath(self):
  class TestSame (line 460) | class TestSame(unittest.TestCase):
    method samefile_link (line 462) | def samefile_link(self):
    method samefile_rel (line 470) | def samefile_rel(self):
    method samefile_norm (line 476) | def samefile_norm(self):
    method samefile_user (line 480) | def samefile_user(self):
  class TestAccess (line 485) | class TestAccess(unittest.TestCase):
    method setUp (line 487) | def setUp(self):
    method tearDown (line 491) | def tearDown(self):
    method __test (line 495) | def __test(self, offset, user=None):
    method test_owner (line 501) | def test_owner(self):
    method test_other (line 504) | def test_other(self):
  class TestUtime (line 508) | class TestUtime(unittest.TestCase):
    method runTest (line 510) | def runTest(self):
  class TestCallFromHdfs (line 523) | class TestCallFromHdfs(unittest.TestCase):
    method setUp (line 525) | def setUp(self):
    method tearDown (line 529) | def tearDown(self):
    method test_stat (line 532) | def test_stat(self):
    method test_access (line 539) | def test_access(self):
    method test_utime (line 544) | def test_utime(self):
  function suite (line 551) | def suite():

FILE: test/hdfs/try_hdfs.py
  function dump_status (line 36) | def dump_status(fs):
  function main (line 43) | def main(argv=sys.argv[1:]):

FILE: test/mapreduce/all_tests.py
  function suite (line 29) | def suite(path=None):

FILE: test/mapreduce/it/crs4/pydoop/mapreduce/pipes/OpaqueRoundtrip.java
  class OpaqueRoundtrip (line 45) | public class OpaqueRoundtrip {
    method main (line 47) | public static void main(String[] args)

FILE: test/mapreduce/test_connections.py
  class Mapper (line 33) | class Mapper(api.Mapper):
    method map (line 35) | def map(self, context):
  class Reducer (line 39) | class Reducer(api.Reducer):
    method reduce (line 41) | def reduce(self, context):
  class UplinkDumpReader (line 46) | class UplinkDumpReader(object):
    method __init__ (line 48) | def __init__(self, stream):
    method close (line 51) | def close(self):
    method __next__ (line 54) | def __next__(self):
    method __iter__ (line 75) | def __iter__(self):
    method next (line 79) | def next(self):
  class TestFileConnection (line 83) | class TestFileConnection(WDTestCase):
    method test_map (line 85) | def test_map(self):
    method test_reduce (line 89) | def test_reduce(self):
    method __run_test (line 93) | def __run_test(self, name, factory, **kwargs):
  function suite (line 107) | def suite():

FILE: test/mapreduce/test_opaque.py
  class TestOpaqueSplit (line 37) | class TestOpaqueSplit(unittest.TestCase):
    method setUp (line 39) | def setUp(self):
    method tearDown (line 43) | def tearDown(self):
    method _make_random_path (line 47) | def _make_random_path(self, where=None):
    method _generate_opaque_splits (line 50) | def _generate_opaque_splits(self, n):
    method _test_opaque (line 53) | def _test_opaque(self, o, no):
    method _test_opaques (line 56) | def _test_opaques(self, opaques, nopaques):
    method _run_java (line 61) | def _run_java(self, in_uri, out_uri, wd):
    method _do_java_roundtrip (line 72) | def _do_java_roundtrip(self, splits, wd='/tmp'):
    method test_opaque (line 82) | def test_opaque(self):
    method test_write_read_opaque_splits (line 94) | def test_write_read_opaque_splits(self):
    method test_opaque_java_round_trip (line 105) | def test_opaque_java_round_trip(self):
  function suite (line 115) | def suite():

FILE: test/sercore/all_tests.py
  function suite (line 29) | def suite(path=None):

FILE: test/sercore/test_deser.py
  class TestFileSplit (line 28) | class TestFileSplit(unittest.TestCase):
    method setUp (line 30) | def setUp(self):
    method test_standard (line 43) | def test_standard(self):
    method test_errors (line 50) | def test_errors(self):
  function suite (line 60) | def suite():

FILE: test/sercore/test_streams.py
  class TestFileInStream (line 38) | class TestFileInStream(unittest.TestCase):
    method setUp (line 40) | def setUp(self):
    method test_from_path (line 44) | def test_from_path(self):
    method test_from_file (line 48) | def test_from_file(self):
    method test_errors (line 53) | def test_errors(self):
    method __check_stream (line 61) | def __check_stream(self, s):
  class TestFileOutStream (line 67) | class TestFileOutStream(unittest.TestCase):
    method setUp (line 69) | def setUp(self):
    method tearDown (line 74) | def tearDown(self):
    method test_from_path (line 77) | def test_from_path(self):
    method test_from_file (line 82) | def test_from_file(self):
    method test_errors (line 88) | def test_errors(self):
    method __fill_stream (line 92) | def __fill_stream(self, s):
    method __check_stream (line 98) | def __check_stream(self):
  class TestSerDe (line 103) | class TestSerDe(unittest.TestCase):
    method setUp (line 112) | def setUp(self):
    method tearDown (line 116) | def tearDown(self):
    method test_vint (line 119) | def test_vint(self):
    method test_vlong (line 125) | def test_vlong(self):
    method test_float (line 131) | def test_float(self):
    method test_string_as_string (line 137) | def test_string_as_string(self):
    method test_string_as_bytes (line 143) | def test_string_as_bytes(self):
    method test_bytes_as_string (line 149) | def test_bytes_as_string(self):
    method test_bytes_as_bytes (line 155) | def test_bytes_as_bytes(self):
    method test_output (line 161) | def test_output(self):
    method test_multi_no_tuple (line 178) | def test_multi_no_tuple(self):
    method test_multi_read_tuple (line 182) | def test_multi_read_tuple(self):
    method test_multi_write_tuple (line 186) | def test_multi_write_tuple(self):
    method test_multi_rw_tuple (line 190) | def test_multi_rw_tuple(self):
    method __fill_stream_multi (line 194) | def __fill_stream_multi(self):
    method __fill_stream_tuple (line 202) | def __fill_stream_tuple(self):
    method __check_stream_multi (line 206) | def __check_stream_multi(self):
    method __check_stream_tuple (line 214) | def __check_stream_tuple(self):
    method test_errors (line 224) | def test_errors(self):
    method test_string_keep_zeros (line 246) | def test_string_keep_zeros(self):
    method test_string_allow_bytes (line 254) | def test_string_allow_bytes(self):
  class TestCheckClosed (line 261) | class TestCheckClosed(unittest.TestCase):
    method test_instream (line 263) | def test_instream(self):
    method test_outstream (line 277) | def test_outstream(self):
    method test_double_close (line 295) | def test_double_close(self):
    method __check (line 313) | def __check(self, ops):
  class TestHadoopTypes (line 318) | class TestHadoopTypes(unittest.TestCase):
    method setUp (line 320) | def setUp(self):
    method test_long_writable (line 324) | def test_long_writable(self):
  function suite (line 352) | def suite():

Download .json

Condensed preview — 370 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (1,816K chars).

[
  {
    "path": ".dir-locals.el",
    "chars": 153,
    "preview": ";;; Directory Local Variables\n;;; See Info node `(emacs) Directory Variables' for more information.\n\n((python-mode\n  (fl"
  },
  {
    "path": ".dockerignore",
    "chars": 22,
    "preview": ".*\nDockerfile*\ndocker\n"
  },
  {
    "path": ".gitignore",
    "chars": 279,
    "preview": "*.pyc\n*~\nbuild\ndocs/_static/favicon.ico\ndocs/_static/logo.png\npydoop/config.py\npydoop/version.py\nsrc/hadoop*/libhdfs/con"
  },
  {
    "path": ".travis/check_script_template.py",
    "chars": 1127,
    "preview": "\"\"\"\\\nPerform full substitution on the Pydoop script template and check\nit with flake8.\n\nAny options (i.e., arguments sta"
  },
  {
    "path": ".travis/cmd/hadoop_localfs.sh",
    "chars": 817,
    "preview": "#!/bin/bash\n\nset -euo pipefail\n[ -n \"${DEBUG:-}\" ] && set -x\n\nfunction onshutdown {\n    mr-jobhistory-daemon.sh stop his"
  },
  {
    "path": ".travis/run_checks",
    "chars": 422,
    "preview": "#!/bin/bash\n\nset -euo pipefail\n[ -n \"${DEBUG:-}\" ] && set -x\n\ndocker exec pydoop bash -c 'cd test && ${PYTHON} all_tests"
  },
  {
    "path": ".travis/start_container",
    "chars": 660,
    "preview": "#!/bin/bash\n\nset -euo pipefail\n[ -n \"${DEBUG:-}\" ] && set -x\nthis=\"${BASH_SOURCE-$0}\"\nthis_dir=$(cd -P -- \"$(dirname -- "
  },
  {
    "path": ".travis.yml",
    "chars": 825,
    "preview": "language: python\n\ncache: pip\n\nmatrix:\n  include:\n  - python: \"2.7\"\n    env: HADOOP_VERSION=3.2.0\n  - python: \"3.6\"\n    e"
  },
  {
    "path": "AUTHORS",
    "chars": 370,
    "preview": "Pydoop is developed and maintained by:\n * Simone Leo <simone.leo@crs4.it>\n * Gianluigi Zanetti <gianluigi.zanetti@crs4.i"
  },
  {
    "path": "Dockerfile",
    "chars": 334,
    "preview": "ARG hadoop_version=3.2.0\nARG python_version=3.6\n\nFROM crs4/pydoop-base:${hadoop_version}-${python_version}\n\nCOPY . /buil"
  },
  {
    "path": "Dockerfile.client",
    "chars": 350,
    "preview": "ARG hadoop_version=3.2.0\nARG python_version=3.6\n\nFROM crs4/pydoop-client-base:${hadoop_version}-${python_version}\n\nCOPY "
  },
  {
    "path": "Dockerfile.docs",
    "chars": 828,
    "preview": "FROM crs4/pydoop-docs-base\n\nCOPY . /build/pydoop\nWORKDIR /build/pydoop\n\nRUN ${PYTHON} -m pip install --no-cache-dir --up"
  },
  {
    "path": "LICENSE",
    "chars": 11358,
    "preview": "\n                                 Apache License\n                           Version 2.0, January 2004\n                  "
  },
  {
    "path": "MANIFEST.in",
    "chars": 205,
    "preview": "include AUTHORS LICENSE VERSION README.md pydoop.properties requirements.txt\n\nrecursive-include src *\nrecursive-include "
  },
  {
    "path": "README.md",
    "chars": 301,
    "preview": "[![Build Status](https://travis-ci.org/crs4/pydoop.png)](https://travis-ci.org/crs4/pydoop)\n\nPydoop is a Python MapReduc"
  },
  {
    "path": "VERSION",
    "chars": 6,
    "preview": "2.0.0\n"
  },
  {
    "path": "dev_tools/build_deprecation_tables",
    "chars": 1180,
    "preview": "#!/usr/bin/env python\n\n\"\"\"\nAn utility to generate mrv1 to mrv2 conversion tables.\n\nUsage::\n\n  bash$ build_deprecation_ta"
  },
  {
    "path": "dev_tools/bump_copyright_year",
    "chars": 1387,
    "preview": "#!/usr/bin/env python\n\n\"\"\"\\\nSet copyright end year across the distribution.\n\"\"\"\n\nimport sys\nimport os\nimport re\nimport a"
  },
  {
    "path": "dev_tools/docker/client_side_tests/apache_2.6.0/initialize.sh",
    "chars": 1067,
    "preview": "#!/bin/bash\n\nport=$1\nclient_id=$2\nrm_container_id=$3\nDOCKER_HOST_IP=${4:-localhost}\n#----------------------------------\n"
  },
  {
    "path": "dev_tools/docker/client_side_tests/apache_2.6.0/local_client_setup.sh",
    "chars": 1724,
    "preview": "#!/bin/bash\n\n#-----------\n# This script should be run in the client container.\n\n\npushd /opt\n\n#----- Hadoop setup\nhdp_ver"
  },
  {
    "path": "dev_tools/docker/client_side_tests/hdp_2.2.0.0/initialize.sh",
    "chars": 756,
    "preview": "#!/bin/bash\n\nport=$1\nclient_id=$2\nrm_container_id=$3\nDOCKER_HOST_IP=${4:-localhost}\n#----------------------------------\n"
  },
  {
    "path": "dev_tools/docker/client_side_tests/hdp_2.2.0.0/local_client_setup.sh",
    "chars": 1780,
    "preview": "#!/bin/bash\n\n# This script should be run in the client container, see initialize.sh\n\n#-----------\nfunction log() {\n    e"
  },
  {
    "path": "dev_tools/docker/cluster.rst",
    "chars": 7680,
    "preview": "Testing pydoop using a Docker Cluster\n=====================================\n\nThe purpose of the pydoop docker cluster is"
  },
  {
    "path": "dev_tools/docker/clusters/apache_2.6.0/docker-compose.yml",
    "chars": 1508,
    "preview": "zookeeper:\n  image: crs4_pydoop/apache_2.6.0_zookeeper:latest\n  name: zookeeper\n  hostname: zookeeper\n  ports:\n    - \"21"
  },
  {
    "path": "dev_tools/docker/clusters/apache_2.6.0/images/base/Dockerfile",
    "chars": 5406,
    "preview": "#----------------------------------------------------\nFROM crs4_pydoop/base:latest\n\n# ----------------------------------"
  },
  {
    "path": "dev_tools/docker/clusters/apache_2.6.0/images/base/scripts/generate_conf_files.py",
    "chars": 4292,
    "preview": "import sys\nimport os\nimport xml.etree.cElementTree as ET\n\n\ndef add_property(conf, name, value):\n    prop = ET.SubElement"
  },
  {
    "path": "dev_tools/docker/clusters/apache_2.6.0/images/base/scripts/zk_set.py",
    "chars": 312,
    "preview": "import sys\nimport os\nfrom kazoo.client import KazooClient\n\nimport logging\nlogging.basicConfig()\n\nlogger = logging.getLog"
  },
  {
    "path": "dev_tools/docker/clusters/apache_2.6.0/images/base/scripts/zk_wait.py",
    "chars": 478,
    "preview": "import sys\nimport os\nimport time\nfrom kazoo.client import KazooClient\n\nimport logging\nlogging.basicConfig()\n\nlogger = lo"
  },
  {
    "path": "dev_tools/docker/clusters/apache_2.6.0/images/bootstrap/Dockerfile",
    "chars": 220,
    "preview": "#----------------------------------------------------\nFROM crs4_pydoop/apache_2.6.0_base:latest\n\n\nCOPY scripts/bootstrap"
  },
  {
    "path": "dev_tools/docker/clusters/apache_2.6.0/images/bootstrap/scripts/bootstrap.py",
    "chars": 1316,
    "preview": "from kazoo.client import KazooClient\nimport os\nimport time\nimport logging\nimport platform\n\nlogging.basicConfig()\n\nlogger"
  },
  {
    "path": "dev_tools/docker/clusters/apache_2.6.0/images/bootstrap/scripts/create_hdfs_dirs.sh",
    "chars": 2206,
    "preview": "#!/bin/bash\n\nexport HADOOP_LOG_DIR=${HDFS_LOG_DIR}\nexport HADOOP_PID_DIR=${HDFS_PID_DIR}\n\nHADOOP_BIN=${HADOOP_HOME}/bin\n"
  },
  {
    "path": "dev_tools/docker/clusters/apache_2.6.0/images/datanode/Dockerfile",
    "chars": 197,
    "preview": "#----------------------------------------------------\nFROM crs4_pydoop/apache_2.6.0_base:latest\n\n#\nEXPOSE  50020\n\nCOPY s"
  },
  {
    "path": "dev_tools/docker/clusters/apache_2.6.0/images/datanode/scripts/start_datanode.sh",
    "chars": 412,
    "preview": "#!/bin/bash\n\n#--- manage_deamon stardard\nexport HADOOP_LOG_DIR=${HDFS_LOG_DIR}\nexport HADOOP_PID_DIR=${HDFS_PID_DIR}\n\npy"
  },
  {
    "path": "dev_tools/docker/clusters/apache_2.6.0/images/historyserver/Dockerfile",
    "chars": 212,
    "preview": "#----------------------------------------------------\nFROM crs4_pydoop/apache_2.6.0_base:latest\n\n#\nEXPOSE 10020 19888\n\nC"
  },
  {
    "path": "dev_tools/docker/clusters/apache_2.6.0/images/historyserver/scripts/start_historyserver.sh",
    "chars": 312,
    "preview": "#!/bin/bash\n\npython /tmp/zk_wait.py historyserver\n\n# we should actually check that the nodemanager is up ...\npython /tmp"
  },
  {
    "path": "dev_tools/docker/clusters/apache_2.6.0/images/namenode/Dockerfile",
    "chars": 239,
    "preview": "#----------------------------------------------------\nFROM crs4_pydoop/apache_2.6.0_base:latest\n\n# HDFS WebUI and HDFS d"
  },
  {
    "path": "dev_tools/docker/clusters/apache_2.6.0/images/namenode/scripts/start_namenode.sh",
    "chars": 550,
    "preview": "#!/bin/bash\n\n#--- manage_deamon stardard\nexport HADOOP_LOG_DIR=${HDFS_LOG_DIR}\nexport HADOOP_PID_DIR=${HDFS_PID_DIR}\n\npy"
  },
  {
    "path": "dev_tools/docker/clusters/apache_2.6.0/images/nodemanager/Dockerfile",
    "chars": 201,
    "preview": "#----------------------------------------------------\nFROM crs4_pydoop/apache_2.6.0_base:latest\n\n#\nEXPOSE 8042\n\nCOPY scr"
  },
  {
    "path": "dev_tools/docker/clusters/apache_2.6.0/images/nodemanager/scripts/start_nodemanager.sh",
    "chars": 937,
    "preview": "#!/bin/bash\n\nexport YARN_LOG_DIR=${YARN_LOG_DIR}\nexport HADOOP_PID_DIR=${HDFS_PID_DIR}\n\npython /tmp/zk_wait.py nodemanag"
  },
  {
    "path": "dev_tools/docker/clusters/apache_2.6.0/images/resourcemanager/Dockerfile",
    "chars": 229,
    "preview": "#----------------------------------------------------\nFROM crs4_pydoop/apache_2.6.0_base:latest\n\n#\nEXPOSE 8088 8021 8031"
  },
  {
    "path": "dev_tools/docker/clusters/apache_2.6.0/images/resourcemanager/scripts/start_resourcemanager.sh",
    "chars": 1150,
    "preview": "#!/bin/bash\n\nexport YARN_LOG_DIR=${YARN_LOG_DIR}\nexport HADOOP_PID_DIR=${HDFS_PID_DIR}\nexport YARN_OPTS=''\n\nexport HADOO"
  },
  {
    "path": "dev_tools/docker/clusters/apache_2.6.0/images/zookeeper/Dockerfile",
    "chars": 174,
    "preview": "#----------------------------------------------------\nFROM crs4_pydoop/apache_2.6.0_base:latest\n\nEXPOSE 2181\n\nCMD [\"/opt"
  },
  {
    "path": "dev_tools/docker/clusters/apache_2.6.0/images/zookeeper/scripts/start_namenode.sh",
    "chars": 416,
    "preview": "#!/bin/bash\n\n#--- manage_deamon stardard\nexport HADOOP_LOG_DIR=${HDFS_LOG_DIR}\nexport HADOOP_PID_DIR=${HDFS_PID_DIR}\n\npy"
  },
  {
    "path": "dev_tools/docker/images/base/Dockerfile",
    "chars": 1618,
    "preview": "#----------------------------------------------------\n#\n# A basic java machine with java, basic services and iv6 disable"
  },
  {
    "path": "dev_tools/docker/images/client/Dockerfile",
    "chars": 760,
    "preview": "#----------------------------------------------------\nFROM crs4_pydoop/base:latest\n\n#----------------------------------\n"
  },
  {
    "path": "dev_tools/docker/scripts/build_base_images.sh",
    "chars": 370,
    "preview": "#!/bin/bash\n\ncurrent_path=$(cd $(dirname ${BASH_SOURCE}); pwd; cd - >/dev/null)\nimages_path=\"${current_path}/../images\"\n"
  },
  {
    "path": "dev_tools/docker/scripts/build_cluster_images.sh",
    "chars": 786,
    "preview": "#!/bin/bash\n\nTAG=${1}\n\nCL_DIR=${TAG}/images\n\nfor d in ${CL_DIR}/*\ndo\n    if [ -d ${d} -a -e ${d}/Dockerfile ]; then\n    "
  },
  {
    "path": "dev_tools/docker/scripts/share_etc_hosts.py",
    "chars": 2497,
    "preview": "import os\nimport sys\nimport ssl\nimport logging\nfrom docker import tls\nfrom docker import Client\n\n\nlogging.basicConfig()\n"
  },
  {
    "path": "dev_tools/docker/scripts/start_client.sh",
    "chars": 1187,
    "preview": "#!/bin/bash\n\n#-------------------------------------------\n#\n# Insert a new client in a running cluster\n#\n# Usage:\n#     "
  },
  {
    "path": "dev_tools/docker/scripts/start_cluster.sh",
    "chars": 549,
    "preview": "#!/bin/bash\n\ncluster_name=$1\nscript_dir=$(cd $(dirname ${BASH_SOURCE}); pwd; cd - >/dev/null)\nshare_hosts_bin=\"python ${"
  },
  {
    "path": "dev_tools/docker_build",
    "chars": 307,
    "preview": "#!/usr/bin/env bash\n\nset -euo pipefail\nthis=\"${BASH_SOURCE-$0}\"\nthis_dir=$(cd -P -- \"$(dirname -- \"${this}\")\" && pwd -P)"
  },
  {
    "path": "dev_tools/dump_app_params",
    "chars": 3415,
    "preview": "#!/usr/bin/env python\n\n\"\"\"\nDump app options in rst table format.\n\"\"\"\n\nimport sys\nimport argparse\n\nimport pydoop.app.main"
  },
  {
    "path": "dev_tools/edit_conf",
    "chars": 1425,
    "preview": "#!/usr/bin/env python\n\n\"\"\"\\\nA utility to edit hadoop configuration files.\n\nUsage::\n\n  $ edit_conf conf/yarn-site.xml tmp"
  },
  {
    "path": "dev_tools/git_export",
    "chars": 1654,
    "preview": "#!/usr/bin/env python\n\n\"\"\"\nExport git working copy including uncommitted changes\n\"\"\"\n\nimport sys\nimport os\nimport argpar"
  },
  {
    "path": "dev_tools/import_src",
    "chars": 3342,
    "preview": "#!/usr/bin/env python\n\n\"\"\"\nImport Hadoop pipes/utils source code.\n\nNOTE: starting from cdh4.3, there is a single Hadoop "
  },
  {
    "path": "dev_tools/mapred_pipes",
    "chars": 954,
    "preview": "#!/usr/bin/env bash\n\n# Set up the layout needed to build the \"mapred\" version of pipes\n\nset -euo pipefail\nthis=\"${BASH_S"
  },
  {
    "path": "dev_tools/unpack_debian",
    "chars": 1574,
    "preview": "#!/usr/bin/env python\n\n\"\"\"\nUnpack debian packages -- a quick shortcut for debug purposes.\n\"\"\"\n\nimport sys, os, argparse,"
  },
  {
    "path": "dev_tools/update_docs",
    "chars": 412,
    "preview": "#!/bin/bash\n\nset -eu\n\ndie() {\n    echo \"$1\" 1>&2\n    exit 1\n}\n\nDOCS_PREFIX=\"docs/_build/html\"\nREPO=\"https://github.com/c"
  },
  {
    "path": "docs/Makefile",
    "chars": 3126,
    "preview": "# Makefile for Sphinx documentation\n#\n\n# You can set these variables from the command line.\nSPHINXOPTS    =\nSPHINXBUILD "
  },
  {
    "path": "docs/_build/.gitignore",
    "chars": 20,
    "preview": "*\n!.gitignore\n!html\n"
  },
  {
    "path": "docs/_templates/layout.html",
    "chars": 3293,
    "preview": "{% extends \"!layout.html\" %}\n\n\n{%- macro mysidebar() %}\n      {%- if not embedded %}{% if not theme_nosidebar|tobool %}\n"
  },
  {
    "path": "docs/api_docs/hadut.rst",
    "chars": 153,
    "preview": ".. _hadut:\n\n:mod:`pydoop.hadut` --- Hadoop shell interaction\n================================================\n\n.. automo"
  },
  {
    "path": "docs/api_docs/hdfs_api.rst",
    "chars": 311,
    "preview": ".. _hdfs-api:\n\n:mod:`pydoop.hdfs` --- HDFS API\n===============================\n\n.. automodule:: pydoop.hdfs\n   :members:"
  },
  {
    "path": "docs/api_docs/index.rst",
    "chars": 79,
    "preview": ".. _api-docs:\n\nAPI Docs\n========\n\n.. toctree::\n\n   mr_api\n   hdfs_api\n   hadut\n"
  },
  {
    "path": "docs/api_docs/mr_api.rst",
    "chars": 207,
    "preview": ".. _mr_api:\n\n:mod:`pydoop.mapreduce.api` --- MapReduce API\n=============================================\n\n.. automodule:"
  },
  {
    "path": "docs/conf.py",
    "chars": 6822,
    "preview": "# -*- coding: utf-8 -*-\n#\n# Pydoop documentation build configuration file, created by\n# sphinx-quickstart on Sun Jun 20 "
  },
  {
    "path": "docs/examples/avro.rst",
    "chars": 4326,
    "preview": ".. _avro_io:\n\nAvro I/O\n========\n\nPydoop transparently supports reading and writing `Avro\n<http://avro.apache.org>`_ reco"
  },
  {
    "path": "docs/examples/index.rst",
    "chars": 114,
    "preview": ".. _examples:\n\nExamples\n========\n\n.. toctree::\n   :maxdepth: 2\n\n   intro\n   sequence_file\n   input_format\n   avro\n"
  },
  {
    "path": "docs/examples/input_format.rst",
    "chars": 889,
    "preview": ".. _input_format_example:\n\nWriting a Custom InputFormat\n============================\n\nYou can use a custom Java ``InputF"
  },
  {
    "path": "docs/examples/intro.rst",
    "chars": 990,
    "preview": "Introduction\n============\n\nPydoop includes several usage examples: you can find them in the\n\"examples\" subdirectory of t"
  },
  {
    "path": "docs/examples/sequence_file.rst",
    "chars": 2393,
    "preview": "Using the Hadoop SequenceFile Format\n====================================\n\nAlthough many MapReduce applications deal wit"
  },
  {
    "path": "docs/how_to_cite.rst",
    "chars": 1098,
    "preview": "How to Cite\n===========\n\nPydoop is developed and maintained by researchers at `CRS4\n<http://www.crs4.it>`_ -- Distribute"
  },
  {
    "path": "docs/index.rst",
    "chars": 1576,
    "preview": ".. Pydoop documentation master file, created by\n   sphinx-quickstart on Sun Jun 20 17:06:55 2010.\n   You can adapt this "
  },
  {
    "path": "docs/installation.rst",
    "chars": 8598,
    "preview": ".. _installation:\n\nInstallation\n============\n\nPrerequisites\n-------------\n\nWe regularly test Pydoop on Ubuntu only, but "
  },
  {
    "path": "docs/news/archive.rst",
    "chars": 6372,
    "preview": "News Archive\n------------\n\n\nNew in 1.2.0\n^^^^^^^^^^^^\n\n * Added support for Hadoop 2.7.2.\n * Dropped support for Python "
  },
  {
    "path": "docs/news/index.rst",
    "chars": 73,
    "preview": ".. _news:\n\nNews\n====\n\n.. toctree::\n   :maxdepth: 1\n\n   latest\n   archive\n"
  },
  {
    "path": "docs/news/latest.rst",
    "chars": 2118,
    "preview": "New in 2.0.0\n------------\n\nPydoop 2.0.0 adds Python 3 and Hadoop 3 support, and features a complete\noverhaul of the ``ma"
  },
  {
    "path": "docs/pydoop_script.rst",
    "chars": 5492,
    "preview": ".. _pydoop_script_guide:\n\nPydoop Script User Guide\n========================\n\nPydoop Script is the easiest way to write s"
  },
  {
    "path": "docs/pydoop_script_options.rst",
    "chars": 9465,
    "preview": "..\n  Auto-generated by dev_tools/dump_app_params. DO NOT EDIT!\n  To update, run:\n    dev_tools/dump_app_params --app scr"
  },
  {
    "path": "docs/pydoop_submit_options.rst",
    "chars": 13200,
    "preview": "..\n  Auto-generated by dev_tools/dump_app_params. DO NOT EDIT!\n  To update, run:\n    dev_tools/dump_app_params --app sub"
  },
  {
    "path": "docs/running_pydoop_applications.rst",
    "chars": 1858,
    "preview": ".. _running_apps:\n\nPydoop Submit User Guide\n========================\n\nPydoop applications are run via the ``pydoop submi"
  },
  {
    "path": "docs/self_contained.rst",
    "chars": 2619,
    "preview": ".. _self_contained:\n\nInstallation-free Usage\n=======================\n\nThis example shows how to use the Hadoop Distribut"
  },
  {
    "path": "docs/tutorial/hdfs_api.rst",
    "chars": 1191,
    "preview": ".. _hdfs_api_tutorial:\n\nThe HDFS API\n============\n\nThe :ref:`HDFS API <hdfs-api>` allows you to connect to an HDFS\ninsta"
  },
  {
    "path": "docs/tutorial/index.rst",
    "chars": 107,
    "preview": ".. _tutorial:\n\nTutorial\n========\n\n.. toctree::\n   :maxdepth: 2\n\n   pydoop_script\n   hdfs_api\n   mapred_api\n"
  },
  {
    "path": "docs/tutorial/mapred_api.rst",
    "chars": 11307,
    "preview": ".. _api_tutorial:\n\nWriting Full-Featured Applications\n==================================\n\nWhile :ref:`Pydoop Script <pyd"
  },
  {
    "path": "docs/tutorial/pydoop_script.rst",
    "chars": 4698,
    "preview": ".. _pydoop_script_tutorial:\n\nEasy Hadoop Scripting with Pydoop Script\n========================================\n\nPydoop S"
  },
  {
    "path": "examples/README",
    "chars": 186,
    "preview": "This directory contains several Pydoop usage examples. Documentation\nis in the \"examples\" subsection of the Pydoop html "
  },
  {
    "path": "examples/avro/build.sh",
    "chars": 427,
    "preview": "#!/usr/bin/env bash\n\nset -euo pipefail\n[ -n \"${DEBUG:-}\" ] && set -x\nthis=\"${BASH_SOURCE-$0}\"\nthis_dir=$(cd -P -- \"$(dir"
  },
  {
    "path": "examples/avro/config.sh",
    "chars": 482,
    "preview": "[ -n \"${PYDOOP_AVRO_EXAMPLES:-}\" ] && return || readonly PYDOOP_AVRO_EXAMPLES=1\n\nTARGET=\"target\"\nexport CLASS_DIR=\"${TAR"
  },
  {
    "path": "examples/avro/pom.xml",
    "chars": 1916,
    "preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n\n<!--\n  BEGIN_COPYRIGHT\n\n  Copyright 2009-2026 CRS4.\n\n  Licensed under the Apache"
  },
  {
    "path": "examples/avro/py/avro_base.py",
    "chars": 2796,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "examples/avro/py/avro_container_dump_results.py",
    "chars": 1109,
    "preview": "# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you "
  },
  {
    "path": "examples/avro/py/avro_key_in.py",
    "chars": 774,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "examples/avro/py/avro_key_in_out.py",
    "chars": 812,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "examples/avro/py/avro_key_value_in.py",
    "chars": 784,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "examples/avro/py/avro_key_value_in_out.py",
    "chars": 832,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "examples/avro/py/avro_parquet_dump_results.py",
    "chars": 899,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "examples/avro/py/avro_pyrw.py",
    "chars": 1587,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "examples/avro/py/avro_value_in.py",
    "chars": 778,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "examples/avro/py/avro_value_in_out.py",
    "chars": 820,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "examples/avro/py/check_cc.py",
    "chars": 1937,
    "preview": "# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you "
  },
  {
    "path": "examples/avro/py/check_results.py",
    "chars": 1677,
    "preview": "# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you "
  },
  {
    "path": "examples/avro/py/color_count.py",
    "chars": 1191,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "examples/avro/py/create_input.py",
    "chars": 1183,
    "preview": "# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you "
  },
  {
    "path": "examples/avro/py/gen_data.py",
    "chars": 963,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "examples/avro/py/generate_avro_users.py",
    "chars": 1721,
    "preview": "# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you "
  },
  {
    "path": "examples/avro/py/kmer_count.py",
    "chars": 1166,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "examples/avro/py/show_kmer_count.py",
    "chars": 1053,
    "preview": "# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you "
  },
  {
    "path": "examples/avro/py/write_avro.py",
    "chars": 1525,
    "preview": "# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you "
  },
  {
    "path": "examples/avro/run",
    "chars": 661,
    "preview": "#!/usr/bin/env bash\n\nset -euo pipefail\n[ -n \"${DEBUG:-}\" ] && set -x\nthis=\"${BASH_SOURCE-$0}\"\nthis_dir=$(cd -P -- \"$(dir"
  },
  {
    "path": "examples/avro/run_avro_container_in",
    "chars": 2467,
    "preview": "#!/usr/bin/env bash\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/avro/run_avro_container_in_out",
    "chars": 3362,
    "preview": "#!/usr/bin/env bash\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/avro/run_avro_parquet_in",
    "chars": 2426,
    "preview": "#!/usr/bin/env bash\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/avro/run_avro_parquet_in_out",
    "chars": 3365,
    "preview": "#!/usr/bin/env bash\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/avro/run_avro_pyrw",
    "chars": 2223,
    "preview": "#!/usr/bin/env bash\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/avro/run_color_count",
    "chars": 2081,
    "preview": "#!/usr/bin/env bash\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/avro/run_kmer_count",
    "chars": 1962,
    "preview": "#!/usr/bin/env bash\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/avro/schemas/alignment_record.avsc",
    "chars": 13136,
    "preview": "{\n    \"type\": \"record\",\n    \"name\": \"AlignmentRecord\",\n    \"fields\": [\n        {\n            \"default\": null,\n          "
  },
  {
    "path": "examples/avro/schemas/alignment_record_proj.avsc",
    "chars": 650,
    "preview": "{\n    \"type\": \"record\",\n    \"name\": \"AlignmentRecord\",\n    \"fields\": [\n        {\n            \"default\": null,\n          "
  },
  {
    "path": "examples/avro/schemas/pet.avsc",
    "chars": 184,
    "preview": "{\n    \"namespace\": \"example.avro\",\n    \"type\": \"record\",\n    \"name\": \"Pet\",\n    \"fields\": [\n        {\"name\": \"name\", \"ty"
  },
  {
    "path": "examples/avro/schemas/stats.avsc",
    "chars": 196,
    "preview": "{\n \"namespace\": \"example.avro\",\n \"type\": \"record\",\n \"name\": \"Stats\",\n \"fields\": [\n     {\"name\": \"office\", \"type\": \"strin"
  },
  {
    "path": "examples/avro/schemas/user.avsc",
    "chars": 289,
    "preview": "{\n \"namespace\": \"example.avro\",\n \"type\": \"record\",\n \"name\": \"User\",\n \"fields\": [\n     {\"name\": \"office\", \"type\": \"string"
  },
  {
    "path": "examples/avro/src/main/java/it/crs4/pydoop/WriteKV.java",
    "chars": 3918,
    "preview": "/** BEGIN_COPYRIGHT\n *\n * Copyright 2009-2026 CRS4.\n *\n * Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "examples/avro/src/main/java/it/crs4/pydoop/WriteParquet.java",
    "chars": 4310,
    "preview": "/* BEGIN_COPYRIGHT\n *\n * Copyright 2009-2026 CRS4.\n *\n * Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "examples/avro/write_avro_kv",
    "chars": 396,
    "preview": "#!/bin/bash\n\n# args: KEY_SCHEMA_FILE, VALUE_SCHEMA_FILE, CSV_IN_FILE AVRO_OUT_FILE\n\nset -euo pipefail\n[ -n \"${DEBUG:-}\" "
  },
  {
    "path": "examples/c++/HadoopPipes.cc",
    "chars": 35640,
    "preview": "/**\n * Licensed to the Apache Software Foundation (ASF) under one\n * or more contributor license agreements.  See the NO"
  },
  {
    "path": "examples/c++/Makefile",
    "chars": 228,
    "preview": "# yum install openssl-devel\n\nCXXFLAGS := -pthread -g -pipe -Iinclude\nLDFLAGS := -pthread\nLDLIBS := -lcrypto\n\nall: wordco"
  },
  {
    "path": "examples/c++/README.txt",
    "chars": 535,
    "preview": "C++ word count implementation, mostly for comparison purposes. Not run\ntogether with other examples and/or tests by defa"
  },
  {
    "path": "examples/c++/SerialUtils.cc",
    "chars": 6700,
    "preview": "/**\n * Licensed to the Apache Software Foundation (ASF) under one\n * or more contributor license agreements.  See the NO"
  },
  {
    "path": "examples/c++/StringUtils.cc",
    "chars": 5178,
    "preview": "/**\n * Licensed to the Apache Software Foundation (ASF) under one\n * or more contributor license agreements.  See the NO"
  },
  {
    "path": "examples/c++/include/hadoop/Pipes.hh",
    "chars": 6330,
    "preview": "/**\n * Licensed to the Apache Software Foundation (ASF) under one\n * or more contributor license agreements.  See the NO"
  },
  {
    "path": "examples/c++/include/hadoop/SerialUtils.hh",
    "chars": 4567,
    "preview": "/**\n * Licensed to the Apache Software Foundation (ASF) under one\n * or more contributor license agreements.  See the NO"
  },
  {
    "path": "examples/c++/include/hadoop/StringUtils.hh",
    "chars": 2441,
    "preview": "/**\n * Licensed to the Apache Software Foundation (ASF) under one\n * or more contributor license agreements.  See the NO"
  },
  {
    "path": "examples/c++/include/hadoop/TemplateFactory.hh",
    "chars": 3319,
    "preview": "/**\n * Licensed to the Apache Software Foundation (ASF) under one\n * or more contributor license agreements.  See the NO"
  },
  {
    "path": "examples/c++/wordcount.cc",
    "chars": 2034,
    "preview": "// BEGIN_COPYRIGHT\n//\n// Copyright 2009-2026 CRS4.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "examples/config.sh",
    "chars": 585,
    "preview": "[ -n \"${PYDOOP_EXAMPLES:-}\" ] && return || readonly PYDOOP_EXAMPLES=1\n\ndie() {\n    echo $1 1>&2\n    exit 1\n}\n\nexport USE"
  },
  {
    "path": "examples/hdfs/common.py",
    "chars": 832,
    "preview": "# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you "
  },
  {
    "path": "examples/hdfs/repl_session.py",
    "chars": 1337,
    "preview": "\"\"\"\\\n# DOCS_INCLUDE_START\n>>> import pydoop.hdfs as hdfs\n>>> hdfs.mkdir('test')\n>>> hdfs.dump('hello, world', 'test/hell"
  },
  {
    "path": "examples/hdfs/run",
    "chars": 1420,
    "preview": "#!/usr/bin/env bash\n\n# BEGIN_COPYRIGHT\n# \n# Copyright 2009-2026 CRS4.\n# \n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "examples/hdfs/treegen.py",
    "chars": 2029,
    "preview": "# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you "
  },
  {
    "path": "examples/hdfs/treewalk.py",
    "chars": 1283,
    "preview": "# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you "
  },
  {
    "path": "examples/input/alice_1.txt",
    "chars": 83766,
    "preview": "Project Gutenberg's Alice's Adventures in Wonderland, by Lewis Carroll\r\n\r\nThis eBook is for the use of anyone anywhere a"
  },
  {
    "path": "examples/input/alice_2.txt",
    "chars": 83763,
    "preview": "go on. 'And so these three little sisters--they were learning to draw,\r\nyou know--'\r\n\r\n'What did they draw?' said Alice,"
  },
  {
    "path": "examples/input_format/check_results.py",
    "chars": 1436,
    "preview": "# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you "
  },
  {
    "path": "examples/input_format/it/crs4/pydoop/mapred/TextInputFormat.java",
    "chars": 1546,
    "preview": "// BEGIN_COPYRIGHT\n// \n// Copyright 2009-2026 CRS4.\n// \n// Licensed under the Apache License, Version 2.0 (the \"License\""
  },
  {
    "path": "examples/input_format/it/crs4/pydoop/mapreduce/TextInputFormat.java",
    "chars": 1565,
    "preview": "// BEGIN_COPYRIGHT\n// \n// Copyright 2009-2026 CRS4.\n// \n// Licensed under the Apache License, Version 2.0 (the \"License\""
  },
  {
    "path": "examples/input_format/run",
    "chars": 1862,
    "preview": "#!/usr/bin/env bash\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/pydoop_script/check.py",
    "chars": 4279,
    "preview": "# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you "
  },
  {
    "path": "examples/pydoop_script/data/base_histogram_input/example_1.sam",
    "chars": 154244,
    "preview": "foo_0/1\t81\tchr6\t3558357\t37\t91M\t*\t0\t0\tAGCTTCTTTGACTCTCGAATTTTAGCACTAGAAGAAATAGTGAGGATTATATATTTCAGAAGTTCTCACCCAGGATATCAGAA"
  },
  {
    "path": "examples/pydoop_script/data/base_histogram_input/example_2.sam",
    "chars": 154459,
    "preview": "foo_499/2\t163\tchr17\t38910967\t60\t91M\t=\t38911344\t468\tTATTGAACCAGGCAGGGGAACCTGGGCCCCTGAACTCTGTCTCTTTATACTGCATTTTGAAAGCAGCAC"
  },
  {
    "path": "examples/pydoop_script/data/stop_words.txt",
    "chars": 14,
    "preview": "one\ntwo\nthree\n"
  },
  {
    "path": "examples/pydoop_script/data/transpose_input/matrix.txt",
    "chars": 60,
    "preview": "a00\ta01\ta02\na10\ta11\ta12\na20\ta21\ta22\na30\ta31\ta32\na40\ta41\ta42\n"
  },
  {
    "path": "examples/pydoop_script/run",
    "chars": 341,
    "preview": "#!/usr/bin/env bash\n\nset -euo pipefail\n[ -n \"${DEBUG:-}\" ] && set -x\nthis=\"${BASH_SOURCE-$0}\"\nthis_dir=$(cd -P -- \"$(dir"
  },
  {
    "path": "examples/pydoop_script/run_script.sh",
    "chars": 2586,
    "preview": "#!/usr/bin/env bash\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/pydoop_script/scripts/base_histogram.py",
    "chars": 978,
    "preview": "# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you "
  },
  {
    "path": "examples/pydoop_script/scripts/caseswitch.py",
    "chars": 1200,
    "preview": "# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you "
  },
  {
    "path": "examples/pydoop_script/scripts/grep.py",
    "chars": 1071,
    "preview": "# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you "
  },
  {
    "path": "examples/pydoop_script/scripts/lowercase.py",
    "chars": 803,
    "preview": "# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you "
  },
  {
    "path": "examples/pydoop_script/scripts/transpose.py",
    "chars": 2183,
    "preview": "# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you "
  },
  {
    "path": "examples/pydoop_script/scripts/wc_combiner.py",
    "chars": 943,
    "preview": "# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you "
  },
  {
    "path": "examples/pydoop_script/scripts/wordcount.py",
    "chars": 854,
    "preview": "# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you "
  },
  {
    "path": "examples/pydoop_script/scripts/wordcount_sw.py",
    "chars": 1155,
    "preview": "# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you "
  },
  {
    "path": "examples/pydoop_submit/check.py",
    "chars": 3037,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "examples/pydoop_submit/data/cols_1.txt",
    "chars": 20,
    "preview": "foo1\tbar1\nfoo2\tbar2\n"
  },
  {
    "path": "examples/pydoop_submit/data/cols_2.txt",
    "chars": 20,
    "preview": "foo3\tbar3\nfoo4\tbar4\n"
  },
  {
    "path": "examples/pydoop_submit/mr/map_only_java_writer.py",
    "chars": 1049,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "examples/pydoop_submit/mr/map_only_python_writer.py",
    "chars": 1859,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "examples/pydoop_submit/mr/nosep.py",
    "chars": 931,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "examples/pydoop_submit/mr/wordcount_full.py",
    "chars": 4413,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "examples/pydoop_submit/mr/wordcount_minimal.py",
    "chars": 1254,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "examples/pydoop_submit/run",
    "chars": 404,
    "preview": "#!/usr/bin/env bash\n\nset -euo pipefail\n[ -n \"${DEBUG:-}\" ] && set -x\nthis=\"${BASH_SOURCE-$0}\"\nthis_dir=$(cd -P -- \"$(dir"
  },
  {
    "path": "examples/pydoop_submit/run_submit.sh",
    "chars": 2637,
    "preview": "#!/usr/bin/env bash\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/run_all",
    "chars": 1447,
    "preview": "#!/bin/bash\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Li"
  },
  {
    "path": "examples/self_contained/check_results.py",
    "chars": 1707,
    "preview": "# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you "
  },
  {
    "path": "examples/self_contained/run",
    "chars": 1773,
    "preview": "#!/usr/bin/env bash\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "examples/self_contained/vowelcount/__init__.py",
    "chars": 850,
    "preview": "# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you "
  },
  {
    "path": "examples/self_contained/vowelcount/lib/__init__.py",
    "chars": 705,
    "preview": "# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you "
  },
  {
    "path": "examples/self_contained/vowelcount/mr/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "examples/self_contained/vowelcount/mr/main.py",
    "chars": 804,
    "preview": "# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you "
  },
  {
    "path": "examples/self_contained/vowelcount/mr/mapper.py",
    "chars": 842,
    "preview": "# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you "
  },
  {
    "path": "examples/self_contained/vowelcount/mr/reducer.py",
    "chars": 777,
    "preview": "# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you "
  },
  {
    "path": "examples/sequence_file/bin/filter.py",
    "chars": 1507,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "examples/sequence_file/bin/wordcount.py",
    "chars": 1174,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "examples/sequence_file/check.py",
    "chars": 1406,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "examples/sequence_file/run",
    "chars": 2702,
    "preview": "#!/usr/bin/env bash\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "int_test/config.sh",
    "chars": 587,
    "preview": "[ -n \"${PYDOOP_INT_TESTS:-}\" ] && return || readonly PYDOOP_INT_TESTS=1\n\ndie() {\n    echo $1 1>&2\n    exit 1\n}\n\nexport U"
  },
  {
    "path": "int_test/mapred_submitter/check.py",
    "chars": 3460,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "int_test/mapred_submitter/genwords.py",
    "chars": 1870,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "int_test/mapred_submitter/input/map_only/f1.txt",
    "chars": 12,
    "preview": "line1\nline2\n"
  },
  {
    "path": "int_test/mapred_submitter/input/map_only/f2.txt",
    "chars": 12,
    "preview": "line3\nline4\n"
  },
  {
    "path": "int_test/mapred_submitter/input/map_reduce/f1.txt",
    "chars": 56,
    "preview": "the quick brown fox\nhad a meeting with\nthe lazy red FӦX\n"
  },
  {
    "path": "int_test/mapred_submitter/input/map_reduce/f2.txt",
    "chars": 56,
    "preview": "the young black FӦX\nhad breakfast with\nthe old pink fox\n"
  },
  {
    "path": "int_test/mapred_submitter/input/map_reduce_long/f.txt",
    "chars": 260,
    "preview": "we need more\nthan ten\nlines\nbecause\nwe are\nsetting the\ntimeout to\nten seconds\nand the map\nand reduce\nfunctions\nsleep for"
  },
  {
    "path": "int_test/mapred_submitter/mr/map_only_java_writer.py",
    "chars": 924,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "int_test/mapred_submitter/mr/map_only_python_writer.py",
    "chars": 1437,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "int_test/mapred_submitter/mr/map_reduce_combiner.py",
    "chars": 1133,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "int_test/mapred_submitter/mr/map_reduce_java_rw.py",
    "chars": 1078,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "int_test/mapred_submitter/mr/map_reduce_java_rw_pstats.py",
    "chars": 1121,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "int_test/mapred_submitter/mr/map_reduce_python_partitioner.py",
    "chars": 1301,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "int_test/mapred_submitter/mr/map_reduce_python_reader.py",
    "chars": 2105,
    "preview": "#!/usr/bin/env python\n\n# BEGIN_COPYRIGHT\n#\n# Copyright 2009-2026 CRS4.\n#\n# Licensed under the Apache License, Version 2."
  }
]

// ... and 170 more files (download for full content)

About this extraction

This page contains the full source code of the crs4/pydoop GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 370 files (1.6 MB), approximately 531.1k tokens, and a symbol index with 1874 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo