[
  {
    "path": ".coveragerc",
    "content": "[report]\nshow_missing = True\n"
  },
  {
    "path": ".gitignore",
    "content": ".DS_Store\n*.pyc\n"
  },
  {
    "path": ".travis.yml",
    "content": "sudo: false\nlanguage: python\npython:\n  - '2.7'\n  - '3.4'\n  - '3.5'\n  - '3.6'\n  - '3.7'\ninstall:\n  - pip install -e .[dev]\nbefore_script:\n  - flake8 .\nscript:\n  - nosetests\n  - coverage report --fail-under=100\nafter_success:\n    - coveralls\nnotifications: \n    email: false\n\naddons:\n  apt:\n    packages:\n      - graphviz\n"
  },
  {
    "path": "LICENSE",
    "content": "Copyright (c) 2015, Robert deCarvalho\r\nAll rights reserved.\r\n\r\nRedistribution and use in source and binary forms, with or without\r\nmodification, are permitted provided that the following conditions are met:\r\n\r\n1. Redistributions of source code must retain the above copyright notice, this\r\n   list of conditions and the following disclaimer. \r\n2. Redistributions in binary form must reproduce the above copyright notice,\r\n   this list of conditions and the following disclaimer in the documentation\r\n   and/or other materials provided with the distribution.\r\n\r\nTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\" AND\r\nANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED\r\nWARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE\r\nDISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR\r\nANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES\r\n(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;\r\nLOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND\r\nON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT\r\n(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS\r\nSOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.\r\n\r\nThe views and conclusions contained in the software and documentation are those\r\nof the authors and should not be interpreted as representing official policies, \r\neither expressed or implied, of the FreeBSD Project.\r\n"
  },
  {
    "path": "README.md",
    "content": "Update (2/23/2021)\n===\nIt looks like this README is slowly turning into a reference of all the projects in this space that I think are better than consecution.\nHere is [metaflow](https://github.com/Netflix/metaflow), an offering from Netflix.\n\n\nUpdate (9/21/2020)\n===\nAnother library that I believe to be better than consecution is the [pypeln](https://cgarciae.github.io/pypeln/) project.  The way it allows for a different number of workers on each node of a pipeline is quite nice.  Additionally the ability to control whether each node is run using threads, processes, async, or sync is really useful.\n\n\nUpdate (5/1/2020)\n===\nSince writing this, the excellent [streamz](https://streamz.readthedocs.io/en/latest/) package has been created.  Streamz\nis the project I wish had existed back when I wrote this.  It is a much more capable implementation of the of the core \nideas of consecution, and plays nicely with [dask](https://dask.org/) to achieve scale.  I have started using streamz in my work in place of consecution.\n\nConsecution\n===\n[![Build Status](https://travis-ci.org/robdmc/consecution.svg?branch=develop)](https://travis-ci.org/robdmc/consecution)\n[![Coverage Status](https://coveralls.io/repos/github/robdmc/consecution/badge.svg?branch=develop)](https://coveralls.io/github/robdmc/consecution?branch=add_docs)\n\nIntroduction\n---\nConsecution is:\n  * An easy-to-use pipeline abstraction inspired by <a href=\"http://storm.apache.org/releases/current/Tutorial.html\"> Apache Storm Topologies</a>\n  * Designed to simplify building ETL pipelines that are robust and easy to test\n  * A system for wiring together simple processing nodes to form a DAG, which is fed with a python iterable\n  * Built using synchronous, single-threaded execution strategies designed to run efficiently on a single core\n  * Implemented in pure-python with optional requirements that are needed only for graph visualization\n  * Written with 100% test coverage\n\nConsecution makes it easy to build systems like this.\n\n![Output Image](/images/etl_example.png?raw=true \"ETL Example\")\n\n\nInstallation\n---\nConsecution is a pure-python package that is simply installed with pip.  The only non-essential\nrequirement is the \n<a href=\"http://www.graphviz.org/\">Graphviz</a> system package, which is only needed if you want to create\ngraphical representations of your pipeline.\n\n<pre><code><strong>[~]$ pip install consecution</strong></code></pre>\n\nDocker\n---\nIf you would like to try out consecution on docker, check out consecution from github and navigate to the\n`docker/` subdirectory.  From there, run the following.\n\n* Build the consecution image: `docker_build.sh`\n* Start a container: `docker_run.sh`\n* Once in the container, run the example: `python simple_example.py`\n\n\nQuick Start\n---\nWhat follows is a quick tour of consecution.  See the <a\nhref=\"http://consecution.readthedocs.io/en/latest/\">API documentation</a> for\nmore detailed information.\n\n### Nodes\nConsecution works by wiring together nodes.  You create nodes by inheriting from the\n`consecution.Node` class.  Every node must define a `.process()` method.  This method\ncontains whatever logic you want for processing single items as they pass through your\npipeline.  Here is an example of a node that simply logs items passing through it.\n```python\nfrom consecution import Node\n\nclass LogNode(Node):\n    def process(self, item):\n        # any logic you want for processing single item \n        print('{: >15} processing {}'.format(self.name, item))\n\n        # send item downstream\n        self.push(item)\n```\n### Pipelines\nNow let's create a pipeline that wires together a series of these logging nodes.\nWe do this by employing the pipe symbol in  much the same way that you pipe data\nbetween programs in unix.  Note that you must name nodes when you instantiate\nthem.\n```python\nfrom consecution import Node, Pipeline\n\n# This is the same node class we defined above\nclass LogNode(Node):\n    def process(self, item):\n        print('{} processing {}'.format(self.name, item))\n        self.push(item)\n\n# Connect nodes with pipe symbols to create pipeline for consuming any iterable.\npipe = Pipeline(\n    LogNode('extract') | LogNode('transform') | LogNode('load')\n)\n```\nAt this point, we can visualize the pipeline to verify that the topology is\nwhat we expect it to be.  If you Graphviz installed, you can now simply type\none of the following to see the pipeline visualized.\n```python\n# Create a pipeline.png file in your working directory\npipe.plot()  \n\n# Interactively display the pipeline visualization in an IPython notebook\n# by simply making the final expression in a cell evaluate to a pipeline.\npipe\n```\nThe plot command should produce the following visualization.\n\n![Output Image](/images/etl1.png?raw=true \"Three Node ETL Example\")\n\nIf you don't have Graphviz installed, you can print the pipeline\nobject to get a text-based visualization.\n```python\nprint(pipe)\n```\nThis represents your pipeline as a series of pipe statements showing\nhow data is piped between nodes.\n```\nPipeline\n--------------------------------------------------------------------\n  extract | transform\ntransform | load\n--------------------------------------------------------------------\n```\n\n\nWe can now process an iterable with our pipeline by running\n```python\npipe.consume(range(5))\n```\nwhich will print the following to the console.\n```\n   extract processing 0\n transform processing 0\n      load processing 0\n   extract processing 1\n transform processing 1\n      load processing 1\n   extract processing 2\n transform processing 2\n      load processing 2\n```\n\n### Broadcasting\nPiping the output of a single node into a list of nodes will cause the single\nnode to broadcast its pushed items to every item in the list.  So, again, using\nour logging node, we could construct a pipeline like this:\n```python\nfrom consecution import Node, Pipeline\n\nclass LogNode(Node):\n    def process(self, item):\n        print('{} processing {}'.format(self.name, item))\n        self.push(item)\n\n# pipe to a list of nodes to broadcast items\npipe = Pipeline(\n    LogNode('extract') \n    | LogNode('transform') \n    | [LogNode('load_redis'), LogNode('load_postgres'), LogNode('load_mongo')]\n)\npipe.plot()\npipe.consume(range(2))\n```\nThe plot command produces this visualization\n\n![Output Image](/images/broadcast.png?raw=true \"Broadcast Example\")\n\nand consuming `range(2)` produces this output\n```\n      extract processing 0\n    transform processing 0\n   load_redis processing 0\nload_postgres processing 0\n   load_mongo processing 0\n      extract processing 1\n    transform processing 1\n   load_redis processing 1\nload_postgres processing 1\n   load_mongo processing 1\n```\n\n### Routing\nIf you pipe to a list that contains multiple nodes and a single callable, then\nconsecution will interpret the callable as a routing function that accepts a\nsingle item as its only argument and returns the name of one of the nodes in the\nlist.  The routing function will direct the flow of items as illustrated below.\n```python\nfrom consecution import Node, Pipeline\n\nclass LogNode(Node):\n    def process(self, item):\n        print('{: >15} processing {}'.format(self.name, item))\n        self.push(item)\n        \ndef parity(item):\n    if item % 2 == 0:\n        return 'transform_even'\n    else:\n        return 'transform_odd'\n\n# pipe to a list containing a callable to achieve routing behaviour\npipe = Pipeline(\n    LogNode('extract') \n    | [LogNode('transform_even'), LogNode('transform_odd'), parity] \n)\npipe.plot()\npipe.consume(range(4))\n```\nThe plot command produces the following pipeline\n\n![Output Image](/images/routing.png?raw=true \"Routing Example\")\n\nand consuming `range(4)` produces this output\n```\n        extract processing 0\n transform_even processing 0\n        extract processing 1\n  transform_odd processing 1\n        extract processing 2\n transform_even processing 2\n        extract processing 3\n  transform_odd processing 3\n```\n\n\n### Merging\nUp to this point, we have the ability to create processing trees where nodes\ncan either broadcast to or route between their downstream nodes.  We can,\nhowever, do more then this and create DAGs (Directed-Acyclic-Graphs).  Piping\nfrom a list back to a single node will merge the output of all nodes in the\nlist together into the single downstream node like this.\n```python\nfrom consecution import Node, Pipeline\n\nclass LogNode(Node):\n    def process(self, item):\n        print('{: >15} processing {}'.format(self.name, item))\n        self.push(item)\n        \ndef parity(item):\n    if item % 2 == 0:\n        return 'transform_even'\n    else:\n        return 'transform_odd'\n\n# piping from a list back to a single node merges items into downstream node\npipe = Pipeline(\n    LogNode('extract') \n    | [LogNode('transform_even'), LogNode('transform_odd'), parity] \n    | LogNode('load')\n)\npipe.plot()\npipe.consume(range(4))\n```\nThe plot command produces the following pipeline\n\n![Output Image](/images/dag.png?raw=true \"DAG Example\")\n\nand consuming `range(4)` produces this output\n```\n        extract processing 0\n transform_even processing 0\n           load processing 0\n        extract processing 1\n  transform_odd processing 1\n           load processing 1\n        extract processing 2\n transform_even processing 2\n           load processing 2\n        extract processing 3\n  transform_odd processing 3\n           load processing 3\n```\n### Managing Local State\nNodes are classes, and as such, you have the freedom to create any attribute you\nwant on a node.  You can actually define two additional methods on your nodes to\nset up and tear down node-local state.  It is important to note the order of\nexecution here.  All nodes in a pipeline will execute their `.begin()` methods\nin pipeline-order before any items are processed.  Each node will enter its\n`.end()` method only after it has processed all items, and after all parent\nnodes have finished their respective `.end()` methods.  Below, we've modified\nour LogNode to keep a running sum of all items that pass through it and end by\nprinting their sum.\n```python\nfrom consecution import Node, Pipeline\n\nclass LogNode(Node):\n    def begin(self):\n        self.sum = 0\n        print('{}.begin()'.format(self.name))\n\n    def process(self, item):\n        print('{: >15} processing {}'.format(self.name, item))\n        self.sum += item\n        self.push(item)\n\n    def end(self):\n        print('sum = {:d} in {}.end()'.format(self.sum, self.name))\n\n# Identical pipeline to merge example above, but with modified LogNode\npipe = Pipeline(\n    LogNode('extract') \n    | [LogNode('transform_even'), LogNode('transform_odd'), parity] \n    | LogNode('load')\n)\npipe.consume(range(4))\n```\n\nConsuming `range(4)` produces the following output\n```\nextract.begin()\ntransform_even.begin()\ntransform_odd.begin()\nload.begin()\n        extract processing 0\n transform_even processing 0\n           load processing 0\n        extract processing 1\n  transform_odd processing 1\n           load processing 1\n        extract processing 2\n transform_even processing 2\n           load processing 2\n        extract processing 3\n  transform_odd processing 3\n           load processing 3\nsum = 6 in extract.end()\nsum = 2 in transform_even.end()\nsum = 4 in transform_odd.end()\nsum = 6 in load.end()\n```\n\n\n### Managing Global State \nEvery node object has a `.global_state` attribute that is shared globally across\nall nodes in the pipeline.  The attribute is also available on the Pipeline\nobject itself.  The GlobalState object is a simple mutable python object whose\nattributes can be mutated by any node.  It also remains accesible on the\nPipeline object after all nodes have completed.  Below is a simple example of\nmutating and accessing global state.\n\n```python\nfrom consecution import Node, Pipeline, GlobalState\n\nclass LogNode(Node):\n    def process(self, item):\n        self.global_state.messages.append(\n            '{: >15} processing {}'.format(self.name, item)\n        )\n        self.push(item)\n        \n# create a global state object with a messages attribute\nglobal_state = GlobalState(messages=[])\n\n# Assign the predefined global_state to the pipeline\npipe = Pipeline(\n    LogNode('extract') | LogNode('transform') | LogNode('load'),\n    global_state=global_state)\n)\npipe.consume(range(3))\n\n# print the content of the global state message list\nfor msg in pipe.global_state.messages:\n    print msg\n```\n\nPrinting the contents of the messages list produces\n```\n  extract processing 0\ntransform processing 0\n     load processing 0\n  extract processing 1\ntransform processing 1\n     load processing 1\n  extract processing 2\ntransform processing 2\n     load processing 2\n```\n\n## Common Patterns\nThis section shows examples of how to implement some common patterns in\nconsecution.\n\n### Map\nMapping with nodes is very simple. Just push an altered item downstream.\n```python\nfrom consecution import Node, Pipeline\nclass Mapper(Node):\n    def process(self, item):\n        self.push(2 * item)\n\nclass LogNode(Node):\n    def process(self, item):\n        print('{: >15} processing {}'.format(self.name, item))\n        self.push(item)\n\npipe = Pipeline(\n    LogNode('extractor') | Mapper('mapper') | LogNode('loader')\n)\n\npipe.consume(range(3))\n```\nThis will produce an output of\n```\nextractor processing 0\n   loader processing 0\nextractor processing 1\n   loader processing 2\nextractor processing 2\n   loader processing 4\n```\n\n### Reduce\nReducing, or folding, is easily implemented by using the `.begin()`\nand `.end()` methods to handle accumulated values.\n```python\nfrom consecution import Node, Pipeline\nclass Reducer(Node):\n    def begin(self):\n        self.result = 0\n        \n    def process(self, item):\n        self.result += item\n        \n    def end(self):\n        self.push(self.result)\n\nclass LogNode(Node):\n    def process(self, item):\n        print('{: >15} processing {}'.format(self.name, item))\n        self.push(item)\n\npipe = Pipeline(\n    LogNode('extractor') | Reducer('reducer') | LogNode('loader')\n)\n\npipe.consume(range(3))\n```\nThis will produce an output of\n```\nextractor processing 0\nextractor processing 1\nextractor processing 2\n   loader processing 3\n```\n\n### Filter\nFiltering is as simple as placing the push statement behind a conditional. All\nitems that don't pass the conditional will not be pushed downstream, and thus\nsilently dropped.\n```python\nfrom consecution import Node, Pipeline\nclass Filter(Node):\n    def process(self, item):\n        if item > 3:\n            self.push(item)\n\nclass LogNode(Node):\n    def process(self, item):\n        print('{: >15} processing {}'.format(self.name, item))\n        self.push(item)\n\npipe = Pipeline(\n    LogNode('extractor') | Filter('filter') | LogNode('loader')\n)\n\npipe.consume(range(6))\n```\nThis produces an output of\n```\nextractor processing 0\nextractor processing 1\nextractor processing 2\nextractor processing 3\nextractor processing 4\n   loader processing 4\nextractor processing 5\n   loader processing 5\n```\n\n### Group By\nConsecution provides a specialized class you can inherit from to perform\ngrouping operations.  GroupBy nodes must define two methods: `.key(item)` and\n`.process(batch)`.  The `.key` method should return a key from an item that is used\nto identify groups.  Any time that key changes, a new group is initiated.  Like\nPython's `itertools.groupby`, you will usually want the GroupByNode to process\nsorted items.  The `.process` method functions exactly like the `.process`\nmethod on regular nodes, except that instead of being called with items,\nconsecution will call it with a batch of items contained in a list.\n```python\nclass LogNode(Node):\n    def process(self, item):\n        print('{: >15} processing {}'.format(self.name, item))\n        self.push(item)\n\nclass Batcher(GroupByNode):\n    def key(self, item):\n        return item // 4\n    \n    def process(self, batch):\n        sum_val = sum(batch)\n        self.push(sum_val)\n        \npipe = Pipeline(\n    Batcher('batcher') | LogNode('logger') \n)\n\npipe.consume(range(16))\n```\nThis produces an output of\n```\nlogger processing 6\nlogger processing 22\nlogger processing 38\nlogger processing 54\n```\n\n### Plugin-Style Composition\nConsecution forces you to think about problems in terms of how small processing\nunits are connected.  This separation between logic and connectivity can be\nexploited to create flexible and reusable solutions.  Basically, you specify the\nconnectivity you want to use in solving your problem, and then plug in the\nprocessing units later.  Breaking the problem up in this way allows you to swap\nout processing units to acheive different objectives with the same pipeline.\n\n```python\n# This function defines a pipeline that can use swappable processing nodes.\n# We don't worry about how we are going to do logging or aggregating.\n# We just focus on how the nodes are connected.\ndef pipeline_factory(log_node, agg_node):\n    pipe = Pipeline(\n        log_node('extractor') | agg_node('aggregator') | log_node('result_logger')\n    )\n    return pipe\n\n\n# Now we define a node for left-justified logging\nclass LeftLogNode(Node):\n    def process(self, item):\n        print('{: <15} processing {}'.format(self.name, item))\n        self.push(item)\n\n# And one for right-justified logging\nclass RightLogNode(Node):\n    def process(self, item):\n        print('{: >15} processing {}'.format(self.name, item))\n        self.push(item)\n\n# We can aggregate by summing\nclass SumNode(Node):\n    def begin(self):\n        self.result = 0\n        \n    def process(self, item):\n        self.result += item\n        \n    def end(self):\n        self.push(self.result)\n\n# Or we can aggregate by multiplying\nclass ProdNode(Node):\n    def begin(self):\n        self.result = 1\n        \n    def process(self, item):\n        self.result *= item\n        \n    def end(self):\n        self.push(self.result)\n\n\n# Now we plug in nodes to create a pipeline that left-prints sums\nsum_pipeline = pipeline_factory(log_node=LeftLogNode, agg_node=SumNode)\n\n# And a different pipeline that right prints products\nprod_pipeline = pipeline_factory(log_node=RightLogNode, agg_node=ProdNode)\n\nprint 'aggregate with sum, left justified\\n' + '-'*40\nsum_pipeline.consume(range(1, 5))\n\nprint '\\naggregate with product, right justified\\n' + '-'*40\nprod_pipeline.consume(range(1, 5))\n```\nThis produces the following output\n```\naggregate with sum, left justified\n----------------------------------------\nextractor       processing 1\nextractor       processing 2\nextractor       processing 3\nextractor       processing 4\nresult_logger   processing 10\n\naggregate with product, right justified\n----------------------------------------\n      extractor processing 1\n      extractor processing 2\n      extractor processing 3\n      extractor processing 4\n  result_logger processing 24\n```\n\n# Aggregation Example\nWe end with a full-blown example of using a pipeline to aggregate data from a\ncsv file.  The data is contained in \n<a href=\"https://raw.githubusercontent.com/robdmc/consecution/master/sample_data.csv\">\na csv file </a> that looks like this.\n\ngender |age |spent\n---    |--- |---\nmale   |11  |39.39\nfemale |10  |34.72\nfemale |15  |40.02\nmale   |19  |26.27\nmale   |13  |21.22\nfemale |40  |23.17\nfemale |52  |33.42\nmale   |33  |39.52\nfemale |16  |28.65\nmale   |60  |26.74\n\n\nAlthough there are much simpler ways of solving this problem, (e.g. with <a\nhref=\"https://github.com/robdmc/consecution/blob/master/pandashells.md\">\nPandashells</a>)\nwe deliberately construct a complex topology just to illustrate how to achieve\ncomplexity when it is actually needed.\n\nThe diagram below was produced from the code beneath it.  A quick glance at the\ndiagram makes it obvious how the data is being routed through the system.  The\ncode is heavily commented to explain features of the consecution toolkit.\n\n![Output Image](/images/gender_age.png?raw=true \"Gender Age Pipeline\")\n\n```python\nfrom __future__ import print_function\nfrom collections import namedtuple\nfrom pprint import pprint\nimport csv\nfrom consecution import Node, Pipeline, GlobalState\n\n# Named tuples are nice immutable containers \n# for passing data between nodes\nPerson = namedtuple('Person', 'gender age spent')\n\n# Create a pipeline that aggregates by gender and age\n# In creating the pipeline we focus on connectivity and don't\n# worry about defining node behavior.\ndef pipe_factory(Extractor, Agg, gender_router, age_router):\n    # Consecution provides a generic GlobalState class.  Any object can be used\n    # as the global_state in a pipeline, but the GlobalState object provides a\n    # nice abstraction where attributes can be accessed either by dot notation\n    # (e.g. global_state.my_attribute) or by dictionary notation (e.g.\n    # global_state['my_attribute'].  Furthermore, GlobalState objects can be\n    # instantiated with initialized attributes using key-word arguments as shown\n    # here.\n    global_state = GlobalState(segment_totals={})\n\n    # Notice, we haven't even defined the behavior of these nodes yet.  They\n    # will be defined later and are, for now, just passed into the factory\n    # function as arguments while we focus on getting the topology right.\n    pipe = Pipeline(\n        Extractor('make_person') |\n        [\n            gender_router,\n            (Agg('male') | [age_router, Agg('male_child'), Agg('male_adult')]),\n            (Agg('female') | [age_router, Agg('female_child'), Agg('female_adult')]),\n        ],\n        global_state=global_state\n    )\n\n    # Nodes can be created outside of a pipeline definition\n    adult = Agg('adult')\n    child = Agg('child')\n    total = Agg('total')\n\n    # Sometimes the topology you want to create cannot easily be expressed\n    # using the pipeline abstraction for wiring nodes together.  You can\n    # drop down to a lower level of abstraction by explicitly wiring nodes \n    # together using the .add_downstream() method.\n    adult.add_downstream(total)\n    child.add_downstream(total)\n\n    # Once a pipeline has been created, you can access individual nodes\n    # with dictionary-like indexing on the pipeline.\n    pipe['male_child'].add_downstream(child)\n    pipe['female_child'].add_downstream(child)\n    pipe['male_adult'].add_downstream(adult)\n    pipe['female_adult'].add_downstream(adult)\n\n    return pipe\n\n# Now that we have the topology of our pipeline defined, we can think about the\n# logic that needs to go into each node.  We start by defining a node that takes\n# a row from a csv file and tranforms it into a namedtuple.\nclass MakePerson(Node):\n    def process(self, item):\n        item['age'] = int(item['age'])\n        item['spent'] = float(item['spent'])\n        self.push(Person(**item))\n\n# We now define a node to perform our aggregations.  Mutable global state comes\n# with a lot of baggage and should be used with care.  This node illustrates\n# how to use global state to put all aggregations in a central location that\n# remains accessible when the pipeline finishes processing.\nclass Sum(Node):\n    def begin(self):\n        # initialize the node-local sum to zero\n        self.total = 0\n\n    def process(self, item):\n        # increment the node-local total and push the item down stream\n        self.total += item.spent\n        self.push(item)\n\n    def end(self):\n        # when pipeline is done, update global state with sum\n        self.global_state.segment_totals[self.name] = round(self.total, 2)\n\n\n# This function routes tuples based on their associated gender\ndef by_gender(item):\n    return '{}'.format(item.gender)\n\n# This function routes tuples based on whether the purchaser was an adult or\n# child\ndef by_age(item):\n    if item.age >= 18:\n        return '{}_adult'.format(item.gender)\n    else:\n        return '{}_child'.format(item.gender)\n\n# Here we plug our node definitions into our topology to create a fully-defined\n# pipeline.\npipe = pipe_factory(MakePerson, Sum, by_gender, by_age)\n\n# We can now visualize pipeline.\npipe.plot()\n\n# Now we feed our pipeline with rows from the csv file\nwith open('sample_data.csv') as f:\n    pipe.consume(csv.DictReader(f))\n\n# The global_state is also available as an attribute on the pipeline allowing\n# us to access it when the pipeline is finished.  This is a good way to \"return\"\n# an object from a pipeline.  Here we simply print the result.\nprint()\npprint(pipe.global_state.segment_totals)\n```\n\nAnd this is the result of running the pipeline with the sample csv file.\n```\n{'adult': 149.12,\n 'child': 164.0,\n 'female': 159.98,\n 'female_adult': 56.59,\n 'female_child': 103.39,\n 'male': 153.14,\n 'male_adult': 92.53,\n 'male_child': 60.61,\n 'total': 313.12}\n```\n\nAs illustrated in the <a\nhref=\"https://github.com/robdmc/consecution/blob/master/pandashells.md\">\nPandashells</a> example, this aggregation is actually much more simple to\nimplement in Pandas.  However, there are a couple of important caveats.\n\nThe Pandas solution must load the entire csv file into memory at once.  If you\nlook at the pipeline solution, you will notice that each node simply increments\nits local sum and passes the data downstream.  At no point is the data\ncompletely loaded into memory.  Although the Pandas code runs much faster due to\nthe highly optimized vectorized math it employes, the pipeline solution can\nprocess arbitrarily large csv files with a very small memory footprint.\n\nPerhaps the most exciting aspect of consecution is its ability to create\nrepeatable and testable data analysis pipelines.  Passing Pandas Dataframes\nthrough a consecution pipeline makes it very easy to encapsulate any analysis\ninto a well-defined, repeatable process where each node manipulates a dataframe\nin its prescribed way. Adopting this structure in analysis projects will\nundoubtedly ease the transition from analysis/research into production.\n\n___\nProjects by [robdmc](https://www.linkedin.com/in/robdecarvalho).\n* [Pandashells](https://github.com/robdmc/pandashells) Pandas at the bash command line\n* [Consecution](https://github.com/robdmc/consecution) Pipeline abstraction for Python\n* [Behold](https://github.com/robdmc/behold) Helping debug large Python projects\n* [Crontabs](https://github.com/robdmc/crontabs) Simple scheduling library for Python scripts\n* [Switchenv](https://github.com/robdmc/switchenv) Manager for bash environments\n* [Gistfinder](https://github.com/robdmc/gistfinder) Fuzzy-search your gists\n"
  },
  {
    "path": "consecution/__init__.py",
    "content": "# flake8: noqa\nfrom consecution.nodes import Node, GroupByNode\nfrom consecution.pipeline import Pipeline, GlobalState\nfrom consecution.utils import Clock\n\n__version__ = '0.2.0'\n\n\n"
  },
  {
    "path": "consecution/nodes.py",
    "content": "import sys\nfrom collections import Counter, deque, OrderedDict\nimport traceback\nfrom consecution.utils import Clock\n\n\nclass Node(object):\n    \"\"\"\n    :type name: str\n    :param str: The name of this node.  Must be unique within a pipeline.\n\n    :type kwargs:  keyword args\n    :param kwargs: Any additional keyword args are assigned as attributes\n                   on the node.\n\n    You create nodes by inheriting from this class.  You will be required to\n    implement a `.process()` on your class.  You can call the `.push()` method\n    from anywhere in your class implementation except from within the\n    `.begin()` method.\n\n    Note that although this documentation refers to \"the `.push` method\",\n    `push` is actually  a callable attribute assigned when nodes are placed\n    into pipelines.\n\n    Its signature is `.push(item)`, where `item` can be anything you want pushed\n    to nodes connected to the downstream side of the node.\n\n    \"\"\"\n    def __init__(self, name, **kwargs):\n        # assign any user-defined attributes\n        for k, v in kwargs.items():\n            setattr(self, k, v)\n        self.name = name\n        self._upstream_nodes = []\n        self._downstream_nodes = []\n\n        self._num_top_down_calls = 0\n\n        # node network can be visualized with pydot.  These hold args and kwargs\n        # that will be used to add and connect this node in the graph visualization\n        self._pydot_node_kwargs = dict(name=self.name, shape='rectangle')\n        self._pydot_edge_kwarg_list = []\n\n        self._router = None\n\n        # this will be one of three values: None, 'input', 'output'\n        self._logging = None\n\n        # add a clock to allow for timing\n        self.clock = Clock()\n\n    def __str__(self):\n        return 'N({})'.format(self.name)\n\n    def __repr__(self):\n        return self.__str__()\n\n    def __hash__(self):\n        \"\"\"\n        define __hash__ method. dicts and sets will use this as key\n        \"\"\"\n        return id(self)\n\n    def __eq__(self, other):\n        return self.__hash__() == other.__hash__()\n\n    def __lt__(self, other):\n        \"\"\"\n        I need this to be able to sort by name\n        \"\"\"\n        return self.name < other.name\n\n    def __getitem__(self, key):\n        msg = (\n            '\\n\\nYou cannot call __getitem__ on nodes.  You tried to call\\n'\n            '{self} [{key}]\\n'\n            'which doesn\\'t make sense.  You probably meant\\n'\n            '{self} | [{key}]\\n'\n        ).format(self=self, key=key)\n        raise ValueError(msg)\n\n    def _get_flattened_list(self, obj):\n        if isinstance(obj, Node):\n            return [obj]\n\n        elif hasattr(obj, '__iter__'):\n            nodes = []\n            for el in obj:\n                if isinstance(el, Node):\n                    nodes.append(el)\n                elif hasattr(el, '__iter__'):\n                    nodes.extend(self._get_flattened_list(el))\n            return nodes\n        else:\n            msg = (\n                'Don\\'t know what to do with {}.  It\\'s not a node, and it\\'s '\n                'not iterable.'\n            ).format(repr(obj))\n            raise ValueError(msg)\n\n    def _get_exposed_slots(self, obj, pointing):\n        nodes = set()\n        for node in self._get_flattened_list(obj):\n            if pointing == 'left':\n                nodes = nodes.union(node.initial_node_set)\n            elif pointing == 'right':\n                nodes = nodes.union(node.terminal_node_set)\n            else:\n                raise ValueError('pointing must be \"left\" or \"right\"')\n        return nodes\n\n    def _connect_lefts_to_rights(self, lefts, rights, router=None):\n        slots_from_left = self._get_exposed_slots(lefts, pointing='right')\n        slots_from_right = self._get_exposed_slots(rights, pointing='left')\n        for left in slots_from_left:\n            router_node = None\n            if router:\n                router_name = '{}.{}'.format(\n                    left.name, self._get_object_name(router))\n                end_point_map = {n.name: n for n in slots_from_right}\n                router_node = _RouterNode(\n                    router_name, end_point_map, router)\n                left.add_downstream(router_node)\n            for right in slots_from_right:\n                if router_node:\n                    router_node.add_downstream(right)\n                else:\n                    left.add_downstream(right)\n\n    def _get_object_name(self, obj):\n        class_name = obj.__class__.__name__\n        if class_name == 'function':\n            return obj.__name__\n        else:\n            return class_name\n\n    def _get_router(self, obj):\n        router = None\n        if hasattr(obj, '__iter__'):\n            routers = [el for el in obj if hasattr(el, '__call__')]\n            router = routers[0] if routers else None\n        return router\n\n    def __or__(self, other):\n        router = self._get_router(other)\n        self._connect_lefts_to_rights(self, other, router)\n        return self\n\n    def __ror__(self, other):\n        self._connect_lefts_to_rights(other, self)\n        return self\n\n    @property\n    def top_node(self):\n        \"\"\"\n        This attribute always holds the top-most node in the node graph.\n        Consecution only allows one top node.\n        \"\"\"\n        root_nodes = self.root_nodes\n        if len(root_nodes) > 1:\n            msg = 'You must remove one of the following input nodes {}'.format(\n                root_nodes)\n            raise ValueError(msg)\n        else:\n            return root_nodes.pop()\n\n    @property\n    def terminal_node_set(self):\n        \"\"\"\n        This attribute holds a set of all bottom nodes in the node graph.\n        \"\"\"\n        return {\n            node for node in self.depth_first_walk('down')\n            if len(node._downstream_nodes) == 0\n        }\n\n    @property\n    def initial_node_set(self):\n        \"\"\"\n        When piecing together fragments of a graph, you can temporarily have\n        connected nodes with multiple \"top-nodes.\"  This method returns this\n        set of nodes.  Node that consecution can only make pipelines from\n        graphs having a single top node.\n        \"\"\"\n        self.depth_first_walk('up')\n        return {\n            node for node in self.depth_first_walk('up')\n            if len(node._upstream_nodes) == 0\n        }\n\n    @property\n    def root_nodes(self):\n        \"\"\"\n        This attribute holds a list of all nodes that do not have any upstream\n        nodes attached.\n        \"\"\"\n        return [\n            node for node in self.all_nodes\n            if len(node._upstream_nodes) == 0\n        ]\n\n    @property\n    def all_nodes(self):\n        \"\"\"\n        This attribute contains a set of all nodes in the graph.\n        \"\"\"\n        return self.depth_first_walk('both')\n\n    def log(self, what):\n        \"\"\"\n        Calling this method on a node will turn on its logging feature.  This\n        means that the node will print logged items to the console.  You can\n        choose whether to log the inputs or outputs of a node.\n\n        :type name: what\n        :param what: One of 'input' or 'output' indicating whther you want to\n                     log the input or output of this node.\n        \"\"\"\n        allowed = ['input', 'output']\n        if what not in allowed:\n            raise ValueError(\n                '\\'what\\' argument must be in {}'.format(allowed)\n            )\n        self._logging = what\n\n    def _get_downstream_reps(self):\n        if self._downstream_nodes:\n            downstreams = sorted([n.name for n in self._downstream_nodes])\n\n            if len(downstreams) == 1:\n                downstreams = downstreams[0]\n\n            template = '{{: >{}s}} | {{}}\\n'.format(\n                self.pipeline._longest_node_name_len_)\n\n            self.pipeline._node_repr += template.format(\n                self.name, downstreams).replace('\\'', '')\n\n    def top_down_make_repr(self):\n        \"\"\"\n        You should never need to use this method.  It iterates through the node\n        graph in top-down order making a repr string for each node.\n        \"\"\"\n        if not hasattr(self, 'pipeline'):\n            raise ValueError(\n                'top_down_make_repr can only be called for nodes in a pipeline')\n\n        self.pipeline._longest_node_name_len_ = max(\n            len(n.name) for n in self.all_nodes)\n        self.pipeline._node_repr = ''\n        self.top_node.top_down_call('_get_downstream_reps')\n\n    def top_down_call(self, method_name):\n        \"\"\"\n        This utility method traverses the graph in top-down order and invokes\n        the named method on every node it encounters. It is used internally\n        to make sure the `.begin()` and `.end()` methods are not called before\n        their upstream counterparts.\n\n        :type method_name: str\n        :param method_name: The name of the method you would like to call in\n                            top-down order.\n        \"\"\"\n        # record the number of upstreams this node has\n        num_upstreams = len(self._upstream_nodes)\n\n        # if this node isn't pulling from multiple upstreams, it's ready\n        # to recurse to downstreams\n        if num_upstreams <= 1:\n            ready_for_downstreams = True\n        # this node isn't ready to recurse to downstreams until the current\n        # call would mean the last required call.\n        elif self._num_top_down_calls == num_upstreams - 1:\n            ready_for_downstreams = True\n        else:\n            ready_for_downstreams = False\n\n        # if ready to recurse, then call the method on self and recurse\n        # downwards.\n        if ready_for_downstreams:\n            getattr(self, method_name)()\n            for downstream in self._downstream_nodes:\n                downstream.top_down_call(method_name)\n            self._num_top_down_calls = 0\n        else:\n            self._num_top_down_calls += 1\n\n    def depth_first_walk(self, direction='both', as_ordered_list=False):\n        \"\"\"\n        This method walks the graph of connected nodes in depth-first\n        order.  It uses a stack to emulate recursion. See good explanation at\n        https://jeremykun.com/2013/01/22/depth-and-breadth-first-search/\n\n        :type direction: str\n        :param direction: one of 'up', 'down' or 'both' specifying the direction\n                          to walk.\n        :type as_ordered_list: Bool\n        :param as_ordered_list: If set to true, returns the walked nodes as\n                                an ordered list instead of an unordered set.\n\n        :rtype: list or set\n        :return: An iterable of the discovered nodes.\n        \"\"\"\n        return self.walk(\n            direction=direction, how='depth_first',\n            as_ordered_list=as_ordered_list)\n\n    def breadth_first_walk(self, direction='both', as_ordered_list=False):\n        \"\"\"\n        This method walks the graph of connected nodes in breadth-first\n        order.  It uses a stack to emulate recursion. See good explanation at\n        https://jeremykun.com/2013/01/22/depth-and-breadth-first-search/\n\n        :type direction: str\n        :param direction: one of 'up', 'down' or 'both' specifying the direction\n                          to walk.\n        :type as_ordered_list: Bool\n        :param as_ordered_list: If set to true, returns the walked nodes as\n                                an ordered list instead of an unordered set.\n\n        :rtype: list or set\n        :return: An iterable of the discovered nodes.\n        \"\"\"\n        return self.walk(\n            direction=direction, how='breadth_first',\n            as_ordered_list=as_ordered_list)\n\n    def walk(\n            self, direction='both', how='breadth_first', as_ordered_list=False):\n\n        \"\"\"\n        This is the core algorithm for walking a graph in specified order.  It\n        is used by the `breadth_first_walk` and `depth_first_walk` methods.\n\n        :type how: str\n        :param how: one of 'breadth_first' or 'depth_first'\n\n        :type direction: str\n        :param direction: one of 'up', 'down' or 'both' specifying the direction\n                          to walk.\n        :type as_ordered_list: Bool\n        :param as_ordered_list: If set to true, returns the walked nodes as\n                                an ordered list instead of an unordered set.\n\n        :rtype: list or set\n        :return: An iterable of the discovered nodes.\n        \"\"\"\n        if how not in {'depth_first', 'breadth_first'}:\n            raise ValueError(\n                '\\'how\\' argument must be one of '\n                '[\\'depth_first\\', \\'breadth_first\\']'\n            )\n        # What I really want is an ordered set, which doesn't exist.  So I'm\n        # using the keys of an ordered dict to get the functionality I want.\n        # I have no need for the values in this dict, only the keys.\n        visited_nodes = OrderedDict()\n\n        # holds nodes that still need to be explored\n        queue = deque([self])\n\n        # while I still have nodes that need exploring\n        while len(queue) > 0:\n            # get the next node to explore\n            node = queue.pop()\n\n            # if I've already seen this node, nothing to do, so go to next\n            if node in visited_nodes:\n                continue\n\n            # Make sure I don't visit this node again\n            # again.  I'm using an ordered dict to mimic an ordered set.\n            # I have no need for the value, so set it to None\n            visited_nodes[node] = None\n\n            neighbor_dict = {\n                'up': node._upstream_nodes,\n                'down': node._downstream_nodes,\n                'both': node._upstream_nodes + node._downstream_nodes,\n            }\n            if direction not in neighbor_dict:\n                raise ValueError(\n                    'direction must be \\'up\\', \\'dowwn\\' or \\'both\\'')\n            neighbors = neighbor_dict[direction]\n\n            # search all neightbors to this node for unvisited nodes\n            for node in neighbors:\n                # if you find unvisited node, add it to nodes needing visit\n                if node not in visited_nodes:\n                    if how == 'breadth_first':\n                        queue.appendleft(node)\n                    else:\n                        queue.append(node)\n\n        # should have hit all nodes in the graph at this point\n        if as_ordered_list:\n            return list(visited_nodes.keys())\n        else:\n            return set(visited_nodes.keys())\n\n    def _check_for_dups(self):\n        counter = Counter()\n        for node in self.all_nodes:\n            counter.update({node.name: 1})\n        dups = [name for (name, count) in counter.items() if count > 1]\n        if dups:\n            msg = (\n                '\\n\\nNode names must be unique.  Dupicates {} found.'\n            ).format(list(dups))\n            raise ValueError(msg)\n        return\n\n    def _check_for_cycles(self):\n        self_and_upstreams = self.depth_first_walk('up')\n        downstreams = self.depth_first_walk('down') - {self}\n        common_nodes = self_and_upstreams.intersection(downstreams)\n        if common_nodes:\n            raise ValueError('\\n\\nYour graph is not acyclic.  It has loops.')\n\n    def _validate_node(self, other):\n        # only nodes allowed to be connected\n        if not isinstance(other, Node):\n            raise ValueError('Trying to connect a non-node type')\n\n    def add_downstream(self, other):\n        \"\"\"\n        You will probably use this method quite a bit.  It is used to manually\n        attach a downstream node.\n\n        :type other: consecution.Node\n        :param other: An instance of the node you want to attach\n        \"\"\"\n        self._validate_node(other)\n        self._downstream_nodes.append(other)\n        other._upstream_nodes.append(self)\n\n        self._check_for_dups()\n        if self.name == other.name:\n            raise ValueError('{} can\\'t be downstream to itself'.format(self))\n        self._check_for_cycles()\n\n        self._pydot_edge_kwarg_list.append(\n            dict(tail_name=self.name, head_name=other.name))\n\n    def remove_downstream(self, other):\n        \"\"\"\n        This method removes the given node from being attached as a downstream\n        node.\n\n        :type other: consecution.Node\n        :param other: An instance of the node you want to remove\n        \"\"\"\n        # remove self from the other's upstreams\n        other._upstream_nodes = [\n            n for n in other._upstream_nodes if n.name != self.name]\n\n        # remove other from self's downstream nodes\n        self._downstream_nodes = [\n            n for n in self._downstream_nodes if n.name != other.name]\n\n        # remove this connection from the pydot kwargs list\n        new_kwargs_list = []\n        for kwargs in self._pydot_edge_kwarg_list:\n            if kwargs['head_name'] == other.name:\n                continue\n            new_kwargs_list.append(kwargs)\n        self._pydot_edge_kwarg_list = new_kwargs_list\n\n    def _build_pydot_graph(self):\n        \"\"\"\n        This private method builds a pydot graph\n        \"\"\"\n        # define kwargs lists for creating the visualization (these are closure vars for function below)\n        node_kwargs_list, edge_kwargs_list = [], []\n\n        # define a function to map over all nodes to aggreate viz kwargs\n        def collect_kwargs(node):\n            node_kwargs_list.append(node._pydot_node_kwargs)\n            edge_kwargs_list.extend(node._pydot_edge_kwarg_list)\n\n        for node in self.all_nodes:\n            collect_kwargs(node)\n\n        # doing import inside method so that pydot dependency is optional\n        from graphviz import Digraph\n\n        # create a pydot graph\n        graph = Digraph(comment='pipeline')\n\n        # create pydot nodes for every node connected to this one\n        for node_kwargs in node_kwargs_list:\n            graph.node(**node_kwargs)\n\n        # creat pydot edges between all nodes connected to this one\n        for edge_kwargs in edge_kwargs_list:\n            graph.edge(**edge_kwargs)\n\n        return graph\n\n    def plot(\n            self, file_name='pipeline', kind='png'):\n        \"\"\"\n        This method draws a visualization of your processing graph.  You must\n        have graphviz installed on your system for it to work properly.  (See\n        install instructions.)\n\n        If you are running consecution in an Jupyter notebook, you can display\n        an inline visualization of a pipeline by simply making the pipeline be\n        the final expression in a cell.\n\n        :type file_name: str\n        :param file_name: The name of the image file to generate\n\n        :type kind: str\n        :param kind: The kind of file to generate (png, pdf)\n        \"\"\"\n        graph = self._build_pydot_graph()\n\n        # define allowed formats for saving the graph visualization\n        ALLOWED_KINDS = {'pdf', 'png'}\n        if kind not in ALLOWED_KINDS:\n            raise ValueError('Only the following kinds are supported: {}'.format(ALLOWED_KINDS))\n\n        # set the output format\n        graph.format = kind\n\n        file_name = file_name.replace('.{}'.format(kind), '')\n\n        # write the output file\n        try:\n            graph.render(file_name)\n        except RuntimeError:\n            sys.stderr.write(\n                '\\n\\n'\n                '=========================================================\\n'\n                'Problem executing GraphViz.  Make sure you have it\\n'\n                'properly installed.\\n'\n                'http://www.graphviz.org/\\n'\n                'If you are on a mac, you should be able to install it with\\n'\n                'brew install graphviz.\\n\\n'\n                'If you are on ubuntu, you can install it with\\n'\n                'apt-get install graphviz\\n'\n                '=========================================================\\n'\n                '\\n\\n'\n            )\n            raise\n\n    def process(self, item):\n        \"\"\"\n        :type item: object\n        :param item: The item this node should process\n\n        You must override this method with your own logic.\n        \"\"\"\n        raise NotImplementedError(\n            (\n                'Error in node named {}\\n'\n                'You must define a .process(self, item) method on all nodes'\n            ).format(repr(self.name))\n        )\n\n    def reset(self):\n        \"\"\"\n        User can override this to do whatever logic they want.\n        \"\"\"\n\n    def _logged_process(self, item):\n        if self._logging == 'input':\n            self._write_log(item)\n        self.process(item)\n\n    def _begin(self):\n        try:\n            self.begin()\n        except AttributeError:\n            e = sys.exc_info()[1]\n            tb = sys.exc_info()[2]\n            (\n                code_file, line_no, method_name, line_txt\n            ) = traceback.extract_tb(tb)[-1]\n            msg = str(e) + (\n                '\\n\\nError in .begin() method of \\'{}\\' node.\\n'\n                'Are you trying to call .push() from inside the\\n'\n                '.begin() method?  That is not allowed.\\n\\n'\n                'file: {}, line{}\\n--> {}\\n\\n'\n            ).format(self.name, code_file, line_no, line_txt)\n            traceback.print_exc()\n            raise AttributeError(msg)\n\n    def begin(self):\n        pass\n\n    def end(self):\n        pass\n\n    def _write_log(self, item):\n        sys.stdout.write('node_log,{},{},{}\\n'.format(self._logging, self.name, item))\n\n    def _push(self, item):\n        \"\"\"\n        This is the default pusher.  It pushes to all downstreams.\n        \"\"\"\n        if self._logging == 'output':\n            self._write_log(item)\n\n        # The _process attribute will be set to the appropriate callable\n        # when initializing the pipeline.  I do this because I want the\n        # chaining to be as efficient as possible.  If logging is not set,\n        # I don't want to have to hit that logic every push, so I just\n        # invoke a callable attribute at each process that has been set\n        # to the appropriate callable.\n        for downstream in self._downstream_nodes:\n            downstream._process(item)\n\n\nclass _RouterNode(Node):\n    \"\"\"\n    This node will route to downstreams.  The router function needs to\n    return the name of the destination node.\n    \"\"\"\n    def __init__(self, name, end_point_map, route_callable):\n        super(_RouterNode, self).__init__(name)\n        self._end_point_map = end_point_map\n        self._pydot_node_kwargs = dict(name=self.name, shape='oval')\n        self._route_callable = route_callable\n\n    def process(self, item):\n        \"\"\"\n        This is the default pusher.  It pushes to all downstreams.\n        \"\"\"\n        node = self._end_point_map.get(self._route_callable(item), None)\n        if node is None:\n            raise ValueError(\n                (\n                    '\\n\\nRouter node {} encountered bad route path {}.  Valid '\n                    'route paths are {}.'\n                ).format(\n                    self.name,\n                    repr(self._route_callable(item)),\n                    [n.name for n in self._downstream_nodes]\n                )\n            )\n\n        node._process(item)\n\n\nclass GroupByNode(Node):\n    def __init__(self, *args, **kwargs):\n        super(GroupByNode, self).__init__(*args, **kwargs)\n        self._batch_ = []\n        self._previous_key = '__no_previous_key__'\n\n    def key(self, item):\n        \"\"\"\n        You must define this method.\n\n        :type item: object\n        :param item: The item you are processing\n\n        :rtype: hashable object\n        :return: a hashable object that serves as a key for the grouping process\n        \"\"\"\n        raise NotImplementedError(\n            'you must define a .key(self, item) method on all '\n            'GroupBy nodes.'\n        )\n\n    def process(self, batch):\n        \"\"\"\n        You must define this method.\n\n        :type batch: iterable\n        :param batch: A batch of items having the same key\n        \"\"\"\n        raise NotImplementedError(\n            'You must define a .process(self, batch) method on all GroupBy '\n            'nodes.'\n        )\n\n    def _process_item(self, item):\n        key = self.key(item)\n        if key != self._previous_key:\n            self._previous_key = key\n            if len(self._batch_) > 0:\n                self.process(self._batch_)\n            self._batch_ = [item]\n        else:\n            self._batch_.append(item)\n\n    def _end(self):\n        self.process(self._batch_)\n        self._batch_ = []\n\n    def __getattribute__(self, name):\n        \"\"\"\n        This should trap for the end() method calls and install\n        pre hook.\n        \"\"\"\n        if name == 'end':\n            def wrapper():\n                self._end()\n                return super(GroupByNode, self).__getattribute__(name)()\n            return wrapper\n        else:\n            return super(GroupByNode, self).__getattribute__(name)\n"
  },
  {
    "path": "consecution/pipeline.py",
    "content": "import sys\nfrom consecution.nodes import GroupByNode\n\n\nclass GlobalState(object):\n    \"\"\"\n    GlobalState is a simple container class that sets its attributes from\n    constructor kwargs.  It supports both object and dictionary access to its\n    attributes.  So, for example, all of the following statements are supported.\n\n    .. code-block:: python\n\n       from consecution import GlobalState\n\n       global_state = GlobalState(a=1, b=2)\n       global_state['c'] = 2\n       a = global_state['a']\n\n    An object of this class will be created as the default ``.global_state``\n    attribute on a Pipeline if you do not explicitely provide a global_state\n    argument to the constructor.\n    \"\"\"\n    # I'm using unconventional \"_item_self_\" name here to avoid\n    # conflicts when kwargs actually contain a \"self\" arg.\n\n    def __init__(_item_self, **kwargs):\n        for key, val in kwargs.items():\n            _item_self[key] = val\n\n    def __str__(_item_self):\n        quoted_keys = [\n            '\\'{}\\''.format(k) for k in sorted(vars(_item_self).keys())]\n        att_string = ', '.join(quoted_keys)\n        return 'GlobalState({})'.format(att_string)\n\n    def __repr__(_item_self):\n        return _item_self.__str__()\n\n    def __setitem__(_item_self, key, value):\n        setattr(_item_self, key, value)\n\n    def __getitem__(_item_self, key):\n        return getattr(_item_self, key)\n\n\nclass Pipeline(object):\n    \"\"\"\n    :type node: Node\n    :param node: Any node in a connected graph\n\n    :type global_state:  object\n    :param global_state: Any python object you want to use for holding global\n                         state.\n\n    Once Nodes have been wired together, they must be placed in a pipeline in\n    order to process data.  If you would like to peform pipeline-level set up and\n    tear-down logic, you can subclass from Pipeline and override the\n    ``.begin()`` and ``end()`` methods.\n    \"\"\"\n    def __init__(self, node, global_state=None):\n        # get a reference to the top node of the connected nodes supplied.\n        self.top_node = node.top_node\n\n        # set the pipeline global state\n        if global_state:\n            self.global_state = global_state\n        else:\n            self.global_state = GlobalState()\n\n        # initialize an empty lookup for nodes\n        self._node_lookup = {}\n\n        # initialize the pipeline\n        self.initialize()\n\n    def initialize(self, with_push=False):\n        # define a flag to determine if the pipeline is \"running\" or not\n        # it will only be true between when the .begin() is run and the\n        # .end() method is run.\n        self._is_running = False\n        self._needs_log_header = False\n\n        # initialize each node\n        for node in self.top_node.all_nodes:\n            self.initialize_node(node, with_push)\n\n        # build the pipeline repr by cycling through all the nodes\n        self.top_node.top_down_make_repr()\n\n        # print a logging header if any node is logging\n        if self._needs_log_header:\n            sys.stdout.write('node_log,what,node_name,item\\n')\n\n    def initialize_node(self, node, with_push=False):\n        # give node reference to pipeline attributes\n        node.pipeline = self\n        node.global_state = self.global_state\n\n        # make node available for lookup\n        self._node_lookup[node.name] = node\n\n        # set the _process callable to be either logged or unlogged\n        # TODO: might want to change this logic so that groupby nodes\n        # can be logged\n        if isinstance(node, GroupByNode):\n            node._process = node._process_item\n        elif node._logging is None:\n            node._process = node.process\n        else:\n            self._needs_log_header = True\n            node._process = node._logged_process\n\n        # for single downstreams with no logging, can short-circuit all logic\n        # and directly wire up the downstream process() callable as the\n        # push callable on this node\n        short_it = len(node._downstream_nodes) == 1\n        short_it = short_it and node._downstream_nodes[0]._logging is None\n        short_it = short_it and not isinstance(\n            node._downstream_nodes[0], GroupByNode)\n\n        # only initialize push if requsted\n        if with_push:\n            if short_it and node._logging is None:\n                node.push = node._downstream_nodes[0].process\n\n            # logged or multiple downstreams require logic, so no short circuit\n            else:\n                node.push = node._push\n\n    def __getitem__(self, name):\n        node = self._node_lookup.get(name, None)\n        if node is None:\n            raise KeyError('No node named \\'{}\\''.format(name))\n        return node\n\n    def __setitem__(self, name_to_replace, replacement_node):\n        # make sure replacement node has proper name\n        if name_to_replace != replacement_node.name:\n            raise ValueError(\n                'Replacement node must have the same name.'\n            )\n\n        # this will automatically raise error if the name doesn't exist\n        node_to_replace = self[name_to_replace]\n\n        removals = []\n        additions = []\n\n        for upstream in node_to_replace._upstream_nodes:\n            removals.append((upstream, node_to_replace))\n            additions.append((upstream, replacement_node))\n            # handle special case of upstream being a routing node\n            if hasattr(upstream, '_end_point_map'):\n                upstream._end_point_map[name_to_replace] = replacement_node\n\n        for downstream in node_to_replace._downstream_nodes:\n            removals.append((node_to_replace, downstream))\n            additions.append((replacement_node, downstream))\n\n        for upstream, downstream in removals:\n            upstream.remove_downstream(downstream)\n\n        for upstream, downstream in additions:\n            upstream.add_downstream(downstream)\n\n        # initialize the replacement node within the pipeline\n        self.initialize_node(replacement_node)\n\n        # if top node was replaced then make sure pipeline nows about it\n        if replacement_node.name == self.top_node.name:\n            self.top_node = replacement_node\n\n    def __getattribute__(self, name):\n        \"\"\"\n        This should trap for the begin() and end() method calls and install\n        pre/post hooks for when they are called either on the pipeline\n        class or on any class derived from it.\n        \"\"\"\n        if name == 'begin':\n            def wrapper():\n                super(Pipeline, self).__getattribute__(name)()\n                self._begin()\n            return wrapper\n        elif name == 'end':\n            def wrapper():\n                self._end()\n                return super(Pipeline, self).__getattribute__(name)()\n            return wrapper\n        elif name == 'reset':\n            def wrapper():\n                self._reset()\n                return super(Pipeline, self).__getattribute__(name)()\n            return wrapper\n        else:\n            return super(Pipeline, self).__getattribute__(name)\n\n    def begin(self):\n        \"\"\"\n        Override this method to execute any logic you want to perform before\n        setting up nodes.  The ``.begin()`` method of all nodes will be called.\n        \"\"\"\n\n    def end(self):\n        \"\"\"\n        Override this method to execute any logic you want to perform after\n        all nodes are done processing data. The ``.end()`` method of all nodes\n        will be called.\n        \"\"\"\n\n    def reset(self):\n        \"\"\"\n        Override this with any logic you'd like to perform for resetting the\n        pipeline. The ``.reset()`` method of all nodes will be called.\n        \"\"\"\n\n    def _reset(self):\n        self.top_node.top_down_call('reset')\n\n    def _begin(self):\n        self.top_node.top_down_call('_begin')\n        self.initialize(with_push=True)\n        self._is_running = True\n\n    def _end(self):\n        self.top_node.top_down_call('end')\n        self._is_running = False\n\n    def push(self, item):\n        \"\"\"\n        You can manually push items to your pipeline using this meethod.\n\n        :type item: object\n        :param item: Any object you would like the pipeline to process\n        \"\"\"\n        if not self._is_running:\n            self.begin()\n        self.top_node._process(item)\n\n    def consume(self, iterable):\n        \"\"\"\n        The pipeline will process each item in the iterable.\n\n        :type iterable: A Python Iterable\n        :param iterable: An iterable of objects you would like to process\n        \"\"\"\n        self.begin()\n        for item in iterable:\n            self.top_node._process(item)\n        return self.end()\n\n    def plot(self, file_name='pipeline', kind='png'):\n        \"\"\"\n        Call this method to produce a visualization of your pipeline.  The\n        Graphviz library will be used to generate the image file.  Note that\n        pipelines are automatically visualized in IPython notebook when they are\n        evaluated as the last expression in a cell.\n\n        :type file_name: str\n        :param file_name: The name of the image file to save\n\n        :type kind: str\n        :param kind: The type of image file to produce (png, pdf)\n        \"\"\"\n        self.top_node.plot(file_name, kind)\n        return self\n\n    def __str__(self):\n        return (\n            '\\nPipeline\\n'\n            '----------------------------------'\n            '----------------------------------\\n{}'\n            '----------------------------------'\n            '----------------------------------\\n'\n        ).format(self._node_repr)\n\n    def __repr__(self):\n        return self.__str__()\n\n    # No good way to test this unless you know dot is installed.\n    def _repr_svg_(self):  # pragma: no cover\n        return self.top_node._build_pydot_graph()._repr_svg_()\n"
  },
  {
    "path": "consecution/tests/__init__.py",
    "content": ""
  },
  {
    "path": "consecution/tests/nodes_tests.py",
    "content": "import os\nfrom collections import namedtuple\nimport shutil\nimport tempfile\nfrom unittest import TestCase\nimport subprocess\n\nfrom mock import patch\n\nfrom consecution.nodes import Node\n\n\ndef dot_installed():\n    p = subprocess.Popen(\n        ['bash', '-c', 'which dot'], stdout=subprocess.PIPE)\n    p.wait()\n    result = p.stdout.read().decode(\"utf-8\")\n    return 'dot' in result\n\n\nclass FakeDigraph(object):  # pragma: no cover\n    def __init__(self, *args, **kwargs):\n        pass\n\n    def node(self, *args, **kwargs):\n        pass\n\n    def edge(self, *args, **kwargs):\n        pass\n\n    def render(self, *args, **kwargs):\n        raise RuntimeError('fake runtime error')\n\n\nclass NodeUnitTests(TestCase):\n    def test_bad_logging_args(self):\n        n = Node('a')\n        with self.assertRaises(ValueError):\n            n.log('bad')\n\n    def test_bad_top_down_make_repr_call(self):\n        n = Node('a')\n        with self.assertRaises(ValueError):\n            n.top_down_make_repr()\n\n    def test_args_as_atts(self):\n        n = Node('my_node', silly_attribute='silly')\n        self.assertEqual(n.silly_attribute, 'silly')\n\n    def test_comparisons(self):\n        a = Node('a')\n        b = Node('b')\n\n        self.assertTrue(a == a)\n        self.assertFalse(a == b)\n\n        self.assertTrue(a < b)\n        self.assertFalse(b < a)\n\n    def test_bad_flattening(self):\n        a = Node('a')\n        with self.assertRaises(ValueError):\n            a | 7\n\n    @patch(\n        'consecution.nodes.Node._build_pydot_graph', lambda a: FakeDigraph())\n    def test_graphviz_not_installed(self):\n        a = Node('a')\n        b = Node('b')\n        p = a | b\n        with self.assertRaises(RuntimeError):\n            p.plot()\n\n    def test_no_getitem(self):\n        a = Node('a')\n        with self.assertRaises(ValueError):\n            a['b']\n\n    def test_bad_slot_name(self):\n        a = Node('a')\n        b = Node('b')\n        with self.assertRaises(ValueError):\n            a._get_exposed_slots(b, 'bad_arg')\n\n\nclass ExplicitWiringTests(TestCase):\n    def setUp(self):\n        self.temp_dir = tempfile.mkdtemp()\n\n    def tearDown(self):\n        shutil.rmtree(self.temp_dir)\n\n    def do_wiring(self):\n        self.do_explicit_wiring()\n\n    def do_explicit_wiring(self):\n        # define nodes\n        a = Node('a')\n        b = Node('b')\n        c = Node('c')\n        d = Node('d')\n        e = Node('e')\n        f = Node('f')\n        g = Node('g')\n        h = Node('h')\n        i = Node('i')\n        j = Node('j')\n        k = Node('k')\n        l = Node('l') # noqa.  okay to use l as var here\n        m = Node('m')\n        n = Node('n')\n\n        # save a list of all nodes\n        self.node_list = [a, b, c, d, e, f, g, h, i, j, k, l, m, n]\n        self.top_node = a\n\n        # wire up the nodes\n        a.add_downstream(b)\n        a.add_downstream(c)\n\n        c.add_downstream(d)\n        c.add_downstream(e)\n\n        e.add_downstream(f)\n        e.add_downstream(g)\n        e.add_downstream(h)\n        e.add_downstream(i)\n\n        f.add_downstream(j)\n        g.add_downstream(j)\n        h.add_downstream(j)\n        i.add_downstream(j)\n\n        d.add_downstream(k)\n        j.add_downstream(k)\n\n        b.add_downstream(l)\n        k.add_downstream(l)\n\n        l.add_downstream(m)\n        l.add_downstream(n)\n\n        # same network in graph notation\n        # a | [\n        #    b,\n        #    c | [\n        #            d,\n        #            e  | [f, g, h, i, my_router] | j\n        #    ] | k\n        # ] | l [m, n]\n\n    def do_graph_wiring(self):\n        # define nodes\n        a = Node('a')\n        b = Node('b')\n        c = Node('c')\n        d = Node('d')\n        e = Node('e')\n        f = Node('f')\n        g = Node('g')\n        h = Node('h')\n        i = Node('i')\n        j = Node('j')\n        k = Node('k')\n        l = Node('l') # noqa.  okay to use l as var here\n        m = Node('m')\n        n = Node('n')\n\n        # save a list of all nodes\n        self.node_list = [a, b, c, d, e, f, g, h, i, j, k, l, m, n]\n        self.top_node = a\n\n        a | [  # noqa\n               b,\n               c | [\n                       d,\n                       e | [f, g, h, i] | j\n                   ] | k\n            ] | l | [m, n]\n\n    def test_connections(self):\n        Conns = namedtuple('Conns', 'node upstreams downstreams')\n        self.do_wiring()\n        n = {\n            node.name: Conns(\n                node.name,\n                {u.name for u in node._upstream_nodes},\n                {d.name for d in node._downstream_nodes}\n            )\n            for node in self.node_list\n        }\n        self.assertEqual(n['a'].upstreams, set())\n        self.assertEqual(n['a'].downstreams, {'b', 'c'})\n\n        self.assertEqual(n['b'].upstreams, {'a'})\n        self.assertEqual(n['b'].downstreams, {'l'})\n\n        self.assertEqual(n['c'].upstreams, {'a'})\n        self.assertEqual(n['c'].downstreams, {'d', 'e'})\n\n        self.assertEqual(n['e'].upstreams, {'c'})\n        self.assertEqual(n['e'].downstreams, {'f', 'g', 'h', 'i'})\n\n        self.assertEqual(n['f'].upstreams, {'e'})\n        self.assertEqual(n['f'].downstreams, {'j'})\n\n        self.assertEqual(n['g'].upstreams, {'e'})\n        self.assertEqual(n['g'].downstreams, {'j'})\n\n        self.assertEqual(n['h'].upstreams, {'e'})\n        self.assertEqual(n['h'].downstreams, {'j'})\n\n        self.assertEqual(n['i'].upstreams, {'e'})\n        self.assertEqual(n['i'].downstreams, {'j'})\n\n        self.assertEqual(n['d'].upstreams, {'c'})\n        self.assertEqual(n['d'].downstreams, {'k'})\n\n        self.assertEqual(n['j'].upstreams, {'f', 'g', 'h', 'i'})\n        self.assertEqual(n['j'].downstreams, {'k'})\n\n        self.assertEqual(n['k'].upstreams, {'j', 'd'})\n        self.assertEqual(n['k'].downstreams, {'l'})\n\n        self.assertEqual(n['l'].upstreams, {'k', 'b'})\n        self.assertEqual(n['l'].downstreams, {'m', 'n'})\n\n    def test_all_nodes(self):\n        self.do_wiring()\n        expected_set = set(self.node_list)\n        all_nodes_set = [\n            set(node.all_nodes) for node in self.node_list\n        ]\n        self.assertTrue(all(\n            [expected_set == found_set for found_set in all_nodes_set]))\n\n    def test_top_node(self):\n        self.do_wiring()\n        top_node_set = {node.top_node for node in self.node_list}\n        self.assertEqual(top_node_set, {self.top_node})\n\n    def test_duplicate_node(self):\n        self.do_wiring()\n\n        # this test is funky in that it has assertion in a loop.\n        # but I wanted to be sure cycles are detected everywhere\n        for name in [n.name for n in self.top_node.all_nodes]:\n            dup = Node(name)\n            with self.assertRaises(ValueError):\n                self.top_node.add_downstream(dup)\n\n    def test_acyclic(self):\n        self.do_wiring()\n\n        # this test is funky in that it has assertion in a loop.\n        # but I wanted to be sure dups are detected everywhere\n        for node in self.top_node.all_nodes:\n            with self.assertRaises(ValueError):\n                node.add_downstream(self.top_node)\n\n    def test_multi_root(self):\n        self.do_wiring()\n        other_root = Node('dual_root')\n        other_root.add_downstream(self.top_node._downstream_nodes[0])\n\n        with self.assertRaises(ValueError):\n            other_root.top_node\n\n    def test_non_node_connect(self):\n        node = Node('a')\n        other = 'not a node'\n        with self.assertRaises(ValueError):\n            node.add_downstream(other)\n\n    def test_write(self):\n        # don't run coverage on this because won't test travis with\n        # both dot installed and not installed.\n        if dot_installed():  # pragma: no cover\n            self.do_wiring()\n            out_file = os.path.join(self.temp_dir, 'out.png')\n            self.top_node.plot(out_file)\n            # uncomment the next line if you want to look at the graph\n            os.system('cp {} /tmp'.format(out_file))\n\n    def test_write_bad_kind(self):\n        self.do_wiring()\n        with self.assertRaises(ValueError):\n            self.top_node.plot(kind='bad')\n\n    def test_bad_search_direction(self):\n        self.do_wiring()\n        with self.assertRaises(ValueError):\n            self.top_node.breadth_first_walk(direction='bad')\n\n    def test_bad_search_method(self):\n        self.do_wiring()\n        with self.assertRaises(ValueError):\n            self.top_node.walk(how='bad')\n\n\nclass DSLWiringTests(ExplicitWiringTests):\n    def do_wiring(self):\n        self.do_graph_wiring()\n\n\nclass TopDownCallTests(TestCase):\n    def test_call_order_okay(self):\n\n        # a toy class that holds a class variable\n        # tracking what order objects get called in\n        class MyNode(Node):\n            call_list = []\n\n            def end(self):\n                self.__class__.call_list.append(self)\n\n        a = MyNode('a')\n        b = MyNode('b')\n        c = MyNode('c')\n        d = MyNode('d')\n        e = MyNode('e')\n        f = MyNode('f')\n        g = MyNode('g')\n\n        a | [\n            b | c,\n            d | e | f\n        ] | g\n        a.top_node.top_down_call('end')\n\n        # make a dictionary with order in which nodes\n        # were called\n        call_number = {\n            node: ind for (ind, node) in enumerate(a.__class__.call_list)}\n\n        # make sure ording of one branch is right\n        self.assertTrue(call_number[a] < call_number[b])\n        self.assertTrue(call_number[b] < call_number[c])\n        self.assertTrue(call_number[c] < call_number[g])\n\n        # make sure ordering of other branch is okay\n        self.assertTrue(call_number[a] < call_number[d])\n        self.assertTrue(call_number[d] < call_number[e])\n        self.assertTrue(call_number[e] < call_number[f])\n        self.assertTrue(call_number[f] < call_number[g])\n\n\nclass BreadthFirstSearchTests(TestCase):\n    def test_top_down_order(self):\n        a = Node('a')\n        b = Node('b')\n        c = Node('c')\n        d = Node('d')\n        e = Node('e')\n        f = Node('f')\n        h = Node('h')\n        i = Node('i')\n\n        def silly_router(item):  # pragma: no cover\n            return 0\n\n        a | [b, c] | [d, e, f, silly_router] | [h, i]\n        nodes = a.top_node.breadth_first_walk(\n            direction='down', as_ordered_list=True)\n        level5 = {nodes.pop() for nn in range(2)}\n        level4 = {nodes.pop() for nn in range(3)}\n        level3 = {nodes.pop() for nn in range(2)}\n        level2 = {nodes.pop() for nn in range(2)}\n        level1 = {nodes.pop() for nn in range(1)}\n\n        self.assertEqual(level1, {a})\n        self.assertEqual(level2, {b, c})\n        self.assertEqual(len(level3), 2)\n        self.assertEqual(level4, {d, e, f})\n        self.assertEqual(level5, {h, i})\n\n    def test_bottom_up_order(self):\n        a = Node('a')\n        b = Node('b')\n        c = Node('c')\n        d = Node('d')\n        e = Node('e')\n        f = Node('f')\n        h = Node('h')\n\n        def silly_router(item):  # pragma: no cover\n            return 0\n\n        a | [b, c] | [d, e, f, silly_router] | h\n        nodes = h.breadth_first_walk(direction='up', as_ordered_list=True)\n        nodes = nodes[::-1]\n        level5 = {nodes.pop() for nn in range(1)}\n        level4 = {nodes.pop() for nn in range(3)}\n        level3 = {nodes.pop() for nn in range(2)}\n        level2 = {nodes.pop() for nn in range(2)}\n        level1 = {nodes.pop() for nn in range(1)}\n\n        self.assertEqual(level1, {a})\n        self.assertEqual(level2, {b, c})\n        self.assertEqual(len(level3), 2)\n        self.assertEqual(level4, {d, e, f})\n        self.assertEqual(level5, {h})\n\n\nclass PrintingTests(TestCase):\n    def setUp(self):\n        # define nodes\n        a = Node('a')\n        b = Node('b')\n        c = Node('c')\n        d = Node('d')\n        e = Node('e')\n        f = Node('f')\n        g = Node('g')\n        h = Node('h')\n        i = Node('i')\n        j = Node('j')\n        k = Node('k')\n        l = Node('l') # noqa okay to use l here\n        m = Node('m')\n        n = Node('n')\n\n        class DummyPipeline(object):\n            pass\n\n        pipeline = DummyPipeline()\n\n        # save a list of all nodes\n        self.node_list = [a, b, c, d, e, f, g, h, i, j, k, l, m, n]\n        self.top_node = a\n\n        def my_router(item):  # pragma: no cover\n            return 'm'\n\n        # wire up nodes using dsl\n        a | [\n               b,  # noqa\n               c | [\n                       d,\n                       e | [f, g, h, i] | j\n                   ] | k\n            ] | l | [m, n, my_router]\n\n        for node in self.top_node.all_nodes:\n            node.pipeline = pipeline\n\n    def test_nothing(self):\n        self.top_node.top_down_make_repr()\n        lines = sorted([\n            line.strip()\n            for line in self.top_node.pipeline._node_repr.split('\\n')\n            if line.strip()\n        ])\n        expected_lines = sorted([\n            'a | [b, c]',\n            'b | l',\n            'c | [d, e]',\n            'd | k',\n            'e | [f, g, h, i]',\n            'f | j',\n            'g | j',\n            'h | j',\n            'i | j',\n            'j | k',\n            'k | l',\n            'l | l.my_router',\n            'l.my_router | [m, n]',\n        ])\n        self.assertEqual(lines, expected_lines)\n\n\nclass RoutingTests(TestCase):\n    def test_nothing(self):\n        a = Node('a')\n        b = Node('b')\n        c = Node('c')\n        d = Node('d')\n        e = Node('e')\n\n        def silly_router(item):  # pragma: no cover\n            return 0\n\n        class ClassRouter(object):  # pragma: no cover\n            def __call__(self, arg):\n                return arg\n\n        a | [b, c, ClassRouter()] | [d, e, silly_router]\n"
  },
  {
    "path": "consecution/tests/pipeline_tests.py",
    "content": "from __future__ import print_function\nfrom collections import namedtuple, Counter\nfrom unittest import TestCase\nfrom consecution.nodes import Node, GroupByNode\nfrom consecution.pipeline import Pipeline, GlobalState\nfrom consecution.tests.testing_helpers import print_catcher\n\nItem = namedtuple('Item', 'value parent source')\n\n\nclass Item(object):  # pragma: no cover (just a testing helper)\n    def __init__(self, value, parent, source):\n        self.value = value\n        self.parent = parent\n        self.source = source\n\n    def build_source_list(self, source_list=None):\n        source_list = [] if source_list is None else source_list\n        source_list.append(self.source)\n        if self.parent:\n            self.parent.build_source_list(source_list)\n        return source_list\n\n    def get_path_string(self):\n        return '|'.join([str(self.value)] + self.build_source_list()[::-1])\n\n    def __str__(self):\n        return self.get_path_string()\n\n    def __repr__(self):\n        return self.get_path_string()\n\n\nclass TestNode(Node):\n    def process(self, item):\n        self.push(\n            Item(value=item.value, parent=item, source=self.name)\n        )\n\n\nclass ResultNode(Node):\n    def process(self, item):\n        self.global_state.final_items.append(item)\n\n\nclass BadNode(Node):\n    def begin(self):\n        self.push(1)\n\n    def process(self, item):  # pragma: no cover  this should never get hit.\n        self.push(item)\n\n\ndef item_generator():\n    for ind in range(1, 3):\n        yield Item(\n            value=ind,\n            parent=None,\n            source='generator'\n        )\n\n\nclass TestBase(TestCase):\n    def setUp(self):\n        a = TestNode('a')\n        b = TestNode('b')\n        c = TestNode('c')\n        d = TestNode('d')\n        even = TestNode('even')\n        odd = TestNode('odd')\n        g = TestNode('g')\n\n        def even_odd(item):\n            return ['even', 'odd'][item.value % 2]\n\n        a | b | [c, d] | [even, odd, even_odd] | g\n\n        self.pipeline = Pipeline(a, global_state=GlobalState(final_items=[]))\n\n\nclass GlobalStateUnitTests(TestCase):\n    def test_kwargs_passed(self):\n        g = GlobalState(custom_name='custom')\n        p = Pipeline(TestNode('a'), global_state=g)\n        self.assertTrue(p.global_state.custom_name == 'custom')\n        self.assertTrue(p.global_state['custom_name'] == 'custom')\n\n    def test_printing(self):\n        g = GlobalState(custom_name='custom')\n        with print_catcher() as catcher1:\n            print(g)\n\n        with print_catcher() as catcher2:\n            print(repr(g))\n\n        self.assertTrue(\n            'GlobalState(\\'custom_name\\')' in catcher1.txt)\n        self.assertTrue(\n            'GlobalState(\\'custom_name\\')' in catcher2.txt)\n\n\nclass OrOpTests(TestCase):\n    def test_ror(self):\n        a = Node('a')\n        b = Node('b')\n        c = Node('c')\n        d = Node('d')\n\n        p = Pipeline(a | ([b, c] | d))\n        with print_catcher() as catcher:\n            print(p)\n        self.assertTrue('a | [b, c]' in catcher.txt)\n        self.assertTrue('c | d' in catcher.txt)\n        self.assertTrue('b | d' in catcher.txt)\n\n\nclass ManualFeedTests(TestCase):\n    def test_manual_feed(self):\n\n        class N(Node):\n            def begin(self):\n                self.global_state.out_list = []\n\n            def process(self, item):\n                self.global_state.out_list.append(item)\n\n        pipeline = Pipeline(TestNode('a') | N('b'))\n        pushed_list = []\n        for item in item_generator():\n            pushed_list.append(item)\n            pipeline.push(item)\n        pipeline.end()\n        self.assertEqual(len(pipeline.global_state.out_list), 2)\n\n\nclass PipelineUnitTests(TestCase):\n    def test_push_in_begin(self):\n        pipeline = Pipeline(BadNode('a') | TestNode('b'))\n        with self.assertRaises(AttributeError):\n            pipeline.begin()\n\n    def test_no_process(self):\n        class N(Node):\n            pass\n\n        pipe = Pipeline(N('a') | N('b'))\n        with self.assertRaises(NotImplementedError):\n            pipe.consume(range(3))\n\n    def test_bad_route(self):\n        def bad_router(item):\n            return 'bad'\n\n        class N(Node):\n            def process(self, item):\n                self.push(item)\n\n        pipeline = Pipeline(N('a') | [N('b'), N('c'), bad_router])\n\n        with self.assertRaises(ValueError):\n            pipeline.consume(range(3))\n\n    def test_bad_node_lookup(self):\n        pipeline = Pipeline(TestNode('a') | TestNode('b'))\n\n        with self.assertRaises(KeyError):\n            pipeline['c']\n\n    def test_bad_replacement_name(self):\n        pipeline = Pipeline(TestNode('a') | TestNode('b'))\n        with self.assertRaises(ValueError):\n            pipeline['b'] = TestNode('c')\n\n    def test_flattened_list(self):\n        pipeline = Pipeline(\n            TestNode('a') | [[Node('b'), Node('c')]])\n\n        with print_catcher() as catcher:\n            print(pipeline)\n\n        self.assertTrue('a | [b, c]' in catcher.txt)\n\n    def test_logging(self):\n        pipeline = Pipeline(TestNode('a') | TestNode('b'))\n        pipeline['a'].log('output')\n        pipeline['b'].log('input')\n        with print_catcher() as catcher:\n            pipeline.consume(item_generator())\n        text = \"\"\"\n            node_log,what,node_name,item\n            node_log,output,a,1|generator|a\n            node_log,input,b,1|generator|a\n            node_log,output,a,2|generator|a\n            node_log,input,b,2|generator|a\n        \"\"\"\n        for line in text.split('\\n'):\n            self.assertTrue(line.strip() in catcher.txt)\n\n    def test_reset(self):\n        class N(Node):\n            def begin(self):\n                self.was_reset = False\n\n            def process(self, item):\n                self.push(item)\n\n            def reset(self):\n                self.was_reset = True\n\n        pipe = Pipeline(N('a') | N('b'))\n        pipe.consume(range(3))\n        self.assertFalse(pipe['a'].was_reset)\n        self.assertFalse(pipe['b'].was_reset)\n\n        pipe.reset()\n\n        self.assertTrue(pipe['a'].was_reset)\n        self.assertTrue(pipe['b'].was_reset)\n\n\nclass LoggingTests(TestBase):\n    def test_logging(self):\n        self.pipeline['g'].log('input')\n\n        with print_catcher() as printer:\n            self.pipeline.consume(item_generator())\n\n        counter = Counter()\n        for line in printer.lines():\n            even_odd = line.split('|')[-1]\n            counter.update({even_odd: 1})\n        self.assertEqual(counter['even'], 2)\n        self.assertEqual(counter['odd'], 2)\n\n\nclass ReplacementTests(TestBase):\n    def test_replace_first(self):\n        class Replacement(Node):\n            def process(self, item):\n                self.push(\n                    Item(value=10 * item.value, parent=item, source=self.name)\n                )\n\n        self.pipeline['a'] = Replacement('a')\n        self.pipeline['a'].log('output')\n\n        with print_catcher() as printer:\n            self.pipeline.consume(item_generator())\n        self.assertEqual(printer.txt.count('10'), 1)\n        self.assertEqual(printer.txt.count('20'), 1)\n\n    def test_replace_even(self):\n        class Replacement(Node):\n            def process(self, item):\n                self.push(\n                    Item(value=10 * item.value, parent=item, source=self.name)\n                )\n\n        self.pipeline['even'] = Replacement('even')\n        self.pipeline['g'].log('output')\n\n        with print_catcher() as printer:\n            self.pipeline.consume(item_generator())\n        self.assertEqual(printer.txt.count('1'), 2)\n        self.assertEqual(printer.txt.count('20'), 2)\n\n    def test_replace_no_router(self):\n        a = TestNode('a')\n        b = TestNode('b')\n        pipe = Pipeline(a | b)\n        pipe['b'] = TestNode('b')\n        with print_catcher() as catcher:\n            print(pipe)\n        self.assertTrue('a | b' in catcher.txt)\n\n\nclass ConsumingTests(TestBase):\n    def test_even_odd(self):\n        self.pipeline['g'].add_downstream(\n            ResultNode('result_node')\n        )\n\n        self.pipeline.consume(item_generator())\n\n        expected_path_set = set([\n            '1|generator|a|b|c|odd|g',\n            '1|generator|a|b|d|odd|g',\n            '2|generator|a|b|c|even|g',\n            '2|generator|a|b|d|even|g',\n        ])\n        path_set = set(\n            item.get_path_string() for item in\n            self.pipeline.global_state.final_items\n        )\n        self.assertEqual(expected_path_set, path_set)\n\n\nclass ConstructingTests(TestBase):\n    def test_printing(self):\n        lines = repr(self.pipeline).split('\\n')\n        self.assertEqual(len(lines), 13)\n\n    def test_plotting(self):\n        # don't want to force a mock dependency, so make a simple mock here\n        args_kwargs = []\n\n        def return_calls(*args, **kwargs):\n            args_kwargs.append(args)\n            args_kwargs.append(kwargs)\n\n        # assign my mock to the top node plot function\n        self.pipeline.top_node.plot = return_calls\n\n        # call pipeline plot\n        self.pipeline.plot()\n\n        # make sure top node plot was properly called\n        self.assertEqual(args_kwargs[0], ('pipeline', 'png'))\n        self.assertEqual(args_kwargs[1], {})\n\n\nclass Batch(GroupByNode):\n    def begin(self):\n        self.global_state.batches = []\n\n    def key(self, item):\n        return item // 3\n\n    def process(self, batch):\n        self.global_state.batches.append(batch)\n\n\nclass GroupByTests(TestCase):\n    def test_batching(self):\n        pipe = Pipeline(Batch('a'))\n        pipe.consume(range(9))\n        self.assertEqual(\n            pipe.global_state.batches,\n            [[0, 1, 2], [3, 4, 5], [6, 7, 8]]\n        )\n\n    def test_undefined_key(self):\n        class B(GroupByNode):\n            def process(self, item):  # pragma: no cover\n                pass\n\n        pipe = Pipeline(B('a'))\n\n        with self.assertRaises(NotImplementedError):\n            pipe.consume(range(9))\n\n    def test_undefined_process(self):\n        class B(GroupByNode):\n            def key(self, item):\n                pass\n\n        pipe = Pipeline(B('a'))\n\n        with self.assertRaises(NotImplementedError):\n            pipe.consume(range(9))\n"
  },
  {
    "path": "consecution/tests/testing_helpers.py",
    "content": "import sys\nfrom contextlib import contextmanager\n\n\n# These don't need to covered.  They are just tesing utilities\n@contextmanager\ndef print_catcher(buff='stdout'):  # pragma: no cover\n    if buff == 'stdout':\n        sys.stdout = Printer()\n        yield sys.stdout\n        sys.stdout = sys.__stdout__\n    elif buff == 'stderr':\n        sys.stderr = Printer()\n        yield sys.stderr\n        sys.stderr = sys.__stderr__\n    else:  # pragma: no cover  This is just to help testing. No need to cover.\n        raise ValueError('buff must be either \\'stdout\\' or \\'stderr\\'')\n\n\nclass Printer(object):  # pragma: no cover\n    def __init__(self):\n        self.txt = \"\"\n\n    def write(self, txt):\n        self.txt += txt\n\n    def lines(self):\n        for line in self.txt.split('\\n'):\n            yield line.strip()\n"
  },
  {
    "path": "consecution/tests/utils_tests.py",
    "content": "from __future__ import print_function\n\nfrom unittest import TestCase\nfrom consecution.utils import Clock\nimport time\nfrom consecution.tests.testing_helpers import print_catcher\n\n\nclass ClockTests(TestCase):\n    def test_bad_start(self):\n        clock = Clock()\n        with self.assertRaises(ValueError):\n            clock.start()\n\n    def test_printing(self):\n        clock = Clock()\n        with clock.running('a', 'b', 'c'):\n            with clock.paused('a'):\n                time.sleep(.1)\n                with clock.paused('b'):\n                    time.sleep(.1)\n\n        with print_catcher() as printer:\n            print(repr(clock))\n\n        names = []\n        for ind, line in enumerate(printer.txt.split('\\n')):\n            if line:\n                if ind > 0:\n                    names.append(line.split()[-1])\n\n        self.assertEqual(names, ['c', 'b', 'a'])\n\n    def test_get_time_of_running(self):\n        clock = Clock()\n        with clock.running('a'):\n            time.sleep(.1)\n            delta1 = int(10 * clock.get_time())\n            time.sleep(.1)\n        delta2 = int(10 * clock.get_time())\n        self.assertEqual(delta1, 1)\n        self.assertEqual(delta2, 2)\n\n    def test_pausing(self):\n        clock = Clock()\n\n        with clock.running('a', 'b', 'c'):\n            time.sleep(.1)\n            with clock.paused('b', 'c'):\n                time.sleep(.1)\n\n        self.assertEqual(int(10 * clock.get_time('a')), 2)\n        self.assertEqual(int(10 * clock.get_time('b')), 1)\n        self.assertEqual(int(10 * clock.get_time('c')), 1)\n        self.assertEqual(\n            {int(10 * v) for v in clock.get_time().values()},\n            {1, 2}\n        )\n\n    def test_stop_all(self):\n        clock = Clock()\n        clock.start('a', 'b')\n        time.sleep(.1)\n        clock.stop()\n        self.assertEqual(int(10 * clock.get_time('a')), 1)\n        self.assertEqual(int(10 * clock.get_time('b')), 1)\n\n    def test_reset_all(self):\n        clock = Clock()\n        clock.start('a', 'b')\n        time.sleep(.1)\n        clock.stop('b')\n        self.assertEqual(len(clock.delta), 1)\n        clock.reset()\n        self.assertEqual(len(clock.get_time()), 0)\n\n    def test_double_calls(self):\n        clock = Clock()\n        clock.start('a')\n        clock.start('a')\n        time.sleep(.1)\n        clock.stop('a')\n        clock.stop('a')\n        self.assertEqual(int(round(10 * clock.get_time())), 1)\n        clock.reset('a')\n        clock.reset('a')\n        clock.reset('b')\n        clock.reset('b')\n        self.assertEqual(clock.get_time(), {})\n\n    def test_get_time_delta_only(self):\n        clock = Clock()\n        clock.start('a')\n        clock.stop('a')\n        self.assertEqual(clock.get_time('f'), {})\n"
  },
  {
    "path": "consecution/utils.py",
    "content": "from collections import Counter\nfrom contextlib import contextmanager\nimport datetime\n\n\nclass Clock(object):\n    def __init__(self):\n        # see the reset method for instance attributes\n        self.delta = Counter()\n        self.active_start_times = dict()\n\n    @contextmanager\n    def running(self, *names):\n        self.start(*names)\n        yield\n        self.stop(*names)\n\n    @contextmanager\n    def paused(self, *names):\n        self.stop(*names)\n        yield\n        self.start(*names)\n\n    def start(self, *names):\n        if not names:\n            raise ValueError('You must provide at least one name to start')\n\n        for name in names:\n            if name not in self.active_start_times:\n                self.active_start_times[name] = datetime.datetime.now()\n\n    def stop(self, *names):\n        ending = datetime.datetime.now()\n        if not names:\n            names = list(self.active_start_times.keys())\n        for name in names:\n            if name in self.active_start_times:\n                starting = self.active_start_times.pop(name)\n                self.delta.update({name: (ending - starting).total_seconds()})\n\n    def reset(self, *names):\n        if not names:\n            names = list(self.active_start_times.keys())\n            names.extend(list(self.delta.keys()))\n        for name in names:\n            if name in self.delta:\n                self.delta.pop(name)\n            if name in self.active_start_times:\n                self.active_start_times.pop(name)\n\n    def get_time(self, *names):\n        ending = datetime.datetime.now()\n        if not names:\n            names = list(self.delta.keys())\n            names.extend(list(self.active_start_times.keys()))\n\n        delta = Counter()\n        for name in names:\n            if name in self.delta:\n                delta.update({name: self.delta[name]})\n            elif name in self.active_start_times:\n                delta.update(\n                    {\n                        name: (\n                            ending - self.active_start_times[name]\n                        ).total_seconds()\n                    }\n                )\n        if len(delta) == 1:\n            return delta[list(delta.keys())[0]]\n        else:\n            return dict(delta)\n\n    def __str__(self):\n        records = sorted(self.delta.items(), key=lambda t: t[1], reverse=True)\n        records = [('%0.6f' % r[1], r[0]) for r in records]\n\n        out_list = ['{: <15s}{}'.format('seconds', 'name')]\n\n        for rec in records:\n            out_list.append('{: <15s}{}'.format(*rec))\n\n        return '\\n'.join(out_list)\n\n    def __repr__(self):\n        return self.__str__()\n"
  },
  {
    "path": "docker/Dockerfile",
    "content": "FROM ubuntu:xenial\n\n# root is the home directory\nWORKDIR /root\n\nADD simple_example.py /root/simple_example.py\n\n# set up the system tools including conda\nRUN \\\n    rm /bin/sh && ln -s /bin/bash /bin/sh && \\\n    apt-get update && \\\n    apt-get install -y vim && \\\n    apt-get install -y git  && \\\n    apt-get install -y wget && \\\n    apt-get install -y curl && \\\n    apt-get install -y graphviz && \\\n    apt-get install -y python-dev\n\nRUN \\\n    curl -sS https://bootstrap.pypa.io/get-pip.py | python\n\nRUN \\\n    pip install git+https://github.com/robdmc/consecution.git\n"
  },
  {
    "path": "docker/docker_build.sh",
    "content": "#! /usr/bin/env bash\n\ndocker build . -t consecution\n"
  },
  {
    "path": "docker/docker_run.sh",
    "content": "#! /usr/bin/env bash\n\ndocker run -it  --rm -v  $(pwd):/root/shared consecution /bin/bash\n"
  },
  {
    "path": "docker/simple_example.py",
    "content": "#! /usr/bin/env python\n\n# TODO: make the consecution install in the docker file read from pip\nfrom __future__ import print_function\n\nfrom consecution import Node, Pipeline\n\n\nclass N(Node):\n    def process(self, item):\n        print(item, self.name)\n        self.push(item)\n\n\np = Pipeline(\n    N('a') | [N('b'), N('c')] | N('d')\n)\np.plot()\n\np.consume(range(5))\n"
  },
  {
    "path": "docs/Makefile",
    "content": "# Makefile for Sphinx documentation\n#\n\n# You can set these variables from the command line.\nSPHINXOPTS    =\nSPHINXBUILD   = sphinx-build\nPAPER         =\nBUILDDIR      = _build\n\n# User-friendly check for sphinx-build\nifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)\n$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)\nendif\n\n# Internal variables.\nPAPEROPT_a4     = -D latex_paper_size=a4\nPAPEROPT_letter = -D latex_paper_size=letter\nALLSPHINXOPTS   = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .\n# the i18n builder cannot share the environment and doctrees with the others\nI18NSPHINXOPTS  = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .\n\n.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp epub latex latexpdf text man changes linkcheck doctest gettext\n\nhelp:\n\t@echo \"Please use \\`make <target>' where <target> is one of\"\n\t@echo \"  html       to make standalone HTML files\"\n\t@echo \"  dirhtml    to make HTML files named index.html in directories\"\n\t@echo \"  singlehtml to make a single large HTML file\"\n\t@echo \"  pickle     to make pickle files\"\n\t@echo \"  json       to make JSON files\"\n\t@echo \"  htmlhelp   to make HTML files and a HTML help project\"\n\t@echo \"  epub       to make an epub\"\n\t@echo \"  latex      to make LaTeX files, you can set PAPER=a4 or PAPER=letter\"\n\t@echo \"  latexpdf   to make LaTeX files and run them through pdflatex\"\n\t@echo \"  latexpdfja to make LaTeX files and run them through platex/dvipdfmx\"\n\t@echo \"  text       to make text files\"\n\t@echo \"  man        to make manual pages\"\n\t@echo \"  texinfo    to make Texinfo files\"\n\t@echo \"  info       to make Texinfo files and run them through makeinfo\"\n\t@echo \"  gettext    to make PO message catalogs\"\n\t@echo \"  changes    to make an overview of all changed/added/deprecated items\"\n\t@echo \"  xml        to make Docutils-native XML files\"\n\t@echo \"  pseudoxml  to make pseudoxml-XML files for display purposes\"\n\t@echo \"  linkcheck  to check all external links for integrity\"\n\t@echo \"  doctest    to run all doctests embedded in the documentation (if enabled)\"\n\nclean:\n\trm -rf $(BUILDDIR)/*\n\nhtml:\n\t$(SPHINXBUILD) -W -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html\n\t@echo\n\t@echo \"Build finished. The HTML pages are in $(BUILDDIR)/html.\"\n\ndirhtml:\n\t$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml\n\t@echo\n\t@echo \"Build finished. The HTML pages are in $(BUILDDIR)/dirhtml.\"\n\nsinglehtml:\n\t$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml\n\t@echo\n\t@echo \"Build finished. The HTML page is in $(BUILDDIR)/singlehtml.\"\n\npickle:\n\t$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle\n\t@echo\n\t@echo \"Build finished; now you can process the pickle files.\"\n\njson:\n\t$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json\n\t@echo\n\t@echo \"Build finished; now you can process the JSON files.\"\n\nhtmlhelp:\n\t$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp\n\t@echo\n\t@echo \"Build finished; now you can run HTML Help Workshop with the\" \\\n\t      \".hhp project file in $(BUILDDIR)/htmlhelp.\"\n\nepub:\n\t$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub\n\t@echo\n\t@echo \"Build finished. The epub file is in $(BUILDDIR)/epub.\"\n\nlatex:\n\t$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex\n\t@echo\n\t@echo \"Build finished; the LaTeX files are in $(BUILDDIR)/latex.\"\n\t@echo \"Run \\`make' in that directory to run these through (pdf)latex\" \\\n\t      \"(use \\`make latexpdf' here to do that automatically).\"\n\nlatexpdf:\n\t$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex\n\t@echo \"Running LaTeX files through pdflatex...\"\n\t$(MAKE) -C $(BUILDDIR)/latex all-pdf\n\t@echo \"pdflatex finished; the PDF files are in $(BUILDDIR)/latex.\"\n\nlatexpdfja:\n\t$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex\n\t@echo \"Running LaTeX files through platex and dvipdfmx...\"\n\t$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja\n\t@echo \"pdflatex finished; the PDF files are in $(BUILDDIR)/latex.\"\n\ntext:\n\t$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text\n\t@echo\n\t@echo \"Build finished. The text files are in $(BUILDDIR)/text.\"\n\nman:\n\t$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man\n\t@echo\n\t@echo \"Build finished. The manual pages are in $(BUILDDIR)/man.\"\n\ntexinfo:\n\t$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo\n\t@echo\n\t@echo \"Build finished. The Texinfo files are in $(BUILDDIR)/texinfo.\"\n\t@echo \"Run \\`make' in that directory to run these through makeinfo\" \\\n\t      \"(use \\`make info' here to do that automatically).\"\n\ninfo:\n\t$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo\n\t@echo \"Running Texinfo files through makeinfo...\"\n\tmake -C $(BUILDDIR)/texinfo info\n\t@echo \"makeinfo finished; the Info files are in $(BUILDDIR)/texinfo.\"\n\ngettext:\n\t$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale\n\t@echo\n\t@echo \"Build finished. The message catalogs are in $(BUILDDIR)/locale.\"\n\nchanges:\n\t$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes\n\t@echo\n\t@echo \"The overview file is in $(BUILDDIR)/changes.\"\n\nlinkcheck:\n\t$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck\n\t@echo\n\t@echo \"Link check complete; look for any errors in the above output \" \\\n\t      \"or in $(BUILDDIR)/linkcheck/output.txt.\"\n\ndoctest:\n\t$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest\n\t@echo \"Testing of doctests in the sources finished, look at the \" \\\n\t      \"results in $(BUILDDIR)/doctest/output.txt.\"\n\nxml:\n\t$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml\n\t@echo\n\t@echo \"Build finished. The XML files are in $(BUILDDIR)/xml.\"\n\npseudoxml:\n\t$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml\n\t@echo\n\t@echo \"Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml.\"\n"
  },
  {
    "path": "docs/conf.py",
    "content": "# -*- coding: utf-8 -*-\n#\nimport inspect\nimport os\nimport re\n\n\ndef get_version():\n    \"\"\"Obtain the packge version from a python file e.g. pkg/__init__.py\n    See <https://packaging.python.org/en/latest/single_source_version.html>.\n    \"\"\"\n    file_dir = os.path.realpath(os.path.dirname(__file__))\n    with open(\n            os.path.join(file_dir, '..', 'consecution', '__init__.py')) as f:\n        txt = f.read()\n    version_match = re.search(\n        r\"\"\"^__version__ = ['\"]([^'\"]*)['\"]\"\"\", txt, re.M)\n    if version_match:\n        return version_match.group(1)\n    raise RuntimeError(\"Unable to find version string.\")\n\n\n# If extensions (or modules to document with autodoc) are in another directory,\n# add these directories to sys.path here. If the directory is relative to the\n# documentation root, use os.path.abspath to make it absolute, like shown here.\n#sys.path.insert(0, os.path.abspath('.'))\n\n# -- General configuration ------------------------------------------------\n\nextensions = [\n    'sphinx.ext.autodoc',\n    'sphinx.ext.intersphinx',\n    #'sphinx.ext.viewcode',\n]\n\n# Add any paths that contain templates here, relative to this directory.\ntemplates_path = ['_templates']\n\n# The suffix of source filenames.\nsource_suffix = '.rst'\n\n# The master toctree document.\nmaster_doc = 'toc'\n\n# General information about the project.\nproject = 'consecution'\ncopyright = '2017, Rob deCarvalho'\n\n# The short X.Y version.\nversion = get_version()\n# The full version, including alpha/beta/rc tags.\nrelease = version\n\nexclude_patterns = ['_build']\n\n# The name of the Pygments (syntax highlighting) style to use.\npygments_style = 'sphinx'\n\nintersphinx_mapping = {\n    'python': ('http://docs.python.org/3.4', None),\n    'django': ('http://django.readthedocs.org/en/latest/', None),\n    #'celery': ('http://celery.readthedocs.org/en/latest/', None),\n}\n\n# -- Options for HTML output ----------------------------------------------\n\nhtml_theme = 'default'\n#html_theme_path = []\n\non_rtd = os.environ.get('READTHEDOCS', None) == 'True'\nif not on_rtd:  # only import and set the theme if we're building docs locally\n    import sphinx_rtd_theme\n    html_theme = 'sphinx_rtd_theme'\n    html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]\n\n# Add any paths that contain custom static files (such as style sheets) here,\n# relative to this directory. They are copied after the builtin static files,\n# so a file named \"default.css\" will overwrite the builtin \"default.css\".\n# html_static_path = ['_static']\n\n# Custom sidebar templates, maps document names to template names.\n#html_sidebars = {}\n\n# Additional templates that should be rendered to pages, maps page names to\n# template names.\n#html_additional_pages = {}\n\n# If true, \"Created using Sphinx\" is shown in the HTML footer. Default is True.\nhtml_show_sphinx = False\n\n# If true, \"(C) Copyright ...\" is shown in the HTML footer. Default is True.\nhtml_show_copyright = True\n\n# Output file base name for HTML help builder.\nhtmlhelp_basename = 'consecutiondoc'\n\n\n# -- Options for LaTeX output ---------------------------------------------\n\nlatex_elements = {\n# The paper size ('letterpaper' or 'a4paper').\n#'papersize': 'letterpaper',\n\n# The font size ('10pt', '11pt' or '12pt').\n#'pointsize': '10pt',\n\n# Additional stuff for the LaTeX preamble.\n#'preamble': '',\n}\n\n# Grouping the document tree into LaTeX files. List of tuples\n# (source start file, target name, title,\n#  author, documentclass [howto, manual, or own class]).\nlatex_documents = [\n  ('index', 'consecution.tex', 'consecution Documentation',\n   'Rob deCarvalho', 'manual'),\n]\n\n# -- Options for manual page output ---------------------------------------\n\n# One entry per manual page. List of tuples\n# (source start file, name, description, authors, manual section).\nman_pages = [\n    ('index', 'consecution', 'consecution Documentation',\n     ['Rob deCarvalho'], 1)\n]\n\n# -- Options for Texinfo output -------------------------------------------\n\n# Grouping the document tree into Texinfo files. List of tuples\n# (source start file, target name, title, author,\n#  dir menu entry, description, category)\ntexinfo_documents = [\n  ('index', 'consecution', 'consecution Documentation',\n   'Rob deCarvalho', 'consecution', 'A short description',\n   'Miscellaneous'),\n]\n\n\ndef process_django_model_docstring(app, what, name, obj, options, lines):\n    \"\"\"\n    Does special processing for django model docstrings, making docs for\n    fields in the model.\n    \"\"\"\n    # This causes import errors if left outside the function\n    from django.db import models\n\n    # Only look at objects that inherit from Django's base model class\n    if inspect.isclass(obj) and issubclass(obj, models.Model):\n        # Grab the field list from the meta class\n        fields = obj._meta.fields\n\n        for field in fields:\n            # Decode and strip any html out of the field's help text\n            help_text = strip_tags(force_unicode(field.help_text))\n\n            # Decode and capitalize the verbose name, for use if there isn't\n            # any help text\n            verbose_name = force_unicode(field.verbose_name).capitalize()\n\n            if help_text:\n                # Add the model field to the end of the docstring as a param\n                # using the help text as the description\n                lines.append(':param %s: %s' % (field.attname, help_text))\n            else:\n                # Add the model field to the end of the docstring as a param\n                # using the verbose name as the description\n                lines.append(':param %s: %s' % (field.attname, verbose_name))\n\n            # Add the field's type to the docstring\n            lines.append(':type %s: %s' % (field.attname, type(field).__name__))\n\n    # Return the extended docstring\n    return lines\n\n\ndef setup(app):\n    # Register the docstring processor with sphinx\n    app.connect('autodoc-process-docstring', process_django_model_docstring)\n"
  },
  {
    "path": "docs/index.rst",
    "content": "\nOverview\n=============================\nConsecution is:\n  * An easy-to-use pipeline abstraction inspired by\n    `Apache Storm Topologies <http://storm.apache.org/releases/current/Tutorial.html>`_.\n  * Designed to simplify building ETL pipelines that are robust and easy to test\n  * A system for wiring together simple processing nodes to form a DAG, which is fed with a python iterable\n  * Built using synchronous, single-threaded execution strategies designed to run efficiently on a single core\n  * Implemented in pure-python with optional requirements that are needed only for graph visualization\n  * Written with 100% test coverage\n\nSee the \n`Github project page <https://github.com/robdmc/consecution>`_.\nfor examples of how to use `consecution`.\n\n"
  },
  {
    "path": "docs/ref/consecution.rst",
    "content": ".. _ref-consecution:\n\nAPI documentation\n==================\n\nNode\n----\nNodes are the fundamental processing unit in consecution.  A node is created by\ninheriting from the `consecution.Node` class.  You are free to declare as many\nattributes and methods on a node class as you wish.  You should not override the\nconstructor unless you really know what you're doing.  Instead, any\ninitialization you wish to perform can be carried out in the `.begin()` method.\nIn the descriptions below, it is assumed that the nodes being discussed have\nbeen wired together into a pipeline and are ready to consume items.\n\nSee the \n`Github README\n<https://github.com/robdmc/consecution/blob/master/README.md>`_\nfor examples  of how to wire nodes into pipelines.\n\nReserved Method Names\n~~~~~~~~~~~~~~~~~~~~~\nThe following Node methods are not intended to be overridden, so you should not\ndefine methods with these names in your node implementations unless you really\nknow what you are doing.\n\n*  `top_node`\n*  `initial_node_set`\n*  `terminal_node_set`\n*  `root_nodes`\n*  `all_nodes`\n*  `log`\n*  `top_down_make_repr`\n*  `top_down_call`\n*  `depth_first_search`\n*  `breadth_first_search`\n*  `search`\n*  `add_downstream`\n*  `remove_downstream`\n*  `plot`\n\nThere are also a number of private method names you should avoid.  These can be\nidentified by looking at the `source code \n<https://github.com/robdmc/consecution/blob/master/consecution/nodes.py>`_\n\n\nExamples\n~~~~~~~~\n\nHere is the simplest possible node you could construct:\n\n.. code-block:: python\n\n    from consecution import Node\n\n    class MyNode(Node):\n        def process(self, item):\n            self.push(item)\n\nAll nodes acquire a `.push()` method when they are wired into a pipeline.  You\ncan call this method anywhere in your class except in the `.begin()` method.\nThe `.push(item)` method will take its argument and send it to the `.process()`\nmethods of the nodes that are immediately downstream in your pipeline graph.\n\nHere is an example node defining all methods you can override.  The\nfunctionality of each method is explained in the code comments.\n\n.. code-block:: python\n\n    from consecution import Node\n\n    class MyNode(Node):\n        def begin(self):\n            # This sets up whatever state you want to exist before the\n            # node begins processing any data.  You can think of it as an\n            # init method that runs just before the node starts processing.\n            # In this example, we initialize a simple counter\n            self.counter = 0\n\n        def process(self, item):\n            # This is the method that defines the processing you want to perform\n            # on every item the node processes.  You can place whatever logic\n            # you want here, including calls the the .push() method.\n            # In this example, we update the counter and push the item\n            # downstream.\n            self.counter += 1\n            self.push(item)\n\n        def end(self):\n            # This method is called right after all items are processed.\n            # This happens  when the iterator being consumed by the pipeline\n            # is exhausted.  At that point the .end() methods of all nodes\n            # in the pipeline are called.  This is a good place for you to\n            # push any summary information downstream.\n            # In this example we push the results of our counter\n            self.push(self.counter)\n\n        def reset(self):\n            # A pipeline can be reused and reset back to its initial condition.\n            # It does this by calling the .reset() method of all its member\n            # nodes.  You can place whatever code you want here to reset your\n            # node to its initial state.\n            # In this example, we simply reset the counter.\n            self.counter = 0\n\nNode API Documentation\n~~~~~~~~~~~~~~~~~~~~~~\n\n.. autoclass:: consecution.nodes.Node\n    :members:\n\nGroupBy Node\n~~~~~~~~~~~~~~~~~~~~~~\nConsecution provides a special Node class specifically designed to do grouping.\nIt works in much the sameway as Python's built in\n``itertools.groupby`` function.  It expects to process nodes in key-sorted\norder.  In addition to the ``.process()`` method required of all nodes, you must\nalso define a ``.key()`` method that will extract a key from each item being\nprocessed.  See the Github project page for an example of using Groupby.\n\n.. autoclass:: consecution.nodes.GroupByNode\n    :members:\n\n\n\nManually Connecting Nodes\n-------------------------\nThe Node base class is equipped with an ``.add_downstream(other_node)`` method.\nThis method provides detailed control over how nodes are wired together. It\nsimply adds ``other_node`` as a downstream relation.\n\nHere is an example of creating a pipeline with one top node that broadcasts\nitems to two downstream nodes, and then collects their results into a single\noutput node.\n\n.. code-block:: python\n\n    from consecution import Pipeline, Node\n    from __future__ import print_function\n\n    class SimpleNode(Node):\n        def process(self, item):\n            print('{} processing {}'.format(self.name, item))\n            self.push(item)\n\n    top = SimpleNode('top')\n    left = SimpleNode('left')\n    right = SimpleNode('right')\n    output = SimpleNode('output')\n\n    top.add_downstream(left)\n    top.add_downstream(right)\n\n    left.add_downstream(output)\n    right.add_downstream(output)\n\n    pipe = Pipeline(top)\n\n    pipe.consume(range(2))\n\n\nNode Connection Mini-language\n-----------------------------\nConsecution provides a concise domain-specific-language (DSL) for creating\ndirected acyclic graphs.  This is the preferred method for connecting nodes into\na pipeline.  However, you may occasionally find that your desired topology is not\neasy to express in the DSL.  For these situations, consecution provides a\nlower-level escape hatch that allowes you to manually connect two nodes\ntogether.  These two levels of abstraction provide a very powerful interface for\nconstructing complex pipelines.\n\nThe DSL is inspired by the unix syntax for chaining together the inputs and\noutputs of different programs at the bash prompt.  You use the pipe symbol ``|``\nto connect nodes together.  These pipe operators will always return an object of\none of the nodes in your connected topology. Below is an example of creating a\nsimple linear pipeline.\n\n.. code-block:: python\n\n    from consecution import Pipeline, Node\n    from __future__ import print_function\n\n    class SimpleNode(Node):\n        def process(self, item):\n            print('{} processing {}'.format(self.name, item))\n            self.push(item)\n\n    left = SimpleNode('left')\n    middle = SimpleNode('middle')\n    right = SimpleNode('right')\n\n    # wire nodes together with bash-like pipe operator\n    node_object = left | middle | right\n\n    # You can now pass the node object into a pipeline constructor\n    pipe = Pipeline(node_object)\n    pipe.consume(range(2))\n\nIn order to create a directed acyclic graph (DAG) you need four basic\nconstructs:\n\n* Send data from one node to a single other node\n* Broadcast data from one node to a set of other nodes\n* Route data from one node to one of a set of other nodes\n* Gather output from several nodes into one node.\n\nThe DSL provides mechanisms for each of these constructs, and we will look at\neach in turn.\n\nSend data from single node to single node\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nUse simple bash-like pipe syntax to send data from a single node to another\nnode.\n\n.. code-block:: python\n\n    # Send data from one to to a single other node using bash-like piping.\n    node1 | node2\n\n\n\nBroadcast data from single node to multiple node\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nBroadcasting is accomplished by piping to a list of nodes.  In the following\nexample, ``node1`` will send each item it pushes to ``node2``, ``node3``, and\n``node4``.\n\n.. code-block:: python\n\n    # Broadcast to a set of nodes by piping to a list\n    node1 | [node2, node3, node4]\n\nRouting from one node to one of multiple nodes\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nRouting is accomplished by piping to a list that contains a single callable and\nany number of nodes.  The following example will send even numbers to\n``even_node`` and odd numbers to ``odd_node``.\n\n.. code-block:: python\n\n    # Define a node class\n    class N(Node):\n        def process(self, item):\n            self.push(item)\n\n    # Define a routing function.  It takes a single argument being the item\n    # you pushed.  It should return a string with the name of the node\n    # to which that item should be routed.\n    def route_func(item):\n        if item % 2 == 0:\n            return 'even_node'\n        else:\n            return 'odd_node'\n\n    # Pipe to a list of nodes and a callable to achieve routing\n    N('top_node') | [N('even_node'), N('odd_node'), route_func]\n\n\nGather output from multiple nodes\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nGathering output from a set of nodes is as simple as piping a list of nodes (and\npossibly a route function) to a single node.  In this example, the outputs of\n``node2``, ``node3``, and ``node4`` will all be sent to ``node5``.\n\n.. code-block:: python\n\n    # Broadcast to a set of nodes by piping to a list\n    node1 | [node2, node3, node4] | node5\n\n\nPipeline\n-----------------\nOnce nodes are wired together, they need to be encapsulated into a pipeline\nbefore they can operate on data.  This is done by passing any node in the\nnetwork as the argument to the ``Pipeline`` constructor.  On construction, the\npipeline will ensure you have a valid processing graph and will execute\ninitialization code to ensure that the nodes are efficiently connected.\nImmediately after construction, the pipeline is ready to consume data.\n\nConsuming Iterables\n~~~~~~~~~~~~~~~~~~~\nWhen the ``.consume(iterable)`` method is called a sequence of events occur in\nexactly this order.\n\n#. The ``.begin()`` method on the pipeline object is called.  You can override\n   this method to perform any task you'd like.\n\n#. The ``.begin()`` methods of all nodes in the network are called.  They are\n   called in top-down order.  What this means is that the ``.begin()`` method of\n   a node is guaranteed to not be called until the ``.begin()`` methods of all\n   its ancestors have been called.\n\n#. Items are read from the iterable argument supplied to the ``.consume()``\n   method.  These are fed through the topology of the processing graph one by\n   one.  Each item is completely processed by the graph before the next one is\n   lifted off the iterable.\n\n#. The ``.end()`` methods of all nodes are called in top-down order.\n\n#. The ``.end()`` method of the pipeline is called.\n\n\nManually feeding Pipeline\n~~~~~~~~~~~~~~~~~~~~~~~~~~\nIn addition to consuming iterables, you can manually feed pipelines using the\n``.push()`` method on the pipeline itself.  When you are finished pushing items,\nyou can manually call the ``.end()`` method.  Here is an example.\n\n.. code-block:: python\n\n    from consecution import Node, Pipeline\n    from __future__ import print_function\n\n    class N(Node):\n        def process(self, item):\n            print(item)\n            self.push(item)\n\n    pipe = Pipeline(N('first') | N('second'))\n    for nn in range(2):\n        pipe.push(nn)\n    pipe.end()\n\n\nPipeline API Documentation\n~~~~~~~~~~~~~~~~~~~~~~~~~~\nPipelines support dictionary-like access to their nodes.  Here are examples.\n\n.. code-block:: python\n\n    from consecution import Node, Pipeline\n\n    # Define a node \n    class N(Node):\n        def process(self, item):\n            self.push(item)\n\n    # Create a pipeline with two nodes\n    pipe = Pipeline(N('first') | N('second'))\n\n    # Get reference to a node with dictionary syntax\n    first = pipe['first']\n\n    # Replace a node with dictionary-like syntax\n    pipe['first'] = N('first')\n\n\n.. autoclass:: consecution.pipeline.Pipeline\n    :members:\n\n\nGlobalState\n-----------------\nThe ``GlobalState`` class is a simple python class that supports both\ndictionary-like and object-like attribute access.  An object of this class will\nbe used as the default ``global_state`` attribute of a pipeline if you don't\nexplicitly provide one in the constructor.\n\n.. autoclass:: consecution.pipeline.GlobalState\n    :members:\n\n"
  },
  {
    "path": "docs/toc.rst",
    "content": "Table of Contents\n=================\n\n.. toctree::\n   :maxdepth: 2\n\n   index\n   ref/consecution\n"
  },
  {
    "path": "pandashells.md",
    "content": "Pandashells One-liner Example\n===\n\n<a href=\"https://github.com/robdmc/pandashells\">Pandashells</a> lets you use <a\nhref=\"http://pandas.pydata.org/\">Pandas</a> from the bash command line.  It\nallows you to combine unix command-line tools (awk, grep, sed, etc.) with the\npower of Pandas Dataframes and Matplotlib visualization.\n\nHere is a one-liner that performs the exact same aggregation demonstrated by the\nexample consecution pipeline.\n\n```bash\ncat sample_data.csv | \\\np.df 'df[\"group\"] = [\"adult\" if a>=18 else \"child\" for a in df.age]' | \\\np.df 'df.pivot_table(index=\"group\", columns=\"gender\", values=\"spent\", margins=True, aggfunc=sum).fillna(0)' \\\n-o table index\n```\n"
  },
  {
    "path": "publish.py",
    "content": "import subprocess\n\nsubprocess.call('pip install wheel'.split())\nsubprocess.call('python setup.py clean --all'.split())\nsubprocess.call('python setup.py sdist'.split())\n# subprocess.call('pip wheel --no-index --no-deps --wheel-dir dist dist/*.tar.gz'.split())\nsubprocess.call('python setup.py register sdist bdist_wheel upload'.split())\n"
  },
  {
    "path": "sample_data.csv",
    "content": "gender,age,spent\nmale,11,39.39\nfemale,10,34.72\nfemale,15,40.02\nmale,19,26.27\nmale,13,21.22\nfemale,40,23.17\nfemale,52,33.42\nmale,33,39.52\nfemale,16,28.65\nmale,60,26.74\n"
  },
  {
    "path": "setup.cfg",
    "content": "[nosetests]\nnocapture=1\nverbosity=1\nwith-coverage=1\ncover-branches=1\n#cover-min-percentage=100\ncover-package=consecution\n\n[coverage:report]\nshow_missing=True\nfail_under=100\nexclude_lines =\n    # Have to re-enable the standard pragma\n    pragma: no cover\n\n    # Don't complain if tests don't hit defensive assertion code:\n    raise NotImplementedError\n\n[coverage:run]\nomit =\n    consecution/version.py\n    consecution/__init__.py\n\n\n[flake8]\nmax-line-length = 120\nexclude = docs,env,*.egg\nmax-complexity = 10\nignore = E402\n\n[build_sphinx]\nsource-dir = docs/\nbuild-dir  = docs/_build\nall_files  = 1\n\n[upload_sphinx]\nupload-dir = docs/_build/html\n\n[bdist_wheel]\nuniversal = 1\n"
  },
  {
    "path": "setup.py",
    "content": "#!/usr/bin/env python\n\nimport io\nimport os\nimport re\nfrom setuptools import setup, find_packages\n\nfile_dir = os.path.dirname(__file__)\n\n\ndef read(path, encoding='utf-8'):\n    path = os.path.join(os.path.dirname(__file__), path)\n    with io.open(path, encoding=encoding) as fp:\n        return fp.read()\n\n\ndef version(path):\n    \"\"\"Obtain the packge version from a python file e.g. pkg/__init__.py\n    See <https://packaging.python.org/en/latest/single_source_version.html>.\n    \"\"\"\n    version_file = read(path)\n    version_match = re.search(r\"\"\"^__version__ = ['\"]([^'\"]*)['\"]\"\"\",\n                              version_file, re.M)\n    if version_match:\n        return version_match.group(1)\n    raise RuntimeError(\"Unable to find version string.\")\n\n\nLONG_DESCRIPTION = \"\"\"\nConsecution is an easy-to-use pipeline abstraction inspired by\nApache Storm topologies.\n\"\"\"\n\nsetup(\n    name='consecution',\n    version=version(os.path.join(file_dir, 'consecution', '__init__.py')),\n    author='Rob deCarvalho',\n    author_email='unlisted',\n    description=('Pipeline Abstraction Library'),\n    license='BSD',\n    keywords=('pipeline apache storm DAG graph topology ETL'),\n    url='https://github.com/robdmc/consecution',\n    packages=find_packages(),\n    long_description=LONG_DESCRIPTION,\n    classifiers=[\n        'Environment :: Console',\n        'Intended Audience :: Developers',\n        'Programming Language :: Python',\n        'Programming Language :: Python :: 2',\n        'Programming Language :: Python :: 3',\n        'Programming Language :: Python :: 2.7',\n        'Programming Language :: Python :: 3.5',\n        'Topic :: Scientific/Engineering',\n    ],\n    extras_require={'dev': ['nose', 'coverage', 'mock', 'flake8', 'coveralls']},\n    install_requires=['graphviz']\n)\n"
  }
]