Repository: robdmc/consecution Branch: develop Commit: c23b4ea20fb7 Files: 29 Total size: 119.8 KB Directory structure: gitextract_eotr679u/ ├── .coveragerc ├── .gitignore ├── .travis.yml ├── LICENSE ├── README.md ├── consecution/ │ ├── .coverage │ ├── __init__.py │ ├── nodes.py │ ├── pipeline.py │ ├── tests/ │ │ ├── __init__.py │ │ ├── nodes_tests.py │ │ ├── pipeline_tests.py │ │ ├── testing_helpers.py │ │ └── utils_tests.py │ └── utils.py ├── docker/ │ ├── Dockerfile │ ├── docker_build.sh │ ├── docker_run.sh │ └── simple_example.py ├── docs/ │ ├── Makefile │ ├── conf.py │ ├── index.rst │ ├── ref/ │ │ └── consecution.rst │ └── toc.rst ├── pandashells.md ├── publish.py ├── sample_data.csv ├── setup.cfg └── setup.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .coveragerc ================================================ [report] show_missing = True ================================================ FILE: .gitignore ================================================ .DS_Store *.pyc ================================================ FILE: .travis.yml ================================================ sudo: false language: python python: - '2.7' - '3.4' - '3.5' - '3.6' - '3.7' install: - pip install -e .[dev] before_script: - flake8 . script: - nosetests - coverage report --fail-under=100 after_success: - coveralls notifications: email: false addons: apt: packages: - graphviz ================================================ FILE: LICENSE ================================================ Copyright (c) 2015, Robert deCarvalho All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. The views and conclusions contained in the software and documentation are those of the authors and should not be interpreted as representing official policies, either expressed or implied, of the FreeBSD Project. ================================================ FILE: README.md ================================================ Update (2/23/2021) === It looks like this README is slowly turning into a reference of all the projects in this space that I think are better than consecution. Here is [metaflow](https://github.com/Netflix/metaflow), an offering from Netflix. Update (9/21/2020) === Another library that I believe to be better than consecution is the [pypeln](https://cgarciae.github.io/pypeln/) project. The way it allows for a different number of workers on each node of a pipeline is quite nice. Additionally the ability to control whether each node is run using threads, processes, async, or sync is really useful. Update (5/1/2020) === Since writing this, the excellent [streamz](https://streamz.readthedocs.io/en/latest/) package has been created. Streamz is the project I wish had existed back when I wrote this. It is a much more capable implementation of the of the core ideas of consecution, and plays nicely with [dask](https://dask.org/) to achieve scale. I have started using streamz in my work in place of consecution. Consecution === [](https://travis-ci.org/robdmc/consecution) [](https://coveralls.io/github/robdmc/consecution?branch=add_docs) Introduction --- Consecution is: * An easy-to-use pipeline abstraction inspired by Apache Storm Topologies * Designed to simplify building ETL pipelines that are robust and easy to test * A system for wiring together simple processing nodes to form a DAG, which is fed with a python iterable * Built using synchronous, single-threaded execution strategies designed to run efficiently on a single core * Implemented in pure-python with optional requirements that are needed only for graph visualization * Written with 100% test coverage Consecution makes it easy to build systems like this.  Installation --- Consecution is a pure-python package that is simply installed with pip. The only non-essential requirement is the Graphviz system package, which is only needed if you want to create graphical representations of your pipeline.
[~]$ pip install consecution
Docker
---
If you would like to try out consecution on docker, check out consecution from github and navigate to the
`docker/` subdirectory. From there, run the following.
* Build the consecution image: `docker_build.sh`
* Start a container: `docker_run.sh`
* Once in the container, run the example: `python simple_example.py`
Quick Start
---
What follows is a quick tour of consecution. See the API documentation for
more detailed information.
### Nodes
Consecution works by wiring together nodes. You create nodes by inheriting from the
`consecution.Node` class. Every node must define a `.process()` method. This method
contains whatever logic you want for processing single items as they pass through your
pipeline. Here is an example of a node that simply logs items passing through it.
```python
from consecution import Node
class LogNode(Node):
def process(self, item):
# any logic you want for processing single item
print('{: >15} processing {}'.format(self.name, item))
# send item downstream
self.push(item)
```
### Pipelines
Now let's create a pipeline that wires together a series of these logging nodes.
We do this by employing the pipe symbol in much the same way that you pipe data
between programs in unix. Note that you must name nodes when you instantiate
them.
```python
from consecution import Node, Pipeline
# This is the same node class we defined above
class LogNode(Node):
def process(self, item):
print('{} processing {}'.format(self.name, item))
self.push(item)
# Connect nodes with pipe symbols to create pipeline for consuming any iterable.
pipe = Pipeline(
LogNode('extract') | LogNode('transform') | LogNode('load')
)
```
At this point, we can visualize the pipeline to verify that the topology is
what we expect it to be. If you Graphviz installed, you can now simply type
one of the following to see the pipeline visualized.
```python
# Create a pipeline.png file in your working directory
pipe.plot()
# Interactively display the pipeline visualization in an IPython notebook
# by simply making the final expression in a cell evaluate to a pipeline.
pipe
```
The plot command should produce the following visualization.

If you don't have Graphviz installed, you can print the pipeline
object to get a text-based visualization.
```python
print(pipe)
```
This represents your pipeline as a series of pipe statements showing
how data is piped between nodes.
```
Pipeline
--------------------------------------------------------------------
extract | transform
transform | load
--------------------------------------------------------------------
```
We can now process an iterable with our pipeline by running
```python
pipe.consume(range(5))
```
which will print the following to the console.
```
extract processing 0
transform processing 0
load processing 0
extract processing 1
transform processing 1
load processing 1
extract processing 2
transform processing 2
load processing 2
```
### Broadcasting
Piping the output of a single node into a list of nodes will cause the single
node to broadcast its pushed items to every item in the list. So, again, using
our logging node, we could construct a pipeline like this:
```python
from consecution import Node, Pipeline
class LogNode(Node):
def process(self, item):
print('{} processing {}'.format(self.name, item))
self.push(item)
# pipe to a list of nodes to broadcast items
pipe = Pipeline(
LogNode('extract')
| LogNode('transform')
| [LogNode('load_redis'), LogNode('load_postgres'), LogNode('load_mongo')]
)
pipe.plot()
pipe.consume(range(2))
```
The plot command produces this visualization

and consuming `range(2)` produces this output
```
extract processing 0
transform processing 0
load_redis processing 0
load_postgres processing 0
load_mongo processing 0
extract processing 1
transform processing 1
load_redis processing 1
load_postgres processing 1
load_mongo processing 1
```
### Routing
If you pipe to a list that contains multiple nodes and a single callable, then
consecution will interpret the callable as a routing function that accepts a
single item as its only argument and returns the name of one of the nodes in the
list. The routing function will direct the flow of items as illustrated below.
```python
from consecution import Node, Pipeline
class LogNode(Node):
def process(self, item):
print('{: >15} processing {}'.format(self.name, item))
self.push(item)
def parity(item):
if item % 2 == 0:
return 'transform_even'
else:
return 'transform_odd'
# pipe to a list containing a callable to achieve routing behaviour
pipe = Pipeline(
LogNode('extract')
| [LogNode('transform_even'), LogNode('transform_odd'), parity]
)
pipe.plot()
pipe.consume(range(4))
```
The plot command produces the following pipeline

and consuming `range(4)` produces this output
```
extract processing 0
transform_even processing 0
extract processing 1
transform_odd processing 1
extract processing 2
transform_even processing 2
extract processing 3
transform_odd processing 3
```
### Merging
Up to this point, we have the ability to create processing trees where nodes
can either broadcast to or route between their downstream nodes. We can,
however, do more then this and create DAGs (Directed-Acyclic-Graphs). Piping
from a list back to a single node will merge the output of all nodes in the
list together into the single downstream node like this.
```python
from consecution import Node, Pipeline
class LogNode(Node):
def process(self, item):
print('{: >15} processing {}'.format(self.name, item))
self.push(item)
def parity(item):
if item % 2 == 0:
return 'transform_even'
else:
return 'transform_odd'
# piping from a list back to a single node merges items into downstream node
pipe = Pipeline(
LogNode('extract')
| [LogNode('transform_even'), LogNode('transform_odd'), parity]
| LogNode('load')
)
pipe.plot()
pipe.consume(range(4))
```
The plot command produces the following pipeline

and consuming `range(4)` produces this output
```
extract processing 0
transform_even processing 0
load processing 0
extract processing 1
transform_odd processing 1
load processing 1
extract processing 2
transform_even processing 2
load processing 2
extract processing 3
transform_odd processing 3
load processing 3
```
### Managing Local State
Nodes are classes, and as such, you have the freedom to create any attribute you
want on a node. You can actually define two additional methods on your nodes to
set up and tear down node-local state. It is important to note the order of
execution here. All nodes in a pipeline will execute their `.begin()` methods
in pipeline-order before any items are processed. Each node will enter its
`.end()` method only after it has processed all items, and after all parent
nodes have finished their respective `.end()` methods. Below, we've modified
our LogNode to keep a running sum of all items that pass through it and end by
printing their sum.
```python
from consecution import Node, Pipeline
class LogNode(Node):
def begin(self):
self.sum = 0
print('{}.begin()'.format(self.name))
def process(self, item):
print('{: >15} processing {}'.format(self.name, item))
self.sum += item
self.push(item)
def end(self):
print('sum = {:d} in {}.end()'.format(self.sum, self.name))
# Identical pipeline to merge example above, but with modified LogNode
pipe = Pipeline(
LogNode('extract')
| [LogNode('transform_even'), LogNode('transform_odd'), parity]
| LogNode('load')
)
pipe.consume(range(4))
```
Consuming `range(4)` produces the following output
```
extract.begin()
transform_even.begin()
transform_odd.begin()
load.begin()
extract processing 0
transform_even processing 0
load processing 0
extract processing 1
transform_odd processing 1
load processing 1
extract processing 2
transform_even processing 2
load processing 2
extract processing 3
transform_odd processing 3
load processing 3
sum = 6 in extract.end()
sum = 2 in transform_even.end()
sum = 4 in transform_odd.end()
sum = 6 in load.end()
```
### Managing Global State
Every node object has a `.global_state` attribute that is shared globally across
all nodes in the pipeline. The attribute is also available on the Pipeline
object itself. The GlobalState object is a simple mutable python object whose
attributes can be mutated by any node. It also remains accesible on the
Pipeline object after all nodes have completed. Below is a simple example of
mutating and accessing global state.
```python
from consecution import Node, Pipeline, GlobalState
class LogNode(Node):
def process(self, item):
self.global_state.messages.append(
'{: >15} processing {}'.format(self.name, item)
)
self.push(item)
# create a global state object with a messages attribute
global_state = GlobalState(messages=[])
# Assign the predefined global_state to the pipeline
pipe = Pipeline(
LogNode('extract') | LogNode('transform') | LogNode('load'),
global_state=global_state)
)
pipe.consume(range(3))
# print the content of the global state message list
for msg in pipe.global_state.messages:
print msg
```
Printing the contents of the messages list produces
```
extract processing 0
transform processing 0
load processing 0
extract processing 1
transform processing 1
load processing 1
extract processing 2
transform processing 2
load processing 2
```
## Common Patterns
This section shows examples of how to implement some common patterns in
consecution.
### Map
Mapping with nodes is very simple. Just push an altered item downstream.
```python
from consecution import Node, Pipeline
class Mapper(Node):
def process(self, item):
self.push(2 * item)
class LogNode(Node):
def process(self, item):
print('{: >15} processing {}'.format(self.name, item))
self.push(item)
pipe = Pipeline(
LogNode('extractor') | Mapper('mapper') | LogNode('loader')
)
pipe.consume(range(3))
```
This will produce an output of
```
extractor processing 0
loader processing 0
extractor processing 1
loader processing 2
extractor processing 2
loader processing 4
```
### Reduce
Reducing, or folding, is easily implemented by using the `.begin()`
and `.end()` methods to handle accumulated values.
```python
from consecution import Node, Pipeline
class Reducer(Node):
def begin(self):
self.result = 0
def process(self, item):
self.result += item
def end(self):
self.push(self.result)
class LogNode(Node):
def process(self, item):
print('{: >15} processing {}'.format(self.name, item))
self.push(item)
pipe = Pipeline(
LogNode('extractor') | Reducer('reducer') | LogNode('loader')
)
pipe.consume(range(3))
```
This will produce an output of
```
extractor processing 0
extractor processing 1
extractor processing 2
loader processing 3
```
### Filter
Filtering is as simple as placing the push statement behind a conditional. All
items that don't pass the conditional will not be pushed downstream, and thus
silently dropped.
```python
from consecution import Node, Pipeline
class Filter(Node):
def process(self, item):
if item > 3:
self.push(item)
class LogNode(Node):
def process(self, item):
print('{: >15} processing {}'.format(self.name, item))
self.push(item)
pipe = Pipeline(
LogNode('extractor') | Filter('filter') | LogNode('loader')
)
pipe.consume(range(6))
```
This produces an output of
```
extractor processing 0
extractor processing 1
extractor processing 2
extractor processing 3
extractor processing 4
loader processing 4
extractor processing 5
loader processing 5
```
### Group By
Consecution provides a specialized class you can inherit from to perform
grouping operations. GroupBy nodes must define two methods: `.key(item)` and
`.process(batch)`. The `.key` method should return a key from an item that is used
to identify groups. Any time that key changes, a new group is initiated. Like
Python's `itertools.groupby`, you will usually want the GroupByNode to process
sorted items. The `.process` method functions exactly like the `.process`
method on regular nodes, except that instead of being called with items,
consecution will call it with a batch of items contained in a list.
```python
class LogNode(Node):
def process(self, item):
print('{: >15} processing {}'.format(self.name, item))
self.push(item)
class Batcher(GroupByNode):
def key(self, item):
return item // 4
def process(self, batch):
sum_val = sum(batch)
self.push(sum_val)
pipe = Pipeline(
Batcher('batcher') | LogNode('logger')
)
pipe.consume(range(16))
```
This produces an output of
```
logger processing 6
logger processing 22
logger processing 38
logger processing 54
```
### Plugin-Style Composition
Consecution forces you to think about problems in terms of how small processing
units are connected. This separation between logic and connectivity can be
exploited to create flexible and reusable solutions. Basically, you specify the
connectivity you want to use in solving your problem, and then plug in the
processing units later. Breaking the problem up in this way allows you to swap
out processing units to acheive different objectives with the same pipeline.
```python
# This function defines a pipeline that can use swappable processing nodes.
# We don't worry about how we are going to do logging or aggregating.
# We just focus on how the nodes are connected.
def pipeline_factory(log_node, agg_node):
pipe = Pipeline(
log_node('extractor') | agg_node('aggregator') | log_node('result_logger')
)
return pipe
# Now we define a node for left-justified logging
class LeftLogNode(Node):
def process(self, item):
print('{: <15} processing {}'.format(self.name, item))
self.push(item)
# And one for right-justified logging
class RightLogNode(Node):
def process(self, item):
print('{: >15} processing {}'.format(self.name, item))
self.push(item)
# We can aggregate by summing
class SumNode(Node):
def begin(self):
self.result = 0
def process(self, item):
self.result += item
def end(self):
self.push(self.result)
# Or we can aggregate by multiplying
class ProdNode(Node):
def begin(self):
self.result = 1
def process(self, item):
self.result *= item
def end(self):
self.push(self.result)
# Now we plug in nodes to create a pipeline that left-prints sums
sum_pipeline = pipeline_factory(log_node=LeftLogNode, agg_node=SumNode)
# And a different pipeline that right prints products
prod_pipeline = pipeline_factory(log_node=RightLogNode, agg_node=ProdNode)
print 'aggregate with sum, left justified\n' + '-'*40
sum_pipeline.consume(range(1, 5))
print '\naggregate with product, right justified\n' + '-'*40
prod_pipeline.consume(range(1, 5))
```
This produces the following output
```
aggregate with sum, left justified
----------------------------------------
extractor processing 1
extractor processing 2
extractor processing 3
extractor processing 4
result_logger processing 10
aggregate with product, right justified
----------------------------------------
extractor processing 1
extractor processing 2
extractor processing 3
extractor processing 4
result_logger processing 24
```
# Aggregation Example
We end with a full-blown example of using a pipeline to aggregate data from a
csv file. The data is contained in
a csv file that looks like this.
gender |age |spent
--- |--- |---
male |11 |39.39
female |10 |34.72
female |15 |40.02
male |19 |26.27
male |13 |21.22
female |40 |23.17
female |52 |33.42
male |33 |39.52
female |16 |28.65
male |60 |26.74
Although there are much simpler ways of solving this problem, (e.g. with
Pandashells)
we deliberately construct a complex topology just to illustrate how to achieve
complexity when it is actually needed.
The diagram below was produced from the code beneath it. A quick glance at the
diagram makes it obvious how the data is being routed through the system. The
code is heavily commented to explain features of the consecution toolkit.

```python
from __future__ import print_function
from collections import namedtuple
from pprint import pprint
import csv
from consecution import Node, Pipeline, GlobalState
# Named tuples are nice immutable containers
# for passing data between nodes
Person = namedtuple('Person', 'gender age spent')
# Create a pipeline that aggregates by gender and age
# In creating the pipeline we focus on connectivity and don't
# worry about defining node behavior.
def pipe_factory(Extractor, Agg, gender_router, age_router):
# Consecution provides a generic GlobalState class. Any object can be used
# as the global_state in a pipeline, but the GlobalState object provides a
# nice abstraction where attributes can be accessed either by dot notation
# (e.g. global_state.my_attribute) or by dictionary notation (e.g.
# global_state['my_attribute']. Furthermore, GlobalState objects can be
# instantiated with initialized attributes using key-word arguments as shown
# here.
global_state = GlobalState(segment_totals={})
# Notice, we haven't even defined the behavior of these nodes yet. They
# will be defined later and are, for now, just passed into the factory
# function as arguments while we focus on getting the topology right.
pipe = Pipeline(
Extractor('make_person') |
[
gender_router,
(Agg('male') | [age_router, Agg('male_child'), Agg('male_adult')]),
(Agg('female') | [age_router, Agg('female_child'), Agg('female_adult')]),
],
global_state=global_state
)
# Nodes can be created outside of a pipeline definition
adult = Agg('adult')
child = Agg('child')
total = Agg('total')
# Sometimes the topology you want to create cannot easily be expressed
# using the pipeline abstraction for wiring nodes together. You can
# drop down to a lower level of abstraction by explicitly wiring nodes
# together using the .add_downstream() method.
adult.add_downstream(total)
child.add_downstream(total)
# Once a pipeline has been created, you can access individual nodes
# with dictionary-like indexing on the pipeline.
pipe['male_child'].add_downstream(child)
pipe['female_child'].add_downstream(child)
pipe['male_adult'].add_downstream(adult)
pipe['female_adult'].add_downstream(adult)
return pipe
# Now that we have the topology of our pipeline defined, we can think about the
# logic that needs to go into each node. We start by defining a node that takes
# a row from a csv file and tranforms it into a namedtuple.
class MakePerson(Node):
def process(self, item):
item['age'] = int(item['age'])
item['spent'] = float(item['spent'])
self.push(Person(**item))
# We now define a node to perform our aggregations. Mutable global state comes
# with a lot of baggage and should be used with care. This node illustrates
# how to use global state to put all aggregations in a central location that
# remains accessible when the pipeline finishes processing.
class Sum(Node):
def begin(self):
# initialize the node-local sum to zero
self.total = 0
def process(self, item):
# increment the node-local total and push the item down stream
self.total += item.spent
self.push(item)
def end(self):
# when pipeline is done, update global state with sum
self.global_state.segment_totals[self.name] = round(self.total, 2)
# This function routes tuples based on their associated gender
def by_gender(item):
return '{}'.format(item.gender)
# This function routes tuples based on whether the purchaser was an adult or
# child
def by_age(item):
if item.age >= 18:
return '{}_adult'.format(item.gender)
else:
return '{}_child'.format(item.gender)
# Here we plug our node definitions into our topology to create a fully-defined
# pipeline.
pipe = pipe_factory(MakePerson, Sum, by_gender, by_age)
# We can now visualize pipeline.
pipe.plot()
# Now we feed our pipeline with rows from the csv file
with open('sample_data.csv') as f:
pipe.consume(csv.DictReader(f))
# The global_state is also available as an attribute on the pipeline allowing
# us to access it when the pipeline is finished. This is a good way to "return"
# an object from a pipeline. Here we simply print the result.
print()
pprint(pipe.global_state.segment_totals)
```
And this is the result of running the pipeline with the sample csv file.
```
{'adult': 149.12,
'child': 164.0,
'female': 159.98,
'female_adult': 56.59,
'female_child': 103.39,
'male': 153.14,
'male_adult': 92.53,
'male_child': 60.61,
'total': 313.12}
```
As illustrated in the
Pandashells example, this aggregation is actually much more simple to
implement in Pandas. However, there are a couple of important caveats.
The Pandas solution must load the entire csv file into memory at once. If you
look at the pipeline solution, you will notice that each node simply increments
its local sum and passes the data downstream. At no point is the data
completely loaded into memory. Although the Pandas code runs much faster due to
the highly optimized vectorized math it employes, the pipeline solution can
process arbitrarily large csv files with a very small memory footprint.
Perhaps the most exciting aspect of consecution is its ability to create
repeatable and testable data analysis pipelines. Passing Pandas Dataframes
through a consecution pipeline makes it very easy to encapsulate any analysis
into a well-defined, repeatable process where each node manipulates a dataframe
in its prescribed way. Adopting this structure in analysis projects will
undoubtedly ease the transition from analysis/research into production.
___
Projects by [robdmc](https://www.linkedin.com/in/robdecarvalho).
* [Pandashells](https://github.com/robdmc/pandashells) Pandas at the bash command line
* [Consecution](https://github.com/robdmc/consecution) Pipeline abstraction for Python
* [Behold](https://github.com/robdmc/behold) Helping debug large Python projects
* [Crontabs](https://github.com/robdmc/crontabs) Simple scheduling library for Python scripts
* [Switchenv](https://github.com/robdmc/switchenv) Manager for bash environments
* [Gistfinder](https://github.com/robdmc/gistfinder) Fuzzy-search your gists
================================================
FILE: consecution/__init__.py
================================================
# flake8: noqa
from consecution.nodes import Node, GroupByNode
from consecution.pipeline import Pipeline, GlobalState
from consecution.utils import Clock
__version__ = '0.2.0'
================================================
FILE: consecution/nodes.py
================================================
import sys
from collections import Counter, deque, OrderedDict
import traceback
from consecution.utils import Clock
class Node(object):
"""
:type name: str
:param str: The name of this node. Must be unique within a pipeline.
:type kwargs: keyword args
:param kwargs: Any additional keyword args are assigned as attributes
on the node.
You create nodes by inheriting from this class. You will be required to
implement a `.process()` on your class. You can call the `.push()` method
from anywhere in your class implementation except from within the
`.begin()` method.
Note that although this documentation refers to "the `.push` method",
`push` is actually a callable attribute assigned when nodes are placed
into pipelines.
Its signature is `.push(item)`, where `item` can be anything you want pushed
to nodes connected to the downstream side of the node.
"""
def __init__(self, name, **kwargs):
# assign any user-defined attributes
for k, v in kwargs.items():
setattr(self, k, v)
self.name = name
self._upstream_nodes = []
self._downstream_nodes = []
self._num_top_down_calls = 0
# node network can be visualized with pydot. These hold args and kwargs
# that will be used to add and connect this node in the graph visualization
self._pydot_node_kwargs = dict(name=self.name, shape='rectangle')
self._pydot_edge_kwarg_list = []
self._router = None
# this will be one of three values: None, 'input', 'output'
self._logging = None
# add a clock to allow for timing
self.clock = Clock()
def __str__(self):
return 'N({})'.format(self.name)
def __repr__(self):
return self.__str__()
def __hash__(self):
"""
define __hash__ method. dicts and sets will use this as key
"""
return id(self)
def __eq__(self, other):
return self.__hash__() == other.__hash__()
def __lt__(self, other):
"""
I need this to be able to sort by name
"""
return self.name < other.name
def __getitem__(self, key):
msg = (
'\n\nYou cannot call __getitem__ on nodes. You tried to call\n'
'{self} [{key}]\n'
'which doesn\'t make sense. You probably meant\n'
'{self} | [{key}]\n'
).format(self=self, key=key)
raise ValueError(msg)
def _get_flattened_list(self, obj):
if isinstance(obj, Node):
return [obj]
elif hasattr(obj, '__iter__'):
nodes = []
for el in obj:
if isinstance(el, Node):
nodes.append(el)
elif hasattr(el, '__iter__'):
nodes.extend(self._get_flattened_list(el))
return nodes
else:
msg = (
'Don\'t know what to do with {}. It\'s not a node, and it\'s '
'not iterable.'
).format(repr(obj))
raise ValueError(msg)
def _get_exposed_slots(self, obj, pointing):
nodes = set()
for node in self._get_flattened_list(obj):
if pointing == 'left':
nodes = nodes.union(node.initial_node_set)
elif pointing == 'right':
nodes = nodes.union(node.terminal_node_set)
else:
raise ValueError('pointing must be "left" or "right"')
return nodes
def _connect_lefts_to_rights(self, lefts, rights, router=None):
slots_from_left = self._get_exposed_slots(lefts, pointing='right')
slots_from_right = self._get_exposed_slots(rights, pointing='left')
for left in slots_from_left:
router_node = None
if router:
router_name = '{}.{}'.format(
left.name, self._get_object_name(router))
end_point_map = {n.name: n for n in slots_from_right}
router_node = _RouterNode(
router_name, end_point_map, router)
left.add_downstream(router_node)
for right in slots_from_right:
if router_node:
router_node.add_downstream(right)
else:
left.add_downstream(right)
def _get_object_name(self, obj):
class_name = obj.__class__.__name__
if class_name == 'function':
return obj.__name__
else:
return class_name
def _get_router(self, obj):
router = None
if hasattr(obj, '__iter__'):
routers = [el for el in obj if hasattr(el, '__call__')]
router = routers[0] if routers else None
return router
def __or__(self, other):
router = self._get_router(other)
self._connect_lefts_to_rights(self, other, router)
return self
def __ror__(self, other):
self._connect_lefts_to_rights(other, self)
return self
@property
def top_node(self):
"""
This attribute always holds the top-most node in the node graph.
Consecution only allows one top node.
"""
root_nodes = self.root_nodes
if len(root_nodes) > 1:
msg = 'You must remove one of the following input nodes {}'.format(
root_nodes)
raise ValueError(msg)
else:
return root_nodes.pop()
@property
def terminal_node_set(self):
"""
This attribute holds a set of all bottom nodes in the node graph.
"""
return {
node for node in self.depth_first_walk('down')
if len(node._downstream_nodes) == 0
}
@property
def initial_node_set(self):
"""
When piecing together fragments of a graph, you can temporarily have
connected nodes with multiple "top-nodes." This method returns this
set of nodes. Node that consecution can only make pipelines from
graphs having a single top node.
"""
self.depth_first_walk('up')
return {
node for node in self.depth_first_walk('up')
if len(node._upstream_nodes) == 0
}
@property
def root_nodes(self):
"""
This attribute holds a list of all nodes that do not have any upstream
nodes attached.
"""
return [
node for node in self.all_nodes
if len(node._upstream_nodes) == 0
]
@property
def all_nodes(self):
"""
This attribute contains a set of all nodes in the graph.
"""
return self.depth_first_walk('both')
def log(self, what):
"""
Calling this method on a node will turn on its logging feature. This
means that the node will print logged items to the console. You can
choose whether to log the inputs or outputs of a node.
:type name: what
:param what: One of 'input' or 'output' indicating whther you want to
log the input or output of this node.
"""
allowed = ['input', 'output']
if what not in allowed:
raise ValueError(
'\'what\' argument must be in {}'.format(allowed)
)
self._logging = what
def _get_downstream_reps(self):
if self._downstream_nodes:
downstreams = sorted([n.name for n in self._downstream_nodes])
if len(downstreams) == 1:
downstreams = downstreams[0]
template = '{{: >{}s}} | {{}}\n'.format(
self.pipeline._longest_node_name_len_)
self.pipeline._node_repr += template.format(
self.name, downstreams).replace('\'', '')
def top_down_make_repr(self):
"""
You should never need to use this method. It iterates through the node
graph in top-down order making a repr string for each node.
"""
if not hasattr(self, 'pipeline'):
raise ValueError(
'top_down_make_repr can only be called for nodes in a pipeline')
self.pipeline._longest_node_name_len_ = max(
len(n.name) for n in self.all_nodes)
self.pipeline._node_repr = ''
self.top_node.top_down_call('_get_downstream_reps')
def top_down_call(self, method_name):
"""
This utility method traverses the graph in top-down order and invokes
the named method on every node it encounters. It is used internally
to make sure the `.begin()` and `.end()` methods are not called before
their upstream counterparts.
:type method_name: str
:param method_name: The name of the method you would like to call in
top-down order.
"""
# record the number of upstreams this node has
num_upstreams = len(self._upstream_nodes)
# if this node isn't pulling from multiple upstreams, it's ready
# to recurse to downstreams
if num_upstreams <= 1:
ready_for_downstreams = True
# this node isn't ready to recurse to downstreams until the current
# call would mean the last required call.
elif self._num_top_down_calls == num_upstreams - 1:
ready_for_downstreams = True
else:
ready_for_downstreams = False
# if ready to recurse, then call the method on self and recurse
# downwards.
if ready_for_downstreams:
getattr(self, method_name)()
for downstream in self._downstream_nodes:
downstream.top_down_call(method_name)
self._num_top_down_calls = 0
else:
self._num_top_down_calls += 1
def depth_first_walk(self, direction='both', as_ordered_list=False):
"""
This method walks the graph of connected nodes in depth-first
order. It uses a stack to emulate recursion. See good explanation at
https://jeremykun.com/2013/01/22/depth-and-breadth-first-search/
:type direction: str
:param direction: one of 'up', 'down' or 'both' specifying the direction
to walk.
:type as_ordered_list: Bool
:param as_ordered_list: If set to true, returns the walked nodes as
an ordered list instead of an unordered set.
:rtype: list or set
:return: An iterable of the discovered nodes.
"""
return self.walk(
direction=direction, how='depth_first',
as_ordered_list=as_ordered_list)
def breadth_first_walk(self, direction='both', as_ordered_list=False):
"""
This method walks the graph of connected nodes in breadth-first
order. It uses a stack to emulate recursion. See good explanation at
https://jeremykun.com/2013/01/22/depth-and-breadth-first-search/
:type direction: str
:param direction: one of 'up', 'down' or 'both' specifying the direction
to walk.
:type as_ordered_list: Bool
:param as_ordered_list: If set to true, returns the walked nodes as
an ordered list instead of an unordered set.
:rtype: list or set
:return: An iterable of the discovered nodes.
"""
return self.walk(
direction=direction, how='breadth_first',
as_ordered_list=as_ordered_list)
def walk(
self, direction='both', how='breadth_first', as_ordered_list=False):
"""
This is the core algorithm for walking a graph in specified order. It
is used by the `breadth_first_walk` and `depth_first_walk` methods.
:type how: str
:param how: one of 'breadth_first' or 'depth_first'
:type direction: str
:param direction: one of 'up', 'down' or 'both' specifying the direction
to walk.
:type as_ordered_list: Bool
:param as_ordered_list: If set to true, returns the walked nodes as
an ordered list instead of an unordered set.
:rtype: list or set
:return: An iterable of the discovered nodes.
"""
if how not in {'depth_first', 'breadth_first'}:
raise ValueError(
'\'how\' argument must be one of '
'[\'depth_first\', \'breadth_first\']'
)
# What I really want is an ordered set, which doesn't exist. So I'm
# using the keys of an ordered dict to get the functionality I want.
# I have no need for the values in this dict, only the keys.
visited_nodes = OrderedDict()
# holds nodes that still need to be explored
queue = deque([self])
# while I still have nodes that need exploring
while len(queue) > 0:
# get the next node to explore
node = queue.pop()
# if I've already seen this node, nothing to do, so go to next
if node in visited_nodes:
continue
# Make sure I don't visit this node again
# again. I'm using an ordered dict to mimic an ordered set.
# I have no need for the value, so set it to None
visited_nodes[node] = None
neighbor_dict = {
'up': node._upstream_nodes,
'down': node._downstream_nodes,
'both': node._upstream_nodes + node._downstream_nodes,
}
if direction not in neighbor_dict:
raise ValueError(
'direction must be \'up\', \'dowwn\' or \'both\'')
neighbors = neighbor_dict[direction]
# search all neightbors to this node for unvisited nodes
for node in neighbors:
# if you find unvisited node, add it to nodes needing visit
if node not in visited_nodes:
if how == 'breadth_first':
queue.appendleft(node)
else:
queue.append(node)
# should have hit all nodes in the graph at this point
if as_ordered_list:
return list(visited_nodes.keys())
else:
return set(visited_nodes.keys())
def _check_for_dups(self):
counter = Counter()
for node in self.all_nodes:
counter.update({node.name: 1})
dups = [name for (name, count) in counter.items() if count > 1]
if dups:
msg = (
'\n\nNode names must be unique. Dupicates {} found.'
).format(list(dups))
raise ValueError(msg)
return
def _check_for_cycles(self):
self_and_upstreams = self.depth_first_walk('up')
downstreams = self.depth_first_walk('down') - {self}
common_nodes = self_and_upstreams.intersection(downstreams)
if common_nodes:
raise ValueError('\n\nYour graph is not acyclic. It has loops.')
def _validate_node(self, other):
# only nodes allowed to be connected
if not isinstance(other, Node):
raise ValueError('Trying to connect a non-node type')
def add_downstream(self, other):
"""
You will probably use this method quite a bit. It is used to manually
attach a downstream node.
:type other: consecution.Node
:param other: An instance of the node you want to attach
"""
self._validate_node(other)
self._downstream_nodes.append(other)
other._upstream_nodes.append(self)
self._check_for_dups()
if self.name == other.name:
raise ValueError('{} can\'t be downstream to itself'.format(self))
self._check_for_cycles()
self._pydot_edge_kwarg_list.append(
dict(tail_name=self.name, head_name=other.name))
def remove_downstream(self, other):
"""
This method removes the given node from being attached as a downstream
node.
:type other: consecution.Node
:param other: An instance of the node you want to remove
"""
# remove self from the other's upstreams
other._upstream_nodes = [
n for n in other._upstream_nodes if n.name != self.name]
# remove other from self's downstream nodes
self._downstream_nodes = [
n for n in self._downstream_nodes if n.name != other.name]
# remove this connection from the pydot kwargs list
new_kwargs_list = []
for kwargs in self._pydot_edge_kwarg_list:
if kwargs['head_name'] == other.name:
continue
new_kwargs_list.append(kwargs)
self._pydot_edge_kwarg_list = new_kwargs_list
def _build_pydot_graph(self):
"""
This private method builds a pydot graph
"""
# define kwargs lists for creating the visualization (these are closure vars for function below)
node_kwargs_list, edge_kwargs_list = [], []
# define a function to map over all nodes to aggreate viz kwargs
def collect_kwargs(node):
node_kwargs_list.append(node._pydot_node_kwargs)
edge_kwargs_list.extend(node._pydot_edge_kwarg_list)
for node in self.all_nodes:
collect_kwargs(node)
# doing import inside method so that pydot dependency is optional
from graphviz import Digraph
# create a pydot graph
graph = Digraph(comment='pipeline')
# create pydot nodes for every node connected to this one
for node_kwargs in node_kwargs_list:
graph.node(**node_kwargs)
# creat pydot edges between all nodes connected to this one
for edge_kwargs in edge_kwargs_list:
graph.edge(**edge_kwargs)
return graph
def plot(
self, file_name='pipeline', kind='png'):
"""
This method draws a visualization of your processing graph. You must
have graphviz installed on your system for it to work properly. (See
install instructions.)
If you are running consecution in an Jupyter notebook, you can display
an inline visualization of a pipeline by simply making the pipeline be
the final expression in a cell.
:type file_name: str
:param file_name: The name of the image file to generate
:type kind: str
:param kind: The kind of file to generate (png, pdf)
"""
graph = self._build_pydot_graph()
# define allowed formats for saving the graph visualization
ALLOWED_KINDS = {'pdf', 'png'}
if kind not in ALLOWED_KINDS:
raise ValueError('Only the following kinds are supported: {}'.format(ALLOWED_KINDS))
# set the output format
graph.format = kind
file_name = file_name.replace('.{}'.format(kind), '')
# write the output file
try:
graph.render(file_name)
except RuntimeError:
sys.stderr.write(
'\n\n'
'=========================================================\n'
'Problem executing GraphViz. Make sure you have it\n'
'properly installed.\n'
'http://www.graphviz.org/\n'
'If you are on a mac, you should be able to install it with\n'
'brew install graphviz.\n\n'
'If you are on ubuntu, you can install it with\n'
'apt-get install graphviz\n'
'=========================================================\n'
'\n\n'
)
raise
def process(self, item):
"""
:type item: object
:param item: The item this node should process
You must override this method with your own logic.
"""
raise NotImplementedError(
(
'Error in node named {}\n'
'You must define a .process(self, item) method on all nodes'
).format(repr(self.name))
)
def reset(self):
"""
User can override this to do whatever logic they want.
"""
def _logged_process(self, item):
if self._logging == 'input':
self._write_log(item)
self.process(item)
def _begin(self):
try:
self.begin()
except AttributeError:
e = sys.exc_info()[1]
tb = sys.exc_info()[2]
(
code_file, line_no, method_name, line_txt
) = traceback.extract_tb(tb)[-1]
msg = str(e) + (
'\n\nError in .begin() method of \'{}\' node.\n'
'Are you trying to call .push() from inside the\n'
'.begin() method? That is not allowed.\n\n'
'file: {}, line{}\n--> {}\n\n'
).format(self.name, code_file, line_no, line_txt)
traceback.print_exc()
raise AttributeError(msg)
def begin(self):
pass
def end(self):
pass
def _write_log(self, item):
sys.stdout.write('node_log,{},{},{}\n'.format(self._logging, self.name, item))
def _push(self, item):
"""
This is the default pusher. It pushes to all downstreams.
"""
if self._logging == 'output':
self._write_log(item)
# The _process attribute will be set to the appropriate callable
# when initializing the pipeline. I do this because I want the
# chaining to be as efficient as possible. If logging is not set,
# I don't want to have to hit that logic every push, so I just
# invoke a callable attribute at each process that has been set
# to the appropriate callable.
for downstream in self._downstream_nodes:
downstream._process(item)
class _RouterNode(Node):
"""
This node will route to downstreams. The router function needs to
return the name of the destination node.
"""
def __init__(self, name, end_point_map, route_callable):
super(_RouterNode, self).__init__(name)
self._end_point_map = end_point_map
self._pydot_node_kwargs = dict(name=self.name, shape='oval')
self._route_callable = route_callable
def process(self, item):
"""
This is the default pusher. It pushes to all downstreams.
"""
node = self._end_point_map.get(self._route_callable(item), None)
if node is None:
raise ValueError(
(
'\n\nRouter node {} encountered bad route path {}. Valid '
'route paths are {}.'
).format(
self.name,
repr(self._route_callable(item)),
[n.name for n in self._downstream_nodes]
)
)
node._process(item)
class GroupByNode(Node):
def __init__(self, *args, **kwargs):
super(GroupByNode, self).__init__(*args, **kwargs)
self._batch_ = []
self._previous_key = '__no_previous_key__'
def key(self, item):
"""
You must define this method.
:type item: object
:param item: The item you are processing
:rtype: hashable object
:return: a hashable object that serves as a key for the grouping process
"""
raise NotImplementedError(
'you must define a .key(self, item) method on all '
'GroupBy nodes.'
)
def process(self, batch):
"""
You must define this method.
:type batch: iterable
:param batch: A batch of items having the same key
"""
raise NotImplementedError(
'You must define a .process(self, batch) method on all GroupBy '
'nodes.'
)
def _process_item(self, item):
key = self.key(item)
if key != self._previous_key:
self._previous_key = key
if len(self._batch_) > 0:
self.process(self._batch_)
self._batch_ = [item]
else:
self._batch_.append(item)
def _end(self):
self.process(self._batch_)
self._batch_ = []
def __getattribute__(self, name):
"""
This should trap for the end() method calls and install
pre hook.
"""
if name == 'end':
def wrapper():
self._end()
return super(GroupByNode, self).__getattribute__(name)()
return wrapper
else:
return super(GroupByNode, self).__getattribute__(name)
================================================
FILE: consecution/pipeline.py
================================================
import sys
from consecution.nodes import GroupByNode
class GlobalState(object):
"""
GlobalState is a simple container class that sets its attributes from
constructor kwargs. It supports both object and dictionary access to its
attributes. So, for example, all of the following statements are supported.
.. code-block:: python
from consecution import GlobalState
global_state = GlobalState(a=1, b=2)
global_state['c'] = 2
a = global_state['a']
An object of this class will be created as the default ``.global_state``
attribute on a Pipeline if you do not explicitely provide a global_state
argument to the constructor.
"""
# I'm using unconventional "_item_self_" name here to avoid
# conflicts when kwargs actually contain a "self" arg.
def __init__(_item_self, **kwargs):
for key, val in kwargs.items():
_item_self[key] = val
def __str__(_item_self):
quoted_keys = [
'\'{}\''.format(k) for k in sorted(vars(_item_self).keys())]
att_string = ', '.join(quoted_keys)
return 'GlobalState({})'.format(att_string)
def __repr__(_item_self):
return _item_self.__str__()
def __setitem__(_item_self, key, value):
setattr(_item_self, key, value)
def __getitem__(_item_self, key):
return getattr(_item_self, key)
class Pipeline(object):
"""
:type node: Node
:param node: Any node in a connected graph
:type global_state: object
:param global_state: Any python object you want to use for holding global
state.
Once Nodes have been wired together, they must be placed in a pipeline in
order to process data. If you would like to peform pipeline-level set up and
tear-down logic, you can subclass from Pipeline and override the
``.begin()`` and ``end()`` methods.
"""
def __init__(self, node, global_state=None):
# get a reference to the top node of the connected nodes supplied.
self.top_node = node.top_node
# set the pipeline global state
if global_state:
self.global_state = global_state
else:
self.global_state = GlobalState()
# initialize an empty lookup for nodes
self._node_lookup = {}
# initialize the pipeline
self.initialize()
def initialize(self, with_push=False):
# define a flag to determine if the pipeline is "running" or not
# it will only be true between when the .begin() is run and the
# .end() method is run.
self._is_running = False
self._needs_log_header = False
# initialize each node
for node in self.top_node.all_nodes:
self.initialize_node(node, with_push)
# build the pipeline repr by cycling through all the nodes
self.top_node.top_down_make_repr()
# print a logging header if any node is logging
if self._needs_log_header:
sys.stdout.write('node_log,what,node_name,item\n')
def initialize_node(self, node, with_push=False):
# give node reference to pipeline attributes
node.pipeline = self
node.global_state = self.global_state
# make node available for lookup
self._node_lookup[node.name] = node
# set the _process callable to be either logged or unlogged
# TODO: might want to change this logic so that groupby nodes
# can be logged
if isinstance(node, GroupByNode):
node._process = node._process_item
elif node._logging is None:
node._process = node.process
else:
self._needs_log_header = True
node._process = node._logged_process
# for single downstreams with no logging, can short-circuit all logic
# and directly wire up the downstream process() callable as the
# push callable on this node
short_it = len(node._downstream_nodes) == 1
short_it = short_it and node._downstream_nodes[0]._logging is None
short_it = short_it and not isinstance(
node._downstream_nodes[0], GroupByNode)
# only initialize push if requsted
if with_push:
if short_it and node._logging is None:
node.push = node._downstream_nodes[0].process
# logged or multiple downstreams require logic, so no short circuit
else:
node.push = node._push
def __getitem__(self, name):
node = self._node_lookup.get(name, None)
if node is None:
raise KeyError('No node named \'{}\''.format(name))
return node
def __setitem__(self, name_to_replace, replacement_node):
# make sure replacement node has proper name
if name_to_replace != replacement_node.name:
raise ValueError(
'Replacement node must have the same name.'
)
# this will automatically raise error if the name doesn't exist
node_to_replace = self[name_to_replace]
removals = []
additions = []
for upstream in node_to_replace._upstream_nodes:
removals.append((upstream, node_to_replace))
additions.append((upstream, replacement_node))
# handle special case of upstream being a routing node
if hasattr(upstream, '_end_point_map'):
upstream._end_point_map[name_to_replace] = replacement_node
for downstream in node_to_replace._downstream_nodes:
removals.append((node_to_replace, downstream))
additions.append((replacement_node, downstream))
for upstream, downstream in removals:
upstream.remove_downstream(downstream)
for upstream, downstream in additions:
upstream.add_downstream(downstream)
# initialize the replacement node within the pipeline
self.initialize_node(replacement_node)
# if top node was replaced then make sure pipeline nows about it
if replacement_node.name == self.top_node.name:
self.top_node = replacement_node
def __getattribute__(self, name):
"""
This should trap for the begin() and end() method calls and install
pre/post hooks for when they are called either on the pipeline
class or on any class derived from it.
"""
if name == 'begin':
def wrapper():
super(Pipeline, self).__getattribute__(name)()
self._begin()
return wrapper
elif name == 'end':
def wrapper():
self._end()
return super(Pipeline, self).__getattribute__(name)()
return wrapper
elif name == 'reset':
def wrapper():
self._reset()
return super(Pipeline, self).__getattribute__(name)()
return wrapper
else:
return super(Pipeline, self).__getattribute__(name)
def begin(self):
"""
Override this method to execute any logic you want to perform before
setting up nodes. The ``.begin()`` method of all nodes will be called.
"""
def end(self):
"""
Override this method to execute any logic you want to perform after
all nodes are done processing data. The ``.end()`` method of all nodes
will be called.
"""
def reset(self):
"""
Override this with any logic you'd like to perform for resetting the
pipeline. The ``.reset()`` method of all nodes will be called.
"""
def _reset(self):
self.top_node.top_down_call('reset')
def _begin(self):
self.top_node.top_down_call('_begin')
self.initialize(with_push=True)
self._is_running = True
def _end(self):
self.top_node.top_down_call('end')
self._is_running = False
def push(self, item):
"""
You can manually push items to your pipeline using this meethod.
:type item: object
:param item: Any object you would like the pipeline to process
"""
if not self._is_running:
self.begin()
self.top_node._process(item)
def consume(self, iterable):
"""
The pipeline will process each item in the iterable.
:type iterable: A Python Iterable
:param iterable: An iterable of objects you would like to process
"""
self.begin()
for item in iterable:
self.top_node._process(item)
return self.end()
def plot(self, file_name='pipeline', kind='png'):
"""
Call this method to produce a visualization of your pipeline. The
Graphviz library will be used to generate the image file. Note that
pipelines are automatically visualized in IPython notebook when they are
evaluated as the last expression in a cell.
:type file_name: str
:param file_name: The name of the image file to save
:type kind: str
:param kind: The type of image file to produce (png, pdf)
"""
self.top_node.plot(file_name, kind)
return self
def __str__(self):
return (
'\nPipeline\n'
'----------------------------------'
'----------------------------------\n{}'
'----------------------------------'
'----------------------------------\n'
).format(self._node_repr)
def __repr__(self):
return self.__str__()
# No good way to test this unless you know dot is installed.
def _repr_svg_(self): # pragma: no cover
return self.top_node._build_pydot_graph()._repr_svg_()
================================================
FILE: consecution/tests/__init__.py
================================================
================================================
FILE: consecution/tests/nodes_tests.py
================================================
import os
from collections import namedtuple
import shutil
import tempfile
from unittest import TestCase
import subprocess
from mock import patch
from consecution.nodes import Node
def dot_installed():
p = subprocess.Popen(
['bash', '-c', 'which dot'], stdout=subprocess.PIPE)
p.wait()
result = p.stdout.read().decode("utf-8")
return 'dot' in result
class FakeDigraph(object): # pragma: no cover
def __init__(self, *args, **kwargs):
pass
def node(self, *args, **kwargs):
pass
def edge(self, *args, **kwargs):
pass
def render(self, *args, **kwargs):
raise RuntimeError('fake runtime error')
class NodeUnitTests(TestCase):
def test_bad_logging_args(self):
n = Node('a')
with self.assertRaises(ValueError):
n.log('bad')
def test_bad_top_down_make_repr_call(self):
n = Node('a')
with self.assertRaises(ValueError):
n.top_down_make_repr()
def test_args_as_atts(self):
n = Node('my_node', silly_attribute='silly')
self.assertEqual(n.silly_attribute, 'silly')
def test_comparisons(self):
a = Node('a')
b = Node('b')
self.assertTrue(a == a)
self.assertFalse(a == b)
self.assertTrue(a < b)
self.assertFalse(b < a)
def test_bad_flattening(self):
a = Node('a')
with self.assertRaises(ValueError):
a | 7
@patch(
'consecution.nodes.Node._build_pydot_graph', lambda a: FakeDigraph())
def test_graphviz_not_installed(self):
a = Node('a')
b = Node('b')
p = a | b
with self.assertRaises(RuntimeError):
p.plot()
def test_no_getitem(self):
a = Node('a')
with self.assertRaises(ValueError):
a['b']
def test_bad_slot_name(self):
a = Node('a')
b = Node('b')
with self.assertRaises(ValueError):
a._get_exposed_slots(b, 'bad_arg')
class ExplicitWiringTests(TestCase):
def setUp(self):
self.temp_dir = tempfile.mkdtemp()
def tearDown(self):
shutil.rmtree(self.temp_dir)
def do_wiring(self):
self.do_explicit_wiring()
def do_explicit_wiring(self):
# define nodes
a = Node('a')
b = Node('b')
c = Node('c')
d = Node('d')
e = Node('e')
f = Node('f')
g = Node('g')
h = Node('h')
i = Node('i')
j = Node('j')
k = Node('k')
l = Node('l') # noqa. okay to use l as var here
m = Node('m')
n = Node('n')
# save a list of all nodes
self.node_list = [a, b, c, d, e, f, g, h, i, j, k, l, m, n]
self.top_node = a
# wire up the nodes
a.add_downstream(b)
a.add_downstream(c)
c.add_downstream(d)
c.add_downstream(e)
e.add_downstream(f)
e.add_downstream(g)
e.add_downstream(h)
e.add_downstream(i)
f.add_downstream(j)
g.add_downstream(j)
h.add_downstream(j)
i.add_downstream(j)
d.add_downstream(k)
j.add_downstream(k)
b.add_downstream(l)
k.add_downstream(l)
l.add_downstream(m)
l.add_downstream(n)
# same network in graph notation
# a | [
# b,
# c | [
# d,
# e | [f, g, h, i, my_router] | j
# ] | k
# ] | l [m, n]
def do_graph_wiring(self):
# define nodes
a = Node('a')
b = Node('b')
c = Node('c')
d = Node('d')
e = Node('e')
f = Node('f')
g = Node('g')
h = Node('h')
i = Node('i')
j = Node('j')
k = Node('k')
l = Node('l') # noqa. okay to use l as var here
m = Node('m')
n = Node('n')
# save a list of all nodes
self.node_list = [a, b, c, d, e, f, g, h, i, j, k, l, m, n]
self.top_node = a
a | [ # noqa
b,
c | [
d,
e | [f, g, h, i] | j
] | k
] | l | [m, n]
def test_connections(self):
Conns = namedtuple('Conns', 'node upstreams downstreams')
self.do_wiring()
n = {
node.name: Conns(
node.name,
{u.name for u in node._upstream_nodes},
{d.name for d in node._downstream_nodes}
)
for node in self.node_list
}
self.assertEqual(n['a'].upstreams, set())
self.assertEqual(n['a'].downstreams, {'b', 'c'})
self.assertEqual(n['b'].upstreams, {'a'})
self.assertEqual(n['b'].downstreams, {'l'})
self.assertEqual(n['c'].upstreams, {'a'})
self.assertEqual(n['c'].downstreams, {'d', 'e'})
self.assertEqual(n['e'].upstreams, {'c'})
self.assertEqual(n['e'].downstreams, {'f', 'g', 'h', 'i'})
self.assertEqual(n['f'].upstreams, {'e'})
self.assertEqual(n['f'].downstreams, {'j'})
self.assertEqual(n['g'].upstreams, {'e'})
self.assertEqual(n['g'].downstreams, {'j'})
self.assertEqual(n['h'].upstreams, {'e'})
self.assertEqual(n['h'].downstreams, {'j'})
self.assertEqual(n['i'].upstreams, {'e'})
self.assertEqual(n['i'].downstreams, {'j'})
self.assertEqual(n['d'].upstreams, {'c'})
self.assertEqual(n['d'].downstreams, {'k'})
self.assertEqual(n['j'].upstreams, {'f', 'g', 'h', 'i'})
self.assertEqual(n['j'].downstreams, {'k'})
self.assertEqual(n['k'].upstreams, {'j', 'd'})
self.assertEqual(n['k'].downstreams, {'l'})
self.assertEqual(n['l'].upstreams, {'k', 'b'})
self.assertEqual(n['l'].downstreams, {'m', 'n'})
def test_all_nodes(self):
self.do_wiring()
expected_set = set(self.node_list)
all_nodes_set = [
set(node.all_nodes) for node in self.node_list
]
self.assertTrue(all(
[expected_set == found_set for found_set in all_nodes_set]))
def test_top_node(self):
self.do_wiring()
top_node_set = {node.top_node for node in self.node_list}
self.assertEqual(top_node_set, {self.top_node})
def test_duplicate_node(self):
self.do_wiring()
# this test is funky in that it has assertion in a loop.
# but I wanted to be sure cycles are detected everywhere
for name in [n.name for n in self.top_node.all_nodes]:
dup = Node(name)
with self.assertRaises(ValueError):
self.top_node.add_downstream(dup)
def test_acyclic(self):
self.do_wiring()
# this test is funky in that it has assertion in a loop.
# but I wanted to be sure dups are detected everywhere
for node in self.top_node.all_nodes:
with self.assertRaises(ValueError):
node.add_downstream(self.top_node)
def test_multi_root(self):
self.do_wiring()
other_root = Node('dual_root')
other_root.add_downstream(self.top_node._downstream_nodes[0])
with self.assertRaises(ValueError):
other_root.top_node
def test_non_node_connect(self):
node = Node('a')
other = 'not a node'
with self.assertRaises(ValueError):
node.add_downstream(other)
def test_write(self):
# don't run coverage on this because won't test travis with
# both dot installed and not installed.
if dot_installed(): # pragma: no cover
self.do_wiring()
out_file = os.path.join(self.temp_dir, 'out.png')
self.top_node.plot(out_file)
# uncomment the next line if you want to look at the graph
os.system('cp {} /tmp'.format(out_file))
def test_write_bad_kind(self):
self.do_wiring()
with self.assertRaises(ValueError):
self.top_node.plot(kind='bad')
def test_bad_search_direction(self):
self.do_wiring()
with self.assertRaises(ValueError):
self.top_node.breadth_first_walk(direction='bad')
def test_bad_search_method(self):
self.do_wiring()
with self.assertRaises(ValueError):
self.top_node.walk(how='bad')
class DSLWiringTests(ExplicitWiringTests):
def do_wiring(self):
self.do_graph_wiring()
class TopDownCallTests(TestCase):
def test_call_order_okay(self):
# a toy class that holds a class variable
# tracking what order objects get called in
class MyNode(Node):
call_list = []
def end(self):
self.__class__.call_list.append(self)
a = MyNode('a')
b = MyNode('b')
c = MyNode('c')
d = MyNode('d')
e = MyNode('e')
f = MyNode('f')
g = MyNode('g')
a | [
b | c,
d | e | f
] | g
a.top_node.top_down_call('end')
# make a dictionary with order in which nodes
# were called
call_number = {
node: ind for (ind, node) in enumerate(a.__class__.call_list)}
# make sure ording of one branch is right
self.assertTrue(call_number[a] < call_number[b])
self.assertTrue(call_number[b] < call_number[c])
self.assertTrue(call_number[c] < call_number[g])
# make sure ordering of other branch is okay
self.assertTrue(call_number[a] < call_number[d])
self.assertTrue(call_number[d] < call_number[e])
self.assertTrue(call_number[e] < call_number[f])
self.assertTrue(call_number[f] < call_number[g])
class BreadthFirstSearchTests(TestCase):
def test_top_down_order(self):
a = Node('a')
b = Node('b')
c = Node('c')
d = Node('d')
e = Node('e')
f = Node('f')
h = Node('h')
i = Node('i')
def silly_router(item): # pragma: no cover
return 0
a | [b, c] | [d, e, f, silly_router] | [h, i]
nodes = a.top_node.breadth_first_walk(
direction='down', as_ordered_list=True)
level5 = {nodes.pop() for nn in range(2)}
level4 = {nodes.pop() for nn in range(3)}
level3 = {nodes.pop() for nn in range(2)}
level2 = {nodes.pop() for nn in range(2)}
level1 = {nodes.pop() for nn in range(1)}
self.assertEqual(level1, {a})
self.assertEqual(level2, {b, c})
self.assertEqual(len(level3), 2)
self.assertEqual(level4, {d, e, f})
self.assertEqual(level5, {h, i})
def test_bottom_up_order(self):
a = Node('a')
b = Node('b')
c = Node('c')
d = Node('d')
e = Node('e')
f = Node('f')
h = Node('h')
def silly_router(item): # pragma: no cover
return 0
a | [b, c] | [d, e, f, silly_router] | h
nodes = h.breadth_first_walk(direction='up', as_ordered_list=True)
nodes = nodes[::-1]
level5 = {nodes.pop() for nn in range(1)}
level4 = {nodes.pop() for nn in range(3)}
level3 = {nodes.pop() for nn in range(2)}
level2 = {nodes.pop() for nn in range(2)}
level1 = {nodes.pop() for nn in range(1)}
self.assertEqual(level1, {a})
self.assertEqual(level2, {b, c})
self.assertEqual(len(level3), 2)
self.assertEqual(level4, {d, e, f})
self.assertEqual(level5, {h})
class PrintingTests(TestCase):
def setUp(self):
# define nodes
a = Node('a')
b = Node('b')
c = Node('c')
d = Node('d')
e = Node('e')
f = Node('f')
g = Node('g')
h = Node('h')
i = Node('i')
j = Node('j')
k = Node('k')
l = Node('l') # noqa okay to use l here
m = Node('m')
n = Node('n')
class DummyPipeline(object):
pass
pipeline = DummyPipeline()
# save a list of all nodes
self.node_list = [a, b, c, d, e, f, g, h, i, j, k, l, m, n]
self.top_node = a
def my_router(item): # pragma: no cover
return 'm'
# wire up nodes using dsl
a | [
b, # noqa
c | [
d,
e | [f, g, h, i] | j
] | k
] | l | [m, n, my_router]
for node in self.top_node.all_nodes:
node.pipeline = pipeline
def test_nothing(self):
self.top_node.top_down_make_repr()
lines = sorted([
line.strip()
for line in self.top_node.pipeline._node_repr.split('\n')
if line.strip()
])
expected_lines = sorted([
'a | [b, c]',
'b | l',
'c | [d, e]',
'd | k',
'e | [f, g, h, i]',
'f | j',
'g | j',
'h | j',
'i | j',
'j | k',
'k | l',
'l | l.my_router',
'l.my_router | [m, n]',
])
self.assertEqual(lines, expected_lines)
class RoutingTests(TestCase):
def test_nothing(self):
a = Node('a')
b = Node('b')
c = Node('c')
d = Node('d')
e = Node('e')
def silly_router(item): # pragma: no cover
return 0
class ClassRouter(object): # pragma: no cover
def __call__(self, arg):
return arg
a | [b, c, ClassRouter()] | [d, e, silly_router]
================================================
FILE: consecution/tests/pipeline_tests.py
================================================
from __future__ import print_function
from collections import namedtuple, Counter
from unittest import TestCase
from consecution.nodes import Node, GroupByNode
from consecution.pipeline import Pipeline, GlobalState
from consecution.tests.testing_helpers import print_catcher
Item = namedtuple('Item', 'value parent source')
class Item(object): # pragma: no cover (just a testing helper)
def __init__(self, value, parent, source):
self.value = value
self.parent = parent
self.source = source
def build_source_list(self, source_list=None):
source_list = [] if source_list is None else source_list
source_list.append(self.source)
if self.parent:
self.parent.build_source_list(source_list)
return source_list
def get_path_string(self):
return '|'.join([str(self.value)] + self.build_source_list()[::-1])
def __str__(self):
return self.get_path_string()
def __repr__(self):
return self.get_path_string()
class TestNode(Node):
def process(self, item):
self.push(
Item(value=item.value, parent=item, source=self.name)
)
class ResultNode(Node):
def process(self, item):
self.global_state.final_items.append(item)
class BadNode(Node):
def begin(self):
self.push(1)
def process(self, item): # pragma: no cover this should never get hit.
self.push(item)
def item_generator():
for ind in range(1, 3):
yield Item(
value=ind,
parent=None,
source='generator'
)
class TestBase(TestCase):
def setUp(self):
a = TestNode('a')
b = TestNode('b')
c = TestNode('c')
d = TestNode('d')
even = TestNode('even')
odd = TestNode('odd')
g = TestNode('g')
def even_odd(item):
return ['even', 'odd'][item.value % 2]
a | b | [c, d] | [even, odd, even_odd] | g
self.pipeline = Pipeline(a, global_state=GlobalState(final_items=[]))
class GlobalStateUnitTests(TestCase):
def test_kwargs_passed(self):
g = GlobalState(custom_name='custom')
p = Pipeline(TestNode('a'), global_state=g)
self.assertTrue(p.global_state.custom_name == 'custom')
self.assertTrue(p.global_state['custom_name'] == 'custom')
def test_printing(self):
g = GlobalState(custom_name='custom')
with print_catcher() as catcher1:
print(g)
with print_catcher() as catcher2:
print(repr(g))
self.assertTrue(
'GlobalState(\'custom_name\')' in catcher1.txt)
self.assertTrue(
'GlobalState(\'custom_name\')' in catcher2.txt)
class OrOpTests(TestCase):
def test_ror(self):
a = Node('a')
b = Node('b')
c = Node('c')
d = Node('d')
p = Pipeline(a | ([b, c] | d))
with print_catcher() as catcher:
print(p)
self.assertTrue('a | [b, c]' in catcher.txt)
self.assertTrue('c | d' in catcher.txt)
self.assertTrue('b | d' in catcher.txt)
class ManualFeedTests(TestCase):
def test_manual_feed(self):
class N(Node):
def begin(self):
self.global_state.out_list = []
def process(self, item):
self.global_state.out_list.append(item)
pipeline = Pipeline(TestNode('a') | N('b'))
pushed_list = []
for item in item_generator():
pushed_list.append(item)
pipeline.push(item)
pipeline.end()
self.assertEqual(len(pipeline.global_state.out_list), 2)
class PipelineUnitTests(TestCase):
def test_push_in_begin(self):
pipeline = Pipeline(BadNode('a') | TestNode('b'))
with self.assertRaises(AttributeError):
pipeline.begin()
def test_no_process(self):
class N(Node):
pass
pipe = Pipeline(N('a') | N('b'))
with self.assertRaises(NotImplementedError):
pipe.consume(range(3))
def test_bad_route(self):
def bad_router(item):
return 'bad'
class N(Node):
def process(self, item):
self.push(item)
pipeline = Pipeline(N('a') | [N('b'), N('c'), bad_router])
with self.assertRaises(ValueError):
pipeline.consume(range(3))
def test_bad_node_lookup(self):
pipeline = Pipeline(TestNode('a') | TestNode('b'))
with self.assertRaises(KeyError):
pipeline['c']
def test_bad_replacement_name(self):
pipeline = Pipeline(TestNode('a') | TestNode('b'))
with self.assertRaises(ValueError):
pipeline['b'] = TestNode('c')
def test_flattened_list(self):
pipeline = Pipeline(
TestNode('a') | [[Node('b'), Node('c')]])
with print_catcher() as catcher:
print(pipeline)
self.assertTrue('a | [b, c]' in catcher.txt)
def test_logging(self):
pipeline = Pipeline(TestNode('a') | TestNode('b'))
pipeline['a'].log('output')
pipeline['b'].log('input')
with print_catcher() as catcher:
pipeline.consume(item_generator())
text = """
node_log,what,node_name,item
node_log,output,a,1|generator|a
node_log,input,b,1|generator|a
node_log,output,a,2|generator|a
node_log,input,b,2|generator|a
"""
for line in text.split('\n'):
self.assertTrue(line.strip() in catcher.txt)
def test_reset(self):
class N(Node):
def begin(self):
self.was_reset = False
def process(self, item):
self.push(item)
def reset(self):
self.was_reset = True
pipe = Pipeline(N('a') | N('b'))
pipe.consume(range(3))
self.assertFalse(pipe['a'].was_reset)
self.assertFalse(pipe['b'].was_reset)
pipe.reset()
self.assertTrue(pipe['a'].was_reset)
self.assertTrue(pipe['b'].was_reset)
class LoggingTests(TestBase):
def test_logging(self):
self.pipeline['g'].log('input')
with print_catcher() as printer:
self.pipeline.consume(item_generator())
counter = Counter()
for line in printer.lines():
even_odd = line.split('|')[-1]
counter.update({even_odd: 1})
self.assertEqual(counter['even'], 2)
self.assertEqual(counter['odd'], 2)
class ReplacementTests(TestBase):
def test_replace_first(self):
class Replacement(Node):
def process(self, item):
self.push(
Item(value=10 * item.value, parent=item, source=self.name)
)
self.pipeline['a'] = Replacement('a')
self.pipeline['a'].log('output')
with print_catcher() as printer:
self.pipeline.consume(item_generator())
self.assertEqual(printer.txt.count('10'), 1)
self.assertEqual(printer.txt.count('20'), 1)
def test_replace_even(self):
class Replacement(Node):
def process(self, item):
self.push(
Item(value=10 * item.value, parent=item, source=self.name)
)
self.pipeline['even'] = Replacement('even')
self.pipeline['g'].log('output')
with print_catcher() as printer:
self.pipeline.consume(item_generator())
self.assertEqual(printer.txt.count('1'), 2)
self.assertEqual(printer.txt.count('20'), 2)
def test_replace_no_router(self):
a = TestNode('a')
b = TestNode('b')
pipe = Pipeline(a | b)
pipe['b'] = TestNode('b')
with print_catcher() as catcher:
print(pipe)
self.assertTrue('a | b' in catcher.txt)
class ConsumingTests(TestBase):
def test_even_odd(self):
self.pipeline['g'].add_downstream(
ResultNode('result_node')
)
self.pipeline.consume(item_generator())
expected_path_set = set([
'1|generator|a|b|c|odd|g',
'1|generator|a|b|d|odd|g',
'2|generator|a|b|c|even|g',
'2|generator|a|b|d|even|g',
])
path_set = set(
item.get_path_string() for item in
self.pipeline.global_state.final_items
)
self.assertEqual(expected_path_set, path_set)
class ConstructingTests(TestBase):
def test_printing(self):
lines = repr(self.pipeline).split('\n')
self.assertEqual(len(lines), 13)
def test_plotting(self):
# don't want to force a mock dependency, so make a simple mock here
args_kwargs = []
def return_calls(*args, **kwargs):
args_kwargs.append(args)
args_kwargs.append(kwargs)
# assign my mock to the top node plot function
self.pipeline.top_node.plot = return_calls
# call pipeline plot
self.pipeline.plot()
# make sure top node plot was properly called
self.assertEqual(args_kwargs[0], ('pipeline', 'png'))
self.assertEqual(args_kwargs[1], {})
class Batch(GroupByNode):
def begin(self):
self.global_state.batches = []
def key(self, item):
return item // 3
def process(self, batch):
self.global_state.batches.append(batch)
class GroupByTests(TestCase):
def test_batching(self):
pipe = Pipeline(Batch('a'))
pipe.consume(range(9))
self.assertEqual(
pipe.global_state.batches,
[[0, 1, 2], [3, 4, 5], [6, 7, 8]]
)
def test_undefined_key(self):
class B(GroupByNode):
def process(self, item): # pragma: no cover
pass
pipe = Pipeline(B('a'))
with self.assertRaises(NotImplementedError):
pipe.consume(range(9))
def test_undefined_process(self):
class B(GroupByNode):
def key(self, item):
pass
pipe = Pipeline(B('a'))
with self.assertRaises(NotImplementedError):
pipe.consume(range(9))
================================================
FILE: consecution/tests/testing_helpers.py
================================================
import sys
from contextlib import contextmanager
# These don't need to covered. They are just tesing utilities
@contextmanager
def print_catcher(buff='stdout'): # pragma: no cover
if buff == 'stdout':
sys.stdout = Printer()
yield sys.stdout
sys.stdout = sys.__stdout__
elif buff == 'stderr':
sys.stderr = Printer()
yield sys.stderr
sys.stderr = sys.__stderr__
else: # pragma: no cover This is just to help testing. No need to cover.
raise ValueError('buff must be either \'stdout\' or \'stderr\'')
class Printer(object): # pragma: no cover
def __init__(self):
self.txt = ""
def write(self, txt):
self.txt += txt
def lines(self):
for line in self.txt.split('\n'):
yield line.strip()
================================================
FILE: consecution/tests/utils_tests.py
================================================
from __future__ import print_function
from unittest import TestCase
from consecution.utils import Clock
import time
from consecution.tests.testing_helpers import print_catcher
class ClockTests(TestCase):
def test_bad_start(self):
clock = Clock()
with self.assertRaises(ValueError):
clock.start()
def test_printing(self):
clock = Clock()
with clock.running('a', 'b', 'c'):
with clock.paused('a'):
time.sleep(.1)
with clock.paused('b'):
time.sleep(.1)
with print_catcher() as printer:
print(repr(clock))
names = []
for ind, line in enumerate(printer.txt.split('\n')):
if line:
if ind > 0:
names.append(line.split()[-1])
self.assertEqual(names, ['c', 'b', 'a'])
def test_get_time_of_running(self):
clock = Clock()
with clock.running('a'):
time.sleep(.1)
delta1 = int(10 * clock.get_time())
time.sleep(.1)
delta2 = int(10 * clock.get_time())
self.assertEqual(delta1, 1)
self.assertEqual(delta2, 2)
def test_pausing(self):
clock = Clock()
with clock.running('a', 'b', 'c'):
time.sleep(.1)
with clock.paused('b', 'c'):
time.sleep(.1)
self.assertEqual(int(10 * clock.get_time('a')), 2)
self.assertEqual(int(10 * clock.get_time('b')), 1)
self.assertEqual(int(10 * clock.get_time('c')), 1)
self.assertEqual(
{int(10 * v) for v in clock.get_time().values()},
{1, 2}
)
def test_stop_all(self):
clock = Clock()
clock.start('a', 'b')
time.sleep(.1)
clock.stop()
self.assertEqual(int(10 * clock.get_time('a')), 1)
self.assertEqual(int(10 * clock.get_time('b')), 1)
def test_reset_all(self):
clock = Clock()
clock.start('a', 'b')
time.sleep(.1)
clock.stop('b')
self.assertEqual(len(clock.delta), 1)
clock.reset()
self.assertEqual(len(clock.get_time()), 0)
def test_double_calls(self):
clock = Clock()
clock.start('a')
clock.start('a')
time.sleep(.1)
clock.stop('a')
clock.stop('a')
self.assertEqual(int(round(10 * clock.get_time())), 1)
clock.reset('a')
clock.reset('a')
clock.reset('b')
clock.reset('b')
self.assertEqual(clock.get_time(), {})
def test_get_time_delta_only(self):
clock = Clock()
clock.start('a')
clock.stop('a')
self.assertEqual(clock.get_time('f'), {})
================================================
FILE: consecution/utils.py
================================================
from collections import Counter
from contextlib import contextmanager
import datetime
class Clock(object):
def __init__(self):
# see the reset method for instance attributes
self.delta = Counter()
self.active_start_times = dict()
@contextmanager
def running(self, *names):
self.start(*names)
yield
self.stop(*names)
@contextmanager
def paused(self, *names):
self.stop(*names)
yield
self.start(*names)
def start(self, *names):
if not names:
raise ValueError('You must provide at least one name to start')
for name in names:
if name not in self.active_start_times:
self.active_start_times[name] = datetime.datetime.now()
def stop(self, *names):
ending = datetime.datetime.now()
if not names:
names = list(self.active_start_times.keys())
for name in names:
if name in self.active_start_times:
starting = self.active_start_times.pop(name)
self.delta.update({name: (ending - starting).total_seconds()})
def reset(self, *names):
if not names:
names = list(self.active_start_times.keys())
names.extend(list(self.delta.keys()))
for name in names:
if name in self.delta:
self.delta.pop(name)
if name in self.active_start_times:
self.active_start_times.pop(name)
def get_time(self, *names):
ending = datetime.datetime.now()
if not names:
names = list(self.delta.keys())
names.extend(list(self.active_start_times.keys()))
delta = Counter()
for name in names:
if name in self.delta:
delta.update({name: self.delta[name]})
elif name in self.active_start_times:
delta.update(
{
name: (
ending - self.active_start_times[name]
).total_seconds()
}
)
if len(delta) == 1:
return delta[list(delta.keys())[0]]
else:
return dict(delta)
def __str__(self):
records = sorted(self.delta.items(), key=lambda t: t[1], reverse=True)
records = [('%0.6f' % r[1], r[0]) for r in records]
out_list = ['{: <15s}{}'.format('seconds', 'name')]
for rec in records:
out_list.append('{: <15s}{}'.format(*rec))
return '\n'.join(out_list)
def __repr__(self):
return self.__str__()
================================================
FILE: docker/Dockerfile
================================================
FROM ubuntu:xenial
# root is the home directory
WORKDIR /root
ADD simple_example.py /root/simple_example.py
# set up the system tools including conda
RUN \
rm /bin/sh && ln -s /bin/bash /bin/sh && \
apt-get update && \
apt-get install -y vim && \
apt-get install -y git && \
apt-get install -y wget && \
apt-get install -y curl && \
apt-get install -y graphviz && \
apt-get install -y python-dev
RUN \
curl -sS https://bootstrap.pypa.io/get-pip.py | python
RUN \
pip install git+https://github.com/robdmc/consecution.git
================================================
FILE: docker/docker_build.sh
================================================
#! /usr/bin/env bash
docker build . -t consecution
================================================
FILE: docker/docker_run.sh
================================================
#! /usr/bin/env bash
docker run -it --rm -v $(pwd):/root/shared consecution /bin/bash
================================================
FILE: docker/simple_example.py
================================================
#! /usr/bin/env python
# TODO: make the consecution install in the docker file read from pip
from __future__ import print_function
from consecution import Node, Pipeline
class N(Node):
def process(self, item):
print(item, self.name)
self.push(item)
p = Pipeline(
N('a') | [N('b'), N('c')] | N('d')
)
p.plot()
p.consume(range(5))
================================================
FILE: docs/Makefile
================================================
# Makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
PAPER =
BUILDDIR = _build
# User-friendly check for sphinx-build
ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
endif
# Internal variables.
PAPEROPT_a4 = -D latex_paper_size=a4
PAPEROPT_letter = -D latex_paper_size=letter
ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
# the i18n builder cannot share the environment and doctrees with the others
I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp epub latex latexpdf text man changes linkcheck doctest gettext
help:
@echo "Please use \`make