Repository: benjaminmgross/visualize-wealth
Branch: master
Commit: 76f3f0fd815a
Files: 22
Total size: 187.7 KB

Directory structure:
gitextract_xj6nrv43/

├── .gitignore
├── .travis.yml
├── README.md
├── requirements.txt
├── run_tests
├── setup.py
├── test_data/
│   ├── estimating when splits have occurred.xlsx
│   ├── panel from weight file test.xlsx
│   ├── test_analyze.xlsx
│   ├── test_ret_calcs.xlsx
│   ├── test_splits.xlsx
│   ├── transaction-costs.xlsx
│   └── ~$panel from weight file test.xlsx
├── test_module/
│   ├── __init__.py
│   ├── test_analyze.py
│   ├── test_construct_portfolio.py
│   └── test_utils.py
└── visualize_wealth/
    ├── __init__.py
    ├── analyze.py
    ├── classify.py
    ├── construct_portfolio.py
    └── utils.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
.DS_Store
*~
build/
*.pyc
*.dropbox
*.egg-info/
dist/
docs/
.coverage


================================================
FILE: .travis.yml
================================================
language: python
python:
  - 2.7
# command to install dependencies
before_install:
  - wget http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh
  - chmod +x miniconda.sh
  - ./miniconda.sh -b
  - export PATH=/home/travis/miniconda/bin:$PATH
  - conda update --yes conda

install:
  - conda install --yes python=$TRAVIS_PYTHON_VERSION atlas numpy scipy pytest 
  - conda install --yes python=$TRAVIS_PYTHON_VERSION matplotlib nose dateutil 
  - conda install --yes python=$TRAVIS_PYTHON_VERSION pandas statsmodels pytables xlrd

#  - python setup.py install
#  - pip install -r preamble.txt
#  - pip install -r requirements.txt
#  - pip install -r denouement.txt

# command to run tests
script: 
  - py.test ./test_module/test_analyze.py -v
  - py.test ./test_module/test_utils.py -v
  - py.test ./test_module/test_construct_portfolio.py -v

# the body of this script was found by @dan-blanchard at https://gist.github.com/dan-blanchard/7045057


================================================
FILE: README.md
================================================
#`visualize_wealth` README.md [![Build Status](https://travis-ci.org/benjaminmgross/visualize-wealth.svg?branch=master)](https://travis-ci.org/benjaminmgross/visualize-wealth)

A library built in Python to construct, backtest, analyze, and evaluate portfolios and their benchmarks, with comprehensive documentation and manual calculations to illustrate all underlying methodologies and statistics.

##License

This program is free software and is distrubuted under the
[GNU General Public License version 3](http://www.gnu.org/licenses/quick-guide-gplv3.html) ("GNU GPL v3")

&copy; Benjamin M. Gross 2013

**NOTE:** Because so much of the underlying technology I'm continuing to build has become the building blocks 
for [my financial technology startup](http://www.visualizewealth.com), I've forked this repo (as of 5.2015) and made new 
changes private. I might continue to push some of the bigger changes to this repo to keep it open source, but 
we'll see.

##Dependencies

- `numpy` & `scipy`: The building blocks of everything quant
- `pandas`: extensively used (`numpy` and `scipy` obviously, but
- `pandas` depends on those)
- `tables`: for HDFStore price extraction
- `urllib2`: for Yahoo! API calls to append price `DataFrame`s with
Dividends

For a full list of dependencies, see the `requirements.txt` file in 
the root folder.


##Installation

To install the `visualize_wealth` modules onto your computer, go into
your desired folder of choice (say `Downloads`), and:

1. Clone the repository

	    $ cd ~/Downloads
	    $ git clone https://github.com/benjaminmgross/wealth-viz

2. `cd` into the `wealth-viz` directory

        $ cd wealth-viz

3. Install the package

        $ python setup.py install

4. Check your install.  From anywhere on your machine, be able to open
   `iPython` and import the library, for example:

	    $ cd ~/
	    $ ipython

        IPython 1.1.0 -- An enhanced Interactive Python.
        ?         -> Introduction and overview of IPython's features.
        %quickref -> Quick reference.
        help      -> Python's own help system.
        object?   -> Details about 'object', use 'object??' for extra details.
	
        In [1]: import visualize_wealth

**"Ligget Se!"**

##Documentation

The `README.md` file has fairly good examples, but I've gone to great lengths to autogenerate documentation for the code using [Sphinx](http://sphinx-doc.org/).  Therefore, aside from the docstrings, when you `git clone` the repository, use these instructions to generate the auto-documentation:

    1. `cd /path-to-wealth-viz/`
    2. `sphinx-build -b html ./docs/source/ ./docs/build/`
   
Now that the autogenerated documentation is complete, you can `cd` into:

	$ cd visualize_wealth/docs/build/

and find full `.html` browseable code documentation (that's pretty beautiful... if I do say so my damn self) with live links, function explanations (that also have live links to their respective definition on the web), etc.

Also I've created an Excel spreadsheet that illustrates almost all of the `analyze.py` portfolio statistic calculations.  That spreadsheet can be found in:

	visualize_wealth > tests > test_analyze.xlsx

In fact, the unit testing for the `analyze.py` portfolio statistics tests the python calculations against this same excel spreadsheet, so you can really get into the guts of how these things are calculated.


##[Portfolio Construction Examples](portfolio-construction-examples)

Portfolios can (generally) be constructed in one of three ways:

1. The Blotter Method
2. Weight Allocation Method
3. Initial Allocation with specific Rebalancing Period Method

### 1. [The Blotter Method](blotter-method-examples)

**The blotter method:** In finance, a spreadsheet of "buys/sells", "Prices", "Dates" etc. is called a "trade blotter."  This also would be the easiest way for an investor to actually analyze the past performance of her portfolio, because trade confirmations provide this exact data.
   
This method is most effectively achieved by providing an Excel / `.csv` file with the following format:

| Date   |Buy / Sell| Price |Ticker|
|:-------|:---------|:------|:-----|
|9/4/2001| 50       | 123.45| EFA  |
|5/5/2003| 65       | 107.71| EEM	|
|6/6/2003|-15       | 118.85| EEM 	|

where "Buys" can be distinguished from "Sells" because buys are positive (+) and sells are negative (-).

For example, let's say I wanted to generate a random portfolio containing the following tickers and respective asset classes, using the `generate_random_portfolio_blotter` method

|Ticker  | Description              | Asset Class        | Price Start|
|:-------|:-------------------------|:-------------------|:-----------|
| IWB    | iShares Russell 1000     | US Equity          | 5/19/2000  |
| IWR    | iShares Russell Midcap   | US Equity          | 8/27/2001  |
| IWM    | iShares Russell 2000     | US Equity          | 5/26/2000  |
| EFA    | iShares EAFE             | Foreign Dev Equity | 8/27/2001  |
| EEM    | iShares EAFE EM          | Foreign EM Equity  | 4/15/2003  |
| TIP    | iShares TIPS             | Fixed Income       | 12/5/2003  |
| TLT    | iShares LT Treasuries    | Fixed Income       | 7/31/2002  |
| IEF    | iShares MT Treasuries    | Fixed Income       | 7/31/2002  |
| SHY    | iShares ST Treasuries    | Fixed Income       | 7/31/2002  |
| LQD    | iShares Inv Grade        | Fixed Income       | 7/31/2002  |
| IYR    | iShares Real Estate      | Alternative        | 6/19/2000  |
| GLD    | iShares Gold Index       | Alternative        | 11/18/2004 |
| GSG    | iShares Commodities      | Alternative        | 7/21/2006  |

I could construct a portfolio of random trades (i.e. the "blotter method"), say 20 trades for each asset, by executing the following:
	
	        #import the modules
	In [5]: import vizualize_wealth.construct_portfolio as vwcp

	In [6]: ticks = ['IWB','IWR','IWM','EFA','EEM','TIP','TLT','IEF',
	                 'SHY','LQD','IYR','GLD','GSG']		
	In [7]: num_trades = 20
	
	        #construct the random trade blotter
	In [8]: blotter = vwcp.generate_random_portfolio_blotter(ticks, num_trades)
	
	        #construct the portfolio panel
	In [9]: port_panel = vwcp.panel_from_blotter(blotter)
	
Now I have a `pandas.Panel`. Before we constuct the cumulative portfolio values, let's examine the dimensions of the panel (which are generally the same for all construction methods, although the columns of the `minor_axis` are different because the methods call for different optimized calculations) with the following dimensions:

	#tickers are `panel.items`
	In [10]: port_panel.items
	Out[10]: Index([u'EEM', u'EFA', u'GLD', u'GSG', u'IEF', u'IWB', u'IWM', u'IWR', 
				u'IYR', u'LQD', u'SHY', u'TIP', u'TLT'], dtype=object)

	#dates are along the `panel.major_axis`
	In [12]: port_panel.major_axis
	Out[12]: 
	<class 'pandas.tseries.index.DatetimeIndex'>
	[2000-07-06 00:00:00, ..., 2013-10-30 00:00:00]
	Length: 3351, Freq: None, Timezone: None

	#price data, cumulative investment, dividends, and split ratios are `panel.minor_axis`
	In [13]: port_panel.minor_axis
	Out[13]: Index([u'Open', u'High', u'Low', u'Close', u'Volume', u'Adj Close',
		u'Dividends',u'Splits', u'contr_withdrawal', u'cum_investment', 
		u'cum_shares'], dtype=object)

There is a lot of information to be gleaned from this data object, but the most common goal would be to convert this `pandas.Panel` to a Portfolio `pandas.DataFrame` with columns `['Open', 'Close']`, so it can be compared against other assets or combination of assets.  In this case, use `pfp_from_blotter`(which stands for "portfolio_from_panel" + portfolio construction method [i.e. blotter, weights, or initial allocaiton] which in this case was "the blotter method").
	
		#construct_the portfolio series
		In [14]: port_df = vwcp.pfp_from_blotter(panel, 1000.)
	
		In [117]: port_df.head()
		Out[117]: 
        	          Close         Open
		Date                                
		2000-07-06  1000.000000   988.744754
		2000-07-07  1006.295307  1000.190767
		2000-07-10  1012.876765  1005.723006
		2000-07-11  1011.636780  1011.064479
		2000-07-12  1031.953453  1016.978253

###2. [The Weight Allocation Method](weight-allocation-method-examples)

A commonplace way to test portoflio management strategies using a
group of underlying assets is to construct aggregate portofolio
performance, given a specified weighting allocation to specific assets
on specified dates.  Specifically, those (often times) percentage
allocations represent a recommended allocation at some point in time,
based on some "view" derived from either the output of a model or some qualitative
analysis.  Therefore, having an engine that is capable of taking in a weighting file (say, a `.csv`) with the following format:

|Date    | Ticker 1  | Ticker 2  | Ticker 3 | Ticker 4 |
|:-------|:---------:|:---------:|:--------:|:--------:|
|1/1/2002| 5%        | 20%       | 30%      | 45%      |
|6/3/2003| 40%       | 10%       | 40%      | 10%      |
|7/8/2003| 25%       | 25%       | 25%      | 25%      |

and turning the above allocation file into a cumulative portfolio
value that can then be analyzed and compared (both in isolation and
relative to specified benchmarks) is highly valuable in the process of
portfolio strategy creation.

A quick example of a weighting allocation file can be found in the
Excel File `visualize_wealth/tests/panel from weight file test.xlsx`,
where the tab `rebal_weights` represents one of these specific
weighting files.

To construct a portfolio of using the **Weighting Allocation Method**,
a process such as the following would be carried out.

	#import the library
	import visualize_wealth.construct_portfolio as vwcp

If we didn't have the prices already, there's a function for that

	#fetch the prices and put them into a pandas.Panel
    price_panel = vwcp.fetch_data_for_weight_allocation_method(weight_df)

	#construct the panel that will go into the portfolio constructor

	 port_panel = vwcp.panel_from_weight_file(weight_df, price_panel,
	     start_value = 1000.)

Construct the `pandas.DataFrame` for the portfolio, starting at
`start_value` of 1000 with columns `['Open', Close']`

	portfolio = vwcp.pfp_from_weight_file(port_panel)

Now a portfolio with `index` of daily values and columns
`['Open', 'Close']` has been created upon which analytics and
performance analysis can be done.

### 3. [The Initial Allocation & Rebalancing Method](initial-allocation-method-examples)

The standard method of portoflio construction that pervades in many
circles to this day is static allocation with a given interval of
rebalancing. For instance, if I wanted to implement Oppenheimers'
[The New 60/40](https://www.oppenheimerfunds.com/digitalAssets/Discover-the-New-60-40-43f7f642-e0aa-40d9-a3fc-00f31be5a4fa.pdf)
static portfolio, rebalancing on a yearly interval, my weighting
scheme would be as follows:

| Ticker | Name                     | Asset Class        | Allocation |
|:-------|:-------------------------|:-------------------|:-----------|
| IWB    | iShares Russell 1000     | US Equity          |        15% |
| IWR    | iShares Russell Midcap   | US Equity          |       7.5% |
| IWM    | iShares Russell 2000     | US Equity          |       7.5% |
| SCZ    | iShares EAFE Small Cap   | Foreign Dev Equity |       7.5% |
| EFA    | iShares EAFE             | Foreign Dev Equity |      12.5% |
| EEM    | iShares EAFE EM          | Foreign EM Equity  |        10% |
| TIP    | iShares TIPS             | Fixed Income       |         5% |
| TLT    | iShares LT Treasuries    | Fixed Income       |       2.5% |
| IEF    | iShares MT Treasuries    | Fixed Income       |       2.5% |
| SHY    | iShares ST Treasuries    | Fixed Income       |         5% |
| HYG    | iShares High Yield       | Fixed Income       |       2.5% |
| LQD    | iShares Inv Grade        | Fixed Income       |       2.5% |
| PCY    | PowerShares EM Sovereign | Fixed Income       |         2% |
| BWX    | SPDR intl Treasuries     | Fixed Income       |         2% |
| MBB    | iShares MBS              | Fixed Income       |         1% |
| PFF    | iShares Preferred Equity | Alternative        |       2.5% |
| IYR    | iShares Real Estate      | Alternative        |         5% |
| GLD    | iShares Gold Index       | Alternative        |       2.5% |
| GSG    | iShares Commodities      | Alternative        |         5% |

To implement such a weighting scheme, we can use the same worksheet
`visualize_wealth/tests/panel from weight file test.xlsx`, and the
tab.  `static_allocation`.  Note there is only a single row of
weights, as this will be the "static allocation" to be rebalanced to
at some given interval.

    #import the construct_portfolio library
	import visualize_wealth.construct_portfolio as vwcp

Let's use the `static_allocation` provided in the `panel from weight
file.xlsx` workbook

    f = pandas.ExcelFile('tests/panel from weight file test.xlsx')
	static_alloc = f.parse('static_allocation', index_col = 0,
	    header_col = 0)

Again, assume we don't have the prices and need to donwload them, use
the `fetch_data_for_initial_allocation_method`

    price-panel = vwcp.fetch_data_for_initial_allocation_method(static_alloc)

Construct the `panel` for the portoflio while determining the desired
rebalance frequency

    panel =	vwcp.panel_from_initial_weights(weight_series = static_alloc,
		static_alloc, price_panel = price_panel, rebal_frequency = 'quarterly')


Construct the final portfolio with columns `['Open', 'Close']`

    portfolio = vwcp.pfp_from_weight_file(panel)

Take a look at the portfolio series:

    In [10:] portfolio.head()
	Out[11:]

	            Close        Open
	Date
	2007-12-12  1000.000000  1007.885932
	2007-12-13   991.329125   990.717915
	2007-12-14   978.157960   983.057829
	2007-12-17   961.705069   969.797167
	2007-12-18   969.794966   972.365687
  

================================================
FILE: requirements.txt
================================================

chardet>=1.0.1
cython>=0.21.1
h5py>=2.3.1
ipdb>=0.8
ipython>=3.0.0
matplotlib>=1.4.2
numpy>=1.9.1
numexpr>=2.4
pandas>=0.14.1
py>=1.4.26
pytest>=2.6.4
pytest-cov>=1.8.1
scipy>=0.14.0
tables>=3.1.1
xlrd>=0.9.3

================================================
FILE: run_tests
================================================
#!/bin/bash

declare -a fList=(
test_analyze.py
test_construct_portfolio.py
test_utils.py
)

for nm in "${fList[@]}"
do
  echo testing "$nm"
  py.test ./test_module/"$nm" -v
done


================================================
FILE: setup.py
================================================
#!/usr/bin/env python
# encoding: utf-8

from setuptools import setup

setup(name='visualize_wealth',
      version='0.1',
      description='Portfolio Construction and Analysis',
      author='Benjamin M. Gross',
      author_email='benjaminMgross@gmail.com',
      url='https://github.com/benjaminmgross/wealth-viz',
      packages=['visualize_wealth'])


================================================
FILE: test_module/__init__.py
================================================


================================================
FILE: test_module/test_analyze.py
================================================
#!/usr/bin/env python
# encoding: utf-8

"""
.. module:: visualize_wealth.test_module.test_analyze.py

.. moduleauthor:: Benjamin M. Gross <benjaminMgross@gmail.com>

"""

import pytest
import pandas
from pandas.util import testing
import visualize_wealth.analyze as analyze

@pytest.fixture
def test_file():
    return pandas.ExcelFile('./test_data/test_analyze.xlsx')

@pytest.fixture
def man_calcs(test_file):
    return test_file.parse('calcs', index_col = 0)

@pytest.fixture
def stat_calcs(test_file):
    return test_file.parse('results', index_col = 0)

@pytest.fixture
def prices(test_file):
    tmp = test_file.parse('calcs', index_col = 0)
    return tmp[['S&P 500', 'VGTSX']]

def test_active_return(prices, stat_calcs):
    man_ar = stat_calcs.loc['active_return', 'VGTSX']

    testing.assert_almost_equal(man_ar, analyze.active_return(
                                series = prices['VGTSX'],
                                benchmark = prices['S&P 500'],
                                freq = 'daily')
    )

def test_active_returns(man_calcs, prices):
    active_returns = analyze.active_returns(series = prices['VGTSX'], 
                                            benchmark = prices['S&P 500'])

    testing.assert_series_equal(man_calcs['Active Return'], active_returns)

def test_log_returns(man_calcs, prices):
    testing.assert_series_equal(man_calcs['S&P 500 Log Ret'],
                                analyze.log_returns(prices['S&P 500'])
    )

def test_linear_returns(man_calcs, prices):
    testing.assert_series_equal(man_calcs['S&P 500 Lin Ret'],
                                analyze.linear_returns(prices['S&P 500'])
    )

def test_drawdown(man_calcs, prices):
    testing.assert_series_equal(man_calcs['VGTSX Drawdown'],
                                analyze.drawdown(prices['VGTSX'])
    )

def test_r2(man_calcs, prices):
    log_rets = analyze.log_returns(prices).dropna()
    pandas_rsq = pandas.ols(x = log_rets['S&P 500'], 
                            y = log_rets['VGTSX']).r2

    analyze_rsq = analyze.r2(benchmark = log_rets['S&P 500'], 
                             series = log_rets['VGTSX'])

    testing.assert_almost_equal(pandas_rsq, analyze_rsq)

def test_r2_adj(man_calcs, prices):
    log_rets = analyze.log_returns(prices).dropna()
    pandas_rsq = pandas.ols(x = log_rets['S&P 500'], 
                            y = log_rets['VGTSX']).r2_adj

    analyze_rsq = analyze.r2_adj(benchmark = log_rets['S&P 500'], 
                             series = log_rets['VGTSX'])

    testing.assert_almost_equal(pandas_rsq, analyze_rsq)

def test_cumulative_turnover(test_file, stat_calcs):
    alloc_df = test_file.parse('alloc_df', index_col = 0)
    cols = alloc_df.columns[alloc_df.columns!='Daily TO']
    alloc_df = alloc_df[cols].dropna()
    asset_wt_df = test_file.parse('asset_wt_df', index_col = 0)
    testing.assert_almost_equal(analyze.cumulative_turnover(alloc_df, asset_wt_df), 
                                stat_calcs.loc['cumulative_turnover', 'S&P 500']
    )

def test_mctr(test_file):
    mctr_prices = test_file.parse('mctr', index_col = 0)
    mctr_manual = test_file.parse('mctr_results', index_col = 0)
    cols = ['BSV','VBK','VBR','VOE','VOT']
    mctr = analyze.mctr(mctr_prices[cols], mctr_prices['Portfolio'])
    testing.assert_series_equal(mctr, mctr_manual.loc['mctr', cols])

def test_risk_contribution(test_file):
    mctr_prices = test_file.parse('mctr', index_col = 0)
    mctr_manual = test_file.parse('mctr_results', index_col = 0)
    cols = ['BSV','VBK','VBR','VOE','VOT']
    mctr = analyze.mctr(mctr_prices[cols], mctr_prices['Portfolio'])
    weights = pandas.Series( [.2, .2, .2, .2, .2], index = cols, name = 'risk_contribution')
    
    testing.assert_series_equal(analyze.risk_contribution(mctr, weights), 
                             mctr_manual.loc['risk_contribution', :]
    )

def test_risk_contribution_as_proportion(test_file):
    mctr_prices = test_file.parse('mctr', index_col = 0)
    mctr_manual = test_file.parse('mctr_results', index_col = 0)
    cols = ['BSV','VBK','VBR','VOE','VOT']
    mctr = analyze.mctr(mctr_prices[cols], mctr_prices['Portfolio'])
    weights = pandas.Series( [.2, .2, .2, .2, .2], index = cols, name = 'risk_contribution')
    
    testing.assert_series_equal(
        analyze.risk_contribution_as_proportion(mctr, weights),
        mctr_manual.loc['risk_contribution_as_proportion']
    )

def test_alpha(prices, stat_calcs):
    man_alpha = stat_calcs.loc['alpha', 'VGTSX']

    testing.assert_almost_equal(man_alpha, analyze.alpha(series = prices['VGTSX'],
                                                         benchmark = prices['S&P 500'])
    )

def test_annualized_return(prices, stat_calcs):
    man_ar = stat_calcs.loc['annualized_return', 'VGTSX']
    
    testing.assert_almost_equal(
        man_ar, analyze.annualized_return(series = prices['VGTSX'], freq = 'daily')
    )

def test_annualized_vol(prices, stat_calcs):
    man_ar = stat_calcs.loc['annualized_vol', 'VGTSX']
    
    testing.assert_almost_equal(
        man_ar, analyze.annualized_vol(series = prices['VGTSX'], freq = 'daily')
    )

def test_appraisal_ratio(prices, stat_calcs):
    man_ar = stat_calcs.loc['appraisal_ratio', 'VGTSX']

    testing.assert_almost_equal(man_ar, analyze.appraisal_ratio(
                                series = prices['VGTSX'],
                                benchmark = prices['S&P 500'],
                                freq = 'daily',
                                rfr = 0.0)
    )

def test_beta(prices, stat_calcs):
    man_beta = stat_calcs.loc['beta', 'VGTSX']

    testing.assert_almost_equal(man_beta, analyze.beta(series = prices['VGTSX'],
                                                       benchmark = prices['S&P 500'])
    )

def test_cvar_cf(prices, stat_calcs):
    man_cvar_cf = stat_calcs.loc['cvar_cf', 'VGTSX']

    testing.assert_almost_equal(
        man_cvar_cf, analyze.cvar_cf(series = prices['VGTSX'], p = 0.01)
    )

def test_cvar_norm(prices, stat_calcs):
    man_cvar_norm = stat_calcs.loc['cvar_norm', 'VGTSX']

    testing.assert_almost_equal(
        man_cvar_norm, analyze.cvar_norm(series = prices['VGTSX'], p = 0.01)
    )

def test_downcapture(prices, stat_calcs):
    man_dc = stat_calcs.loc['downcapture', 'VGTSX']

    testing.assert_almost_equal(
        man_dc, analyze.downcapture(series = prices['VGTSX'], 
                                    benchmark = prices['S&P 500'])
    )

def test_downside_deviation(prices, stat_calcs):
    man_dd = stat_calcs.loc['downside_deviation', 'VGTSX']

    testing.assert_almost_equal(
        man_dd, analyze.downside_deviation(series = prices['VGTSX'])
    )

def test_geometric_difference():
    a, b = 1. , 1.
    assert analyze.geometric_difference(a, b) == 0.
    a, b = pandas.Series({'a': 1.}), pandas.Series({'a': 1.})
    assert analyze.geometric_difference(a, b).values == 0.

def test_idiosyncratic_as_proportion(prices, stat_calcs):
    man_iap = stat_calcs.loc['idiosyncratic_as_proportion', 'VGTSX']

    testing.assert_almost_equal(
        man_iap, analyze.idiosyncratic_as_proportion(
            series = prices['VGTSX'], benchmark = prices['S&P 500'])
    )

def test_idiosyncratic_risk(prices, stat_calcs):
    man_ir = stat_calcs.loc['idiosyncratic_risk', 'VGTSX']

    testing.assert_almost_equal(
        man_ir, analyze.idiosyncratic_risk(
            series = prices['VGTSX'], benchmark = prices['S&P 500'])
    )

def test_information_ratio(prices, stat_calcs):
    man_ir = stat_calcs.loc['information_ratio', 'VGTSX']

    testing.assert_almost_equal(man_ir, analyze.information_ratio(
                                series = prices['VGTSX'],
                                benchmark = prices['S&P 500'],
                                freq = 'daily')
    )

def test_jensens_alpha(prices, stat_calcs):
    man_ja = stat_calcs.loc['jensens_alpha', 'VGTSX']

    testing.assert_almost_equal(
        man_ja, analyze.jensens_alpha(
            series = prices['VGTSX'], benchmark = prices['S&P 500'])
    )

def test_max_drawdown(prices, stat_calcs):    
    man_md = stat_calcs.loc['max_drawdown', 'VGTSX']

    testing.assert_almost_equal(
        man_md, analyze.max_drawdown(series = prices['VGTSX'])
    )

def test_mean_absolute_tracking_error(prices, stat_calcs):    
    man_mate = stat_calcs.loc['mean_absolute_tracking_error', 'VGTSX']

    testing.assert_almost_equal(
        man_mate, analyze.mean_absolute_tracking_error(
            series = prices['VGTSX'], benchmark = prices['S&P 500'])
    )

def test_median_downcapture(prices, stat_calcs):
    man_md = stat_calcs.loc['median_downcapture', 'VGTSX']

    testing.assert_almost_equal(
        man_md, analyze.median_downcapture(
            series = prices['VGTSX'], benchmark = prices['S&P 500'])
    )

def test_median_upcapture(prices, stat_calcs):
    man_uc = stat_calcs.loc['median_upcapture', 'VGTSX']

    testing.assert_almost_equal(
        man_uc, analyze.median_upcapture(
            series = prices['VGTSX'], benchmark = prices['S&P 500'])
    )

def test_risk_adjusted_excess_return(prices, stat_calcs):
    man_raer = stat_calcs.loc['risk_adjusted_excess_return', 'VGTSX']

    testing.assert_almost_equal(
        man_raer, analyze.risk_adjusted_excess_return(
            series = prices['VGTSX'], benchmark = prices['S&P 500'],
            rfr = 0.0, freq = 'daily')
    )

def test_adj_sharpe_ratio(prices, stat_calcs):
    man_asr = stat_calcs.loc['adj_sharpe_ratio', 'VGTSX']

    testing.assert_almost_equal(
        man_asr, analyze.adj_sharpe_ratio(
            series = prices['VGTSX'], 
            rfr = 0.0, 
            freq = 'daily')
    )

def test_sharpe_ratio(prices, stat_calcs):
    man_sr = stat_calcs.loc['sharpe_ratio', 'VGTSX']

    testing.assert_almost_equal(man_sr, analyze.sharpe_ratio(
            series = prices['VGTSX'], 
            rfr = 0.0, 
            freq = 'daily')
    )

def test_sortino_ratio(prices, stat_calcs):
    man_sr = stat_calcs.loc['sortino_ratio', 'VGTSX']

    testing.assert_almost_equal(man_sr, analyze.sortino_ratio(
            series = prices['VGTSX'], 
            rfr = 0.0, 
            freq = 'daily')
    )

def test_systematic_as_proportion(prices, stat_calcs):
    man_sap = stat_calcs.loc['systematic_as_proportion', 'VGTSX']

    testing.assert_almost_equal(
        man_sap, analyze.systematic_as_proportion(
            series = prices['VGTSX'], benchmark = prices['S&P 500'])
    )

def test_systematic_risk(prices, stat_calcs):
    man_sr = stat_calcs.loc['systematic_risk', 'VGTSX']

    testing.assert_almost_equal(
        man_sr, analyze.systematic_risk(
            series = prices['VGTSX'], benchmark = prices['S&P 500'])
    )

def test_tracking_error(prices, stat_calcs):
    man_te = stat_calcs.loc['tracking_error', 'VGTSX']

    testing.assert_almost_equal(
        man_te, analyze.tracking_error(
            series = prices['VGTSX'], benchmark = prices['S&P 500'])
    )

def test_ulcer_index(prices, stat_calcs):
    man_ui = stat_calcs.loc['ulcer_index', 'VGTSX']

    testing.assert_almost_equal(
        man_ui, analyze.ulcer_index(series = prices['VGTSX'])
    )

def test_upcapture(prices, stat_calcs):
    man_uc = stat_calcs.loc['upcapture', 'VGTSX']

    testing.assert_almost_equal(
        man_uc, analyze.upcapture(
            series = prices['VGTSX'], benchmark = prices['S&P 500'])
    )

def test_upside_deviation(prices, stat_calcs):
    man_ud = stat_calcs.loc['upside_deviation', 'VGTSX']

    testing.assert_almost_equal(
        man_ud, analyze.upside_deviation(
            series = prices['VGTSX'], 
            freq = 'daily')
    )


================================================
FILE: test_module/test_construct_portfolio.py
================================================

#!/usr/bin/env python
# encoding: utf-8

"""
.. module:: visualize_wealth.test_module.test_construct_portfolio.py

.. moduleauthor:: Benjamin M. Gross <benjaminMgross@gmail.com>

"""
import os
import pytest
import numpy
import pandas
import tempfile
import datetime
from pandas.util import testing
from visualize_wealth import construct_portfolio as cp

@pytest.fixture
def test_file():
    f = './test_data/panel from weight file test.xlsx'
    return pandas.ExcelFile(f)

@pytest.fixture
def tc_file():
    f = './test_data/transaction-costs.xlsx'
    return pandas.ExcelFile(f)

@pytest.fixture
def rebal_weights(test_file):
    return test_file.parse('rebal_weights', index_col = 0)

@pytest.fixture
def panel(test_file, rebal_weights):
    tickers = ['EEM', 'EFA', 'IYR', 'IWV', 'IEF', 'IYR', 'SHY']
    d = {}
    for ticker in tickers:
        d[ticker] = test_file.parse(ticker, index_col = 0)

    return cp.panel_from_weight_file(rebal_weights, 
                                  pandas.Panel(d), 
                                  1000.
    )

@pytest.fixture
def manual_index(panel, test_file):
    man_calc = test_file.parse('index_result',
                            index_col = 0
    )
    return man_calc

@pytest.fixture
def manual_tc_bps(tc_file):
    man_tcosts = tc_file.parse('tc_bps', index_col = 0)
    man_tcosts = man_tcosts.fillna(0.0)
    return man_tcosts

@pytest.fixture
def manual_tc_cps(tc_file):
    man_tcosts = tc_file.parse('tc_cps', index_col = 0)
    man_tcosts = man_tcosts.fillna(0.0)
    return man_tcosts

@pytest.fixture
def manual_mngmt_fee(tc_file):
    return tc_file.parse('mgmt_fee', index_col = 0)

def test_mngmt_fee(panel, tc_file, manual_mngmt_fee):
    index = cp.pfp_from_weight_file(panel)
    
    vw_mfee = cp.mngmt_fee(price_series = index['Close'],
                           bps_cost = 100.,
                           frequency = 'daily'
    )
    
    testing.assert_series_equal(manual_mngmt_fee['daily_index'],
                                vw_mfee
    )

def test_pfp(panel, manual_index):
    #import ipdb; ipdb.set_trace()
    lib_calc = cp.pfp_from_weight_file(panel)

    # hack because names weren't matching up
    mn_series = manual_index['Close']
    lb_series = lib_calc['Close']
    mn_series.index.name = lb_series.index.name

    testing.assert_series_equal(mn_series, 
                                lb_series
    )
    return lib_calc

def test_tc_bps(rebal_weights, panel, manual_tc_bps):
    vw_tcosts = cp.tc_bps(weight_df = rebal_weights, 
                          share_panel = panel,
                          bps = 10.,
    )
    cols = ['EEM', 'EFA', 'IEF', 'IWV', 'IYR', 'SHY']
    testing.assert_frame_equal(manual_tc_bps[cols], vw_tcosts)

def test_net_bps(rebal_weights, panel, manual_tc_bps, manual_index):
    
    index = test_pfp(panel, manual_index)
    index = index['Close']

    vw_tcosts = cp.tc_bps(weight_df = rebal_weights, 
                          share_panel = panel,
                          bps = 10.,
    )

    net_tcs = cp.net_tcs(tc_df = vw_tcosts, 
                         price_index = index
    )

    testing.assert_series_equal(manual_tc_bps['adj_index'],
                                net_tcs
    )

def test_net_cps(rebal_weights, panel, manual_tc_cps, manual_index):
    index = test_pfp(panel, manual_index)
    index = index['Close']

    vw_tcosts = cp.tc_cps(weight_df = rebal_weights, 
                          share_panel = panel,
                          cps = 10.,
    )

    net_tcs = cp.net_tcs(tc_df = vw_tcosts, 
                         price_index = index
    )

    testing.assert_series_equal(manual_tc_cps['adj_index'],
                                net_tcs
    )

def test_tc_cps(rebal_weights, panel, manual_tc_cps):
    cols = ['EEM', 'EFA', 'IEF', 'IWV', 'IYR', 'SHY']
    vw_tcosts = cp.tc_cps(weight_df = rebal_weights, 
                          share_panel = panel,
                          cps = 10.,
    )

    testing.assert_frame_equal(manual_tc_cps[cols], vw_tcosts)

def test_funs():
    """
    >>> import pandas.util.testing as put
    >>> xl_file = pandas.ExcelFile('../tests/test_splits.xlsx')
    >>> blotter = xl_file.parse('blotter', index_col = 0)
    >>> cols = ['Close', 'Adj Close', 'Dividends']
    >>> price_df = xl_file.parse('calc_sheet', index_col = 0)
    >>> price_df = price_df[cols]
    >>> split_frame = calculate_splits(price_df)

    >>> shares_owned = blotter_and_price_df_to_cum_shares(blotter, 
    ...     split_frame)
    >>> test_vals = xl_file.parse(
    ...     'share_balance', index_col = 0)['cum_shares']
    >>> put.assert_almost_equal(shares_owned['cum_shares'].dropna(), 
    ...     test_vals)
    True
    >>> f = '../tests/panel from weight file test.xlsx'
    >>> xl_file = pandas.ExcelFile(f)
    >>> weight_df = xl_file.parse('rebal_weights', index_col = 0)
    >>> tickers = ['EEM', 'EFA', 'IYR', 'IWV', 'IEF', 'IYR', 'SHY']
    >>> d = {}
    >>> for ticker in tickers:
    ...     d[ticker] = xl_file.parse(ticker, index_col = 0)
    >>> panel = panel_from_weight_file(weight_df, pandas.Panel(d), 
    ...     1000.)
    >>> portfolio = pfp_from_weight_file(panel)
    >>> manual_calcs = xl_file.parse('index_result', index_col = 0)
    >>> put.assert_series_equal(manual_calcs['Close'], 
    ...     portfolio['Close'])
    """
    return None

================================================
FILE: test_module/test_utils.py
================================================
#!/usr/bin/env python
# encoding: utf-8

"""
.. module:: visualize_wealth.test_module.test_utils.py

.. moduleauthor:: Benjamin M. Gross <benjaminMgross@gmail.com>

"""
import os
import pytest
import numpy
import pandas
import tempfile
import datetime

from pandas.util.testing import (assert_frame_equal,
                                 assert_series_equal,
                                 assert_index_equal,
                                 assert_almost_equal
)

import visualize_wealth.utils as utils


@pytest.fixture
def populate_store():
    name = './test_data/tmp.h5'
    store = pandas.HDFStore(name, mode = 'w')

    #two weeks of data before today, delete one week, then update
    delta = datetime.timedelta(14)
    today = datetime.datetime.date(datetime.datetime.today())
    index = pandas.DatetimeIndex(start = today - delta,
                                 freq = 'b',
                                 periods = 10
    )

    store.put('TICK', pandas.Series(numpy.ones(len(index), ),
                                    index = index,
                                    name = 'Close')
    )

    store.put('TOCK', pandas.Series(numpy.ones(len(index), ),
                                    index = index,
                                    name = 'Close')
    )
    store.close()
    return {'name': name, 'index': index}

@pytest.fixture
def populate_updated():
    name = './test_data/tmp.h5'
    store = pandas.HDFStore(name, mode = 'w')

    #two weeks of data before today, delete one week, then update
    delta = datetime.timedelta(14)
    today = datetime.datetime.date(datetime.datetime.today())
    index = pandas.DatetimeIndex(start = today - delta,
                                 freq = 'b',
                                 periods = 10
    )

    store.put('TICK', pandas.Series(numpy.ones(len(index), ),
                                    index = index,
                                    name = 'Close')
    )

    store.put('TOCK', pandas.Series(numpy.ones(len(index), ),
                                    index = index,
                                    name = 'Close')
    )

    #truncate the index for updating
    ind = index[:5]
    n = len(ind)

    #store the Master IND3X
    store.put('IND3X', pandas.Series(ind, 
                                     index = ind)
    )

    cols = ['Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']
    cash = pandas.DataFrame(numpy.ones([n, len(cols)]),
                            index = ind,
                            columns = cols
    )

    #store the CA5H
    store.put('CA5H', cash)
    store.close()
    return {'name': name, 'index': index}


def test_create_store_master_index(populate_store):
    index = populate_store['index']
    index = pandas.Series(index, index = index)

    utils.create_store_master_index(populate_store['name'])
    store = pandas.HDFStore(populate_store['name'], mode = 'r+')
    assert_series_equal(store.get('IND3X'), index)
    store.close()
    os.remove(populate_store['name'])


def test_union_store_indexes(populate_store):
    store = pandas.HDFStore(populate_store['name'], mode = 'r+')
    index = populate_store['index']
    union = utils.union_store_indexes(store)
    assert_index_equal(index, union)
    store.close()
    os.remove(populate_store['name'])


def test_create_store_cash(populate_store):
    index = populate_store['index']
    utils.create_store_cash(populate_store['name'])
    store = pandas.HDFStore(populate_store['name'], mode = 'r+')
    cols = ['Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']
    n = len(index)

    cash = pandas.DataFrame(numpy.ones([n, len(cols)]),
                            index = index,
                            columns = cols
    )

    assert_frame_equal(store.get('CA5H'), cash)
    store.close()
    os.remove(populate_store['name'])


def test_update_store_master_and_cash(populate_updated):
    index = populate_updated['index']
    index = pandas.Series(index, index = index)

    utils.update_store_master_index(populate_updated['name'])
    utils.update_store_cash(populate_updated['name'])

    store = pandas.HDFStore(populate_updated['name'], mode = 'r+')
    assert_series_equal(store.get('IND3X'), index)

    cols = ['Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']
    n = len(index)
    cash = pandas.DataFrame(numpy.ones([n, len(cols)]),
                            index = index,
                            columns = cols
    )

    assert_frame_equal(store.get('CA5H'), cash)
    store.close()
    os.remove(populate_updated['name'])


def test_rets_to_price():
    dts = ['1/1/2000', '1/2/2000', '1/3/2000']

    index = pandas.DatetimeIndex(
                pandas.Timestamp(dt) for dt in dts
    )

    series = pandas.Series([numpy.nan, 0., 0.], 
                           index = index
    )

    log = utils.rets_to_price(
            series, 
            ret_typ = 'log', 
            start_value = 100.
    )

    lin = utils.rets_to_price(
            series, 
            ret_typ = 'linear', 
            start_value = 100.
    )
    
    man = pandas.Series([100., 100., 100.], 
                        index = index
    )

    assert_series_equal(log, man)
    assert_series_equal(lin, man)

    df = pandas.DataFrame({'a': series, 'b': series})
    log = utils.rets_to_price(
            df, 
            ret_typ = 'log', 
            start_value = 100.
    )
    
    lin = utils.rets_to_price(
            df, 
            ret_typ = 'linear', 
            start_value = 100.
    )
    
    man = pandas.DataFrame({'a': man, 'b': man})

    assert_frame_equal(log, man)
    assert_frame_equal(lin, man)

    with pytest.raises(TypeError):
        utils.rets_to_price(pandas.Panel(), 
                            ret_typ = 'log', 
                            start_value = 100.
        )

#@pytest.mark.newtest
def test_strip_vals():
    l = [' TLT', ' HYY ', 'IEF ']
    strpd = utils.strip_vals(l)
    res = ['TLT', 'HYY', 'IEF']
    assert strpd == res

@pytest.mark.newtest
def test_zipped_time_chunks():
    pts = pandas.Timestamp

    index = pandas.DatetimeIndex(
                start = '06/01/2000',
                freq = 'D',
                periods = 100
    )
    res = [('06-01-2000', '06-30-2000'), 
           ('07-01-2000', '07-31-2000'), 
           ('08-01-2000', '08-31-2000')]

    mc = list(((pts(x), pts(y)) for x, y in res))
    lc = utils.zipped_time_chunks(
            index = index,
            interval = 'monthly',
            incl_T = False
    )
    assert mc == lc

    res = [('06-01-2000', '06-30-2000'), 
           ('07-01-2000', '07-31-2000'), 
           ('08-01-2000', '08-31-2000'),
           ('09-01-2000', '09-08-2000')]

    mc = list(((pts(x), pts(y)) for x, y in res))
    lc = utils.zipped_time_chunks(
            index = index,
            interval = 'monthly',
            incl_T = True
    )
    assert mc == lc

    res = [('06-01-2000', '06-30-2000')]
    mc = list(((pts(x), pts(y)) for x, y in res))
    lc = utils.zipped_time_chunks(
            index = index,
            interval = 'quarterly',
            incl_T = False
    )
    assert mc == lc

    res = [('06-01-2000', '06-30-2000')]
    mc = list(((pts(x), pts(y)) for x, y in res))
    lc = utils.zipped_time_chunks(
            index = index,
            interval = 'quarterly',
            incl_T = False
    )
    assert mc == lc

    res = [('06-01-2000', '06-30-2000'),
           ('07-01-2000', '09-08-2000')]
    mc = list(((pts(x), pts(y)) for x, y in res))
    lc = utils.zipped_time_chunks(
            index = index,
            interval = 'quarterly',
            incl_T = True
    )
    assert mc == lc

    mc = []
    lc = utils.zipped_time_chunks(
            index = index,
            interval = 'yearly',
            incl_T = False
    )
    assert mc == lc

    res = [('06-01-2000', '09-08-2000')]
    mc = list(((pts(x), pts(y)) for x, y in res))
    lc = utils.zipped_time_chunks(
            index = index,
            interval = 'yearly',
            incl_T = True
    )
    assert mc == lc

"""
def test_update_store_cash(populate_updated):
    index = populate_updated['index']

    utils.update_store_cash(populate_updated['name'])
    store = pandas.HDFStore(populate_updated['name'], mode = 'r+')
    cols = ['Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']
    n = len(index)
    cash = pandas.DataFrame(numpy.ones([n, len(cols)]),
                            index = index,
                            columns = cols
    )

    assert_frame_equal(store.get('CA5H'), cash)
    store.close()
    os.remove(populate_updated['name'])
"""

================================================
FILE: visualize_wealth/__init__.py
================================================
#!/usr/bin/env python
# encoding: utf-8
"""
.. module:: __init__.py
   :synopsis: initialization file for ``visualize_wealth``

.. moduleauthor:: Benjamin M. Gross <benjaminMgross@gmail.com>

"""
import visualize_wealth.construct_portfolio
import visualize_wealth.utils
import visualize_wealth.classify
import visualize_wealth.analyze


================================================
FILE: visualize_wealth/analyze.py
================================================
#!/usr/bin/env python
# encoding: utf-8
"""
.. module:: visualize_wealth.analyze.py

.. moduleauthor:: Benjamin M. Gross <benjaminMgross@gmail.com>

"""
import collections
import numpy
import pandas
import scipy.stats
from .utils import zipped_time_chunks

def active_return(series, benchmark, freq = 'daily'):
    """
    Active returns is the geometric difference between annualized  
    returns

    :ARGS:

        series: ``pandas.Series`` of prices of the portfolio

        benchmark: ``pandas.Series`` of prices of the benchmark

    :RETURNS: 

        ``pandas.Series`` of active returns

    .. note:: Compound Linear Returns

        Linear returns are not simply subtracted, but rather the 
        compound difference is taken such that

        .. math::

            r_a = \\frac{1 + r_p}{1 + r_b} - 1
    """
    def _active_return(series, benchmark, freq = freq):
        port_ret = annualized_return(series, freq = freq)
        bench_ret = annualized_return(benchmark, freq = freq)

        return geometric_difference(port_ret, bench_ret)

    if isinstance(benchmark, pandas.DataFrame):
        return benchmark.apply(lambda x: _active_return(series, x, freq = freq))
    else:
        return _active_return(series, benchmark, freq = freq)

def active_returns(series, benchmark):
    """
    Active returns is defined as the compound difference between linear 
    returns

    :ARGS:

        series: ``pandas.Series`` of prices of the portfolio

        benchmark: ``pandas.Series`` of prices of the benchmark

    :RETURNS: 

        ``pandas.Series`` of active returns

    .. note:: Compound Linear Returns

        Linear returns are not simply subtracted, but rather the 
        compound difference is taken such that

        .. math::

            r_a = \\frac{1 + r_p}{1 + r_b} - 1
    """
    def _active_returns(series, benchmark):
        return (1 + linear_returns(series)).div(
            1 + linear_returns(benchmark)) - 1 

    if isinstance(benchmark, pandas.DataFrame):
        return benchmark.apply(lambda x: _active_returns(series, x))
    else:
        return _active_returns(series, benchmark)


def alpha(series, benchmark, freq = 'daily', rfr = 0.0):
    """
    Alpha is defined as excess return, over and above its 
    expected return, derived from an asset's sensitivity to an given benchmark, 
    and the return of that benchmrk.

    series: :class:`pandas.Series` or `pandas.DataFrame` of asset prices

    benchamrk: :class:`pandas.Series` of prices

    freq: :class:`string` either ['daily' , 'monthly', 'quarterly', or yearly']
    indicating the frequency of the data. Default, 'daily'

    rfr: :class:`float` of the risk free rate

    .. math::

        \\alpha \\triangleq (R_p - r_f) - \\beta_i \\cdot ( R_b - rf ) 
        
        \\textrm{where},

            R_p &= \\textrm{Portfolio Annualized Return} \\\\
            R_b &= \\textrm{Benchmark Annualized Return} \\\\
            r_f &= \\textrm{Risk Free Rate} \\\\
            \\beta &= \\textrm{Portfolio Sensitivity to the Benchmark}
    """
    def _alpha(series, benchmark, freq = 'daily', rfr = rfr):
        R_p = annualized_return(series, freq = freq)
        R_b = annualized_return(benchmark, freq = freq)
        b = beta(series, benchmark)
        return  R_p - rfr - b * (R_b - rfr)

    if isinstance(benchmark, pandas.DataFrame):
        return benchmark.apply(lambda x: _alpha(
            series, x, freq = freq, rfr = rfr))
    else:
        return _alpha(series, benchmark, freq = freq, rfr = rfr)


def annualized_return(series, freq = 'daily'):
    """
    Returns the annualized linear return of a series, i.e. the linear 
    compounding rate that would have been necessary, given the initial 
    investment, to arrive at the final value

    :ARGS:
    
        series: ``pandas.Series`` of prices
        
        freq: ``str`` of either ``daily, monthly, quarterly, or yearly``
        indicating the frequency of the data ``default=`` daily

    :RETURNS:
    
        ``float``: of the annualized linear return

    .. code:: python

        import visualize_wealth.performance as vwp

        linear_return = vwp.annualized_return(price_series, 
            frequency = 'monthly')
    
    """
    def _annualized_return(series, freq = 'daily'):

        fac = _interval_to_factor(freq)
        T = len(series) - 1.
        yr_frac = (series.index[-1] - series.index[0]).days / 365.
        if yr_frac > 1.:
            return numpy.exp(numpy.log(series[-1]/series[0]) * fac / T) - 1.
        else:
            return numpy.exp(numpy.log(series[-1]/series[0]) ) - 1.

    if isinstance(series, pandas.DataFrame):
        return series.apply(lambda x: _annualized_return(x, freq = freq))
    else:
        return _annualized_return(series, freq = freq)

def annualized_vol(series, freq = 'daily'):
    """
    Returns the annlualized volatility of the log changes of the price 
    series, by calculating the volatility of the series, and then 
    applying the square root of time rule

    :ARGS:
    
        series: ``pandas.Series`` of prices

        freq: ``str`` of either ``daily, monthly, quarterly, or yearly``    
        indicating the frequency of the data ``default=`` daily

    :RETURNS:
    
        float: of the annualized volatility

    .. note:: Applying the Square root of time rule


        .. math::

            \\sigma = \\sigma_t \\cdot \\sqrt{k},\\: \\textrm{where},

            k &= \\textrm{Factor of annualization} \\\\
            \\sigma_t &= \\textrm{volatility of period log returns}
        
    .. code::

        import visualize_wealth.performance as vwp

        ann_vol = vwp.annualized_vol(price_series, 
            frequency = 'monthly')
    """
    def _annualized_vol(series, freq = 'daily'):
        fac = _interval_to_factor(freq)
        series_rets = log_returns(series)
        return series_rets.std()*numpy.sqrt(fac)

    if isinstance(series, pandas.DataFrame):
        return series.apply(lambda x: _annualized_vol(x, freq = freq))
    else:
        return _annualized_vol(series, freq = freq)

def appraisal_ratio(series, benchmark, freq = 'daily', rfr = 0.):
    """
     A measure of the risk-adjusted return of a financial security or portfolio
     that is equal to the alpha, divided by the standard error between the 
     portfolio and the benchmark

    series: :class:`pandas.Series` or `pandas.DataFrame` of asset prices

    benchamrk: :class:`pandas.Series` of prices

    freq: :class:`string` either ['daily' , 'monthly', 'quarterly', or yearly']
    indicating the frequency of the data. Default, 'daily'

    rfr: :class:`float` of the risk free rate

    .. math:: 

        \\textrm{AR} \\triangleq \\frac{\\alpha}{\\epsilon} \\\\
        \\textrm{where,} \\\\
        \\alpha &= \\alpha \\textrm{, the risk adjused excess return} \\\\
        \\epsilon &= \\textrm{standard error, or idiosyncratic risk} \\\\

    """

    def _appraisal_ratio(series, benchmark, freq = freq, rfr = rfr):
        a = alpha(series, benchmark, freq = freq,  rfr = rfr)
        e = idiosyncratic_risk(series, benchmark ,freq = freq)
        return a / e

    if isinstance(benchmark, pandas.DataFrame):
        return benchmark.apply(lambda x: _appraisal_ratio(series, x, 
                                                          freq = freq,
                                                          rfr = rfr)
    )

    else:
        return _appraisal_ratio(series, benchmark, freq = freq, rfr = rfr)


def attribution_weights(series, factor_df):
    """
    Given a price series and explanatory factors factor_df, determine
    the weights of attribution to each factor or asset

    :ARGS:

        series: :class:`pandas.Series` of asset prices to explain given
        the factors or sub_classes in factor_df

        factor_df: :class:`pandas.DataFrame` of the prices of the
        factors or sub_classes to to which the asset prices can be
        attributed

    :RETURNS:

        given an optimal solution, a :class:`pandas.Series` of asset
        factor weights (summing to one) which best explain the
        series.  If an optimal solution is not found, None type is
        returned (with accompanying message)
    """
    def obj_fun(weights):
        tol = 1.e-5
        est = factor_df.apply(lambda x: numpy.multiply(weights, x),
                              axis = 1).sum(axis = 1)
        n = len(series)
        
        #when a variable is "excluded" reduce p for higher adj-r2
        p = len(weights[weights > tol])
        rsq = r2(series = series, benchmark = est)
        adj_rsq = 1 - (1 - rsq)*(n - 1)/(n - p - 1)
        return -1.*adj_rsq

    #linear returns
    series = linear_returns(series).dropna()

    #if isinstance(series, pandas.DataFrame) & len(series.columns == 1):
        #it's an n x 1 dataframe with a valid result
        #series = series[series.columns[0]]

    factor_df = linear_returns(factor_df).dropna()
    guess = numpy.random.rand(factor_df.shape[1])
    guess = pandas.Series(guess/guess.sum(), index = factor_df.columns)
    bounds = [(0., 1.) for i in numpy.arange(len(guess))]

    opt_fun = scipy.optimize.minimize(fun = obj_fun, 
                                      x0 = guess,
                                      bounds = bounds
    )
    opt_wts = pandas.Series(opt_fun.x, index = guess.index)
    opt_wts = opt_wts.div(opt_wts.sum())
    return opt_wts

def attribution_weights_by_interval(series, factor_df, interval):
    """
    Given a price series and explanatory factors factor_df, determine
    the weights of attribution to each factor or asset over differently 
    spaced time intervals

    :ARGS:

        series: :class:`pandas.Series` of asset prices to explain given
        the factors or sub_classes in factor_df

        factor_df: :class:`pandas.DataFrame` of the prices of the
        factors or sub_classes to to which the asset prices can be
        attributed

        interval: interval of the amount of time 

    :RETURNS:

        given an optimal solution, a :class:`pandas.DataFrame` of asset
        factor weights (summing to one) for each interval.  If an optimal 
        solution is not found, None type is returned (with accompanying 
        message)

    """
    chunks = zipped_time_chunks(series.index, interval)
    wt_dict = {}
    for beg, fin in chunks:
        wt_dict[beg] = attribution_weights(series[beg: fin], 
                                           factor_df.loc[beg: fin, :]
        )

    return pandas.DataFrame(wt_dict).transpose()
    
def beta(series, benchmark):
    """
    Returns the sensitivity of one price series to a chosen benchmark:

    :ARGS:

        series: ``pandas.Series`` of prices

        benchmark: :class:`Series` or :class:`DataFrame` of prices 
        of a benchmark to calculate the sensitivity against

    :RETURNS:

        float: the sensitivity of the series to the benchmark

    .. note:: Calculating Beta

        
        .. math::

           \\beta \\triangleq \\frac{\\sigma_{s, b}}{\\sigma^2_{b}},
           \\: \\textrm{where},

           \\sigma^2_{b} &= \\textrm{Variance of the Benchmark} \\\\
           \\sigma_{s, b} &= \\textrm{Covariance of the Series & Benchmark}
    
    """
    def _beta(series, benchmark):
        series_rets = log_returns(series)
        bench_rets = log_returns(benchmark)
        return numpy.divide(bench_rets.cov(series_rets), 
                            bench_rets.var())

    if isinstance(benchmark, pandas.DataFrame):
        return benchmark.apply(lambda x: _beta(series, x))
    else:
        return _beta(series, benchmark)

def beta_ew(series, benchmark, theta = 0.94):
    """
    Returns the exponentially weighted sensitivity of one return 
    series to a chosen benchmark

    :ARGS:

        series: :class:`Series` of prices

        benchmark: :class:`Series` of a benchmark to calculate the 
        sensitivity

        theta: :class:`float` of the exponential smoothing constant
        default to 0.94 MSCI Barra's ew constant

    :RETURNS:

        float: the sensitivity of the series to the benchmark

    """
    span = (1. + theta) / (1. - theta)
    series_rets = log_returns(series)
    bench_rets = log_returns(benchmark)
    
    cov = pandas.ewmcov(series_rets, 
                        bench_rets,
                        span = span,
                        min_periods = span
    )
    
    var = pandas.ewmvar(bench_rets, 
                        span = span, 
                        min_periods = span
    )
    
    return cov.div(var)


def consecutive(int_series):
    """
    Array logic (no for loops) and fast method to determine the number of
    consecutive ones given a `pandas.Series` of integers Derived from 
    `Stack Overflow
    <http://stackoverflow.com/questions/18196811/cumsum-reset-at-nan>`_

    :ARGS:

        int_series: :class:`pandas.Series` of integers as 0s or 1s

    :RETURNS:

        :class:`pandas.Series` of the consecutive ones
    """
    n = int_series == 0
    a = ~n
    c = a.cumsum()
    index = c[n].index
    d = pandas.Series(numpy.diff(numpy.hstack(( [0.], c[n] ))) , 
                      index =index)
    int_series[n] = -d
    return int_series.cumsum()

def consecutive_downtick_performance(series, n_ticks = 3):
    """
    Returns a two column :class:`pandas.DataFrame` with columns 
    `['performance','num_downticks']` that shows the cumulative 
    performance (in log returns) and the `num_upticks` number of 
    days the downtick lasted

    :ARGS:

        series: :class:`pandas.Series` of asset prices

    :RETURNS:

        :class:`pandas.DataFrame` of ``['performance','num_upticks']``.
        Performance is in log returns and `num_downticks` the number 
        of consecutive downticks for which the performance was 
        generated
    """
    def _consecutive_downtick_performance(series, n_ticks):
        dnticks = consecutive_downticks(series, n_ticks = n_ticks)
        series_dn = series[dnticks.index]
        st, fin = dnticks == 0, (dnticks == 0).shift(-1).fillna(True)
        n_per = dnticks[fin]
        series_rets = numpy.log(numpy.divide(series_dn[fin], 
                                             series_dn[st]))

        return pandas.DataFrame({'num_downticks':n_per,
                                 series.name: series_rets})
    
    if isinstance(series, pandas.DataFrame):
        return series.apply(lambda x: _consecutive_downtick_performance(
            x, n_ticks = n_ticks))
    else:
        return _consecutive_downtick_performance(series = series,
            n_ticks = n_ticks)
    
def consecutive_downtick_relative_performance(series, benchmark, n_ticks = 3):
    """
    Returns a two column :class:`pandas.DataFrame` with columns 
    `['outperformance','num_downticks']` that shows the cumulative 
    outperformance (in log returns) and the `num_upticks` number of 
    days the downtick lasted

    :ARGS:

        series: :class:`pandas.Series` of asset prices

        benchmark: :class:`pandas.Series` of prices to compare 
        ``series`` against

    :RETURNS:

        :class:`pandas.DataFrame` of ``['outperformance','num_upticks']``.
        Outperformance is in log returns and `num_downticks` the number 
        of consecutive downticks for which the outperformance was 
        generated
    """
    def _consecutive_downtick_relative_performance(series, benchmark, n_ticks):
        dnticks = consecutive_downticks(benchmark, n_ticks = n_ticks)
        series_dn = series[dnticks.index]
        bench_dn = benchmark[dnticks.index]
        st, fin = dnticks == 0, (dnticks == 0).shift(-1).fillna(True)
        n_per = dnticks[fin]
        series_rets = numpy.log(numpy.divide(series_dn[fin], 
                                             series_dn[st]))
        bench_rets = numpy.log(numpy.divide(bench_dn[fin], bench_dn[st]))
        return pandas.DataFrame({'outperformance':series_rets.subtract(
            bench_rets), 'num_downticks':n_per, series.name: series_rets,
            benchmark.name: bench_rets}, columns = [benchmark.name,
            series.name, 'outperformance', 'num_downticks'] )
    
    if isinstance(benchmark, pandas.DataFrame):
        return map(lambda x: _consecutive_downtick_relative_performance(
               series = series, benchmark = benchmark[x],n_ticks = n_ticks),
               benchmark.columns)
    else:
        return _consecutive_downtick_relative_performance(series = series,
               benchmark = benchmark, n_ticks = n_ticks)

def consecutive_downticks(series, n_ticks = 3):
    """
    Using the :func:`num_consecutive`, returns a :class:`pandas.Series` 
    of the consecutive downticks in the series greater than three 
    downticks

    :ARGS:

        series: :class:`pandas.Series` of the asset prices

    :RETURNS:

        :class:`pandas.Series` of the consecutive downticks of the series
    """
    w = consecutive( (series < series.shift(1)).astype(int) )
    agg_ind = w[w > n_ticks - 1].index.union_many(
              map(lambda x: w[w.shift(-x) == n_ticks].index,
              numpy.arange(n_ticks + 1) ))

    return w[agg_ind]

def consecutive_uptick_relative_performance(series, benchmark, n_ticks = 3):
    """
    Returns a two column :class:`pandas.DataFrame` with columns 
    ``['outperformance', 'num_upticks']`` that shows the cumulative 
    outperformance (in log returns) and the ``num_upticks`` number of 
    days the uptick lasted

    :ARGS:

        series: :class:`pandas.Series` of asset prices

        benchmark: :class:`pandas.Series` of prices to compare 
        ``series`` against

    :RETURNS:

        :class:`pandas.DataFrame` of ``['outperformance',
        'num_upticks']``. Outperformance is in log returns and 
        num_upticks the number of consecutive upticks for which the 
        outperformance was generated
    """
    def _consecutive_uptick_relative_performance(series, benchmark, n_ticks):
        upticks = consecutive_upticks(benchmark, n_ticks = n_ticks)
        series_up  = series[upticks.index]
        bench_up = benchmark[upticks.index]
        st, fin = upticks == 0, (upticks == 0).shift(-1).fillna(True)
        n_per = upticks[fin]
        series_rets = numpy.log(numpy.divide(series_up[fin], 
                                             series_up[st]))
        bench_rets = numpy.log(numpy.divide(bench_up[fin], bench_up[st]))
        return pandas.DataFrame({'outperformance':series_rets.subtract(
            bench_rets), 'num_upticks':n_per, series.name: series_rets,
            benchmark.name: bench_rets}, columns = [benchmark.name,
            series.name, 'outperformance', 'num_upticks'] )

    if isinstance(benchmark, pandas.DataFrame):
        return map(lambda x: _consecutive_uptick_relative_performance(
               series = series, benchmark = benchmark[x], n_ticks = n_ticks),
               benchmark.columns)
    else:
        return _consecutive_uptick_relative_performance(
               series = series, benchmark = benchmark, n_ticks = n_ticks)

def consecutive_uptick_performance(series, n_ticks = 3):
    """
    Returns a two column :class:`pandas.DataFrame` with columns 
    ``['performance', 'num_upticks']`` that shows the cumulative 
    performance (in log returns) and the ``num_upticks`` number of 
    days the uptick lasted

    :ARGS:

        series: :class:`pandas.Series` of asset prices

    :RETURNS:

        :class:`pandas.DataFrame` of ``['outperformance',
        'num_upticks']``. Outperformance is in log returns and 
        num_upticks the number of consecutive upticks for which the 
        outperformance was generated
    """
    def _consecutive_uptick_performance(series, n_ticks):
        upticks = consecutive_upticks(series, n_ticks = n_ticks)
        series_up  = series[upticks.index]
        st, fin = upticks == 0, (upticks == 0).shift(-1).fillna(True)
        n_per = upticks[fin]
        series_rets = numpy.log(numpy.divide(series_up[fin], 
                                             series_up[st]))
        return pandas.DataFrame({'num_upticks':n_per,
            series.name: series_rets} )

    if isinstance(series, pandas.DataFrame):
        return series.apply(lambda x: _consecutive_uptick_performance(x,
            n_ticks = n_ticks))

    else:
        return _consecutive_uptick_performance(
               series = series, n_ticks = n_ticks)

def consecutive_upticks(series, n_ticks = 3):
    """
    Using the :func:`num_consecutive`, returns a :class:`pandas.Series` 
    of the consecutive upticks in the series with greater than 3 
    consecutive upticks

    :ARGS:

        series: :class:`pandas.Series` of the asset prices

    :RETURNS:

        :class:`pandas.Series` of the consecutive downticks of the series
    """
    w = consecutive( (series > series.shift(1)).astype(int) )
    agg_ind = w[w > n_ticks - 1].index.union_many(
              map(lambda x: w[w.shift(-x) == n_ticks].index,
              numpy.arange(n_ticks + 1) ))

    return w[agg_ind]

def cumulative_turnover(alloc_df, asset_wt_df):
    """
    Provided an allocation frame (i.e. the weights to which the portfolio 
    was rebalanced), and the historical asset weights,  return the 
    cumulative turnover, where turnover is defined below.  The first 
    period is excluded of the ``alloc_df`` is excluded as that represents 
    the initial investment

    :ARGS:

        alloc_df: :class:`pandas.DataFrame` of the the weighting allocation 
        that was provided to construct the portfolio

        asset_wt_df: :class:`pandas.DataFrame` of the actual historical 
        weights of each asset

    :RETURNS:

        cumulative turnover

    .. note:: Calcluating Turnover

    Let :math:`\\tau_j =` Single Period period turnover for period 
    :math:`j`, and assets :math:`i = 1,:2,:...:,n`, each whose respective 
    portfolio weight is represented by :math:`\\omega_i`.
    
    Then the single period :math:`j` turnover for all assets 
    :math:`1,..,n` can be calculated as:
    
    .. math::

        \\tau_j = \\frac{\\sum_{i=1}^n|\omega_i - \\omega_{i+1}|  }{2}
        
    """
    #the dates when the portfolio are the cause of turnover
    ind = alloc_df.index[1:]
    try:
        return 0.5*asset_wt_df.loc[ind, :].sub(
            asset_wt_df.shift(-1).loc[ind, :]).abs().sum(axis = 1).sum()

    #the rebalance might have dates past the earliest price
    except KeyError:
        loc = alloc_df.index.searchsorted(asset_wt_df.index[0])
        tmp = alloc_df.iloc[loc:, :]
        ind = tmp.index[1:]
        return 0.5*asset_wt_df.loc[ind, :].sub(
            asset_wt_df.shift(-1).loc[ind, :]).abs().sum(axis = 1).sum()

def cvar_cf(series, p = .01):
    """
    CVaR (Expected Shortfall), using the `Cornish Fisher Approximation 
    <http://en.wikipedia.org/wiki/Cornish%E2%80%93Fisher_expansion>`_

    :ARGS:

        series: :class:`pandas.Series` or :class:`pandas.DataFrame` 
        of the asset prices

        p: :class:`float` of the desired percentile, defaults to .01 
        or the 1% CVaR

    :RETURNS:

        :class:`float` or :class:`pandas.Series` of the CVaR
    
    """
    def _cvar_cf(series, p):
        ppf = scipy.stats.norm.ppf
        pdf = scipy.stats.norm.pdf
        series_rets = log_returns(series)
        mu, sigma = series_rets.mean(), series_rets.std()
        skew, kurt = series_rets.skew(), series_rets.kurtosis() - 3.
        
        f = lambda x, skew, kurt: x + skew/6.*(x**2 - 1) + kurt/24.* x * (
            x**2 - 3.) - skew**2/36. * x * (2. * x**2  - 5.)

        loss = f(x = 1/p*(pdf(ppf(p))), skew = skew, 
                 kurt = kurt) * sigma - mu
        return  numpy.exp(loss) - 1.

    if isinstance(series, pandas.DataFrame):
        return series.apply(lambda x: _cvar_cf(x, p = p))
    else:
        return _cvar_cf(series, p = p)

def cvar_contrib(wts, prices, alpha = .10, n_sims = 100000):
    """
    Calculate each asset's contribution to CVaR based on it's 
    ew volatility and correlation to other assets, and it's 
    current portfolio weight

    :ARGS:

        wts: :class:`Series` of current weights

        prices: :class:`DataFrame` of prices

        alpha: :class:`float` of the cvar parameter

        n_sims: :class:`int` the number of simulations

    :RETURNS:

        :class:`pandas.Series` of proportional contribution

    .. note:: alternative parameters

        currently the span (for exponentially weighted stats)
        and phi (for the degrees of freedom of the t-distribution)
        are not changeable for the function
    """
    
    def _spectral_fun(alpha, n_sims):
        
        th = numpy.ceil(alpha*n_sims) # threshold
        th = int(th) 
        spc = pandas.Series(
                  numpy.zeros([n_sims,])
        )
        
        spc[:th] = 1
        return spc/spc.sum()

    m, n = prices.shape

    rets = analyze.log_returns(prices)
    cov = pandas.ewmcov(rets, span = 21., min_periods = 21)
    zs = pandas.Series(numpy.zeros(n,), index = prices.columns)

    sims = mvt_rnd(mu = zs, 
                   covm = cov.iloc[-1, :, :],
                   phi = 3,
                   n_sim = n_sims
    )

    psi = sims.dot(wts)
    spec = _spectral_fun(alpha = alpha, 
                         n_sims = n_sims
    )

    srtd = psi.copy()
    ind = psi.argsort()
    srtd.sort()

    # pandas multiplies using indexes, so remove index 
    #cvar = srtd[ind].dot(spec.values)

    d = {}

    for asset in wts.index:
        d[asset] = sims.loc[ind, asset].dot(spec.values)

    acvar = pandas.Series(d)
    tmp = acvar.mul(wts)
    return tmp/tmp.sum()

def cvar_norm(series, p = .01):
    """
    CVaR (Conditional Value at Risk), fitting the normal distribution 
    to pthe historical time series using

    :ARGS:

        series: :class:`pandas.Series` or :class:`pandas.DataFrame` of 
        the asset prices
        
        p: :class:`float` of the desired percentile, defaults to .01 
        or the 1% CVaR

    :RETURNS:

        :class:`float` or :class:`pandas.Series` of the CVaR
    """
    def _cvar_norm(series, p):
        pdf = scipy.stats.norm.pdf
        series_rets = log_returns(series)
        mu, sigma = series_rets.mean(), series_rets.std()
        var = lambda alpha: scipy.stats.distributions.norm.ppf(1 - alpha)
        return numpy.exp(sigma/p * pdf(var(p)) - mu) - 1.

    if isinstance(series, pandas.DataFrame):
        return series.apply(lambda x: _cvar_norm(x, p = p))
    else:
        return _cvar_norm(series, p = p)

def cvar_np(series, p):
    """
    Non-parametric CVaR or Expected Shortfall, solely based on the 
    mean of historical values

    :ARGS:

        series: :class:`pandas.Series` or :class:`pandas.DataFrame` of 
        the asset prices

        p: :class:`float` of the desired percentile, defaults to .01 or 
        the 1% CVaR

    :RETURNS:

        :class:`float` or :class:`pandas.Series` of the CVaR
    
    """
    def _cvar_mu_np(series, p):
        series_rets = linear_returns(series)
        var = numpy.percentile(series_rets, p*100.)
        return  -series_rets[series_rets <= var].mean()

    if isinstance(series, pandas.DataFrame):
        return series.apply(lambda x: _cvar_mu_np(x, p = p))
    else:
        return _cvar_mu_np(series, p = p)

def downcapture(series, benchmark):
    """
    Returns the proportion of ``series``'s cumulative negative returns 
    to ``benchmark``'s cumulative  returns, given benchmark's returns 
    were negative in that period

    :ARGS:

        series: ``pandas.Series`` of prices

        benchmark: ``pandas.Series`` of prices to compare ``series`` 
        against

    :RETURNS:

        ``float`` of the downcapture of cumulative positive ret

    .. seealso:: :py:data:`median_downcapture(series, benchmark)`

    """
    def _downcapture(series, benchmark):
        series_rets = log_returns(series)
        bench_rets = log_returns(benchmark)
        index = bench_rets < 0.
        return series_rets[index].mean() / bench_rets[index].mean()

    if isinstance(benchmark, pandas.DataFrame):
        return benchmark.apply(lambda x: _downcapture(series, x))
    else:
        return _downcapture(series, benchmark)

def downside_deviation(series, freq = 'daily'):
    """
    Returns the volatility of the returns that are less than zero

    :ARGS:

        series:``pandas.Series`` of prices

        freq: ``str`` of frequency, either ``daily, monthly, quarterly, 
        or yearly``

    :RETURNS:

        float: of the downside standard deviation

    """
    def _downside_deviation(series, freq = 'daily'):
        fac = _interval_to_factor(freq)
        series_rets = log_returns(series)
        index = series_rets < 0.    
        return series_rets[index].std()*numpy.sqrt(fac)

    if isinstance(series, pandas.DataFrame):
        return series.apply(lambda x: _downside_deviation(x, freq = freq))
    else:
        return _downside_deviation(series, freq = freq)

def drawdown(series):
    """
    Returns a :class:`pandas.Series` or :class:`pandas.DataFrame` 
    (same as input) of the drawdown, i.e. distance from rolling 
    cumulative maximum.  Values are negative specifically to be used
    in plots

    :ARGS:
    
        series: :class:`pandas.Series` or :class:`pandas.DatFrame` of 
        prices

    :RETURNS:
    
        same type as input

    .. code::

        drawdown = vwp.drawdown(price_df)
        """

    def _drawdown(series):
        dd = (series/series.cummax() - 1.)
        dd[0] = numpy.nan
        return dd

    if isinstance(series, pandas.DataFrame):
        return series.apply(_drawdown)
    else:
        return _drawdown(series)

def ew_vol(series, theta = 0.94, freq = 'daily'):
    """
    Returns the exponentially weighted, annualized standard deviation

    :ARGS:

        series: :class:`Series` or :class:`DataFrame` of prices

        theta: coefficient of decay, default BARRA's value of .94 
        which roughly equates to a span of 33 days

        freq: :class:`string` of either ['daily', 'monthly', 'quarterly', 
        'yearly']
    """
    span = (1. + theta)/(1 - theta)

    log_rets = log_returns(series)

    fac = _interval_to_factor(freq)

    ew_vol = pandas.ewmstd(log_rets,
                           span = span,
                           min_periods = span
    )

    return ew_vol*numpy.sqrt(fac)
        

def geometric_difference(a, b):
    """
    Returns the geometric difference of returns where

    :ARGS:

        a: :class:`pandas.Series` or :class:`float`

        b: :class:`pandas.Series` or :class:`float`

    :RETURNS:

        same class as inputs

    .. math::

        \\textrm{GD} = \\frac{(1 + a )}{(1 + b)}  - 1 \\\\
    """
    if isinstance(a, pandas.Series):
        msg = "index must be equal for pandas.Series"
        assert a.index.equals(b.index), msg
        return (1. + a).divide(1. + b) - 1.
    else:
        return (1. + a) / (1. + b) - 1.

def idiosyncratic_as_proportion(series, benchmark, freq = 'daily'):
    """
    Returns the idiosyncratic risk as proportion of total volatility

    :ARGS:

        series: ``pandas.Series`` of prices

        benchmark: ``pandas.Series`` to compare ``series`` against

        freq: ``str`` of frequency, either ``daily, monthly, quarterly, 
        or yearly``

    :RETURNS:

        ``float`` between (0, 1) representing the proportion of  
        volatility represented by idiosycratic risk
        
    """
    def _idiosyncratic_as_proportion(series, benchmark, freq = 'daily'):
        fac = _interval_to_factor(freq)
        series_rets = log_returns(series)
        return idiosyncratic_risk(series, benchmark, freq)**2 / (
            annualized_vol(series, freq)**2)

    if isinstance(benchmark, pandas.DataFrame):
        return benchmark.apply(lambda x: _idiosyncratic_as_proportion(
            series, x, freq))
    else:
        return _idiosyncratic_as_proportion(series, benchmark, freq)

def idiosyncratic_risk(series, benchmark, freq = 'daily'):
    """
    Returns the idiosyncratic risk, i.e. unexplained variation between 
    a price series and a chosen benchmark 

    :ARGS:

       series: ``pandas.Series`` of prices

       benchmark: ``pandas.Series`` to compare ``series`` against

       freq: ``str`` of frequency, either ``daily, monthly, quarterly, 
       or yearly``

    :RETURNS:

        float: the idiosyncratic volatility (not variance)


    .. note:: Additivity of an asset's Variance

        An asset's variance can be broken down into systematic risk, 
        i.e. that proportion of risk that can be attributed to some 
        benchmark or risk factor and idiosyncratic risk, or the 
        unexplained variation between the series and the chosen 
        benchmark / factor.  

        Therefore, using the additivity of variances, we can calculate 
        idiosyncratic risk as follows:

       .. math::

           \\sigma^2_{\\textrm{total}} = \\sigma^2_{\\beta} + 
           \\sigma^2_{\\epsilon} + \\sigma^2_{\\epsilon, \\beta}, 
           \\: \\textrm{where}, 

           \\sigma^2_{\\beta} &= \\textrm{variance attributable to 
           systematic risk}
           \\\\
           \\sigma^2_{\\epsilon} &= \\textrm{idiosyncratic risk} \\\\
           \\sigma^2_{\\epsilon, \\beta} &= \\textrm{covariance 
           between idiosyncratic
           and systematic risk, which by definition} = 0 \\\\

           \\Rightarrow \\sigma_{\\epsilon} = \\sqrt{\\sigma^2_{\\beta} + 
           \\sigma^2_{\\epsilon, \\beta}}

    """
    def _idiosyncratic_risk(series, benchmark, freq = 'daily'):
        fac = _interval_to_factor(freq)
        series_rets =log_returns(series)
        bench_rets = log_returns(benchmark)
        series_vol = annualized_vol(series, freq)
        benchmark_vol = annualized_vol(benchmark, freq)
        return numpy.sqrt(series_vol**2 - beta(series, benchmark)**2 * (
            benchmark_vol ** 2))

    if isinstance(benchmark, pandas.DataFrame):
        return benchmark.apply(lambda x: _idiosyncratic_risk(
            series, x, freq = freq))
    else:
        return _idiosyncratic_risk(series, benchmark, freq)

def information_ratio(series, benchmark, freq = 'daily'):
    """
    A measure of the risk-adjusted return of a financial security or portfolio
    that is equal to the active return divided by the tracking error between the 
    portfolio and the benchmark (MATE is used here, see the benefits of 
        MATE over TE)

    series: :class:`pandas.Series` or `pandas.DataFrame` of asset prices

    benchamrk: :class:`pandas.Series` of prices

    freq: :class:`string` either ['daily' , 'monthly', 'quarterly', or yearly']
    indicating the frequency of the data. Default, 'daily'

    rfr: :class:`float` of the risk free rate

    .. note:: Calculating Information Ratio

        .. math:: 

            \\textrm{IR} \\triangleq \\frac{\\alpha}{\\omega} \\\\
        
        where,

        .. math::

            \\alpha &= \\textrm{active return} \\\\
            \\omega &= \\textrm{tracking error} \\\\

    """
    def _information_ratio(series, benchmark, freq = freq):
        ar = active_return(series, benchmark, freq = freq)
        mate = mean_absolute_tracking_error(series, benchmark, freq = freq)
        return ar / mate

    if isinstance(benchmark, pandas.DataFrame):
        return benchmark.apply(lambda x: _information_ratio(series, x, freq = freq))
    else:
        return _information_ratio(series, benchmark, freq = freq)

def jensens_alpha(series, benchmark, rfr = 0., freq = 'daily'):
    """
    Returns the `Jensen's Alpha 
    <http://en.wikipedia.org/wiki/Jensen's_alpha>`_ or the excess 
    return based on the systematic risk of the ``series`` relative to
    the ``benchmark``

    :ARGS:

        series: ``pandas.Series`` of prices 
        
        benchmark: ``pandas.Series`` of the prices to compare 
        ``series`` against

        rfr: ``float`` of the risk free rate

        freq: ``str`` of frequency, either daily, monthly, quarterly, 
        or yearly

    :RETURNS:

        ``float`` representing the Jensen's Alpha

    .. note:: Calculating Jensen's Alpha

        .. math::

            \\alpha_{\\textrm{Jensen}} = r_p - \\beta \\cdot r_b 
        
        Where,

        .. math::

            r_p &= \\textrm{annualized linear return of the portfolio} 
            \\\\
            \\beta &= \\frac{\\sigma_{s, b}}{\\sigma^2_{b}} \\\\
            r_b &= \\textrm{annualized linear return of the benchmark}

    """
    def _jensens_alpha(series, benchmark, rfr = 0., freq = 'daily'):
        fac = _interval_to_factor(freq)
        series_ret = annualized_return(series, freq)
        bench_ret = annualized_return(benchmark, freq)
        return series_ret - (rfr + beta(series, benchmark)*(
            bench_ret - rfr))

    if isinstance(benchmark, pandas.DataFrame):
        return benchmark.apply(lambda x: _jensens_alpha(
            series, x, rfr = rfr, freq = freq))
    else:
        return _jensens_alpha(series, benchmark, rfr = rfr, freq = freq)

def linear_returns(series):
    """
    Returns a series of linear returns given a series of prices
    
    :ARGS:

        series: ``pandas.Series`` of prices

    :RETURNS:

        series: ``pandas.Series`` of linear returns

    .. note:: Calculating Linear Returns

            .. math::

                R_t = \\frac{P_{t+1}}{P_t} - 1
             
    """
    def _linear_returns(series):
        return series.div(series.shift(1)) - 1
    if isinstance(series, pandas.DataFrame):
        return series.apply(_linear_returns)
    else:
        return _linear_returns(series)    

def log_returns(series):
    """
    Returns a series of log returns given a series of prices where

    :ARGS:

        series: ``pandas.Series`` of prices

    :RETURNS:

        series: ``pandas.Series`` of log returns 

    .. note:: Calculating Log Returns

        .. math::

            R_t = \\log(\\frac{P_{t+1}}{P_t})
         
    """
    def _log_returns(series):
        return series.apply(numpy.log).diff()
    if isinstance(series, pandas.DataFrame):
        return series.apply(_log_returns)
    else:
        return _log_returns(series)

def log_returns_chol_adj(frame, theta = 0.94):
    """
    Create volatility adjusted historical returns that preserve the
    covariance structure while providing scaled-appropriate returns
    to calculate tail-risk measures

    :ARGS:

        frame: :class:`DataFrame` of prices

        theta: :class:`float` of the decay parameter to use for the
        exponential smoothing

    :RETURNS:

        :class:`DataFrame` of vol adjusted log returns preserving the
        covariance structure

    .. note:: Calculation explanation

        The calculation comes from the Duffie & Pan 1997, where the 
        Cholesky matrix of covariance matrix is used in place of the
        square root of the variance, in the volatility adjustment, and
        can be seen in `Value at Risk Models <http://goo.gl/BZHsjR>`_, 
        by Carol Alexander  

        Where,

        .. math:: 

            \\tilde{\\mathbb{x}_t} = \\mathbb{Q}_T\\mathbb{Q}^{-1}_t
            \\mathbb{x}_t, \\; t = 1, 2, ..., T \\\\ \\\\
        
        Where

        .. math:: 

            \\tilde{\\mathbb{x}_t} &= \\textrm{ the stock returns } 
            \\textrm{adjusted to have constant covariance } \\\\
            \\mathbb{Q}_t &=
            \\textrm{ the Cholesky matrix of the covariance matrix } \\\\ 
            \\mathbb{x}_t &= \\textrm{ the unadjusted stock returns}
    """
    #define the truncated functions
    dot = numpy.dot
    inv = numpy.linalg.inv
    chol = numpy.linalg.cholesky

    span = (1. + theta)/(1 - theta)

    log_rets = log_returns(frame)

    ew_cov = pandas.ewmcov(log_rets, 
                           span = span,
                           min_periods = span
    )
    q_T = chol(ew_cov.iloc[-1, :, :])
    d = {}
    for row in ew_cov.dropna().items:
        q_t = chol(ew_cov.loc[row, :, :])
        d[row] = pandas.Series(dot(log_rets.xs(row),
                               dot(q_T, inv(q_t))),
                               index = log_rets.columns
        )

    new_logs = pandas.DataFrame(d).transpose()
    return new_logs.reindex(log_rets.index)


def log_returns_vol_adj(series, theta = 0.94, freq = 'daily'):
    """
    Returns the volatility scaled log returns

    :ARGS:

        series: :class:`Series` or :class:`DataFrame` of prices

        theta: :class:`float` of the decay parameter to use for the
        exponential smoothing for volatility

        freq: :class:`string` from ['daily', 'monthly', 
        'quarterly', 'yearly']

    :RETURNS:

        volatility scaled log returns of the same dtype provided

    .. note:: Calculating Vol Adjustment Factor

       .. math::

           \\tilde{r}_{t,T} = \\frac{\\sigma_T}{\\sigma_t}\\cdot r_{t}

       This methodology is most common in using scaled historical 
       returns to calculate VaR and CVaR

    """
    def _log_ret_vol_adj(series, theta, freq):
        log_rets = log_returns(series)
        vol = ew_vol(series, theta = theta, freq = freq)
        scale = vol[-1] / vol
        return log_rets.mul(scale)

    if isinstance(series, pandas.DataFrame):
        return series.apply(
            lambda x: _log_ret_vol_adj(x, theta = theta, freq = freq)
        )
    else:
        return _log_ret_vol_adj(series, theta = theta, freq = freq)

def max_drawdown(series):
    """
    Returns the maximum drawdown, or the maximum peak to trough linear 
    distance, as a positive drawdown value

    :ARGS:
    
        series: ``pandas.Series`` of prices

    :RETURNS:
    
        float: the maximum drawdown of the period, expressed as a 
        positive number

    .. code::

        import visualize_wealth.performance as vwp

        max_dd = vwp.max_drawdown(price_series)
        """
    def _max_drawdown(series):
        return numpy.max(1 - series/series.cummax())
    if isinstance(series, pandas.DataFrame):
        return series.apply(_max_drawdown)
    else:
        return _max_drawdown(series)

def mctr(asset_df, portfolio_series):
    """
    Return a :class:`pandas.Series` of the marginal contribution for 
    risk ("mctr") for each of the assets that construct ``portfolio_df``

    :ARGS:

        asset_df: :class:`pandas.DataFrame` of asset prices

        portfolio_series: :class:`pandas.Series` of the portfolio value 
        that is consructed by ``asset_df``

    :RETURNS:

        a :class:`pandas.Series` of each of the asset's marginal 
        contribution to risk

    .. note:: Calculating Marginal Contribution to Risk

        If we define, :math:`MCR_i` to be the Marginal Contribution to 
        Risk for asset :math:`i`, then,

        .. math::

            MCTR_i &= \\sigma_i \\cdot \\rho_{i, P} \\\\

        Where,

        .. math::
            
            \\sigma_i &= \\textrm{volatility of asset } i, \\\\
            \\rho_i &= \\textrm{correlation of asset } i
            \\textrm{ with the Portfolio}

    .. note:: Reference for Further Reading

        MSCI Barra did an extensive (and easy to read) white paper 
        entitled `Risk Contribution <http://bit.ly/1eGmxJG>`_ that 
        explicitly details the risk exposure calculation.
    """
    asset_rets = log_returns(asset_df)
    port_rets = log_returns(portfolio_series)
    return asset_rets.corrwith(port_rets).mul(asset_rets.std())

def mean_absolute_tracking_error(series, benchmark, freq = 'daily'):
    """
    Returns Carol Alexander's calculation for Mean Absolute Tracking 
    Error ("MATE").


    :ARGS:

        series: ``pandas.Series`` of prices

        benchmark: ``pandas.Series`` to compare ``series`` against

        freq: ``str`` of frequency, either ``daily, monthly, quarterly, 
        or yearly`` 


    :RETURNS:

        ``float`` of the mean absolute tracking error
        
    .. note:: Why Mean Absolute Tracking Error

        One of the downfalls of 
        `Tracking Error <http://en.wikipedia.org/wiki/Tracking_error>`_ 
        ("TE") is that diverging price series that diverge at a constant 
        rate **may** have low TE.  MATE addresses this issue.
        
        .. math::
    
           \\sqrt{\\frac{(T-1)}{T}\\cdot \\tau^2 + \\bar{R}} \\: 
           \\textrm{where}

           \\tau &= \\textrm{Tracking Error} \\\\
           \\bar{R} &= \\textrm{mean of the active returns}

    """
    def _mean_absolute_tracking_error(series, benchmark, freq = 'daily'):
        active_rets = active_returns(series = series, 
                                     benchmark = benchmark)
        N = active_rets.shape[0]
        return numpy.sqrt((N - 1)/float(N) * tracking_error(
            series, benchmark, freq)**2 + active_rets.mean()**2)

    if isinstance(benchmark, pandas.DataFrame):
        return benchmark.apply(lambda x: _mean_absolute_tracking_error(
            series, x, freq = freq))
    else:
        return _mean_absolute_tracking_error(series, benchmark, 
                                             freq = freq)

def median_downcapture(series, benchmark):
    """
    Returns the median downcapture of a ``series`` of prices against a 
    ``benchmark`` prices, given that the ``benchmark`` achieved negative 
    returns in a given period

    :ARGS:

        series: ``pandas.Series`` of prices

        benchmark: ``pandas.Series`` of prices to compare ``series`` 
        against

    :RETURNS:

        ``float`` of the median downcapture

    .. warning:: About Downcapture
        
        Downcapture can be a difficult statistic to ensure validity.  As 
        downcapture is :math:`\\frac{\\sum{r_{\\textrm{series}}}}
        {\\sum{r_{b|r_i \\geq 0}}}` or the median values (in this case), 
        dividing by small numbers can have asymptotic effects to the 
        overall value of this statistic.  Therefore, it's good to do a 
        "sanity check" between ``median_upcapture`` and ``upcapture``
    
    """
    def _median_downcapture(series, benchmark):
        series_rets = log_returns(series)
        bench_rets = log_returns(benchmark)
        index = bench_rets < 0.
        return series_rets[index].median() / bench_rets[index].median()

    if isinstance(benchmark, pandas.DataFrame):
        return benchmark.apply(lambda x: _median_downcapture(series, x))
    else:
        return _median_downcapture(series, benchmark)

def median_upcapture(series, benchmark):
    """
    Returns the median upcapture of a ``series`` of prices against a 
    ``benchmark`` prices, given that the ``benchmark`` achieved 
    positive returns in a given period

    :ARGS:

        series: ``pandas.Series`` of prices

        benchmark: ``pandas.Series`` of prices to compare ``series`` 
        against

    :RETURNS:

        float: of the median upcapture 

    .. warning:: About Upcapture

        Upcapture can be a difficult statistic to ensure validity.  As 
        upcapture is :math:`\\frac{\\sum{r_{\\textrm{series}}}}
        {\\sum{r_{b|r_i \\geq 0}}}` or the median values (in this case), 
        dividing by small numbers can have asymptotic effects to the 
        overall value of this statistic.  Therefore, it's good to do a 
        "sanity check" between ``median_upcapture`` and ``upcapture``
        
    """
    def _median_upcapture(series, benchmark):
        series_rets = log_returns(series)
        bench_rets = log_returns(benchmark)
        index = bench_rets > 0.
        return series_rets[index].median() / bench_rets[index].median()

    if isinstance(benchmark, pandas.DataFrame):
        return benchmark.apply(lambda x: _median_upcapture(series, x))
    else:
        return _median_upcapture(series, benchmark)

def mvt_rnd(mu, covm, phi, n_sim):
    """
    Create an repr(n_sim) simluation of a multi-variate t 
    distribution with repr(phi) degrees of freedom, mean repr(mu),
    and covariance structure repr(covm)
    
    :ARGS:

        mu: :class:`Series` of average returns

        covm: :class:`DataFrame` of the assets covariance matrix

        phi: :class:`float` of t-distribution degrees of freedom

        n_sim: :class:`int` of the number of simulations to make

    :RETURNS:

        :class:`DataFrame` dim in {n_sim, mu.shape} of simulations

    .. note::

        Transformation taken from `Kenny Chowdary's website
        <http://bit.ly/1Kh8gku>`_

    """
    d = len(covm)

    g = numpy.tile(numpy.random.gamma(phi/2., 2./phi, n_sim), (d, 1)).T
    Z = numpy.random.multivariate_normal(numpy.zeros(d), covm, n_sim)
    ret = mu.values + Z/numpy.sqrt(g)

    return pandas.DataFrame(ret, columns = mu.index)

def period_returns(series, freq = 'daily', interval = 'quarterly'):
    """
    Return the disjoint periodic returns of series at interval, given the 
    time frequency of the data in series is freq.

    :ARGS:

        series: :class:`pandas.Series` of prices

        freq: :class:`string` in ['daily', 'monthly', 'quarterly', 'yearly'] of 
        the frequency of the data

        interval: :class:`string` of the periodicity of the interval you wish to
        return, in ['monthly', 'quarterly', 'yearly']

    :RETURNS:

        :class:`pandas.Series`
    """
    def _period_returns(series, freq, interval):
        fmat = {'monthly': lambda x: '{0}-{1}'.format(x.month, x.year), 
                'quarterly': lambda x: 'q{0}-{1}'.format(x.quarter, x.year),
                'yearly': lambda x: x.year
                }

        chunks = zipped_time_chunks(series.index, interval)
        dt_l = []
        d = {}
        for beg, fin in chunks:
            key = fmat[interval](beg)
            d[key] = annualized_return(series[beg:fin], freq = freq)
            dt_l.append(key)

        return pandas.Series(d, index = dt_l)

    if isinstance(series, pandas.Series):
        return _period_returns(series = series, freq = freq, interval = interval)
    else:
        return series.apply(lambda x: _period_returns(series = x,
                                                      freq = freq,
                                                      interval = interval)
        )

def period_volatility(series, freq = 'daily', interval = 'quarterly'):
    """
    Return the disjoint periodic volatility of series at interval, given the 
    time frequency of the data in series is freq.

    :ARGS:

        series: :class:`pandas.Series` of prices

        freq: :class:`string` in ['daily', 'monthly', 'quarterly', 'yearly'] of 
        the frequency of the data

        interval: :class:`string` of the periodicity of the interval you wish to
        return, in ['monthly', 'quarterly', 'yearly']

    :RETURNS:

        :class:`pandas.Series`
    """
    def _period_volatility(series, freq, interval):
        fmat = {'monthly': lambda x: '{0}-{1}'.format(x.month, x.year), 
                'quarterly': lambda x: 'q{0}-{1}'.format(x.quarter, x.year),
                'yearly': lambda x: x.year
                }

        chunks = zipped_time_chunks(series.index, interval)

        dt_l = []
        d = {}
        for beg, fin in chunks:
            key = fmat[interval](beg)
            d[key] = annualized_vol(series[beg:fin], freq = freq)
            dt_l.append(key)

        return pandas.Series(d, index = dt_l)

    if isinstance(series, pandas.Series):
        return _period_volatility(series = series, freq = freq, interval = interval)
    else:
        return series.apply(lambda x: _period_volatility(series = x,
                                                         freq = freq,
                                                         interval = interval)
        )

def r2(series, benchmark):
    """
    Returns the R-Squared or `Coefficient of Determination
    <http://en.wikipedia.org/wiki/Coefficient_of_determination>`_ 
    for a univariate regression (does not adjust for more independent 
    variables)
    
    .. seealso:: :meth:`r2_adjusted`

    :ARGS:

        series: :class`pandas.Series` of of log returns

        benchmark: :class`pandas.Series` of log returns to regress 
        ``series`` against

    :RETURNS:

        float: of the coefficient of variation
    """
    def _r_squared(x, y):
        X = pandas.DataFrame({'ones': 1., 'xs': x})
        beta = numpy.linalg.inv(X.transpose().dot(X)).dot(
            X.transpose().dot(y) )
        y_est = beta[0] + beta[1]*x
        ss_res = ((y_est - y)**2).sum()
        ss_tot = ((y - y.mean())**2).sum()
        return 1 - ss_res/ss_tot


    if isinstance(benchmark, pandas.DataFrame):
        #remove the numpy.nan's if they're there
        if (benchmark.iloc[0, :].isnull().all()) & (numpy.isnan(series[0])):
            benchmark = benchmark.dropna()
            series = series.dropna()
        return benchmark.apply(lambda x: _r_squared(x = x, y = series))
    else:
        if (numpy.isnan(benchmark.iloc[0])) & (numpy.isnan(series.iloc[0])):
            benchmark = benchmark.dropna()
            series = series.dropna()
        return _r_squared(y = series, x = benchmark)


def r2_adj(series, benchmark):
    """
    The Adjusted R-Squared that incorporates the number of 
    independent variates using the `Formula Found of Wikipedia
    <http://en.wikipedia.org/wiki/Coefficient_of_determination#Adjusted_R2>_`

    :ARGS:

        series: :class:`pandas.Series` of asset returns

        benchmark: :class:`pandas.DataFrame` of benchmark returns to 
        explain the returns of the ``series``

        weights: :class:`pandas.Series` of weights to weight each column 
        of the benchmark

    :RETURNS:

        :class:float of the adjusted r-squared`
    """
    n = len(series)
    p = 1
    return 1 - (1 - r2(series, benchmark))*(n - 1)/(n - p - 1)  

def r2_mv_adj(x, y):
    """
    Returns the adjusted R-Squared for multivariate regression
    """
    n = len(y)
    p = x.shape[1]
    return 1 - (1 - r2_mv(x, y))*(n - 1)/(n - p - 1)

def r2_mv(x, y):   
    """
    Multivariate r-squared
    """
    ones = pandas.Series(numpy.ones(len(y)), name = 'ones')
    d = x.to_dict()
    d['ones'] = ones
    cols = ['ones']
    cols.extend(x.columns)
    X = pandas.DataFrame(d, columns = cols)
    beta = numpy.linalg.inv(X.transpose().dot(X)).dot(
        X.transpose().dot(y) )
    y_est = beta[0] + x.dot(beta[1:])
    ss_res = ((y_est - y)**2).sum()
    ss_tot = ((y - y.mean())**2).sum()
    return 1 - ss_res/ss_tot

def risk_adjusted_excess_return(series, benchmark, rfr = 0., 
                                freq = 'daily'):
    """
    Returns the MMRAP or the `Modigliani Risk Adjusted Performance 
    <http://en.wikipedia.org/wiki/Modigliani_risk-adjusted_performance>`_ 
    that calculates the excess return from the `Capital Allocation Line 
    <http://en.wikipedia.org/wiki/Capital_allocation_line>`_, at the 
    same level of risk (or volatility), specificaly,
        
    :ARGS:

        series: ``pandas.Series`` of prices

        benchmark: ``pandas.Series`` from which to compare ``series``

        rfr: ``float`` of the risk free rate

        freq: ``str`` of frequency, either ``daily, monthly, quarterly, 
        or yearly``

    :RETURNS:

        ``float`` of the risk adjusted excess performance

    .. note:: Calculating Risk Adjusted Excess Returns

       .. math::
    
          raer = r_p - \\left(\\textrm{SR}_b \\cdot \\sigma_p + 
          r_f\\right)

       Where,

       .. math::

          r_p &= \\textrm{annualized linear return} 
          \\\\
          \\textrm{SR}_b &= \\textrm{Sharpe Ratio of the benchmark} 
          \\\\
          \\sigma_p &= \\textrm{volatility of the portfolio}
          \\\\
          r_f &= \\textrm{Risk free rate}
    
    """
    def _risk_adjusted_excess_return(series, benchmark, rfr = 0., 
                                     freq = 'daily'):
        benchmark_sharpe = sharpe_ratio(benchmark, rfr, freq)
        annualized_ret = annualized_return(series, freq)
        series_vol = annualized_vol(series, freq)
        return annualized_ret - series_vol * benchmark_sharpe - rfr

    if isinstance(benchmark, pandas.DataFrame):
        return benchmark.apply(lambda x: _risk_adjusted_excess_return(
            series, x, rfr = rfr, freq = freq))
    else:
        return _risk_adjusted_excess_return(series, benchmark, 
                                            rfr = rfr, freq = freq)

def risk_contribution(mctr_series, weight_series):
    """
    Returns the risk contribution for each asset, given the marginal 
    contribution to risk ("mctr") and the ``weight_series`` of asset 
    weights

    :ARGS:

        mctr_series: :class:`pandas.Series` of the marginal risk 
        contribution 

        weight_series: :class:`pandas.Series` of weights of each asset

    :RETURNS:

        :class:`pandas.Series` of the risk contribution of each asset

    .. note:: Calculating Risk Contribution

        If :math:`RC_i` is the Risk Contribution of asset :math:`i`, and 
        :math:`\omega_i` is the weight of asset :math:`i`, then

        .. math::

            RC_i = mctr_i \\cdot \\omega_i
        
    
    .. seealso:: :meth:`mctr` for Marginal Contribution to Risk ("mctr") 
        as well as the `Risk Contribution <http://bit.ly/1eGmxJG>`_ 
        paper from MSCI Barra
    
    """
    return mctr_series.mul(weight_series)


def risk_contribution_as_proportion(mctr_series, weight_series):
    """
    Returns the proprtion of the risk contribution for each asset, given 
    the marginal contribution to risk ("mctr") and the ``weight_series`` 
    of asset weights

    :ARGS:

        mctr_series: :class:`pandas.Series` of the marginal risk 
        contribution 

        weight_series: :class:`pandas.Series` of weights of each asset

    :RETURNS:

        :class:`pandas.Series` of the proportional risk contribution 
        of each asset

    
    .. seealso:: :meth:`mctr` for Marginal Contribution to Risk ("mctr") 
        as well as the `Risk Contribution <http://bit.ly/1eGmxJG>`_ 
        paper from MSCI Barra
    
    """
    rc = mctr_series.mul(weight_series)
    return rc/rc.sum()
 
def rolling_ui(series, window = 21):
    """   
    returns the rolling ulcer index over a series for a given ``window``
    (instead of the squared deviations from the mean).
    
    :ARGS:
    
        series: ``pandas.Series`` of prices

        window: ``int`` of the size of the rolling window
        
    :RETURNS:
    
        ``pandas.Series``: of the rolling ulcer index

    .. code::

        import visualize_wealth.performance as vwp

        ui = vwp.rolling_ui(price_series, window = 252)

    """
    def _rolling_ui(series, window = 21):
        rui = pandas.Series(numpy.tile(numpy.nan, [len(series),]), 
                            index = series.index, name = 'rolling UI')
        j = 0
        for i in numpy.arange(window, len(series)):
            rui[i] = ulcer_index(series[j:i])
            j += 1
        return rui

    if isinstance(series, pandas.DataFrame):
        return series.apply(lambda x: _rolling_ui(x, window = window))
    else:
        return _rolling_ui(series)

def adj_sharpe_ratio(series, rfr = 0., freq = 'daily'):
    """
    Returns the `Ajusted Sharpe Ratio <http://en.wikipedia.org/wiki/Sharpe_ratio>`_ 
    of an asset, taking into account the kurtosis and skew of the returns.    time series
    
    :ARGS:

        series: ``pandas.Series`` of prices

        rfr: ``float`` of the risk free rate

        freq: ``str`` of frequency, either ``daily, monthly, quarterly, or 
        yearly``

    :RETURN:

        ``float`` of the Adjusted Sharpe Ratio

    .. note:: Calculating Sharpe 

        .. math::

            \\textrm{SR_{adj}} = \\textrm{SR} \\cdot (1 + \\frac{S}{6}\\cdot 
            \\textrm{SR} - \\frac{K - 3}{24} \\cdot \\textrm{SR}^2)
            \\textrm{where},

            R_p &= \\textrm{series annualized return} \\\\
            r_f &= \\textrm{Risk free rate} \\\\
            \\sigma &= \\textrm{Portfolio annualized volatility}

    """
    def _adj_sharpe_ratio(series, rfr = 0., freq = 'daily'):
        sr = sharpe_ratio(series, rfr = rfr, freq = freq)
        skew = log_returns(series).skew()
        kurt = log_returns(series).kurt()
        return sr * (1 + skew/6. * sr - (kurt - 3)/24 * sr**2)

    if isinstance(series, pandas.DataFrame):
        return series.apply(lambda x: _adj_sharpe_ratio(x, rfr = rfr, freq = freq))
    else:
        return _adj_sharpe_ratio(series, rfr = rfr, freq = freq)

def sharpe_ratio(series, rfr = 0., freq = 'daily'):
    """
    Returns the `Sharpe Ratio <http://en.wikipedia.org/wiki/Sharpe_ratio>`_ 
    of an asset, given a price series, risk free rate of ``rfr``, and 
    ``frequency`` of the 
    time series
    
    :ARGS:

        series: ``pandas.Series`` of prices

        rfr: ``float`` of the risk free rate

        freq: ``str`` of frequency, either ``daily, monthly, quarterly, or 
        yearly``

    :RETURN:

        ``float`` of the Sharpe Ratio

    .. note:: Calculating Sharpe 

        .. math::

            \\textrm{SR} = \\frac{(R_p - r_f)}{\\sigma} \\: 
            \\textrm{where},

            R_p &= \\textrm{series annualized return} \\\\
            r_f &= \\textrm{Risk free rate} \\\\
            \\sigma &= \\textrm{Portfolio annualized volatility}

    """
    def _sharpe_ratio(series, rfr = 0., freq = 'daily'):
        return (annualized_return(series, freq) - rfr)/annualized_vol(
            series, freq)

    if isinstance(series, pandas.DataFrame):
        return series.apply(lambda x: _sharpe_ratio(x, rfr = rfr, freq = freq))
    else:
        return _sharpe_ratio(series, rfr = rfr, freq = freq)


def sortino_ratio(series, freq = 'daily', rfr = 0.0):
    """
    Returns the `Sortino Ratio 
    <http://en.wikipedia.org/wiki/Sortino_ratio>`_, or excess returns 
    per unit downside volatility

    :ARGS:
    
        series: ``pandas.Series`` of prices
    
        freq: ``str`` of either ``daily, monthly, quarterly, or yearly``    
        indicating the frequency of the data ``default=`` daily

    :RETURNS:
    
        float of the Sortino Ratio

    .. note:: Calculating the Sortino Ratio

        There are several calculation methodologies for the Sortino 
        Ratio, this method using downside volatility, where
        
        .. math::

            \\textrm{Sortino Ratio} = \\frac{(R-r_f)}
            {\\sigma_\\textrm{downside}}
    
    .. code:: 

        import visualize_wealth.performance as vwp

        sortino_ratio = vwp.sortino_ratio(price_series, 
            frequency = 'monthly')
        
    """
    def _sortino_ratio(series, freq = 'daily'):
        return annualized_return(series, freq = freq)/downside_deviation(
            series, freq = freq)

    if isinstance(series, pandas.DataFrame):
        return series.apply(lambda x: _sortino_ratio(x, freq = freq))
    else:
        return _sortino_ratio(series, freq = freq)

def systematic_as_proportion(series, benchmark, freq = 'daily'):
    """
    Returns the systematic risk as proportion of total volatility

    :ARGS:

        series: ``pandas.Series`` of prices

        benchmark: ``pandas.Series`` to compare ``series`` against

        freq: ``str`` of frequency, either ``daily, monthly, quarterly, 
        or yearly``

    :RETURNS:

        ``float`` between (0, 1) representing the proportion of  volatility
        represented by systematic risk

    """
    def _systematic_as_proportion(series, benchmark, freq = 'daily'):
        fac = _interval_to_factor(freq)
        return systematic_risk(series, benchmark, freq) **2 / (
            annualized_vol(series, freq)**2)

    if isinstance(benchmark, pandas.DataFrame):
        return benchmark.apply(lambda x: _systematic_as_proportion(
            series, x, freq))
    else:
        return _systematic_as_proportion(series, benchmark, freq)


def systematic_risk(series, benchmark, freq = 'daily'):
    """
    Returns the systematic risk, or the volatility that is directly 
    attributable to the benchmark

    :ARGS:

        series: ``pandas.Series`` of prices

        benchmark: ``pandas.Series`` to compare ``series`` against

        freq: ``str`` of frequency, either ``daily, monthly, quarterly, or 
        yearly``

    :RETURNS:

        ``float`` of the systematic volatility (not variance)

    .. note::  Calculating Systematic Risk

        .. math::
            \\sigma_b &= \\textrm{Volatility of the Benchmark} \\\\
            \\sigma^2_{\\beta} &= \\textrm{Systematic Risk} \\\\
            \\beta &= \\frac{\\sigma^2_{s, b}}{\\sigma^2_{b}} \\: 
            \\textrm{then,}

            \\sigma^2_{\\beta} &= \\beta^2 \\cdot \\sigma^2_{b}
            \\Rightarrow \\sigma_{\\beta} &= \\beta \\cdot \\sigma_{b}
    """
    def _systematic_risk(series, benchmark, freq = 'daily'):
        bench_rets = log_returns(benchmark)
        benchmark_vol = annualized_vol(benchmark)
        return benchmark_vol * beta(series, benchmark)

    if isinstance(benchmark, pandas.DataFrame):
        return benchmark.apply(lambda x: _systematic_risk(series, x, freq))
    else:
        return _systematic_risk(series, benchmark, freq)

def tracking_error(series, benchmark, freq = 'daily'):
    """
    Returns a ``float`` of the `Tracking Error 
    <http://en.wikipedia.org/wiki/Tracking_error>`_ or standard 
    deviation of the active returns
      
    :ARGS:

        series: ``pandas.Series`` of prices

        benchmark: ``pandas.Series`` to compare ``series`` against

        freq: ``str`` of frequency, either ``daily, monthly, quarterly, 
        or yearly`` 


    :RETURNS:

        ``float`` of the tracking error

    .. note:: Calculating Tracking Error

        Let :math:`r_{a_i} =` "Active Return" for period :math:`i`, to 
        calculate the compound linear difference between :math:`r_s` 
        and :math:`r_b` is,

        .. math::

            r_{a_i} = \\frac{(1+r_{s_i})}{(1+r_{b_i})}-1

        Then,

        .. math:: 

            \\textrm{TE} &= \\sigma_a \\cdot \\sqrt{k} \\\\
            k &= \\textrm{Annualization factor}

    """
    def _tracking_error(series, benchmark, freq = 'daily'):
        fac = _interval_to_factor(freq)
        series_rets = linear_returns(series)
        bench_rets = linear_returns(benchmark)
        return ((1 + series_rets).div(
            1 + bench_rets) - 1).std()*numpy.sqrt(fac)

    if isinstance(benchmark, pandas.DataFrame):
        return benchmark.apply(lambda x: _tracking_error(
            series, x, freq = freq))
    else:
        return _tracking_error(series, benchmark, freq = freq)

    
def ulcer_index(series):
    """
    Returns the ulcer index of  the series, which is defined as the 
    squared drawdowns (instead of the squared deviations from the mean).  
    Further explanation can be found at `Tanger Tools 
    <http://www.tangotools.com/ui/ui.htm>`_
    
    :ARGS:
    
        series: ``pandas.Series`` of prices

    :RETURNS:
    
        :float: the maximum drawdown of the period, expressed as a 
        positive number

    .. code::

        import visualize_wealth.performance as vwp

        ui = vwp.ulcer_index(price_series)

    """
    def _ulcer_index(series):
        dd = 1. - series/series.cummax()
        ssdd = numpy.sum(dd**2)
        return numpy.sqrt(numpy.divide(ssdd, series.shape[0] - 1))
    if isinstance(series, pandas.DataFrame):
        return series.apply(_ulcer_index)
    else:
        return _ulcer_index(series)


def upcapture(series, benchmark):
    """
    Returns the proportion of ``series``'s cumulative positive returns 
    to ``benchmark``'s cumulative  returns, given benchmark's returns 
    were positive in that period

    :ARGS:

        series: :class:`pandas.Series` of prices

        benchmark: :class:`pandas.Series` of prices to compare ``series`` 
        against

    :RETURNS:

        float: of the upcapture of cumulative positive returns

    .. seealso:: :py:data:`median_upcature(series, benchmark)`
    
    """
    def _upcapture(series, benchmark):
        series_rets = log_returns(series)
        bench_rets = log_returns(benchmark)
        index = bench_rets > 0.
        return series_rets[index].mean() / bench_rets[index].mean()

    if isinstance(benchmark, pandas.DataFrame):
        return benchmark.apply(lambda x: _upcapture(series, x))
    else:
        return _upcapture(series, benchmark)

def upside_deviation(series, freq = 'daily'):
    """
    Returns the volatility of the returns that are greater than zero

    :ARGS:

        series: :class:`pandas.Series` of prices

        freq: ``str`` of frequency, either ``daily, monthly, quarterly, or 
        yearly``

    :RETURNS:

        ``float`` of the upside standard deviation
    """
    def _upside_deviation(series, freq = 'daily'):
        fac = _interval_to_factor(freq)
        series_rets = log_returns(series)
        index = series_rets > 0.
        return series_rets[index].std()*numpy.sqrt(fac)

    if isinstance(series, pandas.DataFrame):
        return series.apply(lambda x: _upside_deviation(x, freq = freq))
    else:
        return _upside_deviation(series, freq)

def var_cf(series, p = .01):
    """
    VaR (Value at Risk), using the `Cornish Fisher Approximation
    <http://en.wikipedia.org/wiki/Cornish%E2%80%93Fisher_expansion>`_.

    :ARGS:

        series: :class:`pandas.Series` or :class:`pandas.DataFrame` 
        of prices

        p: :class:`float` of the :math:`\\alpha` percentile

    :RETURNS:

        :class:`float` or :class:`pandas.Series` of the VaR, where skew 
        and kurtosis are used to adjust the tail density estimation 
        (using the Cornish Fisher Approximation)
    
    """
    series_rets = log_returns(series)
    mu, sigma = series_rets.mean(), series_rets.std()
    skew, kurt = series_rets.skew(), series_rets.kurtosis() - 3.
    v = lambda alpha: scipy.stats.distributions.norm.ppf(1 - alpha)
    V = v(p)+(1-v(p)**2)*skew/6+(5*v(p)-2*v(p)**3)*skew**2/36 + (
        v(p)**3-3*v(p))*kurt/24
    return numpy.exp(sigma * V - mu) - 1

def var_norm(series, p = .01):
    """
    Value at Risk ("VaR") of the :math:`p = \\alpha` quantile, defines 
    the loss, such that there is an :math:`\\alpha` percent chance of 
    a loss, greater than or equal to :math:`\\textrm{VaR}_\\alpha`. 
    :meth:`var_norm` fits a normal distribution to the log returns of 
    the series, and then estimates the :math:`\\textrm{VaR}_\\alpha`

    :ARGS:

        series: :class:`pandas.Series` or :class:`pandas.DataFrame` 
        of prices

        p: :class:`float` of the :math:`\\alpha` quantile for which to 
        estimate VaR

    :RETURNS:

        :class:`float` or :class:`pandas.Series` of VaR

    .. note:: Derivation of Value at Risk

        Let :math:`Y \\sim N(\\mu, \\sigma^2)`, we choose 
        :math:`y_\\alpha` such that 
        :math:`\\mathbb{P}(Y < y_\\alpha) = \\alpha`. 

        Then,

        .. math::

            \\mathbb{P}(Y < y_\\alpha) &= \\alpha \\\\
            \\Rightarrow \\mathbb{P}(\\frac{Y - \\mu}{\\sigma} < 
            \\frac{y_\\alpha - \\mu}{\\sigma}) &= \\alpha 
            \\\\
            \\Rightarrow \\mathbb{P}(Z < \\frac{y_\\alpha - 
            \\mu}{\sigma} &= \\alpha
            \\\\
            \\Rightarrow \\Phi(\\frac{y_\\alpha - \\mu}{\\sigma} ) 
            &= \\alpha, 

        where :math:`\\Phi(.)` is the standard normal cdf operator.
        Then using the inverse of the function :math:`\\Phi`, 
        we have:

        .. math::

            \\Phi^{-1}( \\Phi(\\frac{y_\\alpha - \\mu}{\\sigma} ) ) 
            &= \\Phi^{-1}(\\alpha) 
            \\\\
            \\Rightarrow \\Phi^{-1}(\\alpha)\\cdot\\sigma + \\mu 
            = y_\\alpha 

        But :math:`y_\\alpha` is negative and VaR is always 
        positive, so,

        .. math:: 

            VaR_\\alpha = -y_\\alpha &= -\\Phi^{-1}
            (\\alpha)\\cdot\\sigma - \\mu
            \\\\
            &= \\Phi^{-1}(1 - \\alpha) - \\mu \\\\

    .. seealso:: :meth:var_cf :meth:var_np
             
    """
    def _var_norm(series, p):
        series_rets = log_returns(series)
        mu, sigma = series_rets.mean(), series_rets.std()
        v = lambda alpha: scipy.stats.distributions.norm.ppf(1 - alpha)
        return numpy.exp(sigma * v(p) - mu) - 1

    if isinstance(series, pandas.DataFrame):
        return series.apply(lambda x: _var_norm(x, p = p))
    else:
        return _var_norm(series, p = p)

def var_np(series, p = .01):
    """    
    Return the non-parametric VaR (non-parametric estimate) for a given 
    percentile, i.e. the loss for which there is less than a 
    ``percentile`` chance of exceeding in a period of `freq`.

    :ARGS:
    
        series: ``pandas.Series`` of prices

        freq:``str`` of either ``daily, monthly, quarterly, or yearly``    
        indicating the frequency of the data ``default = daily``

        percentile: ``float`` of the percentile at which to calculate VaR
        
    :RETURNS:
    
        float of the Value at Risk given a ``percentile``

    .. code::

        import visualize_wealth.performance as vwp

        var = vwp.value_at_risk(price_series, frequency = 'monthly', 
        percentile = 0.1)
    
    """
    def _var_np(series, p = .01):
        
        series_rets = linear_returns(series)
        #loss is always reported as positive
        return -1 * (numpy.percentile(series_rets, p*100.))

    if isinstance(series, pandas.DataFrame):
        return series.apply(lambda x: _var_np(x, p = p))
    else:
        return _var_np(series, p = p)

def _interval_to_factor(interval):
    factor_dict = {'daily': 252, 'monthly': 12, 'quarterly': 4, 
                   'yearly': 1}
    return factor_dict[interval] 
    
def _bool_interval_index(pandas_index, interval = 'monthly'):
    """
    creates weekly, monthly, quarterly, or yearly intervals by creating a
    boolean index to be passed visa vie DataFrame.ix[bool_index, :]
    """
    weekly = lambda x: x.weekofyear[1:] != x.weekofyear[:-1]
    monthly = lambda x: x.month[1:] != x.month[:-1]
    yearly = lambda x: x.year[1:] != x.year[:-1]
    ldom = lambda x: x.month[1:] != x.month[:-1]
    fdom = lambda x: numpy.append(False, x.month[1:]!=x.month[:-1])
    qt = lambda x: numpy.append(False, x.quarter[1:]!=x.quarter[:-1])
    time_dict = {'weekly':weekly, 'monthly': monthly, 'quarterly': qt, 
                 'yearly': yearly, 'ldom':ldom, 'fdom':fdom}

    return time_dict[interval](pandas_index)


================================================
FILE: visualize_wealth/classify.py
================================================
#!/usr/bin/env python
# encoding: utf-8
"""
.. module:: visualize_wealth.classify.py

Created by Benjamin M. Gross

"""

import argparse
import datetime
import numpy
import pandas
import os

def classify_series_with_store(series, trained_series, store_path,
                               calc_meth = 'x-inv-x', n = None):
    """
    Determine the asset class of price series from an existing
    HDFStore with prices

    :ARGS:

        series: :class:`pandas.Series` or `pandas.DataFrame` of the
        price series to determine the asset class of

        trained_series: :class:`pandas.Series` of tickers
        and their respective asset classes

        store_path: :class:`string` of the location of the HDFStore
        to find asset prices

        calc_meth: :class:`string` of either ['x-inv-x', 'inv-x', 'exp-x']
        to determine which calculation method is used
                
        n: :class:`integer` of the number of highest r-squared assets
        to include when classifying a new asset

    :RETURNS:

        :class:`string` of the tickers that have been estimated
        based on the method provided
    """
    from .utils import index_intersect
    from .analyze import log_returns, r2_adj

    if series.name in trained_series.index:
        return trained_series[series.name]
    else:
        try:
            store = pandas.HDFStore(path = store_path, mode = 'r')
        except IOError:
            print store_path + " is not a valid path to HDFStore"
            return
        rsq_d = {}
        ys = log_returns(series)

        for key in store.keys():
            key = key.strip('/')
            p = store.get(key)
            xs = log_returns(p['Adj Close'])
            ind = index_intersect(xs, ys)
            rsq_d[key] = r2_adj(benchmark = ys[ind], series = xs[ind])
        rsq_df = pandas.Series(rsq_d)
        store.close()
        if not n:
            n = len(trained_series.unique()) + 1
        
    return __weighting_method_agg_fun(series = rsq_df,
                                      trained_series = trained_series,
                                      n = n, calc_meth = calc_meth)

def classify_series_with_online(series, trained_series,
                                calc_meth = 'x-inv-x', n = None):
    """
    Determine the asset class of price series from an existing
    HDFStore with prices

    :ARGS:

        series: :class:`pandas.Series` or `pandas.DataFrame` of the
        price series to determine the asset class of

        trained_series: :class:`pandas.Series` of tickers
        and their respective asset classes

        calc_meth: :class:`string` of either ['x-inv-x', 'inv-x', 'exp-x']
        to determine which calculation method is used
                
        n: :class:`integer` of the number of highest r-squared assets
        to include when classifying a new asset

    :RETURNS:

        :class:`string` of the tickers that have been estimated
        based on the method provided
    """
    from .utils import tickers_to_dict, index_intersect
    from .analyze import log_returns, r2_adj

    if series.name in trained_series.index:
        return trained_series[series.name]
    else:
        price_dict = tickers_to_dict(trained_series.index)
        rsq_d = {}
        ys = log_returns(series)
        
        for key in price_dict.keys():
            p = price_dict[key]
            xs = log_returns(p['Adj Close'])
            ind = index_intersect(xs, ys)
            rsq_d[key] = r2_adj(benchmark = xs[ind], series = ys[ind])
        rsq_df = pandas.Series(rsq_d)

        if not n:
            n = len(trained_series.unique()) + 1
        
    return __weighting_method_agg_fun(series = rsq_df,
                                      trained_series = trained_series,
                                      n = n, calc_meth = calc_meth)

def knn_exp_weighted(series, trained_series, n = None):
    """
    Training data is a m x n matrix with 'training_tickers' as columns
    and rows of r-squared for different tickers and asset_class is a
    n x 1 result of the asset class

    :ARGS:

        series: :class:`pandas.Series` or :class:`pandas.DataFrame` of
        r-squared values

        trained_series: :class:`pandas.Series` of the columns
        and their respective asset classes

        n: :class:`integer` of the number of highest r-squared assets
        to include when classifying a new asset

    :RETURNS:

        :class:`string` of the tickers that have been estimated
        based on the n closest neighbors
   """
    if not n:
        n = len(trained_series.unique()) + 1

    return __weighting_method_agg_fun(series, trained_series, n,
                                      calc_meth = 'exp-x')

def knn_inverse_weighted(series, trained_series, n = None):
    """
    Training data is a m x n matrix with 'training_tickers' as columns
    and rows of r-squared for different tickers and asset_class is a
    n x 1 result of the asset class

    :ARGS:

        series: :class:`pandas.Series` or :class:`pandas.DataFrame` of
        r-squared values

        trained_series: :class:`pandas.Series` of the columns
        and their respective asset clasnses

        n: :class:`integer` of the number of highest r-squared assets
        to include when classifying a new asset

    :RETURNS:

        :class:`string` of the tickers that have been estimated
        based on the n closest neighbors
   """
    if not n:
        n = len(trained_series.unique()) + 1

    return __weighting_method_agg_fun(series, trained_series, n,
                                      calc_meth = 'inv-x')

def knn_wt_inv_weighted(series, trained_series, n = None):
    """
    Training data is a m x n matrix with 'training_tickers' as columns
    and rows of r-squared for different tickers and asset_class is a
    n x 1 result of the asset class

    :ARGS:

        series: :class:`pandas.Series` or :class:`pandas.DataFrame` of
        r-squared values

        trained_series: :class:`pandas.Series` of the columns
        and their respective asset clasnses

        n: :class:`integer` of the number of highest r-squared assets
        to include when classifying a new asset

    :RETURNS:

        :class:`string` of the tickers that have been estimated
        based on the n closest neighbors
   """
    if not n:
        n = len(trained_series.unique()) + 1

    return __weighting_method_agg_fun(series, trained_series, n,
                                      calc_meth = 'x-inv-x')


def __weighting_method_agg_fun(series, trained_series, n, calc_meth):
    """
    Generator function for the different calcuation methods to determine
    the asset class based on a Series or DataFrame of r-squared values

    :ARGS:

        series: :class:`pandas.Series` or :class:`pandas.DataFrame` of
        r-squared values

        trained_series: :class:`pandas.Series` of the columns
        and their respective asset classes

        n: :class:`integer` of the number of highest r-squared assets
        to include when classifying a new asset

        calc_meth: :class:`string` of either ['x-inv-x', 'inv-x', 'exp-x']
        to determine which calculation method is used

    :RETURNS:

        :class:`string` of the asset class  been estimated
        based on the n closest neighbors, or 'series' in the case when
        a :class:`DataFrame` has been provided instead of a :class:`Series`

    """
    def weighting_method_agg_fun(series, trained_series, n, calc_meth):
        weight_map = {'x-inv-x': lambda x: x.div(1. - x),
                      'inv-x': lambda x: 1./(1. - x),
                      'exp-x': lambda x: numpy.exp(x)
                      }

        key_map = trained_series[series.index]
        series = series.rename(index = key_map)
        wts = weight_map[calc_meth](series)
        wts = wts.sort(ascending = False, inplace = False)
        grp = wts[:n].groupby(wts[:n].index).sum()
        return grp.argmax()

    if isinstance(series, pandas.DataFrame):
        return series.apply(
            lambda x: weighting_method_agg_fun(x,
            trained_series, n, calc_meth), axis = 1)
    else:
        return weighting_method_agg_fun(series, trained_series, n, calc_meth)


================================================
FILE: visualize_wealth/construct_portfolio.py
================================================
#!/usr/bin/env python
# encoding: utf-8
"""
.. module:: visualize_wealth.construct_portfolio.py
   :synopsis: Engine to construct portfolios using three general methodologies

.. moduleauthor:: Benjamin M. Gross <benjaminMgross@gmail.com>
"""
import argparse
import logging
import pandas
import numpy
import pandas.io.data
import datetime
import urllib2

from .utils import (_open_store, 
					tradeplus_tchunks, 
					zipped_time_chunks
)

def format_blotter(blotter_file):
	"""
	Pass in either a location of a blotter file (in ``.csv`` format) or 
	blotter :class:`pandas.DataFrame` with all positive values and 
	return a :class:`pandas.DataFrame` where Sell values are then 
	negative values
	
	:ARGS:
	
		blotter_file: :class:`pandas.DataFrame` with at least index 
		(dates of Buy / Sell) columns = ['Buy/Sell', 'Shares'] or a 
		string of the file location to such a formatted file

	:RETURNS:

		blotter: of type :class:`pandas.DataFrame` where sell values 
		have been made negative

	"""
	if isinstance(blotter_file, str):
		blot = pandas.DataFrame.from_csv(blotter_file)
	elif isinstance(blotter_file, pandas.DataFrame):
		blot = blotter_file.copy()
	#map to ascii
	blot['Buy/Sell'] = map(lambda x: x.encode('ascii', 'ingore'), 
						   blot['Buy/Sell'])
	#remove whitespaces
	blot['Buy/Sell'] = map(str.strip, blot['Buy/Sell'])

	#if the Sell values are not negative, make them negative
	if ((blot['Buy/Sell'] == 'Sell') & (blot['Shares'] > 0.)).any():
		idx = (blot['Buy/Sell'] == 'Sell') & (blot['Shares'] > 0.)
		sub = blot[idx]
		sub['Shares'] = -1.*sub['Shares']
		blot.update(sub)

	return blot

def append_price_frame_with_dividends(ticker, start_date, end_date=None):
	"""
	Given a ticker, start_date, & end_date, return a 
	:class:`pandas.DataFrame` with a Dividend Columns appended to it

	:ARGS:

		ticker: :meth:`str` of ticker

		start_date: :class:`datetime.datetime` or string of format 
		"mm/dd/yyyy"

		end_date: a :class:`datetime.datetime` or string of format 
		"mm/dd/yyyy"

	:RETURNS:
	
		price_df: a :class:`pandas.DataFrame` with columns ['Close', 
		'Adj Close', 'Dividends']

	.. code:: python

		import visualze_wealth.construct portfolio as vwcp
		frame_with_divs = vwcp.append_price_frame_with_dividends('EEM', 
			'01/01/2000', '01/01/2013')

	.. warning:: Requires Internet Connectivity

		Because the function calls the `Yahoo! API 
		<http://www.finance.yahoo.com>`_ internet connectivity is 
		required for the function to work properly
	"""
	reader = pandas.io.data.DataReader

	if isinstance(start_date, str):
		start_date = datetime.datetime.strptime(start_date, "%m/%d/%Y")

	if end_date == None:
		end = datetime.datetime.today()
	elif isinstance(end_date, str):
		end = datetime.datetime.strptime(end_date, "%m/%d/%Y")
	else:
		end = end_date

	#construct the dividend data series
	b_str = 'http://ichart.finance.yahoo.com/table.csv?s='

	if end_date == None:
		end_date = datetime.datetime.today()

	a = '&a=' + str(start_date.month)
	b = '&b=' + str(start_date.day)
	c = '&c=' + str(start_date.year)
	d = '&d=' + str(end.month)
	e = '&e=' + str(end.day)
	f = '&f=' + str(end.year)
	tail = '&g=v&ignore=.csv'
	url = b_str + ticker + a + b + c + d + e + f + tail
	socket = urllib2.urlopen(url)
	div_df = pandas.io.parsers.read_csv(socket, index_col = 0)
	
	price_df = reader(ticker, data_source = 'yahoo', 
					  start = start_date, end = end_date)

	return price_df.join(div_df).fillna(0.0)

def calculate_splits(price_df, tol = .1):
	"""
	Given a ``price_df`` of the format 
	:meth:`append_price_frame_with_dividends`, return a 
	:class:`pandas.DataFrame` with a split factor columns named 'Splits'
	
	:ARGS:
	
		price_df: a :class:`pandas.DataFrame` with columns ['Close', 
		'Adj Close', 'Dividends']

		tol: class:`float` of the tolerance to determine whether a 
		split has occurred

	:RETURNS:
	
		price: :class:`pandas.DataFrame` with columns ['Close', 
		'Adj Close','Dividends', Splits']

	.. code::
	
		price_df_with_divs_and_split_ratios = vwcp.calculate_splits(
			price_df_with_divs, tol = 0.1)

	.. note:: Calculating Splits

		This function specifically looks at the ratios of close to 
		adjusted close to determine whether a split has occurred. To 
		see the manual calculations of this function, see 
		``visualize_wealth/tests/estimating when splits have 
		occurred.xlsx``

	"""
	div_mul = 1 - price_df['Dividends'].shift(-1).div(price_df['Close'])
	rev_cp = div_mul[::-1].cumprod()[::-1]
	rev_cp[-1] = 1.0
	est_adj = price_df['Adj Close'].div(rev_cp)
	eps = est_adj.div(price_df['Close'])
	spl_mul = eps.div(eps.shift(1))
	did_split = numpy.abs(spl_mul - 1) > tol
	splits = spl_mul[did_split]
	for date in splits.index:
		if splits[date] > 1.0:
			splits[date] = numpy.round(splits[date], 0)
		elif splits[date] < 1.0:
			splits[date] = 1./numpy.round(1./splits[date], 0)
	splits.name = 'Splits'
	return price_df.join(splits)

def blotter_and_price_df_to_cum_shares(blotter_df, price_df):
	"""
	Given a blotter :class:`pandas.DataFrame` of dates, purchases (+/-),
	and price :class:`pandas.DataFrame` with Close Adj Close, Dividends, 
	& Splits, calculate the cumulative share balance for the position
	
	:ARGS:
	
		blotter_df: a  :class:`pandas.DataFrame` where index is 
		buy/sell dates

		price_df: a :class:`pandas.DataFrame` with columns ['Close', 
		'Adj Close', 'Dividends', 'Splits']

	:RETURNS:                          

		:class:`pandas.DataFrame` containing contributions, withdrawals, 
		price values

	.. code:: python

		agg_stats_for_single_asset = vwcp.blotter_to_split_adj_shares(
			single_asset_blotter, split_adj_price_frame)

	.. note:: Calculating Position Value

		The sole reason you can't take the number of trades for a 
		given asset, apply a :meth:`cumsum`, and then multiply by 
		'Close' for a given day is because of splits.  Therefore, once 
		this function has run, taking the cumulative shares and then 
		multiplying by close **is** an appropriate way to determine 
		aggregate position value for any given day

	"""
	blotter_df = blotter_df.sort_index()
	#make sure all dates in the blotter file are also in the price file
	#consider, if those dates aren't in price frame, assign the 
	#"closest date" value
	
	msg = "Buy/Sell Dates not in Price File"
	assert blotter_df.index.isin(price_df.index).all(), msg

	#now cumsum the buy/sell chunks and mul by splits for total shares
	bs_series = pandas.Series()
	start_dts = blotter_df.index
	end_dts = blotter_df.index[1:].append(
		pandas.DatetimeIndex([price_df.index[-1]]))

	dt_chunks = zip(start_dts, end_dts)
	end = 0.

	for i, chunk in enumerate(dt_chunks):
		#print str(i) + ' of ' + str(len(dt_chunks)) + ' total'
		tmp = price_df[chunk[0]:chunk[1]][:-1]
		if chunk[1] == price_df.index[-1]:
			tmp = price_df[chunk[0]:chunk[1]]
		splits = tmp[pandas.notnull(tmp['Splits'])]
		vals = numpy.append(blotter_df['Buy/Sell'][chunk[0]] + end,
							splits['Splits'].values)
		dts = pandas.DatetimeIndex([chunk[0]]).append(
			splits['Splits'].index)
		tmp_series = pandas.Series(vals, index = dts)
		tmp_series = tmp_series.cumprod()
		tmp_series = tmp_series[tmp.index].ffill()
		bs_series = bs_series.append(tmp_series)
		end = bs_series[-1]

	bs_series.name = 'cum_shares'

	#construct the contributions, withdrawals, & cumulative investment

	#if a trade is missing a price, assign the 'Close'  of that day
	no_price = blotter_df['Price'][pandas.isnull(blotter_df['Price'])]
	blotter_df.ix[no_price.index, 'Price'] = price_df.ix[no_price.index, 
														 'Close']

	contr = blotter_df['Buy/Sell'].mul(blotter_df['Price'])
	cum_inv = contr.cumsum()
	contr = contr[price_df.index].fillna(0.0)
	cum_inv = cum_inv[price_df.index].ffill()
	res = pandas.DataFrame({'cum_shares':bs_series, 
		'contr_withdrawal':contr, 'cum_investment':cum_inv})
	
	return price_df.join(res)
		
def construct_random_trades(split_df, num_trades):
	"""
	Create random trades on random trade dates, but never allow 
	shares to go negative
	
	:ARGS:
	
		split_df: :class:`pandas.DataFrame` that has 'Close', 
		'Dividends', 'Splits'

	:RETURNS:

		blotter_frame: :class:`pandas.DataFrame` a blotter with random 
		trades, num_trades

	.. note:: Why Create Random Trades?

		One disappointing aspect of any type of financial software is 
		the fact that you **need** to have a portfolio to view what 
		the software does (which never seemed like an appropriate 
		"necessary" condition to me).  Therefore, I've created 
		comprehensive ability to create random trades for single assets,
		as well as random portfolios of assets, to avoid the 
		"unnecessary condition" of having a portfolio to understand 
		how to anaylze one.
	"""
	ind = numpy.sort(numpy.random.randint(0, len(split_df), 
										  size = num_trades))
	#This unique makes sure there aren't double trade day entries 
	#which breaks the function blotter_and_price_df_to_cum_shares
	ind = numpy.unique(ind)
	dates = split_df.index[ind]

	#construct random execution prices
	prices = []
	for date in dates:
		u_lim = split_df.loc[date, 'High']
		l_lim = split_df.loc[date, 'Low']
		prices.append(numpy.random.rand()*(u_lim - l_lim + 1) + l_lim)
		
	trades = numpy.random.randint(-100, 100, size = len(ind))
	trades = numpy.round(trades, -1)

	while numpy.any(trades.cumsum() < 0):
		trades[numpy.argmin(trades)] *= -1.    

	return pandas.DataFrame({'Buy/Sell':trades, 'Price':prices}, 
							index = dates)

def blotter_to_cum_shares(blotter_series, ticker, start_date, 
						  end_date, tol):
	"""
	Aggregation function for :meth:`append_price_frame_with_dividend`, 
	:meth:`calculate_splits`, and 
	`:meth:`blotter_and_price_df_to_cum_shares`. Only blotter, ticker, 
	start_date, & end_date are needed.

	:ARGS:

		blotter_series: a  :class:`pandas.Series` with index of dates 
		and values of quantity

		ticker: class:`str` the ticker for which the buys and sells 
		occurs

		start_date: a :class:`string` or :class:`datetime.datetime`

		end_date: :class:`string` or :class:`datetime.datetime`

		tol: :class:`float`  the tolerance to find the split dates 
		(.1 recommended)
	
	:RETURNS:

		 :class:`pandas.DataFrame` containing contributions, 
		 withdrawals, price values

	.. warning:: Requires Internet Connectivity

	Because the function calls the `Yahoo! API 
	<http://www.finance.yahoo.com>`_ internet connectivity is required 
	for the function to work properly
	
	"""

	price_df = append_price_frame_with_dividends(ticker, start_date, 
												 end_date)

	split_df = calculate_splits(price_df)
	return blotter_and_price_df_to_cum_shares(blotter_series, split_df)

def generate_random_asset_path(ticker, start_date, num_trades):
	"""
	Allows the user to input a ticker, start date, and num_trades to 
	generate a :class:`pandas.DataFrame` with columns 'Open', 'Close', 
	cum_withdrawals', 'cum_shares' (i.e. bypasses the need for a price 
	:class:`pandas.DataFrame` to generate an asset path, as is required 
	in :meth:`construct_random_trades`

	:ARGS:

		ticker: :class:`string` of the ticker to generate the path

		start_date: :class:`string` of format 'mm/dd/yyyy' or 
		:class:`datetime`

		num_trades: :class:`int` of the number of trades to generate

	:RETURNS:

		:class:`pandas.DataFrame` with the additional columns 
		'cum_shares', 'contr_withdrawal', 'Splits', Dividends'
		
	.. warning:: Requires Internet Connectivity

	Because the function calls the `Yahoo! API 
	<http://www.finance.yahoo.com>`_ internet connectivity is required 
	for the function to work properly
	"""
	if isinstance(start_date, str):
		start_date = datetime.datetime.strptime(start_date, "%m/%d/%Y")
	end_date = datetime.datetime.today()
	prices = append_price_frame_with_dividends(ticker, start_date)
	blotter = construct_random_trades(prices, num_trades)
	#blotter.to_csv('../tests/' + ticker + '.csv')
	return blotter_to_cum_shares(blotter_series = blotter, 
		ticker = ticker,  start_date = start_date, 
		end_date = end_date, tol = .1)

def generate_random_portfolio_blotter(tickers, num_trades):
	"""
	:meth:`construct_random_asset_path`, for multiple assets, given a 
	list of tickers and a number of trades (to be used for all tickers). 
	Execution prices will be the 'Close' of that ticker in the price 
	DataFrame that is collected

	:ARGS:
	
		tickers: a :class:`list` with the tickers to be used

		num_trades: :class:`integer`, the number of trades to randomly 
		generate for each ticker

	:RETURNS:

		:class:`pandas.DataFrame` with columns 'Ticker', 'Buy/Sell' 
		(+ for buys, - for sells) and 'Price'

	.. warning:: Requires Internet Connectivity

	Because the function calls the `Yahoo! API 
	<http://www.finance.yahoo.com>`_ internet connectivity is required 
	for the function to work properly
	
	"""
	blot_d = {}
	price_d = {}
	for ticker in tickers:
		tmp = append_price_frame_with_dividends(
			ticker, start_date = datetime.datetime(1990, 1, 1))
		price_d[ticker] = calculate_splits(tmp)
		blot_d[ticker] = construct_random_trades(price_d[ticker], 
												 num_trades)
	ind = []
	agg_d = {'Ticker':[],  'Buy/Sell':[], 'Price':[]}

	for ticker in tickers:
		for date in blot_d[ticker].index:
			ind.append(date)
			agg_d['Ticker'].append(ticker)
			agg_d['Buy/Sell'].append(
				blot_d[ticker].loc[date, 'Buy/Sell'])
			agg_d['Price'].append(
				blot_d[ticker].loc[date, 'Price'])

	return pandas.DataFrame(agg_d, index = ind)

def panel_from_blotter(blotter_df):
	"""
	The aggregation function to construct a portfolio given a blotter 
	of tickers, trades, and number of shares.  

	:ARGS:

		agg_blotter_df: a :class:`pandas.DataFrame` with columns 
		['Ticker', 'Buy/Sell', 'Price'],  where the 'Buy/Sell' column 
		is the quantity of  shares, (+) for buy, (-) for sell

	:RETURNS:
	
		:class:`pandas.Panel` with dimensions [tickers, dates, 
		price data]

	.. note:: What to Do with your Panel

		The :class:`pandas.Panel` returned by this function has all of 
		the necessary information to do some fairly exhaustive 
		analysis.  Cumulative investment, portfolio value (simply the 
		``cum_shares``*``close`` for all assets), closes, opens, etc.  
		You've got a world of information about "your portfolio" with
		this object... get diggin!
	"""
	tickers = pandas.unique(blotter_df['Ticker'])
	start_date = blotter_df.sort_index().index[0]
	end_date = datetime.datetime.today()
	val_d = {}
	for ticker in tickers:
		blotter_series = blotter_df[blotter_df['Ticker'] == ticker]
		blotter_series = blotter_series.sort_index(inplace = True)
		val_d[ticker] = blotter_to_cum_shares(blotter_series, ticker,
			start_date, end_date, tol = .1)

	return pandas.Panel(val_d)

def fetch_data_from_store_weight_alloc_method(weight_df, store_path):
	"""
	To speed up calculation time and allow for off-line functionality,
	provide a :class:`pandas.DataFrame` weight_df and point the function
	to an HDFStore

	:ARGS:
	
		weight_df: a :class:`pandas.DataFrame` with dates as index and 
		tickers as columns

		store_path: :class:`string` of the location to an HDFStore

	:RETURNS:
	
		:class:`pandas.Panel` where:

			* :meth:`panel.items` are tickers

			* :meth:`panel.major_axis` dates

			* :meth:`panel.minor_axis:` price information, specifically: 
			   ['Open', 'Close', 'Adj Close']

	"""

	store = _open_store(store_path)
	beg_port = weight_df.index.min()

	d = {}
	for ticker in weight_df.columns:
		try:
			d[ticker] = store.get(ticker)
		except KeyError as key:
			logging.exception("store.get({0}) ticker failed".format(ticker))

	panel = pandas.Panel(d)

	#Check to make sure the earliest "full data date" is  b/f first trade
	#first_price = max(map(lambda x: panel.loc[x, :,
	#    'Adj Close'].dropna().index.min(), panel.items))

	#print the number of consectutive nans
	#for ticker in weight_df.columns:
	#    print ticker + " " + str(vwa.consecutive(panel.loc[ticker,
	#        first_price:, 'Adj Close'].isnull().astype(int)).max())

	store.close()
	return panel.ffill()
		
def fetch_data_for_weight_allocation_method(weight_df):
	"""
	To be used with `The Weight Allocation Method 
	<./readme.html#the-weight-allocation-method>_` Given a weight_df
	with index of allocation dates and columns of percentage
	allocations, fetch the data using Yahoo!'s API and return a panel 
	of dimensions [tickers, dates, price data], where ``price_data`` 
	has columns ``['Open', 'Close','Adj Close'].``

	:ARGS:
	
		weight_df: a :class:`pandas.DataFrame` with dates as index and 
		tickers as columns

	:RETURNS:
	
		:class:`pandas.Panel` where:

			* :meth:`panel.items` are tickers

			* :meth:`panel.major_axis` dates

			* :meth:`panel.minor_axis:` price information, specifically: 
			   ['Open', 'Close', 'Adj Close']

	.. warning:: Requires Internet Connectivity

	Because the function calls the `Yahoo! API 
	<http://www.finance.yahoo.com>`_ internet connectivity is required 
	for the function to work properly
	"""
	reader = pandas.io.data.DataReader
	beg_port = weight_df.index.min()

	d = {}
	for ticker in weight_df.columns:
		try:
			d[ticker] = reader(ticker, 'yahoo', start = beg_port)
		except:
			print "didn't work for "+ticker+"!"

	#pull the data from Yahoo!
	panel = pandas.Panel(d)

	#Check to make sure the earliest "full data date" is  b/f first trade
	#first_price = max(map(lambda x: panel.loc[x, :,
	#    'Adj Close'].dropna().index.min(), panel.items))

	#print the number of consectutive nans
	#for ticker in weight_df.columns:
	#    print ticker + " " + str(vwa.consecutive(panel.loc[ticker,
	#        first_price:, 'Adj Close'].isnull().astype(int)).max())

	return panel.ffill()

def fetch_data_from_store_initial_alloc_method(
			   initial_weights, store_path, start_date = '01/01/2000'):
	"""
	To speed up calculation time and allow for off-line functionality,
	provide a :class:`pandas.DataFrame` weight_df and point the function
	to an HDFStore

	:ARGS:
	
		weight_df: a :class:`pandas.DataFrame` with dates as index and 
		tickers as columns

		store_path: :class:`string` of the location to an HDFStore

	:RETURNS:
	
		:class:`pandas.Panel` where:

			* :meth:`panel.items` are tickers

			* :meth:`panel.major_axis` dates

			* :meth:`panel.minor_axis:` price information, specifically: 
			   ['Open', 'Close', 'Adj Close']

	"""
	msg = "Not all tickers in HDFStore"
	store = pandas.HDFStore(store_path)
	#assert vwu.check_store_for_tickers(initial_weights.index, store), msg
	#beg_port = datetime.sdat

	d = {}
	for ticker in initial_weights.index:
		try:
			d[ticker] = store.get(ticker)
		except KeyError as key:
			logging.exception("store.get({0}) ticker failed".format(ticker))

	store.close()
	panel = pandas.Panel(d)

	#Check to make sure the earliest "full data date" is  b/f first trade
	#first_price = max(map(lambda x: panel.loc[x, :,
	#    'Adj Close'].dropna().index.min(), panel.items))

	#print the number of consectutive nans
	#for ticker in initial_weights.index:
	#    print ticker + " " + str(vwa.consecutive(panel.loc[ticker,
	#        first_price:, 'Adj Close'].isnull().astype(int)).max())

	return panel.ffill()

def fetch_data_for_initial_allocation_method(initial_weights, 
											 start_date = '01/01/2000'):
	"""
	To be used with `The Initial Allocaiton Method 
	<./readme.html#the-initial-allocation-rebalancing-method>`_ 
	Given initial_weights :class:`pandas.Series` with index of tickers 
	and values of initial allocation percentages, fetch the data using 
	Yahoo!'s API and return a panel of dimensions [tickers, dates, 
	price data], where ``price_data`` has columns ``['Open',  'Close',
	'Adj Close'].``

	:ARGS:
 
		initial_weights :class:`pandas.Series` with tickers as index 
		and weights as values

	:RETURNS:

		:class:`pandas.Panel` where:

			* :meth:`panel.items` are tickers

			* :meth:`panel.major_axis` dates

			* :meth:`panel.minor_axis` price information, specifically: 
			  ['Open', 'Close', 'Adj Close']
	"""
	reader = pandas.io.data.DataReader
	d_0 = datetime.datetime.strptime(start_date, "%m/%d/%Y")

	d = {}
	for ticker in initial_weights.index:
		try:
			d[ticker] = reader(ticker, 'yahoo', start  = d_0)
		except:
			print "Didn't work for " + ticker + "!"
	
	panel = pandas.Panel(d)

	#Check to make sure the earliest "full data date" is bf first trade
	#first_price = max(map(lambda x: panel.loc[x, :,
	#    'Adj Close'].dropna().index.min(), panel.items))

	#print the number of consectutive nans
	#for ticker in initial_weights.index:
	#    print ticker + " " + str(vwa.consecutive(panel.loc[ticker,
	#        first_price: , 'Adj Close'].isnull().astype(int)).max())

	return panel.ffill()


def panel_from_weight_file(weight_df, price_panel, start_value):
	"""
	Returns a :class:`pandas.Panel` with the intermediate calculation
	steps of n0, c0_ac, and adj_q to calculate a portfolio's adjusted
	price path when provided a pandas.DataFrame of weight allocations and a 
	starting value of the index

	:ARGS:
	
		weight_df of :class:`pandas.DataFrame` of a weight allocation 
		with tickers for columns, index of dates and weight allocations 
		to each of the tickers
 
		price_panel of :class:`pandas.Panel` with dimensions [tickers, 
		index, price data]

	:RETURNS:
	
		:class:`pandas.Panel` with dimensions (tickers, dates, 
		price data)

	"""
	#cols correspond 'value_calcs!' in "panel from weight file test.xlsx"
	cols = ['ac_c', 'c0_ac0', 'n0', 'Adj_Q']

	#create the intervals spanning the trade dates

	index = price_panel.major_axis
	w_ind = weight_df.index
	time_chunks = tradeplus_tchunks(weight_index = w_ind,
									price_index = index
	)

	p_val = start_value
	l = []
	f_dt = w_ind[0]

	#for beg, fin in zip(int_beg, int_fin):

	for beg, fin in time_chunks:

		close = price_panel.loc[:, beg:fin, 'Close']
		opn = price_panel.loc[:, beg:fin, 'Open']
		adj = price_panel.loc[:, beg:fin, 'Adj Close']

		n = len(close)
		cl_f = price_panel.loc[:, f_dt, 'Close']
		ac_f = price_panel.loc[:, f_dt, 'Adj Close']

		c0_ac0 = cl_f.div(ac_f)
		n0 = p_val*weight_df.xs(f_dt).div(cl_f)

		ac_c = adj.div(close)

		c0_ac0 = pandas.DataFrame(numpy.tile(c0_ac0, [n, 1]),
								  index = close.index,
								  columns = c0_ac0.index
		)

		n0 = pandas.DataFrame(numpy.tile(n0, [n, 1]),
							  index = close.index,
							  columns = n0.index
		)

		adj_q = c0_ac0.mul(ac_c).mul(n0)
		p_val = adj_q.xs(fin).mul(close.xs(fin)).sum()
		vac = adj_q.mul(close)
		vao = adj_q.mul(opn)
	   
		panel = pandas.Panel.from_dict({'ac_c': ac_c, 
										'c0_ac0': c0_ac0,
										'n0': n0, 
										'Adj_Q': adj_q,
										'Value at Close': vac,
										'Value at Open': vao}
		)

		#set items and minor appropriately for pfp constructors
		panel = panel.transpose(2, 1, 0)
		l.append(panel)
		f_dt = fin
	
	agg = pandas.concat(l, axis = 1)
	return pandas.concat([agg, price_panel], 
						 join = 'inner', 
						 axis = 2
	)

def mngmt_fee(price_series, bps_cost, frequency):
	"""
	Extract management fees from repr(price_series) 
	of repr(bps_cost) every repr(frequency)

	:ARGS:

		price_series: :class:`DataFrame` of 'Open', 'Close' 
		or :class:`pandas.Series` of 'Close'

		bps_cost: :class:`float` of the management fee
		in bps

		frequency: :class:`string` of the frequency to 
		charge the management fee in ['yearly', 'quarterly',
		'monthly', 'daily']

	:RETURNS:

		same as repr(price_series)
	"""
	def time_dist(date, interval):
		"""
		Return the proportion of time left to 
		the end of the interval, from the current
		date
		"""

		return None

	ln = lambda x, y: x.div(y).apply(numpy.log)

	fac = {'daily': 252.,
		   'weekly': 52.,
		   'monthly': 12.,
		   'quarterly': 4.,
		   'yearly': 1.
		   }
	
	per_fee = bps_cost/10000./fac[frequency]

	if frequency is 'daily':
		p_ln = ln(x = price_series, 
			      y = price_series.shift(1)
		)

		p_ln[0] = 0.
		
		fee = numpy.log(1. - per_fee)

		# charge the daily fee on the first day
		ret_p = price_series[0]

		cum_ret = (p_ln + fee).cumsum()
		return ret_p*numpy.exp(cum_ret)

	else:
		tcs = zipped_time_chunks(price_series.index,
								 frequency
		)

		p_o, p_e = tcs[0][0], tcs[0][1]
		rem_t = (p_e - p_o).days
		return None


	# determine the first fee
	
	# extract the first fee

	# create the log changes

	# create the fee costs

	# sum them

	# re-create the price series

	# return None

def _tc_helper(weight_df, share_panel, tau, meth):
	"""
	Helpfer function for the tc_* functions

	Estimate the cumulative rolling transaction costs by ticker using
	the cents per share method of calculation. Can
	be used to directly subtract against tickers / asset classes to 
	determine the asset and asset class impact of transaction costs.

	:ARGS: 

		weight_df: :class:`pandas.DataFrame` weight allocation

		share_panel: :class:`pandas.Panel` with dimensions 
		(tickers, dates, price/share data)

		tau: :class:`float` of the cost per share or basis points

		method: :class:`string` in ['bps', 'cps']

	:RETURNS:

		:class:`pandas.DataFrame` of the cumulative transaction
		cost for each ticker
	"""
	def cps_cost(**kwargs):
		shares = kwargs['shares']
		shares_prev = kwargs['shares_prev']
		tau = kwargs['tau']/100.

		share_diff = abs(shares - shares_prev)
		return share_diff * tau

	def bps_cost(**kwargs):
		shares = kwargs['shares']
		shares_prev = kwargs['shares_prev']
		prices = kwargs['prices']
		tau = kwargs['tau']/10000.

		share_diff = abs(shares - shares_prev)
		return share_diff.mul(prices) * tau
	
	meth_d = {'cps': cps_cost, 
	          'bps': bps_cost
	          }

	adj_q = share_panel.loc[:, :, 'Adj_Q']
	price = share_panel.loc[:, :, 'Close']
	
	tchunks = tradeplus_tchunks(weight_index = weight_df.index,
								price_index = share_panel.major_axis
	)

	#slight finegle to get the tradeplus to be what we need
	sper, fper = zip(*tchunks)
	sper = sper[1:]
	fper = fper[:-1]

	t_o = weight_df.index[0]

	d = {t_o: meth_d[meth](**{'shares': adj_q.loc[t_o, :],
						      'shares_prev': 0.,
						      'prices': price.loc[t_o, :],
						      'tau': tau}
						     )
	}

	for beg, fin in zip(fper, sper):
		d[fin] = meth_d[meth](**{'shares': adj_q.loc[fin, :],
							     'shares_prev': adj_q.loc[beg, :],
							     'tau':tau,
							     'prices': price.loc[fin, :]}
		)

	tcost = pandas.DataFrame(d).transpose()
	cumcost = tcost.reindex(share_panel.major_axis)
	return cumcost.fillna(0.)

def tc_cps(weight_df, share_panel, cps = 10.):
	"""
	Estimate the cumulative rolling transaction costs by ticker using
	the cents per share method of calculation. Can
	be used to directly subtract against tickers / asset classes to 
	determine the asset and asset class impact of transaction costs.

	:ARGS: 

		weight_df: :class:`pandas.DataFrame` weight allocation

		share_panel: :class:`pandas.Panel` with dimensions 
		(tickers, dates, price/share data)

		cps: :class:`float` of the transaction cost in cents per share

	:RETURNS:

		:class:`pandas.DataFrame` of the cumulative transaction
		cost for each ticker
	"""
	return _tc_helper(weight_df = weight_df,
					  share_panel = share_panel,
					  meth = 'cps',
					  tau = cps
	)

def tc_bps(weight_df, share_panel, bps = 10.):
	"""
	Estimate the cumulative rolling transaction costs by ticker as
	basis points of the total value of the transaction. Can
	be used to directly subtract against tickers / asset classes to 
	determine the asset and asset class impact of transaction costs.

	:ARGS: 

		weight_df: :class:`pandas.DataFrame` weight allocation

		share_panel: :class:`pandas.Panel` with dimensions 
		(tickers, dates, price/share data)

		bps: :class:`float` of the transaction cost per trade, 

	:RETURNS:

		:class:`pandas.DataFrame` of the cumulative transaction
		cost for each ticker
	"""
	return _tc_helper(weight_df = weight_df,
					  share_panel = share_panel,
					  meth = 'bps',
					  tau = bps
	)

def net_tcs(tc_df, price_index):
	"""
	Incorporate transaction costs calculated using tc_cps or
	tc_bps into the value of an index (i.e. return the index
	value had transaction costs been accounted for using the 
	given method).

	:ARGS:

		tc_df: :class:`pandas.DataFrame` of transaction costs using
		ether tc_cps or tc_bps

		price_index: :class:`pandas.Series` on which the 
		transaction costs were calculated on

	:RETURNS:

		:class:`pandas.Series` of the adjusted index value

	"""
	#log returns are so ugly
	ln = lambda x, y: x.div(y).apply(numpy.log)
	
	tc_sum = tc_df.sum(axis = 1)
	tc_ln = numpy.log(1. - tc_sum.div(price_index))
	
	p_ln = ln(x = price_index, 
			  y = price_index.shift(1)
	)

	ln_sum = tc_ln.add(p_ln)
	ln_sum[0] = 0.
	p_o = price_index[0] - tc_sum[0]
	return p_o * numpy.exp(ln_sum.cumsum())
		
def weight_df_from_initial_weights(weight_series, price_panel,
	rebal_frequency, start_value = 1000., start_date = None):
	"""
	Returns a :class:`pandas.DataFrame` of weights that are used 
	to construct the portfolio.  Useful in determining tactical 
	over / under weightings relative to other portfolios

	:ARGS:
	
		weight_series of :class:`pandas.Series` of a weight allocation 
		with an index of tickers, and a name of the initial allocation

		price_panel of type :class:`pandas.Panel` with dimensions 
		[tickers, index, price data]

		start_value: of type :class:`float` of the value to start the 
		index

		rebal_frequency: :class:`string` of 'weekly', 'monthly', 
		'quarterly', 'yearly'

	:RETURNS:
	
		price: of type :class:`pandas.DataFrame` with portfolio 
		'Close' and 'Open'
	"""
	
	return initial_weight_help_fn(weight_series, price_panel, 
		rebal_frequency, start_value, start_date, ret_val = 'weights')


def panel_from_initial_weights(weight_series, price_panel, 
	rebal_frequency, start_value = 1000, start_date = None):
	"""
	Returns a pandas.DataFrame with columns ['Close', 'Open'] when 
	provided a pandas.Series of intial weight allocations, the date of 
	those initial weight allocations (series.name), a starting value 
	of the index, and a rebalance frequency (this is the classical 
	"static" construction" methodology, rebalancing at somspecified 
	interval)

	:ARGS:
	
		weight_series of :class:`pandas.Series` of a weight allocation 
		with an index of tickers, and a name of the initial allocation

		price_panel of type :class:`pandas.Panel` with dimensions 
		[tickers, index, price data]

		start_value: of type :class:`float` of the value to start the 
		index

		rebal_frequency: :class:`string` of 'weekly', 'monthly', 
		'quarterly', 'yearly'

	:RETURNS:
	
		weight_df: of type :class:`pandas.DataFrame` of the rebalance
		weights and dates
	"""
	return initial_weight_help_fn(weight_series, price_panel, 
		rebal_frequency, start_value, start_date, ret_val = 'panel')

def initial_weight_help_fn(weight_series, price_panel,
	rebal_frequency, start_value = 1000., start_date = None, ret_val = 'panel'):
	
	#determine the first valid date and make it the start_date
	first_valid = numpy.max(price_panel.loc[:, :, 'Close'].apply(
			pandas.Series.first_valid_index))
	
	if start_date == None:
		d_0 = first_valid
		index = price_panel.loc[:, d_0:, :].major_axis

	else:
		#make sure the the start_date begins after all assets are valid
		if isinstance(start_date, str):
			start_date = datetime.datetime.strptime(start_date, 
													"%m/%d/%Y")
		assert start_date > first_valid, (
			"first_valid index doesn't occur until after start_date")
		index = price_panel.loc[:, start_date, :].major_axis

	#the weigth_series must be a type series, but sometimes can be a 
	#``pandas.DataFrame`` with len(columns) = 1
	msg = "Initial Allocation is not Series"
	if isinstance(weight_series, pandas.DataFrame):
		assert len(weight_series.columns) == 1, msg
		weight_series = weight_series[weight_series.columns[0]]
			
	
	interval_dict = {'weekly':lambda x: x[:-1].week != x[1:].week, 
					 'monthly': lambda x: x[:-1].month != x[1:].month,
					 'quarterly':lambda x: x[:-1].quarter != x[1:].quarter,
					 'yearly':lambda x: x[:-1].year != x[1:].year}

	#create a boolean array of rebalancing dates
	ind = numpy.append(True, interval_dict[rebal_frequency](index))
	weight_df = pandas.DataFrame(numpy.tile(weight_series.values, 
		[len(index[ind]), 1]), index = index[ind], 
		columns = weight_series.index)
	
	if ret_val == 'panel':
		return panel_from_weight_file(weight_df, price_panel, start_value)
	else:
		return weight_df


def pfp_from_weight_file(panel_from_weight_file):
	"""
	pfp stands for "Portfolio from Panel", so this takes the final 
	``pandas.Panel`` that is created in the portfolio construction 
	process when weight file is given and generates a portfolio path 
	of 'Open' and 'Close'

	:ARGS:

		panel_from_weight_file: a :class:`pandas.Panel` that was 
		generated using ``panel_from_weight_file``

	:RETURNS:

		portfolio prices in a :class:`pandas.DataFrame` with columns 
		['Open', 'Close']

	.. note:: The Holy Grail of the Portfolio Path

		The portfolio path is what goes into all of the :mod:`analyze` 
		functions.  So once the `pfp_from_`... has been created, you've 
		got all of the necessary bits to begin calculating performance 
		metrics on a portfolio
	"""
	adj_q = panel_from_weight_file.loc[:, :, 'Adj_Q']
	close = panel_from_weight_file.loc[:, :, 'Close']
	opn = panel_from_weight_file.loc[:, :, 'Open']

	ind_close = adj_q.mul(close).sum(axis = 1)
	ind_open = adj_q.mul(opn).sum(axis = 1)
	port_df = pandas.DataFrame({'Open': ind_open, 
								'Close': ind_close}
	)

	return port_df

def pfp_from_blotter(panel_from_blotter, start_value = 1000.):
	"""
	pfp stands for "Portfolio from Panel", so this takes the final
	:class`pandas.Panel` that is created in the portfolio construction 
	process when a blotter is given and generates a portfolio path of 
	'Open' and 'Close'

	:ARGS:

		 panel_from_blotter: a :class:`pandas.Panel` that was generated 
		 using ref:`panel_from_weight_file`

		start_value: :class:`float` of the starting value, default=1000

	:RETURNS:

		portfolio prices in a :class:`pandas.DataFrame` with columns 
		['Open', 'Close']

	.. note:: The Holy Grail of the Portfolio Path

		The portfolio path is what goes into all of the :mod:`analyze` 
		functions.  So once the `pfp_from_`... has been created, 
		you've got all of the necessary bits to begin calculating 
		performance metrics on your portfolio!

	.. note:: Another way to think of Portfolio Path

		This "Portfolio Path" is really nothing more than a series of 
		prices that, should you have made the trades given in the 
		blotter, would have been the the experience of someone 
		investing `start_value` in your strategy when  your strategy 
		first begins, up until today.
	"""

	panel = panel_from_blotter.copy()
	index = panel.major_axis
	price_df = pandas.DataFrame(numpy.zeros([len(index), 2]), 
		index = index, columns = ['Close', 'Open'])

	price_df.loc[index[0], 'Close'] = start_value
	
	#first determine the log returns for the series
	cl_to_cl_end_val = panel.ix[:, :, 'cum_shares'].mul(
		panel.ix[:, :, 'Close']).add(panel.ix[:, :, 'cum_shares'].mul(
		panel.ix[:, :, 'Dividends'])).sub(
		panel.ix[:, :, 'contr_withdrawal']).sum(axis = 1)

	cl_to_cl_beg_val = panel.ix[:, :, 'cum_shares'].mul(
		panel.ix[:, :, 'Close']).add(panel.ix[:, :, 'cum_shares'].mul(
		panel.ix[:, :, 'Dividends'])).sum(axis = 1).shift(1)

	op_to_cl_end_val = panel.ix[:, :, 'cum_shares'].mul(
		panel.ix[:, :, 'Close']).add(panel.ix[:, :, 'cum_shares'].mul(
		panel.ix[:, :, 'Dividends'])).sum(axis = 1)

	op_to_cl_beg_val = panel.ix[:, :, 'cum_shares'].mul(
		panel.ix[:, :, 'Open']).sum(axis = 1)

	cl_to_cl = cl_to_cl_end_val.div(cl_to_cl_beg_val).apply(numpy.log)
	op_to_cl = op_to_cl_end_val.div(op_to_cl_beg_val).apply(numpy.log)
	price_df.loc[index[1]:, 'Close'] = start_value*numpy.exp(
		cl_to_cl[1:].cumsum())
	price_df['Open'] = price_df['Close'].div(numpy.exp(op_to_cl))
	
	return price_df


if __name__ == '__main__':

	usage = sys.argv[0] + "file_loc"
	description = "description"
	parser = argparse.ArgumentParser(
		description = description, usage = usage)
	parser.add_argument('arg_1', nargs = 1, type = str, help = 'help_1')
	parser.add_argument('arg_2', nargs = 1, type = int, help = 'help_2')
	args = parser.parse_args()


================================================
FILE: visualize_wealth/utils.py
================================================
#!/usr/bin/env python
# encoding: utf-8
"""
.. module:: visualize_wealth.utils.py

.. moduleauthor:: Benjamin M. Gross <benjaminMgross@gmail.com>

"""
import datetime
import logging
import pandas
import numpy
import os

def append_dfs(prv_df, nxt_df):
    """
    Return a single, sorted :class:`DataFrame` where prev_df
    is "stacked" with nxt_df and any overlapping dates 
    are remvoed
    """
    ind_a, ind_b = prv_df.index, nxt_df.index
    apnd = nxt_df.loc[~ind_b.isin(ind_a), :]
    return prv_df.append(apnd)

def exchange_acs_for_ticker(weight_df, ticker_class_dict, date, asset_class, ticker, weight):
    """
    It's common to wonder, what would happen if I took all tickers within a 
    given asset class, zeroed them out, and used some other ticker beginning 
    at some date.  

    :ARGS:

        weight_df: class:`DataFrame` of the weight allocation frame

        ticker_class_dict: :class:`dictionary` of the tickers and the asset 
        classes of each ticker

        date: :class:`string` of the date to zero out the existing tickers
        within an asset class and add ``ticker``

        asset_class: :class:`string` of the 'asset_class' to exchange all 
        tickers for 'ticker'

        ticker: :class:`string` the ticker to add to the weight_df

        weight: :class:`float` of the weight to assign to ``ticker``

    :RETURNS:

        :class:`DataFrame` of the :class:PortfolioObject's rebal_weights, with
        ticker representing weight, beginning on date (or the first trade before)

    """
    
    d = ticker_class_dict
    ind = weight_df.index

    #if the date is exact, use it, otherwise pick the previous one

    if ind[ind.searchsorted(date)] is not pandas.Timestamp(date):
        dt = ind[ind.searchsorted(date) - 1]
    else:
        dt = pandas.Datetime(date)

    #get the tickers with the given asset class
    l = []
    for key, value in d.iteritems():
        if value == asset_class: l.append(key)

    weight_df.loc[dt: , l] = 0.
    s = weight_df.sum(axis = 1)
    weight_df = weight_df.apply(lambda x: x.div(s))

    return ticker_and_weight_into_weight_df(weight_df, ticker, weight, dt)

def ticker_and_weight_into_weight_df(weight_df, ticker, weight, date):
    """
    A helper function to insert a ticker, and its respective weight into a 
    :class:`DataFrame` ``weight_df`` given a dynamic allocation strategy or
    a :class:`Series` given a static allocation strategy

    :ARGS:

        weight_df: :class:`pandas.DataFrame` to be used as a weight allocation
        to construct a portfolio

        ticker: :class:`string` to insert into the weight_df

        weight: :class:`float` of the weight to assign the ticker

        date: :class:`string`, :class:`datetime` or :class:`Timestamp` to first
        allocate ``weight`` to ``ticekr``, going forward.

    :RETURNS:

        :class:`pandas.DataFrame` where the weight_df weights have been 
        proportionally re-distributed on or after ``date``

    """
    ret_df = weight_df.copy()
    ret_df[date:] = ret_df*(1. - weight)
    ret_df[ticker] = 0.
    ret_df.loc[date: , ticker] = weight
    return ret_df

def epoch_to_datetime(pandas_obj):
    """
    Convert string epochs to `pandas.DatetimeIndex`

    :ARGS:

        either a :class:`DataFrame` or :class:`Series` where index can be 
        converted to datetimes

    :RETURNS:

        same as input type, but with index converted into Timestamps
    """
    pandas_obj.index = pandas.to_datetime( pandas_obj.index.astype('int64'), 
                                           unit = 'ms'
    )
    return pandas_obj


def append_store_prices(ticker_list, store_path, start = '01/01/1990'):
    """
    Given an existing store located at ``path``, check to make sure
    the tickers in ``ticker_list`` are not already in the data
    set, and then insert the tickers into the store.

    :ARGS:

        ticker_list: :class:`list` of tickers to add to the
        :class:`pandas.HDStore`

        store_path: :class:`string` of the path to the     
        :class:`pandas.HDStore`

        start: :class:`string` of the date to begin the price data

    :RETURNS:

        :class:`NoneType` but appends the store and comments the
         successes ands failures
    """
    store = _open_store(store_path)
    store_keys = map(lambda x: x.strip('/'), store.keys())
    not_in_store = numpy.setdiff1d(ticker_list, store_keys )
    new_prices = tickers_to_dict(not_in_store, start = start)

    #attempt to add the new values to the store
    for val in new_prices.keys():
        try:
            store.put(val, new_prices[val])
            logging.log(20, "{0} has been stored".format( val))
        except:
            logging.warning("{0} didn't store".format(val))

    store.close()
    return None

def check_store_for_tickers(ticker_list, store):
    """
    Determine which, if any of the :class:`list` `ticker_list` are
    inside of the HDFStore.  If all tickers are located in the store
    returns 1, otherwise returns 0 (provides a "check" to see if
    other functions can be run)

    :ARGS:

        ticker_list: iterable of tickers to be found in the store located
        at :class:`string` store_path

        store: :class:`HDFStore` of the location to the HDFStore

    :RETURNS:

        :class:`bool` True if all tickers are found in the store and
        False if not all the tickers are found in the HDFStore

    """
    if isinstance(ticker_list, pandas.Index):
        #pandas.Index is not sortable, so much tolist() it
        ticker_list = ticker_list.tolist()

    store_keys = map(lambda x: x.strip('/'), store.keys())
    not_in_store = numpy.setdiff1d(ticker_list, store_keys)

    #if len(not_in_store) == 0, all tickers are present
    if not len(not_in_store):
        #print "All tickers in store"
        ret_val = True
    else:
        for ticker in not_in_store:
            print "store does not contain " + ticker
        ret_val = False
    return ret_val

def check_store_path_for_tickers(ticker_list, store_path):
    """
    Determine which, if any of the :class:`list` `ticker_list` are
    inside of the HDFStore.  If all tickers are located in the store
    returns 1, otherwise returns 0 (provides a "check" to see if
    other functions can be run)

    :ARGS:

        ticker_list: iterable of tickers to be found in the store located
        at :class:`string` store_path

        store_path: :class:`string` of the location to the HDFStore

    :RETURNS:

        :class:`bool` True if all tickers are found in the store and
        False if not all the tickers are found in the HDFStore
    """
    store = _open_store(store_path)

    if isinstance(ticker_list, pandas.Index):
        #pandas.Index is not sortable, so much tolist() it
        ticker_list = ticker_list.tolist()

    store_keys = map(lambda x: x.strip('/'), store.keys())
    not_in_store = numpy.setdiff1d(ticker_list, store_keys)
    store.close()

    #if len(not_in_store) == 0, all tickers are present
    if not len(not_in_store):
        print "All tickers in store"
        ret_val = True
    else:
        for ticker in not_in_store:
            print "store does not contain " + ticker
        ret_val = False
    return ret_val

def check_trade_price_start(weight_df, price_df):
    """
    Check to ensure that initial weights / trade dates are after
    the first available price for the same ticker

    :ARGS:

        weight_df: :class:`pandas.DataFrame` of the weights to 
        rebalance the portfolio

        price_df: :class:`pandas.DataFrame` of the prices for each
        of the tickers

    :RETURNS:

        :class:`pandas.Series` of boolean values for each ticker
        where True indicates the first allocation takes place 
        after the first price (as desired) and False the converse
    """
    #make sure all of the weight_df tickers are in price_df
    intrsct = set(weight_df.columns).intersection(set(price_df.columns))

    if set(weight_df.columns) != intrsct:

        raise KeyError, "Not all tickers in weight_df are in price_df"
            

    ret_d = {}
    for ticker in weight_df.columns:
        first_alloc = (weight_df[ticker] > 0).argmin()
        first_price = price_df[ticker].notnull().argmin()
        ret_d[ticker] = first_alloc >= first_price

    return pandas.Series(ret_d)

def create_data_store(ticker_list, store_path):
    """
    Creates the ETF store to run the training of the logistic 
    classificaiton tree

    :ARGS:
    
        ticker_list: iterable of tickers

        store_path: :class:`str` of path to ``HDFStore``
    """
    #check to make sure the store doesn't already exist
    if os.path.isfile(store_path):
        print "File " + store_path + " already exists"
        return
    
    store = pandas.HDFStore(store_path, 'w')
    success = 0
    for ticker in ticker_list:
        try:
            tmp = tickers_to_dict(ticker, 'yahoo', start = '01/01/2000')
            store.put(ticker, tmp)
            print ticker + " added to store"
            success += 1
        except:
            print "unable to add " + ticker + " to store"
    store.close()

    if success == 0: #none of it worked, delete the store
        print "Creation Failed"
        os.remove(path)
    print 
    return None

def first_price_date_get_prices(ticker_list):
    """
    Given a list of tickers, pull down prices and return the first valid price 
    date for each ticker in the list

    :ARGS:

        ticker_list: :class:`string` or :class:`list` of tickers

    :RETURNS:

        :class:`string` of 'dd-mm-yyyy' or :class:`list` of said strings
    """

    #pull down the data into a DataFrame
    df = tickers_to_frame(ticker_list)
    return first_price_date_from_prices(df)

def first_price_date_from_prices(frame):
    """
    Given a :class:`pandas.DataFrame` of prices, return the first date that a 
    price exists for each of the tickers

    :ARGS:

        ticker_list: :class:`string` or :class:`list` of tickers

    :RETURNS:

        :class:`string` of 'dd-mm-yyyy' or :class:`list` of said strings
    """

    fvi = pandas.Series.first_valid_index
    if isinstance(frame, pandas.Series):
        return frame.fvi()
    else:
        return frame.apply(fvi, axis = 0)

def first_valid_date(prices):
    """
    Helper function to determine the first valid date from a set of 
    different prices Can take either a :class:`dict` of 
    :class:`pandas.DataFrame`s where each key is a ticker's 'Open', 
    'High', 'Low', 'Close', 'Adj Close' or a single 
    :class:`pandas.DataFrame` where each column is a different ticker

    :ARGS:

        prices: either :class:`dictionary` or :class:`pandas.DataFrame`

    :RETURNS:

        :class:`pandas.Timestamp` 
   """
    iter_dict = { pandas.DataFrame: lambda x: x.columns,
                  dict: lambda x: x.keys() } 
    try:
        each_first = map(lambda x: prices[x].first_valid_index(),
                         iter_dict[ type(prices) ](prices) )
        return max(each_first)
    except KeyError:
        print "prices must be a DataFrame or dictionary"
        return

def gen_gbm_price_series(num_years, N, price_0, vol, drift):
    """
    Return a price series generated using GBM
    
    :ARGS:

        num_years: number of years (if 20 trading days, then 20/252)

        N: number of total periods
    
        price_0: starting price for the security

        vol: the volatility of the security
    
        return: the expected return of the security
    
    :RETURNS:

        Pandas.Series of length n of the simulated price series
    
    """
    dt = num_years/float(N)
    e1 = (drift - 0.5*vol**2)*dt
    e2 = (vol*numpy.sqrt(dt))
    cum_shocks = numpy.cumsum(numpy.random.randn(N,))
    cum_drift = numpy.arange(1, N + 1)
    
    return pandas.Series(numpy.append(
        price_0, price_0*numpy.exp(cum_drift*e1 + cum_shocks*e2)[:-1]))

def index_intersect(arr_a, arr_b):
    """
    Return the intersection of two :class:`pandas` objects, either a
    :class:`pandas.Series` or a :class:`pandas.DataFrame`

    :ARGS:

        arr_a: :class:`pandas.DataFrame` or :class:`pandas.Series`
        arr_b: :class:`pandas.DataFrame` or :class:`pandas.Series`

    :RETURNS:

        :class:`pandas.DatetimeIndex` of the intersection of the two 
        :class:`pandas` objects
    """
    arr_a = arr_a.sort_index()
    arr_a = arr_a.dropna()
    arr_b = arr_b.sort_index()
    arr_b = arr_b.dropna()
    if arr_a.index.equals(arr_b.index) == False:
        return arr_a.index & arr_b.index
    else:
        return arr_a.index

def index_multi_union(frame_list):
    """
    Returns the index union of multiple 
    :class:`pandas.DataFrame`'s or :class:`pandas.Series`

    :ARGS:

        frame_list: :class:`list` containing either ``DataFrame``'s or
        ``Series``
    
    :RETURNS:

        :class:`pandas.DatetimeIndex` of the objects' intersection
    """
    #check to make sure all objects are Series or DataFrames


    return reduce(lambda x, y: x | y, 
                  map(lambda x: x.dropna().index, 
                      frame_list)
    )

def index_multi_intersect(frame_list):
    """
    Returns the index intersection of multiple 
    :class:`pandas.DataFrame`'s or :class:`pandas.Series`

    :ARGS:

        frame_list: :class:`list` containing either ``DataFrame``'s or
        ``Series``
    
    :RETURNS:

        :class:`pandas.DatetimeIndex` of the objects' intersection
    """

    return reduce(lambda x, y: x & y, 
                  map(lambda x: x.dropna().index, 
                      frame_list) 
    )

def join_on_index(df_list, index):
    """
    pandas doesn't current have the ability to :meth:`concat` several
    :class:`DataFrame`'s on a provided :class:`DatetimeIndex`.  
    This is a quick function to provide that functionality

    :ARGS:

        df_list: :class:`list` of :class:`DataFrame`'s

        index: :class:`Index` on which to join all of the DataFrames
    """
    return pandas.concat( 
                          map( lambda x: x.reindex(index), df_list), 
                          axis = 1
    )

def normalized_price(price_df):
    """
    Return the normalized price of a :class:`pandas.Series` or 
    :class:`pandas.DataFrame`

    :ARGS:

        price_df: :class:`pandas.Series` or :class:`pandas.DataFrame`

    :RETURNS:
        
        same as the input
    """
    null_d = {pandas.DataFrame: lambda x: pandas.isnull(x).any().any(),
              pandas.Series: lambda x: pandas.isnull(x).any()
              }

    calc_d = {pandas.DataFrame: lambda x: x.div(x.iloc[0, :]),
              pandas.Series: lambda x: x.div(x[0])
              }

    typ = type(price_df)
    if null_d[typ](price_df):
        raise ValueError, "cannot contain null values"

    return calc_d[typ](price_df)

def rets_to_price(rets, ret_typ = 'log', start_value = 100.):
    """
    Take a series of repr(rets), of type repr(ret_typ) and 
    convert them into prices

    :ARGS:

        rets: :class:`Series` or :class:`DataFrame` of returns

        ret_typ: :class:`string` of the return type, 
        either ['log', 'linear']

    :RETURNS:

        same as provided type
    """
    def _rets_to_price(rets, ret_typ, start_value):

        typ_d = {'log': lambda x: start_value * numpy.exp(x.cumsum()),
                 'linear': lambda x: start_value * (1. + x).cumprod()
                 }

        fv = rets.first_valid_index()
        fd = rets.index[0]

        if fv == fd:    # no nulls at the beginning
            p = typ_d[ret_typ](rets)
            p = normalized_price(p) * start_value

        else:
            cp = rets.copy()   # copy to prepend with 0.
            loc = cp.index.get_loc(fv)
            fd = cp.index[loc - 1]
            cp[fd] = 0.
            p = typ_d[ret_typ](cp[fd:])
        return p

    if isinstance(rets, pandas.Series):
        return _rets_to_price(rets = rets, 
                              ret_typ = ret_typ,
                              start_value = start_value
        )
    elif isinstance(rets, pandas.DataFrame):
        return rets.apply(
            lambda x: _rets_to_price(rets = x,
                                     ret_typ = ret_typ,
                                     start_value = start_value 
            ), axis = 0
        )

    else:
        raise TypeError, "rets must be Series or DataFrame"


def perturbate_asset(frame, key, eps):
    """
    Perturbate an asset within a weight allocation frame in the amount eps

    :ARGS:

        frame :class:`pandas.DataFrame` of a weight_allocation frame

        key: :class:`string` of the asset to perturbate_asset

        eps: :class:`float` of the amount to perturbate in relative terms

    :RETURNS:

        :class:`pandas.DataFrame` of the perturbed weight_df
    """
    from .analyze import linear_returns

    pert_series = pandas.Series(numpy.zeros_like(frame[key]), 
                          index = frame.index
    )
    
    lin_ret = linear_returns(frame[key])
    lin_ret = lin_ret.mul(1. + eps)
    pert_series[0] = p_o = frame[key][0]
    pert_series[1:] = p_o * (1. + lin_ret[1:])
    ret_frame = frame.copy()
    ret_frame[key] = pert_series
    return ret_frame


def setup_trained_hdfstore(trained_data, store_path):
    """
    The ``HDFStore`` doesn't work properly when it's compiled by different
    versions, so the appropriate thing to do is to setup the trained data
    locally (and not store the ``.h5`` file on GitHub).

    :ARGS:

        trained_data: :class:`pandas.Series` with tickers in the index and
        asset  classes for values 

        store_path: :class:`str` of where to create the ``HDFStore``
    """
    
    create_data_store(trained_data.index, store_path)
    return None

def tickers_to_dict(ticker_list, api = 'yahoo', start = '01/01/1990'):
    """
    Utility function to return ticker data where the input is either a 
    ticker, or a list of tickers.

    :ARGS:

        ticker_list: :class:`list` in the case of multiple tickers or 
        :class:`str` in the case of one ticker

        api: :class:`string` identifying which api to call the data 
        from.  Either 'yahoo' or 'google'

        start: :class:`string` of the desired start date
                
    :RETURNS:

        :class:`dictionary` of (ticker, price_df) mappings or a
        :class:`pandas.DataFrame` when the ``ticker_list`` is 
        :class:`str`
    """
    if isinstance(ticker_list, (str, unicode)):
        return __get_data(ticker_list, api = api, start = start)
    else:
        d = {}
        for ticker in ticker_list:
            d[ticker] = __get_data(ticker, api = api, start = start)
    return d

def tickers_to_frame(ticker_list, api = 'yahoo', start = '01/01/1990', 
                     join_col = 'Adj Close'):
    """
    Utility function to return ticker data where the input is either a 
    ticker, or a list of tickers.

    :ARGS:

        ticker_list: :class:`list` in the case of multiple tickers or 
        :class:`str` in the case of one ticker

        api: :class:`string` identifying which api to call the data 
        from.  Either 'yahoo' or 'google'

        start: :class:`string` of the desired start date

        join_col: :class:`string` to aggregate the 
        :class:`pandas.DataFrame`
                
    :RETURNS:

        :class:`pandas.DataFrame` of (ticker, price_df) mappings or a
        :class:`pandas.DataFrame` when the ``ticker_list`` is 
        :class:`str`
    """
    
    if isinstance(ticker_list, (str, unicode)):
        return __get_data(ticker_list, api = api, start = start)[join_col]
    else:
        d = {}
        for ticker in ticker_list:

            tmp = __get_data(ticker, 
                             api = api,
                             start = start
            )

            d[ticker] = tmp[join_col]

    return pandas.DataFrame(d)

def ticks_to_frame_from_store(ticker_list, store_path,  join_col = 'Adj Close'):
    """
    Utility function to return ticker data where the input is either a 
    ticker, or a list of tickers.

    :ARGS:

        ticker_list: :class:`list` in the case of multiple tickers or 
        :class:`str` in the case of one ticker

        store_path: :class:`str` of the path to the store

        join_col: :class:`string` to aggregate the :class:`pandas.DataFrame`
                
    :RETURNS:

        :class:`pandas.DataFrame` of (ticker, price_df) mappings or a
        :class:`pandas.DataFrame` when the ``ticker_list`` is 
        :class:`str`
    """
    store = _open_store(store_path)

    if isinstance(ticker_list, (str, unicode)):
        ret_series = store[ticker_list][join_col]
        store.close()
        return ret_series
    else:
        d = {}
        for ticker in ticker_list:
            d[ticker] = store[ticker][join_col]
        store.close()
        price_df = pandas.DataFrame(d)
        d_o = first_valid_date(price_df)
        price_df = price_df.loc[d_o:, :]

    return price_df

def create_store_master_index(store_path):
    """
    Add a master index, key = 'IND3X', to HDFStore located at store_path

    :ARGS:

        store_path: :class:`string` the location of the ``HDFStore`` file

    :RETURNS:

        :class:`NoneType` but updates the ``HDF5`` file

    """
    store = _open_store(store_path)

    keys = store.keys()

    if '/IND3X' in keys:
        print "u'IND3X' already exists in HDFStore at {0}".format(store_path)

        store.close()
        return
    else:
        union = union_store_indexes(store)
        store.put('IND3X', pandas.Series(union, index = union))
        store.close()

def union_store_indexes(store):
    """
    Return the union of all Indexes within a store located inside store

    :ARGS:

        store: :class:`HDFStore`

    :RETURNS:

        :class:`pandas.DatetimeIndex` of the union of all indexes within
        the store

    """
    key_iter = (key for key in store.keys())
    ind = store.get(key_iter.next()).index
    union = ind.copy()

    for key in key_iter:
        union = union | store.get(key).index
    return union

def create_store_cash(store_path):
    """
    Create a cash price, key = u'CA5H' in an HDFStore located at store_path

    :ARGS:

        store_path: :class:`string` the location of the ``HDFStore`` file

    :RETURNS:

        :class:`NoneType` but updates the ``HDF5`` file, and prints to 
        screen which values would not update

    """
    store = _open_store(store_path)
    keys = store.keys()
    if '/CA5H' in keys:
        logging.log(1, "CA5H prices already exists")
        store.close()
        return

    if '/IND3X' not in keys:
        m_index = union_store_indexes(store)
    else:
        m_index = store.get('IND3X')

    cols = ['Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']
    n_dates, n_cols = len(m_index), len(cols)

    df = pandas.DataFrame(numpy.ones([n_dates, n_cols]), 
                          index = m_index,
                          columns = cols
    )
    store.put('CA5H', df)
    store.close()
    return

def update_store_master_index(store_path):
    """
    Intelligently update the store 'IND3X', this can only be done
    after the prices at the store path have been updated
    """
    store = _open_store(store_path)

    try:
        stored_data = store.get('IND3X')
    except KeyError:
        logging.exception("store doesn't contain IND3X")
        store.close()
        raise

    last_stored_date = stored_data.dropna().index.max()
    today = datetime.datetime.date(datetime.datetime.today())
    if last_stored_date < pandas.Timestamp(today):

        union_ind = union_store_indexes(store)
        tmp = pandas.Series(union_ind, index = union_ind)

        #need to drop duplicates because there's 1 row of overlap
        tmp = stored_data.append(tmp)
        tmp.drop_duplicates(inplace = True)
        store.put('IND3X', tmp)

    store.close()
    return None

def update_store_cash(store_path):
    """
    Intelligently update the values of CA5H based on existing keys in the 
    store, and existing columns of the CA5H values

    :ARGS:

        store_path: :class:`string` the location of the ``HDFStore`` file

    :RETURNS:

        :class:`NoneType` but updates the ``HDF5`` file, and prints to 
        screen which values would not update

    """
    store = _open_store(store_path)
    td = datetime.datetime.today()

    try:
        master_ind = store.get('IND3X')
        cash = store.get('CA5H')
    except KeyError:
        print "store doesn't contain {0} and / or {1}".format(
            'CA5H', 'IND3X')
        store.close()
        raise

    last_cash_dt = cash.dropna().index.max()
    today = datetime.datetime.date(td)
    if last_cash_dt < pandas.Timestamp(today):
        try:
            n = len(master_ind)
            cols = ['Open', 'High', 'Low', 
                    'Close', 'Volume', 'Adj Close']

            cash = pandas.DataFrame(
                        numpy.ones([n, len(cols)]),
                        index = master_ind,
                        columns = cols
            )
            store.put('CA5H', cash)
        except:
            print "Error updating cash"

    store.close()
    return None

def strip_vals(keys):
    """
    Return a stripped value for each key in keys

    :ARGS:

        keys: :class:`list` of string values (usually tickers)

    :RETURNS:

        same as input class with whitespace stripped out
    """
    return list((x.strip() for x in keys))

def update_store_prices(store_path, store_keys = None):
    """
    Update to the most recent prices for all keys of an existing store, 
    located at ``store_path``.

    :ARGS:

        store_path: :class:`string` the location of the ``HDFStore`` file

        store_keys: :class:`list` of keys to update

    :RETURNS:

        :class:`NoneType` but updates the ``HDF5`` file, and prints to 
        screen which values would not update

    .. note::

        If special keys exist (like, CASH, or INDEX), then keys can be 
        passed to update to ensure that the store does not try to update
        those keys

    """
    def _cleaned_keys(keys):
        """
        Remove the CA5H and IND3X keys from the list 
        if they are present
        """
        blk_lst = ['IND3X', 'CA5H', '/IND3X', '/CA5H']

        for key in blk_lst:
            try:
                keys.remove(key)
                print "{0} removed".format(key)
            except:
                print "{0} not in keys".format(key)

        return keys

    reader = pandas.io.data.DataReader
    strftime = datetime.datetime.strftime
    today_str = strftime(datetime.datetime.today(), format = '%m/%d/%Y')
    
    store = _open_store(store_path)

    if not store_keys:
        store_keys = store.keys()

    store_keys = _cleaned_keys(store_keys)
    for key in store_keys:
        stored_data = store.get(key)
        last_stored_date = stored_data.dropna().index.max()
        today = datetime.datetime.date(datetime.datetime.today())
        if last_stored_date < pandas.Timestamp(today):
            try:
                tmp = reader(key.strip('/'), 'yahoo', start = strftime(
                    last_stored_date, format = '%m/%d/%Y'))

                #need to drop duplicates because there's 1 row of overlap
                tmp = stored_data.append(tmp)
                tmp["index"] = tmp.index
                tmp.drop_duplicates(cols = "index", inplace = True)
                tmp = tmp[tmp.columns[tmp.columns != "index"]]
                store.put(key, tmp)
            except:
                print "could not update {0}".format(key)
                logging.exception("could not update {0}".format(key))

    store.close()
    return None


def zipped_time_chunks(index, interval, incl_T = False):
    """
    Given different period intervals, return a zipped list of tuples
    of length 'period_interval', containing only full periods

    .. note:: 

        The function assumes indexes are of 'daily_frequency'
    
    :ARGS:
    
        index: :class:`pandas.DatetimeIndex`

        per_interval: :class:`string` either 'weekly,
        'monthly', 'quarterly', or 'yearly'
    """

    time_d = {'weekly': lambda x: x.week,
              'monthly': lambda x: x.month, 
              'quarterly':lambda x:x.quarter,
              'yearly':lambda x: x.year}

    prv = time_d[interval](index[:-1])
    nxt = time_d[interval](index[1:])
    ind =  prv != nxt


    if ind[0]:   # index started on the last day of period
        index = index.copy()[1:]   # remove first elem
        prv = time_d[interval](index[:-1])
        nxt = time_d[interval](index[1:])
        ind = prv != nxt

    if incl_T:
        if not ind[-1]:   # doesn't already end on True
            ind = numpy.append(ind, True)

    ldop = index[ind]   # last day of period
    f_ind = numpy.append(True, ind[:-1])
    fdop = index[f_ind]   # first day of period
    return zip(fdop, ldop)

def tradeplus_tchunks(weight_index, price_index):
    """
    Return zipped time intervals of trade signal and trade signal + 1

    :ARGS:

        weight_index: :class:`pandas.DatetimeIndex` of the weight allocation
        frame of generated signals

        price_index: :class:`pandas.DatetimeIndex` for all the price data

    :RETURNS:

        :class:`tuple` of int_beg, the t + 1 date after the weight signal
        and int_fin, the next weight signal (or last date in the price_index)

    .. note:: 

        having consecutive, non-overlapping intervals is commonly used for 
        things such as optimizing share calculation algorithms, transaction
        cost calculation, etc.
    """
    locs = list(price_index.get_loc(key) + 1 for key in weight_index)
    do = pandas.DatetimeIndex([weight_index[0]])
    int_beg = price_index[locs[1:]]
    int_beg = do.append(int_beg)

    int_fin = weight_index[1:]
    dT = pandas.DatetimeIndex([price_index[-1]])
    int_fin = int_fin.append(dT)
    return zip(int_beg, int_fin)

def _open_store(store_path):
    """
    open an HDFStore located at store_path with the appropriate error handling

    :ARGS:

        store_path: :class:`string` where the store is located

    :RETURNS:

        :class:`HDFStore` instance
    """
    try:
        store = pandas.HDFStore(path = store_path, mode = 'r+')
        return store
    except IOError:
        logging.exception(
            "{0} is not a valid path to an HDFStore Object".format(store_path)
        )
        raise
    

def __get_data(ticker, api, start):
    """
    Helper function to get Yahoo! Data with exceptions built in and 
    messages that confirm success for given tickers

    ARGS:
        
        ticker: either a :class:`string` of a ticker or a :class:`list`
        of tickers

        api: :class:`string` the api from which to get the data, 
        'yahoo'or 'google'

        start: :class:`string` the start date to start the data 
        series

    """
    reader = pandas.io.data.DataReader
    try:
        data = reader(ticker, api, start = start)
        return data
    except:
        print "failed for " + ticker
        return