Repository: benjaminmgross/visualize-wealth Branch: master Commit: 76f3f0fd815a Files: 22 Total size: 187.7 KB Directory structure: gitextract_xj6nrv43/ ├── .gitignore ├── .travis.yml ├── README.md ├── requirements.txt ├── run_tests ├── setup.py ├── test_data/ │ ├── estimating when splits have occurred.xlsx │ ├── panel from weight file test.xlsx │ ├── test_analyze.xlsx │ ├── test_ret_calcs.xlsx │ ├── test_splits.xlsx │ ├── transaction-costs.xlsx │ └── ~$panel from weight file test.xlsx ├── test_module/ │ ├── __init__.py │ ├── test_analyze.py │ ├── test_construct_portfolio.py │ └── test_utils.py └── visualize_wealth/ ├── __init__.py ├── analyze.py ├── classify.py ├── construct_portfolio.py └── utils.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ .DS_Store *~ build/ *.pyc *.dropbox *.egg-info/ dist/ docs/ .coverage ================================================ FILE: .travis.yml ================================================ language: python python: - 2.7 # command to install dependencies before_install: - wget http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh - chmod +x miniconda.sh - ./miniconda.sh -b - export PATH=/home/travis/miniconda/bin:$PATH - conda update --yes conda install: - conda install --yes python=$TRAVIS_PYTHON_VERSION atlas numpy scipy pytest - conda install --yes python=$TRAVIS_PYTHON_VERSION matplotlib nose dateutil - conda install --yes python=$TRAVIS_PYTHON_VERSION pandas statsmodels pytables xlrd # - python setup.py install # - pip install -r preamble.txt # - pip install -r requirements.txt # - pip install -r denouement.txt # command to run tests script: - py.test ./test_module/test_analyze.py -v - py.test ./test_module/test_utils.py -v - py.test ./test_module/test_construct_portfolio.py -v # the body of this script was found by @dan-blanchard at https://gist.github.com/dan-blanchard/7045057 ================================================ FILE: README.md ================================================ #`visualize_wealth` README.md [![Build Status](https://travis-ci.org/benjaminmgross/visualize-wealth.svg?branch=master)](https://travis-ci.org/benjaminmgross/visualize-wealth) A library built in Python to construct, backtest, analyze, and evaluate portfolios and their benchmarks, with comprehensive documentation and manual calculations to illustrate all underlying methodologies and statistics. ##License This program is free software and is distrubuted under the [GNU General Public License version 3](http://www.gnu.org/licenses/quick-guide-gplv3.html) ("GNU GPL v3") © Benjamin M. Gross 2013 **NOTE:** Because so much of the underlying technology I'm continuing to build has become the building blocks for [my financial technology startup](http://www.visualizewealth.com), I've forked this repo (as of 5.2015) and made new changes private. I might continue to push some of the bigger changes to this repo to keep it open source, but we'll see. ##Dependencies - `numpy` & `scipy`: The building blocks of everything quant - `pandas`: extensively used (`numpy` and `scipy` obviously, but - `pandas` depends on those) - `tables`: for HDFStore price extraction - `urllib2`: for Yahoo! API calls to append price `DataFrame`s with Dividends For a full list of dependencies, see the `requirements.txt` file in the root folder. ##Installation To install the `visualize_wealth` modules onto your computer, go into your desired folder of choice (say `Downloads`), and: 1. Clone the repository $ cd ~/Downloads $ git clone https://github.com/benjaminmgross/wealth-viz 2. `cd` into the `wealth-viz` directory $ cd wealth-viz 3. Install the package $ python setup.py install 4. Check your install. From anywhere on your machine, be able to open `iPython` and import the library, for example: $ cd ~/ $ ipython IPython 1.1.0 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object', use 'object??' for extra details. In [1]: import visualize_wealth **"Ligget Se!"** ##Documentation The `README.md` file has fairly good examples, but I've gone to great lengths to autogenerate documentation for the code using [Sphinx](http://sphinx-doc.org/). Therefore, aside from the docstrings, when you `git clone` the repository, use these instructions to generate the auto-documentation: 1. `cd /path-to-wealth-viz/` 2. `sphinx-build -b html ./docs/source/ ./docs/build/` Now that the autogenerated documentation is complete, you can `cd` into: $ cd visualize_wealth/docs/build/ and find full `.html` browseable code documentation (that's pretty beautiful... if I do say so my damn self) with live links, function explanations (that also have live links to their respective definition on the web), etc. Also I've created an Excel spreadsheet that illustrates almost all of the `analyze.py` portfolio statistic calculations. That spreadsheet can be found in: visualize_wealth > tests > test_analyze.xlsx In fact, the unit testing for the `analyze.py` portfolio statistics tests the python calculations against this same excel spreadsheet, so you can really get into the guts of how these things are calculated. ##[Portfolio Construction Examples](portfolio-construction-examples) Portfolios can (generally) be constructed in one of three ways: 1. The Blotter Method 2. Weight Allocation Method 3. Initial Allocation with specific Rebalancing Period Method ### 1. [The Blotter Method](blotter-method-examples) **The blotter method:** In finance, a spreadsheet of "buys/sells", "Prices", "Dates" etc. is called a "trade blotter." This also would be the easiest way for an investor to actually analyze the past performance of her portfolio, because trade confirmations provide this exact data. This method is most effectively achieved by providing an Excel / `.csv` file with the following format: | Date |Buy / Sell| Price |Ticker| |:-------|:---------|:------|:-----| |9/4/2001| 50 | 123.45| EFA | |5/5/2003| 65 | 107.71| EEM | |6/6/2003|-15 | 118.85| EEM | where "Buys" can be distinguished from "Sells" because buys are positive (+) and sells are negative (-). For example, let's say I wanted to generate a random portfolio containing the following tickers and respective asset classes, using the `generate_random_portfolio_blotter` method |Ticker | Description | Asset Class | Price Start| |:-------|:-------------------------|:-------------------|:-----------| | IWB | iShares Russell 1000 | US Equity | 5/19/2000 | | IWR | iShares Russell Midcap | US Equity | 8/27/2001 | | IWM | iShares Russell 2000 | US Equity | 5/26/2000 | | EFA | iShares EAFE | Foreign Dev Equity | 8/27/2001 | | EEM | iShares EAFE EM | Foreign EM Equity | 4/15/2003 | | TIP | iShares TIPS | Fixed Income | 12/5/2003 | | TLT | iShares LT Treasuries | Fixed Income | 7/31/2002 | | IEF | iShares MT Treasuries | Fixed Income | 7/31/2002 | | SHY | iShares ST Treasuries | Fixed Income | 7/31/2002 | | LQD | iShares Inv Grade | Fixed Income | 7/31/2002 | | IYR | iShares Real Estate | Alternative | 6/19/2000 | | GLD | iShares Gold Index | Alternative | 11/18/2004 | | GSG | iShares Commodities | Alternative | 7/21/2006 | I could construct a portfolio of random trades (i.e. the "blotter method"), say 20 trades for each asset, by executing the following: #import the modules In [5]: import vizualize_wealth.construct_portfolio as vwcp In [6]: ticks = ['IWB','IWR','IWM','EFA','EEM','TIP','TLT','IEF', 'SHY','LQD','IYR','GLD','GSG'] In [7]: num_trades = 20 #construct the random trade blotter In [8]: blotter = vwcp.generate_random_portfolio_blotter(ticks, num_trades) #construct the portfolio panel In [9]: port_panel = vwcp.panel_from_blotter(blotter) Now I have a `pandas.Panel`. Before we constuct the cumulative portfolio values, let's examine the dimensions of the panel (which are generally the same for all construction methods, although the columns of the `minor_axis` are different because the methods call for different optimized calculations) with the following dimensions: #tickers are `panel.items` In [10]: port_panel.items Out[10]: Index([u'EEM', u'EFA', u'GLD', u'GSG', u'IEF', u'IWB', u'IWM', u'IWR', u'IYR', u'LQD', u'SHY', u'TIP', u'TLT'], dtype=object) #dates are along the `panel.major_axis` In [12]: port_panel.major_axis Out[12]: [2000-07-06 00:00:00, ..., 2013-10-30 00:00:00] Length: 3351, Freq: None, Timezone: None #price data, cumulative investment, dividends, and split ratios are `panel.minor_axis` In [13]: port_panel.minor_axis Out[13]: Index([u'Open', u'High', u'Low', u'Close', u'Volume', u'Adj Close', u'Dividends',u'Splits', u'contr_withdrawal', u'cum_investment', u'cum_shares'], dtype=object) There is a lot of information to be gleaned from this data object, but the most common goal would be to convert this `pandas.Panel` to a Portfolio `pandas.DataFrame` with columns `['Open', 'Close']`, so it can be compared against other assets or combination of assets. In this case, use `pfp_from_blotter`(which stands for "portfolio_from_panel" + portfolio construction method [i.e. blotter, weights, or initial allocaiton] which in this case was "the blotter method"). #construct_the portfolio series In [14]: port_df = vwcp.pfp_from_blotter(panel, 1000.) In [117]: port_df.head() Out[117]: Close Open Date 2000-07-06 1000.000000 988.744754 2000-07-07 1006.295307 1000.190767 2000-07-10 1012.876765 1005.723006 2000-07-11 1011.636780 1011.064479 2000-07-12 1031.953453 1016.978253 ###2. [The Weight Allocation Method](weight-allocation-method-examples) A commonplace way to test portoflio management strategies using a group of underlying assets is to construct aggregate portofolio performance, given a specified weighting allocation to specific assets on specified dates. Specifically, those (often times) percentage allocations represent a recommended allocation at some point in time, based on some "view" derived from either the output of a model or some qualitative analysis. Therefore, having an engine that is capable of taking in a weighting file (say, a `.csv`) with the following format: |Date | Ticker 1 | Ticker 2 | Ticker 3 | Ticker 4 | |:-------|:---------:|:---------:|:--------:|:--------:| |1/1/2002| 5% | 20% | 30% | 45% | |6/3/2003| 40% | 10% | 40% | 10% | |7/8/2003| 25% | 25% | 25% | 25% | and turning the above allocation file into a cumulative portfolio value that can then be analyzed and compared (both in isolation and relative to specified benchmarks) is highly valuable in the process of portfolio strategy creation. A quick example of a weighting allocation file can be found in the Excel File `visualize_wealth/tests/panel from weight file test.xlsx`, where the tab `rebal_weights` represents one of these specific weighting files. To construct a portfolio of using the **Weighting Allocation Method**, a process such as the following would be carried out. #import the library import visualize_wealth.construct_portfolio as vwcp If we didn't have the prices already, there's a function for that #fetch the prices and put them into a pandas.Panel price_panel = vwcp.fetch_data_for_weight_allocation_method(weight_df) #construct the panel that will go into the portfolio constructor port_panel = vwcp.panel_from_weight_file(weight_df, price_panel, start_value = 1000.) Construct the `pandas.DataFrame` for the portfolio, starting at `start_value` of 1000 with columns `['Open', Close']` portfolio = vwcp.pfp_from_weight_file(port_panel) Now a portfolio with `index` of daily values and columns `['Open', 'Close']` has been created upon which analytics and performance analysis can be done. ### 3. [The Initial Allocation & Rebalancing Method](initial-allocation-method-examples) The standard method of portoflio construction that pervades in many circles to this day is static allocation with a given interval of rebalancing. For instance, if I wanted to implement Oppenheimers' [The New 60/40](https://www.oppenheimerfunds.com/digitalAssets/Discover-the-New-60-40-43f7f642-e0aa-40d9-a3fc-00f31be5a4fa.pdf) static portfolio, rebalancing on a yearly interval, my weighting scheme would be as follows: | Ticker | Name | Asset Class | Allocation | |:-------|:-------------------------|:-------------------|:-----------| | IWB | iShares Russell 1000 | US Equity | 15% | | IWR | iShares Russell Midcap | US Equity | 7.5% | | IWM | iShares Russell 2000 | US Equity | 7.5% | | SCZ | iShares EAFE Small Cap | Foreign Dev Equity | 7.5% | | EFA | iShares EAFE | Foreign Dev Equity | 12.5% | | EEM | iShares EAFE EM | Foreign EM Equity | 10% | | TIP | iShares TIPS | Fixed Income | 5% | | TLT | iShares LT Treasuries | Fixed Income | 2.5% | | IEF | iShares MT Treasuries | Fixed Income | 2.5% | | SHY | iShares ST Treasuries | Fixed Income | 5% | | HYG | iShares High Yield | Fixed Income | 2.5% | | LQD | iShares Inv Grade | Fixed Income | 2.5% | | PCY | PowerShares EM Sovereign | Fixed Income | 2% | | BWX | SPDR intl Treasuries | Fixed Income | 2% | | MBB | iShares MBS | Fixed Income | 1% | | PFF | iShares Preferred Equity | Alternative | 2.5% | | IYR | iShares Real Estate | Alternative | 5% | | GLD | iShares Gold Index | Alternative | 2.5% | | GSG | iShares Commodities | Alternative | 5% | To implement such a weighting scheme, we can use the same worksheet `visualize_wealth/tests/panel from weight file test.xlsx`, and the tab. `static_allocation`. Note there is only a single row of weights, as this will be the "static allocation" to be rebalanced to at some given interval. #import the construct_portfolio library import visualize_wealth.construct_portfolio as vwcp Let's use the `static_allocation` provided in the `panel from weight file.xlsx` workbook f = pandas.ExcelFile('tests/panel from weight file test.xlsx') static_alloc = f.parse('static_allocation', index_col = 0, header_col = 0) Again, assume we don't have the prices and need to donwload them, use the `fetch_data_for_initial_allocation_method` price-panel = vwcp.fetch_data_for_initial_allocation_method(static_alloc) Construct the `panel` for the portoflio while determining the desired rebalance frequency panel = vwcp.panel_from_initial_weights(weight_series = static_alloc, static_alloc, price_panel = price_panel, rebal_frequency = 'quarterly') Construct the final portfolio with columns `['Open', 'Close']` portfolio = vwcp.pfp_from_weight_file(panel) Take a look at the portfolio series: In [10:] portfolio.head() Out[11:] Close Open Date 2007-12-12 1000.000000 1007.885932 2007-12-13 991.329125 990.717915 2007-12-14 978.157960 983.057829 2007-12-17 961.705069 969.797167 2007-12-18 969.794966 972.365687 ================================================ FILE: requirements.txt ================================================ chardet>=1.0.1 cython>=0.21.1 h5py>=2.3.1 ipdb>=0.8 ipython>=3.0.0 matplotlib>=1.4.2 numpy>=1.9.1 numexpr>=2.4 pandas>=0.14.1 py>=1.4.26 pytest>=2.6.4 pytest-cov>=1.8.1 scipy>=0.14.0 tables>=3.1.1 xlrd>=0.9.3 ================================================ FILE: run_tests ================================================ #!/bin/bash declare -a fList=( test_analyze.py test_construct_portfolio.py test_utils.py ) for nm in "${fList[@]}" do echo testing "$nm" py.test ./test_module/"$nm" -v done ================================================ FILE: setup.py ================================================ #!/usr/bin/env python # encoding: utf-8 from setuptools import setup setup(name='visualize_wealth', version='0.1', description='Portfolio Construction and Analysis', author='Benjamin M. Gross', author_email='benjaminMgross@gmail.com', url='https://github.com/benjaminmgross/wealth-viz', packages=['visualize_wealth']) ================================================ FILE: test_module/__init__.py ================================================ ================================================ FILE: test_module/test_analyze.py ================================================ #!/usr/bin/env python # encoding: utf-8 """ .. module:: visualize_wealth.test_module.test_analyze.py .. moduleauthor:: Benjamin M. Gross """ import pytest import pandas from pandas.util import testing import visualize_wealth.analyze as analyze @pytest.fixture def test_file(): return pandas.ExcelFile('./test_data/test_analyze.xlsx') @pytest.fixture def man_calcs(test_file): return test_file.parse('calcs', index_col = 0) @pytest.fixture def stat_calcs(test_file): return test_file.parse('results', index_col = 0) @pytest.fixture def prices(test_file): tmp = test_file.parse('calcs', index_col = 0) return tmp[['S&P 500', 'VGTSX']] def test_active_return(prices, stat_calcs): man_ar = stat_calcs.loc['active_return', 'VGTSX'] testing.assert_almost_equal(man_ar, analyze.active_return( series = prices['VGTSX'], benchmark = prices['S&P 500'], freq = 'daily') ) def test_active_returns(man_calcs, prices): active_returns = analyze.active_returns(series = prices['VGTSX'], benchmark = prices['S&P 500']) testing.assert_series_equal(man_calcs['Active Return'], active_returns) def test_log_returns(man_calcs, prices): testing.assert_series_equal(man_calcs['S&P 500 Log Ret'], analyze.log_returns(prices['S&P 500']) ) def test_linear_returns(man_calcs, prices): testing.assert_series_equal(man_calcs['S&P 500 Lin Ret'], analyze.linear_returns(prices['S&P 500']) ) def test_drawdown(man_calcs, prices): testing.assert_series_equal(man_calcs['VGTSX Drawdown'], analyze.drawdown(prices['VGTSX']) ) def test_r2(man_calcs, prices): log_rets = analyze.log_returns(prices).dropna() pandas_rsq = pandas.ols(x = log_rets['S&P 500'], y = log_rets['VGTSX']).r2 analyze_rsq = analyze.r2(benchmark = log_rets['S&P 500'], series = log_rets['VGTSX']) testing.assert_almost_equal(pandas_rsq, analyze_rsq) def test_r2_adj(man_calcs, prices): log_rets = analyze.log_returns(prices).dropna() pandas_rsq = pandas.ols(x = log_rets['S&P 500'], y = log_rets['VGTSX']).r2_adj analyze_rsq = analyze.r2_adj(benchmark = log_rets['S&P 500'], series = log_rets['VGTSX']) testing.assert_almost_equal(pandas_rsq, analyze_rsq) def test_cumulative_turnover(test_file, stat_calcs): alloc_df = test_file.parse('alloc_df', index_col = 0) cols = alloc_df.columns[alloc_df.columns!='Daily TO'] alloc_df = alloc_df[cols].dropna() asset_wt_df = test_file.parse('asset_wt_df', index_col = 0) testing.assert_almost_equal(analyze.cumulative_turnover(alloc_df, asset_wt_df), stat_calcs.loc['cumulative_turnover', 'S&P 500'] ) def test_mctr(test_file): mctr_prices = test_file.parse('mctr', index_col = 0) mctr_manual = test_file.parse('mctr_results', index_col = 0) cols = ['BSV','VBK','VBR','VOE','VOT'] mctr = analyze.mctr(mctr_prices[cols], mctr_prices['Portfolio']) testing.assert_series_equal(mctr, mctr_manual.loc['mctr', cols]) def test_risk_contribution(test_file): mctr_prices = test_file.parse('mctr', index_col = 0) mctr_manual = test_file.parse('mctr_results', index_col = 0) cols = ['BSV','VBK','VBR','VOE','VOT'] mctr = analyze.mctr(mctr_prices[cols], mctr_prices['Portfolio']) weights = pandas.Series( [.2, .2, .2, .2, .2], index = cols, name = 'risk_contribution') testing.assert_series_equal(analyze.risk_contribution(mctr, weights), mctr_manual.loc['risk_contribution', :] ) def test_risk_contribution_as_proportion(test_file): mctr_prices = test_file.parse('mctr', index_col = 0) mctr_manual = test_file.parse('mctr_results', index_col = 0) cols = ['BSV','VBK','VBR','VOE','VOT'] mctr = analyze.mctr(mctr_prices[cols], mctr_prices['Portfolio']) weights = pandas.Series( [.2, .2, .2, .2, .2], index = cols, name = 'risk_contribution') testing.assert_series_equal( analyze.risk_contribution_as_proportion(mctr, weights), mctr_manual.loc['risk_contribution_as_proportion'] ) def test_alpha(prices, stat_calcs): man_alpha = stat_calcs.loc['alpha', 'VGTSX'] testing.assert_almost_equal(man_alpha, analyze.alpha(series = prices['VGTSX'], benchmark = prices['S&P 500']) ) def test_annualized_return(prices, stat_calcs): man_ar = stat_calcs.loc['annualized_return', 'VGTSX'] testing.assert_almost_equal( man_ar, analyze.annualized_return(series = prices['VGTSX'], freq = 'daily') ) def test_annualized_vol(prices, stat_calcs): man_ar = stat_calcs.loc['annualized_vol', 'VGTSX'] testing.assert_almost_equal( man_ar, analyze.annualized_vol(series = prices['VGTSX'], freq = 'daily') ) def test_appraisal_ratio(prices, stat_calcs): man_ar = stat_calcs.loc['appraisal_ratio', 'VGTSX'] testing.assert_almost_equal(man_ar, analyze.appraisal_ratio( series = prices['VGTSX'], benchmark = prices['S&P 500'], freq = 'daily', rfr = 0.0) ) def test_beta(prices, stat_calcs): man_beta = stat_calcs.loc['beta', 'VGTSX'] testing.assert_almost_equal(man_beta, analyze.beta(series = prices['VGTSX'], benchmark = prices['S&P 500']) ) def test_cvar_cf(prices, stat_calcs): man_cvar_cf = stat_calcs.loc['cvar_cf', 'VGTSX'] testing.assert_almost_equal( man_cvar_cf, analyze.cvar_cf(series = prices['VGTSX'], p = 0.01) ) def test_cvar_norm(prices, stat_calcs): man_cvar_norm = stat_calcs.loc['cvar_norm', 'VGTSX'] testing.assert_almost_equal( man_cvar_norm, analyze.cvar_norm(series = prices['VGTSX'], p = 0.01) ) def test_downcapture(prices, stat_calcs): man_dc = stat_calcs.loc['downcapture', 'VGTSX'] testing.assert_almost_equal( man_dc, analyze.downcapture(series = prices['VGTSX'], benchmark = prices['S&P 500']) ) def test_downside_deviation(prices, stat_calcs): man_dd = stat_calcs.loc['downside_deviation', 'VGTSX'] testing.assert_almost_equal( man_dd, analyze.downside_deviation(series = prices['VGTSX']) ) def test_geometric_difference(): a, b = 1. , 1. assert analyze.geometric_difference(a, b) == 0. a, b = pandas.Series({'a': 1.}), pandas.Series({'a': 1.}) assert analyze.geometric_difference(a, b).values == 0. def test_idiosyncratic_as_proportion(prices, stat_calcs): man_iap = stat_calcs.loc['idiosyncratic_as_proportion', 'VGTSX'] testing.assert_almost_equal( man_iap, analyze.idiosyncratic_as_proportion( series = prices['VGTSX'], benchmark = prices['S&P 500']) ) def test_idiosyncratic_risk(prices, stat_calcs): man_ir = stat_calcs.loc['idiosyncratic_risk', 'VGTSX'] testing.assert_almost_equal( man_ir, analyze.idiosyncratic_risk( series = prices['VGTSX'], benchmark = prices['S&P 500']) ) def test_information_ratio(prices, stat_calcs): man_ir = stat_calcs.loc['information_ratio', 'VGTSX'] testing.assert_almost_equal(man_ir, analyze.information_ratio( series = prices['VGTSX'], benchmark = prices['S&P 500'], freq = 'daily') ) def test_jensens_alpha(prices, stat_calcs): man_ja = stat_calcs.loc['jensens_alpha', 'VGTSX'] testing.assert_almost_equal( man_ja, analyze.jensens_alpha( series = prices['VGTSX'], benchmark = prices['S&P 500']) ) def test_max_drawdown(prices, stat_calcs): man_md = stat_calcs.loc['max_drawdown', 'VGTSX'] testing.assert_almost_equal( man_md, analyze.max_drawdown(series = prices['VGTSX']) ) def test_mean_absolute_tracking_error(prices, stat_calcs): man_mate = stat_calcs.loc['mean_absolute_tracking_error', 'VGTSX'] testing.assert_almost_equal( man_mate, analyze.mean_absolute_tracking_error( series = prices['VGTSX'], benchmark = prices['S&P 500']) ) def test_median_downcapture(prices, stat_calcs): man_md = stat_calcs.loc['median_downcapture', 'VGTSX'] testing.assert_almost_equal( man_md, analyze.median_downcapture( series = prices['VGTSX'], benchmark = prices['S&P 500']) ) def test_median_upcapture(prices, stat_calcs): man_uc = stat_calcs.loc['median_upcapture', 'VGTSX'] testing.assert_almost_equal( man_uc, analyze.median_upcapture( series = prices['VGTSX'], benchmark = prices['S&P 500']) ) def test_risk_adjusted_excess_return(prices, stat_calcs): man_raer = stat_calcs.loc['risk_adjusted_excess_return', 'VGTSX'] testing.assert_almost_equal( man_raer, analyze.risk_adjusted_excess_return( series = prices['VGTSX'], benchmark = prices['S&P 500'], rfr = 0.0, freq = 'daily') ) def test_adj_sharpe_ratio(prices, stat_calcs): man_asr = stat_calcs.loc['adj_sharpe_ratio', 'VGTSX'] testing.assert_almost_equal( man_asr, analyze.adj_sharpe_ratio( series = prices['VGTSX'], rfr = 0.0, freq = 'daily') ) def test_sharpe_ratio(prices, stat_calcs): man_sr = stat_calcs.loc['sharpe_ratio', 'VGTSX'] testing.assert_almost_equal(man_sr, analyze.sharpe_ratio( series = prices['VGTSX'], rfr = 0.0, freq = 'daily') ) def test_sortino_ratio(prices, stat_calcs): man_sr = stat_calcs.loc['sortino_ratio', 'VGTSX'] testing.assert_almost_equal(man_sr, analyze.sortino_ratio( series = prices['VGTSX'], rfr = 0.0, freq = 'daily') ) def test_systematic_as_proportion(prices, stat_calcs): man_sap = stat_calcs.loc['systematic_as_proportion', 'VGTSX'] testing.assert_almost_equal( man_sap, analyze.systematic_as_proportion( series = prices['VGTSX'], benchmark = prices['S&P 500']) ) def test_systematic_risk(prices, stat_calcs): man_sr = stat_calcs.loc['systematic_risk', 'VGTSX'] testing.assert_almost_equal( man_sr, analyze.systematic_risk( series = prices['VGTSX'], benchmark = prices['S&P 500']) ) def test_tracking_error(prices, stat_calcs): man_te = stat_calcs.loc['tracking_error', 'VGTSX'] testing.assert_almost_equal( man_te, analyze.tracking_error( series = prices['VGTSX'], benchmark = prices['S&P 500']) ) def test_ulcer_index(prices, stat_calcs): man_ui = stat_calcs.loc['ulcer_index', 'VGTSX'] testing.assert_almost_equal( man_ui, analyze.ulcer_index(series = prices['VGTSX']) ) def test_upcapture(prices, stat_calcs): man_uc = stat_calcs.loc['upcapture', 'VGTSX'] testing.assert_almost_equal( man_uc, analyze.upcapture( series = prices['VGTSX'], benchmark = prices['S&P 500']) ) def test_upside_deviation(prices, stat_calcs): man_ud = stat_calcs.loc['upside_deviation', 'VGTSX'] testing.assert_almost_equal( man_ud, analyze.upside_deviation( series = prices['VGTSX'], freq = 'daily') ) ================================================ FILE: test_module/test_construct_portfolio.py ================================================ #!/usr/bin/env python # encoding: utf-8 """ .. module:: visualize_wealth.test_module.test_construct_portfolio.py .. moduleauthor:: Benjamin M. Gross """ import os import pytest import numpy import pandas import tempfile import datetime from pandas.util import testing from visualize_wealth import construct_portfolio as cp @pytest.fixture def test_file(): f = './test_data/panel from weight file test.xlsx' return pandas.ExcelFile(f) @pytest.fixture def tc_file(): f = './test_data/transaction-costs.xlsx' return pandas.ExcelFile(f) @pytest.fixture def rebal_weights(test_file): return test_file.parse('rebal_weights', index_col = 0) @pytest.fixture def panel(test_file, rebal_weights): tickers = ['EEM', 'EFA', 'IYR', 'IWV', 'IEF', 'IYR', 'SHY'] d = {} for ticker in tickers: d[ticker] = test_file.parse(ticker, index_col = 0) return cp.panel_from_weight_file(rebal_weights, pandas.Panel(d), 1000. ) @pytest.fixture def manual_index(panel, test_file): man_calc = test_file.parse('index_result', index_col = 0 ) return man_calc @pytest.fixture def manual_tc_bps(tc_file): man_tcosts = tc_file.parse('tc_bps', index_col = 0) man_tcosts = man_tcosts.fillna(0.0) return man_tcosts @pytest.fixture def manual_tc_cps(tc_file): man_tcosts = tc_file.parse('tc_cps', index_col = 0) man_tcosts = man_tcosts.fillna(0.0) return man_tcosts @pytest.fixture def manual_mngmt_fee(tc_file): return tc_file.parse('mgmt_fee', index_col = 0) def test_mngmt_fee(panel, tc_file, manual_mngmt_fee): index = cp.pfp_from_weight_file(panel) vw_mfee = cp.mngmt_fee(price_series = index['Close'], bps_cost = 100., frequency = 'daily' ) testing.assert_series_equal(manual_mngmt_fee['daily_index'], vw_mfee ) def test_pfp(panel, manual_index): #import ipdb; ipdb.set_trace() lib_calc = cp.pfp_from_weight_file(panel) # hack because names weren't matching up mn_series = manual_index['Close'] lb_series = lib_calc['Close'] mn_series.index.name = lb_series.index.name testing.assert_series_equal(mn_series, lb_series ) return lib_calc def test_tc_bps(rebal_weights, panel, manual_tc_bps): vw_tcosts = cp.tc_bps(weight_df = rebal_weights, share_panel = panel, bps = 10., ) cols = ['EEM', 'EFA', 'IEF', 'IWV', 'IYR', 'SHY'] testing.assert_frame_equal(manual_tc_bps[cols], vw_tcosts) def test_net_bps(rebal_weights, panel, manual_tc_bps, manual_index): index = test_pfp(panel, manual_index) index = index['Close'] vw_tcosts = cp.tc_bps(weight_df = rebal_weights, share_panel = panel, bps = 10., ) net_tcs = cp.net_tcs(tc_df = vw_tcosts, price_index = index ) testing.assert_series_equal(manual_tc_bps['adj_index'], net_tcs ) def test_net_cps(rebal_weights, panel, manual_tc_cps, manual_index): index = test_pfp(panel, manual_index) index = index['Close'] vw_tcosts = cp.tc_cps(weight_df = rebal_weights, share_panel = panel, cps = 10., ) net_tcs = cp.net_tcs(tc_df = vw_tcosts, price_index = index ) testing.assert_series_equal(manual_tc_cps['adj_index'], net_tcs ) def test_tc_cps(rebal_weights, panel, manual_tc_cps): cols = ['EEM', 'EFA', 'IEF', 'IWV', 'IYR', 'SHY'] vw_tcosts = cp.tc_cps(weight_df = rebal_weights, share_panel = panel, cps = 10., ) testing.assert_frame_equal(manual_tc_cps[cols], vw_tcosts) def test_funs(): """ >>> import pandas.util.testing as put >>> xl_file = pandas.ExcelFile('../tests/test_splits.xlsx') >>> blotter = xl_file.parse('blotter', index_col = 0) >>> cols = ['Close', 'Adj Close', 'Dividends'] >>> price_df = xl_file.parse('calc_sheet', index_col = 0) >>> price_df = price_df[cols] >>> split_frame = calculate_splits(price_df) >>> shares_owned = blotter_and_price_df_to_cum_shares(blotter, ... split_frame) >>> test_vals = xl_file.parse( ... 'share_balance', index_col = 0)['cum_shares'] >>> put.assert_almost_equal(shares_owned['cum_shares'].dropna(), ... test_vals) True >>> f = '../tests/panel from weight file test.xlsx' >>> xl_file = pandas.ExcelFile(f) >>> weight_df = xl_file.parse('rebal_weights', index_col = 0) >>> tickers = ['EEM', 'EFA', 'IYR', 'IWV', 'IEF', 'IYR', 'SHY'] >>> d = {} >>> for ticker in tickers: ... d[ticker] = xl_file.parse(ticker, index_col = 0) >>> panel = panel_from_weight_file(weight_df, pandas.Panel(d), ... 1000.) >>> portfolio = pfp_from_weight_file(panel) >>> manual_calcs = xl_file.parse('index_result', index_col = 0) >>> put.assert_series_equal(manual_calcs['Close'], ... portfolio['Close']) """ return None ================================================ FILE: test_module/test_utils.py ================================================ #!/usr/bin/env python # encoding: utf-8 """ .. module:: visualize_wealth.test_module.test_utils.py .. moduleauthor:: Benjamin M. Gross """ import os import pytest import numpy import pandas import tempfile import datetime from pandas.util.testing import (assert_frame_equal, assert_series_equal, assert_index_equal, assert_almost_equal ) import visualize_wealth.utils as utils @pytest.fixture def populate_store(): name = './test_data/tmp.h5' store = pandas.HDFStore(name, mode = 'w') #two weeks of data before today, delete one week, then update delta = datetime.timedelta(14) today = datetime.datetime.date(datetime.datetime.today()) index = pandas.DatetimeIndex(start = today - delta, freq = 'b', periods = 10 ) store.put('TICK', pandas.Series(numpy.ones(len(index), ), index = index, name = 'Close') ) store.put('TOCK', pandas.Series(numpy.ones(len(index), ), index = index, name = 'Close') ) store.close() return {'name': name, 'index': index} @pytest.fixture def populate_updated(): name = './test_data/tmp.h5' store = pandas.HDFStore(name, mode = 'w') #two weeks of data before today, delete one week, then update delta = datetime.timedelta(14) today = datetime.datetime.date(datetime.datetime.today()) index = pandas.DatetimeIndex(start = today - delta, freq = 'b', periods = 10 ) store.put('TICK', pandas.Series(numpy.ones(len(index), ), index = index, name = 'Close') ) store.put('TOCK', pandas.Series(numpy.ones(len(index), ), index = index, name = 'Close') ) #truncate the index for updating ind = index[:5] n = len(ind) #store the Master IND3X store.put('IND3X', pandas.Series(ind, index = ind) ) cols = ['Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close'] cash = pandas.DataFrame(numpy.ones([n, len(cols)]), index = ind, columns = cols ) #store the CA5H store.put('CA5H', cash) store.close() return {'name': name, 'index': index} def test_create_store_master_index(populate_store): index = populate_store['index'] index = pandas.Series(index, index = index) utils.create_store_master_index(populate_store['name']) store = pandas.HDFStore(populate_store['name'], mode = 'r+') assert_series_equal(store.get('IND3X'), index) store.close() os.remove(populate_store['name']) def test_union_store_indexes(populate_store): store = pandas.HDFStore(populate_store['name'], mode = 'r+') index = populate_store['index'] union = utils.union_store_indexes(store) assert_index_equal(index, union) store.close() os.remove(populate_store['name']) def test_create_store_cash(populate_store): index = populate_store['index'] utils.create_store_cash(populate_store['name']) store = pandas.HDFStore(populate_store['name'], mode = 'r+') cols = ['Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close'] n = len(index) cash = pandas.DataFrame(numpy.ones([n, len(cols)]), index = index, columns = cols ) assert_frame_equal(store.get('CA5H'), cash) store.close() os.remove(populate_store['name']) def test_update_store_master_and_cash(populate_updated): index = populate_updated['index'] index = pandas.Series(index, index = index) utils.update_store_master_index(populate_updated['name']) utils.update_store_cash(populate_updated['name']) store = pandas.HDFStore(populate_updated['name'], mode = 'r+') assert_series_equal(store.get('IND3X'), index) cols = ['Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close'] n = len(index) cash = pandas.DataFrame(numpy.ones([n, len(cols)]), index = index, columns = cols ) assert_frame_equal(store.get('CA5H'), cash) store.close() os.remove(populate_updated['name']) def test_rets_to_price(): dts = ['1/1/2000', '1/2/2000', '1/3/2000'] index = pandas.DatetimeIndex( pandas.Timestamp(dt) for dt in dts ) series = pandas.Series([numpy.nan, 0., 0.], index = index ) log = utils.rets_to_price( series, ret_typ = 'log', start_value = 100. ) lin = utils.rets_to_price( series, ret_typ = 'linear', start_value = 100. ) man = pandas.Series([100., 100., 100.], index = index ) assert_series_equal(log, man) assert_series_equal(lin, man) df = pandas.DataFrame({'a': series, 'b': series}) log = utils.rets_to_price( df, ret_typ = 'log', start_value = 100. ) lin = utils.rets_to_price( df, ret_typ = 'linear', start_value = 100. ) man = pandas.DataFrame({'a': man, 'b': man}) assert_frame_equal(log, man) assert_frame_equal(lin, man) with pytest.raises(TypeError): utils.rets_to_price(pandas.Panel(), ret_typ = 'log', start_value = 100. ) #@pytest.mark.newtest def test_strip_vals(): l = [' TLT', ' HYY ', 'IEF '] strpd = utils.strip_vals(l) res = ['TLT', 'HYY', 'IEF'] assert strpd == res @pytest.mark.newtest def test_zipped_time_chunks(): pts = pandas.Timestamp index = pandas.DatetimeIndex( start = '06/01/2000', freq = 'D', periods = 100 ) res = [('06-01-2000', '06-30-2000'), ('07-01-2000', '07-31-2000'), ('08-01-2000', '08-31-2000')] mc = list(((pts(x), pts(y)) for x, y in res)) lc = utils.zipped_time_chunks( index = index, interval = 'monthly', incl_T = False ) assert mc == lc res = [('06-01-2000', '06-30-2000'), ('07-01-2000', '07-31-2000'), ('08-01-2000', '08-31-2000'), ('09-01-2000', '09-08-2000')] mc = list(((pts(x), pts(y)) for x, y in res)) lc = utils.zipped_time_chunks( index = index, interval = 'monthly', incl_T = True ) assert mc == lc res = [('06-01-2000', '06-30-2000')] mc = list(((pts(x), pts(y)) for x, y in res)) lc = utils.zipped_time_chunks( index = index, interval = 'quarterly', incl_T = False ) assert mc == lc res = [('06-01-2000', '06-30-2000')] mc = list(((pts(x), pts(y)) for x, y in res)) lc = utils.zipped_time_chunks( index = index, interval = 'quarterly', incl_T = False ) assert mc == lc res = [('06-01-2000', '06-30-2000'), ('07-01-2000', '09-08-2000')] mc = list(((pts(x), pts(y)) for x, y in res)) lc = utils.zipped_time_chunks( index = index, interval = 'quarterly', incl_T = True ) assert mc == lc mc = [] lc = utils.zipped_time_chunks( index = index, interval = 'yearly', incl_T = False ) assert mc == lc res = [('06-01-2000', '09-08-2000')] mc = list(((pts(x), pts(y)) for x, y in res)) lc = utils.zipped_time_chunks( index = index, interval = 'yearly', incl_T = True ) assert mc == lc """ def test_update_store_cash(populate_updated): index = populate_updated['index'] utils.update_store_cash(populate_updated['name']) store = pandas.HDFStore(populate_updated['name'], mode = 'r+') cols = ['Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close'] n = len(index) cash = pandas.DataFrame(numpy.ones([n, len(cols)]), index = index, columns = cols ) assert_frame_equal(store.get('CA5H'), cash) store.close() os.remove(populate_updated['name']) """ ================================================ FILE: visualize_wealth/__init__.py ================================================ #!/usr/bin/env python # encoding: utf-8 """ .. module:: __init__.py :synopsis: initialization file for ``visualize_wealth`` .. moduleauthor:: Benjamin M. Gross """ import visualize_wealth.construct_portfolio import visualize_wealth.utils import visualize_wealth.classify import visualize_wealth.analyze ================================================ FILE: visualize_wealth/analyze.py ================================================ #!/usr/bin/env python # encoding: utf-8 """ .. module:: visualize_wealth.analyze.py .. moduleauthor:: Benjamin M. Gross """ import collections import numpy import pandas import scipy.stats from .utils import zipped_time_chunks def active_return(series, benchmark, freq = 'daily'): """ Active returns is the geometric difference between annualized returns :ARGS: series: ``pandas.Series`` of prices of the portfolio benchmark: ``pandas.Series`` of prices of the benchmark :RETURNS: ``pandas.Series`` of active returns .. note:: Compound Linear Returns Linear returns are not simply subtracted, but rather the compound difference is taken such that .. math:: r_a = \\frac{1 + r_p}{1 + r_b} - 1 """ def _active_return(series, benchmark, freq = freq): port_ret = annualized_return(series, freq = freq) bench_ret = annualized_return(benchmark, freq = freq) return geometric_difference(port_ret, bench_ret) if isinstance(benchmark, pandas.DataFrame): return benchmark.apply(lambda x: _active_return(series, x, freq = freq)) else: return _active_return(series, benchmark, freq = freq) def active_returns(series, benchmark): """ Active returns is defined as the compound difference between linear returns :ARGS: series: ``pandas.Series`` of prices of the portfolio benchmark: ``pandas.Series`` of prices of the benchmark :RETURNS: ``pandas.Series`` of active returns .. note:: Compound Linear Returns Linear returns are not simply subtracted, but rather the compound difference is taken such that .. math:: r_a = \\frac{1 + r_p}{1 + r_b} - 1 """ def _active_returns(series, benchmark): return (1 + linear_returns(series)).div( 1 + linear_returns(benchmark)) - 1 if isinstance(benchmark, pandas.DataFrame): return benchmark.apply(lambda x: _active_returns(series, x)) else: return _active_returns(series, benchmark) def alpha(series, benchmark, freq = 'daily', rfr = 0.0): """ Alpha is defined as excess return, over and above its expected return, derived from an asset's sensitivity to an given benchmark, and the return of that benchmrk. series: :class:`pandas.Series` or `pandas.DataFrame` of asset prices benchamrk: :class:`pandas.Series` of prices freq: :class:`string` either ['daily' , 'monthly', 'quarterly', or yearly'] indicating the frequency of the data. Default, 'daily' rfr: :class:`float` of the risk free rate .. math:: \\alpha \\triangleq (R_p - r_f) - \\beta_i \\cdot ( R_b - rf ) \\textrm{where}, R_p &= \\textrm{Portfolio Annualized Return} \\\\ R_b &= \\textrm{Benchmark Annualized Return} \\\\ r_f &= \\textrm{Risk Free Rate} \\\\ \\beta &= \\textrm{Portfolio Sensitivity to the Benchmark} """ def _alpha(series, benchmark, freq = 'daily', rfr = rfr): R_p = annualized_return(series, freq = freq) R_b = annualized_return(benchmark, freq = freq) b = beta(series, benchmark) return R_p - rfr - b * (R_b - rfr) if isinstance(benchmark, pandas.DataFrame): return benchmark.apply(lambda x: _alpha( series, x, freq = freq, rfr = rfr)) else: return _alpha(series, benchmark, freq = freq, rfr = rfr) def annualized_return(series, freq = 'daily'): """ Returns the annualized linear return of a series, i.e. the linear compounding rate that would have been necessary, given the initial investment, to arrive at the final value :ARGS: series: ``pandas.Series`` of prices freq: ``str`` of either ``daily, monthly, quarterly, or yearly`` indicating the frequency of the data ``default=`` daily :RETURNS: ``float``: of the annualized linear return .. code:: python import visualize_wealth.performance as vwp linear_return = vwp.annualized_return(price_series, frequency = 'monthly') """ def _annualized_return(series, freq = 'daily'): fac = _interval_to_factor(freq) T = len(series) - 1. yr_frac = (series.index[-1] - series.index[0]).days / 365. if yr_frac > 1.: return numpy.exp(numpy.log(series[-1]/series[0]) * fac / T) - 1. else: return numpy.exp(numpy.log(series[-1]/series[0]) ) - 1. if isinstance(series, pandas.DataFrame): return series.apply(lambda x: _annualized_return(x, freq = freq)) else: return _annualized_return(series, freq = freq) def annualized_vol(series, freq = 'daily'): """ Returns the annlualized volatility of the log changes of the price series, by calculating the volatility of the series, and then applying the square root of time rule :ARGS: series: ``pandas.Series`` of prices freq: ``str`` of either ``daily, monthly, quarterly, or yearly`` indicating the frequency of the data ``default=`` daily :RETURNS: float: of the annualized volatility .. note:: Applying the Square root of time rule .. math:: \\sigma = \\sigma_t \\cdot \\sqrt{k},\\: \\textrm{where}, k &= \\textrm{Factor of annualization} \\\\ \\sigma_t &= \\textrm{volatility of period log returns} .. code:: import visualize_wealth.performance as vwp ann_vol = vwp.annualized_vol(price_series, frequency = 'monthly') """ def _annualized_vol(series, freq = 'daily'): fac = _interval_to_factor(freq) series_rets = log_returns(series) return series_rets.std()*numpy.sqrt(fac) if isinstance(series, pandas.DataFrame): return series.apply(lambda x: _annualized_vol(x, freq = freq)) else: return _annualized_vol(series, freq = freq) def appraisal_ratio(series, benchmark, freq = 'daily', rfr = 0.): """ A measure of the risk-adjusted return of a financial security or portfolio that is equal to the alpha, divided by the standard error between the portfolio and the benchmark series: :class:`pandas.Series` or `pandas.DataFrame` of asset prices benchamrk: :class:`pandas.Series` of prices freq: :class:`string` either ['daily' , 'monthly', 'quarterly', or yearly'] indicating the frequency of the data. Default, 'daily' rfr: :class:`float` of the risk free rate .. math:: \\textrm{AR} \\triangleq \\frac{\\alpha}{\\epsilon} \\\\ \\textrm{where,} \\\\ \\alpha &= \\alpha \\textrm{, the risk adjused excess return} \\\\ \\epsilon &= \\textrm{standard error, or idiosyncratic risk} \\\\ """ def _appraisal_ratio(series, benchmark, freq = freq, rfr = rfr): a = alpha(series, benchmark, freq = freq, rfr = rfr) e = idiosyncratic_risk(series, benchmark ,freq = freq) return a / e if isinstance(benchmark, pandas.DataFrame): return benchmark.apply(lambda x: _appraisal_ratio(series, x, freq = freq, rfr = rfr) ) else: return _appraisal_ratio(series, benchmark, freq = freq, rfr = rfr) def attribution_weights(series, factor_df): """ Given a price series and explanatory factors factor_df, determine the weights of attribution to each factor or asset :ARGS: series: :class:`pandas.Series` of asset prices to explain given the factors or sub_classes in factor_df factor_df: :class:`pandas.DataFrame` of the prices of the factors or sub_classes to to which the asset prices can be attributed :RETURNS: given an optimal solution, a :class:`pandas.Series` of asset factor weights (summing to one) which best explain the series. If an optimal solution is not found, None type is returned (with accompanying message) """ def obj_fun(weights): tol = 1.e-5 est = factor_df.apply(lambda x: numpy.multiply(weights, x), axis = 1).sum(axis = 1) n = len(series) #when a variable is "excluded" reduce p for higher adj-r2 p = len(weights[weights > tol]) rsq = r2(series = series, benchmark = est) adj_rsq = 1 - (1 - rsq)*(n - 1)/(n - p - 1) return -1.*adj_rsq #linear returns series = linear_returns(series).dropna() #if isinstance(series, pandas.DataFrame) & len(series.columns == 1): #it's an n x 1 dataframe with a valid result #series = series[series.columns[0]] factor_df = linear_returns(factor_df).dropna() guess = numpy.random.rand(factor_df.shape[1]) guess = pandas.Series(guess/guess.sum(), index = factor_df.columns) bounds = [(0., 1.) for i in numpy.arange(len(guess))] opt_fun = scipy.optimize.minimize(fun = obj_fun, x0 = guess, bounds = bounds ) opt_wts = pandas.Series(opt_fun.x, index = guess.index) opt_wts = opt_wts.div(opt_wts.sum()) return opt_wts def attribution_weights_by_interval(series, factor_df, interval): """ Given a price series and explanatory factors factor_df, determine the weights of attribution to each factor or asset over differently spaced time intervals :ARGS: series: :class:`pandas.Series` of asset prices to explain given the factors or sub_classes in factor_df factor_df: :class:`pandas.DataFrame` of the prices of the factors or sub_classes to to which the asset prices can be attributed interval: interval of the amount of time :RETURNS: given an optimal solution, a :class:`pandas.DataFrame` of asset factor weights (summing to one) for each interval. If an optimal solution is not found, None type is returned (with accompanying message) """ chunks = zipped_time_chunks(series.index, interval) wt_dict = {} for beg, fin in chunks: wt_dict[beg] = attribution_weights(series[beg: fin], factor_df.loc[beg: fin, :] ) return pandas.DataFrame(wt_dict).transpose() def beta(series, benchmark): """ Returns the sensitivity of one price series to a chosen benchmark: :ARGS: series: ``pandas.Series`` of prices benchmark: :class:`Series` or :class:`DataFrame` of prices of a benchmark to calculate the sensitivity against :RETURNS: float: the sensitivity of the series to the benchmark .. note:: Calculating Beta .. math:: \\beta \\triangleq \\frac{\\sigma_{s, b}}{\\sigma^2_{b}}, \\: \\textrm{where}, \\sigma^2_{b} &= \\textrm{Variance of the Benchmark} \\\\ \\sigma_{s, b} &= \\textrm{Covariance of the Series & Benchmark} """ def _beta(series, benchmark): series_rets = log_returns(series) bench_rets = log_returns(benchmark) return numpy.divide(bench_rets.cov(series_rets), bench_rets.var()) if isinstance(benchmark, pandas.DataFrame): return benchmark.apply(lambda x: _beta(series, x)) else: return _beta(series, benchmark) def beta_ew(series, benchmark, theta = 0.94): """ Returns the exponentially weighted sensitivity of one return series to a chosen benchmark :ARGS: series: :class:`Series` of prices benchmark: :class:`Series` of a benchmark to calculate the sensitivity theta: :class:`float` of the exponential smoothing constant default to 0.94 MSCI Barra's ew constant :RETURNS: float: the sensitivity of the series to the benchmark """ span = (1. + theta) / (1. - theta) series_rets = log_returns(series) bench_rets = log_returns(benchmark) cov = pandas.ewmcov(series_rets, bench_rets, span = span, min_periods = span ) var = pandas.ewmvar(bench_rets, span = span, min_periods = span ) return cov.div(var) def consecutive(int_series): """ Array logic (no for loops) and fast method to determine the number of consecutive ones given a `pandas.Series` of integers Derived from `Stack Overflow `_ :ARGS: int_series: :class:`pandas.Series` of integers as 0s or 1s :RETURNS: :class:`pandas.Series` of the consecutive ones """ n = int_series == 0 a = ~n c = a.cumsum() index = c[n].index d = pandas.Series(numpy.diff(numpy.hstack(( [0.], c[n] ))) , index =index) int_series[n] = -d return int_series.cumsum() def consecutive_downtick_performance(series, n_ticks = 3): """ Returns a two column :class:`pandas.DataFrame` with columns `['performance','num_downticks']` that shows the cumulative performance (in log returns) and the `num_upticks` number of days the downtick lasted :ARGS: series: :class:`pandas.Series` of asset prices :RETURNS: :class:`pandas.DataFrame` of ``['performance','num_upticks']``. Performance is in log returns and `num_downticks` the number of consecutive downticks for which the performance was generated """ def _consecutive_downtick_performance(series, n_ticks): dnticks = consecutive_downticks(series, n_ticks = n_ticks) series_dn = series[dnticks.index] st, fin = dnticks == 0, (dnticks == 0).shift(-1).fillna(True) n_per = dnticks[fin] series_rets = numpy.log(numpy.divide(series_dn[fin], series_dn[st])) return pandas.DataFrame({'num_downticks':n_per, series.name: series_rets}) if isinstance(series, pandas.DataFrame): return series.apply(lambda x: _consecutive_downtick_performance( x, n_ticks = n_ticks)) else: return _consecutive_downtick_performance(series = series, n_ticks = n_ticks) def consecutive_downtick_relative_performance(series, benchmark, n_ticks = 3): """ Returns a two column :class:`pandas.DataFrame` with columns `['outperformance','num_downticks']` that shows the cumulative outperformance (in log returns) and the `num_upticks` number of days the downtick lasted :ARGS: series: :class:`pandas.Series` of asset prices benchmark: :class:`pandas.Series` of prices to compare ``series`` against :RETURNS: :class:`pandas.DataFrame` of ``['outperformance','num_upticks']``. Outperformance is in log returns and `num_downticks` the number of consecutive downticks for which the outperformance was generated """ def _consecutive_downtick_relative_performance(series, benchmark, n_ticks): dnticks = consecutive_downticks(benchmark, n_ticks = n_ticks) series_dn = series[dnticks.index] bench_dn = benchmark[dnticks.index] st, fin = dnticks == 0, (dnticks == 0).shift(-1).fillna(True) n_per = dnticks[fin] series_rets = numpy.log(numpy.divide(series_dn[fin], series_dn[st])) bench_rets = numpy.log(numpy.divide(bench_dn[fin], bench_dn[st])) return pandas.DataFrame({'outperformance':series_rets.subtract( bench_rets), 'num_downticks':n_per, series.name: series_rets, benchmark.name: bench_rets}, columns = [benchmark.name, series.name, 'outperformance', 'num_downticks'] ) if isinstance(benchmark, pandas.DataFrame): return map(lambda x: _consecutive_downtick_relative_performance( series = series, benchmark = benchmark[x],n_ticks = n_ticks), benchmark.columns) else: return _consecutive_downtick_relative_performance(series = series, benchmark = benchmark, n_ticks = n_ticks) def consecutive_downticks(series, n_ticks = 3): """ Using the :func:`num_consecutive`, returns a :class:`pandas.Series` of the consecutive downticks in the series greater than three downticks :ARGS: series: :class:`pandas.Series` of the asset prices :RETURNS: :class:`pandas.Series` of the consecutive downticks of the series """ w = consecutive( (series < series.shift(1)).astype(int) ) agg_ind = w[w > n_ticks - 1].index.union_many( map(lambda x: w[w.shift(-x) == n_ticks].index, numpy.arange(n_ticks + 1) )) return w[agg_ind] def consecutive_uptick_relative_performance(series, benchmark, n_ticks = 3): """ Returns a two column :class:`pandas.DataFrame` with columns ``['outperformance', 'num_upticks']`` that shows the cumulative outperformance (in log returns) and the ``num_upticks`` number of days the uptick lasted :ARGS: series: :class:`pandas.Series` of asset prices benchmark: :class:`pandas.Series` of prices to compare ``series`` against :RETURNS: :class:`pandas.DataFrame` of ``['outperformance', 'num_upticks']``. Outperformance is in log returns and num_upticks the number of consecutive upticks for which the outperformance was generated """ def _consecutive_uptick_relative_performance(series, benchmark, n_ticks): upticks = consecutive_upticks(benchmark, n_ticks = n_ticks) series_up = series[upticks.index] bench_up = benchmark[upticks.index] st, fin = upticks == 0, (upticks == 0).shift(-1).fillna(True) n_per = upticks[fin] series_rets = numpy.log(numpy.divide(series_up[fin], series_up[st])) bench_rets = numpy.log(numpy.divide(bench_up[fin], bench_up[st])) return pandas.DataFrame({'outperformance':series_rets.subtract( bench_rets), 'num_upticks':n_per, series.name: series_rets, benchmark.name: bench_rets}, columns = [benchmark.name, series.name, 'outperformance', 'num_upticks'] ) if isinstance(benchmark, pandas.DataFrame): return map(lambda x: _consecutive_uptick_relative_performance( series = series, benchmark = benchmark[x], n_ticks = n_ticks), benchmark.columns) else: return _consecutive_uptick_relative_performance( series = series, benchmark = benchmark, n_ticks = n_ticks) def consecutive_uptick_performance(series, n_ticks = 3): """ Returns a two column :class:`pandas.DataFrame` with columns ``['performance', 'num_upticks']`` that shows the cumulative performance (in log returns) and the ``num_upticks`` number of days the uptick lasted :ARGS: series: :class:`pandas.Series` of asset prices :RETURNS: :class:`pandas.DataFrame` of ``['outperformance', 'num_upticks']``. Outperformance is in log returns and num_upticks the number of consecutive upticks for which the outperformance was generated """ def _consecutive_uptick_performance(series, n_ticks): upticks = consecutive_upticks(series, n_ticks = n_ticks) series_up = series[upticks.index] st, fin = upticks == 0, (upticks == 0).shift(-1).fillna(True) n_per = upticks[fin] series_rets = numpy.log(numpy.divide(series_up[fin], series_up[st])) return pandas.DataFrame({'num_upticks':n_per, series.name: series_rets} ) if isinstance(series, pandas.DataFrame): return series.apply(lambda x: _consecutive_uptick_performance(x, n_ticks = n_ticks)) else: return _consecutive_uptick_performance( series = series, n_ticks = n_ticks) def consecutive_upticks(series, n_ticks = 3): """ Using the :func:`num_consecutive`, returns a :class:`pandas.Series` of the consecutive upticks in the series with greater than 3 consecutive upticks :ARGS: series: :class:`pandas.Series` of the asset prices :RETURNS: :class:`pandas.Series` of the consecutive downticks of the series """ w = consecutive( (series > series.shift(1)).astype(int) ) agg_ind = w[w > n_ticks - 1].index.union_many( map(lambda x: w[w.shift(-x) == n_ticks].index, numpy.arange(n_ticks + 1) )) return w[agg_ind] def cumulative_turnover(alloc_df, asset_wt_df): """ Provided an allocation frame (i.e. the weights to which the portfolio was rebalanced), and the historical asset weights, return the cumulative turnover, where turnover is defined below. The first period is excluded of the ``alloc_df`` is excluded as that represents the initial investment :ARGS: alloc_df: :class:`pandas.DataFrame` of the the weighting allocation that was provided to construct the portfolio asset_wt_df: :class:`pandas.DataFrame` of the actual historical weights of each asset :RETURNS: cumulative turnover .. note:: Calcluating Turnover Let :math:`\\tau_j =` Single Period period turnover for period :math:`j`, and assets :math:`i = 1,:2,:...:,n`, each whose respective portfolio weight is represented by :math:`\\omega_i`. Then the single period :math:`j` turnover for all assets :math:`1,..,n` can be calculated as: .. math:: \\tau_j = \\frac{\\sum_{i=1}^n|\omega_i - \\omega_{i+1}| }{2} """ #the dates when the portfolio are the cause of turnover ind = alloc_df.index[1:] try: return 0.5*asset_wt_df.loc[ind, :].sub( asset_wt_df.shift(-1).loc[ind, :]).abs().sum(axis = 1).sum() #the rebalance might have dates past the earliest price except KeyError: loc = alloc_df.index.searchsorted(asset_wt_df.index[0]) tmp = alloc_df.iloc[loc:, :] ind = tmp.index[1:] return 0.5*asset_wt_df.loc[ind, :].sub( asset_wt_df.shift(-1).loc[ind, :]).abs().sum(axis = 1).sum() def cvar_cf(series, p = .01): """ CVaR (Expected Shortfall), using the `Cornish Fisher Approximation `_ :ARGS: series: :class:`pandas.Series` or :class:`pandas.DataFrame` of the asset prices p: :class:`float` of the desired percentile, defaults to .01 or the 1% CVaR :RETURNS: :class:`float` or :class:`pandas.Series` of the CVaR """ def _cvar_cf(series, p): ppf = scipy.stats.norm.ppf pdf = scipy.stats.norm.pdf series_rets = log_returns(series) mu, sigma = series_rets.mean(), series_rets.std() skew, kurt = series_rets.skew(), series_rets.kurtosis() - 3. f = lambda x, skew, kurt: x + skew/6.*(x**2 - 1) + kurt/24.* x * ( x**2 - 3.) - skew**2/36. * x * (2. * x**2 - 5.) loss = f(x = 1/p*(pdf(ppf(p))), skew = skew, kurt = kurt) * sigma - mu return numpy.exp(loss) - 1. if isinstance(series, pandas.DataFrame): return series.apply(lambda x: _cvar_cf(x, p = p)) else: return _cvar_cf(series, p = p) def cvar_contrib(wts, prices, alpha = .10, n_sims = 100000): """ Calculate each asset's contribution to CVaR based on it's ew volatility and correlation to other assets, and it's current portfolio weight :ARGS: wts: :class:`Series` of current weights prices: :class:`DataFrame` of prices alpha: :class:`float` of the cvar parameter n_sims: :class:`int` the number of simulations :RETURNS: :class:`pandas.Series` of proportional contribution .. note:: alternative parameters currently the span (for exponentially weighted stats) and phi (for the degrees of freedom of the t-distribution) are not changeable for the function """ def _spectral_fun(alpha, n_sims): th = numpy.ceil(alpha*n_sims) # threshold th = int(th) spc = pandas.Series( numpy.zeros([n_sims,]) ) spc[:th] = 1 return spc/spc.sum() m, n = prices.shape rets = analyze.log_returns(prices) cov = pandas.ewmcov(rets, span = 21., min_periods = 21) zs = pandas.Series(numpy.zeros(n,), index = prices.columns) sims = mvt_rnd(mu = zs, covm = cov.iloc[-1, :, :], phi = 3, n_sim = n_sims ) psi = sims.dot(wts) spec = _spectral_fun(alpha = alpha, n_sims = n_sims ) srtd = psi.copy() ind = psi.argsort() srtd.sort() # pandas multiplies using indexes, so remove index #cvar = srtd[ind].dot(spec.values) d = {} for asset in wts.index: d[asset] = sims.loc[ind, asset].dot(spec.values) acvar = pandas.Series(d) tmp = acvar.mul(wts) return tmp/tmp.sum() def cvar_norm(series, p = .01): """ CVaR (Conditional Value at Risk), fitting the normal distribution to pthe historical time series using :ARGS: series: :class:`pandas.Series` or :class:`pandas.DataFrame` of the asset prices p: :class:`float` of the desired percentile, defaults to .01 or the 1% CVaR :RETURNS: :class:`float` or :class:`pandas.Series` of the CVaR """ def _cvar_norm(series, p): pdf = scipy.stats.norm.pdf series_rets = log_returns(series) mu, sigma = series_rets.mean(), series_rets.std() var = lambda alpha: scipy.stats.distributions.norm.ppf(1 - alpha) return numpy.exp(sigma/p * pdf(var(p)) - mu) - 1. if isinstance(series, pandas.DataFrame): return series.apply(lambda x: _cvar_norm(x, p = p)) else: return _cvar_norm(series, p = p) def cvar_np(series, p): """ Non-parametric CVaR or Expected Shortfall, solely based on the mean of historical values :ARGS: series: :class:`pandas.Series` or :class:`pandas.DataFrame` of the asset prices p: :class:`float` of the desired percentile, defaults to .01 or the 1% CVaR :RETURNS: :class:`float` or :class:`pandas.Series` of the CVaR """ def _cvar_mu_np(series, p): series_rets = linear_returns(series) var = numpy.percentile(series_rets, p*100.) return -series_rets[series_rets <= var].mean() if isinstance(series, pandas.DataFrame): return series.apply(lambda x: _cvar_mu_np(x, p = p)) else: return _cvar_mu_np(series, p = p) def downcapture(series, benchmark): """ Returns the proportion of ``series``'s cumulative negative returns to ``benchmark``'s cumulative returns, given benchmark's returns were negative in that period :ARGS: series: ``pandas.Series`` of prices benchmark: ``pandas.Series`` of prices to compare ``series`` against :RETURNS: ``float`` of the downcapture of cumulative positive ret .. seealso:: :py:data:`median_downcapture(series, benchmark)` """ def _downcapture(series, benchmark): series_rets = log_returns(series) bench_rets = log_returns(benchmark) index = bench_rets < 0. return series_rets[index].mean() / bench_rets[index].mean() if isinstance(benchmark, pandas.DataFrame): return benchmark.apply(lambda x: _downcapture(series, x)) else: return _downcapture(series, benchmark) def downside_deviation(series, freq = 'daily'): """ Returns the volatility of the returns that are less than zero :ARGS: series:``pandas.Series`` of prices freq: ``str`` of frequency, either ``daily, monthly, quarterly, or yearly`` :RETURNS: float: of the downside standard deviation """ def _downside_deviation(series, freq = 'daily'): fac = _interval_to_factor(freq) series_rets = log_returns(series) index = series_rets < 0. return series_rets[index].std()*numpy.sqrt(fac) if isinstance(series, pandas.DataFrame): return series.apply(lambda x: _downside_deviation(x, freq = freq)) else: return _downside_deviation(series, freq = freq) def drawdown(series): """ Returns a :class:`pandas.Series` or :class:`pandas.DataFrame` (same as input) of the drawdown, i.e. distance from rolling cumulative maximum. Values are negative specifically to be used in plots :ARGS: series: :class:`pandas.Series` or :class:`pandas.DatFrame` of prices :RETURNS: same type as input .. code:: drawdown = vwp.drawdown(price_df) """ def _drawdown(series): dd = (series/series.cummax() - 1.) dd[0] = numpy.nan return dd if isinstance(series, pandas.DataFrame): return series.apply(_drawdown) else: return _drawdown(series) def ew_vol(series, theta = 0.94, freq = 'daily'): """ Returns the exponentially weighted, annualized standard deviation :ARGS: series: :class:`Series` or :class:`DataFrame` of prices theta: coefficient of decay, default BARRA's value of .94 which roughly equates to a span of 33 days freq: :class:`string` of either ['daily', 'monthly', 'quarterly', 'yearly'] """ span = (1. + theta)/(1 - theta) log_rets = log_returns(series) fac = _interval_to_factor(freq) ew_vol = pandas.ewmstd(log_rets, span = span, min_periods = span ) return ew_vol*numpy.sqrt(fac) def geometric_difference(a, b): """ Returns the geometric difference of returns where :ARGS: a: :class:`pandas.Series` or :class:`float` b: :class:`pandas.Series` or :class:`float` :RETURNS: same class as inputs .. math:: \\textrm{GD} = \\frac{(1 + a )}{(1 + b)} - 1 \\\\ """ if isinstance(a, pandas.Series): msg = "index must be equal for pandas.Series" assert a.index.equals(b.index), msg return (1. + a).divide(1. + b) - 1. else: return (1. + a) / (1. + b) - 1. def idiosyncratic_as_proportion(series, benchmark, freq = 'daily'): """ Returns the idiosyncratic risk as proportion of total volatility :ARGS: series: ``pandas.Series`` of prices benchmark: ``pandas.Series`` to compare ``series`` against freq: ``str`` of frequency, either ``daily, monthly, quarterly, or yearly`` :RETURNS: ``float`` between (0, 1) representing the proportion of volatility represented by idiosycratic risk """ def _idiosyncratic_as_proportion(series, benchmark, freq = 'daily'): fac = _interval_to_factor(freq) series_rets = log_returns(series) return idiosyncratic_risk(series, benchmark, freq)**2 / ( annualized_vol(series, freq)**2) if isinstance(benchmark, pandas.DataFrame): return benchmark.apply(lambda x: _idiosyncratic_as_proportion( series, x, freq)) else: return _idiosyncratic_as_proportion(series, benchmark, freq) def idiosyncratic_risk(series, benchmark, freq = 'daily'): """ Returns the idiosyncratic risk, i.e. unexplained variation between a price series and a chosen benchmark :ARGS: series: ``pandas.Series`` of prices benchmark: ``pandas.Series`` to compare ``series`` against freq: ``str`` of frequency, either ``daily, monthly, quarterly, or yearly`` :RETURNS: float: the idiosyncratic volatility (not variance) .. note:: Additivity of an asset's Variance An asset's variance can be broken down into systematic risk, i.e. that proportion of risk that can be attributed to some benchmark or risk factor and idiosyncratic risk, or the unexplained variation between the series and the chosen benchmark / factor. Therefore, using the additivity of variances, we can calculate idiosyncratic risk as follows: .. math:: \\sigma^2_{\\textrm{total}} = \\sigma^2_{\\beta} + \\sigma^2_{\\epsilon} + \\sigma^2_{\\epsilon, \\beta}, \\: \\textrm{where}, \\sigma^2_{\\beta} &= \\textrm{variance attributable to systematic risk} \\\\ \\sigma^2_{\\epsilon} &= \\textrm{idiosyncratic risk} \\\\ \\sigma^2_{\\epsilon, \\beta} &= \\textrm{covariance between idiosyncratic and systematic risk, which by definition} = 0 \\\\ \\Rightarrow \\sigma_{\\epsilon} = \\sqrt{\\sigma^2_{\\beta} + \\sigma^2_{\\epsilon, \\beta}} """ def _idiosyncratic_risk(series, benchmark, freq = 'daily'): fac = _interval_to_factor(freq) series_rets =log_returns(series) bench_rets = log_returns(benchmark) series_vol = annualized_vol(series, freq) benchmark_vol = annualized_vol(benchmark, freq) return numpy.sqrt(series_vol**2 - beta(series, benchmark)**2 * ( benchmark_vol ** 2)) if isinstance(benchmark, pandas.DataFrame): return benchmark.apply(lambda x: _idiosyncratic_risk( series, x, freq = freq)) else: return _idiosyncratic_risk(series, benchmark, freq) def information_ratio(series, benchmark, freq = 'daily'): """ A measure of the risk-adjusted return of a financial security or portfolio that is equal to the active return divided by the tracking error between the portfolio and the benchmark (MATE is used here, see the benefits of MATE over TE) series: :class:`pandas.Series` or `pandas.DataFrame` of asset prices benchamrk: :class:`pandas.Series` of prices freq: :class:`string` either ['daily' , 'monthly', 'quarterly', or yearly'] indicating the frequency of the data. Default, 'daily' rfr: :class:`float` of the risk free rate .. note:: Calculating Information Ratio .. math:: \\textrm{IR} \\triangleq \\frac{\\alpha}{\\omega} \\\\ where, .. math:: \\alpha &= \\textrm{active return} \\\\ \\omega &= \\textrm{tracking error} \\\\ """ def _information_ratio(series, benchmark, freq = freq): ar = active_return(series, benchmark, freq = freq) mate = mean_absolute_tracking_error(series, benchmark, freq = freq) return ar / mate if isinstance(benchmark, pandas.DataFrame): return benchmark.apply(lambda x: _information_ratio(series, x, freq = freq)) else: return _information_ratio(series, benchmark, freq = freq) def jensens_alpha(series, benchmark, rfr = 0., freq = 'daily'): """ Returns the `Jensen's Alpha `_ or the excess return based on the systematic risk of the ``series`` relative to the ``benchmark`` :ARGS: series: ``pandas.Series`` of prices benchmark: ``pandas.Series`` of the prices to compare ``series`` against rfr: ``float`` of the risk free rate freq: ``str`` of frequency, either daily, monthly, quarterly, or yearly :RETURNS: ``float`` representing the Jensen's Alpha .. note:: Calculating Jensen's Alpha .. math:: \\alpha_{\\textrm{Jensen}} = r_p - \\beta \\cdot r_b Where, .. math:: r_p &= \\textrm{annualized linear return of the portfolio} \\\\ \\beta &= \\frac{\\sigma_{s, b}}{\\sigma^2_{b}} \\\\ r_b &= \\textrm{annualized linear return of the benchmark} """ def _jensens_alpha(series, benchmark, rfr = 0., freq = 'daily'): fac = _interval_to_factor(freq) series_ret = annualized_return(series, freq) bench_ret = annualized_return(benchmark, freq) return series_ret - (rfr + beta(series, benchmark)*( bench_ret - rfr)) if isinstance(benchmark, pandas.DataFrame): return benchmark.apply(lambda x: _jensens_alpha( series, x, rfr = rfr, freq = freq)) else: return _jensens_alpha(series, benchmark, rfr = rfr, freq = freq) def linear_returns(series): """ Returns a series of linear returns given a series of prices :ARGS: series: ``pandas.Series`` of prices :RETURNS: series: ``pandas.Series`` of linear returns .. note:: Calculating Linear Returns .. math:: R_t = \\frac{P_{t+1}}{P_t} - 1 """ def _linear_returns(series): return series.div(series.shift(1)) - 1 if isinstance(series, pandas.DataFrame): return series.apply(_linear_returns) else: return _linear_returns(series) def log_returns(series): """ Returns a series of log returns given a series of prices where :ARGS: series: ``pandas.Series`` of prices :RETURNS: series: ``pandas.Series`` of log returns .. note:: Calculating Log Returns .. math:: R_t = \\log(\\frac{P_{t+1}}{P_t}) """ def _log_returns(series): return series.apply(numpy.log).diff() if isinstance(series, pandas.DataFrame): return series.apply(_log_returns) else: return _log_returns(series) def log_returns_chol_adj(frame, theta = 0.94): """ Create volatility adjusted historical returns that preserve the covariance structure while providing scaled-appropriate returns to calculate tail-risk measures :ARGS: frame: :class:`DataFrame` of prices theta: :class:`float` of the decay parameter to use for the exponential smoothing :RETURNS: :class:`DataFrame` of vol adjusted log returns preserving the covariance structure .. note:: Calculation explanation The calculation comes from the Duffie & Pan 1997, where the Cholesky matrix of covariance matrix is used in place of the square root of the variance, in the volatility adjustment, and can be seen in `Value at Risk Models `_, by Carol Alexander Where, .. math:: \\tilde{\\mathbb{x}_t} = \\mathbb{Q}_T\\mathbb{Q}^{-1}_t \\mathbb{x}_t, \\; t = 1, 2, ..., T \\\\ \\\\ Where .. math:: \\tilde{\\mathbb{x}_t} &= \\textrm{ the stock returns } \\textrm{adjusted to have constant covariance } \\\\ \\mathbb{Q}_t &= \\textrm{ the Cholesky matrix of the covariance matrix } \\\\ \\mathbb{x}_t &= \\textrm{ the unadjusted stock returns} """ #define the truncated functions dot = numpy.dot inv = numpy.linalg.inv chol = numpy.linalg.cholesky span = (1. + theta)/(1 - theta) log_rets = log_returns(frame) ew_cov = pandas.ewmcov(log_rets, span = span, min_periods = span ) q_T = chol(ew_cov.iloc[-1, :, :]) d = {} for row in ew_cov.dropna().items: q_t = chol(ew_cov.loc[row, :, :]) d[row] = pandas.Series(dot(log_rets.xs(row), dot(q_T, inv(q_t))), index = log_rets.columns ) new_logs = pandas.DataFrame(d).transpose() return new_logs.reindex(log_rets.index) def log_returns_vol_adj(series, theta = 0.94, freq = 'daily'): """ Returns the volatility scaled log returns :ARGS: series: :class:`Series` or :class:`DataFrame` of prices theta: :class:`float` of the decay parameter to use for the exponential smoothing for volatility freq: :class:`string` from ['daily', 'monthly', 'quarterly', 'yearly'] :RETURNS: volatility scaled log returns of the same dtype provided .. note:: Calculating Vol Adjustment Factor .. math:: \\tilde{r}_{t,T} = \\frac{\\sigma_T}{\\sigma_t}\\cdot r_{t} This methodology is most common in using scaled historical returns to calculate VaR and CVaR """ def _log_ret_vol_adj(series, theta, freq): log_rets = log_returns(series) vol = ew_vol(series, theta = theta, freq = freq) scale = vol[-1] / vol return log_rets.mul(scale) if isinstance(series, pandas.DataFrame): return series.apply( lambda x: _log_ret_vol_adj(x, theta = theta, freq = freq) ) else: return _log_ret_vol_adj(series, theta = theta, freq = freq) def max_drawdown(series): """ Returns the maximum drawdown, or the maximum peak to trough linear distance, as a positive drawdown value :ARGS: series: ``pandas.Series`` of prices :RETURNS: float: the maximum drawdown of the period, expressed as a positive number .. code:: import visualize_wealth.performance as vwp max_dd = vwp.max_drawdown(price_series) """ def _max_drawdown(series): return numpy.max(1 - series/series.cummax()) if isinstance(series, pandas.DataFrame): return series.apply(_max_drawdown) else: return _max_drawdown(series) def mctr(asset_df, portfolio_series): """ Return a :class:`pandas.Series` of the marginal contribution for risk ("mctr") for each of the assets that construct ``portfolio_df`` :ARGS: asset_df: :class:`pandas.DataFrame` of asset prices portfolio_series: :class:`pandas.Series` of the portfolio value that is consructed by ``asset_df`` :RETURNS: a :class:`pandas.Series` of each of the asset's marginal contribution to risk .. note:: Calculating Marginal Contribution to Risk If we define, :math:`MCR_i` to be the Marginal Contribution to Risk for asset :math:`i`, then, .. math:: MCTR_i &= \\sigma_i \\cdot \\rho_{i, P} \\\\ Where, .. math:: \\sigma_i &= \\textrm{volatility of asset } i, \\\\ \\rho_i &= \\textrm{correlation of asset } i \\textrm{ with the Portfolio} .. note:: Reference for Further Reading MSCI Barra did an extensive (and easy to read) white paper entitled `Risk Contribution `_ that explicitly details the risk exposure calculation. """ asset_rets = log_returns(asset_df) port_rets = log_returns(portfolio_series) return asset_rets.corrwith(port_rets).mul(asset_rets.std()) def mean_absolute_tracking_error(series, benchmark, freq = 'daily'): """ Returns Carol Alexander's calculation for Mean Absolute Tracking Error ("MATE"). :ARGS: series: ``pandas.Series`` of prices benchmark: ``pandas.Series`` to compare ``series`` against freq: ``str`` of frequency, either ``daily, monthly, quarterly, or yearly`` :RETURNS: ``float`` of the mean absolute tracking error .. note:: Why Mean Absolute Tracking Error One of the downfalls of `Tracking Error `_ ("TE") is that diverging price series that diverge at a constant rate **may** have low TE. MATE addresses this issue. .. math:: \\sqrt{\\frac{(T-1)}{T}\\cdot \\tau^2 + \\bar{R}} \\: \\textrm{where} \\tau &= \\textrm{Tracking Error} \\\\ \\bar{R} &= \\textrm{mean of the active returns} """ def _mean_absolute_tracking_error(series, benchmark, freq = 'daily'): active_rets = active_returns(series = series, benchmark = benchmark) N = active_rets.shape[0] return numpy.sqrt((N - 1)/float(N) * tracking_error( series, benchmark, freq)**2 + active_rets.mean()**2) if isinstance(benchmark, pandas.DataFrame): return benchmark.apply(lambda x: _mean_absolute_tracking_error( series, x, freq = freq)) else: return _mean_absolute_tracking_error(series, benchmark, freq = freq) def median_downcapture(series, benchmark): """ Returns the median downcapture of a ``series`` of prices against a ``benchmark`` prices, given that the ``benchmark`` achieved negative returns in a given period :ARGS: series: ``pandas.Series`` of prices benchmark: ``pandas.Series`` of prices to compare ``series`` against :RETURNS: ``float`` of the median downcapture .. warning:: About Downcapture Downcapture can be a difficult statistic to ensure validity. As downcapture is :math:`\\frac{\\sum{r_{\\textrm{series}}}} {\\sum{r_{b|r_i \\geq 0}}}` or the median values (in this case), dividing by small numbers can have asymptotic effects to the overall value of this statistic. Therefore, it's good to do a "sanity check" between ``median_upcapture`` and ``upcapture`` """ def _median_downcapture(series, benchmark): series_rets = log_returns(series) bench_rets = log_returns(benchmark) index = bench_rets < 0. return series_rets[index].median() / bench_rets[index].median() if isinstance(benchmark, pandas.DataFrame): return benchmark.apply(lambda x: _median_downcapture(series, x)) else: return _median_downcapture(series, benchmark) def median_upcapture(series, benchmark): """ Returns the median upcapture of a ``series`` of prices against a ``benchmark`` prices, given that the ``benchmark`` achieved positive returns in a given period :ARGS: series: ``pandas.Series`` of prices benchmark: ``pandas.Series`` of prices to compare ``series`` against :RETURNS: float: of the median upcapture .. warning:: About Upcapture Upcapture can be a difficult statistic to ensure validity. As upcapture is :math:`\\frac{\\sum{r_{\\textrm{series}}}} {\\sum{r_{b|r_i \\geq 0}}}` or the median values (in this case), dividing by small numbers can have asymptotic effects to the overall value of this statistic. Therefore, it's good to do a "sanity check" between ``median_upcapture`` and ``upcapture`` """ def _median_upcapture(series, benchmark): series_rets = log_returns(series) bench_rets = log_returns(benchmark) index = bench_rets > 0. return series_rets[index].median() / bench_rets[index].median() if isinstance(benchmark, pandas.DataFrame): return benchmark.apply(lambda x: _median_upcapture(series, x)) else: return _median_upcapture(series, benchmark) def mvt_rnd(mu, covm, phi, n_sim): """ Create an repr(n_sim) simluation of a multi-variate t distribution with repr(phi) degrees of freedom, mean repr(mu), and covariance structure repr(covm) :ARGS: mu: :class:`Series` of average returns covm: :class:`DataFrame` of the assets covariance matrix phi: :class:`float` of t-distribution degrees of freedom n_sim: :class:`int` of the number of simulations to make :RETURNS: :class:`DataFrame` dim in {n_sim, mu.shape} of simulations .. note:: Transformation taken from `Kenny Chowdary's website `_ """ d = len(covm) g = numpy.tile(numpy.random.gamma(phi/2., 2./phi, n_sim), (d, 1)).T Z = numpy.random.multivariate_normal(numpy.zeros(d), covm, n_sim) ret = mu.values + Z/numpy.sqrt(g) return pandas.DataFrame(ret, columns = mu.index) def period_returns(series, freq = 'daily', interval = 'quarterly'): """ Return the disjoint periodic returns of series at interval, given the time frequency of the data in series is freq. :ARGS: series: :class:`pandas.Series` of prices freq: :class:`string` in ['daily', 'monthly', 'quarterly', 'yearly'] of the frequency of the data interval: :class:`string` of the periodicity of the interval you wish to return, in ['monthly', 'quarterly', 'yearly'] :RETURNS: :class:`pandas.Series` """ def _period_returns(series, freq, interval): fmat = {'monthly': lambda x: '{0}-{1}'.format(x.month, x.year), 'quarterly': lambda x: 'q{0}-{1}'.format(x.quarter, x.year), 'yearly': lambda x: x.year } chunks = zipped_time_chunks(series.index, interval) dt_l = [] d = {} for beg, fin in chunks: key = fmat[interval](beg) d[key] = annualized_return(series[beg:fin], freq = freq) dt_l.append(key) return pandas.Series(d, index = dt_l) if isinstance(series, pandas.Series): return _period_returns(series = series, freq = freq, interval = interval) else: return series.apply(lambda x: _period_returns(series = x, freq = freq, interval = interval) ) def period_volatility(series, freq = 'daily', interval = 'quarterly'): """ Return the disjoint periodic volatility of series at interval, given the time frequency of the data in series is freq. :ARGS: series: :class:`pandas.Series` of prices freq: :class:`string` in ['daily', 'monthly', 'quarterly', 'yearly'] of the frequency of the data interval: :class:`string` of the periodicity of the interval you wish to return, in ['monthly', 'quarterly', 'yearly'] :RETURNS: :class:`pandas.Series` """ def _period_volatility(series, freq, interval): fmat = {'monthly': lambda x: '{0}-{1}'.format(x.month, x.year), 'quarterly': lambda x: 'q{0}-{1}'.format(x.quarter, x.year), 'yearly': lambda x: x.year } chunks = zipped_time_chunks(series.index, interval) dt_l = [] d = {} for beg, fin in chunks: key = fmat[interval](beg) d[key] = annualized_vol(series[beg:fin], freq = freq) dt_l.append(key) return pandas.Series(d, index = dt_l) if isinstance(series, pandas.Series): return _period_volatility(series = series, freq = freq, interval = interval) else: return series.apply(lambda x: _period_volatility(series = x, freq = freq, interval = interval) ) def r2(series, benchmark): """ Returns the R-Squared or `Coefficient of Determination `_ for a univariate regression (does not adjust for more independent variables) .. seealso:: :meth:`r2_adjusted` :ARGS: series: :class`pandas.Series` of of log returns benchmark: :class`pandas.Series` of log returns to regress ``series`` against :RETURNS: float: of the coefficient of variation """ def _r_squared(x, y): X = pandas.DataFrame({'ones': 1., 'xs': x}) beta = numpy.linalg.inv(X.transpose().dot(X)).dot( X.transpose().dot(y) ) y_est = beta[0] + beta[1]*x ss_res = ((y_est - y)**2).sum() ss_tot = ((y - y.mean())**2).sum() return 1 - ss_res/ss_tot if isinstance(benchmark, pandas.DataFrame): #remove the numpy.nan's if they're there if (benchmark.iloc[0, :].isnull().all()) & (numpy.isnan(series[0])): benchmark = benchmark.dropna() series = series.dropna() return benchmark.apply(lambda x: _r_squared(x = x, y = series)) else: if (numpy.isnan(benchmark.iloc[0])) & (numpy.isnan(series.iloc[0])): benchmark = benchmark.dropna() series = series.dropna() return _r_squared(y = series, x = benchmark) def r2_adj(series, benchmark): """ The Adjusted R-Squared that incorporates the number of independent variates using the `Formula Found of Wikipedia _` :ARGS: series: :class:`pandas.Series` of asset returns benchmark: :class:`pandas.DataFrame` of benchmark returns to explain the returns of the ``series`` weights: :class:`pandas.Series` of weights to weight each column of the benchmark :RETURNS: :class:float of the adjusted r-squared` """ n = len(series) p = 1 return 1 - (1 - r2(series, benchmark))*(n - 1)/(n - p - 1) def r2_mv_adj(x, y): """ Returns the adjusted R-Squared for multivariate regression """ n = len(y) p = x.shape[1] return 1 - (1 - r2_mv(x, y))*(n - 1)/(n - p - 1) def r2_mv(x, y): """ Multivariate r-squared """ ones = pandas.Series(numpy.ones(len(y)), name = 'ones') d = x.to_dict() d['ones'] = ones cols = ['ones'] cols.extend(x.columns) X = pandas.DataFrame(d, columns = cols) beta = numpy.linalg.inv(X.transpose().dot(X)).dot( X.transpose().dot(y) ) y_est = beta[0] + x.dot(beta[1:]) ss_res = ((y_est - y)**2).sum() ss_tot = ((y - y.mean())**2).sum() return 1 - ss_res/ss_tot def risk_adjusted_excess_return(series, benchmark, rfr = 0., freq = 'daily'): """ Returns the MMRAP or the `Modigliani Risk Adjusted Performance `_ that calculates the excess return from the `Capital Allocation Line `_, at the same level of risk (or volatility), specificaly, :ARGS: series: ``pandas.Series`` of prices benchmark: ``pandas.Series`` from which to compare ``series`` rfr: ``float`` of the risk free rate freq: ``str`` of frequency, either ``daily, monthly, quarterly, or yearly`` :RETURNS: ``float`` of the risk adjusted excess performance .. note:: Calculating Risk Adjusted Excess Returns .. math:: raer = r_p - \\left(\\textrm{SR}_b \\cdot \\sigma_p + r_f\\right) Where, .. math:: r_p &= \\textrm{annualized linear return} \\\\ \\textrm{SR}_b &= \\textrm{Sharpe Ratio of the benchmark} \\\\ \\sigma_p &= \\textrm{volatility of the portfolio} \\\\ r_f &= \\textrm{Risk free rate} """ def _risk_adjusted_excess_return(series, benchmark, rfr = 0., freq = 'daily'): benchmark_sharpe = sharpe_ratio(benchmark, rfr, freq) annualized_ret = annualized_return(series, freq) series_vol = annualized_vol(series, freq) return annualized_ret - series_vol * benchmark_sharpe - rfr if isinstance(benchmark, pandas.DataFrame): return benchmark.apply(lambda x: _risk_adjusted_excess_return( series, x, rfr = rfr, freq = freq)) else: return _risk_adjusted_excess_return(series, benchmark, rfr = rfr, freq = freq) def risk_contribution(mctr_series, weight_series): """ Returns the risk contribution for each asset, given the marginal contribution to risk ("mctr") and the ``weight_series`` of asset weights :ARGS: mctr_series: :class:`pandas.Series` of the marginal risk contribution weight_series: :class:`pandas.Series` of weights of each asset :RETURNS: :class:`pandas.Series` of the risk contribution of each asset .. note:: Calculating Risk Contribution If :math:`RC_i` is the Risk Contribution of asset :math:`i`, and :math:`\omega_i` is the weight of asset :math:`i`, then .. math:: RC_i = mctr_i \\cdot \\omega_i .. seealso:: :meth:`mctr` for Marginal Contribution to Risk ("mctr") as well as the `Risk Contribution `_ paper from MSCI Barra """ return mctr_series.mul(weight_series) def risk_contribution_as_proportion(mctr_series, weight_series): """ Returns the proprtion of the risk contribution for each asset, given the marginal contribution to risk ("mctr") and the ``weight_series`` of asset weights :ARGS: mctr_series: :class:`pandas.Series` of the marginal risk contribution weight_series: :class:`pandas.Series` of weights of each asset :RETURNS: :class:`pandas.Series` of the proportional risk contribution of each asset .. seealso:: :meth:`mctr` for Marginal Contribution to Risk ("mctr") as well as the `Risk Contribution `_ paper from MSCI Barra """ rc = mctr_series.mul(weight_series) return rc/rc.sum() def rolling_ui(series, window = 21): """ returns the rolling ulcer index over a series for a given ``window`` (instead of the squared deviations from the mean). :ARGS: series: ``pandas.Series`` of prices window: ``int`` of the size of the rolling window :RETURNS: ``pandas.Series``: of the rolling ulcer index .. code:: import visualize_wealth.performance as vwp ui = vwp.rolling_ui(price_series, window = 252) """ def _rolling_ui(series, window = 21): rui = pandas.Series(numpy.tile(numpy.nan, [len(series),]), index = series.index, name = 'rolling UI') j = 0 for i in numpy.arange(window, len(series)): rui[i] = ulcer_index(series[j:i]) j += 1 return rui if isinstance(series, pandas.DataFrame): return series.apply(lambda x: _rolling_ui(x, window = window)) else: return _rolling_ui(series) def adj_sharpe_ratio(series, rfr = 0., freq = 'daily'): """ Returns the `Ajusted Sharpe Ratio `_ of an asset, taking into account the kurtosis and skew of the returns. time series :ARGS: series: ``pandas.Series`` of prices rfr: ``float`` of the risk free rate freq: ``str`` of frequency, either ``daily, monthly, quarterly, or yearly`` :RETURN: ``float`` of the Adjusted Sharpe Ratio .. note:: Calculating Sharpe .. math:: \\textrm{SR_{adj}} = \\textrm{SR} \\cdot (1 + \\frac{S}{6}\\cdot \\textrm{SR} - \\frac{K - 3}{24} \\cdot \\textrm{SR}^2) \\textrm{where}, R_p &= \\textrm{series annualized return} \\\\ r_f &= \\textrm{Risk free rate} \\\\ \\sigma &= \\textrm{Portfolio annualized volatility} """ def _adj_sharpe_ratio(series, rfr = 0., freq = 'daily'): sr = sharpe_ratio(series, rfr = rfr, freq = freq) skew = log_returns(series).skew() kurt = log_returns(series).kurt() return sr * (1 + skew/6. * sr - (kurt - 3)/24 * sr**2) if isinstance(series, pandas.DataFrame): return series.apply(lambda x: _adj_sharpe_ratio(x, rfr = rfr, freq = freq)) else: return _adj_sharpe_ratio(series, rfr = rfr, freq = freq) def sharpe_ratio(series, rfr = 0., freq = 'daily'): """ Returns the `Sharpe Ratio `_ of an asset, given a price series, risk free rate of ``rfr``, and ``frequency`` of the time series :ARGS: series: ``pandas.Series`` of prices rfr: ``float`` of the risk free rate freq: ``str`` of frequency, either ``daily, monthly, quarterly, or yearly`` :RETURN: ``float`` of the Sharpe Ratio .. note:: Calculating Sharpe .. math:: \\textrm{SR} = \\frac{(R_p - r_f)}{\\sigma} \\: \\textrm{where}, R_p &= \\textrm{series annualized return} \\\\ r_f &= \\textrm{Risk free rate} \\\\ \\sigma &= \\textrm{Portfolio annualized volatility} """ def _sharpe_ratio(series, rfr = 0., freq = 'daily'): return (annualized_return(series, freq) - rfr)/annualized_vol( series, freq) if isinstance(series, pandas.DataFrame): return series.apply(lambda x: _sharpe_ratio(x, rfr = rfr, freq = freq)) else: return _sharpe_ratio(series, rfr = rfr, freq = freq) def sortino_ratio(series, freq = 'daily', rfr = 0.0): """ Returns the `Sortino Ratio `_, or excess returns per unit downside volatility :ARGS: series: ``pandas.Series`` of prices freq: ``str`` of either ``daily, monthly, quarterly, or yearly`` indicating the frequency of the data ``default=`` daily :RETURNS: float of the Sortino Ratio .. note:: Calculating the Sortino Ratio There are several calculation methodologies for the Sortino Ratio, this method using downside volatility, where .. math:: \\textrm{Sortino Ratio} = \\frac{(R-r_f)} {\\sigma_\\textrm{downside}} .. code:: import visualize_wealth.performance as vwp sortino_ratio = vwp.sortino_ratio(price_series, frequency = 'monthly') """ def _sortino_ratio(series, freq = 'daily'): return annualized_return(series, freq = freq)/downside_deviation( series, freq = freq) if isinstance(series, pandas.DataFrame): return series.apply(lambda x: _sortino_ratio(x, freq = freq)) else: return _sortino_ratio(series, freq = freq) def systematic_as_proportion(series, benchmark, freq = 'daily'): """ Returns the systematic risk as proportion of total volatility :ARGS: series: ``pandas.Series`` of prices benchmark: ``pandas.Series`` to compare ``series`` against freq: ``str`` of frequency, either ``daily, monthly, quarterly, or yearly`` :RETURNS: ``float`` between (0, 1) representing the proportion of volatility represented by systematic risk """ def _systematic_as_proportion(series, benchmark, freq = 'daily'): fac = _interval_to_factor(freq) return systematic_risk(series, benchmark, freq) **2 / ( annualized_vol(series, freq)**2) if isinstance(benchmark, pandas.DataFrame): return benchmark.apply(lambda x: _systematic_as_proportion( series, x, freq)) else: return _systematic_as_proportion(series, benchmark, freq) def systematic_risk(series, benchmark, freq = 'daily'): """ Returns the systematic risk, or the volatility that is directly attributable to the benchmark :ARGS: series: ``pandas.Series`` of prices benchmark: ``pandas.Series`` to compare ``series`` against freq: ``str`` of frequency, either ``daily, monthly, quarterly, or yearly`` :RETURNS: ``float`` of the systematic volatility (not variance) .. note:: Calculating Systematic Risk .. math:: \\sigma_b &= \\textrm{Volatility of the Benchmark} \\\\ \\sigma^2_{\\beta} &= \\textrm{Systematic Risk} \\\\ \\beta &= \\frac{\\sigma^2_{s, b}}{\\sigma^2_{b}} \\: \\textrm{then,} \\sigma^2_{\\beta} &= \\beta^2 \\cdot \\sigma^2_{b} \\Rightarrow \\sigma_{\\beta} &= \\beta \\cdot \\sigma_{b} """ def _systematic_risk(series, benchmark, freq = 'daily'): bench_rets = log_returns(benchmark) benchmark_vol = annualized_vol(benchmark) return benchmark_vol * beta(series, benchmark) if isinstance(benchmark, pandas.DataFrame): return benchmark.apply(lambda x: _systematic_risk(series, x, freq)) else: return _systematic_risk(series, benchmark, freq) def tracking_error(series, benchmark, freq = 'daily'): """ Returns a ``float`` of the `Tracking Error `_ or standard deviation of the active returns :ARGS: series: ``pandas.Series`` of prices benchmark: ``pandas.Series`` to compare ``series`` against freq: ``str`` of frequency, either ``daily, monthly, quarterly, or yearly`` :RETURNS: ``float`` of the tracking error .. note:: Calculating Tracking Error Let :math:`r_{a_i} =` "Active Return" for period :math:`i`, to calculate the compound linear difference between :math:`r_s` and :math:`r_b` is, .. math:: r_{a_i} = \\frac{(1+r_{s_i})}{(1+r_{b_i})}-1 Then, .. math:: \\textrm{TE} &= \\sigma_a \\cdot \\sqrt{k} \\\\ k &= \\textrm{Annualization factor} """ def _tracking_error(series, benchmark, freq = 'daily'): fac = _interval_to_factor(freq) series_rets = linear_returns(series) bench_rets = linear_returns(benchmark) return ((1 + series_rets).div( 1 + bench_rets) - 1).std()*numpy.sqrt(fac) if isinstance(benchmark, pandas.DataFrame): return benchmark.apply(lambda x: _tracking_error( series, x, freq = freq)) else: return _tracking_error(series, benchmark, freq = freq) def ulcer_index(series): """ Returns the ulcer index of the series, which is defined as the squared drawdowns (instead of the squared deviations from the mean). Further explanation can be found at `Tanger Tools `_ :ARGS: series: ``pandas.Series`` of prices :RETURNS: :float: the maximum drawdown of the period, expressed as a positive number .. code:: import visualize_wealth.performance as vwp ui = vwp.ulcer_index(price_series) """ def _ulcer_index(series): dd = 1. - series/series.cummax() ssdd = numpy.sum(dd**2) return numpy.sqrt(numpy.divide(ssdd, series.shape[0] - 1)) if isinstance(series, pandas.DataFrame): return series.apply(_ulcer_index) else: return _ulcer_index(series) def upcapture(series, benchmark): """ Returns the proportion of ``series``'s cumulative positive returns to ``benchmark``'s cumulative returns, given benchmark's returns were positive in that period :ARGS: series: :class:`pandas.Series` of prices benchmark: :class:`pandas.Series` of prices to compare ``series`` against :RETURNS: float: of the upcapture of cumulative positive returns .. seealso:: :py:data:`median_upcature(series, benchmark)` """ def _upcapture(series, benchmark): series_rets = log_returns(series) bench_rets = log_returns(benchmark) index = bench_rets > 0. return series_rets[index].mean() / bench_rets[index].mean() if isinstance(benchmark, pandas.DataFrame): return benchmark.apply(lambda x: _upcapture(series, x)) else: return _upcapture(series, benchmark) def upside_deviation(series, freq = 'daily'): """ Returns the volatility of the returns that are greater than zero :ARGS: series: :class:`pandas.Series` of prices freq: ``str`` of frequency, either ``daily, monthly, quarterly, or yearly`` :RETURNS: ``float`` of the upside standard deviation """ def _upside_deviation(series, freq = 'daily'): fac = _interval_to_factor(freq) series_rets = log_returns(series) index = series_rets > 0. return series_rets[index].std()*numpy.sqrt(fac) if isinstance(series, pandas.DataFrame): return series.apply(lambda x: _upside_deviation(x, freq = freq)) else: return _upside_deviation(series, freq) def var_cf(series, p = .01): """ VaR (Value at Risk), using the `Cornish Fisher Approximation `_. :ARGS: series: :class:`pandas.Series` or :class:`pandas.DataFrame` of prices p: :class:`float` of the :math:`\\alpha` percentile :RETURNS: :class:`float` or :class:`pandas.Series` of the VaR, where skew and kurtosis are used to adjust the tail density estimation (using the Cornish Fisher Approximation) """ series_rets = log_returns(series) mu, sigma = series_rets.mean(), series_rets.std() skew, kurt = series_rets.skew(), series_rets.kurtosis() - 3. v = lambda alpha: scipy.stats.distributions.norm.ppf(1 - alpha) V = v(p)+(1-v(p)**2)*skew/6+(5*v(p)-2*v(p)**3)*skew**2/36 + ( v(p)**3-3*v(p))*kurt/24 return numpy.exp(sigma * V - mu) - 1 def var_norm(series, p = .01): """ Value at Risk ("VaR") of the :math:`p = \\alpha` quantile, defines the loss, such that there is an :math:`\\alpha` percent chance of a loss, greater than or equal to :math:`\\textrm{VaR}_\\alpha`. :meth:`var_norm` fits a normal distribution to the log returns of the series, and then estimates the :math:`\\textrm{VaR}_\\alpha` :ARGS: series: :class:`pandas.Series` or :class:`pandas.DataFrame` of prices p: :class:`float` of the :math:`\\alpha` quantile for which to estimate VaR :RETURNS: :class:`float` or :class:`pandas.Series` of VaR .. note:: Derivation of Value at Risk Let :math:`Y \\sim N(\\mu, \\sigma^2)`, we choose :math:`y_\\alpha` such that :math:`\\mathbb{P}(Y < y_\\alpha) = \\alpha`. Then, .. math:: \\mathbb{P}(Y < y_\\alpha) &= \\alpha \\\\ \\Rightarrow \\mathbb{P}(\\frac{Y - \\mu}{\\sigma} < \\frac{y_\\alpha - \\mu}{\\sigma}) &= \\alpha \\\\ \\Rightarrow \\mathbb{P}(Z < \\frac{y_\\alpha - \\mu}{\sigma} &= \\alpha \\\\ \\Rightarrow \\Phi(\\frac{y_\\alpha - \\mu}{\\sigma} ) &= \\alpha, where :math:`\\Phi(.)` is the standard normal cdf operator. Then using the inverse of the function :math:`\\Phi`, we have: .. math:: \\Phi^{-1}( \\Phi(\\frac{y_\\alpha - \\mu}{\\sigma} ) ) &= \\Phi^{-1}(\\alpha) \\\\ \\Rightarrow \\Phi^{-1}(\\alpha)\\cdot\\sigma + \\mu = y_\\alpha But :math:`y_\\alpha` is negative and VaR is always positive, so, .. math:: VaR_\\alpha = -y_\\alpha &= -\\Phi^{-1} (\\alpha)\\cdot\\sigma - \\mu \\\\ &= \\Phi^{-1}(1 - \\alpha) - \\mu \\\\ .. seealso:: :meth:var_cf :meth:var_np """ def _var_norm(series, p): series_rets = log_returns(series) mu, sigma = series_rets.mean(), series_rets.std() v = lambda alpha: scipy.stats.distributions.norm.ppf(1 - alpha) return numpy.exp(sigma * v(p) - mu) - 1 if isinstance(series, pandas.DataFrame): return series.apply(lambda x: _var_norm(x, p = p)) else: return _var_norm(series, p = p) def var_np(series, p = .01): """ Return the non-parametric VaR (non-parametric estimate) for a given percentile, i.e. the loss for which there is less than a ``percentile`` chance of exceeding in a period of `freq`. :ARGS: series: ``pandas.Series`` of prices freq:``str`` of either ``daily, monthly, quarterly, or yearly`` indicating the frequency of the data ``default = daily`` percentile: ``float`` of the percentile at which to calculate VaR :RETURNS: float of the Value at Risk given a ``percentile`` .. code:: import visualize_wealth.performance as vwp var = vwp.value_at_risk(price_series, frequency = 'monthly', percentile = 0.1) """ def _var_np(series, p = .01): series_rets = linear_returns(series) #loss is always reported as positive return -1 * (numpy.percentile(series_rets, p*100.)) if isinstance(series, pandas.DataFrame): return series.apply(lambda x: _var_np(x, p = p)) else: return _var_np(series, p = p) def _interval_to_factor(interval): factor_dict = {'daily': 252, 'monthly': 12, 'quarterly': 4, 'yearly': 1} return factor_dict[interval] def _bool_interval_index(pandas_index, interval = 'monthly'): """ creates weekly, monthly, quarterly, or yearly intervals by creating a boolean index to be passed visa vie DataFrame.ix[bool_index, :] """ weekly = lambda x: x.weekofyear[1:] != x.weekofyear[:-1] monthly = lambda x: x.month[1:] != x.month[:-1] yearly = lambda x: x.year[1:] != x.year[:-1] ldom = lambda x: x.month[1:] != x.month[:-1] fdom = lambda x: numpy.append(False, x.month[1:]!=x.month[:-1]) qt = lambda x: numpy.append(False, x.quarter[1:]!=x.quarter[:-1]) time_dict = {'weekly':weekly, 'monthly': monthly, 'quarterly': qt, 'yearly': yearly, 'ldom':ldom, 'fdom':fdom} return time_dict[interval](pandas_index) ================================================ FILE: visualize_wealth/classify.py ================================================ #!/usr/bin/env python # encoding: utf-8 """ .. module:: visualize_wealth.classify.py Created by Benjamin M. Gross """ import argparse import datetime import numpy import pandas import os def classify_series_with_store(series, trained_series, store_path, calc_meth = 'x-inv-x', n = None): """ Determine the asset class of price series from an existing HDFStore with prices :ARGS: series: :class:`pandas.Series` or `pandas.DataFrame` of the price series to determine the asset class of trained_series: :class:`pandas.Series` of tickers and their respective asset classes store_path: :class:`string` of the location of the HDFStore to find asset prices calc_meth: :class:`string` of either ['x-inv-x', 'inv-x', 'exp-x'] to determine which calculation method is used n: :class:`integer` of the number of highest r-squared assets to include when classifying a new asset :RETURNS: :class:`string` of the tickers that have been estimated based on the method provided """ from .utils import index_intersect from .analyze import log_returns, r2_adj if series.name in trained_series.index: return trained_series[series.name] else: try: store = pandas.HDFStore(path = store_path, mode = 'r') except IOError: print store_path + " is not a valid path to HDFStore" return rsq_d = {} ys = log_returns(series) for key in store.keys(): key = key.strip('/') p = store.get(key) xs = log_returns(p['Adj Close']) ind = index_intersect(xs, ys) rsq_d[key] = r2_adj(benchmark = ys[ind], series = xs[ind]) rsq_df = pandas.Series(rsq_d) store.close() if not n: n = len(trained_series.unique()) + 1 return __weighting_method_agg_fun(series = rsq_df, trained_series = trained_series, n = n, calc_meth = calc_meth) def classify_series_with_online(series, trained_series, calc_meth = 'x-inv-x', n = None): """ Determine the asset class of price series from an existing HDFStore with prices :ARGS: series: :class:`pandas.Series` or `pandas.DataFrame` of the price series to determine the asset class of trained_series: :class:`pandas.Series` of tickers and their respective asset classes calc_meth: :class:`string` of either ['x-inv-x', 'inv-x', 'exp-x'] to determine which calculation method is used n: :class:`integer` of the number of highest r-squared assets to include when classifying a new asset :RETURNS: :class:`string` of the tickers that have been estimated based on the method provided """ from .utils import tickers_to_dict, index_intersect from .analyze import log_returns, r2_adj if series.name in trained_series.index: return trained_series[series.name] else: price_dict = tickers_to_dict(trained_series.index) rsq_d = {} ys = log_returns(series) for key in price_dict.keys(): p = price_dict[key] xs = log_returns(p['Adj Close']) ind = index_intersect(xs, ys) rsq_d[key] = r2_adj(benchmark = xs[ind], series = ys[ind]) rsq_df = pandas.Series(rsq_d) if not n: n = len(trained_series.unique()) + 1 return __weighting_method_agg_fun(series = rsq_df, trained_series = trained_series, n = n, calc_meth = calc_meth) def knn_exp_weighted(series, trained_series, n = None): """ Training data is a m x n matrix with 'training_tickers' as columns and rows of r-squared for different tickers and asset_class is a n x 1 result of the asset class :ARGS: series: :class:`pandas.Series` or :class:`pandas.DataFrame` of r-squared values trained_series: :class:`pandas.Series` of the columns and their respective asset classes n: :class:`integer` of the number of highest r-squared assets to include when classifying a new asset :RETURNS: :class:`string` of the tickers that have been estimated based on the n closest neighbors """ if not n: n = len(trained_series.unique()) + 1 return __weighting_method_agg_fun(series, trained_series, n, calc_meth = 'exp-x') def knn_inverse_weighted(series, trained_series, n = None): """ Training data is a m x n matrix with 'training_tickers' as columns and rows of r-squared for different tickers and asset_class is a n x 1 result of the asset class :ARGS: series: :class:`pandas.Series` or :class:`pandas.DataFrame` of r-squared values trained_series: :class:`pandas.Series` of the columns and their respective asset clasnses n: :class:`integer` of the number of highest r-squared assets to include when classifying a new asset :RETURNS: :class:`string` of the tickers that have been estimated based on the n closest neighbors """ if not n: n = len(trained_series.unique()) + 1 return __weighting_method_agg_fun(series, trained_series, n, calc_meth = 'inv-x') def knn_wt_inv_weighted(series, trained_series, n = None): """ Training data is a m x n matrix with 'training_tickers' as columns and rows of r-squared for different tickers and asset_class is a n x 1 result of the asset class :ARGS: series: :class:`pandas.Series` or :class:`pandas.DataFrame` of r-squared values trained_series: :class:`pandas.Series` of the columns and their respective asset clasnses n: :class:`integer` of the number of highest r-squared assets to include when classifying a new asset :RETURNS: :class:`string` of the tickers that have been estimated based on the n closest neighbors """ if not n: n = len(trained_series.unique()) + 1 return __weighting_method_agg_fun(series, trained_series, n, calc_meth = 'x-inv-x') def __weighting_method_agg_fun(series, trained_series, n, calc_meth): """ Generator function for the different calcuation methods to determine the asset class based on a Series or DataFrame of r-squared values :ARGS: series: :class:`pandas.Series` or :class:`pandas.DataFrame` of r-squared values trained_series: :class:`pandas.Series` of the columns and their respective asset classes n: :class:`integer` of the number of highest r-squared assets to include when classifying a new asset calc_meth: :class:`string` of either ['x-inv-x', 'inv-x', 'exp-x'] to determine which calculation method is used :RETURNS: :class:`string` of the asset class been estimated based on the n closest neighbors, or 'series' in the case when a :class:`DataFrame` has been provided instead of a :class:`Series` """ def weighting_method_agg_fun(series, trained_series, n, calc_meth): weight_map = {'x-inv-x': lambda x: x.div(1. - x), 'inv-x': lambda x: 1./(1. - x), 'exp-x': lambda x: numpy.exp(x) } key_map = trained_series[series.index] series = series.rename(index = key_map) wts = weight_map[calc_meth](series) wts = wts.sort(ascending = False, inplace = False) grp = wts[:n].groupby(wts[:n].index).sum() return grp.argmax() if isinstance(series, pandas.DataFrame): return series.apply( lambda x: weighting_method_agg_fun(x, trained_series, n, calc_meth), axis = 1) else: return weighting_method_agg_fun(series, trained_series, n, calc_meth) ================================================ FILE: visualize_wealth/construct_portfolio.py ================================================ #!/usr/bin/env python # encoding: utf-8 """ .. module:: visualize_wealth.construct_portfolio.py :synopsis: Engine to construct portfolios using three general methodologies .. moduleauthor:: Benjamin M. Gross """ import argparse import logging import pandas import numpy import pandas.io.data import datetime import urllib2 from .utils import (_open_store, tradeplus_tchunks, zipped_time_chunks ) def format_blotter(blotter_file): """ Pass in either a location of a blotter file (in ``.csv`` format) or blotter :class:`pandas.DataFrame` with all positive values and return a :class:`pandas.DataFrame` where Sell values are then negative values :ARGS: blotter_file: :class:`pandas.DataFrame` with at least index (dates of Buy / Sell) columns = ['Buy/Sell', 'Shares'] or a string of the file location to such a formatted file :RETURNS: blotter: of type :class:`pandas.DataFrame` where sell values have been made negative """ if isinstance(blotter_file, str): blot = pandas.DataFrame.from_csv(blotter_file) elif isinstance(blotter_file, pandas.DataFrame): blot = blotter_file.copy() #map to ascii blot['Buy/Sell'] = map(lambda x: x.encode('ascii', 'ingore'), blot['Buy/Sell']) #remove whitespaces blot['Buy/Sell'] = map(str.strip, blot['Buy/Sell']) #if the Sell values are not negative, make them negative if ((blot['Buy/Sell'] == 'Sell') & (blot['Shares'] > 0.)).any(): idx = (blot['Buy/Sell'] == 'Sell') & (blot['Shares'] > 0.) sub = blot[idx] sub['Shares'] = -1.*sub['Shares'] blot.update(sub) return blot def append_price_frame_with_dividends(ticker, start_date, end_date=None): """ Given a ticker, start_date, & end_date, return a :class:`pandas.DataFrame` with a Dividend Columns appended to it :ARGS: ticker: :meth:`str` of ticker start_date: :class:`datetime.datetime` or string of format "mm/dd/yyyy" end_date: a :class:`datetime.datetime` or string of format "mm/dd/yyyy" :RETURNS: price_df: a :class:`pandas.DataFrame` with columns ['Close', 'Adj Close', 'Dividends'] .. code:: python import visualze_wealth.construct portfolio as vwcp frame_with_divs = vwcp.append_price_frame_with_dividends('EEM', '01/01/2000', '01/01/2013') .. warning:: Requires Internet Connectivity Because the function calls the `Yahoo! API `_ internet connectivity is required for the function to work properly """ reader = pandas.io.data.DataReader if isinstance(start_date, str): start_date = datetime.datetime.strptime(start_date, "%m/%d/%Y") if end_date == None: end = datetime.datetime.today() elif isinstance(end_date, str): end = datetime.datetime.strptime(end_date, "%m/%d/%Y") else: end = end_date #construct the dividend data series b_str = 'http://ichart.finance.yahoo.com/table.csv?s=' if end_date == None: end_date = datetime.datetime.today() a = '&a=' + str(start_date.month) b = '&b=' + str(start_date.day) c = '&c=' + str(start_date.year) d = '&d=' + str(end.month) e = '&e=' + str(end.day) f = '&f=' + str(end.year) tail = '&g=v&ignore=.csv' url = b_str + ticker + a + b + c + d + e + f + tail socket = urllib2.urlopen(url) div_df = pandas.io.parsers.read_csv(socket, index_col = 0) price_df = reader(ticker, data_source = 'yahoo', start = start_date, end = end_date) return price_df.join(div_df).fillna(0.0) def calculate_splits(price_df, tol = .1): """ Given a ``price_df`` of the format :meth:`append_price_frame_with_dividends`, return a :class:`pandas.DataFrame` with a split factor columns named 'Splits' :ARGS: price_df: a :class:`pandas.DataFrame` with columns ['Close', 'Adj Close', 'Dividends'] tol: class:`float` of the tolerance to determine whether a split has occurred :RETURNS: price: :class:`pandas.DataFrame` with columns ['Close', 'Adj Close','Dividends', Splits'] .. code:: price_df_with_divs_and_split_ratios = vwcp.calculate_splits( price_df_with_divs, tol = 0.1) .. note:: Calculating Splits This function specifically looks at the ratios of close to adjusted close to determine whether a split has occurred. To see the manual calculations of this function, see ``visualize_wealth/tests/estimating when splits have occurred.xlsx`` """ div_mul = 1 - price_df['Dividends'].shift(-1).div(price_df['Close']) rev_cp = div_mul[::-1].cumprod()[::-1] rev_cp[-1] = 1.0 est_adj = price_df['Adj Close'].div(rev_cp) eps = est_adj.div(price_df['Close']) spl_mul = eps.div(eps.shift(1)) did_split = numpy.abs(spl_mul - 1) > tol splits = spl_mul[did_split] for date in splits.index: if splits[date] > 1.0: splits[date] = numpy.round(splits[date], 0) elif splits[date] < 1.0: splits[date] = 1./numpy.round(1./splits[date], 0) splits.name = 'Splits' return price_df.join(splits) def blotter_and_price_df_to_cum_shares(blotter_df, price_df): """ Given a blotter :class:`pandas.DataFrame` of dates, purchases (+/-), and price :class:`pandas.DataFrame` with Close Adj Close, Dividends, & Splits, calculate the cumulative share balance for the position :ARGS: blotter_df: a :class:`pandas.DataFrame` where index is buy/sell dates price_df: a :class:`pandas.DataFrame` with columns ['Close', 'Adj Close', 'Dividends', 'Splits'] :RETURNS: :class:`pandas.DataFrame` containing contributions, withdrawals, price values .. code:: python agg_stats_for_single_asset = vwcp.blotter_to_split_adj_shares( single_asset_blotter, split_adj_price_frame) .. note:: Calculating Position Value The sole reason you can't take the number of trades for a given asset, apply a :meth:`cumsum`, and then multiply by 'Close' for a given day is because of splits. Therefore, once this function has run, taking the cumulative shares and then multiplying by close **is** an appropriate way to determine aggregate position value for any given day """ blotter_df = blotter_df.sort_index() #make sure all dates in the blotter file are also in the price file #consider, if those dates aren't in price frame, assign the #"closest date" value msg = "Buy/Sell Dates not in Price File" assert blotter_df.index.isin(price_df.index).all(), msg #now cumsum the buy/sell chunks and mul by splits for total shares bs_series = pandas.Series() start_dts = blotter_df.index end_dts = blotter_df.index[1:].append( pandas.DatetimeIndex([price_df.index[-1]])) dt_chunks = zip(start_dts, end_dts) end = 0. for i, chunk in enumerate(dt_chunks): #print str(i) + ' of ' + str(len(dt_chunks)) + ' total' tmp = price_df[chunk[0]:chunk[1]][:-1] if chunk[1] == price_df.index[-1]: tmp = price_df[chunk[0]:chunk[1]] splits = tmp[pandas.notnull(tmp['Splits'])] vals = numpy.append(blotter_df['Buy/Sell'][chunk[0]] + end, splits['Splits'].values) dts = pandas.DatetimeIndex([chunk[0]]).append( splits['Splits'].index) tmp_series = pandas.Series(vals, index = dts) tmp_series = tmp_series.cumprod() tmp_series = tmp_series[tmp.index].ffill() bs_series = bs_series.append(tmp_series) end = bs_series[-1] bs_series.name = 'cum_shares' #construct the contributions, withdrawals, & cumulative investment #if a trade is missing a price, assign the 'Close' of that day no_price = blotter_df['Price'][pandas.isnull(blotter_df['Price'])] blotter_df.ix[no_price.index, 'Price'] = price_df.ix[no_price.index, 'Close'] contr = blotter_df['Buy/Sell'].mul(blotter_df['Price']) cum_inv = contr.cumsum() contr = contr[price_df.index].fillna(0.0) cum_inv = cum_inv[price_df.index].ffill() res = pandas.DataFrame({'cum_shares':bs_series, 'contr_withdrawal':contr, 'cum_investment':cum_inv}) return price_df.join(res) def construct_random_trades(split_df, num_trades): """ Create random trades on random trade dates, but never allow shares to go negative :ARGS: split_df: :class:`pandas.DataFrame` that has 'Close', 'Dividends', 'Splits' :RETURNS: blotter_frame: :class:`pandas.DataFrame` a blotter with random trades, num_trades .. note:: Why Create Random Trades? One disappointing aspect of any type of financial software is the fact that you **need** to have a portfolio to view what the software does (which never seemed like an appropriate "necessary" condition to me). Therefore, I've created comprehensive ability to create random trades for single assets, as well as random portfolios of assets, to avoid the "unnecessary condition" of having a portfolio to understand how to anaylze one. """ ind = numpy.sort(numpy.random.randint(0, len(split_df), size = num_trades)) #This unique makes sure there aren't double trade day entries #which breaks the function blotter_and_price_df_to_cum_shares ind = numpy.unique(ind) dates = split_df.index[ind] #construct random execution prices prices = [] for date in dates: u_lim = split_df.loc[date, 'High'] l_lim = split_df.loc[date, 'Low'] prices.append(numpy.random.rand()*(u_lim - l_lim + 1) + l_lim) trades = numpy.random.randint(-100, 100, size = len(ind)) trades = numpy.round(trades, -1) while numpy.any(trades.cumsum() < 0): trades[numpy.argmin(trades)] *= -1. return pandas.DataFrame({'Buy/Sell':trades, 'Price':prices}, index = dates) def blotter_to_cum_shares(blotter_series, ticker, start_date, end_date, tol): """ Aggregation function for :meth:`append_price_frame_with_dividend`, :meth:`calculate_splits`, and `:meth:`blotter_and_price_df_to_cum_shares`. Only blotter, ticker, start_date, & end_date are needed. :ARGS: blotter_series: a :class:`pandas.Series` with index of dates and values of quantity ticker: class:`str` the ticker for which the buys and sells occurs start_date: a :class:`string` or :class:`datetime.datetime` end_date: :class:`string` or :class:`datetime.datetime` tol: :class:`float` the tolerance to find the split dates (.1 recommended) :RETURNS: :class:`pandas.DataFrame` containing contributions, withdrawals, price values .. warning:: Requires Internet Connectivity Because the function calls the `Yahoo! API `_ internet connectivity is required for the function to work properly """ price_df = append_price_frame_with_dividends(ticker, start_date, end_date) split_df = calculate_splits(price_df) return blotter_and_price_df_to_cum_shares(blotter_series, split_df) def generate_random_asset_path(ticker, start_date, num_trades): """ Allows the user to input a ticker, start date, and num_trades to generate a :class:`pandas.DataFrame` with columns 'Open', 'Close', cum_withdrawals', 'cum_shares' (i.e. bypasses the need for a price :class:`pandas.DataFrame` to generate an asset path, as is required in :meth:`construct_random_trades` :ARGS: ticker: :class:`string` of the ticker to generate the path start_date: :class:`string` of format 'mm/dd/yyyy' or :class:`datetime` num_trades: :class:`int` of the number of trades to generate :RETURNS: :class:`pandas.DataFrame` with the additional columns 'cum_shares', 'contr_withdrawal', 'Splits', Dividends' .. warning:: Requires Internet Connectivity Because the function calls the `Yahoo! API `_ internet connectivity is required for the function to work properly """ if isinstance(start_date, str): start_date = datetime.datetime.strptime(start_date, "%m/%d/%Y") end_date = datetime.datetime.today() prices = append_price_frame_with_dividends(ticker, start_date) blotter = construct_random_trades(prices, num_trades) #blotter.to_csv('../tests/' + ticker + '.csv') return blotter_to_cum_shares(blotter_series = blotter, ticker = ticker, start_date = start_date, end_date = end_date, tol = .1) def generate_random_portfolio_blotter(tickers, num_trades): """ :meth:`construct_random_asset_path`, for multiple assets, given a list of tickers and a number of trades (to be used for all tickers). Execution prices will be the 'Close' of that ticker in the price DataFrame that is collected :ARGS: tickers: a :class:`list` with the tickers to be used num_trades: :class:`integer`, the number of trades to randomly generate for each ticker :RETURNS: :class:`pandas.DataFrame` with columns 'Ticker', 'Buy/Sell' (+ for buys, - for sells) and 'Price' .. warning:: Requires Internet Connectivity Because the function calls the `Yahoo! API `_ internet connectivity is required for the function to work properly """ blot_d = {} price_d = {} for ticker in tickers: tmp = append_price_frame_with_dividends( ticker, start_date = datetime.datetime(1990, 1, 1)) price_d[ticker] = calculate_splits(tmp) blot_d[ticker] = construct_random_trades(price_d[ticker], num_trades) ind = [] agg_d = {'Ticker':[], 'Buy/Sell':[], 'Price':[]} for ticker in tickers: for date in blot_d[ticker].index: ind.append(date) agg_d['Ticker'].append(ticker) agg_d['Buy/Sell'].append( blot_d[ticker].loc[date, 'Buy/Sell']) agg_d['Price'].append( blot_d[ticker].loc[date, 'Price']) return pandas.DataFrame(agg_d, index = ind) def panel_from_blotter(blotter_df): """ The aggregation function to construct a portfolio given a blotter of tickers, trades, and number of shares. :ARGS: agg_blotter_df: a :class:`pandas.DataFrame` with columns ['Ticker', 'Buy/Sell', 'Price'], where the 'Buy/Sell' column is the quantity of shares, (+) for buy, (-) for sell :RETURNS: :class:`pandas.Panel` with dimensions [tickers, dates, price data] .. note:: What to Do with your Panel The :class:`pandas.Panel` returned by this function has all of the necessary information to do some fairly exhaustive analysis. Cumulative investment, portfolio value (simply the ``cum_shares``*``close`` for all assets), closes, opens, etc. You've got a world of information about "your portfolio" with this object... get diggin! """ tickers = pandas.unique(blotter_df['Ticker']) start_date = blotter_df.sort_index().index[0] end_date = datetime.datetime.today() val_d = {} for ticker in tickers: blotter_series = blotter_df[blotter_df['Ticker'] == ticker] blotter_series = blotter_series.sort_index(inplace = True) val_d[ticker] = blotter_to_cum_shares(blotter_series, ticker, start_date, end_date, tol = .1) return pandas.Panel(val_d) def fetch_data_from_store_weight_alloc_method(weight_df, store_path): """ To speed up calculation time and allow for off-line functionality, provide a :class:`pandas.DataFrame` weight_df and point the function to an HDFStore :ARGS: weight_df: a :class:`pandas.DataFrame` with dates as index and tickers as columns store_path: :class:`string` of the location to an HDFStore :RETURNS: :class:`pandas.Panel` where: * :meth:`panel.items` are tickers * :meth:`panel.major_axis` dates * :meth:`panel.minor_axis:` price information, specifically: ['Open', 'Close', 'Adj Close'] """ store = _open_store(store_path) beg_port = weight_df.index.min() d = {} for ticker in weight_df.columns: try: d[ticker] = store.get(ticker) except KeyError as key: logging.exception("store.get({0}) ticker failed".format(ticker)) panel = pandas.Panel(d) #Check to make sure the earliest "full data date" is b/f first trade #first_price = max(map(lambda x: panel.loc[x, :, # 'Adj Close'].dropna().index.min(), panel.items)) #print the number of consectutive nans #for ticker in weight_df.columns: # print ticker + " " + str(vwa.consecutive(panel.loc[ticker, # first_price:, 'Adj Close'].isnull().astype(int)).max()) store.close() return panel.ffill() def fetch_data_for_weight_allocation_method(weight_df): """ To be used with `The Weight Allocation Method <./readme.html#the-weight-allocation-method>_` Given a weight_df with index of allocation dates and columns of percentage allocations, fetch the data using Yahoo!'s API and return a panel of dimensions [tickers, dates, price data], where ``price_data`` has columns ``['Open', 'Close','Adj Close'].`` :ARGS: weight_df: a :class:`pandas.DataFrame` with dates as index and tickers as columns :RETURNS: :class:`pandas.Panel` where: * :meth:`panel.items` are tickers * :meth:`panel.major_axis` dates * :meth:`panel.minor_axis:` price information, specifically: ['Open', 'Close', 'Adj Close'] .. warning:: Requires Internet Connectivity Because the function calls the `Yahoo! API `_ internet connectivity is required for the function to work properly """ reader = pandas.io.data.DataReader beg_port = weight_df.index.min() d = {} for ticker in weight_df.columns: try: d[ticker] = reader(ticker, 'yahoo', start = beg_port) except: print "didn't work for "+ticker+"!" #pull the data from Yahoo! panel = pandas.Panel(d) #Check to make sure the earliest "full data date" is b/f first trade #first_price = max(map(lambda x: panel.loc[x, :, # 'Adj Close'].dropna().index.min(), panel.items)) #print the number of consectutive nans #for ticker in weight_df.columns: # print ticker + " " + str(vwa.consecutive(panel.loc[ticker, # first_price:, 'Adj Close'].isnull().astype(int)).max()) return panel.ffill() def fetch_data_from_store_initial_alloc_method( initial_weights, store_path, start_date = '01/01/2000'): """ To speed up calculation time and allow for off-line functionality, provide a :class:`pandas.DataFrame` weight_df and point the function to an HDFStore :ARGS: weight_df: a :class:`pandas.DataFrame` with dates as index and tickers as columns store_path: :class:`string` of the location to an HDFStore :RETURNS: :class:`pandas.Panel` where: * :meth:`panel.items` are tickers * :meth:`panel.major_axis` dates * :meth:`panel.minor_axis:` price information, specifically: ['Open', 'Close', 'Adj Close'] """ msg = "Not all tickers in HDFStore" store = pandas.HDFStore(store_path) #assert vwu.check_store_for_tickers(initial_weights.index, store), msg #beg_port = datetime.sdat d = {} for ticker in initial_weights.index: try: d[ticker] = store.get(ticker) except KeyError as key: logging.exception("store.get({0}) ticker failed".format(ticker)) store.close() panel = pandas.Panel(d) #Check to make sure the earliest "full data date" is b/f first trade #first_price = max(map(lambda x: panel.loc[x, :, # 'Adj Close'].dropna().index.min(), panel.items)) #print the number of consectutive nans #for ticker in initial_weights.index: # print ticker + " " + str(vwa.consecutive(panel.loc[ticker, # first_price:, 'Adj Close'].isnull().astype(int)).max()) return panel.ffill() def fetch_data_for_initial_allocation_method(initial_weights, start_date = '01/01/2000'): """ To be used with `The Initial Allocaiton Method <./readme.html#the-initial-allocation-rebalancing-method>`_ Given initial_weights :class:`pandas.Series` with index of tickers and values of initial allocation percentages, fetch the data using Yahoo!'s API and return a panel of dimensions [tickers, dates, price data], where ``price_data`` has columns ``['Open', 'Close', 'Adj Close'].`` :ARGS: initial_weights :class:`pandas.Series` with tickers as index and weights as values :RETURNS: :class:`pandas.Panel` where: * :meth:`panel.items` are tickers * :meth:`panel.major_axis` dates * :meth:`panel.minor_axis` price information, specifically: ['Open', 'Close', 'Adj Close'] """ reader = pandas.io.data.DataReader d_0 = datetime.datetime.strptime(start_date, "%m/%d/%Y") d = {} for ticker in initial_weights.index: try: d[ticker] = reader(ticker, 'yahoo', start = d_0) except: print "Didn't work for " + ticker + "!" panel = pandas.Panel(d) #Check to make sure the earliest "full data date" is bf first trade #first_price = max(map(lambda x: panel.loc[x, :, # 'Adj Close'].dropna().index.min(), panel.items)) #print the number of consectutive nans #for ticker in initial_weights.index: # print ticker + " " + str(vwa.consecutive(panel.loc[ticker, # first_price: , 'Adj Close'].isnull().astype(int)).max()) return panel.ffill() def panel_from_weight_file(weight_df, price_panel, start_value): """ Returns a :class:`pandas.Panel` with the intermediate calculation steps of n0, c0_ac, and adj_q to calculate a portfolio's adjusted price path when provided a pandas.DataFrame of weight allocations and a starting value of the index :ARGS: weight_df of :class:`pandas.DataFrame` of a weight allocation with tickers for columns, index of dates and weight allocations to each of the tickers price_panel of :class:`pandas.Panel` with dimensions [tickers, index, price data] :RETURNS: :class:`pandas.Panel` with dimensions (tickers, dates, price data) """ #cols correspond 'value_calcs!' in "panel from weight file test.xlsx" cols = ['ac_c', 'c0_ac0', 'n0', 'Adj_Q'] #create the intervals spanning the trade dates index = price_panel.major_axis w_ind = weight_df.index time_chunks = tradeplus_tchunks(weight_index = w_ind, price_index = index ) p_val = start_value l = [] f_dt = w_ind[0] #for beg, fin in zip(int_beg, int_fin): for beg, fin in time_chunks: close = price_panel.loc[:, beg:fin, 'Close'] opn = price_panel.loc[:, beg:fin, 'Open'] adj = price_panel.loc[:, beg:fin, 'Adj Close'] n = len(close) cl_f = price_panel.loc[:, f_dt, 'Close'] ac_f = price_panel.loc[:, f_dt, 'Adj Close'] c0_ac0 = cl_f.div(ac_f) n0 = p_val*weight_df.xs(f_dt).div(cl_f) ac_c = adj.div(close) c0_ac0 = pandas.DataFrame(numpy.tile(c0_ac0, [n, 1]), index = close.index, columns = c0_ac0.index ) n0 = pandas.DataFrame(numpy.tile(n0, [n, 1]), index = close.index, columns = n0.index ) adj_q = c0_ac0.mul(ac_c).mul(n0) p_val = adj_q.xs(fin).mul(close.xs(fin)).sum() vac = adj_q.mul(close) vao = adj_q.mul(opn) panel = pandas.Panel.from_dict({'ac_c': ac_c, 'c0_ac0': c0_ac0, 'n0': n0, 'Adj_Q': adj_q, 'Value at Close': vac, 'Value at Open': vao} ) #set items and minor appropriately for pfp constructors panel = panel.transpose(2, 1, 0) l.append(panel) f_dt = fin agg = pandas.concat(l, axis = 1) return pandas.concat([agg, price_panel], join = 'inner', axis = 2 ) def mngmt_fee(price_series, bps_cost, frequency): """ Extract management fees from repr(price_series) of repr(bps_cost) every repr(frequency) :ARGS: price_series: :class:`DataFrame` of 'Open', 'Close' or :class:`pandas.Series` of 'Close' bps_cost: :class:`float` of the management fee in bps frequency: :class:`string` of the frequency to charge the management fee in ['yearly', 'quarterly', 'monthly', 'daily'] :RETURNS: same as repr(price_series) """ def time_dist(date, interval): """ Return the proportion of time left to the end of the interval, from the current date """ return None ln = lambda x, y: x.div(y).apply(numpy.log) fac = {'daily': 252., 'weekly': 52., 'monthly': 12., 'quarterly': 4., 'yearly': 1. } per_fee = bps_cost/10000./fac[frequency] if frequency is 'daily': p_ln = ln(x = price_series, y = price_series.shift(1) ) p_ln[0] = 0. fee = numpy.log(1. - per_fee) # charge the daily fee on the first day ret_p = price_series[0] cum_ret = (p_ln + fee).cumsum() return ret_p*numpy.exp(cum_ret) else: tcs = zipped_time_chunks(price_series.index, frequency ) p_o, p_e = tcs[0][0], tcs[0][1] rem_t = (p_e - p_o).days return None # determine the first fee # extract the first fee # create the log changes # create the fee costs # sum them # re-create the price series # return None def _tc_helper(weight_df, share_panel, tau, meth): """ Helpfer function for the tc_* functions Estimate the cumulative rolling transaction costs by ticker using the cents per share method of calculation. Can be used to directly subtract against tickers / asset classes to determine the asset and asset class impact of transaction costs. :ARGS: weight_df: :class:`pandas.DataFrame` weight allocation share_panel: :class:`pandas.Panel` with dimensions (tickers, dates, price/share data) tau: :class:`float` of the cost per share or basis points method: :class:`string` in ['bps', 'cps'] :RETURNS: :class:`pandas.DataFrame` of the cumulative transaction cost for each ticker """ def cps_cost(**kwargs): shares = kwargs['shares'] shares_prev = kwargs['shares_prev'] tau = kwargs['tau']/100. share_diff = abs(shares - shares_prev) return share_diff * tau def bps_cost(**kwargs): shares = kwargs['shares'] shares_prev = kwargs['shares_prev'] prices = kwargs['prices'] tau = kwargs['tau']/10000. share_diff = abs(shares - shares_prev) return share_diff.mul(prices) * tau meth_d = {'cps': cps_cost, 'bps': bps_cost } adj_q = share_panel.loc[:, :, 'Adj_Q'] price = share_panel.loc[:, :, 'Close'] tchunks = tradeplus_tchunks(weight_index = weight_df.index, price_index = share_panel.major_axis ) #slight finegle to get the tradeplus to be what we need sper, fper = zip(*tchunks) sper = sper[1:] fper = fper[:-1] t_o = weight_df.index[0] d = {t_o: meth_d[meth](**{'shares': adj_q.loc[t_o, :], 'shares_prev': 0., 'prices': price.loc[t_o, :], 'tau': tau} ) } for beg, fin in zip(fper, sper): d[fin] = meth_d[meth](**{'shares': adj_q.loc[fin, :], 'shares_prev': adj_q.loc[beg, :], 'tau':tau, 'prices': price.loc[fin, :]} ) tcost = pandas.DataFrame(d).transpose() cumcost = tcost.reindex(share_panel.major_axis) return cumcost.fillna(0.) def tc_cps(weight_df, share_panel, cps = 10.): """ Estimate the cumulative rolling transaction costs by ticker using the cents per share method of calculation. Can be used to directly subtract against tickers / asset classes to determine the asset and asset class impact of transaction costs. :ARGS: weight_df: :class:`pandas.DataFrame` weight allocation share_panel: :class:`pandas.Panel` with dimensions (tickers, dates, price/share data) cps: :class:`float` of the transaction cost in cents per share :RETURNS: :class:`pandas.DataFrame` of the cumulative transaction cost for each ticker """ return _tc_helper(weight_df = weight_df, share_panel = share_panel, meth = 'cps', tau = cps ) def tc_bps(weight_df, share_panel, bps = 10.): """ Estimate the cumulative rolling transaction costs by ticker as basis points of the total value of the transaction. Can be used to directly subtract against tickers / asset classes to determine the asset and asset class impact of transaction costs. :ARGS: weight_df: :class:`pandas.DataFrame` weight allocation share_panel: :class:`pandas.Panel` with dimensions (tickers, dates, price/share data) bps: :class:`float` of the transaction cost per trade, :RETURNS: :class:`pandas.DataFrame` of the cumulative transaction cost for each ticker """ return _tc_helper(weight_df = weight_df, share_panel = share_panel, meth = 'bps', tau = bps ) def net_tcs(tc_df, price_index): """ Incorporate transaction costs calculated using tc_cps or tc_bps into the value of an index (i.e. return the index value had transaction costs been accounted for using the given method). :ARGS: tc_df: :class:`pandas.DataFrame` of transaction costs using ether tc_cps or tc_bps price_index: :class:`pandas.Series` on which the transaction costs were calculated on :RETURNS: :class:`pandas.Series` of the adjusted index value """ #log returns are so ugly ln = lambda x, y: x.div(y).apply(numpy.log) tc_sum = tc_df.sum(axis = 1) tc_ln = numpy.log(1. - tc_sum.div(price_index)) p_ln = ln(x = price_index, y = price_index.shift(1) ) ln_sum = tc_ln.add(p_ln) ln_sum[0] = 0. p_o = price_index[0] - tc_sum[0] return p_o * numpy.exp(ln_sum.cumsum()) def weight_df_from_initial_weights(weight_series, price_panel, rebal_frequency, start_value = 1000., start_date = None): """ Returns a :class:`pandas.DataFrame` of weights that are used to construct the portfolio. Useful in determining tactical over / under weightings relative to other portfolios :ARGS: weight_series of :class:`pandas.Series` of a weight allocation with an index of tickers, and a name of the initial allocation price_panel of type :class:`pandas.Panel` with dimensions [tickers, index, price data] start_value: of type :class:`float` of the value to start the index rebal_frequency: :class:`string` of 'weekly', 'monthly', 'quarterly', 'yearly' :RETURNS: price: of type :class:`pandas.DataFrame` with portfolio 'Close' and 'Open' """ return initial_weight_help_fn(weight_series, price_panel, rebal_frequency, start_value, start_date, ret_val = 'weights') def panel_from_initial_weights(weight_series, price_panel, rebal_frequency, start_value = 1000, start_date = None): """ Returns a pandas.DataFrame with columns ['Close', 'Open'] when provided a pandas.Series of intial weight allocations, the date of those initial weight allocations (series.name), a starting value of the index, and a rebalance frequency (this is the classical "static" construction" methodology, rebalancing at somspecified interval) :ARGS: weight_series of :class:`pandas.Series` of a weight allocation with an index of tickers, and a name of the initial allocation price_panel of type :class:`pandas.Panel` with dimensions [tickers, index, price data] start_value: of type :class:`float` of the value to start the index rebal_frequency: :class:`string` of 'weekly', 'monthly', 'quarterly', 'yearly' :RETURNS: weight_df: of type :class:`pandas.DataFrame` of the rebalance weights and dates """ return initial_weight_help_fn(weight_series, price_panel, rebal_frequency, start_value, start_date, ret_val = 'panel') def initial_weight_help_fn(weight_series, price_panel, rebal_frequency, start_value = 1000., start_date = None, ret_val = 'panel'): #determine the first valid date and make it the start_date first_valid = numpy.max(price_panel.loc[:, :, 'Close'].apply( pandas.Series.first_valid_index)) if start_date == None: d_0 = first_valid index = price_panel.loc[:, d_0:, :].major_axis else: #make sure the the start_date begins after all assets are valid if isinstance(start_date, str): start_date = datetime.datetime.strptime(start_date, "%m/%d/%Y") assert start_date > first_valid, ( "first_valid index doesn't occur until after start_date") index = price_panel.loc[:, start_date, :].major_axis #the weigth_series must be a type series, but sometimes can be a #``pandas.DataFrame`` with len(columns) = 1 msg = "Initial Allocation is not Series" if isinstance(weight_series, pandas.DataFrame): assert len(weight_series.columns) == 1, msg weight_series = weight_series[weight_series.columns[0]] interval_dict = {'weekly':lambda x: x[:-1].week != x[1:].week, 'monthly': lambda x: x[:-1].month != x[1:].month, 'quarterly':lambda x: x[:-1].quarter != x[1:].quarter, 'yearly':lambda x: x[:-1].year != x[1:].year} #create a boolean array of rebalancing dates ind = numpy.append(True, interval_dict[rebal_frequency](index)) weight_df = pandas.DataFrame(numpy.tile(weight_series.values, [len(index[ind]), 1]), index = index[ind], columns = weight_series.index) if ret_val == 'panel': return panel_from_weight_file(weight_df, price_panel, start_value) else: return weight_df def pfp_from_weight_file(panel_from_weight_file): """ pfp stands for "Portfolio from Panel", so this takes the final ``pandas.Panel`` that is created in the portfolio construction process when weight file is given and generates a portfolio path of 'Open' and 'Close' :ARGS: panel_from_weight_file: a :class:`pandas.Panel` that was generated using ``panel_from_weight_file`` :RETURNS: portfolio prices in a :class:`pandas.DataFrame` with columns ['Open', 'Close'] .. note:: The Holy Grail of the Portfolio Path The portfolio path is what goes into all of the :mod:`analyze` functions. So once the `pfp_from_`... has been created, you've got all of the necessary bits to begin calculating performance metrics on a portfolio """ adj_q = panel_from_weight_file.loc[:, :, 'Adj_Q'] close = panel_from_weight_file.loc[:, :, 'Close'] opn = panel_from_weight_file.loc[:, :, 'Open'] ind_close = adj_q.mul(close).sum(axis = 1) ind_open = adj_q.mul(opn).sum(axis = 1) port_df = pandas.DataFrame({'Open': ind_open, 'Close': ind_close} ) return port_df def pfp_from_blotter(panel_from_blotter, start_value = 1000.): """ pfp stands for "Portfolio from Panel", so this takes the final :class`pandas.Panel` that is created in the portfolio construction process when a blotter is given and generates a portfolio path of 'Open' and 'Close' :ARGS: panel_from_blotter: a :class:`pandas.Panel` that was generated using ref:`panel_from_weight_file` start_value: :class:`float` of the starting value, default=1000 :RETURNS: portfolio prices in a :class:`pandas.DataFrame` with columns ['Open', 'Close'] .. note:: The Holy Grail of the Portfolio Path The portfolio path is what goes into all of the :mod:`analyze` functions. So once the `pfp_from_`... has been created, you've got all of the necessary bits to begin calculating performance metrics on your portfolio! .. note:: Another way to think of Portfolio Path This "Portfolio Path" is really nothing more than a series of prices that, should you have made the trades given in the blotter, would have been the the experience of someone investing `start_value` in your strategy when your strategy first begins, up until today. """ panel = panel_from_blotter.copy() index = panel.major_axis price_df = pandas.DataFrame(numpy.zeros([len(index), 2]), index = index, columns = ['Close', 'Open']) price_df.loc[index[0], 'Close'] = start_value #first determine the log returns for the series cl_to_cl_end_val = panel.ix[:, :, 'cum_shares'].mul( panel.ix[:, :, 'Close']).add(panel.ix[:, :, 'cum_shares'].mul( panel.ix[:, :, 'Dividends'])).sub( panel.ix[:, :, 'contr_withdrawal']).sum(axis = 1) cl_to_cl_beg_val = panel.ix[:, :, 'cum_shares'].mul( panel.ix[:, :, 'Close']).add(panel.ix[:, :, 'cum_shares'].mul( panel.ix[:, :, 'Dividends'])).sum(axis = 1).shift(1) op_to_cl_end_val = panel.ix[:, :, 'cum_shares'].mul( panel.ix[:, :, 'Close']).add(panel.ix[:, :, 'cum_shares'].mul( panel.ix[:, :, 'Dividends'])).sum(axis = 1) op_to_cl_beg_val = panel.ix[:, :, 'cum_shares'].mul( panel.ix[:, :, 'Open']).sum(axis = 1) cl_to_cl = cl_to_cl_end_val.div(cl_to_cl_beg_val).apply(numpy.log) op_to_cl = op_to_cl_end_val.div(op_to_cl_beg_val).apply(numpy.log) price_df.loc[index[1]:, 'Close'] = start_value*numpy.exp( cl_to_cl[1:].cumsum()) price_df['Open'] = price_df['Close'].div(numpy.exp(op_to_cl)) return price_df if __name__ == '__main__': usage = sys.argv[0] + "file_loc" description = "description" parser = argparse.ArgumentParser( description = description, usage = usage) parser.add_argument('arg_1', nargs = 1, type = str, help = 'help_1') parser.add_argument('arg_2', nargs = 1, type = int, help = 'help_2') args = parser.parse_args() ================================================ FILE: visualize_wealth/utils.py ================================================ #!/usr/bin/env python # encoding: utf-8 """ .. module:: visualize_wealth.utils.py .. moduleauthor:: Benjamin M. Gross """ import datetime import logging import pandas import numpy import os def append_dfs(prv_df, nxt_df): """ Return a single, sorted :class:`DataFrame` where prev_df is "stacked" with nxt_df and any overlapping dates are remvoed """ ind_a, ind_b = prv_df.index, nxt_df.index apnd = nxt_df.loc[~ind_b.isin(ind_a), :] return prv_df.append(apnd) def exchange_acs_for_ticker(weight_df, ticker_class_dict, date, asset_class, ticker, weight): """ It's common to wonder, what would happen if I took all tickers within a given asset class, zeroed them out, and used some other ticker beginning at some date. :ARGS: weight_df: class:`DataFrame` of the weight allocation frame ticker_class_dict: :class:`dictionary` of the tickers and the asset classes of each ticker date: :class:`string` of the date to zero out the existing tickers within an asset class and add ``ticker`` asset_class: :class:`string` of the 'asset_class' to exchange all tickers for 'ticker' ticker: :class:`string` the ticker to add to the weight_df weight: :class:`float` of the weight to assign to ``ticker`` :RETURNS: :class:`DataFrame` of the :class:PortfolioObject's rebal_weights, with ticker representing weight, beginning on date (or the first trade before) """ d = ticker_class_dict ind = weight_df.index #if the date is exact, use it, otherwise pick the previous one if ind[ind.searchsorted(date)] is not pandas.Timestamp(date): dt = ind[ind.searchsorted(date) - 1] else: dt = pandas.Datetime(date) #get the tickers with the given asset class l = [] for key, value in d.iteritems(): if value == asset_class: l.append(key) weight_df.loc[dt: , l] = 0. s = weight_df.sum(axis = 1) weight_df = weight_df.apply(lambda x: x.div(s)) return ticker_and_weight_into_weight_df(weight_df, ticker, weight, dt) def ticker_and_weight_into_weight_df(weight_df, ticker, weight, date): """ A helper function to insert a ticker, and its respective weight into a :class:`DataFrame` ``weight_df`` given a dynamic allocation strategy or a :class:`Series` given a static allocation strategy :ARGS: weight_df: :class:`pandas.DataFrame` to be used as a weight allocation to construct a portfolio ticker: :class:`string` to insert into the weight_df weight: :class:`float` of the weight to assign the ticker date: :class:`string`, :class:`datetime` or :class:`Timestamp` to first allocate ``weight`` to ``ticekr``, going forward. :RETURNS: :class:`pandas.DataFrame` where the weight_df weights have been proportionally re-distributed on or after ``date`` """ ret_df = weight_df.copy() ret_df[date:] = ret_df*(1. - weight) ret_df[ticker] = 0. ret_df.loc[date: , ticker] = weight return ret_df def epoch_to_datetime(pandas_obj): """ Convert string epochs to `pandas.DatetimeIndex` :ARGS: either a :class:`DataFrame` or :class:`Series` where index can be converted to datetimes :RETURNS: same as input type, but with index converted into Timestamps """ pandas_obj.index = pandas.to_datetime( pandas_obj.index.astype('int64'), unit = 'ms' ) return pandas_obj def append_store_prices(ticker_list, store_path, start = '01/01/1990'): """ Given an existing store located at ``path``, check to make sure the tickers in ``ticker_list`` are not already in the data set, and then insert the tickers into the store. :ARGS: ticker_list: :class:`list` of tickers to add to the :class:`pandas.HDStore` store_path: :class:`string` of the path to the :class:`pandas.HDStore` start: :class:`string` of the date to begin the price data :RETURNS: :class:`NoneType` but appends the store and comments the successes ands failures """ store = _open_store(store_path) store_keys = map(lambda x: x.strip('/'), store.keys()) not_in_store = numpy.setdiff1d(ticker_list, store_keys ) new_prices = tickers_to_dict(not_in_store, start = start) #attempt to add the new values to the store for val in new_prices.keys(): try: store.put(val, new_prices[val]) logging.log(20, "{0} has been stored".format( val)) except: logging.warning("{0} didn't store".format(val)) store.close() return None def check_store_for_tickers(ticker_list, store): """ Determine which, if any of the :class:`list` `ticker_list` are inside of the HDFStore. If all tickers are located in the store returns 1, otherwise returns 0 (provides a "check" to see if other functions can be run) :ARGS: ticker_list: iterable of tickers to be found in the store located at :class:`string` store_path store: :class:`HDFStore` of the location to the HDFStore :RETURNS: :class:`bool` True if all tickers are found in the store and False if not all the tickers are found in the HDFStore """ if isinstance(ticker_list, pandas.Index): #pandas.Index is not sortable, so much tolist() it ticker_list = ticker_list.tolist() store_keys = map(lambda x: x.strip('/'), store.keys()) not_in_store = numpy.setdiff1d(ticker_list, store_keys) #if len(not_in_store) == 0, all tickers are present if not len(not_in_store): #print "All tickers in store" ret_val = True else: for ticker in not_in_store: print "store does not contain " + ticker ret_val = False return ret_val def check_store_path_for_tickers(ticker_list, store_path): """ Determine which, if any of the :class:`list` `ticker_list` are inside of the HDFStore. If all tickers are located in the store returns 1, otherwise returns 0 (provides a "check" to see if other functions can be run) :ARGS: ticker_list: iterable of tickers to be found in the store located at :class:`string` store_path store_path: :class:`string` of the location to the HDFStore :RETURNS: :class:`bool` True if all tickers are found in the store and False if not all the tickers are found in the HDFStore """ store = _open_store(store_path) if isinstance(ticker_list, pandas.Index): #pandas.Index is not sortable, so much tolist() it ticker_list = ticker_list.tolist() store_keys = map(lambda x: x.strip('/'), store.keys()) not_in_store = numpy.setdiff1d(ticker_list, store_keys) store.close() #if len(not_in_store) == 0, all tickers are present if not len(not_in_store): print "All tickers in store" ret_val = True else: for ticker in not_in_store: print "store does not contain " + ticker ret_val = False return ret_val def check_trade_price_start(weight_df, price_df): """ Check to ensure that initial weights / trade dates are after the first available price for the same ticker :ARGS: weight_df: :class:`pandas.DataFrame` of the weights to rebalance the portfolio price_df: :class:`pandas.DataFrame` of the prices for each of the tickers :RETURNS: :class:`pandas.Series` of boolean values for each ticker where True indicates the first allocation takes place after the first price (as desired) and False the converse """ #make sure all of the weight_df tickers are in price_df intrsct = set(weight_df.columns).intersection(set(price_df.columns)) if set(weight_df.columns) != intrsct: raise KeyError, "Not all tickers in weight_df are in price_df" ret_d = {} for ticker in weight_df.columns: first_alloc = (weight_df[ticker] > 0).argmin() first_price = price_df[ticker].notnull().argmin() ret_d[ticker] = first_alloc >= first_price return pandas.Series(ret_d) def create_data_store(ticker_list, store_path): """ Creates the ETF store to run the training of the logistic classificaiton tree :ARGS: ticker_list: iterable of tickers store_path: :class:`str` of path to ``HDFStore`` """ #check to make sure the store doesn't already exist if os.path.isfile(store_path): print "File " + store_path + " already exists" return store = pandas.HDFStore(store_path, 'w') success = 0 for ticker in ticker_list: try: tmp = tickers_to_dict(ticker, 'yahoo', start = '01/01/2000') store.put(ticker, tmp) print ticker + " added to store" success += 1 except: print "unable to add " + ticker + " to store" store.close() if success == 0: #none of it worked, delete the store print "Creation Failed" os.remove(path) print return None def first_price_date_get_prices(ticker_list): """ Given a list of tickers, pull down prices and return the first valid price date for each ticker in the list :ARGS: ticker_list: :class:`string` or :class:`list` of tickers :RETURNS: :class:`string` of 'dd-mm-yyyy' or :class:`list` of said strings """ #pull down the data into a DataFrame df = tickers_to_frame(ticker_list) return first_price_date_from_prices(df) def first_price_date_from_prices(frame): """ Given a :class:`pandas.DataFrame` of prices, return the first date that a price exists for each of the tickers :ARGS: ticker_list: :class:`string` or :class:`list` of tickers :RETURNS: :class:`string` of 'dd-mm-yyyy' or :class:`list` of said strings """ fvi = pandas.Series.first_valid_index if isinstance(frame, pandas.Series): return frame.fvi() else: return frame.apply(fvi, axis = 0) def first_valid_date(prices): """ Helper function to determine the first valid date from a set of different prices Can take either a :class:`dict` of :class:`pandas.DataFrame`s where each key is a ticker's 'Open', 'High', 'Low', 'Close', 'Adj Close' or a single :class:`pandas.DataFrame` where each column is a different ticker :ARGS: prices: either :class:`dictionary` or :class:`pandas.DataFrame` :RETURNS: :class:`pandas.Timestamp` """ iter_dict = { pandas.DataFrame: lambda x: x.columns, dict: lambda x: x.keys() } try: each_first = map(lambda x: prices[x].first_valid_index(), iter_dict[ type(prices) ](prices) ) return max(each_first) except KeyError: print "prices must be a DataFrame or dictionary" return def gen_gbm_price_series(num_years, N, price_0, vol, drift): """ Return a price series generated using GBM :ARGS: num_years: number of years (if 20 trading days, then 20/252) N: number of total periods price_0: starting price for the security vol: the volatility of the security return: the expected return of the security :RETURNS: Pandas.Series of length n of the simulated price series """ dt = num_years/float(N) e1 = (drift - 0.5*vol**2)*dt e2 = (vol*numpy.sqrt(dt)) cum_shocks = numpy.cumsum(numpy.random.randn(N,)) cum_drift = numpy.arange(1, N + 1) return pandas.Series(numpy.append( price_0, price_0*numpy.exp(cum_drift*e1 + cum_shocks*e2)[:-1])) def index_intersect(arr_a, arr_b): """ Return the intersection of two :class:`pandas` objects, either a :class:`pandas.Series` or a :class:`pandas.DataFrame` :ARGS: arr_a: :class:`pandas.DataFrame` or :class:`pandas.Series` arr_b: :class:`pandas.DataFrame` or :class:`pandas.Series` :RETURNS: :class:`pandas.DatetimeIndex` of the intersection of the two :class:`pandas` objects """ arr_a = arr_a.sort_index() arr_a = arr_a.dropna() arr_b = arr_b.sort_index() arr_b = arr_b.dropna() if arr_a.index.equals(arr_b.index) == False: return arr_a.index & arr_b.index else: return arr_a.index def index_multi_union(frame_list): """ Returns the index union of multiple :class:`pandas.DataFrame`'s or :class:`pandas.Series` :ARGS: frame_list: :class:`list` containing either ``DataFrame``'s or ``Series`` :RETURNS: :class:`pandas.DatetimeIndex` of the objects' intersection """ #check to make sure all objects are Series or DataFrames return reduce(lambda x, y: x | y, map(lambda x: x.dropna().index, frame_list) ) def index_multi_intersect(frame_list): """ Returns the index intersection of multiple :class:`pandas.DataFrame`'s or :class:`pandas.Series` :ARGS: frame_list: :class:`list` containing either ``DataFrame``'s or ``Series`` :RETURNS: :class:`pandas.DatetimeIndex` of the objects' intersection """ return reduce(lambda x, y: x & y, map(lambda x: x.dropna().index, frame_list) ) def join_on_index(df_list, index): """ pandas doesn't current have the ability to :meth:`concat` several :class:`DataFrame`'s on a provided :class:`DatetimeIndex`. This is a quick function to provide that functionality :ARGS: df_list: :class:`list` of :class:`DataFrame`'s index: :class:`Index` on which to join all of the DataFrames """ return pandas.concat( map( lambda x: x.reindex(index), df_list), axis = 1 ) def normalized_price(price_df): """ Return the normalized price of a :class:`pandas.Series` or :class:`pandas.DataFrame` :ARGS: price_df: :class:`pandas.Series` or :class:`pandas.DataFrame` :RETURNS: same as the input """ null_d = {pandas.DataFrame: lambda x: pandas.isnull(x).any().any(), pandas.Series: lambda x: pandas.isnull(x).any() } calc_d = {pandas.DataFrame: lambda x: x.div(x.iloc[0, :]), pandas.Series: lambda x: x.div(x[0]) } typ = type(price_df) if null_d[typ](price_df): raise ValueError, "cannot contain null values" return calc_d[typ](price_df) def rets_to_price(rets, ret_typ = 'log', start_value = 100.): """ Take a series of repr(rets), of type repr(ret_typ) and convert them into prices :ARGS: rets: :class:`Series` or :class:`DataFrame` of returns ret_typ: :class:`string` of the return type, either ['log', 'linear'] :RETURNS: same as provided type """ def _rets_to_price(rets, ret_typ, start_value): typ_d = {'log': lambda x: start_value * numpy.exp(x.cumsum()), 'linear': lambda x: start_value * (1. + x).cumprod() } fv = rets.first_valid_index() fd = rets.index[0] if fv == fd: # no nulls at the beginning p = typ_d[ret_typ](rets) p = normalized_price(p) * start_value else: cp = rets.copy() # copy to prepend with 0. loc = cp.index.get_loc(fv) fd = cp.index[loc - 1] cp[fd] = 0. p = typ_d[ret_typ](cp[fd:]) return p if isinstance(rets, pandas.Series): return _rets_to_price(rets = rets, ret_typ = ret_typ, start_value = start_value ) elif isinstance(rets, pandas.DataFrame): return rets.apply( lambda x: _rets_to_price(rets = x, ret_typ = ret_typ, start_value = start_value ), axis = 0 ) else: raise TypeError, "rets must be Series or DataFrame" def perturbate_asset(frame, key, eps): """ Perturbate an asset within a weight allocation frame in the amount eps :ARGS: frame :class:`pandas.DataFrame` of a weight_allocation frame key: :class:`string` of the asset to perturbate_asset eps: :class:`float` of the amount to perturbate in relative terms :RETURNS: :class:`pandas.DataFrame` of the perturbed weight_df """ from .analyze import linear_returns pert_series = pandas.Series(numpy.zeros_like(frame[key]), index = frame.index ) lin_ret = linear_returns(frame[key]) lin_ret = lin_ret.mul(1. + eps) pert_series[0] = p_o = frame[key][0] pert_series[1:] = p_o * (1. + lin_ret[1:]) ret_frame = frame.copy() ret_frame[key] = pert_series return ret_frame def setup_trained_hdfstore(trained_data, store_path): """ The ``HDFStore`` doesn't work properly when it's compiled by different versions, so the appropriate thing to do is to setup the trained data locally (and not store the ``.h5`` file on GitHub). :ARGS: trained_data: :class:`pandas.Series` with tickers in the index and asset classes for values store_path: :class:`str` of where to create the ``HDFStore`` """ create_data_store(trained_data.index, store_path) return None def tickers_to_dict(ticker_list, api = 'yahoo', start = '01/01/1990'): """ Utility function to return ticker data where the input is either a ticker, or a list of tickers. :ARGS: ticker_list: :class:`list` in the case of multiple tickers or :class:`str` in the case of one ticker api: :class:`string` identifying which api to call the data from. Either 'yahoo' or 'google' start: :class:`string` of the desired start date :RETURNS: :class:`dictionary` of (ticker, price_df) mappings or a :class:`pandas.DataFrame` when the ``ticker_list`` is :class:`str` """ if isinstance(ticker_list, (str, unicode)): return __get_data(ticker_list, api = api, start = start) else: d = {} for ticker in ticker_list: d[ticker] = __get_data(ticker, api = api, start = start) return d def tickers_to_frame(ticker_list, api = 'yahoo', start = '01/01/1990', join_col = 'Adj Close'): """ Utility function to return ticker data where the input is either a ticker, or a list of tickers. :ARGS: ticker_list: :class:`list` in the case of multiple tickers or :class:`str` in the case of one ticker api: :class:`string` identifying which api to call the data from. Either 'yahoo' or 'google' start: :class:`string` of the desired start date join_col: :class:`string` to aggregate the :class:`pandas.DataFrame` :RETURNS: :class:`pandas.DataFrame` of (ticker, price_df) mappings or a :class:`pandas.DataFrame` when the ``ticker_list`` is :class:`str` """ if isinstance(ticker_list, (str, unicode)): return __get_data(ticker_list, api = api, start = start)[join_col] else: d = {} for ticker in ticker_list: tmp = __get_data(ticker, api = api, start = start ) d[ticker] = tmp[join_col] return pandas.DataFrame(d) def ticks_to_frame_from_store(ticker_list, store_path, join_col = 'Adj Close'): """ Utility function to return ticker data where the input is either a ticker, or a list of tickers. :ARGS: ticker_list: :class:`list` in the case of multiple tickers or :class:`str` in the case of one ticker store_path: :class:`str` of the path to the store join_col: :class:`string` to aggregate the :class:`pandas.DataFrame` :RETURNS: :class:`pandas.DataFrame` of (ticker, price_df) mappings or a :class:`pandas.DataFrame` when the ``ticker_list`` is :class:`str` """ store = _open_store(store_path) if isinstance(ticker_list, (str, unicode)): ret_series = store[ticker_list][join_col] store.close() return ret_series else: d = {} for ticker in ticker_list: d[ticker] = store[ticker][join_col] store.close() price_df = pandas.DataFrame(d) d_o = first_valid_date(price_df) price_df = price_df.loc[d_o:, :] return price_df def create_store_master_index(store_path): """ Add a master index, key = 'IND3X', to HDFStore located at store_path :ARGS: store_path: :class:`string` the location of the ``HDFStore`` file :RETURNS: :class:`NoneType` but updates the ``HDF5`` file """ store = _open_store(store_path) keys = store.keys() if '/IND3X' in keys: print "u'IND3X' already exists in HDFStore at {0}".format(store_path) store.close() return else: union = union_store_indexes(store) store.put('IND3X', pandas.Series(union, index = union)) store.close() def union_store_indexes(store): """ Return the union of all Indexes within a store located inside store :ARGS: store: :class:`HDFStore` :RETURNS: :class:`pandas.DatetimeIndex` of the union of all indexes within the store """ key_iter = (key for key in store.keys()) ind = store.get(key_iter.next()).index union = ind.copy() for key in key_iter: union = union | store.get(key).index return union def create_store_cash(store_path): """ Create a cash price, key = u'CA5H' in an HDFStore located at store_path :ARGS: store_path: :class:`string` the location of the ``HDFStore`` file :RETURNS: :class:`NoneType` but updates the ``HDF5`` file, and prints to screen which values would not update """ store = _open_store(store_path) keys = store.keys() if '/CA5H' in keys: logging.log(1, "CA5H prices already exists") store.close() return if '/IND3X' not in keys: m_index = union_store_indexes(store) else: m_index = store.get('IND3X') cols = ['Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close'] n_dates, n_cols = len(m_index), len(cols) df = pandas.DataFrame(numpy.ones([n_dates, n_cols]), index = m_index, columns = cols ) store.put('CA5H', df) store.close() return def update_store_master_index(store_path): """ Intelligently update the store 'IND3X', this can only be done after the prices at the store path have been updated """ store = _open_store(store_path) try: stored_data = store.get('IND3X') except KeyError: logging.exception("store doesn't contain IND3X") store.close() raise last_stored_date = stored_data.dropna().index.max() today = datetime.datetime.date(datetime.datetime.today()) if last_stored_date < pandas.Timestamp(today): union_ind = union_store_indexes(store) tmp = pandas.Series(union_ind, index = union_ind) #need to drop duplicates because there's 1 row of overlap tmp = stored_data.append(tmp) tmp.drop_duplicates(inplace = True) store.put('IND3X', tmp) store.close() return None def update_store_cash(store_path): """ Intelligently update the values of CA5H based on existing keys in the store, and existing columns of the CA5H values :ARGS: store_path: :class:`string` the location of the ``HDFStore`` file :RETURNS: :class:`NoneType` but updates the ``HDF5`` file, and prints to screen which values would not update """ store = _open_store(store_path) td = datetime.datetime.today() try: master_ind = store.get('IND3X') cash = store.get('CA5H') except KeyError: print "store doesn't contain {0} and / or {1}".format( 'CA5H', 'IND3X') store.close() raise last_cash_dt = cash.dropna().index.max() today = datetime.datetime.date(td) if last_cash_dt < pandas.Timestamp(today): try: n = len(master_ind) cols = ['Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close'] cash = pandas.DataFrame( numpy.ones([n, len(cols)]), index = master_ind, columns = cols ) store.put('CA5H', cash) except: print "Error updating cash" store.close() return None def strip_vals(keys): """ Return a stripped value for each key in keys :ARGS: keys: :class:`list` of string values (usually tickers) :RETURNS: same as input class with whitespace stripped out """ return list((x.strip() for x in keys)) def update_store_prices(store_path, store_keys = None): """ Update to the most recent prices for all keys of an existing store, located at ``store_path``. :ARGS: store_path: :class:`string` the location of the ``HDFStore`` file store_keys: :class:`list` of keys to update :RETURNS: :class:`NoneType` but updates the ``HDF5`` file, and prints to screen which values would not update .. note:: If special keys exist (like, CASH, or INDEX), then keys can be passed to update to ensure that the store does not try to update those keys """ def _cleaned_keys(keys): """ Remove the CA5H and IND3X keys from the list if they are present """ blk_lst = ['IND3X', 'CA5H', '/IND3X', '/CA5H'] for key in blk_lst: try: keys.remove(key) print "{0} removed".format(key) except: print "{0} not in keys".format(key) return keys reader = pandas.io.data.DataReader strftime = datetime.datetime.strftime today_str = strftime(datetime.datetime.today(), format = '%m/%d/%Y') store = _open_store(store_path) if not store_keys: store_keys = store.keys() store_keys = _cleaned_keys(store_keys) for key in store_keys: stored_data = store.get(key) last_stored_date = stored_data.dropna().index.max() today = datetime.datetime.date(datetime.datetime.today()) if last_stored_date < pandas.Timestamp(today): try: tmp = reader(key.strip('/'), 'yahoo', start = strftime( last_stored_date, format = '%m/%d/%Y')) #need to drop duplicates because there's 1 row of overlap tmp = stored_data.append(tmp) tmp["index"] = tmp.index tmp.drop_duplicates(cols = "index", inplace = True) tmp = tmp[tmp.columns[tmp.columns != "index"]] store.put(key, tmp) except: print "could not update {0}".format(key) logging.exception("could not update {0}".format(key)) store.close() return None def zipped_time_chunks(index, interval, incl_T = False): """ Given different period intervals, return a zipped list of tuples of length 'period_interval', containing only full periods .. note:: The function assumes indexes are of 'daily_frequency' :ARGS: index: :class:`pandas.DatetimeIndex` per_interval: :class:`string` either 'weekly, 'monthly', 'quarterly', or 'yearly' """ time_d = {'weekly': lambda x: x.week, 'monthly': lambda x: x.month, 'quarterly':lambda x:x.quarter, 'yearly':lambda x: x.year} prv = time_d[interval](index[:-1]) nxt = time_d[interval](index[1:]) ind = prv != nxt if ind[0]: # index started on the last day of period index = index.copy()[1:] # remove first elem prv = time_d[interval](index[:-1]) nxt = time_d[interval](index[1:]) ind = prv != nxt if incl_T: if not ind[-1]: # doesn't already end on True ind = numpy.append(ind, True) ldop = index[ind] # last day of period f_ind = numpy.append(True, ind[:-1]) fdop = index[f_ind] # first day of period return zip(fdop, ldop) def tradeplus_tchunks(weight_index, price_index): """ Return zipped time intervals of trade signal and trade signal + 1 :ARGS: weight_index: :class:`pandas.DatetimeIndex` of the weight allocation frame of generated signals price_index: :class:`pandas.DatetimeIndex` for all the price data :RETURNS: :class:`tuple` of int_beg, the t + 1 date after the weight signal and int_fin, the next weight signal (or last date in the price_index) .. note:: having consecutive, non-overlapping intervals is commonly used for things such as optimizing share calculation algorithms, transaction cost calculation, etc. """ locs = list(price_index.get_loc(key) + 1 for key in weight_index) do = pandas.DatetimeIndex([weight_index[0]]) int_beg = price_index[locs[1:]] int_beg = do.append(int_beg) int_fin = weight_index[1:] dT = pandas.DatetimeIndex([price_index[-1]]) int_fin = int_fin.append(dT) return zip(int_beg, int_fin) def _open_store(store_path): """ open an HDFStore located at store_path with the appropriate error handling :ARGS: store_path: :class:`string` where the store is located :RETURNS: :class:`HDFStore` instance """ try: store = pandas.HDFStore(path = store_path, mode = 'r+') return store except IOError: logging.exception( "{0} is not a valid path to an HDFStore Object".format(store_path) ) raise def __get_data(ticker, api, start): """ Helper function to get Yahoo! Data with exceptions built in and messages that confirm success for given tickers ARGS: ticker: either a :class:`string` of a ticker or a :class:`list` of tickers api: :class:`string` the api from which to get the data, 'yahoo'or 'google' start: :class:`string` the start date to start the data series """ reader = pandas.io.data.DataReader try: data = reader(ticker, api, start = start) return data except: print "failed for " + ticker return