Repository: eliangcs/pystock-crawler Branch: master Commit: 8b803c8944f3 Files: 30 Total size: 216.6 KB Directory structure: gitextract_mp6yf35w/ ├── .gitignore ├── .travis.yml ├── LICENSE ├── MANIFEST.in ├── README.rst ├── bin/ │ └── pystock-crawler ├── pystock_crawler/ │ ├── __init__.py │ ├── exporters.py │ ├── items.py │ ├── loaders.py │ ├── settings.py │ ├── spiders/ │ │ ├── __init__.py │ │ ├── edgar.py │ │ ├── nasdaq.py │ │ └── yahoo.py │ ├── tests/ │ │ ├── __init__.py │ │ ├── base.py │ │ ├── test_cmdline.py │ │ ├── test_loaders.py │ │ ├── test_spiders_edgar.py │ │ ├── test_spiders_nasdaq.py │ │ ├── test_spiders_yahoo.py │ │ └── test_utils.py │ ├── throttle.py │ └── utils.py ├── pytest.ini ├── requirements-test.txt ├── requirements.txt ├── scrapy.cfg └── setup.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ *.csv *.log *.pyc .coverage .scrapy/ .~* build/ dist/ pystock_crawler.egg-info/ pystock_crawler/tests/sample_data/ ================================================ FILE: .travis.yml ================================================ language: python python: - 2.7 branches: only: - master install: - pip install -r requirements.txt - pip install -r requirements-test.txt script: - py.test after_success: - pip install python-coveralls - coveralls ================================================ FILE: LICENSE ================================================ The MIT License (MIT) Copyright (c) 2013 Chang-Hung Liang Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: MANIFEST.in ================================================ include README.rst LICENSE requirements.txt ================================================ FILE: README.rst ================================================ pystock-crawler =============== .. image:: https://badge.fury.io/py/pystock-crawler.png :target: http://badge.fury.io/py/pystock-crawler .. image:: https://travis-ci.org/eliangcs/pystock-crawler.png?branch=master :target: https://travis-ci.org/eliangcs/pystock-crawler .. image:: https://coveralls.io/repos/eliangcs/pystock-crawler/badge.png?branch=master :target: https://coveralls.io/r/eliangcs/pystock-crawler ``pystock-crawler`` is a utility for crawling historical data of US stocks, including: * Ticker symbols listed in NYSE, NASDAQ or AMEX from `NASDAQ.com`_ * Daily prices from `Yahoo Finance`_ * Fundamentals from 10-Q and 10-K filings (XBRL) on `SEC EDGAR`_ Example Output -------------- NYSE ticker symbols:: DDD 3D Systems Corporation MMM 3M Company WBAI 500.com Limited ... Apple's daily prices:: symbol,date,open,high,low,close,volume,adj_close AAPL,2014-04-28,572.80,595.75,572.55,594.09,23890900,594.09 AAPL,2014-04-25,564.53,571.99,563.96,571.94,13922800,571.94 AAPL,2014-04-24,568.21,570.00,560.73,567.77,27092600,567.77 ... Google's fundamentals:: symbol,end_date,amend,period_focus,fiscal_year,doc_type,revenues,op_income,net_income,eps_basic,eps_diluted,dividend,assets,cur_assets,cur_liab,cash,equity,cash_flow_op,cash_flow_inv,cash_flow_fin GOOG,2009-06-30,False,Q2,2009,10-Q,5522897000.0,1873894000.0,1484545000.0,4.7,4.66,0.0,35158760000.0,23834853000.0,2000962000.0,11911351000.0,31594856000.0,3858684000.0,-635974000.0,46354000.0 GOOG,2009-09-30,False,Q3,2009,10-Q,5944851000.0,2073718000.0,1638975000.0,5.18,5.13,0.0,37702845000.0,26353544000.0,2321774000.0,12087115000.0,33721753000.0,6584667000.0,-3245963000.0,74851000.0 GOOG,2009-12-31,False,FY,2009,10-K,23650563000.0,8312186000.0,6520448000.0,20.62,20.41,0.0,40496778000.0,29166958000.0,2747467000.0,10197588000.0,36004224000.0,9316198000.0,-8019205000.0,233412000.0 ... Installation ------------ Prerequisites: * Python 2.7 ``pystock-crawler`` is based on Scrapy_, so you will also need to install prerequisites such as lxml_ and libffi_ for Scrapy and its dependencies. On Ubuntu, for example, you can install them like this:: sudo apt-get update sudo apt-get install -y gcc python-dev libffi-dev libssl-dev libxml2-dev libxslt1-dev build-essential See `Scrapy's installation guide`_ for more details. After installing prerequisites, you can then install ``pystock-crawler`` with ``pip``:: (sudo) pip install pystock-crawler Quickstart ---------- **Example 1.** Fetch Google's and Yahoo's daily prices ordered by date:: pystock-crawler prices GOOG,YHOO -o out.csv --sort **Example 2.** Fetch daily prices of all companies listed in ``./symbols.txt``:: pystock-crawler prices ./symbols.txt -o out.csv **Example 3.** Fetch Facebook's fundamentals during 2013:: pystock-crawler reports FB -o out.csv -s 20130101 -e 20131231 **Example 4.** Fetch fundamentals of all companies in ``./nyse.txt`` and direct the log to ``./crawling.log``:: pystock-crawler reports ./nyse.txt -o out.csv -l ./crawling.log **Example 5.** Fetch all ticker symbols in NYSE, NASDAQ and AMEX:: pystock-crawler symbols NYSE,NASDAQ,AMEX -o out.txt Usage ----- Type ``pystock-crawler -h`` to see command help:: Usage: pystock-crawler symbols (-o OUTPUT) [-l LOGFILE] [-w WORKING_DIR] [--sort] pystock-crawler prices (-o OUTPUT) [-s YYYYMMDD] [-e YYYYMMDD] [-l LOGFILE] [-w WORKING_DIR] [--sort] pystock-crawler reports (-o OUTPUT) [-s YYYYMMDD] [-e YYYYMMDD] [-l LOGFILE] [-w WORKING_DIR] [-b BATCH_SIZE] [--sort] pystock-crawler (-h | --help) pystock-crawler (-v | --version) Options: -h --help Show this screen -o OUTPUT Output file -s YYYYMMDD Start date [default: ] -e YYYYMMDD End date [default: ] -l LOGFILE Log output [default: ] -w WORKING_DIR Working directory [default: .] -b BATCH_SIZE Batch size [default: 500] --sort Sort the result There are three commands available: * ``pystock-crawler symbols`` grabs ticker symbol lists * ``pystock-crawler prices`` grabs daily prices * ``pystock-crawler reports`` grabs fundamentals ```` is a comma-separated string that specifies the stock exchanges you want to include. Current, NYSE, NASDAQ and AMEX are supported. The output file of ``pystock-crawler symbols`` can be used for ```` argument in ``pystock-crawler prices`` and ``pystock-crawler reports`` commands. ```` can be an inline string separated with commas or a text file that lists symbols line by line. For example, the inline string can be something like ``AAPL,GOOG,FB``. And the text file may look like this:: # This line is comment AAPL Put anything you want here GOOG Since the text here is ignored FB Use ``-o`` to specify the output file. For ``pystock-crawler symbols`` command, the output format is a simple text file. For ``pystock-crawler prices`` and ``pystock-crawler reports`` the output format is CSV. ``-l`` is where the crawling logs go to. If not specified, the logs go to stdout. By default, the crawler uses the current directory as the working directory. If you don't want to use the current directoy, you can specify it with ``-w`` option. The crawler keeps HTTP cache in a directory named ``.scrapy`` under the working directory. The cache can save your time by avoid downloading the same web pages. However, the cache can be quite huge. If you don't need it, just delete the ``.scrapy`` directory after you've done crawling. ``-b`` option is only available to ``pystock-crawler reports`` command. It allows you to split a large symbol list into smaller batches. This is actually a workaround for an unresolved bug (#2). Normally you don't have to specify this option. Default value (500) works just fine. The rows in the output file are in an arbitrary order by default. Use ``--sort`` option to sort them by symbols and dates. But if you have a large output file, don't use --sort because it will be slow and eat a lot of memory. Developer Guide --------------- Installing Dependencies ~~~~~~~~~~~~~~~~~~~~~~~ :: pip install -r requirements.txt Running Test ~~~~~~~~~~~~ Install test requirements:: pip install -r requirements-test.txt Then run the test:: py.test This will download the test data (a lot of XML/XBRL files) from from `SEC EDGAR`_ on the fly, so it will take some time and disk space. The test data is saved to ``pystock_crawler/tests/sample_data`` directory. It can be reused on the next time you run the test. If you don't need them, just delete the ``sample_data`` directory. .. _libffi: https://sourceware.org/libffi/ .. _lxml: http://lxml.de/ .. _NASDAQ.com: http://www.nasdaq.com/ .. _Scrapy: http://scrapy.org/ .. _Scrapy's installation guide: http://doc.scrapy.org/en/latest/intro/install.html .. _SEC EDGAR: http://www.sec.gov/edgar/searchedgar/companysearch.html .. _virtualenv: http://www.virtualenv.org/ .. _virtualenvwrapper: http://virtualenvwrapper.readthedocs.org/ .. _Yahoo Finance: http://finance.yahoo.com/ ================================================ FILE: bin/pystock-crawler ================================================ #!/usr/bin/env python ''' Usage: pystock-crawler symbols (-o OUTPUT) [-l LOGFILE] [-w WORKING_DIR] [--sort] pystock-crawler prices (-o OUTPUT) [-s YYYYMMDD] [-e YYYYMMDD] [-l LOGFILE] [-w WORKING_DIR] [--sort] pystock-crawler reports (-o OUTPUT) [-s YYYYMMDD] [-e YYYYMMDD] [-l LOGFILE] [-w WORKING_DIR] [-b BATCH_SIZE] [--sort] pystock-crawler (-h | --help) pystock-crawler (-v | --version) Options: -h --help Show this screen -o OUTPUT Output file -s YYYYMMDD Start date [default: ] -e YYYYMMDD End date [default: ] -l LOGFILE Log output [default: ] -w WORKING_DIR Working directory [default: .] -b BATCH_SIZE Batch size [default: 500] --sort Sort the result ''' import codecs import math import os import sys import uuid from contextlib import contextmanager from docopt import docopt from scrapy import log try: import pystock_crawler except ImportError: # For development environment sys.path.append(os.getcwd()) import pystock_crawler def random_string(length=5): return uuid.uuid4().get_hex()[0:5] @contextmanager def tmp_scrapy_cfg(): content = '''# pystock_crawler scrapy.cfg [settings] default = pystock_crawler.settings [deploy] #url = http://localhost:6800/ project = pystock_crawler ''' filename = os.path.abspath('./scrapy.cfg') filename_bak = os.path.abspath('./scrapy-%s.cfg' % random_string()) if os.path.exists(filename): log.msg(u'Renaming %s -> %s' % (filename, filename_bak)) os.rename(filename, filename_bak) assert not os.path.exists(filename) log.msg(u'Creating temporary config: %s' % filename) with open(filename, 'w') as f: f.write(content) yield if os.path.exists(filename): log.msg(u'Deleting %s' % filename) os.remove(filename) if os.path.exists(filename_bak): log.msg(u'Renaming %s -> %s' % (filename_bak, filename)) os.rename(filename_bak, filename) def run_scrapy_command(cmd): log.msg('Command: %s' % cmd) with tmp_scrapy_cfg(): os.system(cmd) def count_symbols(symbols): if os.path.exists(symbols): # If `symbols` is a file with open(symbols) as f: count = 0 for line in f: line = line.rstrip() if line and not line.startswith('#'): count += 1 return count # If `symbols` is a comma-separated string return len(symbols.split(',')) def merge_files(target, sources, ignore_header=False): log.msg(u'Merging files to %s' % target) with codecs.open(target, 'w', 'utf-8') as out: for i, source in enumerate(sources): with codecs.open(source, 'r', 'utf-8') as f: if ignore_header and i > 0: try: f.next() # Ignore CSV header except StopIteration: break # Empty file out.write(f.read()) # Delete source files for filename in sources: log.msg(u'Deleting %s' % filename) os.remove(filename) def crawl_symbols(exchanges, output, log_file): command = 'scrapy crawl nasdaq -a exchanges="%s" -t symbollist' % exchanges if output: command += ' -o "%s"' % output if log_file: command += ' -s LOG_FILE="%s"' % log_file run_scrapy_command(command) def crawl(spider, symbols, start_date, end_date, output, log_file, batch_size): command = 'scrapy crawl %s -a symbols="%s" -t csv' % (spider, symbols) if start_date: command += ' -a startdate=%s' % start_date if end_date: command += ' -a enddate=%s' % end_date if log_file: command += ' -s LOG_FILE="%s"' % log_file if spider == 'edgar': # When crawling edgar filings, run the scrapy command batch by batch to # work around issue #2 num_symbols = count_symbols(symbols) num_batches = int(math.ceil(num_symbols / float(batch_size))) # Store sub-files so we can merge them later output_files = [] for i in xrange(num_batches): start = i * batch_size batch_cmd = command + ' -a limit=%d,%d' % (start, batch_size) if output: filename = '%s.%d' % (output, i + 1) batch_cmd += ' -o "%s"' % filename output_files.append(filename) run_scrapy_command(batch_cmd) merge_files(output, output_files, ignore_header=True) else: if output: command += ' -o "%s"' % output run_scrapy_command(command) def sort_symbols(filename): log.msg(u'Sorting: %s' % filename) with codecs.open(filename, 'r', 'utf-8') as f: lines = [line for line in f] lines = sorted(lines) with codecs.open(filename, 'w', 'utf-8') as f: f.writelines(lines) log.msg(u'Sorted: %s' % filename) def sort_csv(filename): log.msg(u'Sorting: %s' % filename) with codecs.open(filename, 'r', 'utf-8') as f: try: headers = f.next() except StopIteration: log.msg(u'No need to sort empty file: %s' % filename) return lines = [line for line in f] def line_cmp(line1, line2): a = line1.split(',') b = line2.split(',') length = min(len(a), len(b)) i = 0 while 1: result = cmp(a[i], b[i]) if result or i >= length: return result i += 1 lines = sorted(lines, cmp=line_cmp) with codecs.open(filename, 'w', 'utf-8') as f: f.write(headers) f.writelines(lines) log.msg(u'Sorted: %s' % filename) def print_version(): print 'pystock-crawler %s' % pystock_crawler.__version__ def main(): args = docopt(__doc__) symbols = args.get('') start_date = args.get('-s') end_date = args.get('-e') output = args.get('-o') log_file = args.get('-l') batch_size = args.get('-b') sorting = args.get('--sort') working_dir = args.get('-w') if args['prices']: spider = 'yahoo' elif args['reports']: spider = 'edgar' else: spider = None if symbols and os.path.exists(symbols): symbols = os.path.abspath(symbols) if output: output = os.path.abspath(output) if log_file: log_file = os.path.abspath(log_file) try: batch_size = int(batch_size) if batch_size <= 0: raise ValueError except ValueError: raise ValueError("BATCH_SIZE must be a positive integer, input is '%s'" % batch_size) try: os.chdir(working_dir) except OSError as err: sys.stderr.write('%s\n' % err) return if spider: log.start(logfile=log_file) crawl(spider, symbols, start_date, end_date, output, log_file, batch_size) if sorting and output: sort_csv(output) elif args['symbols']: log.start(logfile=log_file) exchanges = args.get('') crawl_symbols(exchanges, output, log_file) if sorting and output: sort_symbols(output) elif args['-v'] or args['--version']: print_version() if __name__ == '__main__': main() ================================================ FILE: pystock_crawler/__init__.py ================================================ __version__ = '0.8.2' ================================================ FILE: pystock_crawler/exporters.py ================================================ from scrapy.conf import settings from scrapy.contrib.exporter import BaseItemExporter, CsvItemExporter class CsvItemExporter2(CsvItemExporter): ''' The standard CsvItemExporter class does not pass the kwargs through to the CSV writer, resulting in EXPORT_FIELDS and EXPORT_ENCODING being ignored (EXPORT_EMPTY is not used by CSV). http://stackoverflow.com/questions/6943778/python-scrapy-how-to-get-csvitemexporter-to-write-columns-in-a-specific-order ''' def __init__(self, *args, **kwargs): kwargs['fields_to_export'] = settings.getlist('EXPORT_FIELDS') or None kwargs['encoding'] = settings.get('EXPORT_ENCODING', 'utf-8') super(CsvItemExporter2, self).__init__(*args, **kwargs) def _write_headers_and_set_fields_to_export(self, item): # HACK: Override this private method to filter fields that are in # fields_to_export but not in item if self.include_headers_line: item_fields = item.fields.keys() if self.fields_to_export: self.fields_to_export = filter(lambda a: a in item_fields, self.fields_to_export) else: self.fields_to_export = item_fields self.csv_writer.writerow(self.fields_to_export) class SymbolListExporter(BaseItemExporter): def __init__(self, file, **kwargs): self._configure(kwargs, dont_fail=True) self.file = file def export_item(self, item): self.file.write('%s\t%s\n' % (item['symbol'], item['name'])) ================================================ FILE: pystock_crawler/items.py ================================================ # Define here the models for your scraped items # # See documentation in: # http://doc.scrapy.org/en/latest/topics/items.html from scrapy.item import Item, Field class ReportItem(Item): # Trading symbol symbol = Field() # If this doc is an amendment to previously filed doc amend = Field() # Quarterly (10-Q) or annual (10-K) report doc_type = Field() # Q1, Q2, Q3, or FY for annual report period_focus = Field() fiscal_year = Field() end_date = Field() revenues = Field() op_income = Field() net_income = Field() eps_basic = Field() eps_diluted = Field() dividend = Field() # Balance sheet stuffs assets = Field() cur_assets = Field() cur_liab = Field() equity = Field() cash = Field() # Cash flow from operating, investing, and financing cash_flow_op = Field() cash_flow_inv = Field() cash_flow_fin = Field() class PriceItem(Item): # Trading symbol symbol = Field() # YYYY-MM-DD date = Field() open = Field() close = Field() high = Field() low = Field() adj_close = Field() volume = Field() class SymbolItem(Item): symbol = Field() name = Field() ================================================ FILE: pystock_crawler/loaders.py ================================================ import re from datetime import datetime, timedelta from scrapy import log from scrapy.contrib.loader import ItemLoader from scrapy.contrib.loader.processor import Compose, MapCompose, TakeFirst from scrapy.utils.misc import arg_to_iter from scrapy.utils.python import flatten from pystock_crawler.items import ReportItem DATE_FORMAT = '%Y-%m-%d' MAX_PER_SHARE_VALUE = 1000.0 # If number of characters of response body exceeds this value, # remove some useless text defined by RE_XML_GARBAGE to reduce memory usage THRESHOLD_TO_CLEAN = 20000000 # Used to get rid of "LONG STRING..." RE_XML_GARBAGE = re.compile(r'>([^<]{100,})<') class IntermediateValue(object): ''' Intermediate data that serves as output of input processors, i.e., input of output processors. "Intermediate" is shorten as "imd" in later naming. ''' def __init__(self, local_name, value, text, context, node=None, start_date=None, end_date=None, instant=None): self.local_name = local_name self.value = value self.text = text self.context = context self.node = node self.start_date = start_date self.end_date = end_date self.instant = instant def __cmp__(self, other): if self.value < other.value: return -1 elif self.value > other.value: return 1 return 0 def __repr__(self): context_id = None if self.context: context_id = self.context.xpath('@id')[0].extract() return '(%s, %s, %s)' % (self.local_name, self.value, context_id) def is_member(self): return is_member(self.context) class ExtractText(object): def __call__(self, value): if hasattr(value, 'select'): try: return value.xpath('./text()')[0].extract() except IndexError: return '' return unicode(value) class MatchEndDate(object): def __init__(self, data_type=str, ignore_date_range=False): self.data_type = data_type self.ignore_date_range = ignore_date_range def __call__(self, value, loader_context): if not hasattr(value, 'select'): return IntermediateValue('', 0.0, '0', None) doc_end_date_str = loader_context['end_date'] doc_type = loader_context['doc_type'] selector = loader_context['selector'] context_id = value.xpath('@contextRef')[0].extract() try: context = selector.xpath('//*[@id="%s"]' % context_id)[0] except IndexError: try: url = loader_context['response'].url except KeyError: url = None log.msg(u'Cannot find context: %s in %s' % (context_id, url), log.WARNING) return None date = instant = start_date = end_date = None try: instant = context.xpath('.//*[local-name()="instant"]/text()')[0].extract().strip() except (IndexError, ValueError): try: end_date_str = context.xpath('.//*[local-name()="endDate"]/text()')[0].extract().strip() end_date = datetime.strptime(end_date_str, DATE_FORMAT) start_date_str = context.xpath('.//*[local-name()="startDate"]/text()')[0].extract().strip() start_date = datetime.strptime(start_date_str, DATE_FORMAT) if self.ignore_date_range or date_range_matches_doc_type(doc_type, start_date, end_date): date = end_date except (IndexError, ValueError): pass else: try: instant = datetime.strptime(instant, DATE_FORMAT) except ValueError: pass else: date = instant if date: doc_end_date = datetime.strptime(doc_end_date_str, DATE_FORMAT) delta_days = (doc_end_date - date).days if abs(delta_days) < 30: try: text = value.xpath('./text()')[0].extract() val = self.data_type(text) except (IndexError, ValueError): pass else: local_name = value.xpath('local-name()')[0].extract() return IntermediateValue( local_name, val, text, context, value, start_date=start_date, end_date=end_date, instant=instant) return None class ImdSumMembersOr(object): def __init__(self, second_func=None): self.second_func = second_func def __call__(self, imd_values): members = [] non_members = [] for imd_value in imd_values: if imd_value.is_member(): members.append(imd_value) else: non_members.append(imd_value) if members and len(members) == len(imd_values): return imd_sum(members) if imd_values: return self.second_func(non_members) return None def date_range_matches_doc_type(doc_type, start_date, end_date): delta_days = (end_date - start_date).days return ((doc_type == '10-Q' and delta_days < 120 and delta_days > 60) or (doc_type == '10-K' and delta_days < 380 and delta_days > 350)) def get_amend(values): if values: return values[0] return False def get_symbol(values): if values: symbols = map(lambda s: s.strip(), values[0].split(',')) return '/'.join(symbols) return False def imd_max(imd_values): if imd_values: imd_value = max(imd_values) return imd_value.value return None def imd_min(imd_values): if imd_values: imd_value = min(imd_values) return imd_value.value return None def imd_sum(imd_values): return sum([v.value for v in imd_values]) def imd_get_revenues(imd_values): interest_elems = filter(lambda v: 'interest' in v.local_name.lower(), imd_values) if len(interest_elems) == len(imd_values): # HACK: An exceptional case for BBT # Revenues = InterestIncome + NoninterestIncome return imd_sum(imd_values) return imd_max(imd_values) def imd_get_net_income(imd_values): return imd_min(imd_values) def imd_get_op_income(imd_values): imd_values = filter(lambda v: memberness(v.context) < 2, imd_values) return imd_min(imd_values) def imd_get_cash_flow(imd_values, loader_context): if len(imd_values) == 1: return imd_values[0].value doc_type = loader_context['doc_type'] within_date_range = [] for imd_value in imd_values: if imd_value.start_date and imd_value.end_date: if date_range_matches_doc_type(doc_type, imd_value.start_date, imd_value.end_date): within_date_range.append(imd_value) if within_date_range: return imd_max(within_date_range) return imd_max(imd_values) def imd_get_per_share_value(imd_values): if not imd_values: return None v = imd_values[0] value = v.value if abs(value) > MAX_PER_SHARE_VALUE: try: decimals = int(v.node.xpath('@decimals')[0].extract()) except (AttributeError, IndexError, ValueError): return None else: # HACK: some of LTD's reports have unreasonablely large per share value, such as # 320000 EPS (and it should be 0.32), so use decimals attribute to scale it down, # note that this is NOT a correct way to interpret decimals attribute value *= pow(10, decimals - 2) return value if abs(value) <= MAX_PER_SHARE_VALUE else None def imd_get_equity(imd_values): if not imd_values: return None values = filter(lambda v: v.local_name == 'StockholdersEquityIncludingPortionAttributableToNoncontrollingInterest', imd_values) if values: return values[0].value values = filter(lambda v: v.local_name == 'StockholdersEquity', imd_values) if values: return values[0].value return imd_values[0].value def imd_filter_member(imd_values): if imd_values: with_memberness = [(v, memberness(v.context)) for v in imd_values] with_memberness = sorted(with_memberness, cmp=lambda a, b: a[1] - b[1]) m0 = with_memberness[0][1] non_members = [] for v in with_memberness: if v[1] == m0: non_members.append(v[0]) return non_members return imd_values def imd_mult(imd_values): for v in imd_values: try: node_id = v.node.xpath('@id')[0].extract().lower() except (AttributeError, IndexError): pass else: # HACK: some of LUV's reports have unreasonablely small numbers such as # 4136 in revenues which should be 4136 millions, this hack uses id attribute # to determine if it should be scaled up if 'inmillions' in node_id and abs(v.value) < 100000.0: v.value *= 1000000.0 elif 'inthousands' in node_id and abs(v.value) < 100000000.0: v.value *= 1000.0 return imd_values def memberness(context): '''The likelihood that the context is a "member".''' if context: texts = context.xpath('.//*[local-name()="explicitMember"]/text()').extract() text = str(texts).lower() if len(texts) > 1: return 2 elif 'country' in text: return 2 elif 'member' not in text: return 0 elif 'successor' in text: # 'SuccessorMember' is a rare case that shouldn't be treated as member return 1 elif 'parent' in text: return 2 return 3 def is_member(context): if context: texts = context.xpath('.//*[local-name()="explicitMember"]/text()').extract() text = str(texts).lower() # 'SuccessorMember' is a rare case that shouldn't be treated as member if 'member' not in text or 'successor' in text or 'parent' in text: return False return True def str_to_bool(value): if hasattr(value, 'lower'): value = value.lower() return bool(value) and value != 'false' and value != '0' return bool(value) def find_namespace(xxs, name): name_re = name.replace('-', '\-') if not name_re.startswith('xmlns'): name_re = 'xmlns:' + name_re return xxs.re('%s=\"([^\"]+)\"' % name_re)[0] def register_namespace(xxs, name): ns = find_namespace(xxs, name) xxs.register_namespace(name, ns) def register_namespaces(xxs): names = ('xmlns', 'xbrli', 'dei', 'us-gaap') for name in names: try: register_namespace(xxs, name) except IndexError: pass class XmlXPathItemLoader(ItemLoader): def __init__(self, *args, **kwargs): super(XmlXPathItemLoader, self).__init__(*args, **kwargs) register_namespaces(self.selector) def add_xpath(self, field_name, xpath, *processors, **kw): values = self._get_values(xpath, **kw) self.add_value(field_name, values, *processors, **kw) return len(self._values[field_name]) def add_xpaths(self, name, paths): for path in paths: match_count = self.add_xpath(name, path) if match_count > 0: return match_count return 0 def _get_values(self, xpaths, **kw): xpaths = arg_to_iter(xpaths) return flatten([self.selector.xpath(xpath) for xpath in xpaths]) class ReportItemLoader(XmlXPathItemLoader): default_item_class = ReportItem default_output_processor = TakeFirst() symbol_in = MapCompose(ExtractText(), unicode.upper) symbol_out = Compose(get_symbol) amend_in = MapCompose(ExtractText(), str_to_bool) amend_out = Compose(get_amend) period_focus_in = MapCompose(ExtractText(), unicode.upper) period_focus_out = TakeFirst() revenues_in = MapCompose(MatchEndDate(float)) revenues_out = Compose(imd_filter_member, imd_mult, ImdSumMembersOr(imd_get_revenues)) net_income_in = MapCompose(MatchEndDate(float)) net_income_out = Compose(imd_filter_member, imd_mult, imd_get_net_income) op_income_in = MapCompose(MatchEndDate(float)) op_income_out = Compose(imd_filter_member, imd_mult, imd_get_op_income) eps_basic_in = MapCompose(MatchEndDate(float)) eps_basic_out = Compose(ImdSumMembersOr(imd_get_per_share_value), lambda x: x if x < MAX_PER_SHARE_VALUE else None) eps_diluted_in = MapCompose(MatchEndDate(float)) eps_diluted_out = Compose(ImdSumMembersOr(imd_get_per_share_value), lambda x: x if x < MAX_PER_SHARE_VALUE else None) dividend_in = MapCompose(MatchEndDate(float)) dividend_out = Compose(imd_get_per_share_value, lambda x: x if x < MAX_PER_SHARE_VALUE and x > 0.0 else 0.0) assets_in = MapCompose(MatchEndDate(float)) assets_out = Compose(imd_filter_member, imd_mult, imd_max) cur_assets_in = MapCompose(MatchEndDate(float)) cur_assets_out = Compose(imd_filter_member, imd_mult, imd_max) cur_liab_in = MapCompose(MatchEndDate(float)) cur_liab_out = Compose(imd_filter_member, imd_mult, imd_max) equity_in = MapCompose(MatchEndDate(float)) equity_out = Compose(imd_filter_member, imd_mult, imd_get_equity) cash_in = MapCompose(MatchEndDate(float)) cash_out = Compose(imd_filter_member, imd_mult, imd_max) cash_flow_op_in = MapCompose(MatchEndDate(float, True)) cash_flow_op_out = Compose(imd_filter_member, imd_mult, imd_get_cash_flow) cash_flow_inv_in = MapCompose(MatchEndDate(float, True)) cash_flow_inv_out = Compose(imd_filter_member, imd_mult, imd_get_cash_flow) cash_flow_fin_in = MapCompose(MatchEndDate(float, True)) cash_flow_fin_out = Compose(imd_filter_member, imd_mult, imd_get_cash_flow) def __init__(self, *args, **kwargs): response = kwargs.get('response') if len(response.body) > THRESHOLD_TO_CLEAN: # Remove some useless text to reduce memory usage body, __ = RE_XML_GARBAGE.subn(lambda m: '><', response.body) response = response.replace(body=body) kwargs['response'] = response super(ReportItemLoader, self).__init__(*args, **kwargs) symbol = self._get_symbol() end_date = self._get_doc_end_date() fiscal_year = self._get_doc_fiscal_year() doc_type = self._get_doc_type() # ignore document that is not 10-Q or 10-K if not (doc_type and doc_type.split('/')[0] in ('10-Q', '10-K')): return # some documents set their amendment flag in DocumentType, e.g., '10-Q/A', # instead of setting it in AmendmentFlag amend = None if doc_type.endswith('/A'): amend = True doc_type = doc_type[0:-2] self.context.update({ 'end_date': end_date, 'doc_type': doc_type }) self.add_xpath('symbol', '//dei:TradingSymbol') self.add_value('symbol', symbol) if amend: self.add_value('amend', True) else: self.add_xpath('amend', '//dei:AmendmentFlag') if doc_type == '10-K': period_focus = 'FY' else: period_focus = self._get_period_focus(end_date) if not fiscal_year and period_focus: fiscal_year = self._guess_fiscal_year(end_date, period_focus) self.add_value('period_focus', period_focus) self.add_value('fiscal_year', fiscal_year) self.add_value('end_date', end_date) self.add_value('doc_type', doc_type) self.add_xpaths('revenues', [ '//us-gaap:SalesRevenueNet', '//us-gaap:Revenues', '//us-gaap:SalesRevenueGoodsNet', '//us-gaap:SalesRevenueServicesNet', '//us-gaap:RealEstateRevenueNet', '//*[local-name()="NetRevenuesIncludingNetInterestIncome"]', '//*[contains(local-name(), "TotalRevenues") and contains(local-name(), "After")]', '//*[contains(local-name(), "TotalRevenues")]', '//*[local-name()="InterestAndDividendIncomeOperating" or local-name()="NoninterestIncome"]', '//*[contains(local-name(), "Revenue")]' ]) self.add_xpath('revenues', '//us-gaap:FinancialServicesRevenue') self.add_xpaths('net_income', [ '//*[contains(local-name(), "NetLossIncome") and contains(local-name(), "Corporation")]', '//*[local-name()="NetIncomeLossAvailableToCommonStockholdersBasic" or local-name()="NetIncomeLoss"]', '//us-gaap:ProfitLoss', '//us-gaap:IncomeLossFromContinuingOperations', '//*[contains(local-name(), "IncomeLossFromContinuingOperations") and not(contains(local-name(), "Per"))]', '//*[contains(local-name(), "NetIncomeLoss")]', '//*[starts-with(local-name(), "NetIncomeAttributableTo")]' ]) self.add_xpaths('op_income', [ '//us-gaap:OperatingIncomeLoss' ]) self.add_xpaths('eps_basic', [ '//us-gaap:EarningsPerShareBasic', '//us-gaap:IncomeLossFromContinuingOperationsPerBasicShare', '//us-gaap:IncomeLossFromContinuingOperationsPerBasicAndDilutedShare', '//*[contains(local-name(), "NetIncomeLoss") and contains(local-name(), "Per") and contains(local-name(), "Common")]', '//*[contains(local-name(), "Earnings") and contains(local-name(), "Per") and contains(local-name(), "Basic")]', '//*[local-name()="IncomePerShareFromContinuingOperationsAvailableToCompanyStockholdersBasicAndDiluted"]', '//*[contains(local-name(), "NetLossPerShare")]', '//*[contains(local-name(), "NetIncome") and contains(local-name(), "Per") and contains(local-name(), "Basic")]', '//*[local-name()="BasicEarningsAttributableToStockholdersPerCommonShare"]', '//*[local-name()="Earningspersharebasicanddiluted"]', '//*[contains(local-name(), "PerCommonShareBasicAndDiluted")]', '//*[local-name()="NetIncomeLossAttributableToCommonStockholdersBasicAndDiluted"]', '//us-gaap:NetIncomeLossAvailableToCommonStockholdersBasic', '//*[local-name()="NetIncomeLossEPS"]', '//*[local-name()="NetLoss"]' ]) self.add_xpaths('eps_diluted', [ '//us-gaap:EarningsPerShareDiluted', '//us-gaap:IncomeLossFromContinuingOperationsPerDilutedShare', '//us-gaap:IncomeLossFromContinuingOperationsPerBasicAndDilutedShare', '//*[contains(local-name(), "Earnings") and contains(local-name(), "Per") and contains(local-name(), "Diluted")]', '//*[local-name()="IncomePerShareFromContinuingOperationsAvailableToCompanyStockholdersBasicAndDiluted"]', '//*[contains(local-name(), "NetLossPerShare")]', '//*[contains(local-name(), "NetIncome") and contains(local-name(), "Per") and contains(local-name(), "Diluted")]', '//*[local-name()="DilutedEarningsAttributableToStockholdersPerCommonShare"]', '//us-gaap:NetIncomeLossAvailableToCommonStockholdersDiluted', '//*[contains(local-name(), "PerCommonShareBasicAndDiluted")]', '//*[local-name()="NetIncomeLossAttributableToCommonStockholdersBasicAndDiluted"]', '//us-gaap:EarningsPerShareBasic', '//*[local-name()="NetIncomeLossEPS"]', '//*[local-name()="NetLoss"]' ]) self.add_xpaths('dividend', [ '//us-gaap:CommonStockDividendsPerShareDeclared', '//us-gaap:CommonStockDividendsPerShareCashPaid' ]) # if dividend isn't found in doc, assume it's 0 self.add_value('dividend', 0.0) self.add_xpaths('assets', [ '//us-gaap:Assets', '//us-gaap:AssetsNet', '//us-gaap:LiabilitiesAndStockholdersEquity' ]) self.add_xpaths('cur_assets', [ '//us-gaap:AssetsCurrent' ]) self.add_xpaths('cur_liab', [ '//us-gaap:LiabilitiesCurrent' ]) self.add_xpaths('equity', [ '//*[local-name()="StockholdersEquityIncludingPortionAttributableToNoncontrollingInterest" or local-name()="StockholdersEquity"]', '//*[local-name()="TotalCommonShareholdersEquity"]', '//*[local-name()="CommonShareholdersEquity"]', '//*[local-name()="CommonStockEquity"]', '//*[local-name()="TotalEquity"]', '//us-gaap:RetainedEarningsAccumulatedDeficit', '//*[contains(local-name(), "MembersEquityIncludingPortionAttributableToNoncontrollingInterest")]', '//us-gaap:CapitalizationLongtermDebtAndEquity', '//*[local-name()="TotalCapitalization"]' ]) self.add_xpaths('cash', [ '//us-gaap:CashCashEquivalentsAndFederalFundsSold', '//us-gaap:CashAndDueFromBanks', '//us-gaap:CashAndCashEquivalentsAtCarryingValue', '//us-gaap:Cash', '//*[local-name()="CashAndCashEquivalents"]', '//*[contains(local-name(), "CarryingValueOfCashAndCashEquivalents")]', '//*[contains(local-name(), "CashCashEquivalents")]', '//*[contains(local-name(), "CashAndCashEquivalents")]' ]) self.add_xpaths('cash_flow_op', [ '//us-gaap:NetCashProvidedByUsedInOperatingActivities', '//us-gaap:NetCashProvidedByUsedInOperatingActivitiesContinuingOperations' ]) self.add_xpaths('cash_flow_inv', [ '//us-gaap:NetCashProvidedByUsedInInvestingActivities', '//us-gaap:NetCashProvidedByUsedInInvestingActivitiesContinuingOperations' ]) self.add_xpaths('cash_flow_fin', [ '//us-gaap:NetCashProvidedByUsedInFinancingActivities', '//us-gaap:NetCashProvidedByUsedInFinancingActivitiesContinuingOperations' ]) def _get_symbol(self): try: filename = self.context['response'].url.split('/')[-1] return filename.split('-')[0].upper() except IndexError: return None def _get_doc_fiscal_year(self): try: fiscal_year = self.selector.xpath('//dei:DocumentFiscalYearFocus/text()')[0].extract() return int(fiscal_year) except (IndexError, ValueError): return None def _guess_fiscal_year(self, end_date, period_focus): # Guess fiscal_year based on document end_date and period_focus date = datetime.strptime(end_date, DATE_FORMAT) month_ranges = { 'Q1': (2, 3, 4), 'Q2': (5, 6, 7), 'Q3': (8, 9, 10), 'FY': (11, 12, 1) } month_range = month_ranges.get(period_focus) # Case 1: release Q1 around March, Q2 around June, ... # This is what most companies do if date.month in month_range: if period_focus == 'FY' and date.month == 1: return date.year - 1 return date.year # How many days left before 10-K's release? days_left_table = { 'Q1': 270, 'Q2': 180, 'Q3': 90, 'FY': 0 } days_left = days_left_table.get(period_focus) # Other cases, assume end_date.year of its FY report equals to # its fiscal_year if days_left is not None: fy_date = date + timedelta(days=days_left) return fy_date.year return None def _get_doc_end_date(self): # the document end date could come from URL or document content # we need to guess which one is correct url_date_str = self.context['response'].url.split('-')[-1].split('.')[0] url_date = datetime.strptime(url_date_str, '%Y%m%d') url_date_str = url_date.strftime(DATE_FORMAT) try: doc_date_str = self.selector.xpath('//dei:DocumentPeriodEndDate/text()')[0].extract() doc_date = datetime.strptime(doc_date_str, DATE_FORMAT) except (IndexError, ValueError): return url_date.strftime(DATE_FORMAT) context_date_strs = set(self.selector.xpath('//*[local-name()="context"]//*[local-name()="endDate"]/text()').extract()) date = url_date if doc_date_str in context_date_strs: date = doc_date return date.strftime(DATE_FORMAT) def _get_doc_type(self): try: return self.selector.xpath('//dei:DocumentType/text()')[0].extract().upper() except (IndexError, ValueError): return None def _get_period_focus(self, doc_end_date): try: return self.selector.xpath('//dei:DocumentFiscalPeriodFocus/text()')[0].extract().strip().upper() except IndexError: pass try: doc_yr = doc_end_date.split('-')[0] yr_end_date = self.selector.xpath('//dei:CurrentFiscalYearEndDate/text()')[0].extract() yr_end_date = yr_end_date.replace('--', doc_yr + '-') except IndexError: return None doc_end_date = datetime.strptime(doc_end_date, '%Y-%m-%d') yr_end_date = datetime.strptime(yr_end_date, '%Y-%m-%d') delta_days = (yr_end_date - doc_end_date).days if delta_days > -45 and delta_days < 45: return 'FY' elif (delta_days <= -45 and delta_days > -135) or delta_days > 225: return 'Q1' elif (delta_days <= -135 and delta_days > -225) or (delta_days > 135 and delta_days <= 225): return 'Q2' elif delta_days <= -225 or (delta_days > 45 and delta_days <= 135): return 'Q3' return 'FY' ================================================ FILE: pystock_crawler/settings.py ================================================ # Scrapy settings for pystock-crawler project # # For simplicity, this file contains only the most important settings by # default. All the other settings are documented here: # # http://doc.scrapy.org/en/latest/topics/settings.html # BOT_NAME = 'pystock-crawler' EXPORT_FIELDS = ( # Price columns 'symbol', 'date', 'open', 'high', 'low', 'close', 'volume', 'adj_close', # Report columns 'end_date', 'amend', 'period_focus', 'fiscal_year', 'doc_type', 'revenues', 'op_income', 'net_income', 'eps_basic', 'eps_diluted', 'dividend', 'assets', 'cur_assets', 'cur_liab', 'cash', 'equity', 'cash_flow_op', 'cash_flow_inv', 'cash_flow_fin', ) FEED_EXPORTERS = { 'csv': 'pystock_crawler.exporters.CsvItemExporter2', 'symbollist': 'pystock_crawler.exporters.SymbolListExporter' } HTTPCACHE_ENABLED = True HTTPCACHE_POLICY = 'scrapy.contrib.httpcache.RFC2616Policy' HTTPCACHE_STORAGE = 'scrapy.contrib.httpcache.LeveldbCacheStorage' LOG_LEVEL = 'INFO' NEWSPIDER_MODULE = 'pystock_crawler.spiders' SPIDER_MODULES = ['pystock_crawler.spiders'] # Crawl responsibly by identifying yourself (and your website) on the user-agent #USER_AGENT = 'pystock-crawler (+http://www.yourdomain.com)' CONCURRENT_REQUESTS_PER_DOMAIN = 8 COOKIES_ENABLED = False #AUTOTHROTTLE_ENABLED = True RETRY_TIMES = 4 EXTENSIONS = { 'scrapy.contrib.throttle.AutoThrottle': None, 'pystock_crawler.throttle.PassiveThrottle': 0 } PASSIVETHROTTLE_ENABLED = True #PASSIVETHROTTLE_DEBUG = True DEPTH_STATS_VERBOSE = True ================================================ FILE: pystock_crawler/spiders/__init__.py ================================================ # This package will contain the spiders of your Scrapy project # # Please refer to the documentation for information on how to create and manage # your spiders. ================================================ FILE: pystock_crawler/spiders/edgar.py ================================================ import os from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor from scrapy.contrib.spiders import CrawlSpider, Rule from pystock_crawler import utils from pystock_crawler.loaders import ReportItemLoader class URLGenerator(object): def __init__(self, symbols, start_date='', end_date='', start=0, count=None): end = start + count if count is not None else None self.symbols = symbols[start:end] self.start_date = start_date self.end_date = end_date def __iter__(self): url = 'http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=%s&type=10-&dateb=%s&datea=%s&owner=exclude&count=300' for symbol in self.symbols: yield (url % (symbol, self.end_date, self.start_date)) class EdgarSpider(CrawlSpider): name = 'edgar' allowed_domains = ['sec.gov'] rules = ( Rule(SgmlLinkExtractor(allow=('/Archives/edgar/data/[^\"]+\-index\.htm',))), Rule(SgmlLinkExtractor(allow=('/Archives/edgar/data/[^\"]+/[A-Za-z]+\-\d{8}\.xml',)), callback='parse_10qk'), ) def __init__(self, **kwargs): super(EdgarSpider, self).__init__(**kwargs) symbols_arg = kwargs.get('symbols') start_date = kwargs.get('startdate', '') end_date = kwargs.get('enddate', '') limit_arg = kwargs.get('limit', '') utils.check_date_arg(start_date, 'startdate') utils.check_date_arg(end_date, 'enddate') start, count = utils.parse_limit_arg(limit_arg) if symbols_arg: if os.path.exists(symbols_arg): # get symbols from a text file symbols = utils.load_symbols(symbols_arg) else: # inline symbols in command symbols = symbols_arg.split(',') self.start_urls = URLGenerator(symbols, start_date, end_date, start, count) else: self.start_urls = [] def parse_10qk(self, response): '''Parse 10-Q or 10-K XML report.''' loader = ReportItemLoader(response=response) item = loader.load_item() if 'doc_type' in item: doc_type = item['doc_type'] if doc_type in ('10-Q', '10-K'): return item return None ================================================ FILE: pystock_crawler/spiders/nasdaq.py ================================================ import cStringIO import re from scrapy.spider import Spider from pystock_crawler.items import SymbolItem RE_SYMBOL = re.compile(r'^[A-Z]+$') def generate_urls(exchanges): for exchange in exchanges: yield 'http://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=%s&render=download' % exchange class NasdaqSpider(Spider): name = 'nasdaq' allowed_domains = ['www.nasdaq.com'] def __init__(self, **kwargs): super(NasdaqSpider, self).__init__(**kwargs) exchanges = kwargs.get('exchanges', '').split(',') self.start_urls = generate_urls(exchanges) def parse(self, response): try: file_like = cStringIO.StringIO(response.body) # Ignore first row file_like.next() for line in file_like: tokens = line.split(',') symbol = tokens[0].strip('"') if RE_SYMBOL.match(symbol): name = tokens[1].strip('"') yield SymbolItem(symbol=symbol, name=name) finally: file_like.close() ================================================ FILE: pystock_crawler/spiders/yahoo.py ================================================ import cStringIO import os import re from datetime import datetime from scrapy.spider import Spider from pystock_crawler import utils from pystock_crawler.items import PriceItem def parse_date(date_str): if date_str: date = datetime.strptime(date_str, '%Y%m%d') return date.year, date.month - 1, date.day return '', '', '' def make_url(symbol, start_date=None, end_date=None): url = ('http://ichart.finance.yahoo.com/table.csv?' 's=%(symbol)s&d=%(end_month)s&e=%(end_day)s&f=%(end_year)s&g=d&' 'a=%(start_month)s&b=%(start_day)s&c=%(start_year)s&ignore=.csv') start_date = parse_date(start_date) end_date = parse_date(end_date) return url % { 'symbol': symbol, 'start_year': start_date[0], 'start_month': start_date[1], 'start_day': start_date[2], 'end_year': end_date[0], 'end_month': end_date[1], 'end_day': end_date[2] } def generate_urls(symbols, start_date=None, end_date=None): for symbol in symbols: yield make_url(symbol, start_date, end_date) class YahooSpider(Spider): name = 'yahoo' allowed_domains = ['finance.yahoo.com'] def __init__(self, **kwargs): super(YahooSpider, self).__init__(**kwargs) symbols_arg = kwargs.get('symbols') start_date = kwargs.get('startdate', '') end_date = kwargs.get('enddate', '') utils.check_date_arg(start_date, 'startdate') utils.check_date_arg(end_date, 'enddate') if symbols_arg: if os.path.exists(symbols_arg): # get symbols from a text file symbols = utils.load_symbols(symbols_arg) else: # inline symbols in command symbols = symbols_arg.split(',') self.start_urls = generate_urls(symbols, start_date, end_date) else: self.start_urls = [] def parse(self, response): symbol = self._get_symbol_from_url(response.url) try: file_like = cStringIO.StringIO(response.body) rows = utils.parse_csv(file_like) for row in rows: item = PriceItem(symbol=symbol) for k, v in row.iteritems(): item[k.replace(' ', '_').lower()] = v yield item finally: file_like.close() def _get_symbol_from_url(self, url): match = re.search(r'[\?&]s=([^&]*)', url) if match: return match.group(1) return '' ================================================ FILE: pystock_crawler/tests/__init__.py ================================================ ================================================ FILE: pystock_crawler/tests/base.py ================================================ import os import unittest # Stores temporary test data SAMPLE_DATA_DIR = os.path.join(os.path.abspath(os.path.dirname(__file__)), 'sample_data') class TestCaseBase(unittest.TestCase): ''' Provides utility functions for test cases. ''' def assert_none_or_almost_equal(self, value, expected_value): if expected_value is None: self.assertIsNone(value) else: self.assertAlmostEqual(value, expected_value) def assert_item(self, item, expected): self.assertEqual(item.get('symbol'), expected.get('symbol')) self.assertEqual(item.get('name'), expected.get('name')) self.assertEqual(item.get('amend'), expected.get('amend')) self.assertEqual(item.get('doc_type'), expected.get('doc_type')) self.assertEqual(item.get('period_focus'), expected.get('period_focus')) self.assertEqual(item.get('fiscal_year'), expected.get('fiscal_year')) self.assertEqual(item.get('end_date'), expected.get('end_date')) self.assert_none_or_almost_equal(item.get('revenues'), expected.get('revenues')) self.assert_none_or_almost_equal(item.get('net_income'), expected.get('net_income')) self.assert_none_or_almost_equal(item.get('eps_basic'), expected.get('eps_basic')) self.assert_none_or_almost_equal(item.get('eps_diluted'), expected.get('eps_diluted')) self.assertAlmostEqual(item.get('dividend'), expected.get('dividend')) self.assert_none_or_almost_equal(item.get('assets'), expected.get('assets')) self.assert_none_or_almost_equal(item.get('equity'), expected.get('equity')) self.assert_none_or_almost_equal(item.get('cash'), expected.get('cash')) self.assert_none_or_almost_equal(item.get('op_income'), expected.get('op_income')) self.assert_none_or_almost_equal(item.get('cur_assets'), expected.get('cur_assets')) self.assert_none_or_almost_equal(item.get('cur_liab'), expected.get('cur_liab')) self.assert_none_or_almost_equal(item.get('cash_flow_op'), expected.get('cash_flow_op')) self.assert_none_or_almost_equal(item.get('cash_flow_inv'), expected.get('cash_flow_inv')) self.assert_none_or_almost_equal(item.get('cash_flow_fin'), expected.get('cash_flow_fin')) def _create_sample_data_dir(): if not os.path.exists(SAMPLE_DATA_DIR): try: os.makedirs(SAMPLE_DATA_DIR) except OSError: pass assert os.path.exists(SAMPLE_DATA_DIR) _create_sample_data_dir() ================================================ FILE: pystock_crawler/tests/test_cmdline.py ================================================ import os import shutil import unittest import pystock_crawler from envoy import run TEST_DIR = './test_data' # Scrapy runs on another process where working directory may be different with # the process running the test. So we have to explicitly set PYTHONPATH to # the absolute path of the current working directory for Scrapy process to be # able to locate pystock_crawler module. os.environ['PYTHONPATH'] = os.getcwd() class PrintTest(unittest.TestCase): def test_no_args(self): r = run('./bin/pystock-crawler') self.assertIn('Usage:', r.std_err) def test_print_help(self): r = run('./bin/pystock-crawler -h') self.assertIn('Usage:', r.std_out) r2 = run('./bin/pystock-crawler --help') self.assertEqual(r.std_out, r2.std_out) def test_print_version(self): r = run('./bin/pystock-crawler -v') self.assertEqual(r.std_out, 'pystock-crawler %s\n' % pystock_crawler.__version__) r2 = run('./bin/pystock-crawler --version') self.assertEqual(r.std_out, r2.std_out) class CrawlTest(unittest.TestCase): '''Base class for crawl test cases.''' def setUp(self): if os.path.isdir(TEST_DIR): shutil.rmtree(TEST_DIR) os.mkdir(TEST_DIR) self.args = { 'output': os.path.join(TEST_DIR, '%s.out' % self.filename), 'log_file': os.path.join(TEST_DIR, '%s.log' % self.filename), 'working_dir': TEST_DIR } def tearDown(self): shutil.rmtree(TEST_DIR) def assert_cache(self): # Check if cache is there cache_dir = os.path.join(TEST_DIR, '.scrapy', 'httpcache', '%s.leveldb' % self.spider) self.assertTrue(os.path.isdir(cache_dir)) def assert_log(self): # Check if log file is there log_path = self.args['log_file'] self.assertTrue(os.path.isfile(log_path)) def get_output_content(self): output_path = self.args['output'] self.assertTrue(os.path.isfile(output_path)) with open(output_path) as f: content = f.read() return content class CrawlSymbolsTest(CrawlTest): filename = 'symbols' spider = 'nasdaq' def assert_nyse_output(self): # Check if some common NYSE symbols are in output content = self.get_output_content() self.assertIn('JPM', content) self.assertIn('KO', content) self.assertIn('WMT', content) # NASDAQ symbols shouldn't be self.assertNotIn('AAPL', content) self.assertNotIn('GOOG', content) self.assertNotIn('YHOO', content) def assert_nyse_and_nasdaq_output(self): # Check if some common NYSE symbols are in output content = self.get_output_content() self.assertIn('JPM', content) self.assertIn('KO', content) self.assertIn('WMT', content) # Check if some common NASDAQ symbols are in output self.assertIn('AAPL', content) self.assertIn('GOOG', content) self.assertIn('YHOO', content) def test_crawl_nyse(self): r = run('./bin/pystock-crawler symbols NYSE -o %(output)s -l %(log_file)s -w %(working_dir)s' % self.args) self.assertEqual(r.status_code, 0) self.assert_nyse_output() self.assert_log() self.assert_cache() def test_crawl_nyse_and_nasdaq(self): r = run('./bin/pystock-crawler symbols NYSE,NASDAQ -o %(output)s -l %(log_file)s -w %(working_dir)s --sort' % self.args) self.assertEqual(r.status_code, 0) self.assert_nyse_and_nasdaq_output() self.assert_log() self.assert_cache() class CrawlPricesTest(CrawlTest): filename = 'prices' spider = 'yahoo' def test_crawl_inline_symbols(self): r = run('./bin/pystock-crawler prices GOOG,IBM -o %(output)s -l %(log_file)s -w %(working_dir)s' % self.args) self.assertEqual(r.status_code, 0) content = self.get_output_content() self.assertIn('GOOG', content) self.assertIn('IBM', content) self.assert_log() self.assert_cache() def test_crawl_symbol_file(self): # Create a sample symbol file symbol_file = os.path.join(TEST_DIR, 'symbols.txt') with open(symbol_file, 'w') as f: f.write('WMT\nJPM') self.args['symbol_file'] = symbol_file r = run('./bin/pystock-crawler prices %(symbol_file)s -o %(output)s -l %(log_file)s -w %(working_dir)s --sort' % self.args) self.assertEqual(r.status_code, 0) content = self.get_output_content() self.assertIn('WMT', content) self.assertIn('JPM', content) self.assert_log() self.assert_cache() class CrawlReportsTest(CrawlTest): filename = 'reports' spider = 'edgar' def test_crawl_inline_symbols(self): r = run('./bin/pystock-crawler reports KO,MCD -o %(output)s -l %(log_file)s -w %(working_dir)s ' '-s 20130401 -e 20130531' % self.args) self.assertEqual(r.status_code, 0) content = self.get_output_content() self.assertIn('KO', content) self.assertIn('MCD', content) self.assert_log() self.assert_cache() def test_crawl_symbol_file(self): # Create a sample symbol file symbol_file = os.path.join(TEST_DIR, 'symbols.txt') with open(symbol_file, 'w') as f: f.write('KO\nMCD') self.args['symbol_file'] = symbol_file r = run('./bin/pystock-crawler reports %(symbol_file)s -o %(output)s -l %(log_file)s -w %(working_dir)s ' '-s 20130401 -e 20130531 --sort' % self.args) self.assertEqual(r.status_code, 0) content = self.get_output_content() self.assertIn('KO', content) self.assertIn('MCD', content) self.assert_log() self.assert_cache() # Check CSV header expected_header = [ 'symbol', 'end_date', 'amend', 'period_focus', 'fiscal_year', 'doc_type', 'revenues', 'op_income', 'net_income', 'eps_basic', 'eps_diluted', 'dividend', 'assets', 'cur_assets', 'cur_liab', 'cash', 'equity', 'cash_flow_op', 'cash_flow_inv', 'cash_flow_fin' ] head_line = content.split('\n')[0].rstrip() self.assertEqual(head_line.split(','), expected_header) def test_merge_empty_results(self): # Ridiculous date range (1800/1/1) -> empty result r = run('./bin/pystock-crawler reports KO,MCD -o %(output)s -l %(log_file)s -w %(working_dir)s ' '-s 18000101 -e 18000101 -b 1' % self.args) self.assertEqual(r.status_code, 0) content = self.get_output_content() self.assertFalse(content) # Make sure subfiles are deleted filename = self.args['output'] self.assertFalse(os.path.exists(os.path.join('%s.1' % filename))) self.assertFalse(os.path.exists(os.path.join('%s.2' % filename))) ================================================ FILE: pystock_crawler/tests/test_loaders.py ================================================ import os import requests import urlparse from scrapy.http.response.xml import XmlResponse from pystock_crawler.loaders import ReportItemLoader from pystock_crawler.tests.base import SAMPLE_DATA_DIR, TestCaseBase def create_response(file_path): with open(file_path) as f: body = f.read() return XmlResponse('file://%s' % file_path.replace('\\', '/'), body=body) def download(url, local_path): if not os.path.exists(local_path): dir_path = os.path.dirname(local_path) if not os.path.exists(dir_path): try: os.makedirs(dir_path) except OSError: pass assert os.path.exists(dir_path) with open(local_path, 'wb') as f: r = requests.get(url, stream=True) for chunk in r.iter_content(chunk_size=4096): f.write(chunk) def parse_xml(url): url_path = urlparse.urlparse(url).path local_path = os.path.join(SAMPLE_DATA_DIR, url_path[1:]) download(url, local_path) response = create_response(local_path) loader = ReportItemLoader(response=response) return loader.load_item() class ReportItemLoaderTest(TestCaseBase): def test_a_20110131(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1090872/000110465911013291/a-20110131.xml') self.assert_item(item, { 'symbol': 'A', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2011, 'end_date': '2011-01-31', 'revenues': 1519000000, 'op_income': 211000000, 'net_income': 193000000, 'eps_basic': 0.56, 'eps_diluted': 0.54, 'dividend': 0.0, 'assets': 8044000000, 'cur_assets': 4598000000, 'cur_liab': 1406000000, 'equity': 3339000000, 'cash': 2638000000, 'cash_flow_op': 120000000, 'cash_flow_inv': 1500000000, 'cash_flow_fin': -1634000000 }) def test_aa_20120630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/4281/000119312512317135/aa-20120630.xml') self.assert_item(item, { 'symbol': 'AA', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2012, 'end_date': '2012-06-30', 'revenues': 5963000000, 'op_income': None, # Missing value 'net_income': -2000000, 'eps_basic': None, # EPS is 0 actually, but got no data in XML 'eps_diluted': None, 'dividend': 0.03, 'assets': 39498000000, 'cur_assets': 7767000000, 'cur_liab': 6151000000, 'equity': 16914000000, 'cash': 1712000000, 'cash_flow_op': 301000000, 'cash_flow_inv': -704000000, 'cash_flow_fin': 196000000 }) def test_aapl_20100626(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/320193/000119312510162840/aapl-20100626.xml') self.assert_item(item, { 'symbol': 'AAPL', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2010, 'end_date': '2010-06-26', 'revenues': 15700000000, 'op_income': 4234000000, 'net_income': 3253000000, 'eps_basic': 3.57, 'eps_diluted': 3.51, 'dividend': 0.0, 'assets': 64725000000, 'cur_assets': 36033000000, 'cur_liab': 15612000000, 'equity': 43111000000, 'cash': 9705000000, 'cash_flow_op': 12912000000, 'cash_flow_inv': -9471000000, 'cash_flow_fin': 1001000000 }) def test_aapl_20110326(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/320193/000119312511104388/aapl-20110326.xml') self.assert_item(item, { 'symbol': 'AAPL', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2011, 'end_date': '2011-03-26', 'revenues': 24667000000, 'net_income': 5987000000, 'op_income': 7874000000, 'eps_basic': 6.49, 'eps_diluted': 6.40, 'dividend': 0.0, 'assets': 94904000000, 'cur_assets': 46997000000, 'cur_liab': 24327000000, 'equity': 61477000000, 'cash': 15978000000, 'cash_flow_op': 15992000000, 'cash_flow_inv': -12251000000, 'cash_flow_fin': 976000000 }) def test_aapl_20120929(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/320193/000119312512444068/aapl-20120929.xml') self.assert_item(item, { 'symbol': 'AAPL', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2012, 'end_date': '2012-09-29', 'revenues': 156508000000, 'op_income': 55241000000, 'net_income': 41733000000, 'eps_basic': 44.64, 'eps_diluted': 44.15, 'dividend': 2.65, 'assets': 176064000000, 'cur_assets': 57653000000, 'cur_liab': 38542000000, 'equity': 118210000000, 'cash': 10746000000, 'cash_flow_op': 50856000000, 'cash_flow_inv': -48227000000, 'cash_flow_fin': -1698000000 }) def test_aes_20100331(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/874761/000119312510111183/aes-20100331.xml') self.assert_item(item, { 'symbol': 'AES', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2010, 'end_date': '2010-03-31', 'revenues': 4112000000, 'op_income': None, # Missing value 'net_income': 187000000, 'eps_basic': 0.27, 'eps_diluted': 0.27, 'dividend': 0.0, 'assets': 41882000000, 'cur_assets': 10460000000, 'cur_liab': 6894000000, 'equity': 10536000000, 'cash': 3392000000, 'cash_flow_op': 684000000, 'cash_flow_inv': -595000000, 'cash_flow_fin': 1515000000 }) def test_adbe_20060914(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/796343/000110465906066129/adbe-20060914.xml') # Old document is not supported self.assertFalse(item) def test_adbe_20090227(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/796343/000079634309000021/adbe-20090227.xml') self.assert_item(item, { 'symbol': 'ADBE', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2009, 'end_date': '2009-02-27', 'revenues': 786390000, 'op_income': 207916000, 'net_income': 156435000, 'eps_basic': 0.3, 'eps_diluted': 0.3, 'dividend': 0.0, 'assets': 5887596000, 'cur_assets': 2868991000, 'cur_liab': 636865000, 'equity': 4611160000, 'cash': 1148925000, 'cash_flow_op': 365743000, 'cash_flow_inv': -131562000, 'cash_flow_fin': 28675000 }) def test_agn_20101231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/850693/000119312511050632/agn-20101231.xml') self.assert_item(item, { 'symbol': 'AGN', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2010, 'end_date': '2010-12-31', 'revenues': 4919400000, 'op_income': 258600000, 'net_income': 600000, 'eps_basic': 0.0, 'eps_diluted': 0.0, 'dividend': 0.2, 'assets': 8308100000, 'cur_assets': 3993700000, 'cur_liab': 1528400000, 'equity': 4781100000, 'cash': 1991200000, 'cash_flow_op': 463900000, 'cash_flow_inv': -977200000, 'cash_flow_fin': 563000000 }) def test_aig_20130630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/5272/000104746913008075/aig-20130630.xml') self.assert_item(item, { 'symbol': 'AIG', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2013, 'end_date': '2013-06-30', 'revenues': 17315000000, 'net_income': 2731000000, 'op_income': None, 'eps_basic': 1.85, 'eps_diluted': 1.84, 'dividend': 0.0, 'assets': 537438000000, 'cur_assets': None, 'cur_liab': None, 'equity': 98155000000, 'cash': 1762000000, 'cash_flow_op': 1674000000, 'cash_flow_inv': 6071000000, 'cash_flow_fin': -7055000000 }) def test_aiv_20110630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/922864/000095012311070591/aiv-20110630.xml') self.assert_item(item, { 'symbol': 'AIV', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2011, 'end_date': '2011-06-30', 'revenues': 281035000, 'op_income': 49791000, 'net_income': -33177000, 'eps_basic': -0.28, 'eps_diluted': -0.28, 'dividend': 0.12, 'assets': 7164972000, 'cur_assets': None, 'cur_liab': None, 'equity': 1241336000, 'cash': 85324000, 'cash_flow_op': 95208000, 'cash_flow_inv': -33538000, 'cash_flow_fin': -87671000 }) def test_all_20130331(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/899051/000110465913035969/all-20130331.xml') self.assert_item(item, { 'symbol': 'ALL', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2013, 'end_date': '2013-03-31', 'revenues': 8463000000, 'op_income': None, 'net_income': 709000000, 'eps_basic': 1.49, 'eps_diluted': 1.47, 'dividend': 0.25, 'assets': 126612000000, 'cur_assets': None, 'cur_liab': None, 'equity': 20619000000, 'cash': 820000000, 'cash_flow_op': 740000000, 'cash_flow_inv': 136000000, 'cash_flow_fin': -862000000 }) def test_apa_20120930(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/6769/000119312512457830/apa-20120930.xml') self.assert_item(item, { 'symbol': 'APA', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2012, 'end_date': '2012-09-30', 'revenues': 4179000000, 'op_income': None, 'net_income': 161000000, 'eps_basic': 0.41, 'eps_diluted': 0.41, 'dividend': 0.17, 'assets': 58810000000, 'cur_assets': 5044000000, 'cur_liab': 5390000000, 'equity': 30714000000, 'cash': 318000000, 'cash_flow_op': 6422000000, 'cash_flow_inv': -10560000000, 'cash_flow_fin': 4161000000 }) def test_axp_20100930(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/4962/000095012310100214/axp-20100930.xml') self.assert_item(item, { 'symbol': 'AXP', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2010, 'end_date': '2010-09-30', 'revenues': 6660000000, 'op_income': 1640000000, 'net_income': 1093000000, 'eps_basic': 0.91, 'eps_diluted': 0.9, 'dividend': 0.18, 'assets': 146056000000, 'cur_assets': None, 'cur_liab': None, 'equity': 15920000000, 'cash': 21341000000, 'cash_flow_op': 7227000000, 'cash_flow_inv': 5298000000, 'cash_flow_fin': -7885000000 }) def test_axp_20120630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/4962/000119312512332179/axp-20120630.xml') self.assert_item(item, { 'symbol': 'AXP', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2012, 'end_date': '2012-06-30', 'revenues': 7504000000, 'op_income': None, 'net_income': 1339000000, 'eps_basic': 1.16, 'eps_diluted': 1.15, 'dividend': 0.2, 'assets': 148128000000, 'cur_assets': None, 'cur_liab': None, 'equity': 19267000000, 'cash': 22072000000, 'cash_flow_op': 6742000000, 'cash_flow_inv': -1771000000, 'cash_flow_fin': -7786000000 }) def test_axp_20121231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/4962/000119312513070554/axp-20121231.xml') self.assert_item(item, { 'symbol': 'AXP', 'amend': True, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2012, 'end_date': '2012-12-31', 'revenues': 29592000000, 'op_income': None, 'net_income': 4482000000, 'eps_basic': 3.91, 'eps_diluted': 3.89, 'dividend': 0.8, 'assets': 153140000000, 'cur_assets': None, 'cur_liab': None, 'equity': 18886000000, 'cash': 22250000000, 'cash_flow_op': 7082000000, 'cash_flow_inv': -6545000000, 'cash_flow_fin': -3268000000 }) def test_axp_20130331(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/4962/000119312513180601/axp-20130331.xml') self.assert_item(item, { 'symbol': 'AXP', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2013, 'end_date': '2013-03-31', 'revenues': 7384000000, 'op_income': None, 'net_income': 1280000000, 'eps_basic': 1.15, 'eps_diluted': 1.15, 'dividend': 0.2, 'assets': 156855000000, 'cur_assets': None, 'cur_liab': None, 'equity': 19290000000, 'cash': 27964000000, 'cash_flow_op': 7547000000, 'cash_flow_inv': 32000000, 'cash_flow_fin': -1830000000 }) def test_ba_20091231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/12927/000119312510024406/ba-20091231.xml') self.assert_item(item, { 'symbol': 'BA', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2009, 'end_date': '2009-12-31', 'revenues': 68281000000, 'op_income': 2096000000, 'net_income': 1312000000, 'eps_basic': 1.86, 'eps_diluted': 1.84, 'dividend': 1.68, 'assets': 62053000000, 'cur_assets': 35275000000, 'cur_liab': 32883000000, 'equity': 2225000000, 'cash': 9215000000, 'cash_flow_op': 5603000000, 'cash_flow_inv': -3794000000, 'cash_flow_fin': 4094000000 }) def test_ba_20110930(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/12927/000119312511281613/ba-20110930.xml') self.assert_item(item, { 'symbol': 'BA', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2011, 'end_date': '2011-09-30', 'revenues': 17727000000, 'op_income': 1714000000, 'net_income': 1098000000, 'eps_basic': 1.47, 'eps_diluted': 1.46, 'dividend': 0.42, 'assets': 74163000000, 'cur_assets': 46347000000, 'cur_liab': 37593000000, 'equity': 6061000000, 'cash': 5954000000, 'cash_flow_op': 1092000000, 'cash_flow_inv': 856000000, 'cash_flow_fin': -1354000000 }) def test_ba_20130331(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/12927/000001292713000023/ba-20130331.xml') self.assert_item(item, { 'symbol': 'BA', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2013, 'end_date': '2013-03-31', 'revenues': 18893000000, 'op_income': 1528000000, 'net_income': 1106000000, 'eps_basic': 1.45, 'eps_diluted': 1.44, 'dividend': 0.49, 'assets': 90447000000, 'cur_assets': 59490000000, 'cur_liab': 45666000000, 'equity': 7560000000, 'cash': 8335000000, 'cash_flow_op': 524000000, 'cash_flow_inv': -814000000, 'cash_flow_fin': -1705000000 }) def test_bbt_20110930(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/92230/000119312511304459/bbt-20110930.xml') self.assert_item(item, { 'symbol': 'BBT', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2011, 'end_date': '2011-09-30', 'revenues': 2440000000, 'op_income': None, 'net_income': 366000000, 'eps_basic': 0.52, 'eps_diluted': 0.52, 'dividend': 0.16, 'assets': 167677000000, 'cur_assets': None, 'cur_liab': None, 'equity': 17541000000, 'cash': 1312000000, 'cash_flow_op': 4348000000, 'cash_flow_inv': -10838000000, 'cash_flow_fin': 8509000000 }) def test_bk_20100331(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1390777/000119312510112944/bk-20100331.xml') self.assert_item(item, { 'symbol': 'BK', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2010, 'end_date': '2010-03-31', 'revenues': 883000000, 'op_income': None, 'net_income': 559000000, 'eps_basic': 0.46, 'eps_diluted': 0.46, 'dividend': 0.09, 'assets': 220551000000, 'cur_assets': None, 'cur_liab': None, 'equity': 30455000000, 'cash': 3307000000, 'cash_flow_op': 1191000000, 'cash_flow_inv': 512000000, 'cash_flow_fin': -2126000000 }) def test_blk_20130630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1364742/000119312513326890/blk-20130630.xml') self.assert_item(item, { 'symbol': 'BLK', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2013, 'end_date': '2013-06-30', 'revenues': 2482000000, 'op_income': 849000000, 'net_income': 729000000, 'eps_basic': 4.27, 'eps_diluted': 4.19, 'dividend': 1.68, 'assets': 193745000000, 'cur_assets': None, 'cur_liab': None, 'equity': 25755000000, 'cash': 3668000000, 'cash_flow_op': 1330000000, 'cash_flow_inv': 10000000, 'cash_flow_fin': -2193000000 }) def test_c_20090630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/831001/000104746909007400/c-20090630.xml') self.assert_item(item, { 'symbol': 'C', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2009, 'end_date': '2009-06-30', 'revenues': 29969000000, 'net_income': 4279000000, 'op_income': None, 'eps_basic': 0.49, 'eps_diluted': 0.49, 'dividend': 0.0, 'assets': 1848533000000, 'cur_assets': None, 'cur_liab': None, 'equity': 154168000000, 'cash': 26915000000, 'cash_flow_op': -20737000000, 'cash_flow_inv': 16457000000, 'cash_flow_fin': 959000000 }) def test_cbs_20100331(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/813828/000104746910004823/cbs-20100331.xml') self.assert_item(item, { 'symbol': 'CBS', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2010, 'end_date': '2010-03-31', 'revenues': 3530900000, 'op_income': 153400000, 'net_income': -26200000, 'eps_basic': -0.04, 'eps_diluted': -0.04, 'dividend': 0.05, 'assets': 26756100000, 'cur_assets': 5705200000, 'cur_liab': 4712300000, 'equity': 9046100000, 'cash': 872700000, 'cash_flow_op': 700700000, 'cash_flow_inv': -73600000, 'cash_flow_fin': -471100000 }) def test_cbs_20111231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/813828/000104746912001373/cbs-20111231.xml') self.assert_item(item, { 'symbol': 'CBS', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2011, 'end_date': '2011-12-31', 'revenues': 14245000000, 'op_income': 2529000000, 'net_income': 1305000000, 'eps_basic': 1.97, 'eps_diluted': 1.92, 'dividend': 0.35, 'assets': 26197000000, 'cur_assets': 5543000000, 'cur_liab': 3933000000, 'equity': 9908000000, 'cash': 660000000, 'cash_flow_op': 1749000000, 'cash_flow_inv': -389000000, 'cash_flow_fin': -1180000000 }) def test_cbs_20130630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/813828/000104746913007929/cbs-20130630.xml') self.assert_item(item, { 'symbol': 'CBS', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2013, 'end_date': '2013-06-30', 'revenues': 3699000000, 'op_income': 838000000, 'net_income': 472000000, 'eps_basic': 0.78, 'eps_diluted': 0.76, 'dividend': 0.12, 'assets': 25693000000, 'cur_assets': 4770000000, 'cur_liab': 3825000000, 'equity': 9601000000, 'cash': 282000000, 'cash_flow_op': 1051000000, 'cash_flow_inv': -230000000, 'cash_flow_fin': -1247000000 }) def test_cce_20101001(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1491675/000119312510239952/cce-20101001.xml') self.assert_item(item, { 'symbol': 'CCE', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2010, 'end_date': '2010-10-01', 'revenues': 1681000000, 'op_income': 244000000, 'net_income': 208000000, 'eps_basic': 0.61, 'eps_diluted': 0.61, 'dividend': 0.0, 'assets': 8457000000, 'cur_assets': 3145000000, 'cur_liab': 2154000000, 'equity': 3277000000, 'cash': 476000000, 'cash_flow_op': 620000000, 'cash_flow_inv': -705000000, 'cash_flow_fin': 178000000 }) def test_cce_20101231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1491675/000119312511033197/cce-20101231.xml') self.assert_item(item, { 'symbol': 'CCE', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2010, 'end_date': '2010-12-31', 'revenues': 6714000000, 'op_income': 810000000, 'net_income': 624000000, 'eps_basic': 1.84, 'eps_diluted': 1.83, 'dividend': 0.12, 'assets': 8596000000, 'cur_assets': 2230000000, 'cur_liab': 1942000000, 'equity': 3143000000, 'cash': 321000000, 'cash_flow_op': 825000000, 'cash_flow_inv': -739000000, 'cash_flow_fin': -144000000 }) def test_cci_20091231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1051470/000119312510031419/cci-20091231.xml') self.assert_item(item, { 'symbol': 'CCI', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2009, 'end_date': '2009-12-31', 'revenues': 1685407000, 'op_income': 433991000, 'net_income': -135138000, 'eps_basic': -0.47, 'eps_diluted': -0.47, 'dividend': 0.0, 'assets': 10956606000, 'cur_assets': 1196033000, 'cur_liab': 754105000, 'equity': 2936085000, 'cash': 766146000, 'cash_flow_op': 571256000, 'cash_flow_inv': -172145000, 'cash_flow_fin': 214396000 }) def test_ccmm_20110630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1091667/000109166711000103/ccmm-20110630.xml') self.assert_item(item, { 'symbol': 'CCMM', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2011, 'end_date': '2011-06-30', 'revenues': 1791000000, 'op_income': 270000000, 'net_income': -107000000, 'eps_basic': -0.98, 'eps_diluted': -0.98, 'dividend': 0.0, 'assets': None, 'cur_assets': None, # Seems the source filing got the wrong context date on balance sheet 'cur_liab': None, 'equity': None, 'cash': 194000000, 'cash_flow_op': 907000000, 'cash_flow_inv': -694000000, 'cash_flow_fin': -51000000 }) def test_chtr_20111231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1091667/000109166712000026/chtr-20111231.xml') self.assert_item(item, { 'symbol': 'CHTR', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2011, 'end_date': '2011-12-31', 'revenues': 7204000000, 'op_income': 1041000000, 'net_income': -369000000, 'eps_basic': -3.39, 'eps_diluted': -3.39, 'dividend': 0.0, 'assets': 15605000000, 'cur_assets': 370000000, 'cur_liab': 1153000000, 'equity': 409000000, 'cash': 2000000, 'cash_flow_op': 1737000000, 'cash_flow_inv': -1367000000, 'cash_flow_fin': -373000000 }) def test_ci_20130331(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/701221/000110465913036475/ci-20130331.xml') self.assert_item(item, { 'symbol': 'CI', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2013, 'end_date': '2013-03-31', 'revenues': 8183000000, 'op_income': None, 'net_income': 57000000, 'eps_basic': 0.2, 'eps_diluted': 0.2, 'dividend': 0.04, 'assets': 54939000000, 'cur_assets': None, 'cur_liab': None, 'equity': 9660000000, 'cash': 3306000000, 'cash_flow_op': -805000000, 'cash_flow_inv': 962000000, 'cash_flow_fin': 185000000 }) def test_cit_20100630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1171825/000089109210003376/cit-20100331.xml') self.assert_item(item, { 'symbol': 'CIT', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2010, 'end_date': '2010-06-30', 'revenues': 669500000, 'op_income': None, 'net_income': 142100000, 'eps_basic': 0.71, 'eps_diluted': 0.71, 'dividend': 0.0, 'assets': 54916800000, 'cur_assets': None, 'cur_liab': None, 'equity': 8633900000, 'cash': 1060700000, 'cash_flow_op': 178100000, 'cash_flow_inv': 7122800000, 'cash_flow_fin': -6218700000 }) def test_csc_20120928(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/23082/000002308212000073/csc-20120928.xml') self.assert_item(item, { 'symbol': 'CSC', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2013, 'end_date': '2012-09-28', 'revenues': 3854000000, 'op_income': 298000000, 'net_income': 130000000, 'eps_basic': 0.84, 'eps_diluted': 0.83, 'dividend': 0.2, 'assets': 11649000000, 'cur_assets': 5468000000, 'cur_liab': 4015000000, 'equity': 2885000000, 'cash': 1850000000, 'cash_flow_op': 665000000, 'cash_flow_inv': -366000000, 'cash_flow_fin': 469000000 }) def test_disca_20090630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1437107/000095012309029613/disca-20090630.xml') self.assert_item(item, { 'symbol': 'DISCA', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2009, 'end_date': '2009-06-30', 'revenues': 881000000, 'op_income': 486000000, 'net_income': 183000000, 'eps_basic': 0.43, 'eps_diluted': 0.43, 'dividend': 0.0, 'assets': 10696000000, 'cur_assets': 1331000000, 'cur_liab': 1227000000, 'equity': 5918000000, 'cash': 339000000, 'cash_flow_op': 320000000, 'cash_flow_inv': 288000000, 'cash_flow_fin': -371000000 }) def test_disca_20090930(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1437107/000095012309056946/disca-20090930.xml') self.assert_item(item, { 'symbol': 'DISCA', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2009, 'end_date': '2009-09-30', 'revenues': 854000000, 'op_income': 215000000, 'net_income': 95000000, 'eps_basic': 0.22, 'eps_diluted': 0.22, 'dividend': 0.0, 'assets': 10741000000, 'cur_assets': 1417000000, 'cur_liab': 762000000, 'equity': 6042000000, 'cash': 401000000, 'cash_flow_op': 358000000, 'cash_flow_inv': 279000000, 'cash_flow_fin': -343000000 }) def test_dltr_20130504(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/935703/000093570313000029/dltr-20130504.xml') self.assert_item(item, { 'symbol': 'DLTR', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2013, 'end_date': '2013-05-04', 'revenues': 1865800000, 'op_income': 216600000, 'net_income': 133500000, 'eps_basic': 0.6, 'eps_diluted': 0.59, 'dividend': 0.0, 'assets': 2811800000, 'cur_assets': 1489800000, 'cur_liab': 663000000, 'equity': 1739700000, 'cash': 383300000, 'cash_flow_op': 129300000, 'cash_flow_inv': -88200000, 'cash_flow_fin': -57400000 }) def test_dtv_20110331(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1465112/000104746911004655/dtv-20110331.xml') self.assert_item(item, { 'symbol': 'DTV', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2011, 'end_date': '2011-03-31', 'revenues': 6319000000, 'op_income': 1155000000, 'net_income': 674000000, 'eps_basic': 0.85, 'eps_diluted': 0.85, 'dividend': 0.0, 'assets': 20593000000, 'cur_assets': 6938000000, 'cur_liab': 4125000000, 'equity': -902000000, 'cash': 4295000000, 'cash_flow_op': 1309000000, 'cash_flow_inv': -544000000, 'cash_flow_fin': 2028000000 }) def test_ebay_20100630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1065088/000119312510164115/ebay-20100630.xml') self.assert_item(item, { 'symbol': 'EBAY', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2010, 'end_date': '2010-06-30', 'revenues': 2215379000, 'op_income': 484565000, 'net_income': 412192000, 'eps_basic': 0.31, 'eps_diluted': 0.31, 'dividend': 0.0, 'assets': 18747584000, 'cur_assets': 8675313000, 'cur_liab': 3564261000, 'equity': 14169291000, 'cash': 4037442000, 'cash_flow_op': 1144641000, 'cash_flow_inv': -835635000, 'cash_flow_fin': 50363000 }) def test_ebay_20130331(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1065088/000106508813000058/ebay-20130331.xml') self.assert_item(item, { 'symbol': 'EBAY', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2013, 'end_date': '2013-03-31', 'revenues': 3748000000, 'op_income': 800000000, 'net_income': 677000000, 'eps_basic': 0.52, 'eps_diluted': 0.51, 'dividend': 0.0, 'assets': 38000000000, 'cur_assets': 22336000000, 'cur_liab': 11720000000, 'equity': 21112000000, 'cash': 6530000000, 'cash_flow_op': 937000000, 'cash_flow_inv': -719000000, 'cash_flow_fin': -411000000 }) def test_ecl_20120930(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/31462/000110465912072308/ecl-20120930.xml') self.assert_item(item, { 'symbol': 'ECL', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2012, 'end_date': '2012-09-30', 'revenues': 3023300000, 'op_income': 401200000, 'net_income': 238000000, 'eps_basic': 0.81, 'eps_diluted': 0.8, 'dividend': 0.2, 'assets': 16722800000, 'cur_assets': 4072900000, 'cur_liab': 2818700000, 'equity': 6026200000, 'cash': 324000000, 'cash_flow_op': 720800000, 'cash_flow_inv': -414900000, 'cash_flow_fin': -1815800000 }) def test_ed_20130930(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/23632/000119312513425393/ed-20130930.xml') self.assert_item(item, { 'symbol': 'ED', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2013, 'end_date': '2013-09-30', 'revenues': 3484000000, 'op_income': 855000000, 'net_income': 464000000, 'eps_basic': 1.58, 'eps_diluted': 1.58, 'dividend': 0.615, 'assets': 41964000000, 'cur_assets': 3704000000, 'cur_liab': 4373000000, 'equity': 12166000000, 'cash': 74000000, 'cash_flow_op': 1238000000, 'cash_flow_inv': -1895000000, 'cash_flow_fin': 337000000 }) def test_eqt_20101231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/33213/000110465911009751/eqt-20101231.xml') self.assert_item(item, { 'symbol': 'EQT', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2010, 'end_date': '2010-12-31', 'revenues': 1322708000, 'op_income': 470479000, 'net_income': 227700000, 'eps_basic': 1.58, 'eps_diluted': 1.57, 'dividend': 0.88, 'assets': 7098438000, 'cur_assets': 827940000, 'cur_liab': 596984000, 'equity': 3078696000, 'cash': 0.0, 'cash_flow_op': 789740000, 'cash_flow_inv': -1239429000, 'cash_flow_fin': 449689000 }) def test_etr_20121231(self): # Large file test (121 MB) item = parse_xml('http://www.sec.gov/Archives/edgar/data/7323/000006598413000050/etr-20121231.xml') self.assert_item(item, { 'symbol': 'ETR', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2012, 'end_date': '2012-12-31', 'revenues': 10302079000, 'op_income': 1301181000, 'net_income': 846673000, 'eps_basic': 4.77, 'eps_diluted': 4.76, 'dividend': 3.32, 'assets': 43202502000, 'cur_assets': 3683126000, 'cur_liab': 4106321000, 'equity': 9291089000, 'cash': 532569000, 'cash_flow_op': 2940285000, 'cash_flow_inv': -3639797000, 'cash_flow_fin': 538151000 }) def test_exc_20100930(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/22606/000119312510234590/exc-20100930.xml') self.assert_item(item, { 'symbol': 'EXC', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2010, 'end_date': '2010-09-30', 'revenues': 5291000000, 'op_income': 1366000000, 'net_income': 845000000, 'eps_basic': 1.28, 'eps_diluted': 1.27, 'dividend': 0.53, 'assets': 50948000000, 'cur_assets': 6760000000, 'cur_liab': 3967000000, 'equity': 13955000000, 'cash': 2735000000, 'cash_flow_op': 4112000000, 'cash_flow_inv': -2037000000, 'cash_flow_fin': -1350000000 }) def test_fast_20090630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/815556/000119312509154691/fast-20090630.xml') self.assert_item(item, { 'symbol': 'FAST', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2009, 'end_date': '2009-06-30', 'revenues': 474894000, 'op_income': 69938000, 'net_income': 43538000, 'eps_basic': 0.29, 'eps_diluted': 0.29, 'dividend': 0.0, 'assets': 1328684000, 'cur_assets': 988997000, 'cur_liab': 127950000, 'equity': 1186845000, 'cash': 173667000, 'cash_flow_op': 167552000, 'cash_flow_inv': -28942000, 'cash_flow_fin': -51986000 }) def test_fast_20090930(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/815556/000119312509212481/fast-20090930.xml') self.assert_item(item, { 'symbol': 'FAST', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2009, 'end_date': '2009-09-30', 'revenues': 489339000, 'op_income': 76410000, 'net_income': 47589000, 'eps_basic': 0.32, 'eps_diluted': 0.32, 'dividend': 0.0, 'assets': 1337764000, 'cur_assets': 998090000, 'cur_liab': 138744000, 'equity': 1185140000, 'cash': 193744000, 'cash_flow_op': 253184000, 'cash_flow_inv': -41031000, 'cash_flow_fin': -106943000 }) def test_fb_20120630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1326801/000119312512325997/fb-20120630.xml') self.assert_item(item, { 'symbol': 'FB', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2012, 'end_date': '2012-06-30', 'revenues': 1184000000, 'op_income': -743000000, 'net_income': -157000000, 'eps_basic': -0.08, 'eps_diluted': -0.08, 'dividend': 0.0, 'assets': 14928000000, 'cur_assets': 11967000000, 'cur_liab': 1034000000, 'equity': 13309000000, 'cash': 2098000000, 'cash_flow_op': 683000000, 'cash_flow_inv': -7170000000, 'cash_flow_fin': 7090000000 }) def test_fb_20121231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1326801/000132680113000003/fb-20121231.xml') self.assert_item(item, { 'symbol': 'FB', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2012, 'end_date': '2012-12-31', 'revenues': 5089000000, 'op_income': 538000000, 'net_income': 32000000, 'eps_basic': 0.02, 'eps_diluted': 0.01, 'dividend': 0.0, 'assets': 15103000000, 'cur_assets': 11267000000, 'cur_liab': 1052000000, 'equity': 11755000000, 'cash': 2384000000, 'cash_flow_op': 1612000000, 'cash_flow_inv': -7024000000, 'cash_flow_fin': 6283000000 }) def test_fll_20121231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/891482/000118811213000562/fll-20121231.xml') self.assert_item(item, { 'symbol': 'FLL', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2012, 'end_date': '2012-12-31', 'revenues': 128760000, 'op_income': 49638000, 'net_income': 27834000, 'eps_basic': 1.49, 'eps_diluted': None, 'dividend': 0.0, 'assets': 162725000, 'cur_assets': 32339000, 'cur_liab': 15332000, 'equity': 81133000, 'cash': 20603000, 'cash_flow_op': -4301000, 'cash_flow_inv': 45271000, 'cash_flow_fin': -35074000 }) def test_flr_20080930(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1124198/000110465908068715/flr-20080930.xml') self.assert_item(item, { 'symbol': 'FLR', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2008, 'end_date': '2008-09-30', 'revenues': 5673818000, 'op_income': None, 'net_income': 183099000, 'eps_basic': 1.03, 'eps_diluted': 1.01, 'dividend': 0.125, 'assets': 6605120000, 'cur_assets': 4808393000, 'cur_liab': 3228638000, 'equity': 2741002000, 'cash': 1514943000, 'cash_flow_op': 855198000, 'cash_flow_inv': -295445000, 'cash_flow_fin': -202011000 }) def test_fmc_20090630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/37785/000119312509165435/fmc-20090630.xml') self.assert_item(item, { 'symbol': 'FMC', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2009, 'end_date': '2009-06-30', 'revenues': 700300000, 'op_income': 97200000, 'net_income': 69300000, 'eps_basic': 0.95, 'eps_diluted': 0.94, 'dividend': 0.0, 'assets': 3028500000, 'cur_assets': 1423700000, 'cur_liab': 717200000, 'equity': 1101200000, 'cash': 67000000, 'cash_flow_op': 173900000, 'cash_flow_inv': -106500000, 'cash_flow_fin': -33100000 }) def test_fpl_20100331(self): # FPL was later changed to NEE item = parse_xml('http://www.sec.gov/Archives/edgar/data/37634/000075330810000051/fpl-20100331.xml') self.assert_item(item, { 'symbol': 'FPL', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2010, 'end_date': '2010-03-31', 'revenues': 3622000000, 'op_income': 939000000, 'net_income': 556000000, 'eps_basic': 1.36, 'eps_diluted': 1.36, 'dividend': 0.5, 'assets': 50942000000, 'cur_assets': 5557000000, 'cur_liab': 7782000000, 'equity': 13336000000, 'cash': 1215000000, 'cash_flow_op': 896000000, 'cash_flow_inv': -1361000000, 'cash_flow_fin': 1442000000 }) def test_ftr_20110930(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/20520/000002052011000066/ftr-20110930.xml') self.assert_item(item, { 'symbol': 'FTR', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2011, 'end_date': '2011-09-30', 'revenues': 1290939000, 'op_income': 180291000, 'net_income': 19481000, 'eps_basic': 0.02, 'eps_diluted': 0.02, 'dividend': 0.0, 'assets': 17493767000, 'cur_assets': 969746000, 'cur_liab': 1168142000, 'equity': 4776588000, 'cash': 205817000, 'cash_flow_op': 1272654000, 'cash_flow_inv': -676974000, 'cash_flow_fin': -641126000 }) def test_ge_20121231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/40545/000004054513000036/ge-20121231.xml') self.assert_item(item, { 'symbol': 'GE', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2012, 'end_date': '2012-12-31', 'revenues': 147359000000, 'op_income': 22887000000, 'net_income': 13641000000, 'eps_basic': 1.29, 'eps_diluted': 1.29, 'dividend': 0.7, 'assets': 685328000000, 'cur_assets': None, 'cur_liab': None, 'equity': 128470000000, 'cash': 77356000000, 'cash_flow_op': 31331000000, 'cash_flow_inv': 11302000000, 'cash_flow_fin': -51074000000 }) def test_gis_20121125(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/40704/000119312512508388/gis-20121125.xml') self.assert_item(item, { 'symbol': 'GIS', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2013, 'end_date': '2012-11-25', 'revenues': 4881800000, 'op_income': 829000000, 'net_income': 541600000, 'eps_basic': 0.84, 'eps_diluted': 0.82, 'dividend': 0.33, 'assets': 22952900000, 'cur_assets': 4565500000, 'cur_liab': 5736400000, 'equity': 7440000000, 'cash': 734900000, 'cash_flow_op': 1317100000, 'cash_flow_inv': -1103200000, 'cash_flow_fin': 33700000 }) def test_gmcr_20110625(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/909954/000119312511214253/gmcr-20110630.xml') self.assert_item(item, { 'symbol': 'GMCR', 'amend': False, # it's actually amended, but not marked in XML 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2011, 'end_date': '2011-06-25', 'revenues': 717210000, 'op_income': 119310000, 'net_income': 56348000, 'eps_basic': 0.38, 'eps_diluted': 0.37, 'dividend': 0.0, 'assets': 2874422000, 'cur_assets': 844998000, 'cur_liab': 395706000, 'equity': 1816646000, 'cash': 76138000, 'cash_flow_op': 174708000, 'cash_flow_inv': -1082070000, 'cash_flow_fin': 986183000 }) def test_goog_20090930(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1288776/000119312509222384/goog-20090930.xml') self.assert_item(item, { 'symbol': 'GOOG', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2009, 'end_date': '2009-09-30', 'revenues': 5944851000, 'op_income': 2073718000, 'net_income': 1638975000, 'eps_basic': 5.18, 'eps_diluted': 5.13, 'dividend': 0.0, 'assets': 37702845000, 'cur_assets': 26353544000, 'cur_liab': 2321774000, 'equity': 33721753000, 'cash': 12087115000, 'cash_flow_op': 6584667000, 'cash_flow_inv': -3245963000, 'cash_flow_fin': 74851000 }) def test_goog_20120930(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1288776/000119312512440217/goog-20120930.xml') self.assert_item(item, { 'symbol': 'GOOG', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2012, 'end_date': '2012-09-30', 'revenues': 14101000000, 'op_income': 2736000000, 'net_income': 2176000000, 'eps_basic': 6.64, 'eps_diluted': 6.53, 'dividend': 0.0, 'assets': 89730000000, 'cur_assets': 56821000000, 'cur_liab': 14434000000, 'equity': 68028000000, 'cash': 16260000000, 'cash_flow_op': 11950000000, 'cash_flow_inv': -7542000000, 'cash_flow_fin': 1921000000 }) def test_goog_20121231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1288776/000119312513028362/goog-20121231.xml') self.assert_item(item, { 'symbol': 'GOOG', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2012, 'end_date': '2012-12-31', 'revenues': 50175000000, 'op_income': 12760000000, 'net_income': 10737000000, 'eps_basic': 32.81, 'eps_diluted': 32.31, 'dividend': 0.0, 'assets': 93798000000, 'cur_assets': 60454000000, 'cur_liab': 14337000000, 'equity': 71715000000, 'cash': 14778000000, 'cash_flow_op': 16619000000, 'cash_flow_inv': -13056000000, 'cash_flow_fin': 1229000000 }) def test_goog_20130630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1288776/000128877613000055/goog-20130630.xml') self.assert_item(item, { 'symbol': 'GOOG', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2013, 'end_date': '2013-06-30', 'revenues': 14105000000, 'op_income': 3123000000, 'net_income': 3228000000, 'eps_basic': 9.71, 'eps_diluted': 9.54, 'dividend': 0.0, 'assets': 101182000000, 'cur_assets': 66861000000, 'cur_liab': 15329000000, 'equity': 78852000000, 'cash': 16164000000, 'cash_flow_op': 8338000000, 'cash_flow_inv': -6244000000, 'cash_flow_fin': -622000000 }) def test_goog_20140630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1288776/000128877614000065/goog-20140630.xml') self.assert_item(item, { 'symbol': 'GOOG/GOOGL', # Two symbols, see issue #6 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2014, 'end_date': '2014-06-30', 'revenues': 15955000000, 'op_income': 4258000000, 'net_income': 3422000000, 'eps_basic': 5.07, 'eps_diluted': 4.99, 'dividend': 0.0, 'assets': 121608000000, 'cur_assets': 77905000000, 'cur_liab': 17097000000, 'equity': 95749000000, 'cash': 19620000000, 'cash_flow_op': 10018000000, 'cash_flow_inv': -8487000000, 'cash_flow_fin': -640000000 }) def test_gs_20090626(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/886982/000095012309029919/gs-20090626.xml') self.assert_item(item, { 'symbol': 'GS', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2009, 'end_date': '2009-06-26', 'revenues': 13761000000, 'op_income': None, 'net_income': 2718000000, 'eps_basic': 5.27, 'eps_diluted': 4.93, 'dividend': 0.35, 'assets': 889544000000, 'cur_assets': None, 'cur_liab': None, 'equity': 62813000000, 'cash': 22177000000, 'cash_flow_op': 16020000000, 'cash_flow_inv': -772000000, 'cash_flow_fin': -6876000000 }) def test_hon_20120331(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/773840/000093041312002323/hon-20120331.xml') self.assert_item(item, { 'symbol': 'HON', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2012, 'end_date': '2012-03-31', 'revenues': 9307000000, 'op_income': None, 'net_income': 823000000, 'eps_basic': 1.06, 'eps_diluted': 1.04, 'dividend': 0.3725, 'assets': 40370000000, 'cur_assets': 16553000000, 'cur_liab': 12666000000, 'equity': 11842000000, 'cash': 3988000000, 'cash_flow_op': 196000000, 'cash_flow_inv': -122000000, 'cash_flow_fin': 169000000 }) def test_hrb_20090731(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/12659/000095012309041361/hrb-20090731.xml') self.assert_item(item, { 'symbol': 'HRB', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2010, 'end_date': '2009-07-31', 'revenues': 275505000, 'op_income': -214162000, 'net_income': -133634000, 'eps_basic': -0.4, 'eps_diluted': -0.4, 'dividend': 0.15, 'assets': 4545762000, 'cur_assets': 1828146000, 'cur_liab': 1823126000, 'equity': 1190714000, 'cash': 1006303000, 'cash_flow_op': -454577000, 'cash_flow_inv': 15360000, 'cash_flow_fin': -216206000 }) def test_hrb_20091031(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/12659/000095012309069608/hrb-20091031.xml') self.assert_item(item, { 'symbol': 'HRB', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2010, 'end_date': '2009-10-31', 'revenues': 326081000, 'op_income': -214553000, 'net_income': -128587000, 'eps_basic': -0.38, 'eps_diluted': -0.38, 'dividend': 0.15, 'assets': 4967359000, 'cur_assets': 2300986000, 'cur_liab': 2382867000, 'equity': 1071097000, 'cash': 1432243000, 'cash_flow_op': -786152000, 'cash_flow_inv': 43280000, 'cash_flow_fin': 511231000 }) def test_hrb_20130731(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/12659/000157484213000013/hrb-20130731.xml') self.assert_item(item, { 'symbol': 'HRB', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2014, 'end_date': '2013-07-31', 'revenues': 127195000, 'op_income': -179555000, 'net_income': -115187000, 'eps_basic': -0.42, 'eps_diluted': -0.42, 'dividend': 0.20, 'assets': 3762888000, 'cur_assets': 1704932000, 'cur_liab': 1450484000, 'equity': 1105315000, 'cash': 1163876000, 'cash_flow_op': -318742000, 'cash_flow_inv': -29090000, 'cash_flow_fin': -229255000 }) def test_ihc_20120331(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/701869/000070186912000029/ihc-20120331.xml') self.assert_item(item, { 'symbol': 'IHC', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2012, 'end_date': '2012-03-31', 'revenues': 102156000, 'op_income': 6416000, 'net_income': 3922000, 'eps_basic': 0.22, 'eps_diluted': 0.22, 'dividend': 0.0, 'assets': 1364411000, 'cur_assets': None, 'cur_liab': None, 'equity': 280250000, 'cash': 9286000, 'cash_flow_op': -138843000, 'cash_flow_inv': 130710000, 'cash_flow_fin': -808000 }) def test_intc_20111231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/50863/000119312512075534/intc-20111231.xml') self.assert_item(item, { 'symbol': 'INTC', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2011, 'end_date': '2011-12-31', 'revenues': 53999000000, 'op_income': 17477000000, 'net_income': 12942000000, 'eps_basic': 2.46, 'eps_diluted': 2.39, 'dividend': 0.7824, 'assets': 71119000000, 'cur_assets': 25872000000, 'cur_liab': 12028000000, 'equity': 45911000000, 'cash': 5065000000, 'cash_flow_op': 20963000000, 'cash_flow_inv': -10301000000, 'cash_flow_fin': -11100000000 }) def test_intu_20101031(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/896878/000095012310111135/intu-20101031.xml') self.assert_item(item, { 'symbol': 'INTU', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2011, 'end_date': '2010-10-31', 'revenues': 532000000, 'op_income': -104000000, 'net_income': -70000000, 'eps_basic': -0.22, 'eps_diluted': -0.22, 'dividend': 0.0, 'assets': 4943000000, 'cur_assets': 2010000000, 'cur_liab': 1136000000, 'equity': 2615000000, 'cash': 112000000, 'cash_flow_op': -211000000, 'cash_flow_inv': 285000000, 'cash_flow_fin': -177000000 }) def test_jnj_20120101(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/200406/000119312512075565/jnj-20120101.xml') self.assert_item(item, { 'symbol': 'JNJ', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2011, 'end_date': '2012-01-01', 'revenues': 65030000000, 'op_income': 13765000000, 'net_income': 9672000000, 'eps_basic': 3.54, 'eps_diluted': 3.49, 'dividend': 2.25, 'assets': 113644000000, 'cur_assets': 54316000000, 'cur_liab': 22811000000, 'equity': 57080000000, 'cash': 24542000000, 'cash_flow_op': 14298000000, 'cash_flow_inv': -4612000000, 'cash_flow_fin': -4452000000 }) def test_jnj_20120930(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/200406/000020040612000140/jnj-20120930.xml') self.assert_item(item, { 'symbol': 'JNJ', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2012, 'end_date': '2012-09-30', 'revenues': 17052000000, 'op_income': 3825000000, 'net_income': 2968000000, 'eps_basic': 1.08, 'eps_diluted': 1.05, 'dividend': 0.61, 'assets': 118951000000, 'cur_assets': 44791000000, 'cur_liab': 23935000000, 'equity': 63761000000, 'cash': 15486000000, 'cash_flow_op': 12020000000, 'cash_flow_inv': -2007000000, 'cash_flow_fin': -19091000000 }) def test_jnj_20130630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/200406/000020040613000091/jnj-20130630.xml') self.assert_item(item, { 'symbol': 'JNJ', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2013, 'end_date': '2013-06-30', 'revenues': 17877000000, 'op_income': 5020000000, 'net_income': 3833000000, 'eps_basic': 1.36, 'eps_diluted': 1.33, 'dividend': 0.66, 'assets': 124325000000, 'cur_assets': 51273000000, 'cur_liab': 23767000000, 'equity': 69665000000, 'cash': 17307000000, 'cash_flow_op': 7328000000, 'cash_flow_inv': -1972000000, 'cash_flow_fin': -2754000000 }) def test_jpm_20090630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/19617/000095012309032832/jpm-20090630.xml') self.assert_item(item, { 'symbol': 'JPM', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2009, 'end_date': '2009-06-30', 'revenues': 25623000000, 'op_income': None, 'net_income': 1072000000, 'eps_basic': 0.28, 'eps_diluted': 0.28, 'dividend': 0.05, 'assets': 2026642000000, 'cur_assets': None, 'cur_liab': None, 'equity': 154766000000, 'cash': 25133000000, 'cash_flow_op': 103259000000, 'cash_flow_inv': 34430000000, 'cash_flow_fin': -139413000000 }) def test_jpm_20111231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/19617/000001961712000163/jpm-20111231.xml') self.assert_item(item, { 'symbol': 'JPM', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2011, 'end_date': '2011-12-31', 'revenues': 97234000000, 'op_income': None, 'net_income': 17568000000, 'eps_basic': 4.50, 'eps_diluted': 4.48, 'dividend': 1.0, 'assets': 2265792000000, 'cur_assets': None, 'cur_liab': None, 'equity': 183573000000, 'cash': 59602000000, 'cash_flow_op': 95932000000, 'cash_flow_inv': -170752000000, 'cash_flow_fin': 107706000000 }) def test_jpm_20130331(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/19617/000001961713000300/jpm-20130331.xml') self.assert_item(item, { 'symbol': 'JPM', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2013, 'end_date': '2013-03-31', 'revenues': 25122000000, 'op_income': None, 'net_income': 6131000000, 'eps_basic': 1.61, 'eps_diluted': 1.59, 'dividend': 0.30, 'assets': 2389349000000, 'cur_assets': None, 'cur_liab': None, 'equity': 207086000000, 'cash': 45524000000, 'cash_flow_op': 19964000000, 'cash_flow_inv': -55455000000, 'cash_flow_fin': 28180000000 }) def test_ko_20100402(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/21344/000104746910004416/ko-20100402.xml') self.assert_item(item, { 'symbol': 'KO', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2010, 'end_date': '2010-04-02', 'revenues': 7525000000, 'op_income': 2183000000, 'net_income': 1614000000, 'eps_basic': 0.70, 'eps_diluted': 0.69, 'dividend': 0.44, 'assets': 47403000000, 'cur_assets': 17208000000, 'cur_liab': 13583000000, 'equity': 25157000000, 'cash': 5684000000, 'cash_flow_op': 1326000000, 'cash_flow_inv': -1368000000, 'cash_flow_fin': -1043000000 }) def test_ko_20101231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/21344/000104746911001506/ko-20101231.xml') self.assert_item(item, { 'symbol': 'KO', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2010, 'end_date': '2010-12-31', 'revenues': 35119000000, 'op_income': 8449000000, 'net_income': 11809000000, 'eps_basic': 5.12, 'eps_diluted': 5.06, 'dividend': 1.76, 'assets': 72921000000, 'cur_assets': 21579000000, 'cur_liab': 18508000000, 'equity': 31317000000, 'cash': 8517000000, 'cash_flow_op': 9532000000, 'cash_flow_inv': -4405000000, 'cash_flow_fin': -3465000000 }) def test_ko_20120928(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/21344/000002134412000051/ko-20120928.xml') self.assert_item(item, { 'symbol': 'KO', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2012, 'end_date': '2012-09-28', 'revenues': 12340000000, 'op_income': 2793000000, 'net_income': 2311000000, 'eps_basic': 0.51, 'eps_diluted': 0.50, 'dividend': 0.255, 'assets': 86654000000, 'cur_assets': 29712000000, 'cur_liab': 27008000000, 'equity': 33590000000, 'cash': 9615000000, 'cash_flow_op': 7840000000, 'cash_flow_inv': -10399000000, 'cash_flow_fin': -399000000 }) def test_krft_20120930(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1545158/000119312512495570/krft-20120930.xml') self.assert_item(item, { 'symbol': 'KRFT', 'amend': True, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2012, 'end_date': '2012-09-30', 'revenues': 4606000000, 'op_income': 762000000, 'net_income': 470000000, 'eps_basic': 0.79, 'eps_diluted': 0.79, 'dividend': 0.0, 'assets': 22284000000, 'cur_assets': 3905000000, 'cur_liab': 2569000000, 'equity': 7458000000, 'cash': 244000000, 'cash_flow_op': 2067000000, 'cash_flow_inv': -279000000, 'cash_flow_fin': -1548000000 }) def test_l_20100331(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/60086/000119312510105707/l-20100331.xml') self.assert_item(item, { 'symbol': 'L', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2010, 'end_date': '2010-03-31', 'revenues': 3713000000, 'op_income': None, 'net_income': 420000000, 'eps_basic': 0.99, 'eps_diluted': 0.99, 'dividend': 0.0625, 'assets': 75855000000, 'cur_assets': None, 'cur_liab': None, 'equity': 21993000000, 'cash': 135000000, 'cash_flow_op': 294000000, 'cash_flow_inv': -411000000, 'cash_flow_fin': 64000000 }) def test_l_20100930(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/60086/000119312510245478/l-20100930.xml') self.assert_item(item, { 'symbol': 'L', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2010, 'end_date': '2010-09-30', 'revenues': 3701000000, 'op_income': None, 'net_income': 36000000, 'eps_basic': 0.09, 'eps_diluted': 0.09, 'dividend': 0.0625, 'assets': 76821000000, 'cur_assets': None, 'cur_liab': None, 'equity': 23499000000, 'cash': 132000000, 'cash_flow_op': 895000000, 'cash_flow_inv': -426000000, 'cash_flow_fin': -527000000 }) def test_lbtya_20100331(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1316631/000119312510111069/lbtya-20100331.xml') self.assert_item(item, { 'symbol': 'LBTYA', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2010, 'end_date': '2010-03-31', 'revenues': 2178900000, 'op_income': 303600000, 'net_income': 736600000, 'eps_basic': 2.75, 'eps_diluted': 2.75, 'dividend': 0.0, 'assets': 33083500000, 'cur_assets': 5524900000, 'cur_liab': 4107000000, 'equity': 4066000000, 'cash': 4184200000, 'cash_flow_op': 803300000, 'cash_flow_inv': 45400000, 'cash_flow_fin': 170700000 }) def test_lcapa_20110930(self): # This symbol was changed to STRZA item = parse_xml('http://www.sec.gov/Archives/edgar/data/1507934/000150793411000006/lcapa-20110930.xml') self.assert_item(item, { 'symbol': 'LCAPA', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2011, 'end_date': '2011-09-30', 'revenues': 540000000, 'op_income': 111000000, 'net_income': -42000000, 'eps_basic': -0.07, 'eps_diluted': -0.12, 'dividend': 0.0, 'assets': 8915000000, 'cur_assets': 3767000000, 'cur_liab': 3012000000, 'equity': 5078000000, 'cash': 1937000000, 'cash_flow_op': 316000000, 'cash_flow_inv': -205000000, 'cash_flow_fin': -264000000 }) def test_linta_20120331(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1355096/000135509612000008/linta-20120331.xml') self.assert_item(item, { 'symbol': 'LINTA', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2012, 'end_date': '2012-03-31', 'revenues': 2314000000, 'op_income': 258000000, 'net_income': 91000000, 'eps_basic': 0.16, 'eps_diluted': 0.16, 'dividend': 0.0, 'assets': 17144000000, 'cur_assets': 2764000000, 'cur_liab': 3486000000, 'equity': 6505000000, 'cash': 794000000, 'cash_flow_op': 330000000, 'cash_flow_inv': -91000000, 'cash_flow_fin': -284000000 }) def test_lll_20100625(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1039101/000095012310071159/lll-20100625.xml') self.assert_item(item, { 'symbol': 'LLL', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2010, 'end_date': '2010-06-25', 'revenues': -3966000000, # a doc's error, should be 3966M 'op_income': -442000000, # a doc's error, should be 442M 'net_income': -228000000, # a doc's error, should be 227M 'eps_basic': 1.97, 'eps_diluted': 1.95, 'dividend': 0.4, 'assets': 15689000000, 'cur_assets': 5494000000, 'cur_liab': 3730000000, 'equity': 6926000000, 'cash': 1023000000, 'cash_flow_op': 589000000, 'cash_flow_inv': -688000000, 'cash_flow_fin': 132000000 }) def test_lltc_20110102(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/791907/000079190711000016/lltc-20110102.xml') self.assert_item(item, { 'symbol': 'LLTC', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2011, 'end_date': '2011-01-02', 'revenues': 383621000, 'op_income': 201059000, 'net_income': 143743000, 'eps_basic': 0.62, 'eps_diluted': 0.62, 'dividend': 0.23, 'assets': 1446186000, 'cur_assets': 1069958000, 'cur_liab': 199210000, 'equity': 278793000, 'cash': 203308000, 'cash_flow_op': 342333000, 'cash_flow_inv': 39771000, 'cash_flow_fin': -474650000 }) def test_lltc_20111002(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/791907/000079190711000080/lltc-20111007.xml') self.assert_item(item, { 'symbol': 'LLTC', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2012, 'end_date': '2011-10-02', 'revenues': 329920000, 'op_income': 157566000, 'net_income': 108401000, 'eps_basic': 0.47, 'eps_diluted': 0.47, 'dividend': 0.24, 'assets': 1659341000, 'cur_assets': 1268413000, 'cur_liab': 169006000, 'equity': 543199000, 'cash': 163414000, 'cash_flow_op': 149860000, 'cash_flow_inv': -171884000, 'cash_flow_fin': -85085000 }) def test_lly_20100930(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/59478/000095012310097867/lly-20100930.xml') self.assert_item(item, { 'symbol': 'LLY', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2010, 'end_date': '2010-09-30', 'revenues': 5654800000, 'op_income': None, 'net_income': 1302900000, 'eps_basic': 1.18, 'eps_diluted': 1.18, 'dividend': 0.49, 'assets': 29904300000, 'cur_assets': 14184300000, 'cur_liab': 6097400000, 'equity': 12405500000, 'cash': 5908800000, 'cash_flow_op': 4628700000, 'cash_flow_inv': -1595300000, 'cash_flow_fin': -1472300000 }) def test_lmca_20120331(self): # This symbol was changed to STRZA item = parse_xml('http://www.sec.gov/Archives/edgar/data/1507934/000150793412000012/lmca-20120331.xml') self.assert_item(item, { 'symbol': 'LMCA', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2012, 'end_date': '2012-03-31', 'revenues': 440000000, 'op_income': 89000000, 'net_income': 137000000, 'eps_basic': 1.13, 'eps_diluted': 1.10, 'dividend': 0.0, 'assets': 7122000000, 'cur_assets': 3380000000, 'cur_liab': 547000000, 'equity': 5321000000, 'cash': 1915000000, 'cash_flow_op': 94000000, 'cash_flow_inv': 581000000, 'cash_flow_fin': -830000000 }) def test_lnc_20120930(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/59558/000005955812000143/lnc-20120930.xml') self.assert_item(item, { 'symbol': 'LNC', 'amend': False, # mistake in doc, should be True 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2012, 'end_date': '2012-09-30', 'revenues': None, # missing in doc, should be 2954000000 'op_income': None, 'net_income': 402000000, 'eps_basic': 1.45, 'eps_diluted': 1.41, 'dividend': 0.0, 'assets': 215458000000, 'cur_assets': None, 'cur_liab': None, 'equity': 15237000000, 'cash': 4373000000, 'cash_flow_op': 666000000, 'cash_flow_inv': -2067000000, 'cash_flow_fin': 1264000000 }) def test_ltd_20111029(self): # This symbol was changed to LB item = parse_xml('http://www.sec.gov/Archives/edgar/data/701985/000144530511003514/ltd-20111029.xml') self.assert_item(item, { 'symbol': 'LTD', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2011, 'end_date': '2011-10-29', 'revenues': 2174000000, 'op_income': 186000000, 'net_income': 94000000, 'eps_basic': 0.32, 'eps_diluted': 0.31, 'dividend': 0.2, 'assets': 6517000000, 'cur_assets': 2616000000, 'cur_liab': 1504000000, 'equity': 521000000, 'cash': 498000000, 'cash_flow_op': 94000000, 'cash_flow_inv': -239000000, 'cash_flow_fin': -489000000 }) def test_ltd_20130803(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/701985/000070198513000032/ltd-20130803.xml') self.assert_item(item, { 'symbol': 'LTD', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2013, 'end_date': '2013-08-03', 'revenues': 2516000000, 'op_income': 358000000, 'net_income': 178000000, 'eps_basic': 0.62, 'eps_diluted': 0.61, 'dividend': 0.3, 'assets': 6072000000, 'cur_assets': 2098000000, 'cur_liab': 1485000000, 'equity': -861000000, 'cash': 551000000, 'cash_flow_op': 354000000, 'cash_flow_inv': -381000000, 'cash_flow_fin': -194000000 }) def test_luv_20110630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/92380/000009238011000070/luv-20110630.xml') self.assert_item(item, { 'symbol': 'LUV', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2011, 'end_date': '2011-06-30', 'revenues': 4136000000, 'op_income': 207000000, 'net_income': 161000000, 'eps_basic': 0.21, 'eps_diluted': 0.21, 'dividend': 0.0045, 'assets': 18945000000, 'cur_assets': 5421000000, 'cur_liab': 5318000000, 'equity': 7202000000, 'cash': 1595000000, 'cash_flow_op': 237000000, 'cash_flow_inv': -589000000, 'cash_flow_fin': -92000000 }) def test_mchp_20120630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/827054/000082705412000230/mchp-20120630.xml') self.assert_item(item, { 'symbol': 'MCHP', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2013, 'end_date': '2012-06-30', 'revenues': 352134000, 'op_income': 96333000, 'net_income': 78710000, 'eps_basic': 0.41, 'eps_diluted': 0.39, 'dividend': 0.35, 'assets': 3144840000, 'cur_assets': 2229298000, 'cur_liab': 249989000, 'equity': 2017990000, 'cash': 779848000, 'cash_flow_op': 128971000, 'cash_flow_inv': 77890000, 'cash_flow_fin': -62768000 }) def test_mdlz_20130930(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1103982/000119312513431957/mdlz-20130930.xml') self.assert_item(item, { 'symbol': 'MDLZ', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2013, 'end_date': '2013-09-30', 'revenues': 8472000000, 'op_income': 1262000000, 'net_income': 1024000000, 'eps_basic': 0.58, 'eps_diluted': 0.57, 'dividend': 0.14, 'assets': 74859000000, 'cur_assets': 15463000000, 'cur_liab': 15269000000, 'equity': 32492000000, 'cash': 3692000000, 'cash_flow_op': 1198000000, 'cash_flow_inv': -1015000000, 'cash_flow_fin': -881000000 }) def test_mmm_20091231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/66740/000110465910007295/mmm-20091231.xml') self.assert_item(item, { 'symbol': 'MMM', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2009, 'end_date': '2009-12-31', 'revenues': 23123000000, 'op_income': 4814000000, 'net_income': 3193000000, 'eps_basic': 4.56, 'eps_diluted': 4.52, 'dividend': 2.04, 'assets': 27250000000, 'cur_assets': 10795000000, 'cur_liab': 4897000000, 'equity': 13302000000, 'cash': 3040000000, 'cash_flow_op': 4941000000, 'cash_flow_inv': -1732000000, 'cash_flow_fin': -2014000000 }) def test_mmm_20120331(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/66740/000110465912032441/mmm-20120331.xml') self.assert_item(item, { 'symbol': 'MMM', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2012, 'end_date': '2012-03-31', 'revenues': 7486000000, 'op_income': 1634000000, 'net_income': 1125000000, 'eps_basic': 1.61, 'eps_diluted': 1.59, 'dividend': 0.59, 'assets': 32015000000, 'cur_assets': 12853000000, 'cur_liab': 5408000000, 'equity': 16619000000, 'cash': 2332000000, 'cash_flow_op': 828000000, 'cash_flow_inv': -43000000, 'cash_flow_fin': -722000000 }) def test_mmm_20130630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/66740/000110465913058961/mmm-20130630.xml') self.assert_item(item, { 'symbol': 'MMM', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2013, 'end_date': '2013-06-30', 'revenues': 7752000000, 'op_income': 1702000000, 'net_income': 1197000000, 'eps_basic': 1.74, 'eps_diluted': 1.71, 'dividend': 0.635, 'assets': 34130000000, 'cur_assets': 13983000000, 'cur_liab': 6335000000, 'equity': 18319000000, 'cash': 2942000000, 'cash_flow_op': 2673000000, 'cash_flow_inv': -740000000, 'cash_flow_fin': -1727000000 }) def test_mnst_20130630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/865752/000110465913062263/mnst-20130630.xml') self.assert_item(item, { 'symbol': 'MNST', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2013, 'end_date': '2013-06-30', 'revenues': 630934000, 'op_income': 179427000, 'net_income': 106873000, 'eps_basic': 0.64, 'eps_diluted': 0.62, 'dividend': 0.0, 'assets': 1317842000, 'cur_assets': 1093822000, 'cur_liab': 346174000, 'equity': 856021000, 'cash': 283839000, 'cash_flow_op': 99720000, 'cash_flow_inv': -70580000, 'cash_flow_fin': 30981000 }) def test_msft_20110630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/789019/000119312511200680/msft-20110630.xml') self.assert_item(item, { 'symbol': 'MSFT', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2011, 'end_date': '2011-06-30', 'revenues': 69943000000, 'op_income': 27161000000, 'net_income': 23150000000, 'eps_basic': 2.73, 'eps_diluted': 2.69, 'dividend': 0.64, 'assets': 108704000000, 'cur_assets': 74918000000, 'cur_liab': 28774000000, 'equity': 57083000000, 'cash': 9610000000, 'cash_flow_op': 26994000000, 'cash_flow_inv': -14616000000, 'cash_flow_fin': -8376000000 }) def test_msft_20111231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/789019/000119312512026864/msft-20111231.xml') self.assert_item(item, { 'symbol': 'MSFT', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2012, 'end_date': '2011-12-31', 'revenues': 20885000000, 'op_income': 7994000000, 'net_income': 6624000000, 'eps_basic': 0.79, 'eps_diluted': 0.78, 'dividend': 0.20, 'assets': 112243000000, 'cur_assets': 72513000000, 'cur_liab': 25373000000, 'equity': 64121000000, 'cash': 10610000000, 'cash_flow_op': 5862000000, 'cash_flow_inv': -5568000000, 'cash_flow_fin': -2513000000 }) def test_msft_20130331(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/789019/000119312513160748/msft-20130331.xml') self.assert_item(item, { 'symbol': 'MSFT', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2013, 'end_date': '2013-03-31', 'revenues': 20489000000, 'op_income': 7612000000, 'net_income': 6055000000, 'eps_basic': 0.72, 'eps_diluted': 0.72, 'dividend': 0.23, 'assets': 134105000000, 'cur_assets': 93524000000, 'cur_liab': 31929000000, 'equity': 76688000000, 'cash': 5240000000, 'cash_flow_op': 9666000000, 'cash_flow_inv': -7660000000, 'cash_flow_fin': -2744000000 }) def test_mu_20121129(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/723125/000072312513000007/mu-20121129.xml') self.assert_item(item, { 'symbol': 'MU', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2013, 'end_date': '2012-11-29', 'revenues': 1834000000, 'op_income': -157000000, 'net_income': -275000000, 'eps_basic': -0.27, 'eps_diluted': -0.27, 'dividend': 0.0, 'assets': 14067000000, 'cur_assets': 5315000000, 'cur_liab': 2138000000, 'equity': 8186000000, 'cash': 2102000000, 'cash_flow_op': 236000000, 'cash_flow_inv': -639000000, 'cash_flow_fin': 46000000 }) def test_mxim_20110326(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/743316/000144530511000751/mxim-20110422.xml') self.assert_item(item, { 'symbol': 'MXIM', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2011, 'end_date': '2011-03-26', 'revenues': 606775000, 'op_income': 163995000, 'net_income': 136276000, 'eps_basic': 0.46, 'eps_diluted': 0.45, 'dividend': 0.21, 'assets': 3452417000, 'cur_assets': 1676593000, 'cur_liab': 391153000, 'equity': 2465040000, 'cash': 868923000, 'cash_flow_op': 615180000, 'cash_flow_inv': -224755000, 'cash_flow_fin': -348014000 }) def test_nflx_20120930(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1065280/000106528012000020/nflx-20120930.xml') self.assert_item(item, { 'symbol': 'NFLX', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2012, 'end_date': '2012-09-30', 'revenues': 905089000, 'op_income': 16135000, 'net_income': 7675000, 'eps_basic': 0.14, 'eps_diluted': 0.13, 'dividend': 0.0, 'assets': 3808833000, 'cur_assets': 2225018000, 'cur_liab': 1598223000, 'equity': 716840000, 'cash': 370298000, 'cash_flow_op': 150000, 'cash_flow_inv': -33524000, 'cash_flow_fin': -158000 }) def test_nvda_20130127(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1045810/000104581013000008/nvda-20130127.xml') self.assert_item(item, { 'symbol': 'NVDA', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2013, 'end_date': '2013-01-27', 'revenues': 4280159000, 'op_income': 648239000, 'net_income': 562536000, 'eps_basic': 0.91, 'eps_diluted': 0.9, 'dividend': 0.075, 'assets': 6412245000, 'cur_assets': 4775258000, 'cur_liab': 976223000, 'equity': 4827703000, 'cash': 732786000, 'cash_flow_op': 824172000, 'cash_flow_inv': -743992000, 'cash_flow_fin': -15270000 }) def test_nws_20090930(self): # This symbol was changed to FOX item = parse_xml('http://www.sec.gov/Archives/edgar/data/1308161/000119312509224062/nws-20090930.xml') self.assert_item(item, { 'symbol': 'NWS', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2010, 'end_date': '2009-09-30', 'revenues': 7199000000, 'op_income': 1042000000, 'net_income': 571000000, 'eps_basic': 0.22, 'eps_diluted': 0.22, 'dividend': 0.06, 'assets': 55316000000, 'cur_assets': 17425000000, 'cur_liab': 10990000000, 'equity': 24479000000, 'cash': 7832000000, 'cash_flow_op': 680000000, 'cash_flow_inv': -362000000, 'cash_flow_fin': 942000000 }) def test_omx_20110924(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/12978/000119312511286448/omx-20110924.xml') self.assert_item(item, { 'symbol': 'OMX', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2011, 'end_date': '2011-09-24', 'revenues': 1774767000, 'op_income': 41296000, 'net_income': 21518000, 'eps_basic': 0.25, 'eps_diluted': 0.25, 'dividend': 0.0, 'assets': 4002981000, 'cur_assets': 1950996000, 'cur_liab': 998377000, 'equity': 657636000, 'cash': 485426000, 'cash_flow_op': 78743000, 'cash_flow_inv': -41380000, 'cash_flow_fin': -11280000 }) def test_omx_20111231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/12978/000119312512077611/omx-20111231.xml') self.assert_item(item, { 'symbol': 'OMX', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2011, 'end_date': '2011-12-31', 'revenues': 7121167000, 'op_income': 86486000, 'net_income': 32771000, 'eps_basic': 0.38, 'eps_diluted': 0.38, 'dividend': 0.0, 'assets': 4069275000, 'cur_assets': 1938974000, 'cur_liab': 1013301000, 'equity': 568993000, 'cash': 427111000, 'cash_flow_op': 53679000, 'cash_flow_inv': -69373000, 'cash_flow_fin': -17952000 }) def test_omx_20121229(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/12978/000119312513073972/omx-20121229.xml') self.assert_item(item, { 'symbol': 'OMX', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2012, 'end_date': '2012-12-29', 'revenues': 6920384000, 'op_income': 24278000, 'net_income': 414694000, 'eps_basic': 4.79, 'eps_diluted': 4.74, 'dividend': 0.0, 'assets': 3784315000, 'cur_assets': 1983884000, 'cur_liab': 1056641000, 'equity': 1034373000, 'cash': 495056000, 'cash_flow_op': 185201000, 'cash_flow_inv': -85244000, 'cash_flow_fin': -34836000 }) def test_orly_20130331(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/898173/000089817313000028/orly-20130331.xml') self.assert_item(item, { 'symbol': 'ORLY', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2013, 'end_date': '2013-03-31', 'revenues': 1585009000, 'op_income': 251084000, 'net_income': 154329000, 'eps_basic': 1.38, 'eps_diluted': 1.36, 'dividend': 0.0, 'assets': 5789541000, 'cur_assets': 2741188000, 'cur_liab': 2349022000, 'equity': 2072525000, 'cash': 205410000, 'cash_flow_op': 226344000, 'cash_flow_inv': -72100000, 'cash_flow_fin': -196962000 }) def test_pay_20110430(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1312073/000119312511161119/pay-20110430.xml') self.assert_item(item, { 'symbol': 'PAY', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2011, 'end_date': '2011-04-30', 'revenues': 292446000, 'op_income': 37338000, 'net_income': 25200000, 'eps_basic': 0.29, 'eps_diluted': 0.27, 'dividend': 0.0, 'assets': 1252289000, 'cur_assets': 935395000, 'cur_liab': 303590000, 'equity': 332172000, 'cash': 531542000, 'cash_flow_op': 68831000, 'cash_flow_inv': -20049000, 'cash_flow_fin': 34676000 }) def test_pcar_20100331(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/75362/000119312510108284/pcar-20100331.xml') self.assert_item(item, { 'symbol': 'PCAR', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2010, 'end_date': '2010-03-31', 'revenues': 2230700000, 'op_income': None, 'net_income': 68300000, 'eps_basic': 0.19, 'eps_diluted': 0.19, 'dividend': 0.09, 'assets': 13990000000, 'cur_assets': 3396400000, 'cur_liab': 1425900000, 'equity': 5092600000, 'cash': 1854700000, 'cash_flow_op': 285400000, 'cash_flow_inv': 40500000, 'cash_flow_fin': -350800000 }) def test_pcg_20091231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1004980/000100498010000015/pcg-20091231.xml') self.assert_item(item, { 'symbol': 'PCG', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2009, 'end_date': '2009-12-31', 'revenues': 13399000000, 'op_income': 2299000000, 'net_income': 1220000000, 'eps_basic': 3.25, 'eps_diluted': 3.2, 'dividend': 1.68, 'assets': 42945000000, 'cur_assets': 5657000000, 'cur_liab': 6813000000, 'equity': 10585000000, 'cash': 527000000, 'cash_flow_op': 3039000000, 'cash_flow_inv': -3336000000, 'cash_flow_fin': 605000000 }) def test_plt_20130630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/914025/000091402513000049/plt-20130630.xml') self.assert_item(item, { 'symbol': 'PLT', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2014, 'end_date': '2013-06-30', 'revenues': 202818000, 'op_income': 35949000, 'net_income': 26953000, 'eps_basic': 0.63, 'eps_diluted': 0.62, 'dividend': 0.1, 'assets': 780520000, 'cur_assets': 568272000, 'cur_liab': 90121000, 'equity': 673569000, 'cash': 256343000, 'cash_flow_op': 34140000, 'cash_flow_inv': -4120000, 'cash_flow_fin': -2424000 }) def test_qep_20110630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1108827/000119312511202252/qep-20110630.xml') self.assert_item(item, { 'symbol': 'QEP', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2011, 'end_date': '2011-06-30', 'revenues': 784100000, 'op_income': 168900000, 'net_income': 92800000, 'eps_basic': 0.52, 'eps_diluted': 0.52, 'dividend': 0.02, 'assets': 7075000000, 'cur_assets': 655600000, 'cur_liab': 582900000, 'equity': 3184400000, 'cash': None, 'cash_flow_op': 628600000, 'cash_flow_inv': -660200000, 'cash_flow_fin': 31600000 }) def test_qep_20120930(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1108827/000110882712000006/qep-20120930.xml') self.assert_item(item, { 'symbol': 'QEP', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2012, 'end_date': '2012-09-30', 'revenues': 542400000, 'op_income': -12600000, 'net_income': -3100000, 'eps_basic': -0.02, 'eps_diluted': -0.02, 'dividend': 0.02, 'assets': 8996100000, 'cur_assets': 619800000, 'cur_liab': 616700000, 'equity': 3377000000, 'cash': 0.0, 'cash_flow_op': 972000000, 'cash_flow_inv': -2435700000, 'cash_flow_fin': 1463700000 }) def test_regn_20100630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/872589/000120677410001689/regn-20100630.xml') self.assert_item(item, { 'symbol': 'REGN', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2010, 'end_date': '2010-06-30', 'revenues': 115886000, 'op_income': -23724000, 'net_income': -25474000, 'eps_basic': -0.31, 'eps_diluted': -0.31, 'dividend': 0.0, 'assets': 790641000, 'cur_assets': 417750000, 'cur_liab': 119571000, 'equity': 371216000, 'cash': 112000000, 'cash_flow_op': -22626000, 'cash_flow_inv': -131383000, 'cash_flow_fin': 58934000 }) def test_sbac_20110331(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1034054/000119312511130220/sbac-20110331.xml') self.assert_item(item, { 'symbol': 'SBAC', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2011, 'end_date': '2011-03-31', 'revenues': 167749000, 'op_income': 23899000, 'net_income': -34251000, 'eps_basic': -0.3, 'eps_diluted': -0.3, 'dividend': 0.0, 'assets': 3466258000, 'cur_assets': 173387000, 'cur_liab': 120247000, 'equity': 213078000, 'cash': 95104000, 'cash_flow_op': 53197000, 'cash_flow_inv': -108748000, 'cash_flow_fin': 86401000 }) def test_shld_20101030(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1310067/000119312510263486/shld-20101030.xml') self.assert_item(item, { 'symbol': 'SHLD', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2010, 'end_date': '2010-10-30', 'revenues': 9678000000, 'op_income': -292000000, 'net_income': -218000000, 'eps_basic': -1.98, 'eps_diluted': -1.98, 'dividend': 0.0, 'assets': 26045000000, 'cur_assets': 13123000000, 'cur_liab': 10682000000, 'equity': 8378000000, 'cash': 790000000, 'cash_flow_op': -1172000000, 'cash_flow_inv': -296000000, 'cash_flow_fin': 532000000 }) def test_sial_20101231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/90185/000119312511028579/sial-20101231.xml') self.assert_item(item, { 'symbol': 'SIAL', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2010, 'end_date': '2010-12-31', 'revenues': 2271000000, 'op_income': 551000000, 'net_income': 384000000, 'eps_basic': 3.17, 'eps_diluted': 3.12, 'dividend': 0.0, 'assets': 3014000000, 'cur_assets': 1602000000, 'cur_liab': 530000000, 'equity': 1976000000, 'cash': 569000000, 'cash_flow_op': 523000000, 'cash_flow_inv': -182000000, 'cash_flow_fin': -161000000 }) def test_siri_20100630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/908937/000095012310074081/siri-20100630.xml') self.assert_item(item, { 'symbol': 'SIRI', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2010, 'end_date': '2010-06-30', 'revenues': 699761000, 'op_income': 125634000, 'net_income': 15272000, 'eps_basic': 0.0, 'eps_diluted': 0.0, 'dividend': 0.0, 'assets': 7200932000, 'cur_assets': 760172000, 'cur_liab': 2041871000, 'equity': 180428000, 'cash': 258854000, 'cash_flow_op': 140987000, 'cash_flow_inv': -159859000, 'cash_flow_fin': -105763000 }) def test_siri_20120331(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/908937/000090893712000003/siri-20120331.xml') self.assert_item(item, { 'symbol': 'SIRI', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2012, 'end_date': '2012-03-31', 'revenues': 804722000, 'op_income': 199238000, 'net_income': 107774000, 'eps_basic': 0.03, 'eps_diluted': 0.02, 'dividend': 0.0, 'assets': 7501724000, 'cur_assets': 1337094000, 'cur_liab': 2236580000, 'equity': 849579000, 'cash': 746576000, 'cash_flow_op': 39948000, 'cash_flow_inv': -25187000, 'cash_flow_fin': -42175000 }) def test_spex_20130331(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/12239/000141588913001019/spex-20130331.xml') self.assert_item(item, { 'symbol': 'SPEX', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2013, 'end_date': '2013-03-31', 'revenues': 5761, 'op_income': -910547, 'net_income': -3696570, 'eps_basic': -5.35, 'eps_diluted': None, 'dividend': 0.0, 'assets': 3572989, 'cur_assets': 3535555, 'cur_liab': 453858, 'equity': 2857993, 'cash': 3448526, 'cash_flow_op': -1049711, 'cash_flow_inv': None, 'cash_flow_fin': None }) def test_strza_20121231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1507934/000150793413000015/strza-20121231.xml') self.assert_item(item, { 'symbol': 'STRZA', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2012, 'end_date': '2012-12-31', 'revenues': 1630696000, 'op_income': 405404000, 'net_income': 254484000, 'eps_basic': None, 'eps_diluted': None, 'dividend': 0.0, 'assets': 2176050000, 'cur_assets': 1376911000, 'cur_liab': 330451000, 'equity': 1302144000, 'cash': 749774000, 'cash_flow_op': 292077000, 'cash_flow_inv': -16214000, 'cash_flow_fin': -626101000 }) def test_stx_20120928(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1137789/000110465912072744/stx-20120928.xml') self.assert_item(item, { 'symbol': 'STX', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2013, 'end_date': '2012-09-28', 'revenues': 3732000000, 'op_income': 624000000, 'net_income': 582000000, 'eps_basic': 1.48, 'eps_diluted': 1.42, 'dividend': 0.32, 'assets': 9522000000, 'cur_assets': 5749000000, 'cur_liab': 2753000000, 'equity': 3535000000, 'cash': 1894000000, 'cash_flow_op': 1132000000, 'cash_flow_inv': -265000000, 'cash_flow_fin': -681000000 }) def test_stx_20121228(self): # 'stx-20120928' is misnamed item = parse_xml('http://www.sec.gov/Archives/edgar/data/1137789/000110465913005497/stx-20120928.xml') self.assert_item(item, { 'symbol': 'STX', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2013, 'end_date': '2012-12-28', 'revenues': 3668000000, 'op_income': 555000000, 'net_income': 492000000, 'eps_basic': 1.33, 'eps_diluted': 1.3, 'dividend': 0.7, 'assets': 8742000000, 'cur_assets': 5017000000, 'cur_liab': 2643000000, 'equity': 2925000000, 'cash': 1383000000, 'cash_flow_op': 1976000000, 'cash_flow_inv': -453000000, 'cash_flow_fin': -1849000000 }) def test_symc_20130628(self): # 'symc-20140628.xml' is misnamed item = parse_xml('http://www.sec.gov/Archives/edgar/data/849399/000119312513312695/symc-20140628.xml') self.assert_item(item, { 'symbol': 'SYMC', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2014, 'end_date': '2013-06-28', 'revenues': 1709000000, 'op_income': 224000000, 'net_income': 157000000, 'eps_basic': 0.23, 'eps_diluted': 0.22, 'dividend': 0.15, 'assets': 13151000000, 'cur_assets': 5179000000, 'cur_liab': 4205000000, 'equity': 5497000000, 'cash': 3749000000, 'cash_flow_op': 312000000, 'cash_flow_inv': -29000000, 'cash_flow_fin': -1192000000 }) def test_tgt_20130803(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/27419/000110465913066569/tgt-20130803.xml') self.assert_item(item, { 'symbol': 'TGT', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2013, 'end_date': '2013-08-03', 'revenues': 17117000000, 'op_income': 1161000000, 'net_income': 611000000, 'eps_basic': 0.96, 'eps_diluted': 0.95, 'dividend': 0.43, 'assets': 44162000000, 'cur_assets': 11403000000, 'cur_liab': 12616000000, 'equity': 16020000000, 'cash': 1018000000, 'cash_flow_op': 4109000000, 'cash_flow_inv': 1269000000, 'cash_flow_fin': -5148000000 }) def test_trv_20100331(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/86312/000110465910021504/trv-20100331.xml') self.assert_item(item, { 'symbol': 'TRV', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2010, 'end_date': '2010-03-31', 'revenues': 6119000000, 'op_income': None, 'net_income': 647000000, 'eps_basic': 1.26, 'eps_diluted': 1.25, 'dividend': 0.0, 'assets': 108696000000, 'cur_assets': None, 'cur_liab': None, 'equity': 26671000000, 'cash': 251000000, 'cash_flow_op': 531000000, 'cash_flow_inv': 952000000, 'cash_flow_fin': -1486000000 }) def test_tsla_20110630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1318605/000119312511221497/tsla-20110630.xml') self.assert_item(item, { 'symbol': 'TSLA', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2011, 'end_date': '2011-06-30', 'revenues': 58171000, 'op_income': -58739000, 'net_income': -58903000, 'eps_basic': -0.60, 'eps_diluted': -0.60, 'dividend': 0.0, 'assets': 646155000, 'cur_assets': 417758000, 'cur_liab': 138736000, 'equity': 348452000, 'cash': 319380000, 'cash_flow_op': -65785000, 'cash_flow_inv': -13011000, 'cash_flow_fin': 298618000 }) def test_tsla_20111231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1318605/000119312512137560/tsla-20111231.xml') self.assert_item(item, { 'symbol': 'TSLA', 'amend': True, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2011, 'end_date': '2011-12-31', 'revenues': 204242000, 'op_income': -251488000, 'net_income': -254411000, 'eps_basic': -2.53, 'eps_diluted': -2.53, 'dividend': 0.0, 'assets': 713448000, 'cur_assets': 372838000, 'cur_liab': 191339000, 'equity': 224045000, 'cash': 255266000, 'cash_flow_op': -114364000, 'cash_flow_inv': -175928000, 'cash_flow_fin': 446000000 }) def test_tsla_20130630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1318605/000119312513327916/tsla-20130630.xml') self.assert_item(item, { 'symbol': 'TSLA', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2013, 'end_date': '2013-06-30', 'revenues': 405139000, 'op_income': -11792000, 'net_income': -30502000, 'eps_basic': -0.26, 'eps_diluted': -0.26, 'dividend': 0.0, 'assets': 1887844000, 'cur_assets': 1129542000, 'cur_liab': 486545000, 'equity': 629426000, 'cash': 746057000, 'cash_flow_op': 25886000, 'cash_flow_inv': -82410000, 'cash_flow_fin': 600691000 }) def test_utmd_20111231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/706698/000109690612002585/utmd-20111231.xml') self.assert_item(item, { 'symbol': 'UTMD', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2011, 'end_date': '2011-12-31', 'revenues': 37860000, 'op_income': 11842000, 'net_income': 7414000, 'eps_basic': 2.04, 'eps_diluted': 2.03, 'dividend': 0.0, 'assets': 76389000, 'cur_assets': 17016000, 'cur_liab': 9631000, 'equity': 40757000, 'cash': 6534000, 'cash_flow_op': 11365000, 'cash_flow_inv': -26685000, 'cash_flow_fin': 18078000 }) def test_vel_pe_20130930(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/103682/000119312513427104/d-20130930.xml') self.assert_item(item, { 'symbol': 'VEL - PE', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2013, 'end_date': '2013-09-30', 'revenues': 3432000000, 'op_income': 1034000000, 'net_income': 569000000, 'eps_basic': 0.98, 'eps_diluted': 0.98, 'dividend': 0.5625, 'assets': 48488000000, 'cur_assets': 5210000000, 'cur_liab': 6453000000, 'equity': 11242000000, 'cash': 287000000, 'cash_flow_op': 2950000000, 'cash_flow_inv': -2348000000, 'cash_flow_fin': -563000000 }) def test_via_20090930(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1339947/000119312509221448/via-20090930.xml') self.assert_item(item, { 'symbol': 'VIA', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2009, 'end_date': '2009-09-30', 'revenues': 3317000000, 'op_income': 784000000, 'net_income': 463000000, 'eps_basic': 0.76, 'eps_diluted': 0.76, 'dividend': 0.0, 'assets': 21307000000, 'cur_assets': 3605000000, 'cur_liab': 3707000000, 'equity': 8044000000, 'cash': 249000000, 'cash_flow_op': 732000000, 'cash_flow_inv': -117000000, 'cash_flow_fin': -1169000000 }) def test_via_20091231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1339947/000119312510028165/via-20091231.xml') self.assert_item(item, { 'symbol': 'VIA', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2009, 'end_date': '2009-12-31', 'revenues': 13619000000, 'op_income': 2904000000, 'net_income': 1611000000, 'eps_basic': 2.65, 'eps_diluted': 2.65, 'dividend': 0.0, 'assets': 21900000000, 'cur_assets': 4430000000, 'cur_liab': 3751000000, 'equity': 8677000000, 'cash': 298000000, 'cash_flow_op': 1151000000, 'cash_flow_inv': -274000000, 'cash_flow_fin': -1388000000 }) def test_via_20120630(self): # 'via-20120401.xml' is misnamed item = parse_xml('http://www.sec.gov/Archives/edgar/data/1339947/000119312512333732/via-20120401.xml') self.assert_item(item, { 'symbol': 'VIA', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2012, 'end_date': '2012-06-30', 'revenues': 3241000000, 'op_income': 903000000, 'net_income': 534000000, 'eps_basic': 1.02, 'eps_diluted': 1.01, 'dividend': 0.275, 'assets': 21958000000, 'cur_assets': 4511000000, 'cur_liab': 3716000000, 'equity': 7473000000, 'cash': 774000000, 'cash_flow_op': 1736000000, 'cash_flow_inv': -212000000, 'cash_flow_fin': -1750000000 }) def test_vno_20090630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/899689/000089968909000034/vno-20090630.xml') self.assert_item(item, { 'symbol': 'VNO', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'FY', # mismarked in doc, actually should be Q2 'fiscal_year': 2009, 'end_date': '2009-06-30', 'revenues': 678385000, 'op_income': 221139000, 'net_income': -51904000, 'eps_basic': -0.3, 'eps_diluted': -0.3, 'dividend': 0.95, 'assets': 21831857000, 'cur_assets': None, 'cur_liab': None, 'equity': 7122175000, 'cash': 2068498000, 'cash_flow_op': 379439000, 'cash_flow_inv': -219310000, 'cash_flow_fin': 381516000 }) def test_vno_20111231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/899689/000089968912000004/vno-20111231.xml') self.assert_item(item, { 'symbol': 'VNO', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2011, 'end_date': '2011-12-31', 'revenues': 2915665000, 'op_income': 856153000, 'net_income': 601771000, 'eps_basic': 3.26, 'eps_diluted': 3.23, 'dividend': 0.0, 'assets': 20446487000, 'cur_assets': None, 'cur_liab': None, 'equity': 7508447000, 'cash': 606553000, 'cash_flow_op': 702499000, 'cash_flow_inv': -164761000, 'cash_flow_fin': -621974000 }) def test_vrsk_20120930(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1442145/000119312512441544/vrsk-20120930.xml') self.assert_item(item, { 'symbol': 'VRSK', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2012, 'end_date': '2012-09-30', 'revenues': 398863000, 'op_income': 155251000, 'net_income': 82911000, 'eps_basic': 0.5, 'eps_diluted': 0.48, 'dividend': 0.0, 'assets': 2303433000, 'cur_assets': 361337000, 'cur_liab': 668257000, 'equity': 142048000, 'cash': 97770000, 'cash_flow_op': 320997000, 'cash_flow_inv': -838704000, 'cash_flow_fin': 424004000 }) def test_wat_20120929(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1000697/000119312512448069/wat-20120929.xml') self.assert_item(item, { 'symbol': 'WAT', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q3', 'fiscal_year': 2012, 'end_date': '2012-09-29', 'revenues': 449952000, 'op_income': 121745000, 'net_income': 99109000, 'eps_basic': 1.13, 'eps_diluted': 1.12, 'dividend': 0.0, 'assets': 2997140000, 'cur_assets': 2137498000, 'cur_liab': 767562000, 'equity': 1329879000, 'cash': 356293000, 'cash_flow_op': 317627000, 'cash_flow_inv': -298851000, 'cash_flow_fin': -53396000 }) def test_wec_20130331(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/783325/000010781513000080/wec-20130331.xml') self.assert_item(item, { 'symbol': 'WEC', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2013, 'end_date': '2013-03-31', 'revenues': 1275200000, 'op_income': 321000000, 'net_income': 176600000, 'eps_basic': 0.77, 'eps_diluted': 0.76, 'dividend': 0.34, 'assets': 14295300000, 'cur_assets': 1313800000, 'cur_liab': 1278100000, 'equity': 8675000000, 'cash': 24700000, 'cash_flow_op': 330300000, 'cash_flow_inv': -145300000, 'cash_flow_fin': -195900000 }) def test_wec_20130630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/783325/000010781513000112/wec-20130630.xml') self.assert_item(item, { 'symbol': 'WEC', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2013, 'end_date': '2013-06-30', 'revenues': 1012300000, 'op_income': 229500000, 'net_income': 119000000, 'eps_basic': 0.52, 'eps_diluted': 0.52, 'dividend': 0.34, 'assets': 14317000000, 'cur_assets': 1271100000, 'cur_liab': 1280700000, 'equity': 8609000000, 'cash': 21000000, 'cash_flow_op': 681500000, 'cash_flow_inv': -336600000, 'cash_flow_fin': -359500000 }) def test_wfm_20120115(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/865436/000144530512000434/wfm-20120115.xml') self.assert_item(item, { 'symbol': 'WFM', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2012, 'end_date': '2012-01-15', 'revenues': 3390940000, 'op_income': 190338000, 'net_income': 118327000, 'eps_basic': 0.66, 'eps_diluted': 0.65, 'dividend': 0.14, 'assets': 4528241000, 'cur_assets': 1677087000, 'cur_liab': 896972000, 'equity': 3182747000, 'cash': 529954000, 'cash_flow_op': 260896000, 'cash_flow_inv': -6963000, 'cash_flow_fin': 63562000 }) def test_xel_20100331(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/72903/000110465910024080/xel-20100331.xml') self.assert_item(item, { 'symbol': 'XEL', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2010, 'end_date': '2010-03-31', 'revenues': 2807462000, 'op_income': 403665000, 'net_income': 166058000, 'eps_basic': 0.36, 'eps_diluted': 0.36, 'dividend': 0.25, 'assets': 25334501000, 'cur_assets': 2344294000, 'cur_liab': 2759838000, 'equity': 7355871000, 'cash': 79504000, 'cash_flow_op': 555539000, 'cash_flow_inv': -460112000, 'cash_flow_fin': -121731000 }) def test_xel_20101231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/72903/000114036111012444/xel-20101231.xml') self.assert_item(item, { 'symbol': 'XEL', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2010, 'end_date': '2010-12-31', 'revenues': 10310947000, 'op_income': 1619969000, 'net_income': 751593000, 'eps_basic': 1.63, 'eps_diluted': 1.62, 'dividend': 1.0, 'assets': 27387690000, 'cur_assets': 2732643000, 'cur_liab': 2536533000, 'equity': 8083519000, 'cash': 108437000, 'cash_flow_op': 1893942000, 'cash_flow_inv': -2806724000, 'cash_flow_fin': 905571000 }) def test_xom_20110331(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/34088/000119312511127973/xom-20110331.xml') self.assert_item(item, { 'symbol': 'XOM', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q1', 'fiscal_year': 2011, 'end_date': '2011-03-31', 'revenues': 114004000000, 'op_income': None, 'net_income': 10650000000, 'eps_basic': 2.14, 'eps_diluted': 2.14, 'dividend': 0.44, 'assets': 319533000000, 'cur_assets': 72022000000, 'cur_liab': 73576000000, 'equity': 157531000000, 'cash': 12833000000, 'cash_flow_op': 16856000000, 'cash_flow_inv': -5353000000, 'cash_flow_fin': -6749000000 }) def test_xom_20111231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/34088/000119312512078102/xom-20111231.xml') self.assert_item(item, { 'symbol': 'XOM', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2011, 'end_date': '2011-12-31', 'revenues': 467029000000, 'op_income': None, 'net_income': 41060000000, 'eps_basic': 8.43, 'eps_diluted': 8.42, 'dividend': 1.85, 'assets': 331052000000, 'cur_assets': 72963000000, 'cur_liab': 77505000000, 'equity': 160744000000, 'cash': 12664000000, 'cash_flow_op': 55345000000, 'cash_flow_inv': -22165000000, 'cash_flow_fin': -28256000000 }) def test_xom_20130630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/34088/000003408813000035/xom-20130630.xml') self.assert_item(item, { 'symbol': 'XOM', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2013, 'end_date': '2013-06-30', 'revenues': 106469000000, 'op_income': None, 'net_income': 6860000000, 'eps_basic': 1.55, 'eps_diluted': 1.55, 'dividend': 0.63, 'assets': 341615000000, 'cur_assets': 62844000000, 'cur_liab': 72688000000, 'equity': 171588000000, 'cash': 4609000000, 'cash_flow_op': 21275000000, 'cash_flow_inv': -18547000000, 'cash_flow_fin': -7409000000 }) def test_xray_20091231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/818479/000114420410009164/xray-20091231.xml') self.assert_item(item, { 'symbol': 'XRAY', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2009, 'end_date': '2009-12-31', 'revenues': 2159916000, 'op_income': 381187000, 'net_income': 274258000, 'eps_basic': 1.85, 'eps_diluted': 1.83, 'dividend': 0.2, 'assets': 3087932000, 'cur_assets': 1217796000, 'cur_liab': 444556000, 'equity': 1906958000, 'cash': 450348000, 'cash_flow_op': 362489000, 'cash_flow_inv': -53399000, 'cash_flow_fin': -71420000 }) def test_xrx_20091231(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/108772/000119312510043079/xrx-20091231.xml') self.assert_item(item, { 'symbol': 'XRX', 'amend': False, 'doc_type': '10-K', 'period_focus': 'FY', 'fiscal_year': 2009, 'end_date': '2009-12-31', 'revenues': 15179000000, 'op_income': None, 'net_income': 485000000, 'eps_basic': 0.56, 'eps_diluted': 0.55, 'dividend': 0.0, 'assets': 24032000000, 'cur_assets': 9731000000, 'cur_liab': 4461000000, 'equity': 7191000000, 'cash': 3799000000, 'cash_flow_op': 2208000000, 'cash_flow_inv': -343000000, 'cash_flow_fin': 692000000 }) def test_zmh_20090630(self): item = parse_xml('http://www.sec.gov/Archives/edgar/data/1136869/000095012309035693/zmh-20090630.xml') self.assert_item(item, { 'symbol': 'ZMH', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2009, 'end_date': '2009-06-30', 'revenues': 1019900000, 'op_income': 296499999.99999988, 'net_income': 210099999.99999988, # Wired number, but it's actually in the filing 'eps_basic': 0.98, 'eps_diluted': 0.98, 'dividend': 0.0, 'assets': 7462100000.000001, 'cur_assets': 2328700000.0000005, 'cur_liab': 669200000, 'equity': 5805600000, 'cash': 277500000, 'cash_flow_op': 379700000.00000018, 'cash_flow_inv': -174300000.00000003, 'cash_flow_fin': -142000000.00000003 }) ================================================ FILE: pystock_crawler/tests/test_spiders_edgar.py ================================================ import os import tempfile from scrapy.http import HtmlResponse, XmlResponse from pystock_crawler.spiders.edgar import EdgarSpider, URLGenerator from pystock_crawler.tests.base import TestCaseBase def make_url(symbol, start_date='', end_date=''): '''A URL that lists all 10-Q and 10-K filings of a company.''' return 'http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=%s&type=10-&dateb=%s&datea=%s&owner=exclude&count=300' \ % (symbol, end_date, start_date) def make_link_html(href, text=u'Link'): return u'%s' % (href, text) class URLGeneratorTest(TestCaseBase): def test_no_dates(self): urls = URLGenerator(('FB', 'GOOG')) self.assertEqual(list(urls), [ make_url('FB'), make_url('GOOG') ]) def test_with_start_date(self): urls = URLGenerator(('AAPL', 'AMZN', 'GLD'), start_date='20120215') self.assertEqual(list(urls), [ make_url('AAPL', start_date='20120215'), make_url('AMZN', start_date='20120215'), make_url('GLD', start_date='20120215') ]) def test_with_end_date(self): urls = URLGenerator(('TSLA', 'USO', 'MMM'), end_date='20110530') self.assertEqual(list(urls), [ make_url('TSLA', end_date='20110530'), make_url('USO', end_date='20110530'), make_url('MMM', end_date='20110530') ]) def test_with_start_and_end_dates(self): urls = URLGenerator(('DDD', 'AXP', 'KO'), start_date='20111230', end_date='20121230') self.assertEqual(list(urls), [ make_url('DDD', '20111230', '20121230'), make_url('AXP', '20111230', '20121230'), make_url('KO', '20111230', '20121230') ]) class EdgarSpiderTest(TestCaseBase): def test_empty_creation(self): spider = EdgarSpider() self.assertEqual(spider.start_urls, []) def test_symbol_file(self): # create a mock file of a list of symbols f = tempfile.NamedTemporaryFile('w', delete=False) f.write('# Comment\nGOOG\nADBE\nLNKD\n#comment\nJPM\n') f.close() spider = EdgarSpider(symbols=f.name) urls = list(spider.start_urls) self.assertEqual(urls, [ make_url('GOOG'), make_url('ADBE'), make_url('LNKD'), make_url('JPM') ]) os.remove(f.name) def test_invalid_dates(self): with self.assertRaises(ValueError): EdgarSpider(startdate='12345678') with self.assertRaises(ValueError): EdgarSpider(enddate='12345678') def test_symbol_file_and_dates(self): # create a mock file of a list of symbols f = tempfile.NamedTemporaryFile('w', delete=False) f.write('# Comment\nT\nCBS\nWMT\n') f.close() spider = EdgarSpider(symbols=f.name, startdate='20110101', enddate='20130630') urls = list(spider.start_urls) self.assertEqual(urls, [ make_url('T', '20110101', '20130630'), make_url('CBS', '20110101', '20130630'), make_url('WMT', '20110101', '20130630') ]) os.remove(f.name) def test_parse_company_filing_page(self): ''' Parse the page that lists all filings of a company. Example: http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001288776&type=10-&dateb=&owner=exclude&count=40 ''' spider = EdgarSpider() spider._follow_links = True # HACK body = ''' Useless Link Link Link Useless Link Link Link Uselss Link Link Useless Link ''' response = HtmlResponse('http://sec.gov/mock', body=body) requests = spider.parse(response) urls = [r.url for r in requests] self.assertEqual(urls, [ 'http://sec.gov/Archives/edgar/data/abc-index.htm', 'http://sec.gov/Archives/edgar/data/123-index.htm', 'http://sec.gov/Archives/edgar/data/123/abc-index.htm', 'http://sec.gov/Archives/edgar/data/123/456/abc123-index.htm', 'http://sec.gov/Archives/edgar/data/123/456/789/HELLO-index.htm' ]) def test_parse_quarter_or_annual_page(self): ''' Parse the page that lists filings of a quater or a year of a company. Example: http://www.sec.gov/Archives/edgar/data/1288776/000128877613000055/0001288776-13-000055-index.htm ''' spider = EdgarSpider() spider._follow_links = True # HACK body = ''' Useless Link Link Useless Link Link Useless Link Useless Link Useless Link Link ''' response = HtmlResponse('http://sec.gov/mock', body=body) requests = spider.parse(response) urls = [r.url for r in requests] self.assertEqual(urls, [ 'http://sec.gov/Archives/edgar/data/123/abc-20130630.xml', 'http://sec.gov/Archives/edgar/data/456/789/hello-20130630.xml' ]) def test_parse_xml_report(self): '''Parse XML 10-Q or 10-K report.''' spider = EdgarSpider() spider._follow_links = True # HACK body = ''' 2013-03-31 2013-06-28 false 10-Q Q2 2013-06-28 2013 100 200 0.2 0.19 0.07 1600 300 150 ''' response = XmlResponse('http://sec.gov/Archives/edgar/data/123/abc-20130720.xml', body=body) item = spider.parse_10qk(response) self.assert_item(item, { 'symbol': 'ABC', 'amend': False, 'doc_type': '10-Q', 'period_focus': 'Q2', 'fiscal_year': 2013, 'end_date': '2013-06-28', 'revenues': 100.0, 'net_income': 200.0, 'eps_basic': 0.2, 'eps_diluted': 0.19, 'dividend': 0.07, 'assets': 1600.0, 'equity': 300.0, 'cash': 150.0 }) ================================================ FILE: pystock_crawler/tests/test_spiders_nasdaq.py ================================================ from scrapy.http import TextResponse from pystock_crawler.spiders.nasdaq import NasdaqSpider from pystock_crawler.tests.base import TestCaseBase class NasdaqSpiderTest(TestCaseBase): def test_parse(self): spider = NasdaqSpider() body = ('"Symbol","Name","Doesnt Matter",\n' '"DDD","3D Systems Corporation","50.5",\n' '"VNO","Vornado Realty Trust","103.5",\n' '"VNO^G","Vornado Realty Trust","25.21",\n' '"WBS","Webster Financial Corporation","29.71",\n' '"WBS/WS","Webster Financial Corporation","13.07",\n' '"AAA-A","Some Fake Company","1234.0",') response = TextResponse('http://www.nasdaq.com/dummy_url', body=body) items = list(spider.parse(response)) self.assertEqual(len(items), 3) self.assert_item(items[0], { 'symbol': 'DDD', 'name': '3D Systems Corporation' }) self.assert_item(items[1], { 'symbol': 'VNO', 'name': 'Vornado Realty Trust' }) self.assert_item(items[2], { 'symbol': 'WBS', 'name': 'Webster Financial Corporation' }) ================================================ FILE: pystock_crawler/tests/test_spiders_yahoo.py ================================================ import os import tempfile from scrapy.http import TextResponse from pystock_crawler.spiders.yahoo import make_url, YahooSpider from pystock_crawler.tests.base import TestCaseBase class MakeURLTest(TestCaseBase): def test_no_dates(self): self.assertEqual(make_url('YHOO'), ( 'http://ichart.finance.yahoo.com/table.csv?' 's=YHOO&d=&e=&f=&g=d&a=&b=&c=&ignore=.csv' )) def test_only_start_date(self): self.assertEqual(make_url('GOOG', start_date='20131122'), ( 'http://ichart.finance.yahoo.com/table.csv?' 's=GOOG&d=&e=&f=&g=d&a=10&b=22&c=2013&ignore=.csv' )) def test_only_end_date(self): self.assertEqual(make_url('AAPL', end_date='20131122'), ( 'http://ichart.finance.yahoo.com/table.csv?' 's=AAPL&d=10&e=22&f=2013&g=d&a=&b=&c=&ignore=.csv' )) def test_start_and_end_dates(self): self.assertEqual(make_url('TSLA', start_date='20120305', end_date='20131122'), ( 'http://ichart.finance.yahoo.com/table.csv?' 's=TSLA&d=10&e=22&f=2013&g=d&a=2&b=5&c=2012&ignore=.csv' )) class YahooSpiderTest(TestCaseBase): def test_empty_creation(self): spider = YahooSpider() self.assertEqual(list(spider.start_urls), []) def test_inline_symbols(self): spider = YahooSpider(symbols='C') self.assertEqual(list(spider.start_urls), [make_url('C')]) spider = YahooSpider(symbols='KO,DIS,ATVI') self.assertEqual(list(spider.start_urls), [ make_url(symbol) for symbol in ('KO', 'DIS', 'ATVI') ]) def test_symbol_file(self): try: # Create a mock file of a list of symbols with tempfile.NamedTemporaryFile('w', delete=False) as f: f.write('# Comment\nGOOG\tGoogle Inc.\nAAPL\nFB Facebook.com\n#comment\nAMZN\n') spider = YahooSpider(symbols=f.name) self.assertEqual(list(spider.start_urls), [ make_url(symbol) for symbol in ('GOOG', 'AAPL', 'FB', 'AMZN') ]) finally: os.remove(f.name) def test_illegal_dates(self): with self.assertRaises(ValueError): YahooSpider(startdate='12345678') with self.assertRaises(ValueError): YahooSpider(enddate='12345678') def test_parse(self): spider = YahooSpider() body = ('Date,Open,High,Low,Close,Volume,Adj Close\n' '2013-11-22,121.58,122.75,117.93,121.38,11096700,121.38\n' '2013-09-06,168.57,169.70,165.15,166.97,8619700,166.97\n' '2013-06-26,103.80,105.87,102.66,105.72,6602600,105.72\n') response = TextResponse(make_url('YHOO'), body=body) items = list(spider.parse(response)) self.assertEqual(len(items), 3) self.assert_item(items[0], { 'symbol': 'YHOO', 'date': '2013-11-22', 'open': 121.58, 'high': 122.75, 'low': 117.93, 'close': 121.38, 'volume': 11096700, 'adj_close': 121.38 }) self.assert_item(items[1], { 'symbol': 'YHOO', 'date': '2013-09-06', 'open': 168.57, 'high': 169.70, 'low': 165.15, 'close': 166.97, 'volume': 8619700, 'adj_close': 166.97 }) self.assert_item(items[2], { 'symbol': 'YHOO', 'date': '2013-06-26', 'open': 103.80, 'high': 105.87, 'low': 102.66, 'close': 105.72, 'volume': 6602600, 'adj_close': 105.72 }) ================================================ FILE: pystock_crawler/tests/test_utils.py ================================================ import cStringIO import os from pystock_crawler import utils from pystock_crawler.tests.base import SAMPLE_DATA_DIR, TestCaseBase class UtilsTest(TestCaseBase): def test_check_date_arg(self): utils.check_date_arg('19830305') utils.check_date_arg('19851122') utils.check_date_arg('19980720') utils.check_date_arg('20140212') # OK to pass an empty argument utils.check_date_arg('') with self.assertRaises(ValueError): utils.check_date_arg('1234') with self.assertRaises(ValueError): utils.check_date_arg('2014111') with self.assertRaises(ValueError): utils.check_date_arg('20141301') with self.assertRaises(ValueError): utils.check_date_arg('20140132') def test_parse_limit_arg(self): self.assertEqual(utils.parse_limit_arg(''), (0, None)) self.assertEqual(utils.parse_limit_arg('11,22'), (11, 22)) with self.assertRaises(ValueError): utils.parse_limit_arg('11,22,33') with self.assertRaises(ValueError): utils.parse_limit_arg('abc') def test_load_symbols(self): try: filename = os.path.join(SAMPLE_DATA_DIR, 'test_symbols.txt') with open(filename, 'w') as f: f.write('AAPL Apple Inc.\nGOOG\tGoogle Inc.\n# Comment\nFB\nTWTR\nAMZN\nSPY\n\nYHOO\n# The end\n') symbols = list(utils.load_symbols(filename)) self.assertEqual(symbols, ['AAPL', 'GOOG', 'FB', 'TWTR', 'AMZN', 'SPY', 'YHOO']) finally: os.remove(filename) def test_parse_csv(self): f = cStringIO.StringIO('name,age\nAvon,30\nOmar,29\nJoe,45\n') items = list(utils.parse_csv(f)) self.assertEqual(items, [ { 'name': 'Avon', 'age': '30' }, { 'name': 'Omar', 'age': '29' }, { 'name': 'Joe', 'age': '45' } ]) ================================================ FILE: pystock_crawler/throttle.py ================================================ import logging from scrapy.exceptions import NotConfigured from scrapy import signals class PassiveThrottle(object): ''' Scrapy's AutoThrottle adds too much download delay on edgar spider, making it too slow. PassiveThrottle takes a more "passive" approach. It adds download delay only if there is an error response. ''' def __init__(self, crawler): self.crawler = crawler if not crawler.settings.getbool('PASSIVETHROTTLE_ENABLED'): raise NotConfigured self.debug = crawler.settings.getbool("PASSIVETHROTTLE_DEBUG") self.stats = crawler.stats crawler.signals.connect(self._spider_opened, signal=signals.spider_opened) crawler.signals.connect(self._response_downloaded, signal=signals.response_downloaded) @classmethod def from_crawler(cls, crawler): return cls(crawler) def _spider_opened(self, spider): self.mindelay = self._min_delay(spider) self.maxdelay = self._max_delay(spider) self.retry_http_codes = self._retry_http_codes() self.stats.set_value('delay_count', 0) def _min_delay(self, spider): s = self.crawler.settings return getattr(spider, 'download_delay', 0.0) or \ s.getfloat('DOWNLOAD_DELAY') def _max_delay(self, spider): return self.crawler.settings.getfloat('PASSIVETHROTTLE_MAX_DELAY', 60.0) def _retry_http_codes(self): return self.crawler.settings.getlist('RETRY_HTTP_CODES', []) def _response_downloaded(self, response, request, spider): key, slot = self._get_slot(request, spider) if slot is None: return olddelay = slot.delay self._adjust_delay(slot, response) if self.debug: diff = slot.delay - olddelay conc = len(slot.transferring) msg = "slot: %s | conc:%2d | delay:%5d ms (%+d)" % \ (key, conc, slot.delay * 1000, diff * 1000) spider.log(msg, level=logging.INFO) def _get_slot(self, request, spider): key = request.meta.get('download_slot') return key, self.crawler.engine.downloader.slots.get(key) def _adjust_delay(self, slot, response): """Define delay adjustment policy""" if response.status in self.retry_http_codes: new_delay = max(slot.delay, 1) * 4 new_delay = max(new_delay, self.mindelay) new_delay = min(new_delay, self.maxdelay) slot.delay = new_delay self.stats.inc_value('delay_count') elif response.status == 200: new_delay = max(slot.delay / 2, self.mindelay) if new_delay < 0.01: new_delay = 0 slot.delay = new_delay ================================================ FILE: pystock_crawler/utils.py ================================================ import csv from datetime import datetime def check_date_arg(value, arg_name=None): if value: try: if len(value) != 8: raise ValueError datetime.strptime(value, '%Y%m%d') except ValueError: raise ValueError("Option '%s' must be in YYYYMMDD format, input is '%s'" % (arg_name, value)) def parse_limit_arg(value): if value: tokens = value.split(',') try: if len(tokens) != 2: raise ValueError return int(tokens[0]), int(tokens[1]) except ValueError: raise ValueError("Option 'limit' must be in START,COUNT format, input is '%s'" % value) return 0, None def load_symbols(file_path): symbols = [] with open(file_path) as f: for line in f: line = line.strip() if line and not line.startswith('#'): symbol = line.split()[0] symbols.append(symbol) return symbols def parse_csv(file_like): reader = csv.reader(file_like) headers = reader.next() for row in reader: item = {} for i, value in enumerate(row): header = headers[i] item[header] = value yield item ================================================ FILE: pytest.ini ================================================ [pytest] addopts = --cov-report term-missing --cov pystock_crawler --cov bin pystock_crawler/tests/ ================================================ FILE: requirements-test.txt ================================================ envoy pytest pytest-cov requests ================================================ FILE: requirements.txt ================================================ docopt==0.6.2 leveldb==0.193 Scrapy==0.24.4 service-identity==1.0.0 ================================================ FILE: scrapy.cfg ================================================ # Automatically created by: scrapy startproject # # For more information about the [deploy] section see: # http://doc.scrapy.org/en/latest/topics/scrapyd.html [settings] default = pystock_crawler.settings [deploy] #url = http://localhost:6800/ project = pystock_crawler ================================================ FILE: setup.py ================================================ try: from setuptools import setup except ImportError: from distutils.core import setup import codecs import os import re here = os.path.abspath(os.path.dirname(__file__)) # Read the version number from a source file. # Why read it, and not import? # see https://groups.google.com/d/topic/pypa-dev/0PkjVpcxTzQ/discussion def find_version(*file_paths): # Open in Latin-1 so that we avoid encoding errors. # Use codecs.open for Python 2 compatibility with codecs.open(os.path.join(here, *file_paths), 'r', 'latin1') as f: version_file = f.read() # The version line must have the form # __version__ = 'ver' version_match = re.search(r"^__version__ = ['\"]([^'\"]*)['\"]", version_file, re.M) if version_match: return version_match.group(1) raise RuntimeError('Unable to find version string') def read_description(filename): with codecs.open(filename, encoding='utf-8') as f: return f.read() def parse_requirements(filename): with open(filename) as f: content = f.read() return filter(lambda x: x and not x.startswith('#'), content.splitlines()) setup( name='pystock-crawler', version=find_version('pystock_crawler', '__init__.py'), url='https://github.com/eliangcs/pystock-crawler', description='Crawl and parse stock historical data', long_description=read_description('README.rst'), author='Chang-Hung Liang', author_email='eliang.cs@gmail.com', license='MIT', packages=['pystock_crawler', 'pystock_crawler.spiders'], scripts=['bin/pystock-crawler'], install_requires=parse_requirements('requirements.txt'), classifiers=[ 'Development Status :: 3 - Alpha', 'Environment :: Console', 'Intended Audience :: Developers', 'Intended Audience :: Financial and Insurance Industry', 'License :: OSI Approved :: MIT License', 'Operating System :: OS Independent', 'Programming Language :: Python', 'Programming Language :: Python :: 2.7', 'Topic :: Internet :: WWW/HTTP', 'Topic :: Office/Business :: Financial :: Investment', 'Topic :: Software Development :: Libraries :: Python Modules' ] )