Repository: eliangcs/pystock-crawler
Branch: master
Commit: 8b803c8944f3
Files: 30
Total size: 216.6 KB

Directory structure:
gitextract_mp6yf35w/

├── .gitignore
├── .travis.yml
├── LICENSE
├── MANIFEST.in
├── README.rst
├── bin/
│   └── pystock-crawler
├── pystock_crawler/
│   ├── __init__.py
│   ├── exporters.py
│   ├── items.py
│   ├── loaders.py
│   ├── settings.py
│   ├── spiders/
│   │   ├── __init__.py
│   │   ├── edgar.py
│   │   ├── nasdaq.py
│   │   └── yahoo.py
│   ├── tests/
│   │   ├── __init__.py
│   │   ├── base.py
│   │   ├── test_cmdline.py
│   │   ├── test_loaders.py
│   │   ├── test_spiders_edgar.py
│   │   ├── test_spiders_nasdaq.py
│   │   ├── test_spiders_yahoo.py
│   │   └── test_utils.py
│   ├── throttle.py
│   └── utils.py
├── pytest.ini
├── requirements-test.txt
├── requirements.txt
├── scrapy.cfg
└── setup.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
*.csv
*.log
*.pyc
.coverage
.scrapy/
.~*
build/
dist/
pystock_crawler.egg-info/
pystock_crawler/tests/sample_data/


================================================
FILE: .travis.yml
================================================
language: python
python:
  - 2.7
branches:
  only:
    - master
install:
  - pip install -r requirements.txt
  - pip install -r requirements-test.txt
script:
  - py.test
after_success:
  - pip install python-coveralls
  - coveralls


================================================
FILE: LICENSE
================================================
The MIT License (MIT)

Copyright (c) 2013 Chang-Hung Liang

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


================================================
FILE: MANIFEST.in
================================================
include README.rst LICENSE requirements.txt

================================================
FILE: README.rst
================================================
pystock-crawler
===============

.. image:: https://badge.fury.io/py/pystock-crawler.png
    :target: http://badge.fury.io/py/pystock-crawler

.. image:: https://travis-ci.org/eliangcs/pystock-crawler.png?branch=master
    :target: https://travis-ci.org/eliangcs/pystock-crawler

.. image:: https://coveralls.io/repos/eliangcs/pystock-crawler/badge.png?branch=master
    :target: https://coveralls.io/r/eliangcs/pystock-crawler

``pystock-crawler`` is a utility for crawling historical data of US stocks,
including:

* Ticker symbols listed in NYSE, NASDAQ or AMEX from `NASDAQ.com`_
* Daily prices from `Yahoo Finance`_
* Fundamentals from 10-Q and 10-K filings (XBRL) on `SEC EDGAR`_


Example Output
--------------

NYSE ticker symbols::

    DDD   3D Systems Corporation
    MMM   3M Company
    WBAI  500.com Limited
    ...

Apple's daily prices::

    symbol,date,open,high,low,close,volume,adj_close
    AAPL,2014-04-28,572.80,595.75,572.55,594.09,23890900,594.09
    AAPL,2014-04-25,564.53,571.99,563.96,571.94,13922800,571.94
    AAPL,2014-04-24,568.21,570.00,560.73,567.77,27092600,567.77
    ...

Google's fundamentals::

    symbol,end_date,amend,period_focus,fiscal_year,doc_type,revenues,op_income,net_income,eps_basic,eps_diluted,dividend,assets,cur_assets,cur_liab,cash,equity,cash_flow_op,cash_flow_inv,cash_flow_fin
    GOOG,2009-06-30,False,Q2,2009,10-Q,5522897000.0,1873894000.0,1484545000.0,4.7,4.66,0.0,35158760000.0,23834853000.0,2000962000.0,11911351000.0,31594856000.0,3858684000.0,-635974000.0,46354000.0
    GOOG,2009-09-30,False,Q3,2009,10-Q,5944851000.0,2073718000.0,1638975000.0,5.18,5.13,0.0,37702845000.0,26353544000.0,2321774000.0,12087115000.0,33721753000.0,6584667000.0,-3245963000.0,74851000.0
    GOOG,2009-12-31,False,FY,2009,10-K,23650563000.0,8312186000.0,6520448000.0,20.62,20.41,0.0,40496778000.0,29166958000.0,2747467000.0,10197588000.0,36004224000.0,9316198000.0,-8019205000.0,233412000.0
    ...


Installation
------------

Prerequisites:

* Python 2.7

``pystock-crawler`` is based on Scrapy_, so you will also need to install
prerequisites such as lxml_ and libffi_ for Scrapy and its dependencies. On
Ubuntu, for example, you can install them like this::

    sudo apt-get update
    sudo apt-get install -y gcc python-dev libffi-dev libssl-dev libxml2-dev libxslt1-dev build-essential

See `Scrapy's installation guide`_ for more details.

After installing prerequisites, you can then install ``pystock-crawler`` with
``pip``::

    (sudo) pip install pystock-crawler


Quickstart
----------

**Example 1.** Fetch Google's and Yahoo's daily prices ordered by date::

    pystock-crawler prices GOOG,YHOO -o out.csv --sort

**Example 2.** Fetch daily prices of all companies listed in
``./symbols.txt``::

    pystock-crawler prices ./symbols.txt -o out.csv

**Example 3.** Fetch Facebook's fundamentals during 2013::

    pystock-crawler reports FB -o out.csv -s 20130101 -e 20131231

**Example 4.** Fetch fundamentals of all companies in ``./nyse.txt`` and direct
the log to ``./crawling.log``::

    pystock-crawler reports ./nyse.txt -o out.csv -l ./crawling.log

**Example 5.** Fetch all ticker symbols in NYSE, NASDAQ and AMEX::

    pystock-crawler symbols NYSE,NASDAQ,AMEX -o out.txt


Usage
-----

Type ``pystock-crawler -h`` to see command help::

    Usage:
      pystock-crawler symbols <exchanges> (-o OUTPUT) [-l LOGFILE] [-w WORKING_DIR]
                                          [--sort]
      pystock-crawler prices <symbols> (-o OUTPUT) [-s YYYYMMDD] [-e YYYYMMDD]
                                       [-l LOGFILE] [-w WORKING_DIR] [--sort]
      pystock-crawler reports <symbols> (-o OUTPUT) [-s YYYYMMDD] [-e YYYYMMDD]
                                        [-l LOGFILE] [-w WORKING_DIR]
                                        [-b BATCH_SIZE] [--sort]
      pystock-crawler (-h | --help)
      pystock-crawler (-v | --version)

    Options:
      -h --help       Show this screen
      -o OUTPUT       Output file
      -s YYYYMMDD     Start date [default: ]
      -e YYYYMMDD     End date [default: ]
      -l LOGFILE      Log output [default: ]
      -w WORKING_DIR  Working directory [default: .]
      -b BATCH_SIZE   Batch size [default: 500]
      --sort          Sort the result

There are three commands available:

* ``pystock-crawler symbols`` grabs ticker symbol lists
* ``pystock-crawler prices`` grabs daily prices
* ``pystock-crawler reports`` grabs fundamentals

``<exchanges>`` is a comma-separated string that specifies the stock exchanges
you want to include. Current, NYSE, NASDAQ and AMEX are supported.

The output file of ``pystock-crawler symbols`` can be used for ``<symbols>``
argument in ``pystock-crawler prices`` and ``pystock-crawler reports``
commands.

``<symbols>`` can be an inline string separated with commas or a text file
that lists symbols line by line. For example, the inline string can be
something like ``AAPL,GOOG,FB``. And the text file may look like this::

    # This line is comment
    AAPL    Put anything you want here
    GOOG    Since the text here is ignored
    FB

Use ``-o`` to specify the output file. For ``pystock-crawler symbols``
command, the output format is a simple text file. For
``pystock-crawler prices`` and ``pystock-crawler reports`` the output format
is CSV.

``-l`` is where the crawling logs go to. If not specified, the logs go to
stdout.

By default, the crawler uses the current directory as the working directory.
If you don't want to use the current directoy, you can specify it with ``-w``
option. The crawler keeps HTTP cache in a directory named ``.scrapy`` under
the working directory. The cache can save your time by avoid downloading the
same web pages. However, the cache can be quite huge. If you don't need it,
just delete the ``.scrapy`` directory after you've done crawling.

``-b`` option is only available to ``pystock-crawler reports`` command. It
allows you to split a large symbol list into smaller batches. This is actually
a workaround for an unresolved bug (#2). Normally you don't have to specify
this option. Default value (500) works just fine.

The rows in the output file are in an arbitrary order by default. Use
``--sort`` option to sort them by symbols and dates. But if you have a large
output file, don't use --sort because it will be slow and eat a lot of memory.


Developer Guide
---------------

Installing Dependencies
~~~~~~~~~~~~~~~~~~~~~~~
::

    pip install -r requirements.txt


Running Test
~~~~~~~~~~~~

Install test requirements::

    pip install -r requirements-test.txt

Then run the test::

    py.test

This will download the test data (a lot of XML/XBRL files) from from
`SEC EDGAR`_ on the fly, so it will take some time and disk space. The test
data is saved to ``pystock_crawler/tests/sample_data`` directory. It can be
reused on the next time you run the test. If you don't need them, just delete
the ``sample_data`` directory.


.. _libffi: https://sourceware.org/libffi/
.. _lxml: http://lxml.de/
.. _NASDAQ.com: http://www.nasdaq.com/
.. _Scrapy: http://scrapy.org/
.. _Scrapy's installation guide: http://doc.scrapy.org/en/latest/intro/install.html
.. _SEC EDGAR: http://www.sec.gov/edgar/searchedgar/companysearch.html
.. _virtualenv: http://www.virtualenv.org/
.. _virtualenvwrapper: http://virtualenvwrapper.readthedocs.org/
.. _Yahoo Finance: http://finance.yahoo.com/


================================================
FILE: bin/pystock-crawler
================================================
#!/usr/bin/env python
'''
Usage:
  pystock-crawler symbols <exchanges> (-o OUTPUT) [-l LOGFILE] [-w WORKING_DIR]
                                      [--sort]
  pystock-crawler prices <symbols> (-o OUTPUT) [-s YYYYMMDD] [-e YYYYMMDD]
                                   [-l LOGFILE] [-w WORKING_DIR] [--sort]
  pystock-crawler reports <symbols> (-o OUTPUT) [-s YYYYMMDD] [-e YYYYMMDD]
                                    [-l LOGFILE] [-w WORKING_DIR]
                                    [-b BATCH_SIZE] [--sort]
  pystock-crawler (-h | --help)
  pystock-crawler (-v | --version)

Options:
  -h --help       Show this screen
  -o OUTPUT       Output file
  -s YYYYMMDD     Start date [default: ]
  -e YYYYMMDD     End date [default: ]
  -l LOGFILE      Log output [default: ]
  -w WORKING_DIR  Working directory [default: .]
  -b BATCH_SIZE   Batch size [default: 500]
  --sort          Sort the result

'''
import codecs
import math
import os
import sys
import uuid

from contextlib import contextmanager
from docopt import docopt
from scrapy import log

try:
    import pystock_crawler
except ImportError:
    # For development environment
    sys.path.append(os.getcwd())
    import pystock_crawler


def random_string(length=5):
    return uuid.uuid4().get_hex()[0:5]


@contextmanager
def tmp_scrapy_cfg():
    content = '''# pystock_crawler scrapy.cfg
[settings]
default = pystock_crawler.settings

[deploy]
#url = http://localhost:6800/
project = pystock_crawler
'''
    filename = os.path.abspath('./scrapy.cfg')
    filename_bak = os.path.abspath('./scrapy-%s.cfg' % random_string())
    if os.path.exists(filename):
        log.msg(u'Renaming %s -> %s' % (filename, filename_bak))
        os.rename(filename, filename_bak)
    assert not os.path.exists(filename)
    log.msg(u'Creating temporary config: %s' % filename)
    with open(filename, 'w') as f:
        f.write(content)

    yield

    if os.path.exists(filename):
        log.msg(u'Deleting %s' % filename)
        os.remove(filename)
    if os.path.exists(filename_bak):
        log.msg(u'Renaming %s -> %s' % (filename_bak, filename))
        os.rename(filename_bak, filename)


def run_scrapy_command(cmd):
    log.msg('Command: %s' % cmd)
    with tmp_scrapy_cfg():
        os.system(cmd)


def count_symbols(symbols):
    if os.path.exists(symbols):
        # If `symbols` is a file
        with open(symbols) as f:
            count = 0
            for line in f:
                line = line.rstrip()
                if line and not line.startswith('#'):
                    count += 1
        return count

    # If `symbols` is a comma-separated string
    return len(symbols.split(','))


def merge_files(target, sources, ignore_header=False):
    log.msg(u'Merging files to %s' % target)
    with codecs.open(target, 'w', 'utf-8') as out:
        for i, source in enumerate(sources):
            with codecs.open(source, 'r', 'utf-8') as f:
                if ignore_header and i > 0:
                    try:
                        f.next()  # Ignore CSV header
                    except StopIteration:
                        break  # Empty file
                out.write(f.read())

    # Delete source files
    for filename in sources:
        log.msg(u'Deleting %s' % filename)
        os.remove(filename)


def crawl_symbols(exchanges, output, log_file):
    command = 'scrapy crawl nasdaq -a exchanges="%s" -t symbollist' % exchanges

    if output:
        command += ' -o "%s"' % output
    if log_file:
        command += ' -s LOG_FILE="%s"' % log_file

    run_scrapy_command(command)


def crawl(spider, symbols, start_date, end_date, output, log_file, batch_size):
    command = 'scrapy crawl %s -a symbols="%s" -t csv' % (spider, symbols)

    if start_date:
        command += ' -a startdate=%s' % start_date
    if end_date:
        command += ' -a enddate=%s' % end_date
    if log_file:
        command += ' -s LOG_FILE="%s"' % log_file

    if spider == 'edgar':
        # When crawling edgar filings, run the scrapy command batch by batch to
        # work around issue #2
        num_symbols = count_symbols(symbols)
        num_batches = int(math.ceil(num_symbols / float(batch_size)))

        # Store sub-files so we can merge them later
        output_files = []

        for i in xrange(num_batches):
            start = i * batch_size
            batch_cmd = command + ' -a limit=%d,%d' % (start, batch_size)
            if output:
                filename = '%s.%d' % (output, i + 1)
                batch_cmd += ' -o "%s"' % filename
                output_files.append(filename)

            run_scrapy_command(batch_cmd)

        merge_files(output, output_files, ignore_header=True)
    else:
        if output:
            command += ' -o "%s"' % output
        run_scrapy_command(command)


def sort_symbols(filename):
    log.msg(u'Sorting: %s' % filename)

    with codecs.open(filename, 'r', 'utf-8') as f:
        lines = [line for line in f]

    lines = sorted(lines)

    with codecs.open(filename, 'w', 'utf-8') as f:
        f.writelines(lines)

    log.msg(u'Sorted: %s' % filename)


def sort_csv(filename):
    log.msg(u'Sorting: %s' % filename)

    with codecs.open(filename, 'r', 'utf-8') as f:
        try:
            headers = f.next()
        except StopIteration:
            log.msg(u'No need to sort empty file: %s' % filename)
            return
        lines = [line for line in f]

    def line_cmp(line1, line2):
        a = line1.split(',')
        b = line2.split(',')
        length = min(len(a), len(b))
        i = 0
        while 1:
            result = cmp(a[i], b[i])
            if result or i >= length:
                return result
            i += 1

    lines = sorted(lines, cmp=line_cmp)

    with codecs.open(filename, 'w', 'utf-8') as f:
        f.write(headers)
        f.writelines(lines)

    log.msg(u'Sorted: %s' % filename)


def print_version():
    print 'pystock-crawler %s' % pystock_crawler.__version__


def main():
    args = docopt(__doc__)

    symbols = args.get('<symbols>')
    start_date = args.get('-s')
    end_date = args.get('-e')
    output = args.get('-o')
    log_file = args.get('-l')
    batch_size = args.get('-b')
    sorting = args.get('--sort')
    working_dir = args.get('-w')

    if args['prices']:
        spider = 'yahoo'
    elif args['reports']:
        spider = 'edgar'
    else:
        spider = None

    if symbols and os.path.exists(symbols):
        symbols = os.path.abspath(symbols)
    if output:
        output = os.path.abspath(output)
    if log_file:
        log_file = os.path.abspath(log_file)

    try:
        batch_size = int(batch_size)
        if batch_size <= 0:
            raise ValueError
    except ValueError:
        raise ValueError("BATCH_SIZE must be a positive integer, input is '%s'" % batch_size)

    try:
        os.chdir(working_dir)
    except OSError as err:
        sys.stderr.write('%s\n' % err)
        return

    if spider:
        log.start(logfile=log_file)
        crawl(spider, symbols, start_date, end_date, output, log_file, batch_size)
        if sorting and output:
            sort_csv(output)
    elif args['symbols']:
        log.start(logfile=log_file)
        exchanges = args.get('<exchanges>')
        crawl_symbols(exchanges, output, log_file)
        if sorting and output:
            sort_symbols(output)
    elif args['-v'] or args['--version']:
        print_version()


if __name__ == '__main__':
    main()


================================================
FILE: pystock_crawler/__init__.py
================================================
__version__ = '0.8.2'


================================================
FILE: pystock_crawler/exporters.py
================================================
from scrapy.conf import settings
from scrapy.contrib.exporter import BaseItemExporter, CsvItemExporter


class CsvItemExporter2(CsvItemExporter):
    '''
    The standard CsvItemExporter class does not pass the kwargs through to the
    CSV writer, resulting in EXPORT_FIELDS and EXPORT_ENCODING being ignored
    (EXPORT_EMPTY is not used by CSV).

    http://stackoverflow.com/questions/6943778/python-scrapy-how-to-get-csvitemexporter-to-write-columns-in-a-specific-order

    '''
    def __init__(self, *args, **kwargs):
        kwargs['fields_to_export'] = settings.getlist('EXPORT_FIELDS') or None
        kwargs['encoding'] = settings.get('EXPORT_ENCODING', 'utf-8')

        super(CsvItemExporter2, self).__init__(*args, **kwargs)

    def _write_headers_and_set_fields_to_export(self, item):
        # HACK: Override this private method to filter fields that are in
        # fields_to_export but not in item
        if self.include_headers_line:
            item_fields = item.fields.keys()
            if self.fields_to_export:
                self.fields_to_export = filter(lambda a: a in item_fields, self.fields_to_export)
            else:
                self.fields_to_export = item_fields
            self.csv_writer.writerow(self.fields_to_export)


class SymbolListExporter(BaseItemExporter):

    def __init__(self, file, **kwargs):
        self._configure(kwargs, dont_fail=True)
        self.file = file

    def export_item(self, item):
        self.file.write('%s\t%s\n' % (item['symbol'], item['name']))


================================================
FILE: pystock_crawler/items.py
================================================
# Define here the models for your scraped items
#
# See documentation in:
# http://doc.scrapy.org/en/latest/topics/items.html

from scrapy.item import Item, Field


class ReportItem(Item):
    # Trading symbol
    symbol = Field()

    # If this doc is an amendment to previously filed doc
    amend = Field()

    # Quarterly (10-Q) or annual (10-K) report
    doc_type = Field()

    # Q1, Q2, Q3, or FY for annual report
    period_focus = Field()

    fiscal_year = Field()
    end_date = Field()

    revenues = Field()
    op_income = Field()
    net_income = Field()

    eps_basic = Field()
    eps_diluted = Field()

    dividend = Field()

    # Balance sheet stuffs
    assets = Field()
    cur_assets = Field()
    cur_liab = Field()
    equity = Field()
    cash = Field()

    # Cash flow from operating, investing, and financing
    cash_flow_op = Field()
    cash_flow_inv = Field()
    cash_flow_fin = Field()


class PriceItem(Item):
    # Trading symbol
    symbol = Field()

    # YYYY-MM-DD
    date = Field()

    open = Field()
    close = Field()
    high = Field()
    low = Field()
    adj_close = Field()
    volume = Field()


class SymbolItem(Item):
    symbol = Field()
    name = Field()


================================================
FILE: pystock_crawler/loaders.py
================================================
import re

from datetime import datetime, timedelta
from scrapy import log
from scrapy.contrib.loader import ItemLoader
from scrapy.contrib.loader.processor import Compose, MapCompose, TakeFirst
from scrapy.utils.misc import arg_to_iter
from scrapy.utils.python import flatten

from pystock_crawler.items import ReportItem


DATE_FORMAT = '%Y-%m-%d'

MAX_PER_SHARE_VALUE = 1000.0

# If number of characters of response body exceeds this value,
# remove some useless text defined by RE_XML_GARBAGE to reduce memory usage
THRESHOLD_TO_CLEAN = 20000000

# Used to get rid of "<tag>LONG STRING...</tag>"
RE_XML_GARBAGE = re.compile(r'>([^<]{100,})<')


class IntermediateValue(object):
    '''
    Intermediate data that serves as output of input processors, i.e., input
    of output processors. "Intermediate" is shorten as "imd" in later naming.

    '''
    def __init__(self, local_name, value, text, context, node=None, start_date=None,
                 end_date=None, instant=None):
        self.local_name = local_name
        self.value = value
        self.text = text
        self.context = context
        self.node = node
        self.start_date = start_date
        self.end_date = end_date
        self.instant = instant

    def __cmp__(self, other):
        if self.value < other.value:
            return -1
        elif self.value > other.value:
            return 1
        return 0

    def __repr__(self):
        context_id = None
        if self.context:
            context_id = self.context.xpath('@id')[0].extract()
        return '(%s, %s, %s)' % (self.local_name, self.value, context_id)

    def is_member(self):
        return is_member(self.context)


class ExtractText(object):

    def __call__(self, value):
        if hasattr(value, 'select'):
            try:
                return value.xpath('./text()')[0].extract()
            except IndexError:
                return ''
        return unicode(value)


class MatchEndDate(object):

    def __init__(self, data_type=str, ignore_date_range=False):
        self.data_type = data_type
        self.ignore_date_range = ignore_date_range

    def __call__(self, value, loader_context):
        if not hasattr(value, 'select'):
            return IntermediateValue('', 0.0, '0', None)

        doc_end_date_str = loader_context['end_date']
        doc_type = loader_context['doc_type']
        selector = loader_context['selector']

        context_id = value.xpath('@contextRef')[0].extract()
        try:
            context = selector.xpath('//*[@id="%s"]' % context_id)[0]
        except IndexError:
            try:
                url = loader_context['response'].url
            except KeyError:
                url = None
            log.msg(u'Cannot find context: %s in %s' % (context_id, url), log.WARNING)
            return None

        date = instant = start_date = end_date = None
        try:
            instant = context.xpath('.//*[local-name()="instant"]/text()')[0].extract().strip()
        except (IndexError, ValueError):
            try:
                end_date_str = context.xpath('.//*[local-name()="endDate"]/text()')[0].extract().strip()
                end_date = datetime.strptime(end_date_str, DATE_FORMAT)

                start_date_str = context.xpath('.//*[local-name()="startDate"]/text()')[0].extract().strip()
                start_date = datetime.strptime(start_date_str, DATE_FORMAT)

                if self.ignore_date_range or date_range_matches_doc_type(doc_type, start_date, end_date):
                    date = end_date
            except (IndexError, ValueError):
                pass
        else:
            try:
                instant = datetime.strptime(instant, DATE_FORMAT)
            except ValueError:
                pass
            else:
                date = instant

        if date:
            doc_end_date = datetime.strptime(doc_end_date_str, DATE_FORMAT)
            delta_days = (doc_end_date - date).days
            if abs(delta_days) < 30:
                try:
                    text = value.xpath('./text()')[0].extract()
                    val = self.data_type(text)
                except (IndexError, ValueError):
                    pass
                else:
                    local_name = value.xpath('local-name()')[0].extract()
                    return IntermediateValue(
                        local_name, val, text, context, value,
                        start_date=start_date, end_date=end_date, instant=instant)

        return None


class ImdSumMembersOr(object):

    def __init__(self, second_func=None):
        self.second_func = second_func

    def __call__(self, imd_values):
        members = []
        non_members = []
        for imd_value in imd_values:
            if imd_value.is_member():
                members.append(imd_value)
            else:
                non_members.append(imd_value)

        if members and len(members) == len(imd_values):
            return imd_sum(members)

        if imd_values:
            return self.second_func(non_members)
        return None


def date_range_matches_doc_type(doc_type, start_date, end_date):
    delta_days = (end_date - start_date).days
    return ((doc_type == '10-Q' and delta_days < 120 and delta_days > 60) or
            (doc_type == '10-K' and delta_days < 380 and delta_days > 350))


def get_amend(values):
    if values:
        return values[0]
    return False


def get_symbol(values):
    if values:
        symbols = map(lambda s: s.strip(), values[0].split(','))
        return '/'.join(symbols)
    return False


def imd_max(imd_values):
    if imd_values:
        imd_value = max(imd_values)
        return imd_value.value
    return None


def imd_min(imd_values):
    if imd_values:
        imd_value = min(imd_values)
        return imd_value.value
    return None


def imd_sum(imd_values):
    return sum([v.value for v in imd_values])


def imd_get_revenues(imd_values):
    interest_elems = filter(lambda v: 'interest' in v.local_name.lower(), imd_values)
    if len(interest_elems) == len(imd_values):
        # HACK: An exceptional case for BBT
        # Revenues = InterestIncome + NoninterestIncome
        return imd_sum(imd_values)

    return imd_max(imd_values)


def imd_get_net_income(imd_values):
    return imd_min(imd_values)


def imd_get_op_income(imd_values):
    imd_values = filter(lambda v: memberness(v.context) < 2, imd_values)
    return imd_min(imd_values)


def imd_get_cash_flow(imd_values, loader_context):
    if len(imd_values) == 1:
        return imd_values[0].value

    doc_type = loader_context['doc_type']

    within_date_range = []
    for imd_value in imd_values:
        if imd_value.start_date and imd_value.end_date:
            if date_range_matches_doc_type(doc_type, imd_value.start_date, imd_value.end_date):
                within_date_range.append(imd_value)

    if within_date_range:
        return imd_max(within_date_range)

    return imd_max(imd_values)


def imd_get_per_share_value(imd_values):
    if not imd_values:
        return None

    v = imd_values[0]
    value = v.value
    if abs(value) > MAX_PER_SHARE_VALUE:
        try:
            decimals = int(v.node.xpath('@decimals')[0].extract())
        except (AttributeError, IndexError, ValueError):
            return None
        else:
            # HACK: some of LTD's reports have unreasonablely large per share value, such as
            # 320000 EPS (and it should be 0.32), so use decimals attribute to scale it down,
            # note that this is NOT a correct way to interpret decimals attribute
            value *= pow(10, decimals - 2)
    return value if abs(value) <= MAX_PER_SHARE_VALUE else None


def imd_get_equity(imd_values):
    if not imd_values:
        return None

    values = filter(lambda v: v.local_name == 'StockholdersEquityIncludingPortionAttributableToNoncontrollingInterest', imd_values)
    if values:
        return values[0].value

    values = filter(lambda v: v.local_name == 'StockholdersEquity', imd_values)
    if values:
        return values[0].value

    return imd_values[0].value


def imd_filter_member(imd_values):
    if imd_values:
        with_memberness = [(v, memberness(v.context)) for v in imd_values]
        with_memberness = sorted(with_memberness, cmp=lambda a, b: a[1] - b[1])

        m0 = with_memberness[0][1]
        non_members = []

        for v in with_memberness:
            if v[1] == m0:
                non_members.append(v[0])

        return non_members

    return imd_values


def imd_mult(imd_values):
    for v in imd_values:
        try:
            node_id = v.node.xpath('@id')[0].extract().lower()
        except (AttributeError, IndexError):
            pass
        else:
            # HACK: some of LUV's reports have unreasonablely small numbers such as
            # 4136 in revenues which should be 4136 millions, this hack uses id attribute
            # to determine if it should be scaled up
            if 'inmillions' in node_id and abs(v.value) < 100000.0:
                v.value *= 1000000.0
            elif 'inthousands' in node_id and abs(v.value) < 100000000.0:
                v.value *= 1000.0
    return imd_values


def memberness(context):
    '''The likelihood that the context is a "member".'''
    if context:
        texts = context.xpath('.//*[local-name()="explicitMember"]/text()').extract()
        text = str(texts).lower()

        if len(texts) > 1:
            return 2
        elif 'country' in text:
            return 2
        elif 'member' not in text:
            return 0
        elif 'successor' in text:
            # 'SuccessorMember' is a rare case that shouldn't be treated as member
            return 1
        elif 'parent' in text:
            return 2
    return 3


def is_member(context):
    if context:
        texts = context.xpath('.//*[local-name()="explicitMember"]/text()').extract()
        text = str(texts).lower()

        # 'SuccessorMember' is a rare case that shouldn't be treated as member
        if 'member' not in text or 'successor' in text or 'parent' in text:
            return False
    return True


def str_to_bool(value):
    if hasattr(value, 'lower'):
        value = value.lower()
        return bool(value) and value != 'false' and value != '0'
    return bool(value)


def find_namespace(xxs, name):
    name_re = name.replace('-', '\-')
    if not name_re.startswith('xmlns'):
        name_re = 'xmlns:' + name_re
    return xxs.re('%s=\"([^\"]+)\"' % name_re)[0]


def register_namespace(xxs, name):
    ns = find_namespace(xxs, name)
    xxs.register_namespace(name, ns)


def register_namespaces(xxs):
    names = ('xmlns', 'xbrli', 'dei', 'us-gaap')
    for name in names:
        try:
            register_namespace(xxs, name)
        except IndexError:
            pass


class XmlXPathItemLoader(ItemLoader):

    def __init__(self, *args, **kwargs):
        super(XmlXPathItemLoader, self).__init__(*args, **kwargs)
        register_namespaces(self.selector)

    def add_xpath(self, field_name, xpath, *processors, **kw):
        values = self._get_values(xpath, **kw)
        self.add_value(field_name, values, *processors, **kw)
        return len(self._values[field_name])

    def add_xpaths(self, name, paths):
        for path in paths:
            match_count = self.add_xpath(name, path)
            if match_count > 0:
                return match_count

        return 0

    def _get_values(self, xpaths, **kw):
        xpaths = arg_to_iter(xpaths)
        return flatten([self.selector.xpath(xpath) for xpath in xpaths])


class ReportItemLoader(XmlXPathItemLoader):

    default_item_class = ReportItem
    default_output_processor = TakeFirst()

    symbol_in = MapCompose(ExtractText(), unicode.upper)
    symbol_out = Compose(get_symbol)

    amend_in = MapCompose(ExtractText(), str_to_bool)
    amend_out = Compose(get_amend)

    period_focus_in = MapCompose(ExtractText(), unicode.upper)
    period_focus_out = TakeFirst()

    revenues_in = MapCompose(MatchEndDate(float))
    revenues_out = Compose(imd_filter_member, imd_mult, ImdSumMembersOr(imd_get_revenues))

    net_income_in = MapCompose(MatchEndDate(float))
    net_income_out = Compose(imd_filter_member, imd_mult, imd_get_net_income)

    op_income_in = MapCompose(MatchEndDate(float))
    op_income_out = Compose(imd_filter_member, imd_mult, imd_get_op_income)

    eps_basic_in = MapCompose(MatchEndDate(float))
    eps_basic_out = Compose(ImdSumMembersOr(imd_get_per_share_value), lambda x: x if x < MAX_PER_SHARE_VALUE else None)

    eps_diluted_in = MapCompose(MatchEndDate(float))
    eps_diluted_out = Compose(ImdSumMembersOr(imd_get_per_share_value), lambda x: x if x < MAX_PER_SHARE_VALUE else None)

    dividend_in = MapCompose(MatchEndDate(float))
    dividend_out = Compose(imd_get_per_share_value, lambda x: x if x < MAX_PER_SHARE_VALUE and x > 0.0 else 0.0)

    assets_in = MapCompose(MatchEndDate(float))
    assets_out = Compose(imd_filter_member, imd_mult, imd_max)

    cur_assets_in = MapCompose(MatchEndDate(float))
    cur_assets_out = Compose(imd_filter_member, imd_mult, imd_max)

    cur_liab_in = MapCompose(MatchEndDate(float))
    cur_liab_out = Compose(imd_filter_member, imd_mult, imd_max)

    equity_in = MapCompose(MatchEndDate(float))
    equity_out = Compose(imd_filter_member, imd_mult, imd_get_equity)

    cash_in = MapCompose(MatchEndDate(float))
    cash_out = Compose(imd_filter_member, imd_mult, imd_max)

    cash_flow_op_in = MapCompose(MatchEndDate(float, True))
    cash_flow_op_out = Compose(imd_filter_member, imd_mult, imd_get_cash_flow)

    cash_flow_inv_in = MapCompose(MatchEndDate(float, True))
    cash_flow_inv_out = Compose(imd_filter_member, imd_mult, imd_get_cash_flow)

    cash_flow_fin_in = MapCompose(MatchEndDate(float, True))
    cash_flow_fin_out = Compose(imd_filter_member, imd_mult, imd_get_cash_flow)

    def __init__(self, *args, **kwargs):
        response = kwargs.get('response')
        if len(response.body) > THRESHOLD_TO_CLEAN:
            # Remove some useless text to reduce memory usage
            body, __ = RE_XML_GARBAGE.subn(lambda m: '><', response.body)
            response = response.replace(body=body)
            kwargs['response'] = response

        super(ReportItemLoader, self).__init__(*args, **kwargs)

        symbol = self._get_symbol()
        end_date = self._get_doc_end_date()
        fiscal_year = self._get_doc_fiscal_year()
        doc_type = self._get_doc_type()

        # ignore document that is not 10-Q or 10-K
        if not (doc_type and doc_type.split('/')[0] in ('10-Q', '10-K')):
            return

        # some documents set their amendment flag in DocumentType, e.g., '10-Q/A',
        # instead of setting it in AmendmentFlag
        amend = None
        if doc_type.endswith('/A'):
            amend = True
            doc_type = doc_type[0:-2]

        self.context.update({
            'end_date': end_date,
            'doc_type': doc_type
        })

        self.add_xpath('symbol', '//dei:TradingSymbol')
        self.add_value('symbol', symbol)

        if amend:
            self.add_value('amend', True)
        else:
            self.add_xpath('amend', '//dei:AmendmentFlag')

        if doc_type == '10-K':
            period_focus = 'FY'
        else:
            period_focus = self._get_period_focus(end_date)

        if not fiscal_year and period_focus:
            fiscal_year = self._guess_fiscal_year(end_date, period_focus)

        self.add_value('period_focus', period_focus)
        self.add_value('fiscal_year', fiscal_year)
        self.add_value('end_date', end_date)
        self.add_value('doc_type', doc_type)

        self.add_xpaths('revenues', [
            '//us-gaap:SalesRevenueNet',
            '//us-gaap:Revenues',
            '//us-gaap:SalesRevenueGoodsNet',
            '//us-gaap:SalesRevenueServicesNet',
            '//us-gaap:RealEstateRevenueNet',
            '//*[local-name()="NetRevenuesIncludingNetInterestIncome"]',
            '//*[contains(local-name(), "TotalRevenues") and contains(local-name(), "After")]',
            '//*[contains(local-name(), "TotalRevenues")]',
            '//*[local-name()="InterestAndDividendIncomeOperating" or local-name()="NoninterestIncome"]',
            '//*[contains(local-name(), "Revenue")]'
        ])
        self.add_xpath('revenues', '//us-gaap:FinancialServicesRevenue')

        self.add_xpaths('net_income', [
            '//*[contains(local-name(), "NetLossIncome") and contains(local-name(), "Corporation")]',
            '//*[local-name()="NetIncomeLossAvailableToCommonStockholdersBasic" or local-name()="NetIncomeLoss"]',
            '//us-gaap:ProfitLoss',
            '//us-gaap:IncomeLossFromContinuingOperations',
            '//*[contains(local-name(), "IncomeLossFromContinuingOperations") and not(contains(local-name(), "Per"))]',
            '//*[contains(local-name(), "NetIncomeLoss")]',
            '//*[starts-with(local-name(), "NetIncomeAttributableTo")]'
        ])

        self.add_xpaths('op_income', [
            '//us-gaap:OperatingIncomeLoss'
        ])

        self.add_xpaths('eps_basic', [
            '//us-gaap:EarningsPerShareBasic',
            '//us-gaap:IncomeLossFromContinuingOperationsPerBasicShare',
            '//us-gaap:IncomeLossFromContinuingOperationsPerBasicAndDilutedShare',
            '//*[contains(local-name(), "NetIncomeLoss") and contains(local-name(), "Per") and contains(local-name(), "Common")]',
            '//*[contains(local-name(), "Earnings") and contains(local-name(), "Per") and contains(local-name(), "Basic")]',
            '//*[local-name()="IncomePerShareFromContinuingOperationsAvailableToCompanyStockholdersBasicAndDiluted"]',
            '//*[contains(local-name(), "NetLossPerShare")]',
            '//*[contains(local-name(), "NetIncome") and contains(local-name(), "Per") and contains(local-name(), "Basic")]',
            '//*[local-name()="BasicEarningsAttributableToStockholdersPerCommonShare"]',
            '//*[local-name()="Earningspersharebasicanddiluted"]',
            '//*[contains(local-name(), "PerCommonShareBasicAndDiluted")]',
            '//*[local-name()="NetIncomeLossAttributableToCommonStockholdersBasicAndDiluted"]',
            '//us-gaap:NetIncomeLossAvailableToCommonStockholdersBasic',
            '//*[local-name()="NetIncomeLossEPS"]',
            '//*[local-name()="NetLoss"]'
        ])

        self.add_xpaths('eps_diluted', [
            '//us-gaap:EarningsPerShareDiluted',
            '//us-gaap:IncomeLossFromContinuingOperationsPerDilutedShare',
            '//us-gaap:IncomeLossFromContinuingOperationsPerBasicAndDilutedShare',
            '//*[contains(local-name(), "Earnings") and contains(local-name(), "Per") and contains(local-name(), "Diluted")]',
            '//*[local-name()="IncomePerShareFromContinuingOperationsAvailableToCompanyStockholdersBasicAndDiluted"]',
            '//*[contains(local-name(), "NetLossPerShare")]',
            '//*[contains(local-name(), "NetIncome") and contains(local-name(), "Per") and contains(local-name(), "Diluted")]',
            '//*[local-name()="DilutedEarningsAttributableToStockholdersPerCommonShare"]',
            '//us-gaap:NetIncomeLossAvailableToCommonStockholdersDiluted',
            '//*[contains(local-name(), "PerCommonShareBasicAndDiluted")]',
            '//*[local-name()="NetIncomeLossAttributableToCommonStockholdersBasicAndDiluted"]',
            '//us-gaap:EarningsPerShareBasic',
            '//*[local-name()="NetIncomeLossEPS"]',
            '//*[local-name()="NetLoss"]'
        ])

        self.add_xpaths('dividend', [
            '//us-gaap:CommonStockDividendsPerShareDeclared',
            '//us-gaap:CommonStockDividendsPerShareCashPaid'
        ])

        # if dividend isn't found in doc, assume it's 0
        self.add_value('dividend', 0.0)

        self.add_xpaths('assets', [
            '//us-gaap:Assets',
            '//us-gaap:AssetsNet',
            '//us-gaap:LiabilitiesAndStockholdersEquity'
        ])

        self.add_xpaths('cur_assets', [
            '//us-gaap:AssetsCurrent'
        ])

        self.add_xpaths('cur_liab', [
            '//us-gaap:LiabilitiesCurrent'
        ])

        self.add_xpaths('equity', [
            '//*[local-name()="StockholdersEquityIncludingPortionAttributableToNoncontrollingInterest" or local-name()="StockholdersEquity"]',
            '//*[local-name()="TotalCommonShareholdersEquity"]',
            '//*[local-name()="CommonShareholdersEquity"]',
            '//*[local-name()="CommonStockEquity"]',
            '//*[local-name()="TotalEquity"]',
            '//us-gaap:RetainedEarningsAccumulatedDeficit',
            '//*[contains(local-name(), "MembersEquityIncludingPortionAttributableToNoncontrollingInterest")]',
            '//us-gaap:CapitalizationLongtermDebtAndEquity',
            '//*[local-name()="TotalCapitalization"]'
        ])

        self.add_xpaths('cash', [
            '//us-gaap:CashCashEquivalentsAndFederalFundsSold',
            '//us-gaap:CashAndDueFromBanks',
            '//us-gaap:CashAndCashEquivalentsAtCarryingValue',
            '//us-gaap:Cash',
            '//*[local-name()="CashAndCashEquivalents"]',
            '//*[contains(local-name(), "CarryingValueOfCashAndCashEquivalents")]',
            '//*[contains(local-name(), "CashCashEquivalents")]',
            '//*[contains(local-name(), "CashAndCashEquivalents")]'
        ])

        self.add_xpaths('cash_flow_op', [
            '//us-gaap:NetCashProvidedByUsedInOperatingActivities',
            '//us-gaap:NetCashProvidedByUsedInOperatingActivitiesContinuingOperations'
        ])

        self.add_xpaths('cash_flow_inv', [
            '//us-gaap:NetCashProvidedByUsedInInvestingActivities',
            '//us-gaap:NetCashProvidedByUsedInInvestingActivitiesContinuingOperations'
        ])

        self.add_xpaths('cash_flow_fin', [
            '//us-gaap:NetCashProvidedByUsedInFinancingActivities',
            '//us-gaap:NetCashProvidedByUsedInFinancingActivitiesContinuingOperations'
        ])

    def _get_symbol(self):
        try:
            filename = self.context['response'].url.split('/')[-1]
            return filename.split('-')[0].upper()
        except IndexError:
            return None

    def _get_doc_fiscal_year(self):
        try:
            fiscal_year = self.selector.xpath('//dei:DocumentFiscalYearFocus/text()')[0].extract()
            return int(fiscal_year)
        except (IndexError, ValueError):
            return None

    def _guess_fiscal_year(self, end_date, period_focus):
        # Guess fiscal_year based on document end_date and period_focus
        date = datetime.strptime(end_date, DATE_FORMAT)
        month_ranges = {
            'Q1': (2, 3, 4),
            'Q2': (5, 6, 7),
            'Q3': (8, 9, 10),
            'FY': (11, 12, 1)
        }
        month_range = month_ranges.get(period_focus)

        # Case 1: release Q1 around March, Q2 around June, ...
        # This is what most companies do
        if date.month in month_range:
            if period_focus == 'FY' and date.month == 1:
                return date.year - 1
            return date.year

        # How many days left before 10-K's release?
        days_left_table = {
            'Q1': 270,
            'Q2': 180,
            'Q3': 90,
            'FY': 0
        }
        days_left = days_left_table.get(period_focus)

        # Other cases, assume end_date.year of its FY report equals to
        # its fiscal_year
        if days_left is not None:
            fy_date = date + timedelta(days=days_left)
            return fy_date.year

        return None

    def _get_doc_end_date(self):
        # the document end date could come from URL or document content
        # we need to guess which one is correct
        url_date_str = self.context['response'].url.split('-')[-1].split('.')[0]
        url_date = datetime.strptime(url_date_str, '%Y%m%d')
        url_date_str = url_date.strftime(DATE_FORMAT)

        try:
            doc_date_str = self.selector.xpath('//dei:DocumentPeriodEndDate/text()')[0].extract()
            doc_date = datetime.strptime(doc_date_str, DATE_FORMAT)
        except (IndexError, ValueError):
            return url_date.strftime(DATE_FORMAT)

        context_date_strs = set(self.selector.xpath('//*[local-name()="context"]//*[local-name()="endDate"]/text()').extract())

        date = url_date
        if doc_date_str in context_date_strs:
            date = doc_date

        return date.strftime(DATE_FORMAT)

    def _get_doc_type(self):
        try:
            return self.selector.xpath('//dei:DocumentType/text()')[0].extract().upper()
        except (IndexError, ValueError):
            return None

    def _get_period_focus(self, doc_end_date):
        try:
            return self.selector.xpath('//dei:DocumentFiscalPeriodFocus/text()')[0].extract().strip().upper()
        except IndexError:
            pass

        try:
            doc_yr = doc_end_date.split('-')[0]
            yr_end_date = self.selector.xpath('//dei:CurrentFiscalYearEndDate/text()')[0].extract()
            yr_end_date = yr_end_date.replace('--', doc_yr + '-')
        except IndexError:
            return None

        doc_end_date = datetime.strptime(doc_end_date, '%Y-%m-%d')
        yr_end_date = datetime.strptime(yr_end_date, '%Y-%m-%d')
        delta_days = (yr_end_date - doc_end_date).days

        if delta_days > -45 and delta_days < 45:
            return 'FY'
        elif (delta_days <= -45 and delta_days > -135) or delta_days > 225:
            return 'Q1'
        elif (delta_days <= -135 and delta_days > -225) or (delta_days > 135 and delta_days <= 225):
            return 'Q2'
        elif delta_days <= -225 or (delta_days > 45 and delta_days <= 135):
            return 'Q3'

        return 'FY'


================================================
FILE: pystock_crawler/settings.py
================================================
# Scrapy settings for pystock-crawler project
#
# For simplicity, this file contains only the most important settings by
# default. All the other settings are documented here:
#
#     http://doc.scrapy.org/en/latest/topics/settings.html
#

BOT_NAME = 'pystock-crawler'

EXPORT_FIELDS = (
    # Price columns
    'symbol', 'date', 'open', 'high', 'low', 'close', 'volume', 'adj_close',

    # Report columns
    'end_date', 'amend', 'period_focus', 'fiscal_year', 'doc_type', 'revenues', 'op_income', 'net_income',
    'eps_basic', 'eps_diluted', 'dividend', 'assets', 'cur_assets', 'cur_liab', 'cash', 'equity',
    'cash_flow_op', 'cash_flow_inv', 'cash_flow_fin',
)

FEED_EXPORTERS = {
    'csv': 'pystock_crawler.exporters.CsvItemExporter2',
    'symbollist': 'pystock_crawler.exporters.SymbolListExporter'
}

HTTPCACHE_ENABLED = True

HTTPCACHE_POLICY = 'scrapy.contrib.httpcache.RFC2616Policy'

HTTPCACHE_STORAGE = 'scrapy.contrib.httpcache.LeveldbCacheStorage'

LOG_LEVEL = 'INFO'

NEWSPIDER_MODULE = 'pystock_crawler.spiders'

SPIDER_MODULES = ['pystock_crawler.spiders']

# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'pystock-crawler (+http://www.yourdomain.com)'

CONCURRENT_REQUESTS_PER_DOMAIN = 8

COOKIES_ENABLED = False

#AUTOTHROTTLE_ENABLED = True

RETRY_TIMES = 4

EXTENSIONS = {
    'scrapy.contrib.throttle.AutoThrottle': None,
    'pystock_crawler.throttle.PassiveThrottle': 0
}

PASSIVETHROTTLE_ENABLED = True
#PASSIVETHROTTLE_DEBUG = True

DEPTH_STATS_VERBOSE = True


================================================
FILE: pystock_crawler/spiders/__init__.py
================================================
# This package will contain the spiders of your Scrapy project
#
# Please refer to the documentation for information on how to create and manage
# your spiders.


================================================
FILE: pystock_crawler/spiders/edgar.py
================================================
import os

from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.contrib.spiders import CrawlSpider, Rule

from pystock_crawler import utils
from pystock_crawler.loaders import ReportItemLoader


class URLGenerator(object):

    def __init__(self, symbols, start_date='', end_date='', start=0, count=None):
        end = start + count if count is not None else None
        self.symbols = symbols[start:end]
        self.start_date = start_date
        self.end_date = end_date

    def __iter__(self):
        url = 'http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=%s&type=10-&dateb=%s&datea=%s&owner=exclude&count=300'
        for symbol in self.symbols:
            yield (url % (symbol, self.end_date, self.start_date))


class EdgarSpider(CrawlSpider):

    name = 'edgar'
    allowed_domains = ['sec.gov']

    rules = (
        Rule(SgmlLinkExtractor(allow=('/Archives/edgar/data/[^\"]+\-index\.htm',))),
        Rule(SgmlLinkExtractor(allow=('/Archives/edgar/data/[^\"]+/[A-Za-z]+\-\d{8}\.xml',)), callback='parse_10qk'),
    )

    def __init__(self, **kwargs):
        super(EdgarSpider, self).__init__(**kwargs)

        symbols_arg = kwargs.get('symbols')
        start_date = kwargs.get('startdate', '')
        end_date = kwargs.get('enddate', '')
        limit_arg = kwargs.get('limit', '')

        utils.check_date_arg(start_date, 'startdate')
        utils.check_date_arg(end_date, 'enddate')
        start, count = utils.parse_limit_arg(limit_arg)

        if symbols_arg:
            if os.path.exists(symbols_arg):
                # get symbols from a text file
                symbols = utils.load_symbols(symbols_arg)
            else:
                # inline symbols in command
                symbols = symbols_arg.split(',')
            self.start_urls = URLGenerator(symbols, start_date, end_date, start, count)
        else:
            self.start_urls = []

    def parse_10qk(self, response):
        '''Parse 10-Q or 10-K XML report.'''
        loader = ReportItemLoader(response=response)
        item = loader.load_item()

        if 'doc_type' in item:
            doc_type = item['doc_type']
            if doc_type in ('10-Q', '10-K'):
                return item

        return None


================================================
FILE: pystock_crawler/spiders/nasdaq.py
================================================
import cStringIO
import re

from scrapy.spider import Spider

from pystock_crawler.items import SymbolItem


RE_SYMBOL = re.compile(r'^[A-Z]+$')


def generate_urls(exchanges):
    for exchange in exchanges:
        yield 'http://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=%s&render=download' % exchange


class NasdaqSpider(Spider):

    name = 'nasdaq'
    allowed_domains = ['www.nasdaq.com']

    def __init__(self, **kwargs):
        super(NasdaqSpider, self).__init__(**kwargs)

        exchanges = kwargs.get('exchanges', '').split(',')
        self.start_urls = generate_urls(exchanges)

    def parse(self, response):
        try:
            file_like = cStringIO.StringIO(response.body)

            # Ignore first row
            file_like.next()

            for line in file_like:
                tokens = line.split(',')
                symbol = tokens[0].strip('"')
                if RE_SYMBOL.match(symbol):
                    name = tokens[1].strip('"')
                    yield SymbolItem(symbol=symbol, name=name)
        finally:
            file_like.close()


================================================
FILE: pystock_crawler/spiders/yahoo.py
================================================
import cStringIO
import os
import re

from datetime import datetime
from scrapy.spider import Spider

from pystock_crawler import utils
from pystock_crawler.items import PriceItem


def parse_date(date_str):
    if date_str:
        date = datetime.strptime(date_str, '%Y%m%d')
        return date.year, date.month - 1, date.day
    return '', '', ''


def make_url(symbol, start_date=None, end_date=None):
    url = ('http://ichart.finance.yahoo.com/table.csv?'
           's=%(symbol)s&d=%(end_month)s&e=%(end_day)s&f=%(end_year)s&g=d&'
           'a=%(start_month)s&b=%(start_day)s&c=%(start_year)s&ignore=.csv')

    start_date = parse_date(start_date)
    end_date = parse_date(end_date)

    return url % {
        'symbol': symbol,
        'start_year': start_date[0],
        'start_month': start_date[1],
        'start_day': start_date[2],
        'end_year': end_date[0],
        'end_month': end_date[1],
        'end_day': end_date[2]
    }


def generate_urls(symbols, start_date=None, end_date=None):
    for symbol in symbols:
        yield make_url(symbol, start_date, end_date)


class YahooSpider(Spider):

    name = 'yahoo'
    allowed_domains = ['finance.yahoo.com']

    def __init__(self, **kwargs):
        super(YahooSpider, self).__init__(**kwargs)

        symbols_arg = kwargs.get('symbols')
        start_date = kwargs.get('startdate', '')
        end_date = kwargs.get('enddate', '')

        utils.check_date_arg(start_date, 'startdate')
        utils.check_date_arg(end_date, 'enddate')

        if symbols_arg:
            if os.path.exists(symbols_arg):
                # get symbols from a text file
                symbols = utils.load_symbols(symbols_arg)
            else:
                # inline symbols in command
                symbols = symbols_arg.split(',')
            self.start_urls = generate_urls(symbols, start_date, end_date)
        else:
            self.start_urls = []

    def parse(self, response):
        symbol = self._get_symbol_from_url(response.url)
        try:
            file_like = cStringIO.StringIO(response.body)
            rows = utils.parse_csv(file_like)
            for row in rows:
                item = PriceItem(symbol=symbol)
                for k, v in row.iteritems():
                    item[k.replace(' ', '_').lower()] = v
                yield item
        finally:
            file_like.close()

    def _get_symbol_from_url(self, url):
        match = re.search(r'[\?&]s=([^&]*)', url)
        if match:
            return match.group(1)
        return ''


================================================
FILE: pystock_crawler/tests/__init__.py
================================================


================================================
FILE: pystock_crawler/tests/base.py
================================================
import os
import unittest


# Stores temporary test data
SAMPLE_DATA_DIR = os.path.join(os.path.abspath(os.path.dirname(__file__)), 'sample_data')


class TestCaseBase(unittest.TestCase):
    '''
    Provides utility functions for test cases.

    '''
    def assert_none_or_almost_equal(self, value, expected_value):
        if expected_value is None:
            self.assertIsNone(value)
        else:
            self.assertAlmostEqual(value, expected_value)

    def assert_item(self, item, expected):
        self.assertEqual(item.get('symbol'), expected.get('symbol'))
        self.assertEqual(item.get('name'), expected.get('name'))
        self.assertEqual(item.get('amend'), expected.get('amend'))
        self.assertEqual(item.get('doc_type'), expected.get('doc_type'))
        self.assertEqual(item.get('period_focus'), expected.get('period_focus'))
        self.assertEqual(item.get('fiscal_year'), expected.get('fiscal_year'))
        self.assertEqual(item.get('end_date'), expected.get('end_date'))
        self.assert_none_or_almost_equal(item.get('revenues'), expected.get('revenues'))
        self.assert_none_or_almost_equal(item.get('net_income'), expected.get('net_income'))
        self.assert_none_or_almost_equal(item.get('eps_basic'), expected.get('eps_basic'))
        self.assert_none_or_almost_equal(item.get('eps_diluted'), expected.get('eps_diluted'))
        self.assertAlmostEqual(item.get('dividend'), expected.get('dividend'))
        self.assert_none_or_almost_equal(item.get('assets'), expected.get('assets'))
        self.assert_none_or_almost_equal(item.get('equity'), expected.get('equity'))
        self.assert_none_or_almost_equal(item.get('cash'), expected.get('cash'))
        self.assert_none_or_almost_equal(item.get('op_income'), expected.get('op_income'))
        self.assert_none_or_almost_equal(item.get('cur_assets'), expected.get('cur_assets'))
        self.assert_none_or_almost_equal(item.get('cur_liab'), expected.get('cur_liab'))
        self.assert_none_or_almost_equal(item.get('cash_flow_op'), expected.get('cash_flow_op'))
        self.assert_none_or_almost_equal(item.get('cash_flow_inv'), expected.get('cash_flow_inv'))
        self.assert_none_or_almost_equal(item.get('cash_flow_fin'), expected.get('cash_flow_fin'))


def _create_sample_data_dir():
    if not os.path.exists(SAMPLE_DATA_DIR):
        try:
            os.makedirs(SAMPLE_DATA_DIR)
        except OSError:
            pass

    assert os.path.exists(SAMPLE_DATA_DIR)

_create_sample_data_dir()


================================================
FILE: pystock_crawler/tests/test_cmdline.py
================================================
import os
import shutil
import unittest

import pystock_crawler

from envoy import run


TEST_DIR = './test_data'


# Scrapy runs on another process where working directory may be different with
# the process running the test. So we have to explicitly set PYTHONPATH to
# the absolute path of the current working directory for Scrapy process to be
# able to locate pystock_crawler module.
os.environ['PYTHONPATH'] = os.getcwd()


class PrintTest(unittest.TestCase):

    def test_no_args(self):
        r = run('./bin/pystock-crawler')
        self.assertIn('Usage:', r.std_err)

    def test_print_help(self):
        r = run('./bin/pystock-crawler -h')
        self.assertIn('Usage:', r.std_out)

        r2 = run('./bin/pystock-crawler --help')
        self.assertEqual(r.std_out, r2.std_out)

    def test_print_version(self):
        r = run('./bin/pystock-crawler -v')
        self.assertEqual(r.std_out, 'pystock-crawler %s\n' % pystock_crawler.__version__)

        r2 = run('./bin/pystock-crawler --version')
        self.assertEqual(r.std_out, r2.std_out)


class CrawlTest(unittest.TestCase):
    '''Base class for crawl test cases.'''
    def setUp(self):
        if os.path.isdir(TEST_DIR):
            shutil.rmtree(TEST_DIR)
        os.mkdir(TEST_DIR)

        self.args = {
            'output': os.path.join(TEST_DIR, '%s.out' % self.filename),
            'log_file': os.path.join(TEST_DIR, '%s.log' % self.filename),
            'working_dir': TEST_DIR
        }

    def tearDown(self):
        shutil.rmtree(TEST_DIR)

    def assert_cache(self):
        # Check if cache is there
        cache_dir = os.path.join(TEST_DIR, '.scrapy', 'httpcache', '%s.leveldb' % self.spider)
        self.assertTrue(os.path.isdir(cache_dir))

    def assert_log(self):
        # Check if log file is there
        log_path = self.args['log_file']
        self.assertTrue(os.path.isfile(log_path))

    def get_output_content(self):
        output_path = self.args['output']
        self.assertTrue(os.path.isfile(output_path))

        with open(output_path) as f:
            content = f.read()
        return content


class CrawlSymbolsTest(CrawlTest):

    filename = 'symbols'
    spider = 'nasdaq'

    def assert_nyse_output(self):
        # Check if some common NYSE symbols are in output
        content = self.get_output_content()
        self.assertIn('JPM', content)
        self.assertIn('KO', content)
        self.assertIn('WMT', content)

        # NASDAQ symbols shouldn't be
        self.assertNotIn('AAPL', content)
        self.assertNotIn('GOOG', content)
        self.assertNotIn('YHOO', content)

    def assert_nyse_and_nasdaq_output(self):
        # Check if some common NYSE symbols are in output
        content = self.get_output_content()
        self.assertIn('JPM', content)
        self.assertIn('KO', content)
        self.assertIn('WMT', content)

        # Check if some common NASDAQ symbols are in output
        self.assertIn('AAPL', content)
        self.assertIn('GOOG', content)
        self.assertIn('YHOO', content)

    def test_crawl_nyse(self):
        r = run('./bin/pystock-crawler symbols NYSE -o %(output)s -l %(log_file)s -w %(working_dir)s' % self.args)
        self.assertEqual(r.status_code, 0)
        self.assert_nyse_output()
        self.assert_log()
        self.assert_cache()

    def test_crawl_nyse_and_nasdaq(self):
        r = run('./bin/pystock-crawler symbols NYSE,NASDAQ -o %(output)s -l %(log_file)s -w %(working_dir)s --sort' % self.args)
        self.assertEqual(r.status_code, 0)
        self.assert_nyse_and_nasdaq_output()
        self.assert_log()
        self.assert_cache()


class CrawlPricesTest(CrawlTest):

    filename = 'prices'
    spider = 'yahoo'

    def test_crawl_inline_symbols(self):
        r = run('./bin/pystock-crawler prices GOOG,IBM -o %(output)s -l %(log_file)s -w %(working_dir)s' % self.args)
        self.assertEqual(r.status_code, 0)

        content = self.get_output_content()
        self.assertIn('GOOG', content)
        self.assertIn('IBM', content)
        self.assert_log()
        self.assert_cache()

    def test_crawl_symbol_file(self):
        # Create a sample symbol file
        symbol_file = os.path.join(TEST_DIR, 'symbols.txt')
        with open(symbol_file, 'w') as f:
            f.write('WMT\nJPM')
        self.args['symbol_file'] = symbol_file

        r = run('./bin/pystock-crawler prices %(symbol_file)s -o %(output)s -l %(log_file)s -w %(working_dir)s --sort' % self.args)
        self.assertEqual(r.status_code, 0)

        content = self.get_output_content()
        self.assertIn('WMT', content)
        self.assertIn('JPM', content)
        self.assert_log()
        self.assert_cache()


class CrawlReportsTest(CrawlTest):

    filename = 'reports'
    spider = 'edgar'

    def test_crawl_inline_symbols(self):
        r = run('./bin/pystock-crawler reports KO,MCD -o %(output)s -l %(log_file)s -w %(working_dir)s '
                '-s 20130401 -e 20130531' % self.args)
        self.assertEqual(r.status_code, 0)

        content = self.get_output_content()
        self.assertIn('KO', content)
        self.assertIn('MCD', content)
        self.assert_log()
        self.assert_cache()

    def test_crawl_symbol_file(self):
        # Create a sample symbol file
        symbol_file = os.path.join(TEST_DIR, 'symbols.txt')
        with open(symbol_file, 'w') as f:
            f.write('KO\nMCD')
        self.args['symbol_file'] = symbol_file

        r = run('./bin/pystock-crawler reports %(symbol_file)s -o %(output)s -l %(log_file)s -w %(working_dir)s '
                '-s 20130401 -e 20130531 --sort' % self.args)
        self.assertEqual(r.status_code, 0)

        content = self.get_output_content()
        self.assertIn('KO', content)
        self.assertIn('MCD', content)
        self.assert_log()
        self.assert_cache()

        # Check CSV header
        expected_header = [
            'symbol', 'end_date', 'amend', 'period_focus', 'fiscal_year', 'doc_type',
            'revenues', 'op_income', 'net_income', 'eps_basic', 'eps_diluted', 'dividend',
            'assets', 'cur_assets', 'cur_liab', 'cash', 'equity', 'cash_flow_op',
            'cash_flow_inv', 'cash_flow_fin'
        ]
        head_line = content.split('\n')[0].rstrip()
        self.assertEqual(head_line.split(','), expected_header)

    def test_merge_empty_results(self):
        # Ridiculous date range (1800/1/1) -> empty result
        r = run('./bin/pystock-crawler reports KO,MCD -o %(output)s -l %(log_file)s -w %(working_dir)s '
                '-s 18000101 -e 18000101 -b 1' % self.args)
        self.assertEqual(r.status_code, 0)

        content = self.get_output_content()
        self.assertFalse(content)

        # Make sure subfiles are deleted
        filename = self.args['output']
        self.assertFalse(os.path.exists(os.path.join('%s.1' % filename)))
        self.assertFalse(os.path.exists(os.path.join('%s.2' % filename)))


================================================
FILE: pystock_crawler/tests/test_loaders.py
================================================
import os
import requests
import urlparse

from scrapy.http.response.xml import XmlResponse

from pystock_crawler.loaders import ReportItemLoader
from pystock_crawler.tests.base import SAMPLE_DATA_DIR, TestCaseBase


def create_response(file_path):
    with open(file_path) as f:
        body = f.read()
    return XmlResponse('file://%s' % file_path.replace('\\', '/'), body=body)


def download(url, local_path):
    if not os.path.exists(local_path):
        dir_path = os.path.dirname(local_path)
        if not os.path.exists(dir_path):
            try:
                os.makedirs(dir_path)
            except OSError:
                pass

        assert os.path.exists(dir_path)

        with open(local_path, 'wb') as f:
            r = requests.get(url, stream=True)
            for chunk in r.iter_content(chunk_size=4096):
                f.write(chunk)


def parse_xml(url):
    url_path = urlparse.urlparse(url).path
    local_path = os.path.join(SAMPLE_DATA_DIR, url_path[1:])
    download(url, local_path)
    response = create_response(local_path)
    loader = ReportItemLoader(response=response)
    return loader.load_item()


class ReportItemLoaderTest(TestCaseBase):

    def test_a_20110131(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1090872/000110465911013291/a-20110131.xml')
        self.assert_item(item, {
            'symbol': 'A',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2011,
            'end_date': '2011-01-31',
            'revenues': 1519000000,
            'op_income': 211000000,
            'net_income': 193000000,
            'eps_basic': 0.56,
            'eps_diluted': 0.54,
            'dividend': 0.0,
            'assets': 8044000000,
            'cur_assets': 4598000000,
            'cur_liab': 1406000000,
            'equity': 3339000000,
            'cash': 2638000000,
            'cash_flow_op': 120000000,
            'cash_flow_inv': 1500000000,
            'cash_flow_fin': -1634000000
        })

    def test_aa_20120630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/4281/000119312512317135/aa-20120630.xml')
        self.assert_item(item, {
            'symbol': 'AA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2012,
            'end_date': '2012-06-30',
            'revenues': 5963000000,
            'op_income': None,  # Missing value
            'net_income': -2000000,
            'eps_basic': None,  # EPS is 0 actually, but got no data in XML
            'eps_diluted': None,
            'dividend': 0.03,
            'assets': 39498000000,
            'cur_assets': 7767000000,
            'cur_liab': 6151000000,
            'equity': 16914000000,
            'cash': 1712000000,
            'cash_flow_op': 301000000,
            'cash_flow_inv': -704000000,
            'cash_flow_fin': 196000000
        })

    def test_aapl_20100626(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/320193/000119312510162840/aapl-20100626.xml')
        self.assert_item(item, {
            'symbol': 'AAPL',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2010,
            'end_date': '2010-06-26',
            'revenues': 15700000000,
            'op_income': 4234000000,
            'net_income': 3253000000,
            'eps_basic': 3.57,
            'eps_diluted': 3.51,
            'dividend': 0.0,
            'assets': 64725000000,
            'cur_assets': 36033000000,
            'cur_liab': 15612000000,
            'equity': 43111000000,
            'cash': 9705000000,
            'cash_flow_op': 12912000000,
            'cash_flow_inv': -9471000000,
            'cash_flow_fin': 1001000000
        })

    def test_aapl_20110326(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/320193/000119312511104388/aapl-20110326.xml')
        self.assert_item(item, {
            'symbol': 'AAPL',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2011,
            'end_date': '2011-03-26',
            'revenues': 24667000000,
            'net_income': 5987000000,
            'op_income': 7874000000,
            'eps_basic': 6.49,
            'eps_diluted': 6.40,
            'dividend': 0.0,
            'assets': 94904000000,
            'cur_assets': 46997000000,
            'cur_liab': 24327000000,
            'equity': 61477000000,
            'cash': 15978000000,
            'cash_flow_op': 15992000000,
            'cash_flow_inv': -12251000000,
            'cash_flow_fin': 976000000
        })

    def test_aapl_20120929(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/320193/000119312512444068/aapl-20120929.xml')
        self.assert_item(item, {
            'symbol': 'AAPL',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2012,
            'end_date': '2012-09-29',
            'revenues': 156508000000,
            'op_income': 55241000000,
            'net_income': 41733000000,
            'eps_basic': 44.64,
            'eps_diluted': 44.15,
            'dividend': 2.65,
            'assets': 176064000000,
            'cur_assets': 57653000000,
            'cur_liab': 38542000000,
            'equity': 118210000000,
            'cash': 10746000000,
            'cash_flow_op': 50856000000,
            'cash_flow_inv': -48227000000,
            'cash_flow_fin': -1698000000
        })

    def test_aes_20100331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/874761/000119312510111183/aes-20100331.xml')
        self.assert_item(item, {
            'symbol': 'AES',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2010,
            'end_date': '2010-03-31',
            'revenues': 4112000000,
            'op_income': None,  # Missing value
            'net_income': 187000000,
            'eps_basic': 0.27,
            'eps_diluted': 0.27,
            'dividend': 0.0,
            'assets': 41882000000,
            'cur_assets': 10460000000,
            'cur_liab': 6894000000,
            'equity': 10536000000,
            'cash': 3392000000,
            'cash_flow_op': 684000000,
            'cash_flow_inv': -595000000,
            'cash_flow_fin': 1515000000
        })

    def test_adbe_20060914(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/796343/000110465906066129/adbe-20060914.xml')

        # Old document is not supported
        self.assertFalse(item)

    def test_adbe_20090227(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/796343/000079634309000021/adbe-20090227.xml')
        self.assert_item(item, {
            'symbol': 'ADBE',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2009,
            'end_date': '2009-02-27',
            'revenues': 786390000,
            'op_income': 207916000,
            'net_income': 156435000,
            'eps_basic': 0.3,
            'eps_diluted': 0.3,
            'dividend': 0.0,
            'assets': 5887596000,
            'cur_assets': 2868991000,
            'cur_liab': 636865000,
            'equity': 4611160000,
            'cash': 1148925000,
            'cash_flow_op': 365743000,
            'cash_flow_inv': -131562000,
            'cash_flow_fin': 28675000
        })

    def test_agn_20101231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/850693/000119312511050632/agn-20101231.xml')
        self.assert_item(item, {
            'symbol': 'AGN',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2010,
            'end_date': '2010-12-31',
            'revenues': 4919400000,
            'op_income': 258600000,
            'net_income': 600000,
            'eps_basic': 0.0,
            'eps_diluted': 0.0,
            'dividend': 0.2,
            'assets': 8308100000,
            'cur_assets': 3993700000,
            'cur_liab': 1528400000,
            'equity': 4781100000,
            'cash': 1991200000,
            'cash_flow_op': 463900000,
            'cash_flow_inv': -977200000,
            'cash_flow_fin': 563000000
        })

    def test_aig_20130630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/5272/000104746913008075/aig-20130630.xml')
        self.assert_item(item, {
            'symbol': 'AIG',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2013-06-30',
            'revenues': 17315000000,
            'net_income': 2731000000,
            'op_income': None,
            'eps_basic': 1.85,
            'eps_diluted': 1.84,
            'dividend': 0.0,
            'assets': 537438000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 98155000000,
            'cash': 1762000000,
            'cash_flow_op': 1674000000,
            'cash_flow_inv': 6071000000,
            'cash_flow_fin': -7055000000
        })

    def test_aiv_20110630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/922864/000095012311070591/aiv-20110630.xml')
        self.assert_item(item, {
            'symbol': 'AIV',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2011,
            'end_date': '2011-06-30',
            'revenues': 281035000,
            'op_income': 49791000,
            'net_income': -33177000,
            'eps_basic': -0.28,
            'eps_diluted': -0.28,
            'dividend': 0.12,
            'assets': 7164972000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 1241336000,
            'cash': 85324000,
            'cash_flow_op': 95208000,
            'cash_flow_inv': -33538000,
            'cash_flow_fin': -87671000
        })

    def test_all_20130331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/899051/000110465913035969/all-20130331.xml')
        self.assert_item(item, {
            'symbol': 'ALL',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2013,
            'end_date': '2013-03-31',
            'revenues': 8463000000,
            'op_income': None,
            'net_income': 709000000,
            'eps_basic': 1.49,
            'eps_diluted': 1.47,
            'dividend': 0.25,
            'assets': 126612000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 20619000000,
            'cash': 820000000,
            'cash_flow_op': 740000000,
            'cash_flow_inv': 136000000,
            'cash_flow_fin': -862000000
        })

    def test_apa_20120930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/6769/000119312512457830/apa-20120930.xml')
        self.assert_item(item, {
            'symbol': 'APA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2012,
            'end_date': '2012-09-30',
            'revenues': 4179000000,
            'op_income': None,
            'net_income': 161000000,
            'eps_basic': 0.41,
            'eps_diluted': 0.41,
            'dividend': 0.17,
            'assets': 58810000000,
            'cur_assets': 5044000000,
            'cur_liab': 5390000000,
            'equity': 30714000000,
            'cash': 318000000,
            'cash_flow_op': 6422000000,
            'cash_flow_inv': -10560000000,
            'cash_flow_fin': 4161000000
        })

    def test_axp_20100930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/4962/000095012310100214/axp-20100930.xml')
        self.assert_item(item, {
            'symbol': 'AXP',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2010,
            'end_date': '2010-09-30',
            'revenues': 6660000000,
            'op_income': 1640000000,
            'net_income': 1093000000,
            'eps_basic': 0.91,
            'eps_diluted': 0.9,
            'dividend': 0.18,
            'assets': 146056000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 15920000000,
            'cash': 21341000000,
            'cash_flow_op': 7227000000,
            'cash_flow_inv': 5298000000,
            'cash_flow_fin': -7885000000
        })

    def test_axp_20120630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/4962/000119312512332179/axp-20120630.xml')
        self.assert_item(item, {
            'symbol': 'AXP',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2012,
            'end_date': '2012-06-30',
            'revenues': 7504000000,
            'op_income': None,
            'net_income': 1339000000,
            'eps_basic': 1.16,
            'eps_diluted': 1.15,
            'dividend': 0.2,
            'assets': 148128000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 19267000000,
            'cash': 22072000000,
            'cash_flow_op': 6742000000,
            'cash_flow_inv': -1771000000,
            'cash_flow_fin': -7786000000
        })

    def test_axp_20121231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/4962/000119312513070554/axp-20121231.xml')
        self.assert_item(item, {
            'symbol': 'AXP',
            'amend': True,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2012,
            'end_date': '2012-12-31',
            'revenues': 29592000000,
            'op_income': None,
            'net_income': 4482000000,
            'eps_basic': 3.91,
            'eps_diluted': 3.89,
            'dividend': 0.8,
            'assets': 153140000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 18886000000,
            'cash': 22250000000,
            'cash_flow_op': 7082000000,
            'cash_flow_inv': -6545000000,
            'cash_flow_fin': -3268000000
        })

    def test_axp_20130331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/4962/000119312513180601/axp-20130331.xml')
        self.assert_item(item, {
            'symbol': 'AXP',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2013,
            'end_date': '2013-03-31',
            'revenues': 7384000000,
            'op_income': None,
            'net_income': 1280000000,
            'eps_basic': 1.15,
            'eps_diluted': 1.15,
            'dividend': 0.2,
            'assets': 156855000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 19290000000,
            'cash': 27964000000,
            'cash_flow_op': 7547000000,
            'cash_flow_inv': 32000000,
            'cash_flow_fin': -1830000000
        })

    def test_ba_20091231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12927/000119312510024406/ba-20091231.xml')
        self.assert_item(item, {
            'symbol': 'BA',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2009,
            'end_date': '2009-12-31',
            'revenues': 68281000000,
            'op_income': 2096000000,
            'net_income': 1312000000,
            'eps_basic': 1.86,
            'eps_diluted': 1.84,
            'dividend': 1.68,
            'assets': 62053000000,
            'cur_assets': 35275000000,
            'cur_liab': 32883000000,
            'equity': 2225000000,
            'cash': 9215000000,
            'cash_flow_op': 5603000000,
            'cash_flow_inv': -3794000000,
            'cash_flow_fin': 4094000000
        })

    def test_ba_20110930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12927/000119312511281613/ba-20110930.xml')
        self.assert_item(item, {
            'symbol': 'BA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2011,
            'end_date': '2011-09-30',
            'revenues': 17727000000,
            'op_income': 1714000000,
            'net_income': 1098000000,
            'eps_basic': 1.47,
            'eps_diluted': 1.46,
            'dividend': 0.42,
            'assets': 74163000000,
            'cur_assets': 46347000000,
            'cur_liab': 37593000000,
            'equity': 6061000000,
            'cash': 5954000000,
            'cash_flow_op': 1092000000,
            'cash_flow_inv': 856000000,
            'cash_flow_fin': -1354000000
        })

    def test_ba_20130331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12927/000001292713000023/ba-20130331.xml')
        self.assert_item(item, {
            'symbol': 'BA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2013,
            'end_date': '2013-03-31',
            'revenues': 18893000000,
            'op_income': 1528000000,
            'net_income': 1106000000,
            'eps_basic': 1.45,
            'eps_diluted': 1.44,
            'dividend': 0.49,
            'assets': 90447000000,
            'cur_assets': 59490000000,
            'cur_liab': 45666000000,
            'equity': 7560000000,
            'cash': 8335000000,
            'cash_flow_op': 524000000,
            'cash_flow_inv': -814000000,
            'cash_flow_fin': -1705000000
        })

    def test_bbt_20110930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/92230/000119312511304459/bbt-20110930.xml')
        self.assert_item(item, {
            'symbol': 'BBT',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2011,
            'end_date': '2011-09-30',
            'revenues': 2440000000,
            'op_income': None,
            'net_income': 366000000,
            'eps_basic': 0.52,
            'eps_diluted': 0.52,
            'dividend': 0.16,
            'assets': 167677000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 17541000000,
            'cash': 1312000000,
            'cash_flow_op': 4348000000,
            'cash_flow_inv': -10838000000,
            'cash_flow_fin': 8509000000
        })

    def test_bk_20100331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1390777/000119312510112944/bk-20100331.xml')
        self.assert_item(item, {
            'symbol': 'BK',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2010,
            'end_date': '2010-03-31',
            'revenues': 883000000,
            'op_income': None,
            'net_income': 559000000,
            'eps_basic': 0.46,
            'eps_diluted': 0.46,
            'dividend': 0.09,
            'assets': 220551000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 30455000000,
            'cash': 3307000000,
            'cash_flow_op': 1191000000,
            'cash_flow_inv': 512000000,
            'cash_flow_fin': -2126000000
        })

    def test_blk_20130630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1364742/000119312513326890/blk-20130630.xml')
        self.assert_item(item, {
            'symbol': 'BLK',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2013-06-30',
            'revenues': 2482000000,
            'op_income': 849000000,
            'net_income': 729000000,
            'eps_basic': 4.27,
            'eps_diluted': 4.19,
            'dividend': 1.68,
            'assets': 193745000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 25755000000,
            'cash': 3668000000,
            'cash_flow_op': 1330000000,
            'cash_flow_inv': 10000000,
            'cash_flow_fin': -2193000000
        })

    def test_c_20090630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/831001/000104746909007400/c-20090630.xml')
        self.assert_item(item, {
            'symbol': 'C',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2009,
            'end_date': '2009-06-30',
            'revenues': 29969000000,
            'net_income': 4279000000,
            'op_income': None,
            'eps_basic': 0.49,
            'eps_diluted': 0.49,
            'dividend': 0.0,
            'assets': 1848533000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 154168000000,
            'cash': 26915000000,
            'cash_flow_op': -20737000000,
            'cash_flow_inv': 16457000000,
            'cash_flow_fin': 959000000
        })

    def test_cbs_20100331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/813828/000104746910004823/cbs-20100331.xml')
        self.assert_item(item, {
            'symbol': 'CBS',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2010,
            'end_date': '2010-03-31',
            'revenues': 3530900000,
            'op_income': 153400000,
            'net_income': -26200000,
            'eps_basic': -0.04,
            'eps_diluted': -0.04,
            'dividend': 0.05,
            'assets': 26756100000,
            'cur_assets': 5705200000,
            'cur_liab': 4712300000,
            'equity': 9046100000,
            'cash': 872700000,
            'cash_flow_op': 700700000,
            'cash_flow_inv': -73600000,
            'cash_flow_fin': -471100000
        })

    def test_cbs_20111231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/813828/000104746912001373/cbs-20111231.xml')
        self.assert_item(item, {
            'symbol': 'CBS',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2011,
            'end_date': '2011-12-31',
            'revenues': 14245000000,
            'op_income': 2529000000,
            'net_income': 1305000000,
            'eps_basic': 1.97,
            'eps_diluted': 1.92,
            'dividend': 0.35,
            'assets': 26197000000,
            'cur_assets': 5543000000,
            'cur_liab': 3933000000,
            'equity': 9908000000,
            'cash': 660000000,
            'cash_flow_op': 1749000000,
            'cash_flow_inv': -389000000,
            'cash_flow_fin': -1180000000
        })

    def test_cbs_20130630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/813828/000104746913007929/cbs-20130630.xml')
        self.assert_item(item, {
            'symbol': 'CBS',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2013-06-30',
            'revenues': 3699000000,
            'op_income': 838000000,
            'net_income': 472000000,
            'eps_basic': 0.78,
            'eps_diluted': 0.76,
            'dividend': 0.12,
            'assets': 25693000000,
            'cur_assets': 4770000000,
            'cur_liab': 3825000000,
            'equity': 9601000000,
            'cash': 282000000,
            'cash_flow_op': 1051000000,
            'cash_flow_inv': -230000000,
            'cash_flow_fin': -1247000000
        })

    def test_cce_20101001(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1491675/000119312510239952/cce-20101001.xml')
        self.assert_item(item, {
            'symbol': 'CCE',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2010,
            'end_date': '2010-10-01',
            'revenues': 1681000000,
            'op_income': 244000000,
            'net_income': 208000000,
            'eps_basic': 0.61,
            'eps_diluted': 0.61,
            'dividend': 0.0,
            'assets': 8457000000,
            'cur_assets': 3145000000,
            'cur_liab': 2154000000,
            'equity': 3277000000,
            'cash': 476000000,
            'cash_flow_op': 620000000,
            'cash_flow_inv': -705000000,
            'cash_flow_fin': 178000000
        })

    def test_cce_20101231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1491675/000119312511033197/cce-20101231.xml')
        self.assert_item(item, {
            'symbol': 'CCE',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2010,
            'end_date': '2010-12-31',
            'revenues': 6714000000,
            'op_income': 810000000,
            'net_income': 624000000,
            'eps_basic': 1.84,
            'eps_diluted': 1.83,
            'dividend': 0.12,
            'assets': 8596000000,
            'cur_assets': 2230000000,
            'cur_liab': 1942000000,
            'equity': 3143000000,
            'cash': 321000000,
            'cash_flow_op': 825000000,
            'cash_flow_inv': -739000000,
            'cash_flow_fin': -144000000
        })

    def test_cci_20091231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1051470/000119312510031419/cci-20091231.xml')
        self.assert_item(item, {
            'symbol': 'CCI',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2009,
            'end_date': '2009-12-31',
            'revenues': 1685407000,
            'op_income': 433991000,
            'net_income': -135138000,
            'eps_basic': -0.47,
            'eps_diluted': -0.47,
            'dividend': 0.0,
            'assets': 10956606000,
            'cur_assets': 1196033000,
            'cur_liab': 754105000,
            'equity': 2936085000,
            'cash': 766146000,
            'cash_flow_op': 571256000,
            'cash_flow_inv': -172145000,
            'cash_flow_fin': 214396000
        })

    def test_ccmm_20110630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1091667/000109166711000103/ccmm-20110630.xml')
        self.assert_item(item, {
            'symbol': 'CCMM',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2011,
            'end_date': '2011-06-30',
            'revenues': 1791000000,
            'op_income': 270000000,
            'net_income': -107000000,
            'eps_basic': -0.98,
            'eps_diluted': -0.98,
            'dividend': 0.0,
            'assets': None,
            'cur_assets': None,  # Seems the source filing got the wrong context date on balance sheet
            'cur_liab': None,
            'equity': None,
            'cash': 194000000,
            'cash_flow_op': 907000000,
            'cash_flow_inv': -694000000,
            'cash_flow_fin': -51000000
        })

    def test_chtr_20111231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1091667/000109166712000026/chtr-20111231.xml')
        self.assert_item(item, {
            'symbol': 'CHTR',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2011,
            'end_date': '2011-12-31',
            'revenues': 7204000000,
            'op_income': 1041000000,
            'net_income': -369000000,
            'eps_basic': -3.39,
            'eps_diluted': -3.39,
            'dividend': 0.0,
            'assets': 15605000000,
            'cur_assets': 370000000,
            'cur_liab': 1153000000,
            'equity': 409000000,
            'cash': 2000000,
            'cash_flow_op': 1737000000,
            'cash_flow_inv': -1367000000,
            'cash_flow_fin': -373000000
        })

    def test_ci_20130331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/701221/000110465913036475/ci-20130331.xml')
        self.assert_item(item, {
            'symbol': 'CI',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2013,
            'end_date': '2013-03-31',
            'revenues': 8183000000,
            'op_income': None,
            'net_income': 57000000,
            'eps_basic': 0.2,
            'eps_diluted': 0.2,
            'dividend': 0.04,
            'assets': 54939000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 9660000000,
            'cash': 3306000000,
            'cash_flow_op': -805000000,
            'cash_flow_inv': 962000000,
            'cash_flow_fin': 185000000
        })

    def test_cit_20100630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1171825/000089109210003376/cit-20100331.xml')
        self.assert_item(item, {
            'symbol': 'CIT',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2010,
            'end_date': '2010-06-30',
            'revenues': 669500000,
            'op_income': None,
            'net_income': 142100000,
            'eps_basic': 0.71,
            'eps_diluted': 0.71,
            'dividend': 0.0,
            'assets': 54916800000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 8633900000,
            'cash': 1060700000,
            'cash_flow_op': 178100000,
            'cash_flow_inv': 7122800000,
            'cash_flow_fin': -6218700000
        })

    def test_csc_20120928(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/23082/000002308212000073/csc-20120928.xml')
        self.assert_item(item, {
            'symbol': 'CSC',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2012-09-28',
            'revenues': 3854000000,
            'op_income': 298000000,
            'net_income': 130000000,
            'eps_basic': 0.84,
            'eps_diluted': 0.83,
            'dividend': 0.2,
            'assets': 11649000000,
            'cur_assets': 5468000000,
            'cur_liab': 4015000000,
            'equity': 2885000000,
            'cash': 1850000000,
            'cash_flow_op': 665000000,
            'cash_flow_inv': -366000000,
            'cash_flow_fin': 469000000
        })

    def test_disca_20090630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1437107/000095012309029613/disca-20090630.xml')
        self.assert_item(item, {
            'symbol': 'DISCA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2009,
            'end_date': '2009-06-30',
            'revenues': 881000000,
            'op_income': 486000000,
            'net_income': 183000000,
            'eps_basic': 0.43,
            'eps_diluted': 0.43,
            'dividend': 0.0,
            'assets': 10696000000,
            'cur_assets': 1331000000,
            'cur_liab': 1227000000,
            'equity': 5918000000,
            'cash': 339000000,
            'cash_flow_op': 320000000,
            'cash_flow_inv': 288000000,
            'cash_flow_fin': -371000000
        })

    def test_disca_20090930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1437107/000095012309056946/disca-20090930.xml')
        self.assert_item(item, {
            'symbol': 'DISCA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2009,
            'end_date': '2009-09-30',
            'revenues': 854000000,
            'op_income': 215000000,
            'net_income': 95000000,
            'eps_basic': 0.22,
            'eps_diluted': 0.22,
            'dividend': 0.0,
            'assets': 10741000000,
            'cur_assets': 1417000000,
            'cur_liab': 762000000,
            'equity': 6042000000,
            'cash': 401000000,
            'cash_flow_op': 358000000,
            'cash_flow_inv': 279000000,
            'cash_flow_fin': -343000000
        })

    def test_dltr_20130504(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/935703/000093570313000029/dltr-20130504.xml')
        self.assert_item(item, {
            'symbol': 'DLTR',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2013,
            'end_date': '2013-05-04',
            'revenues': 1865800000,
            'op_income': 216600000,
            'net_income': 133500000,
            'eps_basic': 0.6,
            'eps_diluted': 0.59,
            'dividend': 0.0,
            'assets': 2811800000,
            'cur_assets': 1489800000,
            'cur_liab': 663000000,
            'equity': 1739700000,
            'cash': 383300000,
            'cash_flow_op': 129300000,
            'cash_flow_inv': -88200000,
            'cash_flow_fin': -57400000
        })

    def test_dtv_20110331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1465112/000104746911004655/dtv-20110331.xml')
        self.assert_item(item, {
            'symbol': 'DTV',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2011,
            'end_date': '2011-03-31',
            'revenues': 6319000000,
            'op_income': 1155000000,
            'net_income': 674000000,
            'eps_basic': 0.85,
            'eps_diluted': 0.85,
            'dividend': 0.0,
            'assets': 20593000000,
            'cur_assets': 6938000000,
            'cur_liab': 4125000000,
            'equity': -902000000,
            'cash': 4295000000,
            'cash_flow_op': 1309000000,
            'cash_flow_inv': -544000000,
            'cash_flow_fin': 2028000000
        })

    def test_ebay_20100630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1065088/000119312510164115/ebay-20100630.xml')
        self.assert_item(item, {
            'symbol': 'EBAY',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2010,
            'end_date': '2010-06-30',
            'revenues': 2215379000,
            'op_income': 484565000,
            'net_income': 412192000,
            'eps_basic': 0.31,
            'eps_diluted': 0.31,
            'dividend': 0.0,
            'assets': 18747584000,
            'cur_assets': 8675313000,
            'cur_liab': 3564261000,
            'equity': 14169291000,
            'cash': 4037442000,
            'cash_flow_op': 1144641000,
            'cash_flow_inv': -835635000,
            'cash_flow_fin': 50363000
        })

    def test_ebay_20130331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1065088/000106508813000058/ebay-20130331.xml')
        self.assert_item(item, {
            'symbol': 'EBAY',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2013,
            'end_date': '2013-03-31',
            'revenues': 3748000000,
            'op_income': 800000000,
            'net_income': 677000000,
            'eps_basic': 0.52,
            'eps_diluted': 0.51,
            'dividend': 0.0,
            'assets': 38000000000,
            'cur_assets': 22336000000,
            'cur_liab': 11720000000,
            'equity': 21112000000,
            'cash': 6530000000,
            'cash_flow_op': 937000000,
            'cash_flow_inv': -719000000,
            'cash_flow_fin': -411000000
        })

    def test_ecl_20120930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/31462/000110465912072308/ecl-20120930.xml')
        self.assert_item(item, {
            'symbol': 'ECL',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2012,
            'end_date': '2012-09-30',
            'revenues': 3023300000,
            'op_income': 401200000,
            'net_income': 238000000,
            'eps_basic': 0.81,
            'eps_diluted': 0.8,
            'dividend': 0.2,
            'assets': 16722800000,
            'cur_assets': 4072900000,
            'cur_liab': 2818700000,
            'equity': 6026200000,
            'cash': 324000000,
            'cash_flow_op': 720800000,
            'cash_flow_inv': -414900000,
            'cash_flow_fin': -1815800000
        })

    def test_ed_20130930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/23632/000119312513425393/ed-20130930.xml')
        self.assert_item(item, {
            'symbol': 'ED',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2013,
            'end_date': '2013-09-30',
            'revenues': 3484000000,
            'op_income': 855000000,
            'net_income': 464000000,
            'eps_basic': 1.58,
            'eps_diluted': 1.58,
            'dividend': 0.615,
            'assets': 41964000000,
            'cur_assets': 3704000000,
            'cur_liab': 4373000000,
            'equity': 12166000000,
            'cash': 74000000,
            'cash_flow_op': 1238000000,
            'cash_flow_inv': -1895000000,
            'cash_flow_fin': 337000000
        })

    def test_eqt_20101231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/33213/000110465911009751/eqt-20101231.xml')
        self.assert_item(item, {
            'symbol': 'EQT',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2010,
            'end_date': '2010-12-31',
            'revenues': 1322708000,
            'op_income': 470479000,
            'net_income': 227700000,
            'eps_basic': 1.58,
            'eps_diluted': 1.57,
            'dividend': 0.88,
            'assets': 7098438000,
            'cur_assets': 827940000,
            'cur_liab': 596984000,
            'equity': 3078696000,
            'cash': 0.0,
            'cash_flow_op': 789740000,
            'cash_flow_inv': -1239429000,
            'cash_flow_fin': 449689000
        })

    def test_etr_20121231(self):
        # Large file test (121 MB)
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/7323/000006598413000050/etr-20121231.xml')
        self.assert_item(item, {
            'symbol': 'ETR',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2012,
            'end_date': '2012-12-31',
            'revenues': 10302079000,
            'op_income': 1301181000,
            'net_income': 846673000,
            'eps_basic': 4.77,
            'eps_diluted': 4.76,
            'dividend': 3.32,
            'assets': 43202502000,
            'cur_assets': 3683126000,
            'cur_liab': 4106321000,
            'equity': 9291089000,
            'cash': 532569000,
            'cash_flow_op': 2940285000,
            'cash_flow_inv': -3639797000,
            'cash_flow_fin': 538151000
        })

    def test_exc_20100930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/22606/000119312510234590/exc-20100930.xml')
        self.assert_item(item, {
            'symbol': 'EXC',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2010,
            'end_date': '2010-09-30',
            'revenues': 5291000000,
            'op_income': 1366000000,
            'net_income': 845000000,
            'eps_basic': 1.28,
            'eps_diluted': 1.27,
            'dividend': 0.53,
            'assets': 50948000000,
            'cur_assets': 6760000000,
            'cur_liab': 3967000000,
            'equity': 13955000000,
            'cash': 2735000000,
            'cash_flow_op': 4112000000,
            'cash_flow_inv': -2037000000,
            'cash_flow_fin': -1350000000
        })

    def test_fast_20090630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/815556/000119312509154691/fast-20090630.xml')
        self.assert_item(item, {
            'symbol': 'FAST',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2009,
            'end_date': '2009-06-30',
            'revenues': 474894000,
            'op_income': 69938000,
            'net_income': 43538000,
            'eps_basic': 0.29,
            'eps_diluted': 0.29,
            'dividend': 0.0,
            'assets': 1328684000,
            'cur_assets': 988997000,
            'cur_liab': 127950000,
            'equity': 1186845000,
            'cash': 173667000,
            'cash_flow_op': 167552000,
            'cash_flow_inv': -28942000,
            'cash_flow_fin': -51986000
        })

    def test_fast_20090930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/815556/000119312509212481/fast-20090930.xml')
        self.assert_item(item, {
            'symbol': 'FAST',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2009,
            'end_date': '2009-09-30',
            'revenues': 489339000,
            'op_income': 76410000,
            'net_income': 47589000,
            'eps_basic': 0.32,
            'eps_diluted': 0.32,
            'dividend': 0.0,
            'assets': 1337764000,
            'cur_assets': 998090000,
            'cur_liab': 138744000,
            'equity': 1185140000,
            'cash': 193744000,
            'cash_flow_op': 253184000,
            'cash_flow_inv': -41031000,
            'cash_flow_fin': -106943000
        })

    def test_fb_20120630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1326801/000119312512325997/fb-20120630.xml')
        self.assert_item(item, {
            'symbol': 'FB',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2012,
            'end_date': '2012-06-30',
            'revenues': 1184000000,
            'op_income': -743000000,
            'net_income': -157000000,
            'eps_basic': -0.08,
            'eps_diluted': -0.08,
            'dividend': 0.0,
            'assets': 14928000000,
            'cur_assets': 11967000000,
            'cur_liab': 1034000000,
            'equity': 13309000000,
            'cash': 2098000000,
            'cash_flow_op': 683000000,
            'cash_flow_inv': -7170000000,
            'cash_flow_fin': 7090000000
        })

    def test_fb_20121231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1326801/000132680113000003/fb-20121231.xml')
        self.assert_item(item, {
            'symbol': 'FB',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2012,
            'end_date': '2012-12-31',
            'revenues': 5089000000,
            'op_income': 538000000,
            'net_income': 32000000,
            'eps_basic': 0.02,
            'eps_diluted': 0.01,
            'dividend': 0.0,
            'assets': 15103000000,
            'cur_assets': 11267000000,
            'cur_liab': 1052000000,
            'equity': 11755000000,
            'cash': 2384000000,
            'cash_flow_op': 1612000000,
            'cash_flow_inv': -7024000000,
            'cash_flow_fin': 6283000000
        })

    def test_fll_20121231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/891482/000118811213000562/fll-20121231.xml')
        self.assert_item(item, {
            'symbol': 'FLL',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2012,
            'end_date': '2012-12-31',
            'revenues': 128760000,
            'op_income': 49638000,
            'net_income': 27834000,
            'eps_basic': 1.49,
            'eps_diluted': None,
            'dividend': 0.0,
            'assets': 162725000,
            'cur_assets': 32339000,
            'cur_liab': 15332000,
            'equity': 81133000,
            'cash': 20603000,
            'cash_flow_op': -4301000,
            'cash_flow_inv': 45271000,
            'cash_flow_fin': -35074000
        })

    def test_flr_20080930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1124198/000110465908068715/flr-20080930.xml')
        self.assert_item(item, {
            'symbol': 'FLR',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2008,
            'end_date': '2008-09-30',
            'revenues': 5673818000,
            'op_income': None,
            'net_income': 183099000,
            'eps_basic': 1.03,
            'eps_diluted': 1.01,
            'dividend': 0.125,
            'assets': 6605120000,
            'cur_assets': 4808393000,
            'cur_liab': 3228638000,
            'equity': 2741002000,
            'cash': 1514943000,
            'cash_flow_op': 855198000,
            'cash_flow_inv': -295445000,
            'cash_flow_fin': -202011000
        })

    def test_fmc_20090630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/37785/000119312509165435/fmc-20090630.xml')
        self.assert_item(item, {
            'symbol': 'FMC',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2009,
            'end_date': '2009-06-30',
            'revenues': 700300000,
            'op_income': 97200000,
            'net_income': 69300000,
            'eps_basic': 0.95,
            'eps_diluted': 0.94,
            'dividend': 0.0,
            'assets': 3028500000,
            'cur_assets': 1423700000,
            'cur_liab': 717200000,
            'equity': 1101200000,
            'cash': 67000000,
            'cash_flow_op': 173900000,
            'cash_flow_inv': -106500000,
            'cash_flow_fin': -33100000
        })

    def test_fpl_20100331(self):
        # FPL was later changed to NEE
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/37634/000075330810000051/fpl-20100331.xml')
        self.assert_item(item, {
            'symbol': 'FPL',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2010,
            'end_date': '2010-03-31',
            'revenues': 3622000000,
            'op_income': 939000000,
            'net_income': 556000000,
            'eps_basic': 1.36,
            'eps_diluted': 1.36,
            'dividend': 0.5,
            'assets': 50942000000,
            'cur_assets': 5557000000,
            'cur_liab': 7782000000,
            'equity': 13336000000,
            'cash': 1215000000,
            'cash_flow_op': 896000000,
            'cash_flow_inv': -1361000000,
            'cash_flow_fin': 1442000000
        })

    def test_ftr_20110930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/20520/000002052011000066/ftr-20110930.xml')
        self.assert_item(item, {
            'symbol': 'FTR',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2011,
            'end_date': '2011-09-30',
            'revenues': 1290939000,
            'op_income': 180291000,
            'net_income': 19481000,
            'eps_basic': 0.02,
            'eps_diluted': 0.02,
            'dividend': 0.0,
            'assets': 17493767000,
            'cur_assets': 969746000,
            'cur_liab': 1168142000,
            'equity': 4776588000,
            'cash': 205817000,
            'cash_flow_op': 1272654000,
            'cash_flow_inv': -676974000,
            'cash_flow_fin': -641126000
        })

    def test_ge_20121231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/40545/000004054513000036/ge-20121231.xml')
        self.assert_item(item, {
            'symbol': 'GE',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2012,
            'end_date': '2012-12-31',
            'revenues': 147359000000,
            'op_income': 22887000000,
            'net_income': 13641000000,
            'eps_basic': 1.29,
            'eps_diluted': 1.29,
            'dividend': 0.7,
            'assets': 685328000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 128470000000,
            'cash': 77356000000,
            'cash_flow_op': 31331000000,
            'cash_flow_inv': 11302000000,
            'cash_flow_fin': -51074000000
        })

    def test_gis_20121125(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/40704/000119312512508388/gis-20121125.xml')
        self.assert_item(item, {
            'symbol': 'GIS',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2012-11-25',
            'revenues': 4881800000,
            'op_income': 829000000,
            'net_income': 541600000,
            'eps_basic': 0.84,
            'eps_diluted': 0.82,
            'dividend': 0.33,
            'assets': 22952900000,
            'cur_assets': 4565500000,
            'cur_liab': 5736400000,
            'equity': 7440000000,
            'cash': 734900000,
            'cash_flow_op': 1317100000,
            'cash_flow_inv': -1103200000,
            'cash_flow_fin': 33700000
        })

    def test_gmcr_20110625(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/909954/000119312511214253/gmcr-20110630.xml')
        self.assert_item(item, {
            'symbol': 'GMCR',
            'amend': False,  # it's actually amended, but not marked in XML
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2011,
            'end_date': '2011-06-25',
            'revenues': 717210000,
            'op_income': 119310000,
            'net_income': 56348000,
            'eps_basic': 0.38,
            'eps_diluted': 0.37,
            'dividend': 0.0,
            'assets': 2874422000,
            'cur_assets': 844998000,
            'cur_liab': 395706000,
            'equity': 1816646000,
            'cash': 76138000,
            'cash_flow_op': 174708000,
            'cash_flow_inv': -1082070000,
            'cash_flow_fin': 986183000
        })

    def test_goog_20090930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1288776/000119312509222384/goog-20090930.xml')
        self.assert_item(item, {
            'symbol': 'GOOG',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2009,
            'end_date': '2009-09-30',
            'revenues': 5944851000,
            'op_income': 2073718000,
            'net_income': 1638975000,
            'eps_basic': 5.18,
            'eps_diluted': 5.13,
            'dividend': 0.0,
            'assets': 37702845000,
            'cur_assets': 26353544000,
            'cur_liab': 2321774000,
            'equity': 33721753000,
            'cash': 12087115000,
            'cash_flow_op': 6584667000,
            'cash_flow_inv': -3245963000,
            'cash_flow_fin': 74851000
        })

    def test_goog_20120930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1288776/000119312512440217/goog-20120930.xml')
        self.assert_item(item, {
            'symbol': 'GOOG',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2012,
            'end_date': '2012-09-30',
            'revenues': 14101000000,
            'op_income': 2736000000,
            'net_income': 2176000000,
            'eps_basic': 6.64,
            'eps_diluted': 6.53,
            'dividend': 0.0,
            'assets': 89730000000,
            'cur_assets': 56821000000,
            'cur_liab': 14434000000,
            'equity': 68028000000,
            'cash': 16260000000,
            'cash_flow_op': 11950000000,
            'cash_flow_inv': -7542000000,
            'cash_flow_fin': 1921000000
        })

    def test_goog_20121231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1288776/000119312513028362/goog-20121231.xml')
        self.assert_item(item, {
            'symbol': 'GOOG',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2012,
            'end_date': '2012-12-31',
            'revenues': 50175000000,
            'op_income': 12760000000,
            'net_income': 10737000000,
            'eps_basic': 32.81,
            'eps_diluted': 32.31,
            'dividend': 0.0,
            'assets': 93798000000,
            'cur_assets': 60454000000,
            'cur_liab': 14337000000,
            'equity': 71715000000,
            'cash': 14778000000,
            'cash_flow_op': 16619000000,
            'cash_flow_inv': -13056000000,
            'cash_flow_fin': 1229000000
        })

    def test_goog_20130630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1288776/000128877613000055/goog-20130630.xml')
        self.assert_item(item, {
            'symbol': 'GOOG',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2013-06-30',
            'revenues': 14105000000,
            'op_income': 3123000000,
            'net_income': 3228000000,
            'eps_basic': 9.71,
            'eps_diluted': 9.54,
            'dividend': 0.0,
            'assets': 101182000000,
            'cur_assets': 66861000000,
            'cur_liab': 15329000000,
            'equity': 78852000000,
            'cash': 16164000000,
            'cash_flow_op': 8338000000,
            'cash_flow_inv': -6244000000,
            'cash_flow_fin': -622000000
        })

    def test_goog_20140630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1288776/000128877614000065/goog-20140630.xml')
        self.assert_item(item, {
            'symbol': 'GOOG/GOOGL',  # Two symbols, see issue #6
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2014,
            'end_date': '2014-06-30',
            'revenues': 15955000000,
            'op_income': 4258000000,
            'net_income': 3422000000,
            'eps_basic': 5.07,
            'eps_diluted': 4.99,
            'dividend': 0.0,
            'assets': 121608000000,
            'cur_assets': 77905000000,
            'cur_liab': 17097000000,
            'equity': 95749000000,
            'cash': 19620000000,
            'cash_flow_op': 10018000000,
            'cash_flow_inv': -8487000000,
            'cash_flow_fin': -640000000
        })

    def test_gs_20090626(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/886982/000095012309029919/gs-20090626.xml')
        self.assert_item(item, {
            'symbol': 'GS',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2009,
            'end_date': '2009-06-26',
            'revenues': 13761000000,
            'op_income': None,
            'net_income': 2718000000,
            'eps_basic': 5.27,
            'eps_diluted': 4.93,
            'dividend': 0.35,
            'assets': 889544000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 62813000000,
            'cash': 22177000000,
            'cash_flow_op': 16020000000,
            'cash_flow_inv': -772000000,
            'cash_flow_fin': -6876000000
        })

    def test_hon_20120331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/773840/000093041312002323/hon-20120331.xml')
        self.assert_item(item, {
            'symbol': 'HON',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2012,
            'end_date': '2012-03-31',
            'revenues': 9307000000,
            'op_income': None,
            'net_income': 823000000,
            'eps_basic': 1.06,
            'eps_diluted': 1.04,
            'dividend': 0.3725,
            'assets': 40370000000,
            'cur_assets': 16553000000,
            'cur_liab': 12666000000,
            'equity': 11842000000,
            'cash': 3988000000,
            'cash_flow_op': 196000000,
            'cash_flow_inv': -122000000,
            'cash_flow_fin': 169000000
        })

    def test_hrb_20090731(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12659/000095012309041361/hrb-20090731.xml')
        self.assert_item(item, {
            'symbol': 'HRB',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2010,
            'end_date': '2009-07-31',
            'revenues': 275505000,
            'op_income': -214162000,
            'net_income': -133634000,
            'eps_basic': -0.4,
            'eps_diluted': -0.4,
            'dividend': 0.15,
            'assets': 4545762000,
            'cur_assets': 1828146000,
            'cur_liab': 1823126000,
            'equity': 1190714000,
            'cash': 1006303000,
            'cash_flow_op': -454577000,
            'cash_flow_inv': 15360000,
            'cash_flow_fin': -216206000
        })

    def test_hrb_20091031(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12659/000095012309069608/hrb-20091031.xml')
        self.assert_item(item, {
            'symbol': 'HRB',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2010,
            'end_date': '2009-10-31',
            'revenues': 326081000,
            'op_income': -214553000,
            'net_income': -128587000,
            'eps_basic': -0.38,
            'eps_diluted': -0.38,
            'dividend': 0.15,
            'assets': 4967359000,
            'cur_assets': 2300986000,
            'cur_liab': 2382867000,
            'equity': 1071097000,
            'cash': 1432243000,
            'cash_flow_op': -786152000,
            'cash_flow_inv': 43280000,
            'cash_flow_fin': 511231000
        })

    def test_hrb_20130731(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12659/000157484213000013/hrb-20130731.xml')
        self.assert_item(item, {
            'symbol': 'HRB',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2014,
            'end_date': '2013-07-31',
            'revenues': 127195000,
            'op_income': -179555000,
            'net_income': -115187000,
            'eps_basic': -0.42,
            'eps_diluted': -0.42,
            'dividend': 0.20,
            'assets': 3762888000,
            'cur_assets': 1704932000,
            'cur_liab': 1450484000,
            'equity': 1105315000,
            'cash': 1163876000,
            'cash_flow_op': -318742000,
            'cash_flow_inv': -29090000,
            'cash_flow_fin': -229255000
        })

    def test_ihc_20120331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/701869/000070186912000029/ihc-20120331.xml')
        self.assert_item(item, {
            'symbol': 'IHC',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2012,
            'end_date': '2012-03-31',
            'revenues': 102156000,
            'op_income': 6416000,
            'net_income': 3922000,
            'eps_basic': 0.22,
            'eps_diluted': 0.22,
            'dividend': 0.0,
            'assets': 1364411000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 280250000,
            'cash': 9286000,
            'cash_flow_op': -138843000,
            'cash_flow_inv': 130710000,
            'cash_flow_fin': -808000
        })

    def test_intc_20111231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/50863/000119312512075534/intc-20111231.xml')
        self.assert_item(item, {
            'symbol': 'INTC',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2011,
            'end_date': '2011-12-31',
            'revenues': 53999000000,
            'op_income': 17477000000,
            'net_income': 12942000000,
            'eps_basic': 2.46,
            'eps_diluted': 2.39,
            'dividend': 0.7824,
            'assets': 71119000000,
            'cur_assets': 25872000000,
            'cur_liab': 12028000000,
            'equity': 45911000000,
            'cash': 5065000000,
            'cash_flow_op': 20963000000,
            'cash_flow_inv': -10301000000,
            'cash_flow_fin': -11100000000
        })

    def test_intu_20101031(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/896878/000095012310111135/intu-20101031.xml')
        self.assert_item(item, {
            'symbol': 'INTU',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2011,
            'end_date': '2010-10-31',
            'revenues': 532000000,
            'op_income': -104000000,
            'net_income': -70000000,
            'eps_basic': -0.22,
            'eps_diluted': -0.22,
            'dividend': 0.0,
            'assets': 4943000000,
            'cur_assets': 2010000000,
            'cur_liab': 1136000000,
            'equity': 2615000000,
            'cash': 112000000,
            'cash_flow_op': -211000000,
            'cash_flow_inv': 285000000,
            'cash_flow_fin': -177000000
        })

    def test_jnj_20120101(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/200406/000119312512075565/jnj-20120101.xml')
        self.assert_item(item, {
            'symbol': 'JNJ',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2011,
            'end_date': '2012-01-01',
            'revenues': 65030000000,
            'op_income': 13765000000,
            'net_income': 9672000000,
            'eps_basic': 3.54,
            'eps_diluted': 3.49,
            'dividend': 2.25,
            'assets': 113644000000,
            'cur_assets': 54316000000,
            'cur_liab': 22811000000,
            'equity': 57080000000,
            'cash': 24542000000,
            'cash_flow_op': 14298000000,
            'cash_flow_inv': -4612000000,
            'cash_flow_fin': -4452000000
        })

    def test_jnj_20120930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/200406/000020040612000140/jnj-20120930.xml')
        self.assert_item(item, {
            'symbol': 'JNJ',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2012,
            'end_date': '2012-09-30',
            'revenues': 17052000000,
            'op_income': 3825000000,
            'net_income': 2968000000,
            'eps_basic': 1.08,
            'eps_diluted': 1.05,
            'dividend': 0.61,
            'assets': 118951000000,
            'cur_assets': 44791000000,
            'cur_liab': 23935000000,
            'equity': 63761000000,
            'cash': 15486000000,
            'cash_flow_op': 12020000000,
            'cash_flow_inv': -2007000000,
            'cash_flow_fin': -19091000000
        })

    def test_jnj_20130630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/200406/000020040613000091/jnj-20130630.xml')
        self.assert_item(item, {
            'symbol': 'JNJ',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2013-06-30',
            'revenues': 17877000000,
            'op_income': 5020000000,
            'net_income': 3833000000,
            'eps_basic': 1.36,
            'eps_diluted': 1.33,
            'dividend': 0.66,
            'assets': 124325000000,
            'cur_assets': 51273000000,
            'cur_liab': 23767000000,
            'equity': 69665000000,
            'cash': 17307000000,
            'cash_flow_op': 7328000000,
            'cash_flow_inv': -1972000000,
            'cash_flow_fin': -2754000000
        })

    def test_jpm_20090630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/19617/000095012309032832/jpm-20090630.xml')
        self.assert_item(item, {
            'symbol': 'JPM',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2009,
            'end_date': '2009-06-30',
            'revenues': 25623000000,
            'op_income': None,
            'net_income': 1072000000,
            'eps_basic': 0.28,
            'eps_diluted': 0.28,
            'dividend': 0.05,
            'assets': 2026642000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 154766000000,
            'cash': 25133000000,
            'cash_flow_op': 103259000000,
            'cash_flow_inv': 34430000000,
            'cash_flow_fin': -139413000000
        })

    def test_jpm_20111231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/19617/000001961712000163/jpm-20111231.xml')
        self.assert_item(item, {
            'symbol': 'JPM',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2011,
            'end_date': '2011-12-31',
            'revenues': 97234000000,
            'op_income': None,
            'net_income': 17568000000,
            'eps_basic': 4.50,
            'eps_diluted': 4.48,
            'dividend': 1.0,
            'assets': 2265792000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 183573000000,
            'cash': 59602000000,
            'cash_flow_op': 95932000000,
            'cash_flow_inv': -170752000000,
            'cash_flow_fin': 107706000000
        })

    def test_jpm_20130331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/19617/000001961713000300/jpm-20130331.xml')
        self.assert_item(item, {
            'symbol': 'JPM',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2013,
            'end_date': '2013-03-31',
            'revenues': 25122000000,
            'op_income': None,
            'net_income': 6131000000,
            'eps_basic': 1.61,
            'eps_diluted': 1.59,
            'dividend': 0.30,
            'assets': 2389349000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 207086000000,
            'cash': 45524000000,
            'cash_flow_op': 19964000000,
            'cash_flow_inv': -55455000000,
            'cash_flow_fin': 28180000000
        })

    def test_ko_20100402(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/21344/000104746910004416/ko-20100402.xml')
        self.assert_item(item, {
            'symbol': 'KO',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2010,
            'end_date': '2010-04-02',
            'revenues': 7525000000,
            'op_income': 2183000000,
            'net_income': 1614000000,
            'eps_basic': 0.70,
            'eps_diluted': 0.69,
            'dividend': 0.44,
            'assets': 47403000000,
            'cur_assets': 17208000000,
            'cur_liab': 13583000000,
            'equity': 25157000000,
            'cash': 5684000000,
            'cash_flow_op': 1326000000,
            'cash_flow_inv': -1368000000,
            'cash_flow_fin': -1043000000
        })

    def test_ko_20101231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/21344/000104746911001506/ko-20101231.xml')
        self.assert_item(item, {
            'symbol': 'KO',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2010,
            'end_date': '2010-12-31',
            'revenues': 35119000000,
            'op_income': 8449000000,
            'net_income': 11809000000,
            'eps_basic': 5.12,
            'eps_diluted': 5.06,
            'dividend': 1.76,
            'assets': 72921000000,
            'cur_assets': 21579000000,
            'cur_liab': 18508000000,
            'equity': 31317000000,
            'cash': 8517000000,
            'cash_flow_op': 9532000000,
            'cash_flow_inv': -4405000000,
            'cash_flow_fin': -3465000000
        })

    def test_ko_20120928(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/21344/000002134412000051/ko-20120928.xml')
        self.assert_item(item, {
            'symbol': 'KO',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2012,
            'end_date': '2012-09-28',
            'revenues': 12340000000,
            'op_income': 2793000000,
            'net_income': 2311000000,
            'eps_basic': 0.51,
            'eps_diluted': 0.50,
            'dividend': 0.255,
            'assets': 86654000000,
            'cur_assets': 29712000000,
            'cur_liab': 27008000000,
            'equity': 33590000000,
            'cash': 9615000000,
            'cash_flow_op': 7840000000,
            'cash_flow_inv': -10399000000,
            'cash_flow_fin': -399000000
        })

    def test_krft_20120930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1545158/000119312512495570/krft-20120930.xml')
        self.assert_item(item, {
            'symbol': 'KRFT',
            'amend': True,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2012,
            'end_date': '2012-09-30',
            'revenues': 4606000000,
            'op_income': 762000000,
            'net_income': 470000000,
            'eps_basic': 0.79,
            'eps_diluted': 0.79,
            'dividend': 0.0,
            'assets': 22284000000,
            'cur_assets': 3905000000,
            'cur_liab': 2569000000,
            'equity': 7458000000,
            'cash': 244000000,
            'cash_flow_op': 2067000000,
            'cash_flow_inv': -279000000,
            'cash_flow_fin': -1548000000
        })

    def test_l_20100331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/60086/000119312510105707/l-20100331.xml')
        self.assert_item(item, {
            'symbol': 'L',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2010,
            'end_date': '2010-03-31',
            'revenues': 3713000000,
            'op_income': None,
            'net_income': 420000000,
            'eps_basic': 0.99,
            'eps_diluted': 0.99,
            'dividend': 0.0625,
            'assets': 75855000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 21993000000,
            'cash': 135000000,
            'cash_flow_op': 294000000,
            'cash_flow_inv': -411000000,
            'cash_flow_fin': 64000000
        })

    def test_l_20100930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/60086/000119312510245478/l-20100930.xml')
        self.assert_item(item, {
            'symbol': 'L',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2010,
            'end_date': '2010-09-30',
            'revenues': 3701000000,
            'op_income': None,
            'net_income': 36000000,
            'eps_basic': 0.09,
            'eps_diluted': 0.09,
            'dividend': 0.0625,
            'assets': 76821000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 23499000000,
            'cash': 132000000,
            'cash_flow_op': 895000000,
            'cash_flow_inv': -426000000,
            'cash_flow_fin': -527000000
        })

    def test_lbtya_20100331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1316631/000119312510111069/lbtya-20100331.xml')
        self.assert_item(item, {
            'symbol': 'LBTYA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2010,
            'end_date': '2010-03-31',
            'revenues': 2178900000,
            'op_income': 303600000,
            'net_income': 736600000,
            'eps_basic': 2.75,
            'eps_diluted': 2.75,
            'dividend': 0.0,
            'assets': 33083500000,
            'cur_assets': 5524900000,
            'cur_liab': 4107000000,
            'equity': 4066000000,
            'cash': 4184200000,
            'cash_flow_op': 803300000,
            'cash_flow_inv': 45400000,
            'cash_flow_fin': 170700000
        })

    def test_lcapa_20110930(self):
        # This symbol was changed to STRZA
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1507934/000150793411000006/lcapa-20110930.xml')
        self.assert_item(item, {
            'symbol': 'LCAPA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2011,
            'end_date': '2011-09-30',
            'revenues': 540000000,
            'op_income': 111000000,
            'net_income': -42000000,
            'eps_basic': -0.07,
            'eps_diluted': -0.12,
            'dividend': 0.0,
            'assets': 8915000000,
            'cur_assets': 3767000000,
            'cur_liab': 3012000000,
            'equity': 5078000000,
            'cash': 1937000000,
            'cash_flow_op': 316000000,
            'cash_flow_inv': -205000000,
            'cash_flow_fin': -264000000
        })

    def test_linta_20120331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1355096/000135509612000008/linta-20120331.xml')
        self.assert_item(item, {
            'symbol': 'LINTA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2012,
            'end_date': '2012-03-31',
            'revenues': 2314000000,
            'op_income': 258000000,
            'net_income': 91000000,
            'eps_basic': 0.16,
            'eps_diluted': 0.16,
            'dividend': 0.0,
            'assets': 17144000000,
            'cur_assets': 2764000000,
            'cur_liab': 3486000000,
            'equity': 6505000000,
            'cash': 794000000,
            'cash_flow_op': 330000000,
            'cash_flow_inv': -91000000,
            'cash_flow_fin': -284000000
        })

    def test_lll_20100625(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1039101/000095012310071159/lll-20100625.xml')
        self.assert_item(item, {
            'symbol': 'LLL',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2010,
            'end_date': '2010-06-25',
            'revenues': -3966000000,  # a doc's error, should be 3966M
            'op_income': -442000000,  # a doc's error, should be 442M
            'net_income': -228000000,  # a doc's error, should be 227M
            'eps_basic': 1.97,
            'eps_diluted': 1.95,
            'dividend': 0.4,
            'assets': 15689000000,
            'cur_assets': 5494000000,
            'cur_liab': 3730000000,
            'equity': 6926000000,
            'cash': 1023000000,
            'cash_flow_op': 589000000,
            'cash_flow_inv': -688000000,
            'cash_flow_fin': 132000000
        })

    def test_lltc_20110102(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/791907/000079190711000016/lltc-20110102.xml')
        self.assert_item(item, {
            'symbol': 'LLTC',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2011,
            'end_date': '2011-01-02',
            'revenues': 383621000,
            'op_income': 201059000,
            'net_income': 143743000,
            'eps_basic': 0.62,
            'eps_diluted': 0.62,
            'dividend': 0.23,
            'assets': 1446186000,
            'cur_assets': 1069958000,
            'cur_liab': 199210000,
            'equity': 278793000,
            'cash': 203308000,
            'cash_flow_op': 342333000,
            'cash_flow_inv': 39771000,
            'cash_flow_fin': -474650000
        })

    def test_lltc_20111002(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/791907/000079190711000080/lltc-20111007.xml')
        self.assert_item(item, {
            'symbol': 'LLTC',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2012,
            'end_date': '2011-10-02',
            'revenues': 329920000,
            'op_income': 157566000,
            'net_income': 108401000,
            'eps_basic': 0.47,
            'eps_diluted': 0.47,
            'dividend': 0.24,
            'assets': 1659341000,
            'cur_assets': 1268413000,
            'cur_liab': 169006000,
            'equity': 543199000,
            'cash': 163414000,
            'cash_flow_op': 149860000,
            'cash_flow_inv': -171884000,
            'cash_flow_fin': -85085000
        })

    def test_lly_20100930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/59478/000095012310097867/lly-20100930.xml')
        self.assert_item(item, {
            'symbol': 'LLY',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2010,
            'end_date': '2010-09-30',
            'revenues': 5654800000,
            'op_income': None,
            'net_income': 1302900000,
            'eps_basic': 1.18,
            'eps_diluted': 1.18,
            'dividend': 0.49,
            'assets': 29904300000,
            'cur_assets': 14184300000,
            'cur_liab': 6097400000,
            'equity': 12405500000,
            'cash': 5908800000,
            'cash_flow_op': 4628700000,
            'cash_flow_inv': -1595300000,
            'cash_flow_fin': -1472300000
        })

    def test_lmca_20120331(self):
        # This symbol was changed to STRZA
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1507934/000150793412000012/lmca-20120331.xml')
        self.assert_item(item, {
            'symbol': 'LMCA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2012,
            'end_date': '2012-03-31',
            'revenues': 440000000,
            'op_income': 89000000,
            'net_income': 137000000,
            'eps_basic': 1.13,
            'eps_diluted': 1.10,
            'dividend': 0.0,
            'assets': 7122000000,
            'cur_assets': 3380000000,
            'cur_liab': 547000000,
            'equity': 5321000000,
            'cash': 1915000000,
            'cash_flow_op': 94000000,
            'cash_flow_inv': 581000000,
            'cash_flow_fin': -830000000
        })

    def test_lnc_20120930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/59558/000005955812000143/lnc-20120930.xml')
        self.assert_item(item, {
            'symbol': 'LNC',
            'amend': False,  # mistake in doc, should be True
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2012,
            'end_date': '2012-09-30',
            'revenues': None,  # missing in doc, should be 2954000000
            'op_income': None,
            'net_income': 402000000,
            'eps_basic': 1.45,
            'eps_diluted': 1.41,
            'dividend': 0.0,
            'assets': 215458000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 15237000000,
            'cash': 4373000000,
            'cash_flow_op': 666000000,
            'cash_flow_inv': -2067000000,
            'cash_flow_fin': 1264000000
        })

    def test_ltd_20111029(self):
        # This symbol was changed to LB
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/701985/000144530511003514/ltd-20111029.xml')
        self.assert_item(item, {
            'symbol': 'LTD',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2011,
            'end_date': '2011-10-29',
            'revenues': 2174000000,
            'op_income': 186000000,
            'net_income': 94000000,
            'eps_basic': 0.32,
            'eps_diluted': 0.31,
            'dividend': 0.2,
            'assets': 6517000000,
            'cur_assets': 2616000000,
            'cur_liab': 1504000000,
            'equity': 521000000,
            'cash': 498000000,
            'cash_flow_op': 94000000,
            'cash_flow_inv': -239000000,
            'cash_flow_fin': -489000000
        })

    def test_ltd_20130803(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/701985/000070198513000032/ltd-20130803.xml')
        self.assert_item(item, {
            'symbol': 'LTD',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2013-08-03',
            'revenues': 2516000000,
            'op_income': 358000000,
            'net_income': 178000000,
            'eps_basic': 0.62,
            'eps_diluted': 0.61,
            'dividend': 0.3,
            'assets': 6072000000,
            'cur_assets': 2098000000,
            'cur_liab': 1485000000,
            'equity': -861000000,
            'cash': 551000000,
            'cash_flow_op': 354000000,
            'cash_flow_inv': -381000000,
            'cash_flow_fin': -194000000
        })

    def test_luv_20110630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/92380/000009238011000070/luv-20110630.xml')
        self.assert_item(item, {
            'symbol': 'LUV',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2011,
            'end_date': '2011-06-30',
            'revenues': 4136000000,
            'op_income': 207000000,
            'net_income': 161000000,
            'eps_basic': 0.21,
            'eps_diluted': 0.21,
            'dividend': 0.0045,
            'assets': 18945000000,
            'cur_assets': 5421000000,
            'cur_liab': 5318000000,
            'equity': 7202000000,
            'cash': 1595000000,
            'cash_flow_op': 237000000,
            'cash_flow_inv': -589000000,
            'cash_flow_fin': -92000000
        })

    def test_mchp_20120630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/827054/000082705412000230/mchp-20120630.xml')
        self.assert_item(item, {
            'symbol': 'MCHP',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2013,
            'end_date': '2012-06-30',
            'revenues': 352134000,
            'op_income': 96333000,
            'net_income': 78710000,
            'eps_basic': 0.41,
            'eps_diluted': 0.39,
            'dividend': 0.35,
            'assets': 3144840000,
            'cur_assets': 2229298000,
            'cur_liab': 249989000,
            'equity': 2017990000,
            'cash': 779848000,
            'cash_flow_op': 128971000,
            'cash_flow_inv': 77890000,
            'cash_flow_fin': -62768000
        })

    def test_mdlz_20130930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1103982/000119312513431957/mdlz-20130930.xml')
        self.assert_item(item, {
            'symbol': 'MDLZ',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2013,
            'end_date': '2013-09-30',
            'revenues': 8472000000,
            'op_income': 1262000000,
            'net_income': 1024000000,
            'eps_basic': 0.58,
            'eps_diluted': 0.57,
            'dividend': 0.14,
            'assets': 74859000000,
            'cur_assets': 15463000000,
            'cur_liab': 15269000000,
            'equity': 32492000000,
            'cash': 3692000000,
            'cash_flow_op': 1198000000,
            'cash_flow_inv': -1015000000,
            'cash_flow_fin': -881000000
        })

    def test_mmm_20091231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/66740/000110465910007295/mmm-20091231.xml')
        self.assert_item(item, {
            'symbol': 'MMM',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2009,
            'end_date': '2009-12-31',
            'revenues': 23123000000,
            'op_income': 4814000000,
            'net_income': 3193000000,
            'eps_basic': 4.56,
            'eps_diluted': 4.52,
            'dividend': 2.04,
            'assets': 27250000000,
            'cur_assets': 10795000000,
            'cur_liab': 4897000000,
            'equity': 13302000000,
            'cash': 3040000000,
            'cash_flow_op': 4941000000,
            'cash_flow_inv': -1732000000,
            'cash_flow_fin': -2014000000
        })

    def test_mmm_20120331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/66740/000110465912032441/mmm-20120331.xml')
        self.assert_item(item, {
            'symbol': 'MMM',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2012,
            'end_date': '2012-03-31',
            'revenues': 7486000000,
            'op_income': 1634000000,
            'net_income': 1125000000,
            'eps_basic': 1.61,
            'eps_diluted': 1.59,
            'dividend': 0.59,
            'assets': 32015000000,
            'cur_assets': 12853000000,
            'cur_liab': 5408000000,
            'equity': 16619000000,
            'cash': 2332000000,
            'cash_flow_op': 828000000,
            'cash_flow_inv': -43000000,
            'cash_flow_fin': -722000000
        })

    def test_mmm_20130630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/66740/000110465913058961/mmm-20130630.xml')
        self.assert_item(item, {
            'symbol': 'MMM',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2013-06-30',
            'revenues': 7752000000,
            'op_income': 1702000000,
            'net_income': 1197000000,
            'eps_basic': 1.74,
            'eps_diluted': 1.71,
            'dividend': 0.635,
            'assets': 34130000000,
            'cur_assets': 13983000000,
            'cur_liab': 6335000000,
            'equity': 18319000000,
            'cash': 2942000000,
            'cash_flow_op': 2673000000,
            'cash_flow_inv': -740000000,
            'cash_flow_fin': -1727000000
        })

    def test_mnst_20130630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/865752/000110465913062263/mnst-20130630.xml')
        self.assert_item(item, {
            'symbol': 'MNST',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2013-06-30',
            'revenues': 630934000,
            'op_income': 179427000,
            'net_income': 106873000,
            'eps_basic': 0.64,
            'eps_diluted': 0.62,
            'dividend': 0.0,
            'assets': 1317842000,
            'cur_assets': 1093822000,
            'cur_liab': 346174000,
            'equity': 856021000,
            'cash': 283839000,
            'cash_flow_op': 99720000,
            'cash_flow_inv': -70580000,
            'cash_flow_fin': 30981000
        })

    def test_msft_20110630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/789019/000119312511200680/msft-20110630.xml')
        self.assert_item(item, {
            'symbol': 'MSFT',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2011,
            'end_date': '2011-06-30',
            'revenues': 69943000000,
            'op_income': 27161000000,
            'net_income': 23150000000,
            'eps_basic': 2.73,
            'eps_diluted': 2.69,
            'dividend': 0.64,
            'assets': 108704000000,
            'cur_assets': 74918000000,
            'cur_liab': 28774000000,
            'equity': 57083000000,
            'cash': 9610000000,
            'cash_flow_op': 26994000000,
            'cash_flow_inv': -14616000000,
            'cash_flow_fin': -8376000000
        })

    def test_msft_20111231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/789019/000119312512026864/msft-20111231.xml')
        self.assert_item(item, {
            'symbol': 'MSFT',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2012,
            'end_date': '2011-12-31',
            'revenues': 20885000000,
            'op_income': 7994000000,
            'net_income': 6624000000,
            'eps_basic': 0.79,
            'eps_diluted': 0.78,
            'dividend': 0.20,
            'assets': 112243000000,
            'cur_assets': 72513000000,
            'cur_liab': 25373000000,
            'equity': 64121000000,
            'cash': 10610000000,
            'cash_flow_op': 5862000000,
            'cash_flow_inv': -5568000000,
            'cash_flow_fin': -2513000000
        })

    def test_msft_20130331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/789019/000119312513160748/msft-20130331.xml')
        self.assert_item(item, {
            'symbol': 'MSFT',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2013,
            'end_date': '2013-03-31',
            'revenues': 20489000000,
            'op_income': 7612000000,
            'net_income': 6055000000,
            'eps_basic': 0.72,
            'eps_diluted': 0.72,
            'dividend': 0.23,
            'assets': 134105000000,
            'cur_assets': 93524000000,
            'cur_liab': 31929000000,
            'equity': 76688000000,
            'cash': 5240000000,
            'cash_flow_op': 9666000000,
            'cash_flow_inv': -7660000000,
            'cash_flow_fin': -2744000000
        })

    def test_mu_20121129(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/723125/000072312513000007/mu-20121129.xml')
        self.assert_item(item, {
            'symbol': 'MU',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2013,
            'end_date': '2012-11-29',
            'revenues': 1834000000,
            'op_income': -157000000,
            'net_income': -275000000,
            'eps_basic': -0.27,
            'eps_diluted': -0.27,
            'dividend': 0.0,
            'assets': 14067000000,
            'cur_assets': 5315000000,
            'cur_liab': 2138000000,
            'equity': 8186000000,
            'cash': 2102000000,
            'cash_flow_op': 236000000,
            'cash_flow_inv': -639000000,
            'cash_flow_fin': 46000000
        })

    def test_mxim_20110326(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/743316/000144530511000751/mxim-20110422.xml')
        self.assert_item(item, {
            'symbol': 'MXIM',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2011,
            'end_date': '2011-03-26',
            'revenues': 606775000,
            'op_income': 163995000,
            'net_income': 136276000,
            'eps_basic': 0.46,
            'eps_diluted': 0.45,
            'dividend': 0.21,
            'assets': 3452417000,
            'cur_assets': 1676593000,
            'cur_liab': 391153000,
            'equity': 2465040000,
            'cash': 868923000,
            'cash_flow_op': 615180000,
            'cash_flow_inv': -224755000,
            'cash_flow_fin': -348014000
        })

    def test_nflx_20120930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1065280/000106528012000020/nflx-20120930.xml')
        self.assert_item(item, {
            'symbol': 'NFLX',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2012,
            'end_date': '2012-09-30',
            'revenues': 905089000,
            'op_income': 16135000,
            'net_income': 7675000,
            'eps_basic': 0.14,
            'eps_diluted': 0.13,
            'dividend': 0.0,
            'assets': 3808833000,
            'cur_assets': 2225018000,
            'cur_liab': 1598223000,
            'equity': 716840000,
            'cash': 370298000,
            'cash_flow_op': 150000,
            'cash_flow_inv': -33524000,
            'cash_flow_fin': -158000
        })

    def test_nvda_20130127(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1045810/000104581013000008/nvda-20130127.xml')
        self.assert_item(item, {
            'symbol': 'NVDA',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2013,
            'end_date': '2013-01-27',
            'revenues': 4280159000,
            'op_income': 648239000,
            'net_income': 562536000,
            'eps_basic': 0.91,
            'eps_diluted': 0.9,
            'dividend': 0.075,
            'assets': 6412245000,
            'cur_assets': 4775258000,
            'cur_liab': 976223000,
            'equity': 4827703000,
            'cash': 732786000,
            'cash_flow_op': 824172000,
            'cash_flow_inv': -743992000,
            'cash_flow_fin': -15270000
        })

    def test_nws_20090930(self):
        # This symbol was changed to FOX
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1308161/000119312509224062/nws-20090930.xml')
        self.assert_item(item, {
            'symbol': 'NWS',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2010,
            'end_date': '2009-09-30',
            'revenues': 7199000000,
            'op_income': 1042000000,
            'net_income': 571000000,
            'eps_basic': 0.22,
            'eps_diluted': 0.22,
            'dividend': 0.06,
            'assets': 55316000000,
            'cur_assets': 17425000000,
            'cur_liab': 10990000000,
            'equity': 24479000000,
            'cash': 7832000000,
            'cash_flow_op': 680000000,
            'cash_flow_inv': -362000000,
            'cash_flow_fin': 942000000
        })

    def test_omx_20110924(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12978/000119312511286448/omx-20110924.xml')
        self.assert_item(item, {
            'symbol': 'OMX',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2011,
            'end_date': '2011-09-24',
            'revenues': 1774767000,
            'op_income': 41296000,
            'net_income': 21518000,
            'eps_basic': 0.25,
            'eps_diluted': 0.25,
            'dividend': 0.0,
            'assets': 4002981000,
            'cur_assets': 1950996000,
            'cur_liab': 998377000,
            'equity': 657636000,
            'cash': 485426000,
            'cash_flow_op': 78743000,
            'cash_flow_inv': -41380000,
            'cash_flow_fin': -11280000
        })

    def test_omx_20111231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12978/000119312512077611/omx-20111231.xml')
        self.assert_item(item, {
            'symbol': 'OMX',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2011,
            'end_date': '2011-12-31',
            'revenues': 7121167000,
            'op_income': 86486000,
            'net_income': 32771000,
            'eps_basic': 0.38,
            'eps_diluted': 0.38,
            'dividend': 0.0,
            'assets': 4069275000,
            'cur_assets': 1938974000,
            'cur_liab': 1013301000,
            'equity': 568993000,
            'cash': 427111000,
            'cash_flow_op': 53679000,
            'cash_flow_inv': -69373000,
            'cash_flow_fin': -17952000
        })

    def test_omx_20121229(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12978/000119312513073972/omx-20121229.xml')
        self.assert_item(item, {
            'symbol': 'OMX',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2012,
            'end_date': '2012-12-29',
            'revenues': 6920384000,
            'op_income': 24278000,
            'net_income': 414694000,
            'eps_basic': 4.79,
            'eps_diluted': 4.74,
            'dividend': 0.0,
            'assets': 3784315000,
            'cur_assets': 1983884000,
            'cur_liab': 1056641000,
            'equity': 1034373000,
            'cash': 495056000,
            'cash_flow_op': 185201000,
            'cash_flow_inv': -85244000,
            'cash_flow_fin': -34836000
        })

    def test_orly_20130331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/898173/000089817313000028/orly-20130331.xml')
        self.assert_item(item, {
            'symbol': 'ORLY',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2013,
            'end_date': '2013-03-31',
            'revenues': 1585009000,
            'op_income': 251084000,
            'net_income': 154329000,
            'eps_basic': 1.38,
            'eps_diluted': 1.36,
            'dividend': 0.0,
            'assets': 5789541000,
            'cur_assets': 2741188000,
            'cur_liab': 2349022000,
            'equity': 2072525000,
            'cash': 205410000,
            'cash_flow_op': 226344000,
            'cash_flow_inv': -72100000,
            'cash_flow_fin': -196962000
        })

    def test_pay_20110430(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1312073/000119312511161119/pay-20110430.xml')
        self.assert_item(item, {
            'symbol': 'PAY',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2011,
            'end_date': '2011-04-30',
            'revenues': 292446000,
            'op_income': 37338000,
            'net_income': 25200000,
            'eps_basic': 0.29,
            'eps_diluted': 0.27,
            'dividend': 0.0,
            'assets': 1252289000,
            'cur_assets': 935395000,
            'cur_liab': 303590000,
            'equity': 332172000,
            'cash': 531542000,
            'cash_flow_op': 68831000,
            'cash_flow_inv': -20049000,
            'cash_flow_fin': 34676000
        })

    def test_pcar_20100331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/75362/000119312510108284/pcar-20100331.xml')
        self.assert_item(item, {
            'symbol': 'PCAR',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2010,
            'end_date': '2010-03-31',
            'revenues': 2230700000,
            'op_income': None,
            'net_income': 68300000,
            'eps_basic': 0.19,
            'eps_diluted': 0.19,
            'dividend': 0.09,
            'assets': 13990000000,
            'cur_assets': 3396400000,
            'cur_liab': 1425900000,
            'equity': 5092600000,
            'cash': 1854700000,
            'cash_flow_op': 285400000,
            'cash_flow_inv': 40500000,
            'cash_flow_fin': -350800000
        })

    def test_pcg_20091231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1004980/000100498010000015/pcg-20091231.xml')
        self.assert_item(item, {
            'symbol': 'PCG',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2009,
            'end_date': '2009-12-31',
            'revenues': 13399000000,
            'op_income': 2299000000,
            'net_income': 1220000000,
            'eps_basic': 3.25,
            'eps_diluted': 3.2,
            'dividend': 1.68,
            'assets': 42945000000,
            'cur_assets': 5657000000,
            'cur_liab': 6813000000,
            'equity': 10585000000,
            'cash': 527000000,
            'cash_flow_op': 3039000000,
            'cash_flow_inv': -3336000000,
            'cash_flow_fin': 605000000
        })

    def test_plt_20130630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/914025/000091402513000049/plt-20130630.xml')
        self.assert_item(item, {
            'symbol': 'PLT',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2014,
            'end_date': '2013-06-30',
            'revenues': 202818000,
            'op_income': 35949000,
            'net_income': 26953000,
            'eps_basic': 0.63,
            'eps_diluted': 0.62,
            'dividend': 0.1,
            'assets': 780520000,
            'cur_assets': 568272000,
            'cur_liab': 90121000,
            'equity': 673569000,
            'cash': 256343000,
            'cash_flow_op': 34140000,
            'cash_flow_inv': -4120000,
            'cash_flow_fin': -2424000
        })

    def test_qep_20110630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1108827/000119312511202252/qep-20110630.xml')
        self.assert_item(item, {
            'symbol': 'QEP',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2011,
            'end_date': '2011-06-30',
            'revenues': 784100000,
            'op_income': 168900000,
            'net_income': 92800000,
            'eps_basic': 0.52,
            'eps_diluted': 0.52,
            'dividend': 0.02,
            'assets': 7075000000,
            'cur_assets': 655600000,
            'cur_liab': 582900000,
            'equity': 3184400000,
            'cash': None,
            'cash_flow_op': 628600000,
            'cash_flow_inv': -660200000,
            'cash_flow_fin': 31600000
        })

    def test_qep_20120930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1108827/000110882712000006/qep-20120930.xml')
        self.assert_item(item, {
            'symbol': 'QEP',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2012,
            'end_date': '2012-09-30',
            'revenues': 542400000,
            'op_income': -12600000,
            'net_income': -3100000,
            'eps_basic': -0.02,
            'eps_diluted': -0.02,
            'dividend': 0.02,
            'assets': 8996100000,
            'cur_assets': 619800000,
            'cur_liab': 616700000,
            'equity': 3377000000,
            'cash': 0.0,
            'cash_flow_op': 972000000,
            'cash_flow_inv': -2435700000,
            'cash_flow_fin': 1463700000
        })

    def test_regn_20100630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/872589/000120677410001689/regn-20100630.xml')
        self.assert_item(item, {
            'symbol': 'REGN',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2010,
            'end_date': '2010-06-30',
            'revenues': 115886000,
            'op_income': -23724000,
            'net_income': -25474000,
            'eps_basic': -0.31,
            'eps_diluted': -0.31,
            'dividend': 0.0,
            'assets': 790641000,
            'cur_assets': 417750000,
            'cur_liab': 119571000,
            'equity': 371216000,
            'cash': 112000000,
            'cash_flow_op': -22626000,
            'cash_flow_inv': -131383000,
            'cash_flow_fin': 58934000
        })

    def test_sbac_20110331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1034054/000119312511130220/sbac-20110331.xml')
        self.assert_item(item, {
            'symbol': 'SBAC',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2011,
            'end_date': '2011-03-31',
            'revenues': 167749000,
            'op_income': 23899000,
            'net_income': -34251000,
            'eps_basic': -0.3,
            'eps_diluted': -0.3,
            'dividend': 0.0,
            'assets': 3466258000,
            'cur_assets': 173387000,
            'cur_liab': 120247000,
            'equity': 213078000,
            'cash': 95104000,
            'cash_flow_op': 53197000,
            'cash_flow_inv': -108748000,
            'cash_flow_fin': 86401000
        })

    def test_shld_20101030(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1310067/000119312510263486/shld-20101030.xml')
        self.assert_item(item, {
            'symbol': 'SHLD',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2010,
            'end_date': '2010-10-30',
            'revenues': 9678000000,
            'op_income': -292000000,
            'net_income': -218000000,
            'eps_basic': -1.98,
            'eps_diluted': -1.98,
            'dividend': 0.0,
            'assets': 26045000000,
            'cur_assets': 13123000000,
            'cur_liab': 10682000000,
            'equity': 8378000000,
            'cash': 790000000,
            'cash_flow_op': -1172000000,
            'cash_flow_inv': -296000000,
            'cash_flow_fin': 532000000
        })

    def test_sial_20101231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/90185/000119312511028579/sial-20101231.xml')
        self.assert_item(item, {
            'symbol': 'SIAL',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2010,
            'end_date': '2010-12-31',
            'revenues': 2271000000,
            'op_income': 551000000,
            'net_income': 384000000,
            'eps_basic': 3.17,
            'eps_diluted': 3.12,
            'dividend': 0.0,
            'assets': 3014000000,
            'cur_assets': 1602000000,
            'cur_liab': 530000000,
            'equity': 1976000000,
            'cash': 569000000,
            'cash_flow_op': 523000000,
            'cash_flow_inv': -182000000,
            'cash_flow_fin': -161000000
        })

    def test_siri_20100630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/908937/000095012310074081/siri-20100630.xml')
        self.assert_item(item, {
            'symbol': 'SIRI',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2010,
            'end_date': '2010-06-30',
            'revenues': 699761000,
            'op_income': 125634000,
            'net_income': 15272000,
            'eps_basic': 0.0,
            'eps_diluted': 0.0,
            'dividend': 0.0,
            'assets': 7200932000,
            'cur_assets': 760172000,
            'cur_liab': 2041871000,
            'equity': 180428000,
            'cash': 258854000,
            'cash_flow_op': 140987000,
            'cash_flow_inv': -159859000,
            'cash_flow_fin': -105763000
        })

    def test_siri_20120331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/908937/000090893712000003/siri-20120331.xml')
        self.assert_item(item, {
            'symbol': 'SIRI',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2012,
            'end_date': '2012-03-31',
            'revenues': 804722000,
            'op_income': 199238000,
            'net_income': 107774000,
            'eps_basic': 0.03,
            'eps_diluted': 0.02,
            'dividend': 0.0,
            'assets': 7501724000,
            'cur_assets': 1337094000,
            'cur_liab': 2236580000,
            'equity': 849579000,
            'cash': 746576000,
            'cash_flow_op': 39948000,
            'cash_flow_inv': -25187000,
            'cash_flow_fin': -42175000
        })

    def test_spex_20130331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12239/000141588913001019/spex-20130331.xml')
        self.assert_item(item, {
            'symbol': 'SPEX',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2013,
            'end_date': '2013-03-31',
            'revenues': 5761,
            'op_income': -910547,
            'net_income': -3696570,
            'eps_basic': -5.35,
            'eps_diluted': None,
            'dividend': 0.0,
            'assets': 3572989,
            'cur_assets': 3535555,
            'cur_liab': 453858,
            'equity': 2857993,
            'cash': 3448526,
            'cash_flow_op': -1049711,
            'cash_flow_inv': None,
            'cash_flow_fin': None
        })

    def test_strza_20121231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1507934/000150793413000015/strza-20121231.xml')
        self.assert_item(item, {
            'symbol': 'STRZA',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2012,
            'end_date': '2012-12-31',
            'revenues': 1630696000,
            'op_income': 405404000,
            'net_income': 254484000,
            'eps_basic': None,
            'eps_diluted': None,
            'dividend': 0.0,
            'assets': 2176050000,
            'cur_assets': 1376911000,
            'cur_liab': 330451000,
            'equity': 1302144000,
            'cash': 749774000,
            'cash_flow_op': 292077000,
            'cash_flow_inv': -16214000,
            'cash_flow_fin': -626101000
        })

    def test_stx_20120928(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1137789/000110465912072744/stx-20120928.xml')
        self.assert_item(item, {
            'symbol': 'STX',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2013,
            'end_date': '2012-09-28',
            'revenues': 3732000000,
            'op_income': 624000000,
            'net_income': 582000000,
            'eps_basic': 1.48,
            'eps_diluted': 1.42,
            'dividend': 0.32,
            'assets': 9522000000,
            'cur_assets': 5749000000,
            'cur_liab': 2753000000,
            'equity': 3535000000,
            'cash': 1894000000,
            'cash_flow_op': 1132000000,
            'cash_flow_inv': -265000000,
            'cash_flow_fin': -681000000
        })

    def test_stx_20121228(self):
        # 'stx-20120928' is misnamed
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1137789/000110465913005497/stx-20120928.xml')
        self.assert_item(item, {
            'symbol': 'STX',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2012-12-28',
            'revenues': 3668000000,
            'op_income': 555000000,
            'net_income': 492000000,
            'eps_basic': 1.33,
            'eps_diluted': 1.3,
            'dividend': 0.7,
            'assets': 8742000000,
            'cur_assets': 5017000000,
            'cur_liab': 2643000000,
            'equity': 2925000000,
            'cash': 1383000000,
            'cash_flow_op': 1976000000,
            'cash_flow_inv': -453000000,
            'cash_flow_fin': -1849000000
        })

    def test_symc_20130628(self):
        # 'symc-20140628.xml' is misnamed
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/849399/000119312513312695/symc-20140628.xml')
        self.assert_item(item, {
            'symbol': 'SYMC',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2014,
            'end_date': '2013-06-28',
            'revenues': 1709000000,
            'op_income': 224000000,
            'net_income': 157000000,
            'eps_basic': 0.23,
            'eps_diluted': 0.22,
            'dividend': 0.15,
            'assets': 13151000000,
            'cur_assets': 5179000000,
            'cur_liab': 4205000000,
            'equity': 5497000000,
            'cash': 3749000000,
            'cash_flow_op': 312000000,
            'cash_flow_inv': -29000000,
            'cash_flow_fin': -1192000000
        })

    def test_tgt_20130803(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/27419/000110465913066569/tgt-20130803.xml')
        self.assert_item(item, {
            'symbol': 'TGT',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2013-08-03',
            'revenues': 17117000000,
            'op_income': 1161000000,
            'net_income': 611000000,
            'eps_basic': 0.96,
            'eps_diluted': 0.95,
            'dividend': 0.43,
            'assets': 44162000000,
            'cur_assets': 11403000000,
            'cur_liab': 12616000000,
            'equity': 16020000000,
            'cash': 1018000000,
            'cash_flow_op': 4109000000,
            'cash_flow_inv': 1269000000,
            'cash_flow_fin': -5148000000
        })

    def test_trv_20100331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/86312/000110465910021504/trv-20100331.xml')
        self.assert_item(item, {
            'symbol': 'TRV',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2010,
            'end_date': '2010-03-31',
            'revenues': 6119000000,
            'op_income': None,
            'net_income': 647000000,
            'eps_basic': 1.26,
            'eps_diluted': 1.25,
            'dividend': 0.0,
            'assets': 108696000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 26671000000,
            'cash': 251000000,
            'cash_flow_op': 531000000,
            'cash_flow_inv': 952000000,
            'cash_flow_fin': -1486000000
        })

    def test_tsla_20110630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1318605/000119312511221497/tsla-20110630.xml')
        self.assert_item(item, {
            'symbol': 'TSLA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2011,
            'end_date': '2011-06-30',
            'revenues': 58171000,
            'op_income': -58739000,
            'net_income': -58903000,
            'eps_basic': -0.60,
            'eps_diluted': -0.60,
            'dividend': 0.0,
            'assets': 646155000,
            'cur_assets': 417758000,
            'cur_liab': 138736000,
            'equity': 348452000,
            'cash': 319380000,
            'cash_flow_op': -65785000,
            'cash_flow_inv': -13011000,
            'cash_flow_fin': 298618000
        })

    def test_tsla_20111231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1318605/000119312512137560/tsla-20111231.xml')
        self.assert_item(item, {
            'symbol': 'TSLA',
            'amend': True,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2011,
            'end_date': '2011-12-31',
            'revenues': 204242000,
            'op_income': -251488000,
            'net_income': -254411000,
            'eps_basic': -2.53,
            'eps_diluted': -2.53,
            'dividend': 0.0,
            'assets': 713448000,
            'cur_assets': 372838000,
            'cur_liab': 191339000,
            'equity': 224045000,
            'cash': 255266000,
            'cash_flow_op': -114364000,
            'cash_flow_inv': -175928000,
            'cash_flow_fin': 446000000
        })

    def test_tsla_20130630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1318605/000119312513327916/tsla-20130630.xml')
        self.assert_item(item, {
            'symbol': 'TSLA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2013-06-30',
            'revenues': 405139000,
            'op_income': -11792000,
            'net_income': -30502000,
            'eps_basic': -0.26,
            'eps_diluted': -0.26,
            'dividend': 0.0,
            'assets': 1887844000,
            'cur_assets': 1129542000,
            'cur_liab': 486545000,
            'equity': 629426000,
            'cash': 746057000,
            'cash_flow_op': 25886000,
            'cash_flow_inv': -82410000,
            'cash_flow_fin': 600691000
        })

    def test_utmd_20111231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/706698/000109690612002585/utmd-20111231.xml')
        self.assert_item(item, {
            'symbol': 'UTMD',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2011,
            'end_date': '2011-12-31',
            'revenues': 37860000,
            'op_income': 11842000,
            'net_income': 7414000,
            'eps_basic': 2.04,
            'eps_diluted': 2.03,
            'dividend': 0.0,
            'assets': 76389000,
            'cur_assets': 17016000,
            'cur_liab': 9631000,
            'equity': 40757000,
            'cash': 6534000,
            'cash_flow_op': 11365000,
            'cash_flow_inv': -26685000,
            'cash_flow_fin': 18078000
        })

    def test_vel_pe_20130930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/103682/000119312513427104/d-20130930.xml')
        self.assert_item(item, {
            'symbol': 'VEL - PE',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2013,
            'end_date': '2013-09-30',
            'revenues': 3432000000,
            'op_income': 1034000000,
            'net_income': 569000000,
            'eps_basic': 0.98,
            'eps_diluted': 0.98,
            'dividend': 0.5625,
            'assets': 48488000000,
            'cur_assets': 5210000000,
            'cur_liab': 6453000000,
            'equity': 11242000000,
            'cash': 287000000,
            'cash_flow_op': 2950000000,
            'cash_flow_inv': -2348000000,
            'cash_flow_fin': -563000000
        })

    def test_via_20090930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1339947/000119312509221448/via-20090930.xml')
        self.assert_item(item, {
            'symbol': 'VIA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2009,
            'end_date': '2009-09-30',
            'revenues': 3317000000,
            'op_income': 784000000,
            'net_income': 463000000,
            'eps_basic': 0.76,
            'eps_diluted': 0.76,
            'dividend': 0.0,
            'assets': 21307000000,
            'cur_assets': 3605000000,
            'cur_liab': 3707000000,
            'equity': 8044000000,
            'cash': 249000000,
            'cash_flow_op': 732000000,
            'cash_flow_inv': -117000000,
            'cash_flow_fin': -1169000000
        })

    def test_via_20091231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1339947/000119312510028165/via-20091231.xml')
        self.assert_item(item, {
            'symbol': 'VIA',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2009,
            'end_date': '2009-12-31',
            'revenues': 13619000000,
            'op_income': 2904000000,
            'net_income': 1611000000,
            'eps_basic': 2.65,
            'eps_diluted': 2.65,
            'dividend': 0.0,
            'assets': 21900000000,
            'cur_assets': 4430000000,
            'cur_liab': 3751000000,
            'equity': 8677000000,
            'cash': 298000000,
            'cash_flow_op': 1151000000,
            'cash_flow_inv': -274000000,
            'cash_flow_fin': -1388000000
        })

    def test_via_20120630(self):
        # 'via-20120401.xml' is misnamed
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1339947/000119312512333732/via-20120401.xml')
        self.assert_item(item, {
            'symbol': 'VIA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2012,
            'end_date': '2012-06-30',
            'revenues': 3241000000,
            'op_income': 903000000,
            'net_income': 534000000,
            'eps_basic': 1.02,
            'eps_diluted': 1.01,
            'dividend': 0.275,
            'assets': 21958000000,
            'cur_assets': 4511000000,
            'cur_liab': 3716000000,
            'equity': 7473000000,
            'cash': 774000000,
            'cash_flow_op': 1736000000,
            'cash_flow_inv': -212000000,
            'cash_flow_fin': -1750000000
        })

    def test_vno_20090630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/899689/000089968909000034/vno-20090630.xml')
        self.assert_item(item, {
            'symbol': 'VNO',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'FY',  # mismarked in doc, actually should be Q2
            'fiscal_year': 2009,
            'end_date': '2009-06-30',
            'revenues': 678385000,
            'op_income': 221139000,
            'net_income': -51904000,
            'eps_basic': -0.3,
            'eps_diluted': -0.3,
            'dividend': 0.95,
            'assets': 21831857000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 7122175000,
            'cash': 2068498000,
            'cash_flow_op': 379439000,
            'cash_flow_inv': -219310000,
            'cash_flow_fin': 381516000
        })

    def test_vno_20111231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/899689/000089968912000004/vno-20111231.xml')
        self.assert_item(item, {
            'symbol': 'VNO',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2011,
            'end_date': '2011-12-31',
            'revenues': 2915665000,
            'op_income': 856153000,
            'net_income': 601771000,
            'eps_basic': 3.26,
            'eps_diluted': 3.23,
            'dividend': 0.0,
            'assets': 20446487000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 7508447000,
            'cash': 606553000,
            'cash_flow_op': 702499000,
            'cash_flow_inv': -164761000,
            'cash_flow_fin': -621974000
        })

    def test_vrsk_20120930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1442145/000119312512441544/vrsk-20120930.xml')
        self.assert_item(item, {
            'symbol': 'VRSK',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2012,
            'end_date': '2012-09-30',
            'revenues': 398863000,
            'op_income': 155251000,
            'net_income': 82911000,
            'eps_basic': 0.5,
            'eps_diluted': 0.48,
            'dividend': 0.0,
            'assets': 2303433000,
            'cur_assets': 361337000,
            'cur_liab': 668257000,
            'equity': 142048000,
            'cash': 97770000,
            'cash_flow_op': 320997000,
            'cash_flow_inv': -838704000,
            'cash_flow_fin': 424004000
        })

    def test_wat_20120929(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1000697/000119312512448069/wat-20120929.xml')
        self.assert_item(item, {
            'symbol': 'WAT',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2012,
            'end_date': '2012-09-29',
            'revenues': 449952000,
            'op_income': 121745000,
            'net_income': 99109000,
            'eps_basic': 1.13,
            'eps_diluted': 1.12,
            'dividend': 0.0,
            'assets': 2997140000,
            'cur_assets': 2137498000,
            'cur_liab': 767562000,
            'equity': 1329879000,
            'cash': 356293000,
            'cash_flow_op': 317627000,
            'cash_flow_inv': -298851000,
            'cash_flow_fin': -53396000
        })

    def test_wec_20130331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/783325/000010781513000080/wec-20130331.xml')
        self.assert_item(item, {
            'symbol': 'WEC',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2013,
            'end_date': '2013-03-31',
            'revenues': 1275200000,
            'op_income': 321000000,
            'net_income': 176600000,
            'eps_basic': 0.77,
            'eps_diluted': 0.76,
            'dividend': 0.34,
            'assets': 14295300000,
            'cur_assets': 1313800000,
            'cur_liab': 1278100000,
            'equity': 8675000000,
            'cash': 24700000,
            'cash_flow_op': 330300000,
            'cash_flow_inv': -145300000,
            'cash_flow_fin': -195900000
        })

    def test_wec_20130630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/783325/000010781513000112/wec-20130630.xml')
        self.assert_item(item, {
            'symbol': 'WEC',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2013-06-30',
            'revenues': 1012300000,
            'op_income': 229500000,
            'net_income': 119000000,
            'eps_basic': 0.52,
            'eps_diluted': 0.52,
            'dividend': 0.34,
            'assets': 14317000000,
            'cur_assets': 1271100000,
            'cur_liab': 1280700000,
            'equity': 8609000000,
            'cash': 21000000,
            'cash_flow_op': 681500000,
            'cash_flow_inv': -336600000,
            'cash_flow_fin': -359500000
        })

    def test_wfm_20120115(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/865436/000144530512000434/wfm-20120115.xml')
        self.assert_item(item, {
            'symbol': 'WFM',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2012,
            'end_date': '2012-01-15',
            'revenues': 3390940000,
            'op_income': 190338000,
            'net_income': 118327000,
            'eps_basic': 0.66,
            'eps_diluted': 0.65,
            'dividend': 0.14,
            'assets': 4528241000,
            'cur_assets': 1677087000,
            'cur_liab': 896972000,
            'equity': 3182747000,
            'cash': 529954000,
            'cash_flow_op': 260896000,
            'cash_flow_inv': -6963000,
            'cash_flow_fin': 63562000
        })

    def test_xel_20100331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/72903/000110465910024080/xel-20100331.xml')
        self.assert_item(item, {
            'symbol': 'XEL',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2010,
            'end_date': '2010-03-31',
            'revenues': 2807462000,
            'op_income': 403665000,
            'net_income': 166058000,
            'eps_basic': 0.36,
            'eps_diluted': 0.36,
            'dividend': 0.25,
            'assets': 25334501000,
            'cur_assets': 2344294000,
            'cur_liab': 2759838000,
            'equity': 7355871000,
            'cash': 79504000,
            'cash_flow_op': 555539000,
            'cash_flow_inv': -460112000,
            'cash_flow_fin': -121731000
        })

    def test_xel_20101231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/72903/000114036111012444/xel-20101231.xml')
        self.assert_item(item, {
            'symbol': 'XEL',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2010,
            'end_date': '2010-12-31',
            'revenues': 10310947000,
            'op_income': 1619969000,
            'net_income': 751593000,
            'eps_basic': 1.63,
            'eps_diluted': 1.62,
            'dividend': 1.0,
            'assets': 27387690000,
            'cur_assets': 2732643000,
            'cur_liab': 2536533000,
            'equity': 8083519000,
            'cash': 108437000,
            'cash_flow_op': 1893942000,
            'cash_flow_inv': -2806724000,
            'cash_flow_fin': 905571000
        })

    def test_xom_20110331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/34088/000119312511127973/xom-20110331.xml')
        self.assert_item(item, {
            'symbol': 'XOM',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2011,
            'end_date': '2011-03-31',
            'revenues': 114004000000,
            'op_income': None,
            'net_income': 10650000000,
            'eps_basic': 2.14,
            'eps_diluted': 2.14,
            'dividend': 0.44,
            'assets': 319533000000,
            'cur_assets': 72022000000,
            'cur_liab': 73576000000,
            'equity': 157531000000,
            'cash': 12833000000,
            'cash_flow_op': 16856000000,
            'cash_flow_inv': -5353000000,
            'cash_flow_fin': -6749000000
        })

    def test_xom_20111231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/34088/000119312512078102/xom-20111231.xml')
        self.assert_item(item, {
            'symbol': 'XOM',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2011,
            'end_date': '2011-12-31',
            'revenues': 467029000000,
            'op_income': None,
            'net_income': 41060000000,
            'eps_basic': 8.43,
            'eps_diluted': 8.42,
            'dividend': 1.85,
            'assets': 331052000000,
            'cur_assets': 72963000000,
            'cur_liab': 77505000000,
            'equity': 160744000000,
            'cash': 12664000000,
            'cash_flow_op': 55345000000,
            'cash_flow_inv': -22165000000,
            'cash_flow_fin': -28256000000
        })

    def test_xom_20130630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/34088/000003408813000035/xom-20130630.xml')
        self.assert_item(item, {
            'symbol': 'XOM',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2013-06-30',
            'revenues': 106469000000,
            'op_income': None,
            'net_income': 6860000000,
            'eps_basic': 1.55,
            'eps_diluted': 1.55,
            'dividend': 0.63,
            'assets': 341615000000,
            'cur_assets': 62844000000,
            'cur_liab': 72688000000,
            'equity': 171588000000,
            'cash': 4609000000,
            'cash_flow_op': 21275000000,
            'cash_flow_inv': -18547000000,
            'cash_flow_fin': -7409000000
        })

    def test_xray_20091231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/818479/000114420410009164/xray-20091231.xml')
        self.assert_item(item, {
            'symbol': 'XRAY',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2009,
            'end_date': '2009-12-31',
            'revenues': 2159916000,
            'op_income': 381187000,
            'net_income': 274258000,
            'eps_basic': 1.85,
            'eps_diluted': 1.83,
            'dividend': 0.2,
            'assets': 3087932000,
            'cur_assets': 1217796000,
            'cur_liab': 444556000,
            'equity': 1906958000,
            'cash': 450348000,
            'cash_flow_op': 362489000,
            'cash_flow_inv': -53399000,
            'cash_flow_fin': -71420000
        })

    def test_xrx_20091231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/108772/000119312510043079/xrx-20091231.xml')
        self.assert_item(item, {
            'symbol': 'XRX',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2009,
            'end_date': '2009-12-31',
            'revenues': 15179000000,
            'op_income': None,
            'net_income': 485000000,
            'eps_basic': 0.56,
            'eps_diluted': 0.55,
            'dividend': 0.0,
            'assets': 24032000000,
            'cur_assets': 9731000000,
            'cur_liab': 4461000000,
            'equity': 7191000000,
            'cash': 3799000000,
            'cash_flow_op': 2208000000,
            'cash_flow_inv': -343000000,
            'cash_flow_fin': 692000000
        })

    def test_zmh_20090630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1136869/000095012309035693/zmh-20090630.xml')
        self.assert_item(item, {
            'symbol': 'ZMH',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2009,
            'end_date': '2009-06-30',
            'revenues': 1019900000,
            'op_income': 296499999.99999988,
            'net_income': 210099999.99999988,  # Wired number, but it's actually in the filing
            'eps_basic': 0.98,
            'eps_diluted': 0.98,
            'dividend': 0.0,
            'assets': 7462100000.000001,
            'cur_assets': 2328700000.0000005,
            'cur_liab': 669200000,
            'equity': 5805600000,
            'cash': 277500000,
            'cash_flow_op': 379700000.00000018,
            'cash_flow_inv': -174300000.00000003,
            'cash_flow_fin': -142000000.00000003
        })


================================================
FILE: pystock_crawler/tests/test_spiders_edgar.py
================================================
import os
import tempfile

from scrapy.http import HtmlResponse, XmlResponse

from pystock_crawler.spiders.edgar import EdgarSpider, URLGenerator
from pystock_crawler.tests.base import TestCaseBase


def make_url(symbol, start_date='', end_date=''):
    '''A URL that lists all 10-Q and 10-K filings of a company.'''
    return 'http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=%s&type=10-&dateb=%s&datea=%s&owner=exclude&count=300' \
           % (symbol, end_date, start_date)


def make_link_html(href, text=u'Link'):
    return u'<a href="%s">%s</a>' % (href, text)


class URLGeneratorTest(TestCaseBase):

    def test_no_dates(self):
        urls = URLGenerator(('FB', 'GOOG'))
        self.assertEqual(list(urls), [
            make_url('FB'), make_url('GOOG')
        ])

    def test_with_start_date(self):
        urls = URLGenerator(('AAPL', 'AMZN', 'GLD'), start_date='20120215')
        self.assertEqual(list(urls), [
            make_url('AAPL', start_date='20120215'),
            make_url('AMZN', start_date='20120215'),
            make_url('GLD', start_date='20120215')
        ])

    def test_with_end_date(self):
        urls = URLGenerator(('TSLA', 'USO', 'MMM'), end_date='20110530')
        self.assertEqual(list(urls), [
            make_url('TSLA', end_date='20110530'),
            make_url('USO', end_date='20110530'),
            make_url('MMM', end_date='20110530')
        ])

    def test_with_start_and_end_dates(self):
        urls = URLGenerator(('DDD', 'AXP', 'KO'), start_date='20111230', end_date='20121230')
        self.assertEqual(list(urls), [
            make_url('DDD', '20111230', '20121230'),
            make_url('AXP', '20111230', '20121230'),
            make_url('KO', '20111230', '20121230')
        ])


class EdgarSpiderTest(TestCaseBase):

    def test_empty_creation(self):
        spider = EdgarSpider()
        self.assertEqual(spider.start_urls, [])

    def test_symbol_file(self):
        # create a mock file of a list of symbols
        f = tempfile.NamedTemporaryFile('w', delete=False)
        f.write('# Comment\nGOOG\nADBE\nLNKD\n#comment\nJPM\n')
        f.close()

        spider = EdgarSpider(symbols=f.name)
        urls = list(spider.start_urls)

        self.assertEqual(urls, [
            make_url('GOOG'), make_url('ADBE'),
            make_url('LNKD'), make_url('JPM')
        ])

        os.remove(f.name)

    def test_invalid_dates(self):
        with self.assertRaises(ValueError):
            EdgarSpider(startdate='12345678')

        with self.assertRaises(ValueError):
            EdgarSpider(enddate='12345678')

    def test_symbol_file_and_dates(self):
        # create a mock file of a list of symbols
        f = tempfile.NamedTemporaryFile('w', delete=False)
        f.write('# Comment\nT\nCBS\nWMT\n')
        f.close()

        spider = EdgarSpider(symbols=f.name, startdate='20110101', enddate='20130630')
        urls = list(spider.start_urls)

        self.assertEqual(urls, [
            make_url('T', '20110101', '20130630'),
            make_url('CBS', '20110101', '20130630'),
            make_url('WMT', '20110101', '20130630')
        ])

        os.remove(f.name)

    def test_parse_company_filing_page(self):
        '''
        Parse the page that lists all filings of a company.

        Example:
        http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001288776&type=10-&dateb=&owner=exclude&count=40

        '''
        spider = EdgarSpider()
        spider._follow_links = True  # HACK

        body = '''
            <html><body>
            <a href="http://example.com/">Useless Link</a>
            <a href="/Archives/edgar/data/abc-index.htm">Link</a>
            <a href="/Archives/edgar/data/123-index.htm">Link</a>
            <a href="/Archives/edgar/data/123.htm">Useless Link</a>
            <a href="/Archives/edgar/data/123/abc-index.htm">Link</a>
            <a href="/Archives/edgar/data/123/456/abc123-index.htm">Link</a>
            <a href="/Archives/edgar/123/abc-index.htm">Uselss Link</a>
            <a href="/Archives/edgar/data/123/456/789/HELLO-index.htm">Link</a>
            <a href="/Archives/hello-index.html">Useless Link</a>
            </body></html>
        '''

        response = HtmlResponse('http://sec.gov/mock', body=body)
        requests = spider.parse(response)
        urls = [r.url for r in requests]

        self.assertEqual(urls, [
            'http://sec.gov/Archives/edgar/data/abc-index.htm',
            'http://sec.gov/Archives/edgar/data/123-index.htm',
            'http://sec.gov/Archives/edgar/data/123/abc-index.htm',
            'http://sec.gov/Archives/edgar/data/123/456/abc123-index.htm',
            'http://sec.gov/Archives/edgar/data/123/456/789/HELLO-index.htm'
        ])

    def test_parse_quarter_or_annual_page(self):
        '''
        Parse the page that lists filings of a quater or a year of a company.

        Example:
        http://www.sec.gov/Archives/edgar/data/1288776/000128877613000055/0001288776-13-000055-index.htm

        '''
        spider = EdgarSpider()
        spider._follow_links = True  # HACK

        body = '''
            <html><body>
            <a href="http://example.com">Useless Link</a>
            <a href="/Archives/edgar/data/123/abc-20130630.xml">Link</a>
            <a href="/Archives/edgar/123/456/abc123-20130630.xml">Useless Link</a>
            <a href="/Archives/edgar/data/456/789/hello-20130630.xml">Link</a>
            <a href="/Archives/edgar/123/456/hello-20130630.xml">Useless Link</a>
            <a href="/Archives/data/123/456/hello-20130630.xml">Useless Link</a>
            <a href="/Archives/edgar/data/123/456/hello-201306300.xml">Useless Link</a>
            <a href="/Archives/edgar/data/123/456/xyz-20130630.html">Link</a>
            </body></html>
        '''

        response = HtmlResponse('http://sec.gov/mock', body=body)
        requests = spider.parse(response)
        urls = [r.url for r in requests]

        self.assertEqual(urls, [
            'http://sec.gov/Archives/edgar/data/123/abc-20130630.xml',
            'http://sec.gov/Archives/edgar/data/456/789/hello-20130630.xml'
        ])

    def test_parse_xml_report(self):
        '''Parse XML 10-Q or 10-K report.'''
        spider = EdgarSpider()
        spider._follow_links = True  # HACK

        body = '''
            <?xml version="1.0">
            <xbrl xmlns="http://www.xbrl.org/2003/instance"
                  xmlns:xbrli="http://www.xbrl.org/2003/instance"
                  xmlns:dei="http://xbrl.sec.gov/dei/2011-01-31"
                  xmlns:us-gaap="http://fasb.org/us-gaap/2011-01-31">

              <context id="c1">
                <startDate>2013-03-31</startDate>
                <endDate>2013-06-28</endDate>
              </context>

              <dei:AmendmentFlag contextRef="c1">false</dei:AmendmentFlag>
              <dei:DocumentType contextRef="c1">10-Q</dei:DocumentType>
              <dei:DocumentFiscalPeriodFocus contextRef="c1">Q2</dei:DocumentFiscalPeriodFocus>
              <dei:DocumentPeriodEndDate contextRef="c1">2013-06-28</dei:DocumentPeriodEndDate>
              <dei:DocumentFiscalYearFocus>2013</dei>

              <us-gaap:Revenues contextRef="c1">100</us-gaap:Revenues>
              <us-gaap:NetIncomeLoss contextRef="c1">200</us-gaap:NetIncomeLoss>
              <us-gaap:EarningsPerShareBasic contextRef="c1">0.2</us-gaap:EarningsPerShareBasic>
              <us-gaap:EarningsPerShareDiluted contextRef="c1">0.19</us-gaap:EarningsPerShareDiluted>
              <us-gaap:CommonStockDividendsPerShareDeclared contextRef="c1">0.07</us-gaap:CommonStockDividendsPerShareDeclared>

              <us-gaap:Assets contextRef="c1">1600</us-gaap:Assets>
              <us-gaap:StockholdersEquity contextRef="c1">300</us-gaap:StockholdersEquity>
              <us-gaap:CashAndCashEquivalentsAtCarryingValue contextRef="c1">150</us-gaap:CashAndCashEquivalentsAtCarryingValue>
            </xbrl>
        '''

        response = XmlResponse('http://sec.gov/Archives/edgar/data/123/abc-20130720.xml', body=body)
        item = spider.parse_10qk(response)

        self.assert_item(item, {
            'symbol': 'ABC',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2013-06-28',
            'revenues': 100.0,
            'net_income': 200.0,
            'eps_basic': 0.2,
            'eps_diluted': 0.19,
            'dividend': 0.07,
            'assets': 1600.0,
            'equity': 300.0,
            'cash': 150.0
        })


================================================
FILE: pystock_crawler/tests/test_spiders_nasdaq.py
================================================
from scrapy.http import TextResponse

from pystock_crawler.spiders.nasdaq import NasdaqSpider
from pystock_crawler.tests.base import TestCaseBase


class NasdaqSpiderTest(TestCaseBase):

    def test_parse(self):
        spider = NasdaqSpider()

        body = ('"Symbol","Name","Doesnt Matter",\n'
                '"DDD","3D Systems Corporation","50.5",\n'
                '"VNO","Vornado Realty Trust","103.5",\n'
                '"VNO^G","Vornado Realty Trust","25.21",\n'
                '"WBS","Webster Financial Corporation","29.71",\n'
                '"WBS/WS","Webster Financial Corporation","13.07",\n'
                '"AAA-A","Some Fake Company","1234.0",')
        response = TextResponse('http://www.nasdaq.com/dummy_url', body=body)
        items = list(spider.parse(response))

        self.assertEqual(len(items), 3)
        self.assert_item(items[0], {
            'symbol': 'DDD',
            'name': '3D Systems Corporation'
        })
        self.assert_item(items[1], {
            'symbol': 'VNO',
            'name': 'Vornado Realty Trust'
        })
        self.assert_item(items[2], {
            'symbol': 'WBS',
            'name': 'Webster Financial Corporation'
        })


================================================
FILE: pystock_crawler/tests/test_spiders_yahoo.py
================================================
import os
import tempfile

from scrapy.http import TextResponse

from pystock_crawler.spiders.yahoo import make_url, YahooSpider
from pystock_crawler.tests.base import TestCaseBase


class MakeURLTest(TestCaseBase):

    def test_no_dates(self):
        self.assertEqual(make_url('YHOO'), (
            'http://ichart.finance.yahoo.com/table.csv?'
            's=YHOO&d=&e=&f=&g=d&a=&b=&c=&ignore=.csv'
        ))

    def test_only_start_date(self):
        self.assertEqual(make_url('GOOG', start_date='20131122'), (
            'http://ichart.finance.yahoo.com/table.csv?'
            's=GOOG&d=&e=&f=&g=d&a=10&b=22&c=2013&ignore=.csv'
        ))

    def test_only_end_date(self):
        self.assertEqual(make_url('AAPL', end_date='20131122'), (
            'http://ichart.finance.yahoo.com/table.csv?'
            's=AAPL&d=10&e=22&f=2013&g=d&a=&b=&c=&ignore=.csv'
        ))

    def test_start_and_end_dates(self):
        self.assertEqual(make_url('TSLA', start_date='20120305', end_date='20131122'), (
            'http://ichart.finance.yahoo.com/table.csv?'
            's=TSLA&d=10&e=22&f=2013&g=d&a=2&b=5&c=2012&ignore=.csv'
        ))


class YahooSpiderTest(TestCaseBase):

    def test_empty_creation(self):
        spider = YahooSpider()
        self.assertEqual(list(spider.start_urls), [])

    def test_inline_symbols(self):
        spider = YahooSpider(symbols='C')
        self.assertEqual(list(spider.start_urls), [make_url('C')])

        spider = YahooSpider(symbols='KO,DIS,ATVI')
        self.assertEqual(list(spider.start_urls), [
            make_url(symbol) for symbol in ('KO', 'DIS', 'ATVI')
        ])

    def test_symbol_file(self):
        try:
            # Create a mock file of a list of symbols
            with tempfile.NamedTemporaryFile('w', delete=False) as f:
                f.write('# Comment\nGOOG\tGoogle Inc.\nAAPL\nFB  Facebook.com\n#comment\nAMZN\n')

            spider = YahooSpider(symbols=f.name)
            self.assertEqual(list(spider.start_urls), [
                make_url(symbol) for symbol in ('GOOG', 'AAPL', 'FB', 'AMZN')
            ])
        finally:
            os.remove(f.name)

    def test_illegal_dates(self):
        with self.assertRaises(ValueError):
            YahooSpider(startdate='12345678')

        with self.assertRaises(ValueError):
            YahooSpider(enddate='12345678')

    def test_parse(self):
        spider = YahooSpider()

        body = ('Date,Open,High,Low,Close,Volume,Adj Close\n'
                '2013-11-22,121.58,122.75,117.93,121.38,11096700,121.38\n'
                '2013-09-06,168.57,169.70,165.15,166.97,8619700,166.97\n'
                '2013-06-26,103.80,105.87,102.66,105.72,6602600,105.72\n')
        response = TextResponse(make_url('YHOO'), body=body)
        items = list(spider.parse(response))

        self.assertEqual(len(items), 3)
        self.assert_item(items[0], {
            'symbol': 'YHOO',
            'date': '2013-11-22',
            'open': 121.58,
            'high': 122.75,
            'low': 117.93,
            'close': 121.38,
            'volume': 11096700,
            'adj_close': 121.38
        })
        self.assert_item(items[1], {
            'symbol': 'YHOO',
            'date': '2013-09-06',
            'open': 168.57,
            'high': 169.70,
            'low': 165.15,
            'close': 166.97,
            'volume': 8619700,
            'adj_close': 166.97
        })
        self.assert_item(items[2], {
            'symbol': 'YHOO',
            'date': '2013-06-26',
            'open': 103.80,
            'high': 105.87,
            'low': 102.66,
            'close': 105.72,
            'volume': 6602600,
            'adj_close': 105.72
        })


================================================
FILE: pystock_crawler/tests/test_utils.py
================================================
import cStringIO
import os

from pystock_crawler import utils
from pystock_crawler.tests.base import SAMPLE_DATA_DIR, TestCaseBase


class UtilsTest(TestCaseBase):

    def test_check_date_arg(self):
        utils.check_date_arg('19830305')
        utils.check_date_arg('19851122')
        utils.check_date_arg('19980720')
        utils.check_date_arg('20140212')

        # OK to pass an empty argument
        utils.check_date_arg('')

        with self.assertRaises(ValueError):
            utils.check_date_arg('1234')

        with self.assertRaises(ValueError):
            utils.check_date_arg('2014111')

        with self.assertRaises(ValueError):
            utils.check_date_arg('20141301')

        with self.assertRaises(ValueError):
            utils.check_date_arg('20140132')

    def test_parse_limit_arg(self):
        self.assertEqual(utils.parse_limit_arg(''), (0, None))
        self.assertEqual(utils.parse_limit_arg('11,22'), (11, 22))

        with self.assertRaises(ValueError):
            utils.parse_limit_arg('11,22,33')

        with self.assertRaises(ValueError):
            utils.parse_limit_arg('abc')

    def test_load_symbols(self):
        try:
            filename = os.path.join(SAMPLE_DATA_DIR, 'test_symbols.txt')
            with open(filename, 'w') as f:
                f.write('AAPL Apple Inc.\nGOOG\tGoogle Inc.\n# Comment\nFB\nTWTR\nAMZN\nSPY\n\nYHOO\n# The end\n')

            symbols = list(utils.load_symbols(filename))
            self.assertEqual(symbols, ['AAPL', 'GOOG', 'FB', 'TWTR', 'AMZN', 'SPY', 'YHOO'])
        finally:
            os.remove(filename)

    def test_parse_csv(self):
        f = cStringIO.StringIO('name,age\nAvon,30\nOmar,29\nJoe,45\n')
        items = list(utils.parse_csv(f))
        self.assertEqual(items, [
            { 'name': 'Avon', 'age': '30' },
            { 'name': 'Omar', 'age': '29' },
            { 'name': 'Joe', 'age': '45' }
        ])


================================================
FILE: pystock_crawler/throttle.py
================================================
import logging

from scrapy.exceptions import NotConfigured
from scrapy import signals


class PassiveThrottle(object):
    '''
    Scrapy's AutoThrottle adds too much download delay on edgar spider, making
    it too slow.

    PassiveThrottle takes a more "passive" approach. It adds download delay
    only if there is an error response.

    '''
    def __init__(self, crawler):
        self.crawler = crawler
        if not crawler.settings.getbool('PASSIVETHROTTLE_ENABLED'):
            raise NotConfigured

        self.debug = crawler.settings.getbool("PASSIVETHROTTLE_DEBUG")
        self.stats = crawler.stats
        crawler.signals.connect(self._spider_opened, signal=signals.spider_opened)
        crawler.signals.connect(self._response_downloaded, signal=signals.response_downloaded)

    @classmethod
    def from_crawler(cls, crawler):
        return cls(crawler)

    def _spider_opened(self, spider):
        self.mindelay = self._min_delay(spider)
        self.maxdelay = self._max_delay(spider)
        self.retry_http_codes = self._retry_http_codes()

        self.stats.set_value('delay_count', 0)

    def _min_delay(self, spider):
        s = self.crawler.settings
        return getattr(spider, 'download_delay', 0.0) or \
            s.getfloat('DOWNLOAD_DELAY')

    def _max_delay(self, spider):
        return self.crawler.settings.getfloat('PASSIVETHROTTLE_MAX_DELAY', 60.0)

    def _retry_http_codes(self):
        return self.crawler.settings.getlist('RETRY_HTTP_CODES', [])

    def _response_downloaded(self, response, request, spider):
        key, slot = self._get_slot(request, spider)
        if slot is None:
            return

        olddelay = slot.delay
        self._adjust_delay(slot, response)
        if self.debug:
            diff = slot.delay - olddelay
            conc = len(slot.transferring)
            msg = "slot: %s | conc:%2d | delay:%5d ms (%+d)" % \
                  (key, conc, slot.delay * 1000, diff * 1000)
            spider.log(msg, level=logging.INFO)

    def _get_slot(self, request, spider):
        key = request.meta.get('download_slot')
        return key, self.crawler.engine.downloader.slots.get(key)

    def _adjust_delay(self, slot, response):
        """Define delay adjustment policy"""
        if response.status in self.retry_http_codes:
            new_delay = max(slot.delay, 1) * 4
            new_delay = max(new_delay, self.mindelay)
            new_delay = min(new_delay, self.maxdelay)
            slot.delay = new_delay
            self.stats.inc_value('delay_count')
        elif response.status == 200:
            new_delay = max(slot.delay / 2, self.mindelay)
            if new_delay < 0.01:
                new_delay = 0
            slot.delay = new_delay


================================================
FILE: pystock_crawler/utils.py
================================================
import csv

from datetime import datetime


def check_date_arg(value, arg_name=None):
    if value:
        try:
            if len(value) != 8:
                raise ValueError
            datetime.strptime(value, '%Y%m%d')
        except ValueError:
            raise ValueError("Option '%s' must be in YYYYMMDD format, input is '%s'" % (arg_name, value))


def parse_limit_arg(value):
    if value:
        tokens = value.split(',')
        try:
            if len(tokens) != 2:
                raise ValueError
            return int(tokens[0]), int(tokens[1])
        except ValueError:
            raise ValueError("Option 'limit' must be in START,COUNT format, input is '%s'" % value)
    return 0, None


def load_symbols(file_path):
    symbols = []
    with open(file_path) as f:
        for line in f:
            line = line.strip()
            if line and not line.startswith('#'):
                symbol = line.split()[0]
                symbols.append(symbol)
    return symbols


def parse_csv(file_like):
    reader = csv.reader(file_like)
    headers = reader.next()
    for row in reader:
        item = {}
        for i, value in enumerate(row):
            header = headers[i]
            item[header] = value
        yield item


================================================
FILE: pytest.ini
================================================
[pytest]
addopts = --cov-report term-missing --cov pystock_crawler --cov bin pystock_crawler/tests/


================================================
FILE: requirements-test.txt
================================================
envoy
pytest
pytest-cov
requests


================================================
FILE: requirements.txt
================================================
docopt==0.6.2
leveldb==0.193
Scrapy==0.24.4
service-identity==1.0.0


================================================
FILE: scrapy.cfg
================================================
# Automatically created by: scrapy startproject
#
# For more information about the [deploy] section see:
# http://doc.scrapy.org/en/latest/topics/scrapyd.html

[settings]
default = pystock_crawler.settings

[deploy]
#url = http://localhost:6800/
project = pystock_crawler


================================================
FILE: setup.py
================================================
try:
    from setuptools import setup
except ImportError:
    from distutils.core import setup

import codecs
import os
import re


here = os.path.abspath(os.path.dirname(__file__))


# Read the version number from a source file.
# Why read it, and not import?
# see https://groups.google.com/d/topic/pypa-dev/0PkjVpcxTzQ/discussion
def find_version(*file_paths):
    # Open in Latin-1 so that we avoid encoding errors.
    # Use codecs.open for Python 2 compatibility
    with codecs.open(os.path.join(here, *file_paths), 'r', 'latin1') as f:
        version_file = f.read()

    # The version line must have the form
    # __version__ = 'ver'
    version_match = re.search(r"^__version__ = ['\"]([^'\"]*)['\"]", version_file, re.M)
    if version_match:
        return version_match.group(1)
    raise RuntimeError('Unable to find version string')


def read_description(filename):
    with codecs.open(filename, encoding='utf-8') as f:
        return f.read()


def parse_requirements(filename):
    with open(filename) as f:
        content = f.read()
    return filter(lambda x: x and not x.startswith('#'), content.splitlines())


setup(
    name='pystock-crawler',
    version=find_version('pystock_crawler', '__init__.py'),
    url='https://github.com/eliangcs/pystock-crawler',
    description='Crawl and parse stock historical data',
    long_description=read_description('README.rst'),
    author='Chang-Hung Liang',
    author_email='eliang.cs@gmail.com',
    license='MIT',
    packages=['pystock_crawler', 'pystock_crawler.spiders'],
    scripts=['bin/pystock-crawler'],
    install_requires=parse_requirements('requirements.txt'),
    classifiers=[
        'Development Status :: 3 - Alpha',
        'Environment :: Console',
        'Intended Audience :: Developers',
        'Intended Audience :: Financial and Insurance Industry',
        'License :: OSI Approved :: MIT License',
        'Operating System :: OS Independent',
        'Programming Language :: Python',
        'Programming Language :: Python :: 2.7',
        'Topic :: Internet :: WWW/HTTP',
        'Topic :: Office/Business :: Financial :: Investment',
        'Topic :: Software Development :: Libraries :: Python Modules'
    ]
)