Full Code of eliangcs/pystock-crawler for AI

master 8b803c8944f3 cached

30 files

216.6 KB

62.4k tokens

307 symbols

1 requests

Download .txt

Showing preview only (227K chars total). Download the full file or copy to clipboard to get everything.

Repository: eliangcs/pystock-crawler
Branch: master
Commit: 8b803c8944f3
Files: 30
Total size: 216.6 KB

Directory structure:
gitextract_mp6yf35w/

├── .gitignore
├── .travis.yml
├── LICENSE
├── MANIFEST.in
├── README.rst
├── bin/
│   └── pystock-crawler
├── pystock_crawler/
│   ├── __init__.py
│   ├── exporters.py
│   ├── items.py
│   ├── loaders.py
│   ├── settings.py
│   ├── spiders/
│   │   ├── __init__.py
│   │   ├── edgar.py
│   │   ├── nasdaq.py
│   │   └── yahoo.py
│   ├── tests/
│   │   ├── __init__.py
│   │   ├── base.py
│   │   ├── test_cmdline.py
│   │   ├── test_loaders.py
│   │   ├── test_spiders_edgar.py
│   │   ├── test_spiders_nasdaq.py
│   │   ├── test_spiders_yahoo.py
│   │   └── test_utils.py
│   ├── throttle.py
│   └── utils.py
├── pytest.ini
├── requirements-test.txt
├── requirements.txt
├── scrapy.cfg
└── setup.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
*.csv
*.log
*.pyc
.coverage
.scrapy/
.~*
build/
dist/
pystock_crawler.egg-info/
pystock_crawler/tests/sample_data/


================================================
FILE: .travis.yml
================================================
language: python
python:
  - 2.7
branches:
  only:
    - master
install:
  - pip install -r requirements.txt
  - pip install -r requirements-test.txt
script:
  - py.test
after_success:
  - pip install python-coveralls
  - coveralls


================================================
FILE: LICENSE
================================================
The MIT License (MIT)

Copyright (c) 2013 Chang-Hung Liang

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


================================================
FILE: MANIFEST.in
================================================
include README.rst LICENSE requirements.txt

================================================
FILE: README.rst
================================================
pystock-crawler
===============

.. image:: https://badge.fury.io/py/pystock-crawler.png
    :target: http://badge.fury.io/py/pystock-crawler

.. image:: https://travis-ci.org/eliangcs/pystock-crawler.png?branch=master
    :target: https://travis-ci.org/eliangcs/pystock-crawler

.. image:: https://coveralls.io/repos/eliangcs/pystock-crawler/badge.png?branch=master
    :target: https://coveralls.io/r/eliangcs/pystock-crawler

``pystock-crawler`` is a utility for crawling historical data of US stocks,
including:

* Ticker symbols listed in NYSE, NASDAQ or AMEX from `NASDAQ.com`_
* Daily prices from `Yahoo Finance`_
* Fundamentals from 10-Q and 10-K filings (XBRL) on `SEC EDGAR`_


Example Output
--------------

NYSE ticker symbols::

    DDD   3D Systems Corporation
    MMM   3M Company
    WBAI  500.com Limited
    ...

Apple's daily prices::

    symbol,date,open,high,low,close,volume,adj_close
    AAPL,2014-04-28,572.80,595.75,572.55,594.09,23890900,594.09
    AAPL,2014-04-25,564.53,571.99,563.96,571.94,13922800,571.94
    AAPL,2014-04-24,568.21,570.00,560.73,567.77,27092600,567.77
    ...

Google's fundamentals::

    symbol,end_date,amend,period_focus,fiscal_year,doc_type,revenues,op_income,net_income,eps_basic,eps_diluted,dividend,assets,cur_assets,cur_liab,cash,equity,cash_flow_op,cash_flow_inv,cash_flow_fin
    GOOG,2009-06-30,False,Q2,2009,10-Q,5522897000.0,1873894000.0,1484545000.0,4.7,4.66,0.0,35158760000.0,23834853000.0,2000962000.0,11911351000.0,31594856000.0,3858684000.0,-635974000.0,46354000.0
    GOOG,2009-09-30,False,Q3,2009,10-Q,5944851000.0,2073718000.0,1638975000.0,5.18,5.13,0.0,37702845000.0,26353544000.0,2321774000.0,12087115000.0,33721753000.0,6584667000.0,-3245963000.0,74851000.0
    GOOG,2009-12-31,False,FY,2009,10-K,23650563000.0,8312186000.0,6520448000.0,20.62,20.41,0.0,40496778000.0,29166958000.0,2747467000.0,10197588000.0,36004224000.0,9316198000.0,-8019205000.0,233412000.0
    ...


Installation
------------

Prerequisites:

* Python 2.7

``pystock-crawler`` is based on Scrapy_, so you will also need to install
prerequisites such as lxml_ and libffi_ for Scrapy and its dependencies. On
Ubuntu, for example, you can install them like this::

    sudo apt-get update
    sudo apt-get install -y gcc python-dev libffi-dev libssl-dev libxml2-dev libxslt1-dev build-essential

See `Scrapy's installation guide`_ for more details.

After installing prerequisites, you can then install ``pystock-crawler`` with
``pip``::

    (sudo) pip install pystock-crawler


Quickstart
----------

**Example 1.** Fetch Google's and Yahoo's daily prices ordered by date::

    pystock-crawler prices GOOG,YHOO -o out.csv --sort

**Example 2.** Fetch daily prices of all companies listed in
``./symbols.txt``::

    pystock-crawler prices ./symbols.txt -o out.csv

**Example 3.** Fetch Facebook's fundamentals during 2013::

    pystock-crawler reports FB -o out.csv -s 20130101 -e 20131231

**Example 4.** Fetch fundamentals of all companies in ``./nyse.txt`` and direct
the log to ``./crawling.log``::

    pystock-crawler reports ./nyse.txt -o out.csv -l ./crawling.log

**Example 5.** Fetch all ticker symbols in NYSE, NASDAQ and AMEX::

    pystock-crawler symbols NYSE,NASDAQ,AMEX -o out.txt


Usage
-----

Type ``pystock-crawler -h`` to see command help::

    Usage:
      pystock-crawler symbols <exchanges> (-o OUTPUT) [-l LOGFILE] [-w WORKING_DIR]
                                          [--sort]
      pystock-crawler prices <symbols> (-o OUTPUT) [-s YYYYMMDD] [-e YYYYMMDD]
                                       [-l LOGFILE] [-w WORKING_DIR] [--sort]
      pystock-crawler reports <symbols> (-o OUTPUT) [-s YYYYMMDD] [-e YYYYMMDD]
                                        [-l LOGFILE] [-w WORKING_DIR]
                                        [-b BATCH_SIZE] [--sort]
      pystock-crawler (-h | --help)
      pystock-crawler (-v | --version)

    Options:
      -h --help       Show this screen
      -o OUTPUT       Output file
      -s YYYYMMDD     Start date [default: ]
      -e YYYYMMDD     End date [default: ]
      -l LOGFILE      Log output [default: ]
      -w WORKING_DIR  Working directory [default: .]
      -b BATCH_SIZE   Batch size [default: 500]
      --sort          Sort the result

There are three commands available:

* ``pystock-crawler symbols`` grabs ticker symbol lists
* ``pystock-crawler prices`` grabs daily prices
* ``pystock-crawler reports`` grabs fundamentals

``<exchanges>`` is a comma-separated string that specifies the stock exchanges
you want to include. Current, NYSE, NASDAQ and AMEX are supported.

The output file of ``pystock-crawler symbols`` can be used for ``<symbols>``
argument in ``pystock-crawler prices`` and ``pystock-crawler reports``
commands.

``<symbols>`` can be an inline string separated with commas or a text file
that lists symbols line by line. For example, the inline string can be
something like ``AAPL,GOOG,FB``. And the text file may look like this::

    # This line is comment
    AAPL    Put anything you want here
    GOOG    Since the text here is ignored
    FB

Use ``-o`` to specify the output file. For ``pystock-crawler symbols``
command, the output format is a simple text file. For
``pystock-crawler prices`` and ``pystock-crawler reports`` the output format
is CSV.

``-l`` is where the crawling logs go to. If not specified, the logs go to
stdout.

By default, the crawler uses the current directory as the working directory.
If you don't want to use the current directoy, you can specify it with ``-w``
option. The crawler keeps HTTP cache in a directory named ``.scrapy`` under
the working directory. The cache can save your time by avoid downloading the
same web pages. However, the cache can be quite huge. If you don't need it,
just delete the ``.scrapy`` directory after you've done crawling.

``-b`` option is only available to ``pystock-crawler reports`` command. It
allows you to split a large symbol list into smaller batches. This is actually
a workaround for an unresolved bug (#2). Normally you don't have to specify
this option. Default value (500) works just fine.

The rows in the output file are in an arbitrary order by default. Use
``--sort`` option to sort them by symbols and dates. But if you have a large
output file, don't use --sort because it will be slow and eat a lot of memory.


Developer Guide
---------------

Installing Dependencies
~~~~~~~~~~~~~~~~~~~~~~~
::

    pip install -r requirements.txt


Running Test
~~~~~~~~~~~~

Install test requirements::

    pip install -r requirements-test.txt

Then run the test::

    py.test

This will download the test data (a lot of XML/XBRL files) from from
`SEC EDGAR`_ on the fly, so it will take some time and disk space. The test
data is saved to ``pystock_crawler/tests/sample_data`` directory. It can be
reused on the next time you run the test. If you don't need them, just delete
the ``sample_data`` directory.


.. _libffi: https://sourceware.org/libffi/
.. _lxml: http://lxml.de/
.. _NASDAQ.com: http://www.nasdaq.com/
.. _Scrapy: http://scrapy.org/
.. _Scrapy's installation guide: http://doc.scrapy.org/en/latest/intro/install.html
.. _SEC EDGAR: http://www.sec.gov/edgar/searchedgar/companysearch.html
.. _virtualenv: http://www.virtualenv.org/
.. _virtualenvwrapper: http://virtualenvwrapper.readthedocs.org/
.. _Yahoo Finance: http://finance.yahoo.com/


================================================
FILE: bin/pystock-crawler
================================================
#!/usr/bin/env python
'''
Usage:
  pystock-crawler symbols <exchanges> (-o OUTPUT) [-l LOGFILE] [-w WORKING_DIR]
                                      [--sort]
  pystock-crawler prices <symbols> (-o OUTPUT) [-s YYYYMMDD] [-e YYYYMMDD]
                                   [-l LOGFILE] [-w WORKING_DIR] [--sort]
  pystock-crawler reports <symbols> (-o OUTPUT) [-s YYYYMMDD] [-e YYYYMMDD]
                                    [-l LOGFILE] [-w WORKING_DIR]
                                    [-b BATCH_SIZE] [--sort]
  pystock-crawler (-h | --help)
  pystock-crawler (-v | --version)

Options:
  -h --help       Show this screen
  -o OUTPUT       Output file
  -s YYYYMMDD     Start date [default: ]
  -e YYYYMMDD     End date [default: ]
  -l LOGFILE      Log output [default: ]
  -w WORKING_DIR  Working directory [default: .]
  -b BATCH_SIZE   Batch size [default: 500]
  --sort          Sort the result

'''
import codecs
import math
import os
import sys
import uuid

from contextlib import contextmanager
from docopt import docopt
from scrapy import log

try:
    import pystock_crawler
except ImportError:
    # For development environment
    sys.path.append(os.getcwd())
    import pystock_crawler


def random_string(length=5):
    return uuid.uuid4().get_hex()[0:5]


@contextmanager
def tmp_scrapy_cfg():
    content = '''# pystock_crawler scrapy.cfg
[settings]
default = pystock_crawler.settings

[deploy]
#url = http://localhost:6800/
project = pystock_crawler
'''
    filename = os.path.abspath('./scrapy.cfg')
    filename_bak = os.path.abspath('./scrapy-%s.cfg' % random_string())
    if os.path.exists(filename):
        log.msg(u'Renaming %s -> %s' % (filename, filename_bak))
        os.rename(filename, filename_bak)
    assert not os.path.exists(filename)
    log.msg(u'Creating temporary config: %s' % filename)
    with open(filename, 'w') as f:
        f.write(content)

    yield

    if os.path.exists(filename):
        log.msg(u'Deleting %s' % filename)
        os.remove(filename)
    if os.path.exists(filename_bak):
        log.msg(u'Renaming %s -> %s' % (filename_bak, filename))
        os.rename(filename_bak, filename)


def run_scrapy_command(cmd):
    log.msg('Command: %s' % cmd)
    with tmp_scrapy_cfg():
        os.system(cmd)


def count_symbols(symbols):
    if os.path.exists(symbols):
        # If `symbols` is a file
        with open(symbols) as f:
            count = 0
            for line in f:
                line = line.rstrip()
                if line and not line.startswith('#'):
                    count += 1
        return count

    # If `symbols` is a comma-separated string
    return len(symbols.split(','))


def merge_files(target, sources, ignore_header=False):
    log.msg(u'Merging files to %s' % target)
    with codecs.open(target, 'w', 'utf-8') as out:
        for i, source in enumerate(sources):
            with codecs.open(source, 'r', 'utf-8') as f:
                if ignore_header and i > 0:
                    try:
                        f.next()  # Ignore CSV header
                    except StopIteration:
                        break  # Empty file
                out.write(f.read())

    # Delete source files
    for filename in sources:
        log.msg(u'Deleting %s' % filename)
        os.remove(filename)


def crawl_symbols(exchanges, output, log_file):
    command = 'scrapy crawl nasdaq -a exchanges="%s" -t symbollist' % exchanges

    if output:
        command += ' -o "%s"' % output
    if log_file:
        command += ' -s LOG_FILE="%s"' % log_file

    run_scrapy_command(command)


def crawl(spider, symbols, start_date, end_date, output, log_file, batch_size):
    command = 'scrapy crawl %s -a symbols="%s" -t csv' % (spider, symbols)

    if start_date:
        command += ' -a startdate=%s' % start_date
    if end_date:
        command += ' -a enddate=%s' % end_date
    if log_file:
        command += ' -s LOG_FILE="%s"' % log_file

    if spider == 'edgar':
        # When crawling edgar filings, run the scrapy command batch by batch to
        # work around issue #2
        num_symbols = count_symbols(symbols)
        num_batches = int(math.ceil(num_symbols / float(batch_size)))

        # Store sub-files so we can merge them later
        output_files = []

        for i in xrange(num_batches):
            start = i * batch_size
            batch_cmd = command + ' -a limit=%d,%d' % (start, batch_size)
            if output:
                filename = '%s.%d' % (output, i + 1)
                batch_cmd += ' -o "%s"' % filename
                output_files.append(filename)

            run_scrapy_command(batch_cmd)

        merge_files(output, output_files, ignore_header=True)
    else:
        if output:
            command += ' -o "%s"' % output
        run_scrapy_command(command)


def sort_symbols(filename):
    log.msg(u'Sorting: %s' % filename)

    with codecs.open(filename, 'r', 'utf-8') as f:
        lines = [line for line in f]

    lines = sorted(lines)

    with codecs.open(filename, 'w', 'utf-8') as f:
        f.writelines(lines)

    log.msg(u'Sorted: %s' % filename)


def sort_csv(filename):
    log.msg(u'Sorting: %s' % filename)

    with codecs.open(filename, 'r', 'utf-8') as f:
        try:
            headers = f.next()
        except StopIteration:
            log.msg(u'No need to sort empty file: %s' % filename)
            return
        lines = [line for line in f]

    def line_cmp(line1, line2):
        a = line1.split(',')
        b = line2.split(',')
        length = min(len(a), len(b))
        i = 0
        while 1:
            result = cmp(a[i], b[i])
            if result or i >= length:
                return result
            i += 1

    lines = sorted(lines, cmp=line_cmp)

    with codecs.open(filename, 'w', 'utf-8') as f:
        f.write(headers)
        f.writelines(lines)

    log.msg(u'Sorted: %s' % filename)


def print_version():
    print 'pystock-crawler %s' % pystock_crawler.__version__


def main():
    args = docopt(__doc__)

    symbols = args.get('<symbols>')
    start_date = args.get('-s')
    end_date = args.get('-e')
    output = args.get('-o')
    log_file = args.get('-l')
    batch_size = args.get('-b')
    sorting = args.get('--sort')
    working_dir = args.get('-w')

    if args['prices']:
        spider = 'yahoo'
    elif args['reports']:
        spider = 'edgar'
    else:
        spider = None

    if symbols and os.path.exists(symbols):
        symbols = os.path.abspath(symbols)
    if output:
        output = os.path.abspath(output)
    if log_file:
        log_file = os.path.abspath(log_file)

    try:
        batch_size = int(batch_size)
        if batch_size <= 0:
            raise ValueError
    except ValueError:
        raise ValueError("BATCH_SIZE must be a positive integer, input is '%s'" % batch_size)

    try:
        os.chdir(working_dir)
    except OSError as err:
        sys.stderr.write('%s\n' % err)
        return

    if spider:
        log.start(logfile=log_file)
        crawl(spider, symbols, start_date, end_date, output, log_file, batch_size)
        if sorting and output:
            sort_csv(output)
    elif args['symbols']:
        log.start(logfile=log_file)
        exchanges = args.get('<exchanges>')
        crawl_symbols(exchanges, output, log_file)
        if sorting and output:
            sort_symbols(output)
    elif args['-v'] or args['--version']:
        print_version()


if __name__ == '__main__':
    main()


================================================
FILE: pystock_crawler/__init__.py
================================================
__version__ = '0.8.2'


================================================
FILE: pystock_crawler/exporters.py
================================================
from scrapy.conf import settings
from scrapy.contrib.exporter import BaseItemExporter, CsvItemExporter


class CsvItemExporter2(CsvItemExporter):
    '''
    The standard CsvItemExporter class does not pass the kwargs through to the
    CSV writer, resulting in EXPORT_FIELDS and EXPORT_ENCODING being ignored
    (EXPORT_EMPTY is not used by CSV).

    http://stackoverflow.com/questions/6943778/python-scrapy-how-to-get-csvitemexporter-to-write-columns-in-a-specific-order

    '''
    def __init__(self, *args, **kwargs):
        kwargs['fields_to_export'] = settings.getlist('EXPORT_FIELDS') or None
        kwargs['encoding'] = settings.get('EXPORT_ENCODING', 'utf-8')

        super(CsvItemExporter2, self).__init__(*args, **kwargs)

    def _write_headers_and_set_fields_to_export(self, item):
        # HACK: Override this private method to filter fields that are in
        # fields_to_export but not in item
        if self.include_headers_line:
            item_fields = item.fields.keys()
            if self.fields_to_export:
                self.fields_to_export = filter(lambda a: a in item_fields, self.fields_to_export)
            else:
                self.fields_to_export = item_fields
            self.csv_writer.writerow(self.fields_to_export)


class SymbolListExporter(BaseItemExporter):

    def __init__(self, file, **kwargs):
        self._configure(kwargs, dont_fail=True)
        self.file = file

    def export_item(self, item):
        self.file.write('%s\t%s\n' % (item['symbol'], item['name']))


================================================
FILE: pystock_crawler/items.py
================================================
# Define here the models for your scraped items
#
# See documentation in:
# http://doc.scrapy.org/en/latest/topics/items.html

from scrapy.item import Item, Field


class ReportItem(Item):
    # Trading symbol
    symbol = Field()

    # If this doc is an amendment to previously filed doc
    amend = Field()

    # Quarterly (10-Q) or annual (10-K) report
    doc_type = Field()

    # Q1, Q2, Q3, or FY for annual report
    period_focus = Field()

    fiscal_year = Field()
    end_date = Field()

    revenues = Field()
    op_income = Field()
    net_income = Field()

    eps_basic = Field()
    eps_diluted = Field()

    dividend = Field()

    # Balance sheet stuffs
    assets = Field()
    cur_assets = Field()
    cur_liab = Field()
    equity = Field()
    cash = Field()

    # Cash flow from operating, investing, and financing
    cash_flow_op = Field()
    cash_flow_inv = Field()
    cash_flow_fin = Field()


class PriceItem(Item):
    # Trading symbol
    symbol = Field()

    # YYYY-MM-DD
    date = Field()

    open = Field()
    close = Field()
    high = Field()
    low = Field()
    adj_close = Field()
    volume = Field()


class SymbolItem(Item):
    symbol = Field()
    name = Field()


================================================
FILE: pystock_crawler/loaders.py
================================================
import re

from datetime import datetime, timedelta
from scrapy import log
from scrapy.contrib.loader import ItemLoader
from scrapy.contrib.loader.processor import Compose, MapCompose, TakeFirst
from scrapy.utils.misc import arg_to_iter
from scrapy.utils.python import flatten

from pystock_crawler.items import ReportItem


DATE_FORMAT = '%Y-%m-%d'

MAX_PER_SHARE_VALUE = 1000.0

# If number of characters of response body exceeds this value,
# remove some useless text defined by RE_XML_GARBAGE to reduce memory usage
THRESHOLD_TO_CLEAN = 20000000

# Used to get rid of "<tag>LONG STRING...</tag>"
RE_XML_GARBAGE = re.compile(r'>([^<]{100,})<')


class IntermediateValue(object):
    '''
    Intermediate data that serves as output of input processors, i.e., input
    of output processors. "Intermediate" is shorten as "imd" in later naming.

    '''
    def __init__(self, local_name, value, text, context, node=None, start_date=None,
                 end_date=None, instant=None):
        self.local_name = local_name
        self.value = value
        self.text = text
        self.context = context
        self.node = node
        self.start_date = start_date
        self.end_date = end_date
        self.instant = instant

    def __cmp__(self, other):
        if self.value < other.value:
            return -1
        elif self.value > other.value:
            return 1
        return 0

    def __repr__(self):
        context_id = None
        if self.context:
            context_id = self.context.xpath('@id')[0].extract()
        return '(%s, %s, %s)' % (self.local_name, self.value, context_id)

    def is_member(self):
        return is_member(self.context)


class ExtractText(object):

    def __call__(self, value):
        if hasattr(value, 'select'):
            try:
                return value.xpath('./text()')[0].extract()
            except IndexError:
                return ''
        return unicode(value)


class MatchEndDate(object):

    def __init__(self, data_type=str, ignore_date_range=False):
        self.data_type = data_type
        self.ignore_date_range = ignore_date_range

    def __call__(self, value, loader_context):
        if not hasattr(value, 'select'):
            return IntermediateValue('', 0.0, '0', None)

        doc_end_date_str = loader_context['end_date']
        doc_type = loader_context['doc_type']
        selector = loader_context['selector']

        context_id = value.xpath('@contextRef')[0].extract()
        try:
            context = selector.xpath('//*[@id="%s"]' % context_id)[0]
        except IndexError:
            try:
                url = loader_context['response'].url
            except KeyError:
                url = None
            log.msg(u'Cannot find context: %s in %s' % (context_id, url), log.WARNING)
            return None

        date = instant = start_date = end_date = None
        try:
            instant = context.xpath('.//*[local-name()="instant"]/text()')[0].extract().strip()
        except (IndexError, ValueError):
            try:
                end_date_str = context.xpath('.//*[local-name()="endDate"]/text()')[0].extract().strip()
                end_date = datetime.strptime(end_date_str, DATE_FORMAT)

                start_date_str = context.xpath('.//*[local-name()="startDate"]/text()')[0].extract().strip()
                start_date = datetime.strptime(start_date_str, DATE_FORMAT)

                if self.ignore_date_range or date_range_matches_doc_type(doc_type, start_date, end_date):
                    date = end_date
            except (IndexError, ValueError):
                pass
        else:
            try:
                instant = datetime.strptime(instant, DATE_FORMAT)
            except ValueError:
                pass
            else:
                date = instant

        if date:
            doc_end_date = datetime.strptime(doc_end_date_str, DATE_FORMAT)
            delta_days = (doc_end_date - date).days
            if abs(delta_days) < 30:
                try:
                    text = value.xpath('./text()')[0].extract()
                    val = self.data_type(text)
                except (IndexError, ValueError):
                    pass
                else:
                    local_name = value.xpath('local-name()')[0].extract()
                    return IntermediateValue(
                        local_name, val, text, context, value,
                        start_date=start_date, end_date=end_date, instant=instant)

        return None


class ImdSumMembersOr(object):

    def __init__(self, second_func=None):
        self.second_func = second_func

    def __call__(self, imd_values):
        members = []
        non_members = []
        for imd_value in imd_values:
            if imd_value.is_member():
                members.append(imd_value)
            else:
                non_members.append(imd_value)

        if members and len(members) == len(imd_values):
            return imd_sum(members)

        if imd_values:
            return self.second_func(non_members)
        return None


def date_range_matches_doc_type(doc_type, start_date, end_date):
    delta_days = (end_date - start_date).days
    return ((doc_type == '10-Q' and delta_days < 120 and delta_days > 60) or
            (doc_type == '10-K' and delta_days < 380 and delta_days > 350))


def get_amend(values):
    if values:
        return values[0]
    return False


def get_symbol(values):
    if values:
        symbols = map(lambda s: s.strip(), values[0].split(','))
        return '/'.join(symbols)
    return False


def imd_max(imd_values):
    if imd_values:
        imd_value = max(imd_values)
        return imd_value.value
    return None


def imd_min(imd_values):
    if imd_values:
        imd_value = min(imd_values)
        return imd_value.value
    return None


def imd_sum(imd_values):
    return sum([v.value for v in imd_values])


def imd_get_revenues(imd_values):
    interest_elems = filter(lambda v: 'interest' in v.local_name.lower(), imd_values)
    if len(interest_elems) == len(imd_values):
        # HACK: An exceptional case for BBT
        # Revenues = InterestIncome + NoninterestIncome
        return imd_sum(imd_values)

    return imd_max(imd_values)


def imd_get_net_income(imd_values):
    return imd_min(imd_values)


def imd_get_op_income(imd_values):
    imd_values = filter(lambda v: memberness(v.context) < 2, imd_values)
    return imd_min(imd_values)


def imd_get_cash_flow(imd_values, loader_context):
    if len(imd_values) == 1:
        return imd_values[0].value

    doc_type = loader_context['doc_type']

    within_date_range = []
    for imd_value in imd_values:
        if imd_value.start_date and imd_value.end_date:
            if date_range_matches_doc_type(doc_type, imd_value.start_date, imd_value.end_date):
                within_date_range.append(imd_value)

    if within_date_range:
        return imd_max(within_date_range)

    return imd_max(imd_values)


def imd_get_per_share_value(imd_values):
    if not imd_values:
        return None

    v = imd_values[0]
    value = v.value
    if abs(value) > MAX_PER_SHARE_VALUE:
        try:
            decimals = int(v.node.xpath('@decimals')[0].extract())
        except (AttributeError, IndexError, ValueError):
            return None
        else:
            # HACK: some of LTD's reports have unreasonablely large per share value, such as
            # 320000 EPS (and it should be 0.32), so use decimals attribute to scale it down,
            # note that this is NOT a correct way to interpret decimals attribute
            value *= pow(10, decimals - 2)
    return value if abs(value) <= MAX_PER_SHARE_VALUE else None


def imd_get_equity(imd_values):
    if not imd_values:
        return None

    values = filter(lambda v: v.local_name == 'StockholdersEquityIncludingPortionAttributableToNoncontrollingInterest', imd_values)
    if values:
        return values[0].value

    values = filter(lambda v: v.local_name == 'StockholdersEquity', imd_values)
    if values:
        return values[0].value

    return imd_values[0].value


def imd_filter_member(imd_values):
    if imd_values:
        with_memberness = [(v, memberness(v.context)) for v in imd_values]
        with_memberness = sorted(with_memberness, cmp=lambda a, b: a[1] - b[1])

        m0 = with_memberness[0][1]
        non_members = []

        for v in with_memberness:
            if v[1] == m0:
                non_members.append(v[0])

        return non_members

    return imd_values


def imd_mult(imd_values):
    for v in imd_values:
        try:
            node_id = v.node.xpath('@id')[0].extract().lower()
        except (AttributeError, IndexError):
            pass
        else:
            # HACK: some of LUV's reports have unreasonablely small numbers such as
            # 4136 in revenues which should be 4136 millions, this hack uses id attribute
            # to determine if it should be scaled up
            if 'inmillions' in node_id and abs(v.value) < 100000.0:
                v.value *= 1000000.0
            elif 'inthousands' in node_id and abs(v.value) < 100000000.0:
                v.value *= 1000.0
    return imd_values


def memberness(context):
    '''The likelihood that the context is a "member".'''
    if context:
        texts = context.xpath('.//*[local-name()="explicitMember"]/text()').extract()
        text = str(texts).lower()

        if len(texts) > 1:
            return 2
        elif 'country' in text:
            return 2
        elif 'member' not in text:
            return 0
        elif 'successor' in text:
            # 'SuccessorMember' is a rare case that shouldn't be treated as member
            return 1
        elif 'parent' in text:
            return 2
    return 3


def is_member(context):
    if context:
        texts = context.xpath('.//*[local-name()="explicitMember"]/text()').extract()
        text = str(texts).lower()

        # 'SuccessorMember' is a rare case that shouldn't be treated as member
        if 'member' not in text or 'successor' in text or 'parent' in text:
            return False
    return True


def str_to_bool(value):
    if hasattr(value, 'lower'):
        value = value.lower()
        return bool(value) and value != 'false' and value != '0'
    return bool(value)


def find_namespace(xxs, name):
    name_re = name.replace('-', '\-')
    if not name_re.startswith('xmlns'):
        name_re = 'xmlns:' + name_re
    return xxs.re('%s=\"([^\"]+)\"' % name_re)[0]


def register_namespace(xxs, name):
    ns = find_namespace(xxs, name)
    xxs.register_namespace(name, ns)


def register_namespaces(xxs):
    names = ('xmlns', 'xbrli', 'dei', 'us-gaap')
    for name in names:
        try:
            register_namespace(xxs, name)
        except IndexError:
            pass


class XmlXPathItemLoader(ItemLoader):

    def __init__(self, *args, **kwargs):
        super(XmlXPathItemLoader, self).__init__(*args, **kwargs)
        register_namespaces(self.selector)

    def add_xpath(self, field_name, xpath, *processors, **kw):
        values = self._get_values(xpath, **kw)
        self.add_value(field_name, values, *processors, **kw)
        return len(self._values[field_name])

    def add_xpaths(self, name, paths):
        for path in paths:
            match_count = self.add_xpath(name, path)
            if match_count > 0:
                return match_count

        return 0

    def _get_values(self, xpaths, **kw):
        xpaths = arg_to_iter(xpaths)
        return flatten([self.selector.xpath(xpath) for xpath in xpaths])


class ReportItemLoader(XmlXPathItemLoader):

    default_item_class = ReportItem
    default_output_processor = TakeFirst()

    symbol_in = MapCompose(ExtractText(), unicode.upper)
    symbol_out = Compose(get_symbol)

    amend_in = MapCompose(ExtractText(), str_to_bool)
    amend_out = Compose(get_amend)

    period_focus_in = MapCompose(ExtractText(), unicode.upper)
    period_focus_out = TakeFirst()

    revenues_in = MapCompose(MatchEndDate(float))
    revenues_out = Compose(imd_filter_member, imd_mult, ImdSumMembersOr(imd_get_revenues))

    net_income_in = MapCompose(MatchEndDate(float))
    net_income_out = Compose(imd_filter_member, imd_mult, imd_get_net_income)

    op_income_in = MapCompose(MatchEndDate(float))
    op_income_out = Compose(imd_filter_member, imd_mult, imd_get_op_income)

    eps_basic_in = MapCompose(MatchEndDate(float))
    eps_basic_out = Compose(ImdSumMembersOr(imd_get_per_share_value), lambda x: x if x < MAX_PER_SHARE_VALUE else None)

    eps_diluted_in = MapCompose(MatchEndDate(float))
    eps_diluted_out = Compose(ImdSumMembersOr(imd_get_per_share_value), lambda x: x if x < MAX_PER_SHARE_VALUE else None)

    dividend_in = MapCompose(MatchEndDate(float))
    dividend_out = Compose(imd_get_per_share_value, lambda x: x if x < MAX_PER_SHARE_VALUE and x > 0.0 else 0.0)

    assets_in = MapCompose(MatchEndDate(float))
    assets_out = Compose(imd_filter_member, imd_mult, imd_max)

    cur_assets_in = MapCompose(MatchEndDate(float))
    cur_assets_out = Compose(imd_filter_member, imd_mult, imd_max)

    cur_liab_in = MapCompose(MatchEndDate(float))
    cur_liab_out = Compose(imd_filter_member, imd_mult, imd_max)

    equity_in = MapCompose(MatchEndDate(float))
    equity_out = Compose(imd_filter_member, imd_mult, imd_get_equity)

    cash_in = MapCompose(MatchEndDate(float))
    cash_out = Compose(imd_filter_member, imd_mult, imd_max)

    cash_flow_op_in = MapCompose(MatchEndDate(float, True))
    cash_flow_op_out = Compose(imd_filter_member, imd_mult, imd_get_cash_flow)

    cash_flow_inv_in = MapCompose(MatchEndDate(float, True))
    cash_flow_inv_out = Compose(imd_filter_member, imd_mult, imd_get_cash_flow)

    cash_flow_fin_in = MapCompose(MatchEndDate(float, True))
    cash_flow_fin_out = Compose(imd_filter_member, imd_mult, imd_get_cash_flow)

    def __init__(self, *args, **kwargs):
        response = kwargs.get('response')
        if len(response.body) > THRESHOLD_TO_CLEAN:
            # Remove some useless text to reduce memory usage
            body, __ = RE_XML_GARBAGE.subn(lambda m: '><', response.body)
            response = response.replace(body=body)
            kwargs['response'] = response

        super(ReportItemLoader, self).__init__(*args, **kwargs)

        symbol = self._get_symbol()
        end_date = self._get_doc_end_date()
        fiscal_year = self._get_doc_fiscal_year()
        doc_type = self._get_doc_type()

        # ignore document that is not 10-Q or 10-K
        if not (doc_type and doc_type.split('/')[0] in ('10-Q', '10-K')):
            return

        # some documents set their amendment flag in DocumentType, e.g., '10-Q/A',
        # instead of setting it in AmendmentFlag
        amend = None
        if doc_type.endswith('/A'):
            amend = True
            doc_type = doc_type[0:-2]

        self.context.update({
            'end_date': end_date,
            'doc_type': doc_type
        })

        self.add_xpath('symbol', '//dei:TradingSymbol')
        self.add_value('symbol', symbol)

        if amend:
            self.add_value('amend', True)
        else:
            self.add_xpath('amend', '//dei:AmendmentFlag')

        if doc_type == '10-K':
            period_focus = 'FY'
        else:
            period_focus = self._get_period_focus(end_date)

        if not fiscal_year and period_focus:
            fiscal_year = self._guess_fiscal_year(end_date, period_focus)

        self.add_value('period_focus', period_focus)
        self.add_value('fiscal_year', fiscal_year)
        self.add_value('end_date', end_date)
        self.add_value('doc_type', doc_type)

        self.add_xpaths('revenues', [
            '//us-gaap:SalesRevenueNet',
            '//us-gaap:Revenues',
            '//us-gaap:SalesRevenueGoodsNet',
            '//us-gaap:SalesRevenueServicesNet',
            '//us-gaap:RealEstateRevenueNet',
            '//*[local-name()="NetRevenuesIncludingNetInterestIncome"]',
            '//*[contains(local-name(), "TotalRevenues") and contains(local-name(), "After")]',
            '//*[contains(local-name(), "TotalRevenues")]',
            '//*[local-name()="InterestAndDividendIncomeOperating" or local-name()="NoninterestIncome"]',
            '//*[contains(local-name(), "Revenue")]'
        ])
        self.add_xpath('revenues', '//us-gaap:FinancialServicesRevenue')

        self.add_xpaths('net_income', [
            '//*[contains(local-name(), "NetLossIncome") and contains(local-name(), "Corporation")]',
            '//*[local-name()="NetIncomeLossAvailableToCommonStockholdersBasic" or local-name()="NetIncomeLoss"]',
            '//us-gaap:ProfitLoss',
            '//us-gaap:IncomeLossFromContinuingOperations',
            '//*[contains(local-name(), "IncomeLossFromContinuingOperations") and not(contains(local-name(), "Per"))]',
            '//*[contains(local-name(), "NetIncomeLoss")]',
            '//*[starts-with(local-name(), "NetIncomeAttributableTo")]'
        ])

        self.add_xpaths('op_income', [
            '//us-gaap:OperatingIncomeLoss'
        ])

        self.add_xpaths('eps_basic', [
            '//us-gaap:EarningsPerShareBasic',
            '//us-gaap:IncomeLossFromContinuingOperationsPerBasicShare',
            '//us-gaap:IncomeLossFromContinuingOperationsPerBasicAndDilutedShare',
            '//*[contains(local-name(), "NetIncomeLoss") and contains(local-name(), "Per") and contains(local-name(), "Common")]',
            '//*[contains(local-name(), "Earnings") and contains(local-name(), "Per") and contains(local-name(), "Basic")]',
            '//*[local-name()="IncomePerShareFromContinuingOperationsAvailableToCompanyStockholdersBasicAndDiluted"]',
            '//*[contains(local-name(), "NetLossPerShare")]',
            '//*[contains(local-name(), "NetIncome") and contains(local-name(), "Per") and contains(local-name(), "Basic")]',
            '//*[local-name()="BasicEarningsAttributableToStockholdersPerCommonShare"]',
            '//*[local-name()="Earningspersharebasicanddiluted"]',
            '//*[contains(local-name(), "PerCommonShareBasicAndDiluted")]',
            '//*[local-name()="NetIncomeLossAttributableToCommonStockholdersBasicAndDiluted"]',
            '//us-gaap:NetIncomeLossAvailableToCommonStockholdersBasic',
            '//*[local-name()="NetIncomeLossEPS"]',
            '//*[local-name()="NetLoss"]'
        ])

        self.add_xpaths('eps_diluted', [
            '//us-gaap:EarningsPerShareDiluted',
            '//us-gaap:IncomeLossFromContinuingOperationsPerDilutedShare',
            '//us-gaap:IncomeLossFromContinuingOperationsPerBasicAndDilutedShare',
            '//*[contains(local-name(), "Earnings") and contains(local-name(), "Per") and contains(local-name(), "Diluted")]',
            '//*[local-name()="IncomePerShareFromContinuingOperationsAvailableToCompanyStockholdersBasicAndDiluted"]',
            '//*[contains(local-name(), "NetLossPerShare")]',
            '//*[contains(local-name(), "NetIncome") and contains(local-name(), "Per") and contains(local-name(), "Diluted")]',
            '//*[local-name()="DilutedEarningsAttributableToStockholdersPerCommonShare"]',
            '//us-gaap:NetIncomeLossAvailableToCommonStockholdersDiluted',
            '//*[contains(local-name(), "PerCommonShareBasicAndDiluted")]',
            '//*[local-name()="NetIncomeLossAttributableToCommonStockholdersBasicAndDiluted"]',
            '//us-gaap:EarningsPerShareBasic',
            '//*[local-name()="NetIncomeLossEPS"]',
            '//*[local-name()="NetLoss"]'
        ])

        self.add_xpaths('dividend', [
            '//us-gaap:CommonStockDividendsPerShareDeclared',
            '//us-gaap:CommonStockDividendsPerShareCashPaid'
        ])

        # if dividend isn't found in doc, assume it's 0
        self.add_value('dividend', 0.0)

        self.add_xpaths('assets', [
            '//us-gaap:Assets',
            '//us-gaap:AssetsNet',
            '//us-gaap:LiabilitiesAndStockholdersEquity'
        ])

        self.add_xpaths('cur_assets', [
            '//us-gaap:AssetsCurrent'
        ])

        self.add_xpaths('cur_liab', [
            '//us-gaap:LiabilitiesCurrent'
        ])

        self.add_xpaths('equity', [
            '//*[local-name()="StockholdersEquityIncludingPortionAttributableToNoncontrollingInterest" or local-name()="StockholdersEquity"]',
            '//*[local-name()="TotalCommonShareholdersEquity"]',
            '//*[local-name()="CommonShareholdersEquity"]',
            '//*[local-name()="CommonStockEquity"]',
            '//*[local-name()="TotalEquity"]',
            '//us-gaap:RetainedEarningsAccumulatedDeficit',
            '//*[contains(local-name(), "MembersEquityIncludingPortionAttributableToNoncontrollingInterest")]',
            '//us-gaap:CapitalizationLongtermDebtAndEquity',
            '//*[local-name()="TotalCapitalization"]'
        ])

        self.add_xpaths('cash', [
            '//us-gaap:CashCashEquivalentsAndFederalFundsSold',
            '//us-gaap:CashAndDueFromBanks',
            '//us-gaap:CashAndCashEquivalentsAtCarryingValue',
            '//us-gaap:Cash',
            '//*[local-name()="CashAndCashEquivalents"]',
            '//*[contains(local-name(), "CarryingValueOfCashAndCashEquivalents")]',
            '//*[contains(local-name(), "CashCashEquivalents")]',
            '//*[contains(local-name(), "CashAndCashEquivalents")]'
        ])

        self.add_xpaths('cash_flow_op', [
            '//us-gaap:NetCashProvidedByUsedInOperatingActivities',
            '//us-gaap:NetCashProvidedByUsedInOperatingActivitiesContinuingOperations'
        ])

        self.add_xpaths('cash_flow_inv', [
            '//us-gaap:NetCashProvidedByUsedInInvestingActivities',
            '//us-gaap:NetCashProvidedByUsedInInvestingActivitiesContinuingOperations'
        ])

        self.add_xpaths('cash_flow_fin', [
            '//us-gaap:NetCashProvidedByUsedInFinancingActivities',
            '//us-gaap:NetCashProvidedByUsedInFinancingActivitiesContinuingOperations'
        ])

    def _get_symbol(self):
        try:
            filename = self.context['response'].url.split('/')[-1]
            return filename.split('-')[0].upper()
        except IndexError:
            return None

    def _get_doc_fiscal_year(self):
        try:
            fiscal_year = self.selector.xpath('//dei:DocumentFiscalYearFocus/text()')[0].extract()
            return int(fiscal_year)
        except (IndexError, ValueError):
            return None

    def _guess_fiscal_year(self, end_date, period_focus):
        # Guess fiscal_year based on document end_date and period_focus
        date = datetime.strptime(end_date, DATE_FORMAT)
        month_ranges = {
            'Q1': (2, 3, 4),
            'Q2': (5, 6, 7),
            'Q3': (8, 9, 10),
            'FY': (11, 12, 1)
        }
        month_range = month_ranges.get(period_focus)

        # Case 1: release Q1 around March, Q2 around June, ...
        # This is what most companies do
        if date.month in month_range:
            if period_focus == 'FY' and date.month == 1:
                return date.year - 1
            return date.year

        # How many days left before 10-K's release?
        days_left_table = {
            'Q1': 270,
            'Q2': 180,
            'Q3': 90,
            'FY': 0
        }
        days_left = days_left_table.get(period_focus)

        # Other cases, assume end_date.year of its FY report equals to
        # its fiscal_year
        if days_left is not None:
            fy_date = date + timedelta(days=days_left)
            return fy_date.year

        return None

    def _get_doc_end_date(self):
        # the document end date could come from URL or document content
        # we need to guess which one is correct
        url_date_str = self.context['response'].url.split('-')[-1].split('.')[0]
        url_date = datetime.strptime(url_date_str, '%Y%m%d')
        url_date_str = url_date.strftime(DATE_FORMAT)

        try:
            doc_date_str = self.selector.xpath('//dei:DocumentPeriodEndDate/text()')[0].extract()
            doc_date = datetime.strptime(doc_date_str, DATE_FORMAT)
        except (IndexError, ValueError):
            return url_date.strftime(DATE_FORMAT)

        context_date_strs = set(self.selector.xpath('//*[local-name()="context"]//*[local-name()="endDate"]/text()').extract())

        date = url_date
        if doc_date_str in context_date_strs:
            date = doc_date

        return date.strftime(DATE_FORMAT)

    def _get_doc_type(self):
        try:
            return self.selector.xpath('//dei:DocumentType/text()')[0].extract().upper()
        except (IndexError, ValueError):
            return None

    def _get_period_focus(self, doc_end_date):
        try:
            return self.selector.xpath('//dei:DocumentFiscalPeriodFocus/text()')[0].extract().strip().upper()
        except IndexError:
            pass

        try:
            doc_yr = doc_end_date.split('-')[0]
            yr_end_date = self.selector.xpath('//dei:CurrentFiscalYearEndDate/text()')[0].extract()
            yr_end_date = yr_end_date.replace('--', doc_yr + '-')
        except IndexError:
            return None

        doc_end_date = datetime.strptime(doc_end_date, '%Y-%m-%d')
        yr_end_date = datetime.strptime(yr_end_date, '%Y-%m-%d')
        delta_days = (yr_end_date - doc_end_date).days

        if delta_days > -45 and delta_days < 45:
            return 'FY'
        elif (delta_days <= -45 and delta_days > -135) or delta_days > 225:
            return 'Q1'
        elif (delta_days <= -135 and delta_days > -225) or (delta_days > 135 and delta_days <= 225):
            return 'Q2'
        elif delta_days <= -225 or (delta_days > 45 and delta_days <= 135):
            return 'Q3'

        return 'FY'


================================================
FILE: pystock_crawler/settings.py
================================================
# Scrapy settings for pystock-crawler project
#
# For simplicity, this file contains only the most important settings by
# default. All the other settings are documented here:
#
#     http://doc.scrapy.org/en/latest/topics/settings.html
#

BOT_NAME = 'pystock-crawler'

EXPORT_FIELDS = (
    # Price columns
    'symbol', 'date', 'open', 'high', 'low', 'close', 'volume', 'adj_close',

    # Report columns
    'end_date', 'amend', 'period_focus', 'fiscal_year', 'doc_type', 'revenues', 'op_income', 'net_income',
    'eps_basic', 'eps_diluted', 'dividend', 'assets', 'cur_assets', 'cur_liab', 'cash', 'equity',
    'cash_flow_op', 'cash_flow_inv', 'cash_flow_fin',
)

FEED_EXPORTERS = {
    'csv': 'pystock_crawler.exporters.CsvItemExporter2',
    'symbollist': 'pystock_crawler.exporters.SymbolListExporter'
}

HTTPCACHE_ENABLED = True

HTTPCACHE_POLICY = 'scrapy.contrib.httpcache.RFC2616Policy'

HTTPCACHE_STORAGE = 'scrapy.contrib.httpcache.LeveldbCacheStorage'

LOG_LEVEL = 'INFO'

NEWSPIDER_MODULE = 'pystock_crawler.spiders'

SPIDER_MODULES = ['pystock_crawler.spiders']

# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'pystock-crawler (+http://www.yourdomain.com)'

CONCURRENT_REQUESTS_PER_DOMAIN = 8

COOKIES_ENABLED = False

#AUTOTHROTTLE_ENABLED = True

RETRY_TIMES = 4

EXTENSIONS = {
    'scrapy.contrib.throttle.AutoThrottle': None,
    'pystock_crawler.throttle.PassiveThrottle': 0
}

PASSIVETHROTTLE_ENABLED = True
#PASSIVETHROTTLE_DEBUG = True

DEPTH_STATS_VERBOSE = True


================================================
FILE: pystock_crawler/spiders/__init__.py
================================================
# This package will contain the spiders of your Scrapy project
#
# Please refer to the documentation for information on how to create and manage
# your spiders.


================================================
FILE: pystock_crawler/spiders/edgar.py
================================================
import os

from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.contrib.spiders import CrawlSpider, Rule

from pystock_crawler import utils
from pystock_crawler.loaders import ReportItemLoader


class URLGenerator(object):

    def __init__(self, symbols, start_date='', end_date='', start=0, count=None):
        end = start + count if count is not None else None
        self.symbols = symbols[start:end]
        self.start_date = start_date
        self.end_date = end_date

    def __iter__(self):
        url = 'http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=%s&type=10-&dateb=%s&datea=%s&owner=exclude&count=300'
        for symbol in self.symbols:
            yield (url % (symbol, self.end_date, self.start_date))


class EdgarSpider(CrawlSpider):

    name = 'edgar'
    allowed_domains = ['sec.gov']

    rules = (
        Rule(SgmlLinkExtractor(allow=('/Archives/edgar/data/[^\"]+\-index\.htm',))),
        Rule(SgmlLinkExtractor(allow=('/Archives/edgar/data/[^\"]+/[A-Za-z]+\-\d{8}\.xml',)), callback='parse_10qk'),
    )

    def __init__(self, **kwargs):
        super(EdgarSpider, self).__init__(**kwargs)

        symbols_arg = kwargs.get('symbols')
        start_date = kwargs.get('startdate', '')
        end_date = kwargs.get('enddate', '')
        limit_arg = kwargs.get('limit', '')

        utils.check_date_arg(start_date, 'startdate')
        utils.check_date_arg(end_date, 'enddate')
        start, count = utils.parse_limit_arg(limit_arg)

        if symbols_arg:
            if os.path.exists(symbols_arg):
                # get symbols from a text file
                symbols = utils.load_symbols(symbols_arg)
            else:
                # inline symbols in command
                symbols = symbols_arg.split(',')
            self.start_urls = URLGenerator(symbols, start_date, end_date, start, count)
        else:
            self.start_urls = []

    def parse_10qk(self, response):
        '''Parse 10-Q or 10-K XML report.'''
        loader = ReportItemLoader(response=response)
        item = loader.load_item()

        if 'doc_type' in item:
            doc_type = item['doc_type']
            if doc_type in ('10-Q', '10-K'):
                return item

        return None


================================================
FILE: pystock_crawler/spiders/nasdaq.py
================================================
import cStringIO
import re

from scrapy.spider import Spider

from pystock_crawler.items import SymbolItem


RE_SYMBOL = re.compile(r'^[A-Z]+$')


def generate_urls(exchanges):
    for exchange in exchanges:
        yield 'http://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=%s&render=download' % exchange


class NasdaqSpider(Spider):

    name = 'nasdaq'
    allowed_domains = ['www.nasdaq.com']

    def __init__(self, **kwargs):
        super(NasdaqSpider, self).__init__(**kwargs)

        exchanges = kwargs.get('exchanges', '').split(',')
        self.start_urls = generate_urls(exchanges)

    def parse(self, response):
        try:
            file_like = cStringIO.StringIO(response.body)

            # Ignore first row
            file_like.next()

            for line in file_like:
                tokens = line.split(',')
                symbol = tokens[0].strip('"')
                if RE_SYMBOL.match(symbol):
                    name = tokens[1].strip('"')
                    yield SymbolItem(symbol=symbol, name=name)
        finally:
            file_like.close()


================================================
FILE: pystock_crawler/spiders/yahoo.py
================================================
import cStringIO
import os
import re

from datetime import datetime
from scrapy.spider import Spider

from pystock_crawler import utils
from pystock_crawler.items import PriceItem


def parse_date(date_str):
    if date_str:
        date = datetime.strptime(date_str, '%Y%m%d')
        return date.year, date.month - 1, date.day
    return '', '', ''


def make_url(symbol, start_date=None, end_date=None):
    url = ('http://ichart.finance.yahoo.com/table.csv?'
           's=%(symbol)s&d=%(end_month)s&e=%(end_day)s&f=%(end_year)s&g=d&'
           'a=%(start_month)s&b=%(start_day)s&c=%(start_year)s&ignore=.csv')

    start_date = parse_date(start_date)
    end_date = parse_date(end_date)

    return url % {
        'symbol': symbol,
        'start_year': start_date[0],
        'start_month': start_date[1],
        'start_day': start_date[2],
        'end_year': end_date[0],
        'end_month': end_date[1],
        'end_day': end_date[2]
    }


def generate_urls(symbols, start_date=None, end_date=None):
    for symbol in symbols:
        yield make_url(symbol, start_date, end_date)


class YahooSpider(Spider):

    name = 'yahoo'
    allowed_domains = ['finance.yahoo.com']

    def __init__(self, **kwargs):
        super(YahooSpider, self).__init__(**kwargs)

        symbols_arg = kwargs.get('symbols')
        start_date = kwargs.get('startdate', '')
        end_date = kwargs.get('enddate', '')

        utils.check_date_arg(start_date, 'startdate')
        utils.check_date_arg(end_date, 'enddate')

        if symbols_arg:
            if os.path.exists(symbols_arg):
                # get symbols from a text file
                symbols = utils.load_symbols(symbols_arg)
            else:
                # inline symbols in command
                symbols = symbols_arg.split(',')
            self.start_urls = generate_urls(symbols, start_date, end_date)
        else:
            self.start_urls = []

    def parse(self, response):
        symbol = self._get_symbol_from_url(response.url)
        try:
            file_like = cStringIO.StringIO(response.body)
            rows = utils.parse_csv(file_like)
            for row in rows:
                item = PriceItem(symbol=symbol)
                for k, v in row.iteritems():
                    item[k.replace(' ', '_').lower()] = v
                yield item
        finally:
            file_like.close()

    def _get_symbol_from_url(self, url):
        match = re.search(r'[\?&]s=([^&]*)', url)
        if match:
            return match.group(1)
        return ''


================================================
FILE: pystock_crawler/tests/__init__.py
================================================


================================================
FILE: pystock_crawler/tests/base.py
================================================
import os
import unittest


# Stores temporary test data
SAMPLE_DATA_DIR = os.path.join(os.path.abspath(os.path.dirname(__file__)), 'sample_data')


class TestCaseBase(unittest.TestCase):
    '''
    Provides utility functions for test cases.

    '''
    def assert_none_or_almost_equal(self, value, expected_value):
        if expected_value is None:
            self.assertIsNone(value)
        else:
            self.assertAlmostEqual(value, expected_value)

    def assert_item(self, item, expected):
        self.assertEqual(item.get('symbol'), expected.get('symbol'))
        self.assertEqual(item.get('name'), expected.get('name'))
        self.assertEqual(item.get('amend'), expected.get('amend'))
        self.assertEqual(item.get('doc_type'), expected.get('doc_type'))
        self.assertEqual(item.get('period_focus'), expected.get('period_focus'))
        self.assertEqual(item.get('fiscal_year'), expected.get('fiscal_year'))
        self.assertEqual(item.get('end_date'), expected.get('end_date'))
        self.assert_none_or_almost_equal(item.get('revenues'), expected.get('revenues'))
        self.assert_none_or_almost_equal(item.get('net_income'), expected.get('net_income'))
        self.assert_none_or_almost_equal(item.get('eps_basic'), expected.get('eps_basic'))
        self.assert_none_or_almost_equal(item.get('eps_diluted'), expected.get('eps_diluted'))
        self.assertAlmostEqual(item.get('dividend'), expected.get('dividend'))
        self.assert_none_or_almost_equal(item.get('assets'), expected.get('assets'))
        self.assert_none_or_almost_equal(item.get('equity'), expected.get('equity'))
        self.assert_none_or_almost_equal(item.get('cash'), expected.get('cash'))
        self.assert_none_or_almost_equal(item.get('op_income'), expected.get('op_income'))
        self.assert_none_or_almost_equal(item.get('cur_assets'), expected.get('cur_assets'))
        self.assert_none_or_almost_equal(item.get('cur_liab'), expected.get('cur_liab'))
        self.assert_none_or_almost_equal(item.get('cash_flow_op'), expected.get('cash_flow_op'))
        self.assert_none_or_almost_equal(item.get('cash_flow_inv'), expected.get('cash_flow_inv'))
        self.assert_none_or_almost_equal(item.get('cash_flow_fin'), expected.get('cash_flow_fin'))


def _create_sample_data_dir():
    if not os.path.exists(SAMPLE_DATA_DIR):
        try:
            os.makedirs(SAMPLE_DATA_DIR)
        except OSError:
            pass

    assert os.path.exists(SAMPLE_DATA_DIR)

_create_sample_data_dir()


================================================
FILE: pystock_crawler/tests/test_cmdline.py
================================================
import os
import shutil
import unittest

import pystock_crawler

from envoy import run


TEST_DIR = './test_data'


# Scrapy runs on another process where working directory may be different with
# the process running the test. So we have to explicitly set PYTHONPATH to
# the absolute path of the current working directory for Scrapy process to be
# able to locate pystock_crawler module.
os.environ['PYTHONPATH'] = os.getcwd()


class PrintTest(unittest.TestCase):

    def test_no_args(self):
        r = run('./bin/pystock-crawler')
        self.assertIn('Usage:', r.std_err)

    def test_print_help(self):
        r = run('./bin/pystock-crawler -h')
        self.assertIn('Usage:', r.std_out)

        r2 = run('./bin/pystock-crawler --help')
        self.assertEqual(r.std_out, r2.std_out)

    def test_print_version(self):
        r = run('./bin/pystock-crawler -v')
        self.assertEqual(r.std_out, 'pystock-crawler %s\n' % pystock_crawler.__version__)

        r2 = run('./bin/pystock-crawler --version')
        self.assertEqual(r.std_out, r2.std_out)


class CrawlTest(unittest.TestCase):
    '''Base class for crawl test cases.'''
    def setUp(self):
        if os.path.isdir(TEST_DIR):
            shutil.rmtree(TEST_DIR)
        os.mkdir(TEST_DIR)

        self.args = {
            'output': os.path.join(TEST_DIR, '%s.out' % self.filename),
            'log_file': os.path.join(TEST_DIR, '%s.log' % self.filename),
            'working_dir': TEST_DIR
        }

    def tearDown(self):
        shutil.rmtree(TEST_DIR)

    def assert_cache(self):
        # Check if cache is there
        cache_dir = os.path.join(TEST_DIR, '.scrapy', 'httpcache', '%s.leveldb' % self.spider)
        self.assertTrue(os.path.isdir(cache_dir))

    def assert_log(self):
        # Check if log file is there
        log_path = self.args['log_file']
        self.assertTrue(os.path.isfile(log_path))

    def get_output_content(self):
        output_path = self.args['output']
        self.assertTrue(os.path.isfile(output_path))

        with open(output_path) as f:
            content = f.read()
        return content


class CrawlSymbolsTest(CrawlTest):

    filename = 'symbols'
    spider = 'nasdaq'

    def assert_nyse_output(self):
        # Check if some common NYSE symbols are in output
        content = self.get_output_content()
        self.assertIn('JPM', content)
        self.assertIn('KO', content)
        self.assertIn('WMT', content)

        # NASDAQ symbols shouldn't be
        self.assertNotIn('AAPL', content)
        self.assertNotIn('GOOG', content)
        self.assertNotIn('YHOO', content)

    def assert_nyse_and_nasdaq_output(self):
        # Check if some common NYSE symbols are in output
        content = self.get_output_content()
        self.assertIn('JPM', content)
        self.assertIn('KO', content)
        self.assertIn('WMT', content)

        # Check if some common NASDAQ symbols are in output
        self.assertIn('AAPL', content)
        self.assertIn('GOOG', content)
        self.assertIn('YHOO', content)

    def test_crawl_nyse(self):
        r = run('./bin/pystock-crawler symbols NYSE -o %(output)s -l %(log_file)s -w %(working_dir)s' % self.args)
        self.assertEqual(r.status_code, 0)
        self.assert_nyse_output()
        self.assert_log()
        self.assert_cache()

    def test_crawl_nyse_and_nasdaq(self):
        r = run('./bin/pystock-crawler symbols NYSE,NASDAQ -o %(output)s -l %(log_file)s -w %(working_dir)s --sort' % self.args)
        self.assertEqual(r.status_code, 0)
        self.assert_nyse_and_nasdaq_output()
        self.assert_log()
        self.assert_cache()


class CrawlPricesTest(CrawlTest):

    filename = 'prices'
    spider = 'yahoo'

    def test_crawl_inline_symbols(self):
        r = run('./bin/pystock-crawler prices GOOG,IBM -o %(output)s -l %(log_file)s -w %(working_dir)s' % self.args)
        self.assertEqual(r.status_code, 0)

        content = self.get_output_content()
        self.assertIn('GOOG', content)
        self.assertIn('IBM', content)
        self.assert_log()
        self.assert_cache()

    def test_crawl_symbol_file(self):
        # Create a sample symbol file
        symbol_file = os.path.join(TEST_DIR, 'symbols.txt')
        with open(symbol_file, 'w') as f:
            f.write('WMT\nJPM')
        self.args['symbol_file'] = symbol_file

        r = run('./bin/pystock-crawler prices %(symbol_file)s -o %(output)s -l %(log_file)s -w %(working_dir)s --sort' % self.args)
        self.assertEqual(r.status_code, 0)

        content = self.get_output_content()
        self.assertIn('WMT', content)
        self.assertIn('JPM', content)
        self.assert_log()
        self.assert_cache()


class CrawlReportsTest(CrawlTest):

    filename = 'reports'
    spider = 'edgar'

    def test_crawl_inline_symbols(self):
        r = run('./bin/pystock-crawler reports KO,MCD -o %(output)s -l %(log_file)s -w %(working_dir)s '
                '-s 20130401 -e 20130531' % self.args)
        self.assertEqual(r.status_code, 0)

        content = self.get_output_content()
        self.assertIn('KO', content)
        self.assertIn('MCD', content)
        self.assert_log()
        self.assert_cache()

    def test_crawl_symbol_file(self):
        # Create a sample symbol file
        symbol_file = os.path.join(TEST_DIR, 'symbols.txt')
        with open(symbol_file, 'w') as f:
            f.write('KO\nMCD')
        self.args['symbol_file'] = symbol_file

        r = run('./bin/pystock-crawler reports %(symbol_file)s -o %(output)s -l %(log_file)s -w %(working_dir)s '
                '-s 20130401 -e 20130531 --sort' % self.args)
        self.assertEqual(r.status_code, 0)

        content = self.get_output_content()
        self.assertIn('KO', content)
        self.assertIn('MCD', content)
        self.assert_log()
        self.assert_cache()

        # Check CSV header
        expected_header = [
            'symbol', 'end_date', 'amend', 'period_focus', 'fiscal_year', 'doc_type',
            'revenues', 'op_income', 'net_income', 'eps_basic', 'eps_diluted', 'dividend',
            'assets', 'cur_assets', 'cur_liab', 'cash', 'equity', 'cash_flow_op',
            'cash_flow_inv', 'cash_flow_fin'
        ]
        head_line = content.split('\n')[0].rstrip()
        self.assertEqual(head_line.split(','), expected_header)

    def test_merge_empty_results(self):
        # Ridiculous date range (1800/1/1) -> empty result
        r = run('./bin/pystock-crawler reports KO,MCD -o %(output)s -l %(log_file)s -w %(working_dir)s '
                '-s 18000101 -e 18000101 -b 1' % self.args)
        self.assertEqual(r.status_code, 0)

        content = self.get_output_content()
        self.assertFalse(content)

        # Make sure subfiles are deleted
        filename = self.args['output']
        self.assertFalse(os.path.exists(os.path.join('%s.1' % filename)))
        self.assertFalse(os.path.exists(os.path.join('%s.2' % filename)))


================================================
FILE: pystock_crawler/tests/test_loaders.py
================================================
import os
import requests
import urlparse

from scrapy.http.response.xml import XmlResponse

from pystock_crawler.loaders import ReportItemLoader
from pystock_crawler.tests.base import SAMPLE_DATA_DIR, TestCaseBase


def create_response(file_path):
    with open(file_path) as f:
        body = f.read()
    return XmlResponse('file://%s' % file_path.replace('\\', '/'), body=body)


def download(url, local_path):
    if not os.path.exists(local_path):
        dir_path = os.path.dirname(local_path)
        if not os.path.exists(dir_path):
            try:
                os.makedirs(dir_path)
            except OSError:
                pass

        assert os.path.exists(dir_path)

        with open(local_path, 'wb') as f:
            r = requests.get(url, stream=True)
            for chunk in r.iter_content(chunk_size=4096):
                f.write(chunk)


def parse_xml(url):
    url_path = urlparse.urlparse(url).path
    local_path = os.path.join(SAMPLE_DATA_DIR, url_path[1:])
    download(url, local_path)
    response = create_response(local_path)
    loader = ReportItemLoader(response=response)
    return loader.load_item()


class ReportItemLoaderTest(TestCaseBase):

    def test_a_20110131(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1090872/000110465911013291/a-20110131.xml')
        self.assert_item(item, {
            'symbol': 'A',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2011,
            'end_date': '2011-01-31',
            'revenues': 1519000000,
            'op_income': 211000000,
            'net_income': 193000000,
            'eps_basic': 0.56,
            'eps_diluted': 0.54,
            'dividend': 0.0,
            'assets': 8044000000,
            'cur_assets': 4598000000,
            'cur_liab': 1406000000,
            'equity': 3339000000,
            'cash': 2638000000,
            'cash_flow_op': 120000000,
            'cash_flow_inv': 1500000000,
            'cash_flow_fin': -1634000000
        })

    def test_aa_20120630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/4281/000119312512317135/aa-20120630.xml')
        self.assert_item(item, {
            'symbol': 'AA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2012,
            'end_date': '2012-06-30',
            'revenues': 5963000000,
            'op_income': None,  # Missing value
            'net_income': -2000000,
            'eps_basic': None,  # EPS is 0 actually, but got no data in XML
            'eps_diluted': None,
            'dividend': 0.03,
            'assets': 39498000000,
            'cur_assets': 7767000000,
            'cur_liab': 6151000000,
            'equity': 16914000000,
            'cash': 1712000000,
            'cash_flow_op': 301000000,
            'cash_flow_inv': -704000000,
            'cash_flow_fin': 196000000
        })

    def test_aapl_20100626(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/320193/000119312510162840/aapl-20100626.xml')
        self.assert_item(item, {
            'symbol': 'AAPL',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2010,
            'end_date': '2010-06-26',
            'revenues': 15700000000,
            'op_income': 4234000000,
            'net_income': 3253000000,
            'eps_basic': 3.57,
            'eps_diluted': 3.51,
            'dividend': 0.0,
            'assets': 64725000000,
            'cur_assets': 36033000000,
            'cur_liab': 15612000000,
            'equity': 43111000000,
            'cash': 9705000000,
            'cash_flow_op': 12912000000,
            'cash_flow_inv': -9471000000,
            'cash_flow_fin': 1001000000
        })

    def test_aapl_20110326(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/320193/000119312511104388/aapl-20110326.xml')
        self.assert_item(item, {
            'symbol': 'AAPL',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2011,
            'end_date': '2011-03-26',
            'revenues': 24667000000,
            'net_income': 5987000000,
            'op_income': 7874000000,
            'eps_basic': 6.49,
            'eps_diluted': 6.40,
            'dividend': 0.0,
            'assets': 94904000000,
            'cur_assets': 46997000000,
            'cur_liab': 24327000000,
            'equity': 61477000000,
            'cash': 15978000000,
            'cash_flow_op': 15992000000,
            'cash_flow_inv': -12251000000,
            'cash_flow_fin': 976000000
        })

    def test_aapl_20120929(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/320193/000119312512444068/aapl-20120929.xml')
        self.assert_item(item, {
            'symbol': 'AAPL',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2012,
            'end_date': '2012-09-29',
            'revenues': 156508000000,
            'op_income': 55241000000,
            'net_income': 41733000000,
            'eps_basic': 44.64,
            'eps_diluted': 44.15,
            'dividend': 2.65,
            'assets': 176064000000,
            'cur_assets': 57653000000,
            'cur_liab': 38542000000,
            'equity': 118210000000,
            'cash': 10746000000,
            'cash_flow_op': 50856000000,
            'cash_flow_inv': -48227000000,
            'cash_flow_fin': -1698000000
        })

    def test_aes_20100331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/874761/000119312510111183/aes-20100331.xml')
        self.assert_item(item, {
            'symbol': 'AES',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2010,
            'end_date': '2010-03-31',
            'revenues': 4112000000,
            'op_income': None,  # Missing value
            'net_income': 187000000,
            'eps_basic': 0.27,
            'eps_diluted': 0.27,
            'dividend': 0.0,
            'assets': 41882000000,
            'cur_assets': 10460000000,
            'cur_liab': 6894000000,
            'equity': 10536000000,
            'cash': 3392000000,
            'cash_flow_op': 684000000,
            'cash_flow_inv': -595000000,
            'cash_flow_fin': 1515000000
        })

    def test_adbe_20060914(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/796343/000110465906066129/adbe-20060914.xml')

        # Old document is not supported
        self.assertFalse(item)

    def test_adbe_20090227(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/796343/000079634309000021/adbe-20090227.xml')
        self.assert_item(item, {
            'symbol': 'ADBE',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2009,
            'end_date': '2009-02-27',
            'revenues': 786390000,
            'op_income': 207916000,
            'net_income': 156435000,
            'eps_basic': 0.3,
            'eps_diluted': 0.3,
            'dividend': 0.0,
            'assets': 5887596000,
            'cur_assets': 2868991000,
            'cur_liab': 636865000,
            'equity': 4611160000,
            'cash': 1148925000,
            'cash_flow_op': 365743000,
            'cash_flow_inv': -131562000,
            'cash_flow_fin': 28675000
        })

    def test_agn_20101231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/850693/000119312511050632/agn-20101231.xml')
        self.assert_item(item, {
            'symbol': 'AGN',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2010,
            'end_date': '2010-12-31',
            'revenues': 4919400000,
            'op_income': 258600000,
            'net_income': 600000,
            'eps_basic': 0.0,
            'eps_diluted': 0.0,
            'dividend': 0.2,
            'assets': 8308100000,
            'cur_assets': 3993700000,
            'cur_liab': 1528400000,
            'equity': 4781100000,
            'cash': 1991200000,
            'cash_flow_op': 463900000,
            'cash_flow_inv': -977200000,
            'cash_flow_fin': 563000000
        })

    def test_aig_20130630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/5272/000104746913008075/aig-20130630.xml')
        self.assert_item(item, {
            'symbol': 'AIG',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2013-06-30',
            'revenues': 17315000000,
            'net_income': 2731000000,
            'op_income': None,
            'eps_basic': 1.85,
            'eps_diluted': 1.84,
            'dividend': 0.0,
            'assets': 537438000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 98155000000,
            'cash': 1762000000,
            'cash_flow_op': 1674000000,
            'cash_flow_inv': 6071000000,
            'cash_flow_fin': -7055000000
        })

    def test_aiv_20110630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/922864/000095012311070591/aiv-20110630.xml')
        self.assert_item(item, {
            'symbol': 'AIV',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2011,
            'end_date': '2011-06-30',
            'revenues': 281035000,
            'op_income': 49791000,
            'net_income': -33177000,
            'eps_basic': -0.28,
            'eps_diluted': -0.28,
            'dividend': 0.12,
            'assets': 7164972000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 1241336000,
            'cash': 85324000,
            'cash_flow_op': 95208000,
            'cash_flow_inv': -33538000,
            'cash_flow_fin': -87671000
        })

    def test_all_20130331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/899051/000110465913035969/all-20130331.xml')
        self.assert_item(item, {
            'symbol': 'ALL',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2013,
            'end_date': '2013-03-31',
            'revenues': 8463000000,
            'op_income': None,
            'net_income': 709000000,
            'eps_basic': 1.49,
            'eps_diluted': 1.47,
            'dividend': 0.25,
            'assets': 126612000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 20619000000,
            'cash': 820000000,
            'cash_flow_op': 740000000,
            'cash_flow_inv': 136000000,
            'cash_flow_fin': -862000000
        })

    def test_apa_20120930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/6769/000119312512457830/apa-20120930.xml')
        self.assert_item(item, {
            'symbol': 'APA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2012,
            'end_date': '2012-09-30',
            'revenues': 4179000000,
            'op_income': None,
            'net_income': 161000000,
            'eps_basic': 0.41,
            'eps_diluted': 0.41,
            'dividend': 0.17,
            'assets': 58810000000,
            'cur_assets': 5044000000,
            'cur_liab': 5390000000,
            'equity': 30714000000,
            'cash': 318000000,
            'cash_flow_op': 6422000000,
            'cash_flow_inv': -10560000000,
            'cash_flow_fin': 4161000000
        })

    def test_axp_20100930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/4962/000095012310100214/axp-20100930.xml')
        self.assert_item(item, {
            'symbol': 'AXP',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2010,
            'end_date': '2010-09-30',
            'revenues': 6660000000,
            'op_income': 1640000000,
            'net_income': 1093000000,
            'eps_basic': 0.91,
            'eps_diluted': 0.9,
            'dividend': 0.18,
            'assets': 146056000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 15920000000,
            'cash': 21341000000,
            'cash_flow_op': 7227000000,
            'cash_flow_inv': 5298000000,
            'cash_flow_fin': -7885000000
        })

    def test_axp_20120630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/4962/000119312512332179/axp-20120630.xml')
        self.assert_item(item, {
            'symbol': 'AXP',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2012,
            'end_date': '2012-06-30',
            'revenues': 7504000000,
            'op_income': None,
            'net_income': 1339000000,
            'eps_basic': 1.16,
            'eps_diluted': 1.15,
            'dividend': 0.2,
            'assets': 148128000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 19267000000,
            'cash': 22072000000,
            'cash_flow_op': 6742000000,
            'cash_flow_inv': -1771000000,
            'cash_flow_fin': -7786000000
        })

    def test_axp_20121231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/4962/000119312513070554/axp-20121231.xml')
        self.assert_item(item, {
            'symbol': 'AXP',
            'amend': True,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2012,
            'end_date': '2012-12-31',
            'revenues': 29592000000,
            'op_income': None,
            'net_income': 4482000000,
            'eps_basic': 3.91,
            'eps_diluted': 3.89,
            'dividend': 0.8,
            'assets': 153140000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 18886000000,
            'cash': 22250000000,
            'cash_flow_op': 7082000000,
            'cash_flow_inv': -6545000000,
            'cash_flow_fin': -3268000000
        })

    def test_axp_20130331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/4962/000119312513180601/axp-20130331.xml')
        self.assert_item(item, {
            'symbol': 'AXP',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2013,
            'end_date': '2013-03-31',
            'revenues': 7384000000,
            'op_income': None,
            'net_income': 1280000000,
            'eps_basic': 1.15,
            'eps_diluted': 1.15,
            'dividend': 0.2,
            'assets': 156855000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 19290000000,
            'cash': 27964000000,
            'cash_flow_op': 7547000000,
            'cash_flow_inv': 32000000,
            'cash_flow_fin': -1830000000
        })

    def test_ba_20091231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12927/000119312510024406/ba-20091231.xml')
        self.assert_item(item, {
            'symbol': 'BA',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2009,
            'end_date': '2009-12-31',
            'revenues': 68281000000,
            'op_income': 2096000000,
            'net_income': 1312000000,
            'eps_basic': 1.86,
            'eps_diluted': 1.84,
            'dividend': 1.68,
            'assets': 62053000000,
            'cur_assets': 35275000000,
            'cur_liab': 32883000000,
            'equity': 2225000000,
            'cash': 9215000000,
            'cash_flow_op': 5603000000,
            'cash_flow_inv': -3794000000,
            'cash_flow_fin': 4094000000
        })

    def test_ba_20110930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12927/000119312511281613/ba-20110930.xml')
        self.assert_item(item, {
            'symbol': 'BA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2011,
            'end_date': '2011-09-30',
            'revenues': 17727000000,
            'op_income': 1714000000,
            'net_income': 1098000000,
            'eps_basic': 1.47,
            'eps_diluted': 1.46,
            'dividend': 0.42,
            'assets': 74163000000,
            'cur_assets': 46347000000,
            'cur_liab': 37593000000,
            'equity': 6061000000,
            'cash': 5954000000,
            'cash_flow_op': 1092000000,
            'cash_flow_inv': 856000000,
            'cash_flow_fin': -1354000000
        })

    def test_ba_20130331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12927/000001292713000023/ba-20130331.xml')
        self.assert_item(item, {
            'symbol': 'BA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2013,
            'end_date': '2013-03-31',
            'revenues': 18893000000,
            'op_income': 1528000000,
            'net_income': 1106000000,
            'eps_basic': 1.45,
            'eps_diluted': 1.44,
            'dividend': 0.49,
            'assets': 90447000000,
            'cur_assets': 59490000000,
            'cur_liab': 45666000000,
            'equity': 7560000000,
            'cash': 8335000000,
            'cash_flow_op': 524000000,
            'cash_flow_inv': -814000000,
            'cash_flow_fin': -1705000000
        })

    def test_bbt_20110930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/92230/000119312511304459/bbt-20110930.xml')
        self.assert_item(item, {
            'symbol': 'BBT',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2011,
            'end_date': '2011-09-30',
            'revenues': 2440000000,
            'op_income': None,
            'net_income': 366000000,
            'eps_basic': 0.52,
            'eps_diluted': 0.52,
            'dividend': 0.16,
            'assets': 167677000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 17541000000,
            'cash': 1312000000,
            'cash_flow_op': 4348000000,
            'cash_flow_inv': -10838000000,
            'cash_flow_fin': 8509000000
        })

    def test_bk_20100331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1390777/000119312510112944/bk-20100331.xml')
        self.assert_item(item, {
            'symbol': 'BK',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2010,
            'end_date': '2010-03-31',
            'revenues': 883000000,
            'op_income': None,
            'net_income': 559000000,
            'eps_basic': 0.46,
            'eps_diluted': 0.46,
            'dividend': 0.09,
            'assets': 220551000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 30455000000,
            'cash': 3307000000,
            'cash_flow_op': 1191000000,
            'cash_flow_inv': 512000000,
            'cash_flow_fin': -2126000000
        })

    def test_blk_20130630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1364742/000119312513326890/blk-20130630.xml')
        self.assert_item(item, {
            'symbol': 'BLK',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2013-06-30',
            'revenues': 2482000000,
            'op_income': 849000000,
            'net_income': 729000000,
            'eps_basic': 4.27,
            'eps_diluted': 4.19,
            'dividend': 1.68,
            'assets': 193745000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 25755000000,
            'cash': 3668000000,
            'cash_flow_op': 1330000000,
            'cash_flow_inv': 10000000,
            'cash_flow_fin': -2193000000
        })

    def test_c_20090630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/831001/000104746909007400/c-20090630.xml')
        self.assert_item(item, {
            'symbol': 'C',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2009,
            'end_date': '2009-06-30',
            'revenues': 29969000000,
            'net_income': 4279000000,
            'op_income': None,
            'eps_basic': 0.49,
            'eps_diluted': 0.49,
            'dividend': 0.0,
            'assets': 1848533000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 154168000000,
            'cash': 26915000000,
            'cash_flow_op': -20737000000,
            'cash_flow_inv': 16457000000,
            'cash_flow_fin': 959000000
        })

    def test_cbs_20100331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/813828/000104746910004823/cbs-20100331.xml')
        self.assert_item(item, {
            'symbol': 'CBS',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2010,
            'end_date': '2010-03-31',
            'revenues': 3530900000,
            'op_income': 153400000,
            'net_income': -26200000,
            'eps_basic': -0.04,
            'eps_diluted': -0.04,
            'dividend': 0.05,
            'assets': 26756100000,
            'cur_assets': 5705200000,
            'cur_liab': 4712300000,
            'equity': 9046100000,
            'cash': 872700000,
            'cash_flow_op': 700700000,
            'cash_flow_inv': -73600000,
            'cash_flow_fin': -471100000
        })

    def test_cbs_20111231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/813828/000104746912001373/cbs-20111231.xml')
        self.assert_item(item, {
            'symbol': 'CBS',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2011,
            'end_date': '2011-12-31',
            'revenues': 14245000000,
            'op_income': 2529000000,
            'net_income': 1305000000,
            'eps_basic': 1.97,
            'eps_diluted': 1.92,
            'dividend': 0.35,
            'assets': 26197000000,
            'cur_assets': 5543000000,
            'cur_liab': 3933000000,
            'equity': 9908000000,
            'cash': 660000000,
            'cash_flow_op': 1749000000,
            'cash_flow_inv': -389000000,
            'cash_flow_fin': -1180000000
        })

    def test_cbs_20130630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/813828/000104746913007929/cbs-20130630.xml')
        self.assert_item(item, {
            'symbol': 'CBS',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2013-06-30',
            'revenues': 3699000000,
            'op_income': 838000000,
            'net_income': 472000000,
            'eps_basic': 0.78,
            'eps_diluted': 0.76,
            'dividend': 0.12,
            'assets': 25693000000,
            'cur_assets': 4770000000,
            'cur_liab': 3825000000,
            'equity': 9601000000,
            'cash': 282000000,
            'cash_flow_op': 1051000000,
            'cash_flow_inv': -230000000,
            'cash_flow_fin': -1247000000
        })

    def test_cce_20101001(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1491675/000119312510239952/cce-20101001.xml')
        self.assert_item(item, {
            'symbol': 'CCE',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2010,
            'end_date': '2010-10-01',
            'revenues': 1681000000,
            'op_income': 244000000,
            'net_income': 208000000,
            'eps_basic': 0.61,
            'eps_diluted': 0.61,
            'dividend': 0.0,
            'assets': 8457000000,
            'cur_assets': 3145000000,
            'cur_liab': 2154000000,
            'equity': 3277000000,
            'cash': 476000000,
            'cash_flow_op': 620000000,
            'cash_flow_inv': -705000000,
            'cash_flow_fin': 178000000
        })

    def test_cce_20101231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1491675/000119312511033197/cce-20101231.xml')
        self.assert_item(item, {
            'symbol': 'CCE',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2010,
            'end_date': '2010-12-31',
            'revenues': 6714000000,
            'op_income': 810000000,
            'net_income': 624000000,
            'eps_basic': 1.84,
            'eps_diluted': 1.83,
            'dividend': 0.12,
            'assets': 8596000000,
            'cur_assets': 2230000000,
            'cur_liab': 1942000000,
            'equity': 3143000000,
            'cash': 321000000,
            'cash_flow_op': 825000000,
            'cash_flow_inv': -739000000,
            'cash_flow_fin': -144000000
        })

    def test_cci_20091231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1051470/000119312510031419/cci-20091231.xml')
        self.assert_item(item, {
            'symbol': 'CCI',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2009,
            'end_date': '2009-12-31',
            'revenues': 1685407000,
            'op_income': 433991000,
            'net_income': -135138000,
            'eps_basic': -0.47,
            'eps_diluted': -0.47,
            'dividend': 0.0,
            'assets': 10956606000,
            'cur_assets': 1196033000,
            'cur_liab': 754105000,
            'equity': 2936085000,
            'cash': 766146000,
            'cash_flow_op': 571256000,
            'cash_flow_inv': -172145000,
            'cash_flow_fin': 214396000
        })

    def test_ccmm_20110630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1091667/000109166711000103/ccmm-20110630.xml')
        self.assert_item(item, {
            'symbol': 'CCMM',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2011,
            'end_date': '2011-06-30',
            'revenues': 1791000000,
            'op_income': 270000000,
            'net_income': -107000000,
            'eps_basic': -0.98,
            'eps_diluted': -0.98,
            'dividend': 0.0,
            'assets': None,
            'cur_assets': None,  # Seems the source filing got the wrong context date on balance sheet
            'cur_liab': None,
            'equity': None,
            'cash': 194000000,
            'cash_flow_op': 907000000,
            'cash_flow_inv': -694000000,
            'cash_flow_fin': -51000000
        })

    def test_chtr_20111231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1091667/000109166712000026/chtr-20111231.xml')
        self.assert_item(item, {
            'symbol': 'CHTR',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2011,
            'end_date': '2011-12-31',
            'revenues': 7204000000,
            'op_income': 1041000000,
            'net_income': -369000000,
            'eps_basic': -3.39,
            'eps_diluted': -3.39,
            'dividend': 0.0,
            'assets': 15605000000,
            'cur_assets': 370000000,
            'cur_liab': 1153000000,
            'equity': 409000000,
            'cash': 2000000,
            'cash_flow_op': 1737000000,
            'cash_flow_inv': -1367000000,
            'cash_flow_fin': -373000000
        })

    def test_ci_20130331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/701221/000110465913036475/ci-20130331.xml')
        self.assert_item(item, {
            'symbol': 'CI',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2013,
            'end_date': '2013-03-31',
            'revenues': 8183000000,
            'op_income': None,
            'net_income': 57000000,
            'eps_basic': 0.2,
            'eps_diluted': 0.2,
            'dividend': 0.04,
            'assets': 54939000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 9660000000,
            'cash': 3306000000,
            'cash_flow_op': -805000000,
            'cash_flow_inv': 962000000,
            'cash_flow_fin': 185000000
        })

    def test_cit_20100630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1171825/000089109210003376/cit-20100331.xml')
        self.assert_item(item, {
            'symbol': 'CIT',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2010,
            'end_date': '2010-06-30',
            'revenues': 669500000,
            'op_income': None,
            'net_income': 142100000,
            'eps_basic': 0.71,
            'eps_diluted': 0.71,
            'dividend': 0.0,
            'assets': 54916800000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 8633900000,
            'cash': 1060700000,
            'cash_flow_op': 178100000,
            'cash_flow_inv': 7122800000,
            'cash_flow_fin': -6218700000
        })

    def test_csc_20120928(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/23082/000002308212000073/csc-20120928.xml')
        self.assert_item(item, {
            'symbol': 'CSC',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2012-09-28',
            'revenues': 3854000000,
            'op_income': 298000000,
            'net_income': 130000000,
            'eps_basic': 0.84,
            'eps_diluted': 0.83,
            'dividend': 0.2,
            'assets': 11649000000,
            'cur_assets': 5468000000,
            'cur_liab': 4015000000,
            'equity': 2885000000,
            'cash': 1850000000,
            'cash_flow_op': 665000000,
            'cash_flow_inv': -366000000,
            'cash_flow_fin': 469000000
        })

    def test_disca_20090630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1437107/000095012309029613/disca-20090630.xml')
        self.assert_item(item, {
            'symbol': 'DISCA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2009,
            'end_date': '2009-06-30',
            'revenues': 881000000,
            'op_income': 486000000,
            'net_income': 183000000,
            'eps_basic': 0.43,
            'eps_diluted': 0.43,
            'dividend': 0.0,
            'assets': 10696000000,
            'cur_assets': 1331000000,
            'cur_liab': 1227000000,
            'equity': 5918000000,
            'cash': 339000000,
            'cash_flow_op': 320000000,
            'cash_flow_inv': 288000000,
            'cash_flow_fin': -371000000
        })

    def test_disca_20090930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1437107/000095012309056946/disca-20090930.xml')
        self.assert_item(item, {
            'symbol': 'DISCA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2009,
            'end_date': '2009-09-30',
            'revenues': 854000000,
            'op_income': 215000000,
            'net_income': 95000000,
            'eps_basic': 0.22,
            'eps_diluted': 0.22,
            'dividend': 0.0,
            'assets': 10741000000,
            'cur_assets': 1417000000,
            'cur_liab': 762000000,
            'equity': 6042000000,
            'cash': 401000000,
            'cash_flow_op': 358000000,
            'cash_flow_inv': 279000000,
            'cash_flow_fin': -343000000
        })

    def test_dltr_20130504(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/935703/000093570313000029/dltr-20130504.xml')
        self.assert_item(item, {
            'symbol': 'DLTR',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2013,
            'end_date': '2013-05-04',
            'revenues': 1865800000,
            'op_income': 216600000,
            'net_income': 133500000,
            'eps_basic': 0.6,
            'eps_diluted': 0.59,
            'dividend': 0.0,
            'assets': 2811800000,
            'cur_assets': 1489800000,
            'cur_liab': 663000000,
            'equity': 1739700000,
            'cash': 383300000,
            'cash_flow_op': 129300000,
            'cash_flow_inv': -88200000,
            'cash_flow_fin': -57400000
        })

    def test_dtv_20110331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1465112/000104746911004655/dtv-20110331.xml')
        self.assert_item(item, {
            'symbol': 'DTV',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2011,
            'end_date': '2011-03-31',
            'revenues': 6319000000,
            'op_income': 1155000000,
            'net_income': 674000000,
            'eps_basic': 0.85,
            'eps_diluted': 0.85,
            'dividend': 0.0,
            'assets': 20593000000,
            'cur_assets': 6938000000,
            'cur_liab': 4125000000,
            'equity': -902000000,
            'cash': 4295000000,
            'cash_flow_op': 1309000000,
            'cash_flow_inv': -544000000,
            'cash_flow_fin': 2028000000
        })

    def test_ebay_20100630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1065088/000119312510164115/ebay-20100630.xml')
        self.assert_item(item, {
            'symbol': 'EBAY',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2010,
            'end_date': '2010-06-30',
            'revenues': 2215379000,
            'op_income': 484565000,
            'net_income': 412192000,
            'eps_basic': 0.31,
            'eps_diluted': 0.31,
            'dividend': 0.0,
            'assets': 18747584000,
            'cur_assets': 8675313000,
            'cur_liab': 3564261000,
            'equity': 14169291000,
            'cash': 4037442000,
            'cash_flow_op': 1144641000,
            'cash_flow_inv': -835635000,
            'cash_flow_fin': 50363000
        })

    def test_ebay_20130331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1065088/000106508813000058/ebay-20130331.xml')
        self.assert_item(item, {
            'symbol': 'EBAY',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2013,
            'end_date': '2013-03-31',
            'revenues': 3748000000,
            'op_income': 800000000,
            'net_income': 677000000,
            'eps_basic': 0.52,
            'eps_diluted': 0.51,
            'dividend': 0.0,
            'assets': 38000000000,
            'cur_assets': 22336000000,
            'cur_liab': 11720000000,
            'equity': 21112000000,
            'cash': 6530000000,
            'cash_flow_op': 937000000,
            'cash_flow_inv': -719000000,
            'cash_flow_fin': -411000000
        })

    def test_ecl_20120930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/31462/000110465912072308/ecl-20120930.xml')
        self.assert_item(item, {
            'symbol': 'ECL',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2012,
            'end_date': '2012-09-30',
            'revenues': 3023300000,
            'op_income': 401200000,
            'net_income': 238000000,
            'eps_basic': 0.81,
            'eps_diluted': 0.8,
            'dividend': 0.2,
            'assets': 16722800000,
            'cur_assets': 4072900000,
            'cur_liab': 2818700000,
            'equity': 6026200000,
            'cash': 324000000,
            'cash_flow_op': 720800000,
            'cash_flow_inv': -414900000,
            'cash_flow_fin': -1815800000
        })

    def test_ed_20130930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/23632/000119312513425393/ed-20130930.xml')
        self.assert_item(item, {
            'symbol': 'ED',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2013,
            'end_date': '2013-09-30',
            'revenues': 3484000000,
            'op_income': 855000000,
            'net_income': 464000000,
            'eps_basic': 1.58,
            'eps_diluted': 1.58,
            'dividend': 0.615,
            'assets': 41964000000,
            'cur_assets': 3704000000,
            'cur_liab': 4373000000,
            'equity': 12166000000,
            'cash': 74000000,
            'cash_flow_op': 1238000000,
            'cash_flow_inv': -1895000000,
            'cash_flow_fin': 337000000
        })

    def test_eqt_20101231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/33213/000110465911009751/eqt-20101231.xml')
        self.assert_item(item, {
            'symbol': 'EQT',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2010,
            'end_date': '2010-12-31',
            'revenues': 1322708000,
            'op_income': 470479000,
            'net_income': 227700000,
            'eps_basic': 1.58,
            'eps_diluted': 1.57,
            'dividend': 0.88,
            'assets': 7098438000,
            'cur_assets': 827940000,
            'cur_liab': 596984000,
            'equity': 3078696000,
            'cash': 0.0,
            'cash_flow_op': 789740000,
            'cash_flow_inv': -1239429000,
            'cash_flow_fin': 449689000
        })

    def test_etr_20121231(self):
        # Large file test (121 MB)
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/7323/000006598413000050/etr-20121231.xml')
        self.assert_item(item, {
            'symbol': 'ETR',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2012,
            'end_date': '2012-12-31',
            'revenues': 10302079000,
            'op_income': 1301181000,
            'net_income': 846673000,
            'eps_basic': 4.77,
            'eps_diluted': 4.76,
            'dividend': 3.32,
            'assets': 43202502000,
            'cur_assets': 3683126000,
            'cur_liab': 4106321000,
            'equity': 9291089000,
            'cash': 532569000,
            'cash_flow_op': 2940285000,
            'cash_flow_inv': -3639797000,
            'cash_flow_fin': 538151000
        })

    def test_exc_20100930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/22606/000119312510234590/exc-20100930.xml')
        self.assert_item(item, {
            'symbol': 'EXC',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2010,
            'end_date': '2010-09-30',
            'revenues': 5291000000,
            'op_income': 1366000000,
            'net_income': 845000000,
            'eps_basic': 1.28,
            'eps_diluted': 1.27,
            'dividend': 0.53,
            'assets': 50948000000,
            'cur_assets': 6760000000,
            'cur_liab': 3967000000,
            'equity': 13955000000,
            'cash': 2735000000,
            'cash_flow_op': 4112000000,
            'cash_flow_inv': -2037000000,
            'cash_flow_fin': -1350000000
        })

    def test_fast_20090630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/815556/000119312509154691/fast-20090630.xml')
        self.assert_item(item, {
            'symbol': 'FAST',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2009,
            'end_date': '2009-06-30',
            'revenues': 474894000,
            'op_income': 69938000,
            'net_income': 43538000,
            'eps_basic': 0.29,
            'eps_diluted': 0.29,
            'dividend': 0.0,
            'assets': 1328684000,
            'cur_assets': 988997000,
            'cur_liab': 127950000,
            'equity': 1186845000,
            'cash': 173667000,
            'cash_flow_op': 167552000,
            'cash_flow_inv': -28942000,
            'cash_flow_fin': -51986000
        })

    def test_fast_20090930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/815556/000119312509212481/fast-20090930.xml')
        self.assert_item(item, {
            'symbol': 'FAST',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2009,
            'end_date': '2009-09-30',
            'revenues': 489339000,
            'op_income': 76410000,
            'net_income': 47589000,
            'eps_basic': 0.32,
            'eps_diluted': 0.32,
            'dividend': 0.0,
            'assets': 1337764000,
            'cur_assets': 998090000,
            'cur_liab': 138744000,
            'equity': 1185140000,
            'cash': 193744000,
            'cash_flow_op': 253184000,
            'cash_flow_inv': -41031000,
            'cash_flow_fin': -106943000
        })

    def test_fb_20120630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1326801/000119312512325997/fb-20120630.xml')
        self.assert_item(item, {
            'symbol': 'FB',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2012,
            'end_date': '2012-06-30',
            'revenues': 1184000000,
            'op_income': -743000000,
            'net_income': -157000000,
            'eps_basic': -0.08,
            'eps_diluted': -0.08,
            'dividend': 0.0,
            'assets': 14928000000,
            'cur_assets': 11967000000,
            'cur_liab': 1034000000,
            'equity': 13309000000,
            'cash': 2098000000,
            'cash_flow_op': 683000000,
            'cash_flow_inv': -7170000000,
            'cash_flow_fin': 7090000000
        })

    def test_fb_20121231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1326801/000132680113000003/fb-20121231.xml')
        self.assert_item(item, {
            'symbol': 'FB',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2012,
            'end_date': '2012-12-31',
            'revenues': 5089000000,
            'op_income': 538000000,
            'net_income': 32000000,
            'eps_basic': 0.02,
            'eps_diluted': 0.01,
            'dividend': 0.0,
            'assets': 15103000000,
            'cur_assets': 11267000000,
            'cur_liab': 1052000000,
            'equity': 11755000000,
            'cash': 2384000000,
            'cash_flow_op': 1612000000,
            'cash_flow_inv': -7024000000,
            'cash_flow_fin': 6283000000
        })

    def test_fll_20121231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/891482/000118811213000562/fll-20121231.xml')
        self.assert_item(item, {
            'symbol': 'FLL',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2012,
            'end_date': '2012-12-31',
            'revenues': 128760000,
            'op_income': 49638000,
            'net_income': 27834000,
            'eps_basic': 1.49,
            'eps_diluted': None,
            'dividend': 0.0,
            'assets': 162725000,
            'cur_assets': 32339000,
            'cur_liab': 15332000,
            'equity': 81133000,
            'cash': 20603000,
            'cash_flow_op': -4301000,
            'cash_flow_inv': 45271000,
            'cash_flow_fin': -35074000
        })

    def test_flr_20080930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1124198/000110465908068715/flr-20080930.xml')
        self.assert_item(item, {
            'symbol': 'FLR',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2008,
            'end_date': '2008-09-30',
            'revenues': 5673818000,
            'op_income': None,
            'net_income': 183099000,
            'eps_basic': 1.03,
            'eps_diluted': 1.01,
            'dividend': 0.125,
            'assets': 6605120000,
            'cur_assets': 4808393000,
            'cur_liab': 3228638000,
            'equity': 2741002000,
            'cash': 1514943000,
            'cash_flow_op': 855198000,
            'cash_flow_inv': -295445000,
            'cash_flow_fin': -202011000
        })

    def test_fmc_20090630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/37785/000119312509165435/fmc-20090630.xml')
        self.assert_item(item, {
            'symbol': 'FMC',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2009,
            'end_date': '2009-06-30',
            'revenues': 700300000,
            'op_income': 97200000,
            'net_income': 69300000,
            'eps_basic': 0.95,
            'eps_diluted': 0.94,
            'dividend': 0.0,
            'assets': 3028500000,
            'cur_assets': 1423700000,
            'cur_liab': 717200000,
            'equity': 1101200000,
            'cash': 67000000,
            'cash_flow_op': 173900000,
            'cash_flow_inv': -106500000,
            'cash_flow_fin': -33100000
        })

    def test_fpl_20100331(self):
        # FPL was later changed to NEE
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/37634/000075330810000051/fpl-20100331.xml')
        self.assert_item(item, {
            'symbol': 'FPL',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2010,
            'end_date': '2010-03-31',
            'revenues': 3622000000,
            'op_income': 939000000,
            'net_income': 556000000,
            'eps_basic': 1.36,
            'eps_diluted': 1.36,
            'dividend': 0.5,
            'assets': 50942000000,
            'cur_assets': 5557000000,
            'cur_liab': 7782000000,
            'equity': 13336000000,
            'cash': 1215000000,
            'cash_flow_op': 896000000,
            'cash_flow_inv': -1361000000,
            'cash_flow_fin': 1442000000
        })

    def test_ftr_20110930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/20520/000002052011000066/ftr-20110930.xml')
        self.assert_item(item, {
            'symbol': 'FTR',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2011,
            'end_date': '2011-09-30',
            'revenues': 1290939000,
            'op_income': 180291000,
            'net_income': 19481000,
            'eps_basic': 0.02,
            'eps_diluted': 0.02,
            'dividend': 0.0,
            'assets': 17493767000,
            'cur_assets': 969746000,
            'cur_liab': 1168142000,
            'equity': 4776588000,
            'cash': 205817000,
            'cash_flow_op': 1272654000,
            'cash_flow_inv': -676974000,
            'cash_flow_fin': -641126000
        })

    def test_ge_20121231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/40545/000004054513000036/ge-20121231.xml')
        self.assert_item(item, {
            'symbol': 'GE',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2012,
            'end_date': '2012-12-31',
            'revenues': 147359000000,
            'op_income': 22887000000,
            'net_income': 13641000000,
            'eps_basic': 1.29,
            'eps_diluted': 1.29,
            'dividend': 0.7,
            'assets': 685328000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 128470000000,
            'cash': 77356000000,
            'cash_flow_op': 31331000000,
            'cash_flow_inv': 11302000000,
            'cash_flow_fin': -51074000000
        })

    def test_gis_20121125(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/40704/000119312512508388/gis-20121125.xml')
        self.assert_item(item, {
            'symbol': 'GIS',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2012-11-25',
            'revenues': 4881800000,
            'op_income': 829000000,
            'net_income': 541600000,
            'eps_basic': 0.84,
            'eps_diluted': 0.82,
            'dividend': 0.33,
            'assets': 22952900000,
            'cur_assets': 4565500000,
            'cur_liab': 5736400000,
            'equity': 7440000000,
            'cash': 734900000,
            'cash_flow_op': 1317100000,
            'cash_flow_inv': -1103200000,
            'cash_flow_fin': 33700000
        })

    def test_gmcr_20110625(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/909954/000119312511214253/gmcr-20110630.xml')
        self.assert_item(item, {
            'symbol': 'GMCR',
            'amend': False,  # it's actually amended, but not marked in XML
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2011,
            'end_date': '2011-06-25',
            'revenues': 717210000,
            'op_income': 119310000,
            'net_income': 56348000,
            'eps_basic': 0.38,
            'eps_diluted': 0.37,
            'dividend': 0.0,
            'assets': 2874422000,
            'cur_assets': 844998000,
            'cur_liab': 395706000,
            'equity': 1816646000,
            'cash': 76138000,
            'cash_flow_op': 174708000,
            'cash_flow_inv': -1082070000,
            'cash_flow_fin': 986183000
        })

    def test_goog_20090930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1288776/000119312509222384/goog-20090930.xml')
        self.assert_item(item, {
            'symbol': 'GOOG',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2009,
            'end_date': '2009-09-30',
            'revenues': 5944851000,
            'op_income': 2073718000,
            'net_income': 1638975000,
            'eps_basic': 5.18,
            'eps_diluted': 5.13,
            'dividend': 0.0,
            'assets': 37702845000,
            'cur_assets': 26353544000,
            'cur_liab': 2321774000,
            'equity': 33721753000,
            'cash': 12087115000,
            'cash_flow_op': 6584667000,
            'cash_flow_inv': -3245963000,
            'cash_flow_fin': 74851000
        })

    def test_goog_20120930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1288776/000119312512440217/goog-20120930.xml')
        self.assert_item(item, {
            'symbol': 'GOOG',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2012,
            'end_date': '2012-09-30',
            'revenues': 14101000000,
            'op_income': 2736000000,
            'net_income': 2176000000,
            'eps_basic': 6.64,
            'eps_diluted': 6.53,
            'dividend': 0.0,
            'assets': 89730000000,
            'cur_assets': 56821000000,
            'cur_liab': 14434000000,
            'equity': 68028000000,
            'cash': 16260000000,
            'cash_flow_op': 11950000000,
            'cash_flow_inv': -7542000000,
            'cash_flow_fin': 1921000000
        })

    def test_goog_20121231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1288776/000119312513028362/goog-20121231.xml')
        self.assert_item(item, {
            'symbol': 'GOOG',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2012,
            'end_date': '2012-12-31',
            'revenues': 50175000000,
            'op_income': 12760000000,
            'net_income': 10737000000,
            'eps_basic': 32.81,
            'eps_diluted': 32.31,
            'dividend': 0.0,
            'assets': 93798000000,
            'cur_assets': 60454000000,
            'cur_liab': 14337000000,
            'equity': 71715000000,
            'cash': 14778000000,
            'cash_flow_op': 16619000000,
            'cash_flow_inv': -13056000000,
            'cash_flow_fin': 1229000000
        })

    def test_goog_20130630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1288776/000128877613000055/goog-20130630.xml')
        self.assert_item(item, {
            'symbol': 'GOOG',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2013-06-30',
            'revenues': 14105000000,
            'op_income': 3123000000,
            'net_income': 3228000000,
            'eps_basic': 9.71,
            'eps_diluted': 9.54,
            'dividend': 0.0,
            'assets': 101182000000,
            'cur_assets': 66861000000,
            'cur_liab': 15329000000,
            'equity': 78852000000,
            'cash': 16164000000,
            'cash_flow_op': 8338000000,
            'cash_flow_inv': -6244000000,
            'cash_flow_fin': -622000000
        })

    def test_goog_20140630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1288776/000128877614000065/goog-20140630.xml')
        self.assert_item(item, {
            'symbol': 'GOOG/GOOGL',  # Two symbols, see issue #6
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2014,
            'end_date': '2014-06-30',
            'revenues': 15955000000,
            'op_income': 4258000000,
            'net_income': 3422000000,
            'eps_basic': 5.07,
            'eps_diluted': 4.99,
            'dividend': 0.0,
            'assets': 121608000000,
            'cur_assets': 77905000000,
            'cur_liab': 17097000000,
            'equity': 95749000000,
            'cash': 19620000000,
            'cash_flow_op': 10018000000,
            'cash_flow_inv': -8487000000,
            'cash_flow_fin': -640000000
        })

    def test_gs_20090626(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/886982/000095012309029919/gs-20090626.xml')
        self.assert_item(item, {
            'symbol': 'GS',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2009,
            'end_date': '2009-06-26',
            'revenues': 13761000000,
            'op_income': None,
            'net_income': 2718000000,
            'eps_basic': 5.27,
            'eps_diluted': 4.93,
            'dividend': 0.35,
            'assets': 889544000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 62813000000,
            'cash': 22177000000,
            'cash_flow_op': 16020000000,
            'cash_flow_inv': -772000000,
            'cash_flow_fin': -6876000000
        })

    def test_hon_20120331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/773840/000093041312002323/hon-20120331.xml')
        self.assert_item(item, {
            'symbol': 'HON',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2012,
            'end_date': '2012-03-31',
            'revenues': 9307000000,
            'op_income': None,
            'net_income': 823000000,
            'eps_basic': 1.06,
            'eps_diluted': 1.04,
            'dividend': 0.3725,
            'assets': 40370000000,
            'cur_assets': 16553000000,
            'cur_liab': 12666000000,
            'equity': 11842000000,
            'cash': 3988000000,
            'cash_flow_op': 196000000,
            'cash_flow_inv': -122000000,
            'cash_flow_fin': 169000000
        })

    def test_hrb_20090731(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12659/000095012309041361/hrb-20090731.xml')
        self.assert_item(item, {
            'symbol': 'HRB',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2010,
            'end_date': '2009-07-31',
            'revenues': 275505000,
            'op_income': -214162000,
            'net_income': -133634000,
            'eps_basic': -0.4,
            'eps_diluted': -0.4,
            'dividend': 0.15,
            'assets': 4545762000,
            'cur_assets': 1828146000,
            'cur_liab': 1823126000,
            'equity': 1190714000,
            'cash': 1006303000,
            'cash_flow_op': -454577000,
            'cash_flow_inv': 15360000,
            'cash_flow_fin': -216206000
        })

    def test_hrb_20091031(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12659/000095012309069608/hrb-20091031.xml')
        self.assert_item(item, {
            'symbol': 'HRB',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2010,
            'end_date': '2009-10-31',
            'revenues': 326081000,
            'op_income': -214553000,
            'net_income': -128587000,
            'eps_basic': -0.38,
            'eps_diluted': -0.38,
            'dividend': 0.15,
            'assets': 4967359000,
            'cur_assets': 2300986000,
            'cur_liab': 2382867000,
            'equity': 1071097000,
            'cash': 1432243000,
            'cash_flow_op': -786152000,
            'cash_flow_inv': 43280000,
            'cash_flow_fin': 511231000
        })

    def test_hrb_20130731(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12659/000157484213000013/hrb-20130731.xml')
        self.assert_item(item, {
            'symbol': 'HRB',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2014,
            'end_date': '2013-07-31',
            'revenues': 127195000,
            'op_income': -179555000,
            'net_income': -115187000,
            'eps_basic': -0.42,
            'eps_diluted': -0.42,
            'dividend': 0.20,
            'assets': 3762888000,
            'cur_assets': 1704932000,
            'cur_liab': 1450484000,
            'equity': 1105315000,
            'cash': 1163876000,
            'cash_flow_op': -318742000,
            'cash_flow_inv': -29090000,
            'cash_flow_fin': -229255000
        })

    def test_ihc_20120331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/701869/000070186912000029/ihc-20120331.xml')
        self.assert_item(item, {
            'symbol': 'IHC',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2012,
            'end_date': '2012-03-31',
            'revenues': 102156000,
            'op_income': 6416000,
            'net_income': 3922000,
            'eps_basic': 0.22,
            'eps_diluted': 0.22,
            'dividend': 0.0,
            'assets': 1364411000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 280250000,
            'cash': 9286000,
            'cash_flow_op': -138843000,
            'cash_flow_inv': 130710000,
            'cash_flow_fin': -808000
        })

    def test_intc_20111231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/50863/000119312512075534/intc-20111231.xml')
        self.assert_item(item, {
            'symbol': 'INTC',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2011,
            'end_date': '2011-12-31',
            'revenues': 53999000000,
            'op_income': 17477000000,
            'net_income': 12942000000,
            'eps_basic': 2.46,
            'eps_diluted': 2.39,
            'dividend': 0.7824,
            'assets': 71119000000,
            'cur_assets': 25872000000,
            'cur_liab': 12028000000,
            'equity': 45911000000,
            'cash': 5065000000,
            'cash_flow_op': 20963000000,
            'cash_flow_inv': -10301000000,
            'cash_flow_fin': -11100000000
        })

    def test_intu_20101031(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/896878/000095012310111135/intu-20101031.xml')
        self.assert_item(item, {
            'symbol': 'INTU',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2011,
            'end_date': '2010-10-31',
            'revenues': 532000000,
            'op_income': -104000000,
            'net_income': -70000000,
            'eps_basic': -0.22,
            'eps_diluted': -0.22,
            'dividend': 0.0,
            'assets': 4943000000,
            'cur_assets': 2010000000,
            'cur_liab': 1136000000,
            'equity': 2615000000,
            'cash': 112000000,
            'cash_flow_op': -211000000,
            'cash_flow_inv': 285000000,
            'cash_flow_fin': -177000000
        })

    def test_jnj_20120101(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/200406/000119312512075565/jnj-20120101.xml')
        self.assert_item(item, {
            'symbol': 'JNJ',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2011,
            'end_date': '2012-01-01',
            'revenues': 65030000000,
            'op_income': 13765000000,
            'net_income': 9672000000,
            'eps_basic': 3.54,
            'eps_diluted': 3.49,
            'dividend': 2.25,
            'assets': 113644000000,
            'cur_assets': 54316000000,
            'cur_liab': 22811000000,
            'equity': 57080000000,
            'cash': 24542000000,
            'cash_flow_op': 14298000000,
            'cash_flow_inv': -4612000000,
            'cash_flow_fin': -4452000000
        })

    def test_jnj_20120930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/200406/000020040612000140/jnj-20120930.xml')
        self.assert_item(item, {
            'symbol': 'JNJ',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2012,
            'end_date': '2012-09-30',
            'revenues': 17052000000,
            'op_income': 3825000000,
            'net_income': 2968000000,
            'eps_basic': 1.08,
            'eps_diluted': 1.05,
            'dividend': 0.61,
            'assets': 118951000000,
            'cur_assets': 44791000000,
            'cur_liab': 23935000000,
            'equity': 63761000000,
            'cash': 15486000000,
            'cash_flow_op': 12020000000,
            'cash_flow_inv': -2007000000,
            'cash_flow_fin': -19091000000
        })

    def test_jnj_20130630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/200406/000020040613000091/jnj-20130630.xml')
        self.assert_item(item, {
            'symbol': 'JNJ',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2013-06-30',
            'revenues': 17877000000,
            'op_income': 5020000000,
            'net_income': 3833000000,
            'eps_basic': 1.36,
            'eps_diluted': 1.33,
            'dividend': 0.66,
            'assets': 124325000000,
            'cur_assets': 51273000000,
            'cur_liab': 23767000000,
            'equity': 69665000000,
            'cash': 17307000000,
            'cash_flow_op': 7328000000,
            'cash_flow_inv': -1972000000,
            'cash_flow_fin': -2754000000
        })

    def test_jpm_20090630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/19617/000095012309032832/jpm-20090630.xml')
        self.assert_item(item, {
            'symbol': 'JPM',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2009,
            'end_date': '2009-06-30',
            'revenues': 25623000000,
            'op_income': None,
            'net_income': 1072000000,
            'eps_basic': 0.28,
            'eps_diluted': 0.28,
            'dividend': 0.05,
            'assets': 2026642000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 154766000000,
            'cash': 25133000000,
            'cash_flow_op': 103259000000,
            'cash_flow_inv': 34430000000,
            'cash_flow_fin': -139413000000
        })

    def test_jpm_20111231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/19617/000001961712000163/jpm-20111231.xml')
        self.assert_item(item, {
            'symbol': 'JPM',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2011,
            'end_date': '2011-12-31',
            'revenues': 97234000000,
            'op_income': None,
            'net_income': 17568000000,
            'eps_basic': 4.50,
            'eps_diluted': 4.48,
            'dividend': 1.0,
            'assets': 2265792000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 183573000000,
            'cash': 59602000000,
            'cash_flow_op': 95932000000,
            'cash_flow_inv': -170752000000,
            'cash_flow_fin': 107706000000
        })

    def test_jpm_20130331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/19617/000001961713000300/jpm-20130331.xml')
        self.assert_item(item, {
            'symbol': 'JPM',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2013,
            'end_date': '2013-03-31',
            'revenues': 25122000000,
            'op_income': None,
            'net_income': 6131000000,
            'eps_basic': 1.61,
            'eps_diluted': 1.59,
            'dividend': 0.30,
            'assets': 2389349000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 207086000000,
            'cash': 45524000000,
            'cash_flow_op': 19964000000,
            'cash_flow_inv': -55455000000,
            'cash_flow_fin': 28180000000
        })

    def test_ko_20100402(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/21344/000104746910004416/ko-20100402.xml')
        self.assert_item(item, {
            'symbol': 'KO',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2010,
            'end_date': '2010-04-02',
            'revenues': 7525000000,
            'op_income': 2183000000,
            'net_income': 1614000000,
            'eps_basic': 0.70,
            'eps_diluted': 0.69,
            'dividend': 0.44,
            'assets': 47403000000,
            'cur_assets': 17208000000,
            'cur_liab': 13583000000,
            'equity': 25157000000,
            'cash': 5684000000,
            'cash_flow_op': 1326000000,
            'cash_flow_inv': -1368000000,
            'cash_flow_fin': -1043000000
        })

    def test_ko_20101231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/21344/000104746911001506/ko-20101231.xml')
        self.assert_item(item, {
            'symbol': 'KO',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2010,
            'end_date': '2010-12-31',
            'revenues': 35119000000,
            'op_income': 8449000000,
            'net_income': 11809000000,
            'eps_basic': 5.12,
            'eps_diluted': 5.06,
            'dividend': 1.76,
            'assets': 72921000000,
            'cur_assets': 21579000000,
            'cur_liab': 18508000000,
            'equity': 31317000000,
            'cash': 8517000000,
            'cash_flow_op': 9532000000,
            'cash_flow_inv': -4405000000,
            'cash_flow_fin': -3465000000
        })

    def test_ko_20120928(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/21344/000002134412000051/ko-20120928.xml')
        self.assert_item(item, {
            'symbol': 'KO',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2012,
            'end_date': '2012-09-28',
            'revenues': 12340000000,
            'op_income': 2793000000,
            'net_income': 2311000000,
            'eps_basic': 0.51,
            'eps_diluted': 0.50,
            'dividend': 0.255,
            'assets': 86654000000,
            'cur_assets': 29712000000,
            'cur_liab': 27008000000,
            'equity': 33590000000,
            'cash': 9615000000,
            'cash_flow_op': 7840000000,
            'cash_flow_inv': -10399000000,
            'cash_flow_fin': -399000000
        })

    def test_krft_20120930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1545158/000119312512495570/krft-20120930.xml')
        self.assert_item(item, {
            'symbol': 'KRFT',
            'amend': True,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2012,
            'end_date': '2012-09-30',
            'revenues': 4606000000,
            'op_income': 762000000,
            'net_income': 470000000,
            'eps_basic': 0.79,
            'eps_diluted': 0.79,
            'dividend': 0.0,
            'assets': 22284000000,
            'cur_assets': 3905000000,
            'cur_liab': 2569000000,
            'equity': 7458000000,
            'cash': 244000000,
            'cash_flow_op': 2067000000,
            'cash_flow_inv': -279000000,
            'cash_flow_fin': -1548000000
        })

    def test_l_20100331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/60086/000119312510105707/l-20100331.xml')
        self.assert_item(item, {
            'symbol': 'L',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2010,
            'end_date': '2010-03-31',
            'revenues': 3713000000,
            'op_income': None,
            'net_income': 420000000,
            'eps_basic': 0.99,
            'eps_diluted': 0.99,
            'dividend': 0.0625,
            'assets': 75855000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 21993000000,
            'cash': 135000000,
            'cash_flow_op': 294000000,
            'cash_flow_inv': -411000000,
            'cash_flow_fin': 64000000
        })

    def test_l_20100930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/60086/000119312510245478/l-20100930.xml')
        self.assert_item(item, {
            'symbol': 'L',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2010,
            'end_date': '2010-09-30',
            'revenues': 3701000000,
            'op_income': None,
            'net_income': 36000000,
            'eps_basic': 0.09,
            'eps_diluted': 0.09,
            'dividend': 0.0625,
            'assets': 76821000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 23499000000,
            'cash': 132000000,
            'cash_flow_op': 895000000,
            'cash_flow_inv': -426000000,
            'cash_flow_fin': -527000000
        })

    def test_lbtya_20100331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1316631/000119312510111069/lbtya-20100331.xml')
        self.assert_item(item, {
            'symbol': 'LBTYA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2010,
            'end_date': '2010-03-31',
            'revenues': 2178900000,
            'op_income': 303600000,
            'net_income': 736600000,
            'eps_basic': 2.75,
            'eps_diluted': 2.75,
            'dividend': 0.0,
            'assets': 33083500000,
            'cur_assets': 5524900000,
            'cur_liab': 4107000000,
            'equity': 4066000000,
            'cash': 4184200000,
            'cash_flow_op': 803300000,
            'cash_flow_inv': 45400000,
            'cash_flow_fin': 170700000
        })

    def test_lcapa_20110930(self):
        # This symbol was changed to STRZA
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1507934/000150793411000006/lcapa-20110930.xml')
        self.assert_item(item, {
            'symbol': 'LCAPA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2011,
            'end_date': '2011-09-30',
            'revenues': 540000000,
            'op_income': 111000000,
            'net_income': -42000000,
            'eps_basic': -0.07,
            'eps_diluted': -0.12,
            'dividend': 0.0,
            'assets': 8915000000,
            'cur_assets': 3767000000,
            'cur_liab': 3012000000,
            'equity': 5078000000,
            'cash': 1937000000,
            'cash_flow_op': 316000000,
            'cash_flow_inv': -205000000,
            'cash_flow_fin': -264000000
        })

    def test_linta_20120331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1355096/000135509612000008/linta-20120331.xml')
        self.assert_item(item, {
            'symbol': 'LINTA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2012,
            'end_date': '2012-03-31',
            'revenues': 2314000000,
            'op_income': 258000000,
            'net_income': 91000000,
            'eps_basic': 0.16,
            'eps_diluted': 0.16,
            'dividend': 0.0,
            'assets': 17144000000,
            'cur_assets': 2764000000,
            'cur_liab': 3486000000,
            'equity': 6505000000,
            'cash': 794000000,
            'cash_flow_op': 330000000,
            'cash_flow_inv': -91000000,
            'cash_flow_fin': -284000000
        })

    def test_lll_20100625(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1039101/000095012310071159/lll-20100625.xml')
        self.assert_item(item, {
            'symbol': 'LLL',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2010,
            'end_date': '2010-06-25',
            'revenues': -3966000000,  # a doc's error, should be 3966M
            'op_income': -442000000,  # a doc's error, should be 442M
            'net_income': -228000000,  # a doc's error, should be 227M
            'eps_basic': 1.97,
            'eps_diluted': 1.95,
            'dividend': 0.4,
            'assets': 15689000000,
            'cur_assets': 5494000000,
            'cur_liab': 3730000000,
            'equity': 6926000000,
            'cash': 1023000000,
            'cash_flow_op': 589000000,
            'cash_flow_inv': -688000000,
            'cash_flow_fin': 132000000
        })

    def test_lltc_20110102(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/791907/000079190711000016/lltc-20110102.xml')
        self.assert_item(item, {
            'symbol': 'LLTC',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2011,
            'end_date': '2011-01-02',
            'revenues': 383621000,
            'op_income': 201059000,
            'net_income': 143743000,
            'eps_basic': 0.62,
            'eps_diluted': 0.62,
            'dividend': 0.23,
            'assets': 1446186000,
            'cur_assets': 1069958000,
            'cur_liab': 199210000,
            'equity': 278793000,
            'cash': 203308000,
            'cash_flow_op': 342333000,
            'cash_flow_inv': 39771000,
            'cash_flow_fin': -474650000
        })

    def test_lltc_20111002(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/791907/000079190711000080/lltc-20111007.xml')
        self.assert_item(item, {
            'symbol': 'LLTC',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2012,
            'end_date': '2011-10-02',
            'revenues': 329920000,
            'op_income': 157566000,
            'net_income': 108401000,
            'eps_basic': 0.47,
            'eps_diluted': 0.47,
            'dividend': 0.24,
            'assets': 1659341000,
            'cur_assets': 1268413000,
            'cur_liab': 169006000,
            'equity': 543199000,
            'cash': 163414000,
            'cash_flow_op': 149860000,
            'cash_flow_inv': -171884000,
            'cash_flow_fin': -85085000
        })

    def test_lly_20100930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/59478/000095012310097867/lly-20100930.xml')
        self.assert_item(item, {
            'symbol': 'LLY',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2010,
            'end_date': '2010-09-30',
            'revenues': 5654800000,
            'op_income': None,
            'net_income': 1302900000,
            'eps_basic': 1.18,
            'eps_diluted': 1.18,
            'dividend': 0.49,
            'assets': 29904300000,
            'cur_assets': 14184300000,
            'cur_liab': 6097400000,
            'equity': 12405500000,
            'cash': 5908800000,
            'cash_flow_op': 4628700000,
            'cash_flow_inv': -1595300000,
            'cash_flow_fin': -1472300000
        })

    def test_lmca_20120331(self):
        # This symbol was changed to STRZA
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1507934/000150793412000012/lmca-20120331.xml')
        self.assert_item(item, {
            'symbol': 'LMCA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2012,
            'end_date': '2012-03-31',
            'revenues': 440000000,
            'op_income': 89000000,
            'net_income': 137000000,
            'eps_basic': 1.13,
            'eps_diluted': 1.10,
            'dividend': 0.0,
            'assets': 7122000000,
            'cur_assets': 3380000000,
            'cur_liab': 547000000,
            'equity': 5321000000,
            'cash': 1915000000,
            'cash_flow_op': 94000000,
            'cash_flow_inv': 581000000,
            'cash_flow_fin': -830000000
        })

    def test_lnc_20120930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/59558/000005955812000143/lnc-20120930.xml')
        self.assert_item(item, {
            'symbol': 'LNC',
            'amend': False,  # mistake in doc, should be True
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2012,
            'end_date': '2012-09-30',
            'revenues': None,  # missing in doc, should be 2954000000
            'op_income': None,
            'net_income': 402000000,
            'eps_basic': 1.45,
            'eps_diluted': 1.41,
            'dividend': 0.0,
            'assets': 215458000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 15237000000,
            'cash': 4373000000,
            'cash_flow_op': 666000000,
            'cash_flow_inv': -2067000000,
            'cash_flow_fin': 1264000000
        })

    def test_ltd_20111029(self):
        # This symbol was changed to LB
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/701985/000144530511003514/ltd-20111029.xml')
        self.assert_item(item, {
            'symbol': 'LTD',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2011,
            'end_date': '2011-10-29',
            'revenues': 2174000000,
            'op_income': 186000000,
            'net_income': 94000000,
            'eps_basic': 0.32,
            'eps_diluted': 0.31,
            'dividend': 0.2,
            'assets': 6517000000,
            'cur_assets': 2616000000,
            'cur_liab': 1504000000,
            'equity': 521000000,
            'cash': 498000000,
            'cash_flow_op': 94000000,
            'cash_flow_inv': -239000000,
            'cash_flow_fin': -489000000
        })

    def test_ltd_20130803(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/701985/000070198513000032/ltd-20130803.xml')
        self.assert_item(item, {
            'symbol': 'LTD',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2013-08-03',
            'revenues': 2516000000,
            'op_income': 358000000,
            'net_income': 178000000,
            'eps_basic': 0.62,
            'eps_diluted': 0.61,
            'dividend': 0.3,
            'assets': 6072000000,
            'cur_assets': 2098000000,
            'cur_liab': 1485000000,
            'equity': -861000000,
            'cash': 551000000,
            'cash_flow_op': 354000000,
            'cash_flow_inv': -381000000,
            'cash_flow_fin': -194000000
        })

    def test_luv_20110630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/92380/000009238011000070/luv-20110630.xml')
        self.assert_item(item, {
            'symbol': 'LUV',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2011,
            'end_date': '2011-06-30',
            'revenues': 4136000000,
            'op_income': 207000000,
            'net_income': 161000000,
            'eps_basic': 0.21,
            'eps_diluted': 0.21,
            'dividend': 0.0045,
            'assets': 18945000000,
            'cur_assets': 5421000000,
            'cur_liab': 5318000000,
            'equity': 7202000000,
            'cash': 1595000000,
            'cash_flow_op': 237000000,
            'cash_flow_inv': -589000000,
            'cash_flow_fin': -92000000
        })

    def test_mchp_20120630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/827054/000082705412000230/mchp-20120630.xml')
        self.assert_item(item, {
            'symbol': 'MCHP',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2013,
            'end_date': '2012-06-30',
            'revenues': 352134000,
            'op_income': 96333000,
            'net_income': 78710000,
            'eps_basic': 0.41,
            'eps_diluted': 0.39,
            'dividend': 0.35,
            'assets': 3144840000,
            'cur_assets': 2229298000,
            'cur_liab': 249989000,
            'equity': 2017990000,
            'cash': 779848000,
            'cash_flow_op': 128971000,
            'cash_flow_inv': 77890000,
            'cash_flow_fin': -62768000
        })

    def test_mdlz_20130930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1103982/000119312513431957/mdlz-20130930.xml')
        self.assert_item(item, {
            'symbol': 'MDLZ',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2013,
            'end_date': '2013-09-30',
            'revenues': 8472000000,
            'op_income': 1262000000,
            'net_income': 1024000000,
            'eps_basic': 0.58,
            'eps_diluted': 0.57,
            'dividend': 0.14,
            'assets': 74859000000,
            'cur_assets': 15463000000,
            'cur_liab': 15269000000,
            'equity': 32492000000,
            'cash': 3692000000,
            'cash_flow_op': 1198000000,
            'cash_flow_inv': -1015000000,
            'cash_flow_fin': -881000000
        })

    def test_mmm_20091231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/66740/000110465910007295/mmm-20091231.xml')
        self.assert_item(item, {
            'symbol': 'MMM',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2009,
            'end_date': '2009-12-31',
            'revenues': 23123000000,
            'op_income': 4814000000,
            'net_income': 3193000000,
            'eps_basic': 4.56,
            'eps_diluted': 4.52,
            'dividend': 2.04,
            'assets': 27250000000,
            'cur_assets': 10795000000,
            'cur_liab': 4897000000,
            'equity': 13302000000,
            'cash': 3040000000,
            'cash_flow_op': 4941000000,
            'cash_flow_inv': -1732000000,
            'cash_flow_fin': -2014000000
        })

    def test_mmm_20120331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/66740/000110465912032441/mmm-20120331.xml')
        self.assert_item(item, {
            'symbol': 'MMM',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2012,
            'end_date': '2012-03-31',
            'revenues': 7486000000,
            'op_income': 1634000000,
            'net_income': 1125000000,
            'eps_basic': 1.61,
            'eps_diluted': 1.59,
            'dividend': 0.59,
            'assets': 32015000000,
            'cur_assets': 12853000000,
            'cur_liab': 5408000000,
            'equity': 16619000000,
            'cash': 2332000000,
            'cash_flow_op': 828000000,
            'cash_flow_inv': -43000000,
            'cash_flow_fin': -722000000
        })

    def test_mmm_20130630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/66740/000110465913058961/mmm-20130630.xml')
        self.assert_item(item, {
            'symbol': 'MMM',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2013-06-30',
            'revenues': 7752000000,
            'op_income': 1702000000,
            'net_income': 1197000000,
            'eps_basic': 1.74,
            'eps_diluted': 1.71,
            'dividend': 0.635,
            'assets': 34130000000,
            'cur_assets': 13983000000,
            'cur_liab': 6335000000,
            'equity': 18319000000,
            'cash': 2942000000,
            'cash_flow_op': 2673000000,
            'cash_flow_inv': -740000000,
            'cash_flow_fin': -1727000000
        })

    def test_mnst_20130630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/865752/000110465913062263/mnst-20130630.xml')
        self.assert_item(item, {
            'symbol': 'MNST',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2013-06-30',
            'revenues': 630934000,
            'op_income': 179427000,
            'net_income': 106873000,
            'eps_basic': 0.64,
            'eps_diluted': 0.62,
            'dividend': 0.0,
            'assets': 1317842000,
            'cur_assets': 1093822000,
            'cur_liab': 346174000,
            'equity': 856021000,
            'cash': 283839000,
            'cash_flow_op': 99720000,
            'cash_flow_inv': -70580000,
            'cash_flow_fin': 30981000
        })

    def test_msft_20110630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/789019/000119312511200680/msft-20110630.xml')
        self.assert_item(item, {
            'symbol': 'MSFT',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2011,
            'end_date': '2011-06-30',
            'revenues': 69943000000,
            'op_income': 27161000000,
            'net_income': 23150000000,
            'eps_basic': 2.73,
            'eps_diluted': 2.69,
            'dividend': 0.64,
            'assets': 108704000000,
            'cur_assets': 74918000000,
            'cur_liab': 28774000000,
            'equity': 57083000000,
            'cash': 9610000000,
            'cash_flow_op': 26994000000,
            'cash_flow_inv': -14616000000,
            'cash_flow_fin': -8376000000
        })

    def test_msft_20111231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/789019/000119312512026864/msft-20111231.xml')
        self.assert_item(item, {
            'symbol': 'MSFT',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2012,
            'end_date': '2011-12-31',
            'revenues': 20885000000,
            'op_income': 7994000000,
            'net_income': 6624000000,
            'eps_basic': 0.79,
            'eps_diluted': 0.78,
            'dividend': 0.20,
            'assets': 112243000000,
            'cur_assets': 72513000000,
            'cur_liab': 25373000000,
            'equity': 64121000000,
            'cash': 10610000000,
            'cash_flow_op': 5862000000,
            'cash_flow_inv': -5568000000,
            'cash_flow_fin': -2513000000
        })

    def test_msft_20130331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/789019/000119312513160748/msft-20130331.xml')
        self.assert_item(item, {
            'symbol': 'MSFT',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2013,
            'end_date': '2013-03-31',
            'revenues': 20489000000,
            'op_income': 7612000000,
            'net_income': 6055000000,
            'eps_basic': 0.72,
            'eps_diluted': 0.72,
            'dividend': 0.23,
            'assets': 134105000000,
            'cur_assets': 93524000000,
            'cur_liab': 31929000000,
            'equity': 76688000000,
            'cash': 5240000000,
            'cash_flow_op': 9666000000,
            'cash_flow_inv': -7660000000,
            'cash_flow_fin': -2744000000
        })

    def test_mu_20121129(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/723125/000072312513000007/mu-20121129.xml')
        self.assert_item(item, {
            'symbol': 'MU',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2013,
            'end_date': '2012-11-29',
            'revenues': 1834000000,
            'op_income': -157000000,
            'net_income': -275000000,
            'eps_basic': -0.27,
            'eps_diluted': -0.27,
            'dividend': 0.0,
            'assets': 14067000000,
            'cur_assets': 5315000000,
            'cur_liab': 2138000000,
            'equity': 8186000000,
            'cash': 2102000000,
            'cash_flow_op': 236000000,
            'cash_flow_inv': -639000000,
            'cash_flow_fin': 46000000
        })

    def test_mxim_20110326(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/743316/000144530511000751/mxim-20110422.xml')
        self.assert_item(item, {
            'symbol': 'MXIM',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2011,
            'end_date': '2011-03-26',
            'revenues': 606775000,
            'op_income': 163995000,
            'net_income': 136276000,
            'eps_basic': 0.46,
            'eps_diluted': 0.45,
            'dividend': 0.21,
            'assets': 3452417000,
            'cur_assets': 1676593000,
            'cur_liab': 391153000,
            'equity': 2465040000,
            'cash': 868923000,
            'cash_flow_op': 615180000,
            'cash_flow_inv': -224755000,
            'cash_flow_fin': -348014000
        })

    def test_nflx_20120930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1065280/000106528012000020/nflx-20120930.xml')
        self.assert_item(item, {
            'symbol': 'NFLX',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2012,
            'end_date': '2012-09-30',
            'revenues': 905089000,
            'op_income': 16135000,
            'net_income': 7675000,
            'eps_basic': 0.14,
            'eps_diluted': 0.13,
            'dividend': 0.0,
            'assets': 3808833000,
            'cur_assets': 2225018000,
            'cur_liab': 1598223000,
            'equity': 716840000,
            'cash': 370298000,
            'cash_flow_op': 150000,
            'cash_flow_inv': -33524000,
            'cash_flow_fin': -158000
        })

    def test_nvda_20130127(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1045810/000104581013000008/nvda-20130127.xml')
        self.assert_item(item, {
            'symbol': 'NVDA',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2013,
            'end_date': '2013-01-27',
            'revenues': 4280159000,
            'op_income': 648239000,
            'net_income': 562536000,
            'eps_basic': 0.91,
            'eps_diluted': 0.9,
            'dividend': 0.075,
            'assets': 6412245000,
            'cur_assets': 4775258000,
            'cur_liab': 976223000,
            'equity': 4827703000,
            'cash': 732786000,
            'cash_flow_op': 824172000,
            'cash_flow_inv': -743992000,
            'cash_flow_fin': -15270000
        })

    def test_nws_20090930(self):
        # This symbol was changed to FOX
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1308161/000119312509224062/nws-20090930.xml')
        self.assert_item(item, {
            'symbol': 'NWS',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2010,
            'end_date': '2009-09-30',
            'revenues': 7199000000,
            'op_income': 1042000000,
            'net_income': 571000000,
            'eps_basic': 0.22,
            'eps_diluted': 0.22,
            'dividend': 0.06,
            'assets': 55316000000,
            'cur_assets': 17425000000,
            'cur_liab': 10990000000,
            'equity': 24479000000,
            'cash': 7832000000,
            'cash_flow_op': 680000000,
            'cash_flow_inv': -362000000,
            'cash_flow_fin': 942000000
        })

    def test_omx_20110924(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12978/000119312511286448/omx-20110924.xml')
        self.assert_item(item, {
            'symbol': 'OMX',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2011,
            'end_date': '2011-09-24',
            'revenues': 1774767000,
            'op_income': 41296000,
            'net_income': 21518000,
            'eps_basic': 0.25,
            'eps_diluted': 0.25,
            'dividend': 0.0,
            'assets': 4002981000,
            'cur_assets': 1950996000,
            'cur_liab': 998377000,
            'equity': 657636000,
            'cash': 485426000,
            'cash_flow_op': 78743000,
            'cash_flow_inv': -41380000,
            'cash_flow_fin': -11280000
        })

    def test_omx_20111231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12978/000119312512077611/omx-20111231.xml')
        self.assert_item(item, {
            'symbol': 'OMX',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2011,
            'end_date': '2011-12-31',
            'revenues': 7121167000,
            'op_income': 86486000,
            'net_income': 32771000,
            'eps_basic': 0.38,
            'eps_diluted': 0.38,
            'dividend': 0.0,
            'assets': 4069275000,
            'cur_assets': 1938974000,
            'cur_liab': 1013301000,
            'equity': 568993000,
            'cash': 427111000,
            'cash_flow_op': 53679000,
            'cash_flow_inv': -69373000,
            'cash_flow_fin': -17952000
        })

    def test_omx_20121229(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12978/000119312513073972/omx-20121229.xml')
        self.assert_item(item, {
            'symbol': 'OMX',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2012,
            'end_date': '2012-12-29',
            'revenues': 6920384000,
            'op_income': 24278000,
            'net_income': 414694000,
            'eps_basic': 4.79,
            'eps_diluted': 4.74,
            'dividend': 0.0,
            'assets': 3784315000,
            'cur_assets': 1983884000,
            'cur_liab': 1056641000,
            'equity': 1034373000,
            'cash': 495056000,
            'cash_flow_op': 185201000,
            'cash_flow_inv': -85244000,
            'cash_flow_fin': -34836000
        })

    def test_orly_20130331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/898173/000089817313000028/orly-20130331.xml')
        self.assert_item(item, {
            'symbol': 'ORLY',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2013,
            'end_date': '2013-03-31',
            'revenues': 1585009000,
            'op_income': 251084000,
            'net_income': 154329000,
            'eps_basic': 1.38,
            'eps_diluted': 1.36,
            'dividend': 0.0,
            'assets': 5789541000,
            'cur_assets': 2741188000,
            'cur_liab': 2349022000,
            'equity': 2072525000,
            'cash': 205410000,
            'cash_flow_op': 226344000,
            'cash_flow_inv': -72100000,
            'cash_flow_fin': -196962000
        })

    def test_pay_20110430(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1312073/000119312511161119/pay-20110430.xml')
        self.assert_item(item, {
            'symbol': 'PAY',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2011,
            'end_date': '2011-04-30',
            'revenues': 292446000,
            'op_income': 37338000,
            'net_income': 25200000,
            'eps_basic': 0.29,
            'eps_diluted': 0.27,
            'dividend': 0.0,
            'assets': 1252289000,
            'cur_assets': 935395000,
            'cur_liab': 303590000,
            'equity': 332172000,
            'cash': 531542000,
            'cash_flow_op': 68831000,
            'cash_flow_inv': -20049000,
            'cash_flow_fin': 34676000
        })

    def test_pcar_20100331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/75362/000119312510108284/pcar-20100331.xml')
        self.assert_item(item, {
            'symbol': 'PCAR',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2010,
            'end_date': '2010-03-31',
            'revenues': 2230700000,
            'op_income': None,
            'net_income': 68300000,
            'eps_basic': 0.19,
            'eps_diluted': 0.19,
            'dividend': 0.09,
            'assets': 13990000000,
            'cur_assets': 3396400000,
            'cur_liab': 1425900000,
            'equity': 5092600000,
            'cash': 1854700000,
            'cash_flow_op': 285400000,
            'cash_flow_inv': 40500000,
            'cash_flow_fin': -350800000
        })

    def test_pcg_20091231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1004980/000100498010000015/pcg-20091231.xml')
        self.assert_item(item, {
            'symbol': 'PCG',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2009,
            'end_date': '2009-12-31',
            'revenues': 13399000000,
            'op_income': 2299000000,
            'net_income': 1220000000,
            'eps_basic': 3.25,
            'eps_diluted': 3.2,
            'dividend': 1.68,
            'assets': 42945000000,
            'cur_assets': 5657000000,
            'cur_liab': 6813000000,
            'equity': 10585000000,
            'cash': 527000000,
            'cash_flow_op': 3039000000,
            'cash_flow_inv': -3336000000,
            'cash_flow_fin': 605000000
        })

    def test_plt_20130630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/914025/000091402513000049/plt-20130630.xml')
        self.assert_item(item, {
            'symbol': 'PLT',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2014,
            'end_date': '2013-06-30',
            'revenues': 202818000,
            'op_income': 35949000,
            'net_income': 26953000,
            'eps_basic': 0.63,
            'eps_diluted': 0.62,
            'dividend': 0.1,
            'assets': 780520000,
            'cur_assets': 568272000,
            'cur_liab': 90121000,
            'equity': 673569000,
            'cash': 256343000,
            'cash_flow_op': 34140000,
            'cash_flow_inv': -4120000,
            'cash_flow_fin': -2424000
        })

    def test_qep_20110630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1108827/000119312511202252/qep-20110630.xml')
        self.assert_item(item, {
            'symbol': 'QEP',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2011,
            'end_date': '2011-06-30',
            'revenues': 784100000,
            'op_income': 168900000,
            'net_income': 92800000,
            'eps_basic': 0.52,
            'eps_diluted': 0.52,
            'dividend': 0.02,
            'assets': 7075000000,
            'cur_assets': 655600000,
            'cur_liab': 582900000,
            'equity': 3184400000,
            'cash': None,
            'cash_flow_op': 628600000,
            'cash_flow_inv': -660200000,
            'cash_flow_fin': 31600000
        })

    def test_qep_20120930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1108827/000110882712000006/qep-20120930.xml')
        self.assert_item(item, {
            'symbol': 'QEP',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2012,
            'end_date': '2012-09-30',
            'revenues': 542400000,
            'op_income': -12600000,
            'net_income': -3100000,
            'eps_basic': -0.02,
            'eps_diluted': -0.02,
            'dividend': 0.02,
            'assets': 8996100000,
            'cur_assets': 619800000,
            'cur_liab': 616700000,
            'equity': 3377000000,
            'cash': 0.0,
            'cash_flow_op': 972000000,
            'cash_flow_inv': -2435700000,
            'cash_flow_fin': 1463700000
        })

    def test_regn_20100630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/872589/000120677410001689/regn-20100630.xml')
        self.assert_item(item, {
            'symbol': 'REGN',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2010,
            'end_date': '2010-06-30',
            'revenues': 115886000,
            'op_income': -23724000,
            'net_income': -25474000,
            'eps_basic': -0.31,
            'eps_diluted': -0.31,
            'dividend': 0.0,
            'assets': 790641000,
            'cur_assets': 417750000,
            'cur_liab': 119571000,
            'equity': 371216000,
            'cash': 112000000,
            'cash_flow_op': -22626000,
            'cash_flow_inv': -131383000,
            'cash_flow_fin': 58934000
        })

    def test_sbac_20110331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1034054/000119312511130220/sbac-20110331.xml')
        self.assert_item(item, {
            'symbol': 'SBAC',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2011,
            'end_date': '2011-03-31',
            'revenues': 167749000,
            'op_income': 23899000,
            'net_income': -34251000,
            'eps_basic': -0.3,
            'eps_diluted': -0.3,
            'dividend': 0.0,
            'assets': 3466258000,
            'cur_assets': 173387000,
            'cur_liab': 120247000,
            'equity': 213078000,
            'cash': 95104000,
            'cash_flow_op': 53197000,
            'cash_flow_inv': -108748000,
            'cash_flow_fin': 86401000
        })

    def test_shld_20101030(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1310067/000119312510263486/shld-20101030.xml')
        self.assert_item(item, {
            'symbol': 'SHLD',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2010,
            'end_date': '2010-10-30',
            'revenues': 9678000000,
            'op_income': -292000000,
            'net_income': -218000000,
            'eps_basic': -1.98,
            'eps_diluted': -1.98,
            'dividend': 0.0,
            'assets': 26045000000,
            'cur_assets': 13123000000,
            'cur_liab': 10682000000,
            'equity': 8378000000,
            'cash': 790000000,
            'cash_flow_op': -1172000000,
            'cash_flow_inv': -296000000,
            'cash_flow_fin': 532000000
        })

    def test_sial_20101231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/90185/000119312511028579/sial-20101231.xml')
        self.assert_item(item, {
            'symbol': 'SIAL',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2010,
            'end_date': '2010-12-31',
            'revenues': 2271000000,
            'op_income': 551000000,
            'net_income': 384000000,
            'eps_basic': 3.17,
            'eps_diluted': 3.12,
            'dividend': 0.0,
            'assets': 3014000000,
            'cur_assets': 1602000000,
            'cur_liab': 530000000,
            'equity': 1976000000,
            'cash': 569000000,
            'cash_flow_op': 523000000,
            'cash_flow_inv': -182000000,
            'cash_flow_fin': -161000000
        })

    def test_siri_20100630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/908937/000095012310074081/siri-20100630.xml')
        self.assert_item(item, {
            'symbol': 'SIRI',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2010,
            'end_date': '2010-06-30',
            'revenues': 699761000,
            'op_income': 125634000,
            'net_income': 15272000,
            'eps_basic': 0.0,
            'eps_diluted': 0.0,
            'dividend': 0.0,
            'assets': 7200932000,
            'cur_assets': 760172000,
            'cur_liab': 2041871000,
            'equity': 180428000,
            'cash': 258854000,
            'cash_flow_op': 140987000,
            'cash_flow_inv': -159859000,
            'cash_flow_fin': -105763000
        })

    def test_siri_20120331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/908937/000090893712000003/siri-20120331.xml')
        self.assert_item(item, {
            'symbol': 'SIRI',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2012,
            'end_date': '2012-03-31',
            'revenues': 804722000,
            'op_income': 199238000,
            'net_income': 107774000,
            'eps_basic': 0.03,
            'eps_diluted': 0.02,
            'dividend': 0.0,
            'assets': 7501724000,
            'cur_assets': 1337094000,
            'cur_liab': 2236580000,
            'equity': 849579000,
            'cash': 746576000,
            'cash_flow_op': 39948000,
            'cash_flow_inv': -25187000,
            'cash_flow_fin': -42175000
        })

    def test_spex_20130331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12239/000141588913001019/spex-20130331.xml')
        self.assert_item(item, {
            'symbol': 'SPEX',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2013,
            'end_date': '2013-03-31',
            'revenues': 5761,
            'op_income': -910547,
            'net_income': -3696570,
            'eps_basic': -5.35,
            'eps_diluted': None,
            'dividend': 0.0,
            'assets': 3572989,
            'cur_assets': 3535555,
            'cur_liab': 453858,
            'equity': 2857993,
            'cash': 3448526,
            'cash_flow_op': -1049711,
            'cash_flow_inv': None,
            'cash_flow_fin': None
        })

    def test_strza_20121231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1507934/000150793413000015/strza-20121231.xml')
        self.assert_item(item, {
            'symbol': 'STRZA',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2012,
            'end_date': '2012-12-31',
            'revenues': 1630696000,
            'op_income': 405404000,
            'net_income': 254484000,
            'eps_basic': None,
            'eps_diluted': None,
            'dividend': 0.0,
            'assets': 2176050000,
            'cur_assets': 1376911000,
            'cur_liab': 330451000,
            'equity': 1302144000,
            'cash': 749774000,
            'cash_flow_op': 292077000,
            'cash_flow_inv': -16214000,
            'cash_flow_fin': -626101000
        })

    def test_stx_20120928(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1137789/000110465912072744/stx-20120928.xml')
        self.assert_item(item, {
            'symbol': 'STX',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2013,
            'end_date': '2012-09-28',
            'revenues': 3732000000,
            'op_income': 624000000,
            'net_income': 582000000,
            'eps_basic': 1.48,
            'eps_diluted': 1.42,
            'dividend': 0.32,
            'assets': 9522000000,
            'cur_assets': 5749000000,
            'cur_liab': 2753000000,
            'equity': 3535000000,
            'cash': 1894000000,
            'cash_flow_op': 1132000000,
            'cash_flow_inv': -265000000,
            'cash_flow_fin': -681000000
        })

    def test_stx_20121228(self):
        # 'stx-20120928' is misnamed
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1137789/000110465913005497/stx-20120928.xml')
        self.assert_item(item, {
            'symbol': 'STX',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2012-12-28',
            'revenues': 3668000000,
            'op_income': 555000000,
            'net_income': 492000000,
            'eps_basic': 1.33,
            'eps_diluted': 1.3,
            'dividend': 0.7,
            'assets': 8742000000,
            'cur_assets': 5017000000,
            'cur_liab': 2643000000,
            'equity': 2925000000,
            'cash': 1383000000,
            'cash_flow_op': 1976000000,
            'cash_flow_inv': -453000000,
            'cash_flow_fin': -1849000000
        })

    def test_symc_20130628(self):
        # 'symc-20140628.xml' is misnamed
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/849399/000119312513312695/symc-20140628.xml')
        self.assert_item(item, {
            'symbol': 'SYMC',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2014,
            'end_date': '2013-06-28',
            'revenues': 1709000000,
            'op_income': 224000000,
            'net_income': 157000000,
            'eps_basic': 0.23,
            'eps_diluted': 0.22,
            'dividend': 0.15,
            'assets': 13151000000,
            'cur_assets': 5179000000,
            'cur_liab': 4205000000,
            'equity': 5497000000,
            'cash': 3749000000,
            'cash_flow_op': 312000000,
            'cash_flow_inv': -29000000,
            'cash_flow_fin': -1192000000
        })

    def test_tgt_20130803(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/27419/000110465913066569/tgt-20130803.xml')
        self.assert_item(item, {
            'symbol': 'TGT',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2013-08-03',
            'revenues': 17117000000,
            'op_income': 1161000000,
            'net_income': 611000000,
            'eps_basic': 0.96,
            'eps_diluted': 0.95,
            'dividend': 0.43,
            'assets': 44162000000,
            'cur_assets': 11403000000,
            'cur_liab': 12616000000,
            'equity': 16020000000,
            'cash': 1018000000,
            'cash_flow_op': 4109000000,
            'cash_flow_inv': 1269000000,
            'cash_flow_fin': -5148000000
        })

    def test_trv_20100331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/86312/000110465910021504/trv-20100331.xml')
        self.assert_item(item, {
            'symbol': 'TRV',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2010,
            'end_date': '2010-03-31',
            'revenues': 6119000000,
            'op_income': None,
            'net_income': 647000000,
            'eps_basic': 1.26,
            'eps_diluted': 1.25,
            'dividend': 0.0,
            'assets': 108696000000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 26671000000,
            'cash': 251000000,
            'cash_flow_op': 531000000,
            'cash_flow_inv': 952000000,
            'cash_flow_fin': -1486000000
        })

    def test_tsla_20110630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1318605/000119312511221497/tsla-20110630.xml')
        self.assert_item(item, {
            'symbol': 'TSLA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2011,
            'end_date': '2011-06-30',
            'revenues': 58171000,
            'op_income': -58739000,
            'net_income': -58903000,
            'eps_basic': -0.60,
            'eps_diluted': -0.60,
            'dividend': 0.0,
            'assets': 646155000,
            'cur_assets': 417758000,
            'cur_liab': 138736000,
            'equity': 348452000,
            'cash': 319380000,
            'cash_flow_op': -65785000,
            'cash_flow_inv': -13011000,
            'cash_flow_fin': 298618000
        })

    def test_tsla_20111231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1318605/000119312512137560/tsla-20111231.xml')
        self.assert_item(item, {
            'symbol': 'TSLA',
            'amend': True,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2011,
            'end_date': '2011-12-31',
            'revenues': 204242000,
            'op_income': -251488000,
            'net_income': -254411000,
            'eps_basic': -2.53,
            'eps_diluted': -2.53,
            'dividend': 0.0,
            'assets': 713448000,
            'cur_assets': 372838000,
            'cur_liab': 191339000,
            'equity': 224045000,
            'cash': 255266000,
            'cash_flow_op': -114364000,
            'cash_flow_inv': -175928000,
            'cash_flow_fin': 446000000
        })

    def test_tsla_20130630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1318605/000119312513327916/tsla-20130630.xml')
        self.assert_item(item, {
            'symbol': 'TSLA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2013-06-30',
            'revenues': 405139000,
            'op_income': -11792000,
            'net_income': -30502000,
            'eps_basic': -0.26,
            'eps_diluted': -0.26,
            'dividend': 0.0,
            'assets': 1887844000,
            'cur_assets': 1129542000,
            'cur_liab': 486545000,
            'equity': 629426000,
            'cash': 746057000,
            'cash_flow_op': 25886000,
            'cash_flow_inv': -82410000,
            'cash_flow_fin': 600691000
        })

    def test_utmd_20111231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/706698/000109690612002585/utmd-20111231.xml')
        self.assert_item(item, {
            'symbol': 'UTMD',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2011,
            'end_date': '2011-12-31',
            'revenues': 37860000,
            'op_income': 11842000,
            'net_income': 7414000,
            'eps_basic': 2.04,
            'eps_diluted': 2.03,
            'dividend': 0.0,
            'assets': 76389000,
            'cur_assets': 17016000,
            'cur_liab': 9631000,
            'equity': 40757000,
            'cash': 6534000,
            'cash_flow_op': 11365000,
            'cash_flow_inv': -26685000,
            'cash_flow_fin': 18078000
        })

    def test_vel_pe_20130930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/103682/000119312513427104/d-20130930.xml')
        self.assert_item(item, {
            'symbol': 'VEL - PE',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2013,
            'end_date': '2013-09-30',
            'revenues': 3432000000,
            'op_income': 1034000000,
            'net_income': 569000000,
            'eps_basic': 0.98,
            'eps_diluted': 0.98,
            'dividend': 0.5625,
            'assets': 48488000000,
            'cur_assets': 5210000000,
            'cur_liab': 6453000000,
            'equity': 11242000000,
            'cash': 287000000,
            'cash_flow_op': 2950000000,
            'cash_flow_inv': -2348000000,
            'cash_flow_fin': -563000000
        })

    def test_via_20090930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1339947/000119312509221448/via-20090930.xml')
        self.assert_item(item, {
            'symbol': 'VIA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2009,
            'end_date': '2009-09-30',
            'revenues': 3317000000,
            'op_income': 784000000,
            'net_income': 463000000,
            'eps_basic': 0.76,
            'eps_diluted': 0.76,
            'dividend': 0.0,
            'assets': 21307000000,
            'cur_assets': 3605000000,
            'cur_liab': 3707000000,
            'equity': 8044000000,
            'cash': 249000000,
            'cash_flow_op': 732000000,
            'cash_flow_inv': -117000000,
            'cash_flow_fin': -1169000000
        })

    def test_via_20091231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1339947/000119312510028165/via-20091231.xml')
        self.assert_item(item, {
            'symbol': 'VIA',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2009,
            'end_date': '2009-12-31',
            'revenues': 13619000000,
            'op_income': 2904000000,
            'net_income': 1611000000,
            'eps_basic': 2.65,
            'eps_diluted': 2.65,
            'dividend': 0.0,
            'assets': 21900000000,
            'cur_assets': 4430000000,
            'cur_liab': 3751000000,
            'equity': 8677000000,
            'cash': 298000000,
            'cash_flow_op': 1151000000,
            'cash_flow_inv': -274000000,
            'cash_flow_fin': -1388000000
        })

    def test_via_20120630(self):
        # 'via-20120401.xml' is misnamed
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1339947/000119312512333732/via-20120401.xml')
        self.assert_item(item, {
            'symbol': 'VIA',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2012,
            'end_date': '2012-06-30',
            'revenues': 3241000000,
            'op_income': 903000000,
            'net_income': 534000000,
            'eps_basic': 1.02,
            'eps_diluted': 1.01,
            'dividend': 0.275,
            'assets': 21958000000,
            'cur_assets': 4511000000,
            'cur_liab': 3716000000,
            'equity': 7473000000,
            'cash': 774000000,
            'cash_flow_op': 1736000000,
            'cash_flow_inv': -212000000,
            'cash_flow_fin': -1750000000
        })

    def test_vno_20090630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/899689/000089968909000034/vno-20090630.xml')
        self.assert_item(item, {
            'symbol': 'VNO',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'FY',  # mismarked in doc, actually should be Q2
            'fiscal_year': 2009,
            'end_date': '2009-06-30',
            'revenues': 678385000,
            'op_income': 221139000,
            'net_income': -51904000,
            'eps_basic': -0.3,
            'eps_diluted': -0.3,
            'dividend': 0.95,
            'assets': 21831857000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 7122175000,
            'cash': 2068498000,
            'cash_flow_op': 379439000,
            'cash_flow_inv': -219310000,
            'cash_flow_fin': 381516000
        })

    def test_vno_20111231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/899689/000089968912000004/vno-20111231.xml')
        self.assert_item(item, {
            'symbol': 'VNO',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2011,
            'end_date': '2011-12-31',
            'revenues': 2915665000,
            'op_income': 856153000,
            'net_income': 601771000,
            'eps_basic': 3.26,
            'eps_diluted': 3.23,
            'dividend': 0.0,
            'assets': 20446487000,
            'cur_assets': None,
            'cur_liab': None,
            'equity': 7508447000,
            'cash': 606553000,
            'cash_flow_op': 702499000,
            'cash_flow_inv': -164761000,
            'cash_flow_fin': -621974000
        })

    def test_vrsk_20120930(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1442145/000119312512441544/vrsk-20120930.xml')
        self.assert_item(item, {
            'symbol': 'VRSK',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2012,
            'end_date': '2012-09-30',
            'revenues': 398863000,
            'op_income': 155251000,
            'net_income': 82911000,
            'eps_basic': 0.5,
            'eps_diluted': 0.48,
            'dividend': 0.0,
            'assets': 2303433000,
            'cur_assets': 361337000,
            'cur_liab': 668257000,
            'equity': 142048000,
            'cash': 97770000,
            'cash_flow_op': 320997000,
            'cash_flow_inv': -838704000,
            'cash_flow_fin': 424004000
        })

    def test_wat_20120929(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1000697/000119312512448069/wat-20120929.xml')
        self.assert_item(item, {
            'symbol': 'WAT',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q3',
            'fiscal_year': 2012,
            'end_date': '2012-09-29',
            'revenues': 449952000,
            'op_income': 121745000,
            'net_income': 99109000,
            'eps_basic': 1.13,
            'eps_diluted': 1.12,
            'dividend': 0.0,
            'assets': 2997140000,
            'cur_assets': 2137498000,
            'cur_liab': 767562000,
            'equity': 1329879000,
            'cash': 356293000,
            'cash_flow_op': 317627000,
            'cash_flow_inv': -298851000,
            'cash_flow_fin': -53396000
        })

    def test_wec_20130331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/783325/000010781513000080/wec-20130331.xml')
        self.assert_item(item, {
            'symbol': 'WEC',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2013,
            'end_date': '2013-03-31',
            'revenues': 1275200000,
            'op_income': 321000000,
            'net_income': 176600000,
            'eps_basic': 0.77,
            'eps_diluted': 0.76,
            'dividend': 0.34,
            'assets': 14295300000,
            'cur_assets': 1313800000,
            'cur_liab': 1278100000,
            'equity': 8675000000,
            'cash': 24700000,
            'cash_flow_op': 330300000,
            'cash_flow_inv': -145300000,
            'cash_flow_fin': -195900000
        })

    def test_wec_20130630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/783325/000010781513000112/wec-20130630.xml')
        self.assert_item(item, {
            'symbol': 'WEC',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2013-06-30',
            'revenues': 1012300000,
            'op_income': 229500000,
            'net_income': 119000000,
            'eps_basic': 0.52,
            'eps_diluted': 0.52,
            'dividend': 0.34,
            'assets': 14317000000,
            'cur_assets': 1271100000,
            'cur_liab': 1280700000,
            'equity': 8609000000,
            'cash': 21000000,
            'cash_flow_op': 681500000,
            'cash_flow_inv': -336600000,
            'cash_flow_fin': -359500000
        })

    def test_wfm_20120115(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/865436/000144530512000434/wfm-20120115.xml')
        self.assert_item(item, {
            'symbol': 'WFM',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2012,
            'end_date': '2012-01-15',
            'revenues': 3390940000,
            'op_income': 190338000,
            'net_income': 118327000,
            'eps_basic': 0.66,
            'eps_diluted': 0.65,
            'dividend': 0.14,
            'assets': 4528241000,
            'cur_assets': 1677087000,
            'cur_liab': 896972000,
            'equity': 3182747000,
            'cash': 529954000,
            'cash_flow_op': 260896000,
            'cash_flow_inv': -6963000,
            'cash_flow_fin': 63562000
        })

    def test_xel_20100331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/72903/000110465910024080/xel-20100331.xml')
        self.assert_item(item, {
            'symbol': 'XEL',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2010,
            'end_date': '2010-03-31',
            'revenues': 2807462000,
            'op_income': 403665000,
            'net_income': 166058000,
            'eps_basic': 0.36,
            'eps_diluted': 0.36,
            'dividend': 0.25,
            'assets': 25334501000,
            'cur_assets': 2344294000,
            'cur_liab': 2759838000,
            'equity': 7355871000,
            'cash': 79504000,
            'cash_flow_op': 555539000,
            'cash_flow_inv': -460112000,
            'cash_flow_fin': -121731000
        })

    def test_xel_20101231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/72903/000114036111012444/xel-20101231.xml')
        self.assert_item(item, {
            'symbol': 'XEL',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2010,
            'end_date': '2010-12-31',
            'revenues': 10310947000,
            'op_income': 1619969000,
            'net_income': 751593000,
            'eps_basic': 1.63,
            'eps_diluted': 1.62,
            'dividend': 1.0,
            'assets': 27387690000,
            'cur_assets': 2732643000,
            'cur_liab': 2536533000,
            'equity': 8083519000,
            'cash': 108437000,
            'cash_flow_op': 1893942000,
            'cash_flow_inv': -2806724000,
            'cash_flow_fin': 905571000
        })

    def test_xom_20110331(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/34088/000119312511127973/xom-20110331.xml')
        self.assert_item(item, {
            'symbol': 'XOM',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q1',
            'fiscal_year': 2011,
            'end_date': '2011-03-31',
            'revenues': 114004000000,
            'op_income': None,
            'net_income': 10650000000,
            'eps_basic': 2.14,
            'eps_diluted': 2.14,
            'dividend': 0.44,
            'assets': 319533000000,
            'cur_assets': 72022000000,
            'cur_liab': 73576000000,
            'equity': 157531000000,
            'cash': 12833000000,
            'cash_flow_op': 16856000000,
            'cash_flow_inv': -5353000000,
            'cash_flow_fin': -6749000000
        })

    def test_xom_20111231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/34088/000119312512078102/xom-20111231.xml')
        self.assert_item(item, {
            'symbol': 'XOM',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2011,
            'end_date': '2011-12-31',
            'revenues': 467029000000,
            'op_income': None,
            'net_income': 41060000000,
            'eps_basic': 8.43,
            'eps_diluted': 8.42,
            'dividend': 1.85,
            'assets': 331052000000,
            'cur_assets': 72963000000,
            'cur_liab': 77505000000,
            'equity': 160744000000,
            'cash': 12664000000,
            'cash_flow_op': 55345000000,
            'cash_flow_inv': -22165000000,
            'cash_flow_fin': -28256000000
        })

    def test_xom_20130630(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/34088/000003408813000035/xom-20130630.xml')
        self.assert_item(item, {
            'symbol': 'XOM',
            'amend': False,
            'doc_type': '10-Q',
            'period_focus': 'Q2',
            'fiscal_year': 2013,
            'end_date': '2013-06-30',
            'revenues': 106469000000,
            'op_income': None,
            'net_income': 6860000000,
            'eps_basic': 1.55,
            'eps_diluted': 1.55,
            'dividend': 0.63,
            'assets': 341615000000,
            'cur_assets': 62844000000,
            'cur_liab': 72688000000,
            'equity': 171588000000,
            'cash': 4609000000,
            'cash_flow_op': 21275000000,
            'cash_flow_inv': -18547000000,
            'cash_flow_fin': -7409000000
        })

    def test_xray_20091231(self):
        item = parse_xml('http://www.sec.gov/Archives/edgar/data/818479/000114420410009164/xray-20091231.xml')
        self.assert_item(item, {
            'symbol': 'XRAY',
            'amend': False,
            'doc_type': '10-K',
            'period_focus': 'FY',
            'fiscal_year': 2009,
            'end_date': '2009-12-31',
            'revenues': 2159916000,
            'op_income': 381187000,
            'net_income': 274258000,
            'eps_basic': 1.85,
            'eps_diluted': 1.83,
            'dividend': 0.2,
            'assets': 3087932000,
            'cur_assets': 1217796000,

Download .txt

gitextract_mp6yf35w/

├── .gitignore
├── .travis.yml
├── LICENSE
├── MANIFEST.in
├── README.rst
├── bin/
│   └── pystock-crawler
├── pystock_crawler/
│   ├── __init__.py
│   ├── exporters.py
│   ├── items.py
│   ├── loaders.py
│   ├── settings.py
│   ├── spiders/
│   │   ├── __init__.py
│   │   ├── edgar.py
│   │   ├── nasdaq.py
│   │   └── yahoo.py
│   ├── tests/
│   │   ├── __init__.py
│   │   ├── base.py
│   │   ├── test_cmdline.py
│   │   ├── test_loaders.py
│   │   ├── test_spiders_edgar.py
│   │   ├── test_spiders_nasdaq.py
│   │   ├── test_spiders_yahoo.py
│   │   └── test_utils.py
│   ├── throttle.py
│   └── utils.py
├── pytest.ini
├── requirements-test.txt
├── requirements.txt
├── scrapy.cfg
└── setup.py

Download .txt

SYMBOL INDEX (307 symbols across 16 files)

FILE: pystock_crawler/exporters.py
  class CsvItemExporter2 (line 5) | class CsvItemExporter2(CsvItemExporter):
    method __init__ (line 14) | def __init__(self, *args, **kwargs):
    method _write_headers_and_set_fields_to_export (line 20) | def _write_headers_and_set_fields_to_export(self, item):
  class SymbolListExporter (line 32) | class SymbolListExporter(BaseItemExporter):
    method __init__ (line 34) | def __init__(self, file, **kwargs):
    method export_item (line 38) | def export_item(self, item):

FILE: pystock_crawler/items.py
  class ReportItem (line 9) | class ReportItem(Item):
  class PriceItem (line 47) | class PriceItem(Item):
  class SymbolItem (line 62) | class SymbolItem(Item):

FILE: pystock_crawler/loaders.py
  class IntermediateValue (line 25) | class IntermediateValue(object):
    method __init__ (line 31) | def __init__(self, local_name, value, text, context, node=None, start_...
    method __cmp__ (line 42) | def __cmp__(self, other):
    method __repr__ (line 49) | def __repr__(self):
    method is_member (line 55) | def is_member(self):
  class ExtractText (line 59) | class ExtractText(object):
    method __call__ (line 61) | def __call__(self, value):
  class MatchEndDate (line 70) | class MatchEndDate(object):
    method __init__ (line 72) | def __init__(self, data_type=str, ignore_date_range=False):
    method __call__ (line 76) | def __call__(self, value, loader_context):
  class ImdSumMembersOr (line 136) | class ImdSumMembersOr(object):
    method __init__ (line 138) | def __init__(self, second_func=None):
    method __call__ (line 141) | def __call__(self, imd_values):
  function date_range_matches_doc_type (line 158) | def date_range_matches_doc_type(doc_type, start_date, end_date):
  function get_amend (line 164) | def get_amend(values):
  function get_symbol (line 170) | def get_symbol(values):
  function imd_max (line 177) | def imd_max(imd_values):
  function imd_min (line 184) | def imd_min(imd_values):
  function imd_sum (line 191) | def imd_sum(imd_values):
  function imd_get_revenues (line 195) | def imd_get_revenues(imd_values):
  function imd_get_net_income (line 205) | def imd_get_net_income(imd_values):
  function imd_get_op_income (line 209) | def imd_get_op_income(imd_values):
  function imd_get_cash_flow (line 214) | def imd_get_cash_flow(imd_values, loader_context):
  function imd_get_per_share_value (line 232) | def imd_get_per_share_value(imd_values):
  function imd_get_equity (line 251) | def imd_get_equity(imd_values):
  function imd_filter_member (line 266) | def imd_filter_member(imd_values):
  function imd_mult (line 283) | def imd_mult(imd_values):
  function memberness (line 300) | def memberness(context):
  function is_member (line 320) | def is_member(context):
  function str_to_bool (line 331) | def str_to_bool(value):
  function find_namespace (line 338) | def find_namespace(xxs, name):
  function register_namespace (line 345) | def register_namespace(xxs, name):
  function register_namespaces (line 350) | def register_namespaces(xxs):
  class XmlXPathItemLoader (line 359) | class XmlXPathItemLoader(ItemLoader):
    method __init__ (line 361) | def __init__(self, *args, **kwargs):
    method add_xpath (line 365) | def add_xpath(self, field_name, xpath, *processors, **kw):
    method add_xpaths (line 370) | def add_xpaths(self, name, paths):
    method _get_values (line 378) | def _get_values(self, xpaths, **kw):
  class ReportItemLoader (line 383) | class ReportItemLoader(XmlXPathItemLoader):
    method __init__ (line 439) | def __init__(self, *args, **kwargs):
    method _get_symbol (line 614) | def _get_symbol(self):
    method _get_doc_fiscal_year (line 621) | def _get_doc_fiscal_year(self):
    method _guess_fiscal_year (line 628) | def _guess_fiscal_year(self, end_date, period_focus):
    method _get_doc_end_date (line 663) | def _get_doc_end_date(self):
    method _get_doc_type (line 684) | def _get_doc_type(self):
    method _get_period_focus (line 690) | def _get_period_focus(self, doc_end_date):

FILE: pystock_crawler/spiders/edgar.py
  class URLGenerator (line 10) | class URLGenerator(object):
    method __init__ (line 12) | def __init__(self, symbols, start_date='', end_date='', start=0, count...
    method __iter__ (line 18) | def __iter__(self):
  class EdgarSpider (line 24) | class EdgarSpider(CrawlSpider):
    method __init__ (line 34) | def __init__(self, **kwargs):
    method parse_10qk (line 57) | def parse_10qk(self, response):

FILE: pystock_crawler/spiders/nasdaq.py
  function generate_urls (line 12) | def generate_urls(exchanges):
  class NasdaqSpider (line 17) | class NasdaqSpider(Spider):
    method __init__ (line 22) | def __init__(self, **kwargs):
    method parse (line 28) | def parse(self, response):

FILE: pystock_crawler/spiders/yahoo.py
  function parse_date (line 12) | def parse_date(date_str):
  function make_url (line 19) | def make_url(symbol, start_date=None, end_date=None):
  function generate_urls (line 38) | def generate_urls(symbols, start_date=None, end_date=None):
  class YahooSpider (line 43) | class YahooSpider(Spider):
    method __init__ (line 48) | def __init__(self, **kwargs):
    method parse (line 69) | def parse(self, response):
    method _get_symbol_from_url (line 82) | def _get_symbol_from_url(self, url):

FILE: pystock_crawler/tests/base.py
  class TestCaseBase (line 9) | class TestCaseBase(unittest.TestCase):
    method assert_none_or_almost_equal (line 14) | def assert_none_or_almost_equal(self, value, expected_value):
    method assert_item (line 20) | def assert_item(self, item, expected):
  function _create_sample_data_dir (line 44) | def _create_sample_data_dir():

FILE: pystock_crawler/tests/test_cmdline.py
  class PrintTest (line 20) | class PrintTest(unittest.TestCase):
    method test_no_args (line 22) | def test_no_args(self):
    method test_print_help (line 26) | def test_print_help(self):
    method test_print_version (line 33) | def test_print_version(self):
  class CrawlTest (line 41) | class CrawlTest(unittest.TestCase):
    method setUp (line 43) | def setUp(self):
    method tearDown (line 54) | def tearDown(self):
    method assert_cache (line 57) | def assert_cache(self):
    method assert_log (line 62) | def assert_log(self):
    method get_output_content (line 67) | def get_output_content(self):
  class CrawlSymbolsTest (line 76) | class CrawlSymbolsTest(CrawlTest):
    method assert_nyse_output (line 81) | def assert_nyse_output(self):
    method assert_nyse_and_nasdaq_output (line 93) | def assert_nyse_and_nasdaq_output(self):
    method test_crawl_nyse (line 105) | def test_crawl_nyse(self):
    method test_crawl_nyse_and_nasdaq (line 112) | def test_crawl_nyse_and_nasdaq(self):
  class CrawlPricesTest (line 120) | class CrawlPricesTest(CrawlTest):
    method test_crawl_inline_symbols (line 125) | def test_crawl_inline_symbols(self):
    method test_crawl_symbol_file (line 135) | def test_crawl_symbol_file(self):
  class CrawlReportsTest (line 152) | class CrawlReportsTest(CrawlTest):
    method test_crawl_inline_symbols (line 157) | def test_crawl_inline_symbols(self):
    method test_crawl_symbol_file (line 168) | def test_crawl_symbol_file(self):
    method test_merge_empty_results (line 195) | def test_merge_empty_results(self):

FILE: pystock_crawler/tests/test_loaders.py
  function create_response (line 11) | def create_response(file_path):
  function download (line 17) | def download(url, local_path):
  function parse_xml (line 34) | def parse_xml(url):
  class ReportItemLoaderTest (line 43) | class ReportItemLoaderTest(TestCaseBase):
    method test_a_20110131 (line 45) | def test_a_20110131(self):
    method test_aa_20120630 (line 70) | def test_aa_20120630(self):
    method test_aapl_20100626 (line 95) | def test_aapl_20100626(self):
    method test_aapl_20110326 (line 120) | def test_aapl_20110326(self):
    method test_aapl_20120929 (line 145) | def test_aapl_20120929(self):
    method test_aes_20100331 (line 170) | def test_aes_20100331(self):
    method test_adbe_20060914 (line 195) | def test_adbe_20060914(self):
    method test_adbe_20090227 (line 201) | def test_adbe_20090227(self):
    method test_agn_20101231 (line 226) | def test_agn_20101231(self):
    method test_aig_20130630 (line 251) | def test_aig_20130630(self):
    method test_aiv_20110630 (line 276) | def test_aiv_20110630(self):
    method test_all_20130331 (line 301) | def test_all_20130331(self):
    method test_apa_20120930 (line 326) | def test_apa_20120930(self):
    method test_axp_20100930 (line 351) | def test_axp_20100930(self):
    method test_axp_20120630 (line 376) | def test_axp_20120630(self):
    method test_axp_20121231 (line 401) | def test_axp_20121231(self):
    method test_axp_20130331 (line 426) | def test_axp_20130331(self):
    method test_ba_20091231 (line 451) | def test_ba_20091231(self):
    method test_ba_20110930 (line 476) | def test_ba_20110930(self):
    method test_ba_20130331 (line 501) | def test_ba_20130331(self):
    method test_bbt_20110930 (line 526) | def test_bbt_20110930(self):
    method test_bk_20100331 (line 551) | def test_bk_20100331(self):
    method test_blk_20130630 (line 576) | def test_blk_20130630(self):
    method test_c_20090630 (line 601) | def test_c_20090630(self):
    method test_cbs_20100331 (line 626) | def test_cbs_20100331(self):
    method test_cbs_20111231 (line 651) | def test_cbs_20111231(self):
    method test_cbs_20130630 (line 676) | def test_cbs_20130630(self):
    method test_cce_20101001 (line 701) | def test_cce_20101001(self):
    method test_cce_20101231 (line 726) | def test_cce_20101231(self):
    method test_cci_20091231 (line 751) | def test_cci_20091231(self):
    method test_ccmm_20110630 (line 776) | def test_ccmm_20110630(self):
    method test_chtr_20111231 (line 801) | def test_chtr_20111231(self):
    method test_ci_20130331 (line 826) | def test_ci_20130331(self):
    method test_cit_20100630 (line 851) | def test_cit_20100630(self):
    method test_csc_20120928 (line 876) | def test_csc_20120928(self):
    method test_disca_20090630 (line 901) | def test_disca_20090630(self):
    method test_disca_20090930 (line 926) | def test_disca_20090930(self):
    method test_dltr_20130504 (line 951) | def test_dltr_20130504(self):
    method test_dtv_20110331 (line 976) | def test_dtv_20110331(self):
    method test_ebay_20100630 (line 1001) | def test_ebay_20100630(self):
    method test_ebay_20130331 (line 1026) | def test_ebay_20130331(self):
    method test_ecl_20120930 (line 1051) | def test_ecl_20120930(self):
    method test_ed_20130930 (line 1076) | def test_ed_20130930(self):
    method test_eqt_20101231 (line 1101) | def test_eqt_20101231(self):
    method test_etr_20121231 (line 1126) | def test_etr_20121231(self):
    method test_exc_20100930 (line 1152) | def test_exc_20100930(self):
    method test_fast_20090630 (line 1177) | def test_fast_20090630(self):
    method test_fast_20090930 (line 1202) | def test_fast_20090930(self):
    method test_fb_20120630 (line 1227) | def test_fb_20120630(self):
    method test_fb_20121231 (line 1252) | def test_fb_20121231(self):
    method test_fll_20121231 (line 1277) | def test_fll_20121231(self):
    method test_flr_20080930 (line 1302) | def test_flr_20080930(self):
    method test_fmc_20090630 (line 1327) | def test_fmc_20090630(self):
    method test_fpl_20100331 (line 1352) | def test_fpl_20100331(self):
    method test_ftr_20110930 (line 1378) | def test_ftr_20110930(self):
    method test_ge_20121231 (line 1403) | def test_ge_20121231(self):
    method test_gis_20121125 (line 1428) | def test_gis_20121125(self):
    method test_gmcr_20110625 (line 1453) | def test_gmcr_20110625(self):
    method test_goog_20090930 (line 1478) | def test_goog_20090930(self):
    method test_goog_20120930 (line 1503) | def test_goog_20120930(self):
    method test_goog_20121231 (line 1528) | def test_goog_20121231(self):
    method test_goog_20130630 (line 1553) | def test_goog_20130630(self):
    method test_goog_20140630 (line 1578) | def test_goog_20140630(self):
    method test_gs_20090626 (line 1603) | def test_gs_20090626(self):
    method test_hon_20120331 (line 1628) | def test_hon_20120331(self):
    method test_hrb_20090731 (line 1653) | def test_hrb_20090731(self):
    method test_hrb_20091031 (line 1678) | def test_hrb_20091031(self):
    method test_hrb_20130731 (line 1703) | def test_hrb_20130731(self):
    method test_ihc_20120331 (line 1728) | def test_ihc_20120331(self):
    method test_intc_20111231 (line 1753) | def test_intc_20111231(self):
    method test_intu_20101031 (line 1778) | def test_intu_20101031(self):
    method test_jnj_20120101 (line 1803) | def test_jnj_20120101(self):
    method test_jnj_20120930 (line 1828) | def test_jnj_20120930(self):
    method test_jnj_20130630 (line 1853) | def test_jnj_20130630(self):
    method test_jpm_20090630 (line 1878) | def test_jpm_20090630(self):
    method test_jpm_20111231 (line 1903) | def test_jpm_20111231(self):
    method test_jpm_20130331 (line 1928) | def test_jpm_20130331(self):
    method test_ko_20100402 (line 1953) | def test_ko_20100402(self):
    method test_ko_20101231 (line 1978) | def test_ko_20101231(self):
    method test_ko_20120928 (line 2003) | def test_ko_20120928(self):
    method test_krft_20120930 (line 2028) | def test_krft_20120930(self):
    method test_l_20100331 (line 2053) | def test_l_20100331(self):
    method test_l_20100930 (line 2078) | def test_l_20100930(self):
    method test_lbtya_20100331 (line 2103) | def test_lbtya_20100331(self):
    method test_lcapa_20110930 (line 2128) | def test_lcapa_20110930(self):
    method test_linta_20120331 (line 2154) | def test_linta_20120331(self):
    method test_lll_20100625 (line 2179) | def test_lll_20100625(self):
    method test_lltc_20110102 (line 2204) | def test_lltc_20110102(self):
    method test_lltc_20111002 (line 2229) | def test_lltc_20111002(self):
    method test_lly_20100930 (line 2254) | def test_lly_20100930(self):
    method test_lmca_20120331 (line 2279) | def test_lmca_20120331(self):
    method test_lnc_20120930 (line 2305) | def test_lnc_20120930(self):
    method test_ltd_20111029 (line 2330) | def test_ltd_20111029(self):
    method test_ltd_20130803 (line 2356) | def test_ltd_20130803(self):
    method test_luv_20110630 (line 2381) | def test_luv_20110630(self):
    method test_mchp_20120630 (line 2406) | def test_mchp_20120630(self):
    method test_mdlz_20130930 (line 2431) | def test_mdlz_20130930(self):
    method test_mmm_20091231 (line 2456) | def test_mmm_20091231(self):
    method test_mmm_20120331 (line 2481) | def test_mmm_20120331(self):
    method test_mmm_20130630 (line 2506) | def test_mmm_20130630(self):
    method test_mnst_20130630 (line 2531) | def test_mnst_20130630(self):
    method test_msft_20110630 (line 2556) | def test_msft_20110630(self):
    method test_msft_20111231 (line 2581) | def test_msft_20111231(self):
    method test_msft_20130331 (line 2606) | def test_msft_20130331(self):
    method test_mu_20121129 (line 2631) | def test_mu_20121129(self):
    method test_mxim_20110326 (line 2656) | def test_mxim_20110326(self):
    method test_nflx_20120930 (line 2681) | def test_nflx_20120930(self):
    method test_nvda_20130127 (line 2706) | def test_nvda_20130127(self):
    method test_nws_20090930 (line 2731) | def test_nws_20090930(self):
    method test_omx_20110924 (line 2757) | def test_omx_20110924(self):
    method test_omx_20111231 (line 2782) | def test_omx_20111231(self):
    method test_omx_20121229 (line 2807) | def test_omx_20121229(self):
    method test_orly_20130331 (line 2832) | def test_orly_20130331(self):
    method test_pay_20110430 (line 2857) | def test_pay_20110430(self):
    method test_pcar_20100331 (line 2882) | def test_pcar_20100331(self):
    method test_pcg_20091231 (line 2907) | def test_pcg_20091231(self):
    method test_plt_20130630 (line 2932) | def test_plt_20130630(self):
    method test_qep_20110630 (line 2957) | def test_qep_20110630(self):
    method test_qep_20120930 (line 2982) | def test_qep_20120930(self):
    method test_regn_20100630 (line 3007) | def test_regn_20100630(self):
    method test_sbac_20110331 (line 3032) | def test_sbac_20110331(self):
    method test_shld_20101030 (line 3057) | def test_shld_20101030(self):
    method test_sial_20101231 (line 3082) | def test_sial_20101231(self):
    method test_siri_20100630 (line 3107) | def test_siri_20100630(self):
    method test_siri_20120331 (line 3132) | def test_siri_20120331(self):
    method test_spex_20130331 (line 3157) | def test_spex_20130331(self):
    method test_strza_20121231 (line 3182) | def test_strza_20121231(self):
    method test_stx_20120928 (line 3207) | def test_stx_20120928(self):
    method test_stx_20121228 (line 3232) | def test_stx_20121228(self):
    method test_symc_20130628 (line 3258) | def test_symc_20130628(self):
    method test_tgt_20130803 (line 3284) | def test_tgt_20130803(self):
    method test_trv_20100331 (line 3309) | def test_trv_20100331(self):
    method test_tsla_20110630 (line 3334) | def test_tsla_20110630(self):
    method test_tsla_20111231 (line 3359) | def test_tsla_20111231(self):
    method test_tsla_20130630 (line 3384) | def test_tsla_20130630(self):
    method test_utmd_20111231 (line 3409) | def test_utmd_20111231(self):
    method test_vel_pe_20130930 (line 3434) | def test_vel_pe_20130930(self):
    method test_via_20090930 (line 3459) | def test_via_20090930(self):
    method test_via_20091231 (line 3484) | def test_via_20091231(self):
    method test_via_20120630 (line 3509) | def test_via_20120630(self):
    method test_vno_20090630 (line 3535) | def test_vno_20090630(self):
    method test_vno_20111231 (line 3560) | def test_vno_20111231(self):
    method test_vrsk_20120930 (line 3585) | def test_vrsk_20120930(self):
    method test_wat_20120929 (line 3610) | def test_wat_20120929(self):
    method test_wec_20130331 (line 3635) | def test_wec_20130331(self):
    method test_wec_20130630 (line 3660) | def test_wec_20130630(self):
    method test_wfm_20120115 (line 3685) | def test_wfm_20120115(self):
    method test_xel_20100331 (line 3710) | def test_xel_20100331(self):
    method test_xel_20101231 (line 3735) | def test_xel_20101231(self):
    method test_xom_20110331 (line 3760) | def test_xom_20110331(self):
    method test_xom_20111231 (line 3785) | def test_xom_20111231(self):
    method test_xom_20130630 (line 3810) | def test_xom_20130630(self):
    method test_xray_20091231 (line 3835) | def test_xray_20091231(self):
    method test_xrx_20091231 (line 3860) | def test_xrx_20091231(self):
    method test_zmh_20090630 (line 3885) | def test_zmh_20090630(self):

FILE: pystock_crawler/tests/test_spiders_edgar.py
  function make_url (line 10) | def make_url(symbol, start_date='', end_date=''):
  function make_link_html (line 16) | def make_link_html(href, text=u'Link'):
  class URLGeneratorTest (line 20) | class URLGeneratorTest(TestCaseBase):
    method test_no_dates (line 22) | def test_no_dates(self):
    method test_with_start_date (line 28) | def test_with_start_date(self):
    method test_with_end_date (line 36) | def test_with_end_date(self):
    method test_with_start_and_end_dates (line 44) | def test_with_start_and_end_dates(self):
  class EdgarSpiderTest (line 53) | class EdgarSpiderTest(TestCaseBase):
    method test_empty_creation (line 55) | def test_empty_creation(self):
    method test_symbol_file (line 59) | def test_symbol_file(self):
    method test_invalid_dates (line 75) | def test_invalid_dates(self):
    method test_symbol_file_and_dates (line 82) | def test_symbol_file_and_dates(self):
    method test_parse_company_filing_page (line 99) | def test_parse_company_filing_page(self):
    method test_parse_quarter_or_annual_page (line 136) | def test_parse_quarter_or_annual_page(self):
    method test_parse_xml_report (line 169) | def test_parse_xml_report(self):

FILE: pystock_crawler/tests/test_spiders_nasdaq.py
  class NasdaqSpiderTest (line 7) | class NasdaqSpiderTest(TestCaseBase):
    method test_parse (line 9) | def test_parse(self):

FILE: pystock_crawler/tests/test_spiders_yahoo.py
  class MakeURLTest (line 10) | class MakeURLTest(TestCaseBase):
    method test_no_dates (line 12) | def test_no_dates(self):
    method test_only_start_date (line 18) | def test_only_start_date(self):
    method test_only_end_date (line 24) | def test_only_end_date(self):
    method test_start_and_end_dates (line 30) | def test_start_and_end_dates(self):
  class YahooSpiderTest (line 37) | class YahooSpiderTest(TestCaseBase):
    method test_empty_creation (line 39) | def test_empty_creation(self):
    method test_inline_symbols (line 43) | def test_inline_symbols(self):
    method test_symbol_file (line 52) | def test_symbol_file(self):
    method test_illegal_dates (line 65) | def test_illegal_dates(self):
    method test_parse (line 72) | def test_parse(self):

FILE: pystock_crawler/tests/test_utils.py
  class UtilsTest (line 8) | class UtilsTest(TestCaseBase):
    method test_check_date_arg (line 10) | def test_check_date_arg(self):
    method test_parse_limit_arg (line 31) | def test_parse_limit_arg(self):
    method test_load_symbols (line 41) | def test_load_symbols(self):
    method test_parse_csv (line 52) | def test_parse_csv(self):

FILE: pystock_crawler/throttle.py
  class PassiveThrottle (line 7) | class PassiveThrottle(object):
    method __init__ (line 16) | def __init__(self, crawler):
    method from_crawler (line 27) | def from_crawler(cls, crawler):
    method _spider_opened (line 30) | def _spider_opened(self, spider):
    method _min_delay (line 37) | def _min_delay(self, spider):
    method _max_delay (line 42) | def _max_delay(self, spider):
    method _retry_http_codes (line 45) | def _retry_http_codes(self):
    method _response_downloaded (line 48) | def _response_downloaded(self, response, request, spider):
    method _get_slot (line 62) | def _get_slot(self, request, spider):
    method _adjust_delay (line 66) | def _adjust_delay(self, slot, response):

FILE: pystock_crawler/utils.py
  function check_date_arg (line 6) | def check_date_arg(value, arg_name=None):
  function parse_limit_arg (line 16) | def parse_limit_arg(value):
  function load_symbols (line 28) | def load_symbols(file_path):
  function parse_csv (line 39) | def parse_csv(file_like):

FILE: setup.py
  function find_version (line 17) | def find_version(*file_paths):
  function read_description (line 31) | def read_description(filename):
  function parse_requirements (line 36) | def parse_requirements(filename):

Download .json

Condensed preview — 30 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (230K chars).

[
  {
    "path": ".gitignore",
    "chars": 115,
    "preview": "*.csv\n*.log\n*.pyc\n.coverage\n.scrapy/\n.~*\nbuild/\ndist/\npystock_crawler.egg-info/\npystock_crawler/tests/sample_data/\n"
  },
  {
    "path": ".travis.yml",
    "chars": 232,
    "preview": "language: python\npython:\n  - 2.7\nbranches:\n  only:\n    - master\ninstall:\n  - pip install -r requirements.txt\n  - pip ins"
  },
  {
    "path": "LICENSE",
    "chars": 1083,
    "preview": "The MIT License (MIT)\n\nCopyright (c) 2013 Chang-Hung Liang\n\nPermission is hereby granted, free of charge, to any person "
  },
  {
    "path": "MANIFEST.in",
    "chars": 43,
    "preview": "include README.rst LICENSE requirements.txt"
  },
  {
    "path": "README.rst",
    "chars": 7382,
    "preview": "pystock-crawler\n===============\n\n.. image:: https://badge.fury.io/py/pystock-crawler.png\n    :target: http://badge.fury."
  },
  {
    "path": "bin/pystock-crawler",
    "chars": 7472,
    "preview": "#!/usr/bin/env python\n'''\nUsage:\n  pystock-crawler symbols <exchanges> (-o OUTPUT) [-l LOGFILE] [-w WORKING_DIR]\n       "
  },
  {
    "path": "pystock_crawler/__init__.py",
    "chars": 22,
    "preview": "__version__ = '0.8.2'\n"
  },
  {
    "path": "pystock_crawler/exporters.py",
    "chars": 1530,
    "preview": "from scrapy.conf import settings\nfrom scrapy.contrib.exporter import BaseItemExporter, CsvItemExporter\n\n\nclass CsvItemEx"
  },
  {
    "path": "pystock_crawler/items.py",
    "chars": 1219,
    "preview": "# Define here the models for your scraped items\n#\n# See documentation in:\n# http://doc.scrapy.org/en/latest/topics/items"
  },
  {
    "path": "pystock_crawler/loaders.py",
    "chars": 25910,
    "preview": "import re\n\nfrom datetime import datetime, timedelta\nfrom scrapy import log\nfrom scrapy.contrib.loader import ItemLoader\n"
  },
  {
    "path": "pystock_crawler/settings.py",
    "chars": 1538,
    "preview": "# Scrapy settings for pystock-crawler project\n#\n# For simplicity, this file contains only the most important settings by"
  },
  {
    "path": "pystock_crawler/spiders/__init__.py",
    "chars": 161,
    "preview": "# This package will contain the spiders of your Scrapy project\n#\n# Please refer to the documentation for information on "
  },
  {
    "path": "pystock_crawler/spiders/edgar.py",
    "chars": 2259,
    "preview": "import os\n\nfrom scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor\nfrom scrapy.contrib.spiders import CrawlSpid"
  },
  {
    "path": "pystock_crawler/spiders/nasdaq.py",
    "chars": 1102,
    "preview": "import cStringIO\nimport re\n\nfrom scrapy.spider import Spider\n\nfrom pystock_crawler.items import SymbolItem\n\n\nRE_SYMBOL ="
  },
  {
    "path": "pystock_crawler/spiders/yahoo.py",
    "chars": 2549,
    "preview": "import cStringIO\nimport os\nimport re\n\nfrom datetime import datetime\nfrom scrapy.spider import Spider\n\nfrom pystock_crawl"
  },
  {
    "path": "pystock_crawler/tests/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "pystock_crawler/tests/base.py",
    "chars": 2522,
    "preview": "import os\nimport unittest\n\n\n# Stores temporary test data\nSAMPLE_DATA_DIR = os.path.join(os.path.abspath(os.path.dirname("
  },
  {
    "path": "pystock_crawler/tests/test_cmdline.py",
    "chars": 6981,
    "preview": "import os\nimport shutil\nimport unittest\n\nimport pystock_crawler\n\nfrom envoy import run\n\n\nTEST_DIR = './test_data'\n\n\n# Sc"
  },
  {
    "path": "pystock_crawler/tests/test_loaders.py",
    "chars": 137461,
    "preview": "import os\nimport requests\nimport urlparse\n\nfrom scrapy.http.response.xml import XmlResponse\n\nfrom pystock_crawler.loader"
  },
  {
    "path": "pystock_crawler/tests/test_spiders_edgar.py",
    "chars": 8621,
    "preview": "import os\nimport tempfile\n\nfrom scrapy.http import HtmlResponse, XmlResponse\n\nfrom pystock_crawler.spiders.edgar import "
  },
  {
    "path": "pystock_crawler/tests/test_spiders_nasdaq.py",
    "chars": 1205,
    "preview": "from scrapy.http import TextResponse\n\nfrom pystock_crawler.spiders.nasdaq import NasdaqSpider\nfrom pystock_crawler.tests"
  },
  {
    "path": "pystock_crawler/tests/test_spiders_yahoo.py",
    "chars": 3717,
    "preview": "import os\nimport tempfile\n\nfrom scrapy.http import TextResponse\n\nfrom pystock_crawler.spiders.yahoo import make_url, Yah"
  },
  {
    "path": "pystock_crawler/tests/test_utils.py",
    "chars": 1935,
    "preview": "import cStringIO\nimport os\n\nfrom pystock_crawler import utils\nfrom pystock_crawler.tests.base import SAMPLE_DATA_DIR, Te"
  },
  {
    "path": "pystock_crawler/throttle.py",
    "chars": 2757,
    "preview": "import logging\n\nfrom scrapy.exceptions import NotConfigured\nfrom scrapy import signals\n\n\nclass PassiveThrottle(object):\n"
  },
  {
    "path": "pystock_crawler/utils.py",
    "chars": 1250,
    "preview": "import csv\n\nfrom datetime import datetime\n\n\ndef check_date_arg(value, arg_name=None):\n    if value:\n        try:\n       "
  },
  {
    "path": "pytest.ini",
    "chars": 100,
    "preview": "[pytest]\naddopts = --cov-report term-missing --cov pystock_crawler --cov bin pystock_crawler/tests/\n"
  },
  {
    "path": "requirements-test.txt",
    "chars": 33,
    "preview": "envoy\npytest\npytest-cov\nrequests\n"
  },
  {
    "path": "requirements.txt",
    "chars": 68,
    "preview": "docopt==0.6.2\nleveldb==0.193\nScrapy==0.24.4\nservice-identity==1.0.0\n"
  },
  {
    "path": "scrapy.cfg",
    "chars": 272,
    "preview": "# Automatically created by: scrapy startproject\n#\n# For more information about the [deploy] section see:\n# http://doc.sc"
  },
  {
    "path": "setup.py",
    "chars": 2219,
    "preview": "try:\n    from setuptools import setup\nexcept ImportError:\n    from distutils.core import setup\n\nimport codecs\nimport os\n"
  }
]

About this extraction

This page contains the full source code of the eliangcs/pystock-crawler GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 30 files (216.6 KB), approximately 62.4k tokens, and a symbol index with 307 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo