[
  {
    "path": ".gitignore",
    "content": "*.csv\n*.log\n*.pyc\n.coverage\n.scrapy/\n.~*\nbuild/\ndist/\npystock_crawler.egg-info/\npystock_crawler/tests/sample_data/\n"
  },
  {
    "path": ".travis.yml",
    "content": "language: python\npython:\n  - 2.7\nbranches:\n  only:\n    - master\ninstall:\n  - pip install -r requirements.txt\n  - pip install -r requirements-test.txt\nscript:\n  - py.test\nafter_success:\n  - pip install python-coveralls\n  - coveralls\n"
  },
  {
    "path": "LICENSE",
    "content": "The MIT License (MIT)\n\nCopyright (c) 2013 Chang-Hung Liang\n\nPermission is hereby granted, free of charge, to any person obtaining a copy of\nthis software and associated documentation files (the \"Software\"), to deal in\nthe Software without restriction, including without limitation the rights to\nuse, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of\nthe Software, and to permit persons to whom the Software is furnished to do so,\nsubject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS\nFOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR\nCOPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER\nIN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN\nCONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.\n"
  },
  {
    "path": "MANIFEST.in",
    "content": "include README.rst LICENSE requirements.txt"
  },
  {
    "path": "README.rst",
    "content": "pystock-crawler\n===============\n\n.. image:: https://badge.fury.io/py/pystock-crawler.png\n    :target: http://badge.fury.io/py/pystock-crawler\n\n.. image:: https://travis-ci.org/eliangcs/pystock-crawler.png?branch=master\n    :target: https://travis-ci.org/eliangcs/pystock-crawler\n\n.. image:: https://coveralls.io/repos/eliangcs/pystock-crawler/badge.png?branch=master\n    :target: https://coveralls.io/r/eliangcs/pystock-crawler\n\n``pystock-crawler`` is a utility for crawling historical data of US stocks,\nincluding:\n\n* Ticker symbols listed in NYSE, NASDAQ or AMEX from `NASDAQ.com`_\n* Daily prices from `Yahoo Finance`_\n* Fundamentals from 10-Q and 10-K filings (XBRL) on `SEC EDGAR`_\n\n\nExample Output\n--------------\n\nNYSE ticker symbols::\n\n    DDD   3D Systems Corporation\n    MMM   3M Company\n    WBAI  500.com Limited\n    ...\n\nApple's daily prices::\n\n    symbol,date,open,high,low,close,volume,adj_close\n    AAPL,2014-04-28,572.80,595.75,572.55,594.09,23890900,594.09\n    AAPL,2014-04-25,564.53,571.99,563.96,571.94,13922800,571.94\n    AAPL,2014-04-24,568.21,570.00,560.73,567.77,27092600,567.77\n    ...\n\nGoogle's fundamentals::\n\n    symbol,end_date,amend,period_focus,fiscal_year,doc_type,revenues,op_income,net_income,eps_basic,eps_diluted,dividend,assets,cur_assets,cur_liab,cash,equity,cash_flow_op,cash_flow_inv,cash_flow_fin\n    GOOG,2009-06-30,False,Q2,2009,10-Q,5522897000.0,1873894000.0,1484545000.0,4.7,4.66,0.0,35158760000.0,23834853000.0,2000962000.0,11911351000.0,31594856000.0,3858684000.0,-635974000.0,46354000.0\n    GOOG,2009-09-30,False,Q3,2009,10-Q,5944851000.0,2073718000.0,1638975000.0,5.18,5.13,0.0,37702845000.0,26353544000.0,2321774000.0,12087115000.0,33721753000.0,6584667000.0,-3245963000.0,74851000.0\n    GOOG,2009-12-31,False,FY,2009,10-K,23650563000.0,8312186000.0,6520448000.0,20.62,20.41,0.0,40496778000.0,29166958000.0,2747467000.0,10197588000.0,36004224000.0,9316198000.0,-8019205000.0,233412000.0\n    ...\n\n\nInstallation\n------------\n\nPrerequisites:\n\n* Python 2.7\n\n``pystock-crawler`` is based on Scrapy_, so you will also need to install\nprerequisites such as lxml_ and libffi_ for Scrapy and its dependencies. On\nUbuntu, for example, you can install them like this::\n\n    sudo apt-get update\n    sudo apt-get install -y gcc python-dev libffi-dev libssl-dev libxml2-dev libxslt1-dev build-essential\n\nSee `Scrapy's installation guide`_ for more details.\n\nAfter installing prerequisites, you can then install ``pystock-crawler`` with\n``pip``::\n\n    (sudo) pip install pystock-crawler\n\n\nQuickstart\n----------\n\n**Example 1.** Fetch Google's and Yahoo's daily prices ordered by date::\n\n    pystock-crawler prices GOOG,YHOO -o out.csv --sort\n\n**Example 2.** Fetch daily prices of all companies listed in\n``./symbols.txt``::\n\n    pystock-crawler prices ./symbols.txt -o out.csv\n\n**Example 3.** Fetch Facebook's fundamentals during 2013::\n\n    pystock-crawler reports FB -o out.csv -s 20130101 -e 20131231\n\n**Example 4.** Fetch fundamentals of all companies in ``./nyse.txt`` and direct\nthe log to ``./crawling.log``::\n\n    pystock-crawler reports ./nyse.txt -o out.csv -l ./crawling.log\n\n**Example 5.** Fetch all ticker symbols in NYSE, NASDAQ and AMEX::\n\n    pystock-crawler symbols NYSE,NASDAQ,AMEX -o out.txt\n\n\nUsage\n-----\n\nType ``pystock-crawler -h`` to see command help::\n\n    Usage:\n      pystock-crawler symbols <exchanges> (-o OUTPUT) [-l LOGFILE] [-w WORKING_DIR]\n                                          [--sort]\n      pystock-crawler prices <symbols> (-o OUTPUT) [-s YYYYMMDD] [-e YYYYMMDD]\n                                       [-l LOGFILE] [-w WORKING_DIR] [--sort]\n      pystock-crawler reports <symbols> (-o OUTPUT) [-s YYYYMMDD] [-e YYYYMMDD]\n                                        [-l LOGFILE] [-w WORKING_DIR]\n                                        [-b BATCH_SIZE] [--sort]\n      pystock-crawler (-h | --help)\n      pystock-crawler (-v | --version)\n\n    Options:\n      -h --help       Show this screen\n      -o OUTPUT       Output file\n      -s YYYYMMDD     Start date [default: ]\n      -e YYYYMMDD     End date [default: ]\n      -l LOGFILE      Log output [default: ]\n      -w WORKING_DIR  Working directory [default: .]\n      -b BATCH_SIZE   Batch size [default: 500]\n      --sort          Sort the result\n\nThere are three commands available:\n\n* ``pystock-crawler symbols`` grabs ticker symbol lists\n* ``pystock-crawler prices`` grabs daily prices\n* ``pystock-crawler reports`` grabs fundamentals\n\n``<exchanges>`` is a comma-separated string that specifies the stock exchanges\nyou want to include. Current, NYSE, NASDAQ and AMEX are supported.\n\nThe output file of ``pystock-crawler symbols`` can be used for ``<symbols>``\nargument in ``pystock-crawler prices`` and ``pystock-crawler reports``\ncommands.\n\n``<symbols>`` can be an inline string separated with commas or a text file\nthat lists symbols line by line. For example, the inline string can be\nsomething like ``AAPL,GOOG,FB``. And the text file may look like this::\n\n    # This line is comment\n    AAPL    Put anything you want here\n    GOOG    Since the text here is ignored\n    FB\n\nUse ``-o`` to specify the output file. For ``pystock-crawler symbols``\ncommand, the output format is a simple text file. For\n``pystock-crawler prices`` and ``pystock-crawler reports`` the output format\nis CSV.\n\n``-l`` is where the crawling logs go to. If not specified, the logs go to\nstdout.\n\nBy default, the crawler uses the current directory as the working directory.\nIf you don't want to use the current directoy, you can specify it with ``-w``\noption. The crawler keeps HTTP cache in a directory named ``.scrapy`` under\nthe working directory. The cache can save your time by avoid downloading the\nsame web pages. However, the cache can be quite huge. If you don't need it,\njust delete the ``.scrapy`` directory after you've done crawling.\n\n``-b`` option is only available to ``pystock-crawler reports`` command. It\nallows you to split a large symbol list into smaller batches. This is actually\na workaround for an unresolved bug (#2). Normally you don't have to specify\nthis option. Default value (500) works just fine.\n\nThe rows in the output file are in an arbitrary order by default. Use\n``--sort`` option to sort them by symbols and dates. But if you have a large\noutput file, don't use --sort because it will be slow and eat a lot of memory.\n\n\nDeveloper Guide\n---------------\n\nInstalling Dependencies\n~~~~~~~~~~~~~~~~~~~~~~~\n::\n\n    pip install -r requirements.txt\n\n\nRunning Test\n~~~~~~~~~~~~\n\nInstall test requirements::\n\n    pip install -r requirements-test.txt\n\nThen run the test::\n\n    py.test\n\nThis will download the test data (a lot of XML/XBRL files) from from\n`SEC EDGAR`_ on the fly, so it will take some time and disk space. The test\ndata is saved to ``pystock_crawler/tests/sample_data`` directory. It can be\nreused on the next time you run the test. If you don't need them, just delete\nthe ``sample_data`` directory.\n\n\n.. _libffi: https://sourceware.org/libffi/\n.. _lxml: http://lxml.de/\n.. _NASDAQ.com: http://www.nasdaq.com/\n.. _Scrapy: http://scrapy.org/\n.. _Scrapy's installation guide: http://doc.scrapy.org/en/latest/intro/install.html\n.. _SEC EDGAR: http://www.sec.gov/edgar/searchedgar/companysearch.html\n.. _virtualenv: http://www.virtualenv.org/\n.. _virtualenvwrapper: http://virtualenvwrapper.readthedocs.org/\n.. _Yahoo Finance: http://finance.yahoo.com/\n"
  },
  {
    "path": "bin/pystock-crawler",
    "content": "#!/usr/bin/env python\n'''\nUsage:\n  pystock-crawler symbols <exchanges> (-o OUTPUT) [-l LOGFILE] [-w WORKING_DIR]\n                                      [--sort]\n  pystock-crawler prices <symbols> (-o OUTPUT) [-s YYYYMMDD] [-e YYYYMMDD]\n                                   [-l LOGFILE] [-w WORKING_DIR] [--sort]\n  pystock-crawler reports <symbols> (-o OUTPUT) [-s YYYYMMDD] [-e YYYYMMDD]\n                                    [-l LOGFILE] [-w WORKING_DIR]\n                                    [-b BATCH_SIZE] [--sort]\n  pystock-crawler (-h | --help)\n  pystock-crawler (-v | --version)\n\nOptions:\n  -h --help       Show this screen\n  -o OUTPUT       Output file\n  -s YYYYMMDD     Start date [default: ]\n  -e YYYYMMDD     End date [default: ]\n  -l LOGFILE      Log output [default: ]\n  -w WORKING_DIR  Working directory [default: .]\n  -b BATCH_SIZE   Batch size [default: 500]\n  --sort          Sort the result\n\n'''\nimport codecs\nimport math\nimport os\nimport sys\nimport uuid\n\nfrom contextlib import contextmanager\nfrom docopt import docopt\nfrom scrapy import log\n\ntry:\n    import pystock_crawler\nexcept ImportError:\n    # For development environment\n    sys.path.append(os.getcwd())\n    import pystock_crawler\n\n\ndef random_string(length=5):\n    return uuid.uuid4().get_hex()[0:5]\n\n\n@contextmanager\ndef tmp_scrapy_cfg():\n    content = '''# pystock_crawler scrapy.cfg\n[settings]\ndefault = pystock_crawler.settings\n\n[deploy]\n#url = http://localhost:6800/\nproject = pystock_crawler\n'''\n    filename = os.path.abspath('./scrapy.cfg')\n    filename_bak = os.path.abspath('./scrapy-%s.cfg' % random_string())\n    if os.path.exists(filename):\n        log.msg(u'Renaming %s -> %s' % (filename, filename_bak))\n        os.rename(filename, filename_bak)\n    assert not os.path.exists(filename)\n    log.msg(u'Creating temporary config: %s' % filename)\n    with open(filename, 'w') as f:\n        f.write(content)\n\n    yield\n\n    if os.path.exists(filename):\n        log.msg(u'Deleting %s' % filename)\n        os.remove(filename)\n    if os.path.exists(filename_bak):\n        log.msg(u'Renaming %s -> %s' % (filename_bak, filename))\n        os.rename(filename_bak, filename)\n\n\ndef run_scrapy_command(cmd):\n    log.msg('Command: %s' % cmd)\n    with tmp_scrapy_cfg():\n        os.system(cmd)\n\n\ndef count_symbols(symbols):\n    if os.path.exists(symbols):\n        # If `symbols` is a file\n        with open(symbols) as f:\n            count = 0\n            for line in f:\n                line = line.rstrip()\n                if line and not line.startswith('#'):\n                    count += 1\n        return count\n\n    # If `symbols` is a comma-separated string\n    return len(symbols.split(','))\n\n\ndef merge_files(target, sources, ignore_header=False):\n    log.msg(u'Merging files to %s' % target)\n    with codecs.open(target, 'w', 'utf-8') as out:\n        for i, source in enumerate(sources):\n            with codecs.open(source, 'r', 'utf-8') as f:\n                if ignore_header and i > 0:\n                    try:\n                        f.next()  # Ignore CSV header\n                    except StopIteration:\n                        break  # Empty file\n                out.write(f.read())\n\n    # Delete source files\n    for filename in sources:\n        log.msg(u'Deleting %s' % filename)\n        os.remove(filename)\n\n\ndef crawl_symbols(exchanges, output, log_file):\n    command = 'scrapy crawl nasdaq -a exchanges=\"%s\" -t symbollist' % exchanges\n\n    if output:\n        command += ' -o \"%s\"' % output\n    if log_file:\n        command += ' -s LOG_FILE=\"%s\"' % log_file\n\n    run_scrapy_command(command)\n\n\ndef crawl(spider, symbols, start_date, end_date, output, log_file, batch_size):\n    command = 'scrapy crawl %s -a symbols=\"%s\" -t csv' % (spider, symbols)\n\n    if start_date:\n        command += ' -a startdate=%s' % start_date\n    if end_date:\n        command += ' -a enddate=%s' % end_date\n    if log_file:\n        command += ' -s LOG_FILE=\"%s\"' % log_file\n\n    if spider == 'edgar':\n        # When crawling edgar filings, run the scrapy command batch by batch to\n        # work around issue #2\n        num_symbols = count_symbols(symbols)\n        num_batches = int(math.ceil(num_symbols / float(batch_size)))\n\n        # Store sub-files so we can merge them later\n        output_files = []\n\n        for i in xrange(num_batches):\n            start = i * batch_size\n            batch_cmd = command + ' -a limit=%d,%d' % (start, batch_size)\n            if output:\n                filename = '%s.%d' % (output, i + 1)\n                batch_cmd += ' -o \"%s\"' % filename\n                output_files.append(filename)\n\n            run_scrapy_command(batch_cmd)\n\n        merge_files(output, output_files, ignore_header=True)\n    else:\n        if output:\n            command += ' -o \"%s\"' % output\n        run_scrapy_command(command)\n\n\ndef sort_symbols(filename):\n    log.msg(u'Sorting: %s' % filename)\n\n    with codecs.open(filename, 'r', 'utf-8') as f:\n        lines = [line for line in f]\n\n    lines = sorted(lines)\n\n    with codecs.open(filename, 'w', 'utf-8') as f:\n        f.writelines(lines)\n\n    log.msg(u'Sorted: %s' % filename)\n\n\ndef sort_csv(filename):\n    log.msg(u'Sorting: %s' % filename)\n\n    with codecs.open(filename, 'r', 'utf-8') as f:\n        try:\n            headers = f.next()\n        except StopIteration:\n            log.msg(u'No need to sort empty file: %s' % filename)\n            return\n        lines = [line for line in f]\n\n    def line_cmp(line1, line2):\n        a = line1.split(',')\n        b = line2.split(',')\n        length = min(len(a), len(b))\n        i = 0\n        while 1:\n            result = cmp(a[i], b[i])\n            if result or i >= length:\n                return result\n            i += 1\n\n    lines = sorted(lines, cmp=line_cmp)\n\n    with codecs.open(filename, 'w', 'utf-8') as f:\n        f.write(headers)\n        f.writelines(lines)\n\n    log.msg(u'Sorted: %s' % filename)\n\n\ndef print_version():\n    print 'pystock-crawler %s' % pystock_crawler.__version__\n\n\ndef main():\n    args = docopt(__doc__)\n\n    symbols = args.get('<symbols>')\n    start_date = args.get('-s')\n    end_date = args.get('-e')\n    output = args.get('-o')\n    log_file = args.get('-l')\n    batch_size = args.get('-b')\n    sorting = args.get('--sort')\n    working_dir = args.get('-w')\n\n    if args['prices']:\n        spider = 'yahoo'\n    elif args['reports']:\n        spider = 'edgar'\n    else:\n        spider = None\n\n    if symbols and os.path.exists(symbols):\n        symbols = os.path.abspath(symbols)\n    if output:\n        output = os.path.abspath(output)\n    if log_file:\n        log_file = os.path.abspath(log_file)\n\n    try:\n        batch_size = int(batch_size)\n        if batch_size <= 0:\n            raise ValueError\n    except ValueError:\n        raise ValueError(\"BATCH_SIZE must be a positive integer, input is '%s'\" % batch_size)\n\n    try:\n        os.chdir(working_dir)\n    except OSError as err:\n        sys.stderr.write('%s\\n' % err)\n        return\n\n    if spider:\n        log.start(logfile=log_file)\n        crawl(spider, symbols, start_date, end_date, output, log_file, batch_size)\n        if sorting and output:\n            sort_csv(output)\n    elif args['symbols']:\n        log.start(logfile=log_file)\n        exchanges = args.get('<exchanges>')\n        crawl_symbols(exchanges, output, log_file)\n        if sorting and output:\n            sort_symbols(output)\n    elif args['-v'] or args['--version']:\n        print_version()\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "pystock_crawler/__init__.py",
    "content": "__version__ = '0.8.2'\n"
  },
  {
    "path": "pystock_crawler/exporters.py",
    "content": "from scrapy.conf import settings\nfrom scrapy.contrib.exporter import BaseItemExporter, CsvItemExporter\n\n\nclass CsvItemExporter2(CsvItemExporter):\n    '''\n    The standard CsvItemExporter class does not pass the kwargs through to the\n    CSV writer, resulting in EXPORT_FIELDS and EXPORT_ENCODING being ignored\n    (EXPORT_EMPTY is not used by CSV).\n\n    http://stackoverflow.com/questions/6943778/python-scrapy-how-to-get-csvitemexporter-to-write-columns-in-a-specific-order\n\n    '''\n    def __init__(self, *args, **kwargs):\n        kwargs['fields_to_export'] = settings.getlist('EXPORT_FIELDS') or None\n        kwargs['encoding'] = settings.get('EXPORT_ENCODING', 'utf-8')\n\n        super(CsvItemExporter2, self).__init__(*args, **kwargs)\n\n    def _write_headers_and_set_fields_to_export(self, item):\n        # HACK: Override this private method to filter fields that are in\n        # fields_to_export but not in item\n        if self.include_headers_line:\n            item_fields = item.fields.keys()\n            if self.fields_to_export:\n                self.fields_to_export = filter(lambda a: a in item_fields, self.fields_to_export)\n            else:\n                self.fields_to_export = item_fields\n            self.csv_writer.writerow(self.fields_to_export)\n\n\nclass SymbolListExporter(BaseItemExporter):\n\n    def __init__(self, file, **kwargs):\n        self._configure(kwargs, dont_fail=True)\n        self.file = file\n\n    def export_item(self, item):\n        self.file.write('%s\\t%s\\n' % (item['symbol'], item['name']))\n"
  },
  {
    "path": "pystock_crawler/items.py",
    "content": "# Define here the models for your scraped items\n#\n# See documentation in:\n# http://doc.scrapy.org/en/latest/topics/items.html\n\nfrom scrapy.item import Item, Field\n\n\nclass ReportItem(Item):\n    # Trading symbol\n    symbol = Field()\n\n    # If this doc is an amendment to previously filed doc\n    amend = Field()\n\n    # Quarterly (10-Q) or annual (10-K) report\n    doc_type = Field()\n\n    # Q1, Q2, Q3, or FY for annual report\n    period_focus = Field()\n\n    fiscal_year = Field()\n    end_date = Field()\n\n    revenues = Field()\n    op_income = Field()\n    net_income = Field()\n\n    eps_basic = Field()\n    eps_diluted = Field()\n\n    dividend = Field()\n\n    # Balance sheet stuffs\n    assets = Field()\n    cur_assets = Field()\n    cur_liab = Field()\n    equity = Field()\n    cash = Field()\n\n    # Cash flow from operating, investing, and financing\n    cash_flow_op = Field()\n    cash_flow_inv = Field()\n    cash_flow_fin = Field()\n\n\nclass PriceItem(Item):\n    # Trading symbol\n    symbol = Field()\n\n    # YYYY-MM-DD\n    date = Field()\n\n    open = Field()\n    close = Field()\n    high = Field()\n    low = Field()\n    adj_close = Field()\n    volume = Field()\n\n\nclass SymbolItem(Item):\n    symbol = Field()\n    name = Field()\n"
  },
  {
    "path": "pystock_crawler/loaders.py",
    "content": "import re\n\nfrom datetime import datetime, timedelta\nfrom scrapy import log\nfrom scrapy.contrib.loader import ItemLoader\nfrom scrapy.contrib.loader.processor import Compose, MapCompose, TakeFirst\nfrom scrapy.utils.misc import arg_to_iter\nfrom scrapy.utils.python import flatten\n\nfrom pystock_crawler.items import ReportItem\n\n\nDATE_FORMAT = '%Y-%m-%d'\n\nMAX_PER_SHARE_VALUE = 1000.0\n\n# If number of characters of response body exceeds this value,\n# remove some useless text defined by RE_XML_GARBAGE to reduce memory usage\nTHRESHOLD_TO_CLEAN = 20000000\n\n# Used to get rid of \"<tag>LONG STRING...</tag>\"\nRE_XML_GARBAGE = re.compile(r'>([^<]{100,})<')\n\n\nclass IntermediateValue(object):\n    '''\n    Intermediate data that serves as output of input processors, i.e., input\n    of output processors. \"Intermediate\" is shorten as \"imd\" in later naming.\n\n    '''\n    def __init__(self, local_name, value, text, context, node=None, start_date=None,\n                 end_date=None, instant=None):\n        self.local_name = local_name\n        self.value = value\n        self.text = text\n        self.context = context\n        self.node = node\n        self.start_date = start_date\n        self.end_date = end_date\n        self.instant = instant\n\n    def __cmp__(self, other):\n        if self.value < other.value:\n            return -1\n        elif self.value > other.value:\n            return 1\n        return 0\n\n    def __repr__(self):\n        context_id = None\n        if self.context:\n            context_id = self.context.xpath('@id')[0].extract()\n        return '(%s, %s, %s)' % (self.local_name, self.value, context_id)\n\n    def is_member(self):\n        return is_member(self.context)\n\n\nclass ExtractText(object):\n\n    def __call__(self, value):\n        if hasattr(value, 'select'):\n            try:\n                return value.xpath('./text()')[0].extract()\n            except IndexError:\n                return ''\n        return unicode(value)\n\n\nclass MatchEndDate(object):\n\n    def __init__(self, data_type=str, ignore_date_range=False):\n        self.data_type = data_type\n        self.ignore_date_range = ignore_date_range\n\n    def __call__(self, value, loader_context):\n        if not hasattr(value, 'select'):\n            return IntermediateValue('', 0.0, '0', None)\n\n        doc_end_date_str = loader_context['end_date']\n        doc_type = loader_context['doc_type']\n        selector = loader_context['selector']\n\n        context_id = value.xpath('@contextRef')[0].extract()\n        try:\n            context = selector.xpath('//*[@id=\"%s\"]' % context_id)[0]\n        except IndexError:\n            try:\n                url = loader_context['response'].url\n            except KeyError:\n                url = None\n            log.msg(u'Cannot find context: %s in %s' % (context_id, url), log.WARNING)\n            return None\n\n        date = instant = start_date = end_date = None\n        try:\n            instant = context.xpath('.//*[local-name()=\"instant\"]/text()')[0].extract().strip()\n        except (IndexError, ValueError):\n            try:\n                end_date_str = context.xpath('.//*[local-name()=\"endDate\"]/text()')[0].extract().strip()\n                end_date = datetime.strptime(end_date_str, DATE_FORMAT)\n\n                start_date_str = context.xpath('.//*[local-name()=\"startDate\"]/text()')[0].extract().strip()\n                start_date = datetime.strptime(start_date_str, DATE_FORMAT)\n\n                if self.ignore_date_range or date_range_matches_doc_type(doc_type, start_date, end_date):\n                    date = end_date\n            except (IndexError, ValueError):\n                pass\n        else:\n            try:\n                instant = datetime.strptime(instant, DATE_FORMAT)\n            except ValueError:\n                pass\n            else:\n                date = instant\n\n        if date:\n            doc_end_date = datetime.strptime(doc_end_date_str, DATE_FORMAT)\n            delta_days = (doc_end_date - date).days\n            if abs(delta_days) < 30:\n                try:\n                    text = value.xpath('./text()')[0].extract()\n                    val = self.data_type(text)\n                except (IndexError, ValueError):\n                    pass\n                else:\n                    local_name = value.xpath('local-name()')[0].extract()\n                    return IntermediateValue(\n                        local_name, val, text, context, value,\n                        start_date=start_date, end_date=end_date, instant=instant)\n\n        return None\n\n\nclass ImdSumMembersOr(object):\n\n    def __init__(self, second_func=None):\n        self.second_func = second_func\n\n    def __call__(self, imd_values):\n        members = []\n        non_members = []\n        for imd_value in imd_values:\n            if imd_value.is_member():\n                members.append(imd_value)\n            else:\n                non_members.append(imd_value)\n\n        if members and len(members) == len(imd_values):\n            return imd_sum(members)\n\n        if imd_values:\n            return self.second_func(non_members)\n        return None\n\n\ndef date_range_matches_doc_type(doc_type, start_date, end_date):\n    delta_days = (end_date - start_date).days\n    return ((doc_type == '10-Q' and delta_days < 120 and delta_days > 60) or\n            (doc_type == '10-K' and delta_days < 380 and delta_days > 350))\n\n\ndef get_amend(values):\n    if values:\n        return values[0]\n    return False\n\n\ndef get_symbol(values):\n    if values:\n        symbols = map(lambda s: s.strip(), values[0].split(','))\n        return '/'.join(symbols)\n    return False\n\n\ndef imd_max(imd_values):\n    if imd_values:\n        imd_value = max(imd_values)\n        return imd_value.value\n    return None\n\n\ndef imd_min(imd_values):\n    if imd_values:\n        imd_value = min(imd_values)\n        return imd_value.value\n    return None\n\n\ndef imd_sum(imd_values):\n    return sum([v.value for v in imd_values])\n\n\ndef imd_get_revenues(imd_values):\n    interest_elems = filter(lambda v: 'interest' in v.local_name.lower(), imd_values)\n    if len(interest_elems) == len(imd_values):\n        # HACK: An exceptional case for BBT\n        # Revenues = InterestIncome + NoninterestIncome\n        return imd_sum(imd_values)\n\n    return imd_max(imd_values)\n\n\ndef imd_get_net_income(imd_values):\n    return imd_min(imd_values)\n\n\ndef imd_get_op_income(imd_values):\n    imd_values = filter(lambda v: memberness(v.context) < 2, imd_values)\n    return imd_min(imd_values)\n\n\ndef imd_get_cash_flow(imd_values, loader_context):\n    if len(imd_values) == 1:\n        return imd_values[0].value\n\n    doc_type = loader_context['doc_type']\n\n    within_date_range = []\n    for imd_value in imd_values:\n        if imd_value.start_date and imd_value.end_date:\n            if date_range_matches_doc_type(doc_type, imd_value.start_date, imd_value.end_date):\n                within_date_range.append(imd_value)\n\n    if within_date_range:\n        return imd_max(within_date_range)\n\n    return imd_max(imd_values)\n\n\ndef imd_get_per_share_value(imd_values):\n    if not imd_values:\n        return None\n\n    v = imd_values[0]\n    value = v.value\n    if abs(value) > MAX_PER_SHARE_VALUE:\n        try:\n            decimals = int(v.node.xpath('@decimals')[0].extract())\n        except (AttributeError, IndexError, ValueError):\n            return None\n        else:\n            # HACK: some of LTD's reports have unreasonablely large per share value, such as\n            # 320000 EPS (and it should be 0.32), so use decimals attribute to scale it down,\n            # note that this is NOT a correct way to interpret decimals attribute\n            value *= pow(10, decimals - 2)\n    return value if abs(value) <= MAX_PER_SHARE_VALUE else None\n\n\ndef imd_get_equity(imd_values):\n    if not imd_values:\n        return None\n\n    values = filter(lambda v: v.local_name == 'StockholdersEquityIncludingPortionAttributableToNoncontrollingInterest', imd_values)\n    if values:\n        return values[0].value\n\n    values = filter(lambda v: v.local_name == 'StockholdersEquity', imd_values)\n    if values:\n        return values[0].value\n\n    return imd_values[0].value\n\n\ndef imd_filter_member(imd_values):\n    if imd_values:\n        with_memberness = [(v, memberness(v.context)) for v in imd_values]\n        with_memberness = sorted(with_memberness, cmp=lambda a, b: a[1] - b[1])\n\n        m0 = with_memberness[0][1]\n        non_members = []\n\n        for v in with_memberness:\n            if v[1] == m0:\n                non_members.append(v[0])\n\n        return non_members\n\n    return imd_values\n\n\ndef imd_mult(imd_values):\n    for v in imd_values:\n        try:\n            node_id = v.node.xpath('@id')[0].extract().lower()\n        except (AttributeError, IndexError):\n            pass\n        else:\n            # HACK: some of LUV's reports have unreasonablely small numbers such as\n            # 4136 in revenues which should be 4136 millions, this hack uses id attribute\n            # to determine if it should be scaled up\n            if 'inmillions' in node_id and abs(v.value) < 100000.0:\n                v.value *= 1000000.0\n            elif 'inthousands' in node_id and abs(v.value) < 100000000.0:\n                v.value *= 1000.0\n    return imd_values\n\n\ndef memberness(context):\n    '''The likelihood that the context is a \"member\".'''\n    if context:\n        texts = context.xpath('.//*[local-name()=\"explicitMember\"]/text()').extract()\n        text = str(texts).lower()\n\n        if len(texts) > 1:\n            return 2\n        elif 'country' in text:\n            return 2\n        elif 'member' not in text:\n            return 0\n        elif 'successor' in text:\n            # 'SuccessorMember' is a rare case that shouldn't be treated as member\n            return 1\n        elif 'parent' in text:\n            return 2\n    return 3\n\n\ndef is_member(context):\n    if context:\n        texts = context.xpath('.//*[local-name()=\"explicitMember\"]/text()').extract()\n        text = str(texts).lower()\n\n        # 'SuccessorMember' is a rare case that shouldn't be treated as member\n        if 'member' not in text or 'successor' in text or 'parent' in text:\n            return False\n    return True\n\n\ndef str_to_bool(value):\n    if hasattr(value, 'lower'):\n        value = value.lower()\n        return bool(value) and value != 'false' and value != '0'\n    return bool(value)\n\n\ndef find_namespace(xxs, name):\n    name_re = name.replace('-', '\\-')\n    if not name_re.startswith('xmlns'):\n        name_re = 'xmlns:' + name_re\n    return xxs.re('%s=\\\"([^\\\"]+)\\\"' % name_re)[0]\n\n\ndef register_namespace(xxs, name):\n    ns = find_namespace(xxs, name)\n    xxs.register_namespace(name, ns)\n\n\ndef register_namespaces(xxs):\n    names = ('xmlns', 'xbrli', 'dei', 'us-gaap')\n    for name in names:\n        try:\n            register_namespace(xxs, name)\n        except IndexError:\n            pass\n\n\nclass XmlXPathItemLoader(ItemLoader):\n\n    def __init__(self, *args, **kwargs):\n        super(XmlXPathItemLoader, self).__init__(*args, **kwargs)\n        register_namespaces(self.selector)\n\n    def add_xpath(self, field_name, xpath, *processors, **kw):\n        values = self._get_values(xpath, **kw)\n        self.add_value(field_name, values, *processors, **kw)\n        return len(self._values[field_name])\n\n    def add_xpaths(self, name, paths):\n        for path in paths:\n            match_count = self.add_xpath(name, path)\n            if match_count > 0:\n                return match_count\n\n        return 0\n\n    def _get_values(self, xpaths, **kw):\n        xpaths = arg_to_iter(xpaths)\n        return flatten([self.selector.xpath(xpath) for xpath in xpaths])\n\n\nclass ReportItemLoader(XmlXPathItemLoader):\n\n    default_item_class = ReportItem\n    default_output_processor = TakeFirst()\n\n    symbol_in = MapCompose(ExtractText(), unicode.upper)\n    symbol_out = Compose(get_symbol)\n\n    amend_in = MapCompose(ExtractText(), str_to_bool)\n    amend_out = Compose(get_amend)\n\n    period_focus_in = MapCompose(ExtractText(), unicode.upper)\n    period_focus_out = TakeFirst()\n\n    revenues_in = MapCompose(MatchEndDate(float))\n    revenues_out = Compose(imd_filter_member, imd_mult, ImdSumMembersOr(imd_get_revenues))\n\n    net_income_in = MapCompose(MatchEndDate(float))\n    net_income_out = Compose(imd_filter_member, imd_mult, imd_get_net_income)\n\n    op_income_in = MapCompose(MatchEndDate(float))\n    op_income_out = Compose(imd_filter_member, imd_mult, imd_get_op_income)\n\n    eps_basic_in = MapCompose(MatchEndDate(float))\n    eps_basic_out = Compose(ImdSumMembersOr(imd_get_per_share_value), lambda x: x if x < MAX_PER_SHARE_VALUE else None)\n\n    eps_diluted_in = MapCompose(MatchEndDate(float))\n    eps_diluted_out = Compose(ImdSumMembersOr(imd_get_per_share_value), lambda x: x if x < MAX_PER_SHARE_VALUE else None)\n\n    dividend_in = MapCompose(MatchEndDate(float))\n    dividend_out = Compose(imd_get_per_share_value, lambda x: x if x < MAX_PER_SHARE_VALUE and x > 0.0 else 0.0)\n\n    assets_in = MapCompose(MatchEndDate(float))\n    assets_out = Compose(imd_filter_member, imd_mult, imd_max)\n\n    cur_assets_in = MapCompose(MatchEndDate(float))\n    cur_assets_out = Compose(imd_filter_member, imd_mult, imd_max)\n\n    cur_liab_in = MapCompose(MatchEndDate(float))\n    cur_liab_out = Compose(imd_filter_member, imd_mult, imd_max)\n\n    equity_in = MapCompose(MatchEndDate(float))\n    equity_out = Compose(imd_filter_member, imd_mult, imd_get_equity)\n\n    cash_in = MapCompose(MatchEndDate(float))\n    cash_out = Compose(imd_filter_member, imd_mult, imd_max)\n\n    cash_flow_op_in = MapCompose(MatchEndDate(float, True))\n    cash_flow_op_out = Compose(imd_filter_member, imd_mult, imd_get_cash_flow)\n\n    cash_flow_inv_in = MapCompose(MatchEndDate(float, True))\n    cash_flow_inv_out = Compose(imd_filter_member, imd_mult, imd_get_cash_flow)\n\n    cash_flow_fin_in = MapCompose(MatchEndDate(float, True))\n    cash_flow_fin_out = Compose(imd_filter_member, imd_mult, imd_get_cash_flow)\n\n    def __init__(self, *args, **kwargs):\n        response = kwargs.get('response')\n        if len(response.body) > THRESHOLD_TO_CLEAN:\n            # Remove some useless text to reduce memory usage\n            body, __ = RE_XML_GARBAGE.subn(lambda m: '><', response.body)\n            response = response.replace(body=body)\n            kwargs['response'] = response\n\n        super(ReportItemLoader, self).__init__(*args, **kwargs)\n\n        symbol = self._get_symbol()\n        end_date = self._get_doc_end_date()\n        fiscal_year = self._get_doc_fiscal_year()\n        doc_type = self._get_doc_type()\n\n        # ignore document that is not 10-Q or 10-K\n        if not (doc_type and doc_type.split('/')[0] in ('10-Q', '10-K')):\n            return\n\n        # some documents set their amendment flag in DocumentType, e.g., '10-Q/A',\n        # instead of setting it in AmendmentFlag\n        amend = None\n        if doc_type.endswith('/A'):\n            amend = True\n            doc_type = doc_type[0:-2]\n\n        self.context.update({\n            'end_date': end_date,\n            'doc_type': doc_type\n        })\n\n        self.add_xpath('symbol', '//dei:TradingSymbol')\n        self.add_value('symbol', symbol)\n\n        if amend:\n            self.add_value('amend', True)\n        else:\n            self.add_xpath('amend', '//dei:AmendmentFlag')\n\n        if doc_type == '10-K':\n            period_focus = 'FY'\n        else:\n            period_focus = self._get_period_focus(end_date)\n\n        if not fiscal_year and period_focus:\n            fiscal_year = self._guess_fiscal_year(end_date, period_focus)\n\n        self.add_value('period_focus', period_focus)\n        self.add_value('fiscal_year', fiscal_year)\n        self.add_value('end_date', end_date)\n        self.add_value('doc_type', doc_type)\n\n        self.add_xpaths('revenues', [\n            '//us-gaap:SalesRevenueNet',\n            '//us-gaap:Revenues',\n            '//us-gaap:SalesRevenueGoodsNet',\n            '//us-gaap:SalesRevenueServicesNet',\n            '//us-gaap:RealEstateRevenueNet',\n            '//*[local-name()=\"NetRevenuesIncludingNetInterestIncome\"]',\n            '//*[contains(local-name(), \"TotalRevenues\") and contains(local-name(), \"After\")]',\n            '//*[contains(local-name(), \"TotalRevenues\")]',\n            '//*[local-name()=\"InterestAndDividendIncomeOperating\" or local-name()=\"NoninterestIncome\"]',\n            '//*[contains(local-name(), \"Revenue\")]'\n        ])\n        self.add_xpath('revenues', '//us-gaap:FinancialServicesRevenue')\n\n        self.add_xpaths('net_income', [\n            '//*[contains(local-name(), \"NetLossIncome\") and contains(local-name(), \"Corporation\")]',\n            '//*[local-name()=\"NetIncomeLossAvailableToCommonStockholdersBasic\" or local-name()=\"NetIncomeLoss\"]',\n            '//us-gaap:ProfitLoss',\n            '//us-gaap:IncomeLossFromContinuingOperations',\n            '//*[contains(local-name(), \"IncomeLossFromContinuingOperations\") and not(contains(local-name(), \"Per\"))]',\n            '//*[contains(local-name(), \"NetIncomeLoss\")]',\n            '//*[starts-with(local-name(), \"NetIncomeAttributableTo\")]'\n        ])\n\n        self.add_xpaths('op_income', [\n            '//us-gaap:OperatingIncomeLoss'\n        ])\n\n        self.add_xpaths('eps_basic', [\n            '//us-gaap:EarningsPerShareBasic',\n            '//us-gaap:IncomeLossFromContinuingOperationsPerBasicShare',\n            '//us-gaap:IncomeLossFromContinuingOperationsPerBasicAndDilutedShare',\n            '//*[contains(local-name(), \"NetIncomeLoss\") and contains(local-name(), \"Per\") and contains(local-name(), \"Common\")]',\n            '//*[contains(local-name(), \"Earnings\") and contains(local-name(), \"Per\") and contains(local-name(), \"Basic\")]',\n            '//*[local-name()=\"IncomePerShareFromContinuingOperationsAvailableToCompanyStockholdersBasicAndDiluted\"]',\n            '//*[contains(local-name(), \"NetLossPerShare\")]',\n            '//*[contains(local-name(), \"NetIncome\") and contains(local-name(), \"Per\") and contains(local-name(), \"Basic\")]',\n            '//*[local-name()=\"BasicEarningsAttributableToStockholdersPerCommonShare\"]',\n            '//*[local-name()=\"Earningspersharebasicanddiluted\"]',\n            '//*[contains(local-name(), \"PerCommonShareBasicAndDiluted\")]',\n            '//*[local-name()=\"NetIncomeLossAttributableToCommonStockholdersBasicAndDiluted\"]',\n            '//us-gaap:NetIncomeLossAvailableToCommonStockholdersBasic',\n            '//*[local-name()=\"NetIncomeLossEPS\"]',\n            '//*[local-name()=\"NetLoss\"]'\n        ])\n\n        self.add_xpaths('eps_diluted', [\n            '//us-gaap:EarningsPerShareDiluted',\n            '//us-gaap:IncomeLossFromContinuingOperationsPerDilutedShare',\n            '//us-gaap:IncomeLossFromContinuingOperationsPerBasicAndDilutedShare',\n            '//*[contains(local-name(), \"Earnings\") and contains(local-name(), \"Per\") and contains(local-name(), \"Diluted\")]',\n            '//*[local-name()=\"IncomePerShareFromContinuingOperationsAvailableToCompanyStockholdersBasicAndDiluted\"]',\n            '//*[contains(local-name(), \"NetLossPerShare\")]',\n            '//*[contains(local-name(), \"NetIncome\") and contains(local-name(), \"Per\") and contains(local-name(), \"Diluted\")]',\n            '//*[local-name()=\"DilutedEarningsAttributableToStockholdersPerCommonShare\"]',\n            '//us-gaap:NetIncomeLossAvailableToCommonStockholdersDiluted',\n            '//*[contains(local-name(), \"PerCommonShareBasicAndDiluted\")]',\n            '//*[local-name()=\"NetIncomeLossAttributableToCommonStockholdersBasicAndDiluted\"]',\n            '//us-gaap:EarningsPerShareBasic',\n            '//*[local-name()=\"NetIncomeLossEPS\"]',\n            '//*[local-name()=\"NetLoss\"]'\n        ])\n\n        self.add_xpaths('dividend', [\n            '//us-gaap:CommonStockDividendsPerShareDeclared',\n            '//us-gaap:CommonStockDividendsPerShareCashPaid'\n        ])\n\n        # if dividend isn't found in doc, assume it's 0\n        self.add_value('dividend', 0.0)\n\n        self.add_xpaths('assets', [\n            '//us-gaap:Assets',\n            '//us-gaap:AssetsNet',\n            '//us-gaap:LiabilitiesAndStockholdersEquity'\n        ])\n\n        self.add_xpaths('cur_assets', [\n            '//us-gaap:AssetsCurrent'\n        ])\n\n        self.add_xpaths('cur_liab', [\n            '//us-gaap:LiabilitiesCurrent'\n        ])\n\n        self.add_xpaths('equity', [\n            '//*[local-name()=\"StockholdersEquityIncludingPortionAttributableToNoncontrollingInterest\" or local-name()=\"StockholdersEquity\"]',\n            '//*[local-name()=\"TotalCommonShareholdersEquity\"]',\n            '//*[local-name()=\"CommonShareholdersEquity\"]',\n            '//*[local-name()=\"CommonStockEquity\"]',\n            '//*[local-name()=\"TotalEquity\"]',\n            '//us-gaap:RetainedEarningsAccumulatedDeficit',\n            '//*[contains(local-name(), \"MembersEquityIncludingPortionAttributableToNoncontrollingInterest\")]',\n            '//us-gaap:CapitalizationLongtermDebtAndEquity',\n            '//*[local-name()=\"TotalCapitalization\"]'\n        ])\n\n        self.add_xpaths('cash', [\n            '//us-gaap:CashCashEquivalentsAndFederalFundsSold',\n            '//us-gaap:CashAndDueFromBanks',\n            '//us-gaap:CashAndCashEquivalentsAtCarryingValue',\n            '//us-gaap:Cash',\n            '//*[local-name()=\"CashAndCashEquivalents\"]',\n            '//*[contains(local-name(), \"CarryingValueOfCashAndCashEquivalents\")]',\n            '//*[contains(local-name(), \"CashCashEquivalents\")]',\n            '//*[contains(local-name(), \"CashAndCashEquivalents\")]'\n        ])\n\n        self.add_xpaths('cash_flow_op', [\n            '//us-gaap:NetCashProvidedByUsedInOperatingActivities',\n            '//us-gaap:NetCashProvidedByUsedInOperatingActivitiesContinuingOperations'\n        ])\n\n        self.add_xpaths('cash_flow_inv', [\n            '//us-gaap:NetCashProvidedByUsedInInvestingActivities',\n            '//us-gaap:NetCashProvidedByUsedInInvestingActivitiesContinuingOperations'\n        ])\n\n        self.add_xpaths('cash_flow_fin', [\n            '//us-gaap:NetCashProvidedByUsedInFinancingActivities',\n            '//us-gaap:NetCashProvidedByUsedInFinancingActivitiesContinuingOperations'\n        ])\n\n    def _get_symbol(self):\n        try:\n            filename = self.context['response'].url.split('/')[-1]\n            return filename.split('-')[0].upper()\n        except IndexError:\n            return None\n\n    def _get_doc_fiscal_year(self):\n        try:\n            fiscal_year = self.selector.xpath('//dei:DocumentFiscalYearFocus/text()')[0].extract()\n            return int(fiscal_year)\n        except (IndexError, ValueError):\n            return None\n\n    def _guess_fiscal_year(self, end_date, period_focus):\n        # Guess fiscal_year based on document end_date and period_focus\n        date = datetime.strptime(end_date, DATE_FORMAT)\n        month_ranges = {\n            'Q1': (2, 3, 4),\n            'Q2': (5, 6, 7),\n            'Q3': (8, 9, 10),\n            'FY': (11, 12, 1)\n        }\n        month_range = month_ranges.get(period_focus)\n\n        # Case 1: release Q1 around March, Q2 around June, ...\n        # This is what most companies do\n        if date.month in month_range:\n            if period_focus == 'FY' and date.month == 1:\n                return date.year - 1\n            return date.year\n\n        # How many days left before 10-K's release?\n        days_left_table = {\n            'Q1': 270,\n            'Q2': 180,\n            'Q3': 90,\n            'FY': 0\n        }\n        days_left = days_left_table.get(period_focus)\n\n        # Other cases, assume end_date.year of its FY report equals to\n        # its fiscal_year\n        if days_left is not None:\n            fy_date = date + timedelta(days=days_left)\n            return fy_date.year\n\n        return None\n\n    def _get_doc_end_date(self):\n        # the document end date could come from URL or document content\n        # we need to guess which one is correct\n        url_date_str = self.context['response'].url.split('-')[-1].split('.')[0]\n        url_date = datetime.strptime(url_date_str, '%Y%m%d')\n        url_date_str = url_date.strftime(DATE_FORMAT)\n\n        try:\n            doc_date_str = self.selector.xpath('//dei:DocumentPeriodEndDate/text()')[0].extract()\n            doc_date = datetime.strptime(doc_date_str, DATE_FORMAT)\n        except (IndexError, ValueError):\n            return url_date.strftime(DATE_FORMAT)\n\n        context_date_strs = set(self.selector.xpath('//*[local-name()=\"context\"]//*[local-name()=\"endDate\"]/text()').extract())\n\n        date = url_date\n        if doc_date_str in context_date_strs:\n            date = doc_date\n\n        return date.strftime(DATE_FORMAT)\n\n    def _get_doc_type(self):\n        try:\n            return self.selector.xpath('//dei:DocumentType/text()')[0].extract().upper()\n        except (IndexError, ValueError):\n            return None\n\n    def _get_period_focus(self, doc_end_date):\n        try:\n            return self.selector.xpath('//dei:DocumentFiscalPeriodFocus/text()')[0].extract().strip().upper()\n        except IndexError:\n            pass\n\n        try:\n            doc_yr = doc_end_date.split('-')[0]\n            yr_end_date = self.selector.xpath('//dei:CurrentFiscalYearEndDate/text()')[0].extract()\n            yr_end_date = yr_end_date.replace('--', doc_yr + '-')\n        except IndexError:\n            return None\n\n        doc_end_date = datetime.strptime(doc_end_date, '%Y-%m-%d')\n        yr_end_date = datetime.strptime(yr_end_date, '%Y-%m-%d')\n        delta_days = (yr_end_date - doc_end_date).days\n\n        if delta_days > -45 and delta_days < 45:\n            return 'FY'\n        elif (delta_days <= -45 and delta_days > -135) or delta_days > 225:\n            return 'Q1'\n        elif (delta_days <= -135 and delta_days > -225) or (delta_days > 135 and delta_days <= 225):\n            return 'Q2'\n        elif delta_days <= -225 or (delta_days > 45 and delta_days <= 135):\n            return 'Q3'\n\n        return 'FY'\n"
  },
  {
    "path": "pystock_crawler/settings.py",
    "content": "# Scrapy settings for pystock-crawler project\n#\n# For simplicity, this file contains only the most important settings by\n# default. All the other settings are documented here:\n#\n#     http://doc.scrapy.org/en/latest/topics/settings.html\n#\n\nBOT_NAME = 'pystock-crawler'\n\nEXPORT_FIELDS = (\n    # Price columns\n    'symbol', 'date', 'open', 'high', 'low', 'close', 'volume', 'adj_close',\n\n    # Report columns\n    'end_date', 'amend', 'period_focus', 'fiscal_year', 'doc_type', 'revenues', 'op_income', 'net_income',\n    'eps_basic', 'eps_diluted', 'dividend', 'assets', 'cur_assets', 'cur_liab', 'cash', 'equity',\n    'cash_flow_op', 'cash_flow_inv', 'cash_flow_fin',\n)\n\nFEED_EXPORTERS = {\n    'csv': 'pystock_crawler.exporters.CsvItemExporter2',\n    'symbollist': 'pystock_crawler.exporters.SymbolListExporter'\n}\n\nHTTPCACHE_ENABLED = True\n\nHTTPCACHE_POLICY = 'scrapy.contrib.httpcache.RFC2616Policy'\n\nHTTPCACHE_STORAGE = 'scrapy.contrib.httpcache.LeveldbCacheStorage'\n\nLOG_LEVEL = 'INFO'\n\nNEWSPIDER_MODULE = 'pystock_crawler.spiders'\n\nSPIDER_MODULES = ['pystock_crawler.spiders']\n\n# Crawl responsibly by identifying yourself (and your website) on the user-agent\n#USER_AGENT = 'pystock-crawler (+http://www.yourdomain.com)'\n\nCONCURRENT_REQUESTS_PER_DOMAIN = 8\n\nCOOKIES_ENABLED = False\n\n#AUTOTHROTTLE_ENABLED = True\n\nRETRY_TIMES = 4\n\nEXTENSIONS = {\n    'scrapy.contrib.throttle.AutoThrottle': None,\n    'pystock_crawler.throttle.PassiveThrottle': 0\n}\n\nPASSIVETHROTTLE_ENABLED = True\n#PASSIVETHROTTLE_DEBUG = True\n\nDEPTH_STATS_VERBOSE = True\n"
  },
  {
    "path": "pystock_crawler/spiders/__init__.py",
    "content": "# This package will contain the spiders of your Scrapy project\n#\n# Please refer to the documentation for information on how to create and manage\n# your spiders.\n"
  },
  {
    "path": "pystock_crawler/spiders/edgar.py",
    "content": "import os\n\nfrom scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor\nfrom scrapy.contrib.spiders import CrawlSpider, Rule\n\nfrom pystock_crawler import utils\nfrom pystock_crawler.loaders import ReportItemLoader\n\n\nclass URLGenerator(object):\n\n    def __init__(self, symbols, start_date='', end_date='', start=0, count=None):\n        end = start + count if count is not None else None\n        self.symbols = symbols[start:end]\n        self.start_date = start_date\n        self.end_date = end_date\n\n    def __iter__(self):\n        url = 'http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=%s&type=10-&dateb=%s&datea=%s&owner=exclude&count=300'\n        for symbol in self.symbols:\n            yield (url % (symbol, self.end_date, self.start_date))\n\n\nclass EdgarSpider(CrawlSpider):\n\n    name = 'edgar'\n    allowed_domains = ['sec.gov']\n\n    rules = (\n        Rule(SgmlLinkExtractor(allow=('/Archives/edgar/data/[^\\\"]+\\-index\\.htm',))),\n        Rule(SgmlLinkExtractor(allow=('/Archives/edgar/data/[^\\\"]+/[A-Za-z]+\\-\\d{8}\\.xml',)), callback='parse_10qk'),\n    )\n\n    def __init__(self, **kwargs):\n        super(EdgarSpider, self).__init__(**kwargs)\n\n        symbols_arg = kwargs.get('symbols')\n        start_date = kwargs.get('startdate', '')\n        end_date = kwargs.get('enddate', '')\n        limit_arg = kwargs.get('limit', '')\n\n        utils.check_date_arg(start_date, 'startdate')\n        utils.check_date_arg(end_date, 'enddate')\n        start, count = utils.parse_limit_arg(limit_arg)\n\n        if symbols_arg:\n            if os.path.exists(symbols_arg):\n                # get symbols from a text file\n                symbols = utils.load_symbols(symbols_arg)\n            else:\n                # inline symbols in command\n                symbols = symbols_arg.split(',')\n            self.start_urls = URLGenerator(symbols, start_date, end_date, start, count)\n        else:\n            self.start_urls = []\n\n    def parse_10qk(self, response):\n        '''Parse 10-Q or 10-K XML report.'''\n        loader = ReportItemLoader(response=response)\n        item = loader.load_item()\n\n        if 'doc_type' in item:\n            doc_type = item['doc_type']\n            if doc_type in ('10-Q', '10-K'):\n                return item\n\n        return None\n"
  },
  {
    "path": "pystock_crawler/spiders/nasdaq.py",
    "content": "import cStringIO\nimport re\n\nfrom scrapy.spider import Spider\n\nfrom pystock_crawler.items import SymbolItem\n\n\nRE_SYMBOL = re.compile(r'^[A-Z]+$')\n\n\ndef generate_urls(exchanges):\n    for exchange in exchanges:\n        yield 'http://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=%s&render=download' % exchange\n\n\nclass NasdaqSpider(Spider):\n\n    name = 'nasdaq'\n    allowed_domains = ['www.nasdaq.com']\n\n    def __init__(self, **kwargs):\n        super(NasdaqSpider, self).__init__(**kwargs)\n\n        exchanges = kwargs.get('exchanges', '').split(',')\n        self.start_urls = generate_urls(exchanges)\n\n    def parse(self, response):\n        try:\n            file_like = cStringIO.StringIO(response.body)\n\n            # Ignore first row\n            file_like.next()\n\n            for line in file_like:\n                tokens = line.split(',')\n                symbol = tokens[0].strip('\"')\n                if RE_SYMBOL.match(symbol):\n                    name = tokens[1].strip('\"')\n                    yield SymbolItem(symbol=symbol, name=name)\n        finally:\n            file_like.close()\n"
  },
  {
    "path": "pystock_crawler/spiders/yahoo.py",
    "content": "import cStringIO\nimport os\nimport re\n\nfrom datetime import datetime\nfrom scrapy.spider import Spider\n\nfrom pystock_crawler import utils\nfrom pystock_crawler.items import PriceItem\n\n\ndef parse_date(date_str):\n    if date_str:\n        date = datetime.strptime(date_str, '%Y%m%d')\n        return date.year, date.month - 1, date.day\n    return '', '', ''\n\n\ndef make_url(symbol, start_date=None, end_date=None):\n    url = ('http://ichart.finance.yahoo.com/table.csv?'\n           's=%(symbol)s&d=%(end_month)s&e=%(end_day)s&f=%(end_year)s&g=d&'\n           'a=%(start_month)s&b=%(start_day)s&c=%(start_year)s&ignore=.csv')\n\n    start_date = parse_date(start_date)\n    end_date = parse_date(end_date)\n\n    return url % {\n        'symbol': symbol,\n        'start_year': start_date[0],\n        'start_month': start_date[1],\n        'start_day': start_date[2],\n        'end_year': end_date[0],\n        'end_month': end_date[1],\n        'end_day': end_date[2]\n    }\n\n\ndef generate_urls(symbols, start_date=None, end_date=None):\n    for symbol in symbols:\n        yield make_url(symbol, start_date, end_date)\n\n\nclass YahooSpider(Spider):\n\n    name = 'yahoo'\n    allowed_domains = ['finance.yahoo.com']\n\n    def __init__(self, **kwargs):\n        super(YahooSpider, self).__init__(**kwargs)\n\n        symbols_arg = kwargs.get('symbols')\n        start_date = kwargs.get('startdate', '')\n        end_date = kwargs.get('enddate', '')\n\n        utils.check_date_arg(start_date, 'startdate')\n        utils.check_date_arg(end_date, 'enddate')\n\n        if symbols_arg:\n            if os.path.exists(symbols_arg):\n                # get symbols from a text file\n                symbols = utils.load_symbols(symbols_arg)\n            else:\n                # inline symbols in command\n                symbols = symbols_arg.split(',')\n            self.start_urls = generate_urls(symbols, start_date, end_date)\n        else:\n            self.start_urls = []\n\n    def parse(self, response):\n        symbol = self._get_symbol_from_url(response.url)\n        try:\n            file_like = cStringIO.StringIO(response.body)\n            rows = utils.parse_csv(file_like)\n            for row in rows:\n                item = PriceItem(symbol=symbol)\n                for k, v in row.iteritems():\n                    item[k.replace(' ', '_').lower()] = v\n                yield item\n        finally:\n            file_like.close()\n\n    def _get_symbol_from_url(self, url):\n        match = re.search(r'[\\?&]s=([^&]*)', url)\n        if match:\n            return match.group(1)\n        return ''\n"
  },
  {
    "path": "pystock_crawler/tests/__init__.py",
    "content": ""
  },
  {
    "path": "pystock_crawler/tests/base.py",
    "content": "import os\nimport unittest\n\n\n# Stores temporary test data\nSAMPLE_DATA_DIR = os.path.join(os.path.abspath(os.path.dirname(__file__)), 'sample_data')\n\n\nclass TestCaseBase(unittest.TestCase):\n    '''\n    Provides utility functions for test cases.\n\n    '''\n    def assert_none_or_almost_equal(self, value, expected_value):\n        if expected_value is None:\n            self.assertIsNone(value)\n        else:\n            self.assertAlmostEqual(value, expected_value)\n\n    def assert_item(self, item, expected):\n        self.assertEqual(item.get('symbol'), expected.get('symbol'))\n        self.assertEqual(item.get('name'), expected.get('name'))\n        self.assertEqual(item.get('amend'), expected.get('amend'))\n        self.assertEqual(item.get('doc_type'), expected.get('doc_type'))\n        self.assertEqual(item.get('period_focus'), expected.get('period_focus'))\n        self.assertEqual(item.get('fiscal_year'), expected.get('fiscal_year'))\n        self.assertEqual(item.get('end_date'), expected.get('end_date'))\n        self.assert_none_or_almost_equal(item.get('revenues'), expected.get('revenues'))\n        self.assert_none_or_almost_equal(item.get('net_income'), expected.get('net_income'))\n        self.assert_none_or_almost_equal(item.get('eps_basic'), expected.get('eps_basic'))\n        self.assert_none_or_almost_equal(item.get('eps_diluted'), expected.get('eps_diluted'))\n        self.assertAlmostEqual(item.get('dividend'), expected.get('dividend'))\n        self.assert_none_or_almost_equal(item.get('assets'), expected.get('assets'))\n        self.assert_none_or_almost_equal(item.get('equity'), expected.get('equity'))\n        self.assert_none_or_almost_equal(item.get('cash'), expected.get('cash'))\n        self.assert_none_or_almost_equal(item.get('op_income'), expected.get('op_income'))\n        self.assert_none_or_almost_equal(item.get('cur_assets'), expected.get('cur_assets'))\n        self.assert_none_or_almost_equal(item.get('cur_liab'), expected.get('cur_liab'))\n        self.assert_none_or_almost_equal(item.get('cash_flow_op'), expected.get('cash_flow_op'))\n        self.assert_none_or_almost_equal(item.get('cash_flow_inv'), expected.get('cash_flow_inv'))\n        self.assert_none_or_almost_equal(item.get('cash_flow_fin'), expected.get('cash_flow_fin'))\n\n\ndef _create_sample_data_dir():\n    if not os.path.exists(SAMPLE_DATA_DIR):\n        try:\n            os.makedirs(SAMPLE_DATA_DIR)\n        except OSError:\n            pass\n\n    assert os.path.exists(SAMPLE_DATA_DIR)\n\n_create_sample_data_dir()\n"
  },
  {
    "path": "pystock_crawler/tests/test_cmdline.py",
    "content": "import os\nimport shutil\nimport unittest\n\nimport pystock_crawler\n\nfrom envoy import run\n\n\nTEST_DIR = './test_data'\n\n\n# Scrapy runs on another process where working directory may be different with\n# the process running the test. So we have to explicitly set PYTHONPATH to\n# the absolute path of the current working directory for Scrapy process to be\n# able to locate pystock_crawler module.\nos.environ['PYTHONPATH'] = os.getcwd()\n\n\nclass PrintTest(unittest.TestCase):\n\n    def test_no_args(self):\n        r = run('./bin/pystock-crawler')\n        self.assertIn('Usage:', r.std_err)\n\n    def test_print_help(self):\n        r = run('./bin/pystock-crawler -h')\n        self.assertIn('Usage:', r.std_out)\n\n        r2 = run('./bin/pystock-crawler --help')\n        self.assertEqual(r.std_out, r2.std_out)\n\n    def test_print_version(self):\n        r = run('./bin/pystock-crawler -v')\n        self.assertEqual(r.std_out, 'pystock-crawler %s\\n' % pystock_crawler.__version__)\n\n        r2 = run('./bin/pystock-crawler --version')\n        self.assertEqual(r.std_out, r2.std_out)\n\n\nclass CrawlTest(unittest.TestCase):\n    '''Base class for crawl test cases.'''\n    def setUp(self):\n        if os.path.isdir(TEST_DIR):\n            shutil.rmtree(TEST_DIR)\n        os.mkdir(TEST_DIR)\n\n        self.args = {\n            'output': os.path.join(TEST_DIR, '%s.out' % self.filename),\n            'log_file': os.path.join(TEST_DIR, '%s.log' % self.filename),\n            'working_dir': TEST_DIR\n        }\n\n    def tearDown(self):\n        shutil.rmtree(TEST_DIR)\n\n    def assert_cache(self):\n        # Check if cache is there\n        cache_dir = os.path.join(TEST_DIR, '.scrapy', 'httpcache', '%s.leveldb' % self.spider)\n        self.assertTrue(os.path.isdir(cache_dir))\n\n    def assert_log(self):\n        # Check if log file is there\n        log_path = self.args['log_file']\n        self.assertTrue(os.path.isfile(log_path))\n\n    def get_output_content(self):\n        output_path = self.args['output']\n        self.assertTrue(os.path.isfile(output_path))\n\n        with open(output_path) as f:\n            content = f.read()\n        return content\n\n\nclass CrawlSymbolsTest(CrawlTest):\n\n    filename = 'symbols'\n    spider = 'nasdaq'\n\n    def assert_nyse_output(self):\n        # Check if some common NYSE symbols are in output\n        content = self.get_output_content()\n        self.assertIn('JPM', content)\n        self.assertIn('KO', content)\n        self.assertIn('WMT', content)\n\n        # NASDAQ symbols shouldn't be\n        self.assertNotIn('AAPL', content)\n        self.assertNotIn('GOOG', content)\n        self.assertNotIn('YHOO', content)\n\n    def assert_nyse_and_nasdaq_output(self):\n        # Check if some common NYSE symbols are in output\n        content = self.get_output_content()\n        self.assertIn('JPM', content)\n        self.assertIn('KO', content)\n        self.assertIn('WMT', content)\n\n        # Check if some common NASDAQ symbols are in output\n        self.assertIn('AAPL', content)\n        self.assertIn('GOOG', content)\n        self.assertIn('YHOO', content)\n\n    def test_crawl_nyse(self):\n        r = run('./bin/pystock-crawler symbols NYSE -o %(output)s -l %(log_file)s -w %(working_dir)s' % self.args)\n        self.assertEqual(r.status_code, 0)\n        self.assert_nyse_output()\n        self.assert_log()\n        self.assert_cache()\n\n    def test_crawl_nyse_and_nasdaq(self):\n        r = run('./bin/pystock-crawler symbols NYSE,NASDAQ -o %(output)s -l %(log_file)s -w %(working_dir)s --sort' % self.args)\n        self.assertEqual(r.status_code, 0)\n        self.assert_nyse_and_nasdaq_output()\n        self.assert_log()\n        self.assert_cache()\n\n\nclass CrawlPricesTest(CrawlTest):\n\n    filename = 'prices'\n    spider = 'yahoo'\n\n    def test_crawl_inline_symbols(self):\n        r = run('./bin/pystock-crawler prices GOOG,IBM -o %(output)s -l %(log_file)s -w %(working_dir)s' % self.args)\n        self.assertEqual(r.status_code, 0)\n\n        content = self.get_output_content()\n        self.assertIn('GOOG', content)\n        self.assertIn('IBM', content)\n        self.assert_log()\n        self.assert_cache()\n\n    def test_crawl_symbol_file(self):\n        # Create a sample symbol file\n        symbol_file = os.path.join(TEST_DIR, 'symbols.txt')\n        with open(symbol_file, 'w') as f:\n            f.write('WMT\\nJPM')\n        self.args['symbol_file'] = symbol_file\n\n        r = run('./bin/pystock-crawler prices %(symbol_file)s -o %(output)s -l %(log_file)s -w %(working_dir)s --sort' % self.args)\n        self.assertEqual(r.status_code, 0)\n\n        content = self.get_output_content()\n        self.assertIn('WMT', content)\n        self.assertIn('JPM', content)\n        self.assert_log()\n        self.assert_cache()\n\n\nclass CrawlReportsTest(CrawlTest):\n\n    filename = 'reports'\n    spider = 'edgar'\n\n    def test_crawl_inline_symbols(self):\n        r = run('./bin/pystock-crawler reports KO,MCD -o %(output)s -l %(log_file)s -w %(working_dir)s '\n                '-s 20130401 -e 20130531' % self.args)\n        self.assertEqual(r.status_code, 0)\n\n        content = self.get_output_content()\n        self.assertIn('KO', content)\n        self.assertIn('MCD', content)\n        self.assert_log()\n        self.assert_cache()\n\n    def test_crawl_symbol_file(self):\n        # Create a sample symbol file\n        symbol_file = os.path.join(TEST_DIR, 'symbols.txt')\n        with open(symbol_file, 'w') as f:\n            f.write('KO\\nMCD')\n        self.args['symbol_file'] = symbol_file\n\n        r = run('./bin/pystock-crawler reports %(symbol_file)s -o %(output)s -l %(log_file)s -w %(working_dir)s '\n                '-s 20130401 -e 20130531 --sort' % self.args)\n        self.assertEqual(r.status_code, 0)\n\n        content = self.get_output_content()\n        self.assertIn('KO', content)\n        self.assertIn('MCD', content)\n        self.assert_log()\n        self.assert_cache()\n\n        # Check CSV header\n        expected_header = [\n            'symbol', 'end_date', 'amend', 'period_focus', 'fiscal_year', 'doc_type',\n            'revenues', 'op_income', 'net_income', 'eps_basic', 'eps_diluted', 'dividend',\n            'assets', 'cur_assets', 'cur_liab', 'cash', 'equity', 'cash_flow_op',\n            'cash_flow_inv', 'cash_flow_fin'\n        ]\n        head_line = content.split('\\n')[0].rstrip()\n        self.assertEqual(head_line.split(','), expected_header)\n\n    def test_merge_empty_results(self):\n        # Ridiculous date range (1800/1/1) -> empty result\n        r = run('./bin/pystock-crawler reports KO,MCD -o %(output)s -l %(log_file)s -w %(working_dir)s '\n                '-s 18000101 -e 18000101 -b 1' % self.args)\n        self.assertEqual(r.status_code, 0)\n\n        content = self.get_output_content()\n        self.assertFalse(content)\n\n        # Make sure subfiles are deleted\n        filename = self.args['output']\n        self.assertFalse(os.path.exists(os.path.join('%s.1' % filename)))\n        self.assertFalse(os.path.exists(os.path.join('%s.2' % filename)))\n"
  },
  {
    "path": "pystock_crawler/tests/test_loaders.py",
    "content": "import os\nimport requests\nimport urlparse\n\nfrom scrapy.http.response.xml import XmlResponse\n\nfrom pystock_crawler.loaders import ReportItemLoader\nfrom pystock_crawler.tests.base import SAMPLE_DATA_DIR, TestCaseBase\n\n\ndef create_response(file_path):\n    with open(file_path) as f:\n        body = f.read()\n    return XmlResponse('file://%s' % file_path.replace('\\\\', '/'), body=body)\n\n\ndef download(url, local_path):\n    if not os.path.exists(local_path):\n        dir_path = os.path.dirname(local_path)\n        if not os.path.exists(dir_path):\n            try:\n                os.makedirs(dir_path)\n            except OSError:\n                pass\n\n        assert os.path.exists(dir_path)\n\n        with open(local_path, 'wb') as f:\n            r = requests.get(url, stream=True)\n            for chunk in r.iter_content(chunk_size=4096):\n                f.write(chunk)\n\n\ndef parse_xml(url):\n    url_path = urlparse.urlparse(url).path\n    local_path = os.path.join(SAMPLE_DATA_DIR, url_path[1:])\n    download(url, local_path)\n    response = create_response(local_path)\n    loader = ReportItemLoader(response=response)\n    return loader.load_item()\n\n\nclass ReportItemLoaderTest(TestCaseBase):\n\n    def test_a_20110131(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1090872/000110465911013291/a-20110131.xml')\n        self.assert_item(item, {\n            'symbol': 'A',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2011,\n            'end_date': '2011-01-31',\n            'revenues': 1519000000,\n            'op_income': 211000000,\n            'net_income': 193000000,\n            'eps_basic': 0.56,\n            'eps_diluted': 0.54,\n            'dividend': 0.0,\n            'assets': 8044000000,\n            'cur_assets': 4598000000,\n            'cur_liab': 1406000000,\n            'equity': 3339000000,\n            'cash': 2638000000,\n            'cash_flow_op': 120000000,\n            'cash_flow_inv': 1500000000,\n            'cash_flow_fin': -1634000000\n        })\n\n    def test_aa_20120630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/4281/000119312512317135/aa-20120630.xml')\n        self.assert_item(item, {\n            'symbol': 'AA',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2012,\n            'end_date': '2012-06-30',\n            'revenues': 5963000000,\n            'op_income': None,  # Missing value\n            'net_income': -2000000,\n            'eps_basic': None,  # EPS is 0 actually, but got no data in XML\n            'eps_diluted': None,\n            'dividend': 0.03,\n            'assets': 39498000000,\n            'cur_assets': 7767000000,\n            'cur_liab': 6151000000,\n            'equity': 16914000000,\n            'cash': 1712000000,\n            'cash_flow_op': 301000000,\n            'cash_flow_inv': -704000000,\n            'cash_flow_fin': 196000000\n        })\n\n    def test_aapl_20100626(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/320193/000119312510162840/aapl-20100626.xml')\n        self.assert_item(item, {\n            'symbol': 'AAPL',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2010,\n            'end_date': '2010-06-26',\n            'revenues': 15700000000,\n            'op_income': 4234000000,\n            'net_income': 3253000000,\n            'eps_basic': 3.57,\n            'eps_diluted': 3.51,\n            'dividend': 0.0,\n            'assets': 64725000000,\n            'cur_assets': 36033000000,\n            'cur_liab': 15612000000,\n            'equity': 43111000000,\n            'cash': 9705000000,\n            'cash_flow_op': 12912000000,\n            'cash_flow_inv': -9471000000,\n            'cash_flow_fin': 1001000000\n        })\n\n    def test_aapl_20110326(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/320193/000119312511104388/aapl-20110326.xml')\n        self.assert_item(item, {\n            'symbol': 'AAPL',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2011,\n            'end_date': '2011-03-26',\n            'revenues': 24667000000,\n            'net_income': 5987000000,\n            'op_income': 7874000000,\n            'eps_basic': 6.49,\n            'eps_diluted': 6.40,\n            'dividend': 0.0,\n            'assets': 94904000000,\n            'cur_assets': 46997000000,\n            'cur_liab': 24327000000,\n            'equity': 61477000000,\n            'cash': 15978000000,\n            'cash_flow_op': 15992000000,\n            'cash_flow_inv': -12251000000,\n            'cash_flow_fin': 976000000\n        })\n\n    def test_aapl_20120929(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/320193/000119312512444068/aapl-20120929.xml')\n        self.assert_item(item, {\n            'symbol': 'AAPL',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2012,\n            'end_date': '2012-09-29',\n            'revenues': 156508000000,\n            'op_income': 55241000000,\n            'net_income': 41733000000,\n            'eps_basic': 44.64,\n            'eps_diluted': 44.15,\n            'dividend': 2.65,\n            'assets': 176064000000,\n            'cur_assets': 57653000000,\n            'cur_liab': 38542000000,\n            'equity': 118210000000,\n            'cash': 10746000000,\n            'cash_flow_op': 50856000000,\n            'cash_flow_inv': -48227000000,\n            'cash_flow_fin': -1698000000\n        })\n\n    def test_aes_20100331(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/874761/000119312510111183/aes-20100331.xml')\n        self.assert_item(item, {\n            'symbol': 'AES',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2010,\n            'end_date': '2010-03-31',\n            'revenues': 4112000000,\n            'op_income': None,  # Missing value\n            'net_income': 187000000,\n            'eps_basic': 0.27,\n            'eps_diluted': 0.27,\n            'dividend': 0.0,\n            'assets': 41882000000,\n            'cur_assets': 10460000000,\n            'cur_liab': 6894000000,\n            'equity': 10536000000,\n            'cash': 3392000000,\n            'cash_flow_op': 684000000,\n            'cash_flow_inv': -595000000,\n            'cash_flow_fin': 1515000000\n        })\n\n    def test_adbe_20060914(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/796343/000110465906066129/adbe-20060914.xml')\n\n        # Old document is not supported\n        self.assertFalse(item)\n\n    def test_adbe_20090227(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/796343/000079634309000021/adbe-20090227.xml')\n        self.assert_item(item, {\n            'symbol': 'ADBE',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2009,\n            'end_date': '2009-02-27',\n            'revenues': 786390000,\n            'op_income': 207916000,\n            'net_income': 156435000,\n            'eps_basic': 0.3,\n            'eps_diluted': 0.3,\n            'dividend': 0.0,\n            'assets': 5887596000,\n            'cur_assets': 2868991000,\n            'cur_liab': 636865000,\n            'equity': 4611160000,\n            'cash': 1148925000,\n            'cash_flow_op': 365743000,\n            'cash_flow_inv': -131562000,\n            'cash_flow_fin': 28675000\n        })\n\n    def test_agn_20101231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/850693/000119312511050632/agn-20101231.xml')\n        self.assert_item(item, {\n            'symbol': 'AGN',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2010,\n            'end_date': '2010-12-31',\n            'revenues': 4919400000,\n            'op_income': 258600000,\n            'net_income': 600000,\n            'eps_basic': 0.0,\n            'eps_diluted': 0.0,\n            'dividend': 0.2,\n            'assets': 8308100000,\n            'cur_assets': 3993700000,\n            'cur_liab': 1528400000,\n            'equity': 4781100000,\n            'cash': 1991200000,\n            'cash_flow_op': 463900000,\n            'cash_flow_inv': -977200000,\n            'cash_flow_fin': 563000000\n        })\n\n    def test_aig_20130630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/5272/000104746913008075/aig-20130630.xml')\n        self.assert_item(item, {\n            'symbol': 'AIG',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2013,\n            'end_date': '2013-06-30',\n            'revenues': 17315000000,\n            'net_income': 2731000000,\n            'op_income': None,\n            'eps_basic': 1.85,\n            'eps_diluted': 1.84,\n            'dividend': 0.0,\n            'assets': 537438000000,\n            'cur_assets': None,\n            'cur_liab': None,\n            'equity': 98155000000,\n            'cash': 1762000000,\n            'cash_flow_op': 1674000000,\n            'cash_flow_inv': 6071000000,\n            'cash_flow_fin': -7055000000\n        })\n\n    def test_aiv_20110630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/922864/000095012311070591/aiv-20110630.xml')\n        self.assert_item(item, {\n            'symbol': 'AIV',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2011,\n            'end_date': '2011-06-30',\n            'revenues': 281035000,\n            'op_income': 49791000,\n            'net_income': -33177000,\n            'eps_basic': -0.28,\n            'eps_diluted': -0.28,\n            'dividend': 0.12,\n            'assets': 7164972000,\n            'cur_assets': None,\n            'cur_liab': None,\n            'equity': 1241336000,\n            'cash': 85324000,\n            'cash_flow_op': 95208000,\n            'cash_flow_inv': -33538000,\n            'cash_flow_fin': -87671000\n        })\n\n    def test_all_20130331(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/899051/000110465913035969/all-20130331.xml')\n        self.assert_item(item, {\n            'symbol': 'ALL',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2013,\n            'end_date': '2013-03-31',\n            'revenues': 8463000000,\n            'op_income': None,\n            'net_income': 709000000,\n            'eps_basic': 1.49,\n            'eps_diluted': 1.47,\n            'dividend': 0.25,\n            'assets': 126612000000,\n            'cur_assets': None,\n            'cur_liab': None,\n            'equity': 20619000000,\n            'cash': 820000000,\n            'cash_flow_op': 740000000,\n            'cash_flow_inv': 136000000,\n            'cash_flow_fin': -862000000\n        })\n\n    def test_apa_20120930(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/6769/000119312512457830/apa-20120930.xml')\n        self.assert_item(item, {\n            'symbol': 'APA',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2012,\n            'end_date': '2012-09-30',\n            'revenues': 4179000000,\n            'op_income': None,\n            'net_income': 161000000,\n            'eps_basic': 0.41,\n            'eps_diluted': 0.41,\n            'dividend': 0.17,\n            'assets': 58810000000,\n            'cur_assets': 5044000000,\n            'cur_liab': 5390000000,\n            'equity': 30714000000,\n            'cash': 318000000,\n            'cash_flow_op': 6422000000,\n            'cash_flow_inv': -10560000000,\n            'cash_flow_fin': 4161000000\n        })\n\n    def test_axp_20100930(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/4962/000095012310100214/axp-20100930.xml')\n        self.assert_item(item, {\n            'symbol': 'AXP',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2010,\n            'end_date': '2010-09-30',\n            'revenues': 6660000000,\n            'op_income': 1640000000,\n            'net_income': 1093000000,\n            'eps_basic': 0.91,\n            'eps_diluted': 0.9,\n            'dividend': 0.18,\n            'assets': 146056000000,\n            'cur_assets': None,\n            'cur_liab': None,\n            'equity': 15920000000,\n            'cash': 21341000000,\n            'cash_flow_op': 7227000000,\n            'cash_flow_inv': 5298000000,\n            'cash_flow_fin': -7885000000\n        })\n\n    def test_axp_20120630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/4962/000119312512332179/axp-20120630.xml')\n        self.assert_item(item, {\n            'symbol': 'AXP',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2012,\n            'end_date': '2012-06-30',\n            'revenues': 7504000000,\n            'op_income': None,\n            'net_income': 1339000000,\n            'eps_basic': 1.16,\n            'eps_diluted': 1.15,\n            'dividend': 0.2,\n            'assets': 148128000000,\n            'cur_assets': None,\n            'cur_liab': None,\n            'equity': 19267000000,\n            'cash': 22072000000,\n            'cash_flow_op': 6742000000,\n            'cash_flow_inv': -1771000000,\n            'cash_flow_fin': -7786000000\n        })\n\n    def test_axp_20121231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/4962/000119312513070554/axp-20121231.xml')\n        self.assert_item(item, {\n            'symbol': 'AXP',\n            'amend': True,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2012,\n            'end_date': '2012-12-31',\n            'revenues': 29592000000,\n            'op_income': None,\n            'net_income': 4482000000,\n            'eps_basic': 3.91,\n            'eps_diluted': 3.89,\n            'dividend': 0.8,\n            'assets': 153140000000,\n            'cur_assets': None,\n            'cur_liab': None,\n            'equity': 18886000000,\n            'cash': 22250000000,\n            'cash_flow_op': 7082000000,\n            'cash_flow_inv': -6545000000,\n            'cash_flow_fin': -3268000000\n        })\n\n    def test_axp_20130331(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/4962/000119312513180601/axp-20130331.xml')\n        self.assert_item(item, {\n            'symbol': 'AXP',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2013,\n            'end_date': '2013-03-31',\n            'revenues': 7384000000,\n            'op_income': None,\n            'net_income': 1280000000,\n            'eps_basic': 1.15,\n            'eps_diluted': 1.15,\n            'dividend': 0.2,\n            'assets': 156855000000,\n            'cur_assets': None,\n            'cur_liab': None,\n            'equity': 19290000000,\n            'cash': 27964000000,\n            'cash_flow_op': 7547000000,\n            'cash_flow_inv': 32000000,\n            'cash_flow_fin': -1830000000\n        })\n\n    def test_ba_20091231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12927/000119312510024406/ba-20091231.xml')\n        self.assert_item(item, {\n            'symbol': 'BA',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2009,\n            'end_date': '2009-12-31',\n            'revenues': 68281000000,\n            'op_income': 2096000000,\n            'net_income': 1312000000,\n            'eps_basic': 1.86,\n            'eps_diluted': 1.84,\n            'dividend': 1.68,\n            'assets': 62053000000,\n            'cur_assets': 35275000000,\n            'cur_liab': 32883000000,\n            'equity': 2225000000,\n            'cash': 9215000000,\n            'cash_flow_op': 5603000000,\n            'cash_flow_inv': -3794000000,\n            'cash_flow_fin': 4094000000\n        })\n\n    def test_ba_20110930(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12927/000119312511281613/ba-20110930.xml')\n        self.assert_item(item, {\n            'symbol': 'BA',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2011,\n            'end_date': '2011-09-30',\n            'revenues': 17727000000,\n            'op_income': 1714000000,\n            'net_income': 1098000000,\n            'eps_basic': 1.47,\n            'eps_diluted': 1.46,\n            'dividend': 0.42,\n            'assets': 74163000000,\n            'cur_assets': 46347000000,\n            'cur_liab': 37593000000,\n            'equity': 6061000000,\n            'cash': 5954000000,\n            'cash_flow_op': 1092000000,\n            'cash_flow_inv': 856000000,\n            'cash_flow_fin': -1354000000\n        })\n\n    def test_ba_20130331(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12927/000001292713000023/ba-20130331.xml')\n        self.assert_item(item, {\n            'symbol': 'BA',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2013,\n            'end_date': '2013-03-31',\n            'revenues': 18893000000,\n            'op_income': 1528000000,\n            'net_income': 1106000000,\n            'eps_basic': 1.45,\n            'eps_diluted': 1.44,\n            'dividend': 0.49,\n            'assets': 90447000000,\n            'cur_assets': 59490000000,\n            'cur_liab': 45666000000,\n            'equity': 7560000000,\n            'cash': 8335000000,\n            'cash_flow_op': 524000000,\n            'cash_flow_inv': -814000000,\n            'cash_flow_fin': -1705000000\n        })\n\n    def test_bbt_20110930(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/92230/000119312511304459/bbt-20110930.xml')\n        self.assert_item(item, {\n            'symbol': 'BBT',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2011,\n            'end_date': '2011-09-30',\n            'revenues': 2440000000,\n            'op_income': None,\n            'net_income': 366000000,\n            'eps_basic': 0.52,\n            'eps_diluted': 0.52,\n            'dividend': 0.16,\n            'assets': 167677000000,\n            'cur_assets': None,\n            'cur_liab': None,\n            'equity': 17541000000,\n            'cash': 1312000000,\n            'cash_flow_op': 4348000000,\n            'cash_flow_inv': -10838000000,\n            'cash_flow_fin': 8509000000\n        })\n\n    def test_bk_20100331(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1390777/000119312510112944/bk-20100331.xml')\n        self.assert_item(item, {\n            'symbol': 'BK',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2010,\n            'end_date': '2010-03-31',\n            'revenues': 883000000,\n            'op_income': None,\n            'net_income': 559000000,\n            'eps_basic': 0.46,\n            'eps_diluted': 0.46,\n            'dividend': 0.09,\n            'assets': 220551000000,\n            'cur_assets': None,\n            'cur_liab': None,\n            'equity': 30455000000,\n            'cash': 3307000000,\n            'cash_flow_op': 1191000000,\n            'cash_flow_inv': 512000000,\n            'cash_flow_fin': -2126000000\n        })\n\n    def test_blk_20130630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1364742/000119312513326890/blk-20130630.xml')\n        self.assert_item(item, {\n            'symbol': 'BLK',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2013,\n            'end_date': '2013-06-30',\n            'revenues': 2482000000,\n            'op_income': 849000000,\n            'net_income': 729000000,\n            'eps_basic': 4.27,\n            'eps_diluted': 4.19,\n            'dividend': 1.68,\n            'assets': 193745000000,\n            'cur_assets': None,\n            'cur_liab': None,\n            'equity': 25755000000,\n            'cash': 3668000000,\n            'cash_flow_op': 1330000000,\n            'cash_flow_inv': 10000000,\n            'cash_flow_fin': -2193000000\n        })\n\n    def test_c_20090630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/831001/000104746909007400/c-20090630.xml')\n        self.assert_item(item, {\n            'symbol': 'C',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2009,\n            'end_date': '2009-06-30',\n            'revenues': 29969000000,\n            'net_income': 4279000000,\n            'op_income': None,\n            'eps_basic': 0.49,\n            'eps_diluted': 0.49,\n            'dividend': 0.0,\n            'assets': 1848533000000,\n            'cur_assets': None,\n            'cur_liab': None,\n            'equity': 154168000000,\n            'cash': 26915000000,\n            'cash_flow_op': -20737000000,\n            'cash_flow_inv': 16457000000,\n            'cash_flow_fin': 959000000\n        })\n\n    def test_cbs_20100331(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/813828/000104746910004823/cbs-20100331.xml')\n        self.assert_item(item, {\n            'symbol': 'CBS',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2010,\n            'end_date': '2010-03-31',\n            'revenues': 3530900000,\n            'op_income': 153400000,\n            'net_income': -26200000,\n            'eps_basic': -0.04,\n            'eps_diluted': -0.04,\n            'dividend': 0.05,\n            'assets': 26756100000,\n            'cur_assets': 5705200000,\n            'cur_liab': 4712300000,\n            'equity': 9046100000,\n            'cash': 872700000,\n            'cash_flow_op': 700700000,\n            'cash_flow_inv': -73600000,\n            'cash_flow_fin': -471100000\n        })\n\n    def test_cbs_20111231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/813828/000104746912001373/cbs-20111231.xml')\n        self.assert_item(item, {\n            'symbol': 'CBS',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2011,\n            'end_date': '2011-12-31',\n            'revenues': 14245000000,\n            'op_income': 2529000000,\n            'net_income': 1305000000,\n            'eps_basic': 1.97,\n            'eps_diluted': 1.92,\n            'dividend': 0.35,\n            'assets': 26197000000,\n            'cur_assets': 5543000000,\n            'cur_liab': 3933000000,\n            'equity': 9908000000,\n            'cash': 660000000,\n            'cash_flow_op': 1749000000,\n            'cash_flow_inv': -389000000,\n            'cash_flow_fin': -1180000000\n        })\n\n    def test_cbs_20130630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/813828/000104746913007929/cbs-20130630.xml')\n        self.assert_item(item, {\n            'symbol': 'CBS',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2013,\n            'end_date': '2013-06-30',\n            'revenues': 3699000000,\n            'op_income': 838000000,\n            'net_income': 472000000,\n            'eps_basic': 0.78,\n            'eps_diluted': 0.76,\n            'dividend': 0.12,\n            'assets': 25693000000,\n            'cur_assets': 4770000000,\n            'cur_liab': 3825000000,\n            'equity': 9601000000,\n            'cash': 282000000,\n            'cash_flow_op': 1051000000,\n            'cash_flow_inv': -230000000,\n            'cash_flow_fin': -1247000000\n        })\n\n    def test_cce_20101001(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1491675/000119312510239952/cce-20101001.xml')\n        self.assert_item(item, {\n            'symbol': 'CCE',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2010,\n            'end_date': '2010-10-01',\n            'revenues': 1681000000,\n            'op_income': 244000000,\n            'net_income': 208000000,\n            'eps_basic': 0.61,\n            'eps_diluted': 0.61,\n            'dividend': 0.0,\n            'assets': 8457000000,\n            'cur_assets': 3145000000,\n            'cur_liab': 2154000000,\n            'equity': 3277000000,\n            'cash': 476000000,\n            'cash_flow_op': 620000000,\n            'cash_flow_inv': -705000000,\n            'cash_flow_fin': 178000000\n        })\n\n    def test_cce_20101231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1491675/000119312511033197/cce-20101231.xml')\n        self.assert_item(item, {\n            'symbol': 'CCE',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2010,\n            'end_date': '2010-12-31',\n            'revenues': 6714000000,\n            'op_income': 810000000,\n            'net_income': 624000000,\n            'eps_basic': 1.84,\n            'eps_diluted': 1.83,\n            'dividend': 0.12,\n            'assets': 8596000000,\n            'cur_assets': 2230000000,\n            'cur_liab': 1942000000,\n            'equity': 3143000000,\n            'cash': 321000000,\n            'cash_flow_op': 825000000,\n            'cash_flow_inv': -739000000,\n            'cash_flow_fin': -144000000\n        })\n\n    def test_cci_20091231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1051470/000119312510031419/cci-20091231.xml')\n        self.assert_item(item, {\n            'symbol': 'CCI',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2009,\n            'end_date': '2009-12-31',\n            'revenues': 1685407000,\n            'op_income': 433991000,\n            'net_income': -135138000,\n            'eps_basic': -0.47,\n            'eps_diluted': -0.47,\n            'dividend': 0.0,\n            'assets': 10956606000,\n            'cur_assets': 1196033000,\n            'cur_liab': 754105000,\n            'equity': 2936085000,\n            'cash': 766146000,\n            'cash_flow_op': 571256000,\n            'cash_flow_inv': -172145000,\n            'cash_flow_fin': 214396000\n        })\n\n    def test_ccmm_20110630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1091667/000109166711000103/ccmm-20110630.xml')\n        self.assert_item(item, {\n            'symbol': 'CCMM',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2011,\n            'end_date': '2011-06-30',\n            'revenues': 1791000000,\n            'op_income': 270000000,\n            'net_income': -107000000,\n            'eps_basic': -0.98,\n            'eps_diluted': -0.98,\n            'dividend': 0.0,\n            'assets': None,\n            'cur_assets': None,  # Seems the source filing got the wrong context date on balance sheet\n            'cur_liab': None,\n            'equity': None,\n            'cash': 194000000,\n            'cash_flow_op': 907000000,\n            'cash_flow_inv': -694000000,\n            'cash_flow_fin': -51000000\n        })\n\n    def test_chtr_20111231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1091667/000109166712000026/chtr-20111231.xml')\n        self.assert_item(item, {\n            'symbol': 'CHTR',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2011,\n            'end_date': '2011-12-31',\n            'revenues': 7204000000,\n            'op_income': 1041000000,\n            'net_income': -369000000,\n            'eps_basic': -3.39,\n            'eps_diluted': -3.39,\n            'dividend': 0.0,\n            'assets': 15605000000,\n            'cur_assets': 370000000,\n            'cur_liab': 1153000000,\n            'equity': 409000000,\n            'cash': 2000000,\n            'cash_flow_op': 1737000000,\n            'cash_flow_inv': -1367000000,\n            'cash_flow_fin': -373000000\n        })\n\n    def test_ci_20130331(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/701221/000110465913036475/ci-20130331.xml')\n        self.assert_item(item, {\n            'symbol': 'CI',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2013,\n            'end_date': '2013-03-31',\n            'revenues': 8183000000,\n            'op_income': None,\n            'net_income': 57000000,\n            'eps_basic': 0.2,\n            'eps_diluted': 0.2,\n            'dividend': 0.04,\n            'assets': 54939000000,\n            'cur_assets': None,\n            'cur_liab': None,\n            'equity': 9660000000,\n            'cash': 3306000000,\n            'cash_flow_op': -805000000,\n            'cash_flow_inv': 962000000,\n            'cash_flow_fin': 185000000\n        })\n\n    def test_cit_20100630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1171825/000089109210003376/cit-20100331.xml')\n        self.assert_item(item, {\n            'symbol': 'CIT',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2010,\n            'end_date': '2010-06-30',\n            'revenues': 669500000,\n            'op_income': None,\n            'net_income': 142100000,\n            'eps_basic': 0.71,\n            'eps_diluted': 0.71,\n            'dividend': 0.0,\n            'assets': 54916800000,\n            'cur_assets': None,\n            'cur_liab': None,\n            'equity': 8633900000,\n            'cash': 1060700000,\n            'cash_flow_op': 178100000,\n            'cash_flow_inv': 7122800000,\n            'cash_flow_fin': -6218700000\n        })\n\n    def test_csc_20120928(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/23082/000002308212000073/csc-20120928.xml')\n        self.assert_item(item, {\n            'symbol': 'CSC',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2013,\n            'end_date': '2012-09-28',\n            'revenues': 3854000000,\n            'op_income': 298000000,\n            'net_income': 130000000,\n            'eps_basic': 0.84,\n            'eps_diluted': 0.83,\n            'dividend': 0.2,\n            'assets': 11649000000,\n            'cur_assets': 5468000000,\n            'cur_liab': 4015000000,\n            'equity': 2885000000,\n            'cash': 1850000000,\n            'cash_flow_op': 665000000,\n            'cash_flow_inv': -366000000,\n            'cash_flow_fin': 469000000\n        })\n\n    def test_disca_20090630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1437107/000095012309029613/disca-20090630.xml')\n        self.assert_item(item, {\n            'symbol': 'DISCA',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2009,\n            'end_date': '2009-06-30',\n            'revenues': 881000000,\n            'op_income': 486000000,\n            'net_income': 183000000,\n            'eps_basic': 0.43,\n            'eps_diluted': 0.43,\n            'dividend': 0.0,\n            'assets': 10696000000,\n            'cur_assets': 1331000000,\n            'cur_liab': 1227000000,\n            'equity': 5918000000,\n            'cash': 339000000,\n            'cash_flow_op': 320000000,\n            'cash_flow_inv': 288000000,\n            'cash_flow_fin': -371000000\n        })\n\n    def test_disca_20090930(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1437107/000095012309056946/disca-20090930.xml')\n        self.assert_item(item, {\n            'symbol': 'DISCA',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2009,\n            'end_date': '2009-09-30',\n            'revenues': 854000000,\n            'op_income': 215000000,\n            'net_income': 95000000,\n            'eps_basic': 0.22,\n            'eps_diluted': 0.22,\n            'dividend': 0.0,\n            'assets': 10741000000,\n            'cur_assets': 1417000000,\n            'cur_liab': 762000000,\n            'equity': 6042000000,\n            'cash': 401000000,\n            'cash_flow_op': 358000000,\n            'cash_flow_inv': 279000000,\n            'cash_flow_fin': -343000000\n        })\n\n    def test_dltr_20130504(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/935703/000093570313000029/dltr-20130504.xml')\n        self.assert_item(item, {\n            'symbol': 'DLTR',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2013,\n            'end_date': '2013-05-04',\n            'revenues': 1865800000,\n            'op_income': 216600000,\n            'net_income': 133500000,\n            'eps_basic': 0.6,\n            'eps_diluted': 0.59,\n            'dividend': 0.0,\n            'assets': 2811800000,\n            'cur_assets': 1489800000,\n            'cur_liab': 663000000,\n            'equity': 1739700000,\n            'cash': 383300000,\n            'cash_flow_op': 129300000,\n            'cash_flow_inv': -88200000,\n            'cash_flow_fin': -57400000\n        })\n\n    def test_dtv_20110331(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1465112/000104746911004655/dtv-20110331.xml')\n        self.assert_item(item, {\n            'symbol': 'DTV',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2011,\n            'end_date': '2011-03-31',\n            'revenues': 6319000000,\n            'op_income': 1155000000,\n            'net_income': 674000000,\n            'eps_basic': 0.85,\n            'eps_diluted': 0.85,\n            'dividend': 0.0,\n            'assets': 20593000000,\n            'cur_assets': 6938000000,\n            'cur_liab': 4125000000,\n            'equity': -902000000,\n            'cash': 4295000000,\n            'cash_flow_op': 1309000000,\n            'cash_flow_inv': -544000000,\n            'cash_flow_fin': 2028000000\n        })\n\n    def test_ebay_20100630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1065088/000119312510164115/ebay-20100630.xml')\n        self.assert_item(item, {\n            'symbol': 'EBAY',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2010,\n            'end_date': '2010-06-30',\n            'revenues': 2215379000,\n            'op_income': 484565000,\n            'net_income': 412192000,\n            'eps_basic': 0.31,\n            'eps_diluted': 0.31,\n            'dividend': 0.0,\n            'assets': 18747584000,\n            'cur_assets': 8675313000,\n            'cur_liab': 3564261000,\n            'equity': 14169291000,\n            'cash': 4037442000,\n            'cash_flow_op': 1144641000,\n            'cash_flow_inv': -835635000,\n            'cash_flow_fin': 50363000\n        })\n\n    def test_ebay_20130331(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1065088/000106508813000058/ebay-20130331.xml')\n        self.assert_item(item, {\n            'symbol': 'EBAY',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2013,\n            'end_date': '2013-03-31',\n            'revenues': 3748000000,\n            'op_income': 800000000,\n            'net_income': 677000000,\n            'eps_basic': 0.52,\n            'eps_diluted': 0.51,\n            'dividend': 0.0,\n            'assets': 38000000000,\n            'cur_assets': 22336000000,\n            'cur_liab': 11720000000,\n            'equity': 21112000000,\n            'cash': 6530000000,\n            'cash_flow_op': 937000000,\n            'cash_flow_inv': -719000000,\n            'cash_flow_fin': -411000000\n        })\n\n    def test_ecl_20120930(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/31462/000110465912072308/ecl-20120930.xml')\n        self.assert_item(item, {\n            'symbol': 'ECL',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2012,\n            'end_date': '2012-09-30',\n            'revenues': 3023300000,\n            'op_income': 401200000,\n            'net_income': 238000000,\n            'eps_basic': 0.81,\n            'eps_diluted': 0.8,\n            'dividend': 0.2,\n            'assets': 16722800000,\n            'cur_assets': 4072900000,\n            'cur_liab': 2818700000,\n            'equity': 6026200000,\n            'cash': 324000000,\n            'cash_flow_op': 720800000,\n            'cash_flow_inv': -414900000,\n            'cash_flow_fin': -1815800000\n        })\n\n    def test_ed_20130930(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/23632/000119312513425393/ed-20130930.xml')\n        self.assert_item(item, {\n            'symbol': 'ED',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2013,\n            'end_date': '2013-09-30',\n            'revenues': 3484000000,\n            'op_income': 855000000,\n            'net_income': 464000000,\n            'eps_basic': 1.58,\n            'eps_diluted': 1.58,\n            'dividend': 0.615,\n            'assets': 41964000000,\n            'cur_assets': 3704000000,\n            'cur_liab': 4373000000,\n            'equity': 12166000000,\n            'cash': 74000000,\n            'cash_flow_op': 1238000000,\n            'cash_flow_inv': -1895000000,\n            'cash_flow_fin': 337000000\n        })\n\n    def test_eqt_20101231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/33213/000110465911009751/eqt-20101231.xml')\n        self.assert_item(item, {\n            'symbol': 'EQT',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2010,\n            'end_date': '2010-12-31',\n            'revenues': 1322708000,\n            'op_income': 470479000,\n            'net_income': 227700000,\n            'eps_basic': 1.58,\n            'eps_diluted': 1.57,\n            'dividend': 0.88,\n            'assets': 7098438000,\n            'cur_assets': 827940000,\n            'cur_liab': 596984000,\n            'equity': 3078696000,\n            'cash': 0.0,\n            'cash_flow_op': 789740000,\n            'cash_flow_inv': -1239429000,\n            'cash_flow_fin': 449689000\n        })\n\n    def test_etr_20121231(self):\n        # Large file test (121 MB)\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/7323/000006598413000050/etr-20121231.xml')\n        self.assert_item(item, {\n            'symbol': 'ETR',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2012,\n            'end_date': '2012-12-31',\n            'revenues': 10302079000,\n            'op_income': 1301181000,\n            'net_income': 846673000,\n            'eps_basic': 4.77,\n            'eps_diluted': 4.76,\n            'dividend': 3.32,\n            'assets': 43202502000,\n            'cur_assets': 3683126000,\n            'cur_liab': 4106321000,\n            'equity': 9291089000,\n            'cash': 532569000,\n            'cash_flow_op': 2940285000,\n            'cash_flow_inv': -3639797000,\n            'cash_flow_fin': 538151000\n        })\n\n    def test_exc_20100930(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/22606/000119312510234590/exc-20100930.xml')\n        self.assert_item(item, {\n            'symbol': 'EXC',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2010,\n            'end_date': '2010-09-30',\n            'revenues': 5291000000,\n            'op_income': 1366000000,\n            'net_income': 845000000,\n            'eps_basic': 1.28,\n            'eps_diluted': 1.27,\n            'dividend': 0.53,\n            'assets': 50948000000,\n            'cur_assets': 6760000000,\n            'cur_liab': 3967000000,\n            'equity': 13955000000,\n            'cash': 2735000000,\n            'cash_flow_op': 4112000000,\n            'cash_flow_inv': -2037000000,\n            'cash_flow_fin': -1350000000\n        })\n\n    def test_fast_20090630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/815556/000119312509154691/fast-20090630.xml')\n        self.assert_item(item, {\n            'symbol': 'FAST',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2009,\n            'end_date': '2009-06-30',\n            'revenues': 474894000,\n            'op_income': 69938000,\n            'net_income': 43538000,\n            'eps_basic': 0.29,\n            'eps_diluted': 0.29,\n            'dividend': 0.0,\n            'assets': 1328684000,\n            'cur_assets': 988997000,\n            'cur_liab': 127950000,\n            'equity': 1186845000,\n            'cash': 173667000,\n            'cash_flow_op': 167552000,\n            'cash_flow_inv': -28942000,\n            'cash_flow_fin': -51986000\n        })\n\n    def test_fast_20090930(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/815556/000119312509212481/fast-20090930.xml')\n        self.assert_item(item, {\n            'symbol': 'FAST',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2009,\n            'end_date': '2009-09-30',\n            'revenues': 489339000,\n            'op_income': 76410000,\n            'net_income': 47589000,\n            'eps_basic': 0.32,\n            'eps_diluted': 0.32,\n            'dividend': 0.0,\n            'assets': 1337764000,\n            'cur_assets': 998090000,\n            'cur_liab': 138744000,\n            'equity': 1185140000,\n            'cash': 193744000,\n            'cash_flow_op': 253184000,\n            'cash_flow_inv': -41031000,\n            'cash_flow_fin': -106943000\n        })\n\n    def test_fb_20120630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1326801/000119312512325997/fb-20120630.xml')\n        self.assert_item(item, {\n            'symbol': 'FB',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2012,\n            'end_date': '2012-06-30',\n            'revenues': 1184000000,\n            'op_income': -743000000,\n            'net_income': -157000000,\n            'eps_basic': -0.08,\n            'eps_diluted': -0.08,\n            'dividend': 0.0,\n            'assets': 14928000000,\n            'cur_assets': 11967000000,\n            'cur_liab': 1034000000,\n            'equity': 13309000000,\n            'cash': 2098000000,\n            'cash_flow_op': 683000000,\n            'cash_flow_inv': -7170000000,\n            'cash_flow_fin': 7090000000\n        })\n\n    def test_fb_20121231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1326801/000132680113000003/fb-20121231.xml')\n        self.assert_item(item, {\n            'symbol': 'FB',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2012,\n            'end_date': '2012-12-31',\n            'revenues': 5089000000,\n            'op_income': 538000000,\n            'net_income': 32000000,\n            'eps_basic': 0.02,\n            'eps_diluted': 0.01,\n            'dividend': 0.0,\n            'assets': 15103000000,\n            'cur_assets': 11267000000,\n            'cur_liab': 1052000000,\n            'equity': 11755000000,\n            'cash': 2384000000,\n            'cash_flow_op': 1612000000,\n            'cash_flow_inv': -7024000000,\n            'cash_flow_fin': 6283000000\n        })\n\n    def test_fll_20121231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/891482/000118811213000562/fll-20121231.xml')\n        self.assert_item(item, {\n            'symbol': 'FLL',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2012,\n            'end_date': '2012-12-31',\n            'revenues': 128760000,\n            'op_income': 49638000,\n            'net_income': 27834000,\n            'eps_basic': 1.49,\n            'eps_diluted': None,\n            'dividend': 0.0,\n            'assets': 162725000,\n            'cur_assets': 32339000,\n            'cur_liab': 15332000,\n            'equity': 81133000,\n            'cash': 20603000,\n            'cash_flow_op': -4301000,\n            'cash_flow_inv': 45271000,\n            'cash_flow_fin': -35074000\n        })\n\n    def test_flr_20080930(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1124198/000110465908068715/flr-20080930.xml')\n        self.assert_item(item, {\n            'symbol': 'FLR',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2008,\n            'end_date': '2008-09-30',\n            'revenues': 5673818000,\n            'op_income': None,\n            'net_income': 183099000,\n            'eps_basic': 1.03,\n            'eps_diluted': 1.01,\n            'dividend': 0.125,\n            'assets': 6605120000,\n            'cur_assets': 4808393000,\n            'cur_liab': 3228638000,\n            'equity': 2741002000,\n            'cash': 1514943000,\n            'cash_flow_op': 855198000,\n            'cash_flow_inv': -295445000,\n            'cash_flow_fin': -202011000\n        })\n\n    def test_fmc_20090630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/37785/000119312509165435/fmc-20090630.xml')\n        self.assert_item(item, {\n            'symbol': 'FMC',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2009,\n            'end_date': '2009-06-30',\n            'revenues': 700300000,\n            'op_income': 97200000,\n            'net_income': 69300000,\n            'eps_basic': 0.95,\n            'eps_diluted': 0.94,\n            'dividend': 0.0,\n            'assets': 3028500000,\n            'cur_assets': 1423700000,\n            'cur_liab': 717200000,\n            'equity': 1101200000,\n            'cash': 67000000,\n            'cash_flow_op': 173900000,\n            'cash_flow_inv': -106500000,\n            'cash_flow_fin': -33100000\n        })\n\n    def test_fpl_20100331(self):\n        # FPL was later changed to NEE\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/37634/000075330810000051/fpl-20100331.xml')\n        self.assert_item(item, {\n            'symbol': 'FPL',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2010,\n            'end_date': '2010-03-31',\n            'revenues': 3622000000,\n            'op_income': 939000000,\n            'net_income': 556000000,\n            'eps_basic': 1.36,\n            'eps_diluted': 1.36,\n            'dividend': 0.5,\n            'assets': 50942000000,\n            'cur_assets': 5557000000,\n            'cur_liab': 7782000000,\n            'equity': 13336000000,\n            'cash': 1215000000,\n            'cash_flow_op': 896000000,\n            'cash_flow_inv': -1361000000,\n            'cash_flow_fin': 1442000000\n        })\n\n    def test_ftr_20110930(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/20520/000002052011000066/ftr-20110930.xml')\n        self.assert_item(item, {\n            'symbol': 'FTR',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2011,\n            'end_date': '2011-09-30',\n            'revenues': 1290939000,\n            'op_income': 180291000,\n            'net_income': 19481000,\n            'eps_basic': 0.02,\n            'eps_diluted': 0.02,\n            'dividend': 0.0,\n            'assets': 17493767000,\n            'cur_assets': 969746000,\n            'cur_liab': 1168142000,\n            'equity': 4776588000,\n            'cash': 205817000,\n            'cash_flow_op': 1272654000,\n            'cash_flow_inv': -676974000,\n            'cash_flow_fin': -641126000\n        })\n\n    def test_ge_20121231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/40545/000004054513000036/ge-20121231.xml')\n        self.assert_item(item, {\n            'symbol': 'GE',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2012,\n            'end_date': '2012-12-31',\n            'revenues': 147359000000,\n            'op_income': 22887000000,\n            'net_income': 13641000000,\n            'eps_basic': 1.29,\n            'eps_diluted': 1.29,\n            'dividend': 0.7,\n            'assets': 685328000000,\n            'cur_assets': None,\n            'cur_liab': None,\n            'equity': 128470000000,\n            'cash': 77356000000,\n            'cash_flow_op': 31331000000,\n            'cash_flow_inv': 11302000000,\n            'cash_flow_fin': -51074000000\n        })\n\n    def test_gis_20121125(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/40704/000119312512508388/gis-20121125.xml')\n        self.assert_item(item, {\n            'symbol': 'GIS',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2013,\n            'end_date': '2012-11-25',\n            'revenues': 4881800000,\n            'op_income': 829000000,\n            'net_income': 541600000,\n            'eps_basic': 0.84,\n            'eps_diluted': 0.82,\n            'dividend': 0.33,\n            'assets': 22952900000,\n            'cur_assets': 4565500000,\n            'cur_liab': 5736400000,\n            'equity': 7440000000,\n            'cash': 734900000,\n            'cash_flow_op': 1317100000,\n            'cash_flow_inv': -1103200000,\n            'cash_flow_fin': 33700000\n        })\n\n    def test_gmcr_20110625(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/909954/000119312511214253/gmcr-20110630.xml')\n        self.assert_item(item, {\n            'symbol': 'GMCR',\n            'amend': False,  # it's actually amended, but not marked in XML\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2011,\n            'end_date': '2011-06-25',\n            'revenues': 717210000,\n            'op_income': 119310000,\n            'net_income': 56348000,\n            'eps_basic': 0.38,\n            'eps_diluted': 0.37,\n            'dividend': 0.0,\n            'assets': 2874422000,\n            'cur_assets': 844998000,\n            'cur_liab': 395706000,\n            'equity': 1816646000,\n            'cash': 76138000,\n            'cash_flow_op': 174708000,\n            'cash_flow_inv': -1082070000,\n            'cash_flow_fin': 986183000\n        })\n\n    def test_goog_20090930(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1288776/000119312509222384/goog-20090930.xml')\n        self.assert_item(item, {\n            'symbol': 'GOOG',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2009,\n            'end_date': '2009-09-30',\n            'revenues': 5944851000,\n            'op_income': 2073718000,\n            'net_income': 1638975000,\n            'eps_basic': 5.18,\n            'eps_diluted': 5.13,\n            'dividend': 0.0,\n            'assets': 37702845000,\n            'cur_assets': 26353544000,\n            'cur_liab': 2321774000,\n            'equity': 33721753000,\n            'cash': 12087115000,\n            'cash_flow_op': 6584667000,\n            'cash_flow_inv': -3245963000,\n            'cash_flow_fin': 74851000\n        })\n\n    def test_goog_20120930(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1288776/000119312512440217/goog-20120930.xml')\n        self.assert_item(item, {\n            'symbol': 'GOOG',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2012,\n            'end_date': '2012-09-30',\n            'revenues': 14101000000,\n            'op_income': 2736000000,\n            'net_income': 2176000000,\n            'eps_basic': 6.64,\n            'eps_diluted': 6.53,\n            'dividend': 0.0,\n            'assets': 89730000000,\n            'cur_assets': 56821000000,\n            'cur_liab': 14434000000,\n            'equity': 68028000000,\n            'cash': 16260000000,\n            'cash_flow_op': 11950000000,\n            'cash_flow_inv': -7542000000,\n            'cash_flow_fin': 1921000000\n        })\n\n    def test_goog_20121231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1288776/000119312513028362/goog-20121231.xml')\n        self.assert_item(item, {\n            'symbol': 'GOOG',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2012,\n            'end_date': '2012-12-31',\n            'revenues': 50175000000,\n            'op_income': 12760000000,\n            'net_income': 10737000000,\n            'eps_basic': 32.81,\n            'eps_diluted': 32.31,\n            'dividend': 0.0,\n            'assets': 93798000000,\n            'cur_assets': 60454000000,\n            'cur_liab': 14337000000,\n            'equity': 71715000000,\n            'cash': 14778000000,\n            'cash_flow_op': 16619000000,\n            'cash_flow_inv': -13056000000,\n            'cash_flow_fin': 1229000000\n        })\n\n    def test_goog_20130630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1288776/000128877613000055/goog-20130630.xml')\n        self.assert_item(item, {\n            'symbol': 'GOOG',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2013,\n            'end_date': '2013-06-30',\n            'revenues': 14105000000,\n            'op_income': 3123000000,\n            'net_income': 3228000000,\n            'eps_basic': 9.71,\n            'eps_diluted': 9.54,\n            'dividend': 0.0,\n            'assets': 101182000000,\n            'cur_assets': 66861000000,\n            'cur_liab': 15329000000,\n            'equity': 78852000000,\n            'cash': 16164000000,\n            'cash_flow_op': 8338000000,\n            'cash_flow_inv': -6244000000,\n            'cash_flow_fin': -622000000\n        })\n\n    def test_goog_20140630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1288776/000128877614000065/goog-20140630.xml')\n        self.assert_item(item, {\n            'symbol': 'GOOG/GOOGL',  # Two symbols, see issue #6\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2014,\n            'end_date': '2014-06-30',\n            'revenues': 15955000000,\n            'op_income': 4258000000,\n            'net_income': 3422000000,\n            'eps_basic': 5.07,\n            'eps_diluted': 4.99,\n            'dividend': 0.0,\n            'assets': 121608000000,\n            'cur_assets': 77905000000,\n            'cur_liab': 17097000000,\n            'equity': 95749000000,\n            'cash': 19620000000,\n            'cash_flow_op': 10018000000,\n            'cash_flow_inv': -8487000000,\n            'cash_flow_fin': -640000000\n        })\n\n    def test_gs_20090626(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/886982/000095012309029919/gs-20090626.xml')\n        self.assert_item(item, {\n            'symbol': 'GS',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2009,\n            'end_date': '2009-06-26',\n            'revenues': 13761000000,\n            'op_income': None,\n            'net_income': 2718000000,\n            'eps_basic': 5.27,\n            'eps_diluted': 4.93,\n            'dividend': 0.35,\n            'assets': 889544000000,\n            'cur_assets': None,\n            'cur_liab': None,\n            'equity': 62813000000,\n            'cash': 22177000000,\n            'cash_flow_op': 16020000000,\n            'cash_flow_inv': -772000000,\n            'cash_flow_fin': -6876000000\n        })\n\n    def test_hon_20120331(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/773840/000093041312002323/hon-20120331.xml')\n        self.assert_item(item, {\n            'symbol': 'HON',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2012,\n            'end_date': '2012-03-31',\n            'revenues': 9307000000,\n            'op_income': None,\n            'net_income': 823000000,\n            'eps_basic': 1.06,\n            'eps_diluted': 1.04,\n            'dividend': 0.3725,\n            'assets': 40370000000,\n            'cur_assets': 16553000000,\n            'cur_liab': 12666000000,\n            'equity': 11842000000,\n            'cash': 3988000000,\n            'cash_flow_op': 196000000,\n            'cash_flow_inv': -122000000,\n            'cash_flow_fin': 169000000\n        })\n\n    def test_hrb_20090731(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12659/000095012309041361/hrb-20090731.xml')\n        self.assert_item(item, {\n            'symbol': 'HRB',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2010,\n            'end_date': '2009-07-31',\n            'revenues': 275505000,\n            'op_income': -214162000,\n            'net_income': -133634000,\n            'eps_basic': -0.4,\n            'eps_diluted': -0.4,\n            'dividend': 0.15,\n            'assets': 4545762000,\n            'cur_assets': 1828146000,\n            'cur_liab': 1823126000,\n            'equity': 1190714000,\n            'cash': 1006303000,\n            'cash_flow_op': -454577000,\n            'cash_flow_inv': 15360000,\n            'cash_flow_fin': -216206000\n        })\n\n    def test_hrb_20091031(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12659/000095012309069608/hrb-20091031.xml')\n        self.assert_item(item, {\n            'symbol': 'HRB',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2010,\n            'end_date': '2009-10-31',\n            'revenues': 326081000,\n            'op_income': -214553000,\n            'net_income': -128587000,\n            'eps_basic': -0.38,\n            'eps_diluted': -0.38,\n            'dividend': 0.15,\n            'assets': 4967359000,\n            'cur_assets': 2300986000,\n            'cur_liab': 2382867000,\n            'equity': 1071097000,\n            'cash': 1432243000,\n            'cash_flow_op': -786152000,\n            'cash_flow_inv': 43280000,\n            'cash_flow_fin': 511231000\n        })\n\n    def test_hrb_20130731(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12659/000157484213000013/hrb-20130731.xml')\n        self.assert_item(item, {\n            'symbol': 'HRB',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2014,\n            'end_date': '2013-07-31',\n            'revenues': 127195000,\n            'op_income': -179555000,\n            'net_income': -115187000,\n            'eps_basic': -0.42,\n            'eps_diluted': -0.42,\n            'dividend': 0.20,\n            'assets': 3762888000,\n            'cur_assets': 1704932000,\n            'cur_liab': 1450484000,\n            'equity': 1105315000,\n            'cash': 1163876000,\n            'cash_flow_op': -318742000,\n            'cash_flow_inv': -29090000,\n            'cash_flow_fin': -229255000\n        })\n\n    def test_ihc_20120331(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/701869/000070186912000029/ihc-20120331.xml')\n        self.assert_item(item, {\n            'symbol': 'IHC',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2012,\n            'end_date': '2012-03-31',\n            'revenues': 102156000,\n            'op_income': 6416000,\n            'net_income': 3922000,\n            'eps_basic': 0.22,\n            'eps_diluted': 0.22,\n            'dividend': 0.0,\n            'assets': 1364411000,\n            'cur_assets': None,\n            'cur_liab': None,\n            'equity': 280250000,\n            'cash': 9286000,\n            'cash_flow_op': -138843000,\n            'cash_flow_inv': 130710000,\n            'cash_flow_fin': -808000\n        })\n\n    def test_intc_20111231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/50863/000119312512075534/intc-20111231.xml')\n        self.assert_item(item, {\n            'symbol': 'INTC',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2011,\n            'end_date': '2011-12-31',\n            'revenues': 53999000000,\n            'op_income': 17477000000,\n            'net_income': 12942000000,\n            'eps_basic': 2.46,\n            'eps_diluted': 2.39,\n            'dividend': 0.7824,\n            'assets': 71119000000,\n            'cur_assets': 25872000000,\n            'cur_liab': 12028000000,\n            'equity': 45911000000,\n            'cash': 5065000000,\n            'cash_flow_op': 20963000000,\n            'cash_flow_inv': -10301000000,\n            'cash_flow_fin': -11100000000\n        })\n\n    def test_intu_20101031(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/896878/000095012310111135/intu-20101031.xml')\n        self.assert_item(item, {\n            'symbol': 'INTU',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2011,\n            'end_date': '2010-10-31',\n            'revenues': 532000000,\n            'op_income': -104000000,\n            'net_income': -70000000,\n            'eps_basic': -0.22,\n            'eps_diluted': -0.22,\n            'dividend': 0.0,\n            'assets': 4943000000,\n            'cur_assets': 2010000000,\n            'cur_liab': 1136000000,\n            'equity': 2615000000,\n            'cash': 112000000,\n            'cash_flow_op': -211000000,\n            'cash_flow_inv': 285000000,\n            'cash_flow_fin': -177000000\n        })\n\n    def test_jnj_20120101(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/200406/000119312512075565/jnj-20120101.xml')\n        self.assert_item(item, {\n            'symbol': 'JNJ',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2011,\n            'end_date': '2012-01-01',\n            'revenues': 65030000000,\n            'op_income': 13765000000,\n            'net_income': 9672000000,\n            'eps_basic': 3.54,\n            'eps_diluted': 3.49,\n            'dividend': 2.25,\n            'assets': 113644000000,\n            'cur_assets': 54316000000,\n            'cur_liab': 22811000000,\n            'equity': 57080000000,\n            'cash': 24542000000,\n            'cash_flow_op': 14298000000,\n            'cash_flow_inv': -4612000000,\n            'cash_flow_fin': -4452000000\n        })\n\n    def test_jnj_20120930(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/200406/000020040612000140/jnj-20120930.xml')\n        self.assert_item(item, {\n            'symbol': 'JNJ',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2012,\n            'end_date': '2012-09-30',\n            'revenues': 17052000000,\n            'op_income': 3825000000,\n            'net_income': 2968000000,\n            'eps_basic': 1.08,\n            'eps_diluted': 1.05,\n            'dividend': 0.61,\n            'assets': 118951000000,\n            'cur_assets': 44791000000,\n            'cur_liab': 23935000000,\n            'equity': 63761000000,\n            'cash': 15486000000,\n            'cash_flow_op': 12020000000,\n            'cash_flow_inv': -2007000000,\n            'cash_flow_fin': -19091000000\n        })\n\n    def test_jnj_20130630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/200406/000020040613000091/jnj-20130630.xml')\n        self.assert_item(item, {\n            'symbol': 'JNJ',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2013,\n            'end_date': '2013-06-30',\n            'revenues': 17877000000,\n            'op_income': 5020000000,\n            'net_income': 3833000000,\n            'eps_basic': 1.36,\n            'eps_diluted': 1.33,\n            'dividend': 0.66,\n            'assets': 124325000000,\n            'cur_assets': 51273000000,\n            'cur_liab': 23767000000,\n            'equity': 69665000000,\n            'cash': 17307000000,\n            'cash_flow_op': 7328000000,\n            'cash_flow_inv': -1972000000,\n            'cash_flow_fin': -2754000000\n        })\n\n    def test_jpm_20090630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/19617/000095012309032832/jpm-20090630.xml')\n        self.assert_item(item, {\n            'symbol': 'JPM',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2009,\n            'end_date': '2009-06-30',\n            'revenues': 25623000000,\n            'op_income': None,\n            'net_income': 1072000000,\n            'eps_basic': 0.28,\n            'eps_diluted': 0.28,\n            'dividend': 0.05,\n            'assets': 2026642000000,\n            'cur_assets': None,\n            'cur_liab': None,\n            'equity': 154766000000,\n            'cash': 25133000000,\n            'cash_flow_op': 103259000000,\n            'cash_flow_inv': 34430000000,\n            'cash_flow_fin': -139413000000\n        })\n\n    def test_jpm_20111231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/19617/000001961712000163/jpm-20111231.xml')\n        self.assert_item(item, {\n            'symbol': 'JPM',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2011,\n            'end_date': '2011-12-31',\n            'revenues': 97234000000,\n            'op_income': None,\n            'net_income': 17568000000,\n            'eps_basic': 4.50,\n            'eps_diluted': 4.48,\n            'dividend': 1.0,\n            'assets': 2265792000000,\n            'cur_assets': None,\n            'cur_liab': None,\n            'equity': 183573000000,\n            'cash': 59602000000,\n            'cash_flow_op': 95932000000,\n            'cash_flow_inv': -170752000000,\n            'cash_flow_fin': 107706000000\n        })\n\n    def test_jpm_20130331(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/19617/000001961713000300/jpm-20130331.xml')\n        self.assert_item(item, {\n            'symbol': 'JPM',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2013,\n            'end_date': '2013-03-31',\n            'revenues': 25122000000,\n            'op_income': None,\n            'net_income': 6131000000,\n            'eps_basic': 1.61,\n            'eps_diluted': 1.59,\n            'dividend': 0.30,\n            'assets': 2389349000000,\n            'cur_assets': None,\n            'cur_liab': None,\n            'equity': 207086000000,\n            'cash': 45524000000,\n            'cash_flow_op': 19964000000,\n            'cash_flow_inv': -55455000000,\n            'cash_flow_fin': 28180000000\n        })\n\n    def test_ko_20100402(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/21344/000104746910004416/ko-20100402.xml')\n        self.assert_item(item, {\n            'symbol': 'KO',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2010,\n            'end_date': '2010-04-02',\n            'revenues': 7525000000,\n            'op_income': 2183000000,\n            'net_income': 1614000000,\n            'eps_basic': 0.70,\n            'eps_diluted': 0.69,\n            'dividend': 0.44,\n            'assets': 47403000000,\n            'cur_assets': 17208000000,\n            'cur_liab': 13583000000,\n            'equity': 25157000000,\n            'cash': 5684000000,\n            'cash_flow_op': 1326000000,\n            'cash_flow_inv': -1368000000,\n            'cash_flow_fin': -1043000000\n        })\n\n    def test_ko_20101231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/21344/000104746911001506/ko-20101231.xml')\n        self.assert_item(item, {\n            'symbol': 'KO',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2010,\n            'end_date': '2010-12-31',\n            'revenues': 35119000000,\n            'op_income': 8449000000,\n            'net_income': 11809000000,\n            'eps_basic': 5.12,\n            'eps_diluted': 5.06,\n            'dividend': 1.76,\n            'assets': 72921000000,\n            'cur_assets': 21579000000,\n            'cur_liab': 18508000000,\n            'equity': 31317000000,\n            'cash': 8517000000,\n            'cash_flow_op': 9532000000,\n            'cash_flow_inv': -4405000000,\n            'cash_flow_fin': -3465000000\n        })\n\n    def test_ko_20120928(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/21344/000002134412000051/ko-20120928.xml')\n        self.assert_item(item, {\n            'symbol': 'KO',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2012,\n            'end_date': '2012-09-28',\n            'revenues': 12340000000,\n            'op_income': 2793000000,\n            'net_income': 2311000000,\n            'eps_basic': 0.51,\n            'eps_diluted': 0.50,\n            'dividend': 0.255,\n            'assets': 86654000000,\n            'cur_assets': 29712000000,\n            'cur_liab': 27008000000,\n            'equity': 33590000000,\n            'cash': 9615000000,\n            'cash_flow_op': 7840000000,\n            'cash_flow_inv': -10399000000,\n            'cash_flow_fin': -399000000\n        })\n\n    def test_krft_20120930(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1545158/000119312512495570/krft-20120930.xml')\n        self.assert_item(item, {\n            'symbol': 'KRFT',\n            'amend': True,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2012,\n            'end_date': '2012-09-30',\n            'revenues': 4606000000,\n            'op_income': 762000000,\n            'net_income': 470000000,\n            'eps_basic': 0.79,\n            'eps_diluted': 0.79,\n            'dividend': 0.0,\n            'assets': 22284000000,\n            'cur_assets': 3905000000,\n            'cur_liab': 2569000000,\n            'equity': 7458000000,\n            'cash': 244000000,\n            'cash_flow_op': 2067000000,\n            'cash_flow_inv': -279000000,\n            'cash_flow_fin': -1548000000\n        })\n\n    def test_l_20100331(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/60086/000119312510105707/l-20100331.xml')\n        self.assert_item(item, {\n            'symbol': 'L',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2010,\n            'end_date': '2010-03-31',\n            'revenues': 3713000000,\n            'op_income': None,\n            'net_income': 420000000,\n            'eps_basic': 0.99,\n            'eps_diluted': 0.99,\n            'dividend': 0.0625,\n            'assets': 75855000000,\n            'cur_assets': None,\n            'cur_liab': None,\n            'equity': 21993000000,\n            'cash': 135000000,\n            'cash_flow_op': 294000000,\n            'cash_flow_inv': -411000000,\n            'cash_flow_fin': 64000000\n        })\n\n    def test_l_20100930(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/60086/000119312510245478/l-20100930.xml')\n        self.assert_item(item, {\n            'symbol': 'L',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2010,\n            'end_date': '2010-09-30',\n            'revenues': 3701000000,\n            'op_income': None,\n            'net_income': 36000000,\n            'eps_basic': 0.09,\n            'eps_diluted': 0.09,\n            'dividend': 0.0625,\n            'assets': 76821000000,\n            'cur_assets': None,\n            'cur_liab': None,\n            'equity': 23499000000,\n            'cash': 132000000,\n            'cash_flow_op': 895000000,\n            'cash_flow_inv': -426000000,\n            'cash_flow_fin': -527000000\n        })\n\n    def test_lbtya_20100331(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1316631/000119312510111069/lbtya-20100331.xml')\n        self.assert_item(item, {\n            'symbol': 'LBTYA',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2010,\n            'end_date': '2010-03-31',\n            'revenues': 2178900000,\n            'op_income': 303600000,\n            'net_income': 736600000,\n            'eps_basic': 2.75,\n            'eps_diluted': 2.75,\n            'dividend': 0.0,\n            'assets': 33083500000,\n            'cur_assets': 5524900000,\n            'cur_liab': 4107000000,\n            'equity': 4066000000,\n            'cash': 4184200000,\n            'cash_flow_op': 803300000,\n            'cash_flow_inv': 45400000,\n            'cash_flow_fin': 170700000\n        })\n\n    def test_lcapa_20110930(self):\n        # This symbol was changed to STRZA\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1507934/000150793411000006/lcapa-20110930.xml')\n        self.assert_item(item, {\n            'symbol': 'LCAPA',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2011,\n            'end_date': '2011-09-30',\n            'revenues': 540000000,\n            'op_income': 111000000,\n            'net_income': -42000000,\n            'eps_basic': -0.07,\n            'eps_diluted': -0.12,\n            'dividend': 0.0,\n            'assets': 8915000000,\n            'cur_assets': 3767000000,\n            'cur_liab': 3012000000,\n            'equity': 5078000000,\n            'cash': 1937000000,\n            'cash_flow_op': 316000000,\n            'cash_flow_inv': -205000000,\n            'cash_flow_fin': -264000000\n        })\n\n    def test_linta_20120331(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1355096/000135509612000008/linta-20120331.xml')\n        self.assert_item(item, {\n            'symbol': 'LINTA',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2012,\n            'end_date': '2012-03-31',\n            'revenues': 2314000000,\n            'op_income': 258000000,\n            'net_income': 91000000,\n            'eps_basic': 0.16,\n            'eps_diluted': 0.16,\n            'dividend': 0.0,\n            'assets': 17144000000,\n            'cur_assets': 2764000000,\n            'cur_liab': 3486000000,\n            'equity': 6505000000,\n            'cash': 794000000,\n            'cash_flow_op': 330000000,\n            'cash_flow_inv': -91000000,\n            'cash_flow_fin': -284000000\n        })\n\n    def test_lll_20100625(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1039101/000095012310071159/lll-20100625.xml')\n        self.assert_item(item, {\n            'symbol': 'LLL',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2010,\n            'end_date': '2010-06-25',\n            'revenues': -3966000000,  # a doc's error, should be 3966M\n            'op_income': -442000000,  # a doc's error, should be 442M\n            'net_income': -228000000,  # a doc's error, should be 227M\n            'eps_basic': 1.97,\n            'eps_diluted': 1.95,\n            'dividend': 0.4,\n            'assets': 15689000000,\n            'cur_assets': 5494000000,\n            'cur_liab': 3730000000,\n            'equity': 6926000000,\n            'cash': 1023000000,\n            'cash_flow_op': 589000000,\n            'cash_flow_inv': -688000000,\n            'cash_flow_fin': 132000000\n        })\n\n    def test_lltc_20110102(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/791907/000079190711000016/lltc-20110102.xml')\n        self.assert_item(item, {\n            'symbol': 'LLTC',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2011,\n            'end_date': '2011-01-02',\n            'revenues': 383621000,\n            'op_income': 201059000,\n            'net_income': 143743000,\n            'eps_basic': 0.62,\n            'eps_diluted': 0.62,\n            'dividend': 0.23,\n            'assets': 1446186000,\n            'cur_assets': 1069958000,\n            'cur_liab': 199210000,\n            'equity': 278793000,\n            'cash': 203308000,\n            'cash_flow_op': 342333000,\n            'cash_flow_inv': 39771000,\n            'cash_flow_fin': -474650000\n        })\n\n    def test_lltc_20111002(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/791907/000079190711000080/lltc-20111007.xml')\n        self.assert_item(item, {\n            'symbol': 'LLTC',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2012,\n            'end_date': '2011-10-02',\n            'revenues': 329920000,\n            'op_income': 157566000,\n            'net_income': 108401000,\n            'eps_basic': 0.47,\n            'eps_diluted': 0.47,\n            'dividend': 0.24,\n            'assets': 1659341000,\n            'cur_assets': 1268413000,\n            'cur_liab': 169006000,\n            'equity': 543199000,\n            'cash': 163414000,\n            'cash_flow_op': 149860000,\n            'cash_flow_inv': -171884000,\n            'cash_flow_fin': -85085000\n        })\n\n    def test_lly_20100930(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/59478/000095012310097867/lly-20100930.xml')\n        self.assert_item(item, {\n            'symbol': 'LLY',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2010,\n            'end_date': '2010-09-30',\n            'revenues': 5654800000,\n            'op_income': None,\n            'net_income': 1302900000,\n            'eps_basic': 1.18,\n            'eps_diluted': 1.18,\n            'dividend': 0.49,\n            'assets': 29904300000,\n            'cur_assets': 14184300000,\n            'cur_liab': 6097400000,\n            'equity': 12405500000,\n            'cash': 5908800000,\n            'cash_flow_op': 4628700000,\n            'cash_flow_inv': -1595300000,\n            'cash_flow_fin': -1472300000\n        })\n\n    def test_lmca_20120331(self):\n        # This symbol was changed to STRZA\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1507934/000150793412000012/lmca-20120331.xml')\n        self.assert_item(item, {\n            'symbol': 'LMCA',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2012,\n            'end_date': '2012-03-31',\n            'revenues': 440000000,\n            'op_income': 89000000,\n            'net_income': 137000000,\n            'eps_basic': 1.13,\n            'eps_diluted': 1.10,\n            'dividend': 0.0,\n            'assets': 7122000000,\n            'cur_assets': 3380000000,\n            'cur_liab': 547000000,\n            'equity': 5321000000,\n            'cash': 1915000000,\n            'cash_flow_op': 94000000,\n            'cash_flow_inv': 581000000,\n            'cash_flow_fin': -830000000\n        })\n\n    def test_lnc_20120930(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/59558/000005955812000143/lnc-20120930.xml')\n        self.assert_item(item, {\n            'symbol': 'LNC',\n            'amend': False,  # mistake in doc, should be True\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2012,\n            'end_date': '2012-09-30',\n            'revenues': None,  # missing in doc, should be 2954000000\n            'op_income': None,\n            'net_income': 402000000,\n            'eps_basic': 1.45,\n            'eps_diluted': 1.41,\n            'dividend': 0.0,\n            'assets': 215458000000,\n            'cur_assets': None,\n            'cur_liab': None,\n            'equity': 15237000000,\n            'cash': 4373000000,\n            'cash_flow_op': 666000000,\n            'cash_flow_inv': -2067000000,\n            'cash_flow_fin': 1264000000\n        })\n\n    def test_ltd_20111029(self):\n        # This symbol was changed to LB\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/701985/000144530511003514/ltd-20111029.xml')\n        self.assert_item(item, {\n            'symbol': 'LTD',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2011,\n            'end_date': '2011-10-29',\n            'revenues': 2174000000,\n            'op_income': 186000000,\n            'net_income': 94000000,\n            'eps_basic': 0.32,\n            'eps_diluted': 0.31,\n            'dividend': 0.2,\n            'assets': 6517000000,\n            'cur_assets': 2616000000,\n            'cur_liab': 1504000000,\n            'equity': 521000000,\n            'cash': 498000000,\n            'cash_flow_op': 94000000,\n            'cash_flow_inv': -239000000,\n            'cash_flow_fin': -489000000\n        })\n\n    def test_ltd_20130803(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/701985/000070198513000032/ltd-20130803.xml')\n        self.assert_item(item, {\n            'symbol': 'LTD',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2013,\n            'end_date': '2013-08-03',\n            'revenues': 2516000000,\n            'op_income': 358000000,\n            'net_income': 178000000,\n            'eps_basic': 0.62,\n            'eps_diluted': 0.61,\n            'dividend': 0.3,\n            'assets': 6072000000,\n            'cur_assets': 2098000000,\n            'cur_liab': 1485000000,\n            'equity': -861000000,\n            'cash': 551000000,\n            'cash_flow_op': 354000000,\n            'cash_flow_inv': -381000000,\n            'cash_flow_fin': -194000000\n        })\n\n    def test_luv_20110630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/92380/000009238011000070/luv-20110630.xml')\n        self.assert_item(item, {\n            'symbol': 'LUV',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2011,\n            'end_date': '2011-06-30',\n            'revenues': 4136000000,\n            'op_income': 207000000,\n            'net_income': 161000000,\n            'eps_basic': 0.21,\n            'eps_diluted': 0.21,\n            'dividend': 0.0045,\n            'assets': 18945000000,\n            'cur_assets': 5421000000,\n            'cur_liab': 5318000000,\n            'equity': 7202000000,\n            'cash': 1595000000,\n            'cash_flow_op': 237000000,\n            'cash_flow_inv': -589000000,\n            'cash_flow_fin': -92000000\n        })\n\n    def test_mchp_20120630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/827054/000082705412000230/mchp-20120630.xml')\n        self.assert_item(item, {\n            'symbol': 'MCHP',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2013,\n            'end_date': '2012-06-30',\n            'revenues': 352134000,\n            'op_income': 96333000,\n            'net_income': 78710000,\n            'eps_basic': 0.41,\n            'eps_diluted': 0.39,\n            'dividend': 0.35,\n            'assets': 3144840000,\n            'cur_assets': 2229298000,\n            'cur_liab': 249989000,\n            'equity': 2017990000,\n            'cash': 779848000,\n            'cash_flow_op': 128971000,\n            'cash_flow_inv': 77890000,\n            'cash_flow_fin': -62768000\n        })\n\n    def test_mdlz_20130930(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1103982/000119312513431957/mdlz-20130930.xml')\n        self.assert_item(item, {\n            'symbol': 'MDLZ',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2013,\n            'end_date': '2013-09-30',\n            'revenues': 8472000000,\n            'op_income': 1262000000,\n            'net_income': 1024000000,\n            'eps_basic': 0.58,\n            'eps_diluted': 0.57,\n            'dividend': 0.14,\n            'assets': 74859000000,\n            'cur_assets': 15463000000,\n            'cur_liab': 15269000000,\n            'equity': 32492000000,\n            'cash': 3692000000,\n            'cash_flow_op': 1198000000,\n            'cash_flow_inv': -1015000000,\n            'cash_flow_fin': -881000000\n        })\n\n    def test_mmm_20091231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/66740/000110465910007295/mmm-20091231.xml')\n        self.assert_item(item, {\n            'symbol': 'MMM',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2009,\n            'end_date': '2009-12-31',\n            'revenues': 23123000000,\n            'op_income': 4814000000,\n            'net_income': 3193000000,\n            'eps_basic': 4.56,\n            'eps_diluted': 4.52,\n            'dividend': 2.04,\n            'assets': 27250000000,\n            'cur_assets': 10795000000,\n            'cur_liab': 4897000000,\n            'equity': 13302000000,\n            'cash': 3040000000,\n            'cash_flow_op': 4941000000,\n            'cash_flow_inv': -1732000000,\n            'cash_flow_fin': -2014000000\n        })\n\n    def test_mmm_20120331(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/66740/000110465912032441/mmm-20120331.xml')\n        self.assert_item(item, {\n            'symbol': 'MMM',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2012,\n            'end_date': '2012-03-31',\n            'revenues': 7486000000,\n            'op_income': 1634000000,\n            'net_income': 1125000000,\n            'eps_basic': 1.61,\n            'eps_diluted': 1.59,\n            'dividend': 0.59,\n            'assets': 32015000000,\n            'cur_assets': 12853000000,\n            'cur_liab': 5408000000,\n            'equity': 16619000000,\n            'cash': 2332000000,\n            'cash_flow_op': 828000000,\n            'cash_flow_inv': -43000000,\n            'cash_flow_fin': -722000000\n        })\n\n    def test_mmm_20130630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/66740/000110465913058961/mmm-20130630.xml')\n        self.assert_item(item, {\n            'symbol': 'MMM',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2013,\n            'end_date': '2013-06-30',\n            'revenues': 7752000000,\n            'op_income': 1702000000,\n            'net_income': 1197000000,\n            'eps_basic': 1.74,\n            'eps_diluted': 1.71,\n            'dividend': 0.635,\n            'assets': 34130000000,\n            'cur_assets': 13983000000,\n            'cur_liab': 6335000000,\n            'equity': 18319000000,\n            'cash': 2942000000,\n            'cash_flow_op': 2673000000,\n            'cash_flow_inv': -740000000,\n            'cash_flow_fin': -1727000000\n        })\n\n    def test_mnst_20130630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/865752/000110465913062263/mnst-20130630.xml')\n        self.assert_item(item, {\n            'symbol': 'MNST',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2013,\n            'end_date': '2013-06-30',\n            'revenues': 630934000,\n            'op_income': 179427000,\n            'net_income': 106873000,\n            'eps_basic': 0.64,\n            'eps_diluted': 0.62,\n            'dividend': 0.0,\n            'assets': 1317842000,\n            'cur_assets': 1093822000,\n            'cur_liab': 346174000,\n            'equity': 856021000,\n            'cash': 283839000,\n            'cash_flow_op': 99720000,\n            'cash_flow_inv': -70580000,\n            'cash_flow_fin': 30981000\n        })\n\n    def test_msft_20110630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/789019/000119312511200680/msft-20110630.xml')\n        self.assert_item(item, {\n            'symbol': 'MSFT',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2011,\n            'end_date': '2011-06-30',\n            'revenues': 69943000000,\n            'op_income': 27161000000,\n            'net_income': 23150000000,\n            'eps_basic': 2.73,\n            'eps_diluted': 2.69,\n            'dividend': 0.64,\n            'assets': 108704000000,\n            'cur_assets': 74918000000,\n            'cur_liab': 28774000000,\n            'equity': 57083000000,\n            'cash': 9610000000,\n            'cash_flow_op': 26994000000,\n            'cash_flow_inv': -14616000000,\n            'cash_flow_fin': -8376000000\n        })\n\n    def test_msft_20111231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/789019/000119312512026864/msft-20111231.xml')\n        self.assert_item(item, {\n            'symbol': 'MSFT',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2012,\n            'end_date': '2011-12-31',\n            'revenues': 20885000000,\n            'op_income': 7994000000,\n            'net_income': 6624000000,\n            'eps_basic': 0.79,\n            'eps_diluted': 0.78,\n            'dividend': 0.20,\n            'assets': 112243000000,\n            'cur_assets': 72513000000,\n            'cur_liab': 25373000000,\n            'equity': 64121000000,\n            'cash': 10610000000,\n            'cash_flow_op': 5862000000,\n            'cash_flow_inv': -5568000000,\n            'cash_flow_fin': -2513000000\n        })\n\n    def test_msft_20130331(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/789019/000119312513160748/msft-20130331.xml')\n        self.assert_item(item, {\n            'symbol': 'MSFT',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2013,\n            'end_date': '2013-03-31',\n            'revenues': 20489000000,\n            'op_income': 7612000000,\n            'net_income': 6055000000,\n            'eps_basic': 0.72,\n            'eps_diluted': 0.72,\n            'dividend': 0.23,\n            'assets': 134105000000,\n            'cur_assets': 93524000000,\n            'cur_liab': 31929000000,\n            'equity': 76688000000,\n            'cash': 5240000000,\n            'cash_flow_op': 9666000000,\n            'cash_flow_inv': -7660000000,\n            'cash_flow_fin': -2744000000\n        })\n\n    def test_mu_20121129(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/723125/000072312513000007/mu-20121129.xml')\n        self.assert_item(item, {\n            'symbol': 'MU',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2013,\n            'end_date': '2012-11-29',\n            'revenues': 1834000000,\n            'op_income': -157000000,\n            'net_income': -275000000,\n            'eps_basic': -0.27,\n            'eps_diluted': -0.27,\n            'dividend': 0.0,\n            'assets': 14067000000,\n            'cur_assets': 5315000000,\n            'cur_liab': 2138000000,\n            'equity': 8186000000,\n            'cash': 2102000000,\n            'cash_flow_op': 236000000,\n            'cash_flow_inv': -639000000,\n            'cash_flow_fin': 46000000\n        })\n\n    def test_mxim_20110326(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/743316/000144530511000751/mxim-20110422.xml')\n        self.assert_item(item, {\n            'symbol': 'MXIM',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2011,\n            'end_date': '2011-03-26',\n            'revenues': 606775000,\n            'op_income': 163995000,\n            'net_income': 136276000,\n            'eps_basic': 0.46,\n            'eps_diluted': 0.45,\n            'dividend': 0.21,\n            'assets': 3452417000,\n            'cur_assets': 1676593000,\n            'cur_liab': 391153000,\n            'equity': 2465040000,\n            'cash': 868923000,\n            'cash_flow_op': 615180000,\n            'cash_flow_inv': -224755000,\n            'cash_flow_fin': -348014000\n        })\n\n    def test_nflx_20120930(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1065280/000106528012000020/nflx-20120930.xml')\n        self.assert_item(item, {\n            'symbol': 'NFLX',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2012,\n            'end_date': '2012-09-30',\n            'revenues': 905089000,\n            'op_income': 16135000,\n            'net_income': 7675000,\n            'eps_basic': 0.14,\n            'eps_diluted': 0.13,\n            'dividend': 0.0,\n            'assets': 3808833000,\n            'cur_assets': 2225018000,\n            'cur_liab': 1598223000,\n            'equity': 716840000,\n            'cash': 370298000,\n            'cash_flow_op': 150000,\n            'cash_flow_inv': -33524000,\n            'cash_flow_fin': -158000\n        })\n\n    def test_nvda_20130127(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1045810/000104581013000008/nvda-20130127.xml')\n        self.assert_item(item, {\n            'symbol': 'NVDA',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2013,\n            'end_date': '2013-01-27',\n            'revenues': 4280159000,\n            'op_income': 648239000,\n            'net_income': 562536000,\n            'eps_basic': 0.91,\n            'eps_diluted': 0.9,\n            'dividend': 0.075,\n            'assets': 6412245000,\n            'cur_assets': 4775258000,\n            'cur_liab': 976223000,\n            'equity': 4827703000,\n            'cash': 732786000,\n            'cash_flow_op': 824172000,\n            'cash_flow_inv': -743992000,\n            'cash_flow_fin': -15270000\n        })\n\n    def test_nws_20090930(self):\n        # This symbol was changed to FOX\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1308161/000119312509224062/nws-20090930.xml')\n        self.assert_item(item, {\n            'symbol': 'NWS',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2010,\n            'end_date': '2009-09-30',\n            'revenues': 7199000000,\n            'op_income': 1042000000,\n            'net_income': 571000000,\n            'eps_basic': 0.22,\n            'eps_diluted': 0.22,\n            'dividend': 0.06,\n            'assets': 55316000000,\n            'cur_assets': 17425000000,\n            'cur_liab': 10990000000,\n            'equity': 24479000000,\n            'cash': 7832000000,\n            'cash_flow_op': 680000000,\n            'cash_flow_inv': -362000000,\n            'cash_flow_fin': 942000000\n        })\n\n    def test_omx_20110924(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12978/000119312511286448/omx-20110924.xml')\n        self.assert_item(item, {\n            'symbol': 'OMX',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2011,\n            'end_date': '2011-09-24',\n            'revenues': 1774767000,\n            'op_income': 41296000,\n            'net_income': 21518000,\n            'eps_basic': 0.25,\n            'eps_diluted': 0.25,\n            'dividend': 0.0,\n            'assets': 4002981000,\n            'cur_assets': 1950996000,\n            'cur_liab': 998377000,\n            'equity': 657636000,\n            'cash': 485426000,\n            'cash_flow_op': 78743000,\n            'cash_flow_inv': -41380000,\n            'cash_flow_fin': -11280000\n        })\n\n    def test_omx_20111231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12978/000119312512077611/omx-20111231.xml')\n        self.assert_item(item, {\n            'symbol': 'OMX',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2011,\n            'end_date': '2011-12-31',\n            'revenues': 7121167000,\n            'op_income': 86486000,\n            'net_income': 32771000,\n            'eps_basic': 0.38,\n            'eps_diluted': 0.38,\n            'dividend': 0.0,\n            'assets': 4069275000,\n            'cur_assets': 1938974000,\n            'cur_liab': 1013301000,\n            'equity': 568993000,\n            'cash': 427111000,\n            'cash_flow_op': 53679000,\n            'cash_flow_inv': -69373000,\n            'cash_flow_fin': -17952000\n        })\n\n    def test_omx_20121229(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12978/000119312513073972/omx-20121229.xml')\n        self.assert_item(item, {\n            'symbol': 'OMX',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2012,\n            'end_date': '2012-12-29',\n            'revenues': 6920384000,\n            'op_income': 24278000,\n            'net_income': 414694000,\n            'eps_basic': 4.79,\n            'eps_diluted': 4.74,\n            'dividend': 0.0,\n            'assets': 3784315000,\n            'cur_assets': 1983884000,\n            'cur_liab': 1056641000,\n            'equity': 1034373000,\n            'cash': 495056000,\n            'cash_flow_op': 185201000,\n            'cash_flow_inv': -85244000,\n            'cash_flow_fin': -34836000\n        })\n\n    def test_orly_20130331(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/898173/000089817313000028/orly-20130331.xml')\n        self.assert_item(item, {\n            'symbol': 'ORLY',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2013,\n            'end_date': '2013-03-31',\n            'revenues': 1585009000,\n            'op_income': 251084000,\n            'net_income': 154329000,\n            'eps_basic': 1.38,\n            'eps_diluted': 1.36,\n            'dividend': 0.0,\n            'assets': 5789541000,\n            'cur_assets': 2741188000,\n            'cur_liab': 2349022000,\n            'equity': 2072525000,\n            'cash': 205410000,\n            'cash_flow_op': 226344000,\n            'cash_flow_inv': -72100000,\n            'cash_flow_fin': -196962000\n        })\n\n    def test_pay_20110430(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1312073/000119312511161119/pay-20110430.xml')\n        self.assert_item(item, {\n            'symbol': 'PAY',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2011,\n            'end_date': '2011-04-30',\n            'revenues': 292446000,\n            'op_income': 37338000,\n            'net_income': 25200000,\n            'eps_basic': 0.29,\n            'eps_diluted': 0.27,\n            'dividend': 0.0,\n            'assets': 1252289000,\n            'cur_assets': 935395000,\n            'cur_liab': 303590000,\n            'equity': 332172000,\n            'cash': 531542000,\n            'cash_flow_op': 68831000,\n            'cash_flow_inv': -20049000,\n            'cash_flow_fin': 34676000\n        })\n\n    def test_pcar_20100331(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/75362/000119312510108284/pcar-20100331.xml')\n        self.assert_item(item, {\n            'symbol': 'PCAR',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2010,\n            'end_date': '2010-03-31',\n            'revenues': 2230700000,\n            'op_income': None,\n            'net_income': 68300000,\n            'eps_basic': 0.19,\n            'eps_diluted': 0.19,\n            'dividend': 0.09,\n            'assets': 13990000000,\n            'cur_assets': 3396400000,\n            'cur_liab': 1425900000,\n            'equity': 5092600000,\n            'cash': 1854700000,\n            'cash_flow_op': 285400000,\n            'cash_flow_inv': 40500000,\n            'cash_flow_fin': -350800000\n        })\n\n    def test_pcg_20091231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1004980/000100498010000015/pcg-20091231.xml')\n        self.assert_item(item, {\n            'symbol': 'PCG',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2009,\n            'end_date': '2009-12-31',\n            'revenues': 13399000000,\n            'op_income': 2299000000,\n            'net_income': 1220000000,\n            'eps_basic': 3.25,\n            'eps_diluted': 3.2,\n            'dividend': 1.68,\n            'assets': 42945000000,\n            'cur_assets': 5657000000,\n            'cur_liab': 6813000000,\n            'equity': 10585000000,\n            'cash': 527000000,\n            'cash_flow_op': 3039000000,\n            'cash_flow_inv': -3336000000,\n            'cash_flow_fin': 605000000\n        })\n\n    def test_plt_20130630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/914025/000091402513000049/plt-20130630.xml')\n        self.assert_item(item, {\n            'symbol': 'PLT',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2014,\n            'end_date': '2013-06-30',\n            'revenues': 202818000,\n            'op_income': 35949000,\n            'net_income': 26953000,\n            'eps_basic': 0.63,\n            'eps_diluted': 0.62,\n            'dividend': 0.1,\n            'assets': 780520000,\n            'cur_assets': 568272000,\n            'cur_liab': 90121000,\n            'equity': 673569000,\n            'cash': 256343000,\n            'cash_flow_op': 34140000,\n            'cash_flow_inv': -4120000,\n            'cash_flow_fin': -2424000\n        })\n\n    def test_qep_20110630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1108827/000119312511202252/qep-20110630.xml')\n        self.assert_item(item, {\n            'symbol': 'QEP',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2011,\n            'end_date': '2011-06-30',\n            'revenues': 784100000,\n            'op_income': 168900000,\n            'net_income': 92800000,\n            'eps_basic': 0.52,\n            'eps_diluted': 0.52,\n            'dividend': 0.02,\n            'assets': 7075000000,\n            'cur_assets': 655600000,\n            'cur_liab': 582900000,\n            'equity': 3184400000,\n            'cash': None,\n            'cash_flow_op': 628600000,\n            'cash_flow_inv': -660200000,\n            'cash_flow_fin': 31600000\n        })\n\n    def test_qep_20120930(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1108827/000110882712000006/qep-20120930.xml')\n        self.assert_item(item, {\n            'symbol': 'QEP',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2012,\n            'end_date': '2012-09-30',\n            'revenues': 542400000,\n            'op_income': -12600000,\n            'net_income': -3100000,\n            'eps_basic': -0.02,\n            'eps_diluted': -0.02,\n            'dividend': 0.02,\n            'assets': 8996100000,\n            'cur_assets': 619800000,\n            'cur_liab': 616700000,\n            'equity': 3377000000,\n            'cash': 0.0,\n            'cash_flow_op': 972000000,\n            'cash_flow_inv': -2435700000,\n            'cash_flow_fin': 1463700000\n        })\n\n    def test_regn_20100630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/872589/000120677410001689/regn-20100630.xml')\n        self.assert_item(item, {\n            'symbol': 'REGN',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2010,\n            'end_date': '2010-06-30',\n            'revenues': 115886000,\n            'op_income': -23724000,\n            'net_income': -25474000,\n            'eps_basic': -0.31,\n            'eps_diluted': -0.31,\n            'dividend': 0.0,\n            'assets': 790641000,\n            'cur_assets': 417750000,\n            'cur_liab': 119571000,\n            'equity': 371216000,\n            'cash': 112000000,\n            'cash_flow_op': -22626000,\n            'cash_flow_inv': -131383000,\n            'cash_flow_fin': 58934000\n        })\n\n    def test_sbac_20110331(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1034054/000119312511130220/sbac-20110331.xml')\n        self.assert_item(item, {\n            'symbol': 'SBAC',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2011,\n            'end_date': '2011-03-31',\n            'revenues': 167749000,\n            'op_income': 23899000,\n            'net_income': -34251000,\n            'eps_basic': -0.3,\n            'eps_diluted': -0.3,\n            'dividend': 0.0,\n            'assets': 3466258000,\n            'cur_assets': 173387000,\n            'cur_liab': 120247000,\n            'equity': 213078000,\n            'cash': 95104000,\n            'cash_flow_op': 53197000,\n            'cash_flow_inv': -108748000,\n            'cash_flow_fin': 86401000\n        })\n\n    def test_shld_20101030(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1310067/000119312510263486/shld-20101030.xml')\n        self.assert_item(item, {\n            'symbol': 'SHLD',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2010,\n            'end_date': '2010-10-30',\n            'revenues': 9678000000,\n            'op_income': -292000000,\n            'net_income': -218000000,\n            'eps_basic': -1.98,\n            'eps_diluted': -1.98,\n            'dividend': 0.0,\n            'assets': 26045000000,\n            'cur_assets': 13123000000,\n            'cur_liab': 10682000000,\n            'equity': 8378000000,\n            'cash': 790000000,\n            'cash_flow_op': -1172000000,\n            'cash_flow_inv': -296000000,\n            'cash_flow_fin': 532000000\n        })\n\n    def test_sial_20101231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/90185/000119312511028579/sial-20101231.xml')\n        self.assert_item(item, {\n            'symbol': 'SIAL',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2010,\n            'end_date': '2010-12-31',\n            'revenues': 2271000000,\n            'op_income': 551000000,\n            'net_income': 384000000,\n            'eps_basic': 3.17,\n            'eps_diluted': 3.12,\n            'dividend': 0.0,\n            'assets': 3014000000,\n            'cur_assets': 1602000000,\n            'cur_liab': 530000000,\n            'equity': 1976000000,\n            'cash': 569000000,\n            'cash_flow_op': 523000000,\n            'cash_flow_inv': -182000000,\n            'cash_flow_fin': -161000000\n        })\n\n    def test_siri_20100630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/908937/000095012310074081/siri-20100630.xml')\n        self.assert_item(item, {\n            'symbol': 'SIRI',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2010,\n            'end_date': '2010-06-30',\n            'revenues': 699761000,\n            'op_income': 125634000,\n            'net_income': 15272000,\n            'eps_basic': 0.0,\n            'eps_diluted': 0.0,\n            'dividend': 0.0,\n            'assets': 7200932000,\n            'cur_assets': 760172000,\n            'cur_liab': 2041871000,\n            'equity': 180428000,\n            'cash': 258854000,\n            'cash_flow_op': 140987000,\n            'cash_flow_inv': -159859000,\n            'cash_flow_fin': -105763000\n        })\n\n    def test_siri_20120331(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/908937/000090893712000003/siri-20120331.xml')\n        self.assert_item(item, {\n            'symbol': 'SIRI',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2012,\n            'end_date': '2012-03-31',\n            'revenues': 804722000,\n            'op_income': 199238000,\n            'net_income': 107774000,\n            'eps_basic': 0.03,\n            'eps_diluted': 0.02,\n            'dividend': 0.0,\n            'assets': 7501724000,\n            'cur_assets': 1337094000,\n            'cur_liab': 2236580000,\n            'equity': 849579000,\n            'cash': 746576000,\n            'cash_flow_op': 39948000,\n            'cash_flow_inv': -25187000,\n            'cash_flow_fin': -42175000\n        })\n\n    def test_spex_20130331(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/12239/000141588913001019/spex-20130331.xml')\n        self.assert_item(item, {\n            'symbol': 'SPEX',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2013,\n            'end_date': '2013-03-31',\n            'revenues': 5761,\n            'op_income': -910547,\n            'net_income': -3696570,\n            'eps_basic': -5.35,\n            'eps_diluted': None,\n            'dividend': 0.0,\n            'assets': 3572989,\n            'cur_assets': 3535555,\n            'cur_liab': 453858,\n            'equity': 2857993,\n            'cash': 3448526,\n            'cash_flow_op': -1049711,\n            'cash_flow_inv': None,\n            'cash_flow_fin': None\n        })\n\n    def test_strza_20121231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1507934/000150793413000015/strza-20121231.xml')\n        self.assert_item(item, {\n            'symbol': 'STRZA',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2012,\n            'end_date': '2012-12-31',\n            'revenues': 1630696000,\n            'op_income': 405404000,\n            'net_income': 254484000,\n            'eps_basic': None,\n            'eps_diluted': None,\n            'dividend': 0.0,\n            'assets': 2176050000,\n            'cur_assets': 1376911000,\n            'cur_liab': 330451000,\n            'equity': 1302144000,\n            'cash': 749774000,\n            'cash_flow_op': 292077000,\n            'cash_flow_inv': -16214000,\n            'cash_flow_fin': -626101000\n        })\n\n    def test_stx_20120928(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1137789/000110465912072744/stx-20120928.xml')\n        self.assert_item(item, {\n            'symbol': 'STX',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2013,\n            'end_date': '2012-09-28',\n            'revenues': 3732000000,\n            'op_income': 624000000,\n            'net_income': 582000000,\n            'eps_basic': 1.48,\n            'eps_diluted': 1.42,\n            'dividend': 0.32,\n            'assets': 9522000000,\n            'cur_assets': 5749000000,\n            'cur_liab': 2753000000,\n            'equity': 3535000000,\n            'cash': 1894000000,\n            'cash_flow_op': 1132000000,\n            'cash_flow_inv': -265000000,\n            'cash_flow_fin': -681000000\n        })\n\n    def test_stx_20121228(self):\n        # 'stx-20120928' is misnamed\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1137789/000110465913005497/stx-20120928.xml')\n        self.assert_item(item, {\n            'symbol': 'STX',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2013,\n            'end_date': '2012-12-28',\n            'revenues': 3668000000,\n            'op_income': 555000000,\n            'net_income': 492000000,\n            'eps_basic': 1.33,\n            'eps_diluted': 1.3,\n            'dividend': 0.7,\n            'assets': 8742000000,\n            'cur_assets': 5017000000,\n            'cur_liab': 2643000000,\n            'equity': 2925000000,\n            'cash': 1383000000,\n            'cash_flow_op': 1976000000,\n            'cash_flow_inv': -453000000,\n            'cash_flow_fin': -1849000000\n        })\n\n    def test_symc_20130628(self):\n        # 'symc-20140628.xml' is misnamed\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/849399/000119312513312695/symc-20140628.xml')\n        self.assert_item(item, {\n            'symbol': 'SYMC',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2014,\n            'end_date': '2013-06-28',\n            'revenues': 1709000000,\n            'op_income': 224000000,\n            'net_income': 157000000,\n            'eps_basic': 0.23,\n            'eps_diluted': 0.22,\n            'dividend': 0.15,\n            'assets': 13151000000,\n            'cur_assets': 5179000000,\n            'cur_liab': 4205000000,\n            'equity': 5497000000,\n            'cash': 3749000000,\n            'cash_flow_op': 312000000,\n            'cash_flow_inv': -29000000,\n            'cash_flow_fin': -1192000000\n        })\n\n    def test_tgt_20130803(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/27419/000110465913066569/tgt-20130803.xml')\n        self.assert_item(item, {\n            'symbol': 'TGT',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2013,\n            'end_date': '2013-08-03',\n            'revenues': 17117000000,\n            'op_income': 1161000000,\n            'net_income': 611000000,\n            'eps_basic': 0.96,\n            'eps_diluted': 0.95,\n            'dividend': 0.43,\n            'assets': 44162000000,\n            'cur_assets': 11403000000,\n            'cur_liab': 12616000000,\n            'equity': 16020000000,\n            'cash': 1018000000,\n            'cash_flow_op': 4109000000,\n            'cash_flow_inv': 1269000000,\n            'cash_flow_fin': -5148000000\n        })\n\n    def test_trv_20100331(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/86312/000110465910021504/trv-20100331.xml')\n        self.assert_item(item, {\n            'symbol': 'TRV',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2010,\n            'end_date': '2010-03-31',\n            'revenues': 6119000000,\n            'op_income': None,\n            'net_income': 647000000,\n            'eps_basic': 1.26,\n            'eps_diluted': 1.25,\n            'dividend': 0.0,\n            'assets': 108696000000,\n            'cur_assets': None,\n            'cur_liab': None,\n            'equity': 26671000000,\n            'cash': 251000000,\n            'cash_flow_op': 531000000,\n            'cash_flow_inv': 952000000,\n            'cash_flow_fin': -1486000000\n        })\n\n    def test_tsla_20110630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1318605/000119312511221497/tsla-20110630.xml')\n        self.assert_item(item, {\n            'symbol': 'TSLA',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2011,\n            'end_date': '2011-06-30',\n            'revenues': 58171000,\n            'op_income': -58739000,\n            'net_income': -58903000,\n            'eps_basic': -0.60,\n            'eps_diluted': -0.60,\n            'dividend': 0.0,\n            'assets': 646155000,\n            'cur_assets': 417758000,\n            'cur_liab': 138736000,\n            'equity': 348452000,\n            'cash': 319380000,\n            'cash_flow_op': -65785000,\n            'cash_flow_inv': -13011000,\n            'cash_flow_fin': 298618000\n        })\n\n    def test_tsla_20111231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1318605/000119312512137560/tsla-20111231.xml')\n        self.assert_item(item, {\n            'symbol': 'TSLA',\n            'amend': True,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2011,\n            'end_date': '2011-12-31',\n            'revenues': 204242000,\n            'op_income': -251488000,\n            'net_income': -254411000,\n            'eps_basic': -2.53,\n            'eps_diluted': -2.53,\n            'dividend': 0.0,\n            'assets': 713448000,\n            'cur_assets': 372838000,\n            'cur_liab': 191339000,\n            'equity': 224045000,\n            'cash': 255266000,\n            'cash_flow_op': -114364000,\n            'cash_flow_inv': -175928000,\n            'cash_flow_fin': 446000000\n        })\n\n    def test_tsla_20130630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1318605/000119312513327916/tsla-20130630.xml')\n        self.assert_item(item, {\n            'symbol': 'TSLA',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2013,\n            'end_date': '2013-06-30',\n            'revenues': 405139000,\n            'op_income': -11792000,\n            'net_income': -30502000,\n            'eps_basic': -0.26,\n            'eps_diluted': -0.26,\n            'dividend': 0.0,\n            'assets': 1887844000,\n            'cur_assets': 1129542000,\n            'cur_liab': 486545000,\n            'equity': 629426000,\n            'cash': 746057000,\n            'cash_flow_op': 25886000,\n            'cash_flow_inv': -82410000,\n            'cash_flow_fin': 600691000\n        })\n\n    def test_utmd_20111231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/706698/000109690612002585/utmd-20111231.xml')\n        self.assert_item(item, {\n            'symbol': 'UTMD',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2011,\n            'end_date': '2011-12-31',\n            'revenues': 37860000,\n            'op_income': 11842000,\n            'net_income': 7414000,\n            'eps_basic': 2.04,\n            'eps_diluted': 2.03,\n            'dividend': 0.0,\n            'assets': 76389000,\n            'cur_assets': 17016000,\n            'cur_liab': 9631000,\n            'equity': 40757000,\n            'cash': 6534000,\n            'cash_flow_op': 11365000,\n            'cash_flow_inv': -26685000,\n            'cash_flow_fin': 18078000\n        })\n\n    def test_vel_pe_20130930(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/103682/000119312513427104/d-20130930.xml')\n        self.assert_item(item, {\n            'symbol': 'VEL - PE',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2013,\n            'end_date': '2013-09-30',\n            'revenues': 3432000000,\n            'op_income': 1034000000,\n            'net_income': 569000000,\n            'eps_basic': 0.98,\n            'eps_diluted': 0.98,\n            'dividend': 0.5625,\n            'assets': 48488000000,\n            'cur_assets': 5210000000,\n            'cur_liab': 6453000000,\n            'equity': 11242000000,\n            'cash': 287000000,\n            'cash_flow_op': 2950000000,\n            'cash_flow_inv': -2348000000,\n            'cash_flow_fin': -563000000\n        })\n\n    def test_via_20090930(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1339947/000119312509221448/via-20090930.xml')\n        self.assert_item(item, {\n            'symbol': 'VIA',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2009,\n            'end_date': '2009-09-30',\n            'revenues': 3317000000,\n            'op_income': 784000000,\n            'net_income': 463000000,\n            'eps_basic': 0.76,\n            'eps_diluted': 0.76,\n            'dividend': 0.0,\n            'assets': 21307000000,\n            'cur_assets': 3605000000,\n            'cur_liab': 3707000000,\n            'equity': 8044000000,\n            'cash': 249000000,\n            'cash_flow_op': 732000000,\n            'cash_flow_inv': -117000000,\n            'cash_flow_fin': -1169000000\n        })\n\n    def test_via_20091231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1339947/000119312510028165/via-20091231.xml')\n        self.assert_item(item, {\n            'symbol': 'VIA',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2009,\n            'end_date': '2009-12-31',\n            'revenues': 13619000000,\n            'op_income': 2904000000,\n            'net_income': 1611000000,\n            'eps_basic': 2.65,\n            'eps_diluted': 2.65,\n            'dividend': 0.0,\n            'assets': 21900000000,\n            'cur_assets': 4430000000,\n            'cur_liab': 3751000000,\n            'equity': 8677000000,\n            'cash': 298000000,\n            'cash_flow_op': 1151000000,\n            'cash_flow_inv': -274000000,\n            'cash_flow_fin': -1388000000\n        })\n\n    def test_via_20120630(self):\n        # 'via-20120401.xml' is misnamed\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1339947/000119312512333732/via-20120401.xml')\n        self.assert_item(item, {\n            'symbol': 'VIA',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2012,\n            'end_date': '2012-06-30',\n            'revenues': 3241000000,\n            'op_income': 903000000,\n            'net_income': 534000000,\n            'eps_basic': 1.02,\n            'eps_diluted': 1.01,\n            'dividend': 0.275,\n            'assets': 21958000000,\n            'cur_assets': 4511000000,\n            'cur_liab': 3716000000,\n            'equity': 7473000000,\n            'cash': 774000000,\n            'cash_flow_op': 1736000000,\n            'cash_flow_inv': -212000000,\n            'cash_flow_fin': -1750000000\n        })\n\n    def test_vno_20090630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/899689/000089968909000034/vno-20090630.xml')\n        self.assert_item(item, {\n            'symbol': 'VNO',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'FY',  # mismarked in doc, actually should be Q2\n            'fiscal_year': 2009,\n            'end_date': '2009-06-30',\n            'revenues': 678385000,\n            'op_income': 221139000,\n            'net_income': -51904000,\n            'eps_basic': -0.3,\n            'eps_diluted': -0.3,\n            'dividend': 0.95,\n            'assets': 21831857000,\n            'cur_assets': None,\n            'cur_liab': None,\n            'equity': 7122175000,\n            'cash': 2068498000,\n            'cash_flow_op': 379439000,\n            'cash_flow_inv': -219310000,\n            'cash_flow_fin': 381516000\n        })\n\n    def test_vno_20111231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/899689/000089968912000004/vno-20111231.xml')\n        self.assert_item(item, {\n            'symbol': 'VNO',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2011,\n            'end_date': '2011-12-31',\n            'revenues': 2915665000,\n            'op_income': 856153000,\n            'net_income': 601771000,\n            'eps_basic': 3.26,\n            'eps_diluted': 3.23,\n            'dividend': 0.0,\n            'assets': 20446487000,\n            'cur_assets': None,\n            'cur_liab': None,\n            'equity': 7508447000,\n            'cash': 606553000,\n            'cash_flow_op': 702499000,\n            'cash_flow_inv': -164761000,\n            'cash_flow_fin': -621974000\n        })\n\n    def test_vrsk_20120930(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1442145/000119312512441544/vrsk-20120930.xml')\n        self.assert_item(item, {\n            'symbol': 'VRSK',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2012,\n            'end_date': '2012-09-30',\n            'revenues': 398863000,\n            'op_income': 155251000,\n            'net_income': 82911000,\n            'eps_basic': 0.5,\n            'eps_diluted': 0.48,\n            'dividend': 0.0,\n            'assets': 2303433000,\n            'cur_assets': 361337000,\n            'cur_liab': 668257000,\n            'equity': 142048000,\n            'cash': 97770000,\n            'cash_flow_op': 320997000,\n            'cash_flow_inv': -838704000,\n            'cash_flow_fin': 424004000\n        })\n\n    def test_wat_20120929(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1000697/000119312512448069/wat-20120929.xml')\n        self.assert_item(item, {\n            'symbol': 'WAT',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q3',\n            'fiscal_year': 2012,\n            'end_date': '2012-09-29',\n            'revenues': 449952000,\n            'op_income': 121745000,\n            'net_income': 99109000,\n            'eps_basic': 1.13,\n            'eps_diluted': 1.12,\n            'dividend': 0.0,\n            'assets': 2997140000,\n            'cur_assets': 2137498000,\n            'cur_liab': 767562000,\n            'equity': 1329879000,\n            'cash': 356293000,\n            'cash_flow_op': 317627000,\n            'cash_flow_inv': -298851000,\n            'cash_flow_fin': -53396000\n        })\n\n    def test_wec_20130331(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/783325/000010781513000080/wec-20130331.xml')\n        self.assert_item(item, {\n            'symbol': 'WEC',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2013,\n            'end_date': '2013-03-31',\n            'revenues': 1275200000,\n            'op_income': 321000000,\n            'net_income': 176600000,\n            'eps_basic': 0.77,\n            'eps_diluted': 0.76,\n            'dividend': 0.34,\n            'assets': 14295300000,\n            'cur_assets': 1313800000,\n            'cur_liab': 1278100000,\n            'equity': 8675000000,\n            'cash': 24700000,\n            'cash_flow_op': 330300000,\n            'cash_flow_inv': -145300000,\n            'cash_flow_fin': -195900000\n        })\n\n    def test_wec_20130630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/783325/000010781513000112/wec-20130630.xml')\n        self.assert_item(item, {\n            'symbol': 'WEC',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2013,\n            'end_date': '2013-06-30',\n            'revenues': 1012300000,\n            'op_income': 229500000,\n            'net_income': 119000000,\n            'eps_basic': 0.52,\n            'eps_diluted': 0.52,\n            'dividend': 0.34,\n            'assets': 14317000000,\n            'cur_assets': 1271100000,\n            'cur_liab': 1280700000,\n            'equity': 8609000000,\n            'cash': 21000000,\n            'cash_flow_op': 681500000,\n            'cash_flow_inv': -336600000,\n            'cash_flow_fin': -359500000\n        })\n\n    def test_wfm_20120115(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/865436/000144530512000434/wfm-20120115.xml')\n        self.assert_item(item, {\n            'symbol': 'WFM',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2012,\n            'end_date': '2012-01-15',\n            'revenues': 3390940000,\n            'op_income': 190338000,\n            'net_income': 118327000,\n            'eps_basic': 0.66,\n            'eps_diluted': 0.65,\n            'dividend': 0.14,\n            'assets': 4528241000,\n            'cur_assets': 1677087000,\n            'cur_liab': 896972000,\n            'equity': 3182747000,\n            'cash': 529954000,\n            'cash_flow_op': 260896000,\n            'cash_flow_inv': -6963000,\n            'cash_flow_fin': 63562000\n        })\n\n    def test_xel_20100331(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/72903/000110465910024080/xel-20100331.xml')\n        self.assert_item(item, {\n            'symbol': 'XEL',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2010,\n            'end_date': '2010-03-31',\n            'revenues': 2807462000,\n            'op_income': 403665000,\n            'net_income': 166058000,\n            'eps_basic': 0.36,\n            'eps_diluted': 0.36,\n            'dividend': 0.25,\n            'assets': 25334501000,\n            'cur_assets': 2344294000,\n            'cur_liab': 2759838000,\n            'equity': 7355871000,\n            'cash': 79504000,\n            'cash_flow_op': 555539000,\n            'cash_flow_inv': -460112000,\n            'cash_flow_fin': -121731000\n        })\n\n    def test_xel_20101231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/72903/000114036111012444/xel-20101231.xml')\n        self.assert_item(item, {\n            'symbol': 'XEL',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2010,\n            'end_date': '2010-12-31',\n            'revenues': 10310947000,\n            'op_income': 1619969000,\n            'net_income': 751593000,\n            'eps_basic': 1.63,\n            'eps_diluted': 1.62,\n            'dividend': 1.0,\n            'assets': 27387690000,\n            'cur_assets': 2732643000,\n            'cur_liab': 2536533000,\n            'equity': 8083519000,\n            'cash': 108437000,\n            'cash_flow_op': 1893942000,\n            'cash_flow_inv': -2806724000,\n            'cash_flow_fin': 905571000\n        })\n\n    def test_xom_20110331(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/34088/000119312511127973/xom-20110331.xml')\n        self.assert_item(item, {\n            'symbol': 'XOM',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q1',\n            'fiscal_year': 2011,\n            'end_date': '2011-03-31',\n            'revenues': 114004000000,\n            'op_income': None,\n            'net_income': 10650000000,\n            'eps_basic': 2.14,\n            'eps_diluted': 2.14,\n            'dividend': 0.44,\n            'assets': 319533000000,\n            'cur_assets': 72022000000,\n            'cur_liab': 73576000000,\n            'equity': 157531000000,\n            'cash': 12833000000,\n            'cash_flow_op': 16856000000,\n            'cash_flow_inv': -5353000000,\n            'cash_flow_fin': -6749000000\n        })\n\n    def test_xom_20111231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/34088/000119312512078102/xom-20111231.xml')\n        self.assert_item(item, {\n            'symbol': 'XOM',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2011,\n            'end_date': '2011-12-31',\n            'revenues': 467029000000,\n            'op_income': None,\n            'net_income': 41060000000,\n            'eps_basic': 8.43,\n            'eps_diluted': 8.42,\n            'dividend': 1.85,\n            'assets': 331052000000,\n            'cur_assets': 72963000000,\n            'cur_liab': 77505000000,\n            'equity': 160744000000,\n            'cash': 12664000000,\n            'cash_flow_op': 55345000000,\n            'cash_flow_inv': -22165000000,\n            'cash_flow_fin': -28256000000\n        })\n\n    def test_xom_20130630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/34088/000003408813000035/xom-20130630.xml')\n        self.assert_item(item, {\n            'symbol': 'XOM',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2013,\n            'end_date': '2013-06-30',\n            'revenues': 106469000000,\n            'op_income': None,\n            'net_income': 6860000000,\n            'eps_basic': 1.55,\n            'eps_diluted': 1.55,\n            'dividend': 0.63,\n            'assets': 341615000000,\n            'cur_assets': 62844000000,\n            'cur_liab': 72688000000,\n            'equity': 171588000000,\n            'cash': 4609000000,\n            'cash_flow_op': 21275000000,\n            'cash_flow_inv': -18547000000,\n            'cash_flow_fin': -7409000000\n        })\n\n    def test_xray_20091231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/818479/000114420410009164/xray-20091231.xml')\n        self.assert_item(item, {\n            'symbol': 'XRAY',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2009,\n            'end_date': '2009-12-31',\n            'revenues': 2159916000,\n            'op_income': 381187000,\n            'net_income': 274258000,\n            'eps_basic': 1.85,\n            'eps_diluted': 1.83,\n            'dividend': 0.2,\n            'assets': 3087932000,\n            'cur_assets': 1217796000,\n            'cur_liab': 444556000,\n            'equity': 1906958000,\n            'cash': 450348000,\n            'cash_flow_op': 362489000,\n            'cash_flow_inv': -53399000,\n            'cash_flow_fin': -71420000\n        })\n\n    def test_xrx_20091231(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/108772/000119312510043079/xrx-20091231.xml')\n        self.assert_item(item, {\n            'symbol': 'XRX',\n            'amend': False,\n            'doc_type': '10-K',\n            'period_focus': 'FY',\n            'fiscal_year': 2009,\n            'end_date': '2009-12-31',\n            'revenues': 15179000000,\n            'op_income': None,\n            'net_income': 485000000,\n            'eps_basic': 0.56,\n            'eps_diluted': 0.55,\n            'dividend': 0.0,\n            'assets': 24032000000,\n            'cur_assets': 9731000000,\n            'cur_liab': 4461000000,\n            'equity': 7191000000,\n            'cash': 3799000000,\n            'cash_flow_op': 2208000000,\n            'cash_flow_inv': -343000000,\n            'cash_flow_fin': 692000000\n        })\n\n    def test_zmh_20090630(self):\n        item = parse_xml('http://www.sec.gov/Archives/edgar/data/1136869/000095012309035693/zmh-20090630.xml')\n        self.assert_item(item, {\n            'symbol': 'ZMH',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2009,\n            'end_date': '2009-06-30',\n            'revenues': 1019900000,\n            'op_income': 296499999.99999988,\n            'net_income': 210099999.99999988,  # Wired number, but it's actually in the filing\n            'eps_basic': 0.98,\n            'eps_diluted': 0.98,\n            'dividend': 0.0,\n            'assets': 7462100000.000001,\n            'cur_assets': 2328700000.0000005,\n            'cur_liab': 669200000,\n            'equity': 5805600000,\n            'cash': 277500000,\n            'cash_flow_op': 379700000.00000018,\n            'cash_flow_inv': -174300000.00000003,\n            'cash_flow_fin': -142000000.00000003\n        })\n"
  },
  {
    "path": "pystock_crawler/tests/test_spiders_edgar.py",
    "content": "import os\nimport tempfile\n\nfrom scrapy.http import HtmlResponse, XmlResponse\n\nfrom pystock_crawler.spiders.edgar import EdgarSpider, URLGenerator\nfrom pystock_crawler.tests.base import TestCaseBase\n\n\ndef make_url(symbol, start_date='', end_date=''):\n    '''A URL that lists all 10-Q and 10-K filings of a company.'''\n    return 'http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=%s&type=10-&dateb=%s&datea=%s&owner=exclude&count=300' \\\n           % (symbol, end_date, start_date)\n\n\ndef make_link_html(href, text=u'Link'):\n    return u'<a href=\"%s\">%s</a>' % (href, text)\n\n\nclass URLGeneratorTest(TestCaseBase):\n\n    def test_no_dates(self):\n        urls = URLGenerator(('FB', 'GOOG'))\n        self.assertEqual(list(urls), [\n            make_url('FB'), make_url('GOOG')\n        ])\n\n    def test_with_start_date(self):\n        urls = URLGenerator(('AAPL', 'AMZN', 'GLD'), start_date='20120215')\n        self.assertEqual(list(urls), [\n            make_url('AAPL', start_date='20120215'),\n            make_url('AMZN', start_date='20120215'),\n            make_url('GLD', start_date='20120215')\n        ])\n\n    def test_with_end_date(self):\n        urls = URLGenerator(('TSLA', 'USO', 'MMM'), end_date='20110530')\n        self.assertEqual(list(urls), [\n            make_url('TSLA', end_date='20110530'),\n            make_url('USO', end_date='20110530'),\n            make_url('MMM', end_date='20110530')\n        ])\n\n    def test_with_start_and_end_dates(self):\n        urls = URLGenerator(('DDD', 'AXP', 'KO'), start_date='20111230', end_date='20121230')\n        self.assertEqual(list(urls), [\n            make_url('DDD', '20111230', '20121230'),\n            make_url('AXP', '20111230', '20121230'),\n            make_url('KO', '20111230', '20121230')\n        ])\n\n\nclass EdgarSpiderTest(TestCaseBase):\n\n    def test_empty_creation(self):\n        spider = EdgarSpider()\n        self.assertEqual(spider.start_urls, [])\n\n    def test_symbol_file(self):\n        # create a mock file of a list of symbols\n        f = tempfile.NamedTemporaryFile('w', delete=False)\n        f.write('# Comment\\nGOOG\\nADBE\\nLNKD\\n#comment\\nJPM\\n')\n        f.close()\n\n        spider = EdgarSpider(symbols=f.name)\n        urls = list(spider.start_urls)\n\n        self.assertEqual(urls, [\n            make_url('GOOG'), make_url('ADBE'),\n            make_url('LNKD'), make_url('JPM')\n        ])\n\n        os.remove(f.name)\n\n    def test_invalid_dates(self):\n        with self.assertRaises(ValueError):\n            EdgarSpider(startdate='12345678')\n\n        with self.assertRaises(ValueError):\n            EdgarSpider(enddate='12345678')\n\n    def test_symbol_file_and_dates(self):\n        # create a mock file of a list of symbols\n        f = tempfile.NamedTemporaryFile('w', delete=False)\n        f.write('# Comment\\nT\\nCBS\\nWMT\\n')\n        f.close()\n\n        spider = EdgarSpider(symbols=f.name, startdate='20110101', enddate='20130630')\n        urls = list(spider.start_urls)\n\n        self.assertEqual(urls, [\n            make_url('T', '20110101', '20130630'),\n            make_url('CBS', '20110101', '20130630'),\n            make_url('WMT', '20110101', '20130630')\n        ])\n\n        os.remove(f.name)\n\n    def test_parse_company_filing_page(self):\n        '''\n        Parse the page that lists all filings of a company.\n\n        Example:\n        http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001288776&type=10-&dateb=&owner=exclude&count=40\n\n        '''\n        spider = EdgarSpider()\n        spider._follow_links = True  # HACK\n\n        body = '''\n            <html><body>\n            <a href=\"http://example.com/\">Useless Link</a>\n            <a href=\"/Archives/edgar/data/abc-index.htm\">Link</a>\n            <a href=\"/Archives/edgar/data/123-index.htm\">Link</a>\n            <a href=\"/Archives/edgar/data/123.htm\">Useless Link</a>\n            <a href=\"/Archives/edgar/data/123/abc-index.htm\">Link</a>\n            <a href=\"/Archives/edgar/data/123/456/abc123-index.htm\">Link</a>\n            <a href=\"/Archives/edgar/123/abc-index.htm\">Uselss Link</a>\n            <a href=\"/Archives/edgar/data/123/456/789/HELLO-index.htm\">Link</a>\n            <a href=\"/Archives/hello-index.html\">Useless Link</a>\n            </body></html>\n        '''\n\n        response = HtmlResponse('http://sec.gov/mock', body=body)\n        requests = spider.parse(response)\n        urls = [r.url for r in requests]\n\n        self.assertEqual(urls, [\n            'http://sec.gov/Archives/edgar/data/abc-index.htm',\n            'http://sec.gov/Archives/edgar/data/123-index.htm',\n            'http://sec.gov/Archives/edgar/data/123/abc-index.htm',\n            'http://sec.gov/Archives/edgar/data/123/456/abc123-index.htm',\n            'http://sec.gov/Archives/edgar/data/123/456/789/HELLO-index.htm'\n        ])\n\n    def test_parse_quarter_or_annual_page(self):\n        '''\n        Parse the page that lists filings of a quater or a year of a company.\n\n        Example:\n        http://www.sec.gov/Archives/edgar/data/1288776/000128877613000055/0001288776-13-000055-index.htm\n\n        '''\n        spider = EdgarSpider()\n        spider._follow_links = True  # HACK\n\n        body = '''\n            <html><body>\n            <a href=\"http://example.com\">Useless Link</a>\n            <a href=\"/Archives/edgar/data/123/abc-20130630.xml\">Link</a>\n            <a href=\"/Archives/edgar/123/456/abc123-20130630.xml\">Useless Link</a>\n            <a href=\"/Archives/edgar/data/456/789/hello-20130630.xml\">Link</a>\n            <a href=\"/Archives/edgar/123/456/hello-20130630.xml\">Useless Link</a>\n            <a href=\"/Archives/data/123/456/hello-20130630.xml\">Useless Link</a>\n            <a href=\"/Archives/edgar/data/123/456/hello-201306300.xml\">Useless Link</a>\n            <a href=\"/Archives/edgar/data/123/456/xyz-20130630.html\">Link</a>\n            </body></html>\n        '''\n\n        response = HtmlResponse('http://sec.gov/mock', body=body)\n        requests = spider.parse(response)\n        urls = [r.url for r in requests]\n\n        self.assertEqual(urls, [\n            'http://sec.gov/Archives/edgar/data/123/abc-20130630.xml',\n            'http://sec.gov/Archives/edgar/data/456/789/hello-20130630.xml'\n        ])\n\n    def test_parse_xml_report(self):\n        '''Parse XML 10-Q or 10-K report.'''\n        spider = EdgarSpider()\n        spider._follow_links = True  # HACK\n\n        body = '''\n            <?xml version=\"1.0\">\n            <xbrl xmlns=\"http://www.xbrl.org/2003/instance\"\n                  xmlns:xbrli=\"http://www.xbrl.org/2003/instance\"\n                  xmlns:dei=\"http://xbrl.sec.gov/dei/2011-01-31\"\n                  xmlns:us-gaap=\"http://fasb.org/us-gaap/2011-01-31\">\n\n              <context id=\"c1\">\n                <startDate>2013-03-31</startDate>\n                <endDate>2013-06-28</endDate>\n              </context>\n\n              <dei:AmendmentFlag contextRef=\"c1\">false</dei:AmendmentFlag>\n              <dei:DocumentType contextRef=\"c1\">10-Q</dei:DocumentType>\n              <dei:DocumentFiscalPeriodFocus contextRef=\"c1\">Q2</dei:DocumentFiscalPeriodFocus>\n              <dei:DocumentPeriodEndDate contextRef=\"c1\">2013-06-28</dei:DocumentPeriodEndDate>\n              <dei:DocumentFiscalYearFocus>2013</dei>\n\n              <us-gaap:Revenues contextRef=\"c1\">100</us-gaap:Revenues>\n              <us-gaap:NetIncomeLoss contextRef=\"c1\">200</us-gaap:NetIncomeLoss>\n              <us-gaap:EarningsPerShareBasic contextRef=\"c1\">0.2</us-gaap:EarningsPerShareBasic>\n              <us-gaap:EarningsPerShareDiluted contextRef=\"c1\">0.19</us-gaap:EarningsPerShareDiluted>\n              <us-gaap:CommonStockDividendsPerShareDeclared contextRef=\"c1\">0.07</us-gaap:CommonStockDividendsPerShareDeclared>\n\n              <us-gaap:Assets contextRef=\"c1\">1600</us-gaap:Assets>\n              <us-gaap:StockholdersEquity contextRef=\"c1\">300</us-gaap:StockholdersEquity>\n              <us-gaap:CashAndCashEquivalentsAtCarryingValue contextRef=\"c1\">150</us-gaap:CashAndCashEquivalentsAtCarryingValue>\n            </xbrl>\n        '''\n\n        response = XmlResponse('http://sec.gov/Archives/edgar/data/123/abc-20130720.xml', body=body)\n        item = spider.parse_10qk(response)\n\n        self.assert_item(item, {\n            'symbol': 'ABC',\n            'amend': False,\n            'doc_type': '10-Q',\n            'period_focus': 'Q2',\n            'fiscal_year': 2013,\n            'end_date': '2013-06-28',\n            'revenues': 100.0,\n            'net_income': 200.0,\n            'eps_basic': 0.2,\n            'eps_diluted': 0.19,\n            'dividend': 0.07,\n            'assets': 1600.0,\n            'equity': 300.0,\n            'cash': 150.0\n        })\n"
  },
  {
    "path": "pystock_crawler/tests/test_spiders_nasdaq.py",
    "content": "from scrapy.http import TextResponse\n\nfrom pystock_crawler.spiders.nasdaq import NasdaqSpider\nfrom pystock_crawler.tests.base import TestCaseBase\n\n\nclass NasdaqSpiderTest(TestCaseBase):\n\n    def test_parse(self):\n        spider = NasdaqSpider()\n\n        body = ('\"Symbol\",\"Name\",\"Doesnt Matter\",\\n'\n                '\"DDD\",\"3D Systems Corporation\",\"50.5\",\\n'\n                '\"VNO\",\"Vornado Realty Trust\",\"103.5\",\\n'\n                '\"VNO^G\",\"Vornado Realty Trust\",\"25.21\",\\n'\n                '\"WBS\",\"Webster Financial Corporation\",\"29.71\",\\n'\n                '\"WBS/WS\",\"Webster Financial Corporation\",\"13.07\",\\n'\n                '\"AAA-A\",\"Some Fake Company\",\"1234.0\",')\n        response = TextResponse('http://www.nasdaq.com/dummy_url', body=body)\n        items = list(spider.parse(response))\n\n        self.assertEqual(len(items), 3)\n        self.assert_item(items[0], {\n            'symbol': 'DDD',\n            'name': '3D Systems Corporation'\n        })\n        self.assert_item(items[1], {\n            'symbol': 'VNO',\n            'name': 'Vornado Realty Trust'\n        })\n        self.assert_item(items[2], {\n            'symbol': 'WBS',\n            'name': 'Webster Financial Corporation'\n        })\n"
  },
  {
    "path": "pystock_crawler/tests/test_spiders_yahoo.py",
    "content": "import os\nimport tempfile\n\nfrom scrapy.http import TextResponse\n\nfrom pystock_crawler.spiders.yahoo import make_url, YahooSpider\nfrom pystock_crawler.tests.base import TestCaseBase\n\n\nclass MakeURLTest(TestCaseBase):\n\n    def test_no_dates(self):\n        self.assertEqual(make_url('YHOO'), (\n            'http://ichart.finance.yahoo.com/table.csv?'\n            's=YHOO&d=&e=&f=&g=d&a=&b=&c=&ignore=.csv'\n        ))\n\n    def test_only_start_date(self):\n        self.assertEqual(make_url('GOOG', start_date='20131122'), (\n            'http://ichart.finance.yahoo.com/table.csv?'\n            's=GOOG&d=&e=&f=&g=d&a=10&b=22&c=2013&ignore=.csv'\n        ))\n\n    def test_only_end_date(self):\n        self.assertEqual(make_url('AAPL', end_date='20131122'), (\n            'http://ichart.finance.yahoo.com/table.csv?'\n            's=AAPL&d=10&e=22&f=2013&g=d&a=&b=&c=&ignore=.csv'\n        ))\n\n    def test_start_and_end_dates(self):\n        self.assertEqual(make_url('TSLA', start_date='20120305', end_date='20131122'), (\n            'http://ichart.finance.yahoo.com/table.csv?'\n            's=TSLA&d=10&e=22&f=2013&g=d&a=2&b=5&c=2012&ignore=.csv'\n        ))\n\n\nclass YahooSpiderTest(TestCaseBase):\n\n    def test_empty_creation(self):\n        spider = YahooSpider()\n        self.assertEqual(list(spider.start_urls), [])\n\n    def test_inline_symbols(self):\n        spider = YahooSpider(symbols='C')\n        self.assertEqual(list(spider.start_urls), [make_url('C')])\n\n        spider = YahooSpider(symbols='KO,DIS,ATVI')\n        self.assertEqual(list(spider.start_urls), [\n            make_url(symbol) for symbol in ('KO', 'DIS', 'ATVI')\n        ])\n\n    def test_symbol_file(self):\n        try:\n            # Create a mock file of a list of symbols\n            with tempfile.NamedTemporaryFile('w', delete=False) as f:\n                f.write('# Comment\\nGOOG\\tGoogle Inc.\\nAAPL\\nFB  Facebook.com\\n#comment\\nAMZN\\n')\n\n            spider = YahooSpider(symbols=f.name)\n            self.assertEqual(list(spider.start_urls), [\n                make_url(symbol) for symbol in ('GOOG', 'AAPL', 'FB', 'AMZN')\n            ])\n        finally:\n            os.remove(f.name)\n\n    def test_illegal_dates(self):\n        with self.assertRaises(ValueError):\n            YahooSpider(startdate='12345678')\n\n        with self.assertRaises(ValueError):\n            YahooSpider(enddate='12345678')\n\n    def test_parse(self):\n        spider = YahooSpider()\n\n        body = ('Date,Open,High,Low,Close,Volume,Adj Close\\n'\n                '2013-11-22,121.58,122.75,117.93,121.38,11096700,121.38\\n'\n                '2013-09-06,168.57,169.70,165.15,166.97,8619700,166.97\\n'\n                '2013-06-26,103.80,105.87,102.66,105.72,6602600,105.72\\n')\n        response = TextResponse(make_url('YHOO'), body=body)\n        items = list(spider.parse(response))\n\n        self.assertEqual(len(items), 3)\n        self.assert_item(items[0], {\n            'symbol': 'YHOO',\n            'date': '2013-11-22',\n            'open': 121.58,\n            'high': 122.75,\n            'low': 117.93,\n            'close': 121.38,\n            'volume': 11096700,\n            'adj_close': 121.38\n        })\n        self.assert_item(items[1], {\n            'symbol': 'YHOO',\n            'date': '2013-09-06',\n            'open': 168.57,\n            'high': 169.70,\n            'low': 165.15,\n            'close': 166.97,\n            'volume': 8619700,\n            'adj_close': 166.97\n        })\n        self.assert_item(items[2], {\n            'symbol': 'YHOO',\n            'date': '2013-06-26',\n            'open': 103.80,\n            'high': 105.87,\n            'low': 102.66,\n            'close': 105.72,\n            'volume': 6602600,\n            'adj_close': 105.72\n        })\n"
  },
  {
    "path": "pystock_crawler/tests/test_utils.py",
    "content": "import cStringIO\nimport os\n\nfrom pystock_crawler import utils\nfrom pystock_crawler.tests.base import SAMPLE_DATA_DIR, TestCaseBase\n\n\nclass UtilsTest(TestCaseBase):\n\n    def test_check_date_arg(self):\n        utils.check_date_arg('19830305')\n        utils.check_date_arg('19851122')\n        utils.check_date_arg('19980720')\n        utils.check_date_arg('20140212')\n\n        # OK to pass an empty argument\n        utils.check_date_arg('')\n\n        with self.assertRaises(ValueError):\n            utils.check_date_arg('1234')\n\n        with self.assertRaises(ValueError):\n            utils.check_date_arg('2014111')\n\n        with self.assertRaises(ValueError):\n            utils.check_date_arg('20141301')\n\n        with self.assertRaises(ValueError):\n            utils.check_date_arg('20140132')\n\n    def test_parse_limit_arg(self):\n        self.assertEqual(utils.parse_limit_arg(''), (0, None))\n        self.assertEqual(utils.parse_limit_arg('11,22'), (11, 22))\n\n        with self.assertRaises(ValueError):\n            utils.parse_limit_arg('11,22,33')\n\n        with self.assertRaises(ValueError):\n            utils.parse_limit_arg('abc')\n\n    def test_load_symbols(self):\n        try:\n            filename = os.path.join(SAMPLE_DATA_DIR, 'test_symbols.txt')\n            with open(filename, 'w') as f:\n                f.write('AAPL Apple Inc.\\nGOOG\\tGoogle Inc.\\n# Comment\\nFB\\nTWTR\\nAMZN\\nSPY\\n\\nYHOO\\n# The end\\n')\n\n            symbols = list(utils.load_symbols(filename))\n            self.assertEqual(symbols, ['AAPL', 'GOOG', 'FB', 'TWTR', 'AMZN', 'SPY', 'YHOO'])\n        finally:\n            os.remove(filename)\n\n    def test_parse_csv(self):\n        f = cStringIO.StringIO('name,age\\nAvon,30\\nOmar,29\\nJoe,45\\n')\n        items = list(utils.parse_csv(f))\n        self.assertEqual(items, [\n            { 'name': 'Avon', 'age': '30' },\n            { 'name': 'Omar', 'age': '29' },\n            { 'name': 'Joe', 'age': '45' }\n        ])\n"
  },
  {
    "path": "pystock_crawler/throttle.py",
    "content": "import logging\n\nfrom scrapy.exceptions import NotConfigured\nfrom scrapy import signals\n\n\nclass PassiveThrottle(object):\n    '''\n    Scrapy's AutoThrottle adds too much download delay on edgar spider, making\n    it too slow.\n\n    PassiveThrottle takes a more \"passive\" approach. It adds download delay\n    only if there is an error response.\n\n    '''\n    def __init__(self, crawler):\n        self.crawler = crawler\n        if not crawler.settings.getbool('PASSIVETHROTTLE_ENABLED'):\n            raise NotConfigured\n\n        self.debug = crawler.settings.getbool(\"PASSIVETHROTTLE_DEBUG\")\n        self.stats = crawler.stats\n        crawler.signals.connect(self._spider_opened, signal=signals.spider_opened)\n        crawler.signals.connect(self._response_downloaded, signal=signals.response_downloaded)\n\n    @classmethod\n    def from_crawler(cls, crawler):\n        return cls(crawler)\n\n    def _spider_opened(self, spider):\n        self.mindelay = self._min_delay(spider)\n        self.maxdelay = self._max_delay(spider)\n        self.retry_http_codes = self._retry_http_codes()\n\n        self.stats.set_value('delay_count', 0)\n\n    def _min_delay(self, spider):\n        s = self.crawler.settings\n        return getattr(spider, 'download_delay', 0.0) or \\\n            s.getfloat('DOWNLOAD_DELAY')\n\n    def _max_delay(self, spider):\n        return self.crawler.settings.getfloat('PASSIVETHROTTLE_MAX_DELAY', 60.0)\n\n    def _retry_http_codes(self):\n        return self.crawler.settings.getlist('RETRY_HTTP_CODES', [])\n\n    def _response_downloaded(self, response, request, spider):\n        key, slot = self._get_slot(request, spider)\n        if slot is None:\n            return\n\n        olddelay = slot.delay\n        self._adjust_delay(slot, response)\n        if self.debug:\n            diff = slot.delay - olddelay\n            conc = len(slot.transferring)\n            msg = \"slot: %s | conc:%2d | delay:%5d ms (%+d)\" % \\\n                  (key, conc, slot.delay * 1000, diff * 1000)\n            spider.log(msg, level=logging.INFO)\n\n    def _get_slot(self, request, spider):\n        key = request.meta.get('download_slot')\n        return key, self.crawler.engine.downloader.slots.get(key)\n\n    def _adjust_delay(self, slot, response):\n        \"\"\"Define delay adjustment policy\"\"\"\n        if response.status in self.retry_http_codes:\n            new_delay = max(slot.delay, 1) * 4\n            new_delay = max(new_delay, self.mindelay)\n            new_delay = min(new_delay, self.maxdelay)\n            slot.delay = new_delay\n            self.stats.inc_value('delay_count')\n        elif response.status == 200:\n            new_delay = max(slot.delay / 2, self.mindelay)\n            if new_delay < 0.01:\n                new_delay = 0\n            slot.delay = new_delay\n"
  },
  {
    "path": "pystock_crawler/utils.py",
    "content": "import csv\n\nfrom datetime import datetime\n\n\ndef check_date_arg(value, arg_name=None):\n    if value:\n        try:\n            if len(value) != 8:\n                raise ValueError\n            datetime.strptime(value, '%Y%m%d')\n        except ValueError:\n            raise ValueError(\"Option '%s' must be in YYYYMMDD format, input is '%s'\" % (arg_name, value))\n\n\ndef parse_limit_arg(value):\n    if value:\n        tokens = value.split(',')\n        try:\n            if len(tokens) != 2:\n                raise ValueError\n            return int(tokens[0]), int(tokens[1])\n        except ValueError:\n            raise ValueError(\"Option 'limit' must be in START,COUNT format, input is '%s'\" % value)\n    return 0, None\n\n\ndef load_symbols(file_path):\n    symbols = []\n    with open(file_path) as f:\n        for line in f:\n            line = line.strip()\n            if line and not line.startswith('#'):\n                symbol = line.split()[0]\n                symbols.append(symbol)\n    return symbols\n\n\ndef parse_csv(file_like):\n    reader = csv.reader(file_like)\n    headers = reader.next()\n    for row in reader:\n        item = {}\n        for i, value in enumerate(row):\n            header = headers[i]\n            item[header] = value\n        yield item\n"
  },
  {
    "path": "pytest.ini",
    "content": "[pytest]\naddopts = --cov-report term-missing --cov pystock_crawler --cov bin pystock_crawler/tests/\n"
  },
  {
    "path": "requirements-test.txt",
    "content": "envoy\npytest\npytest-cov\nrequests\n"
  },
  {
    "path": "requirements.txt",
    "content": "docopt==0.6.2\nleveldb==0.193\nScrapy==0.24.4\nservice-identity==1.0.0\n"
  },
  {
    "path": "scrapy.cfg",
    "content": "# Automatically created by: scrapy startproject\n#\n# For more information about the [deploy] section see:\n# http://doc.scrapy.org/en/latest/topics/scrapyd.html\n\n[settings]\ndefault = pystock_crawler.settings\n\n[deploy]\n#url = http://localhost:6800/\nproject = pystock_crawler\n"
  },
  {
    "path": "setup.py",
    "content": "try:\n    from setuptools import setup\nexcept ImportError:\n    from distutils.core import setup\n\nimport codecs\nimport os\nimport re\n\n\nhere = os.path.abspath(os.path.dirname(__file__))\n\n\n# Read the version number from a source file.\n# Why read it, and not import?\n# see https://groups.google.com/d/topic/pypa-dev/0PkjVpcxTzQ/discussion\ndef find_version(*file_paths):\n    # Open in Latin-1 so that we avoid encoding errors.\n    # Use codecs.open for Python 2 compatibility\n    with codecs.open(os.path.join(here, *file_paths), 'r', 'latin1') as f:\n        version_file = f.read()\n\n    # The version line must have the form\n    # __version__ = 'ver'\n    version_match = re.search(r\"^__version__ = ['\\\"]([^'\\\"]*)['\\\"]\", version_file, re.M)\n    if version_match:\n        return version_match.group(1)\n    raise RuntimeError('Unable to find version string')\n\n\ndef read_description(filename):\n    with codecs.open(filename, encoding='utf-8') as f:\n        return f.read()\n\n\ndef parse_requirements(filename):\n    with open(filename) as f:\n        content = f.read()\n    return filter(lambda x: x and not x.startswith('#'), content.splitlines())\n\n\nsetup(\n    name='pystock-crawler',\n    version=find_version('pystock_crawler', '__init__.py'),\n    url='https://github.com/eliangcs/pystock-crawler',\n    description='Crawl and parse stock historical data',\n    long_description=read_description('README.rst'),\n    author='Chang-Hung Liang',\n    author_email='eliang.cs@gmail.com',\n    license='MIT',\n    packages=['pystock_crawler', 'pystock_crawler.spiders'],\n    scripts=['bin/pystock-crawler'],\n    install_requires=parse_requirements('requirements.txt'),\n    classifiers=[\n        'Development Status :: 3 - Alpha',\n        'Environment :: Console',\n        'Intended Audience :: Developers',\n        'Intended Audience :: Financial and Insurance Industry',\n        'License :: OSI Approved :: MIT License',\n        'Operating System :: OS Independent',\n        'Programming Language :: Python',\n        'Programming Language :: Python :: 2.7',\n        'Topic :: Internet :: WWW/HTTP',\n        'Topic :: Office/Business :: Financial :: Investment',\n        'Topic :: Software Development :: Libraries :: Python Modules'\n    ]\n)\n"
  }
]