Repository: kljensen/async-flask-sqlalchemy-example
Branch: master
Commit: 345562b0b602
Files: 6
Total size: 11.5 KB

Directory structure:
gitextract_qzvrx_0j/

├── .gitignore
├── README.md
├── client.py
├── config.py
├── requirements.txt
└── server.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
*.pyc


================================================
FILE: README.md
================================================
Async, non-blocking Flask & SQLAlchemy example
==============================================

> [!WARNING]  
> This code is really old at this point. Use it for edification but not production!

## Overview

This code shows how to use the following menagerie of compontents
together in a completely non-blocking manner:

* [Flask](http://flask.pocoo.org/), for the web application framework;
* [SQLAlchemy](http://www.sqlalchemy.org/), for the object relational mapper (via [Flask-SQLAlchemy](https://github.com/mitsuhiko/flask-sqlalchemy));
* [Postgresql](http://www.postgresql.org/), for the database;
* [Psycopg2](http://initd.org/psycopg/), for the SQLAlchemy-Postgresql adapter;
* [Gunicorn](http://gunicorn.org/), for the WSGI server; and,
* [Gevent](http://www.gevent.org/), for the networking library.

The file `server.py` defines a small Flask application that has
two routes: one that triggers a `time.sleep(5)` in Python and one that
triggers a `pg_sleep(5)` in Postgres.  Both of these sleeps are normally
blocking operations.  By running the server using the Gevent
worker for Gunicorn, we can make the Python sleep non-blocking.
By configuring Psycopg2's co-routine support (via
[psycogreen](https://bitbucket.org/dvarrazzo/psycogreen)) we 
can make the Postgres sleep non-blocking.


## Installation

Clone the repo:

	git clone https://github.com/kljensen/async-flask-sqlalchemy-example.git

Install the requirements

	pip install -r requirements.txt

Make sure you've got the required database

	createdb fsppgg_test

Create the required tables in this database

	python ./server.py -c


## Running the code

You can test three situations with this code:
 * Gunicorn blocking with SQLAlchemy/Psycopg2 blocking;
 * Gunicorn non-blocking with SQLAlchemy/Psycopg2 blocking; and,
 * Gunicorn non-blocking with SQLAlchemy/Psycopg2 non-blocking.

### Gunicorn blocking with SQLAlchemy blocking

Run the server (which is the Flask application) like

	gunicorn server:app

Then, in a separate shell, run the client like

	python ./client.py

You should see output like

	Sending 5 requests for http://localhost:8000/sleep/python/...
		@  5.05s got response [200]
		@ 10.05s got response [200]
		@ 15.07s got response [200]
		@ 20.07s got response [200]
		@ 25.08s got response [200]
		= 25.09s TOTAL
	Sending 5 requests for http://localhost:8000/sleep/postgres/...
		@  5.02s got response [200]
		@ 10.02s got response [200]
		@ 15.03s got response [200]
		@ 20.04s got response [200]
		@ 25.05s got response [200]
		= 25.05s TOTAL
	------------------------------------------
	SUM TOTAL = 50.15s


### Gunicorn non-blocking with SQLAlchemy blocking

Run the server like

	gunicorn server:app -k gevent

and run the client again.   You should see output like

	Sending 5 requests for http://localhost:8000/sleep/python/...
		@  5.05s got response [200]
		@  5.06s got response [200]
		@  5.06s got response [200]
		@  5.06s got response [200]
		@  5.07s got response [200]
		=  5.08s TOTAL
	Sending 5 requests for http://localhost:8000/sleep/postgres/...
		@  5.01s got response [200]
		@ 10.02s got response [200]
		@ 15.04s got response [200]
		@ 20.05s got response [200]
		@ 25.06s got response [200]
		= 25.06s TOTAL
	------------------------------------------
	SUM TOTAL = 30.14s
	 

### Gunicorn non-blocking with SQLAlchemy non-blocking

Run the server like

	PSYCOGREEN=true gunicorn server:app  -k gevent 

and run the client again.   You should see output like

	Sending 5 requests for http://localhost:8000/sleep/python/...
		@  5.03s got response [200]
		@  5.03s got response [200]
		@  5.03s got response [200]
		@  5.04s got response [200]
		@  5.03s got response [200]
		=  5.04s TOTAL
	Sending 5 requests for http://localhost:8000/sleep/postgres/...
		@  5.02s got response [200]
		@  5.03s got response [200]
		@  5.03s got response [200]
		@  5.03s got response [200]
		@  5.03s got response [200]
		=  5.03s TOTAL
	------------------------------------------
	SUM TOTAL = 10.07s


## Warnings (I lied, it actually does block)

If you increase the number of requests made in `client.py` you'll notice
that SQLAlchemy/Psycopg2 start to block again.  Try, e.g.

	python ./client.py 100

when running the server in fully non-blocking mode.  You'll notice the `/sleep/postgres/` 
responses come back in sets of 15. (Well, probably 15, you could have your
environment configured differently than I.)  This because SQLAlchemy uses
[connection pooling](http://docs.sqlalchemy.org/en/latest/core/pooling.html)
and, by default, the [QueuePool](http://docs.sqlalchemy.org/en/latest/core/pooling.html#sqlalchemy.pool.QueuePool)
which limits the number of connections to some configuration parameter
`pool_size` plus a possible "burst" of `max_overflow`.  (If you're using 
the [Flask-SQLAlchemy](https://github.com/mitsuhiko/flask-sqlalchemy)
extension, `pool_size` is set by your Flask app's configuration variable
`SQLALCHEMY_POOL_SIZE`.  It is 5 by default.  `max_overflow` is 10 by
default and cannot be specified by a Flask configuration variable, you need
to set it on the pool yourself.) Once you get over
`pool_size + max_overflow` needed connections, the SQLAlchemy operations
will block.  You can get around this by disabling pooling via SQLAlchemy's
[SQLAlchemy's NullPool](http://docs.sqlalchemy.org/en/latest/core/pooling.html#sqlalchemy.pool.NullPool);
however, you probably don't want to do that for two reasons.  

1.  Postgresql has a configuration parameter `max_connections` that, drumroll, limits the
number of connections.  If `pool_size + max_overflow` exceeds `max_connections`,
any new connection requests will be declined by your Postgresql instance.
Each unique connection will cause Postgresql to use a non-trival amount of
RAM.  Therefore, unless you have a ton of RAM, you should keep `max_connections`
to some reasonable value.

2.  If you used the `NullPool`, you'd create a new TCP connection every
time you use SQLAlchemy to talk to the database.  Thus, you'll encur an
overhead  associated with the TCP handshake, etc.

So, in effect, the concurrency for Postgresql operations is always
limited by `max_connections` and how much RAM you have.


## Results

Stuff gets faster, shizzle works fine.  Your mileage may vary in production.  


## License (MIT)

Copyright (c) 2013 Kyle L. Jensen (kljensen@gmail.com)

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


================================================
FILE: client.py
================================================
import sys
import gevent
import time
from gevent import monkey
monkey.patch_all()
import urllib2


def fetch_url(url):
    """ Fetch a URL and return the total amount of time required.
    """
    t0 = time.time()
    try:
        resp = urllib2.urlopen(url)
        resp_code = resp.code
    except urllib2.HTTPError, e:
        resp_code = e.code

    t1 = time.time()
    print("\t@ %5.2fs got response [%d]" % (t1 - t0, resp_code))
    return t1 - t0


def time_fetch_urls(url, num_jobs):
    """ Fetch a URL `num_jobs` times in parallel and return the
        total amount of time required.
    """
    print("Sending %d requests for %s..." % (num_jobs, url))
    t0 = time.time()
    jobs = [gevent.spawn(fetch_url, url) for i in range(num_jobs)]
    gevent.joinall(jobs)
    t1 = time.time()
    print("\t= %5.2fs TOTAL" % (t1 - t0))
    return t1 - t0


if __name__ == '__main__':

    try:
        num_requests = int(sys.argv[1])
    except IndexError:
        num_requests = 5

    # Fetch the URL that blocks with a `time.sleep`
    t0 = time_fetch_urls("http://localhost:8000/sleep/python/", num_requests)

    # Fetch the URL that blocks with a `pg_sleep`
    t1 = time_fetch_urls("http://localhost:8000/sleep/postgres/", num_requests)

    print("------------------------------------------")
    print("SUM TOTAL = %.2fs" % (t0 + t1))


================================================
FILE: config.py
================================================
SQLALCHEMY_DATABASE_URI = 'postgresql+psycopg2://localhost/fsppgg_test'
SQLALCHEMY_ECHO = False
SECRET_KEY = '\xfb\x12\xdf\xa1@i\xd6>V\xc0\xbb\x8fp\x16#Z\x0b\x81\xeb\x16'
DEBUG = True


================================================
FILE: requirements.txt
================================================
Flask-SQLAlchemy==0.16
psycopg2==2.4.6
psycogreen==1.0
gevent==0.13.8
gunicorn==0.17.2

================================================
FILE: server.py
================================================
import sys
import os
import time
from flask import Flask, jsonify
from flask.ext.sqlalchemy import SQLAlchemy


# Optionally, set up psycopg2 & SQLAlchemy to be greenlet-friendly.
# Note: psycogreen does not really monkey patch psycopg2 in the
# manner that gevent monkey patches socket.
#
if "PSYCOGREEN" in os.environ:

    # Do our monkey patching
    #
    from gevent.monkey import patch_all
    patch_all()
    from psycogreen.gevent import patch_psycopg
    patch_psycopg()

    using_gevent = True
else:
    using_gevent = False


# Create our Flask app
#
app = Flask(__name__)
app.config.from_pyfile('config.py')


# Create our Flask-SQLAlchemy instance
#
db = SQLAlchemy(app)
if using_gevent:

    # Assuming that gevent monkey patched the builtin
    # threading library, we're likely good to use
    # SQLAlchemy's QueuePool, which is the default
    # pool class.  However, we need to make it use
    # threadlocal connections
    #
    #
    db.engine.pool._use_threadlocal = True


class Todo(db.Model):
    """ Small example model just to show you that SQLAlchemy is
        doing everything it should be doing.
    """
    id = db.Column(db.Integer, primary_key=True)
    title = db.Column(db.String(60))
    done = db.Column(db.Boolean)
    priority = db.Column(db.Integer)

    def as_dict(self):
        """ Return an individual Todo as a dictionary.
        """
        return {
            'id': self.id,
            'title': self.title,
            'done': self.done,
            'priority': self.priority
        }

    @classmethod
    def jsonify_all(cls):
        """ Returns all Todo instances in a JSON
            Flask response.
        """
        return jsonify(todos=[todo.as_dict() for todo in cls.query.all()])


@app.route('/sleep/postgres/')
def sleep_postgres():
    """ This handler asks Postgres to sleep for 5s and will
        block for 5s unless psycopg2 is set up (above) to be
        gevent-friendly.
    """
    db.session.execute('SELECT pg_sleep(5)')
    return Todo.jsonify_all()


@app.route('/sleep/python/')
def sleep_python():
    """ This handler sleeps for 5s and will block for 5s unless
        gunicorn is using the gevent worker class.
    """
    time.sleep(5)
    return Todo.jsonify_all()


# Create the tables and populate it with some dummy data
#
def create_data():
    """ A helper function to create our tables and some Todo objects.
    """
    db.create_all()
    todos = []
    for i in range(50):
        todo = Todo(
            title="Slave for the man {0}".format(i),
            done=(i % 2 == 0),
            priority=(i % 5)
        )
        todos.append(todo)
    db.session.add_all(todos)
    db.session.commit()


if __name__ == '__main__':

    if '-c' in sys.argv:
        create_data()
    else:
        app.run()