Repository: kljensen/async-flask-sqlalchemy-example Branch: master Commit: 345562b0b602 Files: 6 Total size: 11.5 KB Directory structure: gitextract_qzvrx_0j/ ├── .gitignore ├── README.md ├── client.py ├── config.py ├── requirements.txt └── server.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ *.pyc ================================================ FILE: README.md ================================================ Async, non-blocking Flask & SQLAlchemy example ============================================== > [!WARNING] > This code is really old at this point. Use it for edification but not production! ## Overview This code shows how to use the following menagerie of compontents together in a completely non-blocking manner: * [Flask](http://flask.pocoo.org/), for the web application framework; * [SQLAlchemy](http://www.sqlalchemy.org/), for the object relational mapper (via [Flask-SQLAlchemy](https://github.com/mitsuhiko/flask-sqlalchemy)); * [Postgresql](http://www.postgresql.org/), for the database; * [Psycopg2](http://initd.org/psycopg/), for the SQLAlchemy-Postgresql adapter; * [Gunicorn](http://gunicorn.org/), for the WSGI server; and, * [Gevent](http://www.gevent.org/), for the networking library. The file `server.py` defines a small Flask application that has two routes: one that triggers a `time.sleep(5)` in Python and one that triggers a `pg_sleep(5)` in Postgres. Both of these sleeps are normally blocking operations. By running the server using the Gevent worker for Gunicorn, we can make the Python sleep non-blocking. By configuring Psycopg2's co-routine support (via [psycogreen](https://bitbucket.org/dvarrazzo/psycogreen)) we can make the Postgres sleep non-blocking. ## Installation Clone the repo: git clone https://github.com/kljensen/async-flask-sqlalchemy-example.git Install the requirements pip install -r requirements.txt Make sure you've got the required database createdb fsppgg_test Create the required tables in this database python ./server.py -c ## Running the code You can test three situations with this code: * Gunicorn blocking with SQLAlchemy/Psycopg2 blocking; * Gunicorn non-blocking with SQLAlchemy/Psycopg2 blocking; and, * Gunicorn non-blocking with SQLAlchemy/Psycopg2 non-blocking. ### Gunicorn blocking with SQLAlchemy blocking Run the server (which is the Flask application) like gunicorn server:app Then, in a separate shell, run the client like python ./client.py You should see output like Sending 5 requests for http://localhost:8000/sleep/python/... @ 5.05s got response [200] @ 10.05s got response [200] @ 15.07s got response [200] @ 20.07s got response [200] @ 25.08s got response [200] = 25.09s TOTAL Sending 5 requests for http://localhost:8000/sleep/postgres/... @ 5.02s got response [200] @ 10.02s got response [200] @ 15.03s got response [200] @ 20.04s got response [200] @ 25.05s got response [200] = 25.05s TOTAL ------------------------------------------ SUM TOTAL = 50.15s ### Gunicorn non-blocking with SQLAlchemy blocking Run the server like gunicorn server:app -k gevent and run the client again. You should see output like Sending 5 requests for http://localhost:8000/sleep/python/... @ 5.05s got response [200] @ 5.06s got response [200] @ 5.06s got response [200] @ 5.06s got response [200] @ 5.07s got response [200] = 5.08s TOTAL Sending 5 requests for http://localhost:8000/sleep/postgres/... @ 5.01s got response [200] @ 10.02s got response [200] @ 15.04s got response [200] @ 20.05s got response [200] @ 25.06s got response [200] = 25.06s TOTAL ------------------------------------------ SUM TOTAL = 30.14s ### Gunicorn non-blocking with SQLAlchemy non-blocking Run the server like PSYCOGREEN=true gunicorn server:app -k gevent and run the client again. You should see output like Sending 5 requests for http://localhost:8000/sleep/python/... @ 5.03s got response [200] @ 5.03s got response [200] @ 5.03s got response [200] @ 5.04s got response [200] @ 5.03s got response [200] = 5.04s TOTAL Sending 5 requests for http://localhost:8000/sleep/postgres/... @ 5.02s got response [200] @ 5.03s got response [200] @ 5.03s got response [200] @ 5.03s got response [200] @ 5.03s got response [200] = 5.03s TOTAL ------------------------------------------ SUM TOTAL = 10.07s ## Warnings (I lied, it actually does block) If you increase the number of requests made in `client.py` you'll notice that SQLAlchemy/Psycopg2 start to block again. Try, e.g. python ./client.py 100 when running the server in fully non-blocking mode. You'll notice the `/sleep/postgres/` responses come back in sets of 15. (Well, probably 15, you could have your environment configured differently than I.) This because SQLAlchemy uses [connection pooling](http://docs.sqlalchemy.org/en/latest/core/pooling.html) and, by default, the [QueuePool](http://docs.sqlalchemy.org/en/latest/core/pooling.html#sqlalchemy.pool.QueuePool) which limits the number of connections to some configuration parameter `pool_size` plus a possible "burst" of `max_overflow`. (If you're using the [Flask-SQLAlchemy](https://github.com/mitsuhiko/flask-sqlalchemy) extension, `pool_size` is set by your Flask app's configuration variable `SQLALCHEMY_POOL_SIZE`. It is 5 by default. `max_overflow` is 10 by default and cannot be specified by a Flask configuration variable, you need to set it on the pool yourself.) Once you get over `pool_size + max_overflow` needed connections, the SQLAlchemy operations will block. You can get around this by disabling pooling via SQLAlchemy's [SQLAlchemy's NullPool](http://docs.sqlalchemy.org/en/latest/core/pooling.html#sqlalchemy.pool.NullPool); however, you probably don't want to do that for two reasons. 1. Postgresql has a configuration parameter `max_connections` that, drumroll, limits the number of connections. If `pool_size + max_overflow` exceeds `max_connections`, any new connection requests will be declined by your Postgresql instance. Each unique connection will cause Postgresql to use a non-trival amount of RAM. Therefore, unless you have a ton of RAM, you should keep `max_connections` to some reasonable value. 2. If you used the `NullPool`, you'd create a new TCP connection every time you use SQLAlchemy to talk to the database. Thus, you'll encur an overhead associated with the TCP handshake, etc. So, in effect, the concurrency for Postgresql operations is always limited by `max_connections` and how much RAM you have. ## Results Stuff gets faster, shizzle works fine. Your mileage may vary in production. ## License (MIT) Copyright (c) 2013 Kyle L. Jensen (kljensen@gmail.com) Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: client.py ================================================ import sys import gevent import time from gevent import monkey monkey.patch_all() import urllib2 def fetch_url(url): """ Fetch a URL and return the total amount of time required. """ t0 = time.time() try: resp = urllib2.urlopen(url) resp_code = resp.code except urllib2.HTTPError, e: resp_code = e.code t1 = time.time() print("\t@ %5.2fs got response [%d]" % (t1 - t0, resp_code)) return t1 - t0 def time_fetch_urls(url, num_jobs): """ Fetch a URL `num_jobs` times in parallel and return the total amount of time required. """ print("Sending %d requests for %s..." % (num_jobs, url)) t0 = time.time() jobs = [gevent.spawn(fetch_url, url) for i in range(num_jobs)] gevent.joinall(jobs) t1 = time.time() print("\t= %5.2fs TOTAL" % (t1 - t0)) return t1 - t0 if __name__ == '__main__': try: num_requests = int(sys.argv[1]) except IndexError: num_requests = 5 # Fetch the URL that blocks with a `time.sleep` t0 = time_fetch_urls("http://localhost:8000/sleep/python/", num_requests) # Fetch the URL that blocks with a `pg_sleep` t1 = time_fetch_urls("http://localhost:8000/sleep/postgres/", num_requests) print("------------------------------------------") print("SUM TOTAL = %.2fs" % (t0 + t1)) ================================================ FILE: config.py ================================================ SQLALCHEMY_DATABASE_URI = 'postgresql+psycopg2://localhost/fsppgg_test' SQLALCHEMY_ECHO = False SECRET_KEY = '\xfb\x12\xdf\xa1@i\xd6>V\xc0\xbb\x8fp\x16#Z\x0b\x81\xeb\x16' DEBUG = True ================================================ FILE: requirements.txt ================================================ Flask-SQLAlchemy==0.16 psycopg2==2.4.6 psycogreen==1.0 gevent==0.13.8 gunicorn==0.17.2 ================================================ FILE: server.py ================================================ import sys import os import time from flask import Flask, jsonify from flask.ext.sqlalchemy import SQLAlchemy # Optionally, set up psycopg2 & SQLAlchemy to be greenlet-friendly. # Note: psycogreen does not really monkey patch psycopg2 in the # manner that gevent monkey patches socket. # if "PSYCOGREEN" in os.environ: # Do our monkey patching # from gevent.monkey import patch_all patch_all() from psycogreen.gevent import patch_psycopg patch_psycopg() using_gevent = True else: using_gevent = False # Create our Flask app # app = Flask(__name__) app.config.from_pyfile('config.py') # Create our Flask-SQLAlchemy instance # db = SQLAlchemy(app) if using_gevent: # Assuming that gevent monkey patched the builtin # threading library, we're likely good to use # SQLAlchemy's QueuePool, which is the default # pool class. However, we need to make it use # threadlocal connections # # db.engine.pool._use_threadlocal = True class Todo(db.Model): """ Small example model just to show you that SQLAlchemy is doing everything it should be doing. """ id = db.Column(db.Integer, primary_key=True) title = db.Column(db.String(60)) done = db.Column(db.Boolean) priority = db.Column(db.Integer) def as_dict(self): """ Return an individual Todo as a dictionary. """ return { 'id': self.id, 'title': self.title, 'done': self.done, 'priority': self.priority } @classmethod def jsonify_all(cls): """ Returns all Todo instances in a JSON Flask response. """ return jsonify(todos=[todo.as_dict() for todo in cls.query.all()]) @app.route('/sleep/postgres/') def sleep_postgres(): """ This handler asks Postgres to sleep for 5s and will block for 5s unless psycopg2 is set up (above) to be gevent-friendly. """ db.session.execute('SELECT pg_sleep(5)') return Todo.jsonify_all() @app.route('/sleep/python/') def sleep_python(): """ This handler sleeps for 5s and will block for 5s unless gunicorn is using the gevent worker class. """ time.sleep(5) return Todo.jsonify_all() # Create the tables and populate it with some dummy data # def create_data(): """ A helper function to create our tables and some Todo objects. """ db.create_all() todos = [] for i in range(50): todo = Todo( title="Slave for the man {0}".format(i), done=(i % 2 == 0), priority=(i % 5) ) todos.append(todo) db.session.add_all(todos) db.session.commit() if __name__ == '__main__': if '-c' in sys.argv: create_data() else: app.run()