[
  {
    "path": ".gitignore",
    "content": ".DS_Store\nnode_modules\nbower\n*.pyc\nstatic/img/bank/**/*.jpg\n.idea\nbank.db\n"
  },
  {
    "path": "README.md",
    "content": "# Overview\nVery simply, this project demonstrates how to match an image to a bank of pre-existing images. It contains a simple front-end and image bank. The python implementation of the image bank can be easily adapted for other applications.\n\nThe image comparisons use [SURF: Speeded Up Robust Features](http://www.vision.ee.ethz.ch/~surf/eccv06.pdf) which is **scale, orientation, and to some degree affine invariant**.\n\nA common problem in managing large numbers of images is detecting *slight* duplicates. Using a library like OpenCV which is widely available across platforms and languages is a great way to detect these duplicates.\n\n![scale orientation invariant](http://i.imgur.com/nFASitk.gif)\n\n\n# Animated description\n\n![animation](http://i.cubeupload.com/8nVjdO.gif)\n\n\n# How it works\nTo add an image to the bank:\n- Compute SURF descriptors for the image\n- Concatenate the descriptor to a \"mega matrix\" of pre-existing ones, making note of it's position.\n\nTo look up an image:\n- Compute SURF descriptors for the image\n- Perform a knn search in the \"mega matrix\" for the SURF descriptors found above\n- For all matches, if the two are within a certain distance threshold, we increment a similary value with respect to that candidate by 1. This creates an arbitrary similarity index.\n- Return the top results\n\n\nThe server is implemented using [flask](http://flask.pocoo.org/) and the front end uses [react](http://facebook.github.io/react/)\n\n\n# Install:\n## OSX\nNeed to install `opencv` and `imagemagick` (todo: add links)\n```sh\npip install sqlite3\npip install numpy\npip install flask\npip install wand\npip install flask\nnpm install\n```\n\n# Development:\ncompile front end\n`webpack`\n\nwatch for changes on front end\n`webpack --watch`\n\nrun server:\n`python server.py`\n\nwatch for changes on server:\nuncomment this line in `server.py` `app.debug = True`\n**note: this is by default on**\n\n# Optimization:\n- The implementation is poorly optimized, there is a rudimentary attempt to distribute the \"mega matrix\" to take advantage of multiple cores. At any sort of scale, you probably want to look into doing some sort of distributed nearest neighbor search.\n\n- By default the server persists the bank data in `bank.db` which is a simple sqlite database with pickled python objects. This is merely for convenience between server restarts. While it is running, the server keeps everything in local memory.\n\n# Related projects:\n- [isk-daemon](https://github.com/ricardocabral/iskdaemon)\n\n# Notes:\n\n- Tested with around 200k images without issues.\n\n- This is only tested on OS X Mavericks, it shouldn't have any problems on linux. It is completely untested on windows.\n\n- [A Sample dataset](http://www.vision.caltech.edu/Image_Datasets/Caltech256/). untar it and just POST them all to the server `find <MY_DATASET_DIR> -name \"*.<IMAGE_EXTENSION>\" -exec curl -i -F file=@{} \\;`\n\n\n# LICENSE\n**mineye** source code is released under the **MIT License**\n\nThe **SURF and SIFT algorithms implemented by OpenCV are patented** You will have to switch out the feature detector for something else.\n"
  },
  {
    "path": "img.py",
    "content": "import cv2\nimport numpy\nimport sqlite3\nimport pickle\nfrom datetime import datetime\n\n\n#max number of images in each matrix, for parallel processing\nDESC_MAX_LEN = 100000\n#sqlite db for persistence\nBANK_FILENAME = 'bank.db'\n\n'''\nnote the licensing issues with using SURF/SIFT, alternatives are FREAK, BRISK for\nfeature detection\n'''\ndef get_surf_des(filename):\n    f = cv2.imread(filename)\n    #hessian threshold 800, 64 not 128\n    surf = cv2.SURF(800, extended=False)\n    kp, des = surf.detectAndCompute(f, None)\n    return kp, des\n\ndef get_conn():\n    return sqlite3.connect('bank.db')\n\nclass _img:\n    def __init__(self):\n        self.imap = []\n        self.r = 0\n        self.descs = []\n        index_params = dict(algorithm=1,trees=4)\n        self.flann = cv2.FlannBasedMatcher(index_params,dict())\n\n    def add_image(self, filename, des=None):\n        if des == None:\n            kv, des = get_surf_des(filename)\n        self.imap.append({\n            'index_start' : self.r,\n            'index_end' : self.r + des.shape[0] - 1,\n            'file_name' : filename\n        })\n        self.r += des.shape[0]\n        #it's really slow to do a vstack every time, so just maintain a list and\n        #replicate it as a concatenated numpy ndarray every time. an optimization\n        #would be to do a numpy.vstack((self.descs, numpy,array(des))) where self.descs\n        #is a numpy.array\n        self.descs.append(des)\n\n    def match(self, filename, limit=20):\n        kp, to_match = get_surf_des(filename)\n        img_db = numpy.vstack(numpy.array(self.descs))\n        #this should be reversed, need to update distance calculation\n        matches = self.flann.knnMatch(img_db, to_match, k=4)\n        sim = dict()\n        for img in self.imap:\n            sim[img['file_name']] = 0\n        for i in xrange(0, len(matches)):\n            match = matches[i]\n            if match[0].distance < (.6 * match[1].distance):\n                for img in self.imap:\n                    if img['index_start'] <= i and img['index_end'] >= i:\n                        sim[img['file_name']] += 1\n        return sim\n\n    def __len__(self):\n        return len(self.descs)\n\nclass img:\n    def __init__(self):\n        self.ims = [_img()]\n        self.count = 0\n\n    def get_count(self):\n        return self.count\n\n    def add_image(self, filename, des=None):\n        self.count += 1\n        self.ims[-1].add_image(filename, des=des)\n        if len(self.ims[-1]) > DESC_MAX_LEN:\n            self.ims.append(_img())\n\n    def match(self, filename, limit=20):\n        import multiprocessing.dummy\n        p = multiprocessing.dummy.Pool(10)\n\n        def f(instance):\n            return instance.match(filename, limit=limit)\n\n        res = p.map(f, [i for i in self.ims])\n        sim = dict((k,v) for d in res for (k,v) in d.items())\n        sorted_sim = sorted(sim.items(), key=lambda x:x[1], reverse=True)[0:limit]\n        sorted_sim = [{'image' : x[0], 'similarity' : x[1]} for x in sorted_sim]\n        sorted_sim = filter(lambda x:x['similarity'] > 5, sorted_sim)\n        return sorted_sim\n\nclass persisted_img(img):\n    def __init__(self):\n        #optimization, should additionally wrap img once more instead, so it works without persistence\n        img.__init__(self)\n        with get_conn() as conn:\n            c = conn.cursor()\n            c.execute('''CREATE TABLE IF NOT EXISTS descs\n                        (filename, des,kp)\n                        ''')\n            conn.commit()\n            c.execute(\n                '''\n                SELECT filename,des\n                FROM descs\n            ''')\n            while True:\n                row = c.fetchone()\n                if not row:\n                    break\n                filename = row[0]\n                des = pickle.loads(str(row[1]))\n                print 'img.__init__: loading descriptor for file %s from db' % (filename)\n                if des == None:\n                    print 'img.__init__: error loading descriptor for %s from db' % (filename)\n                    continue\n                self.add_image(filename, des=des)\n\n    def add_image(self, filename, des=None):\n        if des == None:\n            kv, des = get_surf_des(filename)\n            with get_conn() as conn:\n                c = conn.cursor()\n                data = sqlite3.Binary(pickle.dumps(des, pickle.HIGHEST_PROTOCOL))\n                c.execute('''\n                    INSERT INTO descs(filename, des) VALUES (?,:data)\n                    ''',\n                    [filename, data]\n                )\n                print 'INSERT  %s to db' % (filename)\n                conn.commit()\n        img.add_image(self, filename, des=des)\n\n\n"
  },
  {
    "path": "package.json",
    "content": "{\n  \"name\": \"similarity\",\n  \"version\": \"0.0.0\",\n  \"description\": \"\",\n  \"main\": \"webpack.config.js\",\n  \"dependencies\": {\n    \"jsx-loader\": \"~0.9.0\",\n    \"css-loader\": \"~0.6.12\",\n    \"less-loader\": \"~0.7.2\",\n    \"less\": \"~1.7.0\",\n    \"style-loader\": \"~0.6.3\",\n    \"envify\": \"~1.2.1\",\n    \"react\": \"~0.10.0\",\n    \"superagent\": \"~0.17.0\",\n    \"lodash\": \"~2.4.1\",\n    \"jquery\": \"~2.1.0\"\n  },\n  \"devDependencies\": {},\n  \"scripts\": {\n    \"test\": \"echo \\\"Error: no test specified\\\" && exit 1\"\n  },\n  \"author\": \"\",\n  \"license\": \"ISC\"\n}\n"
  },
  {
    "path": "pages/home.jsx",
    "content": "/** @jsx React.DOM */\n/*NPM includes*/\nvar _ = require('lodash');\nvar $ = require('jQuery');\n\n/*React includes*/\nvar React = require(\"react/addons\");\nvar cx = React.addons.classSet;\n\n/*Styling*/\nrequire('./home.less');\n\nvar UploadForm = React.createClass({\n  handleClick : function(e){\n    var node = this.getDOMNode();\n    onComplete = this.props.onComplete;\n\n    e.preventDefault();\n    var formData = new FormData(this.getDOMNode());\n    $.ajax({\n        type:$(node).attr('method'),\n        url: $(node).attr('action'),\n        data:formData,\n        cache:false,\n        contentType: false,\n        processData: false,\n        success:function(data){\n          onComplete(data);\n        },\n        error: function(data){\n          alert('error uploading!');\n        }\n    });\n  },\n  render : function(){\n    return (\n      <form method={this.props.method} action={this.props.action} encType='multipart/form-data' onSubmit={this.handleClick}>\n        <input type=\"file\" name=\"file\" />\n        <input type=\"submit\" value={this.props.value} />\n      </form>\n    );\n  }\n});\n\nvar View = React.createClass({\n  getInitialState : function(){\n    return {\n      bank : [],\n      count : 0,\n      related : []\n    };\n  },\n  refreshBank : function(){\n    $.getJSON('/bank', function(bank){\n      if(!bank || !bank.latest || !bank.count){\n        return;\n      }\n      this.setState({\n        bank : bank.latest,\n        count : bank.count\n      });\n    }.bind(this));\n  },\n  relatedImagesChanged : function(images){\n    if(!images || !images.length){\n      return;\n    }\n    images = JSON.parse(images);\n    this.setState({\n      related : images\n    });\n  },\n  componentDidMount : function(){\n    this.refreshBank();\n  },\n  render : function(){\n    var images = _.map(this.state.bank, function(image){\n      return (\n        <li key={image}>\n          <img src={image} className=\"thumb-100\" />\n        </li>\n      );\n    });\n    var related = _.map(this.state.related, function(image){\n      return (\n        <li key={image.image}>\n          <img src={image.image} className=\"thumb-100\" />\n          <h5>similarity : {image.similarity}</h5>\n        </li>\n      );\n    });\n    return (\n      <div>\n        <h1>Upload a file to check!</h1>\n        <UploadForm action=\"/lookup\" value=\"Upload\" method=\"POST\"  action=\"/similar\" onComplete={ this.relatedImagesChanged }/>\n        <h1>Results</h1>\n        <p>Please upload a file to get some matches</p>\n        <ul className=\"thumbs\">\n          {related}\n        </ul>\n        <h1>Recent Images Uploaded</h1>\n        <ul className=\"thumbs\">\n          {images}\n        </ul>\n        <h3>Total Number of Images: {this.state.count} </h3>\n        <h1>Add a file to the bank!</h1>\n        <UploadForm action=\"/bank\" value=\"Upload to bank\" method=\"POST\" action=\"/bank\" onComplete={ this.refreshBank }/>\n      </div>\n    )\n  }\n});\n\nReact.renderComponent(\n  View(),\n  document.getElementById('content')\n);\n"
  },
  {
    "path": "pages/home.less",
    "content": ".thumb-100{\n  width: 100px;\n  height: 100px;\n}\n\nul.thumbs{\n  >li{\n    display: inline-block;\n    margin-left: 1px;\n  }\n}\n"
  },
  {
    "path": "server.py",
    "content": "from os import listdir\nfrom os.path import isfile, join\nimport traceback\nimport json\nimport uuid\nimport re\nimport tempfile\nfrom flask import Flask, request\nimport wand.image\nimport wand.display\nimport wand.exceptions\napp = Flask(__name__)\n\n#local stuff\nfrom img import persisted_img\nim = persisted_img()\n\nBANK_PATH = 'static/img/bank'\nBANK_THUMB_PATH = join(BANK_PATH,'thumb')\nprint 'USING BANK PATH ' + BANK_PATH\nprint 'USING THUMB PATH ' + BANK_THUMB_PATH\n\ndef get_images(path):\n    #this isn't very robust, oh well\n    return filter(\n        lambda x : re.search('\\.(jpg|jpeg|png)', x.lower()) != None,\n        [join(path, f) for f in listdir(path) if isfile(join(path,f))]\n    )\n\ndef get_bank_images():\n    return get_images(BANK_PATH)\n\ndef get_thumb_images():\n    return get_images(BANK_THUMB_PATH)\n\n@app.route(\"/\")\ndef index():\n    return '''\n        <html>\n            <head>\n            </head>\n            <body>\n              <div id=\"content\"></div>\n              <script type=\"text/javascript\" src=\"/static/js/all.js\"></script>\n            </body>\n        </html>\n        '''\n\n@app.route('/similar', methods=['POST'])\ndef similar():\n    if request.method == 'POST':\n        file = request.files['file']\n        if file:\n            tmpfile = join(\n                tempfile.gettempdir(),\n                file.name\n            )\n            file.save(tmpfile)\n            #lol shitty\n            try:\n                with wand.image.Image(filename=tmpfile) as img:\n                    img.resize(256, 256)\n                    img.save(filename=tmpfile)\n                matches = im.match(tmpfile, limit=10)\n                return json.dumps(matches)\n            except:\n                traceback.print_exc()\n                pass\n    return '', 400\n\n@app.route('/bank', methods=['GET', 'POST'])\ndef bank():\n    if request.method == 'POST':\n        file = request.files['file']\n        print file\n        if file:\n            tmpfile = join(\n                tempfile.gettempdir(),\n                file.name\n            )\n            guid = str(uuid.uuid4().get_hex().upper()[0:12]) + '.jpg'\n            dstfile = join(\n                BANK_PATH,\n                guid\n            )\n            dstfile_thumb = join(\n                BANK_THUMB_PATH,\n                guid\n            )\n            file.save(tmpfile)\n            try:\n                with wand.image.Image(filename=tmpfile) as img:\n                    img.save(filename=dstfile)\n                    #will potentially produce some funny results with extremely wide/oblong images\n                    img.resize(256, 256)\n                    img.save(filename=dstfile_thumb)\n                    im.add_image(dstfile_thumb)\n            except wand.exceptions.MissingDelegateError:\n                return 'input is not a valid image', 500\n            return '', 200\n\n    elif request.method == 'GET':\n        limit = 10\n        try:\n            limit = int(request.args.get('limit', '10'))\n        except ValueError:\n            pass\n        #note, will spit back any non dir\n        files = get_bank_images()\n        return json.dumps({\n            'count' : im.get_count(),\n            'latest' : ['/'+f for f in files[0:limit]]\n            })\n    return '', 400\n\nif __name__ == \"__main__\":\n    #todo: toggle debug from config\n    app.debug = True\n    app.run()\n"
  },
  {
    "path": "static/img/.gitignore",
    "content": "*\n!.gitignore\n"
  },
  {
    "path": "static/js/.gitignore",
    "content": "all.js\n"
  },
  {
    "path": "webpack.config.js",
    "content": "module.exports = {\n  context: __dirname,\n  entry: './pages/home.jsx',\n  output: {\n    path: __dirname + '/static/js',\n    filename: 'all.js'\n  },\n  module: {\n    loaders: [\n      {test: /\\.jsx$/, loader: 'jsx-loader'},\n      {test: /\\.less$/, loader: 'style-loader!css-loader!less-loader'},\n      {test: /bower_components\\.*\\.js$/, loader: \"script-loader\"}\n    ]\n  }\n};\n"
  }
]