Repository: Pathgather/predictor
Branch: master
Commit: be866b424119
Files: 20
Total size: 88.7 KB

Directory structure:
gitextract_u972f2ab/

├── .github/
│   └── workflows/
│       └── test.yml
├── .gitignore
├── Changelog.md
├── Gemfile
├── LICENSE
├── README.md
├── Rakefile
├── benchmark/
│   └── process.rb
├── docs/
│   └── READMEv1.md
├── lib/
│   ├── predictor/
│   │   ├── base.rb
│   │   ├── distance.rb
│   │   ├── input_matrix.rb
│   │   ├── predictor.rb
│   │   └── version.rb
│   └── predictor.rb
├── predictor.gemspec
└── spec/
    ├── base_spec.rb
    ├── input_matrix_spec.rb
    ├── predictor_spec.rb
    └── spec_helper.rb

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/workflows/test.yml
================================================
name: Test

on: [push, pull_request]

jobs:
  test:

    runs-on: ${{ matrix.os }}

    strategy:
      fail-fast: false
      matrix:
        os: [ubuntu-18.04, ubuntu-20.04]
        ruby: [2.6, 2.7, 3.0]
    services:
      redis:
        image: redis
        options: >-
          --health-cmd "redis-cli ping"
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
        ports:
          - 6379:6379
    
    steps:
    - uses: actions/checkout@v2
    - name: Set up Ruby ${{ matrix.ruby }}
      uses: ruby/setup-ruby@v1
      with:
        bundler-cache: true
        ruby-version: ${{ matrix.ruby }}
    - name: Install dependencies
      run: bundle install
    - name: Run tests
      run: bundle exec rake


================================================
FILE: .gitignore
================================================
bin/
*.gem
Gemfile.lock
ext/Makefile


================================================
FILE: Changelog.md
================================================
# Predictor Changelog
All notable changes to this project will be documented in this file.

## [Unreleased]
### Changed
- Support rake version 11.0 or higher＆rspec version 3.4.0 or higher
- Fix title of README
- Change a test with github actions
- Made it possible to run tests on ubuntu-18.04 and ubuntu-20.04
- Fix the homepage entry in predictor.gemspec

### **BREAKING CHANGES**
- Ruby 2.1 ~ 2.5 will no longer be supported because of eol

## [2.3.0] - 2014-09-06
- The logic for processing item similarities was ported to a Lua script. Use `Predictor.processing_technique(:lua)` to use the Lua script for all similarity calculations, or use `MyRecommender.processing_technique(:lua)` to use it for specific recommenders. It is substantially faster than the default (old) Ruby mechanism, but has the disadvantage of blocking the Redis server while it runs.
- An alternate method of calculating item similarities was added, which uses a ZUNIONSTORE across item sets. The results are similar to those achieved by using the Ruby or Lua scripts, but faster. Use `Predictor.processing_technique(:union)` to use the ZUNIONSTORE technique for all similarity calculations, or use `MyRecommender.processing_technique(:union)` to use it for specific recommenders.

## [2.2.0] - 2014-06-24
- The namespace used for keys in Redis is now configurable on a global or per-class basis. See the readme for more information. If you were overriding the redis_prefix instance method before, it is recommended that you use the new redis_prefix class method instead.
- Data stored in Redis is now namespaced by the class name of the recommender it is stored by. This change ensures that different recommenders with input matrices of the same name don't overwrite each others' data. After upgrading you'll need to either reindex your data in Redis or configure Predictor to use the naming system you were using before. If you were using the defaults before and you're not worried about matrix name collisions, you can mimic the old behavior with:
```ruby
  class MyRecommender
    include Predictor::Base
    redis_prefix [nil]
  end
```
- The #predictions_for method on recommenders now accepts a :boost option to give more weight to items with particular attributes. See the readme for more information.

## [2.1.0] - 2014-06-19
- The similarity limit now defaults to 128, instead of being unlimited. This is intended to save space in Redis. See the Readme for more information. It is strongly recommended that you run `ensure_similarity_limit_is_obeyed!` to shrink existing similarity sets.

## [2.0.0] - 2014-04-17
**Rewrite of 1.0.0 and contains several breaking changes!**

Version 1.0.0 (which really should have been 0.0.1) contained several issues that made compatability with v2 not worth the trouble. This includes:
- In v1, similarities were cached per input_matrix, and Predictor::Base utilized those caches when determining similarities and predictions. This quickly ate up Redis memory with even a semi-large dataset, as each input_matrix had a significant memory requirement. v2 caches similarities at the root (Recommender::Base), which means you can add any number of input matrices with little impact on memory usage.
- Added the ability to limit the number of items stored in the similarity cache (via the 'limit_similarities_to' option). Now that similarities are cached at the root, this is possible and can greatly help memory usage.
- Removed bang methods from input_matrix (add_set!, and_single!, etc). These called process! for you previously, but since the cache is no longer kept at the input_matrix level, process! has to be called at the root (Recommender::Base)
- Bug fix: Fixed bug where a call to delete_item! on the input matrix didn't update the similarity cache.
- Other minor fixes.

================================================
FILE: Gemfile
================================================
source 'https://rubygems.org'

gemspec


================================================
FILE: LICENSE
================================================
The MIT License (MIT)

Copyright (c) 2014 Pathgather

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


================================================
FILE: README.md
================================================
# Predictor

Fast and efficient recommendations and predictions using Ruby & Redis. Developed by and used at [Pathgather](http://pathgather.com) to generate course similarities and content recommendations to users.

![Test](https://github.com/nyagato-00/predictor/workflows/Test/badge.svg?branch=master)

Originally forked and based on [Recommendify](https://github.com/paulasmuth/recommendify) by Paul Asmuth, so a huge thanks to him for his contributions to Recommendify. Predictor has been almost completely rewritten to
* Be much, much more performant and efficient by using Redis for most logic.
* Provide item similarities such as "Users that read this book also read ..."
* Provide personalized predictions based on a user's past history, such as "You read these 10 books, so you might also like to read ..."

At the moment, Predictor uses the [Jaccard index](http://en.wikipedia.org/wiki/Jaccard_index) or the [Sorenson-Dice coefficient](http://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient) (default is Jaccard) to determine similarities between items. There are other ways to do this, which we intend to implement eventually, but if you want to beat us to the punch, pull requests are quite welcome :)

Notice
---------------------
This is the readme for Predictor 2.0, which contains a few breaking changes from 1.0. The 1.0 readme can be found [here](https://github.com/Pathgather/predictor/blob/master/docs/READMEv1.md). See below on how to upgrade to 2.0

Installation
---------------------
In your Gemfile:
```ruby
gem 'predictor'
```
Getting Started
---------------------
First step is to configure Predictor with your Redis instance.
```ruby
# in config/initializers/predictor.rb
Predictor.redis = Redis.new(:url => ENV["PREDICTOR_REDIS"])

# Or, to improve performance, add hiredis as your driver (you'll need to install the hiredis gem first)
Predictor.redis = Redis.new(:url => ENV["PREDICTOR_REDIS"], :driver => :hiredis)
```

Inputting Data
---------------------
Create a class and include the Predictor::Base module. Define an input_matrix for each relationship you'd like to keep track of. This can be anything you think is a significant metric for the item: page views, purchases, categories the item belongs to, etc.

Below, we're building a recommender to recommend courses based off of:
* Users that have taken a course. If 2 courses were taken by the same user, this is 3 times as important to us than if the courses share the same topic. This will lead to sets like:
  * "user1" -> "course-1", "course-3",
  * "user2" -> "course-1", "course-4"
* Tags and their courses. This will lead to sets like:
  * "rails" -> "course-1", "course-2",
  * "microeconomics" -> "course-3", "course-4"
* Topics and their courses. This will lead to sets like:
  * "computer science" -> "course-1", "course-2",
  * "economics and finance" -> "course-3", "course-4"

```ruby
class CourseRecommender
  include Predictor::Base

  input_matrix :users, weight: 3.0
  input_matrix :tags, weight: 2.0
  input_matrix :topics, weight: 1.0, measure: :sorensen_coefficient # Use Sorenson over Jaccard
end
```

Now, we just need to update our matrices when courses are created, users take a course, topics are changed, etc:
```ruby
recommender = CourseRecommender.new

# Add a single course to topic-1's items. If topic-1 already exists as a set ID, this just adds course-1 to the set
recommender.add_to_matrix!(:topics, "topic-1", "course-1")

# If your dataset is even remotely large, add_to_matrix! could take some time, as it must calculate the similarity scores
# for course-1 and other courses that share a set with course-1. If this is the case, use add_to_matrix and
# process the items at a more convenient time, perhaps in a background job
recommender.topics.add_to_set("topic-1", "course-1", "course-2") # Same as recommender.add_to_matrix(:topics, "topic-1", "course-1", "course-2")
recommender.process_items!("course-1", "course-2")
```

As noted above, it's important to remember that if you don't use the bang method 'add_to_matrix!', you'll need to manually update your similarities. If your dataset is even remotely large, you'll probably want to do this:
* If you want to update the similarities for certain item(s):
  ````
  recommender.process_items!(item1, item2, etc)
  ````
* If you want to update all similarities for all items:
  ````
  recommender.process!
  ````

Retrieving Similarities and Recommendations
---------------------
Now that your matrices have been initialized with several relationships, you can start generating similarities and recommendations! First, let's start with similarities, which will use the weights we specify on each matrix to determine which courses share the most in common with a given course.

```ruby
recommender = CourseRecommender.new

# Return all similarities for course-1 (ordered by most similar to least).
recommender.similarities_for("course-1")

# Need to paginate? Not a problem! Specify an offset and a limit
recommender.similarities_for("course-1", offset: 10, limit: 10) # Gets similarities 11-20

# Want scores?
recommender.similarities_for("course-1", with_scores: true)

# Want to ignore a certain set of courses in similarities?
recommender.similarities_for("course-1", exclusion_set: ["course-2"])
```

The above examples are great for situations like "Users that viewed this also liked ...", but what if you wanted to recommend courses to a user based on the courses they've already taken? Not a problem!

```ruby
recommender = CourseRecommender.new

# User has taken course-1 and course-2. Let's see what else they might like...
recommender.predictions_for(item_set: ["course-1", "course-2"])

# Already have the set you need stored in an input matrix? In our case, we do (the users matrix stores the courses a user has taken), so we can just do:
recommender.predictions_for("user-1", matrix_label: :users)

# Paginate too!
recommender.predictions_for("user-1", matrix_label: :users, offset: 10, limit: 10)

# Gimme some scores and ignore course-2....that course-2 is one sketchy fella
recommender.predictions_for("user-1", matrix_label: :users, with_scores: true, exclusion_set: ["course-2"])
```

Deleting Items
---------------------
If your data is deleted from your persistent storage, you certainly don't want to recommend it to a user. To ensure that doesn't happen, simply call delete_from_matrix! with the individual matrix or delete_item! if the item is completely gone:
```ruby
recommender = CourseRecommender.new

# User removed course-1 from topic-1, but course-1 still exists

recommender.delete_pair_from_matrix!(:topics, "topic-1", "course-1")

#User removed course-1 from all topics
recommender.delete_from_matrix!(:topics, "course-1")

# course-1 was permanently deleted
recommender.delete_item!("course-1")

# Something crazy has happened, so let's just start fresh and wipe out all previously stored similarities:
recommender.clean!
```

Limiting Similarities
---------------------
By default, Predictor caches 128 similarities for each item. This is because this is the maximum size for the similarity sorted sets to be kept in a [memory-efficient format](http://redis.io/topics/memory-optimization). If you want to keep more similarities than that, and you don't mind using more memory, you may want to increase the similarity limit, like so:

```ruby
class CourseRecommender
  include Predictor::Base

  limit_similarities_to 500
  input_matrix :users, weight: 3.0
  input_matrix :tags, weight: 2.0
  input_matrix :topics, weight: 1.0
end
```

The memory penalty can be heavy, though. In our testing, similarity caches for 1,000 objects varied in size like so:

```
limit_similarities_to(128) # 8.5 MB (this is the default)
limit_similarities_to(129) # 22.74 MB
limit_similarities_to(500) # 76.72 MB
```

If you decide you need to store more than 128 similarities, you may want to see the Redis documentation linked above and consider increasing `zset-max-ziplist-entries` in your configuration.

Predictions fetched with the predictions_for call utilizes the similarity caches, so if you're using predictions_for, make sure you set the limit high enough so that intelligent predictions can be generated. If you aren't using predictions and are just using similarities, then feel free to set this to the maximum number of similarities you'd possibly want to show!

You can also use `limit_similarities_to(nil)` to remove the limit entirely. This means if you have 10,000 items, and each item is somehow related to the other, you'll have 10,000 sets each with 9,999 items, which will run up your Redis bill quite quickly. Removing the limit is not recommended unless you're sure you know what you're doing.

If at some point you decide to lower your similarity limits, you'll want to be sure to shrink the size of the sorted sets already in Redis. You can do this with `CourseRecommender.new.ensure_similarity_limit_is_obeyed!`.

Boost
---------------------
What if you want to recommend courses to users based not only on what courses they've taken, but on other attributes of courses that they may be interested in? You can do that by passing the :boost argument to predictions_for:

```ruby
class CourseRecommender
  include Predictor::Base

  # Courses are compared to one another by the users taking them and their tags.
  input_matrix :users,  weight: 3.0
  input_matrix :tags,   weight: 2.0
  input_matrix :topics, weight: 2.0
end

recommender = CourseRecommender.new

# We want to find recommendations for Billy, who's told us that he's
# especially interested in free, interactive courses on Photoshop. So, we give
# a boost to courses that are tagged as free and interactive and have
# Photoshop as a topic:
recommender.predictions_for("Billy", matrix_label: :users, boost: {tags: ['free', 'interactive'], topics: ["Photoshop"]})

# We can also modify how much these tags and topics matter by specifying a
# weight. The default is 1.0, but if that's too much we can just tweak it:
recommender.predictions_for("Billy", matrix_label: :users, boost: {tags: {values: ['free', 'interactive'], weight: 0.4}, topics: {values: ["Photoshop"], weight: 0.3}})
```

Key Prefixes
---------------------
As of 2.2.0, there is much more control available over the format of the keys Predictor will use in Redis. By default, the CourseRecommender given as an example above will use keys like "predictor:CourseRecommender:users:items:user1". You can configure the global namespace like so:

```ruby
  Predictor.redis_prefix 'my_namespace' # => "my_namespace:CourseRecommender:users:items:user1"
  # Or, for a multitenanted setup:
  Predictor.redis_prefix { "user-#{User.current.id}" } # => "user-7:CourseRecommender:users:items:user1"
```

You can also configure the namespace used by each class you create:

```ruby
  class CourseRecommender
    include Predictor::Base
    redis_prefix "courses" # => "predictor:courses:users:items:user1"
    redis_prefix { "courses_for_user-#{User.current.id}" } # => "predictor:courses_for_user-7:users:items:user1"
  end
```

You can also configure the namespace used by each instance you create in addition to class and global namespace:

```ruby
  class CourseRecommender
    include Predictor::Base

    def initialize(prefix)
      @prefix = prefix
    end

    # Simply override this instance method with the prefix you want
    def get_redis_prefix
      @prefix
    end
  end

  recommender = CourseRecommender.new("super")
  recommender.redis_prefix # "predictor:CourseRecommender:super"
```

Processing Items
---------------------
As of 2.3.0, there are now multiple techniques available for processing item similarities. You can choose between them by setting a global default like `Predictor.processing_technique(:lua)` or setting a technique for certain classes like `CourseRecommender.processing_technique(:union)`. There are three values.
- :ruby - This is the default, and is how Predictor calculated similarities before 2.3.0. With this technique the Jaccard and Sorensen calculations are performed in Ruby, with frequent calls to Redis to retrieve simple values. It is somewhat slow.
- :lua - This option performs the Jaccard and Sorensen calculations in a Lua script on the Redis server. It is substantially faster than the :ruby technique, but blocks the Redis server while each set of calculations are run. The period of blocking will vary based on the size and disposition of your data, but each call may take up to several hundred milliseconds. If your application requires your Redis server to always return results quickly, and you're not able to simply run calculations during off-hours, you should use a different strategy.
- :union - This option skips Jaccard and Sorensen entirely, and uses a simpler technique involving a ZUNIONSTORE across many item sets to calculate similarities. The results are different from, but similar to the results of using the Jaccard and Sorensen algorithms. It is even faster than the :lua option and does not have the same problem of blocking Redis for long periods of time, but before using it you should sample the output to ensure that it is good enough for your application.

Predictor now contains a benchmarking script that you can use to compare the speed of these options. An example output from the processing of a relatively small dataset is:

```
ruby = 21.098 seconds
lua = 2.106 seconds
union = 0.741 seconds
```

Upgrading from 1.0 to 2.0
---------------------
As mentioned, 2.0.0 is quite a bit different than 1.0.0, so simply upgrading with no changes likely won't work. My apologies for this. I promise this won't happen in future releases, as I'm much more confident in this Predictor release than the last. Anywho, upgrading really shouldn't be that much of a pain if you follow these steps:

* Change predictor.matrix.add_set! and predictor.matrix.add_single! calls to predictor.add_to_matrix!. For example:
```ruby
# Change
predictor.topics.add_single!("topic-1", "course-1")
# to
predictor.add_to_matrix!(:topics, "topic-1", "course-1")

# Change
predictor.tags.add_set!("tag-1", ["course-1", "course-2"])
# to
predictor.add_to_matrix!(:tags, "tag-1", "course-1", "course-2")
```
* Change predictor.matrix.process! or predictor.matrix.process_item! calls to just predictor.process! or predictor.process_items!
```ruby
# Change
predictor.topics.process_item!("course-1")
# to
predictor.process_items!("course-1")
```
* Change predictor.matrix.delete_item! calls to predictor.delete_from_matrix!. This will update similarities too, so you may want to queue this to run in a background job.
```ruby
# Change
predictor.topics.delete_item!("course-1")
# to delete_from_matrix! if you want to update similarities to account for the deleted item (in v1, this was a bug and didn't occur)
predictor.delete_from_matrix!(:topics, "course-1")
```
* Regenerate your recommendations, as redis keys have changed for Predictor 2. You can use the recommender.clean! to clear out old similarities, then run your rake task (or whatever you've setup) to create new similarities.

About Pathgather
---------------------
Pathgather is an NYC-based startup building a platform that dramatically accelerates learning for enterprises by bringing employees, training content, and existing enterprise systems into one engaging platform.

Every Friday, we work on open-source software (our own or other projects). Want to join our always growing team? Peruse our [current opportunities](http://www.pathgather.com/jobs/) or reach out to us at <tech@pathgather.com>!

Problems? Issues? Want to help out?
---------------------
Just submit a Gihub issue or pull request! We'd love to have you help out, as the most common library to use for this need, Recommendify, was last updated 2 years ago. We'll be sure to keep this maintained, but we could certainly use your help!

The MIT License (MIT)
---------------------
Copyright (c) 2014 Pathgather

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


================================================
FILE: Rakefile
================================================
require 'bundler/gem_tasks'

require 'rspec/core/rake_task'
RSpec::Core::RakeTask.new(:spec)

task :default => :spec

Dir["./benchmark/*.rb"].sort.each &method(:require)


================================================
FILE: benchmark/process.rb
================================================
namespace :benchmark do
  task :process do
    require 'predictor'
    require 'pry'
    require 'logger'

    Predictor.redis = Redis.new #logger: Logger.new(STDOUT)
    Predictor.redis_prefix "predictor-benchmark"

    def flush!
      keys = Predictor.redis.keys("predictor-benchmark*")
      Predictor.redis.del(keys) if keys.any?
    end

    class ItemRecommender
      include Predictor::Base

      input_matrix :users, weight: 2.0
      input_matrix :parts, weight: 1.0
    end

    flush!

    items = (1..200).map { |i| "item-#{i}" }
    users = (1..100).map { |i| "user-#{i}" }
    parts = (1..100).map { |i| "part-#{i}" }

    r = ItemRecommender.new

    start = Time.now
    users.each { |user| r.users.add_to_set user, *items.sample(40) }
    parts.each { |part| r.parts.add_to_set part, *items.sample(40) }
    elapsed = Time.now - start

    puts "add_to_set = #{elapsed.round(3)} seconds"

    [:ruby, :lua, :union].each do |technique|
      start = Time.now
      Predictor.processing_technique technique
      r.process!
      elapsed = Time.now - start
      puts "#{technique} = #{elapsed.round(3)} seconds"
    end

    flush!
  end
end


================================================
FILE: docs/READMEv1.md
================================================
=======
Predictor
=========

Fast and efficient recommendations and predictions using Ruby & Redis. Used in production over at [Pathgather](http://pathgather.com) to generate course similarities and content recommendations to users.

![](https://www.codeship.io/projects/5aeeedf0-6053-0131-2319-5ede98f174ff/status)

Originally forked and based on [Recommendify](https://github.com/paulasmuth/recommendify) by Paul Asmuth, so a huge thanks to him for his contributions to Recommendify. Predictor has been almost completely rewritten to
* Be much, much more performant and efficient by using Redis for most logic.
* Provide item similarities such as "Users that read this book also read ..."
* Provide personalized predictions based on a user's past history, such as "You read these 10 books, so you might also like to read ..."

At the moment, Predictor uses the [Jaccard index](http://en.wikipedia.org/wiki/Jaccard_index) to determine similarities between items. There are other ways to do this, which we intend to implement eventually, but if you want to beat us to the punch, pull requests are quite welcome :)

Installation
---------------------
```ruby
gem install predictor
````
or in your Gemfile:
````
gem 'predictor'
```
Getting Started
---------------------
First step is to configure Predictor with your Redis instance.
```ruby
# in config/initializers/predictor.rb
Predictor.redis = Redis.new(:url => ENV["PREDICTOR_REDIS"])

# Or, to improve performance, add hiredis as your driver (you'll need to install the hiredis gem first)
Predictor.redis = Redis.new(:url => ENV["PREDICTOR_REDIS"], :driver => :hiredis)
```
Inputting Data
---------------------
Create a class and include the Predictor::Base module. Define an input_matrix for each relationship you'd like to keep track of. This can be anything you think is a significant metric for the item: page views, purchases, categories the item belongs to, etc.

Below, we're building a recommender to recommend courses based off of:
* Users that have taken a course. If 2 courses were taken by the same user, this is 3 times as important to us than if the courses share the same topic. This will lead to sets like:
  * "user1" -> "course-1", "course-3",
  * "user2" -> "course-1", "course-4"
* Tags and their courses. This will lead to sets like:
  * "rails" -> "course-1", "course-2",
  * "microeconomics" -> "course-3", "course-4"
* Topics and their courses. This will lead to sets like:
  * "computer science" -> "course-1", "course-2",
  * "economics and finance" -> "course-3", "course-4"

```ruby
class CourseRecommender
  include Predictor::Base

  input_matrix :users, weight: 3.0
  input_matrix :tags, weight: 2.0
  input_matrix :topics, weight: 1.0
end
```

Now, we just need to update our matrices when courses are created, users take a course, topics are changed, etc:
```ruby
recommender = CourseRecommender.new

# Add a single course to topic-1's items. If topic-1 already exists as a set ID, this just adds course-1 to the set
recommender.topics.add_single!("topic-1", "course-1")

# If your matrix is quite large, add_single! could take some time, as it must calculate the similarity scores
# for course-1 across all other courses. If this is the case, use add_single and process the item at a more
# convenient time, perhaps in a background job
recommender.topics.add_single("topic-1", "course-1")
recommender.topics.process_item!("course-1")

# Add an array of courses to tag-1. Again, these will simply be added to tag-1's existing set, if it exists.
# If not, the tag-1 set will be initialized with course-1 and course-2
recommender.tags.add_set!("tag-1", ["course-1", "course-2"])

# Or, just add the set and process whenever you like
recommender.tags.add_set("tag-1", ["course-1", "course-2"])
["course-1", "course-2"].each { |course| recommender.topics.process_item!(course) }
```

As noted above, it's important to remember that if you don't use the bang methods (add_set! and add_single!), you'll need to manually update your similarities (the bang methods will likely suffice for most use cases though). You can do so a variety of ways.
* If you want to simply update the similarities for a single item in a specific matrix:
  ````
  recommender.matrix.process_item!(item)
  ````
* If you want to update the similarities for all items in a specific matrix:
  ````
  recommender.matrix.process!
  ````
* If you want to update the similarities for a single item in all matrices:
  ````
  recommender.process_item!(item)
  ````
* If you want to update all similarities in all matrices:
  ````
  recommender.process!
  ````

Retrieving Similarities and Recommendations
---------------------
Now that your matrices have been initialized with several relationships, you can start generating similarities and recommendations! First, let's start with similarities, which will use the weights we specify on each matrix to determine which courses share the most in common with a given course.

![Course Alternative](http://pathgather.github.io/predictor/images/course-alts.png)

```ruby
recommender = CourseRecommender.new

# Return all similarities for course-1 (ordered by most similar to least).
recommender.similarities_for("course-1")

# Need to paginate? Not a problem! Specify an offset and a limit
recommender.similarities_for("course-1", offset: 10, limit: 10) # Gets similarities 11-20

# Want scores?
recommender.similarities_for("course-1", with_scores: true)

# Want to ignore a certain set of courses in similarities?
recommender.similarities_for("course-1", exclusion_set: ["course-2"])
```

The above examples are great for situations like "Users that viewed this also liked ...", but what if you wanted to recommend courses to a user based on the courses they've already taken? Not a problem!

![Course Recommendations](http://pathgather.github.io/predictor/images/suggested.png)

```ruby
recommender = CourseRecommender.new

# User has taken course-1 and course-2. Let's see what else they might like...
recommender.predictions_for(item_set: ["course-1", "course-2"])

# Already have the set you need stored in an input matrix? In our case, we do (the users matrix stores the courses a user has taken), so we can just do:
recommender.predictions_for("user-1", matrix_label: :users)

# Paginate too!
recommender.predictions_for("user-1", matrix_label: :users, offset: 10, limit: 10)

# Gimme some scores and ignore user-2....that user-2 is one sketchy fella
recommender.predictions_for("user-1", matrix_label: :users, with_scores: true, exclusion_set: ["user-2"])
```

Deleting Items
---------------------
If your data is deleted from your persistent storage, you certainly don't want to recommend that data to a user. To ensure that doesn't happen, simply call delete_item! on the individual matrix or recommender as a whole:
```ruby
recommender = CourseRecommender.new

# User removed course-1 from topic-1, but course-1 still exists
recommender.topics.delete_item!("course-1")

# course-1 was permanently deleted
recommender.delete_item!("course-1")

# Something crazy has happened, so let's just start fresh and wipe out all previously stored similarities:
recommender.clean!
```

Memory Management
---------------------
Predictor works by caching the similarities for each item in each matrix, then computing overall similarities off those caches. With an even semi-large dataset, this can really eat up Redis's memory. To limit the number of similarities cached in each matrix, specify a similarity_limit option when defining the matrix.
```ruby
class CourseRecommender
  include Predictor::Base

  input_matrix :users, weight: 3.0, similarity_limit: 300
  input_matrix :tags, weight: 2.0, similarity_limit: 300
  input_matrix :topics, weight: 1.0, similarity_limit: 300
end
```

This will ensure that only the top 300 similarities for each item are cached in each matrix. This can greatly reduce your memory usage, and if you're just using Predictor for scenarios where you maybe show the top 5 or so similar items, then this can be hugely helpful. But note, **don't set similarity_limit to 5 in that case**. This simply limits the similarities cached in each matrix, but does not limit the similarities for an item across all matrices. That is computed (and can be limited) on the fly, and uses the similarity cache in each matrix. So, you need a large enough cache in each matrix to determine an intelligent similarity list across all matrices.

*Note*: This is a bit of a hack, and there are most certainly other ways to improve Predictor's memory usage for large datasets, but each appear to require a more significant change than the trivial implementation of similarity_limit above. PRs are quite welcome that experiment with these other ways :)

Oh, and if you decide to tinker with your limit to try and find a sweet spot, I added a helpful method to ensure limits are obeyed to avoid regenerating all similarities. Of course, this only helps if you are decreasing the limit. If you're increasing it, you'll need to process similarities all over.
```ruby
recommender.users.ensure_similarity_limit_is_obeyed!  # Remove similarities that disobey our current limit
recommender.tags.ensure_similarity_limit_is_obeyed!
recommender.topics.ensure_similarity_limit_is_obeyed!
```

Problems? Issues? Want to help out?
---------------------
Just submit a Gihub issue or pull request! We'd love to have you help out, as the most common library to use for this need, Recommendify, was last updated 2 years ago. We'll be sure to keep this maintained, but we could certainly use your help!

The MIT License (MIT)
---------------------
Copyright (c) 2014 Pathgather

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


================================================
FILE: lib/predictor/base.rb
================================================
module Predictor::Base
  def self.included(base)
    base.extend(ClassMethods)
  end

  module ClassMethods
    def input_matrix(key, opts={})
      @matrices ||= {}
      @matrices[key] = opts
    end

    def limit_similarities_to(val)
      @similarity_limit_set = true
      @similarity_limit     = val
    end

    def similarity_limit
      @similarity_limit_set ? @similarity_limit : 128
    end

    def reset_similarity_limit!
      @similarity_limit_set = nil
      @similarity_limit     = nil
    end

    def input_matrices=(val)
      @matrices = val
    end

    def input_matrices
      @matrices
    end

    def redis_prefix(prefix = nil, &block)
      @redis_prefix = block_given? ? block : prefix
    end

    def get_redis_prefix
      if @redis_prefix
        if @redis_prefix.respond_to?(:call)
          @redis_prefix.call
        else
          @redis_prefix
        end
      else
        to_s
      end
    end

    def processing_technique(technique)
      @technique = technique
    end

    def get_processing_technique
      @technique || Predictor.get_processing_technique
    end
  end

  def input_matrices
    @input_matrices ||= Hash[self.class.input_matrices.map{ |key, opts|
      opts.merge!(:key => key, :base => self)
      [ key, Predictor::InputMatrix.new(opts) ]
    }]
  end

  def get_redis_prefix
    nil # Override in subclass.
  end

  def redis_prefix
    [Predictor.get_redis_prefix, self.class.get_redis_prefix, self.get_redis_prefix].compact
  end

  def similarity_limit
    self.class.similarity_limit
  end

  def redis_key(*append)
    ([redis_prefix] + append).flatten.compact.join(":")
  end

  def method_missing(method, *args)
    if input_matrices.has_key?(method)
      input_matrices[method]
    else
      raise NoMethodError.new(method.to_s)
    end
  end

  def respond_to?(method, include_all = false)
    input_matrices.has_key?(method) ? true : super
  end

  def all_items
    Predictor.redis.smembers(redis_key(:all_items))
  end

  def add_to_matrix(matrix, set, *items)
    items = items.flatten if items.count == 1 && items[0].is_a?(Array)  # Old syntax
    input_matrices[matrix].add_to_set(set, *items)
  end

  def add_to_matrix!(matrix, set, *items)
    items = items.flatten if items.count == 1 && items[0].is_a?(Array)  # Old syntax
    add_to_matrix(matrix, set, *items)
    process_items!(*items)
  end

  def related_items(item)
    keys = []
    input_matrices.each do |key, matrix|
      sets = Predictor.redis.smembers(matrix.redis_key(:sets, item))
      keys.concat(sets.map { |set| matrix.redis_key(:items, set) })
    end

    keys.empty? ? [] : (Predictor.redis.sunion(keys) - [item.to_s])
  end

  def predictions_for(set=nil, item_set: nil, matrix_label: nil, with_scores: false, on: nil, offset: 0, limit: -1, exclusion_set: [], boost: {})
    fail "item_set or matrix_label and set is required" unless item_set || (matrix_label && set)

    on = Array(on)

    if matrix_label
      matrix = input_matrices[matrix_label]
      item_set = Predictor.redis.smembers(matrix.redis_key(:items, set))
    end

    item_keys = []
    weights   = []

    item_set.each do |item|
      item_keys << redis_key(:similarities, item)
      weights   << 1.0
    end

    boost.each do |matrix_label, values|
      m = input_matrices[matrix_label]

      # Passing plain sets to zunionstore is undocumented, but tested and supported:
      # https://github.com/antirez/redis/blob/2.8.11/tests/unit/type/zset.tcl#L481-L489

      case values
      when Hash
        values[:values].each do |value|
          item_keys << m.redis_key(:items, value)
          weights   << values[:weight]
        end
      when Array
        values.each do |value|
          item_keys << m.redis_key(:items, value)
          weights   << 1.0
        end
      else
        raise "Bad value for boost: #{boost.inspect}"
      end
    end

    return [] if item_keys.empty?

    predictions = nil

    Predictor.redis.multi do |multi|
      multi.zunionstore 'temp', item_keys, weights: weights
      multi.zrem 'temp', item_set if item_set.any?
      multi.zrem 'temp', exclusion_set if exclusion_set.length > 0

      if on.any?
        multi.zadd 'temp2', on.map{ |val| [0.0, val] }
        multi.zinterstore 'temp', ['temp', 'temp2']
        multi.del 'temp2'
      end

      predictions = multi.zrevrange 'temp', offset, limit == -1 ? limit : offset + (limit - 1), with_scores: with_scores
      multi.del 'temp'
    end

    predictions.value
  end

  def similarities_for(item, with_scores: false, offset: 0, limit: -1, exclusion_set: [])
    neighbors = nil
    Predictor.redis.multi do |multi|
      multi.zunionstore 'temp', [1, redis_key(:similarities, item)]
      multi.zrem 'temp', exclusion_set if exclusion_set.length > 0
      neighbors = multi.zrevrange('temp', offset, limit == -1 ? limit : offset + (limit - 1), with_scores: with_scores)
      multi.del 'temp'
    end
    return neighbors.value
  end

  def sets_for(item)
    keys = input_matrices.map{ |k,m| m.redis_key(:sets, item) }
    Predictor.redis.sunion keys
  end

  def process_item!(item)
    process_items!(item)  # Old method
  end

  def process_items!(*items)
    items = items.flatten if items.count == 1 && items[0].is_a?(Array) # Old syntax

    case self.class.get_processing_technique
    when :lua
      matrix_data = {}
      input_matrices.each do |name, matrix|
        matrix_data[name] = {weight: matrix.weight, measure: matrix.measure_name}
      end
      matrix_json = JSON.dump(matrix_data)

      items.each do |item|
        Predictor.process_lua_script(redis_key, matrix_json, similarity_limit, item)
      end
    when :union
      items.each do |item|
        keys    = []
        weights = []

        input_matrices.each do |key, matrix|
          k = matrix.redis_key(:sets, item)
          item_keys = Predictor.redis.smembers(k).map { |set| matrix.redis_key(:items, set) }

          counts = Predictor.redis.multi do |multi|
            item_keys.each { |key| Predictor.redis.scard(key) }
          end

          item_keys.zip(counts).each do |key, count|
            unless count.zero?
              keys << key
              weights << matrix.weight / count
            end
          end
        end

        Predictor.redis.multi do |multi|
          key = redis_key(:similarities, item)
          multi.del(key)

          if keys.any?
            multi.zunionstore(key, keys, weights: weights)
            multi.zrem(key, item)
            multi.zremrangebyrank(key, 0, -(similarity_limit + 1))
            multi.zunionstore key, [key] # Rewrite zset for optimized storage.
          end
        end
      end
    else # Default to old behavior, processing things in Ruby.
      items.each do |item|
        related_items(item).each { |related_item| cache_similarity(item, related_item) }
      end
    end

    return self
  end

  def process!
    process_items!(*all_items)
    return self
  end

  def delete_from_matrix!(matrix, item)
    # Deleting from a specific matrix, so get related_items, delete, then update the similarity of those related_items
    items = related_items(item)
    input_matrices[matrix].delete_item(item)
    items.each { |related_item| cache_similarity(item, related_item) }
    return self
  end

  def delete_pair_from_matrix!(matrix, set, item)
    items = related_items(item)
    input_matrices[matrix].remove_from_set(set, item)
    items.each { |related_item| cache_similarity(item, related_item) }
    return self
  end

  def add_item(item)
    Predictor.redis.sadd(redis_key(:all_items), item)
  end

  def delete_item!(item)
    Predictor.redis.srem(redis_key(:all_items), item)
    Predictor.redis.watch(redis_key(:similarities, item)) do
      items = related_items(item)
      Predictor.redis.multi do |multi|
        items.each do |related_item|
          multi.zrem(redis_key(:similarities, related_item), item)
        end
        multi.del redis_key(:similarities, item)
      end
    end

    input_matrices.each do |k,m|
      m.delete_item(item)
    end
    return self
  end

  def clean!
    keys = Predictor.redis.keys(redis_key('*'))
    unless keys.empty?
      Predictor.redis.del(keys)
    end
  end

  def ensure_similarity_limit_is_obeyed!
    if similarity_limit
      items = all_items
      Predictor.redis.multi do |multi|
        items.each do |item|
          key = redis_key(:similarities, item)
          multi.zremrangebyrank(key, 0, -(similarity_limit + 1))
          multi.zunionstore key, [key] # Rewrite zset to take advantage of ziplist implementation.
        end
      end
    end
  end

  private

  def cache_similarity(item1, item2)
    score = 0
    input_matrices.each do |key, matrix|
      score += (matrix.score(item1, item2) * matrix.weight)
    end
    if score > 0
      add_similarity_if_necessary(item1, item2, score)
      add_similarity_if_necessary(item2, item1, score)
    else
      Predictor.redis.multi do |multi|
        multi.zrem(redis_key(:similarities, item1), item2)
        multi.zrem(redis_key(:similarities, item2), item1)
      end
    end
  end

  def add_similarity_if_necessary(item, similarity, score)
    store = true
    key = redis_key(:similarities, item)
    if similarity_limit
      if Predictor.redis.zrank(key, similarity).nil? && Predictor.redis.zcard(key) >= similarity_limit
        # Similarity is not already stored and we are at limit of similarities
        lowest_scored_item = Predictor.redis.zrangebyscore(key, "0", "+inf", limit: [0, 1], with_scores: true)
        unless lowest_scored_item.empty?
          # If score is less than or equal to the lowest score, don't store it. Otherwise, make room by removing the lowest scored similarity
          score <= lowest_scored_item[0][1] ? store = false : Predictor.redis.zrem(key, lowest_scored_item[0][0])
        end
      end
    end
    Predictor.redis.zadd(key, score, similarity) if store
  end
end


================================================
FILE: lib/predictor/distance.rb
================================================
module Predictor
  module Distance
    extend self

    def jaccard_index(key_1, key_2, redis = Predictor.redis)
      x, y = nil

      redis.multi do |multi|
        x = multi.sinterstore 'temp', [key_1, key_2]
        y = multi.sunionstore 'temp', [key_1, key_2]
        multi.del 'temp'
      end

      y.value > 0 ? (x.value.to_f/y.value.to_f) : 0.0
    end

    def sorensen_coefficient(key_1, key_2, redis = Predictor.redis)
      x, y, z = nil

      redis.multi do |multi|
        x = multi.sinterstore 'temp', [key_1, key_2]
        y = multi.scard key_1
        z = multi.scard key_2
        multi.del 'temp'
      end

      denom = (y.value + z.value)
      denom > 0 ? (2 * (x.value) / denom.to_f) : 0.0
    end
  end
end


================================================
FILE: lib/predictor/input_matrix.rb
================================================
module Predictor
  class InputMatrix
    def initialize(opts)
      @opts = opts
    end

    def measure_name
      @opts.fetch(:measure, :jaccard_index)
    end

    def base
      @opts[:base]
    end

    def parent_redis_key(*append)
      base.redis_key(*append)
    end

    def redis_key(*append)
      base.redis_key(@opts.fetch(:key), *append)
    end

    def weight
      (@opts[:weight] || 1).to_f
    end

    def add_to_set(set, *items)
      items = items.flatten if items.count == 1 && items[0].is_a?(Array)
      if items.any?
        Predictor.redis.multi do |redis|
          redis.sadd(parent_redis_key(:all_items), items)
          redis.sadd(redis_key(:items, set), items)

          items.each do |item|
            # add the set to the item's set--inverting the sets
            redis.sadd(redis_key(:sets, item), set)
          end
        end
      end
    end

    # Delete a specific relationship
    def remove_from_set(set, item)
      Predictor.redis.multi do |redis|
        redis.srem(redis_key(:items, set), item)
        redis.srem(redis_key(:sets, item), set)
      end
    end

    def add_set(set, items)
      add_to_set(set, *items)
    end

    def add_single(set, item)
      add_to_set(set, item)
    end

    def items_for(set)
      Predictor.redis.smembers redis_key(:items, set)
    end

    def sets_for(item)
      Predictor.redis.sunion redis_key(:sets, item)
    end

    def related_items(item)
      sets = Predictor.redis.smembers(redis_key(:sets, item))
      keys = sets.map { |set| redis_key(:items, set) }
      keys.length > 0 ? Predictor.redis.sunion(keys) - [item.to_s] : []
    end

    # delete item from the matrix
    def delete_item(item)
      Predictor.redis.watch(redis_key(:sets, item)) do
        sets = Predictor.redis.smembers(redis_key(:sets, item))
        Predictor.redis.multi do |multi|
          sets.each do |set|
            multi.srem(redis_key(:items, set), item)
          end

          multi.del redis_key(:sets, item)
        end
      end
    end

    def score(item1, item2)
      Distance.send(measure_name, redis_key(:sets, item1), redis_key(:sets, item2), Predictor.redis)
    end

    def calculate_jaccard(item1, item2)
      warn 'InputMatrix#calculate_jaccard is now deprecated. Use InputMatrix#score instead'
      Distance.jaccard_index(redis_key(:sets, item1), redis_key(:sets, item2), Predictor.redis)
    end
  end
end


================================================
FILE: lib/predictor/predictor.rb
================================================
module Predictor
  @@redis = nil
  @@redis_prefix = nil

  def self.redis=(redis)
    @@redis = redis
  end

  def self.redis
    return @@redis unless @@redis.nil?
    raise "redis not configured! - Predictor.redis = Redis.new"
  end

  def self.redis_prefix(prefix = nil, &block)
    @@redis_prefix = block_given? ? block : prefix
  end

  def self.get_redis_prefix
    if @@redis_prefix
      if @@redis_prefix.respond_to?(:call)
        @@redis_prefix.call
      else
        @@redis_prefix
      end
    else
      'predictor'
    end
  end

  def self.capitalize(str_or_sym)
  	str = str_or_sym.to_s.each_char.to_a
  	str.first.upcase + str[1..-1].join("").downcase
  end

  def self.constantize(klass)
    Object.module_eval("Predictor::#{klass}", __FILE__, __LINE__)
  end

  def self.processing_technique(algorithm)
    @technique = algorithm
  end

  def self.get_processing_technique
    @technique || :ruby
  end

  def self.process_lua_script(*args)
    @process_sha ||= redis.script(:load, PROCESS_ITEMS_LUA_SCRIPT)
    redis.evalsha(@process_sha, argv: args)
  end

  PROCESS_ITEMS_LUA_SCRIPT = <<-LUA
    local redis_prefix = ARGV[1]
    local input_matrices = cjson.decode(ARGV[2])
    local similarity_limit = tonumber(ARGV[3])
    local item = ARGV[4]
    local keys = {}

    for name, options in pairs(input_matrices) do
      local key = table.concat({redis_prefix, name, 'sets', item}, ':')
      local sets = redis.call('SMEMBERS', key)
      for _, set in ipairs(sets) do
        table.insert(keys, table.concat({redis_prefix, name, 'items', set}, ':'))
      end
    end

    -- Account for empty tables.
    if next(keys) == nil then
      return nil
    end

    local related_items = redis.call('SUNION', unpack(keys))

    local function add_similarity_if_necessary(item, similarity, score)
      local store = true
      local key = table.concat({redis_prefix, 'similarities', item}, ':')

      if similarity_limit ~= nil then
        local zrank = redis.call('ZRANK', key, similarity)

        if zrank ~= nil then
          local zcard = redis.call('ZCARD', key)

          if zcard >= similarity_limit then
            -- Similarity is not already stored and we are at limit of similarities.

            local lowest_scored_item = redis.call('ZRANGEBYSCORE', key, '0', '+inf', 'withscores', 'limit', 0, 1)

            if #lowest_scored_item > 0 then
              -- If score is less than or equal to the lowest score, don't store it. Otherwise, make room by removing the lowest scored similarity
              if score <= tonumber(lowest_scored_item[2]) then
                store = false
              else
                redis.call('ZREM', key, lowest_scored_item[1])
              end
            end
          end
        end
      end

      if store then
        redis.call('ZADD', key, score, similarity)
      end
    end

    for i, related_item in ipairs(related_items) do
      -- Disregard the current item.
      if related_item ~= item then
        local score = 0.0

        for name, matrix in pairs(input_matrices) do
          local s = 0.0

          local key_1 = table.concat({redis_prefix, name, 'sets', item}, ':')
          local key_2 = table.concat({redis_prefix, name, 'sets', related_item}, ':')

          if matrix.measure == 'jaccard_index' then
            local x = tonumber(redis.call('SINTERSTORE', 'temp', key_1, key_2))
            local y = tonumber(redis.call('SUNIONSTORE', 'temp', key_1, key_2))
            redis.call('DEL', 'temp')

            if y > 0 then
              s = s + (x / y)
            end
          elseif matrix.measure == 'sorensen_coefficient' then
            local x = redis.call('SINTERSTORE', 'temp', key_1, key_2)
            local y = redis.call('SCARD', key_1)
            local z = redis.call('SCARD', key_2)

            redis.call('DEL', 'temp')

            local denom = y + z
            if denom > 0 then
              s = s + (2 * x / denom)
            end
          else
            error("Bad matrix.measure: " .. matrix.measure)
          end

          score = score + (s * matrix.weight)
        end

        if score > 0 then
          add_similarity_if_necessary(item, related_item, score)
          add_similarity_if_necessary(related_item, item, score)
        else
          redis.call('ZREM', table.concat({redis_prefix, 'similarities', item}, ':'), related_item)
          redis.call('ZREM', table.concat({redis_prefix, 'similarities', related_item}, ':'), item)
        end
      end
    end
  LUA
end


================================================
FILE: lib/predictor/version.rb
================================================
module Predictor
  VERSION = "2.3.1"
end


================================================
FILE: lib/predictor.rb
================================================
require 'json'
require "redis"
require "predictor/predictor"
require "predictor/distance"
require "predictor/input_matrix"
require "predictor/base"


================================================
FILE: predictor.gemspec
================================================
# -*- encoding: utf-8 -*-
require File.expand_path('../lib/predictor/version', __FILE__)

Gem::Specification.new do |s|
  s.name        = "predictor"
  s.version     = Predictor::VERSION
  s.platform    = Gem::Platform::RUBY
  s.authors     = ["Pathgather"]
  s.email       = ["tech@pathgather.com"]
  s.homepage    = "https://github.com/nyagato-00/predictor"
  s.description = s.summary = "Fast and efficient recommendations and predictions using Redis"
  s.licenses    = ["MIT"]

  s.add_dependency "redis", ">= 3.0.0"

  s.add_development_dependency "rspec", ">= 3.4.0"
  s.add_development_dependency "rake", ">= 11.0"
  s.add_development_dependency "pry"
  s.add_development_dependency "yard"

  s.files         = `git ls-files`.split("\n") - [".gitignore", ".rspec", ".travis.yml"]
  s.test_files    = `git ls-files -- spec/*`.split("\n")
  s.require_paths = ["lib"]
end


================================================
FILE: spec/base_spec.rb
================================================
require 'spec_helper'

describe Predictor::Base do
  before(:each) do
    flush_redis!
    BaseRecommender.input_matrices = {}
    BaseRecommender.reset_similarity_limit!
    BaseRecommender.redis_prefix(nil)
    UserRecommender.input_matrices = {}
    UserRecommender.reset_similarity_limit!
    BaseRecommender.processing_technique nil
    UserRecommender.processing_technique nil
    Predictor.processing_technique nil
  end

  describe "configuration" do
    it "should add an input_matrix by 'key'" do
      BaseRecommender.input_matrix(:myinput)
      expect(BaseRecommender.input_matrices.keys).to eq([:myinput])
    end

    it "should default the similarity_limit to 128" do
      expect(BaseRecommender.similarity_limit).to eq(128)
    end

    it "should allow the similarity limit to be configured" do
      BaseRecommender.limit_similarities_to(500)
      expect(BaseRecommender.similarity_limit).to eq(500)
    end

    it "should allow the similarity limit to be removed" do
      BaseRecommender.limit_similarities_to(nil)
      expect(BaseRecommender.similarity_limit).to eq(nil)
    end

    it "should retrieve an input_matrix on a new instance" do
      BaseRecommender.input_matrix(:myinput)
      sm = BaseRecommender.new
      expect{ sm.myinput }.not_to raise_error
    end

    it "should retrieve an input_matrix on a new instance and correctly overload respond_to?" do
      BaseRecommender.input_matrix(:myinput)
      sm = BaseRecommender.new
      expect(sm.respond_to?(:process!)).to be_truthy
      expect(sm.respond_to?(:myinput)).to be_truthy
      expect(sm.respond_to?(:fnord)).to be_falsey
    end

    it "should retrieve an input_matrix on a new instance and intialize the correct class" do
      BaseRecommender.input_matrix(:myinput)
      sm = BaseRecommender.new
      expect(sm.myinput).to be_a(Predictor::InputMatrix)
    end

    it "should accept a custom processing_technique, or default to Predictor's default" do
      expect(BaseRecommender.get_processing_technique).to eq(:ruby)
      Predictor.processing_technique :lua
      expect(BaseRecommender.get_processing_technique).to eq(:lua)
      BaseRecommender.processing_technique :union
      expect(BaseRecommender.get_processing_technique).to eq(:union)
    end
  end

  describe "redis_key" do
    it "should vary based on the class name" do
      expect(BaseRecommender.new.redis_key).to eq('predictor-test:BaseRecommender')
      expect(UserRecommender.new.redis_key).to eq('predictor-test:UserRecommender')
    end
  end

  describe "redis_key" do
    it "should vary based on the class name" do
      expect(BaseRecommender.new.redis_key).to eq('predictor-test:BaseRecommender')
      expect(UserRecommender.new.redis_key).to eq('predictor-test:UserRecommender')
    end

    it "should be able to mimic the old naming defaults" do
      BaseRecommender.redis_prefix([nil])
      expect(BaseRecommender.new.redis_key(:key)).to eq('predictor-test:key')
    end

    it "should respect the Predictor prefix configuration setting" do
      br = BaseRecommender.new

      expect(br.redis_key).to eq("predictor-test:BaseRecommender")
      expect(br.redis_key(:another)).to eq("predictor-test:BaseRecommender:another")
      expect(br.redis_key(:another, :key)).to eq("predictor-test:BaseRecommender:another:key")
      expect(br.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:BaseRecommender:another:set:of:keys")

      i = 0
      Predictor.redis_prefix { i += 1 }
      expect(br.redis_key).to eq("1:BaseRecommender")
      expect(br.redis_key(:another)).to eq("2:BaseRecommender:another")
      expect(br.redis_key(:another, :key)).to eq("3:BaseRecommender:another:key")
      expect(br.redis_key(:another, [:set, :of, :keys])).to eq("4:BaseRecommender:another:set:of:keys")

      Predictor.redis_prefix nil
      expect(br.redis_key).to eq("predictor:BaseRecommender")
      expect(br.redis_key(:another)).to eq("predictor:BaseRecommender:another")
      expect(br.redis_key(:another, :key)).to eq("predictor:BaseRecommender:another:key")
      expect(br.redis_key(:another, [:set, :of, :keys])).to eq("predictor:BaseRecommender:another:set:of:keys")

      Predictor.redis_prefix [nil]
      expect(br.redis_key).to eq("BaseRecommender")
      expect(br.redis_key(:another)).to eq("BaseRecommender:another")
      expect(br.redis_key(:another, :key)).to eq("BaseRecommender:another:key")
      expect(br.redis_key(:another, [:set, :of, :keys])).to eq("BaseRecommender:another:set:of:keys")

      Predictor.redis_prefix { [1, 2, 3] }
      expect(br.redis_key).to eq("1:2:3:BaseRecommender")
      expect(br.redis_key(:another)).to eq("1:2:3:BaseRecommender:another")
      expect(br.redis_key(:another, :key)).to eq("1:2:3:BaseRecommender:another:key")
      expect(br.redis_key(:another, [:set, :of, :keys])).to eq("1:2:3:BaseRecommender:another:set:of:keys")

      Predictor.redis_prefix 'predictor-test'
      expect(br.redis_key).to eq("predictor-test:BaseRecommender")
      expect(br.redis_key(:another)).to eq("predictor-test:BaseRecommender:another")
      expect(br.redis_key(:another, :key)).to eq("predictor-test:BaseRecommender:another:key")
      expect(br.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:BaseRecommender:another:set:of:keys")
    end

    it "should respect the class prefix configuration setting" do
      br = BaseRecommender.new

      BaseRecommender.redis_prefix('base')
      expect(br.redis_key).to eq("predictor-test:base")
      expect(br.redis_key(:another)).to eq("predictor-test:base:another")
      expect(br.redis_key(:another, :key)).to eq("predictor-test:base:another:key")
      expect(br.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:base:another:set:of:keys")

      i = 0
      BaseRecommender.redis_prefix { i += 1 }
      expect(br.redis_key).to eq("predictor-test:1")
      expect(br.redis_key(:another)).to eq("predictor-test:2:another")
      expect(br.redis_key(:another, :key)).to eq("predictor-test:3:another:key")
      expect(br.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:4:another:set:of:keys")

      BaseRecommender.redis_prefix(nil)
      expect(br.redis_key).to eq("predictor-test:BaseRecommender")
      expect(br.redis_key(:another)).to eq("predictor-test:BaseRecommender:another")
      expect(br.redis_key(:another, :key)).to eq("predictor-test:BaseRecommender:another:key")
      expect(br.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:BaseRecommender:another:set:of:keys")
    end

    it "should respect the instance prefix configuration setting" do
      br = PrefixRecommender.new("foo")

      expect(br.redis_key).to eq("predictor-test:PrefixRecommender:foo")
      expect(br.redis_key(:another)).to eq("predictor-test:PrefixRecommender:foo:another")
      expect(br.redis_key(:another, :key)).to eq("predictor-test:PrefixRecommender:foo:another:key")
      expect(br.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:PrefixRecommender:foo:another:set:of:keys")


      br.prefix = nil
      expect(br.redis_key).to eq("predictor-test:PrefixRecommender")
      expect(br.redis_key(:another)).to eq("predictor-test:PrefixRecommender:another")

    end
  end

  describe "all_items" do
    it "returns all items across all matrices" do
      BaseRecommender.input_matrix(:anotherinput)
      BaseRecommender.input_matrix(:yetanotherinput)
      sm = BaseRecommender.new
      sm.add_to_matrix(:anotherinput, 'a', "foo", "bar")
      sm.add_to_matrix(:yetanotherinput, 'b', "fnord", "shmoo", "bar")
      expect(sm.all_items).to include('foo', 'bar', 'fnord', 'shmoo')
      expect(sm.all_items.length).to eq(4)
    end

    it "doesn't return items from other recommenders" do
      BaseRecommender.input_matrix(:anotherinput)
      BaseRecommender.input_matrix(:yetanotherinput)
      UserRecommender.input_matrix(:anotherinput)
      UserRecommender.input_matrix(:yetanotherinput)
      sm = BaseRecommender.new
      sm.add_to_matrix(:anotherinput, 'a', "foo", "bar")
      sm.add_to_matrix(:yetanotherinput, 'b', "fnord", "shmoo", "bar")
      expect(sm.all_items).to include('foo', 'bar', 'fnord', 'shmoo')
      expect(sm.all_items.length).to eq(4)

      ur = UserRecommender.new
      expect(ur.all_items).to eq([])
    end
  end

  describe "add_to_matrix" do
    it "calls add_to_set on the given matrix" do
      BaseRecommender.input_matrix(:anotherinput)
      sm = BaseRecommender.new
      expect(sm.anotherinput).to receive(:add_to_set).with('a', 'foo', 'bar')
      sm.add_to_matrix(:anotherinput, 'a', 'foo', 'bar')
    end

    it "adds the items to the all_items storage" do
      BaseRecommender.input_matrix(:anotherinput)
      sm = BaseRecommender.new
      sm.add_to_matrix(:anotherinput, 'a', 'foo', 'bar')
      expect(sm.all_items).to include('foo', 'bar')
    end
  end

  describe "add_to_matrix!" do
    it "calls add_to_matrix and process_items! for the given items" do
      BaseRecommender.input_matrix(:anotherinput)
      sm = BaseRecommender.new
      expect(sm).to receive(:add_to_matrix).with(:anotherinput, 'a', 'foo')
      expect(sm).to receive(:process_items!).with('foo')
      sm.add_to_matrix!(:anotherinput, 'a', 'foo')
    end
  end

  describe "related_items" do
    it "returns items in the sets across all matrices that the given item is also in" do
      BaseRecommender.input_matrix(:anotherinput)
      BaseRecommender.input_matrix(:yetanotherinput)
      BaseRecommender.input_matrix(:finalinput)
      sm = BaseRecommender.new
      sm.anotherinput.add_to_set('a', "foo", "bar")
      sm.yetanotherinput.add_to_set('b', "fnord", "shmoo", "bar")
      sm.finalinput.add_to_set('c', "nada")
      sm.process!
      expect(sm.related_items("bar")).to include("foo", "fnord", "shmoo")
      expect(sm.related_items("bar").length).to eq(3)
    end
  end

  describe "predictions_for" do
    it "accepts an :on option to return scores of specific objects" do
      BaseRecommender.input_matrix(:users, weight: 4.0)
      BaseRecommender.input_matrix(:tags, weight: 1.0)
      sm = BaseRecommender.new
      sm.users.add_to_set('me', "foo", "bar", "fnord")
      sm.users.add_to_set('not_me', "foo", "shmoo")
      sm.users.add_to_set('another', "fnord", "other")
      sm.users.add_to_set('another', "nada")
      sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo")
      sm.tags.add_to_set('tag2', "bar", "shmoo", "other")
      sm.tags.add_to_set('tag3', "shmoo", "nada")
      sm.process!
      predictions = sm.predictions_for('me', matrix_label: :users, on: 'other', with_scores: true)
      expect(predictions).to eq([['other', 3.0]])
      predictions = sm.predictions_for('me', matrix_label: :users, on: ['other'], with_scores: true)
      expect(predictions).to eq([['other', 3.0]])
      predictions = sm.predictions_for('me', matrix_label: :users, on: ['other', 'nada'], with_scores: true)
      expect(predictions).to eq([['other', 3.0], ['nada', 2.0]])
      predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], on: ['other', 'nada'], with_scores: true)
      expect(predictions).to eq([['other', 3.0], ['nada', 2.0]])
      predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], on: ['other', 'nada'])
      expect(predictions).to eq(['other', 'nada'])
      predictions = sm.predictions_for('me', matrix_label: :users, on: ['shmoo', 'other', 'nada'], offset: 1, limit: 1, with_scores: true)
      expect(predictions).to eq([["other", 3.0]])
      predictions = sm.predictions_for('me', matrix_label: :users, on: ['shmoo', 'other', 'nada'], offset: 1, with_scores: true)
      expect(predictions).to eq([['other', 3.0], ['nada', 2.0]])
    end
  end

  [:ruby, :lua, :union].each do |technique|
    describe "predictions_for with #{technique} processing" do
      before do
        Predictor.processing_technique(technique)
      end

      it "returns relevant predictions" do
        BaseRecommender.input_matrix(:users, weight: 4.0)
        BaseRecommender.input_matrix(:tags, weight: 1.0)
        sm = BaseRecommender.new
        sm.users.add_to_set('me', "foo", "bar", "fnord")
        sm.users.add_to_set('not_me', "foo", "shmoo")
        sm.users.add_to_set('another', "fnord", "other")
        sm.users.add_to_set('another', "nada")
        sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo")
        sm.tags.add_to_set('tag2', "bar", "shmoo")
        sm.tags.add_to_set('tag3', "shmoo", "nada")
        sm.process!
        predictions = sm.predictions_for('me', matrix_label: :users)
        expect(predictions).to eq(["shmoo", "other", "nada"])
        predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"])
        expect(predictions).to eq(["shmoo", "other", "nada"])
        predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1)
        expect(predictions).to eq(["other"])
        predictions = sm.predictions_for('me', matrix_label: :users, offset: 1)
        expect(predictions).to eq(["other", "nada"])
      end

      it "accepts a :boost option" do
        BaseRecommender.input_matrix(:users, weight: 4.0)
        BaseRecommender.input_matrix(:tags, weight: 1.0)
        sm = BaseRecommender.new
        sm.users.add_to_set('me', "foo", "bar", "fnord")
        sm.users.add_to_set('not_me', "foo", "shmoo")
        sm.users.add_to_set('another', "fnord", "other")
        sm.users.add_to_set('another', "nada")
        sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo")
        sm.tags.add_to_set('tag2', "bar", "shmoo")
        sm.tags.add_to_set('tag3', "shmoo", "nada")
        sm.process!

        # Syntax #1: Tags passed as array, weights assumed to be 1.0
        predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: ['tag3']})
        expect(predictions).to eq(["shmoo", "nada", "other"])
        predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], boost: {tags: ['tag3']})
        expect(predictions).to eq(["shmoo", "nada", "other"])
        predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: ['tag3']})
        expect(predictions).to eq(["nada"])
        predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: ['tag3']})
        expect(predictions).to eq(["nada", "other"])

        # Syntax #2: Weights explicitly set.
        predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: {values: ['tag3'], weight: 1.0}})
        expect(predictions).to eq(["shmoo", "nada", "other"])
        predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], boost: {tags: {values: ['tag3'], weight: 1.0}})
        expect(predictions).to eq(["shmoo", "nada", "other"])
        predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
        expect(predictions).to eq(["nada"])
        predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
        expect(predictions).to eq(["nada", "other"])

        # Make sure weights are actually being passed to Redis.
        shmoo, nada, other = sm.predictions_for('me', matrix_label: :users, boost: {tags: {values: ['tag3'], weight: 10000.0}}, with_scores: true)
        expect(shmoo[0]).to eq('shmoo')
        expect(shmoo[1]).to be > 10000
        expect(nada[0]).to eq('nada')
        expect(nada[1]).to be > 10000
        expect(other[0]).to eq('other')
        expect(other[1]).to be < 10
      end

      it "accepts a :boost option, even with an empty item set" do
        BaseRecommender.input_matrix(:users, weight: 4.0)
        BaseRecommender.input_matrix(:tags, weight: 1.0)
        sm = BaseRecommender.new
        sm.users.add_to_set('not_me', "foo", "shmoo")
        sm.users.add_to_set('another', "fnord", "other")
        sm.users.add_to_set('another', "nada")
        sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo")
        sm.tags.add_to_set('tag2', "bar", "shmoo")
        sm.tags.add_to_set('tag3', "shmoo", "nada")
        sm.process!

        # Syntax #1: Tags passed as array, weights assumed to be 1.0
        predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: ['tag3']})
        expect(predictions).to eq(["shmoo", "nada"])
        predictions = sm.predictions_for(item_set: [], boost: {tags: ['tag3']})
        expect(predictions).to eq(["shmoo", "nada"])
        predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: ['tag3']})
        expect(predictions).to eq(["nada"])
        predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: ['tag3']})
        expect(predictions).to eq(["nada"])

        # Syntax #2: Weights explicitly set.
        predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: {values: ['tag3'], weight: 1.0}})
        expect(predictions).to eq(["shmoo", "nada"])
        predictions = sm.predictions_for(item_set: [], boost: {tags: {values: ['tag3'], weight: 1.0}})
        expect(predictions).to eq(["shmoo", "nada"])
        predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
        expect(predictions).to eq(["nada"])
        predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
        expect(predictions).to eq(["nada"])
      end
    end

    describe "process_items! with #{technique} processing" do
      before do
        Predictor.processing_technique(technique)
      end

      context "with no similarity_limit" do
        it "calculates the similarity between the item and all related_items (other items in a set the given item is in)" do
          BaseRecommender.input_matrix(:myfirstinput)
          BaseRecommender.input_matrix(:mysecondinput)
          BaseRecommender.input_matrix(:mythirdinput, weight: 3.0)
          sm = BaseRecommender.new
          sm.myfirstinput.add_to_set 'set1', 'item1', 'item2'
          sm.mysecondinput.add_to_set 'set2', 'item2', 'item3'
          sm.mythirdinput.add_to_set 'set3', 'item2', 'item3'
          sm.mythirdinput.add_to_set 'set4', 'item1', 'item2', 'item3'
          expect(sm.similarities_for('item2')).to be_empty
          sm.process_items!('item2')
          similarities = sm.similarities_for('item2')
          expect(similarities).to eq(["item3", "item1"])
        end
      end

      context "with a similarity_limit" do
        it "calculates the similarity between the item and all related_items (other items in a set the given item is in), but obeys the similarity_limit" do
          BaseRecommender.input_matrix(:myfirstinput)
          BaseRecommender.input_matrix(:mysecondinput)
          BaseRecommender.input_matrix(:mythirdinput, weight: 3.0)
          BaseRecommender.limit_similarities_to(1)
          sm = BaseRecommender.new
          sm.myfirstinput.add_to_set 'set1', 'item1', 'item2'
          sm.mysecondinput.add_to_set 'set2', 'item2', 'item3'
          sm.mythirdinput.add_to_set 'set3', 'item2', 'item3'
          sm.mythirdinput.add_to_set 'set4', 'item1', 'item2', 'item3'
          expect(sm.similarities_for('item2')).to be_empty
          sm.process_items!('item2')
          similarities = sm.similarities_for('item2')
          expect(similarities).to include("item3")
          expect(similarities.length).to eq(1)
        end
      end
    end
  end

  describe "similarities_for" do
    it "should not throw exception for non existing items" do
      sm = BaseRecommender.new
      expect(sm.similarities_for("not_existing_item").length).to eq(0)
    end

    it "correctly weighs and sums input matrices" do
      BaseRecommender.input_matrix(:users, weight: 1.0)
      BaseRecommender.input_matrix(:tags, weight: 2.0)
      BaseRecommender.input_matrix(:topics, weight: 4.0)

      sm = BaseRecommender.new

      sm.users.add_to_set('user1', "c1", "c2", "c4")
      sm.users.add_to_set('user2', "c3", "c4")
      sm.topics.add_to_set('topic1', "c1", "c4")
      sm.topics.add_to_set('topic2', "c2", "c3")
      sm.tags.add_to_set('tag1', "c1", "c2", "c4")
      sm.tags.add_to_set('tag2', "c1", "c4")

      sm.process!
      expect(sm.similarities_for("c1", with_scores: true)).to eq([["c4", 6.5], ["c2", 2.0]])
      expect(sm.similarities_for("c2", with_scores: true)).to eq([["c3", 4.0], ["c1", 2.0], ["c4", 1.5]])
      expect(sm.similarities_for("c3", with_scores: true)).to eq([["c2", 4.0], ["c4", 0.5]])
      expect(sm.similarities_for("c4", with_scores: true, exclusion_set: ["c3"])).to eq([["c1", 6.5], ["c2", 1.5]])
    end
  end

  describe "sets_for" do
    it "should return all the sets the given item is in" do
      BaseRecommender.input_matrix(:set1)
      BaseRecommender.input_matrix(:set2)
      sm = BaseRecommender.new
      sm.set1.add_to_set "item1", "foo", "bar"
      sm.set1.add_to_set "item2", "nada", "bar"
      sm.set2.add_to_set "item3", "bar", "other"
      expect(sm.sets_for("bar").length).to eq(3)
      expect(sm.sets_for("bar")).to include("item1", "item2", "item3")
      expect(sm.sets_for("other")).to eq(["item3"])
    end
  end

  describe "process!" do
    it "should call process_items for all_items's" do
      BaseRecommender.input_matrix(:anotherinput)
      BaseRecommender.input_matrix(:yetanotherinput)
      sm = BaseRecommender.new
      sm.anotherinput.add_to_set('a', "foo", "bar")
      sm.yetanotherinput.add_to_set('b', "fnord", "shmoo")
      expect(sm.all_items).to include("foo", "bar", "fnord", "shmoo")
      expect(sm).to receive(:process_items!).with(*sm.all_items)
      sm.process!
    end
  end

  describe "delete_pair_from_matrix!" do
    it "should call remove_from_set on the matrix" do
      BaseRecommender.input_matrix(:anotherinput)
      sm = BaseRecommender.new
      sm.anotherinput.add_to_set('a', "foo")
      sm.anotherinput.add_to_set('a', "bar")
      sm.anotherinput.add_to_set('a', "shmoo")
      sm.process!
      expect(sm.similarities_for('bar')).to include('foo', 'shmoo')
      expect(sm.anotherinput).to receive(:remove_from_set).with('a', 'foo')
      sm.delete_pair_from_matrix!(:anotherinput, 'a', 'foo')
    end

    it "updates similarities" do
      BaseRecommender.input_matrix(:anotherinput)
      sm = BaseRecommender.new
      sm.anotherinput.add_to_set('a', "foo")
      sm.anotherinput.add_to_set('a', "bar")
      sm.anotherinput.add_to_set('a', "shmoo")
      sm.process!
      expect(sm.similarities_for('bar')).to include('foo', 'shmoo')
      sm.delete_pair_from_matrix!(:anotherinput, 'a', 'foo')
      expect(sm.similarities_for('bar')).to eq(['shmoo'])
    end
  end

  describe "delete_from_matrix!" do
    it "calls delete_item on the matrix" do
      BaseRecommender.input_matrix(:anotherinput)
      BaseRecommender.input_matrix(:yetanotherinput)
      sm = BaseRecommender.new
      sm.anotherinput.add_to_set('a', "foo", "bar")
      sm.yetanotherinput.add_to_set('b', "bar", "shmoo")
      sm.process!
      expect(sm.similarities_for('bar')).to include('foo', 'shmoo')
      expect(sm.anotherinput).to receive(:delete_item).with('foo')
      sm.delete_from_matrix!(:anotherinput, 'foo')
    end

    it "updates similarities" do
      BaseRecommender.input_matrix(:anotherinput)
      BaseRecommender.input_matrix(:yetanotherinput)
      sm = BaseRecommender.new
      sm.anotherinput.add_to_set('a', "foo", "bar")
      sm.yetanotherinput.add_to_set('b', "bar", "shmoo")
      sm.process!
      expect(sm.similarities_for('bar')).to include('foo', 'shmoo')
      sm.delete_from_matrix!(:anotherinput, 'foo')
      expect(sm.similarities_for('bar')).to eq(['shmoo'])
    end
  end

  describe "delete_item!" do
    it "should call delete_item on each input_matrix" do
      BaseRecommender.input_matrix(:myfirstinput)
      BaseRecommender.input_matrix(:mysecondinput)
      sm = BaseRecommender.new
      expect(sm.myfirstinput).to receive(:delete_item).with("fnorditem")
      expect(sm.mysecondinput).to receive(:delete_item).with("fnorditem")
      sm.delete_item!("fnorditem")
    end

    it "should remove the item from all_items" do
      BaseRecommender.input_matrix(:anotherinput)
      sm = BaseRecommender.new
      sm.anotherinput.add_to_set('a', "foo", "bar")
      sm.process!
      expect(sm.all_items).to include('foo')
      sm.delete_item!('foo')
      expect(sm.all_items).not_to include('foo')
    end

    it "should remove the item's similarities and also remove the item from related_items' similarities" do
      BaseRecommender.input_matrix(:anotherinput)
      BaseRecommender.input_matrix(:yetanotherinput)
      sm = BaseRecommender.new
      sm.anotherinput.add_to_set('a', "foo", "bar")
      sm.yetanotherinput.add_to_set('b', "bar", "shmoo")
      sm.process!
      expect(sm.similarities_for('bar')).to include('foo', 'shmoo')
      expect(sm.similarities_for('shmoo')).to include('bar')
      sm.delete_item!('shmoo')
      expect(sm.similarities_for('bar')).not_to include('shmoo')
      expect(sm.similarities_for('shmoo')).to be_empty
    end
  end

  describe "clean!" do
    it "should clean out the Redis storage for this Predictor" do
      BaseRecommender.input_matrix(:set1)
      BaseRecommender.input_matrix(:set2)
      sm = BaseRecommender.new
      sm.set1.add_to_set "item1", "foo", "bar"
      sm.set1.add_to_set "item2", "nada", "bar"
      sm.set2.add_to_set "item3", "bar", "other"

      expect(Predictor.redis.keys(sm.redis_key('*'))).not_to be_empty
      sm.clean!
      expect(Predictor.redis.keys(sm.redis_key('*'))).to be_empty
    end
  end

  describe "ensure_similarity_limit_is_obeyed!" do
    it "should shorten similarities to the given limit and rewrite the zset" do
      BaseRecommender.limit_similarities_to(nil)

      BaseRecommender.input_matrix(:myfirstinput)
      sm = BaseRecommender.new
      sm.myfirstinput.add_to_set *(['set1'] + 130.times.map{|i| "item#{i}"})
      expect(sm.similarities_for('item2')).to be_empty
      sm.process_items!('item2')
      expect(sm.similarities_for('item2').length).to eq(129)

      redis = Predictor.redis
      key = sm.redis_key(:similarities, 'item2')
      expect(redis.zcard(key)).to eq(129)
      expect(redis.object(:encoding, key)).to eq('skiplist') # Inefficient

      BaseRecommender.reset_similarity_limit!
      sm.ensure_similarity_limit_is_obeyed!

      expect(redis.zcard(key)).to eq(128)
      expect(redis.object(:encoding, key)).to eq('ziplist') # Efficient
    end
  end
end


================================================
FILE: spec/input_matrix_spec.rb
================================================
require 'spec_helper'

describe Predictor::InputMatrix do
  let(:options) { @default_options.merge(@options) }

  before(:each) { @options = {} }

  before(:all) do
    @base = BaseRecommender.new
    @default_options = { base: @base, key: "mymatrix" }
    @matrix = Predictor::InputMatrix.new(@default_options)
  end

  before(:each) do
    flush_redis!
  end

  describe "redis_key" do
    it "should respect the global namespace configuration" do
      expect(@matrix.redis_key).to eq("predictor-test:BaseRecommender:mymatrix")
      expect(@matrix.redis_key(:another)).to eq("predictor-test:BaseRecommender:mymatrix:another")
      expect(@matrix.redis_key(:another, :key)).to eq("predictor-test:BaseRecommender:mymatrix:another:key")
      expect(@matrix.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:BaseRecommender:mymatrix:another:set:of:keys")

      i = 0
      Predictor.redis_prefix { i += 1 }
      expect(@matrix.redis_key).to eq("1:BaseRecommender:mymatrix")
      expect(@matrix.redis_key(:another)).to eq("2:BaseRecommender:mymatrix:another")
      expect(@matrix.redis_key(:another, :key)).to eq("3:BaseRecommender:mymatrix:another:key")
      expect(@matrix.redis_key(:another, [:set, :of, :keys])).to eq("4:BaseRecommender:mymatrix:another:set:of:keys")

      Predictor.redis_prefix(nil)
      expect(@matrix.redis_key).to eq("predictor:BaseRecommender:mymatrix")
      expect(@matrix.redis_key(:another)).to eq("predictor:BaseRecommender:mymatrix:another")
      expect(@matrix.redis_key(:another, :key)).to eq("predictor:BaseRecommender:mymatrix:another:key")
      expect(@matrix.redis_key(:another, [:set, :of, :keys])).to eq("predictor:BaseRecommender:mymatrix:another:set:of:keys")

      Predictor.redis_prefix('predictor-test')
      expect(@matrix.redis_key).to eq("predictor-test:BaseRecommender:mymatrix")
      expect(@matrix.redis_key(:another)).to eq("predictor-test:BaseRecommender:mymatrix:another")
      expect(@matrix.redis_key(:another, :key)).to eq("predictor-test:BaseRecommender:mymatrix:another:key")
      expect(@matrix.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:BaseRecommender:mymatrix:another:set:of:keys")
    end

    it "should respect the class-level configuration" do
      i = 0
      BaseRecommender.redis_prefix { i += 1 }
      expect(@matrix.redis_key).to eq("predictor-test:1:mymatrix")
      expect(@matrix.redis_key(:another)).to eq("predictor-test:2:mymatrix:another")
      expect(@matrix.redis_key(:another, :key)).to eq("predictor-test:3:mymatrix:another:key")
      expect(@matrix.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:4:mymatrix:another:set:of:keys")

      BaseRecommender.redis_prefix([nil])
      expect(@matrix.redis_key).to eq("predictor-test:mymatrix")
      expect(@matrix.redis_key(:another)).to eq("predictor-test:mymatrix:another")
      expect(@matrix.redis_key(:another, :key)).to eq("predictor-test:mymatrix:another:key")
      expect(@matrix.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:mymatrix:another:set:of:keys")

      BaseRecommender.redis_prefix(['a', 'b'])
      expect(@matrix.redis_key).to eq("predictor-test:a:b:mymatrix")
      expect(@matrix.redis_key(:another)).to eq("predictor-test:a:b:mymatrix:another")
      expect(@matrix.redis_key(:another, :key)).to eq("predictor-test:a:b:mymatrix:another:key")
      expect(@matrix.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:a:b:mymatrix:another:set:of:keys")

      BaseRecommender.redis_prefix(nil)
      expect(@matrix.redis_key).to eq("predictor-test:BaseRecommender:mymatrix")
      expect(@matrix.redis_key(:another)).to eq("predictor-test:BaseRecommender:mymatrix:another")
      expect(@matrix.redis_key(:another, :key)).to eq("predictor-test:BaseRecommender:mymatrix:another:key")
      expect(@matrix.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:BaseRecommender:mymatrix:another:set:of:keys")
    end
  end

  describe "weight" do
    it "returns the weight configured or a default of 1" do
      expect(@matrix.weight).to eq(1.0)  # default weight
      matrix = Predictor::InputMatrix.new(redis_prefix: "predictor-test", key: "mymatrix", weight: 5.0)
      expect(matrix.weight).to eq(5.0)
    end
  end

  describe "add_to_set" do
    it "adds each member of the set to the key's 'sets' set" do
      expect(@matrix.items_for("item1")).not_to include("foo", "bar", "fnord", "blubb")
      @matrix.add_to_set "item1", "foo", "bar", "fnord", "blubb"
      expect(@matrix.items_for("item1")).to include("foo", "bar", "fnord", "blubb")
    end

    it "does not crash if the set of items is empty" do
      @matrix.add_to_set "item1"
      @matrix.add_to_set "item1", []
    end

    it "adds the key to each set member's 'items' set" do
      expect(@matrix.sets_for("foo")).not_to include("item1")
      expect(@matrix.sets_for("bar")).not_to include("item1")
      expect(@matrix.sets_for("fnord")).not_to include("item1")
      expect(@matrix.sets_for("blubb")).not_to include("item1")
      @matrix.add_to_set "item1", "foo", "bar", "fnord", "blubb"
      expect(@matrix.sets_for("foo")).to include("item1")
      expect(@matrix.sets_for("bar")).to include("item1")
      expect(@matrix.sets_for("fnord")).to include("item1")
      expect(@matrix.sets_for("blubb")).to include("item1")
    end
  end

  describe "items_for" do
    it "returns the items in the given set ID" do
      @matrix.add_to_set "item1", ["foo", "bar", "fnord", "blubb"]
      expect(@matrix.items_for("item1")).to include("foo", "bar", "fnord", "blubb")
      @matrix.add_to_set "item2", ["foo", "bar", "snafu", "nada"]
      expect(@matrix.items_for("item2")).to include("foo", "bar", "snafu", "nada")
      expect(@matrix.items_for("item1")).not_to include("snafu", "nada")
    end
  end

  describe "sets_for" do
    it "returns the set IDs the given item is in" do
      @matrix.add_to_set "item1", ["foo", "bar", "fnord", "blubb"]
      @matrix.add_to_set "item2", ["foo", "bar", "snafu", "nada"]
      expect(@matrix.sets_for("foo")).to include("item1", "item2")
      expect(@matrix.sets_for("snafu")).to eq(["item2"])
    end
  end

  describe "related_items" do
    it "returns the items in sets the given item is also in" do
      @matrix.add_to_set "item1", ["foo", "bar", "fnord", "blubb"]
      @matrix.add_to_set "item2", ["foo", "bar", "snafu", "nada"]
      @matrix.add_to_set "item3", ["nada", "other"]
      expect(@matrix.related_items("bar")).to include("foo", "fnord", "blubb", "snafu", "nada")
      expect(@matrix.related_items("bar").length).to eq(5)
      expect(@matrix.related_items("other")).to eq(["nada"])
      expect(@matrix.related_items("snafu")).to include("foo", "bar", "nada")
      expect(@matrix.related_items("snafu").length).to eq(3)
    end
  end

  describe "delete_item" do
    before do
      @matrix.add_to_set "item1", "foo", "bar", "fnord", "blubb"
      @matrix.add_to_set "item2", "foo", "bar", "snafu", "nada"
      @matrix.add_to_set "item3", "nada", "other"
    end

    it "should delete the item from sets it is in" do
      expect(@matrix.items_for("item1")).to include("bar")
      expect(@matrix.items_for("item2")).to include("bar")
      expect(@matrix.sets_for("bar")).to include("item1", "item2")
      @matrix.delete_item("bar")
      expect(@matrix.items_for("item1")).not_to include("bar")
      expect(@matrix.items_for("item2")).not_to include("bar")
      expect(@matrix.sets_for("bar")).to be_empty
    end
  end

  describe "#score" do
    let(:matrix) { Predictor::InputMatrix.new(options) }

    context "default" do
      it "scores as jaccard index by default" do
        matrix.add_to_set "item1", "foo", "bar", "fnord", "blubb"
        matrix.add_to_set "item2", "bar", "fnord", "shmoo", "snafu"
        matrix.add_to_set "item3", "bar", "nada", "snafu"

        expect(matrix.score("bar", "snafu")).to eq(2.0/3.0)
      end

      it "scores as jaccard index when given option" do
        matrix = Predictor::InputMatrix.new(options.merge(measure: :jaccard_index))
        matrix.add_to_set "item1", "foo", "bar", "fnord", "blubb"
        matrix.add_to_set "item2", "bar", "fnord", "shmoo", "snafu"
        matrix.add_to_set "item3", "bar", "nada", "snafu"

        expect(matrix.score("bar", "snafu")).to eq(2.0/3.0)
      end

      it "should handle missing sets" do
        matrix.add_to_set "item1", "foo", "bar", "fnord", "blubb"

        expect(matrix.score("is", "missing")).to eq(0.0)
      end
    end

    context "sorensen_coefficient" do
      before { @options[:measure] = :sorensen_coefficient }

      it "should calculate the correct sorensen index" do
        matrix.add_to_set "item1", "foo", "bar", "fnord", "blubb"
        matrix.add_to_set "item2", "fnord", "shmoo", "snafu"
        matrix.add_to_set "item3", "bar", "nada", "snafu"

        expect(matrix.score("bar", "snafu")).to eq(2.0/4.0)
      end

      it "should handle missing sets" do
        matrix.add_to_set "item1", "foo", "bar", "fnord", "blubb"

        expect(matrix.score("is", "missing")).to eq(0.0)
      end
    end
  end

  private

  def add_two_item_test_data!(matrix)
    matrix.add_to_set("user42", "fnord", "blubb")
    matrix.add_to_set("user44", "blubb")
    matrix.add_to_set("user46", "fnord")
    matrix.add_to_set("user48", "fnord", "blubb")
    matrix.add_to_set("user50", "fnord")
  end

  def add_three_item_test_data!(matrix)
    matrix.add_to_set("user42", "fnord", "blubb", "shmoo")
    matrix.add_to_set("user44", "blubb")
    matrix.add_to_set("user46", "fnord", "shmoo")
    matrix.add_to_set("user48", "fnord", "blubb")
    matrix.add_to_set("user50", "fnord", "shmoo")
  end

end


================================================
FILE: spec/predictor_spec.rb
================================================
require 'spec_helper'

describe Predictor do

  it "should store a redis connection" do
    Predictor.redis = "asd"
    expect(Predictor.redis).to eq("asd")
  end

  it "should raise an exception if unconfigured redis connection is accessed" do
    Predictor.redis = nil
    expect{ Predictor.redis }.to raise_error(/not configured/i)
  end

end


================================================
FILE: spec/spec_helper.rb
================================================
require "predictor"
require "pry"

def flush_redis!
  Predictor.redis = Redis.new
  Predictor.redis.keys("predictor-test*").each do |k|
    Predictor.redis.del(k)
  end
end

Predictor.redis_prefix "predictor-test"

class BaseRecommender
  include Predictor::Base
end

class UserRecommender
  include Predictor::Base
end

class TestRecommender
  include Predictor::Base

  input_matrix :jaccard_one
end

class PrefixRecommender
  include Predictor::Base

  def initialize(prefix)
    @prefix = prefix
  end

  def prefix=(new_prefix)
    @prefix = new_prefix
  end

  def get_redis_prefix
    @prefix
  end
end

class Predictor::TestInputMatrix
  def initialize(opts)
    @opts = opts
  end

  def method_missing(method, *args)
    @opts[method]
  end
end