Repository: Pathgather/predictor
Branch: master
Commit: be866b424119
Files: 20
Total size: 88.7 KB
Directory structure:
gitextract_u972f2ab/
├── .github/
│ └── workflows/
│ └── test.yml
├── .gitignore
├── Changelog.md
├── Gemfile
├── LICENSE
├── README.md
├── Rakefile
├── benchmark/
│ └── process.rb
├── docs/
│ └── READMEv1.md
├── lib/
│ ├── predictor/
│ │ ├── base.rb
│ │ ├── distance.rb
│ │ ├── input_matrix.rb
│ │ ├── predictor.rb
│ │ └── version.rb
│ └── predictor.rb
├── predictor.gemspec
└── spec/
├── base_spec.rb
├── input_matrix_spec.rb
├── predictor_spec.rb
└── spec_helper.rb
================================================
FILE CONTENTS
================================================
================================================
FILE: .github/workflows/test.yml
================================================
name: Test
on: [push, pull_request]
jobs:
test:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-18.04, ubuntu-20.04]
ruby: [2.6, 2.7, 3.0]
services:
redis:
image: redis
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
- 6379:6379
steps:
- uses: actions/checkout@v2
- name: Set up Ruby ${{ matrix.ruby }}
uses: ruby/setup-ruby@v1
with:
bundler-cache: true
ruby-version: ${{ matrix.ruby }}
- name: Install dependencies
run: bundle install
- name: Run tests
run: bundle exec rake
================================================
FILE: .gitignore
================================================
bin/
*.gem
Gemfile.lock
ext/Makefile
================================================
FILE: Changelog.md
================================================
# Predictor Changelog
All notable changes to this project will be documented in this file.
## [Unreleased]
### Changed
- Support rake version 11.0 or higher&rspec version 3.4.0 or higher
- Fix title of README
- Change a test with github actions
- Made it possible to run tests on ubuntu-18.04 and ubuntu-20.04
- Fix the homepage entry in predictor.gemspec
### **BREAKING CHANGES**
- Ruby 2.1 ~ 2.5 will no longer be supported because of eol
## [2.3.0] - 2014-09-06
- The logic for processing item similarities was ported to a Lua script. Use `Predictor.processing_technique(:lua)` to use the Lua script for all similarity calculations, or use `MyRecommender.processing_technique(:lua)` to use it for specific recommenders. It is substantially faster than the default (old) Ruby mechanism, but has the disadvantage of blocking the Redis server while it runs.
- An alternate method of calculating item similarities was added, which uses a ZUNIONSTORE across item sets. The results are similar to those achieved by using the Ruby or Lua scripts, but faster. Use `Predictor.processing_technique(:union)` to use the ZUNIONSTORE technique for all similarity calculations, or use `MyRecommender.processing_technique(:union)` to use it for specific recommenders.
## [2.2.0] - 2014-06-24
- The namespace used for keys in Redis is now configurable on a global or per-class basis. See the readme for more information. If you were overriding the redis_prefix instance method before, it is recommended that you use the new redis_prefix class method instead.
- Data stored in Redis is now namespaced by the class name of the recommender it is stored by. This change ensures that different recommenders with input matrices of the same name don't overwrite each others' data. After upgrading you'll need to either reindex your data in Redis or configure Predictor to use the naming system you were using before. If you were using the defaults before and you're not worried about matrix name collisions, you can mimic the old behavior with:
```ruby
class MyRecommender
include Predictor::Base
redis_prefix [nil]
end
```
- The #predictions_for method on recommenders now accepts a :boost option to give more weight to items with particular attributes. See the readme for more information.
## [2.1.0] - 2014-06-19
- The similarity limit now defaults to 128, instead of being unlimited. This is intended to save space in Redis. See the Readme for more information. It is strongly recommended that you run `ensure_similarity_limit_is_obeyed!` to shrink existing similarity sets.
## [2.0.0] - 2014-04-17
**Rewrite of 1.0.0 and contains several breaking changes!**
Version 1.0.0 (which really should have been 0.0.1) contained several issues that made compatability with v2 not worth the trouble. This includes:
- In v1, similarities were cached per input_matrix, and Predictor::Base utilized those caches when determining similarities and predictions. This quickly ate up Redis memory with even a semi-large dataset, as each input_matrix had a significant memory requirement. v2 caches similarities at the root (Recommender::Base), which means you can add any number of input matrices with little impact on memory usage.
- Added the ability to limit the number of items stored in the similarity cache (via the 'limit_similarities_to' option). Now that similarities are cached at the root, this is possible and can greatly help memory usage.
- Removed bang methods from input_matrix (add_set!, and_single!, etc). These called process! for you previously, but since the cache is no longer kept at the input_matrix level, process! has to be called at the root (Recommender::Base)
- Bug fix: Fixed bug where a call to delete_item! on the input matrix didn't update the similarity cache.
- Other minor fixes.
================================================
FILE: Gemfile
================================================
source 'https://rubygems.org'
gemspec
================================================
FILE: LICENSE
================================================
The MIT License (MIT)
Copyright (c) 2014 Pathgather
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
================================================
FILE: README.md
================================================
# Predictor
Fast and efficient recommendations and predictions using Ruby & Redis. Developed by and used at [Pathgather](http://pathgather.com) to generate course similarities and content recommendations to users.

Originally forked and based on [Recommendify](https://github.com/paulasmuth/recommendify) by Paul Asmuth, so a huge thanks to him for his contributions to Recommendify. Predictor has been almost completely rewritten to
* Be much, much more performant and efficient by using Redis for most logic.
* Provide item similarities such as "Users that read this book also read ..."
* Provide personalized predictions based on a user's past history, such as "You read these 10 books, so you might also like to read ..."
At the moment, Predictor uses the [Jaccard index](http://en.wikipedia.org/wiki/Jaccard_index) or the [Sorenson-Dice coefficient](http://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient) (default is Jaccard) to determine similarities between items. There are other ways to do this, which we intend to implement eventually, but if you want to beat us to the punch, pull requests are quite welcome :)
Notice
---------------------
This is the readme for Predictor 2.0, which contains a few breaking changes from 1.0. The 1.0 readme can be found [here](https://github.com/Pathgather/predictor/blob/master/docs/READMEv1.md). See below on how to upgrade to 2.0
Installation
---------------------
In your Gemfile:
```ruby
gem 'predictor'
```
Getting Started
---------------------
First step is to configure Predictor with your Redis instance.
```ruby
# in config/initializers/predictor.rb
Predictor.redis = Redis.new(:url => ENV["PREDICTOR_REDIS"])
# Or, to improve performance, add hiredis as your driver (you'll need to install the hiredis gem first)
Predictor.redis = Redis.new(:url => ENV["PREDICTOR_REDIS"], :driver => :hiredis)
```
Inputting Data
---------------------
Create a class and include the Predictor::Base module. Define an input_matrix for each relationship you'd like to keep track of. This can be anything you think is a significant metric for the item: page views, purchases, categories the item belongs to, etc.
Below, we're building a recommender to recommend courses based off of:
* Users that have taken a course. If 2 courses were taken by the same user, this is 3 times as important to us than if the courses share the same topic. This will lead to sets like:
* "user1" -> "course-1", "course-3",
* "user2" -> "course-1", "course-4"
* Tags and their courses. This will lead to sets like:
* "rails" -> "course-1", "course-2",
* "microeconomics" -> "course-3", "course-4"
* Topics and their courses. This will lead to sets like:
* "computer science" -> "course-1", "course-2",
* "economics and finance" -> "course-3", "course-4"
```ruby
class CourseRecommender
include Predictor::Base
input_matrix :users, weight: 3.0
input_matrix :tags, weight: 2.0
input_matrix :topics, weight: 1.0, measure: :sorensen_coefficient # Use Sorenson over Jaccard
end
```
Now, we just need to update our matrices when courses are created, users take a course, topics are changed, etc:
```ruby
recommender = CourseRecommender.new
# Add a single course to topic-1's items. If topic-1 already exists as a set ID, this just adds course-1 to the set
recommender.add_to_matrix!(:topics, "topic-1", "course-1")
# If your dataset is even remotely large, add_to_matrix! could take some time, as it must calculate the similarity scores
# for course-1 and other courses that share a set with course-1. If this is the case, use add_to_matrix and
# process the items at a more convenient time, perhaps in a background job
recommender.topics.add_to_set("topic-1", "course-1", "course-2") # Same as recommender.add_to_matrix(:topics, "topic-1", "course-1", "course-2")
recommender.process_items!("course-1", "course-2")
```
As noted above, it's important to remember that if you don't use the bang method 'add_to_matrix!', you'll need to manually update your similarities. If your dataset is even remotely large, you'll probably want to do this:
* If you want to update the similarities for certain item(s):
````
recommender.process_items!(item1, item2, etc)
````
* If you want to update all similarities for all items:
````
recommender.process!
````
Retrieving Similarities and Recommendations
---------------------
Now that your matrices have been initialized with several relationships, you can start generating similarities and recommendations! First, let's start with similarities, which will use the weights we specify on each matrix to determine which courses share the most in common with a given course.
```ruby
recommender = CourseRecommender.new
# Return all similarities for course-1 (ordered by most similar to least).
recommender.similarities_for("course-1")
# Need to paginate? Not a problem! Specify an offset and a limit
recommender.similarities_for("course-1", offset: 10, limit: 10) # Gets similarities 11-20
# Want scores?
recommender.similarities_for("course-1", with_scores: true)
# Want to ignore a certain set of courses in similarities?
recommender.similarities_for("course-1", exclusion_set: ["course-2"])
```
The above examples are great for situations like "Users that viewed this also liked ...", but what if you wanted to recommend courses to a user based on the courses they've already taken? Not a problem!
```ruby
recommender = CourseRecommender.new
# User has taken course-1 and course-2. Let's see what else they might like...
recommender.predictions_for(item_set: ["course-1", "course-2"])
# Already have the set you need stored in an input matrix? In our case, we do (the users matrix stores the courses a user has taken), so we can just do:
recommender.predictions_for("user-1", matrix_label: :users)
# Paginate too!
recommender.predictions_for("user-1", matrix_label: :users, offset: 10, limit: 10)
# Gimme some scores and ignore course-2....that course-2 is one sketchy fella
recommender.predictions_for("user-1", matrix_label: :users, with_scores: true, exclusion_set: ["course-2"])
```
Deleting Items
---------------------
If your data is deleted from your persistent storage, you certainly don't want to recommend it to a user. To ensure that doesn't happen, simply call delete_from_matrix! with the individual matrix or delete_item! if the item is completely gone:
```ruby
recommender = CourseRecommender.new
# User removed course-1 from topic-1, but course-1 still exists
recommender.delete_pair_from_matrix!(:topics, "topic-1", "course-1")
#User removed course-1 from all topics
recommender.delete_from_matrix!(:topics, "course-1")
# course-1 was permanently deleted
recommender.delete_item!("course-1")
# Something crazy has happened, so let's just start fresh and wipe out all previously stored similarities:
recommender.clean!
```
Limiting Similarities
---------------------
By default, Predictor caches 128 similarities for each item. This is because this is the maximum size for the similarity sorted sets to be kept in a [memory-efficient format](http://redis.io/topics/memory-optimization). If you want to keep more similarities than that, and you don't mind using more memory, you may want to increase the similarity limit, like so:
```ruby
class CourseRecommender
include Predictor::Base
limit_similarities_to 500
input_matrix :users, weight: 3.0
input_matrix :tags, weight: 2.0
input_matrix :topics, weight: 1.0
end
```
The memory penalty can be heavy, though. In our testing, similarity caches for 1,000 objects varied in size like so:
```
limit_similarities_to(128) # 8.5 MB (this is the default)
limit_similarities_to(129) # 22.74 MB
limit_similarities_to(500) # 76.72 MB
```
If you decide you need to store more than 128 similarities, you may want to see the Redis documentation linked above and consider increasing `zset-max-ziplist-entries` in your configuration.
Predictions fetched with the predictions_for call utilizes the similarity caches, so if you're using predictions_for, make sure you set the limit high enough so that intelligent predictions can be generated. If you aren't using predictions and are just using similarities, then feel free to set this to the maximum number of similarities you'd possibly want to show!
You can also use `limit_similarities_to(nil)` to remove the limit entirely. This means if you have 10,000 items, and each item is somehow related to the other, you'll have 10,000 sets each with 9,999 items, which will run up your Redis bill quite quickly. Removing the limit is not recommended unless you're sure you know what you're doing.
If at some point you decide to lower your similarity limits, you'll want to be sure to shrink the size of the sorted sets already in Redis. You can do this with `CourseRecommender.new.ensure_similarity_limit_is_obeyed!`.
Boost
---------------------
What if you want to recommend courses to users based not only on what courses they've taken, but on other attributes of courses that they may be interested in? You can do that by passing the :boost argument to predictions_for:
```ruby
class CourseRecommender
include Predictor::Base
# Courses are compared to one another by the users taking them and their tags.
input_matrix :users, weight: 3.0
input_matrix :tags, weight: 2.0
input_matrix :topics, weight: 2.0
end
recommender = CourseRecommender.new
# We want to find recommendations for Billy, who's told us that he's
# especially interested in free, interactive courses on Photoshop. So, we give
# a boost to courses that are tagged as free and interactive and have
# Photoshop as a topic:
recommender.predictions_for("Billy", matrix_label: :users, boost: {tags: ['free', 'interactive'], topics: ["Photoshop"]})
# We can also modify how much these tags and topics matter by specifying a
# weight. The default is 1.0, but if that's too much we can just tweak it:
recommender.predictions_for("Billy", matrix_label: :users, boost: {tags: {values: ['free', 'interactive'], weight: 0.4}, topics: {values: ["Photoshop"], weight: 0.3}})
```
Key Prefixes
---------------------
As of 2.2.0, there is much more control available over the format of the keys Predictor will use in Redis. By default, the CourseRecommender given as an example above will use keys like "predictor:CourseRecommender:users:items:user1". You can configure the global namespace like so:
```ruby
Predictor.redis_prefix 'my_namespace' # => "my_namespace:CourseRecommender:users:items:user1"
# Or, for a multitenanted setup:
Predictor.redis_prefix { "user-#{User.current.id}" } # => "user-7:CourseRecommender:users:items:user1"
```
You can also configure the namespace used by each class you create:
```ruby
class CourseRecommender
include Predictor::Base
redis_prefix "courses" # => "predictor:courses:users:items:user1"
redis_prefix { "courses_for_user-#{User.current.id}" } # => "predictor:courses_for_user-7:users:items:user1"
end
```
You can also configure the namespace used by each instance you create in addition to class and global namespace:
```ruby
class CourseRecommender
include Predictor::Base
def initialize(prefix)
@prefix = prefix
end
# Simply override this instance method with the prefix you want
def get_redis_prefix
@prefix
end
end
recommender = CourseRecommender.new("super")
recommender.redis_prefix # "predictor:CourseRecommender:super"
```
Processing Items
---------------------
As of 2.3.0, there are now multiple techniques available for processing item similarities. You can choose between them by setting a global default like `Predictor.processing_technique(:lua)` or setting a technique for certain classes like `CourseRecommender.processing_technique(:union)`. There are three values.
- :ruby - This is the default, and is how Predictor calculated similarities before 2.3.0. With this technique the Jaccard and Sorensen calculations are performed in Ruby, with frequent calls to Redis to retrieve simple values. It is somewhat slow.
- :lua - This option performs the Jaccard and Sorensen calculations in a Lua script on the Redis server. It is substantially faster than the :ruby technique, but blocks the Redis server while each set of calculations are run. The period of blocking will vary based on the size and disposition of your data, but each call may take up to several hundred milliseconds. If your application requires your Redis server to always return results quickly, and you're not able to simply run calculations during off-hours, you should use a different strategy.
- :union - This option skips Jaccard and Sorensen entirely, and uses a simpler technique involving a ZUNIONSTORE across many item sets to calculate similarities. The results are different from, but similar to the results of using the Jaccard and Sorensen algorithms. It is even faster than the :lua option and does not have the same problem of blocking Redis for long periods of time, but before using it you should sample the output to ensure that it is good enough for your application.
Predictor now contains a benchmarking script that you can use to compare the speed of these options. An example output from the processing of a relatively small dataset is:
```
ruby = 21.098 seconds
lua = 2.106 seconds
union = 0.741 seconds
```
Upgrading from 1.0 to 2.0
---------------------
As mentioned, 2.0.0 is quite a bit different than 1.0.0, so simply upgrading with no changes likely won't work. My apologies for this. I promise this won't happen in future releases, as I'm much more confident in this Predictor release than the last. Anywho, upgrading really shouldn't be that much of a pain if you follow these steps:
* Change predictor.matrix.add_set! and predictor.matrix.add_single! calls to predictor.add_to_matrix!. For example:
```ruby
# Change
predictor.topics.add_single!("topic-1", "course-1")
# to
predictor.add_to_matrix!(:topics, "topic-1", "course-1")
# Change
predictor.tags.add_set!("tag-1", ["course-1", "course-2"])
# to
predictor.add_to_matrix!(:tags, "tag-1", "course-1", "course-2")
```
* Change predictor.matrix.process! or predictor.matrix.process_item! calls to just predictor.process! or predictor.process_items!
```ruby
# Change
predictor.topics.process_item!("course-1")
# to
predictor.process_items!("course-1")
```
* Change predictor.matrix.delete_item! calls to predictor.delete_from_matrix!. This will update similarities too, so you may want to queue this to run in a background job.
```ruby
# Change
predictor.topics.delete_item!("course-1")
# to delete_from_matrix! if you want to update similarities to account for the deleted item (in v1, this was a bug and didn't occur)
predictor.delete_from_matrix!(:topics, "course-1")
```
* Regenerate your recommendations, as redis keys have changed for Predictor 2. You can use the recommender.clean! to clear out old similarities, then run your rake task (or whatever you've setup) to create new similarities.
About Pathgather
---------------------
Pathgather is an NYC-based startup building a platform that dramatically accelerates learning for enterprises by bringing employees, training content, and existing enterprise systems into one engaging platform.
Every Friday, we work on open-source software (our own or other projects). Want to join our always growing team? Peruse our [current opportunities](http://www.pathgather.com/jobs/) or reach out to us at <tech@pathgather.com>!
Problems? Issues? Want to help out?
---------------------
Just submit a Gihub issue or pull request! We'd love to have you help out, as the most common library to use for this need, Recommendify, was last updated 2 years ago. We'll be sure to keep this maintained, but we could certainly use your help!
The MIT License (MIT)
---------------------
Copyright (c) 2014 Pathgather
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
================================================
FILE: Rakefile
================================================
require 'bundler/gem_tasks'
require 'rspec/core/rake_task'
RSpec::Core::RakeTask.new(:spec)
task :default => :spec
Dir["./benchmark/*.rb"].sort.each &method(:require)
================================================
FILE: benchmark/process.rb
================================================
namespace :benchmark do
task :process do
require 'predictor'
require 'pry'
require 'logger'
Predictor.redis = Redis.new #logger: Logger.new(STDOUT)
Predictor.redis_prefix "predictor-benchmark"
def flush!
keys = Predictor.redis.keys("predictor-benchmark*")
Predictor.redis.del(keys) if keys.any?
end
class ItemRecommender
include Predictor::Base
input_matrix :users, weight: 2.0
input_matrix :parts, weight: 1.0
end
flush!
items = (1..200).map { |i| "item-#{i}" }
users = (1..100).map { |i| "user-#{i}" }
parts = (1..100).map { |i| "part-#{i}" }
r = ItemRecommender.new
start = Time.now
users.each { |user| r.users.add_to_set user, *items.sample(40) }
parts.each { |part| r.parts.add_to_set part, *items.sample(40) }
elapsed = Time.now - start
puts "add_to_set = #{elapsed.round(3)} seconds"
[:ruby, :lua, :union].each do |technique|
start = Time.now
Predictor.processing_technique technique
r.process!
elapsed = Time.now - start
puts "#{technique} = #{elapsed.round(3)} seconds"
end
flush!
end
end
================================================
FILE: docs/READMEv1.md
================================================
=======
Predictor
=========
Fast and efficient recommendations and predictions using Ruby & Redis. Used in production over at [Pathgather](http://pathgather.com) to generate course similarities and content recommendations to users.

Originally forked and based on [Recommendify](https://github.com/paulasmuth/recommendify) by Paul Asmuth, so a huge thanks to him for his contributions to Recommendify. Predictor has been almost completely rewritten to
* Be much, much more performant and efficient by using Redis for most logic.
* Provide item similarities such as "Users that read this book also read ..."
* Provide personalized predictions based on a user's past history, such as "You read these 10 books, so you might also like to read ..."
At the moment, Predictor uses the [Jaccard index](http://en.wikipedia.org/wiki/Jaccard_index) to determine similarities between items. There are other ways to do this, which we intend to implement eventually, but if you want to beat us to the punch, pull requests are quite welcome :)
Installation
---------------------
```ruby
gem install predictor
````
or in your Gemfile:
````
gem 'predictor'
```
Getting Started
---------------------
First step is to configure Predictor with your Redis instance.
```ruby
# in config/initializers/predictor.rb
Predictor.redis = Redis.new(:url => ENV["PREDICTOR_REDIS"])
# Or, to improve performance, add hiredis as your driver (you'll need to install the hiredis gem first)
Predictor.redis = Redis.new(:url => ENV["PREDICTOR_REDIS"], :driver => :hiredis)
```
Inputting Data
---------------------
Create a class and include the Predictor::Base module. Define an input_matrix for each relationship you'd like to keep track of. This can be anything you think is a significant metric for the item: page views, purchases, categories the item belongs to, etc.
Below, we're building a recommender to recommend courses based off of:
* Users that have taken a course. If 2 courses were taken by the same user, this is 3 times as important to us than if the courses share the same topic. This will lead to sets like:
* "user1" -> "course-1", "course-3",
* "user2" -> "course-1", "course-4"
* Tags and their courses. This will lead to sets like:
* "rails" -> "course-1", "course-2",
* "microeconomics" -> "course-3", "course-4"
* Topics and their courses. This will lead to sets like:
* "computer science" -> "course-1", "course-2",
* "economics and finance" -> "course-3", "course-4"
```ruby
class CourseRecommender
include Predictor::Base
input_matrix :users, weight: 3.0
input_matrix :tags, weight: 2.0
input_matrix :topics, weight: 1.0
end
```
Now, we just need to update our matrices when courses are created, users take a course, topics are changed, etc:
```ruby
recommender = CourseRecommender.new
# Add a single course to topic-1's items. If topic-1 already exists as a set ID, this just adds course-1 to the set
recommender.topics.add_single!("topic-1", "course-1")
# If your matrix is quite large, add_single! could take some time, as it must calculate the similarity scores
# for course-1 across all other courses. If this is the case, use add_single and process the item at a more
# convenient time, perhaps in a background job
recommender.topics.add_single("topic-1", "course-1")
recommender.topics.process_item!("course-1")
# Add an array of courses to tag-1. Again, these will simply be added to tag-1's existing set, if it exists.
# If not, the tag-1 set will be initialized with course-1 and course-2
recommender.tags.add_set!("tag-1", ["course-1", "course-2"])
# Or, just add the set and process whenever you like
recommender.tags.add_set("tag-1", ["course-1", "course-2"])
["course-1", "course-2"].each { |course| recommender.topics.process_item!(course) }
```
As noted above, it's important to remember that if you don't use the bang methods (add_set! and add_single!), you'll need to manually update your similarities (the bang methods will likely suffice for most use cases though). You can do so a variety of ways.
* If you want to simply update the similarities for a single item in a specific matrix:
````
recommender.matrix.process_item!(item)
````
* If you want to update the similarities for all items in a specific matrix:
````
recommender.matrix.process!
````
* If you want to update the similarities for a single item in all matrices:
````
recommender.process_item!(item)
````
* If you want to update all similarities in all matrices:
````
recommender.process!
````
Retrieving Similarities and Recommendations
---------------------
Now that your matrices have been initialized with several relationships, you can start generating similarities and recommendations! First, let's start with similarities, which will use the weights we specify on each matrix to determine which courses share the most in common with a given course.

```ruby
recommender = CourseRecommender.new
# Return all similarities for course-1 (ordered by most similar to least).
recommender.similarities_for("course-1")
# Need to paginate? Not a problem! Specify an offset and a limit
recommender.similarities_for("course-1", offset: 10, limit: 10) # Gets similarities 11-20
# Want scores?
recommender.similarities_for("course-1", with_scores: true)
# Want to ignore a certain set of courses in similarities?
recommender.similarities_for("course-1", exclusion_set: ["course-2"])
```
The above examples are great for situations like "Users that viewed this also liked ...", but what if you wanted to recommend courses to a user based on the courses they've already taken? Not a problem!

```ruby
recommender = CourseRecommender.new
# User has taken course-1 and course-2. Let's see what else they might like...
recommender.predictions_for(item_set: ["course-1", "course-2"])
# Already have the set you need stored in an input matrix? In our case, we do (the users matrix stores the courses a user has taken), so we can just do:
recommender.predictions_for("user-1", matrix_label: :users)
# Paginate too!
recommender.predictions_for("user-1", matrix_label: :users, offset: 10, limit: 10)
# Gimme some scores and ignore user-2....that user-2 is one sketchy fella
recommender.predictions_for("user-1", matrix_label: :users, with_scores: true, exclusion_set: ["user-2"])
```
Deleting Items
---------------------
If your data is deleted from your persistent storage, you certainly don't want to recommend that data to a user. To ensure that doesn't happen, simply call delete_item! on the individual matrix or recommender as a whole:
```ruby
recommender = CourseRecommender.new
# User removed course-1 from topic-1, but course-1 still exists
recommender.topics.delete_item!("course-1")
# course-1 was permanently deleted
recommender.delete_item!("course-1")
# Something crazy has happened, so let's just start fresh and wipe out all previously stored similarities:
recommender.clean!
```
Memory Management
---------------------
Predictor works by caching the similarities for each item in each matrix, then computing overall similarities off those caches. With an even semi-large dataset, this can really eat up Redis's memory. To limit the number of similarities cached in each matrix, specify a similarity_limit option when defining the matrix.
```ruby
class CourseRecommender
include Predictor::Base
input_matrix :users, weight: 3.0, similarity_limit: 300
input_matrix :tags, weight: 2.0, similarity_limit: 300
input_matrix :topics, weight: 1.0, similarity_limit: 300
end
```
This will ensure that only the top 300 similarities for each item are cached in each matrix. This can greatly reduce your memory usage, and if you're just using Predictor for scenarios where you maybe show the top 5 or so similar items, then this can be hugely helpful. But note, **don't set similarity_limit to 5 in that case**. This simply limits the similarities cached in each matrix, but does not limit the similarities for an item across all matrices. That is computed (and can be limited) on the fly, and uses the similarity cache in each matrix. So, you need a large enough cache in each matrix to determine an intelligent similarity list across all matrices.
*Note*: This is a bit of a hack, and there are most certainly other ways to improve Predictor's memory usage for large datasets, but each appear to require a more significant change than the trivial implementation of similarity_limit above. PRs are quite welcome that experiment with these other ways :)
Oh, and if you decide to tinker with your limit to try and find a sweet spot, I added a helpful method to ensure limits are obeyed to avoid regenerating all similarities. Of course, this only helps if you are decreasing the limit. If you're increasing it, you'll need to process similarities all over.
```ruby
recommender.users.ensure_similarity_limit_is_obeyed! # Remove similarities that disobey our current limit
recommender.tags.ensure_similarity_limit_is_obeyed!
recommender.topics.ensure_similarity_limit_is_obeyed!
```
Problems? Issues? Want to help out?
---------------------
Just submit a Gihub issue or pull request! We'd love to have you help out, as the most common library to use for this need, Recommendify, was last updated 2 years ago. We'll be sure to keep this maintained, but we could certainly use your help!
The MIT License (MIT)
---------------------
Copyright (c) 2014 Pathgather
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
================================================
FILE: lib/predictor/base.rb
================================================
module Predictor::Base
def self.included(base)
base.extend(ClassMethods)
end
module ClassMethods
def input_matrix(key, opts={})
@matrices ||= {}
@matrices[key] = opts
end
def limit_similarities_to(val)
@similarity_limit_set = true
@similarity_limit = val
end
def similarity_limit
@similarity_limit_set ? @similarity_limit : 128
end
def reset_similarity_limit!
@similarity_limit_set = nil
@similarity_limit = nil
end
def input_matrices=(val)
@matrices = val
end
def input_matrices
@matrices
end
def redis_prefix(prefix = nil, &block)
@redis_prefix = block_given? ? block : prefix
end
def get_redis_prefix
if @redis_prefix
if @redis_prefix.respond_to?(:call)
@redis_prefix.call
else
@redis_prefix
end
else
to_s
end
end
def processing_technique(technique)
@technique = technique
end
def get_processing_technique
@technique || Predictor.get_processing_technique
end
end
def input_matrices
@input_matrices ||= Hash[self.class.input_matrices.map{ |key, opts|
opts.merge!(:key => key, :base => self)
[ key, Predictor::InputMatrix.new(opts) ]
}]
end
def get_redis_prefix
nil # Override in subclass.
end
def redis_prefix
[Predictor.get_redis_prefix, self.class.get_redis_prefix, self.get_redis_prefix].compact
end
def similarity_limit
self.class.similarity_limit
end
def redis_key(*append)
([redis_prefix] + append).flatten.compact.join(":")
end
def method_missing(method, *args)
if input_matrices.has_key?(method)
input_matrices[method]
else
raise NoMethodError.new(method.to_s)
end
end
def respond_to?(method, include_all = false)
input_matrices.has_key?(method) ? true : super
end
def all_items
Predictor.redis.smembers(redis_key(:all_items))
end
def add_to_matrix(matrix, set, *items)
items = items.flatten if items.count == 1 && items[0].is_a?(Array) # Old syntax
input_matrices[matrix].add_to_set(set, *items)
end
def add_to_matrix!(matrix, set, *items)
items = items.flatten if items.count == 1 && items[0].is_a?(Array) # Old syntax
add_to_matrix(matrix, set, *items)
process_items!(*items)
end
def related_items(item)
keys = []
input_matrices.each do |key, matrix|
sets = Predictor.redis.smembers(matrix.redis_key(:sets, item))
keys.concat(sets.map { |set| matrix.redis_key(:items, set) })
end
keys.empty? ? [] : (Predictor.redis.sunion(keys) - [item.to_s])
end
def predictions_for(set=nil, item_set: nil, matrix_label: nil, with_scores: false, on: nil, offset: 0, limit: -1, exclusion_set: [], boost: {})
fail "item_set or matrix_label and set is required" unless item_set || (matrix_label && set)
on = Array(on)
if matrix_label
matrix = input_matrices[matrix_label]
item_set = Predictor.redis.smembers(matrix.redis_key(:items, set))
end
item_keys = []
weights = []
item_set.each do |item|
item_keys << redis_key(:similarities, item)
weights << 1.0
end
boost.each do |matrix_label, values|
m = input_matrices[matrix_label]
# Passing plain sets to zunionstore is undocumented, but tested and supported:
# https://github.com/antirez/redis/blob/2.8.11/tests/unit/type/zset.tcl#L481-L489
case values
when Hash
values[:values].each do |value|
item_keys << m.redis_key(:items, value)
weights << values[:weight]
end
when Array
values.each do |value|
item_keys << m.redis_key(:items, value)
weights << 1.0
end
else
raise "Bad value for boost: #{boost.inspect}"
end
end
return [] if item_keys.empty?
predictions = nil
Predictor.redis.multi do |multi|
multi.zunionstore 'temp', item_keys, weights: weights
multi.zrem 'temp', item_set if item_set.any?
multi.zrem 'temp', exclusion_set if exclusion_set.length > 0
if on.any?
multi.zadd 'temp2', on.map{ |val| [0.0, val] }
multi.zinterstore 'temp', ['temp', 'temp2']
multi.del 'temp2'
end
predictions = multi.zrevrange 'temp', offset, limit == -1 ? limit : offset + (limit - 1), with_scores: with_scores
multi.del 'temp'
end
predictions.value
end
def similarities_for(item, with_scores: false, offset: 0, limit: -1, exclusion_set: [])
neighbors = nil
Predictor.redis.multi do |multi|
multi.zunionstore 'temp', [1, redis_key(:similarities, item)]
multi.zrem 'temp', exclusion_set if exclusion_set.length > 0
neighbors = multi.zrevrange('temp', offset, limit == -1 ? limit : offset + (limit - 1), with_scores: with_scores)
multi.del 'temp'
end
return neighbors.value
end
def sets_for(item)
keys = input_matrices.map{ |k,m| m.redis_key(:sets, item) }
Predictor.redis.sunion keys
end
def process_item!(item)
process_items!(item) # Old method
end
def process_items!(*items)
items = items.flatten if items.count == 1 && items[0].is_a?(Array) # Old syntax
case self.class.get_processing_technique
when :lua
matrix_data = {}
input_matrices.each do |name, matrix|
matrix_data[name] = {weight: matrix.weight, measure: matrix.measure_name}
end
matrix_json = JSON.dump(matrix_data)
items.each do |item|
Predictor.process_lua_script(redis_key, matrix_json, similarity_limit, item)
end
when :union
items.each do |item|
keys = []
weights = []
input_matrices.each do |key, matrix|
k = matrix.redis_key(:sets, item)
item_keys = Predictor.redis.smembers(k).map { |set| matrix.redis_key(:items, set) }
counts = Predictor.redis.multi do |multi|
item_keys.each { |key| Predictor.redis.scard(key) }
end
item_keys.zip(counts).each do |key, count|
unless count.zero?
keys << key
weights << matrix.weight / count
end
end
end
Predictor.redis.multi do |multi|
key = redis_key(:similarities, item)
multi.del(key)
if keys.any?
multi.zunionstore(key, keys, weights: weights)
multi.zrem(key, item)
multi.zremrangebyrank(key, 0, -(similarity_limit + 1))
multi.zunionstore key, [key] # Rewrite zset for optimized storage.
end
end
end
else # Default to old behavior, processing things in Ruby.
items.each do |item|
related_items(item).each { |related_item| cache_similarity(item, related_item) }
end
end
return self
end
def process!
process_items!(*all_items)
return self
end
def delete_from_matrix!(matrix, item)
# Deleting from a specific matrix, so get related_items, delete, then update the similarity of those related_items
items = related_items(item)
input_matrices[matrix].delete_item(item)
items.each { |related_item| cache_similarity(item, related_item) }
return self
end
def delete_pair_from_matrix!(matrix, set, item)
items = related_items(item)
input_matrices[matrix].remove_from_set(set, item)
items.each { |related_item| cache_similarity(item, related_item) }
return self
end
def add_item(item)
Predictor.redis.sadd(redis_key(:all_items), item)
end
def delete_item!(item)
Predictor.redis.srem(redis_key(:all_items), item)
Predictor.redis.watch(redis_key(:similarities, item)) do
items = related_items(item)
Predictor.redis.multi do |multi|
items.each do |related_item|
multi.zrem(redis_key(:similarities, related_item), item)
end
multi.del redis_key(:similarities, item)
end
end
input_matrices.each do |k,m|
m.delete_item(item)
end
return self
end
def clean!
keys = Predictor.redis.keys(redis_key('*'))
unless keys.empty?
Predictor.redis.del(keys)
end
end
def ensure_similarity_limit_is_obeyed!
if similarity_limit
items = all_items
Predictor.redis.multi do |multi|
items.each do |item|
key = redis_key(:similarities, item)
multi.zremrangebyrank(key, 0, -(similarity_limit + 1))
multi.zunionstore key, [key] # Rewrite zset to take advantage of ziplist implementation.
end
end
end
end
private
def cache_similarity(item1, item2)
score = 0
input_matrices.each do |key, matrix|
score += (matrix.score(item1, item2) * matrix.weight)
end
if score > 0
add_similarity_if_necessary(item1, item2, score)
add_similarity_if_necessary(item2, item1, score)
else
Predictor.redis.multi do |multi|
multi.zrem(redis_key(:similarities, item1), item2)
multi.zrem(redis_key(:similarities, item2), item1)
end
end
end
def add_similarity_if_necessary(item, similarity, score)
store = true
key = redis_key(:similarities, item)
if similarity_limit
if Predictor.redis.zrank(key, similarity).nil? && Predictor.redis.zcard(key) >= similarity_limit
# Similarity is not already stored and we are at limit of similarities
lowest_scored_item = Predictor.redis.zrangebyscore(key, "0", "+inf", limit: [0, 1], with_scores: true)
unless lowest_scored_item.empty?
# If score is less than or equal to the lowest score, don't store it. Otherwise, make room by removing the lowest scored similarity
score <= lowest_scored_item[0][1] ? store = false : Predictor.redis.zrem(key, lowest_scored_item[0][0])
end
end
end
Predictor.redis.zadd(key, score, similarity) if store
end
end
================================================
FILE: lib/predictor/distance.rb
================================================
module Predictor
module Distance
extend self
def jaccard_index(key_1, key_2, redis = Predictor.redis)
x, y = nil
redis.multi do |multi|
x = multi.sinterstore 'temp', [key_1, key_2]
y = multi.sunionstore 'temp', [key_1, key_2]
multi.del 'temp'
end
y.value > 0 ? (x.value.to_f/y.value.to_f) : 0.0
end
def sorensen_coefficient(key_1, key_2, redis = Predictor.redis)
x, y, z = nil
redis.multi do |multi|
x = multi.sinterstore 'temp', [key_1, key_2]
y = multi.scard key_1
z = multi.scard key_2
multi.del 'temp'
end
denom = (y.value + z.value)
denom > 0 ? (2 * (x.value) / denom.to_f) : 0.0
end
end
end
================================================
FILE: lib/predictor/input_matrix.rb
================================================
module Predictor
class InputMatrix
def initialize(opts)
@opts = opts
end
def measure_name
@opts.fetch(:measure, :jaccard_index)
end
def base
@opts[:base]
end
def parent_redis_key(*append)
base.redis_key(*append)
end
def redis_key(*append)
base.redis_key(@opts.fetch(:key), *append)
end
def weight
(@opts[:weight] || 1).to_f
end
def add_to_set(set, *items)
items = items.flatten if items.count == 1 && items[0].is_a?(Array)
if items.any?
Predictor.redis.multi do |redis|
redis.sadd(parent_redis_key(:all_items), items)
redis.sadd(redis_key(:items, set), items)
items.each do |item|
# add the set to the item's set--inverting the sets
redis.sadd(redis_key(:sets, item), set)
end
end
end
end
# Delete a specific relationship
def remove_from_set(set, item)
Predictor.redis.multi do |redis|
redis.srem(redis_key(:items, set), item)
redis.srem(redis_key(:sets, item), set)
end
end
def add_set(set, items)
add_to_set(set, *items)
end
def add_single(set, item)
add_to_set(set, item)
end
def items_for(set)
Predictor.redis.smembers redis_key(:items, set)
end
def sets_for(item)
Predictor.redis.sunion redis_key(:sets, item)
end
def related_items(item)
sets = Predictor.redis.smembers(redis_key(:sets, item))
keys = sets.map { |set| redis_key(:items, set) }
keys.length > 0 ? Predictor.redis.sunion(keys) - [item.to_s] : []
end
# delete item from the matrix
def delete_item(item)
Predictor.redis.watch(redis_key(:sets, item)) do
sets = Predictor.redis.smembers(redis_key(:sets, item))
Predictor.redis.multi do |multi|
sets.each do |set|
multi.srem(redis_key(:items, set), item)
end
multi.del redis_key(:sets, item)
end
end
end
def score(item1, item2)
Distance.send(measure_name, redis_key(:sets, item1), redis_key(:sets, item2), Predictor.redis)
end
def calculate_jaccard(item1, item2)
warn 'InputMatrix#calculate_jaccard is now deprecated. Use InputMatrix#score instead'
Distance.jaccard_index(redis_key(:sets, item1), redis_key(:sets, item2), Predictor.redis)
end
end
end
================================================
FILE: lib/predictor/predictor.rb
================================================
module Predictor
@@redis = nil
@@redis_prefix = nil
def self.redis=(redis)
@@redis = redis
end
def self.redis
return @@redis unless @@redis.nil?
raise "redis not configured! - Predictor.redis = Redis.new"
end
def self.redis_prefix(prefix = nil, &block)
@@redis_prefix = block_given? ? block : prefix
end
def self.get_redis_prefix
if @@redis_prefix
if @@redis_prefix.respond_to?(:call)
@@redis_prefix.call
else
@@redis_prefix
end
else
'predictor'
end
end
def self.capitalize(str_or_sym)
str = str_or_sym.to_s.each_char.to_a
str.first.upcase + str[1..-1].join("").downcase
end
def self.constantize(klass)
Object.module_eval("Predictor::#{klass}", __FILE__, __LINE__)
end
def self.processing_technique(algorithm)
@technique = algorithm
end
def self.get_processing_technique
@technique || :ruby
end
def self.process_lua_script(*args)
@process_sha ||= redis.script(:load, PROCESS_ITEMS_LUA_SCRIPT)
redis.evalsha(@process_sha, argv: args)
end
PROCESS_ITEMS_LUA_SCRIPT = <<-LUA
local redis_prefix = ARGV[1]
local input_matrices = cjson.decode(ARGV[2])
local similarity_limit = tonumber(ARGV[3])
local item = ARGV[4]
local keys = {}
for name, options in pairs(input_matrices) do
local key = table.concat({redis_prefix, name, 'sets', item}, ':')
local sets = redis.call('SMEMBERS', key)
for _, set in ipairs(sets) do
table.insert(keys, table.concat({redis_prefix, name, 'items', set}, ':'))
end
end
-- Account for empty tables.
if next(keys) == nil then
return nil
end
local related_items = redis.call('SUNION', unpack(keys))
local function add_similarity_if_necessary(item, similarity, score)
local store = true
local key = table.concat({redis_prefix, 'similarities', item}, ':')
if similarity_limit ~= nil then
local zrank = redis.call('ZRANK', key, similarity)
if zrank ~= nil then
local zcard = redis.call('ZCARD', key)
if zcard >= similarity_limit then
-- Similarity is not already stored and we are at limit of similarities.
local lowest_scored_item = redis.call('ZRANGEBYSCORE', key, '0', '+inf', 'withscores', 'limit', 0, 1)
if #lowest_scored_item > 0 then
-- If score is less than or equal to the lowest score, don't store it. Otherwise, make room by removing the lowest scored similarity
if score <= tonumber(lowest_scored_item[2]) then
store = false
else
redis.call('ZREM', key, lowest_scored_item[1])
end
end
end
end
end
if store then
redis.call('ZADD', key, score, similarity)
end
end
for i, related_item in ipairs(related_items) do
-- Disregard the current item.
if related_item ~= item then
local score = 0.0
for name, matrix in pairs(input_matrices) do
local s = 0.0
local key_1 = table.concat({redis_prefix, name, 'sets', item}, ':')
local key_2 = table.concat({redis_prefix, name, 'sets', related_item}, ':')
if matrix.measure == 'jaccard_index' then
local x = tonumber(redis.call('SINTERSTORE', 'temp', key_1, key_2))
local y = tonumber(redis.call('SUNIONSTORE', 'temp', key_1, key_2))
redis.call('DEL', 'temp')
if y > 0 then
s = s + (x / y)
end
elseif matrix.measure == 'sorensen_coefficient' then
local x = redis.call('SINTERSTORE', 'temp', key_1, key_2)
local y = redis.call('SCARD', key_1)
local z = redis.call('SCARD', key_2)
redis.call('DEL', 'temp')
local denom = y + z
if denom > 0 then
s = s + (2 * x / denom)
end
else
error("Bad matrix.measure: " .. matrix.measure)
end
score = score + (s * matrix.weight)
end
if score > 0 then
add_similarity_if_necessary(item, related_item, score)
add_similarity_if_necessary(related_item, item, score)
else
redis.call('ZREM', table.concat({redis_prefix, 'similarities', item}, ':'), related_item)
redis.call('ZREM', table.concat({redis_prefix, 'similarities', related_item}, ':'), item)
end
end
end
LUA
end
================================================
FILE: lib/predictor/version.rb
================================================
module Predictor
VERSION = "2.3.1"
end
================================================
FILE: lib/predictor.rb
================================================
require 'json'
require "redis"
require "predictor/predictor"
require "predictor/distance"
require "predictor/input_matrix"
require "predictor/base"
================================================
FILE: predictor.gemspec
================================================
# -*- encoding: utf-8 -*-
require File.expand_path('../lib/predictor/version', __FILE__)
Gem::Specification.new do |s|
s.name = "predictor"
s.version = Predictor::VERSION
s.platform = Gem::Platform::RUBY
s.authors = ["Pathgather"]
s.email = ["tech@pathgather.com"]
s.homepage = "https://github.com/nyagato-00/predictor"
s.description = s.summary = "Fast and efficient recommendations and predictions using Redis"
s.licenses = ["MIT"]
s.add_dependency "redis", ">= 3.0.0"
s.add_development_dependency "rspec", ">= 3.4.0"
s.add_development_dependency "rake", ">= 11.0"
s.add_development_dependency "pry"
s.add_development_dependency "yard"
s.files = `git ls-files`.split("\n") - [".gitignore", ".rspec", ".travis.yml"]
s.test_files = `git ls-files -- spec/*`.split("\n")
s.require_paths = ["lib"]
end
================================================
FILE: spec/base_spec.rb
================================================
require 'spec_helper'
describe Predictor::Base do
before(:each) do
flush_redis!
BaseRecommender.input_matrices = {}
BaseRecommender.reset_similarity_limit!
BaseRecommender.redis_prefix(nil)
UserRecommender.input_matrices = {}
UserRecommender.reset_similarity_limit!
BaseRecommender.processing_technique nil
UserRecommender.processing_technique nil
Predictor.processing_technique nil
end
describe "configuration" do
it "should add an input_matrix by 'key'" do
BaseRecommender.input_matrix(:myinput)
expect(BaseRecommender.input_matrices.keys).to eq([:myinput])
end
it "should default the similarity_limit to 128" do
expect(BaseRecommender.similarity_limit).to eq(128)
end
it "should allow the similarity limit to be configured" do
BaseRecommender.limit_similarities_to(500)
expect(BaseRecommender.similarity_limit).to eq(500)
end
it "should allow the similarity limit to be removed" do
BaseRecommender.limit_similarities_to(nil)
expect(BaseRecommender.similarity_limit).to eq(nil)
end
it "should retrieve an input_matrix on a new instance" do
BaseRecommender.input_matrix(:myinput)
sm = BaseRecommender.new
expect{ sm.myinput }.not_to raise_error
end
it "should retrieve an input_matrix on a new instance and correctly overload respond_to?" do
BaseRecommender.input_matrix(:myinput)
sm = BaseRecommender.new
expect(sm.respond_to?(:process!)).to be_truthy
expect(sm.respond_to?(:myinput)).to be_truthy
expect(sm.respond_to?(:fnord)).to be_falsey
end
it "should retrieve an input_matrix on a new instance and intialize the correct class" do
BaseRecommender.input_matrix(:myinput)
sm = BaseRecommender.new
expect(sm.myinput).to be_a(Predictor::InputMatrix)
end
it "should accept a custom processing_technique, or default to Predictor's default" do
expect(BaseRecommender.get_processing_technique).to eq(:ruby)
Predictor.processing_technique :lua
expect(BaseRecommender.get_processing_technique).to eq(:lua)
BaseRecommender.processing_technique :union
expect(BaseRecommender.get_processing_technique).to eq(:union)
end
end
describe "redis_key" do
it "should vary based on the class name" do
expect(BaseRecommender.new.redis_key).to eq('predictor-test:BaseRecommender')
expect(UserRecommender.new.redis_key).to eq('predictor-test:UserRecommender')
end
end
describe "redis_key" do
it "should vary based on the class name" do
expect(BaseRecommender.new.redis_key).to eq('predictor-test:BaseRecommender')
expect(UserRecommender.new.redis_key).to eq('predictor-test:UserRecommender')
end
it "should be able to mimic the old naming defaults" do
BaseRecommender.redis_prefix([nil])
expect(BaseRecommender.new.redis_key(:key)).to eq('predictor-test:key')
end
it "should respect the Predictor prefix configuration setting" do
br = BaseRecommender.new
expect(br.redis_key).to eq("predictor-test:BaseRecommender")
expect(br.redis_key(:another)).to eq("predictor-test:BaseRecommender:another")
expect(br.redis_key(:another, :key)).to eq("predictor-test:BaseRecommender:another:key")
expect(br.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:BaseRecommender:another:set:of:keys")
i = 0
Predictor.redis_prefix { i += 1 }
expect(br.redis_key).to eq("1:BaseRecommender")
expect(br.redis_key(:another)).to eq("2:BaseRecommender:another")
expect(br.redis_key(:another, :key)).to eq("3:BaseRecommender:another:key")
expect(br.redis_key(:another, [:set, :of, :keys])).to eq("4:BaseRecommender:another:set:of:keys")
Predictor.redis_prefix nil
expect(br.redis_key).to eq("predictor:BaseRecommender")
expect(br.redis_key(:another)).to eq("predictor:BaseRecommender:another")
expect(br.redis_key(:another, :key)).to eq("predictor:BaseRecommender:another:key")
expect(br.redis_key(:another, [:set, :of, :keys])).to eq("predictor:BaseRecommender:another:set:of:keys")
Predictor.redis_prefix [nil]
expect(br.redis_key).to eq("BaseRecommender")
expect(br.redis_key(:another)).to eq("BaseRecommender:another")
expect(br.redis_key(:another, :key)).to eq("BaseRecommender:another:key")
expect(br.redis_key(:another, [:set, :of, :keys])).to eq("BaseRecommender:another:set:of:keys")
Predictor.redis_prefix { [1, 2, 3] }
expect(br.redis_key).to eq("1:2:3:BaseRecommender")
expect(br.redis_key(:another)).to eq("1:2:3:BaseRecommender:another")
expect(br.redis_key(:another, :key)).to eq("1:2:3:BaseRecommender:another:key")
expect(br.redis_key(:another, [:set, :of, :keys])).to eq("1:2:3:BaseRecommender:another:set:of:keys")
Predictor.redis_prefix 'predictor-test'
expect(br.redis_key).to eq("predictor-test:BaseRecommender")
expect(br.redis_key(:another)).to eq("predictor-test:BaseRecommender:another")
expect(br.redis_key(:another, :key)).to eq("predictor-test:BaseRecommender:another:key")
expect(br.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:BaseRecommender:another:set:of:keys")
end
it "should respect the class prefix configuration setting" do
br = BaseRecommender.new
BaseRecommender.redis_prefix('base')
expect(br.redis_key).to eq("predictor-test:base")
expect(br.redis_key(:another)).to eq("predictor-test:base:another")
expect(br.redis_key(:another, :key)).to eq("predictor-test:base:another:key")
expect(br.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:base:another:set:of:keys")
i = 0
BaseRecommender.redis_prefix { i += 1 }
expect(br.redis_key).to eq("predictor-test:1")
expect(br.redis_key(:another)).to eq("predictor-test:2:another")
expect(br.redis_key(:another, :key)).to eq("predictor-test:3:another:key")
expect(br.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:4:another:set:of:keys")
BaseRecommender.redis_prefix(nil)
expect(br.redis_key).to eq("predictor-test:BaseRecommender")
expect(br.redis_key(:another)).to eq("predictor-test:BaseRecommender:another")
expect(br.redis_key(:another, :key)).to eq("predictor-test:BaseRecommender:another:key")
expect(br.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:BaseRecommender:another:set:of:keys")
end
it "should respect the instance prefix configuration setting" do
br = PrefixRecommender.new("foo")
expect(br.redis_key).to eq("predictor-test:PrefixRecommender:foo")
expect(br.redis_key(:another)).to eq("predictor-test:PrefixRecommender:foo:another")
expect(br.redis_key(:another, :key)).to eq("predictor-test:PrefixRecommender:foo:another:key")
expect(br.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:PrefixRecommender:foo:another:set:of:keys")
br.prefix = nil
expect(br.redis_key).to eq("predictor-test:PrefixRecommender")
expect(br.redis_key(:another)).to eq("predictor-test:PrefixRecommender:another")
end
end
describe "all_items" do
it "returns all items across all matrices" do
BaseRecommender.input_matrix(:anotherinput)
BaseRecommender.input_matrix(:yetanotherinput)
sm = BaseRecommender.new
sm.add_to_matrix(:anotherinput, 'a', "foo", "bar")
sm.add_to_matrix(:yetanotherinput, 'b', "fnord", "shmoo", "bar")
expect(sm.all_items).to include('foo', 'bar', 'fnord', 'shmoo')
expect(sm.all_items.length).to eq(4)
end
it "doesn't return items from other recommenders" do
BaseRecommender.input_matrix(:anotherinput)
BaseRecommender.input_matrix(:yetanotherinput)
UserRecommender.input_matrix(:anotherinput)
UserRecommender.input_matrix(:yetanotherinput)
sm = BaseRecommender.new
sm.add_to_matrix(:anotherinput, 'a', "foo", "bar")
sm.add_to_matrix(:yetanotherinput, 'b', "fnord", "shmoo", "bar")
expect(sm.all_items).to include('foo', 'bar', 'fnord', 'shmoo')
expect(sm.all_items.length).to eq(4)
ur = UserRecommender.new
expect(ur.all_items).to eq([])
end
end
describe "add_to_matrix" do
it "calls add_to_set on the given matrix" do
BaseRecommender.input_matrix(:anotherinput)
sm = BaseRecommender.new
expect(sm.anotherinput).to receive(:add_to_set).with('a', 'foo', 'bar')
sm.add_to_matrix(:anotherinput, 'a', 'foo', 'bar')
end
it "adds the items to the all_items storage" do
BaseRecommender.input_matrix(:anotherinput)
sm = BaseRecommender.new
sm.add_to_matrix(:anotherinput, 'a', 'foo', 'bar')
expect(sm.all_items).to include('foo', 'bar')
end
end
describe "add_to_matrix!" do
it "calls add_to_matrix and process_items! for the given items" do
BaseRecommender.input_matrix(:anotherinput)
sm = BaseRecommender.new
expect(sm).to receive(:add_to_matrix).with(:anotherinput, 'a', 'foo')
expect(sm).to receive(:process_items!).with('foo')
sm.add_to_matrix!(:anotherinput, 'a', 'foo')
end
end
describe "related_items" do
it "returns items in the sets across all matrices that the given item is also in" do
BaseRecommender.input_matrix(:anotherinput)
BaseRecommender.input_matrix(:yetanotherinput)
BaseRecommender.input_matrix(:finalinput)
sm = BaseRecommender.new
sm.anotherinput.add_to_set('a', "foo", "bar")
sm.yetanotherinput.add_to_set('b', "fnord", "shmoo", "bar")
sm.finalinput.add_to_set('c', "nada")
sm.process!
expect(sm.related_items("bar")).to include("foo", "fnord", "shmoo")
expect(sm.related_items("bar").length).to eq(3)
end
end
describe "predictions_for" do
it "accepts an :on option to return scores of specific objects" do
BaseRecommender.input_matrix(:users, weight: 4.0)
BaseRecommender.input_matrix(:tags, weight: 1.0)
sm = BaseRecommender.new
sm.users.add_to_set('me', "foo", "bar", "fnord")
sm.users.add_to_set('not_me', "foo", "shmoo")
sm.users.add_to_set('another', "fnord", "other")
sm.users.add_to_set('another', "nada")
sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo")
sm.tags.add_to_set('tag2', "bar", "shmoo", "other")
sm.tags.add_to_set('tag3', "shmoo", "nada")
sm.process!
predictions = sm.predictions_for('me', matrix_label: :users, on: 'other', with_scores: true)
expect(predictions).to eq([['other', 3.0]])
predictions = sm.predictions_for('me', matrix_label: :users, on: ['other'], with_scores: true)
expect(predictions).to eq([['other', 3.0]])
predictions = sm.predictions_for('me', matrix_label: :users, on: ['other', 'nada'], with_scores: true)
expect(predictions).to eq([['other', 3.0], ['nada', 2.0]])
predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], on: ['other', 'nada'], with_scores: true)
expect(predictions).to eq([['other', 3.0], ['nada', 2.0]])
predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], on: ['other', 'nada'])
expect(predictions).to eq(['other', 'nada'])
predictions = sm.predictions_for('me', matrix_label: :users, on: ['shmoo', 'other', 'nada'], offset: 1, limit: 1, with_scores: true)
expect(predictions).to eq([["other", 3.0]])
predictions = sm.predictions_for('me', matrix_label: :users, on: ['shmoo', 'other', 'nada'], offset: 1, with_scores: true)
expect(predictions).to eq([['other', 3.0], ['nada', 2.0]])
end
end
[:ruby, :lua, :union].each do |technique|
describe "predictions_for with #{technique} processing" do
before do
Predictor.processing_technique(technique)
end
it "returns relevant predictions" do
BaseRecommender.input_matrix(:users, weight: 4.0)
BaseRecommender.input_matrix(:tags, weight: 1.0)
sm = BaseRecommender.new
sm.users.add_to_set('me', "foo", "bar", "fnord")
sm.users.add_to_set('not_me', "foo", "shmoo")
sm.users.add_to_set('another', "fnord", "other")
sm.users.add_to_set('another', "nada")
sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo")
sm.tags.add_to_set('tag2', "bar", "shmoo")
sm.tags.add_to_set('tag3', "shmoo", "nada")
sm.process!
predictions = sm.predictions_for('me', matrix_label: :users)
expect(predictions).to eq(["shmoo", "other", "nada"])
predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"])
expect(predictions).to eq(["shmoo", "other", "nada"])
predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1)
expect(predictions).to eq(["other"])
predictions = sm.predictions_for('me', matrix_label: :users, offset: 1)
expect(predictions).to eq(["other", "nada"])
end
it "accepts a :boost option" do
BaseRecommender.input_matrix(:users, weight: 4.0)
BaseRecommender.input_matrix(:tags, weight: 1.0)
sm = BaseRecommender.new
sm.users.add_to_set('me', "foo", "bar", "fnord")
sm.users.add_to_set('not_me', "foo", "shmoo")
sm.users.add_to_set('another', "fnord", "other")
sm.users.add_to_set('another', "nada")
sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo")
sm.tags.add_to_set('tag2', "bar", "shmoo")
sm.tags.add_to_set('tag3', "shmoo", "nada")
sm.process!
# Syntax #1: Tags passed as array, weights assumed to be 1.0
predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: ['tag3']})
expect(predictions).to eq(["shmoo", "nada", "other"])
predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], boost: {tags: ['tag3']})
expect(predictions).to eq(["shmoo", "nada", "other"])
predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: ['tag3']})
expect(predictions).to eq(["nada"])
predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: ['tag3']})
expect(predictions).to eq(["nada", "other"])
# Syntax #2: Weights explicitly set.
predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: {values: ['tag3'], weight: 1.0}})
expect(predictions).to eq(["shmoo", "nada", "other"])
predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], boost: {tags: {values: ['tag3'], weight: 1.0}})
expect(predictions).to eq(["shmoo", "nada", "other"])
predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
expect(predictions).to eq(["nada"])
predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
expect(predictions).to eq(["nada", "other"])
# Make sure weights are actually being passed to Redis.
shmoo, nada, other = sm.predictions_for('me', matrix_label: :users, boost: {tags: {values: ['tag3'], weight: 10000.0}}, with_scores: true)
expect(shmoo[0]).to eq('shmoo')
expect(shmoo[1]).to be > 10000
expect(nada[0]).to eq('nada')
expect(nada[1]).to be > 10000
expect(other[0]).to eq('other')
expect(other[1]).to be < 10
end
it "accepts a :boost option, even with an empty item set" do
BaseRecommender.input_matrix(:users, weight: 4.0)
BaseRecommender.input_matrix(:tags, weight: 1.0)
sm = BaseRecommender.new
sm.users.add_to_set('not_me', "foo", "shmoo")
sm.users.add_to_set('another', "fnord", "other")
sm.users.add_to_set('another', "nada")
sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo")
sm.tags.add_to_set('tag2', "bar", "shmoo")
sm.tags.add_to_set('tag3', "shmoo", "nada")
sm.process!
# Syntax #1: Tags passed as array, weights assumed to be 1.0
predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: ['tag3']})
expect(predictions).to eq(["shmoo", "nada"])
predictions = sm.predictions_for(item_set: [], boost: {tags: ['tag3']})
expect(predictions).to eq(["shmoo", "nada"])
predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: ['tag3']})
expect(predictions).to eq(["nada"])
predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: ['tag3']})
expect(predictions).to eq(["nada"])
# Syntax #2: Weights explicitly set.
predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: {values: ['tag3'], weight: 1.0}})
expect(predictions).to eq(["shmoo", "nada"])
predictions = sm.predictions_for(item_set: [], boost: {tags: {values: ['tag3'], weight: 1.0}})
expect(predictions).to eq(["shmoo", "nada"])
predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
expect(predictions).to eq(["nada"])
predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
expect(predictions).to eq(["nada"])
end
end
describe "process_items! with #{technique} processing" do
before do
Predictor.processing_technique(technique)
end
context "with no similarity_limit" do
it "calculates the similarity between the item and all related_items (other items in a set the given item is in)" do
BaseRecommender.input_matrix(:myfirstinput)
BaseRecommender.input_matrix(:mysecondinput)
BaseRecommender.input_matrix(:mythirdinput, weight: 3.0)
sm = BaseRecommender.new
sm.myfirstinput.add_to_set 'set1', 'item1', 'item2'
sm.mysecondinput.add_to_set 'set2', 'item2', 'item3'
sm.mythirdinput.add_to_set 'set3', 'item2', 'item3'
sm.mythirdinput.add_to_set 'set4', 'item1', 'item2', 'item3'
expect(sm.similarities_for('item2')).to be_empty
sm.process_items!('item2')
similarities = sm.similarities_for('item2')
expect(similarities).to eq(["item3", "item1"])
end
end
context "with a similarity_limit" do
it "calculates the similarity between the item and all related_items (other items in a set the given item is in), but obeys the similarity_limit" do
BaseRecommender.input_matrix(:myfirstinput)
BaseRecommender.input_matrix(:mysecondinput)
BaseRecommender.input_matrix(:mythirdinput, weight: 3.0)
BaseRecommender.limit_similarities_to(1)
sm = BaseRecommender.new
sm.myfirstinput.add_to_set 'set1', 'item1', 'item2'
sm.mysecondinput.add_to_set 'set2', 'item2', 'item3'
sm.mythirdinput.add_to_set 'set3', 'item2', 'item3'
sm.mythirdinput.add_to_set 'set4', 'item1', 'item2', 'item3'
expect(sm.similarities_for('item2')).to be_empty
sm.process_items!('item2')
similarities = sm.similarities_for('item2')
expect(similarities).to include("item3")
expect(similarities.length).to eq(1)
end
end
end
end
describe "similarities_for" do
it "should not throw exception for non existing items" do
sm = BaseRecommender.new
expect(sm.similarities_for("not_existing_item").length).to eq(0)
end
it "correctly weighs and sums input matrices" do
BaseRecommender.input_matrix(:users, weight: 1.0)
BaseRecommender.input_matrix(:tags, weight: 2.0)
BaseRecommender.input_matrix(:topics, weight: 4.0)
sm = BaseRecommender.new
sm.users.add_to_set('user1', "c1", "c2", "c4")
sm.users.add_to_set('user2', "c3", "c4")
sm.topics.add_to_set('topic1', "c1", "c4")
sm.topics.add_to_set('topic2', "c2", "c3")
sm.tags.add_to_set('tag1', "c1", "c2", "c4")
sm.tags.add_to_set('tag2', "c1", "c4")
sm.process!
expect(sm.similarities_for("c1", with_scores: true)).to eq([["c4", 6.5], ["c2", 2.0]])
expect(sm.similarities_for("c2", with_scores: true)).to eq([["c3", 4.0], ["c1", 2.0], ["c4", 1.5]])
expect(sm.similarities_for("c3", with_scores: true)).to eq([["c2", 4.0], ["c4", 0.5]])
expect(sm.similarities_for("c4", with_scores: true, exclusion_set: ["c3"])).to eq([["c1", 6.5], ["c2", 1.5]])
end
end
describe "sets_for" do
it "should return all the sets the given item is in" do
BaseRecommender.input_matrix(:set1)
BaseRecommender.input_matrix(:set2)
sm = BaseRecommender.new
sm.set1.add_to_set "item1", "foo", "bar"
sm.set1.add_to_set "item2", "nada", "bar"
sm.set2.add_to_set "item3", "bar", "other"
expect(sm.sets_for("bar").length).to eq(3)
expect(sm.sets_for("bar")).to include("item1", "item2", "item3")
expect(sm.sets_for("other")).to eq(["item3"])
end
end
describe "process!" do
it "should call process_items for all_items's" do
BaseRecommender.input_matrix(:anotherinput)
BaseRecommender.input_matrix(:yetanotherinput)
sm = BaseRecommender.new
sm.anotherinput.add_to_set('a', "foo", "bar")
sm.yetanotherinput.add_to_set('b', "fnord", "shmoo")
expect(sm.all_items).to include("foo", "bar", "fnord", "shmoo")
expect(sm).to receive(:process_items!).with(*sm.all_items)
sm.process!
end
end
describe "delete_pair_from_matrix!" do
it "should call remove_from_set on the matrix" do
BaseRecommender.input_matrix(:anotherinput)
sm = BaseRecommender.new
sm.anotherinput.add_to_set('a', "foo")
sm.anotherinput.add_to_set('a', "bar")
sm.anotherinput.add_to_set('a', "shmoo")
sm.process!
expect(sm.similarities_for('bar')).to include('foo', 'shmoo')
expect(sm.anotherinput).to receive(:remove_from_set).with('a', 'foo')
sm.delete_pair_from_matrix!(:anotherinput, 'a', 'foo')
end
it "updates similarities" do
BaseRecommender.input_matrix(:anotherinput)
sm = BaseRecommender.new
sm.anotherinput.add_to_set('a', "foo")
sm.anotherinput.add_to_set('a', "bar")
sm.anotherinput.add_to_set('a', "shmoo")
sm.process!
expect(sm.similarities_for('bar')).to include('foo', 'shmoo')
sm.delete_pair_from_matrix!(:anotherinput, 'a', 'foo')
expect(sm.similarities_for('bar')).to eq(['shmoo'])
end
end
describe "delete_from_matrix!" do
it "calls delete_item on the matrix" do
BaseRecommender.input_matrix(:anotherinput)
BaseRecommender.input_matrix(:yetanotherinput)
sm = BaseRecommender.new
sm.anotherinput.add_to_set('a', "foo", "bar")
sm.yetanotherinput.add_to_set('b', "bar", "shmoo")
sm.process!
expect(sm.similarities_for('bar')).to include('foo', 'shmoo')
expect(sm.anotherinput).to receive(:delete_item).with('foo')
sm.delete_from_matrix!(:anotherinput, 'foo')
end
it "updates similarities" do
BaseRecommender.input_matrix(:anotherinput)
BaseRecommender.input_matrix(:yetanotherinput)
sm = BaseRecommender.new
sm.anotherinput.add_to_set('a', "foo", "bar")
sm.yetanotherinput.add_to_set('b', "bar", "shmoo")
sm.process!
expect(sm.similarities_for('bar')).to include('foo', 'shmoo')
sm.delete_from_matrix!(:anotherinput, 'foo')
expect(sm.similarities_for('bar')).to eq(['shmoo'])
end
end
describe "delete_item!" do
it "should call delete_item on each input_matrix" do
BaseRecommender.input_matrix(:myfirstinput)
BaseRecommender.input_matrix(:mysecondinput)
sm = BaseRecommender.new
expect(sm.myfirstinput).to receive(:delete_item).with("fnorditem")
expect(sm.mysecondinput).to receive(:delete_item).with("fnorditem")
sm.delete_item!("fnorditem")
end
it "should remove the item from all_items" do
BaseRecommender.input_matrix(:anotherinput)
sm = BaseRecommender.new
sm.anotherinput.add_to_set('a', "foo", "bar")
sm.process!
expect(sm.all_items).to include('foo')
sm.delete_item!('foo')
expect(sm.all_items).not_to include('foo')
end
it "should remove the item's similarities and also remove the item from related_items' similarities" do
BaseRecommender.input_matrix(:anotherinput)
BaseRecommender.input_matrix(:yetanotherinput)
sm = BaseRecommender.new
sm.anotherinput.add_to_set('a', "foo", "bar")
sm.yetanotherinput.add_to_set('b', "bar", "shmoo")
sm.process!
expect(sm.similarities_for('bar')).to include('foo', 'shmoo')
expect(sm.similarities_for('shmoo')).to include('bar')
sm.delete_item!('shmoo')
expect(sm.similarities_for('bar')).not_to include('shmoo')
expect(sm.similarities_for('shmoo')).to be_empty
end
end
describe "clean!" do
it "should clean out the Redis storage for this Predictor" do
BaseRecommender.input_matrix(:set1)
BaseRecommender.input_matrix(:set2)
sm = BaseRecommender.new
sm.set1.add_to_set "item1", "foo", "bar"
sm.set1.add_to_set "item2", "nada", "bar"
sm.set2.add_to_set "item3", "bar", "other"
expect(Predictor.redis.keys(sm.redis_key('*'))).not_to be_empty
sm.clean!
expect(Predictor.redis.keys(sm.redis_key('*'))).to be_empty
end
end
describe "ensure_similarity_limit_is_obeyed!" do
it "should shorten similarities to the given limit and rewrite the zset" do
BaseRecommender.limit_similarities_to(nil)
BaseRecommender.input_matrix(:myfirstinput)
sm = BaseRecommender.new
sm.myfirstinput.add_to_set *(['set1'] + 130.times.map{|i| "item#{i}"})
expect(sm.similarities_for('item2')).to be_empty
sm.process_items!('item2')
expect(sm.similarities_for('item2').length).to eq(129)
redis = Predictor.redis
key = sm.redis_key(:similarities, 'item2')
expect(redis.zcard(key)).to eq(129)
expect(redis.object(:encoding, key)).to eq('skiplist') # Inefficient
BaseRecommender.reset_similarity_limit!
sm.ensure_similarity_limit_is_obeyed!
expect(redis.zcard(key)).to eq(128)
expect(redis.object(:encoding, key)).to eq('ziplist') # Efficient
end
end
end
================================================
FILE: spec/input_matrix_spec.rb
================================================
require 'spec_helper'
describe Predictor::InputMatrix do
let(:options) { @default_options.merge(@options) }
before(:each) { @options = {} }
before(:all) do
@base = BaseRecommender.new
@default_options = { base: @base, key: "mymatrix" }
@matrix = Predictor::InputMatrix.new(@default_options)
end
before(:each) do
flush_redis!
end
describe "redis_key" do
it "should respect the global namespace configuration" do
expect(@matrix.redis_key).to eq("predictor-test:BaseRecommender:mymatrix")
expect(@matrix.redis_key(:another)).to eq("predictor-test:BaseRecommender:mymatrix:another")
expect(@matrix.redis_key(:another, :key)).to eq("predictor-test:BaseRecommender:mymatrix:another:key")
expect(@matrix.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:BaseRecommender:mymatrix:another:set:of:keys")
i = 0
Predictor.redis_prefix { i += 1 }
expect(@matrix.redis_key).to eq("1:BaseRecommender:mymatrix")
expect(@matrix.redis_key(:another)).to eq("2:BaseRecommender:mymatrix:another")
expect(@matrix.redis_key(:another, :key)).to eq("3:BaseRecommender:mymatrix:another:key")
expect(@matrix.redis_key(:another, [:set, :of, :keys])).to eq("4:BaseRecommender:mymatrix:another:set:of:keys")
Predictor.redis_prefix(nil)
expect(@matrix.redis_key).to eq("predictor:BaseRecommender:mymatrix")
expect(@matrix.redis_key(:another)).to eq("predictor:BaseRecommender:mymatrix:another")
expect(@matrix.redis_key(:another, :key)).to eq("predictor:BaseRecommender:mymatrix:another:key")
expect(@matrix.redis_key(:another, [:set, :of, :keys])).to eq("predictor:BaseRecommender:mymatrix:another:set:of:keys")
Predictor.redis_prefix('predictor-test')
expect(@matrix.redis_key).to eq("predictor-test:BaseRecommender:mymatrix")
expect(@matrix.redis_key(:another)).to eq("predictor-test:BaseRecommender:mymatrix:another")
expect(@matrix.redis_key(:another, :key)).to eq("predictor-test:BaseRecommender:mymatrix:another:key")
expect(@matrix.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:BaseRecommender:mymatrix:another:set:of:keys")
end
it "should respect the class-level configuration" do
i = 0
BaseRecommender.redis_prefix { i += 1 }
expect(@matrix.redis_key).to eq("predictor-test:1:mymatrix")
expect(@matrix.redis_key(:another)).to eq("predictor-test:2:mymatrix:another")
expect(@matrix.redis_key(:another, :key)).to eq("predictor-test:3:mymatrix:another:key")
expect(@matrix.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:4:mymatrix:another:set:of:keys")
BaseRecommender.redis_prefix([nil])
expect(@matrix.redis_key).to eq("predictor-test:mymatrix")
expect(@matrix.redis_key(:another)).to eq("predictor-test:mymatrix:another")
expect(@matrix.redis_key(:another, :key)).to eq("predictor-test:mymatrix:another:key")
expect(@matrix.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:mymatrix:another:set:of:keys")
BaseRecommender.redis_prefix(['a', 'b'])
expect(@matrix.redis_key).to eq("predictor-test:a:b:mymatrix")
expect(@matrix.redis_key(:another)).to eq("predictor-test:a:b:mymatrix:another")
expect(@matrix.redis_key(:another, :key)).to eq("predictor-test:a:b:mymatrix:another:key")
expect(@matrix.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:a:b:mymatrix:another:set:of:keys")
BaseRecommender.redis_prefix(nil)
expect(@matrix.redis_key).to eq("predictor-test:BaseRecommender:mymatrix")
expect(@matrix.redis_key(:another)).to eq("predictor-test:BaseRecommender:mymatrix:another")
expect(@matrix.redis_key(:another, :key)).to eq("predictor-test:BaseRecommender:mymatrix:another:key")
expect(@matrix.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:BaseRecommender:mymatrix:another:set:of:keys")
end
end
describe "weight" do
it "returns the weight configured or a default of 1" do
expect(@matrix.weight).to eq(1.0) # default weight
matrix = Predictor::InputMatrix.new(redis_prefix: "predictor-test", key: "mymatrix", weight: 5.0)
expect(matrix.weight).to eq(5.0)
end
end
describe "add_to_set" do
it "adds each member of the set to the key's 'sets' set" do
expect(@matrix.items_for("item1")).not_to include("foo", "bar", "fnord", "blubb")
@matrix.add_to_set "item1", "foo", "bar", "fnord", "blubb"
expect(@matrix.items_for("item1")).to include("foo", "bar", "fnord", "blubb")
end
it "does not crash if the set of items is empty" do
@matrix.add_to_set "item1"
@matrix.add_to_set "item1", []
end
it "adds the key to each set member's 'items' set" do
expect(@matrix.sets_for("foo")).not_to include("item1")
expect(@matrix.sets_for("bar")).not_to include("item1")
expect(@matrix.sets_for("fnord")).not_to include("item1")
expect(@matrix.sets_for("blubb")).not_to include("item1")
@matrix.add_to_set "item1", "foo", "bar", "fnord", "blubb"
expect(@matrix.sets_for("foo")).to include("item1")
expect(@matrix.sets_for("bar")).to include("item1")
expect(@matrix.sets_for("fnord")).to include("item1")
expect(@matrix.sets_for("blubb")).to include("item1")
end
end
describe "items_for" do
it "returns the items in the given set ID" do
@matrix.add_to_set "item1", ["foo", "bar", "fnord", "blubb"]
expect(@matrix.items_for("item1")).to include("foo", "bar", "fnord", "blubb")
@matrix.add_to_set "item2", ["foo", "bar", "snafu", "nada"]
expect(@matrix.items_for("item2")).to include("foo", "bar", "snafu", "nada")
expect(@matrix.items_for("item1")).not_to include("snafu", "nada")
end
end
describe "sets_for" do
it "returns the set IDs the given item is in" do
@matrix.add_to_set "item1", ["foo", "bar", "fnord", "blubb"]
@matrix.add_to_set "item2", ["foo", "bar", "snafu", "nada"]
expect(@matrix.sets_for("foo")).to include("item1", "item2")
expect(@matrix.sets_for("snafu")).to eq(["item2"])
end
end
describe "related_items" do
it "returns the items in sets the given item is also in" do
@matrix.add_to_set "item1", ["foo", "bar", "fnord", "blubb"]
@matrix.add_to_set "item2", ["foo", "bar", "snafu", "nada"]
@matrix.add_to_set "item3", ["nada", "other"]
expect(@matrix.related_items("bar")).to include("foo", "fnord", "blubb", "snafu", "nada")
expect(@matrix.related_items("bar").length).to eq(5)
expect(@matrix.related_items("other")).to eq(["nada"])
expect(@matrix.related_items("snafu")).to include("foo", "bar", "nada")
expect(@matrix.related_items("snafu").length).to eq(3)
end
end
describe "delete_item" do
before do
@matrix.add_to_set "item1", "foo", "bar", "fnord", "blubb"
@matrix.add_to_set "item2", "foo", "bar", "snafu", "nada"
@matrix.add_to_set "item3", "nada", "other"
end
it "should delete the item from sets it is in" do
expect(@matrix.items_for("item1")).to include("bar")
expect(@matrix.items_for("item2")).to include("bar")
expect(@matrix.sets_for("bar")).to include("item1", "item2")
@matrix.delete_item("bar")
expect(@matrix.items_for("item1")).not_to include("bar")
expect(@matrix.items_for("item2")).not_to include("bar")
expect(@matrix.sets_for("bar")).to be_empty
end
end
describe "#score" do
let(:matrix) { Predictor::InputMatrix.new(options) }
context "default" do
it "scores as jaccard index by default" do
matrix.add_to_set "item1", "foo", "bar", "fnord", "blubb"
matrix.add_to_set "item2", "bar", "fnord", "shmoo", "snafu"
matrix.add_to_set "item3", "bar", "nada", "snafu"
expect(matrix.score("bar", "snafu")).to eq(2.0/3.0)
end
it "scores as jaccard index when given option" do
matrix = Predictor::InputMatrix.new(options.merge(measure: :jaccard_index))
matrix.add_to_set "item1", "foo", "bar", "fnord", "blubb"
matrix.add_to_set "item2", "bar", "fnord", "shmoo", "snafu"
matrix.add_to_set "item3", "bar", "nada", "snafu"
expect(matrix.score("bar", "snafu")).to eq(2.0/3.0)
end
it "should handle missing sets" do
matrix.add_to_set "item1", "foo", "bar", "fnord", "blubb"
expect(matrix.score("is", "missing")).to eq(0.0)
end
end
context "sorensen_coefficient" do
before { @options[:measure] = :sorensen_coefficient }
it "should calculate the correct sorensen index" do
matrix.add_to_set "item1", "foo", "bar", "fnord", "blubb"
matrix.add_to_set "item2", "fnord", "shmoo", "snafu"
matrix.add_to_set "item3", "bar", "nada", "snafu"
expect(matrix.score("bar", "snafu")).to eq(2.0/4.0)
end
it "should handle missing sets" do
matrix.add_to_set "item1", "foo", "bar", "fnord", "blubb"
expect(matrix.score("is", "missing")).to eq(0.0)
end
end
end
private
def add_two_item_test_data!(matrix)
matrix.add_to_set("user42", "fnord", "blubb")
matrix.add_to_set("user44", "blubb")
matrix.add_to_set("user46", "fnord")
matrix.add_to_set("user48", "fnord", "blubb")
matrix.add_to_set("user50", "fnord")
end
def add_three_item_test_data!(matrix)
matrix.add_to_set("user42", "fnord", "blubb", "shmoo")
matrix.add_to_set("user44", "blubb")
matrix.add_to_set("user46", "fnord", "shmoo")
matrix.add_to_set("user48", "fnord", "blubb")
matrix.add_to_set("user50", "fnord", "shmoo")
end
end
================================================
FILE: spec/predictor_spec.rb
================================================
require 'spec_helper'
describe Predictor do
it "should store a redis connection" do
Predictor.redis = "asd"
expect(Predictor.redis).to eq("asd")
end
it "should raise an exception if unconfigured redis connection is accessed" do
Predictor.redis = nil
expect{ Predictor.redis }.to raise_error(/not configured/i)
end
end
================================================
FILE: spec/spec_helper.rb
================================================
require "predictor"
require "pry"
def flush_redis!
Predictor.redis = Redis.new
Predictor.redis.keys("predictor-test*").each do |k|
Predictor.redis.del(k)
end
end
Predictor.redis_prefix "predictor-test"
class BaseRecommender
include Predictor::Base
end
class UserRecommender
include Predictor::Base
end
class TestRecommender
include Predictor::Base
input_matrix :jaccard_one
end
class PrefixRecommender
include Predictor::Base
def initialize(prefix)
@prefix = prefix
end
def prefix=(new_prefix)
@prefix = new_prefix
end
def get_redis_prefix
@prefix
end
end
class Predictor::TestInputMatrix
def initialize(opts)
@opts = opts
end
def method_missing(method, *args)
@opts[method]
end
end
gitextract_u972f2ab/
├── .github/
│ └── workflows/
│ └── test.yml
├── .gitignore
├── Changelog.md
├── Gemfile
├── LICENSE
├── README.md
├── Rakefile
├── benchmark/
│ └── process.rb
├── docs/
│ └── READMEv1.md
├── lib/
│ ├── predictor/
│ │ ├── base.rb
│ │ ├── distance.rb
│ │ ├── input_matrix.rb
│ │ ├── predictor.rb
│ │ └── version.rb
│ └── predictor.rb
├── predictor.gemspec
└── spec/
├── base_spec.rb
├── input_matrix_spec.rb
├── predictor_spec.rb
└── spec_helper.rb
SYMBOL INDEX (86 symbols across 8 files)
FILE: benchmark/process.rb
function flush! (line 10) | def flush!
class ItemRecommender (line 15) | class ItemRecommender
FILE: lib/predictor/base.rb
type Predictor::Base (line 1) | module Predictor::Base
function included (line 2) | def self.included(base)
type ClassMethods (line 6) | module ClassMethods
function input_matrix (line 7) | def input_matrix(key, opts={})
function limit_similarities_to (line 12) | def limit_similarities_to(val)
function similarity_limit (line 17) | def similarity_limit
function reset_similarity_limit! (line 21) | def reset_similarity_limit!
function input_matrices= (line 26) | def input_matrices=(val)
function input_matrices (line 30) | def input_matrices
function redis_prefix (line 34) | def redis_prefix(prefix = nil, &block)
function get_redis_prefix (line 38) | def get_redis_prefix
function processing_technique (line 50) | def processing_technique(technique)
function get_processing_technique (line 54) | def get_processing_technique
function input_matrices (line 59) | def input_matrices
function get_redis_prefix (line 66) | def get_redis_prefix
function redis_prefix (line 70) | def redis_prefix
function similarity_limit (line 74) | def similarity_limit
function redis_key (line 78) | def redis_key(*append)
function method_missing (line 82) | def method_missing(method, *args)
function respond_to? (line 90) | def respond_to?(method, include_all = false)
function all_items (line 94) | def all_items
function add_to_matrix (line 98) | def add_to_matrix(matrix, set, *items)
function add_to_matrix! (line 103) | def add_to_matrix!(matrix, set, *items)
function related_items (line 109) | def related_items(item)
function predictions_for (line 119) | def predictions_for(set=nil, item_set: nil, matrix_label: nil, with_sc...
function similarities_for (line 181) | def similarities_for(item, with_scores: false, offset: 0, limit: -1, e...
function sets_for (line 192) | def sets_for(item)
function process_item! (line 197) | def process_item!(item)
function process_items! (line 201) | def process_items!(*items)
function process! (line 257) | def process!
function delete_from_matrix! (line 262) | def delete_from_matrix!(matrix, item)
function delete_pair_from_matrix! (line 270) | def delete_pair_from_matrix!(matrix, set, item)
function add_item (line 277) | def add_item(item)
function delete_item! (line 281) | def delete_item!(item)
function clean! (line 299) | def clean!
function ensure_similarity_limit_is_obeyed! (line 306) | def ensure_similarity_limit_is_obeyed!
function cache_similarity (line 321) | def cache_similarity(item1, item2)
function add_similarity_if_necessary (line 337) | def add_similarity_if_necessary(item, similarity, score)
FILE: lib/predictor/distance.rb
type Predictor (line 1) | module Predictor
type Distance (line 2) | module Distance
function jaccard_index (line 5) | def jaccard_index(key_1, key_2, redis = Predictor.redis)
function sorensen_coefficient (line 17) | def sorensen_coefficient(key_1, key_2, redis = Predictor.redis)
FILE: lib/predictor/input_matrix.rb
type Predictor (line 1) | module Predictor
class InputMatrix (line 2) | class InputMatrix
method initialize (line 3) | def initialize(opts)
method measure_name (line 7) | def measure_name
method base (line 11) | def base
method parent_redis_key (line 15) | def parent_redis_key(*append)
method redis_key (line 19) | def redis_key(*append)
method weight (line 23) | def weight
method add_to_set (line 27) | def add_to_set(set, *items)
method remove_from_set (line 43) | def remove_from_set(set, item)
method add_set (line 50) | def add_set(set, items)
method add_single (line 54) | def add_single(set, item)
method items_for (line 58) | def items_for(set)
method sets_for (line 62) | def sets_for(item)
method related_items (line 66) | def related_items(item)
method delete_item (line 73) | def delete_item(item)
method score (line 86) | def score(item1, item2)
method calculate_jaccard (line 90) | def calculate_jaccard(item1, item2)
FILE: lib/predictor/predictor.rb
type Predictor (line 1) | module Predictor
function redis= (line 5) | def self.redis=(redis)
function redis (line 9) | def self.redis
function redis_prefix (line 14) | def self.redis_prefix(prefix = nil, &block)
function get_redis_prefix (line 18) | def self.get_redis_prefix
function capitalize (line 30) | def self.capitalize(str_or_sym)
function constantize (line 35) | def self.constantize(klass)
function processing_technique (line 39) | def self.processing_technique(algorithm)
function get_processing_technique (line 43) | def self.get_processing_technique
function process_lua_script (line 47) | def self.process_lua_script(*args)
FILE: lib/predictor/version.rb
type Predictor (line 1) | module Predictor
FILE: spec/input_matrix_spec.rb
function add_two_item_test_data! (line 205) | def add_two_item_test_data!(matrix)
function add_three_item_test_data! (line 213) | def add_three_item_test_data!(matrix)
FILE: spec/spec_helper.rb
function flush_redis! (line 4) | def flush_redis!
class BaseRecommender (line 13) | class BaseRecommender
class UserRecommender (line 17) | class UserRecommender
class TestRecommender (line 21) | class TestRecommender
class PrefixRecommender (line 27) | class PrefixRecommender
method initialize (line 30) | def initialize(prefix)
method prefix= (line 34) | def prefix=(new_prefix)
method get_redis_prefix (line 38) | def get_redis_prefix
class Predictor::TestInputMatrix (line 43) | class Predictor::TestInputMatrix
method initialize (line 44) | def initialize(opts)
method method_missing (line 48) | def method_missing(method, *args)
Condensed preview — 20 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (96K chars).
[
{
"path": ".github/workflows/test.yml",
"chars": 754,
"preview": "name: Test\n\non: [push, pull_request]\n\njobs:\n test:\n\n runs-on: ${{ matrix.os }}\n\n strategy:\n fail-fast: false"
},
{
"path": ".gitignore",
"chars": 37,
"preview": "bin/\n*.gem\nGemfile.lock\next/Makefile\n"
},
{
"path": "Changelog.md",
"chars": 3800,
"preview": "# Predictor Changelog\nAll notable changes to this project will be documented in this file.\n\n## [Unreleased]\n### Changed\n"
},
{
"path": "Gemfile",
"chars": 39,
"preview": "source 'https://rubygems.org'\n\ngemspec\n"
},
{
"path": "LICENSE",
"chars": 1077,
"preview": "The MIT License (MIT)\n\nCopyright (c) 2014 Pathgather\n\nPermission is hereby granted, free of charge, to any person obtain"
},
{
"path": "README.md",
"chars": 16969,
"preview": "# Predictor\n\nFast and efficient recommendations and predictions using Ruby & Redis. Developed by and used at [Pathgather"
},
{
"path": "Rakefile",
"chars": 170,
"preview": "require 'bundler/gem_tasks'\n\nrequire 'rspec/core/rake_task'\nRSpec::Core::RakeTask.new(:spec)\n\ntask :default => :spec\n\nDi"
},
{
"path": "benchmark/process.rb",
"chars": 1161,
"preview": "namespace :benchmark do\n task :process do\n require 'predictor'\n require 'pry'\n require 'logger'\n\n Predictor"
},
{
"path": "docs/READMEv1.md",
"chars": 10685,
"preview": "=======\nPredictor\n=========\n\nFast and efficient recommendations and predictions using Ruby & Redis. Used in production o"
},
{
"path": "lib/predictor/base.rb",
"chars": 9962,
"preview": "module Predictor::Base\n def self.included(base)\n base.extend(ClassMethods)\n end\n\n module ClassMethods\n def inpu"
},
{
"path": "lib/predictor/distance.rb",
"chars": 737,
"preview": "module Predictor\n module Distance\n extend self\n\n def jaccard_index(key_1, key_2, redis = Predictor.redis)\n x"
},
{
"path": "lib/predictor/input_matrix.rb",
"chars": 2421,
"preview": "module Predictor\n class InputMatrix\n def initialize(opts)\n @opts = opts\n end\n\n def measure_name\n @op"
},
{
"path": "lib/predictor/predictor.rb",
"chars": 4532,
"preview": "module Predictor\n @@redis = nil\n @@redis_prefix = nil\n\n def self.redis=(redis)\n @@redis = redis\n end\n\n def self."
},
{
"path": "lib/predictor/version.rb",
"chars": 41,
"preview": "module Predictor\n VERSION = \"2.3.1\"\nend\n"
},
{
"path": "lib/predictor.rb",
"chars": 148,
"preview": "require 'json'\nrequire \"redis\"\nrequire \"predictor/predictor\"\nrequire \"predictor/distance\"\nrequire \"predictor/input_matri"
},
{
"path": "predictor.gemspec",
"chars": 876,
"preview": "# -*- encoding: utf-8 -*-\nrequire File.expand_path('../lib/predictor/version', __FILE__)\n\nGem::Specification.new do |s|\n"
},
{
"path": "spec/base_spec.rb",
"chars": 26629,
"preview": "require 'spec_helper'\n\ndescribe Predictor::Base do\n before(:each) do\n flush_redis!\n BaseRecommender.input_matrice"
},
{
"path": "spec/input_matrix_spec.rb",
"chars": 9721,
"preview": "require 'spec_helper'\n\ndescribe Predictor::InputMatrix do\n let(:options) { @default_options.merge(@options) }\n\n before"
},
{
"path": "spec/predictor_spec.rb",
"chars": 346,
"preview": "require 'spec_helper'\n\ndescribe Predictor do\n\n it \"should store a redis connection\" do\n Predictor.redis = \"asd\"\n "
},
{
"path": "spec/spec_helper.rb",
"chars": 755,
"preview": "require \"predictor\"\nrequire \"pry\"\n\ndef flush_redis!\n Predictor.redis = Redis.new\n Predictor.redis.keys(\"predictor-test"
}
]
About this extraction
This page contains the full source code of the Pathgather/predictor GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 20 files (88.7 KB), approximately 24.4k tokens, and a symbol index with 86 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.