Repository: Pathgather/predictor Branch: master Commit: be866b424119 Files: 20 Total size: 88.7 KB Directory structure: gitextract_u972f2ab/ ├── .github/ │ └── workflows/ │ └── test.yml ├── .gitignore ├── Changelog.md ├── Gemfile ├── LICENSE ├── README.md ├── Rakefile ├── benchmark/ │ └── process.rb ├── docs/ │ └── READMEv1.md ├── lib/ │ ├── predictor/ │ │ ├── base.rb │ │ ├── distance.rb │ │ ├── input_matrix.rb │ │ ├── predictor.rb │ │ └── version.rb │ └── predictor.rb ├── predictor.gemspec └── spec/ ├── base_spec.rb ├── input_matrix_spec.rb ├── predictor_spec.rb └── spec_helper.rb ================================================ FILE CONTENTS ================================================ ================================================ FILE: .github/workflows/test.yml ================================================ name: Test on: [push, pull_request] jobs: test: runs-on: ${{ matrix.os }} strategy: fail-fast: false matrix: os: [ubuntu-18.04, ubuntu-20.04] ruby: [2.6, 2.7, 3.0] services: redis: image: redis options: >- --health-cmd "redis-cli ping" --health-interval 10s --health-timeout 5s --health-retries 5 ports: - 6379:6379 steps: - uses: actions/checkout@v2 - name: Set up Ruby ${{ matrix.ruby }} uses: ruby/setup-ruby@v1 with: bundler-cache: true ruby-version: ${{ matrix.ruby }} - name: Install dependencies run: bundle install - name: Run tests run: bundle exec rake ================================================ FILE: .gitignore ================================================ bin/ *.gem Gemfile.lock ext/Makefile ================================================ FILE: Changelog.md ================================================ # Predictor Changelog All notable changes to this project will be documented in this file. ## [Unreleased] ### Changed - Support rake version 11.0 or higher&rspec version 3.4.0 or higher - Fix title of README - Change a test with github actions - Made it possible to run tests on ubuntu-18.04 and ubuntu-20.04 - Fix the homepage entry in predictor.gemspec ### **BREAKING CHANGES** - Ruby 2.1 ~ 2.5 will no longer be supported because of eol ## [2.3.0] - 2014-09-06 - The logic for processing item similarities was ported to a Lua script. Use `Predictor.processing_technique(:lua)` to use the Lua script for all similarity calculations, or use `MyRecommender.processing_technique(:lua)` to use it for specific recommenders. It is substantially faster than the default (old) Ruby mechanism, but has the disadvantage of blocking the Redis server while it runs. - An alternate method of calculating item similarities was added, which uses a ZUNIONSTORE across item sets. The results are similar to those achieved by using the Ruby or Lua scripts, but faster. Use `Predictor.processing_technique(:union)` to use the ZUNIONSTORE technique for all similarity calculations, or use `MyRecommender.processing_technique(:union)` to use it for specific recommenders. ## [2.2.0] - 2014-06-24 - The namespace used for keys in Redis is now configurable on a global or per-class basis. See the readme for more information. If you were overriding the redis_prefix instance method before, it is recommended that you use the new redis_prefix class method instead. - Data stored in Redis is now namespaced by the class name of the recommender it is stored by. This change ensures that different recommenders with input matrices of the same name don't overwrite each others' data. After upgrading you'll need to either reindex your data in Redis or configure Predictor to use the naming system you were using before. If you were using the defaults before and you're not worried about matrix name collisions, you can mimic the old behavior with: ```ruby class MyRecommender include Predictor::Base redis_prefix [nil] end ``` - The #predictions_for method on recommenders now accepts a :boost option to give more weight to items with particular attributes. See the readme for more information. ## [2.1.0] - 2014-06-19 - The similarity limit now defaults to 128, instead of being unlimited. This is intended to save space in Redis. See the Readme for more information. It is strongly recommended that you run `ensure_similarity_limit_is_obeyed!` to shrink existing similarity sets. ## [2.0.0] - 2014-04-17 **Rewrite of 1.0.0 and contains several breaking changes!** Version 1.0.0 (which really should have been 0.0.1) contained several issues that made compatability with v2 not worth the trouble. This includes: - In v1, similarities were cached per input_matrix, and Predictor::Base utilized those caches when determining similarities and predictions. This quickly ate up Redis memory with even a semi-large dataset, as each input_matrix had a significant memory requirement. v2 caches similarities at the root (Recommender::Base), which means you can add any number of input matrices with little impact on memory usage. - Added the ability to limit the number of items stored in the similarity cache (via the 'limit_similarities_to' option). Now that similarities are cached at the root, this is possible and can greatly help memory usage. - Removed bang methods from input_matrix (add_set!, and_single!, etc). These called process! for you previously, but since the cache is no longer kept at the input_matrix level, process! has to be called at the root (Recommender::Base) - Bug fix: Fixed bug where a call to delete_item! on the input matrix didn't update the similarity cache. - Other minor fixes. ================================================ FILE: Gemfile ================================================ source 'https://rubygems.org' gemspec ================================================ FILE: LICENSE ================================================ The MIT License (MIT) Copyright (c) 2014 Pathgather Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: README.md ================================================ # Predictor Fast and efficient recommendations and predictions using Ruby & Redis. Developed by and used at [Pathgather](http://pathgather.com) to generate course similarities and content recommendations to users. ![Test](https://github.com/nyagato-00/predictor/workflows/Test/badge.svg?branch=master) Originally forked and based on [Recommendify](https://github.com/paulasmuth/recommendify) by Paul Asmuth, so a huge thanks to him for his contributions to Recommendify. Predictor has been almost completely rewritten to * Be much, much more performant and efficient by using Redis for most logic. * Provide item similarities such as "Users that read this book also read ..." * Provide personalized predictions based on a user's past history, such as "You read these 10 books, so you might also like to read ..." At the moment, Predictor uses the [Jaccard index](http://en.wikipedia.org/wiki/Jaccard_index) or the [Sorenson-Dice coefficient](http://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient) (default is Jaccard) to determine similarities between items. There are other ways to do this, which we intend to implement eventually, but if you want to beat us to the punch, pull requests are quite welcome :) Notice --------------------- This is the readme for Predictor 2.0, which contains a few breaking changes from 1.0. The 1.0 readme can be found [here](https://github.com/Pathgather/predictor/blob/master/docs/READMEv1.md). See below on how to upgrade to 2.0 Installation --------------------- In your Gemfile: ```ruby gem 'predictor' ``` Getting Started --------------------- First step is to configure Predictor with your Redis instance. ```ruby # in config/initializers/predictor.rb Predictor.redis = Redis.new(:url => ENV["PREDICTOR_REDIS"]) # Or, to improve performance, add hiredis as your driver (you'll need to install the hiredis gem first) Predictor.redis = Redis.new(:url => ENV["PREDICTOR_REDIS"], :driver => :hiredis) ``` Inputting Data --------------------- Create a class and include the Predictor::Base module. Define an input_matrix for each relationship you'd like to keep track of. This can be anything you think is a significant metric for the item: page views, purchases, categories the item belongs to, etc. Below, we're building a recommender to recommend courses based off of: * Users that have taken a course. If 2 courses were taken by the same user, this is 3 times as important to us than if the courses share the same topic. This will lead to sets like: * "user1" -> "course-1", "course-3", * "user2" -> "course-1", "course-4" * Tags and their courses. This will lead to sets like: * "rails" -> "course-1", "course-2", * "microeconomics" -> "course-3", "course-4" * Topics and their courses. This will lead to sets like: * "computer science" -> "course-1", "course-2", * "economics and finance" -> "course-3", "course-4" ```ruby class CourseRecommender include Predictor::Base input_matrix :users, weight: 3.0 input_matrix :tags, weight: 2.0 input_matrix :topics, weight: 1.0, measure: :sorensen_coefficient # Use Sorenson over Jaccard end ``` Now, we just need to update our matrices when courses are created, users take a course, topics are changed, etc: ```ruby recommender = CourseRecommender.new # Add a single course to topic-1's items. If topic-1 already exists as a set ID, this just adds course-1 to the set recommender.add_to_matrix!(:topics, "topic-1", "course-1") # If your dataset is even remotely large, add_to_matrix! could take some time, as it must calculate the similarity scores # for course-1 and other courses that share a set with course-1. If this is the case, use add_to_matrix and # process the items at a more convenient time, perhaps in a background job recommender.topics.add_to_set("topic-1", "course-1", "course-2") # Same as recommender.add_to_matrix(:topics, "topic-1", "course-1", "course-2") recommender.process_items!("course-1", "course-2") ``` As noted above, it's important to remember that if you don't use the bang method 'add_to_matrix!', you'll need to manually update your similarities. If your dataset is even remotely large, you'll probably want to do this: * If you want to update the similarities for certain item(s): ```` recommender.process_items!(item1, item2, etc) ```` * If you want to update all similarities for all items: ```` recommender.process! ```` Retrieving Similarities and Recommendations --------------------- Now that your matrices have been initialized with several relationships, you can start generating similarities and recommendations! First, let's start with similarities, which will use the weights we specify on each matrix to determine which courses share the most in common with a given course. ```ruby recommender = CourseRecommender.new # Return all similarities for course-1 (ordered by most similar to least). recommender.similarities_for("course-1") # Need to paginate? Not a problem! Specify an offset and a limit recommender.similarities_for("course-1", offset: 10, limit: 10) # Gets similarities 11-20 # Want scores? recommender.similarities_for("course-1", with_scores: true) # Want to ignore a certain set of courses in similarities? recommender.similarities_for("course-1", exclusion_set: ["course-2"]) ``` The above examples are great for situations like "Users that viewed this also liked ...", but what if you wanted to recommend courses to a user based on the courses they've already taken? Not a problem! ```ruby recommender = CourseRecommender.new # User has taken course-1 and course-2. Let's see what else they might like... recommender.predictions_for(item_set: ["course-1", "course-2"]) # Already have the set you need stored in an input matrix? In our case, we do (the users matrix stores the courses a user has taken), so we can just do: recommender.predictions_for("user-1", matrix_label: :users) # Paginate too! recommender.predictions_for("user-1", matrix_label: :users, offset: 10, limit: 10) # Gimme some scores and ignore course-2....that course-2 is one sketchy fella recommender.predictions_for("user-1", matrix_label: :users, with_scores: true, exclusion_set: ["course-2"]) ``` Deleting Items --------------------- If your data is deleted from your persistent storage, you certainly don't want to recommend it to a user. To ensure that doesn't happen, simply call delete_from_matrix! with the individual matrix or delete_item! if the item is completely gone: ```ruby recommender = CourseRecommender.new # User removed course-1 from topic-1, but course-1 still exists recommender.delete_pair_from_matrix!(:topics, "topic-1", "course-1") #User removed course-1 from all topics recommender.delete_from_matrix!(:topics, "course-1") # course-1 was permanently deleted recommender.delete_item!("course-1") # Something crazy has happened, so let's just start fresh and wipe out all previously stored similarities: recommender.clean! ``` Limiting Similarities --------------------- By default, Predictor caches 128 similarities for each item. This is because this is the maximum size for the similarity sorted sets to be kept in a [memory-efficient format](http://redis.io/topics/memory-optimization). If you want to keep more similarities than that, and you don't mind using more memory, you may want to increase the similarity limit, like so: ```ruby class CourseRecommender include Predictor::Base limit_similarities_to 500 input_matrix :users, weight: 3.0 input_matrix :tags, weight: 2.0 input_matrix :topics, weight: 1.0 end ``` The memory penalty can be heavy, though. In our testing, similarity caches for 1,000 objects varied in size like so: ``` limit_similarities_to(128) # 8.5 MB (this is the default) limit_similarities_to(129) # 22.74 MB limit_similarities_to(500) # 76.72 MB ``` If you decide you need to store more than 128 similarities, you may want to see the Redis documentation linked above and consider increasing `zset-max-ziplist-entries` in your configuration. Predictions fetched with the predictions_for call utilizes the similarity caches, so if you're using predictions_for, make sure you set the limit high enough so that intelligent predictions can be generated. If you aren't using predictions and are just using similarities, then feel free to set this to the maximum number of similarities you'd possibly want to show! You can also use `limit_similarities_to(nil)` to remove the limit entirely. This means if you have 10,000 items, and each item is somehow related to the other, you'll have 10,000 sets each with 9,999 items, which will run up your Redis bill quite quickly. Removing the limit is not recommended unless you're sure you know what you're doing. If at some point you decide to lower your similarity limits, you'll want to be sure to shrink the size of the sorted sets already in Redis. You can do this with `CourseRecommender.new.ensure_similarity_limit_is_obeyed!`. Boost --------------------- What if you want to recommend courses to users based not only on what courses they've taken, but on other attributes of courses that they may be interested in? You can do that by passing the :boost argument to predictions_for: ```ruby class CourseRecommender include Predictor::Base # Courses are compared to one another by the users taking them and their tags. input_matrix :users, weight: 3.0 input_matrix :tags, weight: 2.0 input_matrix :topics, weight: 2.0 end recommender = CourseRecommender.new # We want to find recommendations for Billy, who's told us that he's # especially interested in free, interactive courses on Photoshop. So, we give # a boost to courses that are tagged as free and interactive and have # Photoshop as a topic: recommender.predictions_for("Billy", matrix_label: :users, boost: {tags: ['free', 'interactive'], topics: ["Photoshop"]}) # We can also modify how much these tags and topics matter by specifying a # weight. The default is 1.0, but if that's too much we can just tweak it: recommender.predictions_for("Billy", matrix_label: :users, boost: {tags: {values: ['free', 'interactive'], weight: 0.4}, topics: {values: ["Photoshop"], weight: 0.3}}) ``` Key Prefixes --------------------- As of 2.2.0, there is much more control available over the format of the keys Predictor will use in Redis. By default, the CourseRecommender given as an example above will use keys like "predictor:CourseRecommender:users:items:user1". You can configure the global namespace like so: ```ruby Predictor.redis_prefix 'my_namespace' # => "my_namespace:CourseRecommender:users:items:user1" # Or, for a multitenanted setup: Predictor.redis_prefix { "user-#{User.current.id}" } # => "user-7:CourseRecommender:users:items:user1" ``` You can also configure the namespace used by each class you create: ```ruby class CourseRecommender include Predictor::Base redis_prefix "courses" # => "predictor:courses:users:items:user1" redis_prefix { "courses_for_user-#{User.current.id}" } # => "predictor:courses_for_user-7:users:items:user1" end ``` You can also configure the namespace used by each instance you create in addition to class and global namespace: ```ruby class CourseRecommender include Predictor::Base def initialize(prefix) @prefix = prefix end # Simply override this instance method with the prefix you want def get_redis_prefix @prefix end end recommender = CourseRecommender.new("super") recommender.redis_prefix # "predictor:CourseRecommender:super" ``` Processing Items --------------------- As of 2.3.0, there are now multiple techniques available for processing item similarities. You can choose between them by setting a global default like `Predictor.processing_technique(:lua)` or setting a technique for certain classes like `CourseRecommender.processing_technique(:union)`. There are three values. - :ruby - This is the default, and is how Predictor calculated similarities before 2.3.0. With this technique the Jaccard and Sorensen calculations are performed in Ruby, with frequent calls to Redis to retrieve simple values. It is somewhat slow. - :lua - This option performs the Jaccard and Sorensen calculations in a Lua script on the Redis server. It is substantially faster than the :ruby technique, but blocks the Redis server while each set of calculations are run. The period of blocking will vary based on the size and disposition of your data, but each call may take up to several hundred milliseconds. If your application requires your Redis server to always return results quickly, and you're not able to simply run calculations during off-hours, you should use a different strategy. - :union - This option skips Jaccard and Sorensen entirely, and uses a simpler technique involving a ZUNIONSTORE across many item sets to calculate similarities. The results are different from, but similar to the results of using the Jaccard and Sorensen algorithms. It is even faster than the :lua option and does not have the same problem of blocking Redis for long periods of time, but before using it you should sample the output to ensure that it is good enough for your application. Predictor now contains a benchmarking script that you can use to compare the speed of these options. An example output from the processing of a relatively small dataset is: ``` ruby = 21.098 seconds lua = 2.106 seconds union = 0.741 seconds ``` Upgrading from 1.0 to 2.0 --------------------- As mentioned, 2.0.0 is quite a bit different than 1.0.0, so simply upgrading with no changes likely won't work. My apologies for this. I promise this won't happen in future releases, as I'm much more confident in this Predictor release than the last. Anywho, upgrading really shouldn't be that much of a pain if you follow these steps: * Change predictor.matrix.add_set! and predictor.matrix.add_single! calls to predictor.add_to_matrix!. For example: ```ruby # Change predictor.topics.add_single!("topic-1", "course-1") # to predictor.add_to_matrix!(:topics, "topic-1", "course-1") # Change predictor.tags.add_set!("tag-1", ["course-1", "course-2"]) # to predictor.add_to_matrix!(:tags, "tag-1", "course-1", "course-2") ``` * Change predictor.matrix.process! or predictor.matrix.process_item! calls to just predictor.process! or predictor.process_items! ```ruby # Change predictor.topics.process_item!("course-1") # to predictor.process_items!("course-1") ``` * Change predictor.matrix.delete_item! calls to predictor.delete_from_matrix!. This will update similarities too, so you may want to queue this to run in a background job. ```ruby # Change predictor.topics.delete_item!("course-1") # to delete_from_matrix! if you want to update similarities to account for the deleted item (in v1, this was a bug and didn't occur) predictor.delete_from_matrix!(:topics, "course-1") ``` * Regenerate your recommendations, as redis keys have changed for Predictor 2. You can use the recommender.clean! to clear out old similarities, then run your rake task (or whatever you've setup) to create new similarities. About Pathgather --------------------- Pathgather is an NYC-based startup building a platform that dramatically accelerates learning for enterprises by bringing employees, training content, and existing enterprise systems into one engaging platform. Every Friday, we work on open-source software (our own or other projects). Want to join our always growing team? Peruse our [current opportunities](http://www.pathgather.com/jobs/) or reach out to us at ! Problems? Issues? Want to help out? --------------------- Just submit a Gihub issue or pull request! We'd love to have you help out, as the most common library to use for this need, Recommendify, was last updated 2 years ago. We'll be sure to keep this maintained, but we could certainly use your help! The MIT License (MIT) --------------------- Copyright (c) 2014 Pathgather Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: Rakefile ================================================ require 'bundler/gem_tasks' require 'rspec/core/rake_task' RSpec::Core::RakeTask.new(:spec) task :default => :spec Dir["./benchmark/*.rb"].sort.each &method(:require) ================================================ FILE: benchmark/process.rb ================================================ namespace :benchmark do task :process do require 'predictor' require 'pry' require 'logger' Predictor.redis = Redis.new #logger: Logger.new(STDOUT) Predictor.redis_prefix "predictor-benchmark" def flush! keys = Predictor.redis.keys("predictor-benchmark*") Predictor.redis.del(keys) if keys.any? end class ItemRecommender include Predictor::Base input_matrix :users, weight: 2.0 input_matrix :parts, weight: 1.0 end flush! items = (1..200).map { |i| "item-#{i}" } users = (1..100).map { |i| "user-#{i}" } parts = (1..100).map { |i| "part-#{i}" } r = ItemRecommender.new start = Time.now users.each { |user| r.users.add_to_set user, *items.sample(40) } parts.each { |part| r.parts.add_to_set part, *items.sample(40) } elapsed = Time.now - start puts "add_to_set = #{elapsed.round(3)} seconds" [:ruby, :lua, :union].each do |technique| start = Time.now Predictor.processing_technique technique r.process! elapsed = Time.now - start puts "#{technique} = #{elapsed.round(3)} seconds" end flush! end end ================================================ FILE: docs/READMEv1.md ================================================ ======= Predictor ========= Fast and efficient recommendations and predictions using Ruby & Redis. Used in production over at [Pathgather](http://pathgather.com) to generate course similarities and content recommendations to users. ![](https://www.codeship.io/projects/5aeeedf0-6053-0131-2319-5ede98f174ff/status) Originally forked and based on [Recommendify](https://github.com/paulasmuth/recommendify) by Paul Asmuth, so a huge thanks to him for his contributions to Recommendify. Predictor has been almost completely rewritten to * Be much, much more performant and efficient by using Redis for most logic. * Provide item similarities such as "Users that read this book also read ..." * Provide personalized predictions based on a user's past history, such as "You read these 10 books, so you might also like to read ..." At the moment, Predictor uses the [Jaccard index](http://en.wikipedia.org/wiki/Jaccard_index) to determine similarities between items. There are other ways to do this, which we intend to implement eventually, but if you want to beat us to the punch, pull requests are quite welcome :) Installation --------------------- ```ruby gem install predictor ```` or in your Gemfile: ```` gem 'predictor' ``` Getting Started --------------------- First step is to configure Predictor with your Redis instance. ```ruby # in config/initializers/predictor.rb Predictor.redis = Redis.new(:url => ENV["PREDICTOR_REDIS"]) # Or, to improve performance, add hiredis as your driver (you'll need to install the hiredis gem first) Predictor.redis = Redis.new(:url => ENV["PREDICTOR_REDIS"], :driver => :hiredis) ``` Inputting Data --------------------- Create a class and include the Predictor::Base module. Define an input_matrix for each relationship you'd like to keep track of. This can be anything you think is a significant metric for the item: page views, purchases, categories the item belongs to, etc. Below, we're building a recommender to recommend courses based off of: * Users that have taken a course. If 2 courses were taken by the same user, this is 3 times as important to us than if the courses share the same topic. This will lead to sets like: * "user1" -> "course-1", "course-3", * "user2" -> "course-1", "course-4" * Tags and their courses. This will lead to sets like: * "rails" -> "course-1", "course-2", * "microeconomics" -> "course-3", "course-4" * Topics and their courses. This will lead to sets like: * "computer science" -> "course-1", "course-2", * "economics and finance" -> "course-3", "course-4" ```ruby class CourseRecommender include Predictor::Base input_matrix :users, weight: 3.0 input_matrix :tags, weight: 2.0 input_matrix :topics, weight: 1.0 end ``` Now, we just need to update our matrices when courses are created, users take a course, topics are changed, etc: ```ruby recommender = CourseRecommender.new # Add a single course to topic-1's items. If topic-1 already exists as a set ID, this just adds course-1 to the set recommender.topics.add_single!("topic-1", "course-1") # If your matrix is quite large, add_single! could take some time, as it must calculate the similarity scores # for course-1 across all other courses. If this is the case, use add_single and process the item at a more # convenient time, perhaps in a background job recommender.topics.add_single("topic-1", "course-1") recommender.topics.process_item!("course-1") # Add an array of courses to tag-1. Again, these will simply be added to tag-1's existing set, if it exists. # If not, the tag-1 set will be initialized with course-1 and course-2 recommender.tags.add_set!("tag-1", ["course-1", "course-2"]) # Or, just add the set and process whenever you like recommender.tags.add_set("tag-1", ["course-1", "course-2"]) ["course-1", "course-2"].each { |course| recommender.topics.process_item!(course) } ``` As noted above, it's important to remember that if you don't use the bang methods (add_set! and add_single!), you'll need to manually update your similarities (the bang methods will likely suffice for most use cases though). You can do so a variety of ways. * If you want to simply update the similarities for a single item in a specific matrix: ```` recommender.matrix.process_item!(item) ```` * If you want to update the similarities for all items in a specific matrix: ```` recommender.matrix.process! ```` * If you want to update the similarities for a single item in all matrices: ```` recommender.process_item!(item) ```` * If you want to update all similarities in all matrices: ```` recommender.process! ```` Retrieving Similarities and Recommendations --------------------- Now that your matrices have been initialized with several relationships, you can start generating similarities and recommendations! First, let's start with similarities, which will use the weights we specify on each matrix to determine which courses share the most in common with a given course. ![Course Alternative](http://pathgather.github.io/predictor/images/course-alts.png) ```ruby recommender = CourseRecommender.new # Return all similarities for course-1 (ordered by most similar to least). recommender.similarities_for("course-1") # Need to paginate? Not a problem! Specify an offset and a limit recommender.similarities_for("course-1", offset: 10, limit: 10) # Gets similarities 11-20 # Want scores? recommender.similarities_for("course-1", with_scores: true) # Want to ignore a certain set of courses in similarities? recommender.similarities_for("course-1", exclusion_set: ["course-2"]) ``` The above examples are great for situations like "Users that viewed this also liked ...", but what if you wanted to recommend courses to a user based on the courses they've already taken? Not a problem! ![Course Recommendations](http://pathgather.github.io/predictor/images/suggested.png) ```ruby recommender = CourseRecommender.new # User has taken course-1 and course-2. Let's see what else they might like... recommender.predictions_for(item_set: ["course-1", "course-2"]) # Already have the set you need stored in an input matrix? In our case, we do (the users matrix stores the courses a user has taken), so we can just do: recommender.predictions_for("user-1", matrix_label: :users) # Paginate too! recommender.predictions_for("user-1", matrix_label: :users, offset: 10, limit: 10) # Gimme some scores and ignore user-2....that user-2 is one sketchy fella recommender.predictions_for("user-1", matrix_label: :users, with_scores: true, exclusion_set: ["user-2"]) ``` Deleting Items --------------------- If your data is deleted from your persistent storage, you certainly don't want to recommend that data to a user. To ensure that doesn't happen, simply call delete_item! on the individual matrix or recommender as a whole: ```ruby recommender = CourseRecommender.new # User removed course-1 from topic-1, but course-1 still exists recommender.topics.delete_item!("course-1") # course-1 was permanently deleted recommender.delete_item!("course-1") # Something crazy has happened, so let's just start fresh and wipe out all previously stored similarities: recommender.clean! ``` Memory Management --------------------- Predictor works by caching the similarities for each item in each matrix, then computing overall similarities off those caches. With an even semi-large dataset, this can really eat up Redis's memory. To limit the number of similarities cached in each matrix, specify a similarity_limit option when defining the matrix. ```ruby class CourseRecommender include Predictor::Base input_matrix :users, weight: 3.0, similarity_limit: 300 input_matrix :tags, weight: 2.0, similarity_limit: 300 input_matrix :topics, weight: 1.0, similarity_limit: 300 end ``` This will ensure that only the top 300 similarities for each item are cached in each matrix. This can greatly reduce your memory usage, and if you're just using Predictor for scenarios where you maybe show the top 5 or so similar items, then this can be hugely helpful. But note, **don't set similarity_limit to 5 in that case**. This simply limits the similarities cached in each matrix, but does not limit the similarities for an item across all matrices. That is computed (and can be limited) on the fly, and uses the similarity cache in each matrix. So, you need a large enough cache in each matrix to determine an intelligent similarity list across all matrices. *Note*: This is a bit of a hack, and there are most certainly other ways to improve Predictor's memory usage for large datasets, but each appear to require a more significant change than the trivial implementation of similarity_limit above. PRs are quite welcome that experiment with these other ways :) Oh, and if you decide to tinker with your limit to try and find a sweet spot, I added a helpful method to ensure limits are obeyed to avoid regenerating all similarities. Of course, this only helps if you are decreasing the limit. If you're increasing it, you'll need to process similarities all over. ```ruby recommender.users.ensure_similarity_limit_is_obeyed! # Remove similarities that disobey our current limit recommender.tags.ensure_similarity_limit_is_obeyed! recommender.topics.ensure_similarity_limit_is_obeyed! ``` Problems? Issues? Want to help out? --------------------- Just submit a Gihub issue or pull request! We'd love to have you help out, as the most common library to use for this need, Recommendify, was last updated 2 years ago. We'll be sure to keep this maintained, but we could certainly use your help! The MIT License (MIT) --------------------- Copyright (c) 2014 Pathgather Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: lib/predictor/base.rb ================================================ module Predictor::Base def self.included(base) base.extend(ClassMethods) end module ClassMethods def input_matrix(key, opts={}) @matrices ||= {} @matrices[key] = opts end def limit_similarities_to(val) @similarity_limit_set = true @similarity_limit = val end def similarity_limit @similarity_limit_set ? @similarity_limit : 128 end def reset_similarity_limit! @similarity_limit_set = nil @similarity_limit = nil end def input_matrices=(val) @matrices = val end def input_matrices @matrices end def redis_prefix(prefix = nil, &block) @redis_prefix = block_given? ? block : prefix end def get_redis_prefix if @redis_prefix if @redis_prefix.respond_to?(:call) @redis_prefix.call else @redis_prefix end else to_s end end def processing_technique(technique) @technique = technique end def get_processing_technique @technique || Predictor.get_processing_technique end end def input_matrices @input_matrices ||= Hash[self.class.input_matrices.map{ |key, opts| opts.merge!(:key => key, :base => self) [ key, Predictor::InputMatrix.new(opts) ] }] end def get_redis_prefix nil # Override in subclass. end def redis_prefix [Predictor.get_redis_prefix, self.class.get_redis_prefix, self.get_redis_prefix].compact end def similarity_limit self.class.similarity_limit end def redis_key(*append) ([redis_prefix] + append).flatten.compact.join(":") end def method_missing(method, *args) if input_matrices.has_key?(method) input_matrices[method] else raise NoMethodError.new(method.to_s) end end def respond_to?(method, include_all = false) input_matrices.has_key?(method) ? true : super end def all_items Predictor.redis.smembers(redis_key(:all_items)) end def add_to_matrix(matrix, set, *items) items = items.flatten if items.count == 1 && items[0].is_a?(Array) # Old syntax input_matrices[matrix].add_to_set(set, *items) end def add_to_matrix!(matrix, set, *items) items = items.flatten if items.count == 1 && items[0].is_a?(Array) # Old syntax add_to_matrix(matrix, set, *items) process_items!(*items) end def related_items(item) keys = [] input_matrices.each do |key, matrix| sets = Predictor.redis.smembers(matrix.redis_key(:sets, item)) keys.concat(sets.map { |set| matrix.redis_key(:items, set) }) end keys.empty? ? [] : (Predictor.redis.sunion(keys) - [item.to_s]) end def predictions_for(set=nil, item_set: nil, matrix_label: nil, with_scores: false, on: nil, offset: 0, limit: -1, exclusion_set: [], boost: {}) fail "item_set or matrix_label and set is required" unless item_set || (matrix_label && set) on = Array(on) if matrix_label matrix = input_matrices[matrix_label] item_set = Predictor.redis.smembers(matrix.redis_key(:items, set)) end item_keys = [] weights = [] item_set.each do |item| item_keys << redis_key(:similarities, item) weights << 1.0 end boost.each do |matrix_label, values| m = input_matrices[matrix_label] # Passing plain sets to zunionstore is undocumented, but tested and supported: # https://github.com/antirez/redis/blob/2.8.11/tests/unit/type/zset.tcl#L481-L489 case values when Hash values[:values].each do |value| item_keys << m.redis_key(:items, value) weights << values[:weight] end when Array values.each do |value| item_keys << m.redis_key(:items, value) weights << 1.0 end else raise "Bad value for boost: #{boost.inspect}" end end return [] if item_keys.empty? predictions = nil Predictor.redis.multi do |multi| multi.zunionstore 'temp', item_keys, weights: weights multi.zrem 'temp', item_set if item_set.any? multi.zrem 'temp', exclusion_set if exclusion_set.length > 0 if on.any? multi.zadd 'temp2', on.map{ |val| [0.0, val] } multi.zinterstore 'temp', ['temp', 'temp2'] multi.del 'temp2' end predictions = multi.zrevrange 'temp', offset, limit == -1 ? limit : offset + (limit - 1), with_scores: with_scores multi.del 'temp' end predictions.value end def similarities_for(item, with_scores: false, offset: 0, limit: -1, exclusion_set: []) neighbors = nil Predictor.redis.multi do |multi| multi.zunionstore 'temp', [1, redis_key(:similarities, item)] multi.zrem 'temp', exclusion_set if exclusion_set.length > 0 neighbors = multi.zrevrange('temp', offset, limit == -1 ? limit : offset + (limit - 1), with_scores: with_scores) multi.del 'temp' end return neighbors.value end def sets_for(item) keys = input_matrices.map{ |k,m| m.redis_key(:sets, item) } Predictor.redis.sunion keys end def process_item!(item) process_items!(item) # Old method end def process_items!(*items) items = items.flatten if items.count == 1 && items[0].is_a?(Array) # Old syntax case self.class.get_processing_technique when :lua matrix_data = {} input_matrices.each do |name, matrix| matrix_data[name] = {weight: matrix.weight, measure: matrix.measure_name} end matrix_json = JSON.dump(matrix_data) items.each do |item| Predictor.process_lua_script(redis_key, matrix_json, similarity_limit, item) end when :union items.each do |item| keys = [] weights = [] input_matrices.each do |key, matrix| k = matrix.redis_key(:sets, item) item_keys = Predictor.redis.smembers(k).map { |set| matrix.redis_key(:items, set) } counts = Predictor.redis.multi do |multi| item_keys.each { |key| Predictor.redis.scard(key) } end item_keys.zip(counts).each do |key, count| unless count.zero? keys << key weights << matrix.weight / count end end end Predictor.redis.multi do |multi| key = redis_key(:similarities, item) multi.del(key) if keys.any? multi.zunionstore(key, keys, weights: weights) multi.zrem(key, item) multi.zremrangebyrank(key, 0, -(similarity_limit + 1)) multi.zunionstore key, [key] # Rewrite zset for optimized storage. end end end else # Default to old behavior, processing things in Ruby. items.each do |item| related_items(item).each { |related_item| cache_similarity(item, related_item) } end end return self end def process! process_items!(*all_items) return self end def delete_from_matrix!(matrix, item) # Deleting from a specific matrix, so get related_items, delete, then update the similarity of those related_items items = related_items(item) input_matrices[matrix].delete_item(item) items.each { |related_item| cache_similarity(item, related_item) } return self end def delete_pair_from_matrix!(matrix, set, item) items = related_items(item) input_matrices[matrix].remove_from_set(set, item) items.each { |related_item| cache_similarity(item, related_item) } return self end def add_item(item) Predictor.redis.sadd(redis_key(:all_items), item) end def delete_item!(item) Predictor.redis.srem(redis_key(:all_items), item) Predictor.redis.watch(redis_key(:similarities, item)) do items = related_items(item) Predictor.redis.multi do |multi| items.each do |related_item| multi.zrem(redis_key(:similarities, related_item), item) end multi.del redis_key(:similarities, item) end end input_matrices.each do |k,m| m.delete_item(item) end return self end def clean! keys = Predictor.redis.keys(redis_key('*')) unless keys.empty? Predictor.redis.del(keys) end end def ensure_similarity_limit_is_obeyed! if similarity_limit items = all_items Predictor.redis.multi do |multi| items.each do |item| key = redis_key(:similarities, item) multi.zremrangebyrank(key, 0, -(similarity_limit + 1)) multi.zunionstore key, [key] # Rewrite zset to take advantage of ziplist implementation. end end end end private def cache_similarity(item1, item2) score = 0 input_matrices.each do |key, matrix| score += (matrix.score(item1, item2) * matrix.weight) end if score > 0 add_similarity_if_necessary(item1, item2, score) add_similarity_if_necessary(item2, item1, score) else Predictor.redis.multi do |multi| multi.zrem(redis_key(:similarities, item1), item2) multi.zrem(redis_key(:similarities, item2), item1) end end end def add_similarity_if_necessary(item, similarity, score) store = true key = redis_key(:similarities, item) if similarity_limit if Predictor.redis.zrank(key, similarity).nil? && Predictor.redis.zcard(key) >= similarity_limit # Similarity is not already stored and we are at limit of similarities lowest_scored_item = Predictor.redis.zrangebyscore(key, "0", "+inf", limit: [0, 1], with_scores: true) unless lowest_scored_item.empty? # If score is less than or equal to the lowest score, don't store it. Otherwise, make room by removing the lowest scored similarity score <= lowest_scored_item[0][1] ? store = false : Predictor.redis.zrem(key, lowest_scored_item[0][0]) end end end Predictor.redis.zadd(key, score, similarity) if store end end ================================================ FILE: lib/predictor/distance.rb ================================================ module Predictor module Distance extend self def jaccard_index(key_1, key_2, redis = Predictor.redis) x, y = nil redis.multi do |multi| x = multi.sinterstore 'temp', [key_1, key_2] y = multi.sunionstore 'temp', [key_1, key_2] multi.del 'temp' end y.value > 0 ? (x.value.to_f/y.value.to_f) : 0.0 end def sorensen_coefficient(key_1, key_2, redis = Predictor.redis) x, y, z = nil redis.multi do |multi| x = multi.sinterstore 'temp', [key_1, key_2] y = multi.scard key_1 z = multi.scard key_2 multi.del 'temp' end denom = (y.value + z.value) denom > 0 ? (2 * (x.value) / denom.to_f) : 0.0 end end end ================================================ FILE: lib/predictor/input_matrix.rb ================================================ module Predictor class InputMatrix def initialize(opts) @opts = opts end def measure_name @opts.fetch(:measure, :jaccard_index) end def base @opts[:base] end def parent_redis_key(*append) base.redis_key(*append) end def redis_key(*append) base.redis_key(@opts.fetch(:key), *append) end def weight (@opts[:weight] || 1).to_f end def add_to_set(set, *items) items = items.flatten if items.count == 1 && items[0].is_a?(Array) if items.any? Predictor.redis.multi do |redis| redis.sadd(parent_redis_key(:all_items), items) redis.sadd(redis_key(:items, set), items) items.each do |item| # add the set to the item's set--inverting the sets redis.sadd(redis_key(:sets, item), set) end end end end # Delete a specific relationship def remove_from_set(set, item) Predictor.redis.multi do |redis| redis.srem(redis_key(:items, set), item) redis.srem(redis_key(:sets, item), set) end end def add_set(set, items) add_to_set(set, *items) end def add_single(set, item) add_to_set(set, item) end def items_for(set) Predictor.redis.smembers redis_key(:items, set) end def sets_for(item) Predictor.redis.sunion redis_key(:sets, item) end def related_items(item) sets = Predictor.redis.smembers(redis_key(:sets, item)) keys = sets.map { |set| redis_key(:items, set) } keys.length > 0 ? Predictor.redis.sunion(keys) - [item.to_s] : [] end # delete item from the matrix def delete_item(item) Predictor.redis.watch(redis_key(:sets, item)) do sets = Predictor.redis.smembers(redis_key(:sets, item)) Predictor.redis.multi do |multi| sets.each do |set| multi.srem(redis_key(:items, set), item) end multi.del redis_key(:sets, item) end end end def score(item1, item2) Distance.send(measure_name, redis_key(:sets, item1), redis_key(:sets, item2), Predictor.redis) end def calculate_jaccard(item1, item2) warn 'InputMatrix#calculate_jaccard is now deprecated. Use InputMatrix#score instead' Distance.jaccard_index(redis_key(:sets, item1), redis_key(:sets, item2), Predictor.redis) end end end ================================================ FILE: lib/predictor/predictor.rb ================================================ module Predictor @@redis = nil @@redis_prefix = nil def self.redis=(redis) @@redis = redis end def self.redis return @@redis unless @@redis.nil? raise "redis not configured! - Predictor.redis = Redis.new" end def self.redis_prefix(prefix = nil, &block) @@redis_prefix = block_given? ? block : prefix end def self.get_redis_prefix if @@redis_prefix if @@redis_prefix.respond_to?(:call) @@redis_prefix.call else @@redis_prefix end else 'predictor' end end def self.capitalize(str_or_sym) str = str_or_sym.to_s.each_char.to_a str.first.upcase + str[1..-1].join("").downcase end def self.constantize(klass) Object.module_eval("Predictor::#{klass}", __FILE__, __LINE__) end def self.processing_technique(algorithm) @technique = algorithm end def self.get_processing_technique @technique || :ruby end def self.process_lua_script(*args) @process_sha ||= redis.script(:load, PROCESS_ITEMS_LUA_SCRIPT) redis.evalsha(@process_sha, argv: args) end PROCESS_ITEMS_LUA_SCRIPT = <<-LUA local redis_prefix = ARGV[1] local input_matrices = cjson.decode(ARGV[2]) local similarity_limit = tonumber(ARGV[3]) local item = ARGV[4] local keys = {} for name, options in pairs(input_matrices) do local key = table.concat({redis_prefix, name, 'sets', item}, ':') local sets = redis.call('SMEMBERS', key) for _, set in ipairs(sets) do table.insert(keys, table.concat({redis_prefix, name, 'items', set}, ':')) end end -- Account for empty tables. if next(keys) == nil then return nil end local related_items = redis.call('SUNION', unpack(keys)) local function add_similarity_if_necessary(item, similarity, score) local store = true local key = table.concat({redis_prefix, 'similarities', item}, ':') if similarity_limit ~= nil then local zrank = redis.call('ZRANK', key, similarity) if zrank ~= nil then local zcard = redis.call('ZCARD', key) if zcard >= similarity_limit then -- Similarity is not already stored and we are at limit of similarities. local lowest_scored_item = redis.call('ZRANGEBYSCORE', key, '0', '+inf', 'withscores', 'limit', 0, 1) if #lowest_scored_item > 0 then -- If score is less than or equal to the lowest score, don't store it. Otherwise, make room by removing the lowest scored similarity if score <= tonumber(lowest_scored_item[2]) then store = false else redis.call('ZREM', key, lowest_scored_item[1]) end end end end end if store then redis.call('ZADD', key, score, similarity) end end for i, related_item in ipairs(related_items) do -- Disregard the current item. if related_item ~= item then local score = 0.0 for name, matrix in pairs(input_matrices) do local s = 0.0 local key_1 = table.concat({redis_prefix, name, 'sets', item}, ':') local key_2 = table.concat({redis_prefix, name, 'sets', related_item}, ':') if matrix.measure == 'jaccard_index' then local x = tonumber(redis.call('SINTERSTORE', 'temp', key_1, key_2)) local y = tonumber(redis.call('SUNIONSTORE', 'temp', key_1, key_2)) redis.call('DEL', 'temp') if y > 0 then s = s + (x / y) end elseif matrix.measure == 'sorensen_coefficient' then local x = redis.call('SINTERSTORE', 'temp', key_1, key_2) local y = redis.call('SCARD', key_1) local z = redis.call('SCARD', key_2) redis.call('DEL', 'temp') local denom = y + z if denom > 0 then s = s + (2 * x / denom) end else error("Bad matrix.measure: " .. matrix.measure) end score = score + (s * matrix.weight) end if score > 0 then add_similarity_if_necessary(item, related_item, score) add_similarity_if_necessary(related_item, item, score) else redis.call('ZREM', table.concat({redis_prefix, 'similarities', item}, ':'), related_item) redis.call('ZREM', table.concat({redis_prefix, 'similarities', related_item}, ':'), item) end end end LUA end ================================================ FILE: lib/predictor/version.rb ================================================ module Predictor VERSION = "2.3.1" end ================================================ FILE: lib/predictor.rb ================================================ require 'json' require "redis" require "predictor/predictor" require "predictor/distance" require "predictor/input_matrix" require "predictor/base" ================================================ FILE: predictor.gemspec ================================================ # -*- encoding: utf-8 -*- require File.expand_path('../lib/predictor/version', __FILE__) Gem::Specification.new do |s| s.name = "predictor" s.version = Predictor::VERSION s.platform = Gem::Platform::RUBY s.authors = ["Pathgather"] s.email = ["tech@pathgather.com"] s.homepage = "https://github.com/nyagato-00/predictor" s.description = s.summary = "Fast and efficient recommendations and predictions using Redis" s.licenses = ["MIT"] s.add_dependency "redis", ">= 3.0.0" s.add_development_dependency "rspec", ">= 3.4.0" s.add_development_dependency "rake", ">= 11.0" s.add_development_dependency "pry" s.add_development_dependency "yard" s.files = `git ls-files`.split("\n") - [".gitignore", ".rspec", ".travis.yml"] s.test_files = `git ls-files -- spec/*`.split("\n") s.require_paths = ["lib"] end ================================================ FILE: spec/base_spec.rb ================================================ require 'spec_helper' describe Predictor::Base do before(:each) do flush_redis! BaseRecommender.input_matrices = {} BaseRecommender.reset_similarity_limit! BaseRecommender.redis_prefix(nil) UserRecommender.input_matrices = {} UserRecommender.reset_similarity_limit! BaseRecommender.processing_technique nil UserRecommender.processing_technique nil Predictor.processing_technique nil end describe "configuration" do it "should add an input_matrix by 'key'" do BaseRecommender.input_matrix(:myinput) expect(BaseRecommender.input_matrices.keys).to eq([:myinput]) end it "should default the similarity_limit to 128" do expect(BaseRecommender.similarity_limit).to eq(128) end it "should allow the similarity limit to be configured" do BaseRecommender.limit_similarities_to(500) expect(BaseRecommender.similarity_limit).to eq(500) end it "should allow the similarity limit to be removed" do BaseRecommender.limit_similarities_to(nil) expect(BaseRecommender.similarity_limit).to eq(nil) end it "should retrieve an input_matrix on a new instance" do BaseRecommender.input_matrix(:myinput) sm = BaseRecommender.new expect{ sm.myinput }.not_to raise_error end it "should retrieve an input_matrix on a new instance and correctly overload respond_to?" do BaseRecommender.input_matrix(:myinput) sm = BaseRecommender.new expect(sm.respond_to?(:process!)).to be_truthy expect(sm.respond_to?(:myinput)).to be_truthy expect(sm.respond_to?(:fnord)).to be_falsey end it "should retrieve an input_matrix on a new instance and intialize the correct class" do BaseRecommender.input_matrix(:myinput) sm = BaseRecommender.new expect(sm.myinput).to be_a(Predictor::InputMatrix) end it "should accept a custom processing_technique, or default to Predictor's default" do expect(BaseRecommender.get_processing_technique).to eq(:ruby) Predictor.processing_technique :lua expect(BaseRecommender.get_processing_technique).to eq(:lua) BaseRecommender.processing_technique :union expect(BaseRecommender.get_processing_technique).to eq(:union) end end describe "redis_key" do it "should vary based on the class name" do expect(BaseRecommender.new.redis_key).to eq('predictor-test:BaseRecommender') expect(UserRecommender.new.redis_key).to eq('predictor-test:UserRecommender') end end describe "redis_key" do it "should vary based on the class name" do expect(BaseRecommender.new.redis_key).to eq('predictor-test:BaseRecommender') expect(UserRecommender.new.redis_key).to eq('predictor-test:UserRecommender') end it "should be able to mimic the old naming defaults" do BaseRecommender.redis_prefix([nil]) expect(BaseRecommender.new.redis_key(:key)).to eq('predictor-test:key') end it "should respect the Predictor prefix configuration setting" do br = BaseRecommender.new expect(br.redis_key).to eq("predictor-test:BaseRecommender") expect(br.redis_key(:another)).to eq("predictor-test:BaseRecommender:another") expect(br.redis_key(:another, :key)).to eq("predictor-test:BaseRecommender:another:key") expect(br.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:BaseRecommender:another:set:of:keys") i = 0 Predictor.redis_prefix { i += 1 } expect(br.redis_key).to eq("1:BaseRecommender") expect(br.redis_key(:another)).to eq("2:BaseRecommender:another") expect(br.redis_key(:another, :key)).to eq("3:BaseRecommender:another:key") expect(br.redis_key(:another, [:set, :of, :keys])).to eq("4:BaseRecommender:another:set:of:keys") Predictor.redis_prefix nil expect(br.redis_key).to eq("predictor:BaseRecommender") expect(br.redis_key(:another)).to eq("predictor:BaseRecommender:another") expect(br.redis_key(:another, :key)).to eq("predictor:BaseRecommender:another:key") expect(br.redis_key(:another, [:set, :of, :keys])).to eq("predictor:BaseRecommender:another:set:of:keys") Predictor.redis_prefix [nil] expect(br.redis_key).to eq("BaseRecommender") expect(br.redis_key(:another)).to eq("BaseRecommender:another") expect(br.redis_key(:another, :key)).to eq("BaseRecommender:another:key") expect(br.redis_key(:another, [:set, :of, :keys])).to eq("BaseRecommender:another:set:of:keys") Predictor.redis_prefix { [1, 2, 3] } expect(br.redis_key).to eq("1:2:3:BaseRecommender") expect(br.redis_key(:another)).to eq("1:2:3:BaseRecommender:another") expect(br.redis_key(:another, :key)).to eq("1:2:3:BaseRecommender:another:key") expect(br.redis_key(:another, [:set, :of, :keys])).to eq("1:2:3:BaseRecommender:another:set:of:keys") Predictor.redis_prefix 'predictor-test' expect(br.redis_key).to eq("predictor-test:BaseRecommender") expect(br.redis_key(:another)).to eq("predictor-test:BaseRecommender:another") expect(br.redis_key(:another, :key)).to eq("predictor-test:BaseRecommender:another:key") expect(br.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:BaseRecommender:another:set:of:keys") end it "should respect the class prefix configuration setting" do br = BaseRecommender.new BaseRecommender.redis_prefix('base') expect(br.redis_key).to eq("predictor-test:base") expect(br.redis_key(:another)).to eq("predictor-test:base:another") expect(br.redis_key(:another, :key)).to eq("predictor-test:base:another:key") expect(br.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:base:another:set:of:keys") i = 0 BaseRecommender.redis_prefix { i += 1 } expect(br.redis_key).to eq("predictor-test:1") expect(br.redis_key(:another)).to eq("predictor-test:2:another") expect(br.redis_key(:another, :key)).to eq("predictor-test:3:another:key") expect(br.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:4:another:set:of:keys") BaseRecommender.redis_prefix(nil) expect(br.redis_key).to eq("predictor-test:BaseRecommender") expect(br.redis_key(:another)).to eq("predictor-test:BaseRecommender:another") expect(br.redis_key(:another, :key)).to eq("predictor-test:BaseRecommender:another:key") expect(br.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:BaseRecommender:another:set:of:keys") end it "should respect the instance prefix configuration setting" do br = PrefixRecommender.new("foo") expect(br.redis_key).to eq("predictor-test:PrefixRecommender:foo") expect(br.redis_key(:another)).to eq("predictor-test:PrefixRecommender:foo:another") expect(br.redis_key(:another, :key)).to eq("predictor-test:PrefixRecommender:foo:another:key") expect(br.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:PrefixRecommender:foo:another:set:of:keys") br.prefix = nil expect(br.redis_key).to eq("predictor-test:PrefixRecommender") expect(br.redis_key(:another)).to eq("predictor-test:PrefixRecommender:another") end end describe "all_items" do it "returns all items across all matrices" do BaseRecommender.input_matrix(:anotherinput) BaseRecommender.input_matrix(:yetanotherinput) sm = BaseRecommender.new sm.add_to_matrix(:anotherinput, 'a', "foo", "bar") sm.add_to_matrix(:yetanotherinput, 'b', "fnord", "shmoo", "bar") expect(sm.all_items).to include('foo', 'bar', 'fnord', 'shmoo') expect(sm.all_items.length).to eq(4) end it "doesn't return items from other recommenders" do BaseRecommender.input_matrix(:anotherinput) BaseRecommender.input_matrix(:yetanotherinput) UserRecommender.input_matrix(:anotherinput) UserRecommender.input_matrix(:yetanotherinput) sm = BaseRecommender.new sm.add_to_matrix(:anotherinput, 'a', "foo", "bar") sm.add_to_matrix(:yetanotherinput, 'b', "fnord", "shmoo", "bar") expect(sm.all_items).to include('foo', 'bar', 'fnord', 'shmoo') expect(sm.all_items.length).to eq(4) ur = UserRecommender.new expect(ur.all_items).to eq([]) end end describe "add_to_matrix" do it "calls add_to_set on the given matrix" do BaseRecommender.input_matrix(:anotherinput) sm = BaseRecommender.new expect(sm.anotherinput).to receive(:add_to_set).with('a', 'foo', 'bar') sm.add_to_matrix(:anotherinput, 'a', 'foo', 'bar') end it "adds the items to the all_items storage" do BaseRecommender.input_matrix(:anotherinput) sm = BaseRecommender.new sm.add_to_matrix(:anotherinput, 'a', 'foo', 'bar') expect(sm.all_items).to include('foo', 'bar') end end describe "add_to_matrix!" do it "calls add_to_matrix and process_items! for the given items" do BaseRecommender.input_matrix(:anotherinput) sm = BaseRecommender.new expect(sm).to receive(:add_to_matrix).with(:anotherinput, 'a', 'foo') expect(sm).to receive(:process_items!).with('foo') sm.add_to_matrix!(:anotherinput, 'a', 'foo') end end describe "related_items" do it "returns items in the sets across all matrices that the given item is also in" do BaseRecommender.input_matrix(:anotherinput) BaseRecommender.input_matrix(:yetanotherinput) BaseRecommender.input_matrix(:finalinput) sm = BaseRecommender.new sm.anotherinput.add_to_set('a', "foo", "bar") sm.yetanotherinput.add_to_set('b', "fnord", "shmoo", "bar") sm.finalinput.add_to_set('c', "nada") sm.process! expect(sm.related_items("bar")).to include("foo", "fnord", "shmoo") expect(sm.related_items("bar").length).to eq(3) end end describe "predictions_for" do it "accepts an :on option to return scores of specific objects" do BaseRecommender.input_matrix(:users, weight: 4.0) BaseRecommender.input_matrix(:tags, weight: 1.0) sm = BaseRecommender.new sm.users.add_to_set('me', "foo", "bar", "fnord") sm.users.add_to_set('not_me', "foo", "shmoo") sm.users.add_to_set('another', "fnord", "other") sm.users.add_to_set('another', "nada") sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo") sm.tags.add_to_set('tag2', "bar", "shmoo", "other") sm.tags.add_to_set('tag3', "shmoo", "nada") sm.process! predictions = sm.predictions_for('me', matrix_label: :users, on: 'other', with_scores: true) expect(predictions).to eq([['other', 3.0]]) predictions = sm.predictions_for('me', matrix_label: :users, on: ['other'], with_scores: true) expect(predictions).to eq([['other', 3.0]]) predictions = sm.predictions_for('me', matrix_label: :users, on: ['other', 'nada'], with_scores: true) expect(predictions).to eq([['other', 3.0], ['nada', 2.0]]) predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], on: ['other', 'nada'], with_scores: true) expect(predictions).to eq([['other', 3.0], ['nada', 2.0]]) predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], on: ['other', 'nada']) expect(predictions).to eq(['other', 'nada']) predictions = sm.predictions_for('me', matrix_label: :users, on: ['shmoo', 'other', 'nada'], offset: 1, limit: 1, with_scores: true) expect(predictions).to eq([["other", 3.0]]) predictions = sm.predictions_for('me', matrix_label: :users, on: ['shmoo', 'other', 'nada'], offset: 1, with_scores: true) expect(predictions).to eq([['other', 3.0], ['nada', 2.0]]) end end [:ruby, :lua, :union].each do |technique| describe "predictions_for with #{technique} processing" do before do Predictor.processing_technique(technique) end it "returns relevant predictions" do BaseRecommender.input_matrix(:users, weight: 4.0) BaseRecommender.input_matrix(:tags, weight: 1.0) sm = BaseRecommender.new sm.users.add_to_set('me', "foo", "bar", "fnord") sm.users.add_to_set('not_me', "foo", "shmoo") sm.users.add_to_set('another', "fnord", "other") sm.users.add_to_set('another', "nada") sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo") sm.tags.add_to_set('tag2', "bar", "shmoo") sm.tags.add_to_set('tag3', "shmoo", "nada") sm.process! predictions = sm.predictions_for('me', matrix_label: :users) expect(predictions).to eq(["shmoo", "other", "nada"]) predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"]) expect(predictions).to eq(["shmoo", "other", "nada"]) predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1) expect(predictions).to eq(["other"]) predictions = sm.predictions_for('me', matrix_label: :users, offset: 1) expect(predictions).to eq(["other", "nada"]) end it "accepts a :boost option" do BaseRecommender.input_matrix(:users, weight: 4.0) BaseRecommender.input_matrix(:tags, weight: 1.0) sm = BaseRecommender.new sm.users.add_to_set('me', "foo", "bar", "fnord") sm.users.add_to_set('not_me', "foo", "shmoo") sm.users.add_to_set('another', "fnord", "other") sm.users.add_to_set('another', "nada") sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo") sm.tags.add_to_set('tag2', "bar", "shmoo") sm.tags.add_to_set('tag3', "shmoo", "nada") sm.process! # Syntax #1: Tags passed as array, weights assumed to be 1.0 predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: ['tag3']}) expect(predictions).to eq(["shmoo", "nada", "other"]) predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], boost: {tags: ['tag3']}) expect(predictions).to eq(["shmoo", "nada", "other"]) predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: ['tag3']}) expect(predictions).to eq(["nada"]) predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: ['tag3']}) expect(predictions).to eq(["nada", "other"]) # Syntax #2: Weights explicitly set. predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: {values: ['tag3'], weight: 1.0}}) expect(predictions).to eq(["shmoo", "nada", "other"]) predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], boost: {tags: {values: ['tag3'], weight: 1.0}}) expect(predictions).to eq(["shmoo", "nada", "other"]) predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: {values: ['tag3'], weight: 1.0}}) expect(predictions).to eq(["nada"]) predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: {values: ['tag3'], weight: 1.0}}) expect(predictions).to eq(["nada", "other"]) # Make sure weights are actually being passed to Redis. shmoo, nada, other = sm.predictions_for('me', matrix_label: :users, boost: {tags: {values: ['tag3'], weight: 10000.0}}, with_scores: true) expect(shmoo[0]).to eq('shmoo') expect(shmoo[1]).to be > 10000 expect(nada[0]).to eq('nada') expect(nada[1]).to be > 10000 expect(other[0]).to eq('other') expect(other[1]).to be < 10 end it "accepts a :boost option, even with an empty item set" do BaseRecommender.input_matrix(:users, weight: 4.0) BaseRecommender.input_matrix(:tags, weight: 1.0) sm = BaseRecommender.new sm.users.add_to_set('not_me', "foo", "shmoo") sm.users.add_to_set('another', "fnord", "other") sm.users.add_to_set('another', "nada") sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo") sm.tags.add_to_set('tag2', "bar", "shmoo") sm.tags.add_to_set('tag3', "shmoo", "nada") sm.process! # Syntax #1: Tags passed as array, weights assumed to be 1.0 predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: ['tag3']}) expect(predictions).to eq(["shmoo", "nada"]) predictions = sm.predictions_for(item_set: [], boost: {tags: ['tag3']}) expect(predictions).to eq(["shmoo", "nada"]) predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: ['tag3']}) expect(predictions).to eq(["nada"]) predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: ['tag3']}) expect(predictions).to eq(["nada"]) # Syntax #2: Weights explicitly set. predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: {values: ['tag3'], weight: 1.0}}) expect(predictions).to eq(["shmoo", "nada"]) predictions = sm.predictions_for(item_set: [], boost: {tags: {values: ['tag3'], weight: 1.0}}) expect(predictions).to eq(["shmoo", "nada"]) predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: {values: ['tag3'], weight: 1.0}}) expect(predictions).to eq(["nada"]) predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: {values: ['tag3'], weight: 1.0}}) expect(predictions).to eq(["nada"]) end end describe "process_items! with #{technique} processing" do before do Predictor.processing_technique(technique) end context "with no similarity_limit" do it "calculates the similarity between the item and all related_items (other items in a set the given item is in)" do BaseRecommender.input_matrix(:myfirstinput) BaseRecommender.input_matrix(:mysecondinput) BaseRecommender.input_matrix(:mythirdinput, weight: 3.0) sm = BaseRecommender.new sm.myfirstinput.add_to_set 'set1', 'item1', 'item2' sm.mysecondinput.add_to_set 'set2', 'item2', 'item3' sm.mythirdinput.add_to_set 'set3', 'item2', 'item3' sm.mythirdinput.add_to_set 'set4', 'item1', 'item2', 'item3' expect(sm.similarities_for('item2')).to be_empty sm.process_items!('item2') similarities = sm.similarities_for('item2') expect(similarities).to eq(["item3", "item1"]) end end context "with a similarity_limit" do it "calculates the similarity between the item and all related_items (other items in a set the given item is in), but obeys the similarity_limit" do BaseRecommender.input_matrix(:myfirstinput) BaseRecommender.input_matrix(:mysecondinput) BaseRecommender.input_matrix(:mythirdinput, weight: 3.0) BaseRecommender.limit_similarities_to(1) sm = BaseRecommender.new sm.myfirstinput.add_to_set 'set1', 'item1', 'item2' sm.mysecondinput.add_to_set 'set2', 'item2', 'item3' sm.mythirdinput.add_to_set 'set3', 'item2', 'item3' sm.mythirdinput.add_to_set 'set4', 'item1', 'item2', 'item3' expect(sm.similarities_for('item2')).to be_empty sm.process_items!('item2') similarities = sm.similarities_for('item2') expect(similarities).to include("item3") expect(similarities.length).to eq(1) end end end end describe "similarities_for" do it "should not throw exception for non existing items" do sm = BaseRecommender.new expect(sm.similarities_for("not_existing_item").length).to eq(0) end it "correctly weighs and sums input matrices" do BaseRecommender.input_matrix(:users, weight: 1.0) BaseRecommender.input_matrix(:tags, weight: 2.0) BaseRecommender.input_matrix(:topics, weight: 4.0) sm = BaseRecommender.new sm.users.add_to_set('user1', "c1", "c2", "c4") sm.users.add_to_set('user2', "c3", "c4") sm.topics.add_to_set('topic1', "c1", "c4") sm.topics.add_to_set('topic2', "c2", "c3") sm.tags.add_to_set('tag1', "c1", "c2", "c4") sm.tags.add_to_set('tag2', "c1", "c4") sm.process! expect(sm.similarities_for("c1", with_scores: true)).to eq([["c4", 6.5], ["c2", 2.0]]) expect(sm.similarities_for("c2", with_scores: true)).to eq([["c3", 4.0], ["c1", 2.0], ["c4", 1.5]]) expect(sm.similarities_for("c3", with_scores: true)).to eq([["c2", 4.0], ["c4", 0.5]]) expect(sm.similarities_for("c4", with_scores: true, exclusion_set: ["c3"])).to eq([["c1", 6.5], ["c2", 1.5]]) end end describe "sets_for" do it "should return all the sets the given item is in" do BaseRecommender.input_matrix(:set1) BaseRecommender.input_matrix(:set2) sm = BaseRecommender.new sm.set1.add_to_set "item1", "foo", "bar" sm.set1.add_to_set "item2", "nada", "bar" sm.set2.add_to_set "item3", "bar", "other" expect(sm.sets_for("bar").length).to eq(3) expect(sm.sets_for("bar")).to include("item1", "item2", "item3") expect(sm.sets_for("other")).to eq(["item3"]) end end describe "process!" do it "should call process_items for all_items's" do BaseRecommender.input_matrix(:anotherinput) BaseRecommender.input_matrix(:yetanotherinput) sm = BaseRecommender.new sm.anotherinput.add_to_set('a', "foo", "bar") sm.yetanotherinput.add_to_set('b', "fnord", "shmoo") expect(sm.all_items).to include("foo", "bar", "fnord", "shmoo") expect(sm).to receive(:process_items!).with(*sm.all_items) sm.process! end end describe "delete_pair_from_matrix!" do it "should call remove_from_set on the matrix" do BaseRecommender.input_matrix(:anotherinput) sm = BaseRecommender.new sm.anotherinput.add_to_set('a', "foo") sm.anotherinput.add_to_set('a', "bar") sm.anotherinput.add_to_set('a', "shmoo") sm.process! expect(sm.similarities_for('bar')).to include('foo', 'shmoo') expect(sm.anotherinput).to receive(:remove_from_set).with('a', 'foo') sm.delete_pair_from_matrix!(:anotherinput, 'a', 'foo') end it "updates similarities" do BaseRecommender.input_matrix(:anotherinput) sm = BaseRecommender.new sm.anotherinput.add_to_set('a', "foo") sm.anotherinput.add_to_set('a', "bar") sm.anotherinput.add_to_set('a', "shmoo") sm.process! expect(sm.similarities_for('bar')).to include('foo', 'shmoo') sm.delete_pair_from_matrix!(:anotherinput, 'a', 'foo') expect(sm.similarities_for('bar')).to eq(['shmoo']) end end describe "delete_from_matrix!" do it "calls delete_item on the matrix" do BaseRecommender.input_matrix(:anotherinput) BaseRecommender.input_matrix(:yetanotherinput) sm = BaseRecommender.new sm.anotherinput.add_to_set('a', "foo", "bar") sm.yetanotherinput.add_to_set('b', "bar", "shmoo") sm.process! expect(sm.similarities_for('bar')).to include('foo', 'shmoo') expect(sm.anotherinput).to receive(:delete_item).with('foo') sm.delete_from_matrix!(:anotherinput, 'foo') end it "updates similarities" do BaseRecommender.input_matrix(:anotherinput) BaseRecommender.input_matrix(:yetanotherinput) sm = BaseRecommender.new sm.anotherinput.add_to_set('a', "foo", "bar") sm.yetanotherinput.add_to_set('b', "bar", "shmoo") sm.process! expect(sm.similarities_for('bar')).to include('foo', 'shmoo') sm.delete_from_matrix!(:anotherinput, 'foo') expect(sm.similarities_for('bar')).to eq(['shmoo']) end end describe "delete_item!" do it "should call delete_item on each input_matrix" do BaseRecommender.input_matrix(:myfirstinput) BaseRecommender.input_matrix(:mysecondinput) sm = BaseRecommender.new expect(sm.myfirstinput).to receive(:delete_item).with("fnorditem") expect(sm.mysecondinput).to receive(:delete_item).with("fnorditem") sm.delete_item!("fnorditem") end it "should remove the item from all_items" do BaseRecommender.input_matrix(:anotherinput) sm = BaseRecommender.new sm.anotherinput.add_to_set('a', "foo", "bar") sm.process! expect(sm.all_items).to include('foo') sm.delete_item!('foo') expect(sm.all_items).not_to include('foo') end it "should remove the item's similarities and also remove the item from related_items' similarities" do BaseRecommender.input_matrix(:anotherinput) BaseRecommender.input_matrix(:yetanotherinput) sm = BaseRecommender.new sm.anotherinput.add_to_set('a', "foo", "bar") sm.yetanotherinput.add_to_set('b', "bar", "shmoo") sm.process! expect(sm.similarities_for('bar')).to include('foo', 'shmoo') expect(sm.similarities_for('shmoo')).to include('bar') sm.delete_item!('shmoo') expect(sm.similarities_for('bar')).not_to include('shmoo') expect(sm.similarities_for('shmoo')).to be_empty end end describe "clean!" do it "should clean out the Redis storage for this Predictor" do BaseRecommender.input_matrix(:set1) BaseRecommender.input_matrix(:set2) sm = BaseRecommender.new sm.set1.add_to_set "item1", "foo", "bar" sm.set1.add_to_set "item2", "nada", "bar" sm.set2.add_to_set "item3", "bar", "other" expect(Predictor.redis.keys(sm.redis_key('*'))).not_to be_empty sm.clean! expect(Predictor.redis.keys(sm.redis_key('*'))).to be_empty end end describe "ensure_similarity_limit_is_obeyed!" do it "should shorten similarities to the given limit and rewrite the zset" do BaseRecommender.limit_similarities_to(nil) BaseRecommender.input_matrix(:myfirstinput) sm = BaseRecommender.new sm.myfirstinput.add_to_set *(['set1'] + 130.times.map{|i| "item#{i}"}) expect(sm.similarities_for('item2')).to be_empty sm.process_items!('item2') expect(sm.similarities_for('item2').length).to eq(129) redis = Predictor.redis key = sm.redis_key(:similarities, 'item2') expect(redis.zcard(key)).to eq(129) expect(redis.object(:encoding, key)).to eq('skiplist') # Inefficient BaseRecommender.reset_similarity_limit! sm.ensure_similarity_limit_is_obeyed! expect(redis.zcard(key)).to eq(128) expect(redis.object(:encoding, key)).to eq('ziplist') # Efficient end end end ================================================ FILE: spec/input_matrix_spec.rb ================================================ require 'spec_helper' describe Predictor::InputMatrix do let(:options) { @default_options.merge(@options) } before(:each) { @options = {} } before(:all) do @base = BaseRecommender.new @default_options = { base: @base, key: "mymatrix" } @matrix = Predictor::InputMatrix.new(@default_options) end before(:each) do flush_redis! end describe "redis_key" do it "should respect the global namespace configuration" do expect(@matrix.redis_key).to eq("predictor-test:BaseRecommender:mymatrix") expect(@matrix.redis_key(:another)).to eq("predictor-test:BaseRecommender:mymatrix:another") expect(@matrix.redis_key(:another, :key)).to eq("predictor-test:BaseRecommender:mymatrix:another:key") expect(@matrix.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:BaseRecommender:mymatrix:another:set:of:keys") i = 0 Predictor.redis_prefix { i += 1 } expect(@matrix.redis_key).to eq("1:BaseRecommender:mymatrix") expect(@matrix.redis_key(:another)).to eq("2:BaseRecommender:mymatrix:another") expect(@matrix.redis_key(:another, :key)).to eq("3:BaseRecommender:mymatrix:another:key") expect(@matrix.redis_key(:another, [:set, :of, :keys])).to eq("4:BaseRecommender:mymatrix:another:set:of:keys") Predictor.redis_prefix(nil) expect(@matrix.redis_key).to eq("predictor:BaseRecommender:mymatrix") expect(@matrix.redis_key(:another)).to eq("predictor:BaseRecommender:mymatrix:another") expect(@matrix.redis_key(:another, :key)).to eq("predictor:BaseRecommender:mymatrix:another:key") expect(@matrix.redis_key(:another, [:set, :of, :keys])).to eq("predictor:BaseRecommender:mymatrix:another:set:of:keys") Predictor.redis_prefix('predictor-test') expect(@matrix.redis_key).to eq("predictor-test:BaseRecommender:mymatrix") expect(@matrix.redis_key(:another)).to eq("predictor-test:BaseRecommender:mymatrix:another") expect(@matrix.redis_key(:another, :key)).to eq("predictor-test:BaseRecommender:mymatrix:another:key") expect(@matrix.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:BaseRecommender:mymatrix:another:set:of:keys") end it "should respect the class-level configuration" do i = 0 BaseRecommender.redis_prefix { i += 1 } expect(@matrix.redis_key).to eq("predictor-test:1:mymatrix") expect(@matrix.redis_key(:another)).to eq("predictor-test:2:mymatrix:another") expect(@matrix.redis_key(:another, :key)).to eq("predictor-test:3:mymatrix:another:key") expect(@matrix.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:4:mymatrix:another:set:of:keys") BaseRecommender.redis_prefix([nil]) expect(@matrix.redis_key).to eq("predictor-test:mymatrix") expect(@matrix.redis_key(:another)).to eq("predictor-test:mymatrix:another") expect(@matrix.redis_key(:another, :key)).to eq("predictor-test:mymatrix:another:key") expect(@matrix.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:mymatrix:another:set:of:keys") BaseRecommender.redis_prefix(['a', 'b']) expect(@matrix.redis_key).to eq("predictor-test:a:b:mymatrix") expect(@matrix.redis_key(:another)).to eq("predictor-test:a:b:mymatrix:another") expect(@matrix.redis_key(:another, :key)).to eq("predictor-test:a:b:mymatrix:another:key") expect(@matrix.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:a:b:mymatrix:another:set:of:keys") BaseRecommender.redis_prefix(nil) expect(@matrix.redis_key).to eq("predictor-test:BaseRecommender:mymatrix") expect(@matrix.redis_key(:another)).to eq("predictor-test:BaseRecommender:mymatrix:another") expect(@matrix.redis_key(:another, :key)).to eq("predictor-test:BaseRecommender:mymatrix:another:key") expect(@matrix.redis_key(:another, [:set, :of, :keys])).to eq("predictor-test:BaseRecommender:mymatrix:another:set:of:keys") end end describe "weight" do it "returns the weight configured or a default of 1" do expect(@matrix.weight).to eq(1.0) # default weight matrix = Predictor::InputMatrix.new(redis_prefix: "predictor-test", key: "mymatrix", weight: 5.0) expect(matrix.weight).to eq(5.0) end end describe "add_to_set" do it "adds each member of the set to the key's 'sets' set" do expect(@matrix.items_for("item1")).not_to include("foo", "bar", "fnord", "blubb") @matrix.add_to_set "item1", "foo", "bar", "fnord", "blubb" expect(@matrix.items_for("item1")).to include("foo", "bar", "fnord", "blubb") end it "does not crash if the set of items is empty" do @matrix.add_to_set "item1" @matrix.add_to_set "item1", [] end it "adds the key to each set member's 'items' set" do expect(@matrix.sets_for("foo")).not_to include("item1") expect(@matrix.sets_for("bar")).not_to include("item1") expect(@matrix.sets_for("fnord")).not_to include("item1") expect(@matrix.sets_for("blubb")).not_to include("item1") @matrix.add_to_set "item1", "foo", "bar", "fnord", "blubb" expect(@matrix.sets_for("foo")).to include("item1") expect(@matrix.sets_for("bar")).to include("item1") expect(@matrix.sets_for("fnord")).to include("item1") expect(@matrix.sets_for("blubb")).to include("item1") end end describe "items_for" do it "returns the items in the given set ID" do @matrix.add_to_set "item1", ["foo", "bar", "fnord", "blubb"] expect(@matrix.items_for("item1")).to include("foo", "bar", "fnord", "blubb") @matrix.add_to_set "item2", ["foo", "bar", "snafu", "nada"] expect(@matrix.items_for("item2")).to include("foo", "bar", "snafu", "nada") expect(@matrix.items_for("item1")).not_to include("snafu", "nada") end end describe "sets_for" do it "returns the set IDs the given item is in" do @matrix.add_to_set "item1", ["foo", "bar", "fnord", "blubb"] @matrix.add_to_set "item2", ["foo", "bar", "snafu", "nada"] expect(@matrix.sets_for("foo")).to include("item1", "item2") expect(@matrix.sets_for("snafu")).to eq(["item2"]) end end describe "related_items" do it "returns the items in sets the given item is also in" do @matrix.add_to_set "item1", ["foo", "bar", "fnord", "blubb"] @matrix.add_to_set "item2", ["foo", "bar", "snafu", "nada"] @matrix.add_to_set "item3", ["nada", "other"] expect(@matrix.related_items("bar")).to include("foo", "fnord", "blubb", "snafu", "nada") expect(@matrix.related_items("bar").length).to eq(5) expect(@matrix.related_items("other")).to eq(["nada"]) expect(@matrix.related_items("snafu")).to include("foo", "bar", "nada") expect(@matrix.related_items("snafu").length).to eq(3) end end describe "delete_item" do before do @matrix.add_to_set "item1", "foo", "bar", "fnord", "blubb" @matrix.add_to_set "item2", "foo", "bar", "snafu", "nada" @matrix.add_to_set "item3", "nada", "other" end it "should delete the item from sets it is in" do expect(@matrix.items_for("item1")).to include("bar") expect(@matrix.items_for("item2")).to include("bar") expect(@matrix.sets_for("bar")).to include("item1", "item2") @matrix.delete_item("bar") expect(@matrix.items_for("item1")).not_to include("bar") expect(@matrix.items_for("item2")).not_to include("bar") expect(@matrix.sets_for("bar")).to be_empty end end describe "#score" do let(:matrix) { Predictor::InputMatrix.new(options) } context "default" do it "scores as jaccard index by default" do matrix.add_to_set "item1", "foo", "bar", "fnord", "blubb" matrix.add_to_set "item2", "bar", "fnord", "shmoo", "snafu" matrix.add_to_set "item3", "bar", "nada", "snafu" expect(matrix.score("bar", "snafu")).to eq(2.0/3.0) end it "scores as jaccard index when given option" do matrix = Predictor::InputMatrix.new(options.merge(measure: :jaccard_index)) matrix.add_to_set "item1", "foo", "bar", "fnord", "blubb" matrix.add_to_set "item2", "bar", "fnord", "shmoo", "snafu" matrix.add_to_set "item3", "bar", "nada", "snafu" expect(matrix.score("bar", "snafu")).to eq(2.0/3.0) end it "should handle missing sets" do matrix.add_to_set "item1", "foo", "bar", "fnord", "blubb" expect(matrix.score("is", "missing")).to eq(0.0) end end context "sorensen_coefficient" do before { @options[:measure] = :sorensen_coefficient } it "should calculate the correct sorensen index" do matrix.add_to_set "item1", "foo", "bar", "fnord", "blubb" matrix.add_to_set "item2", "fnord", "shmoo", "snafu" matrix.add_to_set "item3", "bar", "nada", "snafu" expect(matrix.score("bar", "snafu")).to eq(2.0/4.0) end it "should handle missing sets" do matrix.add_to_set "item1", "foo", "bar", "fnord", "blubb" expect(matrix.score("is", "missing")).to eq(0.0) end end end private def add_two_item_test_data!(matrix) matrix.add_to_set("user42", "fnord", "blubb") matrix.add_to_set("user44", "blubb") matrix.add_to_set("user46", "fnord") matrix.add_to_set("user48", "fnord", "blubb") matrix.add_to_set("user50", "fnord") end def add_three_item_test_data!(matrix) matrix.add_to_set("user42", "fnord", "blubb", "shmoo") matrix.add_to_set("user44", "blubb") matrix.add_to_set("user46", "fnord", "shmoo") matrix.add_to_set("user48", "fnord", "blubb") matrix.add_to_set("user50", "fnord", "shmoo") end end ================================================ FILE: spec/predictor_spec.rb ================================================ require 'spec_helper' describe Predictor do it "should store a redis connection" do Predictor.redis = "asd" expect(Predictor.redis).to eq("asd") end it "should raise an exception if unconfigured redis connection is accessed" do Predictor.redis = nil expect{ Predictor.redis }.to raise_error(/not configured/i) end end ================================================ FILE: spec/spec_helper.rb ================================================ require "predictor" require "pry" def flush_redis! Predictor.redis = Redis.new Predictor.redis.keys("predictor-test*").each do |k| Predictor.redis.del(k) end end Predictor.redis_prefix "predictor-test" class BaseRecommender include Predictor::Base end class UserRecommender include Predictor::Base end class TestRecommender include Predictor::Base input_matrix :jaccard_one end class PrefixRecommender include Predictor::Base def initialize(prefix) @prefix = prefix end def prefix=(new_prefix) @prefix = new_prefix end def get_redis_prefix @prefix end end class Predictor::TestInputMatrix def initialize(opts) @opts = opts end def method_missing(method, *args) @opts[method] end end