Repository: apache/predictionio Branch: develop Commit: d9628ca2f148 Files: 904 Total size: 2.9 MB Directory structure: gitextract_wm7b4ll_/ ├── .gitattributes ├── .gitignore ├── .travis.yml ├── CONTRIBUTING.md ├── Dockerfile ├── KEYS ├── LICENSE.txt ├── NOTICE.txt ├── PMC.md ├── README.md ├── RELEASE.md ├── assembly/ │ ├── build.sbt │ └── src/ │ ├── debian/ │ │ └── DEBIAN/ │ │ ├── postrm │ │ └── preinst │ └── rpm/ │ └── scriptlets/ │ ├── postun │ └── preinst ├── bin/ │ ├── cjson │ ├── compute-classpath.sh │ ├── install.sh │ ├── load-pio-env.sh │ ├── pio │ ├── pio-class │ ├── pio-daemon │ ├── pio-shell │ ├── pio-start-all │ ├── pio-stop-all │ ├── semver.sh │ └── travis/ │ ├── pio-start-travis │ └── pio-stop-travis ├── build.sbt ├── common/ │ ├── build.sbt │ └── src/ │ └── main/ │ ├── java/ │ │ └── org/ │ │ └── apache/ │ │ └── predictionio/ │ │ └── annotation/ │ │ ├── DeveloperApi.java │ │ └── Experimental.java │ ├── resources/ │ │ └── application.conf │ └── scala/ │ └── org/ │ └── apache/ │ └── predictionio/ │ ├── akkahttpjson4s/ │ │ └── Json4sSupport.scala │ ├── authentication/ │ │ └── KeyAuthentication.scala │ └── configuration/ │ └── SSLConfiguration.scala ├── conf/ │ ├── keystore.jks │ ├── log4j.properties │ ├── pio-env.sh.template │ ├── pio-env.sh.travis │ ├── pio-vendors.sh │ └── server.conf ├── core/ │ ├── build.sbt │ └── src/ │ ├── main/ │ │ ├── scala/ │ │ │ └── org/ │ │ │ └── apache/ │ │ │ └── predictionio/ │ │ │ ├── controller/ │ │ │ │ ├── CustomQuerySerializer.scala │ │ │ │ ├── Deployment.scala │ │ │ │ ├── Engine.scala │ │ │ │ ├── EngineFactory.scala │ │ │ │ ├── EngineParams.scala │ │ │ │ ├── EngineParamsGenerator.scala │ │ │ │ ├── Evaluation.scala │ │ │ │ ├── FastEvalEngine.scala │ │ │ │ ├── IdentityPreparator.scala │ │ │ │ ├── LAlgorithm.scala │ │ │ │ ├── LAverageServing.scala │ │ │ │ ├── LDataSource.scala │ │ │ │ ├── LFirstServing.scala │ │ │ │ ├── LPreparator.scala │ │ │ │ ├── LServing.scala │ │ │ │ ├── LocalFileSystemPersistentModel.scala │ │ │ │ ├── Metric.scala │ │ │ │ ├── MetricEvaluator.scala │ │ │ │ ├── P2LAlgorithm.scala │ │ │ │ ├── PAlgorithm.scala │ │ │ │ ├── PDataSource.scala │ │ │ │ ├── PPreparator.scala │ │ │ │ ├── Params.scala │ │ │ │ ├── PersistentModel.scala │ │ │ │ ├── SanityCheck.scala │ │ │ │ ├── Utils.scala │ │ │ │ ├── java/ │ │ │ │ │ ├── JavaEngineParamsGenerator.scala │ │ │ │ │ ├── JavaEvaluation.scala │ │ │ │ │ ├── LJavaAlgorithm.scala │ │ │ │ │ ├── LJavaDataSource.scala │ │ │ │ │ ├── LJavaPreparator.scala │ │ │ │ │ ├── LJavaServing.scala │ │ │ │ │ ├── P2LJavaAlgorithm.scala │ │ │ │ │ ├── PJavaAlgorithm.scala │ │ │ │ │ ├── PJavaDataSource.scala │ │ │ │ │ ├── PJavaPreparator.scala │ │ │ │ │ └── SerializableComparator.scala │ │ │ │ └── package.scala │ │ │ ├── core/ │ │ │ │ ├── AbstractDoer.scala │ │ │ │ ├── BaseAlgorithm.scala │ │ │ │ ├── BaseDataSource.scala │ │ │ │ ├── BaseEngine.scala │ │ │ │ ├── BaseEvaluator.scala │ │ │ │ ├── BasePreparator.scala │ │ │ │ ├── BaseServing.scala │ │ │ │ ├── SelfCleaningDataSource.scala │ │ │ │ └── package.scala │ │ │ ├── package.scala │ │ │ └── workflow/ │ │ │ ├── BatchPredict.scala │ │ │ ├── CleanupFunctions.scala │ │ │ ├── CoreWorkflow.scala │ │ │ ├── CreateServer.scala │ │ │ ├── CreateWorkflow.scala │ │ │ ├── EngineServerPlugin.scala │ │ │ ├── EngineServerPluginContext.scala │ │ │ ├── EngineServerPluginsActor.scala │ │ │ ├── EvaluationWorkflow.scala │ │ │ ├── FakeWorkflow.scala │ │ │ ├── JsonExtractor.scala │ │ │ ├── JsonExtractorOption.scala │ │ │ ├── PersistentModelManifest.scala │ │ │ ├── Workflow.scala │ │ │ ├── WorkflowContext.scala │ │ │ ├── WorkflowParams.scala │ │ │ └── WorkflowUtils.scala │ │ └── twirl/ │ │ └── org/ │ │ └── apache/ │ │ └── predictionio/ │ │ ├── controller/ │ │ │ └── metric_evaluator.scala.html │ │ └── workflow/ │ │ └── index.scala.html │ └── test/ │ ├── java/ │ │ └── org/ │ │ └── apache/ │ │ └── predictionio/ │ │ └── workflow/ │ │ ├── JavaParams.java │ │ ├── JavaQuery.java │ │ └── JavaQueryTypeAdapterFactory.java │ └── scala/ │ └── org/ │ └── apache/ │ └── predictionio/ │ ├── controller/ │ │ ├── EngineTest.scala │ │ ├── EvaluationTest.scala │ │ ├── EvaluatorTest.scala │ │ ├── FastEvalEngineTest.scala │ │ ├── MetricEvaluatorTest.scala │ │ ├── MetricTest.scala │ │ └── SampleEngine.scala │ ├── core/ │ │ ├── SelfCleaningDataSourceTest.scala │ │ └── test.json │ └── workflow/ │ ├── BaseTest.scala │ ├── EngineWorkflowTest.scala │ ├── EvaluationWorkflowTest.scala │ └── JsonExtractorSuite.scala ├── data/ │ ├── README.md │ ├── build.sbt │ ├── src/ │ │ ├── main/ │ │ │ └── scala/ │ │ │ └── org/ │ │ │ └── apache/ │ │ │ └── predictionio/ │ │ │ └── data/ │ │ │ ├── Utils.scala │ │ │ ├── api/ │ │ │ │ ├── Common.scala │ │ │ │ ├── EventInfo.scala │ │ │ │ ├── EventServer.scala │ │ │ │ ├── EventServerPlugin.scala │ │ │ │ ├── EventServerPluginContext.scala │ │ │ │ ├── PluginsActor.scala │ │ │ │ ├── Stats.scala │ │ │ │ ├── StatsActor.scala │ │ │ │ ├── Webhooks.scala │ │ │ │ └── WebhooksConnectors.scala │ │ │ ├── package.scala │ │ │ ├── storage/ │ │ │ │ ├── AccessKeys.scala │ │ │ │ ├── Apps.scala │ │ │ │ ├── BiMap.scala │ │ │ │ ├── Channels.scala │ │ │ │ ├── DataMap.scala │ │ │ │ ├── DateTimeJson4sSupport.scala │ │ │ │ ├── EngineInstances.scala │ │ │ │ ├── EntityMap.scala │ │ │ │ ├── EvaluationInstances.scala │ │ │ │ ├── Event.scala │ │ │ │ ├── EventJson4sSupport.scala │ │ │ │ ├── LEventAggregator.scala │ │ │ │ ├── LEvents.scala │ │ │ │ ├── Models.scala │ │ │ │ ├── PEventAggregator.scala │ │ │ │ ├── PEvents.scala │ │ │ │ ├── PropertyMap.scala │ │ │ │ ├── Storage.scala │ │ │ │ ├── Utils.scala │ │ │ │ └── package.scala │ │ │ ├── store/ │ │ │ │ ├── Common.scala │ │ │ │ ├── LEventStore.scala │ │ │ │ ├── PEventStore.scala │ │ │ │ ├── java/ │ │ │ │ │ ├── LJavaEventStore.scala │ │ │ │ │ ├── OptionHelper.scala │ │ │ │ │ └── PJavaEventStore.scala │ │ │ │ ├── package.scala │ │ │ │ └── python/ │ │ │ │ └── PPythonEventStore.scala │ │ │ ├── view/ │ │ │ │ ├── DataView.scala │ │ │ │ ├── LBatchView.scala │ │ │ │ ├── PBatchView.scala │ │ │ │ └── QuickTest.scala │ │ │ └── webhooks/ │ │ │ ├── ConnectorException.scala │ │ │ ├── ConnectorUtil.scala │ │ │ ├── FormConnector.scala │ │ │ ├── JsonConnector.scala │ │ │ ├── exampleform/ │ │ │ │ └── ExampleFormConnector.scala │ │ │ ├── examplejson/ │ │ │ │ └── ExampleJsonConnector.scala │ │ │ ├── mailchimp/ │ │ │ │ └── MailChimpConnector.scala │ │ │ └── segmentio/ │ │ │ └── SegmentIOConnector.scala │ │ └── test/ │ │ ├── resources/ │ │ │ └── application.conf │ │ └── scala/ │ │ └── org/ │ │ └── apache/ │ │ └── predictionio/ │ │ └── data/ │ │ ├── api/ │ │ │ ├── EventServiceSpec.scala │ │ │ └── SegmentIOAuthSpec.scala │ │ ├── storage/ │ │ │ ├── BiMapSpec.scala │ │ │ ├── DataMapSpec.scala │ │ │ ├── LEventAggregatorSpec.scala │ │ │ ├── PEventAggregatorSpec.scala │ │ │ ├── StorageMockContext.scala │ │ │ └── TestEvents.scala │ │ └── webhooks/ │ │ ├── ConnectorTestUtil.scala │ │ ├── exampleform/ │ │ │ └── ExampleFormConnectorSpec.scala │ │ ├── examplejson/ │ │ │ └── ExampleJsonConnectorSpec.scala │ │ ├── mailchimp/ │ │ │ └── MailChimpConnectorSpec.scala │ │ └── segmentio/ │ │ └── SegmentIOConnectorSpec.scala │ ├── test-form.sh │ ├── test-normal.sh │ ├── test-segmentio.sh │ ├── test.sh │ ├── test2.sh │ ├── test3.sh │ └── very_long_batch_request.txt ├── doap.rdf ├── docker/ │ ├── .ivy2/ │ │ └── .keep │ ├── JUPYTER.md │ ├── README.md │ ├── bin/ │ │ └── pio-docker │ ├── charts/ │ │ ├── README.md │ │ ├── postgresql.yaml │ │ ├── predictionio/ │ │ │ ├── .helmignore │ │ │ ├── Chart.yaml │ │ │ ├── templates/ │ │ │ │ ├── NOTES.txt │ │ │ │ ├── _helpers.tpl │ │ │ │ ├── pio-deployment.yaml │ │ │ │ └── pio-service.yaml │ │ │ └── values.yaml │ │ ├── predictionio_postgresql.yaml │ │ └── spark/ │ │ ├── .helmignore │ │ ├── Chart.yaml │ │ ├── README.md │ │ ├── templates/ │ │ │ ├── NOTES.txt │ │ │ ├── _helpers.tpl │ │ │ ├── spark-master-deployment.yaml │ │ │ ├── spark-sql-test.yaml │ │ │ ├── spark-worker-deployment.yaml │ │ │ └── spark-worker-hpa.yaml │ │ └── values.yaml │ ├── deploy/ │ │ └── run.sh │ ├── docker-compose.deploy.yml │ ├── docker-compose.jupyter.yml │ ├── docker-compose.spark.yml │ ├── docker-compose.yml │ ├── elasticsearch/ │ │ ├── docker-compose.base.yml │ │ ├── docker-compose.event.yml │ │ └── docker-compose.meta.yml │ ├── jupyter/ │ │ ├── Dockerfile │ │ ├── fix-permissions │ │ ├── jupyter_notebook_config.py │ │ ├── requirements.txt │ │ ├── start-jupyter.sh │ │ └── start.sh │ ├── localfs/ │ │ └── docker-compose.model.yml │ ├── mysql/ │ │ ├── docker-compose.base.yml │ │ ├── docker-compose.event.yml │ │ ├── docker-compose.meta.yml │ │ └── docker-compose.model.yml │ ├── pgsql/ │ │ ├── docker-compose.base.yml │ │ ├── docker-compose.event.yml │ │ ├── docker-compose.meta.yml │ │ └── docker-compose.model.yml │ ├── pio/ │ │ ├── Dockerfile │ │ └── pio_run │ └── templates/ │ └── .keep ├── docs/ │ ├── javadoc/ │ │ ├── README.md │ │ └── javadoc-overview.html │ ├── manual/ │ │ ├── .gitignore │ │ ├── Gemfile │ │ ├── Rakefile │ │ ├── bower.json │ │ ├── config.rb │ │ ├── data/ │ │ │ ├── nav/ │ │ │ │ ├── build.yml │ │ │ │ └── main.yml │ │ │ └── versions.yml │ │ ├── helpers/ │ │ │ ├── application_helpers.rb │ │ │ ├── breadcrumb_helpers.rb │ │ │ ├── icon_helpers.rb │ │ │ ├── table_of_contents_helpers.rb │ │ │ └── url_helpers.rb │ │ ├── lib/ │ │ │ ├── custom_renderer.rb │ │ │ └── gallery_generator.rb │ │ └── source/ │ │ ├── 404.html.md │ │ ├── algorithm/ │ │ │ ├── custom.html.md │ │ │ ├── index.html.md │ │ │ ├── multiple.html.md │ │ │ └── switch.html.md │ │ ├── appintegration/ │ │ │ └── index.html.md │ │ ├── archived/ │ │ │ ├── community.html.md │ │ │ ├── index.html.md │ │ │ ├── install-linux.html.md.erb │ │ │ ├── install-vagrant.html.md.erb │ │ │ ├── launch-aws.html.md.erb │ │ │ ├── supervisedlearning.html.md │ │ │ └── tapster.html.md │ │ ├── batchpredict/ │ │ │ └── index.html.md │ │ ├── cli/ │ │ │ └── index.html.md │ │ ├── community/ │ │ │ ├── contribute-code.html.md │ │ │ ├── contribute-documentation.html.md │ │ │ ├── contribute-sdk.html.md │ │ │ ├── contribute-webhook.html.md │ │ │ ├── index.html.md │ │ │ ├── projects.html.md │ │ │ └── submit-template.html.md │ │ ├── customize/ │ │ │ ├── dase.html.md.erb │ │ │ ├── index.html.md │ │ │ └── troubleshooting.html.md │ │ ├── datacollection/ │ │ │ ├── analytics-ipynb.html.md.erb │ │ │ ├── analytics-tableau.html.md.erb │ │ │ ├── analytics-zeppelin.html.md.erb │ │ │ ├── analytics.html.md │ │ │ ├── batchimport.html.md │ │ │ ├── channel.html.md.erb │ │ │ ├── eventapi.html.md │ │ │ ├── eventmodel.html.md.erb │ │ │ ├── index.html.md │ │ │ ├── plugin.html.md │ │ │ └── webhooks.html.md.erb │ │ ├── demo/ │ │ │ ├── index.html.md.erb │ │ │ └── textclassification.html.md.erb │ │ ├── deploy/ │ │ │ ├── engineparams.html.md │ │ │ ├── enginevariants.html.md │ │ │ ├── index.html.md │ │ │ ├── monitoring.html.md │ │ │ └── plugin.html.md │ │ ├── evaluation/ │ │ │ ├── evaluationdashboard.html.md │ │ │ ├── history.html.md │ │ │ ├── index.html.md │ │ │ ├── metricbuild.html.md │ │ │ ├── metricchoose.html.md │ │ │ └── paramtuning.html.md │ │ ├── gallery/ │ │ │ └── templates.yaml │ │ ├── github.html │ │ ├── index.html.md.erb │ │ ├── install/ │ │ │ ├── index.html.md.erb │ │ │ ├── install-docker.html.md.erb │ │ │ ├── install-sourcecode.html.md.erb │ │ │ └── sdk.html.md │ │ ├── javascripts/ │ │ │ ├── application.js │ │ │ └── tryit.js │ │ ├── layouts/ │ │ │ ├── layout.html.slim │ │ │ └── tryit.html.slim │ │ ├── machinelearning/ │ │ │ ├── dimensionalityreduction.html.md │ │ │ └── modelingworkflow.html.md │ │ ├── partials/ │ │ │ ├── _action_call.html.slim │ │ │ ├── _edit_page.html.slim │ │ │ ├── _footer.html.slim │ │ │ ├── _header.html.slim │ │ │ ├── _search_bar.html.slim │ │ │ ├── _segment.html.slim │ │ │ ├── _swiftype.html.slim │ │ │ ├── _table_of_content.html.slim │ │ │ ├── head/ │ │ │ │ ├── _base.html.slim │ │ │ │ ├── _favicon.html.slim │ │ │ │ ├── _javascripts.html.slim │ │ │ │ ├── _meta.html.slim │ │ │ │ └── _stylesheets.html.slim │ │ │ ├── nav/ │ │ │ │ ├── _breadcrumbs.html.slim │ │ │ │ ├── _header.html.slim │ │ │ │ ├── _main.html.slim │ │ │ │ ├── _node.html.slim │ │ │ │ ├── _page.html.slim │ │ │ │ └── _swiftype.html.slim │ │ │ └── shared/ │ │ │ ├── dase/ │ │ │ │ └── _dase.html.md.erb │ │ │ ├── datacollection/ │ │ │ │ └── _parquet.html.md.erb │ │ │ ├── install/ │ │ │ │ ├── _dependent_services.html.erb │ │ │ │ ├── _elasticsearch.html.erb │ │ │ │ ├── _hbase.html.erb │ │ │ │ ├── _postgres.html.erb │ │ │ │ ├── _proceed_template.html.md.erb │ │ │ │ └── _spark.html.erb │ │ │ └── quickstart/ │ │ │ ├── _collect_data.html.md.erb │ │ │ ├── _create_app.html.md.erb │ │ │ ├── _create_engine.html.md.erb │ │ │ ├── _deploy.html.md.erb │ │ │ ├── _deploy_enginejson.html.md.erb │ │ │ ├── _import_sample_data.html.md.erb │ │ │ ├── _install.html.md.erb │ │ │ ├── _install_python_sdk.html.md.erb │ │ │ ├── _install_sdk.html.md.erb │ │ │ ├── _production.html.md.erb │ │ │ ├── _query_eventserver.html.md.erb │ │ │ └── _query_eventserver_short.html.md.erb │ │ ├── production/ │ │ │ └── deploy-cloudformation.html.md │ │ ├── resources/ │ │ │ ├── faq.html.md │ │ │ ├── glossary.html.md │ │ │ ├── intellij.html.md.erb │ │ │ ├── release.html.md │ │ │ └── upgrade.html.md │ │ ├── robots.txt │ │ ├── samples/ │ │ │ ├── index.html.md │ │ │ ├── languages.html.md │ │ │ ├── level-1.html.md │ │ │ ├── level-2-1.html.md │ │ │ ├── level-2-2.html.md │ │ │ ├── level-2.html.md │ │ │ ├── level-3-1.html.md │ │ │ ├── level-3-2.html.md │ │ │ ├── level-3.html.md │ │ │ ├── level-4-1.html.md │ │ │ ├── level-4-2.html.md │ │ │ ├── level-4-3.html.md │ │ │ ├── level-4.html.md │ │ │ ├── narrow.html.md │ │ │ ├── sizing.html.md │ │ │ └── tabs.html.md │ │ ├── sdk/ │ │ │ ├── index.html.md │ │ │ ├── java.html.md.erb │ │ │ ├── php.html.md.erb │ │ │ ├── python.html.md.erb │ │ │ └── ruby.html.md.erb │ │ ├── search/ │ │ │ └── index.html.md │ │ ├── start/ │ │ │ ├── customize.html.md │ │ │ ├── deploy.html.md.erb │ │ │ ├── download.html.md │ │ │ └── index.html.md │ │ ├── stylesheets/ │ │ │ ├── application.css.scss │ │ │ ├── mixins/ │ │ │ │ └── _all.css.scss │ │ │ ├── partials/ │ │ │ │ ├── _action_call.css.scss │ │ │ │ ├── _alerts.css.scss │ │ │ │ ├── _breadcrumbs.css.scss │ │ │ │ ├── _buttons.css.scss │ │ │ │ ├── _classes.css.scss │ │ │ │ ├── _code.css.scss │ │ │ │ ├── _content.css.scss │ │ │ │ ├── _copyright.css.scss │ │ │ │ ├── _edit_page.css.scss │ │ │ │ ├── _footer.css.scss │ │ │ │ ├── _global.css.scss │ │ │ │ ├── _hacks.css.scss │ │ │ │ ├── _header.css.scss │ │ │ │ ├── _hybird_vim_highlight.css.scss │ │ │ │ ├── _jcarousel.css.scss │ │ │ │ ├── _layout.css.scss │ │ │ │ ├── _modules.css.scss │ │ │ │ ├── _off_canvas.css.scss │ │ │ │ ├── _page_title.css.scss │ │ │ │ ├── _responsive.css.scss │ │ │ │ ├── _search_bar_row.css.scss │ │ │ │ ├── _subscribe_form.css.scss │ │ │ │ ├── _table_of_contents.css.scss │ │ │ │ ├── _tables.css.scss │ │ │ │ ├── _tabs.css.scss │ │ │ │ ├── _tags.css.scss │ │ │ │ ├── _tryit.css.scss │ │ │ │ └── nav/ │ │ │ │ ├── _header.css.scss │ │ │ │ ├── _main.css.scss │ │ │ │ ├── _page.css.scss │ │ │ │ └── _swiftype.css.scss │ │ │ └── variables/ │ │ │ ├── _colors.css.scss │ │ │ ├── _fonts.css.scss │ │ │ └── _sizes.css.scss │ │ ├── support/ │ │ │ └── index.html.md.erb │ │ ├── system/ │ │ │ ├── anotherdatastore.html.md │ │ │ ├── deploy-cloudformation.html.md.erb │ │ │ └── index.html.md │ │ ├── templates/ │ │ │ ├── classification/ │ │ │ │ ├── add-algorithm.html.md │ │ │ │ ├── dase.html.md.erb │ │ │ │ ├── how-to.html.md │ │ │ │ ├── quickstart.html.md.erb │ │ │ │ └── reading-custom-properties.html.md │ │ │ ├── complementarypurchase/ │ │ │ │ ├── dase.html.md.erb │ │ │ │ └── quickstart.html.md.erb │ │ │ ├── ecommercerecommendation/ │ │ │ │ ├── adjust-score.html.md.erb │ │ │ │ ├── dase.html.md.erb │ │ │ │ ├── how-to.html.md │ │ │ │ ├── quickstart.html.md.erb │ │ │ │ └── train-with-rate-event.html.md.erb │ │ │ ├── index.html.md │ │ │ ├── javaecommercerecommendation/ │ │ │ │ ├── dase.html.md.erb │ │ │ │ └── quickstart.html.md.erb │ │ │ ├── leadscoring/ │ │ │ │ ├── dase.html.md.erb │ │ │ │ └── quickstart.html.md.erb │ │ │ ├── productranking/ │ │ │ │ ├── dase.html.md.erb │ │ │ │ └── quickstart.html.md.erb │ │ │ ├── recommendation/ │ │ │ │ ├── batch-evaluator.html.md │ │ │ │ ├── blacklist-items.html.md │ │ │ │ ├── customize-data-prep.html.md │ │ │ │ ├── customize-serving.html.md │ │ │ │ ├── dase.html.md.erb │ │ │ │ ├── evaluation.html.md.erb │ │ │ │ ├── how-to.html.md │ │ │ │ ├── quickstart.html.md.erb │ │ │ │ ├── reading-custom-events.html.md │ │ │ │ └── training-with-implicit-preference.html.md │ │ │ ├── similarproduct/ │ │ │ │ ├── dase.html.md.erb │ │ │ │ ├── how-to.html.md │ │ │ │ ├── multi-events-multi-algos.html.md.erb │ │ │ │ ├── quickstart.html.md.erb │ │ │ │ ├── recommended-user.html.md.erb │ │ │ │ ├── return-item-properties.html.md.erb │ │ │ │ ├── rid-user-set-event.html.md.erb │ │ │ │ └── train-with-rate-event.html.md.erb │ │ │ └── vanilla/ │ │ │ ├── dase.html.md.erb │ │ │ └── quickstart.html.md.erb │ │ └── tryit/ │ │ └── index.html.slim │ └── scaladoc/ │ ├── README.md │ ├── api-docs.css │ ├── api-docs.js │ └── rootdoc.txt ├── e2/ │ ├── build.sbt │ └── src/ │ ├── main/ │ │ └── scala/ │ │ └── org/ │ │ └── apache/ │ │ └── predictionio/ │ │ ├── e2/ │ │ │ ├── engine/ │ │ │ │ ├── BinaryVectorizer.scala │ │ │ │ ├── CategoricalNaiveBayes.scala │ │ │ │ ├── MarkovChain.scala │ │ │ │ └── PythonEngine.scala │ │ │ ├── evaluation/ │ │ │ │ └── CrossValidation.scala │ │ │ └── package.scala │ │ └── package.scala │ └── test/ │ └── scala/ │ └── org/ │ └── apache/ │ └── predictionio/ │ └── e2/ │ ├── engine/ │ │ ├── BinaryVectorizerTest.scala │ │ ├── CategoricalNaiveBayesTest.scala │ │ └── MarkovChainTest.scala │ ├── evaluation/ │ │ └── CrossValidationTest.scala │ └── fixture/ │ ├── BinaryVectorizerFixture.scala │ ├── MarkovChainFixture.scala │ ├── NaiveBayesFixture.scala │ └── SharedSparkContext.scala ├── examples/ │ ├── redeploy-script/ │ │ ├── local.sh.template │ │ └── redeploy.sh │ ├── scala-parallel-classification/ │ │ ├── README.md │ │ ├── add-algorithm/ │ │ │ ├── .gitignore │ │ │ ├── build.sbt │ │ │ ├── data/ │ │ │ │ ├── data.txt │ │ │ │ └── import_eventserver.py │ │ │ ├── engine.json │ │ │ ├── project/ │ │ │ │ ├── assembly.sbt │ │ │ │ └── build.properties │ │ │ ├── src/ │ │ │ │ └── main/ │ │ │ │ └── scala/ │ │ │ │ ├── CompleteEvaluation.scala │ │ │ │ ├── DataSource.scala │ │ │ │ ├── Engine.scala │ │ │ │ ├── Evaluation.scala │ │ │ │ ├── NaiveBayesAlgorithm.scala │ │ │ │ ├── PrecisionEvaluation.scala │ │ │ │ ├── Preparator.scala │ │ │ │ ├── RandomForestAlgorithm.scala │ │ │ │ └── Serving.scala │ │ │ └── template.json │ │ └── reading-custom-properties/ │ │ ├── .gitignore │ │ ├── build.sbt │ │ ├── data/ │ │ │ ├── data.txt │ │ │ └── import_eventserver.py │ │ ├── engine.json │ │ ├── project/ │ │ │ ├── assembly.sbt │ │ │ └── build.properties │ │ ├── src/ │ │ │ └── main/ │ │ │ └── scala/ │ │ │ ├── CompleteEvaluation.scala │ │ │ ├── DataSource.scala │ │ │ ├── Engine.scala │ │ │ ├── Evaluation.scala │ │ │ ├── NaiveBayesAlgorithm.scala │ │ │ ├── PrecisionEvaluation.scala │ │ │ ├── Preparator.scala │ │ │ └── Serving.scala │ │ └── template.json │ ├── scala-parallel-ecommercerecommendation/ │ │ ├── README.md │ │ ├── adjust-score/ │ │ │ ├── .gitignore │ │ │ ├── build.sbt │ │ │ ├── data/ │ │ │ │ ├── import_eventserver.py │ │ │ │ └── send_query.py │ │ │ ├── engine.json │ │ │ ├── project/ │ │ │ │ ├── assembly.sbt │ │ │ │ └── build.properties │ │ │ ├── src/ │ │ │ │ └── main/ │ │ │ │ └── scala/ │ │ │ │ ├── DataSource.scala │ │ │ │ ├── ECommAlgorithm.scala │ │ │ │ ├── Engine.scala │ │ │ │ ├── Preparator.scala │ │ │ │ └── Serving.scala │ │ │ └── template.json │ │ └── train-with-rate-event/ │ │ ├── build.sbt │ │ ├── data/ │ │ │ ├── import_eventserver.py │ │ │ └── send_query.py │ │ ├── engine.json │ │ ├── project/ │ │ │ ├── assembly.sbt │ │ │ └── build.properties │ │ ├── src/ │ │ │ └── main/ │ │ │ └── scala/ │ │ │ ├── DataSource.scala │ │ │ ├── ECommAlgorithm.scala │ │ │ ├── Engine.scala │ │ │ ├── Preparator.scala │ │ │ └── Serving.scala │ │ └── template.json │ ├── scala-parallel-recommendation/ │ │ ├── README.md │ │ ├── blacklist-items/ │ │ │ ├── build.sbt │ │ │ ├── data/ │ │ │ │ ├── import_eventserver.py │ │ │ │ └── send_query.py │ │ │ ├── engine.json │ │ │ ├── project/ │ │ │ │ ├── assembly.sbt │ │ │ │ └── build.properties │ │ │ ├── src/ │ │ │ │ └── main/ │ │ │ │ └── scala/ │ │ │ │ ├── ALSAlgorithm.scala │ │ │ │ ├── ALSModel.scala │ │ │ │ ├── DataSource.scala │ │ │ │ ├── Engine.scala │ │ │ │ ├── Evaluation.scala │ │ │ │ ├── Preparator.scala │ │ │ │ └── Serving.scala │ │ │ └── template.json │ │ ├── customize-data-prep/ │ │ │ ├── .gitignore │ │ │ ├── build.sbt │ │ │ ├── data/ │ │ │ │ ├── import_eventserver.py │ │ │ │ ├── sample_not_train_data.txt │ │ │ │ └── send_query.py │ │ │ ├── engine.json │ │ │ ├── project/ │ │ │ │ ├── assembly.sbt │ │ │ │ └── build.properties │ │ │ ├── src/ │ │ │ │ └── main/ │ │ │ │ └── scala/ │ │ │ │ ├── ALSAlgorithm.scala │ │ │ │ ├── ALSModel.scala │ │ │ │ ├── DataSource.scala │ │ │ │ ├── Engine.scala │ │ │ │ ├── Evaluation.scala │ │ │ │ ├── Preparator.scala │ │ │ │ └── Serving.scala │ │ │ └── template.json │ │ ├── customize-serving/ │ │ │ ├── .gitignore │ │ │ ├── build.sbt │ │ │ ├── data/ │ │ │ │ ├── import_eventserver.py │ │ │ │ ├── sample_disabled_items.txt │ │ │ │ └── send_query.py │ │ │ ├── engine.json │ │ │ ├── project/ │ │ │ │ ├── assembly.sbt │ │ │ │ └── build.properties │ │ │ ├── src/ │ │ │ │ └── main/ │ │ │ │ └── scala/ │ │ │ │ ├── ALSAlgorithm.scala │ │ │ │ ├── ALSModel.scala │ │ │ │ ├── DataSource.scala │ │ │ │ ├── Engine.scala │ │ │ │ ├── Evaluation.scala │ │ │ │ ├── Preparator.scala │ │ │ │ └── Serving.scala │ │ │ └── template.json │ │ ├── reading-custom-events/ │ │ │ ├── .gitignore │ │ │ ├── build.sbt │ │ │ ├── data/ │ │ │ │ ├── import_eventserver.py │ │ │ │ └── send_query.py │ │ │ ├── engine.json │ │ │ ├── project/ │ │ │ │ ├── assembly.sbt │ │ │ │ └── build.properties │ │ │ ├── src/ │ │ │ │ └── main/ │ │ │ │ └── scala/ │ │ │ │ ├── ALSAlgorithm.scala │ │ │ │ ├── ALSModel.scala │ │ │ │ ├── DataSource.scala │ │ │ │ ├── Engine.scala │ │ │ │ ├── Evaluation.scala │ │ │ │ ├── Preparator.scala │ │ │ │ └── Serving.scala │ │ │ └── template.json │ │ └── train-with-view-event/ │ │ ├── .gitignore │ │ ├── build.sbt │ │ ├── data/ │ │ │ ├── import_eventserver.py │ │ │ └── send_query.py │ │ ├── engine.json │ │ ├── project/ │ │ │ ├── assembly.sbt │ │ │ └── build.properties │ │ ├── src/ │ │ │ └── main/ │ │ │ └── scala/ │ │ │ ├── ALSAlgorithm.scala │ │ │ ├── ALSModel.scala │ │ │ ├── DataSource.scala │ │ │ ├── Engine.scala │ │ │ ├── Evaluation.scala │ │ │ ├── Preparator.scala │ │ │ └── Serving.scala │ │ └── template.json │ └── scala-parallel-similarproduct/ │ ├── README.md │ ├── multi-events-multi-algos/ │ │ ├── .gitignore │ │ ├── build.sbt │ │ ├── data/ │ │ │ ├── import_eventserver.py │ │ │ └── send_query.py │ │ ├── engine-cooccurrence.json │ │ ├── engine.json │ │ ├── project/ │ │ │ ├── assembly.sbt │ │ │ └── build.properties │ │ ├── src/ │ │ │ └── main/ │ │ │ └── scala/ │ │ │ ├── ALSAlgorithm.scala │ │ │ ├── CooccurrenceAlgorithm.scala │ │ │ ├── DataSource.scala │ │ │ ├── Engine.scala │ │ │ ├── LikeAlgorithm.scala │ │ │ ├── Preparator.scala │ │ │ └── Serving.scala │ │ └── template.json │ ├── recommended-user/ │ │ ├── .gitignore │ │ ├── build.sbt │ │ ├── data/ │ │ │ ├── import_eventserver.py │ │ │ └── send_query.py │ │ ├── engine.json │ │ ├── project/ │ │ │ ├── assembly.sbt │ │ │ └── build.properties │ │ ├── src/ │ │ │ └── main/ │ │ │ └── scala/ │ │ │ ├── ALSAlgorithm.scala │ │ │ ├── DataSource.scala │ │ │ ├── Engine.scala │ │ │ ├── Preparator.scala │ │ │ └── Serving.scala │ │ └── template.json │ ├── return-item-properties/ │ │ ├── .gitignore │ │ ├── build.sbt │ │ ├── data/ │ │ │ ├── import_eventserver.py │ │ │ └── send_query.py │ │ ├── engine-cooccurrence.json │ │ ├── engine.json │ │ ├── project/ │ │ │ ├── assembly.sbt │ │ │ └── build.properties │ │ ├── src/ │ │ │ └── main/ │ │ │ └── scala/ │ │ │ ├── ALSAlgorithm.scala │ │ │ ├── CooccurrenceAlgorithm.scala │ │ │ ├── DataSource.scala │ │ │ ├── Engine.scala │ │ │ ├── Preparator.scala │ │ │ └── Serving.scala │ │ └── template.json │ ├── rid-user-set-event/ │ │ ├── .gitignore │ │ ├── build.sbt │ │ ├── data/ │ │ │ ├── import_eventserver.py │ │ │ └── send_query.py │ │ ├── engine-cooccurrence.json │ │ ├── engine.json │ │ ├── project/ │ │ │ ├── assembly.sbt │ │ │ └── build.properties │ │ ├── src/ │ │ │ └── main/ │ │ │ └── scala/ │ │ │ ├── ALSAlgorithm.scala │ │ │ ├── CooccurrenceAlgorithm.scala │ │ │ ├── DataSource.scala │ │ │ ├── Engine.scala │ │ │ ├── Preparator.scala │ │ │ └── Serving.scala │ │ └── template.json │ └── train-with-rate-event/ │ ├── build.sbt │ ├── data/ │ │ ├── import_eventserver.py │ │ └── send_query.py │ ├── engine-cooccurrence.json │ ├── engine.json │ ├── project/ │ │ ├── assembly.sbt │ │ └── build.properties │ ├── src/ │ │ └── main/ │ │ └── scala/ │ │ ├── ALSAlgorithm.scala │ │ ├── CooccurrenceAlgorithm.scala │ │ ├── DataSource.scala │ │ ├── Engine.scala │ │ ├── Preparator.scala │ │ └── Serving.scala │ └── template.json ├── make-distribution.sh ├── project/ │ ├── PIOBuild.scala │ ├── assembly.sbt │ ├── build.properties │ ├── plugins.sbt │ └── unidoc.sbt ├── python/ │ └── pypio/ │ ├── __init__.py │ ├── data/ │ │ ├── __init__.py │ │ └── eventstore.py │ ├── pypio.py │ ├── utils.py │ └── workflow/ │ ├── __init__.py │ └── cleanup_functions.py ├── sbt/ │ └── sbt ├── scalastyle-config.xml ├── storage/ │ ├── elasticsearch/ │ │ ├── .gitignore │ │ ├── build.sbt │ │ └── src/ │ │ ├── main/ │ │ │ └── scala/ │ │ │ └── org/ │ │ │ └── apache/ │ │ │ └── predictionio/ │ │ │ └── data/ │ │ │ └── storage/ │ │ │ └── elasticsearch/ │ │ │ ├── ESAccessKeys.scala │ │ │ ├── ESApps.scala │ │ │ ├── ESChannels.scala │ │ │ ├── ESEngineInstances.scala │ │ │ ├── ESEvaluationInstances.scala │ │ │ ├── ESEventsUtil.scala │ │ │ ├── ESLEvents.scala │ │ │ ├── ESPEvents.scala │ │ │ ├── ESSequences.scala │ │ │ ├── ESUtils.scala │ │ │ ├── StorageClient.scala │ │ │ └── package.scala │ │ └── test/ │ │ ├── resources/ │ │ │ └── application.conf │ │ └── scala/ │ │ └── org/ │ │ └── apache/ │ │ └── predictionio/ │ │ └── data/ │ │ └── storage/ │ │ └── elasticsearch/ │ │ ├── StorageClientSpec.scala │ │ └── StorageTestUtils.scala │ ├── hbase/ │ │ ├── .gitignore │ │ ├── build.sbt │ │ └── src/ │ │ ├── main/ │ │ │ └── scala/ │ │ │ └── org/ │ │ │ └── apache/ │ │ │ └── predictionio/ │ │ │ └── data/ │ │ │ └── storage/ │ │ │ └── hbase/ │ │ │ ├── HBEventsUtil.scala │ │ │ ├── HBLEvents.scala │ │ │ ├── HBPEvents.scala │ │ │ ├── PIOHBaseUtil.scala │ │ │ ├── StorageClient.scala │ │ │ └── package.scala │ │ └── test/ │ │ ├── resources/ │ │ │ └── application.conf │ │ └── scala/ │ │ └── org/ │ │ └── apache/ │ │ └── predictionio/ │ │ └── data/ │ │ └── storage/ │ │ └── hbase/ │ │ ├── LEventsSpec.scala │ │ ├── PEventsSpec.scala │ │ ├── StorageTestUtils.scala │ │ └── TestEvents.scala │ ├── hdfs/ │ │ ├── .gitignore │ │ ├── build.sbt │ │ ├── project/ │ │ │ └── build.properties │ │ └── src/ │ │ ├── main/ │ │ │ └── scala/ │ │ │ └── org/ │ │ │ └── apache/ │ │ │ └── predictionio/ │ │ │ └── data/ │ │ │ └── storage/ │ │ │ └── hdfs/ │ │ │ ├── HDFSModels.scala │ │ │ ├── StorageClient.scala │ │ │ └── package.scala │ │ └── test/ │ │ └── resources/ │ │ └── application.conf │ ├── jdbc/ │ │ ├── .gitignore │ │ ├── build.sbt │ │ └── src/ │ │ ├── main/ │ │ │ └── scala/ │ │ │ └── org/ │ │ │ └── apache/ │ │ │ └── predictionio/ │ │ │ └── data/ │ │ │ └── storage/ │ │ │ └── jdbc/ │ │ │ ├── JDBCAccessKeys.scala │ │ │ ├── JDBCApps.scala │ │ │ ├── JDBCChannels.scala │ │ │ ├── JDBCEngineInstances.scala │ │ │ ├── JDBCEvaluationInstances.scala │ │ │ ├── JDBCLEvents.scala │ │ │ ├── JDBCModels.scala │ │ │ ├── JDBCPEvents.scala │ │ │ ├── JDBCUtils.scala │ │ │ ├── StorageClient.scala │ │ │ └── package.scala │ │ └── test/ │ │ ├── resources/ │ │ │ └── application.conf │ │ └── scala/ │ │ └── org/ │ │ └── apache/ │ │ └── predictionio/ │ │ └── data/ │ │ └── storage/ │ │ └── jdbc/ │ │ ├── JDBCUtilsSpec.scala │ │ ├── LEventsSpec.scala │ │ ├── PEventsSpec.scala │ │ ├── StorageTestUtils.scala │ │ └── TestEvents.scala │ ├── localfs/ │ │ ├── .gitignore │ │ ├── build.sbt │ │ └── src/ │ │ ├── main/ │ │ │ └── scala/ │ │ │ └── org/ │ │ │ └── apache/ │ │ │ └── predictionio/ │ │ │ └── data/ │ │ │ └── storage/ │ │ │ └── localfs/ │ │ │ ├── LocalFSModels.scala │ │ │ ├── StorageClient.scala │ │ │ └── package.scala │ │ └── test/ │ │ └── resources/ │ │ └── application.conf │ └── s3/ │ ├── .gitignore │ ├── build.sbt │ └── src/ │ └── main/ │ └── scala/ │ └── org/ │ └── apache/ │ └── predictionio/ │ └── data/ │ └── storage/ │ └── s3/ │ ├── S3Models.scala │ ├── StorageClient.scala │ └── package.scala ├── tests/ │ ├── .rat-excludes │ ├── Dockerfile │ ├── Dockerfile.base │ ├── README.md │ ├── after_script.travis.sh │ ├── before_script.travis.sh │ ├── build_docker.sh │ ├── check_libraries.sh │ ├── check_license.sh │ ├── docker-compose.yml │ ├── docker-files/ │ │ ├── awscredentials │ │ ├── env-conf/ │ │ │ ├── hbase-site.xml │ │ │ └── pio-env.sh │ │ ├── init.sh │ │ ├── pgpass │ │ └── set_build_profile.sh │ ├── pio_tests/ │ │ ├── README.md │ │ ├── __init__.py │ │ ├── data/ │ │ │ ├── eventserver_test/ │ │ │ │ ├── partially_malformed_events.json │ │ │ │ ├── rate_events_25.json │ │ │ │ └── signup_events_51.json │ │ │ └── quickstart_test/ │ │ │ └── engine.json │ │ ├── engines/ │ │ │ └── recommendation-engine/ │ │ │ ├── README.md │ │ │ ├── build.sbt │ │ │ ├── engine.json │ │ │ ├── project/ │ │ │ │ └── assembly.sbt │ │ │ ├── src/ │ │ │ │ └── main/ │ │ │ │ └── scala/ │ │ │ │ ├── ALSAlgorithm.scala │ │ │ │ ├── ALSModel.scala │ │ │ │ ├── DataSource.scala │ │ │ │ ├── Engine.scala │ │ │ │ ├── Evaluation.scala │ │ │ │ ├── Preparator.scala │ │ │ │ └── Serving.scala │ │ │ └── template.json │ │ ├── globals.py │ │ ├── integration.py │ │ ├── scenarios/ │ │ │ ├── __init__.py │ │ │ ├── basic_app_usecases.py │ │ │ ├── eventserver_test.py │ │ │ └── quickstart_test.py │ │ ├── tests.py │ │ └── utils.py │ ├── run_docker.sh │ ├── script.travis.sh │ └── unit.sh └── tools/ ├── build.sbt └── src/ ├── main/ │ ├── scala/ │ │ └── org/ │ │ └── apache/ │ │ └── predictionio/ │ │ └── tools/ │ │ ├── Common.scala │ │ ├── RunBatchPredict.scala │ │ ├── RunServer.scala │ │ ├── RunWorkflow.scala │ │ ├── Runner.scala │ │ ├── admin/ │ │ │ ├── AdminAPI.scala │ │ │ ├── CommandClient.scala │ │ │ └── README.md │ │ ├── commands/ │ │ │ ├── AccessKey.scala │ │ │ ├── App.scala │ │ │ ├── Engine.scala │ │ │ ├── Export.scala │ │ │ ├── Import.scala │ │ │ ├── Management.scala │ │ │ └── Template.scala │ │ ├── console/ │ │ │ ├── Console.scala │ │ │ └── Pio.scala │ │ ├── dashboard/ │ │ │ ├── CorsSupport.scala │ │ │ └── Dashboard.scala │ │ ├── export/ │ │ │ └── EventsToFile.scala │ │ └── imprt/ │ │ └── FileToEvents.scala │ └── twirl/ │ └── org/ │ └── apache/ │ └── predictionio/ │ └── tools/ │ ├── console/ │ │ ├── accesskey.scala.txt │ │ ├── adminserver.scala.txt │ │ ├── app.scala.txt │ │ ├── batchpredict.scala.txt │ │ ├── build.scala.txt │ │ ├── dashboard.scala.txt │ │ ├── deploy.scala.txt │ │ ├── eval.scala.txt │ │ ├── eventserver.scala.txt │ │ ├── export.scala.txt │ │ ├── imprt.scala.txt │ │ ├── main.scala.txt │ │ ├── run.scala.txt │ │ ├── status.scala.txt │ │ ├── template.scala.txt │ │ ├── train.scala.txt │ │ ├── upgrade.scala.txt │ │ └── version.scala.txt │ ├── dashboard/ │ │ └── index.scala.html │ └── templates/ │ ├── itemrank/ │ │ └── params/ │ │ ├── algorithmsJson.scala.txt │ │ ├── datasourceJson.scala.txt │ │ ├── preparatorJson.scala.txt │ │ └── servingJson.scala.txt │ ├── itemrec/ │ │ └── params/ │ │ ├── algorithmsJson.scala.txt │ │ ├── datasourceJson.scala.txt │ │ ├── preparatorJson.scala.txt │ │ └── servingJson.scala.txt │ ├── itemsim/ │ │ └── params/ │ │ ├── algorithmsJson.scala.txt │ │ ├── datasourceJson.scala.txt │ │ ├── preparatorJson.scala.txt │ │ └── servingJson.scala.txt │ └── scala/ │ ├── buildSbt.scala.txt │ ├── engineJson.scala.txt │ ├── manifestJson.scala.txt │ ├── project/ │ │ └── assemblySbt.scala.txt │ └── src/ │ └── main/ │ └── scala/ │ └── engine.scala.txt └── test/ └── scala/ └── org/ └── apache/ └── predictionio/ └── tools/ ├── RunnerSpec.scala └── admin/ └── AdminAPISpec.scala ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitattributes ================================================ .travis.yml merge=ours ================================================ FILE: .gitignore ================================================ .DS_Store .pio_store *.swp *.swo conf/pio-env.sh target/ sbt/sbt-launch-*.jar ..sxr/ *.class core/data *.orig examples/data/ml-* fs/ supervisord.conf /dist pio.log *.tar.gz *.pyc # Ignore source files whose name prefixed with "Private" Private*.scala quickstartapp/ # Eclipse .project .classpath .settings/ # IntelliJ *.iml .idea/ .templates-cache /vendors /docs/manual/source/gallery/template-gallery.html.md test-reports/ apache-rat-0.11.jar tests/dist tests/docker-files/*.jar tests/docker-files/*.tgz assembly/*.jar assembly/src/universal/ ================================================ FILE: .travis.yml ================================================ ########## # This is .travis.yml configuration file specifically for master and develop branch. # The travis job should contains only unit and integration tests. # # To avoid this file from being overwritten by .travis.yml from other branches, # please add the following to your local git config: # git config merge.ours.driver true ########## # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # branches: except: - livedoc language: scala jdk: - openjdk8 services: - docker sudo: required cache: directories: - $HOME/.ivy2/cache - $HOME/.sbt/boot - $HOME/.sbt/launchers env: matrix: - BUILD_TYPE=Unit METADATA_REP=PGSQL EVENTDATA_REP=PGSQL MODELDATA_REP=PGSQL - BUILD_TYPE=Integration METADATA_REP=ELASTICSEARCH EVENTDATA_REP=ELASTICSEARCH MODELDATA_REP=S3 PIO_ELASTICSEARCH_VERSION=6.8.1 - BUILD_TYPE=Integration METADATA_REP=ELASTICSEARCH EVENTDATA_REP=HBASE MODELDATA_REP=LOCALFS PIO_HBASE_VERSION=1.2.6 - BUILD_TYPE=Integration METADATA_REP=PGSQL EVENTDATA_REP=PGSQL MODELDATA_REP=PGSQL PIO_SCALA_VERSION=2.11.12 PIO_SPARK_VERSION=2.0.2 PIO_HADOOP_VERSION=2.6.5 - BUILD_TYPE=Integration METADATA_REP=PGSQL EVENTDATA_REP=PGSQL MODELDATA_REP=PGSQL PIO_SCALA_VERSION=2.11.12 PIO_SPARK_VERSION=2.1.3 PIO_HADOOP_VERSION=2.6.5 - BUILD_TYPE=Integration METADATA_REP=PGSQL EVENTDATA_REP=PGSQL MODELDATA_REP=PGSQL PIO_SCALA_VERSION=2.11.12 PIO_SPARK_VERSION=2.2.3 PIO_HADOOP_VERSION=2.6.5 - BUILD_TYPE=Integration METADATA_REP=PGSQL EVENTDATA_REP=PGSQL MODELDATA_REP=HDFS PIO_SCALA_VERSION=2.11.12 PIO_SPARK_VERSION=2.3.3 PIO_HADOOP_VERSION=2.6.5 - BUILD_TYPE=Integration METADATA_REP=PGSQL EVENTDATA_REP=PGSQL MODELDATA_REP=PGSQL PIO_SCALA_VERSION=2.11.12 PIO_SPARK_VERSION=2.0.2 PIO_HADOOP_VERSION=2.7.7 - BUILD_TYPE=Integration METADATA_REP=PGSQL EVENTDATA_REP=PGSQL MODELDATA_REP=PGSQL PIO_SCALA_VERSION=2.11.12 PIO_SPARK_VERSION=2.1.3 PIO_HADOOP_VERSION=2.7.7 - BUILD_TYPE=Integration METADATA_REP=PGSQL EVENTDATA_REP=PGSQL MODELDATA_REP=PGSQL PIO_SCALA_VERSION=2.11.12 PIO_SPARK_VERSION=2.2.3 PIO_HADOOP_VERSION=2.7.7 - BUILD_TYPE=Integration METADATA_REP=PGSQL EVENTDATA_REP=PGSQL MODELDATA_REP=HDFS PIO_SCALA_VERSION=2.11.12 PIO_SPARK_VERSION=2.4.0 PIO_HADOOP_VERSION=2.7.7 - BUILD_TYPE=LicenseCheck before_install: - unset SBT_OPTS JVM_OPTS - sudo rm /usr/local/bin/docker-compose - travis_retry curl -L https://github.com/docker/compose/releases/download/1.11.1/docker-compose-`uname -s`-`uname -m` > docker-compose - chmod +x docker-compose - sudo mv docker-compose /usr/local/bin before_script: - sudo sysctl -w vm.max_map_count=262144 - docker-compose -v - travis_retry ./tests/before_script.travis.sh script: - travis_retry ./tests/script.travis.sh after_script: - ./tests/after_script.travis.sh ================================================ FILE: CONTRIBUTING.md ================================================ Thank you for your interest in contributing to Apache PredictionIO. Our mission is to enable developers to build scalable machine learning applications easily. Here is how you can help with the project development. If you have any question regarding development at anytime, please free to subscribe and post to the Development Mailing List . For code contribution, please follow guidelines at http://predictionio.apache.org/community/contribute-code/. For documentation contribution, please follow guidelines at http://predictionio.apache.org/community/contribute-documentation/. ================================================ FILE: Dockerfile ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # WARNING: THIS DOCKERFILE IS NOT INTENDED FOR PRODUCTION USE OR DEPLOYMENT. AT # THIS POINT, THIS IS ONLY INTENDED FOR USE IN AUTOMATED TESTS. IF YOU # ARE LOOKING TO DEPLOY PREDICTIONIO WITH DOCKER, PLEASE REFER TO # http://predictionio.apache.org/community/projects/#docker-installation-for-predictionio FROM predictionio/pio-testing-base # Include the entire code tree ENV PIO_HOME /PredictionIO ENV PATH ${PIO_HOME}/bin/:${PATH} ADD . ${PIO_HOME} ================================================ FILE: KEYS ================================================ This file contains the PGP keys of various developers. Please don't use them for email unless you have to. Their main purpose is code signing. Users: pgp < KEYS gpg --import KEYS Developers: pgp -kxa and append it to this file. (pgpk -ll && pgpk -xa ) >> this file. (gpg --list-sigs && gpg --armor --export ) >> this file. -------------------------------------------------------------------------------------------- pub 4096R/D3541808 2014-01-09 uid [ultimate] Suneel Marthi (CODE SIGNING KEY) sig 3 D3541808 2014-01-09 Suneel Marthi (CODE SIGNING KEY) sub 4096R/AF46E2DE 2014-01-09 sig D3541808 2014-01-09 Suneel Marthi (CODE SIGNING KEY) -----BEGIN PGP PUBLIC KEY BLOCK----- Comment: GPGTools - https://gpgtools.org mQINBFLPJmEBEAC9d/dUZCXeyhB0fVGmJAjdjXfLebav4VqGdNZC+M1T9C3dcVsh X/JGme5bjJeIgVwiH5UsdNceYn1+hyxs8jXuRAWEWKP76gD+pNrp8Az0ZdBkJoAy zCywOPtJV2PCOz7+S5ri2nUA2+1Kgcu6IlSLMmYAGO0IAmRrjBEzxy9iGaxiNGTc LvQt/iVtIXWkKKI8yvpoJ8iFf3TGhpjgaC/h7cJP3zpy0SScmhJJASLXRsfocLv9 sle6ndN9IPbDtRW8cL7Fk3VQlzp1ToVjmnQTyZZ6S1WafsjzCZ9hLN+k++o8VbvY v3icY6Sy0BKz0J6KwaxTkuZ6w1K7oUkVOQboKaWFIEdO+jwrEmU+Puyd8Np8jLnF Q0Y5GPfyMlqM3S/zaDm1t4D1eb5FLciStkxfg5wPVK6TkqB325KVD3aio5C7E7kt aQechHxaJXCQOtCtVY4X+L4iClnMSuk+hcSc8W8MYRTSVansItK0vI9eQZXMnpan w9/jk5rS4Gts1rHB7+kdjT3QRJmkyk6fEFT0fz5tfMC7N8waeEUhCaRW6lAoiqDW NW1h+0UGxJw+9YcGxBC0kkt3iofNOWQWmuf/BS3DHPKT7XV/YtBHe44wW0sF5L5P nfQUHpnA3pcZ0En6bXAvepKVZTNdOWWJqMyHV+436DA+33h45QL6lWb/GwARAQAB tDVTdW5lZWwgTWFydGhpIChDT0RFIFNJR05JTkcgS0VZKSA8c21hcnRoaUBhcGFj aGUub3JnPokCNwQTAQoAIQUCUs8mYQIbAwULCQgHAwUVCgkICwUWAgMBAAIeAQIX gAAKCRC08czE01QYCOKKEAChRtHBoYNTX+RZbFO0Kl1GlN+i1Ik0shEm5ZJ56XHv AnFx/gRK7CfZzJswWo7kf2s/dvJiFfs+rrolYVuO6E8gNhAaTEomSuvWQAMHdPcR 9G5APRKCSkbZYugElqplEbSphk78FKoFO+sml52M7Pr9jj88ApBjoFVVY8njdnNq 6DVlaDsg8YninCD78Z7PNFnRGwxyZ8Qd4Dh0rG+MUTfAWopZu6/MxpQxU7QpeVeX SIMLg7ClFrGfXnZcszYF4dnav1aa0i7W88PAdYNPko7tC5qz5yv2ep7t2gRbcYKf RXhYC2FHQey3wPhMKjA8V436lAqmfYnY/YdmhEy9Xq/1EdX1nHsQ7OEkfgXK14WM F+rnqXRAl/0cwiyb41eocdg5kpZFIKgCYT02usLWxwNnd3jOCe109Ze3y3acN/G8 +xOf9YRfNVAe6pD8H6ieRbv9gRjBmsbz9bXQCmxFnDqxNri5Me6gBAQPNmYTJD0h jgJTK6o0vJ0pwjBLauasJsLu+1tR3Cb0dxPE+JVaTF26FCd7pM7W6KdVfod9ZfrN cSyJ/cECc2KvYVGmTjQNVo1dYG0awBachlWnYNt+0Qx4opLsczZOLtPKtFY4BJA7 aZoXT4Qf9yB8km7x2/cgNExVbFummToJ/IP3M39/EaryspsQQuM5Qu5Q5lZp8Qnn ybkCDQRSzyZhARAA7bAawFzbJaghYnm6mTZyGG5hQmfAynbF6cPAE+g2SnXcNQjP 6kjYx3tSpb7rEzmjQqs46ztqdec6PIVBMhakON6z27Zz+IviAtO/TcaZHWNuCAjw FXVQZ+tYsSeiKInttfkrQc8jXAHWwSkSjLqNpvQpBdBEX80MYkFB6ZPOeON2+/Ta GC1H/HU2YngF0qQSmG33KKG6ezihBJdKxU6t2tsQfTlCmZW6R6MGpS9fVurYMKBk vR+7RGZ/H6dSjWPcpxhusGg92J9uz7r5SopN1wSdyPMUCMAFGeyoxcAuBDl38quU H/ENG3x5LDPq2aEH2AJ6yvZfIXbeJ1zmXf2cAHv+HbmvZaTSp0XIjq8Yxh8NkYEC ZdfRWmsGLIpU16TkBijpK3Dn9MDXjHGT3V8/qfdpURtMvIaL8WFrq9ejcy/vGRFn mCYqxIIPH+vLiMXKWtuMc61GN3ES21msKQH6IuQxxfQLyhK44L/pv7FpF4E+6LaE 8uRwAex5HIDpR1v4aJq089rRtye9VXTJJLZ7lYs0HctdZ30QbBRWT4jS9d9rj3cr HgQ7mIGO9TAfK2kWc6AJN/EvxPWNbOwptsTUzAF/adiy9ax8C18iw7nKczC+2eN6 UcbxXiPdytuKYK7O9A8S9e1w89GwpxYN7Xfn2o6QfpSbL9cLKiinOeV+xikAEQEA AYkCHwQYAQoACQUCUs8mYQIbDAAKCRC08czE01QYCG7yD/471dmyOD+go8cZkdqR 3CHhjH03odtI0EJNVy4VGEC0r9paz3BWYTy18LqWYkw3ygphOIU1r8/7QK3H5Ke3 c4yCSUxaMk5SlAJ+iVRek5TABkR8+zI+ZN5pQtqRH+ya5JxV4F/Sx5Q3KWMzpvgY n6AgSSc3hEfkgdI7SalIeyLaLDWv+RFdGZ5JU5gD28C0G8BeH8L62x6sixZcqoGT oy9rwkjs45/ZmmvBZhd1wLvC/au8l2Ecou6O8+8m26W8Z7vCuGKxuWn0KV3DLLWe 66uchDVlakGoMJSPIK06JWYUlE+gL0CW+U2ekt/v2qb8hGgMVET3CBAMq+bFWuJ6 juX7hJd7wHtCFfjnFDDAkdp2IIIZAlBW6FZGv7pJ82xsW6pSAg0A7VrV6nTtMtDv T8esOfo/t4t0gaL7bivy9DVVdATbUBcJJFpoVoe5MxiyjptveqPzIRwzt04n52Ph ordVWAnX5AokXWTg+Glem/EWEuf7jUuZArfqCSl/sZoQdXGTjR7G4iFscispji4+ kNjVQsItqFbgDpuc6n+GcFxlKQ7YMCnu5MVtTV01U4lFs0qy0NTUqsuR35DM4z14 DkFmj1upWAayCoXTpKzsHBvJZPC+Wqf9Pl3O47apelg7KxU3S011YfXpVPvCTKBv kD2o/5GKWS5QkSUEUXXY1oDiLg== =f8kJ -----END PGP PUBLIC KEY BLOCK----- pub 4096R/8BF4ABEB 2016-08-01 uid Donald Szeto (CODE SIGNING KEY) sig 3 8BF4ABEB 2016-08-01 Donald Szeto (CODE SIGNING KEY) sub 4096R/D8AB5D20 2016-08-01 sig 8BF4ABEB 2016-08-01 Donald Szeto (CODE SIGNING KEY) -----BEGIN PGP PUBLIC KEY BLOCK----- Version: GnuPG v1 mQINBFefie0BEAC8RvYKQJ7xOeqaBKAi+PpcRvLxvpO9G8HIXDiw/6GCO3/tBHJ8 Z2NMfGtFx351R+YpAd2KsiInU4iB25YoTeUqCrwR81zBnXPuNsKs6FXqSLlOZrYq O+a9wLkBY7bh6ABRc3OI3kGTpFMSqq8tlaJyLHvQIREHtQFckjSONMOjSnR0EAfn 4DQS3xgVZNAUbpLeJUdc3B5XYAIzMnkFBPSXEQkBmA97kkDrgaoPpeUdGW4Cqsfz ekUjkjxcax9Dp/OjhLKWmLabHdiVp161Td0x6e24rBaGSVNRlpNLHXfBCBW/+iml iGEh8OGtW/Fc8b4V4HEhTXPbVLpvgt22T17OTIYKyueUGvSd+AIS0053asLlO9kQ X4Y00sH8nnCtJgeTDwwLiudCENvYmE5PvX6Kwiq3tOZJN/onFRKnOHrssXbPd87m +82yDx8/oKYKEoA23bz8f7yMPeqmiedgRr4/1b+ToVtiKSUGtnyzLiXbC2c0sxAZ /L8qFMEWQmO/iDMq5+JmMvZld+Ns4AO81gg+WiWoCaE0YB3kqo1L3yP+D0FDETke 5Ky0i2RtVlCzoM9aXz0zQkHx7vhN24h2IJdCADhGAloykmNVxIqlsbDxx02SsNgV IuZQ+jq9zwL/VR3UUm8uJ+o55XcgBDjBPALvilMTnUG+tB99ip9H/p3l2QARAQAB tDNEb25hbGQgU3pldG8gKENPREUgU0lHTklORyBLRVkpIDxkb25hbGRAYXBhY2hl Lm9yZz6JAjcEEwEKACEFAlefie0CGwMFCwkIBwMFFQoJCAsFFgIDAQACHgECF4AA CgkQbHTsq4v0q+ub4xAArvZBq7K1FjtqiKdwuOqOXGLuQC7Eq8e7mYUvac08nNsq rkvr2RtCDN9VaPbYh9TNJ/7BdcwG6IXOmOsW24FsAnrLSueGaw3zuaAhz8Q/vn+b 7VPcJ3OQEHbNpHlVkAur2NzZobznNhWGK4M9LQnXrVxEMTTDTd5MJqdDKAPNZ/TE Aav0AiAlOd56U8ONswBHgjqlXoS/xHvsUI9UrJIFGkdz96I1ohdcjmjkDiCYFJRt 2NvSWgGEtiN8oykYSCjU1qlyPgcIkdHu5E5xy5fXvVdQEV3bc/Y0Ghf02W6Nb8RY fuq8qBtBVhbi6T0xqwnuh5iuuO4k1BAJUUC2H+c68VTUWWJ0b6Wzz1x54MClUwQV u1hrBFbMGubRTo6uuB9hKMzwXfl3WY6iBXQvb1eY+Y0Fu/NEnNSSSVFq+qyaluUq 9RQn5u0+VCULomzr0TME4Etd+UbIliiylVFg+mtalvha0z5CrE4EJOJ/c+efI/JP vN/WOSJ44JDXUocvp05cZ0GZGyBcfTEepb/gR1dpidoYBnvScWkBak7P4trkadCS vpjbMPtWOOEa7hVP+vZg1MvmelZ2o+VvuyWvGMHryimpV6tFFtbiGR03ltC3cN1t HmjYjSb9rIsXGIN2c+b1LLuR0zxaK46y8UKNMWrwI++9Iqbl4BrBN4oofD78p4W5 Ag0EV5+J7QEQAKkBVsL1zSlOuh/GAeXBs3aIwfY+eQD3PIeo4DsCD1J1M8Dn5xa7 SBHqM+aql7t9hw2iIdqioS8P9ScN1uyWi/MppxDVdTR526ViBR8+739EeprzWPn2 k+cTGoTeisxQjgLgC/C48taCHDPcztDUh+rLnmcKxKJA+dfqswtiAK7qxCHrT+jq 5ru78lDqzbHbJU6BqsEyzP9rwtMnGzjbevNC8YLddkZ8iF8KQgSlr93EXlDj6KaU pZs1AUkPg4UEEkuHJv+pjDhqNfoRSM1vqFyilEe4dWFW+MOKt6yiexVmHB2kgXuV 7J0PjFi4V2tlgInDhimrvW/6gg2b0EPNF3hPIAo9dBUDpW5EEj1HG+CFPEvLBOHX V3LLi5NLDSnj8iL4eiJV+l0/pQToAxDjV7VwFQ3T7gLyKM1YwkOXtS8GsjtqCsJ8 xIGVWsgmaZWpZSIRuEWhgSLwSPOjZjo2YvXQA3WMdslrx+5/ZF5ElFPBKCYXxpUB DkSQ/jTLRSitAXH/0rHrsdw+TZdaW8GPsx7tzdXQVK1PajEpBIx5r+Ix/uhQarwr 9T2yBQ75rMfPPCccHbAI5g4aGpAEpDNnXfjiYi2fEffB5mEiaDaftiHqjOcoMOBw OpN7Y6V+IWniaaqEfWfGMBjM+G/m6veLIgQhAv91TmWvdMksunM9sTAFABEBAAGJ Ah8EGAEKAAkFAlefie0CGwwACgkQbHTsq4v0q+v1rQ//bp0a6vBrRYJU5RKTZ5me Ux93RT5BxZqf0wX4deCz0GiaD9G8fJ9HZyv1jedygIeBiSU/dkrkemGA8j0fchca An8yt4tNamo4AAdkvmPa9c7Z4qpHQvKpKDqrVT+ztB+a6qGFjx9cw3iioji3HW9m ykOPYYk03q9H8h0dW2sa4jaVlXNq/3b2t5cWJ14GGk9XkraSfd+0ZEIT8ffT6u45 B5l35FuzBdxjyNh91T9UGrREjo1e5sgB0WSss/EJVBAJq+xDbWeOgE/azXQ1MhT+ Z+BwKfMfvI35SIHG4Ngr/OSirZbQy4s+OktFBbGhBy7dlmWbS2A9SMn7pt9e/i9K Y44sFGC1xAjq2gnVhbkal0mvT0iDLwIe4/sWuMZJHG05wUGJNMqf8au9KjVj5NzT iDo8roJi4jFolm6YmH6FRFCeYmLON0pXdNFnCe+cLanrQDI59TBgHh/XPFweOjla LqMCEjGpfSLvnopusA1SJUnenwjWWecdfnChnSB8EVIkBiAah1DP1wk+Qy9YfZlj MV2jufd24oqFWQmddOCCkIPYeDOavZvcxYevdftY3LX1B4WbiZOMHqh/HDH2NkyW 6YQIPrw2fEHG2av55RCzCKVcW0PKgW05zjUWwFVo7gEaj5CHK/SeYv0Fpc6QQKls 93sbykwywNMHkdFPE0109cM= =QAQh -----END PGP PUBLIC KEY BLOCK----- pub 4096R/4719A8F4 2017-09-12 [expires: 2021-09-12] uid [ultimate] Chan Lee sig 3 4719A8F4 2017-09-12 Chan Lee sub 4096R/A18B1E8F 2017-09-12 [expires: 2021-09-12] sig 4719A8F4 2017-09-12 Chan Lee -----BEGIN PGP PUBLIC KEY BLOCK----- mQINBFm4YoUBEAC3CFA/xIiqn+NoqSB8ya+mgnwfuL7XoRcR0gaZQ5BNXMjRZcqw 5On1v2TTcXo6LcD/g7oxdBYWzUaubsCSmZzMp+cT4w4bmLr3bSZol0akNq4n2MgT q7jXOhXDhMdKzIdxuJe7wGtFGLjm8Macc6576MEYq1AUtdDNYuEMWr5PmQwsbNQQ CBkH8007trDlPygvzh+w/tLOHNbIv3ynHCIYeY/vYpm3XLHPEF0CjV05sDUMdHkb l5s9xYFOfPT6JqXEC1gnjanSvQ20MlLwk1D9AYlkTir+a17/igi2S5YGBUHXvxvU 7xJrNfN9xDklUMUkHcyZNNWgIr1U7U//4hKm/3D2ele0AmMtL+Y0MmnSKvqXMiRf PTON/6rBwK4WdKsYtq1e2nUWv5Btfzb/oEovh8nmJwrpWUi4oBo8q7lBj9PEb1yY QXBnSjzg2C4zx/z8O1aHx0zb8njTMwknH0Ii/0ukMyCvC1y9yoDTEcD01ctivxwq h4I547vuzVQA7LrPob8fa003R5jR95+bvfedR4beJlDb4uTIS7654Usk58xzCyYt hVz7YrD8Zn1u841YYqw52pQ+d29OpVC6z/IaEN06v77TIW6uJVPxdQU/tUJM+fvT kFVVU8qcGo5SjPUFyc66s9eJ3NRBdoRQO8ls3YO9JxMNSd4wtSysj9vKOwARAQAB tB1DaGFuIExlZSA8Y2hhbmxlZUBhcGFjaGUub3JnPokCPQQTAQoAJwUCWbhihQIb AwUJB4YfgAULCQgHAwUVCgkICwUWAgMBAAIeAQIXgAAKCRB+I2PYRxmo9Eq3D/9Q Tg45t1DXEq911JAfCDA/nIFNBaziPt3FIdKeWR3EGxwTie/KrSH7sL6ou40+GnSb DcWxfq3CobFkTyaAAF7NGt3cmP5e7H02EaEqrSfewUnV0eZlZOb7qoc4qWoG8vGe OwQQEY48DFxmzBzSZSyUvtgJQBX08mlE4jtEMh0QngkxyH5qGi+NEfbvCZaRWf/3 nfErYSfFenQqqRmXol69zGJ+MPXDZLCBdpayTjNZRRs9Bw4B/VVm3zGgdm2GZ/2U G4aPNNBnqDBi6WDBqC8RwOtWcyvHKCXXu2suHTN79SIWhksuPQBILI0lMEfrdsdk 76WUlgHLvdUAjb/g4a+cV7XkbSstGbMvKi7foM5V2o5FppaL7BApMdIZUN5xTAsh zbJ/Wyru69SgNYm/4byWOFyzZBvZLtodqhT/BUHerwOO2ZNtRijwct8Q6o7PePbu K/35KTCTmU5D+91igIa72981ePnShuG1qx1v9a3wpoENYRt6YPchW9vhnErKdh3L nMfgcobI0HESQDx2Si8n8YGYIwXNeUAdDhkuhULXs22XIAay503R5tYiBm3mTaRR q25Tl1uSBeMeNGRoGOfHl7s8oIx/kH54PzgmEbLtEUs4SaUUW1l92ShQgkj87lKX Zytqx27HUTwkUz3AIxBHs9VloJ2Sh/cmMXB70uTV1LkCDQRZuGKFARAAzPli/9oU QLC5Bp75UDIRLQk4kDqyaSfCgcNTuqsXxL0UR/IoICz8S4qo1UccwO6LoUNH5L7j EFHdeG5P2a/Y/WQUIduwGs6M4Btb8vC9Q6kw2P+6R7gCoq9L6QKGOWjOwCbd7H1e YnxB7IizTYzgHq8llr3V7VXXkBSytCMOXdCxOMwoSyNqJdQtrrV1XdzPSwaHZ5vn i+VMuKVM3XlS3cB/KhJTJckWfaQzHMWmCYZytz5GwSJuMz37/YYFKTBPvYfk6z3F kVnpYTK/TZ9ZalbVUQY39Xk9MRhF22FbY1/kdmTQaA9+lxEONvmULXo/vmZIrJlS ua+DdUhrbmz8E6zLvqsH5HcPqJx7nbv/wcjcp4jRPk8vUNu3OO7a4ClFBymJzwL3 k9bgCoVEY/fo71jVSpeajzWR3psJjOFLIOufNortYx3AjdVR2TkyKko0k0nFJ3FV P2MN86nwCh89kU581Q/XLDmZFrlrqFek6V+J2fu4l42Bu4/rWlML4AGst7p7UsYU y/TbF9wJhHgpSV/70gcS2jW26xZpcjNM09fgGlaJFzeqCrxkuIfx5CbdJzkAeFFC 3Nq9o6rcLZXAqVGlLJLl2hGF7mnrzm+7RpZkoIXwX0RyJVVLaz4e6TYXXygYTaD3 kQK6fNabA1Yo+SvRiQLBGM9oAaNoe8ajfWEAEQEAAYkCJQQYAQoADwUCWbhihQIb DAUJB4YfgAAKCRB+I2PYRxmo9KlYD/0aP3YrtNjHdlmLMit9z1jS4hATMdIohS2M 5woZkNogmol2SUXLWz0jGNSKlsPvmlFfD7BEJtzoal27GL5/LTu53a7XXtKMdPrT RF8ZHQd4SkvAmNySPAZo10utskWnkEBtVntHT4/T4KYaum+f+9DbmVLmBIvOMzWU qN5QIYI1HcXw/flY/7F/jH3pZWR1IZ+lgF8AERldYnMUYn+PmEo0YCsu2MulLkfr 2puXx5k0U/dO2Ljnrai878maLuSH100T8nFnY0lJEvwTb1xns4Hszum//9JOk1kJ MpaD7iWKW8BtkN5bouWxT95Igi9I+9nxpDTF47s5fqCbhG7EGI7jeAJYaPUYkzof P/k4K44d4jdmPyxCDxcA2KhxU+8QaYYuKgKX51UZzYR2Xx9YKG5IJTsbZNeKvUku 2T7F+k4B2Z/XbvjzBtsfdQx0hqiDwfw2ZOMTFiaRQKzTbuJ4jb9QoNZazAiuR6gO A9pyEhk6ZBWq9zw00ym3W5IGy/YTS/3buDOfgLL82Axp9TEEJo7vk6JN0I4nJScN CDBXY/trGluhnOmV1mQ2lJnq4FJ2sJSIVcbYTNahAD3cJ8ub+XFgyeMno0OTlrVg LxEW+096aYvYBaxyb9GdKZzUplRHOZWg89DjGM5EVIy8yST1XDyCCMQeK/XglWCA gslTn9eM5A== =Jacn -----END PGP PUBLIC KEY BLOCK----- pub rsa4096/9F4FBC3D 2018-09-11 [SC] uid [ultimate] Takako Shimamoto (CODE SIGNING KEY) sig 3 9F4FBC3D 2018-09-11 Takako Shimamoto (CODE SIGNING KEY) sub rsa4096/EC151981 2018-09-11 [E] sig 9F4FBC3D 2018-09-11 Takako Shimamoto (CODE SIGNING KEY) -----BEGIN PGP PUBLIC KEY BLOCK----- mQINBFuXgEQBEADA2p47lLog6fWkm3yXB7+jcvzzhZVkLweePBkEi5I0QBOX7PpL CFNGGDdJW0L6p+8PhHWkzEeCdzYEJj74TGuMT9pZ+ibbjw3BLw3CvFaJa24/g83j 1jfoKOBLL7xdsvYyrMr/U3ZZYOpD6UkW4LjMWooGYcthlQgpuTXhmLswrym+b1YA 9xJbjFFL7gktB9O/XPf80nKDv+/duCtCcLKsWRuVsfFmnabJGJsok17wT9j5gjjc GfADZvsQdXJDYFS8Z7Os5fczPzx+xpIKioqLUN1bmXDuwF1+e+hgQuK4WS3RfOu6 N9bp2R/cnYOcPWIGi955wKkjbUo6ujFFg7ICxhWTEqALZuXXScDoA0SkjtD/E8u6 d8L43Hb64v4TA6qc7sTzyUGsKjzs/j8iTCFu7H9rl+MvpTZj6BnovHKcqufWO9Aj ndPPdVsnNse8MoBL7yxZ/eohVILA0LdHu/AnolfQTHtKkKFiCke18OhnS7x8Vg8j Q1rUDllsG77LhyA3EwecR//E518nOrxreD4PVXzQvkQz3HK9V2LBAeKrAzCtn0h8 fpBfCgcN3r5+f8eA34Y3f4b6SGgyRRYYQtIwJ1w0CLGCWm06CKK8rKBK1wa988QA 3W4r+vCNVKEKUjPBltOvSpebk3DS4Ymk6plRXxLWedS8c89UJE7jb9wF3wARAQAB tDpUYWtha28gU2hpbWFtb3RvIChDT0RFIFNJR05JTkcgS0VZKSA8c2hpbWFtb3Rv QGFwYWNoZS5vcmc+iQJOBBMBCgA4FiEEGRmUSApCN8H1BfYxFyTwJ59PvD0FAluX gEQCGwMFCwkIBwMFFQoJCAsFFgIDAQACHgECF4AACgkQFyTwJ59PvD3quw/+KJ7k lNPkF0ogvBW48bn9HZgm3M6I/fxmHoqqEF1q8uCjnSXuHboEb1LhQO+BKLA4WH+F fXUAlAIdzbGrUVIvuExr5QMhVY5oDofUMbUOouJSPG/1JXjikjnS3UP2eZYKyWst bNZH7OrMiXqKtGaF8HT0BgVsNYxIEeKAKo4N0QWaUS0n1ep/GqdCBKuFGfXnxH7e qMzJCEyuRhXu+S7t7EGdBUGz/2kYAHcfsuAj4y24xzyQOUYYox1wBGoMwg4MGzIy WmVflIO7Unqz33dquEfNrOopK6kAbHmI1MBloHcNOVclF8sDTBH2kkkBfnFPy+H7 munXFMg4Dtk+4fMsdPp8+wkvQd6J67ao67D9KBN/jnUSohw4bEuzOl6cy+dVlTf9 XI+t/vKNJq+N8gRRTHvbaDfT9j5JgH51abFnh7Y9UDILU9JmvIFbjkBIQPpUg/ZV CPaNzafFPvTn3G/KTVkpc75IDMjgEX4i9scPosvJL8rpGJnfMIEjt7tSwlLJ8lWf DsW7XkWo6KpKdlve0e36KV23EdyUAAZ3+Oy05nAdZo/DhaC9LmmSlJwg9l/dwoUG HEdPFdEWquoqkAQCbw3JXNuISugrpq7l6gD2cckAEOMg4ub5nGUVCLgojrWa3vGN eb0YLBR5HlcRHhCnAQj69l7jgID4/VLNCUMjKfi5Ag0EW5eARAEQANf2H038aioy 0wFO27pERyVbPQrDr9kmN3AX4QoJBQ8U67jEkO+/vjz8S33H/Y/x8crMm8QCly6l ECukdPCFmne+gloHlJm2pL2Qre/6YKuEDHMq/wut1/HDfKrkA86zfKkX5aut+Vtx jdh7awf8ub6XcSTmmACBk+g/bvVoKCBD21bdw6h1I1/rYR+X+cCSCTSzvDLlv8ye JeMEeLdld0/lTDUHXUyYf/z8AVr2IrtcFJFlA27ixtCqU3getyhBT1zGhlSFtQar 4tBzQ7UMOKQcxczHOCHM7lyB58zRsDv2PSr5LCV9tysR/CzNHdxIxvsK8qCf4wNp YQLqwZa0bMdk/1vfiHf2f9L5PIWvfXUcfLFEUqG0GvARia1JC/YCA2vFEluVRPPg EZgMsrUWoBOj+4qPaABAYFe8Tv7WW+vHKB9/+sETzUmo3TV0wcZz9qKiQwYoxWZQ /hME5q7SLWHd7kLfyTbHBZrYtIvYLRFBljH9RZQR7i+VMThG9yWPrmpiY9Pcxxfv sefGISNwlNWmXbK55+LuFHIcjPI+FTUQBF0SqvxJuCXjMmTsxjP3w+BiCrKgJEt4 ZVWN7VI8LYHP6IIctg65TPNti/rkd1AU4MDpSESN0o8b5R+04dIb05s2oEIGPY2R tr2WUZY6YwuEmcKQ/igsYq0TwOtIqEtdABEBAAGJAjYEGAEKACAWIQQZGZRICkI3 wfUF9jEXJPAnn0+8PQUCW5eARAIbDAAKCRAXJPAnn0+8PenjEADAIkw4B/AC7cP+ xJzjYsE0kVOKLRYo+2+5jRvyoLWffcU6WMs44sF9XI4BRDHAGgDC0xvK03LPeh0d mfhIMVEiqG8Zz+6Qkt4upCkXDuJ5QnjKZh4SWXNpW/avzOwCOX2f8JWz11Qoq+J4 Vnd0BbIjjI+rDiBbccr5Kc6tN43QhihLclJ5hO2QpdWIHGFjCaekIX6zWVYAkkFk UB4vHG/eghXJA44lP2kUtVb0Ay5Fl23G8bNqC30/DoswE5bJZjgEbKWUGHBCAA+q GtYDG6ttj1AzQuwhaW3mjCaspRHYbPp76Jqh/dw1mRuwWsgw/rYhw/Ptydpzeyad W3NlHFj0NweMmstfmvwMd6VOP9cXni87Ynra8pWUvzCO2kzCL+IpUylPFAOvsSHe M0exU5/K+ClKlsL5J9oL+6Nc02LDgk56H3aQimKc4sO7/TWqIhHtvYMHJj2PK+Da P0qc0vJo6f3wYNU3VhzX/IZg/94luTeMiQgCBie5jXrv3EtBqCg/B/+TzOIrl6pO TPFJ9Q5iQiLgoJNrBc2AjV34YaCa8esLRLTZQYoVX/9pN4ECcskNX3TxF45zExUi 8LMAftwE+fVt7zsJe3oZhPwtLN1RZoLJI/zvXJhXKAP/1LK57Ezrws0AwFSazc0I wEvmHAxp9J8DJY97zR+oCp/Or62cNw== =8icF -----END PGP PUBLIC KEY BLOCK----- ================================================ FILE: LICENSE.txt ================================================ Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ================================================================================ PredictionIO Subcomponents: The Apache PredictionIO project contains subcomponents with separate copyright notices and license terms. Your use of the source code for these subcomponents is subject to the terms and conditions of the following licenses. ================================================================================ For semver.sh in bin/: ================================================================================ Copyright (c) 2013, Ray Bejjani All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. The views and conclusions contained in the software and documentation are those of the authors and should not be interpreted as representing official policies, either expressed or implied, of the FreeBSD Project. ================================================================================ For sbt in sbt/: ================================================================================ // Generated from http://www.opensource.org/licenses/bsd-license.php Copyright (c) 2011, Paul Phillips. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the author nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ================================================================================ For binary distribution: ================================================================================ Binary distribution bundles javax.servlet # servlet-api # 2.5 javax.servlet.jsp # jsp-api # 2.1 javax.activation # activation # 1.1 javax.xml.stream # stax-api # 1.0-2 which are available under the CDDL v1.0 license (https://glassfish.java.net/public/CDDLv1.0.text) COMMON DEVELOPMENT AND DISTRIBUTION LICENSE (CDDL) Version 1.0 1. Definitions. 1.1. Contributor means each individual or entity that creates or contributes to the creation of Modifications. 1.2. Contributor Version means the combination of the Original Software, prior Modifications used by a Contributor (if any), and the Modifications made by that particular Contributor. 1.3. Covered Software means (a) the Original Software, or (b) Modifications, or (c) the combination of files containing Original Software with files containing Modifications, in each case including portions thereof. 1.4. Executable means the Covered Software in any form other than Source Code. 1.5. Initial Developer means the individual or entity that first makes Original Software available under this License. 1.6. Larger Work means a work which combines Covered Software or portions thereof with code not governed by the terms of this License. 1.7. License means this document. 1.8. Licensable means having the right to grant, to the maximum extent possible, whether at the time of the initial grant or subsequently acquired, any and all of the rights conveyed herein. 1.9. Modifications means the Source Code and Executable form of any of the following: A. Any file that results from an addition to, deletion from or modification of the contents of a file containing Original Software or previous Modifications; B. Any new file that contains any part of the Original Software or previous Modification; or C. Any new file that is contributed or otherwise made available under the terms of this License. 1.10. Original Software means the Source Code and Executable form of computer software code that is originally released under this License. 1.11. Patent Claims means any patent claim(s), now owned or hereafter acquired, including without limitation, method, process, and apparatus claims, in any patent Licensable by grantor. 1.12. Source Code means (a) the common form of computer software code in which modifications are made and (b) associated documentation included in or with such code. 1.13. You (or Your) means an individual or a legal entity exercising rights under, and complying with all of the terms of, this License. For legal entities, You includes any entity which controls, is controlled by, or is under common control with You. For purposes of this definition, control means (a) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (b) ownership of more than fifty percent (50%) of the outstanding shares or beneficial ownership of such entity. 2. License Grants. 2.1. The Initial Developer Grant. Conditioned upon Your compliance with Section 3.1 below and subject to third party intellectual property claims, the Initial Developer hereby grants You a world-wide, royalty-free, non-exclusive license: (a) under intellectual property rights (other than patent or trademark) Licensable by Initial Developer, to use, reproduce, modify, display, perform, sublicense and distribute the Original Software (or portions thereof), with or without Modifications, and/or as part of a Larger Work; and (b) under Patent Claims infringed by the making, using or selling of Original Software, to make, have made, use, practice, sell, and offer for sale, and/or otherwise dispose of the Original Software (or portions thereof). (c) The licenses granted in Sections 2.1(a) and (b) are effective on the date Initial Developer first distributes or otherwise makes the Original Software available to a third party under the terms of this License. (d) Notwithstanding Section 2.1(b) above, no patent license is granted: (1) for code that You delete from the Original Software, or (2) for infringements caused by: (i) the modification of the Original Software, or (ii) the combination of the Original Software with other software or devices. 2.2. Contributor Grant. Conditioned upon Your compliance with Section 3.1 below and subject to third party intellectual property claims, each Contributor hereby grants You a world-wide, royalty-free, non-exclusive license: (a) under intellectual property rights (other than patent or trademark) Licensable by Contributor to use, reproduce, modify, display, perform, sublicense and distribute the Modifications created by such Contributor (or portions thereof), either on an unmodified basis, with other Modifications, as Covered Software and/or as part of a Larger Work; and (b) under Patent Claims infringed by the making, using, or selling of Modifications made by that Contributor either alone and/or in combination with its Contributor Version (or portions of such combination), to make, use, sell, offer for sale, have made, and/or otherwise dispose of: (1) Modifications made by that Contributor (or portions thereof); and (2) the combination of Modifications made by that Contributor with its Contributor Version (or portions of such combination). (c) The licenses granted in Sections 2.2(a) and 2.2(b) are effective on the date Contributor first distributes or otherwise makes the Modifications available to a third party. (d) Notwithstanding Section 2.2(b) above, no patent license is granted: (1) for any code that Contributor has deleted from the Contributor Version; (2) for infringements caused by: (i) third party modifications of Contributor Version, or (ii) the combination of Modifications made by that Contributor with other software (except as part of the Contributor Version) or other devices; or (3) under Patent Claims infringed by Covered Software in the absence of Modifications made by that Contributor. 3. Distribution Obligations. 3.1. Availability of Source Code. Any Covered Software that You distribute or otherwise make available in Executable form must also be made available in Source Code form and that Source Code form must be distributed only under the terms of this License. You must include a copy of this License with every copy of the Source Code form of the Covered Software You distribute or otherwise make available. You must inform recipients of any such Covered Software in Executable form as to how they can obtain such Covered Software in Source Code form in a reasonable manner on or through a medium customarily used for software exchange. 3.2. Modifications. The Modifications that You create or to which You contribute are governed by the terms of this License. You represent that You believe Your Modifications are Your original creation(s) and/or You have sufficient rights to grant the rights conveyed by this License. 3.3. Required Notices. You must include a notice in each of Your Modifications that identifies You as the Contributor of the Modification. You may not remove or alter any copyright, patent or trademark notices contained within the Covered Software, or any notices of licensing or any descriptive text giving attribution to any Contributor or the Initial Developer. 3.4. Application of Additional Terms. You may not offer or impose any terms on any Covered Software in Source Code form that alters or restricts the applicable version of this License or the recipients rights hereunder. You may choose to offer, and to charge a fee for, warranty, support, indemnity or liability obligations to one or more recipients of Covered Software. However, you may do so only on Your own behalf, and not on behalf of the Initial Developer or any Contributor. You must make it absolutely clear that any such warranty, support, indemnity or liability obligation is offered by You alone, and You hereby agree to indemnify the Initial Developer and every Contributor for any liability incurred by the Initial Developer or such Contributor as a result of warranty, support, indemnity or liability terms You offer. 3.5. Distribution of Executable Versions. You may distribute the Executable form of the Covered Software under the terms of this License or under the terms of a license of Your choice, which may contain terms different from this License, provided that You are in compliance with the terms of this License and that the license for the Executable form does not attempt to limit or alter the recipients rights in the Source Code form from the rights set forth in this License. If You distribute the Covered Software in Executable form under a different license, You must make it absolutely clear that any terms which differ from this License are offered by You alone, not by the Initial Developer or Contributor. You hereby agree to indemnify the Initial Developer and every Contributor for any liability incurred by the Initial Developer or such Contributor as a result of any such terms You offer. 3.6. Larger Works. You may create a Larger Work by combining Covered Software with other code not governed by the terms of this License and distribute the Larger Work as a single product. In such a case, You must make sure the requirements of this License are fulfilled for the Covered Software. 4. Versions of the License. 4.1. New Versions. Sun Microsystems, Inc. is the initial license steward and may publish revised and/or new versions of this License from time to time. Each version will be given a distinguishing version number. Except as provided in Section 4.3, no one other than the license steward has the right to modify this License. 4.2. Effect of New Versions. You may always continue to use, distribute or otherwise make the Covered Software available under the terms of the version of the License under which You originally received the Covered Software. If the Initial Developer includes a notice in the Original Software prohibiting it from being distributed or otherwise made available under any subsequent version of the License, You must distribute and make the Covered Software available under the terms of the version of the License under which You originally received the Covered Software. Otherwise, You may also choose to use, distribute or otherwise make the Covered Software available under the terms of any subsequent version of the License published by the license steward. 4.3. Modified Versions. When You are an Initial Developer and You want to create a new license for Your Original Software, You may create and use a modified version of this License if You: (a) rename the license and remove any references to the name of the license steward (except to note that the license differs from this License); and (b) otherwise make it clear that the license contains terms which differ from this License. 5. DISCLAIMER OF WARRANTY. COVERED SOFTWARE IS PROVIDED UNDER THIS LICENSE ON AN AS IS BASIS, WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES THAT THE COVERED SOFTWARE IS FREE OF DEFECTS, MERCHANTABLE, FIT FOR A PARTICULAR PURPOSE OR NON-INFRINGING. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE COVERED SOFTWARE IS WITH YOU. SHOULD ANY COVERED SOFTWARE PROVE DEFECTIVE IN ANY RESPECT, YOU (NOT THE INITIAL DEVELOPER OR ANY OTHER CONTRIBUTOR) ASSUME THE COST OF ANY NECESSARY SERVICING, REPAIR OR CORRECTION. THIS DISCLAIMER OF WARRANTY CONSTITUTES AN ESSENTIAL PART OF THIS LICENSE. NO USE OF ANY COVERED SOFTWARE IS AUTHORIZED HEREUNDER EXCEPT UNDER THIS DISCLAIMER. 6. TERMINATION. 6.1. This License and the rights granted hereunder will terminate automatically if You fail to comply with terms herein and fail to cure such breach within 30 days of becoming aware of the breach. Provisions which, by their nature, must remain in effect beyond the termination of this License shall survive. 6.2. If You assert a patent infringement claim (excluding declaratory judgment actions) against Initial Developer or a Contributor (the Initial Developer or Contributor against whom You assert such claim is referred to as Participant) alleging that the Participant Software (meaning the Contributor Version where the Participant is a Contributor or the Original Software where the Participant is the Initial Developer) directly or indirectly infringes any patent, then any and all rights granted directly or indirectly to You by such Participant, the Initial Developer (if the Initial Developer is not the Participant) and all Contributors under Sections 2.1 and/or 2.2 of this License shall, upon 60 days notice from Participant terminate prospectively and automatically at the expiration of such 60 day notice period, unless if within such 60 day period You withdraw Your claim with respect to the Participant Software against such Participant either unilaterally or pursuant to a written agreement with Participant. 6.3. In the event of termination under Sections 6.1 or 6.2 above, all end user licenses that have been validly granted by You or any distributor hereunder prior to termination (excluding licenses granted to You by any distributor) shall survive termination. 7. LIMITATION OF LIABILITY. UNDER NO CIRCUMSTANCES AND UNDER NO LEGAL THEORY, WHETHER TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE, SHALL YOU, THE INITIAL DEVELOPER, ANY OTHER CONTRIBUTOR, OR ANY DISTRIBUTOR OF COVERED SOFTWARE, OR ANY SUPPLIER OF ANY OF SUCH PARTIES, BE LIABLE TO ANY PERSON FOR ANY INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY CHARACTER INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOST PROFITS, LOSS OF GOODWILL, WORK STOPPAGE, COMPUTER FAILURE OR MALFUNCTION, OR ANY AND ALL OTHER COMMERCIAL DAMAGES OR LOSSES, EVEN IF SUCH PARTY SHALL HAVE BEEN INFORMED OF THE POSSIBILITY OF SUCH DAMAGES. THIS LIMITATION OF LIABILITY SHALL NOT APPLY TO LIABILITY FOR DEATH OR PERSONAL INJURY RESULTING FROM SUCH PARTYS NEGLIGENCE TO THE EXTENT APPLICABLE LAW PROHIBITS SUCH LIMITATION. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OR LIMITATION OF INCIDENTAL OR CONSEQUENTIAL DAMAGES, SO THIS EXCLUSION AND LIMITATION MAY NOT APPLY TO YOU. 8. U.S. GOVERNMENT END USERS. The Covered Software is a commercial item, as that term is defined in 48 C.F.R. 2.101 (Oct. 1995), consisting of commercial computer software (as that term is defined at 48 C.F.R. 252.227-7014(a)(1)) and commercial computer software documentation as such terms are used in 48 C.F.R. 12.212 (Sept. 1995). Consistent with 48 C.F.R. 12.212 and 48 C.F.R. 227.7202-1 through 227.7202-4 (June 1995), all U.S. Government End Users acquire Covered Software with only those rights set forth herein. This U.S. Government Rights clause is in lieu of, and supersedes, any other FAR, DFAR, or other clause or provision that addresses Government rights in computer software under this License. 9. MISCELLANEOUS. This License represents the complete agreement concerning subject matter hereof. If any provision of this License is held to be unenforceable, such provision shall be reformed only to the extent necessary to make it enforceable. This License shall be governed by the law of the jurisdiction specified in a notice contained within the Original Software (except to the extent applicable law, if any, provides otherwise), excluding such jurisdictions conflict-of-law provisions. Any litigation relating to this License shall be subject to the jurisdiction of the courts located in the jurisdiction and venue specified in a notice contained within the Original Software, with the losing party responsible for costs, including, without limitation, court costs and reasonable attorneys fees and expenses. The application of the United Nations Convention on Contracts for the International Sale of Goods is expressly excluded. Any law or regulation which provides that the language of a contract shall be construed against the drafter shall not apply to this License. You agree that You alone are responsible for compliance with the United States export administration regulations (and the export control laws and regulation of any other countries) when You use, distribute or otherwise make available any Covered Software. 10. RESPONSIBILITY FOR CLAIMS. As between Initial Developer and the Contributors, each party is responsible for claims and damages arising, directly or indirectly, out of its utilization of rights under this License and You agree to work with Initial Developer and Contributors to distribute such responsibility on an equitable basis. Nothing herein is intended or shall be deemed to constitute any admission of liability. NOTICE PURSUANT TO SECTION 9 OF THE COMMON DEVELOPMENT AND DISTRIBUTION LICENSE (CDDL) The GlassFish code released under the CDDL shall be governed by the laws of the State of California (excluding conflict-of-law provisions). -------------------------------------------------------------------------------- Binary distribution bundles com.sun.jersey # jersey-core # 1.9 (https://github.com/jersey/jersey-1.x) com.sun.jersey # jersey-json # 1.9 (https://github.com/jersey/jersey-1.x) com.sun.jersey # jersey-server # 1.9 (https://github.com/jersey/jersey-1.x) javax.xml.bind # jaxb-api # 2.2.2 com.sun.xml.bind # jaxb-impl # 2.2.3-1 which are available under the CDDL v1.1 license (https://glassfish.java.net/public/CDDL+GPL_1_1.html) COMMON DEVELOPMENT AND DISTRIBUTION LICENSE (CDDL) Version 1.1 1. Definitions. 1.1. "Contributor" means each individual or entity that creates or contributes to the creation of Modifications. 1.2. "Contributor Version" means the combination of the Original Software, prior Modifications used by a Contributor (if any), and the Modifications made by that particular Contributor. 1.3. "Covered Software" means (a) the Original Software, or (b) Modifications, or (c) the combination of files containing Original Software with files containing Modifications, in each case including portions thereof. 1.4. "Executable" means the Covered Software in any form other than Source Code. 1.5. "Initial Developer" means the individual or entity that first makes Original Software available under this License. 1.6. "Larger Work" means a work which combines Covered Software or portions thereof with code not governed by the terms of this License. 1.7. "License" means this document. 1.8. "Licensable" means having the right to grant, to the maximum extent possible, whether at the time of the initial grant or subsequently acquired, any and all of the rights conveyed herein. 1.9. "Modifications" means the Source Code and Executable form of any of the following: A. Any file that results from an addition to, deletion from or modification of the contents of a file containing Original Software or previous Modifications; B. Any new file that contains any part of the Original Software or previous Modification; or C. Any new file that is contributed or otherwise made available under the terms of this License. 1.10. "Original Software" means the Source Code and Executable form of computer software code that is originally released under this License. 1.11. "Patent Claims" means any patent claim(s), now owned or hereafter acquired, including without limitation, method, process, and apparatus claims, in any patent Licensable by grantor. 1.12. "Source Code" means (a) the common form of computer software code in which modifications are made and (b) associated documentation included in or with such code. 1.13. "You" (or "Your") means an individual or a legal entity exercising rights under, and complying with all of the terms of, this License. For legal entities, "You" includes any entity which controls, is controlled by, or is under common control with You. For purposes of this definition, "control" means (a) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (b) ownership of more than fifty percent (50%) of the outstanding shares or beneficial ownership of such entity. 2. License Grants. 2.1. The Initial Developer Grant. Conditioned upon Your compliance with Section 3.1 below and subject to third party intellectual property claims, the Initial Developer hereby grants You a world-wide, royalty-free, non-exclusive license: (a) under intellectual property rights (other than patent or trademark) Licensable by Initial Developer, to use, reproduce, modify, display, perform, sublicense and distribute the Original Software (or portions thereof), with or without Modifications, and/or as part of a Larger Work; and (b) under Patent Claims infringed by the making, using or selling of Original Software, to make, have made, use, practice, sell, and offer for sale, and/or otherwise dispose of the Original Software (or portions thereof). (c) The licenses granted in Sections 2.1(a) and (b) are effective on the date Initial Developer first distributes or otherwise makes the Original Software available to a third party under the terms of this License. (d) Notwithstanding Section 2.1(b) above, no patent license is granted: (1) for code that You delete from the Original Software, or (2) for infringements caused by: (i) the modification of the Original Software, or (ii) the combination of the Original Software with other software or devices. 2.2. Contributor Grant. Conditioned upon Your compliance with Section 3.1 below and subject to third party intellectual property claims, each Contributor hereby grants You a world-wide, royalty-free, non-exclusive license: (a) under intellectual property rights (other than patent or trademark) Licensable by Contributor to use, reproduce, modify, display, perform, sublicense and distribute the Modifications created by such Contributor (or portions thereof), either on an unmodified basis, with other Modifications, as Covered Software and/or as part of a Larger Work; and (b) under Patent Claims infringed by the making, using, or selling of Modifications made by that Contributor either alone and/or in combination with its Contributor Version (or portions of such combination), to make, use, sell, offer for sale, have made, and/or otherwise dispose of: (1) Modifications made by that Contributor (or portions thereof); and (2) the combination of Modifications made by that Contributor with its Contributor Version (or portions of such combination). (c) The licenses granted in Sections 2.2(a) and 2.2(b) are effective on the date Contributor first distributes or otherwise makes the Modifications available to a third party. (d) Notwithstanding Section 2.2(b) above, no patent license is granted: (1) for any code that Contributor has deleted from the Contributor Version; (2) for infringements caused by: (i) third party modifications of Contributor Version, or (ii) the combination of Modifications made by that Contributor with other software (except as part of the Contributor Version) or other devices; or (3) under Patent Claims infringed by Covered Software in the absence of Modifications made by that Contributor. 3. Distribution Obligations. 3.1. Availability of Source Code. Any Covered Software that You distribute or otherwise make available in Executable form must also be made available in Source Code form and that Source Code form must be distributed only under the terms of this License. You must include a copy of this License with every copy of the Source Code form of the Covered Software You distribute or otherwise make available. You must inform recipients of any such Covered Software in Executable form as to how they can obtain such Covered Software in Source Code form in a reasonable manner on or through a medium customarily used for software exchange. 3.2. Modifications. The Modifications that You create or to which You contribute are governed by the terms of this License. You represent that You believe Your Modifications are Your original creation(s) and/or You have sufficient rights to grant the rights conveyed by this License. 3.3. Required Notices. You must include a notice in each of Your Modifications that identifies You as the Contributor of the Modification. You may not remove or alter any copyright, patent or trademark notices contained within the Covered Software, or any notices of licensing or any descriptive text giving attribution to any Contributor or the Initial Developer. 3.4. Application of Additional Terms. You may not offer or impose any terms on any Covered Software in Source Code form that alters or restricts the applicable version of this License or the recipients' rights hereunder. You may choose to offer, and to charge a fee for, warranty, support, indemnity or liability obligations to one or more recipients of Covered Software. However, you may do so only on Your own behalf, and not on behalf of the Initial Developer or any Contributor. You must make it absolutely clear that any such warranty, support, indemnity or liability obligation is offered by You alone, and You hereby agree to indemnify the Initial Developer and every Contributor for any liability incurred by the Initial Developer or such Contributor as a result of warranty, support, indemnity or liability terms You offer. 3.5. Distribution of Executable Versions. You may distribute the Executable form of the Covered Software under the terms of this License or under the terms of a license of Your choice, which may contain terms different from this License, provided that You are in compliance with the terms of this License and that the license for the Executable form does not attempt to limit or alter the recipient's rights in the Source Code form from the rights set forth in this License. If You distribute the Covered Software in Executable form under a different license, You must make it absolutely clear that any terms which differ from this License are offered by You alone, not by the Initial Developer or Contributor. You hereby agree to indemnify the Initial Developer and every Contributor for any liability incurred by the Initial Developer or such Contributor as a result of any such terms You offer. 3.6. Larger Works. You may create a Larger Work by combining Covered Software with other code not governed by the terms of this License and distribute the Larger Work as a single product. In such a case, You must make sure the requirements of this License are fulfilled for the Covered Software. 4. Versions of the License. 4.1. New Versions. Oracle is the initial license steward and may publish revised and/or new versions of this License from time to time. Each version will be given a distinguishing version number. Except as provided in Section 4.3, no one other than the license steward has the right to modify this License. 4.2. Effect of New Versions. You may always continue to use, distribute or otherwise make the Covered Software available under the terms of the version of the License under which You originally received the Covered Software. If the Initial Developer includes a notice in the Original Software prohibiting it from being distributed or otherwise made available under any subsequent version of the License, You must distribute and make the Covered Software available under the terms of the version of the License under which You originally received the Covered Software. Otherwise, You may also choose to use, distribute or otherwise make the Covered Software available under the terms of any subsequent version of the License published by the license steward. 4.3. Modified Versions. When You are an Initial Developer and You want to create a new license for Your Original Software, You may create and use a modified version of this License if You: (a) rename the license and remove any references to the name of the license steward (except to note that the license differs from this License); and (b) otherwise make it clear that the license contains terms which differ from this License. 5. DISCLAIMER OF WARRANTY. COVERED SOFTWARE IS PROVIDED UNDER THIS LICENSE ON AN "AS IS" BASIS, WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES THAT THE COVERED SOFTWARE IS FREE OF DEFECTS, MERCHANTABLE, FIT FOR A PARTICULAR PURPOSE OR NON-INFRINGING. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE COVERED SOFTWARE IS WITH YOU. SHOULD ANY COVERED SOFTWARE PROVE DEFECTIVE IN ANY RESPECT, YOU (NOT THE INITIAL DEVELOPER OR ANY OTHER CONTRIBUTOR) ASSUME THE COST OF ANY NECESSARY SERVICING, REPAIR OR CORRECTION. THIS DISCLAIMER OF WARRANTY CONSTITUTES AN ESSENTIAL PART OF THIS LICENSE. NO USE OF ANY COVERED SOFTWARE IS AUTHORIZED HEREUNDER EXCEPT UNDER THIS DISCLAIMER. 6. TERMINATION. 6.1. This License and the rights granted hereunder will terminate automatically if You fail to comply with terms herein and fail to cure such breach within 30 days of becoming aware of the breach. Provisions which, by their nature, must remain in effect beyond the termination of this License shall survive. 6.2. If You assert a patent infringement claim (excluding declaratory judgment actions) against Initial Developer or a Contributor (the Initial Developer or Contributor against whom You assert such claim is referred to as "Participant") alleging that the Participant Software (meaning the Contributor Version where the Participant is a Contributor or the Original Software where the Participant is the Initial Developer) directly or indirectly infringes any patent, then any and all rights granted directly or indirectly to You by such Participant, the Initial Developer (if the Initial Developer is not the Participant) and all Contributors under Sections 2.1 and/or 2.2 of this License shall, upon 60 days notice from Participant terminate prospectively and automatically at the expiration of such 60 day notice period, unless if within such 60 day period You withdraw Your claim with respect to the Participant Software against such Participant either unilaterally or pursuant to a written agreement with Participant. 6.3. If You assert a patent infringement claim against Participant alleging that the Participant Software directly or indirectly infringes any patent where such claim is resolved (such as by license or settlement) prior to the initiation of patent infringement litigation, then the reasonable value of the licenses granted by such Participant under Sections 2.1 or 2.2 shall be taken into account in determining the amount or value of any payment or license. 6.4. In the event of termination under Sections 6.1 or 6.2 above, all end user licenses that have been validly granted by You or any distributor hereunder prior to termination (excluding licenses granted to You by any distributor) shall survive termination. 7. LIMITATION OF LIABILITY. UNDER NO CIRCUMSTANCES AND UNDER NO LEGAL THEORY, WHETHER TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE, SHALL YOU, THE INITIAL DEVELOPER, ANY OTHER CONTRIBUTOR, OR ANY DISTRIBUTOR OF COVERED SOFTWARE, OR ANY SUPPLIER OF ANY OF SUCH PARTIES, BE LIABLE TO ANY PERSON FOR ANY INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY CHARACTER INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF GOODWILL, WORK STOPPAGE, COMPUTER FAILURE OR MALFUNCTION, OR ANY AND ALL OTHER COMMERCIAL DAMAGES OR LOSSES, EVEN IF SUCH PARTY SHALL HAVE BEEN INFORMED OF THE POSSIBILITY OF SUCH DAMAGES. THIS LIMITATION OF LIABILITY SHALL NOT APPLY TO LIABILITY FOR DEATH OR PERSONAL INJURY RESULTING FROM SUCH PARTY'S NEGLIGENCE TO THE EXTENT APPLICABLE LAW PROHIBITS SUCH LIMITATION. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OR LIMITATION OF INCIDENTAL OR CONSEQUENTIAL DAMAGES, SO THIS EXCLUSION AND LIMITATION MAY NOT APPLY TO YOU. 8. U.S. GOVERNMENT END USERS. The Covered Software is a "commercial item," as that term is defined in 48 C.F.R. 2.101 (Oct. 1995), consisting of "commercial computer software" (as that term is defined at 48 C.F.R. § 252.227-7014(a)(1)) and "commercial computer software documentation" as such terms are used in 48 C.F.R. 12.212 (Sept. 1995). Consistent with 48 C.F.R. 12.212 and 48 C.F.R. 227.7202-1 through 227.7202-4 (June 1995), all U.S. Government End Users acquire Covered Software with only those rights set forth herein. This U.S. Government Rights clause is in lieu of, and supersedes, any other FAR, DFAR, or other clause or provision that addresses Government rights in computer software under this License. 9. MISCELLANEOUS. This License represents the complete agreement concerning subject matter hereof. If any provision of this License is held to be unenforceable, such provision shall be reformed only to the extent necessary to make it enforceable. This License shall be governed by the law of the jurisdiction specified in a notice contained within the Original Software (except to the extent applicable law, if any, provides otherwise), excluding such jurisdiction's conflict-of-law provisions. Any litigation relating to this License shall be subject to the jurisdiction of the courts located in the jurisdiction and venue specified in a notice contained within the Original Software, with the losing party responsible for costs, including, without limitation, court costs and reasonable attorneys' fees and expenses. The application of the United Nations Convention on Contracts for the International Sale of Goods is expressly excluded. Any law or regulation which provides that the language of a contract shall be construed against the drafter shall not apply to this License. You agree that You alone are responsible for compliance with the United States export administration regulations (and the export control laws and regulation of any other countries) when You use, distribute or otherwise make available any Covered Software. 10. RESPONSIBILITY FOR CLAIMS. As between Initial Developer and the Contributors, each party is responsible for claims and damages arising, directly or indirectly, out of its utilization of rights under this License and You agree to work with Initial Developer and Contributors to distribute such responsibility on an equitable basis. Nothing herein is intended or shall be deemed to constitute any admission of liability. NOTICE PURSUANT TO SECTION 9 OF THE COMMON DEVELOPMENT AND DISTRIBUTION LICENSE (CDDL) The code released under the CDDL shall be governed by the laws of the State of California (excluding conflict-of-law provisions). Any litigation relating to this License shall be subject to the jurisdiction of the Federal Courts of the Northern District of California and the state courts of the State of California, with venue lying in Santa Clara County, California. The GNU General Public License (GPL) Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc. 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. One line to give the program's name and a brief idea of what it does. Copyright (C) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. signature of Ty Coon, 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Library General Public License instead of this License. # "CLASSPATH" EXCEPTION TO THE GPL VERSION 2 Certain source files distributed by Oracle are subject to the following clarification and special exception to the GPL Version 2, but only where Oracle has expressly included in the particular source file's header the words "Oracle designates this particular file as subject to the "Classpath" exception as provided by Oracle in the License file that accompanied this code." Linking this library statically or dynamically with other modules is making a combined work based on this library. Thus, the terms and conditions of the GNU General Public License Version 2 cover the whole combination. As a special exception, the copyright holders of this library give you permission to link this library with independent modules to produce an executable, regardless of the license terms of these independent modules, and to copy and distribute the resulting executable under terms of your choice, provided that you also meet, for each linked independent module, the terms and conditions of the license of that module. An independent module is a module which is not derived from or based on this library. If you modify this library, you may extend this exception to your version of the library, but you are not obligated to do so. If you do not wish to do so, delete this exception statement from your version. -------------------------------------------------------------------------------- Binary distribution bundles junit # junit # 4.12 (http://junit.org/junit4/) which are available under the CPL v1.0 license (https://eclipse.org/legal/cpl-v10.html) Common Public License - v 1.0 THE ACCOMPANYING PROGRAM IS PROVIDED UNDER THE TERMS OF THIS COMMON PUBLIC LICENSE ("AGREEMENT"). ANY USE, REPRODUCTION OR DISTRIBUTION OF THE PROGRAM CONSTITUTES RECIPIENT'S ACCEPTANCE OF THIS AGREEMENT. 1. DEFINITIONS "Contribution" means: a) in the case of the initial Contributor, the initial code and documentation distributed under this Agreement, and b) in the case of each subsequent Contributor: i) changes to the Program, and ii) additions to the Program; where such changes and/or additions to the Program originate from and are distributed by that particular Contributor. A Contribution 'originates' from a Contributor if it was added to the Program by such Contributor itself or anyone acting on such Contributor's behalf. Contributions do not include additions to the Program which: (i) are separate modules of software distributed in conjunction with the Program under their own license agreement, and (ii) are not derivative works of the Program. "Contributor" means any person or entity that distributes the Program. "Licensed Patents " mean patent claims licensable by a Contributor which are necessarily infringed by the use or sale of its Contribution alone or when combined with the Program. "Program" means the Contributions distributed in accordance with this Agreement. "Recipient" means anyone who receives the Program under this Agreement, including all Contributors. 2. GRANT OF RIGHTS a) Subject to the terms of this Agreement, each Contributor hereby grants Recipient a non-exclusive, worldwide, royalty-free copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, distribute and sublicense the Contribution of such Contributor, if any, and such derivative works, in source code and object code form. b) Subject to the terms of this Agreement, each Contributor hereby grants Recipient a non-exclusive, worldwide, royalty-free patent license under Licensed Patents to make, use, sell, offer to sell, import and otherwise transfer the Contribution of such Contributor, if any, in source code and object code form. This patent license shall apply to the combination of the Contribution and the Program if, at the time the Contribution is added by the Contributor, such addition of the Contribution causes such combination to be covered by the Licensed Patents. The patent license shall not apply to any other combinations which include the Contribution. No hardware per se is licensed hereunder. c) Recipient understands that although each Contributor grants the licenses to its Contributions set forth herein, no assurances are provided by any Contributor that the Program does not infringe the patent or other intellectual property rights of any other entity. Each Contributor disclaims any liability to Recipient for claims brought by any other entity based on infringement of intellectual property rights or otherwise. As a condition to exercising the rights and licenses granted hereunder, each Recipient hereby assumes sole responsibility to secure any other intellectual property rights needed, if any. For example, if a third party patent license is required to allow Recipient to distribute the Program, it is Recipient's responsibility to acquire that license before distributing the Program. d) Each Contributor represents that to its knowledge it has sufficient copyright rights in its Contribution, if any, to grant the copyright license set forth in this Agreement. 3. REQUIREMENTS A Contributor may choose to distribute the Program in object code form under its own license agreement, provided that: a) it complies with the terms and conditions of this Agreement; and b) its license agreement: i) effectively disclaims on behalf of all Contributors all warranties and conditions, express and implied, including warranties or conditions of title and non-infringement, and implied warranties or conditions of merchantability and fitness for a particular purpose; ii) effectively excludes on behalf of all Contributors all liability for damages, including direct, indirect, special, incidental and consequential damages, such as lost profits; iii) states that any provisions which differ from this Agreement are offered by that Contributor alone and not by any other party; and iv) states that source code for the Program is available from such Contributor, and informs licensees how to obtain it in a reasonable manner on or through a medium customarily used for software exchange. When the Program is made available in source code form: a) it must be made available under this Agreement; and b) a copy of this Agreement must be included with each copy of the Program. Contributors may not remove or alter any copyright notices contained within the Program. Each Contributor must identify itself as the originator of its Contribution, if any, in a manner that reasonably allows subsequent Recipients to identify the originator of the Contribution. 4. COMMERCIAL DISTRIBUTION Commercial distributors of software may accept certain responsibilities with respect to end users, business partners and the like. While this license is intended to facilitate the commercial use of the Program, the Contributor who includes the Program in a commercial product offering should do so in a manner which does not create potential liability for other Contributors. Therefore, if a Contributor includes the Program in a commercial product offering, such Contributor ("Commercial Contributor") hereby agrees to defend and indemnify every other Contributor ("Indemnified Contributor") against any losses, damages and costs (collectively "Losses") arising from claims, lawsuits and other legal actions brought by a third party against the Indemnified Contributor to the extent caused by the acts or omissions of such Commercial Contributor in connection with its distribution of the Program in a commercial product offering. The obligations in this section do not apply to any claims or Losses relating to any actual or alleged intellectual property infringement. In order to qualify, an Indemnified Contributor must: a) promptly notify the Commercial Contributor in writing of such claim, and b) allow the Commercial Contributor to control, and cooperate with the Commercial Contributor in, the defense and any related settlement negotiations. The Indemnified Contributor may participate in any such claim at its own expense. For example, a Contributor might include the Program in a commercial product offering, Product X. That Contributor is then a Commercial Contributor. If that Commercial Contributor then makes performance claims, or offers warranties related to Product X, those performance claims and warranties are such Commercial Contributor's responsibility alone. Under this section, the Commercial Contributor would have to defend claims against the other Contributors related to those performance claims and warranties, and if a court requires any other Contributor to pay any damages as a result, the Commercial Contributor must pay those damages. 5. NO WARRANTY EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, THE PROGRAM IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Each Recipient is solely responsible for determining the appropriateness of using and distributing the Program and assumes all risks associated with its exercise of rights under this Agreement, including but not limited to the risks and costs of program errors, compliance with applicable laws, damage to or loss of data, programs or equipment, and unavailability or interruption of operations. 6. DISCLAIMER OF LIABILITY EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, NEITHER RECIPIENT NOR ANY CONTRIBUTORS SHALL HAVE ANY LIABILITY FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING WITHOUT LIMITATION LOST PROFITS), HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OR DISTRIBUTION OF THE PROGRAM OR THE EXERCISE OF ANY RIGHTS GRANTED HEREUNDER, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 7. GENERAL If any provision of this Agreement is invalid or unenforceable under applicable law, it shall not affect the validity or enforceability of the remainder of the terms of this Agreement, and without further action by the parties hereto, such provision shall be reformed to the minimum extent necessary to make such provision valid and enforceable. If Recipient institutes patent litigation against a Contributor with respect to a patent applicable to software (including a cross-claim or counterclaim in a lawsuit), then any patent licenses granted by that Contributor to such Recipient under this Agreement shall terminate as of the date such litigation is filed. In addition, if Recipient institutes patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Program itself (excluding combinations of the Program with other software or hardware) infringes such Recipient's patent(s), then such Recipient's rights granted under Section 2(b) shall terminate as of the date such litigation is filed. All Recipient's rights under this Agreement shall terminate if it fails to comply with any of the material terms or conditions of this Agreement and does not cure such failure in a reasonable period of time after becoming aware of such noncompliance. If all Recipient's rights under this Agreement terminate, Recipient agrees to cease use and distribution of the Program as soon as reasonably practicable. However, Recipient's obligations under this Agreement and any licenses granted by Recipient relating to the Program shall continue and survive. Everyone is permitted to copy and distribute copies of this Agreement, but in order to avoid inconsistency the Agreement is copyrighted and may only be modified in the following manner. The Agreement Steward reserves the right to publish new versions (including revisions) of this Agreement from time to time. No one other than the Agreement Steward has the right to modify this Agreement. IBM is the initial Agreement Steward. IBM may assign the responsibility to serve as the Agreement Steward to a suitable separate entity. Each new version of the Agreement will be given a distinguishing version number. The Program (including Contributions) may always be distributed subject to the version of the Agreement under which it was received. In addition, after a new version of the Agreement is published, Contributor may elect to distribute the Program (including its Contributions) under the new version. Except as expressly stated in Sections 2(a) and 2(b) above, Recipient receives no rights or licenses to the intellectual property of any Contributor under this Agreement, whether expressly, by implication, estoppel or otherwise. All rights in the Program not expressly granted under this Agreement are reserved. This Agreement is governed by the laws of the State of New York and the intellectual property laws of the United States of America. No party to this Agreement will bring a legal action under this Agreement more than one year after the cause of action arose. Each party waives its rights to a jury trial in any resulting litigation. -------------------------------------------------------------------------------- Binary distribution bundles org.jamon # jamon-runtime # 2.4.1 (http://www.jamon.org/) which are available under the MPL v1.1 license (http://www.mozilla.org/MPL/MPL-1.1.txt) -------------------------------------------------------------------------------- Binary distribution bundles org.slf4j # slf4j-api # 1.7.25 (https://www.slf4j.org/) org.slf4j # slf4j-api # 1.7.10 (https://www.slf4j.org/) org.slf4j # slf4j-api # 1.7.2 (https://www.slf4j.org/) org.slf4j # slf4j-log4j12 # 1.7.18 (https://www.slf4j.org/) org.slf4j # slf4j-log4j12 # 1.7.10 (https://www.slf4j.org/) org.jruby.jcodings # jcodings # 1.0.8 (https://github.com/jruby/jcodings/) org.jruby.joni # joni # 2.1.2 (https://github.com/jruby/joni/) which are available under the MIT license (http://opensource.org/licenses/mit-license.php) Copyright (c) 2004-2008 QOS.ch All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -------------------------------------------------------------------------------- Binary distribution bundles com.github.zafarkhaja # java-semver # 0.9.0 (https://github.com/zafarkhaja/jsemver) which are available under the MIT license (http://opensource.org/licenses/mit-license.php) The MIT License Copyright 2012-2014 Zafar Khaja . Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -------------------------------------------------------------------------------- Binary distribution bundles com.github.scopt # scopt_2.11 # 3.5.0 (https://github.com/scopt/scopt) which are available under the MIT license (http://opensource.org/licenses/mit-license.php) Copyright (c) scopt contributors Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -------------------------------------------------------------------------------- Binary distribution bundles com.esotericsoftware # kryo # 3.0.3 (https://github.com/EsotericSoftware/kryo) com.esotericsoftware # minlog # 1.3.0 (https://github.com/EsotericSoftware/minlog) com.esotericsoftware # reflectasm # 1.10.1 (https://github.com/EsotericSoftware/reflectasm) com.esotericsoftware.kryo # kryo # 2.21 (https://github.com/EsotericSoftware/kryo) com.esotericsoftware.minlog # minlog # 1.2 (https://github.com/EsotericSoftware/minlog) com.esotericsoftware.reflectasm # reflectasm # 1.07 (https://github.com/EsotericSoftware/reflectasm) which is available under the BSD license (http://www.opensource.org/licenses/bsd-license.php) Copyright (c) 2008, Nathan Sweet All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Esoteric Software nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -------------------------------------------------------------------------------- Binary distribution bundles com.google.protobuf # protobuf-java # 2.5.0 (https://github.com/google/protobuf) com.google.protobuf # protobuf-java # 2.6.1 (https://github.com/google/protobuf) Copyright 2008, Google Inc. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Google Inc. nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Code generated by the Protocol Buffer compiler is owned by the owner of the input file used when generating it. This code is not standalone and requires a support library to be linked with it. This support library is itself covered by the above license. which is available under the BSD license (http://www.opensource.org/licenses/bsd-license.php) -------------------------------------------------------------------------------- Binary distribution bundles xmlenc # xmlenc # 0.52 (http://xmlenc.sourceforge.net/) which is available under the BSD license (http://www.opensource.org/licenses/bsd-license.php) Copyright 2003-2005, Ernst de Haan All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -------------------------------------------------------------------------------- Binary distribution bundles com.thoughtworks.paranamer # paranamer # 2.3 (https://github.com/paul-hammant/paranamer) com.thoughtworks.paranamer # paranamer # 2.6 (https://github.com/paul-hammant/paranamer) which is available under the BSD license (http://www.opensource.org/licenses/bsd-license.php) Copyright (c) 2006 Paul Hammant & ThoughtWorks Inc All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holders nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -------------------------------------------------------------------------------- Binary distribution bundles org.hamcrest # hamcrest-core # 1.3 (http://hamcrest.org/JavaHamcrest/) which is available under the BSD license (http://www.opensource.org/licenses/bsd-license.php) Copyright (c) 2000-2015 www.hamcrest.org All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. Neither the name of Hamcrest nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -------------------------------------------------------------------------------- Binary distribution bundles asm # asm # 3.1 (http://asm.ow2.org/) org.ow2.asm # asm # 5.0.3 (http://asm.ow2.org/) which is available under the BSD license (http://www.opensource.org/licenses/bsd-license.php) Copyright (c) 2000-2011 INRIA, France Telecom All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holders nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -------------------------------------------------------------------------------- Binary distribution bundles org.clapper # grizzled-slf4j_2.11 # 1.0.2 (http://software.clapper.org/grizzled-slf4j/) which is available under the BSD license (http://www.opensource.org/licenses/bsd-license.php) Copyright © 2010-2016, Brian M. Clapper. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -------------------------------------------------------------------------------- Binary distribution bundles com.jcraft # jsch # 0.1.54 (http://www.jcraft.com/jsch/) which is available under the BSD license (http://www.jcraft.com/jsch/LICENSE.txt) Copyright (c) 2002-2015 Atsuhiko Yamanaka, JCraft,Inc. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The names of the authors may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL JCRAFT, INC. OR ANY CONTRIBUTORS TO THIS SOFTWARE BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -------------------------------------------------------------------------------- Binary distribution bundles org.scala-lang # scala-library # 2.11.12 (http://scala-lang.org/) org.scala-lang # scala-compiler # 2.11.12 (http://scala-lang.org/) org.scala-lang # scala-reflect # 2.11.12 (http://scala-lang.org/) org.scala-lang # scalap # 2.11.12 (http://scala-lang.org/) org.scala-lang.modules # scala-java8-compat_2.11 # 0.7.0 (http://scala-lang.org/) org.scala-lang.modules # scala-parser-combinators_2.11 # 1.0.6 (http://scala-lang.org/) org.scala-lang.modules # scala-parser-combinators_2.11 # 1.1.0 (http://scala-lang.org/) org.scala-lang.modules # scala-xml_2.11 # 1.0.5 (http://scala-lang.org/) which is available under the BSD license (http://www.scala-lang.org/downloads/license.html) Copyright (c) 2002-2017 EPFL Copyright (c) 2011-2017 Lightbend, Inc. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the EPFL nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -------------------------------------------------------------------------------- Binary distribution bundles org.fusesource.leveldbjni # leveldbjni-all # 1.8 (https://github.com/fusesource/leveldbjni) which is available under the BSD license (http://www.opensource.org/licenses/BSD-3-Clause) Copyright (c) 2011 FuseSource Corp. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of FuseSource Corp. nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -------------------------------------------------------------------------------- The following libraries are from the public domain. org.tukaani # xz # 1.0 (http://tukaani.org/xz/java.html) org.reactivestreams # reactive-streams # 1.0.2 (http://www.reactive-streams.org/) ================================================ FILE: NOTICE.txt ================================================ Apache PredictionIO Copyright 2016 The Apache Software Foundation This product includes software developed at The Apache Software Foundation (http://www.apache.org/). This product depends on third party software that falls under a variety of licenses. All dependencies with licenses other than Apache are specified in the LICENSE file. Please see LICENSE for additional copyright and licensing information. ================================================ FILE: PMC.md ================================================ # Project Management Committee Documentation This outlines the steps for a PMC member to create a new release. More details and policy guidelines can be found here: http://www.apache.org/dev/release-distribution ## Release Procedure 1. Generate code signing key if you do not already have one for Apache. Refer to http://apache.org/dev/openpgp.html#generate-key on how to generate a strong code signing key. 2. Add your public key to the `KEYS` file at the root of the source code tree. 3. Create a new release branch, with version bumped to the next release version. * `git checkout -b release/0.15.0` * Replace all `0.15.0-SNAPSHOT` in the code tree to `0.15.0` * `git commit -am "Prepare 0.15.0-rc1"` * `git tag -am "Apache PredictionIO 0.15.0-rc1" v0.15.0-rc1` 4. Push the release branch and tag to the apache git repo. 5. Wait for Travis to pass build on the release branch. 6. Package a clean tarball for staging a release candidate. * `git archive --format tar v0.15.0-rc1 > ../apache-predictionio-0.15.0-rc1.tar` * `cd ..; gzip apache-predictionio-0.15.0-rc1.tar` 7. Generate detached signature for the release candidate. (http://apache.org/dev/release-signing.html#openpgp-ascii-detach-sig) * `gpg --armor --output apache-predictionio-0.15.0-rc1.tar.gz.asc --detach-sig apache-predictionio-0.15.0-rc1.tar.gz` 8. Generate SHA512 checksums for the release candidate. * `gpg --print-md SHA512 apache-predictionio-0.15.0-rc1.tar.gz > apache-predictionio-0.15.0-rc1.tar.gz.sha512` 9. Run `./make-distribution.sh` and repeat steps 6 to 8 to create binary distribution release. * `mv PredictionIO-0.15.0.tar.gz apache-predictionio-0.15.0-bin.tar.gz` * `gpg --armor --output apache-predictionio-0.15.0-bin.tar.gz.asc --detach-sig apache-predictionio-0.15.0-bin.tar.gz` * `gpg --print-md SHA512 apache-predictionio-0.15.0-bin.tar.gz > apache-predictionio-0.15.0-bin.tar.gz.sha512` 10. If you have not done so, use SVN to checkout https://dist.apache.org/repos/dist/dev/predictionio. This is the area for staging release candidates for voting. * `svn co https://dist.apache.org/repos/dist/dev/predictionio` 11. Create a subdirectory at the SVN staging area. The area should have a `KEYS` file. * `mkdir apache-predictionio-0.15.0-rc1` * `cp apache-predictionio-0.15.0-* apache-predictionio-0.15.0-rc1` 12. If you have updated the `KEYS` file, also copy that to the staging area. 13. `svn commit -m "Apache PredictionIO 0.15.0-rc1"` 14. Set up credentials with Apache Nexus using the SBT Sonatype plugin. Put this in `~/.sbt/1.0/sonatype.sbt`. ``` publishTo := { val nexus = "https://repository.apache.org/" if (isSnapshot.value) Some("snapshots" at nexus + "content/repositories/snapshots") else Some("releases" at nexus + "service/local/staging/deploy/maven2") } credentials += Credentials("Sonatype Nexus Repository Manager", "repository.apache.org", "", "") ``` 15. Run `sbt/sbt +publishLocal` first and then run `sbt/sbt +publishSigned +storage/publishSigned`. Close the staged repository on Apache Nexus. 16. Send out email for voting on PredictionIO dev mailing list. ``` Subject: [VOTE] Apache PredictionIO 0.15.0 Release (RC1) This is the vote for 0.15.0 of Apache PredictionIO. The vote will run for at least 72 hours and will close on Apr 7th, 2017. The release candidate artifacts can be downloaded here: https://dist.apache.org/repos/dist/dev/predictionio/apache-predictionio-0.15.0-rc1/ Test results of RC1 can be found here: https://travis-ci.org/apache/predictionio/builds/xxx Maven artifacts are built from the release candidate artifacts above, and are provided as convenience for testing with engine templates. The Maven artifacts are provided at the Maven staging repo here: https://repository.apache.org/content/repositories/orgapachepredictionio-nnnn/ All JIRAs completed for this release are tagged with 'FixVersion = 0.15.0'. You can view them here: https://issues.apache.org/jira/secure/ReleaseNote .jspa?projectId=12320420&version=12337844 The artifacts have been signed with Key : YOUR_KEY_ID Please vote accordingly: [ ] +1, accept RC as the official 0.15.0 release [ ] -1, do not accept RC as the official 0.15.0 release because... ``` 17. After the vote has been accepted, update `RELEASE.md`. 18. Create a release tag 19. Repeat steps 6 to 8 to create the official release, and step 15 to publish it. 20. Use SVN to checkout https://dist.apache.org/repos/dist/release/predictionio/. This is the area for staging actual releases. 21. Create a subdirectory at the SVN staging area. The area should have a `KEYS` file. * `mkdir 0.15.0` * Copy the binary distribution from the dev/ tree to the release/ tree * Copy the official release to the release/ tree 22. If you have updated the `KEYS` file, also copy that to the staging area. 23. Remove old releases from the ASF distribution mirrors. (https://www.apache.org/dev/mirrors.html#location) * `svn delete 0.14.0` 24. `svn commit -m "Apache PredictionIO 0.15.0"` 25. Document breaking changes in https://predictionio.apache.org/resources/upgrade/. 26. Mark the version as released on JIRA. (https://issues.apache.org/jira/projects/PIO?selectedItem=com.atlassian.jira.jira-projects-plugin%3Arelease-page&status=no-filter) 27. Send out an email to the following mailing lists: announce, user, dev. ``` Subject: [ANNOUNCE] Apache PredictionIO 0.15.0 Release The Apache PredictionIO team would like to announce the release of Apache PredictionIO 0.15.0. Release notes are here: https://github.com/apache/predictionio/blob/v0.15.0/RELEASE.md Apache PredictionIO is an open source Machine Learning Server built on top of state-of-the-art open source stack, that enables developers to manage and deploy production-ready predictive services for various kinds of machine learning tasks. More details regarding Apache PredictionIO can be found here: https://predictionio.apache.org/ The release artifacts can be downloaded here: https://www.apache.org/dyn/closer.lua/predictionio/0.15.0/apache-predictionio-0.15.0-bin.tar.gz All JIRAs completed for this release are tagged with 'FixVersion = 0.15.0'; the JIRA release notes can be found here: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320420&version=12337844 Thanks! The Apache PredictionIO Team ``` ================================================ FILE: README.md ================================================ # [Apache PredictionIO](http://predictionio.apache.org) [![Build Status](https://api.travis-ci.org/apache/predictionio.svg?branch=develop)](https://travis-ci.org/apache/predictionio) Apache PredictionIO is an open source machine learning framework for developers, data scientists, and end users. It supports event collection, deployment of algorithms, evaluation, querying predictive results via REST APIs. It is based on scalable open source services like Hadoop, HBase (and other DBs), Elasticsearch, Spark and implements what is called a Lambda Architecture. To get started, check out http://predictionio.apache.org! ## Table of contents - [Installation](#installation) - [Quick Start](#quick-start) - [Bugs and Feature Requests](#bugs-and-feature-requests) - [Documentation](#documentation) - [Contributing](#contributing) - [Community](#community) ## Installation A few installation options available. * [Installing Apache PredictionIO from Binary/Source](http://predictionio.apache.org/install/install-sourcecode/) * [Installing Apache PredictionIO with Docker](http://predictionio.apache.org/install/install-docker/) ## Quick Start * [Recommendation Engine Template Quick Start](http://predictionio.apache.org/templates/recommendation/quickstart/) Guide * [Similiar Product Engine Template Quick Start](http://predictionio.apache.org/templates/similarproduct/quickstart/) Guide * [Classification Engine Template Quick Start](http://predictionio.apache.org/templates/classification/quickstart/) Guide ## Bugs and Feature Requests Use [Apache JIRA](https://issues.apache.org/jira/browse/PIO) to report bugs or request new features. ## Documentation Documentation, included in this repo in the `docs/manual` directory, is built with [Middleman](http://middlemanapp.com/) and publicly hosted at [predictionio.apache.org](http://predictionio.apache.org/). Interested in helping with our documentation? Read [Contributing Documentation](http://predictionio.apache.org/community/contribute-documentation/). ## Community Keep track of development and community news. * Subscribe to the user mailing list and the dev mailing list * Follow [@predictionio](https://twitter.com/predictionio) on Twitter. ## Contributing Read the [Contribute Code](http://predictionio.apache.org/community/contribute-code/) page. You can also list your projects on the [Community Project page](http://predictionio.apache.org//community/projects/). ## License Apache PredictionIO is under [Apache 2 license](http://www.apache.org/licenses/LICENSE-2.0.html). ================================================ FILE: RELEASE.md ================================================ # Release Notes and News **Note:** For upgrade instructions please refer to [this page](https://predictionio.apache.org/resources/upgrade/). ## Version History ### 0.14.0 Mar 11, 2019 #### Breaking changes - [PIO-168](https://issues.apache.org/jira/browse/PIO-168): Elasticsearch 6.x support (see the [pull request](https://github.com/apache/predictionio/pull/466)) #### New Features - [PIO-183](https://issues.apache.org/jira/browse/PIO-183): Add Jupyter Docker image - [PIO-199](https://issues.apache.org/jira/browse/PIO-199): Spark 2.4 (Scala 2.11) support #### Behavior Changes - [PIO-31](https://issues.apache.org/jira/browse/PIO-31): Move from spray to akka-http in servers - [PIO-171](https://issues.apache.org/jira/browse/PIO-171): Drop Scala 2.10 and Spark 1.6 support - [PIO-175](https://issues.apache.org/jira/browse/PIO-175): Deprecation of Elasticsearch 1.x support - [PIO-179](https://issues.apache.org/jira/browse/PIO-179): bump up hbase client version and make it configurable - [PIO-192](https://issues.apache.org/jira/browse/PIO-192): Enhance PySpark support - [PIO-196](https://issues.apache.org/jira/browse/PIO-196): Use external PySpark environment variables in Jupyter Docker image #### Other Changes - [PIO-153](https://issues.apache.org/jira/browse/PIO-153): Allow use of GNU tar on non-GNU systems - [PIO-170](https://issues.apache.org/jira/browse/PIO-170): Upgrade sbt to 1.x - [PIO-176](https://issues.apache.org/jira/browse/PIO-176): Clean up unmanaged sources in the data module - [PIO-182](https://issues.apache.org/jira/browse/PIO-182): Add asynchronous (non-blocking) methods to LEventStore - [PIO-188](https://issues.apache.org/jira/browse/PIO-188): Update the build matrix to the latest supported versions - [PIO-189](https://issues.apache.org/jira/browse/PIO-189): ES6 integration test fails - [PIO-194](https://issues.apache.org/jira/browse/PIO-194): S3 Model Data Storage should allow more flexible ways for specifying AWS credentials - [PIO-203](https://issues.apache.org/jira/browse/PIO-203): pio status warnings - [PIO-205](https://issues.apache.org/jira/browse/PIO-205): Update Dockerfile to reflect new Spark version - [PIO-206](https://issues.apache.org/jira/browse/PIO-206): Spark 2.3.2 to 2.3.3 #### Documentation - [PIO-172](https://issues.apache.org/jira/browse/PIO-172): Migration guide for ES 6.x changes - [PIO-180](https://issues.apache.org/jira/browse/PIO-180): Trivial LiveDoc Link Change in Readme - [PIO-185](https://issues.apache.org/jira/browse/PIO-185): Non-tracked Link in Apache Project page - [PIO-195](https://issues.apache.org/jira/browse/PIO-195): Improve readability and grammar of documentation #### Credits The following contributors have spent a great deal of effort to bring to you this release: Alexander Merritt, Chris Wewerka, Donald Szeto, Naoki Takezoe, Saurabh Gulati, Shinsuke Sugaya, Takako Shimamoto, Wei Chen, Yavor Stoychev ### 0.13.0 Sep 20, 2018 #### New Features - [PIO-161](https://issues.apache.org/jira/browse/PIO-161): Spark 2.3 support. #### Behavior Changes - [PIO-158](https://issues.apache.org/jira/browse/PIO-158): More officially deprecate support for Scala 2.10 and Spark 1.x. #### Other Changes - [PIO-152](https://issues.apache.org/jira/browse/PIO-152): DOAP syntax error. - [PIO-155](https://issues.apache.org/jira/browse/PIO-155): Fix 'Topic Labelling with Wikipedia' Template Link. - [PIO-156](https://issues.apache.org/jira/browse/PIO-156): Stale release on download page. - [PIO-160](https://issues.apache.org/jira/browse/PIO-160): Array out of bound exception in JDBCUtils when --env is not supplied to CreateWorkflow. #### Credits The following contributors have spent a great deal of effort to bring to you this release: Donald Szeto, Takako Shimamoto ### 0.12.1 Mar 11, 2018 #### New Features - [PIO-125](https://issues.apache.org/jira/browse/PIO-125): Add support for Spark 2.2. - [PIO-137](https://issues.apache.org/jira/browse/PIO-137): Add CleanupFunctions for Python. #### Behavior Changes - [PIO-126](https://issues.apache.org/jira/browse/PIO-126): Update install.sh to use binary release. - [PIO-137](https://issues.apache.org/jira/browse/PIO-137): Create a connection object at a worker to delete events. #### Other Changes - [PIO-101](https://issues.apache.org/jira/browse/PIO-101): Document usage of plug-in of event server and engine server. - [PIO-127](https://issues.apache.org/jira/browse/PIO-127): Update PMC documentation for release process. - [PIO-129](https://issues.apache.org/jira/browse/PIO-129): Move CLI document in side menu. - [PIO-131](https://issues.apache.org/jira/browse/PIO-131): Fix Apache licensing issues for doc site. - [PIO-133](https://issues.apache.org/jira/browse/PIO-133): Make sure project web site meets all requirements in Apache Project Website Branding Policy. - [PIO-135](https://issues.apache.org/jira/browse/PIO-135): Remove all incubating status. - [PIO-139](https://issues.apache.org/jira/browse/PIO-139): Update release process doc to include closing all resolved stories within the new release. - [PIO-146](https://issues.apache.org/jira/browse/PIO-146): Change TM to (R) on text marks. - [PIO-147](https://issues.apache.org/jira/browse/PIO-147): Fix broken Scala API documentation. - [PIO-150](https://issues.apache.org/jira/browse/PIO-150): Update Ruby gem dependency versions for security improvement. - [PIO-151](https://issues.apache.org/jira/browse/PIO-151): Add S3 storage provider docs. #### Credits The following contributors have spent a great deal of effort to bring to you this release: Chan Lee, Donald Szeto, Helene Brashear, James Ward, Jeffrey Cafferata, Mars Hall, Naoki Takezoe, Shinsuke Sugaya, Steven Yan, Takahiro Hagino, Takako Shimamoto ### 0.12.0 Sep 27, 2017 #### New Features - [PIO-61](https://issues.apache.org/jira/browse/PIO-61): S3 support for model data - [PIO-69](https://issues.apache.org/jira/browse/PIO-69), [PIO-91](https://issues.apache.org/jira/browse/PIO-91): Binary distribution of PredictionIO - [PIO-105](https://issues.apache.org/jira/browse/PIO-105), [PIO-110](https://issues.apache.org/jira/browse/PIO-110), [PIO-111](https://issues.apache.org/jira/browse/PIO-111): Batch predictions - [PIO-95](https://issues.apache.org/jira/browse/PIO-95): Raise request timeout for REST API to 35-seconds - [PIO-114](https://issues.apache.org/jira/browse/PIO-114): Basic HTTP authentication for Elasticsearch 5.x StorageClient - [PIO-116](https://issues.apache.org/jira/browse/PIO-116): PySpark support #### Breaking changes - [PIO-106](https://issues.apache.org/jira/browse/PIO-106): Elasticsearch 5.x StorageClient should reuse RestClient (see the [pull request](https://github.com/apache/predictionio/pull/421)) #### Behavior Changes - [PIO-59](https://issues.apache.org/jira/browse/PIO-59): `pio app new` uses /dev/urandom/ to generate entropy. - [PIO-72](https://issues.apache.org/jira/browse/PIO-72): `pio-shell` properly loads storage dependencies. - [PIO-83](https://issues.apache.org/jira/browse/PIO-83), [PIO-119](https://issues.apache.org/jira/browse/PIO-119): Default environment changed to Spark 2.1.1, Scala 2.11.8, and Elasticsearch 5.5.2. - [PIO-99](https://issues.apache.org/jira/browse/PIO-99): `pio-build` checks for compilation errors before proceeding to build engine. - [PIO-100](https://issues.apache.org/jira/browse/PIO-100): `pio` commands no longer display SLF4J warning messages. #### Other Changes - [PIO-56](https://issues.apache.org/jira/browse/PIO-56): Core unit tests no longer require meta data setup. - [PIO-60](https://issues.apache.org/jira/browse/PIO-60), [PIO-62](https://issues.apache.org/jira/browse/PIO-62): Minor fixes in authorship information and license checking. - [PIO-63](https://issues.apache.org/jira/browse/PIO-63): Apache incubator logo and disclaimer is displayed on the website. - [PIO-65](https://issues.apache.org/jira/browse/PIO-65): Integration tests on Travis caches downloaded jars. - [PIO-66](https://issues.apache.org/jira/browse/PIO-66): More detailed documentation regarding release process and adding JIRA tickets. - [PIO-90](https://issues.apache.org/jira/browse/PIO-90): Improved performance for /batch/events.json API call. - [PIO-94](https://issues.apache.org/jira/browse/PIO-94): More detailed stack trace for REST API errors. - [PIO-97](https://issues.apache.org/jira/browse/PIO-97): Update examples in official templates. - [PIO-102](https://issues.apache.org/jira/browse/PIO-102), [PIO-117](https://issues.apache.org/jira/browse/PIO-117), [PIO-118](https://issues.apache.org/jira/browse/PIO-118), [PIO-120](https://issues.apache.org/jira/browse/PIO-120): Bug fixes, refactoring, and improved performance on Elasticsearch behavior. - [PIO-104](https://issues.apache.org/jira/browse/PIO-104): Bug fix regarding plugins. - [PIO-107](https://issues.apache.org/jira/browse/PIO-107): Obsolete experimental examples are removed. #### Credits The following contributors have spent a great deal of effort to bring to you this release: Aayush Kumar, Chan Lee, Donald Szeto, Hugo Duksis, Juha Syrjälä, Lucas Bonatto, Marius Rabenarivo, Mars Hall, Naoki Takezoe, Nilmax Moura, Shinsuke Sugaya, Tomasz Stęczniewski, Vaghawan Ojha ### 0.11.0 Apr 25, 2017 #### New Features - [PIO-30](https://issues.apache.org/jira/browse/PIO-30): Scala 2.11 support - [PIO-30](https://issues.apache.org/jira/browse/PIO-30): Spark 2 support - [PIO-49](https://issues.apache.org/jira/browse/PIO-49): Elasticsearch 5 support - [PIO-30](https://issues.apache.org/jira/browse/PIO-30), [PIO-49](https://issues.apache.org/jira/browse/PIO-49): Flexible build system - [PIO-47](https://issues.apache.org/jira/browse/PIO-47), [PIO-51](https://issues.apache.org/jira/browse/PIO-51): Removal of engine manifests - [PIO-49](https://issues.apache.org/jira/browse/PIO-49): Modularized storage backend modules - [PIO-45](https://issues.apache.org/jira/browse/PIO-45): Self cleaning data source #### Behavior Changes - [PIO-25](https://issues.apache.org/jira/browse/PIO-25): `pio-start-all` will no longer start PostgreSQL if it is not being used. - [PIO-47](https://issues.apache.org/jira/browse/PIO-47), [PIO-51](https://issues.apache.org/jira/browse/PIO-51): `pio build` no longer requires access to the metadata repository. `pio` commands will now accept an optional `--engine-dir` parameter if you want to run `pio build`, `pio train`, or `pio deploy` outside of an engine directory. This is an interim solution before an engine registry feature becomes available in the future. - [PIO-49](https://issues.apache.org/jira/browse/PIO-49): PostgreSQL JDBC driver is no longer bundled with the core assembly. If you are using PostgreSQL, you must download the JDBC driver and update your configuration to point to the correct JDBC driver file. - [PIO-54](https://issues.apache.org/jira/browse/PIO-54): New generated access keys will no longer start with a `-` character. #### Other Changes - [PIO-28](https://issues.apache.org/jira/browse/PIO-28): Code refactoring of the command line interface. It is now possible to develop new interfaces that perform the same functionalities provided by the CLI. - [PIO-53](https://issues.apache.org/jira/browse/PIO-53): Integration tests can now be tied to every single Git commit, without the need to update the official test Docker image. - The meta data and model data access object methods are now public and marked as Core Developer API. #### Credits The following contributors have spent a great deal of effort to bring to you this release: Ahmet DAL, Alexander Merritt, Amy Lin, Bansari Shah, Chan Lee, Chris Woodford, Daniel Gabrieli, Dennis Jung, Donald Szeto, Emily Rose, Hari Charan Ayada, infoquestsolutions, Jonny Daenen, Kenneth Chan, Laertis Pappas, Marcin Ziemiński, Naoki Takezoe, Rajdeep Dua, Shinsuke Sugaya, Pat Ferrel, scorpiovn, Suneel Marthi, Steven Yan, Takahiro Hagino, Takako Shimamoto ### 0.10.0 Oct 7, 2016 - Make SSL optional - Merge ActionML fork - First Apache PredictionIO release ### 0.9.7-aml (ActionML fork) Aug 5, 2016 - changed version id so artifacts don't conflict with naming in the Salesforce sponsored project. - bug fix in memory use during moving window event trim and compaction EventStore data. - update [install.sh](https://github.com/actionml/PredictionIO/blob/master/bin/install.sh) script for single line installs with options that support various combinations required by some templates. ### 0.9.6 April 11, 2015 For a detailed list of commits check [this page](https://github.com/apache/predictionio/commits/v0.9.6) - Upgrade components for install/runtime to Hbase 1, Spark 1.5.2 PIO still runs on older HBase and Spark back to 1.3.1, upgrading install of Elaticsearch to 1.5.2 since pio run well on it but also runs on older versions. - Support for maintaining a moving window of events by discarding old events from the EventStore - Support for doing a deploy without creating a Spark Context ### 0.9.6 (ActionML fork) March 26, 2016 - Upgrade components for install/runtime to Hbase 1.X, Spark 1.5.2 PIO still runs on older HBase and Spark back to 1.3.1, upgrading install of Elasticsearch to 1.5.2 since pio run well on it but also runs on older versions. - Support for maintaining a moving window of events by discarding old events from the EventStore - Support for doing a deploy without creating a Spark Context ### 0.9.5 October 14th, 2015 [Release Notes](https://github.com/apache/predictionio/blob/master/RELEASE.md) have been moved to Github and you are reading them. For a detailed list of commits check [this page](https://github.com/apache/predictionio/commits/v0.9.5) - Support batches of events sent to the EventServer as json arrays - Support creating an Elasticsearch StorageClient created for an Elasticsearch cluster from variables in pio-env.sh - Fixed some errors installing PredictionIO through install.sh when SBT was not correctly downloaded ### 0.9.4 July 16th, 2015 Release Notes - Support event permissions with different access keys at the Event Server interface - Support detection of 3rd party Apache Spark distributions - Support running `pio eval` without `engine.json` - Fix an issue where `--verbose` is not handled properly by `pio train` ### 0.9.3 May 20th, 2015 Release Notes - Add support of developing prediction engines in Java - Add support of PostgreSQL and MySQL - Spark 1.3.1 compatibility fix - Creation of specific app access keys - Prevent a case where `pio build` accidentally removes PredictionIO core library ### 0.9.2 April 14th, 2015 Release Notes - Channels in the Event Server - Spark 1.3+ support (upgrade to Spark 1.3+ required) - [Webhook Connector](http://predictionio.apache.org/community/contribute-webhook/) support - Engine and Event Servers now by default bind to 0.0.0.0 - Many documentation improvements ### 0.9.1 March 17th, 2015 Release Notes - Improved `pio-start-all` - Fixed a bug where `pio build` failed to set PredictionIO dependency version for engine templates ### 0.9.0 March 4th, 2015 Release Notes - [E-Commerce Recommendation Template](http://predictionio.apache.org/gallery/template-gallery#recommender-systems) which includes 1) out-of-stock items support 2) new user recommendation 3) unseen items only - [Complementary Purchase Template](http://predictionio.apache.org/gallery/template-gallery#unsupervised-learning) for shopping cart recommendation - [Lead Scoring Template](http://predictionio.apache.org/gallery/template-gallery#classification) predicts the probability of an user will convert in the current session - `pio-start-all`, `pio-stop-all` commands to start and stop all PredictionIO related services ### 0.8.6 Feb 10th, 2015 Release Notes - New engine template - [Product Ranking](/templates/productranking/quickstart/) for personalized product listing - [CloudFormation deployment](/system/deploy-cloudformation/) available ================================================ FILE: assembly/build.sbt ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import NativePackagerHelper._ import RpmConstants._ import com.typesafe.sbt.packager.linux.LinuxSymlink enablePlugins(RpmPlugin, DebianPlugin) name := "predictionio" maintainer in Linux := "Apache Software Foundation" packageSummary in Linux := "Apache PredictionIO" packageDescription := "Apache PredictionIO is an open source Machine Learning Server " + "built on top of state-of-the-art open source stack for developers " + "and data scientists create predictive engines for any machine learning task." version in Rpm := version.value.replace("-", "_") rpmRelease := "1" rpmVendor := "apache" rpmGroup := Some("Applications/System") rpmUrl := Some("http://predictionio.apache.org/") rpmLicense := Some("Apache License Version 2.0") maintainerScripts in Rpm := maintainerScriptsAppendFromFile((maintainerScripts in Rpm).value)( Pre -> (sourceDirectory.value / "rpm" / "scriptlets" / "preinst"), Postun -> (sourceDirectory.value / "rpm" / "scriptlets" / "postun") ) mappings in Universal ++= { val releaseFile = baseDirectory.value / ".." / "RELEASE.md" val buildPropFile = baseDirectory.value / ".." / "project" / "build.properties" val sbtFile = baseDirectory.value / ".." / "sbt" / "sbt" Seq(releaseFile -> "RELEASE", buildPropFile -> "project/build.properties", sbtFile -> "sbt/sbt") } mappings in Universal ++= { val files = IO.listFiles(baseDirectory.value / ".." / "conf") files filterNot { f => f.getName.endsWith(".travis") } map { case f if f.getName equals "pio-env.sh.template" => f -> "conf/pio-env.sh" case f => f -> s"conf/${f.getName}" } toSeq } mappings in Universal ++= { val files = IO.listFiles(baseDirectory.value / ".." / "bin") files map { f => f -> s"bin/${f.getName}" } toSeq } linuxPackageMappings := { val mappings = linuxPackageMappings.value mappings map { linuxPackage => val linuxFileMappings = linuxPackage.mappings map { case (f, n) if f.getName equals "conf" => f -> s"/etc/${name.value}" case (f, n) if f.getName equals "pio-env.sh.template" => f -> s"/etc/${name.value}/pio-env.sh" case (f, n) if f.getParent endsWith "conf" => f -> s"/etc/${name.value}/${f.getName}" case (f, n) if f.getName equals "log" => f -> s"/var/log/${name.value}" case (f, n) if f.getName equals "pio.log" => f -> s"/var/log/${name.value}/pio.log" case (f, n) => f -> n } val fileData = linuxPackage.fileData.copy( user = s"${name.value}", group = s"${name.value}" ) linuxPackage.copy( mappings = linuxFileMappings, fileData = fileData ) } } linuxPackageSymlinks := { Seq(LinuxSymlink("/usr/bin/pio", s"/usr/share/${name.value}/bin/pio"), LinuxSymlink("/usr/bin/pio-daemon", s"/usr/share/${name.value}/bin/pio-daemon")) } ================================================ FILE: assembly/src/debian/DEBIAN/postrm ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # REMOVE_USER_AND_GROUP=false case "$1" in remove) ;; purge) REMOVE_USER_AND_GROUP=true ;; failed-upgrade|abort-install|abort-upgrade|disappear|upgrade|disappear) ;; *) echo "post remove script called with unknown argument \`$1'" >&2 exit 1 ;; esac if [ "$REMOVE_USER_AND_GROUP" = "true" ]; then if id "predictionio" > /dev/null 2>&1 ; then userdel "predictionio" fi if getent group "predictionio" > /dev/null 2>&1 ; then groupdel "predictionio" fi fi ================================================ FILE: assembly/src/debian/DEBIAN/preinst ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # if ! getent group "predictionio" > /dev/null 2>&1 ; then echo -n "Creating predictionio group..." addgroup --quiet --system "predictionio" echo " OK" fi if ! id predictionio > /dev/null 2>&1 ; then echo -n "Creating predictionio user..." adduser --quiet \ --system \ --no-create-home \ --ingroup "predictionio" \ --disabled-password \ --shell /bin/false \ --home "/usr/share/predictionio" \ "predictionio" echo " OK" fi ================================================ FILE: assembly/src/rpm/scriptlets/postun ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # REMOVE_USER_AND_GROUP=false case "$1" in 0) REMOVE_USER_AND_GROUP=true ;; 1) ;; *) echo "post remove script called with unknown argument \`$1'" >&2 exit 1 ;; esac if [ "$REMOVE_USER_AND_GROUP" = "true" ]; then if id "predictionio" > /dev/null 2>&1 ; then userdel "predictionio" fi if getent group "predictionio" > /dev/null 2>&1 ; then groupdel "predictionio" fi fi ================================================ FILE: assembly/src/rpm/scriptlets/preinst ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # if ! getent group "predictionio" > /dev/null 2>&1 ; then echo -n "Creating predictionio group..." groupadd -r "predictionio" echo " OK" fi if ! id predictionio > /dev/null 2>&1 ; then echo -n "Creating predictionio user..." useradd --system \ -M \ --gid "predictionio" \ --shell /sbin/nologin \ --comment "fess user" \ -d "/usr/share/predictionio" \ "predictionio" echo " OK" fi ================================================ FILE: bin/cjson ================================================ #!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # curl -H "Content-Type: application/json" -d "$1" $2 ================================================ FILE: bin/compute-classpath.sh ================================================ #!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # Figure out where PredictionIO is installed FWDIR="$(cd `dirname $0`/..; pwd)" . ${FWDIR}/bin/load-pio-env.sh if [ -n "$JAVA_HOME" ]; then JAR_CMD="$JAVA_HOME/bin/jar" else JAR_CMD="jar" fi # Use pio-assembly JAR from either RELEASE or assembly directory if [ -f "${FWDIR}/RELEASE" ]; then assembly_folder="${FWDIR}"/lib else assembly_folder="${FWDIR}"/assembly/src/universal/lib fi MAIN_JAR=$(ls "${assembly_folder}"/pio-assembly*.jar 2>/dev/null) DATA_JARS=$(ls "${assembly_folder}"/spark/pio-data-*assembly*.jar 2>/dev/null) # Comma-separated list of assembly jars for submitting to spark-shell ASSEMBLY_JARS=$(printf "${MAIN_JAR}\n${DATA_JARS}" | paste -sd "," -) # Build up classpath CLASSPATH="${PIO_CONF_DIR}" # stable classpath for plugin JARs if [ -d "${FWDIR}/plugins" ]; then lib_plugin_jars=`ls "${FWDIR}"/plugins/*` lib_plugin_classpath='' for J in $lib_plugin_jars; do lib_plugin_classpath="${lib_plugin_classpath}:${J}" done CLASSPATH="$CLASSPATH${lib_plugin_classpath}" fi # stable classpath for Spark JARs lib_spark_jars=`ls "${assembly_folder}"/spark/*.jar` lib_spark_classpath='' for J in $lib_spark_jars; do lib_spark_classpath="${lib_spark_classpath}:${J}" done CLASSPATH="$CLASSPATH${lib_spark_classpath}" CLASSPATH="$CLASSPATH:${MAIN_JAR}" # Add hadoop conf dir if given -- otherwise FileSystem.*, etc fail ! Note, this # assumes that there is either a HADOOP_CONF_DIR or YARN_CONF_DIR which hosts # the configurtion files. if [ -n "$HADOOP_CONF_DIR" ]; then CLASSPATH="$CLASSPATH:$HADOOP_CONF_DIR" fi if [ -n "$YARN_CONF_DIR" ]; then CLASSPATH="$CLASSPATH:$YARN_CONF_DIR" fi if [ -n "$HBASE_CONF_DIR" ]; then CLASSPATH="$CLASSPATH:$HBASE_CONF_DIR" fi if [ -n "$ES_CONF_DIR" ]; then CLASSPATH="$CLASSPATH:$ES_CONF_DIR" fi if [ -n "$POSTGRES_JDBC_DRIVER" ]; then CLASSPATH="$CLASSPATH:$POSTGRES_JDBC_DRIVER" ASSEMBLY_JARS="$ASSEMBLY_JARS,$POSTGRES_JDBC_DRIVER" fi if [ -n "$MYSQL_JDBC_DRIVER" ]; then CLASSPATH="$CLASSPATH:$MYSQL_JDBC_DRIVER" ASSEMBLY_JARS="$ASSEMBLY_JARS,$MYSQL_JDBC_DRIVER" fi echo "$CLASSPATH" ================================================ FILE: bin/install.sh ================================================ #!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # OS=`uname` SPARK_VERSION=2.1.1 # Looks like support for Elasticsearch 2.0 will require 2.0 so deferring ELASTICSEARCH_VERSION=5.6.9 HBASE_VERSION=1.2.6 POSTGRES_VERSION=42.0.0 MYSQL_VERSION=5.1.41 PIO_DIR=$HOME/PredictionIO USER_PROFILE=$HOME/.profile PIO_FILE=PredictionIO-*.tar.gz TEMP_DIR=/tmp DISTRO_DEBIAN="Debian/Ubuntu" DISTRO_OTHER="Other" PGSQL="PostgreSQL" MYSQL="MySQL" ES_PGSQL="Elasticsearch + PostgreSQL" ES_HB="Elasticsearch + HBase" # Ask a yes/no question, with a default of "yes". confirm () { echo -ne $@ "[Y/n] " read -r response case ${response} in [yY][eE][sS]|[yY]|"") true ;; [nN][oO]|[nN]) false ;; *) confirm $@ ;; esac } echo -e "\033[1;32mWelcome to PredictionIO !\033[0m" # Detect OS if [[ "$OS" = "Darwin" ]]; then echo "Mac OS detected!" SED_CMD="sed -i ''" elif [[ "$OS" = "Linux" ]]; then echo "Linux OS detected!" SED_CMD="sed -i" else echo -e "\033[1;31mYour OS $OS is not yet supported for automatic install :(\033[0m" echo -e "\033[1;31mPlease do a manual install!\033[0m" exit 1 fi if [[ $USER ]]; then echo "Using user: $USER" else echo "No user found - this is OK!" fi if [[ "$OS" = "Linux" && $(cat /proc/1/cgroup) == *cpu:/docker/* ]]; then # Docker # REQUIRED: No user input for Docker! echo -e "\033[1;33mDocker detected!\033[0m" echo -e "\033[1;33mForcing Docker defaults!\033[0m" pio_dir=${PIO_DIR} vendors_dir=${pio_dir}/vendors spark_dir=${vendors_dir}/spark-${SPARK_VERSION} elasticsearch_dir=${vendors_dir}/elasticsearch-${ELASTICSEARCH_VERSION} hbase_dir=${vendors_dir}/hbase-${HBASE_VERSION} zookeeper_dir=${vendors_dir}/zookeeper echo "--------------------------------------------------------------------------------" echo -e "\033[1;32mOK, looks good!\033[0m" echo "You are going to install PredictionIO to: $pio_dir" echo -e "Vendor applications will go in: $vendors_dir\n" echo "Spark: $spark_dir" echo "Elasticsearch: $elasticsearch_dir" echo "HBase: $hbase_dir" echo "ZooKeeper: $zookeeper_dir" echo "--------------------------------------------------------------------------------" # Java Install echo -e "\033[1;36mStarting Java install...\033[0m" sudo add-apt-repository ppa:openjdk-r/ppa sudo apt-get update sudo apt-get install openjdk-8-jdk libgfortran3 -y echo -e "\033[1;32mJava install done!\033[0m" JAVA_HOME=$(readlink -f /usr/bin/javac | sed "s:/bin/javac::") elif [[ "$1" == "-y" ]]; then # Non-interactive echo -e "\033[1;33mNon-interactive installation requested!\033[0m" echo -e "\033[1;33mForcing defaults!\033[0m" pio_dir=${PIO_DIR} vendors_dir=${pio_dir}/vendors source_setup=${ES_HB} spark_dir=${vendors_dir}/spark-${SPARK_VERSION} elasticsearch_dir=${vendors_dir}/elasticsearch-${ELASTICSEARCH_VERSION} hbase_dir=${vendors_dir}/hbase-${HBASE_VERSION} zookeeper_dir=${vendors_dir}/zookeeper echo "--------------------------------------------------------------------------------" echo -e "\033[1;32mOK, looks good!\033[0m" echo "You are going to install PredictionIO to: $pio_dir" echo -e "Vendor applications will go in: $vendors_dir\n" echo "Spark: $spark_dir" echo "Elasticsearch: $elasticsearch_dir" echo "HBase: $hbase_dir" echo "ZooKeeper: $zookeeper_dir" echo "--------------------------------------------------------------------------------" # Java Install echo -e "\033[1;36mStarting Java install...\033[0m" # todo: make java installation platform independent sudo add-apt-repository ppa:openjdk-r/ppa sudo apt-get update sudo apt-get install openjdk-8-jdk libgfortran3 python-pip -y sudo pip install predictionio echo -e "\033[1;32mJava install done!\033[0m" JAVA_HOME=$(readlink -f /usr/bin/javac | sed "s:/bin/javac::") else # Interactive while true; do echo -e "\033[1mWhere would you like to install PredictionIO?\033[0m" read -e -p "Installation path ($PIO_DIR): " pio_dir pio_dir=${pio_dir:-$PIO_DIR} read -e -p "Vendor path ($pio_dir/vendors): " vendors_dir vendors_dir=${vendors_dir:-$pio_dir/vendors} echo -e "\033[1mPlease choose between the following sources (1, 2, 3 or 4):\033[0m" select source_setup in "$PGSQL" "$MYSQL" "$ES_PGSQL" "$ES_HB"; do case ${source_setup} in "$PGSQL") break ;; "$MYSQL") break ;; "$ES_PGSQL") break ;; "$ES_HB") break ;; *) ;; esac done spark_dir=${vendors_dir}/spark-${SPARK_VERSION} elasticsearch_dir=${vendors_dir}/elasticsearch-${ELASTICSEARCH_VERSION} hbase_dir=${vendors_dir}/hbase-${HBASE_VERSION} zookeeper_dir=${vendors_dir}/zookeeper echo "--------------------------------------------------------------------------------" echo -e "\033[1;32mOK, looks good!\033[0m" echo "You are going to install PredictionIO to: $pio_dir" echo -e "Vendor applications will go in: $vendors_dir\n" echo "Spark: $spark_dir" case $source_setup in "$PGSQL") # PostgreSQL installed by apt-get so no path is printed beforehand break ;; "$MYSQL") # MySQL installed by apt-get so no path is printed beforehand break ;; "$ES_PGSQL") # PostgreSQL installed by apt-get so no path is printed beforehand echo "Elasticsearch: $elasticsearch_dir" break ;; "$ES_HB") echo "Elasticsearch: $elasticsearch_dir" echo "HBase: $hbase_dir" echo "ZooKeeper: $zookeeper_dir" break ;; esac echo "--------------------------------------------------------------------------------" if confirm "\033[1mIs this correct?\033[0m"; then break; fi done echo -e "\033[1mSelect your linux distribution:\033[0m" select distribution in "$DISTRO_DEBIAN" "$DISTRO_OTHER"; do case $distribution in "$DISTRO_DEBIAN") break ;; "$DISTRO_OTHER") break ;; *) ;; esac done # Java Install if [[ ${OS} = "Linux" ]] && confirm "\033[1mWould you like to install Java?\033[0m"; then case ${distribution} in "$DISTRO_DEBIAN") echo -e "\033[1;36mStarting Java install...\033[0m" echo -e "\033[33mThis script requires superuser access!\033[0m" echo -e "\033[33mYou will be prompted for your password by sudo:\033[0m" sudo add-apt-repository ppa:openjdk-r/ppa sudo apt-get update sudo apt-get install openjdk-8-jdk libgfortran3 python-pip -y sudo pip install predictionio echo -e "\033[1;32mJava install done!\033[0m" break ;; "$DISTRO_OTHER") echo -e "\033[1;31mYour distribution not yet supported for automatic install :(\033[0m" echo -e "\033[1;31mPlease install Java manually!\033[0m" exit 2 ;; *) ;; esac fi # Try to find JAVA_HOME echo "Locating JAVA_HOME..." if [[ "$OS" = "Darwin" ]]; then JAVA_VERSION=`echo "$(java -version 2>&1)" | grep "java version" | awk '{ print substr($3, 2, length($3)-2); }'` JAVA_HOME=`/usr/libexec/java_home` elif [[ "$OS" = "Linux" ]]; then JAVA_HOME=$(readlink -f /usr/bin/javac | sed "s:/bin/javac::") fi echo "Found: $JAVA_HOME" # Check JAVA_HOME while [ ! -f "$JAVA_HOME/bin/javac" ]; do echo -e "\033[1;31mJAVA_HOME is incorrect!\033[0m" echo -e "\033[1;33mJAVA_HOME should be a directory containing \"bin/javac\"!\033[0m" read -e -p "Please enter JAVA_HOME manually: " JAVA_HOME done; fi if [ -n "$JAVA_VERSION" ]; then echo "Your Java version is: $JAVA_VERSION" fi echo "JAVA_HOME is now set to: $JAVA_HOME" # PredictionIO echo -e "\033[1;36mStarting PredictionIO setup in:\033[0m $pio_dir" cd ${TEMP_DIR} files=$(ls PredictionIO*.tar.gz 2> /dev/null | wc -l) if [[ $files == 0 ]]; then echo "Downloading PredictionIO..." curl -L https://dist.apache.org/repos/dist/release/predictionio/0.12.1/apache-predictionio-0.12.1-bin.tar.gz > predictionio-release.tar.gz tar zxf predictionio-0.12.1.tar.gz mv predictionio-0.12.1 PredictionIO sh PredictionIO/make-distribution.sh cp PredictionIO/${PIO_FILE} ${TEMP_DIR} rm -r PredictionIO fi tar zxf ${PIO_FILE} rm -rf ${pio_dir} mv PredictionIO*/ ${pio_dir} if [[ $USER ]]; then chown -R $USER ${pio_dir} fi echo "Updating ~/.profile to include: $pio_dir" PATH=$PATH:${pio_dir}/bin echo "export PATH=\$PATH:$pio_dir/bin" >> ${USER_PROFILE} echo -e "\033[1;32mPredictionIO setup done!\033[0m" mkdir -p ${vendors_dir} # Spark echo -e "\033[1;36mStarting Spark setup in:\033[0m $spark_dir" if [[ ! -e spark-${SPARK_VERSION}-bin-hadoop2.6.tgz ]]; then echo "Downloading Spark..." curl -O http://www-us.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop2.6.tgz fi tar xf spark-${SPARK_VERSION}-bin-hadoop2.6.tgz rm -rf ${spark_dir} mv spark-${SPARK_VERSION}-bin-hadoop2.6 ${spark_dir} echo "Updating: $pio_dir/conf/pio-env.sh" ${SED_CMD} "s|SPARK_HOME=.*|SPARK_HOME=$spark_dir|g" ${pio_dir}/conf/pio-env.sh echo -e "\033[1;32mSpark setup done!\033[0m" installPGSQL () { if [[ ${distribution} = "$DISTRO_DEBIAN" ]]; then echo -e "\033[1;36mInstalling PostgreSQL...\033[0m" sudo apt-get install postgresql-9.4 -y echo -e "\033[1;36mPlease use the default password 'pio' when prompted to enter one\033[0m" sudo -u postgres createdb pio sudo -u postgres createuser -P pio echo -e "\033[1;36mPlease update $pio_dir/conf/pio-env.sh if you did not enter the default password\033[0m" else echo -e "\033[1;31mYour distribution not yet supported for automatic install :(\033[0m" echo -e "\033[1;31mPlease install PostgreSQL manually!\033[0m" fi curl -O https://jdbc.postgresql.org/download/postgresql-${POSTGRES_VERSION}.jar mv postgresql-${POSTGRES_VERSION}.jar ${pio_dir}/lib/ echo -e "\033[1;32mPGSQL setup done!\033[0m" } installES() { echo -e "\033[1;36mStarting Elasticsearch setup in:\033[0m $elasticsearch_dir" if [[ -e elasticsearch-${ELASTICSEARCH_VERSION}.tar.gz ]]; then if confirm "Delete existing elasticsearch-$ELASTICSEARCH_VERSION.tar.gz?"; then rm elasticsearch-${ELASTICSEARCH_VERSION}.tar.gz fi fi if [[ ! -e elasticsearch-${ELASTICSEARCH_VERSION}.tar.gz ]]; then echo "Downloading Elasticsearch..." curl -O https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-${ELASTICSEARCH_VERSION}.tar.gz fi tar zxf elasticsearch-${ELASTICSEARCH_VERSION}.tar.gz rm -rf ${elasticsearch_dir} mv elasticsearch-${ELASTICSEARCH_VERSION} ${elasticsearch_dir} echo "Updating: $elasticsearch_dir/config/elasticsearch.yml" echo 'network.host: 127.0.0.1' >> ${elasticsearch_dir}/config/elasticsearch.yml } case $source_setup in "$PGSQL") installPGSQL ;; "$ES_PGSQL") installES installPGSQL echo "Updating: $pio_dir/conf/pio-env.sh" ${SED_CMD} "s|PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=PGSQL|PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH|" ${pio_dir}/conf/pio-env.sh ${SED_CMD} "s|# PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE|PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE|" ${pio_dir}/conf/pio-env.sh ${SED_CMD} "s|# PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=.*|PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$elasticsearch_dir|" ${pio_dir}/conf/pio-env.sh ;; "$MYSQL") if [[ ${distribution} = "$DISTRO_DEBIAN" ]]; then echo -e "\033[1;36mInstalling MySQL...\033[0m" echo -e "\033[1;36mPlease update $pio_dir/conf/pio-env.sh with your database configuration\033[0m" sudo apt-get install mysql-server -y sudo mysql -e "create database pio; grant all on pio.* to pio@localhost identified by 'pio'" echo -e "\033[1;36mUpdating: $pio_dir/conf/pio-env.sh\033[0m" ${SED_CMD} "s|PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=PGSQL|PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=MYSQL|" ${pio_dir}/conf/pio-env.sh ${SED_CMD} "s|PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=PGSQL|PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=MYSQL|" ${pio_dir}/conf/pio-env.sh ${SED_CMD} "s|PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=PGSQL|PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=MYSQL|" ${pio_dir}/conf/pio-env.sh ${SED_CMD} "s|PIO_STORAGE_SOURCES_PGSQL|# PIO_STORAGE_SOURCES_PGSQL|" ${pio_dir}/conf/pio-env.sh ${SED_CMD} "s|# PIO_STORAGE_SOURCES_MYSQL|PIO_STORAGE_SOURCES_MYSQL|" ${pio_dir}/conf/pio-env.sh else echo -e "\033[1;31mYour distribution not yet supported for automatic install :(\033[0m" echo -e "\033[1;31mPlease install MySQL manually!\033[0m" exit 4 fi curl -O http://central.maven.org/maven2/mysql/mysql-connector-java/5.1.37/mysql-connector-java-${MYSQL_VERSION}.jar mv mysql-connector-java-${MYSQL_VERSION}.jar ${pio_dir}/lib/ ;; "$ES_HB") # Elasticsearch installES echo "Updating: $pio_dir/conf/pio-env.sh" ${SED_CMD} "s|PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=PGSQL|PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH|" ${pio_dir}/conf/pio-env.sh ${SED_CMD} "s|PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=PGSQL|PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS|" ${pio_dir}/conf/pio-env.sh ${SED_CMD} "s|PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=PGSQL|PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE|" ${pio_dir}/conf/pio-env.sh ${SED_CMD} "s|PIO_STORAGE_SOURCES_PGSQL|# PIO_STORAGE_SOURCES_PGSQL|" ${pio_dir}/conf/pio-env.sh ${SED_CMD} "s|# PIO_STORAGE_SOURCES_LOCALFS|PIO_STORAGE_SOURCES_LOCALFS|" ${pio_dir}/conf/pio-env.sh ${SED_CMD} "s|# PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE|PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE|" ${pio_dir}/conf/pio-env.sh ${SED_CMD} "s|# PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=.*|PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$elasticsearch_dir|" ${pio_dir}/conf/pio-env.sh echo -e "\033[1;32mElasticsearch setup done!\033[0m" # HBase echo -e "\033[1;36mStarting HBase setup in:\033[0m $hbase_dir" if [[ ! -e hbase-${HBASE_VERSION}-bin.tar.gz ]]; then echo "Downloading HBase..." curl -O http://archive.apache.org/dist/hbase/${HBASE_VERSION}/hbase-${HBASE_VERSION}-bin.tar.gz fi tar zxf hbase-${HBASE_VERSION}-bin.tar.gz rm -rf ${hbase_dir} mv hbase-${HBASE_VERSION} ${hbase_dir} echo "Creating default site in: $hbase_dir/conf/hbase-site.xml" cat < ${hbase_dir}/conf/hbase-site.xml hbase.rootdir file://${hbase_dir}/data hbase.zookeeper.property.dataDir ${zookeeper_dir} EOT echo "Updating: $hbase_dir/conf/hbase-env.sh to include $JAVA_HOME" ${SED_CMD} "s|# export JAVA_HOME=/usr/java/jdk1.6.0/|export JAVA_HOME=$JAVA_HOME|" ${hbase_dir}/conf/hbase-env.sh echo "Updating: $pio_dir/conf/pio-env.sh" ${SED_CMD} "s|# PIO_STORAGE_SOURCES_HBASE|PIO_STORAGE_SOURCES_HBASE|" ${pio_dir}/conf/pio-env.sh ${SED_CMD} "s|PIO_STORAGE_SOURCES_HBASE_HOME=.*|PIO_STORAGE_SOURCES_HBASE_HOME=$hbase_dir|" ${pio_dir}/conf/pio-env.sh ${SED_CMD} "s|# HBASE_CONF_DIR=.*|HBASE_CONF_DIR=$hbase_dir/conf|" ${pio_dir}/conf/pio-env.sh echo -e "\033[1;32mHBase setup done!\033[0m" ;; esac echo "Updating permissions on: $vendors_dir" if [[ $USER ]]; then chown -R $USER ${vendors_dir} fi echo -e "\033[1;32mInstallation done!\033[0m" echo "--------------------------------------------------------------------------------" echo -e "\033[1;32mInstallation of PredictionIO complete!\033[0m" echo -e "\033[1;32mPlease follow documentation at http://predictionio.apache.org/start/download/ to download the engine template based on your needs\033[0m" echo -e echo -e "\033[1;33mCommand Line Usage Notes:\033[0m" if [[ ${source_setup} = $ES_HB ]]; then echo -e "To start PredictionIO and dependencies, run: '\033[1mpio-start-all\033[0m'" else echo -e "To start PredictionIO Event Server in the background, run: '\033[1mpio eventserver &\033[0m'" fi echo -e "To check the PredictionIO status, run: '\033[1mpio status\033[0m'" echo -e "To train/deploy engine, run: '\033[1mpio [train|deploy|...]\033[0m' commands" if [[ ${source_setup} = $ES_HB ]]; then echo -e "To stop PredictionIO and dependencies, run: '\033[1mpio-stop-all\033[0m'" fi echo -e "" echo -e "Please report any problems to the user mailing list." echo -e "User mailing list instructions: \033[1;34mhttp://predictionio.apache.org/support/\033[0m" echo "--------------------------------------------------------------------------------" ================================================ FILE: bin/load-pio-env.sh ================================================ #!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # This script loads pio-env.sh if it exists, and ensures it is only loaded once. # pio-env.sh is loaded from PIO_CONF_DIR if set, or within the current # directory's conf/ subdirectory. if [ -z "$PIO_ENV_LOADED" ]; then export PIO_ENV_LOADED=1 # Returns the parent of the directory this script lives in. parent_dir="$(cd `dirname $0`/..; pwd)" use_conf_dir=${PIO_CONF_DIR:-"${parent_dir}/conf"} if [ -f "${use_conf_dir}/pio-env.sh" ]; then # Promote all variable declarations to environment (exported) variables set -a . "${use_conf_dir}/pio-env.sh" set +a else echo -e "\033[0;35mWarning: pio-env.sh was not found in ${use_conf_dir}. Using system environment variables instead.\033[0m\n" fi fi ================================================ FILE: bin/pio ================================================ #!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # search() { local i=0; local needle=$1; shift for str in $@; do if [ "${str}" = "$needle" ]; then echo ${i} return else ((i++)) fi done echo ${i} } if [ -z $PIO_HOME ] ; then PIO_FILE=$(readlink -f $0 2>/dev/null) if [ $? = 0 ] ; then export PIO_HOME="$(cd $(dirname $PIO_FILE)/..; pwd)" else CURRENT_DIR=`pwd` TARGET_FILE="$0" cd "$(dirname "$TARGET_FILE")" TARGET_FILE=$(basename "$TARGET_FILE") while [ -L "$TARGET_FILE" ] do TARGET_FILE=$(readlink "$TARGET_FILE") cd "$(dirname "$TARGET_FILE")" TARGET_FILE=$(basename "$TARGET_FILE") done export PIO_HOME="$(cd $(dirname "$TARGET_FILE")/..; pwd -P)" cd "$CURRENT_DIR" fi fi if [ -z $PIO_CONF_DIR ] ; then export PIO_CONF_DIR="${PIO_HOME}/conf" if [ ! -d $PIO_CONF_DIR ] ; then export PIO_CONF_DIR="/etc/predictionio" if [ ! -d $PIO_CONF_DIR ] ; then echo "PIO_CONF_DIR is not found." exit 1 fi fi fi FIRST_SEP=$(search "--" $@) FIRST_HALF="${@:1:$FIRST_SEP}" SECOND_HALF="${@:$FIRST_SEP+1}" exec ${PIO_HOME}/bin/pio-class org.apache.predictionio.tools.console.Console ${FIRST_HALF} --pio-home ${PIO_HOME} ${SECOND_HALF} ================================================ FILE: bin/pio-class ================================================ #!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # cygwin=false case "`uname`" in CYGWIN*) cygwin=true;; esac # Figure out where PredictionIO is installed FWDIR="$(cd `dirname $0`/..; pwd)" # Export this as PIO_HOME export PIO_HOME="${FWDIR}" . ${FWDIR}/bin/load-pio-env.sh . ${FWDIR}/bin/semver.sh if [ -z "$1" ]; then echo "Usage: pio-class []" 1>&2 exit 1 fi # Warn if log4j.properties is not present if [ ! -f "$PIO_CONF_DIR/log4j.properties" ]; then echo -e "\033[0;35mWarning: log4j.properties is missing from $PIO_CONF_DIR\033[0m" fi # Make sure the Apache Spark version meets the prerequisite if it is a binary # distribution MIN_SPARK_VERSION="2.0.2" if [ -z "$SPARK_HOME" ]; then echo -e "\033[0;31mSPARK_HOME must be set in conf/pio-env.sh, or in the environment!\033[0m" exit 1 elif [ -r "$SPARK_HOME/RELEASE" ]; then SPARK_VERSION=`head -n 1 $SPARK_HOME/RELEASE | awk '{print $2}'` if [ -z "$SPARK_VERSION" ]; then echo -e "\033[0;35m$SPARK_HOME contains an empty RELEASE file. This is a known problem with certain vendors (e.g. Cloudera). Please make sure you are using at least $MIN_SPARK_VERSION.\033[0m" elif semverLT ${SPARK_VERSION} ${MIN_SPARK_VERSION}; then echo -e "\033[0;31mYou have Apache Spark $SPARK_VERSION at $SPARK_HOME which does not meet the minimum version requirement of $MIN_SPARK_VERSION.\033[0m" echo -e "\033[0;31mAborting.\033[0m" exit 1 fi else echo -e "\033[0;35m$SPARK_HOME is probably an Apache Spark development tree. Please make sure you are using at least $MIN_SPARK_VERSION.\033[0m" fi # Find the java binary if [ -n "${JAVA_HOME}" ]; then RUNNER="${JAVA_HOME}/bin/java" else if [ `command -v java` ]; then RUNNER="java" else echo -e "\033[0;31mJAVA_HOME is not set\033[0m" >&2 exit 1 fi fi # Compute classpath using external script classpath_output=$(${FWDIR}/bin/compute-classpath.sh) if [[ "$?" != "0" ]]; then echo "$classpath_output" exit 1 else CLASSPATH=${classpath_output} fi if [ -z $PIO_LOG_DIR ] ; then PIO_LOG_DIR=$PIO_HOME/log touch $PIO_LOG_DIR/pio.log > /dev/null 2>&1 if [ $? != 0 ] ; then PIO_LOG_DIR=/var/log/predictionio touch $PIO_LOG_DIR/pio.log > /dev/null 2>&1 if [ $? != 0 ] ; then PIO_LOG_DIR=$HOME fi fi fi export CLASSPATH export JAVA_OPTS="$JAVA_OPTS -Dpio.log.dir=$PIO_LOG_DIR" exec "$RUNNER" -cp "$CLASSPATH" $JAVA_OPTS "$@" ================================================ FILE: bin/pio-daemon ================================================ #!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # search() { local i=0; local needle=$1; shift for str in $@; do if [ "${str}" = "$needle" ]; then echo ${i} return else ((i++)) fi done echo ${i} } if [ -z $PIO_HOME ] ; then PIO_FILE=$(readlink -f $0 2>/dev/null) if [ $? = 0 ] ; then export PIO_HOME="$(cd $(dirname $PIO_FILE)/..; pwd)" else CURRENT_DIR=`pwd` TARGET_FILE="$0" cd "$(dirname "$TARGET_FILE")" TARGET_FILE=$(basename "$TARGET_FILE") while [ -L "$TARGET_FILE" ] do TARGET_FILE=$(readlink "$TARGET_FILE") cd "$(dirname "$TARGET_FILE")" TARGET_FILE=$(basename "$TARGET_FILE") done export PIO_HOME="$(cd $(dirname "$TARGET_FILE")/..; pwd -P)" cd "$CURRENT_DIR" fi fi if [ -z $PIO_CONF_DIR ] ; then export PIO_CONF_DIR="${PIO_HOME}/conf" if [ ! -d $PIO_CONF_DIR ] ; then export PIO_CONF_DIR="/etc/predictionio" if [ ! -d $PIO_CONF_DIR ] ; then echo "PIO_CONF_DIR is not found." exit 1 fi fi fi PIDFILE=$1 shift FIRST_SEP=$(search "--" $@) FIRST_HALF="${@:1:$FIRST_SEP}" SECOND_HALF="${@:$FIRST_SEP+1}" exec nohup ${PIO_HOME}/bin/pio-class org.apache.predictionio.tools.console.Console ${FIRST_HALF} --pio-home ${PIO_HOME} ${SECOND_HALF} <&- > /dev/null 2>&1 & echo $! > ${PIDFILE} ================================================ FILE: bin/pio-shell ================================================ #!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # if [ -z $PIO_HOME ] ; then PIO_FILE=$(readlink -f $0 2>/dev/null) if [ $? = 0 ] ; then export PIO_HOME="$(cd $(dirname $PIO_FILE)/..; pwd)" else CURRENT_DIR=`pwd` TARGET_FILE="$0" cd "$(dirname "$TARGET_FILE")" TARGET_FILE=$(basename "$TARGET_FILE") while [ -L "$TARGET_FILE" ] do TARGET_FILE=$(readlink "$TARGET_FILE") cd "$(dirname "$TARGET_FILE")" TARGET_FILE=$(basename "$TARGET_FILE") done export PIO_HOME="$(cd $(dirname "$TARGET_FILE")/..; pwd -P)" cd "$CURRENT_DIR" fi fi if [ -z $PIO_CONF_DIR ] ; then export PIO_CONF_DIR="${PIO_HOME}/conf" if [ ! -d $PIO_CONF_DIR ] ; then export PIO_CONF_DIR="/etc/predictionio" if [ ! -d $PIO_CONF_DIR ] ; then echo "PIO_CONF_DIR is not found." exit 1 fi fi fi . ${PIO_HOME}/bin/load-pio-env.sh if [[ "$1" == "--with-spark" ]] then echo "Starting the PIO shell with the Apache Spark Shell." # Get paths of assembly jars to pass to spark-shell . ${PIO_HOME}/bin/compute-classpath.sh shift ${SPARK_HOME}/bin/spark-shell --jars ${ASSEMBLY_JARS} $@ elif [[ "$1" == "--with-pyspark" ]] then echo "Starting the PIO shell with the Apache Spark Shell." # Get paths of assembly jars to pass to pyspark . ${PIO_HOME}/bin/compute-classpath.sh shift export PYTHONPATH=${PIO_HOME}/python ${SPARK_HOME}/bin/pyspark --jars ${ASSEMBLY_JARS} $@ else echo -e "\033[0;33mStarting the PIO shell without Apache Spark.\033[0m" echo -e "\033[0;33mIf you need the Apache Spark library, run 'pio-shell --with-spark [spark-submit arguments...]'.\033[0m" cd ${PIO_HOME} ./sbt/sbt console fi ================================================ FILE: bin/pio-start-all ================================================ #!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # Convenience script for starting all default dependent services in a single # node scenario. # Figure out where PredictionIO is installed export PIO_HOME="$(cd `dirname $0`/..; pwd)" . ${PIO_HOME}/bin/load-pio-env.sh SOURCE_TYPE=$PIO_STORAGE_REPOSITORIES_METADATA_SOURCE SOURCE_TYPE=$SOURCE_TYPE$PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE SOURCE_TYPE=$SOURCE_TYPE$PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE # Elasticsearch if [ `echo $SOURCE_TYPE | grep -i elasticsearch | wc -l` != 0 ] ; then echo "Starting Elasticsearch..." if [ -n "$PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME" ]; then ELASTICSEARCH_HOME=$PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME fi if [ -n "$ELASTICSEARCH_HOME" ]; then if [ -n "$JAVA_HOME" ]; then JPS=`$JAVA_HOME/bin/jps` else JPS=`jps` fi if [[ ${JPS} =~ "Elasticsearch" ]]; then echo -e "\033[0;31mElasticsearch is already running. Please use pio-stop-all to try stopping it first.\033[0m" echo -e "\033[0;31mNote: If you started Elasticsearch manually, you will need to kill it manually.\033[0m" echo -e "\033[0;31mAborting...\033[0m" exit 1 else $ELASTICSEARCH_HOME/bin/elasticsearch -d -p $PIO_HOME/es.pid fi else echo -e "\033[0;31mPlease set PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME in conf/pio-env.sh, or in your environment.\033[0m" echo -e "\033[0;31mCannot start Elasticsearch. Aborting...\033[0m" exit 1 fi fi # HBase if [ `echo $SOURCE_TYPE | grep -i hbase | wc -l` != 0 ] ; then echo "Starting HBase..." if [ -n "$PIO_STORAGE_SOURCES_HBASE_HOME" ]; then $PIO_STORAGE_SOURCES_HBASE_HOME/bin/start-hbase.sh else echo -e "\033[0;31mPlease set PIO_STORAGE_SOURCES_HBASE_HOME in conf/pio-env.sh, or in your environment.\033[0m" # Kill everything for cleanliness echo -e "\033[0;31mCannot start HBase. Aborting...\033[0m" sleep 3 ${PIO_HOME}/bin/pio-stop-all exit 1 fi fi #PGSQL if [ `echo $SOURCE_TYPE | grep -i pgsql | wc -l` != 0 ] ; then pgsqlStatus="$(ps auxwww | grep postgres | wc -l)" if [[ "$pgsqlStatus" < 5 ]]; then # Detect OS OS=`uname` if [[ "$OS" = "Darwin" ]]; then pg_cmd=`which pg_ctl` if [[ "$pg_cmd" != "" ]]; then pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start fi elif [[ "$OS" = "Linux" ]]; then sudo service postgresql start else echo -e "\033[1;31mYour OS $OS is not yet supported for automatic postgresql startup:(\033[0m" echo -e "\033[1;31mPlease do a manual startup!\033[0m" ${PIO_HOME}/bin/pio-stop-all exit 1 fi fi fi # PredictionIO Event Server echo "Waiting 10 seconds for Storage Repositories to fully initialize..." sleep 10 echo "Starting PredictionIO Event Server..." ${PIO_HOME}/bin/pio-daemon ${PIO_HOME}/eventserver.pid eventserver --ip 0.0.0.0 ================================================ FILE: bin/pio-stop-all ================================================ #!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # Convenience script for stopping all default dependent services in a single # node scenario. # Figure out where PredictionIO is installed export PIO_HOME="$(cd `dirname $0`/..; pwd)" . ${PIO_HOME}/bin/load-pio-env.sh SOURCE_TYPE=$PIO_STORAGE_REPOSITORIES_METADATA_SOURCE SOURCE_TYPE=$SOURCE_TYPE$PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE SOURCE_TYPE=$SOURCE_TYPE$PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE # PredictionIO Event Server echo "Stopping PredictionIO Event Server..." PIDFILE=${PIO_HOME}/eventserver.pid if [ -e ${PIDFILE} ]; then cat ${PIDFILE} | xargs kill rm ${PIDFILE} fi # HBase if [ `echo $SOURCE_TYPE | grep -i hbase | wc -l` != 0 ] ; then echo "Stopping HBase..." if [ -n "$PIO_STORAGE_SOURCES_HBASE_HOME" ]; then $PIO_STORAGE_SOURCES_HBASE_HOME/bin/stop-hbase.sh fi fi # Elasticsearch if [ `echo $SOURCE_TYPE | grep -i elasticsearch | wc -l` != 0 ] ; then echo "Stopping Elasticsearch..." PIDFILE=${PIO_HOME}/es.pid if [ -e ${PIDFILE} ]; then cat ${PIDFILE} | xargs kill rm ${PIDFILE} fi fi #PGSQL if [ `echo $SOURCE_TYPE | grep -i pgsql | wc -l` != 0 ] ; then if [ -n "$PIO_STORAGE_SOURCES_PGSQL_TYPE" ]; then OS=`uname` if [[ "$OS" = "Darwin" ]]; then pg_cmd=`which pg_ctl` if [[ "$pg_cmd" != "" ]]; then pg_ctl -D /usr/local/var/postgres stop -s -m fast fi elif [[ "$OS" = "Linux" ]]; then sudo service postgresql stop else echo -e "\033[1;31mYour OS $OS is not yet supported for automatic postgresql startup:(\033[0m" echo -e "\033[1;31mPlease do a manual shutdown!\033[0m" exit 1 fi fi fi ================================================ FILE: bin/semver.sh ================================================ #!/usr/bin/env sh # # Copyright (c) 2013, Ray Bejjani # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are met: # # 1. Redistributions of source code must retain the above copyright notice, this # list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright notice, # this list of conditions and the following disclaimer in the documentation # and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND # ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. # # The views and conclusions contained in the software and documentation are those # of the authors and should not be interpreted as representing official policies, # either expressed or implied, of the FreeBSD Project. function semverParseInto() { local RE='[^0-9]*\([0-9]*\)[.]\([0-9]*\)[.]\([0-9]*\)\([0-9A-Za-z-]*\)' #MAJOR eval $2=`echo $1 | sed -e "s#$RE#\1#"` #MINOR eval $3=`echo $1 | sed -e "s#$RE#\2#"` #MINOR eval $4=`echo $1 | sed -e "s#$RE#\3#"` #SPECIAL eval $5=`echo $1 | sed -e "s#$RE#\4#"` } function semverEQ() { local MAJOR_A=0 local MINOR_A=0 local PATCH_A=0 local SPECIAL_A=0 local MAJOR_B=0 local MINOR_B=0 local PATCH_B=0 local SPECIAL_B=0 semverParseInto $1 MAJOR_A MINOR_A PATCH_A SPECIAL_A semverParseInto $2 MAJOR_B MINOR_B PATCH_B SPECIAL_B if [ $MAJOR_A -ne $MAJOR_B ]; then return 1 fi if [ $MINOR_A -ne $MINOR_B ]; then return 1 fi if [ $PATCH_A -ne $PATCH_B ]; then return 1 fi if [[ "_$SPECIAL_A" != "_$SPECIAL_B" ]]; then return 1 fi return 0 } function semverLT() { local MAJOR_A=0 local MINOR_A=0 local PATCH_A=0 local SPECIAL_A=0 local MAJOR_B=0 local MINOR_B=0 local PATCH_B=0 local SPECIAL_B=0 semverParseInto $1 MAJOR_A MINOR_A PATCH_A SPECIAL_A semverParseInto $2 MAJOR_B MINOR_B PATCH_B SPECIAL_B if [ $MAJOR_A -lt $MAJOR_B ]; then return 0 fi if [[ $MAJOR_A -le $MAJOR_B && $MINOR_A -lt $MINOR_B ]]; then return 0 fi if [[ $MAJOR_A -le $MAJOR_B && $MINOR_A -le $MINOR_B && $PATCH_A -lt $PATCH_B ]]; then return 0 fi if [[ "_$SPECIAL_A" == "_" ]] && [[ "_$SPECIAL_B" == "_" ]] ; then return 1 fi if [[ "_$SPECIAL_A" == "_" ]] && [[ "_$SPECIAL_B" != "_" ]] ; then return 1 fi if [[ "_$SPECIAL_A" != "_" ]] && [[ "_$SPECIAL_B" == "_" ]] ; then return 0 fi if [[ "_$SPECIAL_A" < "_$SPECIAL_B" ]]; then return 0 fi return 1 } function semverGT() { semverEQ $1 $2 local EQ=$? semverLT $1 $2 local LT=$? if [ $EQ -ne 0 ] && [ $LT -ne 0 ]; then return 0 else return 1 fi } if [ "___semver.sh" == "___`basename $0`" ]; then MAJOR=0 MINOR=0 PATCH=0 SPECIAL="" semverParseInto $1 MAJOR MINOR PATCH SPECIAL echo "$1 -> M: $MAJOR m:$MINOR p:$PATCH s:$SPECIAL" semverParseInto $2 MAJOR MINOR PATCH SPECIAL echo "$2 -> M: $MAJOR m:$MINOR p:$PATCH s:$SPECIAL" semverEQ $1 $2 echo "$1 == $2 -> $?." semverLT $1 $2 echo "$1 < $2 -> $?." semverGT $1 $2 echo "$1 > $2 -> $?." fi ================================================ FILE: bin/travis/pio-start-travis ================================================ #!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # Convenience script for starting all default dependent services in a single # node scenario. # Figure out where PredictionIO is installed export PIO_HOME="$(cd `dirname $0`/..; pwd)" . ${PIO_HOME}/load-pio-env.sh # HBase echo "Starting HBase..." if [ -n "$PIO_STORAGE_SOURCES_HBASE_HOME" ]; then $PIO_STORAGE_SOURCES_HBASE_HOME/bin/start-hbase.sh else echo -e "\033[0;31mPlease set PIO_STORAGE_SOURCES_HBASE_HOME in conf/pio-env.sh, or in your environment.\033[0m" # Kill everything for cleanliness echo -e "\033[0;31mCannot start HBase. Aborting...\033[0m" sleep 3 ${PIO_HOME}/bin/pio-stop-all exit 1 fi # PredictionIO Event Server echo "Waiting 10 seconds for HBase to fully initialize..." sleep 10 echo "Starting PredictionIO Event Server..." ${PIO_HOME}/pio-daemon ${PIO_HOME}/../eventserver.pid eventserver --ip 0.0.0.0 ================================================ FILE: bin/travis/pio-stop-travis ================================================ #!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # Convenience script for stopping all default dependent services in a single # node scenario. # Figure out where PredictionIO is installed export PIO_HOME="$(cd `dirname $0`/..; pwd)" . ${PIO_HOME}/load-pio-env.sh # PredictionIO Event Server echo "Stopping PredictionIO Event Server..." PIDFILE=${PIO_HOME}/../eventserver.pid if [ -e ${PIDFILE} ]; then cat ${PIDFILE} | xargs kill rm ${PIDFILE} fi # HBase echo "Stopping HBase..." if [ -n "$PIO_STORAGE_SOURCES_HBASE_HOME" ]; then $PIO_STORAGE_SOURCES_HBASE_HOME/bin/stop-hbase.sh fi ================================================ FILE: build.sbt ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import PIOBuild._ lazy val scalaSparkDepsVersion = Map( "2.11" -> Map( "2.0" -> Map( "akka" -> "2.5.16", "hadoop" -> "2.7.7", "json4s" -> "3.2.11"), "2.1" -> Map( "akka" -> "2.5.17", "hadoop" -> "2.7.7", "json4s" -> "3.2.11"), "2.2" -> Map( "akka" -> "2.5.17", "hadoop" -> "2.7.7", "json4s" -> "3.2.11"), "2.3" -> Map( "akka" -> "2.5.17", "hadoop" -> "2.7.7", "json4s" -> "3.2.11"))) name := "apache-predictionio-parent" version in ThisBuild := "0.15.0-SNAPSHOT" organization in ThisBuild := "org.apache.predictionio" scalaVersion in ThisBuild := sys.props.getOrElse("scala.version", "2.11.12") scalaBinaryVersion in ThisBuild := binaryVersion(scalaVersion.value) crossScalaVersions in ThisBuild := Seq(scalaVersion.value) scalacOptions in ThisBuild ++= Seq("-deprecation", "-unchecked", "-feature") scalacOptions in (ThisBuild, Test) ++= Seq("-Yrangepos") fork in (ThisBuild, run) := true javacOptions in (ThisBuild, compile) ++= Seq("-source", "1.8", "-target", "1.8", "-Xlint:deprecation", "-Xlint:unchecked") // Ignore differentiation of Spark patch levels sparkVersion in ThisBuild := sys.props.getOrElse("spark.version", "2.1.3") sparkBinaryVersion in ThisBuild := binaryVersion(sparkVersion.value) hadoopVersion in ThisBuild := sys.props.getOrElse("hadoop.version", "2.7.7") akkaVersion in ThisBuild := sys.props.getOrElse("akka.version", "2.5.17") elasticsearchVersion in ThisBuild := sys.props.getOrElse("elasticsearch.version", "6.8.1") hbaseVersion in ThisBuild := sys.props.getOrElse("hbase.version", "1.2.6") json4sVersion in ThisBuild := { sparkBinaryVersion.value match { case "2.0" | "2.1" | "2.2" | "2.3" => "3.2.11" case "2.4" => "3.5.3" } } val conf = file("conf") val commonSettings = Seq( autoAPIMappings := true, licenseConfigurations := Set("compile"), licenseReportTypes := Seq(Csv), unmanagedClasspath in Test += conf, unmanagedClasspath in Test += baseDirectory.value.getParentFile / s"storage/jdbc/target/scala-${scalaBinaryVersion.value}/classes") val commonTestSettings = Seq( libraryDependencies ++= Seq( "org.postgresql" % "postgresql" % "9.4-1204-jdbc41" % "test", "org.scalikejdbc" %% "scalikejdbc" % "3.1.0" % "test")) val dataElasticsearch = (project in file("storage/elasticsearch")). settings(commonSettings: _*) val dataHbase = (project in file("storage/hbase")). settings(commonSettings: _*). enablePlugins(GenJavadocPlugin) val dataHdfs = (project in file("storage/hdfs")). settings(commonSettings: _*). enablePlugins(GenJavadocPlugin) val dataJdbc = (project in file("storage/jdbc")). settings(commonSettings: _*). enablePlugins(GenJavadocPlugin) val dataLocalfs = (project in file("storage/localfs")). settings(commonSettings: _*). enablePlugins(GenJavadocPlugin) val dataS3 = (project in file("storage/s3")). settings(commonSettings: _*). enablePlugins(GenJavadocPlugin) val common = (project in file("common")). settings(commonSettings: _*). enablePlugins(GenJavadocPlugin). disablePlugins(sbtassembly.AssemblyPlugin) val data = (project in file("data")). dependsOn(common). settings(commonSettings: _*). settings(commonTestSettings: _*). enablePlugins(GenJavadocPlugin). disablePlugins(sbtassembly.AssemblyPlugin) val core = (project in file("core")). dependsOn(data). settings(commonSettings: _*). settings(commonTestSettings: _*). enablePlugins(GenJavadocPlugin). enablePlugins(BuildInfoPlugin). settings( buildInfoKeys := Seq[BuildInfoKey]( name, version, scalaVersion, scalaBinaryVersion, sbtVersion, sparkVersion, hadoopVersion), buildInfoPackage := "org.apache.predictionio.core" ). enablePlugins(SbtTwirl). disablePlugins(sbtassembly.AssemblyPlugin) val e2 = (project in file("e2")). dependsOn(core). settings(commonSettings: _*). enablePlugins(GenJavadocPlugin). disablePlugins(sbtassembly.AssemblyPlugin) val tools = (project in file("tools")). dependsOn(e2). settings(commonSettings: _*). settings(commonTestSettings: _*). settings(skip in publish := true). enablePlugins(GenJavadocPlugin). enablePlugins(SbtTwirl) val storageProjectReference = Seq( dataElasticsearch, dataHbase, dataHdfs, dataJdbc, dataLocalfs, dataS3) map Project.projectToRef val storage = (project in file("storage")) .settings(skip in publish := true) .aggregate(storageProjectReference: _*) .disablePlugins(sbtassembly.AssemblyPlugin) val assembly = (project in file("assembly")). settings(commonSettings: _*) val root = (project in file(".")). settings(commonSettings: _*). enablePlugins(ScalaUnidocPlugin). settings( unidocProjectFilter in (ScalaUnidoc, unidoc) := inAnyProject -- inProjects(storageProjectReference: _*), unidocProjectFilter in (JavaUnidoc, unidoc) := inAnyProject -- inProjects(storageProjectReference: _*), scalacOptions in (ScalaUnidoc, unidoc) ++= Seq( "-groups", "-skip-packages", Seq( "akka", "org.apache.predictionio.annotation", "org.apache.predictionio.authentication", "org.apache.predictionio.configuration", "org.apache.predictionio.controller.html", "org.apache.predictionio.controller.java", "org.apache.predictionio.data.api", "org.apache.predictionio.data.storage.*", "org.apache.predictionio.data.view", "org.apache.predictionio.data.webhooks", "org.apache.predictionio.tools", "org.apache.predictionio.workflow.html", "scalikejdbc").mkString(":"), "-doc-title", "PredictionIO Scala API", "-doc-version", version.value, "-doc-root-content", "docs/scaladoc/rootdoc.txt")). settings( javacOptions in (JavaUnidoc, unidoc) := Seq( "-subpackages", "org.apache.predictionio", "-exclude", Seq( "org.apache.predictionio.controller.html", "org.apache.predictionio.data.api", "org.apache.predictionio.data.view", "org.apache.predictionio.data.webhooks.*", "org.apache.predictionio.workflow", "org.apache.predictionio.tools", "org.apache.hadoop").mkString(":"), "-windowtitle", "PredictionIO Javadoc " + version.value, "-group", "Java Controllers", Seq( "org.apache.predictionio.controller.java", "org.apache.predictionio.data.store.java").mkString(":"), "-group", "Scala Base Classes", Seq( "org.apache.predictionio.controller", "org.apache.predictionio.core", "org.apache.predictionio.data.storage", "org.apache.predictionio.data.storage.*", "org.apache.predictionio.data.store").mkString(":"), "-overview", "docs/javadoc/javadoc-overview.html", "-noqualifier", "java.lang")). aggregate(common, core, data, tools, e2). disablePlugins(sbtassembly.AssemblyPlugin) val pioUnidoc = taskKey[Unit]("Builds PredictionIO ScalaDoc") pioUnidoc := { (unidoc in Compile).value val log = streams.value.log log.info("Adding custom styling.") IO.append( crossTarget.value / "unidoc" / "lib" / "template.css", IO.read(baseDirectory.value / "docs" / "scaladoc" / "api-docs.css")) IO.append( crossTarget.value / "unidoc" / "lib" / "template.js", IO.read(baseDirectory.value / "docs" / "scaladoc" / "api-docs.js")) } homepage := Some(url("http://predictionio.apache.org")) pomExtra := { org.apache apache 18 scm:git:github.com/apache/predictionio scm:git:https://gitbox.apache.org/repos/asf/predictionio.git github.com/apache/predictionio donald Donald Szeto http://predictionio.apache.org donald@apache.org } childrenPomExtra in ThisBuild := { {organization.value} {name.value}_{scalaBinaryVersion.value} {version.value} } concurrentRestrictions in Global := Seq( Tags.limit(Tags.CPU, 1), Tags.limit(Tags.Network, 1), Tags.limit(Tags.Test, 1), Tags.limitAll( 1 ) ) parallelExecution := false parallelExecution in Global := false testOptions in Test += Tests.Argument("-oDF") printBuildInfo := { println(s"PIO_SCALA_VERSION=${scalaVersion.value}") println(s"PIO_SPARK_VERSION=${sparkVersion.value}") println(s"PIO_HADOOP_VERSION=${hadoopVersion.value}") println(s"PIO_ELASTICSEARCH_VERSION=${elasticsearchVersion.value}") println(s"PIO_HBASE_VERSION=${hbaseVersion.value}") } ================================================ FILE: common/build.sbt ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import PIOBuild._ name := "apache-predictionio-common" libraryDependencies ++= Seq( "com.typesafe.akka" %% "akka-actor" % akkaVersion.value, "com.typesafe.akka" %% "akka-slf4j" % akkaVersion.value, "com.typesafe.akka" %% "akka-http" % "10.1.5", "org.json4s" %% "json4s-native" % json4sVersion.value, "com.typesafe.akka" %% "akka-stream" % "2.5.12" ) pomExtra := childrenPomExtra.value ================================================ FILE: common/src/main/java/org/apache/predictionio/annotation/DeveloperApi.java ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.annotation; import java.lang.annotation.*; /** * A lower-level, unstable API intended for developers. * * Developer API's might change or be removed in minor versions of Spark. * * NOTE: If there exists a Scaladoc comment that immediately precedes this * annotation, the first line of the comment must be ":: DeveloperApi ::" with * no trailing blank line. This is because of the known issue that Scaladoc * displays only either the annotation or the comment, whichever comes first. */ @Retention(RetentionPolicy.RUNTIME) @Target({ElementType.TYPE, ElementType.FIELD, ElementType.METHOD, ElementType.PARAMETER, ElementType.CONSTRUCTOR, ElementType.LOCAL_VARIABLE, ElementType.PACKAGE}) public @interface DeveloperApi {} ================================================ FILE: common/src/main/java/org/apache/predictionio/annotation/Experimental.java ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.annotation; import java.lang.annotation.*; /** * An experimental user-facing API. * * Experimental API's might change or be removed, or be adopted as first-class * API's. * * NOTE: If there exists a Scaladoc comment that immediately precedes this * annotation, the first line of the comment must be ":: Experimental ::" with * no trailing blank line. This is because of the known issue that Scaladoc * displays only either the annotation or the comment, whichever comes first. */ @Retention(RetentionPolicy.RUNTIME) @Target({ElementType.TYPE, ElementType.FIELD, ElementType.METHOD, ElementType.PARAMETER, ElementType.CONSTRUCTOR, ElementType.LOCAL_VARIABLE, ElementType.PACKAGE}) public @interface Experimental {} ================================================ FILE: common/src/main/resources/application.conf ================================================ akka { log-config-on-start = false loggers = ["akka.event.slf4j.Slf4jLogger"] loglevel = "INFO" } ================================================ FILE: common/src/main/scala/org/apache/predictionio/akkahttpjson4s/Json4sSupport.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.akkahttpjson4s // Referenced from https://github.com/hseeberger/akka-http-json // because of the difference of supported json4s version. import java.lang.reflect.InvocationTargetException import akka.http.scaladsl.marshalling.{ Marshaller, ToEntityMarshaller } import akka.http.scaladsl.model.ContentTypeRange import akka.http.scaladsl.model.MediaType import akka.http.scaladsl.model.MediaTypes.`application/json` import akka.http.scaladsl.unmarshalling.{ FromEntityUnmarshaller, Unmarshaller } import akka.util.ByteString import org.json4s.{ Formats, MappingException, Serialization } import scala.collection.immutable.Seq /** * Automatic to and from JSON marshalling/unmarshalling using an in-scope *Json4s* protocol. * * Pretty printing is enabled if an implicit [[Json4sSupport.ShouldWritePretty.True]] is in scope. */ object Json4sSupport extends Json4sSupport { sealed abstract class ShouldWritePretty final object ShouldWritePretty { final object True extends ShouldWritePretty final object False extends ShouldWritePretty } } /** * Automatic to and from JSON marshalling/unmarshalling using an in-scope *Json4s* protocol. * * Pretty printing is enabled if an implicit [[Json4sSupport.ShouldWritePretty.True]] is in scope. */ trait Json4sSupport { import Json4sSupport._ def unmarshallerContentTypes: Seq[ContentTypeRange] = mediaTypes.map(ContentTypeRange.apply) def mediaTypes: Seq[MediaType.WithFixedCharset] = List(`application/json`) private val jsonStringUnmarshaller = Unmarshaller.byteStringUnmarshaller .forContentTypes(unmarshallerContentTypes: _*) .mapWithCharset { case (ByteString.empty, _) => throw Unmarshaller.NoContentException case (data, charset) => data.decodeString(charset.nioCharset.name) } private val jsonStringMarshaller = Marshaller.oneOf(mediaTypes: _*)(Marshaller.stringMarshaller) /** * HTTP entity => `A` * * @tparam A type to decode * @return unmarshaller for `A` */ implicit def unmarshaller[A: Manifest](implicit serialization: Serialization, formats: Formats): FromEntityUnmarshaller[A] = jsonStringUnmarshaller .map(s => serialization.read(s)) .recover { _ => _ => { case MappingException(_, ite: InvocationTargetException) => throw ite.getCause } } /** * `A` => HTTP entity * * @tparam A type to encode, must be upper bounded by `AnyRef` * @return marshaller for any `A` value */ implicit def marshaller[A <: AnyRef](implicit serialization: Serialization, formats: Formats, shouldWritePretty: ShouldWritePretty = ShouldWritePretty.False): ToEntityMarshaller[A] = shouldWritePretty match { case ShouldWritePretty.False => jsonStringMarshaller.compose(serialization.write[A]) case ShouldWritePretty.True => jsonStringMarshaller.compose(serialization.writePretty[A]) } } ================================================ FILE: common/src/main/scala/org/apache/predictionio/authentication/KeyAuthentication.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.authentication /** * This is a (very) simple authentication for the dashboard and engine servers * It is highly recommended to implement a stonger authentication mechanism */ import akka.http.scaladsl.model.HttpRequest import akka.http.scaladsl.model.headers.HttpChallenge import akka.http.scaladsl.server.{AuthenticationFailedRejection, Rejection, RequestContext} import com.typesafe.config.ConfigFactory import scala.concurrent.ExecutionContext.Implicits.global import scala.concurrent.Future trait KeyAuthentication { object ServerKey { private val config = ConfigFactory.load("server.conf") val authEnforced = config.getBoolean("org.apache.predictionio.server.key-auth-enforced") val get = config.getString("org.apache.predictionio.server.accessKey") val param = "accessKey" } def withAccessKeyFromFile: RequestContext => Future[Either[Rejection, HttpRequest]] = { ctx: RequestContext => val accessKeyParamOpt = ctx.request.uri.query().get(ServerKey.param) Future { val passedKey = accessKeyParamOpt.getOrElse { Left(AuthenticationFailedRejection( AuthenticationFailedRejection.CredentialsRejected, HttpChallenge("", None))) } if (!ServerKey.authEnforced || passedKey.equals(ServerKey.get)) Right(ctx.request) else Left(AuthenticationFailedRejection( AuthenticationFailedRejection.CredentialsRejected, HttpChallenge("", None))) } } } ================================================ FILE: common/src/main/scala/org/apache/predictionio/configuration/SSLConfiguration.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.configuration import java.io.FileInputStream import java.security.KeyStore import com.typesafe.config.ConfigFactory import javax.net.ssl.{KeyManagerFactory, SSLContext, TrustManagerFactory} trait SSLConfiguration { private val serverConfig = ConfigFactory.load("server.conf") private val keyStoreResource = serverConfig.getString("org.apache.predictionio.server.ssl-keystore-resource") private val password = serverConfig.getString("org.apache.predictionio.server.ssl-keystore-pass") private val keyAlias = serverConfig.getString("org.apache.predictionio.server.ssl-key-alias") private val keyStore = { // Loading keystore from specified file val clientStore = KeyStore.getInstance("JKS") val inputStream = new FileInputStream( getClass().getClassLoader().getResource(keyStoreResource).getFile()) clientStore.load(inputStream, password.toCharArray) inputStream.close() clientStore } // Creating SSL context def sslContext: SSLContext = { val context = SSLContext.getInstance("TLS") val tmf = TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm) val kmf = KeyManagerFactory.getInstance(KeyManagerFactory.getDefaultAlgorithm) kmf.init(keyStore, password.toCharArray) tmf.init(keyStore) context.init(kmf.getKeyManagers, tmf.getTrustManagers, null) context } } ================================================ FILE: conf/log4j.properties ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ log4j.rootLogger=INFO, console, file # console appender log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.follow=true log4j.appender.console.layout=org.apache.log4j.EnhancedPatternLayout log4j.appender.console.layout.ConversionPattern=[%p] [%c{1}] %m%n%throwable{0} # file appender log4j.appender.file=org.apache.log4j.FileAppender log4j.appender.file.File=${pio.log.dir}/pio.log log4j.appender.file.layout=org.apache.log4j.EnhancedPatternLayout log4j.appender.file.layout.ConversionPattern=%d %-5p %c [%t] - %m%n # quiet some packages that are too verbose log4j.logger.org.elasticsearch=WARN log4j.logger.org.apache.hadoop=WARN log4j.logger.org.apache.hadoop.hbase.zookeeper=ERROR log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR log4j.logger.org.apache.spark=WARN log4j.logger.org.apache.zookeeper=ERROR log4j.logger.org.eclipse.jetty=WARN log4j.logger.org.spark-project.jetty=WARN log4j.logger.akka=WARN ================================================ FILE: conf/pio-env.sh.template ================================================ #!/usr/bin/env bash # # Copy this file as pio-env.sh and edit it for your site's configuration. # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # PredictionIO Main Configuration # # This section controls core behavior of PredictionIO. It is very likely that # you need to change these to fit your site. # SPARK_HOME: Apache Spark is a hard dependency and must be configured. SPARK_HOME=$PIO_HOME/vendors/spark-2.1.1-bin-hadoop2.6 POSTGRES_JDBC_DRIVER=$PIO_HOME/lib/postgresql-42.0.0.jar MYSQL_JDBC_DRIVER=$PIO_HOME/lib/mysql-connector-java-5.1.41.jar # ES_CONF_DIR: You must configure this if you have advanced configuration for # your Elasticsearch setup. # ES_CONF_DIR=/opt/elasticsearch # HADOOP_CONF_DIR: You must configure this if you intend to run PredictionIO # with Hadoop 2. # HADOOP_CONF_DIR=/opt/hadoop # HBASE_CONF_DIR: You must configure this if you intend to run PredictionIO # with HBase on a remote cluster. # HBASE_CONF_DIR=$PIO_HOME/vendors/hbase-1.2.6/conf # Filesystem paths where PredictionIO uses as block storage. PIO_FS_BASEDIR=$HOME/.pio_store PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp # PredictionIO Storage Configuration # # This section controls programs that make use of PredictionIO's built-in # storage facilities. Default values are shown below. # # For more information on storage configuration please refer to # http://predictionio.apache.org/system/anotherdatastore/ # Storage Repositories # Default is to use PostgreSQL PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=PGSQL PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=PGSQL PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=PGSQL # Storage Data Sources # PostgreSQL Default Settings # Please change "pio" to your database name in PIO_STORAGE_SOURCES_PGSQL_URL # Please change PIO_STORAGE_SOURCES_PGSQL_USERNAME and # PIO_STORAGE_SOURCES_PGSQL_PASSWORD accordingly PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio # MySQL Example # PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc # PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://localhost/pio # PIO_STORAGE_SOURCES_MYSQL_USERNAME=pio # PIO_STORAGE_SOURCES_MYSQL_PASSWORD=pio # Elasticsearch Example # PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch # PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost # PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200 # PIO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES=http # PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch-6.8.1 # Optional basic HTTP auth # PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=my-name # PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=my-secret # Local File System Example # PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs # PIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models # HBase Example # PIO_STORAGE_SOURCES_HBASE_TYPE=hbase # PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase-1.2.6 # AWS S3 Example # PIO_STORAGE_SOURCES_S3_TYPE=s3 # PIO_STORAGE_SOURCES_S3_BUCKET_NAME=pio_bucket # PIO_STORAGE_SOURCES_S3_BASE_PATH=pio_model ================================================ FILE: conf/pio-env.sh.travis ================================================ #!/usr/bin/env bash # # Copy this file as pio-env.sh and edit it for your site's configuration. # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # PredictionIO Main Configuration # # This section controls core behavior of PredictionIO. It is very likely that # you need to change these to fit your site. # SPARK_HOME: Apache Spark is a hard dependency and must be configured. # it is set up in script.travis.sh SPARK_HOME=$SPARK_HOME # Filesystem paths where PredictionIO uses as block storage. PIO_FS_BASEDIR=$HOME/.pio_store PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp # PredictionIO Storage Configuration # # This section controls programs that make use of PredictionIO's built-in # storage facilities. Default values are shown below. # Storage Data Sources PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs PIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models PIO_STORAGE_SOURCES_HBASE_TYPE=hbase PIO_STORAGE_SOURCES_HBASE_HOME=$HBASE_HOME # Storage Data Sources (pgsql) PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql:predictionio PIO_STORAGE_SOURCES_PGSQL_USERNAME=postgres PIO_STORAGE_SOURCES_PGSQL_PASSWORD= # Storage Repositories PIO_STORAGE_REPOSITORIES_METADATA_NAME=predictionio_metadata PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=PGSQL PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=predictionio_modeldata PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=PGSQL PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=predictionio_eventdata PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=PGSQL ================================================ FILE: conf/pio-vendors.sh ================================================ #!/bin/bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # IMPORTANT: PIO_*_VERSION for dependencies must be set before envoking this script. # `source conf/set_build_profile.sh $BUILD_PROFILE` to get the proper versions if [ -z "$PIO_SCALA_VERSION" ]; then PIO_SCALA_VERSION="2.11.12" fi if [ -z "$PIO_SPARK_VERSION" ]; then PIO_SPARK_VERSION="2.1.3" fi if [ -z "$PIO_HADOOP_VERSION" ]; then PIO_HADOOP_VERSION="2.7.7" fi if [ -z "$PIO_ELASTICSEARCH_VERSION" ]; then PIO_ELASTICSEARCH_VERSION="6.8.1" fi if [ -z "$PIO_HBASE_VERSION" ]; then PIO_HBASE_VERSION="1.2.6" fi export ES_IMAGE="docker.elastic.co/elasticsearch/elasticsearch" export ES_TAG="$PIO_ELASTICSEARCH_VERSION" HBASE_MAJOR=`echo $PIO_HBASE_VERSION | awk -F. '{print $1 "." $2}'` export HBASE_TAG="$HBASE_MAJOR" PGSQL_JAR=postgresql-9.4-1204.jdbc41.jar PGSQL_DOWNLOAD=https://jdbc.postgresql.org/download/${PGSQL_JAR} HADOOP_MAJOR=`echo $PIO_HADOOP_VERSION | awk -F. '{print $1 "." $2}'` SPARK_DIR=spark-${PIO_SPARK_VERSION}-bin-hadoop${HADOOP_MAJOR} SPARK_ARCHIVE=${SPARK_DIR}.tgz SPARK_DOWNLOAD_MIRROR=https://www.apache.org/dyn/closer.lua\?action=download\&filename=spark/spark-${PIO_SPARK_VERSION}/${SPARK_ARCHIVE} SPARK_DOWNLOAD_ARCHIVE=https://archive.apache.org/dist/spark/spark-${PIO_SPARK_VERSION}/${SPARK_ARCHIVE} ================================================ FILE: conf/server.conf ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # Engine and dashboard Server related configurations org.apache.predictionio.server { # This access key is used by org.apache.predictionio.authentication.KeyAuthentication # to authenticate Evalutaion Dashboard, and Engine Server /stop and /reload enpoints # Should be passed as a query string param key-auth-enforced = "false" accessKey = "" # configs used by org.apache.predictionio.configuration.SSLConfiguration ssl-enforced = "false" ssl-keystore-resource = "keystore.jks" ssl-keystore-pass = "pioserver" ssl-key-alias = "selfsigned" } ================================================ FILE: core/build.sbt ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import PIOBuild._ name := "apache-predictionio-core" libraryDependencies ++= Seq( "com.github.scopt" %% "scopt" % "3.5.0", "com.google.code.gson" % "gson" % "2.5", "com.twitter" %% "chill-bijection" % "0.7.2", "de.javakaffee" % "kryo-serializers" % "0.37", "net.jodah" % "typetools" % "0.3.1", "org.apache.spark" %% "spark-core" % sparkVersion.value % "provided", "org.json4s" %% "json4s-ext" % json4sVersion.value, "org.scalaj" %% "scalaj-http" % "1.1.6", "org.slf4j" % "slf4j-log4j12" % "1.7.18", "org.scalatest" %% "scalatest" % "2.1.7" % "test", "org.specs2" %% "specs2" % "2.3.13" % "test", "org.scalamock" %% "scalamock-scalatest-support" % "3.5.0" % "test", "com.h2database" % "h2" % "1.4.196" % "test" ) parallelExecution in Test := false pomExtra := childrenPomExtra.value ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/CustomQuerySerializer.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import org.apache.predictionio.core.BaseQuerySerializer /** If your query class cannot be automatically serialized/deserialized to/from * JSON, implement a trait by extending this trait, and overriding the * `querySerializer` member with your * [[https://github.com/json4s/json4s#serializing-non-supported-types custom JSON4S serializer]]. * Algorithm and serving classes using your query class would only need to mix * in the trait to enable the custom serializer. * * @group Helper */ trait CustomQuerySerializer extends BaseQuerySerializer /** DEPRECATED. Use [[CustomQuerySerializer]] instead. * * @group Helper */ @deprecated("Use CustomQuerySerializer instead.", "0.9.2") trait WithQuerySerializer extends CustomQuerySerializer ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/Deployment.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import org.apache.predictionio.core.BaseEngine import scala.language.implicitConversions /** Defines a deployment that contains an [[Engine]] * * @group Engine */ trait Deployment extends EngineFactory { protected[this] var _engine: BaseEngine[_, _, _, _] = _ protected[this] var engineSet: Boolean = false /** Returns the [[Engine]] of this [[Deployment]] */ def apply(): BaseEngine[_, _, _, _] = { assert(engineSet, "Engine not set") _engine } /** Returns the [[Engine]] contained in this [[Deployment]]. */ private[predictionio] def engine: BaseEngine[_, _, _, _] = { assert(engineSet, "Engine not set") _engine } /** Sets the [[Engine]] for this [[Deployment]] * * @param engine An implementation of [[Engine]] * @tparam EI Evaluation information class * @tparam Q Query class * @tparam P Predicted result class * @tparam A Actual result class */ def engine_=[EI, Q, P, A](engine: BaseEngine[EI, Q, P, A]) { assert(!engineSet, "Engine can be set at most once") _engine = engine engineSet = true } } ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/Engine.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import grizzled.slf4j.Logger import org.apache.predictionio.core.BaseAlgorithm import org.apache.predictionio.core.BaseDataSource import org.apache.predictionio.core.BaseEngine import org.apache.predictionio.core.BasePreparator import org.apache.predictionio.core.BaseServing import org.apache.predictionio.core.Doer import org.apache.predictionio.data.storage.EngineInstance import org.apache.predictionio.data.storage.StorageClientException import org.apache.predictionio.workflow.CreateWorkflow import org.apache.predictionio.workflow.EngineLanguage import org.apache.predictionio.workflow.JsonExtractorOption.JsonExtractorOption import org.apache.predictionio.workflow.NameParamsSerializer import org.apache.predictionio.workflow.PersistentModelManifest import org.apache.predictionio.workflow.SparkWorkflowUtils import org.apache.predictionio.workflow.StopAfterPrepareInterruption import org.apache.predictionio.workflow.StopAfterReadInterruption import org.apache.predictionio.workflow.WorkflowParams import org.apache.predictionio.workflow.WorkflowUtils import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD import org.json4s._ import org.json4s.native.JsonMethods._ import org.json4s.native.Serialization.read import scala.collection.JavaConversions import scala.language.implicitConversions /** This class chains up the entire data process. PredictionIO uses this * information to create workflows and deployments. In Scala, you should * implement an object that extends the [[EngineFactory]] trait similar to the * following example. * * {{{ * object ItemRankEngine extends EngineFactory { * def apply() = { * new Engine( * classOf[ItemRankDataSource], * classOf[ItemRankPreparator], * Map( * "knn" -> classOf[KNNAlgorithm], * "rand" -> classOf[RandomAlgorithm], * "mahoutItemBased" -> classOf[MahoutItemBasedAlgorithm]), * classOf[ItemRankServing]) * } * } * }}} * * @see [[EngineFactory]] * @tparam TD Training data class. * @tparam EI Evaluation info class. * @tparam PD Prepared data class. * @tparam Q Input query class. * @tparam P Output prediction class. * @tparam A Actual value class. * @param dataSourceClassMap Map of data source names to class. * @param preparatorClassMap Map of preparator names to class. * @param algorithmClassMap Map of algorithm names to classes. * @param servingClassMap Map of serving names to class. * @group Engine */ class Engine[TD, EI, PD, Q, P, A]( val dataSourceClassMap: Map[String, Class[_ <: BaseDataSource[TD, EI, Q, A]]], val preparatorClassMap: Map[String, Class[_ <: BasePreparator[TD, PD]]], val algorithmClassMap: Map[String, Class[_ <: BaseAlgorithm[PD, _, Q, P]]], val servingClassMap: Map[String, Class[_ <: BaseServing[Q, P]]]) extends BaseEngine[EI, Q, P, A] { private[predictionio] implicit lazy val formats = Utils.json4sDefaultFormats + new NameParamsSerializer @transient lazy protected val logger = Logger[this.type] /** This auxiliary constructor is provided for backward compatibility. * * @param dataSourceClass Data source class. * @param preparatorClass Preparator class. * @param algorithmClassMap Map of algorithm names to classes. * @param servingClass Serving class. */ def this( dataSourceClass: Class[_ <: BaseDataSource[TD, EI, Q, A]], preparatorClass: Class[_ <: BasePreparator[TD, PD]], algorithmClassMap: Map[String, Class[_ <: BaseAlgorithm[PD, _, Q, P]]], servingClass: Class[_ <: BaseServing[Q, P]]) = this( Map("" -> dataSourceClass), Map("" -> preparatorClass), algorithmClassMap, Map("" -> servingClass) ) /** Java-friendly constructor * * @param dataSourceClass Data source class. * @param preparatorClass Preparator class. * @param algorithmClassMap Map of algorithm names to classes. * @param servingClass Serving class. */ def this(dataSourceClass: Class[_ <: BaseDataSource[TD, EI, Q, A]], preparatorClass: Class[_ <: BasePreparator[TD, PD]], algorithmClassMap: _root_.java.util.Map[String, Class[_ <: BaseAlgorithm[PD, _, Q, P]]], servingClass: Class[_ <: BaseServing[Q, P]]) = this( Map("" -> dataSourceClass), Map("" -> preparatorClass), JavaConversions.mapAsScalaMap(algorithmClassMap).toMap, Map("" -> servingClass) ) /** Returns a new Engine instance, mimicking case class's copy method behavior. */ def copy( dataSourceClassMap: Map[String, Class[_ <: BaseDataSource[TD, EI, Q, A]]] = dataSourceClassMap, preparatorClassMap: Map[String, Class[_ <: BasePreparator[TD, PD]]] = preparatorClassMap, algorithmClassMap: Map[String, Class[_ <: BaseAlgorithm[PD, _, Q, P]]] = algorithmClassMap, servingClassMap: Map[String, Class[_ <: BaseServing[Q, P]]] = servingClassMap): Engine[TD, EI, PD, Q, P, A] = { new Engine( dataSourceClassMap, preparatorClassMap, algorithmClassMap, servingClassMap) } /** Training this engine would return a list of models. * * @param sc An instance of SparkContext. * @param engineParams An instance of [[EngineParams]] for running a single training. * @param params An instance of [[WorkflowParams]] that controls the workflow. * @return A list of models. */ def train( sc: SparkContext, engineParams: EngineParams, engineInstanceId: String, params: WorkflowParams): Seq[Any] = { val (dataSourceName, dataSourceParams) = engineParams.dataSourceParams val dataSource = Doer(dataSourceClassMap(dataSourceName), dataSourceParams) val (preparatorName, preparatorParams) = engineParams.preparatorParams val preparator = Doer(preparatorClassMap(preparatorName), preparatorParams) val algoParamsList = engineParams.algorithmParamsList require( algoParamsList.size > 0, "EngineParams.algorithmParamsList must have at least 1 element.") val algorithms = algoParamsList.map { case (algoName, algoParams) => Doer(algorithmClassMap(algoName), algoParams) } val models = Engine.train( sc, dataSource, preparator, algorithms, params) val algoCount = algorithms.size val algoTuples: Seq[(String, Params, BaseAlgorithm[_, _, _, _], Any)] = (0 until algoCount).map { ax => { // val (name, params) = algoParamsList(ax) val (name, params) = algoParamsList(ax) (name, params, algorithms(ax), models(ax)) }} makeSerializableModels( sc, engineInstanceId = engineInstanceId, algoTuples = algoTuples) } /** Algorithm models can be persisted before deploy. However, it is also * possible that models are not persisted. This method retrains non-persisted * models and return a list of models that can be used directly in deploy. */ private[predictionio] def prepareDeploy( sc: SparkContext, engineParams: EngineParams, engineInstanceId: String, persistedModels: Seq[Any], params: WorkflowParams): Seq[Any] = { val algoParamsList = engineParams.algorithmParamsList val algorithms = algoParamsList.map { case (algoName, algoParams) => Doer(algorithmClassMap(algoName), algoParams) } val models = if (persistedModels.exists(m => m.isInstanceOf[Unit])) { // If any of persistedModels is Unit, we need to re-train the model. logger.info("Some persisted models are Unit, need to re-train.") val (dataSourceName, dataSourceParams) = engineParams.dataSourceParams val dataSource = Doer(dataSourceClassMap(dataSourceName), dataSourceParams) val (preparatorName, preparatorParams) = engineParams.preparatorParams val preparator = Doer(preparatorClassMap(preparatorName), preparatorParams) val td = dataSource.readTrainingBase(sc) val pd = preparator.prepareBase(sc, td) val models = algorithms.zip(persistedModels).map { case (algo, m) => m match { case () => algo.trainBase(sc, pd) case _ => m } } models } else { logger.info("Using persisted model") persistedModels } models .zip(algorithms) .zip(algoParamsList) .zipWithIndex .map { case (((model, algo), (algoName, algoParams)), ax) => { model match { case modelManifest: PersistentModelManifest => { logger.info("Custom-persisted model detected for algorithm " + algo.getClass.getName) SparkWorkflowUtils.getPersistentModel( modelManifest, Seq(engineInstanceId, ax, algoName).mkString("-"), algoParams, Some(sc), getClass.getClassLoader) } case m => { try { logger.info( s"Loaded model ${m.getClass.getName} for algorithm " + s"${algo.getClass.getName}") sc.stop } catch { case e: NullPointerException => logger.warn( s"Null model detected for algorithm ${algo.getClass.getName}") } m } } // model match } } } /** Extract model for persistent layer. * * PredictionIO persists models for future use. It allows custom * implementation for persisting models. You need to implement the * [[org.apache.predictionio.controller.PersistentModel]] interface. This method * traverses all models in the workflow. If the model is a * [[org.apache.predictionio.controller.PersistentModel]], it calls the save method * for custom persistence logic. * * For model doesn't support custom logic, PredictionIO serializes the whole * model if the corresponding algorithm is local. On the other hand, if the * model is parallel (i.e. model associated with a number of huge RDDS), this * method return Unit, in which case PredictionIO will retrain the whole * model from scratch next time it is used. */ private def makeSerializableModels( sc: SparkContext, engineInstanceId: String, // AlgoName, Algo, Model algoTuples: Seq[(String, Params, BaseAlgorithm[_, _, _, _], Any)] ): Seq[Any] = { logger.info(s"engineInstanceId=$engineInstanceId") algoTuples .zipWithIndex .map { case ((name, params, algo, model), ax) => algo.makePersistentModel( sc = sc, modelId = Seq(engineInstanceId, ax, name).mkString("-"), algoParams = params, bm = model) } } /** This is implemented such that [[org.apache.predictionio.controller.Evaluation]] can * use this method to generate inputs for [[org.apache.predictionio.controller.Metric]]. * * @param sc An instance of SparkContext. * @param engineParams An instance of [[EngineParams]] for running a single evaluation. * @param params An instance of [[WorkflowParams]] that controls the workflow. * @return A list of evaluation information and RDD of query, predicted * result, and actual result tuple tuple. */ def eval( sc: SparkContext, engineParams: EngineParams, params: WorkflowParams) : Seq[(EI, RDD[(Q, P, A)])] = { val (dataSourceName, dataSourceParams) = engineParams.dataSourceParams val dataSource = Doer(dataSourceClassMap(dataSourceName), dataSourceParams) val (preparatorName, preparatorParams) = engineParams.preparatorParams val preparator = Doer(preparatorClassMap(preparatorName), preparatorParams) val algoParamsList = engineParams.algorithmParamsList require( algoParamsList.size > 0, "EngineParams.algorithmParamsList must have at least 1 element.") val algorithms = algoParamsList.map { case (algoName, algoParams) => { try { Doer(algorithmClassMap(algoName), algoParams) } catch { case e: NoSuchElementException => { if (algoName == "") { logger.error("Empty algorithm name supplied but it could not " + "match with any algorithm in the engine's definition. " + "Existing algorithm name(s) are: " + s"${algorithmClassMap.keys.mkString(", ")}. Aborting.") } else { logger.error(s"$algoName cannot be found in the engine's " + "definition. Existing algorithm name(s) are: " + s"${algorithmClassMap.keys.mkString(", ")}. Aborting.") } sys.exit(1) } } }} val (servingName, servingParams) = engineParams.servingParams val serving = Doer(servingClassMap(servingName), servingParams) Engine.eval(sc, dataSource, preparator, algorithms, serving) } override def jValueToEngineParams( variantJson: JValue, jsonExtractor: JsonExtractorOption): EngineParams = { val engineLanguage = EngineLanguage.Scala // Extract EngineParams logger.info(s"Extracting datasource params...") val dataSourceParams: (String, Params) = WorkflowUtils.getParamsFromJsonByFieldAndClass( variantJson, "datasource", dataSourceClassMap, engineLanguage, jsonExtractor) logger.info(s"Datasource params: $dataSourceParams") logger.info(s"Extracting preparator params...") val preparatorParams: (String, Params) = WorkflowUtils.getParamsFromJsonByFieldAndClass( variantJson, "preparator", preparatorClassMap, engineLanguage, jsonExtractor) logger.info(s"Preparator params: $preparatorParams") val algorithmsParams: Seq[(String, Params)] = variantJson findField { case JField("algorithms", _) => true case _ => false } map { jv => val algorithmsParamsJson = jv._2 algorithmsParamsJson match { case JArray(s) => s.map { algorithmParamsJValue => val eap = algorithmParamsJValue.extract[CreateWorkflow.AlgorithmParams] ( eap.name, WorkflowUtils.extractParams( engineLanguage, compact(render(eap.params)), algorithmClassMap(eap.name), jsonExtractor) ) } case _ => Nil } } getOrElse Seq(("", EmptyParams())) logger.info(s"Extracting serving params...") val servingParams: (String, Params) = WorkflowUtils.getParamsFromJsonByFieldAndClass( variantJson, "serving", servingClassMap, engineLanguage, jsonExtractor) logger.info(s"Serving params: $servingParams") new EngineParams( dataSourceParams = dataSourceParams, preparatorParams = preparatorParams, algorithmParamsList = algorithmsParams, servingParams = servingParams) } private[predictionio] def engineInstanceToEngineParams( engineInstance: EngineInstance, jsonExtractor: JsonExtractorOption): EngineParams = { implicit val formats = DefaultFormats val engineLanguage = EngineLanguage.Scala val dataSourceParamsWithName: (String, Params) = { val (name, params) = read[(String, JValue)](engineInstance.dataSourceParams) if (!dataSourceClassMap.contains(name)) { logger.error(s"Unable to find datasource class with name '$name'" + " defined in Engine.") sys.exit(1) } val extractedParams = WorkflowUtils.extractParams( engineLanguage, compact(render(params)), dataSourceClassMap(name), jsonExtractor) (name, extractedParams) } val preparatorParamsWithName: (String, Params) = { val (name, params) = read[(String, JValue)](engineInstance.preparatorParams) if (!preparatorClassMap.contains(name)) { logger.error(s"Unable to find preparator class with name '$name'" + " defined in Engine.") sys.exit(1) } val extractedParams = WorkflowUtils.extractParams( engineLanguage, compact(render(params)), preparatorClassMap(name), jsonExtractor) (name, extractedParams) } val algorithmsParamsWithNames = read[Seq[(String, JValue)]](engineInstance.algorithmsParams).map { case (algoName, params) => val extractedParams = WorkflowUtils.extractParams( engineLanguage, compact(render(params)), algorithmClassMap(algoName), jsonExtractor) (algoName, extractedParams) } val servingParamsWithName: (String, Params) = { val (name, params) = read[(String, JValue)](engineInstance.servingParams) if (!servingClassMap.contains(name)) { logger.error(s"Unable to find serving class with name '$name'" + " defined in Engine.") sys.exit(1) } val extractedParams = WorkflowUtils.extractParams( engineLanguage, compact(render(params)), servingClassMap(name), jsonExtractor) (name, extractedParams) } new EngineParams( dataSourceParams = dataSourceParamsWithName, preparatorParams = preparatorParamsWithName, algorithmParamsList = algorithmsParamsWithNames, servingParams = servingParamsWithName) } } /** This object contains concrete implementation for some methods of the * [[Engine]] class. * * @group Engine */ object Engine { private type EX = Int private type AX = Int private type QX = Long @transient lazy private val logger = Logger[this.type] /** Helper class to accept either a single data source, or a map of data * sources, with a companion object providing implicit conversions, so * using this class directly is not necessary. * * @tparam TD Training data class * @tparam EI Evaluation information class * @tparam Q Input query class * @tparam A Actual result class */ class DataSourceMap[TD, EI, Q, A]( val m: Map[String, Class[_ <: BaseDataSource[TD, EI, Q, A]]]) { def this(c: Class[_ <: BaseDataSource[TD, EI, Q, A]]) = this(Map("" -> c)) } /** Companion object providing implicit conversions, so using this directly * is not necessary. */ object DataSourceMap { implicit def cToMap[TD, EI, Q, A]( c: Class[_ <: BaseDataSource[TD, EI, Q, A]]): DataSourceMap[TD, EI, Q, A] = new DataSourceMap(c) implicit def mToMap[TD, EI, Q, A]( m: Map[String, Class[_ <: BaseDataSource[TD, EI, Q, A]]]): DataSourceMap[TD, EI, Q, A] = new DataSourceMap(m) } /** Helper class to accept either a single preparator, or a map of * preparators, with a companion object providing implicit conversions, so * using this class directly is not necessary. * * @tparam TD Training data class * @tparam PD Prepared data class */ class PreparatorMap[TD, PD]( val m: Map[String, Class[_ <: BasePreparator[TD, PD]]]) { def this(c: Class[_ <: BasePreparator[TD, PD]]) = this(Map("" -> c)) } /** Companion object providing implicit conversions, so using this directly * is not necessary. */ object PreparatorMap { implicit def cToMap[TD, PD]( c: Class[_ <: BasePreparator[TD, PD]]): PreparatorMap[TD, PD] = new PreparatorMap(c) implicit def mToMap[TD, PD]( m: Map[String, Class[_ <: BasePreparator[TD, PD]]]): PreparatorMap[TD, PD] = new PreparatorMap(m) } /** Helper class to accept either a single serving, or a map of serving, with * a companion object providing implicit conversions, so using this class * directly is not necessary. * * @tparam Q Input query class * @tparam P Predicted result class */ class ServingMap[Q, P]( val m: Map[String, Class[_ <: BaseServing[Q, P]]]) { def this(c: Class[_ <: BaseServing[Q, P]]) = this(Map("" -> c)) } /** Companion object providing implicit conversions, so using this directly * is not necessary. */ object ServingMap { implicit def cToMap[Q, P]( c: Class[_ <: BaseServing[Q, P]]): ServingMap[Q, P] = new ServingMap(c) implicit def mToMap[Q, P]( m: Map[String, Class[_ <: BaseServing[Q, P]]]): ServingMap[Q, P] = new ServingMap(m) } /** Convenient method for returning an instance of [[Engine]]. * * @param dataSourceMap Accepts either an instance of Class of the data * source, or a Map of data source classes (implicitly * converted to [[DataSourceMap]]. * @param preparatorMap Accepts either an instance of Class of the * preparator, or a Map of preparator classes * (implicitly converted to [[PreparatorMap]]. * @param algorithmClassMap Accepts a Map of algorithm classes. * @param servingMap Accepts either an instance of Class of the serving, or * a Map of serving classes (implicitly converted to * [[ServingMap]]. * @tparam TD Training data class * @tparam EI Evaluation information class * @tparam PD Prepared data class * @tparam Q Input query class * @tparam P Predicted result class * @tparam A Actual result class * @return An instance of [[Engine]] */ def apply[TD, EI, PD, Q, P, A]( dataSourceMap: DataSourceMap[TD, EI, Q, A], preparatorMap: PreparatorMap[TD, PD], algorithmClassMap: Map[String, Class[_ <: BaseAlgorithm[PD, _, Q, P]]], servingMap: ServingMap[Q, P]): Engine[TD, EI, PD, Q, P, A] = new Engine( dataSourceMap.m, preparatorMap.m, algorithmClassMap, servingMap.m ) /** Provides concrete implementation of training for [[Engine]]. * * @param sc An instance of SparkContext * @param dataSource An instance of data source * @param preparator An instance of preparator * @param algorithmList A list of algorithm instances * @param params An instance of [[WorkflowParams]] that controls the training * process. * @tparam TD Training data class * @tparam PD Prepared data class * @tparam Q Input query class * @return A list of trained models */ def train[TD, PD, Q]( sc: SparkContext, dataSource: BaseDataSource[TD, _, Q, _], preparator: BasePreparator[TD, PD], algorithmList: Seq[BaseAlgorithm[PD, _, Q, _]], params: WorkflowParams ): Seq[Any] = { logger.info("EngineWorkflow.train") logger.info(s"DataSource: $dataSource") logger.info(s"Preparator: $preparator") logger.info(s"AlgorithmList: $algorithmList") if (params.skipSanityCheck) { logger.info("Data sanity check is off.") } else { logger.info("Data sanity check is on.") } val td = try { dataSource.readTrainingBase(sc) } catch { case e: StorageClientException => logger.error(s"Error occurred reading from data source. (Reason: " + e.getMessage + ") Please see the log for debugging details.", e) sys.exit(1) } if (!params.skipSanityCheck) { td match { case sanityCheckable: SanityCheck => { logger.info(s"${td.getClass.getName} supports data sanity" + " check. Performing check.") sanityCheckable.sanityCheck() } case _ => { logger.info(s"${td.getClass.getName} does not support" + " data sanity check. Skipping check.") } } } if (params.stopAfterRead) { logger.info("Stopping here because --stop-after-read is set.") throw StopAfterReadInterruption() } val pd = preparator.prepareBase(sc, td) if (!params.skipSanityCheck) { pd match { case sanityCheckable: SanityCheck => { logger.info(s"${pd.getClass.getName} supports data sanity" + " check. Performing check.") sanityCheckable.sanityCheck() } case _ => { logger.info(s"${pd.getClass.getName} does not support" + " data sanity check. Skipping check.") } } } if (params.stopAfterPrepare) { logger.info("Stopping here because --stop-after-prepare is set.") throw StopAfterPrepareInterruption() } val models: Seq[Any] = algorithmList.map(_.trainBase(sc, pd)) if (!params.skipSanityCheck) { models.foreach { model => model match { case sanityCheckable: SanityCheck => { logger.info(s"${model.getClass.getName} supports data sanity" + " check. Performing check.") sanityCheckable.sanityCheck() } case _ => { logger.info(s"${model.getClass.getName} does not support" + " data sanity check. Skipping check.") } } } } logger.info("EngineWorkflow.train completed") models } /** Provides concrete implementation of evaluation for [[Engine]]. * * @param sc An instance of SparkContext * @param dataSource An instance of data source * @param preparator An instance of preparator * @param algorithmList A list of algorithm instances * @param serving An instance of serving * @tparam TD Training data class * @tparam PD Prepared data class * @tparam Q Input query class * @tparam P Predicted result class * @tparam A Actual result class * @tparam EI Evaluation information class * @return A list of evaluation information, RDD of query, predicted result, * and actual result tuple tuple. */ def eval[TD, PD, Q, P, A, EI]( sc: SparkContext, dataSource: BaseDataSource[TD, EI, Q, A], preparator: BasePreparator[TD, PD], algorithmList: Seq[BaseAlgorithm[PD, _, Q, P]], serving: BaseServing[Q, P]): Seq[(EI, RDD[(Q, P, A)])] = { logger.info(s"DataSource: $dataSource") logger.info(s"Preparator: $preparator") logger.info(s"AlgorithmList: $algorithmList") logger.info(s"Serving: $serving") val algoMap: Map[AX, BaseAlgorithm[PD, _, Q, P]] = algorithmList .zipWithIndex .map(_.swap) .toMap val algoCount = algoMap.size val evalTupleMap: Map[EX, (TD, EI, RDD[(Q, A)])] = dataSource .readEvalBase(sc) .zipWithIndex .map(_.swap) .toMap val evalCount = evalTupleMap.size val evalTrainMap: Map[EX, TD] = evalTupleMap.mapValues(_._1) val evalInfoMap: Map[EX, EI] = evalTupleMap.mapValues(_._2) val evalQAsMap: Map[EX, RDD[(QX, (Q, A))]] = evalTupleMap .mapValues(_._3) .mapValues{ _.zipWithUniqueId().map(_.swap) } val preparedMap: Map[EX, PD] = evalTrainMap.mapValues { td => preparator.prepareBase(sc, td) } val algoModelsMap: Map[EX, Map[AX, Any]] = preparedMap.mapValues { pd => algoMap.mapValues(_.trainBase(sc,pd)) } val suppQAsMap: Map[EX, RDD[(QX, (Q, A))]] = evalQAsMap.mapValues { qas => qas.map { case (qx, (q, a)) => (qx, (serving.supplementBase(q), a)) } } val algoPredictsMap: Map[EX, RDD[(QX, Seq[P])]] = (0 until evalCount) .map { ex => val modelMap: Map[AX, Any] = algoModelsMap(ex) val qs: RDD[(QX, Q)] = suppQAsMap(ex).mapValues(_._1) val algoPredicts: Seq[RDD[(QX, (AX, P))]] = (0 until algoCount) .map { ax => val algo = algoMap(ax) val model = modelMap(ax) val rawPredicts: RDD[(QX, P)] = algo.batchPredictBase(sc, model, qs) val predicts: RDD[(QX, (AX, P))] = rawPredicts.map { case (qx, p) => (qx, (ax, p)) } predicts } val unionAlgoPredicts: RDD[(QX, Seq[P])] = sc.union(algoPredicts) .groupByKey() .mapValues { ps => assert (ps.size == algoCount, "Must have same length as algoCount") // TODO. Check size == algoCount ps.toSeq.sortBy(_._1).map(_._2) } (ex, unionAlgoPredicts) } .toMap val servingQPAMap: Map[EX, RDD[(Q, P, A)]] = algoPredictsMap .map { case (ex, psMap) => // The query passed to serving.serve is the original one, not // supplemented. val qasMap: RDD[(QX, (Q, A))] = evalQAsMap(ex) val qpsaMap: RDD[(QX, Q, Seq[P], A)] = psMap.join(qasMap) .map { case (qx, t) => (qx, t._2._1, t._1, t._2._2) } val qpaMap: RDD[(Q, P, A)] = qpsaMap.map { case (qx, q, ps, a) => (q, serving.serveBase(q, ps), a) } (ex, qpaMap) } (0 until evalCount).map { ex => (evalInfoMap(ex), servingQPAMap(ex)) } } } /** Mix in this trait for queries that contain prId (PredictedResultId). * This is useful when your engine expects queries to also be associated with * prId keys when feedback loop is enabled. * * @group Helper */ @deprecated("To be removed in future releases.", "0.9.2") trait WithPrId { val prId: String = "" } ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/EngineFactory.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import org.apache.predictionio.core.BaseEngine import scala.language.implicitConversions /** If you intend to let PredictionIO create workflow and deploy serving * automatically, you will need to implement an object that extends this class * and return an [[Engine]]. * * @group Engine */ abstract class EngineFactory { /** Creates an instance of an [[Engine]]. */ def apply(): BaseEngine[_, _, _, _] /** Override this method to programmatically return engine parameters. */ def engineParams(key: String): EngineParams = EngineParams() } /** DEPRECATED. Use [[EngineFactory]] instead. * * @group Engine */ @deprecated("Use EngineFactory instead.", "0.9.2") trait IEngineFactory extends EngineFactory ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/EngineParams.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import org.apache.predictionio.core.BaseDataSource import org.apache.predictionio.core.BaseAlgorithm import scala.collection.JavaConversions import scala.language.implicitConversions /** This class serves as a logical grouping of all required engine's parameters. * * @param dataSourceParams Data Source name-parameters tuple. * @param preparatorParams Preparator name-parameters tuple. * @param algorithmParamsList List of algorithm name-parameter pairs. * @param servingParams Serving name-parameters tuple. * @group Engine */ class EngineParams( val dataSourceParams: (String, Params) = ("", EmptyParams()), val preparatorParams: (String, Params) = ("", EmptyParams()), val algorithmParamsList: Seq[(String, Params)] = Nil, val servingParams: (String, Params) = ("", EmptyParams())) extends Serializable { /** Java-friendly constructor * * @param dataSourceName Data Source name * @param dataSourceParams Data Source parameters * @param preparatorName Preparator name * @param preparatorParams Preparator parameters * @param algorithmParamsList Map of algorithm name-parameters * @param servingName Serving name * @param servingParams Serving parameters */ def this( dataSourceName: String, dataSourceParams: Params, preparatorName: String, preparatorParams: Params, algorithmParamsList: _root_.java.util.Map[String, _ <: Params], servingName: String, servingParams: Params) = { // To work around a json4s weird limitation, the parameter names can not be changed this( (dataSourceName, dataSourceParams), (preparatorName, preparatorParams), JavaConversions.mapAsScalaMap(algorithmParamsList).toSeq, (servingName, servingParams) ) } // A case class style copy method. def copy( dataSourceParams: (String, Params) = dataSourceParams, preparatorParams: (String, Params) = preparatorParams, algorithmParamsList: Seq[(String, Params)] = algorithmParamsList, servingParams: (String, Params) = servingParams): EngineParams = { new EngineParams( dataSourceParams, preparatorParams, algorithmParamsList, servingParams) } } /** Companion object for creating [[EngineParams]] instances. * * @group Engine */ object EngineParams { /** Create EngineParams. * * @param dataSourceName Data Source name * @param dataSourceParams Data Source parameters * @param preparatorName Preparator name * @param preparatorParams Preparator parameters * @param algorithmParamsList List of algorithm name-parameter pairs. * @param servingName Serving name * @param servingParams Serving parameters */ def apply( dataSourceName: String = "", dataSourceParams: Params = EmptyParams(), preparatorName: String = "", preparatorParams: Params = EmptyParams(), algorithmParamsList: Seq[(String, Params)] = Nil, servingName: String = "", servingParams: Params = EmptyParams()): EngineParams = { new EngineParams( dataSourceParams = (dataSourceName, dataSourceParams), preparatorParams = (preparatorName, preparatorParams), algorithmParamsList = algorithmParamsList, servingParams = (servingName, servingParams) ) } } /** SimpleEngine has only one algorithm, and uses default preparator and serving * layer. Current default preparator is `IdentityPreparator` and serving is * `FirstServing`. * * @tparam TD Training data class. * @tparam EI Evaluation info class. * @tparam Q Input query class. * @tparam P Output prediction class. * @tparam A Actual value class. * @param dataSourceClass Data source class. * @param algorithmClass of algorithm names to classes. * @group Engine */ class SimpleEngine[TD, EI, Q, P, A]( dataSourceClass: Class[_ <: BaseDataSource[TD, EI, Q, A]], algorithmClass: Class[_ <: BaseAlgorithm[TD, _, Q, P]]) extends Engine( dataSourceClass, IdentityPreparator(dataSourceClass), Map("" -> algorithmClass), LFirstServing(algorithmClass)) /** This shorthand class serves the `SimpleEngine` class. * * @param dataSourceParams Data source parameters. * @param algorithmParams List of algorithm name-parameter pairs. * @group Engine */ class SimpleEngineParams( dataSourceParams: Params = EmptyParams(), algorithmParams: Params = EmptyParams()) extends EngineParams( dataSourceParams = ("", dataSourceParams), algorithmParamsList = Seq(("", algorithmParams))) ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/EngineParamsGenerator.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import scala.language.implicitConversions /** Defines an engine parameters generator. * * Implementations of this trait can be supplied to "pio eval" as the second * command line argument. * * @group Evaluation */ trait EngineParamsGenerator { protected[this] var epList: Seq[EngineParams] = _ protected[this] var epListSet: Boolean = false /** Returns the list of [[EngineParams]] of this [[EngineParamsGenerator]]. */ def engineParamsList: Seq[EngineParams] = { assert(epListSet, "EngineParamsList not set") epList } /** Sets the list of [[EngineParams]] of this [[EngineParamsGenerator]]. */ def engineParamsList_=(l: Seq[EngineParams]) { assert(!epListSet, "EngineParamsList can bet set at most once") epList = Seq(l:_*) epListSet = true } } ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/Evaluation.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import org.apache.predictionio.core.BaseEngine import org.apache.predictionio.core.BaseEvaluator import org.apache.predictionio.core.BaseEvaluatorResult import scala.language.implicitConversions /** Defines an evaluation that contains an engine and a metric. * * Implementations of this trait can be supplied to "pio eval" as the first * argument. * * @group Evaluation */ trait Evaluation extends Deployment { protected [this] var _evaluatorSet: Boolean = false protected [this] var _evaluator: BaseEvaluator[_, _, _, _, _ <: BaseEvaluatorResult] = _ private[predictionio] def evaluator: BaseEvaluator[_, _, _, _, _ <: BaseEvaluatorResult] = { assert(_evaluatorSet, "Evaluator not set") _evaluator } /** Gets the tuple of the [[Engine]] and the implementation of * [[org.apache.predictionio.core.BaseEvaluator]] */ def engineEvaluator : (BaseEngine[_, _, _, _], BaseEvaluator[_, _, _, _, _]) = { assert(_evaluatorSet, "Evaluator not set") (engine, _evaluator) } /** Sets both an [[Engine]] and an implementation of * [[org.apache.predictionio.core.BaseEvaluator]] for this [[Evaluation]] * * @param engineEvaluator A tuple an [[Engine]] and an implementation of * [[org.apache.predictionio.core.BaseEvaluator]] * @tparam EI Evaluation information class * @tparam Q Query class * @tparam P Predicted result class * @tparam A Actual result class * @tparam R Metric result class */ def engineEvaluator_=[EI, Q, P, A, R <: BaseEvaluatorResult]( engineEvaluator: ( BaseEngine[EI, Q, P, A], BaseEvaluator[EI, Q, P, A, R])) { assert(!_evaluatorSet, "Evaluator can be set at most once") engine = engineEvaluator._1 _evaluator = engineEvaluator._2 _evaluatorSet = true } /** Returns both the [[Engine]] and the implementation of [[Metric]] for this * [[Evaluation]] */ def engineMetric: (BaseEngine[_, _, _, _], Metric[_, _, _, _, _]) = { throw new NotImplementedError("This method is to keep the compiler happy") } /** Sets both an [[Engine]] and an implementation of [[Metric]] for this * [[Evaluation]] * * @param engineMetric A tuple of [[Engine]] and an implementation of * [[Metric]] * @tparam EI Evaluation information class * @tparam Q Query class * @tparam P Predicted result class * @tparam A Actual result class */ def engineMetric_=[EI, Q, P, A]( engineMetric: (BaseEngine[EI, Q, P, A], Metric[EI, Q, P, A, _])) { engineEvaluator = ( engineMetric._1, MetricEvaluator( metric = engineMetric._2, otherMetrics = Seq[Metric[EI, Q, P, A, _]](), outputPath = "best.json")) } private[predictionio] def engineMetrics: (BaseEngine[_, _, _, _], Metric[_, _, _, _, _]) = { throw new NotImplementedError("This method is to keep the compiler happy") } /** Sets an [[Engine]], an implementation of [[Metric]], and sequence of * implementations of [[Metric]] for this [[Evaluation]] * * @param engineMetrics A tuple of [[Engine]], an implementation of * [[Metric]] and sequence of implementations of [[Metric]] * @tparam EI Evaluation information class * @tparam Q Query class * @tparam P Predicted result class * @tparam A Actual result class */ def engineMetrics_=[EI, Q, P, A]( engineMetrics: ( BaseEngine[EI, Q, P, A], Metric[EI, Q, P, A, _], Seq[Metric[EI, Q, P, A, _]])) { engineEvaluator = ( engineMetrics._1, MetricEvaluator(engineMetrics._2, engineMetrics._3)) } } ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/FastEvalEngine.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import org.apache.predictionio.core.BaseDataSource import org.apache.predictionio.core.BasePreparator import org.apache.predictionio.core.BaseAlgorithm import org.apache.predictionio.core.BaseServing import org.apache.predictionio.core.Doer import org.apache.predictionio.annotation.Experimental import grizzled.slf4j.Logger import org.apache.predictionio.workflow.WorkflowParams import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.rdd.RDD import scala.language.implicitConversions import _root_.java.util.NoSuchElementException import scala.collection.mutable.{ HashMap => MutableHashMap } /** :: Experimental :: * Workflow based on [[FastEvalEngine]] * * @group Evaluation */ @Experimental object FastEvalEngineWorkflow { @transient lazy val logger = Logger[this.type] type EX = Int type AX = Int type QX = Long case class DataSourcePrefix(dataSourceParams: (String, Params)) { def this(pp: PreparatorPrefix) = this(pp.dataSourceParams) def this(ap: AlgorithmsPrefix) = this(ap.dataSourceParams) def this(sp: ServingPrefix) = this(sp.dataSourceParams) } case class PreparatorPrefix( dataSourceParams: (String, Params), preparatorParams: (String, Params)) { def this(ap: AlgorithmsPrefix) = { this(ap.dataSourceParams, ap.preparatorParams) } } case class AlgorithmsPrefix( dataSourceParams: (String, Params), preparatorParams: (String, Params), algorithmParamsList: Seq[(String, Params)]) { def this(sp: ServingPrefix) = { this(sp.dataSourceParams, sp.preparatorParams, sp.algorithmParamsList) } } case class ServingPrefix( dataSourceParams: (String, Params), preparatorParams: (String, Params), algorithmParamsList: Seq[(String, Params)], servingParams: (String, Params)) { def this(ep: EngineParams) = this( ep.dataSourceParams, ep.preparatorParams, ep.algorithmParamsList, ep.servingParams) } def getDataSourceResult[TD, EI, PD, Q, P, A]( workflow: FastEvalEngineWorkflow[TD, EI, PD, Q, P, A], prefix: DataSourcePrefix) : Map[EX, (TD, EI, RDD[(QX, (Q, A))])] = { val cache = workflow.dataSourceCache if (!cache.contains(prefix)) { val dataSource = Doer( workflow.engine.dataSourceClassMap(prefix.dataSourceParams._1), prefix.dataSourceParams._2) val result = dataSource .readEvalBase(workflow.sc) .map { case (td, ei, qaRDD) => { (td, ei, qaRDD.zipWithUniqueId().map(_.swap)) }} .zipWithIndex .map(_.swap) .toMap cache += Tuple2(prefix, result) } cache(prefix) } def getPreparatorResult[TD, EI, PD, Q, P, A]( workflow: FastEvalEngineWorkflow[TD, EI, PD, Q, P, A], prefix: PreparatorPrefix): Map[EX, PD] = { val cache = workflow.preparatorCache if (!cache.contains(prefix)) { val preparator = Doer( workflow.engine.preparatorClassMap(prefix.preparatorParams._1), prefix.preparatorParams._2) val result = getDataSourceResult( workflow = workflow, prefix = new DataSourcePrefix(prefix)) .mapValues { case (td, _, _) => preparator.prepareBase(workflow.sc, td) } cache += Tuple2(prefix, result) } cache(prefix) } def computeAlgorithmsResult[TD, EI, PD, Q, P, A]( workflow: FastEvalEngineWorkflow[TD, EI, PD, Q, P, A], prefix: AlgorithmsPrefix): Map[EX, RDD[(QX, Seq[P])]] = { val algoMap: Map[AX, BaseAlgorithm[PD, _, Q, P]] = prefix.algorithmParamsList .map { case (algoName, algoParams) => { try { Doer(workflow.engine.algorithmClassMap(algoName), algoParams) } catch { case e: NoSuchElementException => { val algorithmClassMap = workflow.engine.algorithmClassMap if (algoName == "") { logger.error("Empty algorithm name supplied but it could not " + "match with any algorithm in the engine's definition. " + "Existing algorithm name(s) are: " + s"${algorithmClassMap.keys.mkString(", ")}. Aborting.") } else { logger.error(s"${algoName} cannot be found in the engine's " + "definition. Existing algorithm name(s) are: " + s"${algorithmClassMap.keys.mkString(", ")}. Aborting.") } sys.exit(1) } } }} .zipWithIndex .map(_.swap) .toMap val algoCount = algoMap.size // Model Train val algoModelsMap: Map[EX, Map[AX, Any]] = getPreparatorResult( workflow, new PreparatorPrefix(prefix)) .mapValues { pd => algoMap.mapValues(_.trainBase(workflow.sc,pd)) } // Predict val dataSourceResult = FastEvalEngineWorkflow.getDataSourceResult( workflow = workflow, prefix = new DataSourcePrefix(prefix)) val algoResult: Map[EX, RDD[(QX, Seq[P])]] = dataSourceResult .par .map { case (ex, (td, ei, iqaRDD)) => { val modelsMap: Map[AX, Any] = algoModelsMap(ex) val qs: RDD[(QX, Q)] = iqaRDD.mapValues(_._1) val algoPredicts: Seq[RDD[(QX, (AX, P))]] = (0 until algoCount) .map { ax => { val algo = algoMap(ax) val model = modelsMap(ax) val rawPredicts: RDD[(QX, P)] = algo.batchPredictBase( workflow.sc, model, qs) val predicts: RDD[(QX, (AX, P))] = rawPredicts.map { case (qx, p) => (qx, (ax, p)) } predicts }} val unionAlgoPredicts: RDD[(QX, Seq[P])] = workflow.sc .union(algoPredicts) .groupByKey .mapValues { ps => { assert (ps.size == algoCount, "Must have same length as algoCount") // TODO. Check size == algoCount ps.toSeq.sortBy(_._1).map(_._2) }} (ex, unionAlgoPredicts) }} .seq .toMap algoResult } def getAlgorithmsResult[TD, EI, PD, Q, P, A]( workflow: FastEvalEngineWorkflow[TD, EI, PD, Q, P, A], prefix: AlgorithmsPrefix): Map[EX, RDD[(QX, Seq[P])]] = { val cache = workflow.algorithmsCache if (!cache.contains(prefix)) { val result = computeAlgorithmsResult(workflow, prefix) cache += Tuple2(prefix, result) } cache(prefix) } def getServingResult[TD, EI, PD, Q, P, A]( workflow: FastEvalEngineWorkflow[TD, EI, PD, Q, P, A], prefix: ServingPrefix) : Seq[(EI, RDD[(Q, P, A)])] = { val cache = workflow.servingCache if (!cache.contains(prefix)) { val serving = Doer( workflow.engine.servingClassMap(prefix.servingParams._1), prefix.servingParams._2) val algoPredictsMap = getAlgorithmsResult( workflow = workflow, prefix = new AlgorithmsPrefix(prefix)) val dataSourceResult = getDataSourceResult( workflow = workflow, prefix = new DataSourcePrefix(prefix)) val evalQAsMap = dataSourceResult.mapValues(_._3) val evalInfoMap = dataSourceResult.mapValues(_._2) val servingQPAMap: Map[EX, RDD[(Q, P, A)]] = algoPredictsMap .map { case (ex, psMap) => { val qasMap: RDD[(QX, (Q, A))] = evalQAsMap(ex) val qpsaMap: RDD[(QX, Q, Seq[P], A)] = psMap.join(qasMap) .map { case (qx, t) => (qx, t._2._1, t._1, t._2._2) } val qpaMap: RDD[(Q, P, A)] = qpsaMap.map { case (qx, q, ps, a) => (q, serving.serveBase(q, ps), a) } (ex, qpaMap) }} val servingResult = (0 until evalQAsMap.size).map { ex => { (evalInfoMap(ex), servingQPAMap(ex)) }} .toSeq cache += Tuple2(prefix, servingResult) } cache(prefix) } def get[TD, EI, PD, Q, P, A]( workflow: FastEvalEngineWorkflow[TD, EI, PD, Q, P, A], engineParamsList: Seq[EngineParams]) : Seq[(EngineParams, Seq[(EI, RDD[(Q, P, A)])])] = { engineParamsList.map { engineParams => (engineParams, getServingResult(workflow, new ServingPrefix(engineParams))) } } } /** :: Experimental :: * Workflow based on [[FastEvalEngine]] * * @group Evaluation */ @Experimental class FastEvalEngineWorkflow[TD, EI, PD, Q, P, A]( val engine: FastEvalEngine[TD, EI, PD, Q, P, A], val sc: SparkContext, val workflowParams: WorkflowParams) extends Serializable { import org.apache.predictionio.controller.FastEvalEngineWorkflow._ type DataSourceResult = Map[EX, (TD, EI, RDD[(QX, (Q, A))])] type PreparatorResult = Map[EX, PD] type AlgorithmsResult = Map[EX, RDD[(QX, Seq[P])]] type ServingResult = Seq[(EI, RDD[(Q, P, A)])] val dataSourceCache = MutableHashMap[DataSourcePrefix, DataSourceResult]() val preparatorCache = MutableHashMap[PreparatorPrefix, PreparatorResult]() val algorithmsCache = MutableHashMap[AlgorithmsPrefix, AlgorithmsResult]() val servingCache = MutableHashMap[ServingPrefix, ServingResult]() } /** :: Experimental :: * FastEvalEngine is a subclass of [[Engine]] that exploits the immutability of * controllers to optimize the evaluation process * * @group Evaluation */ @Experimental class FastEvalEngine[TD, EI, PD, Q, P, A]( dataSourceClassMap: Map[String, Class[_ <: BaseDataSource[TD, EI, Q, A]]], preparatorClassMap: Map[String, Class[_ <: BasePreparator[TD, PD]]], algorithmClassMap: Map[String, Class[_ <: BaseAlgorithm[PD, _, Q, P]]], servingClassMap: Map[String, Class[_ <: BaseServing[Q, P]]]) extends Engine[TD, EI, PD, Q, P, A]( dataSourceClassMap, preparatorClassMap, algorithmClassMap, servingClassMap) { @transient override lazy val logger = Logger[this.type] override def eval( sc: SparkContext, engineParams: EngineParams, params: WorkflowParams): Seq[(EI, RDD[(Q, P, A)])] = { logger.info("FastEvalEngine.eval") batchEval(sc, Seq(engineParams), params).head._2 } override def batchEval( sc: SparkContext, engineParamsList: Seq[EngineParams], params: WorkflowParams) : Seq[(EngineParams, Seq[(EI, RDD[(Q, P, A)])])] = { val fastEngineWorkflow = new FastEvalEngineWorkflow( this, sc, params) FastEvalEngineWorkflow.get( fastEngineWorkflow, engineParamsList) } } ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/IdentityPreparator.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import org.apache.predictionio.core.BaseDataSource import org.apache.predictionio.core.BasePreparator import org.apache.spark.SparkContext /** A helper concrete implementation of [[org.apache.predictionio.core.BasePreparator]] * that passes training data through without any special preparation. This can * be used in place for both [[PPreparator]] and [[LPreparator]]. * * @tparam TD Training data class. * @group Preparator */ class IdentityPreparator[TD] extends BasePreparator[TD, TD] { override def prepareBase(sc: SparkContext, td: TD): TD = td } /** Companion object of [[IdentityPreparator]] that conveniently returns an * instance of the class of [[IdentityPreparator]] for use with * [[EngineFactory]]. * * @group Preparator */ object IdentityPreparator { /** Produces an instance of the class of [[IdentityPreparator]]. * * @param ds Instance of the class of the data source for this preparator. */ def apply[TD](ds: Class[_ <: BaseDataSource[TD, _, _, _]]): Class[IdentityPreparator[TD]] = classOf[IdentityPreparator[TD]] } /** DEPRECATED. Use [[IdentityPreparator]] instead. * * @tparam TD Training data class. * @group Preparator */ class PIdentityPreparator[TD] extends IdentityPreparator[TD] /** DEPRECATED. Use [[IdentityPreparator]] instead. * * @group Preparator */ object PIdentityPreparator { /** Produces an instance of the class of [[IdentityPreparator]]. * * @param ds Instance of the class of the data source for this preparator. */ @deprecated("Use IdentityPreparator instead.", "0.9.2") def apply[TD](ds: Class[_ <: BaseDataSource[TD, _, _, _]]): Class[IdentityPreparator[TD]] = classOf[IdentityPreparator[TD]] } /** DEPRECATED. Use [[IdentityPreparator]] instead. * * @tparam TD Training data class. * @group Preparator */ class LIdentityPreparator[TD] extends IdentityPreparator[TD] /** DEPRECATED. Use [[IdentityPreparator]] instead. * * @group Preparator */ object LIdentityPreparator { /** Produces an instance of the class of [[IdentityPreparator]]. * * @param ds Instance of the class of the data source for this preparator. */ @deprecated("Use IdentityPreparator instead.", "0.9.2") def apply[TD](ds: Class[_ <: BaseDataSource[TD, _, _, _]]): Class[IdentityPreparator[TD]] = classOf[IdentityPreparator[TD]] } ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/LAlgorithm.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import _root_.org.apache.predictionio.annotation.DeveloperApi import org.apache.predictionio.core.BaseAlgorithm import org.apache.predictionio.workflow.PersistentModelManifest import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD import scala.reflect._ /** Base class of a local algorithm. * * A local algorithm runs locally within a single machine and produces a model * that can fit within a single machine. * * If your input query class requires custom JSON4S serialization, the most * idiomatic way is to implement a trait that extends [[CustomQuerySerializer]], * and mix that into your algorithm class, instead of overriding * [[querySerializer]] directly. * * @tparam PD Prepared data class. * @tparam M Trained model class. * @tparam Q Input query class. * @tparam P Output prediction class. * @group Algorithm */ abstract class LAlgorithm[PD, M : ClassTag, Q, P] extends BaseAlgorithm[RDD[PD], RDD[M], Q, P] { override def trainBase(sc: SparkContext, pd: RDD[PD]): RDD[M] = pd.map(train) /** Implement this method to produce a model from prepared data. * * @param pd Prepared data for model training. * @return Trained model. */ def train(pd: PD): M override def batchPredictBase(sc: SparkContext, bm: Any, qs: RDD[(Long, Q)]) : RDD[(Long, P)] = { val mRDD = bm.asInstanceOf[RDD[M]] batchPredict(mRDD, qs) } /** This is a default implementation to perform batch prediction. Override * this method for a custom implementation. * * @param mRDD A single model wrapped inside an RDD * @param qs An RDD of index-query tuples. The index is used to keep track of * predicted results with corresponding queries. * @return Batch of predicted results */ def batchPredict(mRDD: RDD[M], qs: RDD[(Long, Q)]): RDD[(Long, P)] = { val glomQs: RDD[Array[(Long, Q)]] = qs.glom() val cartesian: RDD[(M, Array[(Long, Q)])] = mRDD.cartesian(glomQs) cartesian.flatMap { case (m, qArray) => qArray.map { case (qx, q) => (qx, predict(m, q)) } } } override def predictBase(localBaseModel: Any, q: Q): P = { predict(localBaseModel.asInstanceOf[M], q) } /** Implement this method to produce a prediction from a query and trained * model. * * @param m Trained model produced by [[train]]. * @param q An input query. * @return A prediction. */ def predict(m: M, q: Q): P /** :: DeveloperApi :: * Engine developers should not use this directly (read on to see how local * algorithm models are persisted). * * Local algorithms produce local models. By default, models will be * serialized and stored automatically. Engine developers can override this behavior by * mixing the [[PersistentModel]] trait into the model class, and * PredictionIO will call [[PersistentModel.save]] instead. If it returns * true, a [[org.apache.predictionio.workflow.PersistentModelManifest]] will be * returned so that during deployment, PredictionIO will use * [[PersistentModelLoader]] to retrieve the model. Otherwise, Unit will be * returned and the model will be re-trained on-the-fly. * * @param sc Spark context * @param modelId Model ID * @param algoParams Algorithm parameters that trained this model * @param bm Model * @return The model itself for automatic persistence, an instance of * [[org.apache.predictionio.workflow.PersistentModelManifest]] for manual * persistence, or Unit for re-training on deployment */ @DeveloperApi override def makePersistentModel( sc: SparkContext, modelId: String, algoParams: Params, bm: Any): Any = { // Check RDD[M].count == 1 val m = bm.asInstanceOf[RDD[M]].first() m match { case m: PersistentModel[Params] @unchecked => if(m.save(modelId, algoParams, sc)){ PersistentModelManifest(className = m.getClass.getName) } else () case _ => m } } } ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/LAverageServing.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import org.apache.predictionio.core.BaseAlgorithm /** A concrete implementation of [[LServing]] returning the average of all * algorithms' predictions, where their classes are expected to be all Double. * * @group Serving */ class LAverageServing[Q] extends LServing[Q, Double] { /** Returns the average of all algorithms' predictions. */ override def serve(query: Q, predictions: Seq[Double]): Double = { predictions.sum / predictions.length } } /** A concrete implementation of [[LServing]] returning the average of all * algorithms' predictions, where their classes are expected to be all Double. * * @group Serving */ object LAverageServing { /** Returns an instance of [[LAverageServing]]. */ def apply[Q](a: Class[_ <: BaseAlgorithm[_, _, Q, _]]): Class[LAverageServing[Q]] = classOf[LAverageServing[Q]] } ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/LDataSource.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import org.apache.predictionio.core.BaseDataSource import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD import scala.reflect._ /** Base class of a local data source. * * A local data source runs locally within a single machine and return data * that can fit within a single machine. * * @tparam TD Training data class. * @tparam EI Evaluation Info class. * @tparam Q Input query class. * @tparam A Actual value class. * @group Data Source */ abstract class LDataSource[TD: ClassTag, EI, Q, A] extends BaseDataSource[RDD[TD], EI, Q, A] { override def readTrainingBase(sc: SparkContext): RDD[TD] = { sc.parallelize(Seq(None)).map(_ => readTraining()) } /** Implement this method to only return training data from a data source */ def readTraining(): TD override def readEvalBase(sc: SparkContext): Seq[(RDD[TD], EI, RDD[(Q, A)])] = { val localEvalData: Seq[(TD, EI, Seq[(Q, A)])] = readEval() localEvalData.map { case (td, ei, qaSeq) => { val tdRDD = sc.parallelize(Seq(None)).map(_ => td) val qaRDD = sc.parallelize(qaSeq) (tdRDD, ei, qaRDD) }} } /** To provide evaluation feature for your engine, your must override this * method to return data for evaluation from a data source. Returned data can * optionally include a sequence of query and actual value pairs for * evaluation purpose. * * The default implementation returns an empty sequence as a stub, so that * an engine can be compiled without implementing evaluation. */ def readEval(): Seq[(TD, EI, Seq[(Q, A)])] = Seq[(TD, EI, Seq[(Q, A)])]() @deprecated("Use readEval() instead.", "0.9.0") def read(): Seq[(TD, EI, Seq[(Q, A)])] = readEval() } ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/LFirstServing.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import org.apache.predictionio.core.BaseAlgorithm /** A concrete implementation of [[LServing]] returning the first algorithm's * prediction result directly without any modification. * * @group Serving */ class LFirstServing[Q, P] extends LServing[Q, P] { /** Returns the first algorithm's prediction. */ override def serve(query: Q, predictions: Seq[P]): P = predictions.head } /** A concrete implementation of [[LServing]] returning the first algorithm's * prediction result directly without any modification. * * @group Serving */ object LFirstServing { /** Returns an instance of [[LFirstServing]]. */ def apply[Q, P](a: Class[_ <: BaseAlgorithm[_, _, Q, P]]): Class[LFirstServing[Q, P]] = classOf[LFirstServing[Q, P]] } ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/LPreparator.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import org.apache.predictionio.core.BasePreparator import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD import scala.reflect._ /** Base class of a local preparator. * * A local preparator runs locally within a single machine and produces * prepared data that can fit within a single machine. * * @tparam TD Training data class. * @tparam PD Prepared data class. * @group Preparator */ abstract class LPreparator[TD, PD : ClassTag] extends BasePreparator[RDD[TD], RDD[PD]] { override def prepareBase(sc: SparkContext, rddTd: RDD[TD]): RDD[PD] = { rddTd.map(prepare) } /** Implement this method to produce prepared data that is ready for model * training. * * @param trainingData Training data to be prepared. */ def prepare(trainingData: TD): PD } ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/LServing.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import org.apache.predictionio.annotation.Experimental import org.apache.predictionio.core.BaseServing /** Base class of serving. * * @tparam Q Input query class. * @tparam P Output prediction class. * @group Serving */ abstract class LServing[Q, P] extends BaseServing[Q, P] { override def supplementBase(q: Q): Q = supplement(q) /** :: Experimental :: * Implement this method to supplement the query before sending it to * algorithms. * * @param q Query * @return A supplemented Query */ @Experimental def supplement(q: Q): Q = q override def serveBase(q: Q, ps: Seq[P]): P = { serve(q, ps) } /** Implement this method to combine multiple algorithms' predictions to * produce a single final prediction. The query is the original query sent to * the engine, not the supplemented produced by [[LServing.supplement]]. * * @param query Original input query. * @param predictions A list of algorithms' predictions. */ def serve(query: Q, predictions: Seq[P]): P } ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/LocalFileSystemPersistentModel.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import org.apache.spark.SparkContext /** This trait is a convenience helper for persisting your model to the local * filesystem. This trait and [[LocalFileSystemPersistentModelLoader]] contain * concrete implementation and need not be implemented. * * The underlying implementation is [[Utils.save]]. * * {{{ * class MyModel extends LocalFileSystemPersistentModel[MyParams] { * ... * } * * object MyModel extends LocalFileSystemPersistentModelLoader[MyParams, MyModel] { * ... * } * }}} * * @tparam AP Algorithm parameters class. * @see [[LocalFileSystemPersistentModelLoader]] * @group Algorithm */ trait LocalFileSystemPersistentModel[AP <: Params] extends PersistentModel[AP] { override def save(id: String, params: AP, sc: SparkContext): Boolean = { Utils.save(id, this) true } } /** Implement an object that extends this trait for PredictionIO to support * loading a persisted model from local filesystem during serving deployment. * * The underlying implementation is [[Utils.load]]. * * @tparam AP Algorithm parameters class. * @tparam M Model class. * @see [[LocalFileSystemPersistentModel]] * @group Algorithm */ trait LocalFileSystemPersistentModelLoader[AP <: Params, M] extends PersistentModelLoader[AP, M] { override def apply(id: String, params: AP, sc: Option[SparkContext]): M = { Utils.load(id).asInstanceOf[M] } } /** DEPRECATED. Use [[LocalFileSystemPersistentModel]] instead. * * @group Algorithm */ @deprecated("Use LocalFileSystemPersistentModel instead.", "0.9.2") trait IFSPersistentModel[AP <: Params] extends LocalFileSystemPersistentModel[AP] /** DEPRECATED. Use [[LocalFileSystemPersistentModelLoader]] instead. * * @group Algorithm */ @deprecated("Use LocalFileSystemPersistentModelLoader instead.", "0.9.2") trait IFSPersistentModelLoader[AP <: Params, M] extends LocalFileSystemPersistentModelLoader[AP, M] ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/Metric.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import _root_.org.apache.predictionio.controller.java.SerializableComparator import org.apache.predictionio.core.BaseEngine import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD import org.apache.spark.util.StatCounter import scala.Numeric.Implicits._ import scala.reflect._ /** Base class of a [[Metric]]. * * @tparam EI Evaluation information * @tparam Q Query * @tparam P Predicted result * @tparam A Actual result * @tparam R Metric result * @group Evaluation */ abstract class Metric[EI, Q, P, A, R](implicit rOrder: Ordering[R]) extends Serializable { /** Java friendly constructor * * @param comparator A serializable comparator for sorting the metric results. * */ def this(comparator: SerializableComparator[R]) = { this()(Ordering.comparatorToOrdering(comparator)) } /** Class name of this [[Metric]]. */ def header: String = this.getClass.getSimpleName /** Calculates the result of this [[Metric]]. */ def calculate(sc: SparkContext, evalDataSet: Seq[(EI, RDD[(Q, P, A)])]): R /** Comparison function for R's ordering. */ def compare(r0: R, r1: R): Int = rOrder.compare(r0, r1) } private[predictionio] trait StatsMetricHelper[EI, Q, P, A] { def calculate(q: Q, p: P, a: A): Double def calculateStats(sc: SparkContext, evalDataSet: Seq[(EI, RDD[(Q, P, A)])]) : StatCounter = { val doubleRDD = sc.union( evalDataSet.map { case (_, qpaRDD) => qpaRDD.map { case (q, p, a) => calculate(q, p, a) } } ) doubleRDD.stats() } } private[predictionio] trait StatsOptionMetricHelper[EI, Q, P, A] { def calculate(q: Q, p: P, a: A): Option[Double] def calculateStats(sc: SparkContext, evalDataSet: Seq[(EI, RDD[(Q, P, A)])]) : StatCounter = { val doubleRDD = sc.union( evalDataSet.map { case (_, qpaRDD) => qpaRDD.flatMap { case (q, p, a) => calculate(q, p, a) } } ) doubleRDD.stats() } } /** Returns the global average of the score returned by the calculate method. * * @tparam EI Evaluation information * @tparam Q Query * @tparam P Predicted result * @tparam A Actual result * * @group Evaluation */ abstract class AverageMetric[EI, Q, P, A] extends Metric[EI, Q, P, A, Double] with StatsMetricHelper[EI, Q, P, A] with QPAMetric[Q, P, A, Double] { /** Implement this method to return a score that will be used for averaging * across all QPA tuples. */ override def calculate(q: Q, p: P, a: A): Double override def calculate(sc: SparkContext, evalDataSet: Seq[(EI, RDD[(Q, P, A)])]) : Double = { calculateStats(sc, evalDataSet).mean } } /** Returns the global average of the non-None score returned by the calculate * method. * * @tparam EI Evaluation information * @tparam Q Query * @tparam P Predicted result * @tparam A Actual result * * @group Evaluation */ abstract class OptionAverageMetric[EI, Q, P, A] extends Metric[EI, Q, P, A, Double] with StatsOptionMetricHelper[EI, Q, P, A] with QPAMetric[Q, P, A, Option[Double]] { /** Implement this method to return a score that will be used for averaging * across all QPA tuples. */ override def calculate(q: Q, p: P, a: A): Option[Double] override def calculate(sc: SparkContext, evalDataSet: Seq[(EI, RDD[(Q, P, A)])]) : Double = { calculateStats(sc, evalDataSet).mean } } /** Returns the global standard deviation of the score returned by the calculate method * * This method uses org.apache.spark.util.StatCounter library, a one pass * method is used for calculation * * @tparam EI Evaluation information * @tparam Q Query * @tparam P Predicted result * @tparam A Actual result * * @group Evaluation */ abstract class StdevMetric[EI, Q, P, A] extends Metric[EI, Q, P, A, Double] with StatsMetricHelper[EI, Q, P, A] with QPAMetric[Q, P, A, Double] { /** Implement this method to return a score that will be used for calculating * the stdev * across all QPA tuples. */ override def calculate(q: Q, p: P, a: A): Double override def calculate(sc: SparkContext, evalDataSet: Seq[(EI, RDD[(Q, P, A)])]) : Double = { calculateStats(sc, evalDataSet).stdev } } /** Returns the global standard deviation of the non-None score returned by the calculate method * * This method uses org.apache.spark.util.StatCounter library, a one pass * method is used for calculation * * @tparam EI Evaluation information * @tparam Q Query * @tparam P Predicted result * @tparam A Actual result * * @group Evaluation */ abstract class OptionStdevMetric[EI, Q, P, A] extends Metric[EI, Q, P, A, Double] with StatsOptionMetricHelper[EI, Q, P, A] with QPAMetric[Q, P, A, Option[Double]] { /** Implement this method to return a score that will be used for calculating * the stdev * across all QPA tuples. */ override def calculate(q: Q, p: P, a: A): Option[Double] override def calculate(sc: SparkContext, evalDataSet: Seq[(EI, RDD[(Q, P, A)])]) : Double = { calculateStats(sc, evalDataSet).stdev } } /** Returns the sum of the score returned by the calculate method. * * @tparam EI Evaluation information * @tparam Q Query * @tparam P Predicted result * @tparam A Actual result * @tparam R Result, output of the function calculate, must be Numeric * * @group Evaluation */ abstract class SumMetric[EI, Q, P, A, R: ClassTag](implicit num: Numeric[R]) extends Metric[EI, Q, P, A, R]()(num) with QPAMetric[Q, P, A, R] { /** Implement this method to return a score that will be used for summing * across all QPA tuples. */ override def calculate(q: Q, p: P, a: A): R override def calculate(sc: SparkContext, evalDataSet: Seq[(EI, RDD[(Q, P, A)])]) : R = { val union: RDD[R] = sc.union( evalDataSet.map { case (_, qpaRDD) => qpaRDD.map { case (q, p, a) => calculate(q, p, a) } } ) union.aggregate[R](num.zero)(_ + _, _ + _) } } /** Returns zero. Useful as a placeholder during evaluation development when not all components are * implemented. * @tparam EI Evaluation information * @tparam Q Query * @tparam P Predicted result * @tparam A Actual result * * @group Evaluation */ class ZeroMetric[EI, Q, P, A] extends Metric[EI, Q, P, A, Double]() { override def calculate(sc: SparkContext, evalDataSet: Seq[(EI, RDD[(Q, P, A)])]): Double = 0.0 } /** Companion object of [[ZeroMetric]] * * @group Evaluation */ object ZeroMetric { /** Returns a ZeroMetric instance using Engine's type parameters. */ def apply[EI, Q, P, A](engine: BaseEngine[EI, Q, P, A]): ZeroMetric[EI, Q, P, A] = { new ZeroMetric[EI, Q, P, A]() } } /** Trait for metric which returns a score based on Query, PredictedResult, * and ActualResult * * @tparam Q Query class * @tparam P Predicted result class * @tparam A Actual result class * @tparam R Metric result class * @group Evaluation */ trait QPAMetric[Q, P, A, R] { /** Calculate a metric result based on query, predicted result, and actual * result * * @param q Query * @param p Predicted result * @param a Actual result * @return Metric result */ def calculate(q: Q, p: P, a: A): R } ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/MetricEvaluator.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import _root_.java.io.File import _root_.java.io.PrintWriter import com.github.nscala_time.time.Imports.DateTime import grizzled.slf4j.Logger import org.apache.predictionio.annotation.DeveloperApi import org.apache.predictionio.core.BaseEvaluator import org.apache.predictionio.core.BaseEvaluatorResult import org.apache.predictionio.data.storage.Storage import org.apache.predictionio.workflow.JsonExtractor import org.apache.predictionio.workflow.JsonExtractorOption.Both import org.apache.predictionio.workflow.NameParamsSerializer import org.apache.predictionio.workflow.WorkflowParams import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD import org.json4s.native.Serialization.write import org.json4s.native.Serialization.writePretty import scala.language.existentials /** Case class storing a primary score, and other scores * * @param score Primary metric score * @param otherScores Other scores this metric might have * @tparam R Type of the primary metric score * @group Evaluation */ case class MetricScores[R]( score: R, otherScores: Seq[Any]) /** Contains all results of a [[MetricEvaluator]] * * @param bestScore The best score among all iterations * @param bestEngineParams The set of engine parameters that yielded the best score * @param bestIdx The index of iteration that yielded the best score * @param metricHeader Brief description of the primary metric score * @param otherMetricHeaders Brief descriptions of other metric scores * @param engineParamsScores All sets of engine parameters and corresponding metric scores * @param outputPath An optional output path where scores are saved * @tparam R Type of the primary metric score * @group Evaluation */ case class MetricEvaluatorResult[R]( bestScore: MetricScores[R], bestEngineParams: EngineParams, bestIdx: Int, metricHeader: String, otherMetricHeaders: Seq[String], engineParamsScores: Seq[(EngineParams, MetricScores[R])], outputPath: Option[String]) extends BaseEvaluatorResult { override def toOneLiner(): String = { val idx = engineParamsScores.map(_._1).indexOf(bestEngineParams) s"Best Params Index: $idx Score: ${bestScore.score}" } override def toJSON(): String = { implicit lazy val formats = Utils.json4sDefaultFormats + new NameParamsSerializer write(this) } override def toHTML(): String = html.metric_evaluator().toString() override def toString: String = { implicit lazy val formats = Utils.json4sDefaultFormats + new NameParamsSerializer val bestEPStr = JsonExtractor.engineParamstoPrettyJson(Both, bestEngineParams) val strings = Seq( "MetricEvaluatorResult:", s" # engine params evaluated: ${engineParamsScores.size}") ++ Seq( "Optimal Engine Params:", s" $bestEPStr", "Metrics:", s" $metricHeader: ${bestScore.score}") ++ otherMetricHeaders.zip(bestScore.otherScores).map { case (h, s) => s" $h: $s" } ++ outputPath.toSeq.map { p => s"The best variant params can be found in $p" } strings.mkString("\n") } } /** Companion object of [[MetricEvaluator]] * * @group Evaluation */ object MetricEvaluator { def apply[EI, Q, P, A, R]( metric: Metric[EI, Q, P, A, R], otherMetrics: Seq[Metric[EI, Q, P, A, _]], outputPath: String): MetricEvaluator[EI, Q, P, A, R] = { new MetricEvaluator[EI, Q, P, A, R]( metric, otherMetrics, Some(outputPath)) } def apply[EI, Q, P, A, R]( metric: Metric[EI, Q, P, A, R], otherMetrics: Seq[Metric[EI, Q, P, A, _]]) : MetricEvaluator[EI, Q, P, A, R] = { new MetricEvaluator[EI, Q, P, A, R]( metric, otherMetrics, None) } def apply[EI, Q, P, A, R](metric: Metric[EI, Q, P, A, R]) : MetricEvaluator[EI, Q, P, A, R] = { new MetricEvaluator[EI, Q, P, A, R]( metric, Seq[Metric[EI, Q, P, A, _]](), None) } case class NameParams(name: String, params: Params) { def this(np: (String, Params)) = this(np._1, np._2) } case class EngineVariant( id: String, description: String, engineFactory: String, datasource: NameParams, preparator: NameParams, algorithms: Seq[NameParams], serving: NameParams) { def this(evaluation: Evaluation, engineParams: EngineParams) = this( id = "", description = "", engineFactory = evaluation.getClass.getName, datasource = new NameParams(engineParams.dataSourceParams), preparator = new NameParams(engineParams.preparatorParams), algorithms = engineParams.algorithmParamsList.map(np => new NameParams(np)), serving = new NameParams(engineParams.servingParams)) } } /** :: DeveloperApi :: * Do no use this directly. Use [[MetricEvaluator$]] instead. This is an * implementation of [[org.apache.predictionio.core.BaseEvaluator]] that evaluates * prediction performance based on metric scores. * * @param metric Primary metric * @param otherMetrics Other metrics * @param outputPath Optional output path to save evaluation results * @tparam EI Evaluation information type * @tparam Q Query class * @tparam P Predicted result class * @tparam A Actual result class * @tparam R Metric result class * @group Evaluation */ @DeveloperApi class MetricEvaluator[EI, Q, P, A, R] ( val metric: Metric[EI, Q, P, A, R], val otherMetrics: Seq[Metric[EI, Q, P, A, _]], val outputPath: Option[String]) extends BaseEvaluator[EI, Q, P, A, MetricEvaluatorResult[R]] { @transient lazy val logger = Logger[this.type] @transient val engineInstances = Storage.getMetaDataEngineInstances() def saveEngineJson( evaluation: Evaluation, engineParams: EngineParams, outputPath: String) { val now = DateTime.now val evalClassName = evaluation.getClass.getName val variant = MetricEvaluator.EngineVariant( id = s"$evalClassName $now", description = "", engineFactory = evalClassName, datasource = new MetricEvaluator.NameParams(engineParams.dataSourceParams), preparator = new MetricEvaluator.NameParams(engineParams.preparatorParams), algorithms = engineParams.algorithmParamsList.map(np => new MetricEvaluator.NameParams(np)), serving = new MetricEvaluator.NameParams(engineParams.servingParams)) implicit lazy val formats = Utils.json4sDefaultFormats logger.info(s"Writing best variant params to disk ($outputPath)...") val writer = new PrintWriter(new File(outputPath)) writer.write(writePretty(variant)) writer.close() } override def evaluateBase( sc: SparkContext, evaluation: Evaluation, engineEvalDataSet: Seq[(EngineParams, Seq[(EI, RDD[(Q, P, A)])])], params: WorkflowParams): MetricEvaluatorResult[R] = { val evalResultList: Seq[(EngineParams, MetricScores[R])] = engineEvalDataSet .par .map { case (engineParams, evalDataSet) => val metricScores = MetricScores[R]( metric.calculate(sc, evalDataSet), otherMetrics.map(_.calculate(sc, evalDataSet))) (engineParams, metricScores) } .seq implicit lazy val formats = Utils.json4sDefaultFormats + new NameParamsSerializer val evalResultListWithIndex = evalResultList.zipWithIndex evalResultListWithIndex.foreach { case ((ep, r), idx) => logger.info(s"Iteration $idx") logger.info(s"EngineParams: ${JsonExtractor.engineParamsToJson(Both, ep)}") logger.info(s"Result: $r") } // use max. take implicit from Metric. val ((bestEngineParams, bestScore), bestIdx) = evalResultListWithIndex .reduce { (x, y) => if (metric.compare(x._1._2.score, y._1._2.score) >= 0) x else y } // save engine params if it is set. outputPath.foreach { path => saveEngineJson(evaluation, bestEngineParams, path) } MetricEvaluatorResult( bestScore = bestScore, bestEngineParams = bestEngineParams, bestIdx = bestIdx, metricHeader = metric.header, otherMetricHeaders = otherMetrics.map(_.header), engineParamsScores = evalResultList, outputPath = outputPath) } } ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/P2LAlgorithm.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import _root_.org.apache.predictionio.annotation.DeveloperApi import org.apache.predictionio.core.BaseAlgorithm import org.apache.predictionio.workflow.PersistentModelManifest import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.rdd.RDD import scala.reflect._ /** Base class of a parallel-to-local algorithm. * * A parallel-to-local algorithm can be run in parallel on a cluster and * produces a model that can fit within a single machine. * * If your input query class requires custom JSON4S serialization, the most * idiomatic way is to implement a trait that extends [[CustomQuerySerializer]], * and mix that into your algorithm class, instead of overriding * [[querySerializer]] directly. * * @tparam PD Prepared data class. * @tparam M Trained model class. * @tparam Q Input query class. * @tparam P Output prediction class. * @group Algorithm */ abstract class P2LAlgorithm[PD, M: ClassTag, Q: ClassTag, P] extends BaseAlgorithm[PD, M, Q, P] { override def trainBase(sc: SparkContext, pd: PD): M = train(sc, pd) /** Implement this method to produce a model from prepared data. * * @param pd Prepared data for model training. * @return Trained model. */ def train(sc: SparkContext, pd: PD): M override def batchPredictBase(sc: SparkContext, bm: Any, qs: RDD[(Long, Q)]) : RDD[(Long, P)] = batchPredict(bm.asInstanceOf[M], qs) /** This is a default implementation to perform batch prediction. Override * this method for a custom implementation. * * @param m A model * @param qs An RDD of index-query tuples. The index is used to keep track of * predicted results with corresponding queries. * @return Batch of predicted results */ def batchPredict(m: M, qs: RDD[(Long, Q)]): RDD[(Long, P)] = { qs.mapValues { q => predict(m, q) } } override def predictBase(bm: Any, q: Q): P = predict(bm.asInstanceOf[M], q) /** Implement this method to produce a prediction from a query and trained * model. * * @param model Trained model produced by [[train]]. * @param query An input query. * @return A prediction. */ def predict(model: M, query: Q): P /** :: DeveloperApi :: * Engine developers should not use this directly (read on to see how * parallel-to-local algorithm models are persisted). * * Parallel-to-local algorithms produce local models. By default, models will be * serialized and stored automatically. Engine developers can override this behavior by * mixing the [[PersistentModel]] trait into the model class, and * PredictionIO will call [[PersistentModel.save]] instead. If it returns * true, a [[org.apache.predictionio.workflow.PersistentModelManifest]] will be * returned so that during deployment, PredictionIO will use * [[PersistentModelLoader]] to retrieve the model. Otherwise, Unit will be * returned and the model will be re-trained on-the-fly. * * @param sc Spark context * @param modelId Model ID * @param algoParams Algorithm parameters that trained this model * @param bm Model * @return The model itself for automatic persistence, an instance of * [[org.apache.predictionio.workflow.PersistentModelManifest]] for manual * persistence, or Unit for re-training on deployment */ @DeveloperApi override def makePersistentModel( sc: SparkContext, modelId: String, algoParams: Params, bm: Any): Any = { val m = bm.asInstanceOf[M] m match { case m: PersistentModel[Params] @unchecked => if(m.save(modelId, algoParams, sc)){ PersistentModelManifest(className = m.getClass.getName) } else () case _ => m } } } ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/PAlgorithm.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import org.apache.predictionio.annotation.DeveloperApi import org.apache.predictionio.core.BaseAlgorithm import org.apache.predictionio.workflow.PersistentModelManifest import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD /** Base class of a parallel algorithm. * * A parallel algorithm can be run in parallel on a cluster and produces a * model that can also be distributed across a cluster. * * If your input query class requires custom JSON4S serialization, the most * idiomatic way is to implement a trait that extends [[CustomQuerySerializer]], * and mix that into your algorithm class, instead of overriding * [[querySerializer]] directly. * * To provide evaluation feature, one must override and implement the * [[batchPredict]] method. Otherwise, an exception will be thrown when pio eval` * is used. * * @tparam PD Prepared data class. * @tparam M Trained model class. * @tparam Q Input query class. * @tparam P Output prediction class. * @group Algorithm */ abstract class PAlgorithm[PD, M, Q, P] extends BaseAlgorithm[PD, M, Q, P] { override def trainBase(sc: SparkContext, pd: PD): M = train(sc, pd) /** Implement this method to produce a model from prepared data. * * @param pd Prepared data for model training. * @return Trained model. */ def train(sc: SparkContext, pd: PD): M override def batchPredictBase(sc: SparkContext, bm: Any, qs: RDD[(Long, Q)]) : RDD[(Long, P)] = batchPredict(bm.asInstanceOf[M], qs) /** To provide evaluation feature, one must override and implement this method * to generate many predictions in batch. Otherwise, an exception will be * thrown when `pio eval` is used. * * The default implementation throws an exception. * * @param m Trained model produced by [[train]]. * @param qs An RDD of index-query tuples. The index is used to keep track of * predicted results with corresponding queries. */ def batchPredict(m: M, qs: RDD[(Long, Q)]): RDD[(Long, P)] = throw new NotImplementedError("batchPredict not implemented") override def predictBase(baseModel: Any, query: Q): P = { predict(baseModel.asInstanceOf[M], query) } /** Implement this method to produce a prediction from a query and trained * model. * * @param model Trained model produced by [[train]]. * @param query An input query. * @return A prediction. */ def predict(model: M, query: Q): P /** :: DeveloperApi :: * Engine developers should not use this directly (read on to see how parallel * algorithm models are persisted). * * In general, parallel models may contain multiple RDDs. It is not easy to * infer and persist them programmatically since these RDDs may be * potentially huge. To persist these models, engine developers need to mix * the [[PersistentModel]] trait into the model class and implement * [[PersistentModel.save]]. If it returns true, a * [[org.apache.predictionio.workflow.PersistentModelManifest]] will be * returned so that during deployment, PredictionIO will use * [[PersistentModelLoader]] to retrieve the model. Otherwise, Unit will be * returned and the model will be re-trained on-the-fly. * * @param sc Spark context * @param modelId Model ID * @param algoParams Algorithm parameters that trained this model * @param bm Model * @return The model itself for automatic persistence, an instance of * [[org.apache.predictionio.workflow.PersistentModelManifest]] for manual * persistence, or Unit for re-training on deployment */ @DeveloperApi override def makePersistentModel( sc: SparkContext, modelId: String, algoParams: Params, bm: Any): Any = { val m = bm.asInstanceOf[M] m match { case m: PersistentModel[Params] @unchecked => if(m.save(modelId, algoParams, sc)){ PersistentModelManifest(className = m.getClass.getName) } else () case _ => () } } } ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/PDataSource.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import org.apache.predictionio.core.BaseDataSource import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD /** Base class of a parallel data source. * * A parallel data source runs locally within a single machine, or in parallel * on a cluster, to return data that is distributed across a cluster. * * @tparam TD Training data class. * @tparam EI Evaluation Info class. * @tparam Q Input query class. * @tparam A Actual value class. * @group Data Source */ abstract class PDataSource[TD, EI, Q, A] extends BaseDataSource[TD, EI, Q, A] { override def readTrainingBase(sc: SparkContext): TD = readTraining(sc) /** Implement this method to only return training data from a data source */ def readTraining(sc: SparkContext): TD override def readEvalBase(sc: SparkContext): Seq[(TD, EI, RDD[(Q, A)])] = readEval(sc) /** To provide evaluation feature for your engine, your must override this * method to return data for evaluation from a data source. Returned data can * optionally include a sequence of query and actual value pairs for * evaluation purpose. * * The default implementation returns an empty sequence as a stub, so that * an engine can be compiled without implementing evaluation. */ def readEval(sc: SparkContext): Seq[(TD, EI, RDD[(Q, A)])] = Seq[(TD, EI, RDD[(Q, A)])]() @deprecated("Use readEval() instead.", "0.9.0") def read(sc: SparkContext): Seq[(TD, EI, RDD[(Q, A)])] = readEval(sc) } ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/PPreparator.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import org.apache.predictionio.core.BasePreparator import org.apache.spark.SparkContext /** Base class of a parallel preparator. * * A parallel preparator can be run in parallel on a cluster and produces a * prepared data that is distributed across a cluster. * * @tparam TD Training data class. * @tparam PD Prepared data class. * @group Preparator */ abstract class PPreparator[TD, PD] extends BasePreparator[TD, PD] { override def prepareBase(sc: SparkContext, td: TD): PD = { prepare(sc, td) } /** Implement this method to produce prepared data that is ready for model * training. * * @param sc An Apache Spark context. * @param trainingData Training data to be prepared. */ def prepare(sc: SparkContext, trainingData: TD): PD } ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/Params.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller /** Base trait for all kinds of parameters that will be passed to constructors * of different controller classes. * * @group Helper */ trait Params extends Serializable {} /** A concrete implementation of [[Params]] representing empty parameters. * * @group Helper */ case class EmptyParams() extends Params { override def toString(): String = "Empty" } ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/PersistentModel.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import org.apache.spark.SparkContext /** Mix in and implement this trait if your model cannot be persisted by * PredictionIO automatically. A companion object extending * IPersistentModelLoader is required for PredictionIO to load the persisted * model automatically during deployment. * * Notice that models generated by [[PAlgorithm]] cannot be persisted * automatically by nature and must implement these traits if model persistence * is desired. * * {{{ * class MyModel extends PersistentModel[MyParams] { * def save(id: String, params: MyParams, sc: SparkContext): Boolean = { * ... * } * } * * object MyModel extends PersistentModelLoader[MyParams, MyModel] { * def apply(id: String, params: MyParams, sc: Option[SparkContext]): MyModel = { * ... * } * } * }}} * * In Java, all you need to do is to implement this interface, and add a static * method with 3 arguments of type String, [[Params]], and SparkContext. * * {{{ * public class MyModel implements PersistentModel, Serializable { * ... * public boolean save(String id, MyParams params, SparkContext sc) { * ... * } * * public static MyModel load(String id, Params params, SparkContext sc) { * ... * } * ... * } * }}} * * @tparam AP Algorithm parameters class. * @see [[PersistentModelLoader]] * @group Algorithm */ trait PersistentModel[AP <: Params] { /** Save the model to some persistent storage. * * This method should return true if the model has been saved successfully so * that PredictionIO knows that it can be restored later during deployment. * This method should return false if the model cannot be saved (or should * not be saved due to configuration) so that PredictionIO will re-train the * model during deployment. All arguments of this method are provided by * automatically by PredictionIO. * * @param id ID of the run that trained this model. * @param params Algorithm parameters that were used to train this model. * @param sc An Apache Spark context. */ def save(id: String, params: AP, sc: SparkContext): Boolean } /** Implement an object that extends this trait for PredictionIO to support * loading a persisted model during serving deployment. * * @tparam AP Algorithm parameters class. * @tparam M Model class. * @see [[PersistentModel]] * @group Algorithm */ trait PersistentModelLoader[AP <: Params, M] { /** Implement this method to restore a persisted model that extends the * [[PersistentModel]] trait. All arguments of this method are provided * automatically by PredictionIO. * * @param id ID of the run that trained this model. * @param params Algorithm parameters that were used to train this model. * @param sc An optional Apache Spark context. This will be injected if the * model was generated by a [[PAlgorithm]]. */ def apply(id: String, params: AP, sc: Option[SparkContext]): M } /** DEPRECATED. Use [[PersistentModel]] instead. * * @group Algorithm */ @deprecated("Use PersistentModel instead.", "0.9.2") trait IPersistentModel[AP <: Params] extends PersistentModel[AP] /** DEPRECATED. Use [[PersistentModelLoader]] instead. * * @group Algorithm */ @deprecated("Use PersistentModelLoader instead.", "0.9.2") trait IPersistentModelLoader[AP <: Params, M] extends PersistentModelLoader[AP, M] ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/SanityCheck.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller /** Extends a data class with this trait if you want PredictionIO to * automatically perform sanity check on your data classes during training. * This is very useful when you need to debug your engine. * * @group Helper */ trait SanityCheck { /** Implement this method to perform checks on your data. This method should * contain assertions that throw exceptions when your data does not meet * your pre-defined requirement. */ def sanityCheck(): Unit } ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/Utils.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import org.apache.predictionio.workflow.KryoInstantiator import org.json4s._ import org.json4s.ext.JodaTimeSerializers import scala.io.Source import _root_.java.io.File import _root_.java.io.FileOutputStream /** Controller utilities. * * @group Helper */ object Utils { /** Default JSON4S serializers for PredictionIO controllers. */ val json4sDefaultFormats = DefaultFormats.lossless ++ JodaTimeSerializers.all /** Save a model object as a file to a temporary location on local filesystem. * It will first try to use the location indicated by the environmental * variable PIO_FS_TMPDIR, then fall back to the java.io.tmpdir property. * * @param id Used as the filename of the file. * @param model Model object. */ def save(id: String, model: Any): Unit = { val tmpdir = sys.env.getOrElse("PIO_FS_TMPDIR", System.getProperty("java.io.tmpdir")) val modelFile = tmpdir + File.separator + id (new File(tmpdir)).mkdirs val fos = new FileOutputStream(modelFile) val kryo = KryoInstantiator.newKryoInjection fos.write(kryo(model)) fos.close } /** Load a model object from a file in a temporary location on local * filesystem. It will first try to use the location indicated by the * environmental variable PIO_FS_TMPDIR, then fall back to the java.io.tmpdir * property. * * @param id Used as the filename of the file. */ def load(id: String): Any = { val tmpdir = sys.env.getOrElse("PIO_FS_TMPDIR", System.getProperty("java.io.tmpdir")) val modelFile = tmpdir + File.separator + id val src = Source.fromFile(modelFile)(scala.io.Codec.ISO8859) val kryo = KryoInstantiator.newKryoInjection val m = kryo.invert(src.map(_.toByte).toArray).get src.close m } } ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/java/JavaEngineParamsGenerator.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller.java import org.apache.predictionio.controller.EngineParams import org.apache.predictionio.controller.EngineParamsGenerator import scala.collection.JavaConversions.asScalaBuffer /** Define an engine parameter generator in Java * * Implementations of this abstract class can be supplied to "pio eval" as the second * command line argument. * * @group Evaluation */ abstract class JavaEngineParamsGenerator extends EngineParamsGenerator { /** Set the list of [[EngineParams]]. * * @param engineParams A list of engine params */ def setEngineParamsList(engineParams: java.util.List[_ <: EngineParams]) { engineParamsList = asScalaBuffer(engineParams) } } ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/java/JavaEvaluation.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller.java import org.apache.predictionio.controller.Evaluation import org.apache.predictionio.controller.Metric import org.apache.predictionio.core.BaseEngine import scala.collection.JavaConversions.asScalaBuffer /** Define an evaluation in Java. * * Implementations of this abstract class can be supplied to "pio eval" as the first * argument. * * @group Evaluation */ abstract class JavaEvaluation extends Evaluation { /** Set the [[BaseEngine]] and [[Metric]] for this [[Evaluation]] * * @param baseEngine [[BaseEngine]] for this [[JavaEvaluation]] * @param metric [[Metric]] for this [[JavaEvaluation]] * @tparam EI Evaluation information class * @tparam Q Query class * @tparam P Predicted result class * @tparam A Actual result class */ def setEngineMetric[EI, Q, P, A]( baseEngine: BaseEngine[EI, Q, P, A], metric: Metric[EI, Q, P, A, _]) { engineMetric = (baseEngine, metric) } /** Set the [[BaseEngine]] and [[Metric]]s for this [[JavaEvaluation]] * * @param baseEngine [[BaseEngine]] for this [[JavaEvaluation]] * @param metric [[Metric]] for this [[JavaEvaluation]] * @param metrics Other [[Metric]]s for this [[JavaEvaluation]] * @tparam EI Evaluation information class * @tparam Q Query class * @tparam P Predicted result class * @tparam A Actual result class */ def setEngineMetrics[EI, Q, P, A]( baseEngine: BaseEngine[EI, Q, P, A], metric: Metric[EI, Q, P, A, _], metrics: java.util.List[_ <: Metric[EI, Q, P, A, _]]) { engineMetrics = (baseEngine, metric, asScalaBuffer(metrics)) } } ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/java/LJavaAlgorithm.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller.java import org.apache.predictionio.controller.LAlgorithm import scala.reflect.ClassTag /** Base class of a Java local algorithm. Refer to [[LAlgorithm]] for documentation. * * @tparam PD Prepared data class. * @tparam M Trained model class. * @tparam Q Input query class. * @tparam P Output prediction class. * @group Algorithm */ abstract class LJavaAlgorithm[PD, M, Q, P] extends LAlgorithm[PD, M, Q, P]()(ClassTag.AnyRef.asInstanceOf[ClassTag[M]]) ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/java/LJavaDataSource.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller.java import org.apache.predictionio.controller.LDataSource import scala.reflect.ClassTag /** Base class of a Java local data source. Refer to [[LDataSource]] for documentation. * * @tparam TD Training data class. * @tparam EI Evaluation Info class. * @tparam Q Input query class. * @tparam A Actual value class. * @group Data Source */ abstract class LJavaDataSource[TD, EI, Q, A] extends LDataSource[TD, EI, Q, A]()(ClassTag.AnyRef.asInstanceOf[ClassTag[TD]]) ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/java/LJavaPreparator.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller.java import org.apache.predictionio.controller.LPreparator import scala.reflect.ClassTag /** Base class of a Java local preparator. Refer to [[LPreparator]] for documentation. * * @tparam TD Training data class. * @tparam PD Prepared data class. * @group Preparator */ abstract class LJavaPreparator[TD, PD] extends LPreparator[TD, PD]()(ClassTag.AnyRef.asInstanceOf[ClassTag[PD]]) ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/java/LJavaServing.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller.java import org.apache.predictionio.controller.LServing /** Base class of Java local serving. Refer to [[LServing]] for documentation. * * @tparam Q Input query class. * @tparam P Output prediction class. * @group Serving */ abstract class LJavaServing[Q, P] extends LServing[Q, P] ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/java/P2LJavaAlgorithm.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller.java import org.apache.predictionio.controller.P2LAlgorithm import scala.reflect.ClassTag /** Base class of a Java parallel-to-local algorithm. Refer to [[P2LAlgorithm]] for documentation. * * @tparam PD Prepared data class. * @tparam M Trained model class. * @tparam Q Input query class. * @tparam P Output prediction class. * @group Algorithm */ abstract class P2LJavaAlgorithm[PD, M, Q, P] extends P2LAlgorithm[PD, M, Q, P]()( ClassTag.AnyRef.asInstanceOf[ClassTag[M]], ClassTag.AnyRef.asInstanceOf[ClassTag[Q]]) ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/java/PJavaAlgorithm.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller.java import org.apache.predictionio.controller.PAlgorithm /** Base class of a Java parallel algorithm. Refer to [[PAlgorithm]] for documentation. * * @tparam PD Prepared data class. * @tparam M Trained model class. * @tparam Q Input query class. * @tparam P Output prediction class. * @group Algorithm */ abstract class PJavaAlgorithm[PD, M, Q, P] extends PAlgorithm[PD, M, Q, P] ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/java/PJavaDataSource.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller.java import org.apache.predictionio.controller.PDataSource /** Base class of a Java parallel data source. Refer to [[PDataSource]] for documentation. * * @tparam TD Training data class. * @tparam EI Evaluation Info class. * @tparam Q Input query class. * @tparam A Actual value class. * @group Data Source */ abstract class PJavaDataSource[TD, EI, Q, A] extends PDataSource[TD, EI, Q, A] ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/java/PJavaPreparator.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller.java import org.apache.predictionio.controller.PPreparator /** Base class of a Java parallel preparator. Refer to [[PPreparator]] for documentation * * @tparam TD Training data class. * @tparam PD Prepared data class. * @group Preparator */ abstract class PJavaPreparator[TD, PD] extends PPreparator[TD, PD] ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/java/SerializableComparator.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller.java import java.util.Comparator trait SerializableComparator[T] extends Comparator[T] with java.io.Serializable ================================================ FILE: core/src/main/scala/org/apache/predictionio/controller/package.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio /** Provides building blocks for writing a complete prediction engine * consisting of DataSource, Preparator, Algorithm, Serving, and Evaluation. * * == Start Building an Engine == * The starting point of a prediction engine is the [[Engine]] class. * * == The DASE Paradigm == * The building blocks together form the DASE paradigm. Learn more about DASE * [[http://predictionio.apache.org/customize/ here]]. * * == Types of Building Blocks == * Depending on the problem you are solving, you would need to pick appropriate * flavors of building blocks. * * === Engines === * There are 3 typical engine configurations: * * 1. [[PDataSource]], [[PPreparator]], [[P2LAlgorithm]], [[LServing]] * 2. [[PDataSource]], [[PPreparator]], [[PAlgorithm]], [[LServing]] * 3. [[LDataSource]], [[LPreparator]], [[LAlgorithm]], [[LServing]] * * In both configurations 1 and 2, data is sourced and prepared in a * parallelized fashion, with data type as RDD. * * The difference between configurations 1 and 2 come at the algorithm stage. * In configuration 1, the algorithm operates on potentially large data as RDDs * in the Spark cluster, and eventually outputs a model that is small enough to * fit in a single machine. * * On the other hand, configuration 2 outputs a model that is potentially too * large to fit in a single machine, and must reside in the Spark cluster as * RDD(s). * * With configuration 1 ([[P2LAlgorithm]]), PredictionIO will automatically * try to persist the model to local disk or HDFS if the model is serializable. * * With configuration 2 ([[PAlgorithm]]), PredictionIO will not automatically * try to persist the model, unless the model implements the [[PersistentModel]] * trait. * * In special circumstances where both the data and the model are small, * configuration 3 may be used. Beware that RDDs cannot be used with * configuration 3. * * === Data Source === * [[PDataSource]] is probably the most used data source base class with the * ability to process RDD-based data. [[LDataSource]] '''cannot''' handle * RDD-based data. Use only when you have a special requirement. * * === Preparator === * With [[PDataSource]], you must pick [[PPreparator]]. The same applies to * [[LDataSource]] and [[LPreparator]]. * * === Algorithm === * The workhorse of the engine comes in 3 different flavors. * * ==== P2LAlgorithm ==== * Produces a model that is small enough to fit in a single machine from * [[PDataSource]] and [[PPreparator]]. The model '''cannot''' contain any RDD. * If the produced model is serializable, PredictionIO will try to * automatically persist it. In addition, P2LAlgorithm.batchPredict is * already implemented for [[Evaluation]] purpose. * * ==== PAlgorithm ==== * Produces a model that could contain RDDs from [[PDataSource]] and * [[PPreparator]]. PredictionIO will not try to persist it automatically * unless the model implements [[PersistentModel]]. [[PAlgorithm.batchPredict]] * must be implemented for [[Evaluation]]. * * ==== LAlgorithm ==== * Produces a model that is small enough to fit in a single machine from * [[LDataSource]] and [[LPreparator]]. The model '''cannot''' contain any RDD. * If the produced model is serializable, PredictionIO will try to * automatically persist it. In addition, LAlgorithm.batchPredict is * already implemented for [[Evaluation]] purpose. * * === Serving === * The serving component comes with only 1 flavor--[[LServing]]. At the serving * stage, it is assumed that the result being served is already at a human- * consumable size. * * == Model Persistence == * PredictionIO tries its best to persist trained models automatically. Please * refer to [[LAlgorithm.makePersistentModel]], * [[P2LAlgorithm.makePersistentModel]], and [[PAlgorithm.makePersistentModel]] * for descriptions on different strategies. */ package object controller { /** Base class of several helper types that represent emptiness * * @group Helper */ class SerializableClass() extends Serializable /** Empty data source parameters. * @group Helper */ type EmptyDataSourceParams = EmptyParams /** Empty data parameters. * @group Helper */ type EmptyDataParams = EmptyParams /** Empty evaluation info. * @group Helper */ type EmptyEvaluationInfo = SerializableClass /** Empty preparator parameters. * @group Helper */ type EmptyPreparatorParams = EmptyParams /** Empty algorithm parameters. * @group Helper */ type EmptyAlgorithmParams = EmptyParams /** Empty serving parameters. * @group Helper */ type EmptyServingParams = EmptyParams /** Empty metrics parameters. * @group Helper */ type EmptyMetricsParams = EmptyParams /** Empty training data. * @group Helper */ type EmptyTrainingData = SerializableClass /** Empty prepared data. * @group Helper */ type EmptyPreparedData = SerializableClass /** Empty model. * @group Helper */ type EmptyModel = SerializableClass /** Empty actual result. * @group Helper */ type EmptyActualResult = SerializableClass } ================================================ FILE: core/src/main/scala/org/apache/predictionio/core/AbstractDoer.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.core import grizzled.slf4j.Logging import org.apache.predictionio.annotation.DeveloperApi import org.apache.predictionio.controller.Params /** :: DeveloperApi :: * Base class for all controllers */ @DeveloperApi abstract class AbstractDoer extends Serializable /** :: DeveloperApi :: * Provides facility to instantiate controller classes */ @DeveloperApi object Doer extends Logging { /** :: DeveloperApi :: * Instantiates a controller class using supplied controller parameters as * constructor parameters * * @param cls Class of the controller class * @param params Parameters of the controller class * @tparam C Controller class * @return An instance of the controller class */ @DeveloperApi def apply[C <: AbstractDoer] ( cls: Class[_ <: C], params: Params): C = { // Subclasses only allows two kind of constructors. // 1. Constructor with P <: Params. // 2. Empty constructor. // First try (1), if failed, try (2). try { val constr = cls.getConstructor(params.getClass) constr.newInstance(params) } catch { case e: NoSuchMethodException => try { val zeroConstr = cls.getConstructor() zeroConstr.newInstance() } catch { case e: NoSuchMethodException => error(s"${params.getClass.getName} was used as the constructor " + s"argument to ${e.getMessage}, but no constructor can handle it. " + "Aborting.") sys.exit(1) } } } } ================================================ FILE: core/src/main/scala/org/apache/predictionio/core/BaseAlgorithm.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.core import com.google.gson.TypeAdapterFactory import org.apache.predictionio.annotation.DeveloperApi import org.apache.predictionio.controller.Params import org.apache.predictionio.controller.Utils import net.jodah.typetools.TypeResolver import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD /** :: DeveloperApi :: * Base trait with default custom query serializer, exposed to engine developer * via [[org.apache.predictionio.controller.CustomQuerySerializer]] */ @DeveloperApi trait BaseQuerySerializer { /** :: DeveloperApi :: * Serializer for Scala query classes using * [[org.apache.predictionio.controller.Utils.json4sDefaultFormats]] */ @DeveloperApi @transient lazy val querySerializer = Utils.json4sDefaultFormats /** :: DeveloperApi :: * Serializer for Java query classes using Gson */ @DeveloperApi @transient lazy val gsonTypeAdapterFactories = Seq.empty[TypeAdapterFactory] } /** :: DeveloperApi :: * Base class of all algorithm controllers * * @tparam PD Prepared data class * @tparam M Model class * @tparam Q Query class * @tparam P Predicted result class */ @DeveloperApi abstract class BaseAlgorithm[PD, M, Q, P] extends AbstractDoer with BaseQuerySerializer { /** :: DeveloperApi :: * Engine developers should not use this directly. This is called by workflow * to train a model. * * @param sc Spark context * @param pd Prepared data * @return Trained model */ @DeveloperApi def trainBase(sc: SparkContext, pd: PD): M /** :: DeveloperApi :: * Engine developers should not use this directly. This is called by * evaluation workflow to perform batch prediction. * * @param sc Spark context * @param bm Model * @param qs Batch of queries * @return Batch of predicted results */ @DeveloperApi def batchPredictBase(sc: SparkContext, bm: Any, qs: RDD[(Long, Q)]) : RDD[(Long, P)] /** :: DeveloperApi :: * Engine developers should not use this directly. Called by serving to * perform a single prediction. * * @param bm Model * @param q Query * @return Predicted result */ @DeveloperApi def predictBase(bm: Any, q: Q): P /** :: DeveloperApi :: * Engine developers should not use this directly. Prepare a model for * persistence in the downstream consumer. PredictionIO supports 3 types of * model persistence: automatic persistence, manual persistence, and * re-training on deployment. This method provides a way for downstream * modules to determine which mode the model should be persisted. * * @param sc Spark context * @param modelId Model ID * @param algoParams Algorithm parameters that trained this model * @param bm Model * @return The model itself for automatic persistence, an instance of * [[org.apache.predictionio.workflow.PersistentModelManifest]] for manual * persistence, or Unit for re-training on deployment */ @DeveloperApi def makePersistentModel( sc: SparkContext, modelId: String, algoParams: Params, bm: Any): Any = () /** :: DeveloperApi :: * Obtains the type signature of query for this algorithm * * @return Type signature of query */ def queryClass: Class[Q] = { val types = TypeResolver.resolveRawArguments(classOf[BaseAlgorithm[PD, M, Q, P]], getClass) types(2).asInstanceOf[Class[Q]] } } ================================================ FILE: core/src/main/scala/org/apache/predictionio/core/BaseDataSource.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.core import org.apache.predictionio.annotation.DeveloperApi import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD /** :: DeveloperApi :: * Base class of all data source controllers * * @tparam TD Training data class * @tparam EI Evaluation information class * @tparam Q Query class * @tparam A Actual result class */ @DeveloperApi abstract class BaseDataSource[TD, EI, Q, A] extends AbstractDoer { /** :: DeveloperApi :: * Engine developer should not use this directly. This is called by workflow * to read training data. * * @param sc Spark context * @return Training data */ @DeveloperApi def readTrainingBase(sc: SparkContext): TD /** :: DeveloperApi :: * Engine developer should not use this directly. This is called by * evaluation workflow to read training and validation data. * * @param sc Spark context * @return Sets of training data, evaluation information, queries, and actual * results */ @DeveloperApi def readEvalBase(sc: SparkContext): Seq[(TD, EI, RDD[(Q, A)])] } ================================================ FILE: core/src/main/scala/org/apache/predictionio/core/BaseEngine.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.core import org.apache.predictionio.annotation.DeveloperApi import org.apache.predictionio.controller.EngineParams import org.apache.predictionio.workflow.JsonExtractorOption.JsonExtractorOption import org.apache.predictionio.workflow.WorkflowParams import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD import org.json4s.JValue /** :: DeveloperApi :: * Base class of all engine controller classes * * @tparam EI Evaluation information class * @tparam Q Query class * @tparam P Predicted result class * @tparam A Actual result class */ @DeveloperApi abstract class BaseEngine[EI, Q, P, A] extends Serializable { /** :: DeveloperApi :: * Implement this method so that training this engine would return a list of * models. * * @param sc An instance of SparkContext. * @param engineParams An instance of [[EngineParams]] for running a single training. * @param params An instance of [[WorkflowParams]] that controls the workflow. * @return A list of models. */ @DeveloperApi def train( sc: SparkContext, engineParams: EngineParams, engineInstanceId: String, params: WorkflowParams): Seq[Any] /** :: DeveloperApi :: * Implement this method so that [[org.apache.predictionio.controller.Evaluation]] can * use this method to generate inputs for [[org.apache.predictionio.controller.Metric]]. * * @param sc An instance of SparkContext. * @param engineParams An instance of [[EngineParams]] for running a single evaluation. * @param params An instance of [[WorkflowParams]] that controls the workflow. * @return A list of evaluation information and RDD of query, predicted * result, and actual result tuple tuple. */ @DeveloperApi def eval( sc: SparkContext, engineParams: EngineParams, params: WorkflowParams): Seq[(EI, RDD[(Q, P, A)])] /** :: DeveloperApi :: * Override this method to further optimize the process that runs multiple * evaluations (during tuning, for example). By default, this method calls * [[eval]] for each element in the engine parameters list. * * @param sc An instance of SparkContext. * @param engineParamsList A list of [[EngineParams]] for running batch evaluation. * @param params An instance of [[WorkflowParams]] that controls the workflow. * @return A list of engine parameters and evaluation result (from [[eval]]) tuples. */ @DeveloperApi def batchEval( sc: SparkContext, engineParamsList: Seq[EngineParams], params: WorkflowParams) : Seq[(EngineParams, Seq[(EI, RDD[(Q, P, A)])])] = { engineParamsList.map { engineParams => (engineParams, eval(sc, engineParams, params)) } } /** :: DeveloperApi :: * Implement this method to convert a JValue (read from an engine variant * JSON file) to an instance of [[EngineParams]]. * * @param variantJson Content of the engine variant JSON as JValue. * @param jsonExtractor Content of the engine variant JSON as JValue. * @return An instance of [[EngineParams]] converted from JSON. */ @DeveloperApi def jValueToEngineParams(variantJson: JValue, jsonExtractor: JsonExtractorOption): EngineParams = throw new NotImplementedError("JSON to EngineParams is not implemented.") } ================================================ FILE: core/src/main/scala/org/apache/predictionio/core/BaseEvaluator.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.core import org.apache.predictionio.annotation.DeveloperApi import org.apache.predictionio.annotation.Experimental import org.apache.predictionio.controller.EngineParams import org.apache.predictionio.controller.Evaluation import org.apache.predictionio.workflow.WorkflowParams import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD /** :: DeveloperApi :: * Base class of all evaluator controller classes * * @tparam EI Evaluation information class * @tparam Q Query class * @tparam P Predicted result class * @tparam A Actual result class * @tparam ER Evaluation result class */ @DeveloperApi abstract class BaseEvaluator[EI, Q, P, A, ER <: BaseEvaluatorResult] extends AbstractDoer { /** :: DeveloperApi :: * Engine developers should not use this directly. This is called by * evaluation workflow to perform evaluation. * * @param sc Spark context * @param evaluation Evaluation to run * @param engineEvalDataSet Sets of engine parameters and data for evaluation * @param params Evaluation workflow parameters * @return Evaluation result */ @DeveloperApi def evaluateBase( sc: SparkContext, evaluation: Evaluation, engineEvalDataSet: Seq[(EngineParams, Seq[(EI, RDD[(Q, P, A)])])], params: WorkflowParams): ER } /** Base trait of evaluator result */ trait BaseEvaluatorResult extends Serializable { /** A short description of the result */ def toOneLiner(): String = "" /** HTML portion of the rendered evaluator results */ def toHTML(): String = "" /** JSON portion of the rendered evaluator results */ def toJSON(): String = "" /** :: Experimental :: * Indicate if this result is inserted into database */ @Experimental val noSave: Boolean = false } ================================================ FILE: core/src/main/scala/org/apache/predictionio/core/BasePreparator.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.core import org.apache.predictionio.annotation.DeveloperApi import org.apache.spark.SparkContext /** :: DeveloperApi :: * Base class of all preparator controller classes * * Dev note: Probably will add an extra parameter for ad hoc JSON formatter * * @tparam TD Training data class * @tparam PD Prepared data class */ @DeveloperApi abstract class BasePreparator[TD, PD] extends AbstractDoer { /** :: DeveloperApi :: * Engine developers should not use this directly. This is called by training * workflow to prepare data before handing it over to algorithm * * @param sc Spark context * @param td Training data * @return Prepared data */ @DeveloperApi def prepareBase(sc: SparkContext, td: TD): PD } ================================================ FILE: core/src/main/scala/org/apache/predictionio/core/BaseServing.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.core import org.apache.predictionio.annotation.DeveloperApi import org.apache.predictionio.annotation.Experimental /** :: DeveloperApi :: * Base class of all serving controller classes * * @tparam Q Query class * @tparam P Predicted result class */ @DeveloperApi abstract class BaseServing[Q, P] extends AbstractDoer { /** :: Experimental :: * Engine developers should not use this directly. This is called by serving * layer to supplement process the query before sending it to algorithms. * * @param q Query * @return A supplement Query */ @Experimental def supplementBase(q: Q): Q /** :: DeveloperApi :: * Engine developers should not use this directly. This is called by serving * layer to combine multiple predicted results from multiple algorithms, and * custom business logic before serving to the end user. * * @param q Query * @param ps List of predicted results * @return A single predicted result */ @DeveloperApi def serveBase(q: Q, ps: Seq[P]): P } ================================================ FILE: core/src/main/scala/org/apache/predictionio/core/SelfCleaningDataSource.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.core import grizzled.slf4j.Logger import org.apache.predictionio.annotation.DeveloperApi import org.apache.predictionio.data.storage.{DataMap, Event,Storage} import org.apache.predictionio.data.store.{Common, LEventStore, PEventStore} import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD import org.joda.time.DateTime import org.json4s._ import scala.concurrent.ExecutionContext.Implicits.global import scala.concurrent.{Await, Future} import scala.concurrent.duration.Duration /** :: DeveloperApi :: * Base class of cleaned data source. * * A cleaned data source consists tools for cleaning events that happened earlier that * specified duration in seconds from train moment. Also it can remove duplicates and compress * properties(flat set/unset events to one) * */ @DeveloperApi trait SelfCleaningDataSource { implicit object DateTimeOrdering extends Ordering[DateTime] { def compare(d1: DateTime, d2: DateTime): Int = d2.compareTo(d1) } @transient lazy private val pEventsDb = Storage.getPEvents() @transient lazy private val lEventsDb = Storage.getLEvents() /** :: DeveloperApi :: * Current App name which events will be cleaned. * * @return App name */ @DeveloperApi def appName: String /** :: DeveloperApi :: * Param list that used for cleanup. * * @return current event windows that will be used to clean up events. */ @DeveloperApi def eventWindow: Option[EventWindow] = None @transient lazy val logger = Logger[this.type] /** :: DeveloperApi :: * * Returns RDD of events happened after duration in event window params. * * @return RDD[Event] most recent PEvents. */ @DeveloperApi def getCleanedPEvents(pEvents: RDD[Event]): RDD[Event] = { eventWindow .flatMap(_.duration) .map { duration => val fd = Duration(duration) pEvents.filter(e => e.eventTime.isAfter(DateTime.now().minus(fd.toMillis)) || isSetEvent(e) ) }.getOrElse(pEvents) } /** :: DeveloperApi :: * * Returns Iterator of events happened after duration in event window params. * * @return Iterator[Event] most recent LEvents. */ @DeveloperApi def getCleanedLEvents(lEvents: Iterable[Event]): Iterable[Event] = { eventWindow .flatMap(_.duration) .map { duration => val fd = Duration(duration) lEvents.filter(e => e.eventTime.isAfter(DateTime.now().minus(fd.toMillis)) || isSetEvent(e) ) }.getOrElse(lEvents) } def compressPProperties(sc: SparkContext, rdd: RDD[Event]): RDD[Event] = { rdd.filter(isSetEvent) .groupBy(_.entityType) .flatMap { pair => val (_, ls) = pair ls.groupBy(_.entityId).map { anotherpair => val (_, anotherls) = anotherpair compress(anotherls) } } ++ rdd.filter(!isSetEvent(_)) } def compressLProperties(events: Iterable[Event]): Iterable[Event] = { events.filter(isSetEvent) .groupBy(_.entityType) .map { pair => val (_, ls) = pair compress(ls) } ++ events.filter(!isSetEvent(_)) } def removePDuplicates(sc: SparkContext, rdd: RDD[Event]): RDD[Event] = { val now = DateTime.now() rdd.sortBy(_.eventTime, true).map(x => (recreateEvent(x, None, now), (x.eventId, x.eventTime))) .groupByKey .map{case (x, y) => recreateEvent(x, y.head._1, y.head._2)} } def recreateEvent(x: Event, eventId: Option[String], creationTime: DateTime): Event = { Event(eventId = eventId, event = x.event, entityType = x.entityType, entityId = x.entityId, targetEntityType = x.targetEntityType, targetEntityId = x.targetEntityId, properties = x.properties, eventTime = creationTime, tags = x.tags, prId= x.prId, creationTime = creationTime) } def removeLDuplicates(ls: Iterable[Event]): Iterable[Event] = { val now = DateTime.now() ls.toList.reverse.map(x => (recreateEvent(x, None, now), (x.eventId, x.eventTime))) .groupBy(_._1).mapValues( _.map( _._2 ) ) .map(x => recreateEvent(x._1, x._2.head._1, x._2.head._2)) } /** :: DeveloperApi :: * * Filters most recent, compress properties and removes duplicates of PEvents * * @return RDD[Event] most recent PEvents */ @DeveloperApi def cleanPersistedPEvents(sc: SparkContext): Unit ={ eventWindow match { case Some(ew) => val result = cleanPEvents(sc) val originalEvents = PEventStore.find(appName)(sc) val newEvents = result subtract originalEvents val eventsToRemove = (originalEvents subtract result).map { e => e.eventId.getOrElse("") } wipePEvents(newEvents, eventsToRemove, sc) case None => } } /** Replace events in Event Store */ def wipePEvents( newEvents: RDD[Event], eventsToRemove: RDD[String], sc: SparkContext ): Unit = { val (appId, channelId) = Common.appNameToId(appName, None) pEventsDb.write(newEvents.map(x => recreateEvent(x, None, x.eventTime)), appId)(sc) removePEvents(eventsToRemove, appId, sc) } def removeEvents(eventsToRemove: Set[String], appId: Int) { val listOfFuture: List[Future[Boolean]] = eventsToRemove .filter(x => x != "").toList.map { eventId => lEventsDb.futureDelete(eventId, appId) } val futureOfList: Future[List[Boolean]] = Future.sequence(listOfFuture) Await.result(futureOfList, scala.concurrent.duration.Duration(60, "minutes")) } def removePEvents(eventsToRemove: RDD[String], appId: Int, sc: SparkContext) { pEventsDb.delete(eventsToRemove.filter(x => x != ""), appId, None)(sc) } /** Replace events in Event Store * * @param newEvents new events * @param eventsToRemove event ids to remove */ def wipe( newEvents: Set[Event], eventsToRemove: Set[String] ): Unit = { val (appId, channelId) = Common.appNameToId(appName, None) val listOfFutureNewEvents: List[Future[String]] = newEvents.toList.map { event => lEventsDb.futureInsert(recreateEvent(event, None, event.eventTime), appId) } val futureOfListNewEvents: Future[List[String]] = Future.sequence(listOfFutureNewEvents) Await.result(futureOfListNewEvents, scala.concurrent.duration.Duration(60, "minutes")) removeEvents(eventsToRemove, appId) } /** :: DeveloperApi :: * * Filters most recent, compress properties of PEvents */ @DeveloperApi def cleanPEvents(sc: SparkContext): RDD[Event] = { val pEvents = getCleanedPEvents(PEventStore.find(appName)(sc).sortBy(_.eventTime, false)) val rdd = eventWindow match { case Some(ew) => val updated = if (ew.compressProperties) compressPProperties(sc, pEvents) else pEvents val deduped = if (ew.removeDuplicates) removePDuplicates(sc, updated) else updated deduped case None => pEvents } rdd } /** :: DeveloperApi :: * * Filters most recent, compress properties and removes duplicates of LEvents * * @return Iterator[Event] most recent LEvents */ @DeveloperApi def cleanPersistedLEvents: Unit = { eventWindow match { case Some(ew) => val result = cleanLEvents().toSet val originalEvents = LEventStore.find(appName).toSet val newEvents = result -- originalEvents val eventsToRemove = (originalEvents -- result).map { e => e.eventId.getOrElse("") } wipe(newEvents, eventsToRemove) case None => } } /** :: DeveloperApi :: * * Filters most recent, compress properties of LEvents */ @DeveloperApi def cleanLEvents(): Iterable[Event] = { val lEvents = getCleanedLEvents(LEventStore.find(appName).toList.sortBy(_.eventTime).reverse) val events = eventWindow match { case Some(ew) => val updated = if (ew.compressProperties) compressLProperties(lEvents) else lEvents val deduped = if (ew.removeDuplicates) removeLDuplicates(updated) else updated deduped case None => lEvents } events } private def isSetEvent(e: Event): Boolean = { e.event == "$set" || e.event == "$unset" } private def compress(events: Iterable[Event]): Event = { events.find(_.event == "$set") match { case Some(first) => events.reduce { (e1, e2) => val props = e2.event match { case "$set" => e1.properties.fields ++ e2.properties.fields case "$unset" => e1.properties.fields .filterKeys(f => !e2.properties.fields.contains(f)) } e1.copy(properties = DataMap(props), eventTime = e2.eventTime) } case None => events.reduce { (e1, e2) => e1.copy(properties = DataMap(e1.properties.fields ++ e2.properties.fields), eventTime = e2.eventTime ) } } } } case class EventWindow( duration: Option[String] = None, removeDuplicates: Boolean = false, compressProperties: Boolean = false ) ================================================ FILE: core/src/main/scala/org/apache/predictionio/core/package.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio /** Core base classes of PredictionIO controller components. Engine developers * should not use these directly. */ package object core {} ================================================ FILE: core/src/main/scala/org/apache/predictionio/package.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache /** PredictionIO Scala API */ package object predictionio {} ================================================ FILE: core/src/main/scala/org/apache/predictionio/workflow/BatchPredict.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.workflow import java.io.Serializable import com.twitter.bijection.Injection import com.twitter.chill.{KryoBase, KryoInjection, ScalaKryoInstantiator} import de.javakaffee.kryoserializers.SynchronizedCollectionsSerializer import grizzled.slf4j.Logging import org.apache.predictionio.controller.{Engine, Utils} import org.apache.predictionio.core.{BaseAlgorithm, BaseServing, Doer} import org.apache.predictionio.data.storage.{EngineInstance, Storage} import org.apache.predictionio.workflow.JsonExtractorOption.JsonExtractorOption import org.apache.predictionio.workflow.CleanupFunctions import org.apache.spark.rdd.RDD import org.json4s._ import org.json4s.native.JsonMethods._ import scala.language.existentials case class BatchPredictConfig( inputFilePath: String = "batchpredict-input.json", outputFilePath: String = "batchpredict-output.json", queryPartitions: Option[Int] = None, engineInstanceId: String = "", engineId: Option[String] = None, engineVersion: Option[String] = None, engineVariant: String = "", env: Option[String] = None, verbose: Boolean = false, debug: Boolean = false, jsonExtractor: JsonExtractorOption = JsonExtractorOption.Both) object BatchPredict extends Logging { class KryoInstantiator(classLoader: ClassLoader) extends ScalaKryoInstantiator { override def newKryo(): KryoBase = { val kryo = super.newKryo() kryo.setClassLoader(classLoader) SynchronizedCollectionsSerializer.registerSerializers(kryo) kryo } } object KryoInstantiator extends Serializable { def newKryoInjection : Injection[Any, Array[Byte]] = { val kryoInstantiator = new KryoInstantiator(getClass.getClassLoader) KryoInjection.instance(kryoInstantiator) } } val engineInstances = Storage.getMetaDataEngineInstances val modeldata = Storage.getModelDataModels def main(args: Array[String]): Unit = { val parser = new scopt.OptionParser[BatchPredictConfig]("BatchPredict") { opt[String]("input") action { (x, c) => c.copy(inputFilePath = x) } text("Path to file containing input queries; a " + "multi-object JSON file with one object per line.") opt[String]("output") action { (x, c) => c.copy(outputFilePath = x) } text("Path to file containing output predictions; a " + "multi-object JSON file with one object per line.") opt[Int]("query-partitions") action { (x, c) => c.copy(queryPartitions = Some(x)) } text("Limit concurrency of predictions by setting the number " + "of partitions used internally for the RDD of queries.") opt[String]("engineId") action { (x, c) => c.copy(engineId = Some(x)) } text("Engine ID.") opt[String]("engineId") action { (x, c) => c.copy(engineId = Some(x)) } text("Engine ID.") opt[String]("engineVersion") action { (x, c) => c.copy(engineVersion = Some(x)) } text("Engine version.") opt[String]("engine-variant") required() action { (x, c) => c.copy(engineVariant = x) } text("Engine variant JSON.") opt[String]("env") action { (x, c) => c.copy(env = Some(x)) } text("Comma-separated list of environmental variables (in 'FOO=BAR' " + "format) to pass to the Spark execution environment.") opt[String]("engineInstanceId") required() action { (x, c) => c.copy(engineInstanceId = x) } text("Engine instance ID.") opt[Unit]("verbose") action { (x, c) => c.copy(verbose = true) } text("Enable verbose output.") opt[Unit]("debug") action { (x, c) => c.copy(debug = true) } text("Enable debug output.") opt[String]("json-extractor") action { (x, c) => c.copy(jsonExtractor = JsonExtractorOption.withName(x)) } } parser.parse(args, BatchPredictConfig()) map { config => WorkflowUtils.modifyLogging(config.verbose) engineInstances.get(config.engineInstanceId) map { engineInstance => val engine = getEngine(engineInstance) run(config, engineInstance, engine) } getOrElse { error(s"Invalid engine instance ID. Aborting batch predict.") } } } def getEngine(engineInstance: EngineInstance): Engine[_, _, _, _, _, _] = { val engineFactoryName = engineInstance.engineFactory val (engineLanguage, engineFactory) = WorkflowUtils.getEngine(engineFactoryName, getClass.getClassLoader) val maybeEngine = engineFactory() // EngineFactory return a base engine, which may not be deployable. maybeEngine match { case e: Engine[_, _, _, _, _, _] => e case _ => throw new NoSuchMethodException( s"Engine $maybeEngine cannot be used for batch predict") } } def run[Q, P]( config: BatchPredictConfig, engineInstance: EngineInstance, engine: Engine[_, _, _, Q, P, _]): Unit = { try { val engineParams = engine.engineInstanceToEngineParams( engineInstance, config.jsonExtractor) val kryo = KryoInstantiator.newKryoInjection val modelsFromEngineInstance = kryo.invert(modeldata.get(engineInstance.id).get.models).get. asInstanceOf[Seq[Any]] val prepareSparkContext = WorkflowContext( batch = engineInstance.engineFactory, executorEnv = engineInstance.env, mode = "Batch Predict (model)", sparkEnv = engineInstance.sparkConf) val models = engine.prepareDeploy( prepareSparkContext, engineParams, engineInstance.id, modelsFromEngineInstance, params = WorkflowParams() ) val algorithms = engineParams.algorithmParamsList.map { case (n, p) => Doer(engine.algorithmClassMap(n), p) } val servingParamsWithName = engineParams.servingParams val serving = Doer(engine.servingClassMap(servingParamsWithName._1), servingParamsWithName._2) val runSparkContext = WorkflowContext( batch = engineInstance.engineFactory, executorEnv = engineInstance.env, mode = "Batch Predict (runner)", sparkEnv = engineInstance.sparkConf) val inputRDD: RDD[String] = runSparkContext. textFile(config.inputFilePath). filter(_.trim.nonEmpty) val queriesRDD: RDD[String] = config.queryPartitions match { case Some(p) => inputRDD.repartition(p) case None => inputRDD } val predictionsRDD: RDD[String] = queriesRDD.map { queryString => val jsonExtractorOption = config.jsonExtractor // Extract Query from Json val query = JsonExtractor.extract( jsonExtractorOption, queryString, algorithms.head.queryClass, algorithms.head.querySerializer, algorithms.head.gsonTypeAdapterFactories ) // Deploy logic. First call Serving.supplement, then Algo.predict, // finally Serving.serve. val supplementedQuery = serving.supplementBase(query) // TODO: Parallelize the following. val predictions = algorithms.zip(models).map { case (a, m) => a.predictBase(m, supplementedQuery) } // Notice that it is by design to call Serving.serve with the // *original* query. val prediction = serving.serveBase(query, predictions) // Combine query with prediction, so the batch results are // self-descriptive. val predictionJValue = JsonExtractor.toJValue( jsonExtractorOption, Map("query" -> query, "prediction" -> prediction), algorithms.head.querySerializer, algorithms.head.gsonTypeAdapterFactories) // Return JSON string compact(render(predictionJValue)) } predictionsRDD.saveAsTextFile(config.outputFilePath) } finally { CleanupFunctions.run() } } } ================================================ FILE: core/src/main/scala/org/apache/predictionio/workflow/CleanupFunctions.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.workflow /** :: DeveloperApi :: * Singleton object that collects anonymous functions to be * executed to allow the process to end gracefully. * * For example, the Elasticsearch REST storage client * maintains an internal connection pool that must * be closed to allow the process to exit. */ object CleanupFunctions { @volatile private var functions: Seq[() => Unit] = Seq.empty[() => Unit] /** Add a function to be called during cleanup. * * {{{ * import org.apache.predictionio.workflow.CleanupFunctions * * CleanupFunctions.add { MyStorageClass.close } * }}} * * @param f function containing cleanup code. */ def add(f: () => Unit): Seq[() => Unit] = { functions = functions :+ f functions } /** Call all cleanup functions in order added. * * {{{ * import org.apache.predictionio.workflow.CleanupFunctions * * try { * // Much code that needs cleanup * // whether successful or error thrown. * } finally { * CleanupFunctions.run() * } * }}} */ def run(): Unit = { functions.foreach { f => f() } } } ================================================ FILE: core/src/main/scala/org/apache/predictionio/workflow/CoreWorkflow.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.workflow import org.apache.predictionio.controller.EngineParams import org.apache.predictionio.controller.Evaluation import org.apache.predictionio.core.BaseEngine import org.apache.predictionio.core.BaseEvaluator import org.apache.predictionio.core.BaseEvaluatorResult import org.apache.predictionio.data.storage.EngineInstance import org.apache.predictionio.data.storage.EvaluationInstance import org.apache.predictionio.data.storage.Model import org.apache.predictionio.data.storage.Storage import com.github.nscala_time.time.Imports.DateTime import grizzled.slf4j.Logger import scala.language.existentials /** CoreWorkflow handles PredictionIO metadata and environment variables of * training and evaluation. */ object CoreWorkflow { @transient lazy val logger = Logger[this.type] @transient lazy val engineInstances = Storage.getMetaDataEngineInstances @transient lazy val evaluationInstances = Storage.getMetaDataEvaluationInstances() def runTrain[EI, Q, P, A]( engine: BaseEngine[EI, Q, P, A], engineParams: EngineParams, engineInstance: EngineInstance, env: Map[String, String] = WorkflowUtils.pioEnvVars, params: WorkflowParams = WorkflowParams()) { logger.debug("Starting SparkContext") val mode = "training" val batch = if (params.batch.nonEmpty) { s"{engineInstance.engineFactory} (${params.batch}})" } else { engineInstance.engineFactory } val sc = WorkflowContext( batch, env, params.sparkEnv, mode.capitalize) try { val models: Seq[Any] = engine.train( sc = sc, engineParams = engineParams, engineInstanceId = engineInstance.id, params = params ) val instanceId = Storage.getMetaDataEngineInstances val kryo = KryoInstantiator.newKryoInjection logger.info("Inserting persistent model") Storage.getModelDataModels.insert(Model( id = engineInstance.id, models = kryo(models))) logger.info("Updating engine instance") val engineInstances = Storage.getMetaDataEngineInstances engineInstances.update(engineInstance.copy( status = "COMPLETED", endTime = DateTime.now )) logger.info("Training completed successfully.") } catch { case e @( _: StopAfterReadInterruption | _: StopAfterPrepareInterruption) => { logger.info(s"Training interrupted by $e.") } } finally { logger.debug("Stopping SparkContext") CleanupFunctions.run() sc.stop() } } def runEvaluation[EI, Q, P, A, R <: BaseEvaluatorResult]( evaluation: Evaluation, engine: BaseEngine[EI, Q, P, A], engineParamsList: Seq[EngineParams], evaluationInstance: EvaluationInstance, evaluator: BaseEvaluator[EI, Q, P, A, R], env: Map[String, String] = WorkflowUtils.pioEnvVars, params: WorkflowParams = WorkflowParams()) { logger.info("runEvaluation started") logger.debug("Start SparkContext") val mode = "evaluation" val batch = if (params.batch.nonEmpty) { s"{evaluation.getClass.getName} (${params.batch}})" } else { evaluation.getClass.getName } val sc = WorkflowContext( batch, env, params.sparkEnv, mode.capitalize) try { val evaluationInstanceId = evaluationInstances.insert(evaluationInstance) logger.info(s"Starting evaluation instance ID: $evaluationInstanceId") val evaluatorResult: BaseEvaluatorResult = EvaluationWorkflow.runEvaluation( sc, evaluation, engine, engineParamsList, evaluator, params) if (evaluatorResult.noSave) { logger.info(s"This evaluation result is not inserted into database: $evaluatorResult") } else { val evaluatedEvaluationInstance = evaluationInstance.copy( status = "EVALCOMPLETED", id = evaluationInstanceId, endTime = DateTime.now, evaluatorResults = evaluatorResult.toOneLiner, evaluatorResultsHTML = evaluatorResult.toHTML, evaluatorResultsJSON = evaluatorResult.toJSON ) logger.info(s"Updating evaluation instance with result: $evaluatorResult") evaluationInstances.update(evaluatedEvaluationInstance) } logger.info("runEvaluation completed") } finally { logger.debug("Stop SparkContext") CleanupFunctions.run() sc.stop() } } } ================================================ FILE: core/src/main/scala/org/apache/predictionio/workflow/CreateServer.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.workflow import java.io.Serializable import java.util.concurrent.TimeUnit import akka.event.Logging import com.github.nscala_time.time.Imports.DateTime import com.twitter.bijection.Injection import com.twitter.chill.{KryoBase, KryoInjection, ScalaKryoInstantiator} import com.typesafe.config.ConfigFactory import de.javakaffee.kryoserializers.SynchronizedCollectionsSerializer import grizzled.slf4j.Logging import org.apache.commons.lang3.exception.ExceptionUtils import org.apache.predictionio.authentication.KeyAuthentication import org.apache.predictionio.controller.{Engine, Params, Utils, WithPrId} import org.apache.predictionio.core.{BaseAlgorithm, BaseServing, Doer} import org.apache.predictionio.data.storage.{EngineInstance, Storage} import org.apache.predictionio.workflow.JsonExtractorOption.JsonExtractorOption import org.json4s._ import org.json4s.native.JsonMethods._ import org.json4s.native.Serialization.write import akka.actor._ import akka.http.scaladsl.{ConnectionContext, Http, HttpsConnectionContext} import akka.http.scaladsl.Http.ServerBinding import akka.http.scaladsl.model.ContentTypes._ import akka.http.scaladsl.model.{HttpEntity, HttpResponse, StatusCodes} import akka.http.scaladsl.server.Directives.complete import akka.http.scaladsl.server.directives._ import akka.http.scaladsl.server._ import akka.pattern.ask import akka.util.Timeout import akka.http.scaladsl.server.Directives._ import akka.stream.ActorMaterializer import org.apache.predictionio.akkahttpjson4s.Json4sSupport._ import org.apache.predictionio.configuration.SSLConfiguration import scala.concurrent.ExecutionContext.Implicits.global import scala.concurrent.{Await, Future} import scala.concurrent.duration._ import scala.language.existentials import scala.util.{Failure, Random, Success} import scalaj.http.HttpOptions class KryoInstantiator(classLoader: ClassLoader) extends ScalaKryoInstantiator { override def newKryo(): KryoBase = { val kryo = super.newKryo() kryo.setClassLoader(classLoader) SynchronizedCollectionsSerializer.registerSerializers(kryo) kryo } } object KryoInstantiator extends Serializable { def newKryoInjection : Injection[Any, Array[Byte]] = { val kryoInstantiator = new KryoInstantiator(getClass.getClassLoader) KryoInjection.instance(kryoInstantiator) } } case class ServerConfig( batch: String = "", engineInstanceId: String = "", engineId: Option[String] = None, engineVersion: Option[String] = None, engineVariant: String = "", env: Option[String] = None, ip: String = "0.0.0.0", port: Int = 8000, feedback: Boolean = false, eventServerIp: String = "0.0.0.0", eventServerPort: Int = 7070, accessKey: Option[String] = None, logUrl: Option[String] = None, logPrefix: Option[String] = None, logFile: Option[String] = None, verbose: Boolean = false, debug: Boolean = false, jsonExtractor: JsonExtractorOption = JsonExtractorOption.Both) case class StartServer() case class BindServer() case class StopServer() case class ReloadServer() object CreateServer extends Logging { val actorSystem = ActorSystem("pio-server") val engineInstances = Storage.getMetaDataEngineInstances val modeldata = Storage.getModelDataModels def main(args: Array[String]): Unit = { val parser = new scopt.OptionParser[ServerConfig]("CreateServer") { opt[String]("batch") action { (x, c) => c.copy(batch = x) } text("Batch label of the deployment.") opt[String]("engineId") action { (x, c) => c.copy(engineId = Some(x)) } text("Engine ID.") opt[String]("engineVersion") action { (x, c) => c.copy(engineVersion = Some(x)) } text("Engine version.") opt[String]("engine-variant") required() action { (x, c) => c.copy(engineVariant = x) } text("Engine variant JSON.") opt[String]("ip") action { (x, c) => c.copy(ip = x) } opt[String]("env") action { (x, c) => c.copy(env = Some(x)) } text("Comma-separated list of environmental variables (in 'FOO=BAR' " + "format) to pass to the Spark execution environment.") opt[Int]("port") action { (x, c) => c.copy(port = x) } text("Port to bind to (default: 8000).") opt[String]("engineInstanceId") required() action { (x, c) => c.copy(engineInstanceId = x) } text("Engine instance ID.") opt[Unit]("feedback") action { (_, c) => c.copy(feedback = true) } text("Enable feedback loop to event server.") opt[String]("event-server-ip") action { (x, c) => c.copy(eventServerIp = x) } opt[Int]("event-server-port") action { (x, c) => c.copy(eventServerPort = x) } text("Event server port. Default: 7070") opt[String]("accesskey") action { (x, c) => c.copy(accessKey = Some(x)) } text("Event server access key.") opt[String]("log-url") action { (x, c) => c.copy(logUrl = Some(x)) } opt[String]("log-prefix") action { (x, c) => c.copy(logPrefix = Some(x)) } opt[String]("log-file") action { (x, c) => c.copy(logFile = Some(x)) } opt[Unit]("verbose") action { (x, c) => c.copy(verbose = true) } text("Enable verbose output.") opt[Unit]("debug") action { (x, c) => c.copy(debug = true) } text("Enable debug output.") opt[String]("json-extractor") action { (x, c) => c.copy(jsonExtractor = JsonExtractorOption.withName(x)) } } parser.parse(args, ServerConfig()) map { sc => WorkflowUtils.modifyLogging(sc.verbose) engineInstances.get(sc.engineInstanceId) map { engineInstance => val engineId = sc.engineId.getOrElse(engineInstance.engineId) val engineVersion = sc.engineVersion.getOrElse( engineInstance.engineVersion) val engineFactoryName = engineInstance.engineFactory val master = actorSystem.actorOf(Props( classOf[MasterActor], sc, engineInstance, engineFactoryName), "master") implicit val timeout = Timeout(5.seconds) master ? StartServer() val f = actorSystem.whenTerminated Await.ready(f, Duration.Inf) } getOrElse { error(s"Invalid engine instance ID. Aborting server.") } } } def createPredictionServerWithEngine[TD, EIN, PD, Q, P, A]( sc: ServerConfig, engineInstance: EngineInstance, engine: Engine[TD, EIN, PD, Q, P, A], engineLanguage: EngineLanguage.Value): PredictionServer[Q, P] = { val engineParams = engine.engineInstanceToEngineParams( engineInstance, sc.jsonExtractor) val kryo = KryoInstantiator.newKryoInjection val modelsFromEngineInstance = kryo.invert(modeldata.get(engineInstance.id).get.models).get. asInstanceOf[Seq[Any]] val batch = if (engineInstance.batch.nonEmpty) { s"${engineInstance.engineFactory} (${engineInstance.batch})" } else { engineInstance.engineFactory } val sparkContext = WorkflowContext( batch = batch, executorEnv = engineInstance.env, mode = "Serving", sparkEnv = engineInstance.sparkConf) val models = engine.prepareDeploy( sparkContext, engineParams, engineInstance.id, modelsFromEngineInstance, params = WorkflowParams() ) val algorithms = engineParams.algorithmParamsList.map { case (n, p) => Doer(engine.algorithmClassMap(n), p) } val servingParamsWithName = engineParams.servingParams val serving = Doer(engine.servingClassMap(servingParamsWithName._1), servingParamsWithName._2) new PredictionServer( sc, engineInstance, engine, engineLanguage, engineParams.dataSourceParams._2, engineParams.preparatorParams._2, algorithms, engineParams.algorithmParamsList.map(_._2), models, serving, engineParams.servingParams._2, actorSystem) } } object EngineServerJson4sSupport { implicit val serialization = org.json4s.jackson.Serialization implicit def json4sFormats: Formats = DefaultFormats } class MasterActor ( sc: ServerConfig, engineInstance: EngineInstance, engineFactoryName: String) extends Actor with KeyAuthentication with SSLConfiguration { val log = Logging(context.system, this) implicit val system = context.system implicit val materializer = ActorMaterializer() var currentServerBinding: Option[Future[ServerBinding]] = None var retry = 3 val serverConfig = ConfigFactory.load("server.conf") val sslEnforced = serverConfig.getBoolean("org.apache.predictionio.server.ssl-enforced") val protocol = if (sslEnforced) "https://" else "http://" val https: Option[HttpsConnectionContext] = if(sslEnforced){ val https = ConnectionContext.https(sslContext) Http().setDefaultServerHttpContext(https) Some(https) } else None def undeploy(ip: String, port: Int): Unit = { val serverUrl = s"${protocol}${ip}:${port}" log.info( s"Undeploying any existing engine instance at $serverUrl") try { val code = scalaj.http.Http(s"$serverUrl/stop") .option(HttpOptions.allowUnsafeSSL) .param(ServerKey.param, ServerKey.get) .method("POST").asString.code code match { case 200 => () case 404 => log.error( s"Another process is using $serverUrl. Unable to undeploy.") case _ => log.error( s"Another process is using $serverUrl, or an existing " + s"engine server is not responding properly (HTTP $code). " + "Unable to undeploy.") } } catch { case e: java.net.ConnectException => log.warning(s"Nothing at $serverUrl") case _: Throwable => log.error("Another process might be occupying " + s"$ip:$port. Unable to undeploy.") } } def receive: Actor.Receive = { case x: StartServer => undeploy(sc.ip, sc.port) self ! BindServer() case x: BindServer => currentServerBinding match { case Some(_) => log.error("Cannot bind a non-existing server backend.") case None => val server = createServer(sc, engineInstance, engineFactoryName) val route = server.createRoute() val binding = https match { case Some(https) => Http().bindAndHandle(route, sc.ip, sc.port, connectionContext = https) case None => Http().bindAndHandle(route, sc.ip, sc.port) } currentServerBinding = Some(binding) val serverUrl = s"${protocol}${sc.ip}:${sc.port}" log.info(s"Engine is deployed and running. Engine API is live at ${serverUrl}.") } case x: StopServer => log.info(s"Stop server command received.") currentServerBinding match { case Some(f) => f.flatMap { binding => binding.unbind() }.foreach { _ => system.terminate() } case None => log.warning("No active server is running.") } case x: ReloadServer => log.info("Reload server command received.") currentServerBinding match { case Some(f) => f.flatMap { binding => binding.unbind() } val latestEngineInstance = CreateServer.engineInstances.getLatestCompleted( engineInstance.engineId, engineInstance.engineVersion, engineInstance.engineVariant) latestEngineInstance map { lr => val server = createServer(sc, lr, engineFactoryName) val route = server.createRoute() val binding = https match { case Some(https) => Http().bindAndHandle(route, sc.ip, sc.port, connectionContext = https) case None => Http().bindAndHandle(route, sc.ip, sc.port) } currentServerBinding = Some(binding) } getOrElse { log.warning( s"No latest completed engine instance for ${engineInstance.engineId} " + s"${engineInstance.engineVersion}. Abort reloading.") } case None => log.warning("No active server is running. Abort reloading.") } } def createServer( sc: ServerConfig, engineInstance: EngineInstance, engineFactoryName: String): PredictionServer[_, _] = { val (engineLanguage, engineFactory) = WorkflowUtils.getEngine(engineFactoryName, getClass.getClassLoader) val engine = engineFactory() // EngineFactory return a base engine, which may not be deployable. if (!engine.isInstanceOf[Engine[_,_,_,_,_,_]]) { throw new NoSuchMethodException(s"Engine $engine is not deployable") } val deployableEngine = engine.asInstanceOf[Engine[_,_,_,_,_,_]] CreateServer.createPredictionServerWithEngine( sc, engineInstance, // engine, deployableEngine, engineLanguage) } } class PredictionServer[Q, P]( val args: ServerConfig, val engineInstance: EngineInstance, val engine: Engine[_, _, _, Q, P, _], val engineLanguage: EngineLanguage.Value, val dataSourceParams: Params, val preparatorParams: Params, val algorithms: Seq[BaseAlgorithm[_, _, Q, P]], val algorithmsParams: Seq[Params], val models: Seq[Any], val serving: BaseServing[Q, P], val servingParams: Params, val system: ActorSystem) extends KeyAuthentication { val log = Logging(system, getClass) val serverStartTime = DateTime.now var requestCount: Int = 0 var avgServingSec: Double = 0.0 var lastServingSec: Double = 0.0 implicit val timeout = Timeout(5, TimeUnit.SECONDS) val pluginsActorRef = system.actorOf(Props(classOf[PluginsActor], args.engineVariant), "PluginsActor") val pluginContext = EngineServerPluginContext(log, args.engineVariant) val feedbackEnabled = if (args.feedback) { if (args.accessKey.isEmpty) { log.error("Feedback loop cannot be enabled because accessKey is empty.") false } else { true } } else false def remoteLog(logUrl: String, logPrefix: String, message: String): Unit = { implicit val formats = Utils.json4sDefaultFormats try { scalaj.http.Http(logUrl).postData( logPrefix + write(Map( "engineInstance" -> engineInstance, "message" -> message))).asString } catch { case e: Throwable => log.error(s"Unable to send remote log: ${e.getMessage}") } } def authenticate[T](authenticator: RequestContext => Future[Either[Rejection, T]]): AuthenticationDirective[T] = { extractRequestContext.flatMap { requestContext => onSuccess(authenticator(requestContext)).flatMap { case Right(x) => provide(x) case Left(x) => reject(x): Directive1[T] } } } def createRoute(): Route = { val myRoute = path("") { get { complete(HttpResponse(entity = HttpEntity( `text/html(UTF-8)`, html.index( args, engineInstance, algorithms.map(_.toString), algorithmsParams.map(_.toString), models.map(_.toString), dataSourceParams.toString, preparatorParams.toString, servingParams.toString, serverStartTime, feedbackEnabled, args.eventServerIp, args.eventServerPort, requestCount, avgServingSec, lastServingSec ).toString ))) } } ~ path("queries.json") { post { entity(as[String]) { queryString => try { val servingStartTime = DateTime.now val jsonExtractorOption = args.jsonExtractor val queryTime = DateTime.now // Extract Query from Json val query = JsonExtractor.extract( jsonExtractorOption, queryString, algorithms.head.queryClass, algorithms.head.querySerializer, algorithms.head.gsonTypeAdapterFactories ) val queryJValue = JsonExtractor.toJValue( jsonExtractorOption, query, algorithms.head.querySerializer, algorithms.head.gsonTypeAdapterFactories) // Deploy logic. First call Serving.supplement, then Algo.predict, // finally Serving.serve. val supplementedQuery = serving.supplementBase(query) // TODO: Parallelize the following. val predictions = algorithms.zip(models).map { case (a, m) => a.predictBase(m, supplementedQuery) } // Notice that it is by design to call Serving.serve with the // *original* query. val prediction = serving.serveBase(query, predictions) val predictionJValue = JsonExtractor.toJValue( jsonExtractorOption, prediction, algorithms.head.querySerializer, algorithms.head.gsonTypeAdapterFactories) /** Handle feedback to Event Server * Send the following back to the Event Server * - appId * - engineInstanceId * - query * - prediction * - prId */ val result = if (feedbackEnabled) { implicit val formats = algorithms.headOption map { alg => alg.querySerializer } getOrElse { Utils.json4sDefaultFormats } // val genPrId = Random.alphanumeric.take(64).mkString def genPrId: String = Random.alphanumeric.take(64).mkString val newPrId = prediction match { case id: WithPrId => val org = id.prId if (org.isEmpty) genPrId else org case _ => genPrId } // also save Query's prId as prId of this pio_pr predict events val queryPrId = query match { case id: WithPrId => Map("prId" -> id.prId) case _ => Map.empty } val data = Map( // "appId" -> dataSourceParams.asInstanceOf[ParamsWithAppId].appId, "event" -> "predict", "eventTime" -> queryTime.toString(), "entityType" -> "pio_pr", // prediction result "entityId" -> newPrId, "properties" -> Map( "engineInstanceId" -> engineInstance.id, "query" -> query, "prediction" -> prediction)) ++ queryPrId // At this point args.accessKey should be Some(String). val accessKey = args.accessKey.getOrElse("") val f: Future[Int] = Future { scalaj.http.Http( s"http://${args.eventServerIp}:${args.eventServerPort}/" + s"events.json?accessKey=$accessKey").postData( write(data)).header( "content-type", "application/json").asString.code } f onComplete { case Success(code) => { if (code != 201) { log.error(s"Feedback event failed. Status code: $code." + s"Data: ${write(data)}.") } } case Failure(t) => { log.error(s"Feedback event failed: ${t.getMessage}") } } // overwrite prId in predictedResult // - if it is WithPrId, // then overwrite with new prId // - if it is not WithPrId, no prId injection if (prediction.isInstanceOf[WithPrId]) { predictionJValue merge parse(s"""{"prId" : "$newPrId"}""") } else { predictionJValue } } else predictionJValue val pluginResult = pluginContext.outputBlockers.values.foldLeft(result) { case (r, p) => p.process(engineInstance, queryJValue, r, pluginContext) } pluginsActorRef ! (engineInstance, queryJValue, result) // Bookkeeping val servingEndTime = DateTime.now lastServingSec = (servingEndTime.getMillis - servingStartTime.getMillis) / 1000.0 avgServingSec = ((avgServingSec * requestCount) + lastServingSec) / (requestCount + 1) requestCount += 1 complete(compact(render(pluginResult))) } catch { case e: MappingException => val msg = s"Query:\n$queryString\n\nStack Trace:\n" + s"${ExceptionUtils.getStackTrace(e)}\n\n" log.error(msg) args.logUrl map { url => remoteLog( url, args.logPrefix.getOrElse(""), msg) } complete(StatusCodes.BadRequest, e.getMessage) case e: Throwable => val msg = s"Query:\n$queryString\n\nStack Trace:\n" + s"${ExceptionUtils.getStackTrace(e)}\n\n" log.error(msg) args.logUrl map { url => remoteLog( url, args.logPrefix.getOrElse(""), msg) } complete(StatusCodes.InternalServerError, msg) } } } } ~ path("reload") { authenticate(withAccessKeyFromFile) { request => post { system.actorSelection("/user/master") ! ReloadServer() complete("Reloading...") } } } ~ path("stop") { authenticate(withAccessKeyFromFile) { request => post { system.scheduler.scheduleOnce(1.seconds) { system.actorSelection("/user/master") ! StopServer() } complete("Shutting down...") } } } ~ pathPrefix("assets") { getFromResourceDirectory("assets") } ~ path("plugins.json") { import EngineServerJson4sSupport._ get { complete( Map("plugins" -> Map( "outputblockers" -> pluginContext.outputBlockers.map { case (n, p) => n -> Map( "name" -> p.pluginName, "description" -> p.pluginDescription, "class" -> p.getClass.getName, "params" -> pluginContext.pluginParams(p.pluginName)) }, "outputsniffers" -> pluginContext.outputSniffers.map { case (n, p) => n -> Map( "name" -> p.pluginName, "description" -> p.pluginDescription, "class" -> p.getClass.getName, "params" -> pluginContext.pluginParams(p.pluginName)) } )) ) } } ~ path("plugins" / Segments) { segments => import EngineServerJson4sSupport._ get { val pluginArgs = segments.drop(2) val pluginType = segments(0) val pluginName = segments(1) pluginType match { case EngineServerPlugin.outputBlocker => complete(HttpResponse(entity = HttpEntity( `application/json`, pluginContext.outputBlockers(pluginName).handleREST(pluginArgs)))) case EngineServerPlugin.outputSniffer => complete(pluginsActorRef ? PluginsActor.HandleREST( pluginName = pluginName, pluginArgs = pluginArgs) map { json => HttpResponse(entity = HttpEntity( `application/json`, json.asInstanceOf[String] )) }) } } } myRoute } } ================================================ FILE: core/src/main/scala/org/apache/predictionio/workflow/CreateWorkflow.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.workflow import java.net.URI import com.github.nscala_time.time.Imports._ import com.google.common.io.ByteStreams import grizzled.slf4j.Logging import org.apache.predictionio.controller.Engine import org.apache.predictionio.core.BaseEngine import org.apache.predictionio.data.storage.EngineInstance import org.apache.predictionio.data.storage.EvaluationInstance import org.apache.predictionio.data.storage.Storage import org.apache.predictionio.workflow.JsonExtractorOption.JsonExtractorOption import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.FileSystem import org.apache.hadoop.fs.Path import org.json4s.JValue import org.json4s.JString import org.json4s.native.JsonMethods.parse import scala.language.existentials object CreateWorkflow extends Logging { case class WorkflowConfig( deployMode: String = "", batch: String = "", engineId: String = "", engineVersion: String = "", engineVariant: String = "", engineFactory: String = "", engineParamsKey: String = "", evaluationClass: Option[String] = None, engineParamsGeneratorClass: Option[String] = None, env: Option[String] = None, skipSanityCheck: Boolean = false, stopAfterRead: Boolean = false, stopAfterPrepare: Boolean = false, verbosity: Int = 0, verbose: Boolean = false, debug: Boolean = false, logFile: Option[String] = None, jsonExtractor: JsonExtractorOption = JsonExtractorOption.Both) case class AlgorithmParams(name: String, params: JValue) private def stringFromFile(filePath: String): String = { try { val uri = new URI(filePath) val fs = FileSystem.get(uri, new Configuration()) new String(ByteStreams.toByteArray(fs.open(new Path(uri))).map(_.toChar)) } catch { case e: java.io.IOException => error(s"Error reading from file: ${e.getMessage}. Aborting workflow.") sys.exit(1) } } val parser = new scopt.OptionParser[WorkflowConfig]("CreateWorkflow") { override def errorOnUnknownArgument: Boolean = false opt[String]("batch") action { (x, c) => c.copy(batch = x) } text("Batch label of the workflow run.") opt[String]("engine-id") required() action { (x, c) => c.copy(engineId = x) } text("Engine's ID.") opt[String]("engine-version") required() action { (x, c) => c.copy(engineVersion = x) } text("Engine's version.") opt[String]("engine-variant") required() action { (x, c) => c.copy(engineVariant = x) } text("Engine variant JSON.") opt[String]("evaluation-class") action { (x, c) => c.copy(evaluationClass = Some(x)) } text("Class name of the run's evaluator.") opt[String]("engine-params-generator-class") action { (x, c) => c.copy(engineParamsGeneratorClass = Some(x)) } text("Path to evaluator parameters") opt[String]("env") action { (x, c) => c.copy(env = Some(x)) } text("Comma-separated list of environmental variables (in 'FOO=BAR' " + "format) to pass to the Spark execution environment.") opt[Unit]("verbose") action { (x, c) => c.copy(verbose = true) } text("Enable verbose output.") opt[Unit]("debug") action { (x, c) => c.copy(debug = true) } text("Enable debug output.") opt[Unit]("skip-sanity-check") action { (x, c) => c.copy(skipSanityCheck = true) } opt[Unit]("stop-after-read") action { (x, c) => c.copy(stopAfterRead = true) } opt[Unit]("stop-after-prepare") action { (x, c) => c.copy(stopAfterPrepare = true) } opt[String]("deploy-mode") action { (x, c) => c.copy(deployMode = x) } opt[Int]("verbosity") action { (x, c) => c.copy(verbosity = x) } opt[String]("engine-factory") action { (x, c) => c.copy(engineFactory = x) } opt[String]("engine-params-key") action { (x, c) => c.copy(engineParamsKey = x) } opt[String]("log-file") action { (x, c) => c.copy(logFile = Some(x)) } opt[String]("json-extractor") action { (x, c) => c.copy(jsonExtractor = JsonExtractorOption.withName(x)) } } def main(args: Array[String]): Unit = { try { val wfcOpt = parser.parse(args, WorkflowConfig()) if (wfcOpt.isEmpty) { logger.error("WorkflowConfig is empty. Quitting") return } val wfc = wfcOpt.get WorkflowUtils.modifyLogging(wfc.verbose) val evaluation = wfc.evaluationClass.map { ec => try { WorkflowUtils.getEvaluation(ec, getClass.getClassLoader)._2 } catch { case e @ (_: ClassNotFoundException | _: NoSuchMethodException) => error(s"Unable to obtain evaluation $ec. Aborting workflow.", e) sys.exit(1) } } val engineParamsGenerator = wfc.engineParamsGeneratorClass.map { epg => try { WorkflowUtils.getEngineParamsGenerator(epg, getClass.getClassLoader)._2 } catch { case e @ (_: ClassNotFoundException | _: NoSuchMethodException) => error(s"Unable to obtain engine parameters generator $epg. " + "Aborting workflow.", e) sys.exit(1) } } val pioEnvVars = wfc.env.map { e => e.split(',').flatMap { p => p.split('=') match { case Array(k, v) => List(k -> v) case _ => Nil } }.toMap }.getOrElse(Map.empty) if (evaluation.isEmpty) { val variantJson = parse(stringFromFile(wfc.engineVariant)) val engineFactory = if (wfc.engineFactory == "") { variantJson \ "engineFactory" match { case JString(s) => s case _ => error("Unable to read engine factory class name from " + s"${wfc.engineVariant}. Aborting.") sys.exit(1) } } else wfc.engineFactory val variantId = variantJson \ "id" match { case JString(s) => s case _ => error("Unable to read engine variant ID from " + s"${wfc.engineVariant}. Aborting.") sys.exit(1) } val (engineLanguage, engineFactoryObj) = try { WorkflowUtils.getEngine(engineFactory, getClass.getClassLoader) } catch { case e @ (_: ClassNotFoundException | _: NoSuchMethodException) => error(s"Unable to obtain engine: ${e.getMessage}. Aborting workflow.") sys.exit(1) } val engine: BaseEngine[_, _, _, _] = engineFactoryObj() val customSparkConf = WorkflowUtils.extractSparkConf(variantJson) val workflowParams = WorkflowParams( verbose = wfc.verbosity, skipSanityCheck = wfc.skipSanityCheck, stopAfterRead = wfc.stopAfterRead, stopAfterPrepare = wfc.stopAfterPrepare, sparkEnv = WorkflowParams().sparkEnv ++ customSparkConf) // Evaluator Not Specified. Do training. if (!engine.isInstanceOf[Engine[_,_,_,_,_,_]]) { throw new NoSuchMethodException(s"Engine $engine is not trainable") } val trainableEngine = engine.asInstanceOf[Engine[_, _, _, _, _, _]] val engineParams = if (wfc.engineParamsKey == "") { trainableEngine.jValueToEngineParams(variantJson, wfc.jsonExtractor) } else { engineFactoryObj.engineParams(wfc.engineParamsKey) } val engineInstance = EngineInstance( id = "", status = "INIT", startTime = DateTime.now, endTime = DateTime.now, engineId = wfc.engineId, engineVersion = wfc.engineVersion, engineVariant = variantId, engineFactory = engineFactory, batch = wfc.batch, env = pioEnvVars, sparkConf = workflowParams.sparkEnv, dataSourceParams = JsonExtractor.paramToJson(wfc.jsonExtractor, engineParams.dataSourceParams), preparatorParams = JsonExtractor.paramToJson(wfc.jsonExtractor, engineParams.preparatorParams), algorithmsParams = JsonExtractor.paramsToJson(wfc.jsonExtractor, engineParams.algorithmParamsList), servingParams = JsonExtractor.paramToJson(wfc.jsonExtractor, engineParams.servingParams)) val engineInstanceId = Storage.getMetaDataEngineInstances.insert( engineInstance) CoreWorkflow.runTrain( env = pioEnvVars, params = workflowParams, engine = trainableEngine, engineParams = engineParams, engineInstance = engineInstance.copy(id = engineInstanceId)) } else { val workflowParams = WorkflowParams( verbose = wfc.verbosity, skipSanityCheck = wfc.skipSanityCheck, stopAfterRead = wfc.stopAfterRead, stopAfterPrepare = wfc.stopAfterPrepare, sparkEnv = WorkflowParams().sparkEnv) val evaluationInstance = EvaluationInstance( evaluationClass = wfc.evaluationClass.get, engineParamsGeneratorClass = wfc.engineParamsGeneratorClass.get, batch = wfc.batch, env = pioEnvVars, sparkConf = workflowParams.sparkEnv ) Workflow.runEvaluation( evaluation = evaluation.get, engineParamsGenerator = engineParamsGenerator.get, evaluationInstance = evaluationInstance, params = workflowParams) } } finally { CleanupFunctions.run() } } } ================================================ FILE: core/src/main/scala/org/apache/predictionio/workflow/EngineServerPlugin.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.workflow import org.apache.predictionio.data.storage.EngineInstance import org.json4s._ trait EngineServerPlugin { val pluginName: String val pluginDescription: String val pluginType: String def process( engineInstance: EngineInstance, query: JValue, prediction: JValue, context: EngineServerPluginContext): JValue def handleREST(arguments: Seq[String]): String } object EngineServerPlugin { val outputBlocker = "outputblocker" val outputSniffer = "outputsniffer" } ================================================ FILE: core/src/main/scala/org/apache/predictionio/workflow/EngineServerPluginContext.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.workflow import java.net.URI import java.util.ServiceLoader import akka.event.LoggingAdapter import com.google.common.io.ByteStreams import grizzled.slf4j.Logging import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.FileSystem import org.apache.hadoop.fs.Path import org.json4s.DefaultFormats import org.json4s.Formats import org.json4s.JObject import org.json4s.JValue import org.json4s.native.JsonMethods._ import scala.collection.JavaConversions._ import scala.collection.mutable class EngineServerPluginContext( val plugins: mutable.Map[String, mutable.Map[String, EngineServerPlugin]], val pluginParams: mutable.Map[String, JValue], val log: LoggingAdapter) { def outputBlockers: Map[String, EngineServerPlugin] = plugins.getOrElse(EngineServerPlugin.outputBlocker, Map.empty).toMap def outputSniffers: Map[String, EngineServerPlugin] = plugins.getOrElse(EngineServerPlugin.outputSniffer, Map.empty).toMap } object EngineServerPluginContext extends Logging { implicit val formats: Formats = DefaultFormats def apply(log: LoggingAdapter, engineVariant: String): EngineServerPluginContext = { val plugins = mutable.Map[String, mutable.Map[String, EngineServerPlugin]]( EngineServerPlugin.outputBlocker -> mutable.Map(), EngineServerPlugin.outputSniffer -> mutable.Map()) val pluginParams = mutable.Map[String, JValue]() val serviceLoader = ServiceLoader.load(classOf[EngineServerPlugin]) stringFromFile(engineVariant).foreach { variantJson => (parse(variantJson) \ "plugins").extractOpt[JObject].foreach { pluginDefs => pluginDefs.obj.foreach { pluginParams += _ } } } serviceLoader foreach { service => pluginParams.get(service.pluginName) map { params => if ((params \ "enabled").extractOrElse(false)) { info(s"Plugin ${service.pluginName} is enabled.") plugins(service.pluginType) += service.pluginName -> service } else { info(s"Plugin ${service.pluginName} is disabled.") } } getOrElse { info(s"Plugin ${service.pluginName} is disabled.") } } new EngineServerPluginContext( plugins, pluginParams, log) } private def stringFromFile(filePath: String): Option[String] = { try { val uri = new URI(filePath) val fs = FileSystem.get(uri, new Configuration()) val path = new Path(uri) if (fs.exists(path)) { Some(new String(ByteStreams.toByteArray(fs.open(path)).map(_.toChar))) } else { None } } catch { case e: java.io.IOException => error(s"Error reading from file: ${e.getMessage}. Aborting.") sys.exit(1) } } } ================================================ FILE: core/src/main/scala/org/apache/predictionio/workflow/EngineServerPluginsActor.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.workflow import akka.actor.Actor import akka.event.Logging import org.apache.predictionio.data.storage.EngineInstance import org.json4s.JValue class PluginsActor(engineVariant: String) extends Actor { implicit val system = context.system val log = Logging(system, this) val pluginContext = EngineServerPluginContext(log, engineVariant) override def receive: PartialFunction[Any, Unit] = { case (ei: EngineInstance, q: JValue, p: JValue) => pluginContext.outputSniffers.values.foreach(_.process(ei, q, p, pluginContext)) case h: PluginsActor.HandleREST => try { sender() ! pluginContext.outputSniffers(h.pluginName).handleREST(h.pluginArgs) } catch { case e: Exception => sender() ! s"""{"message":"${e.getMessage}"}""" } case _ => log.error("Unknown message sent to the Engine Server output sniffer plugin host.") } } object PluginsActor { case class HandleREST(pluginName: String, pluginArgs: Seq[String]) } ================================================ FILE: core/src/main/scala/org/apache/predictionio/workflow/EvaluationWorkflow.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.workflow import org.apache.predictionio.controller.EngineParams import org.apache.predictionio.controller.Evaluation import org.apache.predictionio.core.BaseEvaluator import org.apache.predictionio.core.BaseEvaluatorResult import org.apache.predictionio.core.BaseEngine import grizzled.slf4j.Logger import org.apache.spark.SparkContext import scala.language.existentials object EvaluationWorkflow { @transient lazy val logger = Logger[this.type] def runEvaluation[EI, Q, P, A, R <: BaseEvaluatorResult]( sc: SparkContext, evaluation: Evaluation, engine: BaseEngine[EI, Q, P, A], engineParamsList: Seq[EngineParams], evaluator: BaseEvaluator[EI, Q, P, A, R], params: WorkflowParams) : R = { val engineEvalDataSet = engine.batchEval(sc, engineParamsList, params) evaluator.evaluateBase(sc, evaluation, engineEvalDataSet, params) } } ================================================ FILE: core/src/main/scala/org/apache/predictionio/workflow/FakeWorkflow.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.workflow import org.apache.predictionio.annotation.Experimental // FIXME(yipjustin): Remove wildcard import. import org.apache.predictionio.core._ import org.apache.predictionio.controller._ import grizzled.slf4j.Logger import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.rdd.RDD @Experimental private[predictionio] class FakeEngine extends BaseEngine[EmptyParams, EmptyParams, EmptyParams, EmptyParams] { @transient lazy val logger = Logger[this.type] override def train( sc: SparkContext, engineParams: EngineParams, engineInstanceId: String, params: WorkflowParams): Seq[Any] = { throw new StopAfterReadInterruption() } override def eval( sc: SparkContext, engineParams: EngineParams, params: WorkflowParams) : Seq[(EmptyParams, RDD[(EmptyParams, EmptyParams, EmptyParams)])] = { return Seq[(EmptyParams, RDD[(EmptyParams, EmptyParams, EmptyParams)])]() } } @Experimental private[predictionio] class FakeRunner(f: (SparkContext => Unit)) extends BaseEvaluator[EmptyParams, EmptyParams, EmptyParams, EmptyParams, FakeEvalResult] { @transient private lazy val logger = Logger[this.type] override def evaluateBase( sc: SparkContext, evaluation: Evaluation, engineEvalDataSet: Seq[(EngineParams, Seq[(EmptyParams, RDD[(EmptyParams, EmptyParams, EmptyParams)])])], params: WorkflowParams): FakeEvalResult = { f(sc) FakeEvalResult() } } @Experimental private[predictionio] case class FakeEvalResult() extends BaseEvaluatorResult { override val noSave: Boolean = true } /** FakeRun allows user to implement custom function under the exact environment * as other PredictionIO workflow. * * Useful for developing new features. Only need to extend this trait and * implement a function: (SparkContext => Unit). For example, the code below * can be run with `pio eval HelloWorld`. * * {{{ * object HelloWorld extends FakeRun { * // func defines the function pio runs, must have signature (SparkContext => Unit). * func = f * * def f(sc: SparkContext): Unit { * val logger = Logger[this.type] * logger.info("HelloWorld") * } * } * }}} * */ @Experimental trait FakeRun extends Evaluation with EngineParamsGenerator { private[this] var _runner: FakeRunner = _ def runner: FakeRunner = _runner def runner_=(r: FakeRunner) { engineEvaluator = (new FakeEngine(), r) engineParamsList = Seq(new EngineParams()) } def func: (SparkContext => Unit) = { (sc: SparkContext) => () } def func_=(f: SparkContext => Unit) { runner = new FakeRunner(f) } } ================================================ FILE: core/src/main/scala/org/apache/predictionio/workflow/JsonExtractor.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.workflow import com.google.gson.Gson import com.google.gson.GsonBuilder import com.google.gson.TypeAdapterFactory import org.apache.predictionio.controller.EngineParams import org.apache.predictionio.controller.Params import org.apache.predictionio.controller.Utils import org.apache.predictionio.workflow.JsonExtractorOption.JsonExtractorOption import org.json4s.Extraction import org.json4s.Formats import org.json4s.JsonAST.{JArray, JValue} import org.json4s.native.JsonMethods.compact import org.json4s.native.JsonMethods.pretty import org.json4s.native.JsonMethods.parse import org.json4s.native.JsonMethods.render object JsonExtractor { def toJValue( extractorOption: JsonExtractorOption, o: Any, json4sFormats: Formats = Utils.json4sDefaultFormats, gsonTypeAdapterFactories: Seq[TypeAdapterFactory] = Seq.empty[TypeAdapterFactory]): JValue = { extractorOption match { case JsonExtractorOption.Both => val json4sResult = Extraction.decompose(o)(json4sFormats) json4sResult.children.size match { case 0 => parse(gson(gsonTypeAdapterFactories).toJson(o)) case _ => json4sResult } case JsonExtractorOption.Json4sNative => Extraction.decompose(o)(json4sFormats) case JsonExtractorOption.Gson => parse(gson(gsonTypeAdapterFactories).toJson(o)) } } def extract[T]( extractorOption: JsonExtractorOption, json: String, clazz: Class[T], json4sFormats: Formats = Utils.json4sDefaultFormats, gsonTypeAdapterFactories: Seq[TypeAdapterFactory] = Seq.empty[TypeAdapterFactory]): T = { extractorOption match { case JsonExtractorOption.Both => try { extractWithJson4sNative(json, json4sFormats, clazz) } catch { case e: Exception => extractWithGson(json, clazz, gsonTypeAdapterFactories) } case JsonExtractorOption.Json4sNative => extractWithJson4sNative(json, json4sFormats, clazz) case JsonExtractorOption.Gson => extractWithGson(json, clazz, gsonTypeAdapterFactories) } } def paramToJson(extractorOption: JsonExtractorOption, param: (String, Params)): String = { // to be replaced JValue needs to be done by Json4s, otherwise the tuple JValue will be wrong val toBeReplacedJValue = JsonExtractor.toJValue(JsonExtractorOption.Json4sNative, (param._1, null)) val paramJValue = JsonExtractor.toJValue(extractorOption, param._2) compact(render(toBeReplacedJValue.replace(param._1 :: Nil, paramJValue))) } def paramsToJson(extractorOption: JsonExtractorOption, params: Seq[(String, Params)]): String = { compact(render(paramsToJValue(extractorOption, params))) } def engineParamsToJson(extractorOption: JsonExtractorOption, params: EngineParams) : String = { compact(render(engineParamsToJValue(extractorOption, params))) } def engineParamstoPrettyJson( extractorOption: JsonExtractorOption, params: EngineParams) : String = { pretty(render(engineParamsToJValue(extractorOption, params))) } private def engineParamsToJValue(extractorOption: JsonExtractorOption, params: EngineParams) = { var jValue = toJValue(JsonExtractorOption.Json4sNative, params) val dataSourceParamsJValue = toJValue(extractorOption, params.dataSourceParams._2) jValue = jValue.replace( "dataSourceParams" :: params.dataSourceParams._1 :: Nil, dataSourceParamsJValue) val preparatorParamsJValue = toJValue(extractorOption, params.preparatorParams._2) jValue = jValue.replace( "preparatorParams" :: params.preparatorParams._1 :: Nil, preparatorParamsJValue) val algorithmParamsJValue = paramsToJValue(extractorOption, params.algorithmParamsList) jValue = jValue.replace("algorithmParamsList" :: Nil, algorithmParamsJValue) val servingParamsJValue = toJValue(extractorOption, params.servingParams._2) jValue = jValue.replace("servingParams" :: params.servingParams._1 :: Nil, servingParamsJValue) jValue } private def paramsToJValue(extractorOption: JsonExtractorOption, params: Seq[(String, Params)]) = { val jValues = params.map { case (name, param) => // to be replaced JValue needs to be done by Json4s, otherwise the tuple JValue will be wrong val toBeReplacedJValue = JsonExtractor.toJValue(JsonExtractorOption.Json4sNative, (name, null)) val paramJValue = JsonExtractor.toJValue(extractorOption, param) toBeReplacedJValue.replace(name :: Nil, paramJValue) } JArray(jValues.toList) } private def extractWithJson4sNative[T]( json: String, formats: Formats, clazz: Class[T]): T = { implicit val f = formats implicit val m = if (clazz == classOf[Map[_, _]]) { Manifest.classType(clazz, manifest[String], manifest[Any]) } else { Manifest.classType(clazz) } Extraction.extract(parse(json)) } private def extractWithGson[T]( json: String, clazz: Class[T], gsonTypeAdapterFactories: Seq[TypeAdapterFactory]): T = { gson(gsonTypeAdapterFactories).fromJson(json, clazz) } private def gson(gsonTypeAdapterFactories: Seq[TypeAdapterFactory]): Gson = { val gsonBuilder = new GsonBuilder() gsonTypeAdapterFactories.foreach { typeAdapterFactory => gsonBuilder.registerTypeAdapterFactory(typeAdapterFactory) } gsonBuilder.create() } } ================================================ FILE: core/src/main/scala/org/apache/predictionio/workflow/JsonExtractorOption.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.workflow object JsonExtractorOption extends Enumeration { type JsonExtractorOption = Value val Json4sNative = Value val Gson = Value val Both = Value } ================================================ FILE: core/src/main/scala/org/apache/predictionio/workflow/PersistentModelManifest.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.workflow case class PersistentModelManifest(className: String) ================================================ FILE: core/src/main/scala/org/apache/predictionio/workflow/Workflow.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.workflow import org.apache.predictionio.annotation.Experimental import org.apache.predictionio.controller.EngineParams import org.apache.predictionio.controller.EngineParamsGenerator import org.apache.predictionio.controller.Evaluation import org.apache.predictionio.core.BaseEngine import org.apache.predictionio.core.BaseEvaluator import org.apache.predictionio.core.BaseEvaluatorResult import org.apache.predictionio.data.storage.EvaluationInstance /** Collection of workflow creation methods. * @group Workflow */ object Workflow { // evaluator is already instantiated. // This is an undocumented way of using evaluator. Still experimental. // evaluatorParams is used to write into EngineInstance, will be shown in // dashboard. /* def runEval[EI, Q, P, A, ER <: AnyRef]( engine: BaseEngine[EI, Q, P, A], engineParams: EngineParams, evaluator: BaseEvaluator[EI, Q, P, A, ER], evaluatorParams: Params, env: Map[String, String] = WorkflowUtils.pioEnvVars, params: WorkflowParams = WorkflowParams()) { implicit lazy val formats = Utils.json4sDefaultFormats + new NameParamsSerializer val engineInstance = EngineInstance( id = "", status = "INIT", startTime = DateTime.now, endTime = DateTime.now, engineId = "", engineVersion = "", engineVariant = "", engineFactory = "FIXME", evaluatorClass = evaluator.getClass.getName(), batch = params.batch, env = env, sparkConf = params.sparkEnv, dataSourceParams = write(engineParams.dataSourceParams), preparatorParams = write(engineParams.preparatorParams), algorithmsParams = write(engineParams.algorithmParamsList), servingParams = write(engineParams.servingParams), evaluatorParams = write(evaluatorParams), evaluatorResults = "", evaluatorResultsHTML = "", evaluatorResultsJSON = "") CoreWorkflow.runEval( engine = engine, engineParams = engineParams, engineInstance = engineInstance, evaluator = evaluator, evaluatorParams = evaluatorParams, env = env, params = params) } */ def runEvaluation( evaluation: Evaluation, engineParamsGenerator: EngineParamsGenerator, env: Map[String, String] = WorkflowUtils.pioEnvVars, evaluationInstance: EvaluationInstance = EvaluationInstance(), params: WorkflowParams = WorkflowParams()) { runEvaluationTypeless( evaluation = evaluation, engine = evaluation.engine, engineParamsList = engineParamsGenerator.engineParamsList, evaluationInstance = evaluationInstance, evaluator = evaluation.evaluator, env = env, params = params ) } def runEvaluationTypeless[ EI, Q, P, A, EEI, EQ, EP, EA, ER <: BaseEvaluatorResult]( evaluation: Evaluation, engine: BaseEngine[EI, Q, P, A], engineParamsList: Seq[EngineParams], evaluationInstance: EvaluationInstance, evaluator: BaseEvaluator[EEI, EQ, EP, EA, ER], env: Map[String, String] = WorkflowUtils.pioEnvVars, params: WorkflowParams = WorkflowParams()) { runEvaluationViaCoreWorkflow( evaluation = evaluation, engine = engine, engineParamsList = engineParamsList, evaluationInstance = evaluationInstance, evaluator = evaluator.asInstanceOf[BaseEvaluator[EI, Q, P, A, ER]], env = env, params = params) } /** :: Experimental :: */ @Experimental def runEvaluationViaCoreWorkflow[EI, Q, P, A, R <: BaseEvaluatorResult]( evaluation: Evaluation, engine: BaseEngine[EI, Q, P, A], engineParamsList: Seq[EngineParams], evaluationInstance: EvaluationInstance, evaluator: BaseEvaluator[EI, Q, P, A, R], env: Map[String, String] = WorkflowUtils.pioEnvVars, params: WorkflowParams = WorkflowParams()) { CoreWorkflow.runEvaluation( evaluation = evaluation, engine = engine, engineParamsList = engineParamsList, evaluationInstance = evaluationInstance, evaluator = evaluator, env = env, params = params) } } ================================================ FILE: core/src/main/scala/org/apache/predictionio/workflow/WorkflowContext.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.workflow import grizzled.slf4j.Logging import org.apache.spark.SparkContext import org.apache.spark.SparkConf import scala.language.existentials // FIXME: move to better location. object WorkflowContext extends Logging { def apply( batch: String = "", executorEnv: Map[String, String] = Map.empty, sparkEnv: Map[String, String] = Map.empty, mode: String = "" ): SparkContext = { val conf = new SparkConf() val prefix = if (mode == "") "PredictionIO" else s"PredictionIO ${mode}" conf.setAppName(s"${prefix}: ${batch}") debug(s"Executor environment received: ${executorEnv}") executorEnv.map(kv => conf.setExecutorEnv(kv._1, kv._2)) debug(s"SparkConf executor environment: ${conf.getExecutorEnv}") debug(s"Application environment received: ${sparkEnv}") conf.setAll(sparkEnv) val sparkConfString = conf.getAll.toSeq debug(s"SparkConf environment: $sparkConfString") new SparkContext(conf) } } ================================================ FILE: core/src/main/scala/org/apache/predictionio/workflow/WorkflowParams.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.workflow /** Workflow parameters. * * @param batch Batch label of the run. * @param verbose Verbosity level. * @param saveModel Controls whether trained models are persisted. * @param sparkEnv Spark properties that will be set in SparkConf.setAll(). * @param skipSanityCheck Skips all data sanity check. * @param stopAfterRead Stops workflow after reading from data source. * @param stopAfterPrepare Stops workflow after data preparation. * @group Workflow */ case class WorkflowParams( batch: String = "", verbose: Int = 2, saveModel: Boolean = true, sparkEnv: Map[String, String] = Map[String, String]("spark.executor.extraClassPath" -> "."), skipSanityCheck: Boolean = false, stopAfterRead: Boolean = false, stopAfterPrepare: Boolean = false) { // Temporary workaround for WorkflowParamsBuilder for Java. It doesn't support // custom spark environment yet. def this(batch: String, verbose: Int, saveModel: Boolean) = this(batch, verbose, saveModel, Map[String, String]()) } ================================================ FILE: core/src/main/scala/org/apache/predictionio/workflow/WorkflowUtils.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.workflow import java.io.File import java.net.URI import com.google.gson.{Gson, JsonSyntaxException} import grizzled.slf4j.Logging import org.apache.log4j.{Level, LogManager} import org.apache.predictionio.controller._ import org.apache.predictionio.workflow.JsonExtractorOption.JsonExtractorOption import org.apache.spark.SparkContext import org.apache.spark.api.java.JavaRDDLike import org.apache.spark.rdd.RDD import org.json4s.JsonAST.JValue import org.json4s._ import org.json4s.native.JsonMethods._ import scala.language.existentials import scala.reflect.runtime.universe /** Collection of reusable workflow related utilities. */ object WorkflowUtils extends Logging { @transient private lazy val gson = new Gson /** Obtains an Engine object in Scala, or instantiate an Engine in Java. * * @param engine Engine factory name. * @param cl A Java ClassLoader to look for engine-related classes. * * @throws ClassNotFoundException * Thrown when engine factory class does not exist. * @throws NoSuchMethodException * Thrown when engine factory's apply() method is not implemented. */ def getEngine(engine: String, cl: ClassLoader): (EngineLanguage.Value, EngineFactory) = { val runtimeMirror = universe.runtimeMirror(cl) val engineModule = runtimeMirror.staticModule(engine) val engineObject = runtimeMirror.reflectModule(engineModule) try { ( EngineLanguage.Scala, engineObject.instance.asInstanceOf[EngineFactory] ) } catch { case e @ (_: NoSuchFieldException | _: ClassNotFoundException) => ( EngineLanguage.Java, Class.forName(engine).newInstance.asInstanceOf[EngineFactory] ) } } def getEngineParamsGenerator(epg: String, cl: ClassLoader): (EngineLanguage.Value, EngineParamsGenerator) = { val runtimeMirror = universe.runtimeMirror(cl) val epgModule = runtimeMirror.staticModule(epg) val epgObject = runtimeMirror.reflectModule(epgModule) try { ( EngineLanguage.Scala, epgObject.instance.asInstanceOf[EngineParamsGenerator] ) } catch { case e @ (_: NoSuchFieldException | _: ClassNotFoundException) => ( EngineLanguage.Java, Class.forName(epg).newInstance.asInstanceOf[EngineParamsGenerator] ) } } def getEvaluation(evaluation: String, cl: ClassLoader): (EngineLanguage.Value, Evaluation) = { val runtimeMirror = universe.runtimeMirror(cl) val evaluationModule = runtimeMirror.staticModule(evaluation) val evaluationObject = runtimeMirror.reflectModule(evaluationModule) try { ( EngineLanguage.Scala, evaluationObject.instance.asInstanceOf[Evaluation] ) } catch { case e @ (_: NoSuchFieldException | _: ClassNotFoundException) => ( EngineLanguage.Java, Class.forName(evaluation).newInstance.asInstanceOf[Evaluation] ) } } /** Converts a JSON document to an instance of Params. * * @param language Engine's programming language. * @param json JSON document. * @param clazz Class of the component that is going to receive the resulting * Params instance as a constructor argument. * @param jsonExtractor JSON extractor option. * @param formats JSON4S serializers for deserialization. * * @throws MappingException Thrown when JSON4S fails to perform conversion. * @throws JsonSyntaxException Thrown when GSON fails to perform conversion. */ def extractParams( language: EngineLanguage.Value = EngineLanguage.Scala, json: String, clazz: Class[_], jsonExtractor: JsonExtractorOption, formats: Formats = Utils.json4sDefaultFormats): Params = { implicit val f = formats val pClass = clazz.getConstructors.head.getParameterTypes if (pClass.size == 0) { if (json != "") { warn(s"Non-empty parameters supplied to ${clazz.getName}, but its " + "constructor does not accept any arguments. Stubbing with empty " + "parameters.") } EmptyParams() } else { val apClass = pClass.head try { JsonExtractor.extract(jsonExtractor, json, apClass, f).asInstanceOf[Params] } catch { case e@(_: MappingException | _: JsonSyntaxException) => error( s"Unable to extract parameters for ${apClass.getName} from " + s"JSON string: $json. Aborting workflow.", e) throw e } } } def getParamsFromJsonByFieldAndClass( variantJson: JValue, field: String, classMap: Map[String, Class[_]], engineLanguage: EngineLanguage.Value, jsonExtractor: JsonExtractorOption): (String, Params) = { variantJson findField { case JField(f, _) => f == field case _ => false } map { jv => implicit lazy val formats = Utils.json4sDefaultFormats + new NameParamsSerializer val np: NameParams = try { jv._2.extract[NameParams] } catch { case e: Exception => error(s"Unable to extract $field name and params $jv") throw e } val extractedParams = np.params.map { p => try { if (!classMap.contains(np.name)) { error(s"Unable to find $field class with name '${np.name}'" + " defined in Engine.") sys.exit(1) } WorkflowUtils.extractParams( engineLanguage, compact(render(p)), classMap(np.name), jsonExtractor, formats) } catch { case e: Exception => error(s"Unable to extract $field params $p") throw e } }.getOrElse(EmptyParams()) (np.name, extractedParams) } getOrElse("", EmptyParams()) } /** Grab environmental variables that starts with 'PIO_'. */ def pioEnvVars: Map[String, String] = sys.env.filter(kv => kv._1.startsWith("PIO_")) /** Converts Java (non-Scala) objects to a JSON4S JValue. * * @param params The Java object to be converted. */ def javaObjectToJValue(params: AnyRef): JValue = parse(gson.toJson(params)) // Extract debug string by recursively traversing the data. def debugString[D](data: D): String = { val s: String = data match { case rdd: RDD[_] => { debugString(rdd.collect()) } case javaRdd: JavaRDDLike[_, _] => { debugString(javaRdd.collect()) } case array: Array[_] => { "[" + array.map(debugString).mkString(",") + "]" } case d: AnyRef => { d.toString } case null => "null" } s } /** Detect third party software configuration files to be submitted as * extras to Apache Spark. This makes sure all executors receive the same * configuration. */ def thirdPartyConfFiles: Seq[String] = { val thirdPartyFiles = Map( "PIO_CONF_DIR" -> "log4j.properties", "ES_CONF_DIR" -> "elasticsearch.yml", "HADOOP_CONF_DIR" -> "core-site.xml", "HBASE_CONF_DIR" -> "hbase-site.xml") thirdPartyFiles.keys.toSeq.flatMap { k: String => sys.env.get(k) map { x => val p = Seq(x, thirdPartyFiles(k)).mkString(File.separator) if (new File(p).exists) Seq(p) else Seq[String]() } getOrElse Seq[String]() } } def thirdPartyClasspaths: Seq[String] = { val thirdPartyPaths = Seq( "PIO_CONF_DIR", "ES_CONF_DIR", "POSTGRES_JDBC_DRIVER", "MYSQL_JDBC_DRIVER", "HADOOP_CONF_DIR", "HBASE_CONF_DIR") thirdPartyPaths.flatMap(p => sys.env.get(p).map(Seq(_)).getOrElse(Seq[String]()) ) } def thirdPartyJars: Seq[URI] = { val thirdPartyPaths = Seq( "POSTGRES_JDBC_DRIVER", "MYSQL_JDBC_DRIVER") thirdPartyPaths.flatMap(p => sys.env.get(p) map { f => val file = new File(f) if (file.exists()) { Seq(file.toURI) } else { warn(s"Environment variable $p is pointing to a nonexistent file $f. Ignoring.") Seq.empty } } getOrElse Seq.empty ) } def modifyLogging(verbose: Boolean): Unit = { val rootLoggerLevel = if (verbose) Level.TRACE else Level.INFO val chattyLoggerLevel = if (verbose) Level.INFO else Level.WARN LogManager.getRootLogger.setLevel(rootLoggerLevel) LogManager.getLogger("org.elasticsearch").setLevel(chattyLoggerLevel) LogManager.getLogger("org.apache.hadoop").setLevel(chattyLoggerLevel) LogManager.getLogger("org.apache.spark").setLevel(chattyLoggerLevel) LogManager.getLogger("org.eclipse.jetty").setLevel(chattyLoggerLevel) LogManager.getLogger("akka").setLevel(chattyLoggerLevel) } def extractNameParams(jv: JValue): NameParams = { implicit val formats = Utils.json4sDefaultFormats val nameOpt = (jv \ "name").extract[Option[String]] val paramsOpt = (jv \ "params").extract[Option[JValue]] if (nameOpt.isEmpty && paramsOpt.isEmpty) { error("Unable to find 'name' or 'params' fields in" + s" ${compact(render(jv))}.\n" + "Since 0.8.4, the 'params' field is required in engine.json" + " in order to specify parameters for DataSource, Preparator or" + " Serving.\n" + "Please go to http://predictionio.apache.org/resources/upgrade/" + " for detailed instruction of how to change engine.json.") sys.exit(1) } if (nameOpt.isEmpty) { info(s"No 'name' is found. Default empty String will be used.") } if (paramsOpt.isEmpty) { info(s"No 'params' is found. Default EmptyParams will be used.") } NameParams( name = nameOpt.getOrElse(""), params = paramsOpt ) } def extractSparkConf(root: JValue): List[(String, String)] = { def flatten(jv: JValue): List[(List[String], String)] = { jv match { case JObject(fields) => for ((namePrefix, childJV) <- fields; (name, value) <- flatten(childJV)) yield (namePrefix :: name) -> value case JArray(_) => { error("Arrays are not allowed in the sparkConf section of engine.js.") sys.exit(1) } case JNothing => Nil case _ => List(Nil -> jv.values.toString) } } flatten(root \ "sparkConf").map(x => (x._1.reduce((a, b) => s"$a.$b"), x._2)) } } case class NameParams(name: String, params: Option[JValue]) class NameParamsSerializer extends CustomSerializer[NameParams](format => ( { case jv: JValue => WorkflowUtils.extractNameParams(jv) }, { case x: NameParams => JObject(JField("name", JString(x.name)) :: JField("params", x.params.getOrElse(JNothing)) :: Nil) } )) /** Collection of reusable workflow related utilities that touch on Apache * Spark. They are separated to avoid compilation problems with certain code. */ object SparkWorkflowUtils extends Logging { def getPersistentModel[AP <: Params, M]( pmm: PersistentModelManifest, runId: String, params: AP, sc: Option[SparkContext], cl: ClassLoader): M = { val runtimeMirror = universe.runtimeMirror(cl) val pmmModule = runtimeMirror.staticModule(pmm.className) val pmmObject = runtimeMirror.reflectModule(pmmModule) try { pmmObject.instance.asInstanceOf[PersistentModelLoader[AP, M]]( runId, params, sc) } catch { case e @ (_: NoSuchFieldException | _: ClassNotFoundException) => try { val loadMethod = Class.forName(pmm.className).getMethod( "load", classOf[String], classOf[Params], classOf[SparkContext]) loadMethod.invoke(null, runId, params, sc.orNull).asInstanceOf[M] } catch { case e: ClassNotFoundException => error(s"Model class ${pmm.className} cannot be found.") throw e case e: NoSuchMethodException => error( "The load(String, Params, SparkContext) method cannot be found.") throw e } } } } class WorkflowInterruption() extends Exception case class StopAfterReadInterruption() extends WorkflowInterruption case class StopAfterPrepareInterruption() extends WorkflowInterruption object EngineLanguage extends Enumeration { val Scala, Java = Value } ================================================ FILE: core/src/main/twirl/org/apache/predictionio/controller/metric_evaluator.scala.html ================================================

Metric Evaluator

Engine Params Evaluation Results

Click on table to view the engine params

    
  



================================================
FILE: core/src/main/twirl/org/apache/predictionio/workflow/index.scala.html
================================================
@import org.apache.predictionio.data.storage.EngineInstance
@import org.apache.predictionio.workflow.ServerConfig
@import org.joda.time.DateTime
@import org.joda.time.format.DateTimeFormat
@(args: ServerConfig,
  engineInstance: EngineInstance,
  algorithms: Seq[String],
  algorithmsParams: Seq[String],
  models: Seq[String],
  dataSourceParams: String,
  preparatorParams: String,
  servingParams: String,
  serverStartTime: DateTime,
  feedback: Boolean,
  eventServerIp: String,
  eventServerPort: Int,
  requestCount: Int,
  avgServingSec: Double,
  lastServingSec: Double
  )



  
    @{engineInstance.engineFactory} (@{engineInstance.engineVariant}) - PredictionIO Engine Server at @{args.ip}:@{args.port}
    
    
  
  
    

Engine Information

Training Start Time@{DateTimeFormat.forStyle("FF").print(engineInstance.startTime)}
Training End Time@{DateTimeFormat.forStyle("FF").print(engineInstance.endTime)}
Variant ID@{engineInstance.engineVariant}
Instance ID@{engineInstance.id}

Server Information

Start Time@{DateTimeFormat.forStyle("FF").print(serverStartTime)}
Request Count@{requestCount}
Average Serving Time@{f"${avgServingSec}%.4f"} seconds
Last Serving Time@{f"${lastServingSec}%.4f"} seconds
Engine Factory Class (Scala/Java)@{engineInstance.engineFactory}

Data Source

Parameters@{dataSourceParams}

Data Preparator

Parameters@{preparatorParams}

Algorithms and Models

@for((((algo, param), model), i) <- algorithms.zip(algorithmsParams).zip(models).zipWithIndex) { }
#Information
@{i + 1} Class@{algo}
Parameters@{param}
Model@{model}

Serving

Parameters@{servingParams}

Feedback Loop Information

Feedback Loop Enabled?@{feedback}
Event Server IP@{eventServerIp}
Event Server Port@{eventServerPort}
================================================ FILE: core/src/test/java/org/apache/predictionio/workflow/JavaParams.java ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.workflow; import org.apache.predictionio.controller.Params; public class JavaParams implements Params { private final String p; public JavaParams(String p) { this.p = p; } public String getP() { return p; } } ================================================ FILE: core/src/test/java/org/apache/predictionio/workflow/JavaQuery.java ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.workflow; import java.io.Serializable; public class JavaQuery implements Serializable{ private final String q; public JavaQuery(String q) { this.q = q; } public String getQ() { return q; } @Override public boolean equals(Object o) { if (this == o) return true; if (o == null || getClass() != o.getClass()) return false; JavaQuery javaQuery = (JavaQuery) o; return !(q != null ? !q.equals(javaQuery.q) : javaQuery.q != null); } @Override public int hashCode() { return q != null ? q.hashCode() : 0; } } ================================================ FILE: core/src/test/java/org/apache/predictionio/workflow/JavaQueryTypeAdapterFactory.java ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.workflow; import com.google.gson.Gson; import com.google.gson.TypeAdapter; import com.google.gson.TypeAdapterFactory; import com.google.gson.reflect.TypeToken; import com.google.gson.stream.JsonReader; import com.google.gson.stream.JsonToken; import com.google.gson.stream.JsonWriter; import java.io.IOException; public class JavaQueryTypeAdapterFactory implements TypeAdapterFactory { @Override public TypeAdapter create(Gson gson, TypeToken type) { if (type.getRawType().equals(JavaQuery.class)) { return (TypeAdapter) new TypeAdapter() { public void write(JsonWriter out, JavaQuery value) throws IOException { if (value == null) { out.nullValue(); } else { out.beginObject(); out.name("q").value(value.getQ().toUpperCase()); out.endObject(); } } public JavaQuery read(JsonReader reader) throws IOException { if (reader.peek() == JsonToken.NULL) { reader.nextNull(); return null; } else { reader.beginObject(); reader.nextName(); String q = reader.nextString(); reader.endObject(); return new JavaQuery(q.toUpperCase()); } } }; } else { return null; } } } ================================================ FILE: core/src/test/scala/org/apache/predictionio/controller/EngineTest.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import org.apache.predictionio.workflow.PersistentModelManifest import org.apache.predictionio.workflow.SharedSparkContext import org.apache.predictionio.workflow.StopAfterPrepareInterruption import org.apache.predictionio.workflow.StopAfterReadInterruption import grizzled.slf4j.Logger import org.apache.predictionio.workflow.WorkflowParams import org.apache.spark.rdd.RDD import org.scalatest.Inspectors._ import org.scalatest.Matchers._ import org.scalatest.FunSuite import org.scalatest.Inside import scala.util.Random class EngineSuite extends FunSuite with Inside with SharedSparkContext { import org.apache.predictionio.controller.Engine0._ @transient lazy val logger = Logger[this.type] test("Engine.train") { val engine = new Engine( classOf[PDataSource2], classOf[PPreparator1], Map("" -> classOf[PAlgo2]), classOf[LServing1]) val engineParams = EngineParams( dataSourceParams = PDataSource2.Params(0), preparatorParams = PPreparator1.Params(1), algorithmParamsList = Seq(("", PAlgo2.Params(2))), servingParams = LServing1.Params(3)) val models = engine.train( sc, engineParams, engineInstanceId = "", params = WorkflowParams()) val pd = ProcessedData(1, TrainingData(0)) // PAlgo2.Model doesn't have IPersistentModel trait implemented. Hence the // model extract after train is Unit. models should contain theSameElementsAs Seq(()) } test("Engine.train persisting PAlgo.Model") { val engine = new Engine( classOf[PDataSource2], classOf[PPreparator1], Map( "PAlgo2" -> classOf[PAlgo2], "PAlgo3" -> classOf[PAlgo3] ), classOf[LServing1]) val engineParams = EngineParams( dataSourceParams = PDataSource2.Params(0), preparatorParams = PPreparator1.Params(1), algorithmParamsList = Seq( ("PAlgo2", PAlgo2.Params(2)), ("PAlgo3", PAlgo3.Params(21)), ("PAlgo3", PAlgo3.Params(22)) ), servingParams = LServing1.Params(3)) val pd = ProcessedData(1, TrainingData(0)) val model21 = PAlgo3.Model(21, pd) val model22 = PAlgo3.Model(22, pd) val models = engine.train( sc, engineParams, engineInstanceId = "", params = WorkflowParams()) val pModel21 = PersistentModelManifest(model21.getClass.getName) val pModel22 = PersistentModelManifest(model22.getClass.getName) models should contain theSameElementsAs Seq((), pModel21, pModel22) } test("Engine.train persisting LAlgo.Model") { val engine = Engine( classOf[LDataSource1], classOf[LPreparator1], Map( "LAlgo1" -> classOf[LAlgo1], "LAlgo2" -> classOf[LAlgo2], "LAlgo3" -> classOf[LAlgo3] ), classOf[LServing1]) val engineParams = EngineParams( dataSourceParams = LDataSource1.Params(0), preparatorParams = LPreparator1.Params(1), algorithmParamsList = Seq( ("LAlgo2", LAlgo2.Params(20)), ("LAlgo2", LAlgo2.Params(21)), ("LAlgo3", LAlgo3.Params(22))), servingParams = LServing1.Params(3)) val pd = ProcessedData(1, TrainingData(0)) val model20 = LAlgo2.Model(20, pd) val model21 = LAlgo2.Model(21, pd) val model22 = LAlgo3.Model(22, pd) //val models = engine.train(sc, engineParams, WorkflowParams()) val models = engine.train( sc, engineParams, engineInstanceId = "", params = WorkflowParams()) val pModel20 = PersistentModelManifest(model20.getClass.getName) val pModel21 = PersistentModelManifest(model21.getClass.getName) models should contain theSameElementsAs Seq(pModel20, pModel21, model22) } test("Engine.train persisting P&NAlgo.Model") { val engine = new Engine( classOf[PDataSource2], classOf[PPreparator1], Map( "PAlgo2" -> classOf[PAlgo2], "PAlgo3" -> classOf[PAlgo3], "NAlgo2" -> classOf[NAlgo2], "NAlgo3" -> classOf[NAlgo3] ), classOf[LServing1]) val engineParams = EngineParams( dataSourceParams = PDataSource2.Params(0), preparatorParams = PPreparator1.Params(1), algorithmParamsList = Seq( ("PAlgo2", PAlgo2.Params(20)), ("PAlgo3", PAlgo3.Params(21)), ("PAlgo3", PAlgo3.Params(22)), ("NAlgo2", NAlgo2.Params(23)), ("NAlgo3", NAlgo3.Params(24)), ("NAlgo3", NAlgo3.Params(25)) ), servingParams = LServing1.Params(3)) val pd = ProcessedData(1, TrainingData(0)) val model21 = PAlgo3.Model(21, pd) val model22 = PAlgo3.Model(22, pd) val model23 = NAlgo2.Model(23, pd) val model24 = NAlgo3.Model(24, pd) val model25 = NAlgo3.Model(25, pd) //val models = engine.train(sc, engineParams, WorkflowParams()) val models = engine.train( sc, engineParams, engineInstanceId = "", params = WorkflowParams()) val pModel21 = PersistentModelManifest(model21.getClass.getName) val pModel22 = PersistentModelManifest(model22.getClass.getName) val pModel23 = PersistentModelManifest(model23.getClass.getName) models should contain theSameElementsAs Seq( (), pModel21, pModel22, pModel23, model24, model25) } test("Engine.eval") { val engine = new Engine( classOf[PDataSource2], classOf[PPreparator1], Map("" -> classOf[PAlgo2]), classOf[LServing1]) val qn = 10 val en = 3 val engineParams = EngineParams( dataSourceParams = PDataSource2.Params(id = 0, en = en, qn = qn), preparatorParams = PPreparator1.Params(1), algorithmParamsList = Seq(("", PAlgo2.Params(2))), servingParams = LServing1.Params(3)) val algoCount = engineParams.algorithmParamsList.size val pd = ProcessedData(1, TrainingData(0)) val model0 = PAlgo2.Model(2, pd) val evalDataSet = engine.eval(sc, engineParams, WorkflowParams()) evalDataSet should have size en forAll(evalDataSet.zipWithIndex) { case (evalData, ex) => { val (evalInfo, qpaRDD) = evalData evalInfo shouldBe EvalInfo(0) val qpaSeq: Seq[(Query, Prediction, Actual)] = qpaRDD.collect qpaSeq should have size qn forAll (qpaSeq) { case (q, p, a) => val Query(qId, qEx, qQx, _) = q val Actual(aId, aEx, aQx) = a qId shouldBe aId qEx shouldBe ex aEx shouldBe ex qQx shouldBe aQx inside (p) { case Prediction(pId, pQ, pModels, pPs) => { pId shouldBe 3 pQ shouldBe q pModels shouldBe None pPs should have size algoCount pPs shouldBe Seq( Prediction(id = 2, q = q, models = Some(model0))) }} } }} } test("Engine.prepareDeploy PAlgo") { val engine = new Engine( classOf[PDataSource2], classOf[PPreparator1], Map( "PAlgo2" -> classOf[PAlgo2], "PAlgo3" -> classOf[PAlgo3], "NAlgo2" -> classOf[NAlgo2], "NAlgo3" -> classOf[NAlgo3] ), classOf[LServing1]) val engineParams = EngineParams( dataSourceParams = PDataSource2.Params(0), preparatorParams = PPreparator1.Params(1), algorithmParamsList = Seq( ("PAlgo2", PAlgo2.Params(20)), ("PAlgo3", PAlgo3.Params(21)), ("PAlgo3", PAlgo3.Params(22)), ("NAlgo2", NAlgo2.Params(23)), ("NAlgo3", NAlgo3.Params(24)), ("NAlgo3", NAlgo3.Params(25)) ), servingParams = LServing1.Params(3)) val pd = ProcessedData(1, TrainingData(0)) val model20 = PAlgo2.Model(20, pd) val model21 = PAlgo3.Model(21, pd) val model22 = PAlgo3.Model(22, pd) val model23 = NAlgo2.Model(23, pd) val model24 = NAlgo3.Model(24, pd) val model25 = NAlgo3.Model(25, pd) val rand = new Random() val fakeEngineInstanceId = s"FakeInstanceId-${rand.nextLong()}" val persistedModels = engine.train( sc, engineParams, engineInstanceId = fakeEngineInstanceId, params = WorkflowParams() ) val deployableModels = engine.prepareDeploy( sc, engineParams, fakeEngineInstanceId, persistedModels, params = WorkflowParams() ) deployableModels should contain theSameElementsAs Seq( model20, model21, model22, model23, model24, model25) } } class EngineTrainSuite extends FunSuite with SharedSparkContext { import org.apache.predictionio.controller.Engine0._ val defaultWorkflowParams: WorkflowParams = WorkflowParams() test("Parallel DS/P/Algos") { val models = Engine.train( sc, new PDataSource0(0), new PPreparator0(1), Seq( new PAlgo0(2), new PAlgo1(3), new PAlgo0(4)), defaultWorkflowParams ) val pd = ProcessedData(1, TrainingData(0)) models should contain theSameElementsAs Seq( PAlgo0.Model(2, pd), PAlgo1.Model(3, pd), PAlgo0.Model(4, pd)) } test("Empty Algos Sequence") { val models = Engine.train( sc, new PDataSource0(0), new PPreparator0(1), Nil, defaultWorkflowParams ) models should not be null } test("Null defaultWorkflowParams") { an [NullPointerException] should be thrownBy Engine.train( sc, new PDataSource0(0), new PPreparator0(1), Seq( new PAlgo0(2), new PAlgo1(3), new PAlgo0(4)), null ) } test("Null Spark Context") { // Shouldn't we check if Spark Context is empty ? val models = Engine.train( null, new PDataSource0(0), new PPreparator0(1), Seq( new PAlgo0(2), new PAlgo1(3), new PAlgo0(4)), defaultWorkflowParams ) val pd = ProcessedData(1, TrainingData(0)) models should contain theSameElementsAs Seq( PAlgo0.Model(2, pd), PAlgo1.Model(3, pd), PAlgo0.Model(4, pd)) } test("Null DataSource") { // Shouldn't we check if Spark Context is empty ? an [NullPointerException] should be thrownBy Engine.train( sc, null, new PPreparator0(1), Seq( new PAlgo0(2), new PAlgo1(3), new PAlgo0(4)), defaultWorkflowParams ) } test("Local DS/P/Algos") { val models = Engine.train( sc, new LDataSource0(0), new LPreparator0(1), Seq( new LAlgo0(2), new LAlgo1(3), new LAlgo0(4)), defaultWorkflowParams ) val pd = ProcessedData(1, TrainingData(0)) val expectedResults = Seq( LAlgo0.Model(2, pd), LAlgo1.Model(3, pd), LAlgo0.Model(4, pd)) forAll(models.zip(expectedResults)) { case (model, expected) => model shouldBe a [RDD[_]] val localModel = model.asInstanceOf[RDD[_]].collect localModel should contain theSameElementsAs Seq(expected) } } test("P2L DS/P/Algos") { val models = Engine.train( sc, new PDataSource0(0), new PPreparator0(1), Seq( new NAlgo0(2), new NAlgo1(3), new NAlgo0(4)), defaultWorkflowParams ) val pd = ProcessedData(1, TrainingData(0)) models should contain theSameElementsAs Seq( NAlgo0.Model(2, pd), NAlgo1.Model(3, pd), NAlgo0.Model(4, pd)) } test("Parallel DS/P/Algos Stop-After-Read") { val workflowParams = defaultWorkflowParams.copy( stopAfterRead = true) an [StopAfterReadInterruption] should be thrownBy Engine.train( sc, new PDataSource0(0), new PPreparator0(1), Seq( new PAlgo0(2), new PAlgo1(3), new PAlgo0(4)), workflowParams ) } test("Parallel DS/P/Algos Stop-After-Prepare") { val workflowParams = defaultWorkflowParams.copy( stopAfterPrepare = true) an [StopAfterPrepareInterruption] should be thrownBy Engine.train( sc, new PDataSource0(0), new PPreparator0(1), Seq( new PAlgo0(2), new PAlgo1(3), new PAlgo0(4)), workflowParams ) } test("Parallel DS/P/Algos Dirty TrainingData") { val workflowParams = defaultWorkflowParams.copy( skipSanityCheck = false) an [AssertionError] should be thrownBy Engine.train( sc, new PDataSource3(0, error = true), new PPreparator0(1), Seq( new PAlgo0(2), new PAlgo1(3), new PAlgo0(4)), workflowParams ) } test("Parallel DS/P/Algos Dirty TrainingData But Skip Check") { val workflowParams = defaultWorkflowParams.copy( skipSanityCheck = true) val models = Engine.train( sc, new PDataSource3(0, error = true), new PPreparator0(1), Seq( new PAlgo0(2), new PAlgo1(3), new PAlgo0(4)), workflowParams ) val pd = ProcessedData(1, TrainingData(0, error = true)) models should contain theSameElementsAs Seq( PAlgo0.Model(2, pd), PAlgo1.Model(3, pd), PAlgo0.Model(4, pd)) } } class EngineEvalSuite extends FunSuite with Inside with SharedSparkContext { import org.apache.predictionio.controller.Engine0._ @transient lazy val logger = Logger[this.type] test("Simple Parallel DS/P/A/S") { val en = 2 val qn = 5 val evalDataSet: Seq[(EvalInfo, RDD[(Query, Prediction, Actual)])] = Engine.eval( sc, new PDataSource1(id = 1, en = en, qn = qn), new PPreparator0(id = 2), Seq(new PAlgo0(id = 3)), new LServing0(id = 10)) val pd = ProcessedData(2, TrainingData(1)) val model0 = PAlgo0.Model(3, pd) forAll(evalDataSet.zipWithIndex) { case (evalData, ex) => { val (evalInfo, qpaRDD) = evalData evalInfo shouldBe EvalInfo(1) val qpaSeq: Seq[(Query, Prediction, Actual)] = qpaRDD.collect forAll (qpaSeq) { case (q, p, a) => val Query(qId, qEx, qQx, _) = q val Actual(aId, aEx, aQx) = a qId shouldBe aId qEx shouldBe ex aEx shouldBe ex qQx shouldBe aQx inside (p) { case Prediction(pId, pQ, pModels, pPs) => { pId shouldBe 10 pQ shouldBe q pModels shouldBe None pPs should have size 1 pPs shouldBe Seq( Prediction(id = 3, q = q, models = Some(model0))) }} } }} } test("Parallel DS/P/A/S") { val en = 2 val qn = 5 val evalDataSet: Seq[(EvalInfo, RDD[(Query, Prediction, Actual)])] = Engine.eval( sc, new PDataSource1(id = 1, en = en, qn = qn), new PPreparator0(id = 2), Seq( new PAlgo0(id = 3), new PAlgo1(id = 4), new NAlgo1(id = 5)), new LServing0(id = 10)) val pd = ProcessedData(2, TrainingData(1)) val model0 = PAlgo0.Model(3, pd) val model1 = PAlgo1.Model(4, pd) val model2 = NAlgo1.Model(5, pd) forAll(evalDataSet.zipWithIndex) { case (evalData, ex) => { val (evalInfo, qpaRDD) = evalData evalInfo shouldBe EvalInfo(1) val qpaSeq: Seq[(Query, Prediction, Actual)] = qpaRDD.collect forAll (qpaSeq) { case (q, p, a) => val Query(qId, qEx, qQx, _) = q val Actual(aId, aEx, aQx) = a qId shouldBe aId qEx shouldBe ex aEx shouldBe ex qQx shouldBe aQx inside (p) { case Prediction(pId, pQ, pModels, pPs) => { pId shouldBe 10 pQ shouldBe q pModels shouldBe None pPs should have size 3 pPs shouldBe Seq( Prediction(id = 3, q = q, models = Some(model0)), Prediction(id = 4, q = q, models = Some(model1)), Prediction(id = 5, q = q, models = Some(model2)) ) }} } }} } test("Parallel DS/P/A/S with Supplemented Query") { val en = 2 val qn = 5 val evalDataSet: Seq[(EvalInfo, RDD[(Query, Prediction, Actual)])] = Engine.eval( sc, new PDataSource1(id = 1, en = en, qn = qn), new PPreparator0(id = 2), Seq( new PAlgo0(id = 3), new PAlgo1(id = 4), new NAlgo1(id = 5)), new LServing2(id = 10)) val pd = ProcessedData(2, TrainingData(1)) val model0 = PAlgo0.Model(3, pd) val model1 = PAlgo1.Model(4, pd) val model2 = NAlgo1.Model(5, pd) forAll(evalDataSet.zipWithIndex) { case (evalData, ex) => { val (evalInfo, qpaRDD) = evalData evalInfo shouldBe EvalInfo(1) val qpaSeq: Seq[(Query, Prediction, Actual)] = qpaRDD.collect forAll (qpaSeq) { case (q, p, a) => val Query(qId, qEx, qQx, qSupp) = q val Actual(aId, aEx, aQx) = a qId shouldBe aId qEx shouldBe ex aEx shouldBe ex qQx shouldBe aQx qSupp shouldBe false inside (p) { case Prediction(pId, pQ, pModels, pPs) => { pId shouldBe 10 pQ shouldBe q pModels shouldBe None pPs should have size 3 // queries inside prediction should have supp set to true, since it // represents what the algorithms see. val qSupp = q.copy(supp = true) pPs shouldBe Seq( Prediction(id = 3, q = qSupp, models = Some(model0)), Prediction(id = 4, q = qSupp, models = Some(model1)), Prediction(id = 5, q = qSupp, models = Some(model2)) ) }} } }} } test("Local DS/P/A/S") { val en = 2 val qn = 5 val evalDataSet: Seq[(EvalInfo, RDD[(Query, Prediction, Actual)])] = Engine.eval( sc, new LDataSource0(id = 1, en = en, qn = qn), new LPreparator0(id = 2), Seq( new LAlgo0(id = 3), new LAlgo1(id = 4), new LAlgo1(id = 5)), new LServing0(id = 10)) val pd = ProcessedData(2, TrainingData(1)) val model0 = LAlgo0.Model(3, pd) val model1 = LAlgo1.Model(4, pd) val model2 = LAlgo1.Model(5, pd) forAll(evalDataSet.zipWithIndex) { case (evalData, ex) => { val (evalInfo, qpaRDD) = evalData evalInfo shouldBe EvalInfo(1) val qpaSeq: Seq[(Query, Prediction, Actual)] = qpaRDD.collect forAll (qpaSeq) { case (q, p, a) => val Query(qId, qEx, qQx, _) = q val Actual(aId, aEx, aQx) = a qId shouldBe aId qEx shouldBe ex aEx shouldBe ex qQx shouldBe aQx inside (p) { case Prediction(pId, pQ, pModels, pPs) => { pId shouldBe 10 pQ shouldBe q pModels shouldBe None pPs should have size 3 pPs shouldBe Seq( Prediction(id = 3, q = q, models = Some(model0)), Prediction(id = 4, q = q, models = Some(model1)), Prediction(id = 5, q = q, models = Some(model2)) ) }} } }} } } ================================================ FILE: core/src/test/scala/org/apache/predictionio/controller/EvaluationTest.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import org.apache.predictionio.workflow.{SharedSparkContext, SharedStorageContext} import org.scalatest.FunSuite import org.scalatest.Inside import org.scalatest.Matchers._ import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD object EvaluationSuite { import org.apache.predictionio.controller.TestEvaluator._ class Metric0 extends Metric[EvalInfo, Query, Prediction, Actual, Int] { def calculate( sc: SparkContext, evalDataSet: Seq[(EvalInfo, RDD[(Query, Prediction, Actual)])]): Int = 1 } object Evaluation0 extends Evaluation { engineMetric = (new FakeEngine(1, 1, 1), new Metric0()) } } class EvaluationSuite extends FunSuite with Inside with SharedSparkContext with SharedStorageContext { import org.apache.predictionio.controller.EvaluationSuite._ test("Evaluation makes MetricEvaluator") { // MetricEvaluator is typed [EvalInfo, Query, Prediction, Actual, Int], // however this information is erased on JVM. scalatest doc recommends to // use wildcards. Evaluation0.evaluator shouldBe a [MetricEvaluator[_, _, _, _, _]] } test("Load from class path") { val r = org.apache.predictionio.workflow.WorkflowUtils.getEvaluation( "org.apache.predictionio.controller.EvaluationSuite.Evaluation0", getClass.getClassLoader) r._2 shouldBe EvaluationSuite.Evaluation0 } } ================================================ FILE: core/src/test/scala/org/apache/predictionio/controller/EvaluatorTest.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import org.apache.predictionio.core._ import org.apache.predictionio.workflow.WorkflowParams import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD object TestEvaluator { case class EvalInfo(id: Int, ex: Int) case class Query(id: Int, ex: Int, qx: Int) case class Prediction(id: Int, ex: Int, qx: Int) case class Actual(id: Int, ex: Int, qx: Int) class FakeEngine(val id: Int, val en: Int, val qn: Int) extends BaseEngine[EvalInfo, Query, Prediction, Actual] { def train( sc: SparkContext, engineParams: EngineParams, instanceId: String = "", params: WorkflowParams = WorkflowParams() ): Seq[Any] = { Seq[Any]() } def eval( sc: SparkContext, engineParams: EngineParams, params: WorkflowParams) : Seq[(EvalInfo, RDD[(Query, Prediction, Actual)])] = { (0 until en).map { ex => { val qpas = (0 until qn).map { qx => { (Query(id, ex, qx), Prediction(id, ex, qx), Actual(id, ex, qx)) }} (EvalInfo(id = id, ex = ex), sc.parallelize(qpas)) }} } } /* class Evaluator0 extends Evaluator[EvalInfo, Query, Prediction, Actual, (Query, Prediction, Actual), (EvalInfo, Seq[(Query, Prediction, Actual)]), Seq[(EvalInfo, (EvalInfo, Seq[(Query, Prediction, Actual)]))] ] { def evaluateUnit(q: Query, p: Prediction, a: Actual) : (Query, Prediction, Actual) = (q, p, a) def evaluateSet( evalInfo: EvalInfo, eus: Seq[(Query, Prediction, Actual)]) : (EvalInfo, Seq[(Query, Prediction, Actual)]) = (evalInfo, eus) def evaluateAll( input: Seq[(EvalInfo, (EvalInfo, Seq[(Query, Prediction, Actual)]))]) = input } */ } /* class EvaluatorSuite extends FunSuite with Inside with SharedSparkContext { import org.apache.predictionio.controller.TestEvaluator._ @transient lazy val logger = Logger[this.type] test("Evaluator.evaluate") { val engine = new FakeEngine(1, 3, 10) val evaluator = new Evaluator0() val evalDataSet = engine.eval(sc, null.asInstanceOf[EngineParams]) val er: Seq[(EvalInfo, (EvalInfo, Seq[(Query, Prediction, Actual)]))] = evaluator.evaluateBase(sc, evalDataSet) evalDataSet.zip(er).map { case (input, output) => { val (inputEvalInfo, inputQpaRDD) = input val (outputEvalInfo, (outputEvalInfo2, outputQpaSeq)) = output inputEvalInfo shouldBe outputEvalInfo inputEvalInfo shouldBe outputEvalInfo2 val inputQpaSeq: Array[(Query, Prediction, Actual)] = inputQpaRDD.collect inputQpaSeq.size should be (outputQpaSeq.size) // TODO. match inputQpa and outputQpa content. }} } } */ ================================================ FILE: core/src/test/scala/org/apache/predictionio/controller/FastEvalEngineTest.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import org.apache.predictionio.workflow.WorkflowParams import org.scalatest.FunSuite import org.scalatest.Inside import org.scalatest.Matchers._ import org.scalatest.Inspectors._ import org.apache.predictionio.workflow.SharedSparkContext class FastEngineSuite extends FunSuite with Inside with SharedSparkContext { import org.apache.predictionio.controller.Engine0._ test("Single Evaluation") { val engine = new FastEvalEngine( Map("" -> classOf[PDataSource2]), Map("" -> classOf[PPreparator1]), Map( "PAlgo2" -> classOf[PAlgo2], "PAlgo3" -> classOf[PAlgo3] ), Map("" -> classOf[LServing1])) val qn = 10 val en = 3 val engineParams = EngineParams( dataSourceParams = PDataSource2.Params(id = 0, en = en, qn = qn), preparatorParams = PPreparator1.Params(1), algorithmParamsList = Seq( ("PAlgo2", PAlgo2.Params(20)), ("PAlgo2", PAlgo2.Params(21)), ("PAlgo3", PAlgo3.Params(22)) ), servingParams = LServing1.Params(3)) val algoCount = engineParams.algorithmParamsList.size val pd = ProcessedData(1, TrainingData(0)) val model0 = PAlgo2.Model(20, pd) val model1 = PAlgo2.Model(21, pd) val model2 = PAlgo3.Model(22, pd) val evalDataSet = engine.eval(sc, engineParams, WorkflowParams()) evalDataSet should have size en forAll(evalDataSet.zipWithIndex) { case (evalData, ex) => { val (evalInfo, qpaRDD) = evalData evalInfo shouldBe EvalInfo(0) val qpaSeq: Seq[(Query, Prediction, Actual)] = qpaRDD.collect qpaSeq should have size qn forAll (qpaSeq) { case (q, p, a) => val Query(qId, qEx, qQx, _) = q val Actual(aId, aEx, aQx) = a qId shouldBe aId qEx shouldBe ex aEx shouldBe ex qQx shouldBe aQx inside (p) { case Prediction(pId, pQ, pModels, pPs) => { pId shouldBe 3 pQ shouldBe q pModels shouldBe None pPs should have size algoCount pPs shouldBe Seq( Prediction(id = 20, q = q, models = Some(model0)), Prediction(id = 21, q = q, models = Some(model1)), Prediction(id = 22, q = q, models = Some(model2)) ) }} } }} } test("Batch Evaluation") { val engine = new FastEvalEngine( Map("" -> classOf[PDataSource2]), Map("" -> classOf[PPreparator1]), Map("" -> classOf[PAlgo2]), Map("" -> classOf[LServing1])) val qn = 10 val en = 3 val baseEngineParams = EngineParams( dataSourceParams = PDataSource2.Params(id = 0, en = en, qn = qn), preparatorParams = PPreparator1.Params(1), algorithmParamsList = Seq(("", PAlgo2.Params(2))), servingParams = LServing1.Params(3)) val ep0 = baseEngineParams val ep1 = baseEngineParams.copy( algorithmParamsList = Seq(("", PAlgo2.Params(2)))) val ep2 = baseEngineParams.copy( algorithmParamsList = Seq(("", PAlgo2.Params(20)))) val engineEvalDataSet = engine.batchEval( sc, Seq(ep0, ep1, ep2), WorkflowParams()) val evalDataSet0 = engineEvalDataSet(0)._2 val evalDataSet1 = engineEvalDataSet(1)._2 val evalDataSet2 = engineEvalDataSet(2)._2 evalDataSet0 shouldBe evalDataSet1 evalDataSet0 should not be evalDataSet2 evalDataSet1 should not be evalDataSet2 // evalDataSet0._1 should be theSameInstanceAs evalDataSet1._1 // When things are cached correctly, evalDataSet0 and 1 should share the // same EI evalDataSet0.zip(evalDataSet1).foreach { case (e0, e1) => { e0._1 should be theSameInstanceAs e1._1 e0._2 should be theSameInstanceAs e1._2 }} // So as set1 and set2, however, the QPA-RDD should be different. evalDataSet1.zip(evalDataSet2).foreach { case (e1, e2) => { e1._1 should be theSameInstanceAs e2._1 val e1Qpa = e1._2 val e2Qpa = e2._2 e1Qpa should not be theSameInstanceAs (e2Qpa) }} } test("Not cached when isEqual not implemented") { // PDataSource3.Params is a class not case class. Need to implement the // isEqual function for hashing. val engine = new FastEvalEngine( Map("" -> classOf[PDataSource4]), Map("" -> classOf[PPreparator1]), Map("" -> classOf[PAlgo2]), Map("" -> classOf[LServing1])) val qn = 10 val en = 3 val baseEngineParams = EngineParams( dataSourceParams = new PDataSource4.Params(id = 0, en = en, qn = qn), preparatorParams = PPreparator1.Params(1), algorithmParamsList = Seq(("", PAlgo2.Params(2))), servingParams = LServing1.Params(3)) val ep0 = baseEngineParams val ep1 = baseEngineParams.copy( algorithmParamsList = Seq(("", PAlgo2.Params(3)))) // ep2.dataSource is different from ep0. val ep2 = baseEngineParams.copy( dataSourceParams = ("", new PDataSource4.Params(id = 0, en = en, qn = qn)), algorithmParamsList = Seq(("", PAlgo2.Params(3)))) val engineEvalDataSet = engine.batchEval( sc, Seq(ep0, ep1, ep2), WorkflowParams()) val evalDataSet0 = engineEvalDataSet(0)._2 val evalDataSet1 = engineEvalDataSet(1)._2 val evalDataSet2 = engineEvalDataSet(2)._2 evalDataSet0 should not be evalDataSet1 evalDataSet0 should not be evalDataSet2 evalDataSet1 should not be evalDataSet2 // Set0 should have same EI as Set1, since their dsp are the same instance. evalDataSet0.zip(evalDataSet1).foreach { case (e0, e1) => { e0._1 should be theSameInstanceAs (e1._1) }} // Set1 should have different EI as Set2, since Set2's dsp is another // instance evalDataSet1.zip(evalDataSet2).foreach { case (e1, e2) => { e1._1 should not be theSameInstanceAs (e2._1) }} } } ================================================ FILE: core/src/test/scala/org/apache/predictionio/controller/MetricEvaluatorTest.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import org.apache.predictionio.workflow.{SharedSparkContext, SharedStorageContext, WorkflowParams} import org.scalatest.FunSuite object MetricEvaluatorSuite { case class Metric0() extends SumMetric[EmptyParams, Int, Int, Int, Int] { def calculate(q: Int, p: Int, a: Int): Int = q } object Evaluation0 extends Evaluation {} } class MetricEvaluatorDevSuite extends FunSuite with SharedSparkContext with SharedStorageContext { import org.apache.predictionio.controller.MetricEvaluatorSuite._ test("a") { val metricEvaluator = MetricEvaluator( Metric0(), Seq(Metric0(), Metric0()) ) val engineEvalDataSet = Seq( (EngineParams(), Seq( (EmptyParams(), sc.parallelize(Seq((1,0,0), (2,0,0)))))), (EngineParams(), Seq( (EmptyParams(), sc.parallelize(Seq((1,0,0), (2,0,0))))))) val r = metricEvaluator.evaluateBase( sc, Evaluation0, engineEvalDataSet, WorkflowParams()) } } ================================================ FILE: core/src/test/scala/org/apache/predictionio/controller/MetricTest.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import org.apache.predictionio.workflow.SharedSparkContext import grizzled.slf4j.Logger import org.scalatest.Matchers._ import org.scalatest.FunSuite import org.scalatest.Inside object MetricDevSuite { class QIntSumMetric extends SumMetric[EmptyParams, Int, Int, Int, Int] { def calculate(q: Int, p: Int, a: Int): Int = q } class QDoubleSumMetric extends SumMetric[EmptyParams, Int, Int, Int, Double] { def calculate(q: Int, p: Int, a: Int): Double = q.toDouble } class QAverageMetric extends AverageMetric[EmptyParams, Int, Int, Int] { def calculate(q: Int, p: Int, a: Int): Double = q.toDouble } class QOptionAverageMetric extends OptionAverageMetric[EmptyParams, Int, Int, Int] { def calculate(q: Int, p: Int, a: Int): Option[Double] = { if (q < 0) { None } else { Some(q.toDouble) } } } class QStdevMetric extends StdevMetric[EmptyParams, Int, Int, Int] { def calculate(q: Int, p: Int, a: Int): Double = q.toDouble } class QOptionStdevMetric extends OptionStdevMetric[EmptyParams, Int, Int, Int] { def calculate(q: Int, p: Int, a: Int): Option[Double] = { if (q < 0) { None } else { Some(q.toDouble) } } } } class MetricDevSuite extends FunSuite with Inside with SharedSparkContext { @transient lazy val logger = Logger[this.type] test("Average Metric") { val qpaSeq0 = Seq((1, 0, 0), (2, 0, 0), (3, 0, 0)) val qpaSeq1 = Seq((4, 0, 0), (5, 0, 0), (6, 0, 0)) val evalDataSet = Seq( (EmptyParams(), sc.parallelize(qpaSeq0)), (EmptyParams(), sc.parallelize(qpaSeq1))) val m = new MetricDevSuite.QAverageMetric() val result = m.calculate(sc, evalDataSet) result shouldBe (21.0 / 6) } test("Option Average Metric") { val qpaSeq0 = Seq((1, 0, 0), (2, 0, 0), (3, 0, 0)) val qpaSeq1 = Seq((-4, 0, 0), (-5, 0, 0), (6, 0, 0)) val evalDataSet = Seq( (EmptyParams(), sc.parallelize(qpaSeq0)), (EmptyParams(), sc.parallelize(qpaSeq1))) val m = new MetricDevSuite.QOptionAverageMetric() val result = m.calculate(sc, evalDataSet) result shouldBe (12.0 / 4) } test("Stdev Metric") { val qpaSeq0 = Seq((1, 0, 0), (1, 0, 0), (1, 0, 0), (1, 0, 0)) val qpaSeq1 = Seq((5, 0, 0), (5, 0, 0), (5, 0, 0), (5, 0, 0)) val evalDataSet = Seq( (EmptyParams(), sc.parallelize(qpaSeq0)), (EmptyParams(), sc.parallelize(qpaSeq1))) val m = new MetricDevSuite.QStdevMetric() val result = m.calculate(sc, evalDataSet) result shouldBe 2.0 } test("Option Stdev Metric") { val qpaSeq0 = Seq((1, 0, 0), (1, 0, 0), (1, 0, 0), (1, 0, 0)) val qpaSeq1 = Seq((5, 0, 0), (5, 0, 0), (5, 0, 0), (5, 0, 0), (-5, 0, 0)) val evalDataSet = Seq( (EmptyParams(), sc.parallelize(qpaSeq0)), (EmptyParams(), sc.parallelize(qpaSeq1))) val m = new MetricDevSuite.QOptionStdevMetric() val result = m.calculate(sc, evalDataSet) result shouldBe 2.0 } test("Sum Metric [Int]") { val qpaSeq0 = Seq((1, 0, 0), (2, 0, 0), (3, 0, 0)) val qpaSeq1 = Seq((4, 0, 0), (5, 0, 0), (6, 0, 0)) val evalDataSet = Seq( (EmptyParams(), sc.parallelize(qpaSeq0)), (EmptyParams(), sc.parallelize(qpaSeq1))) val m = new MetricDevSuite.QIntSumMetric() val result = m.calculate(sc, evalDataSet) result shouldBe 21 } test("Sum Metric [Double]") { val qpaSeq0 = Seq((1, 0, 0), (2, 0, 0), (3, 0, 0)) val qpaSeq1 = Seq((4, 0, 0), (5, 0, 0), (6, 0, 0)) val evalDataSet = Seq( (EmptyParams(), sc.parallelize(qpaSeq0)), (EmptyParams(), sc.parallelize(qpaSeq1))) val m = new MetricDevSuite.QDoubleSumMetric() val result = m.calculate(sc, evalDataSet) result shouldBe 21.0 } } ================================================ FILE: core/src/test/scala/org/apache/predictionio/controller/SampleEngine.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.controller import org.apache.predictionio.controller.{Params => PIOParams} import org.apache.predictionio.core._ import grizzled.slf4j.Logger import org.apache.predictionio.workflow.WorkflowParams import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.rdd.RDD object Engine0 { @transient lazy val logger = Logger[this.type] case class TrainingData(id: Int, error: Boolean = false) extends SanityCheck { def sanityCheck(): Unit = { Predef.assert(!error, "Not Error") } } case class EvalInfo(id: Int) case class ProcessedData(id: Int, td: TrainingData) case class Query(id: Int, ex: Int = 0, qx: Int = 0, supp: Boolean = false) case class Actual(id: Int, ex: Int = 0, qx: Int = 0) case class Prediction( id: Int, q: Query, models: Option[Any] = None, ps: Seq[Prediction] = Seq[Prediction]()) class PDataSource0(id: Int = 0) extends PDataSource[TrainingData, EvalInfo, Query, Actual] { def readTraining(sc: SparkContext): TrainingData = { TrainingData(id) } } class PDataSource1(id: Int = 0, en: Int = 0, qn: Int = 0) extends PDataSource[TrainingData, EvalInfo, Query, Actual] { def readTraining(sc: SparkContext): TrainingData = TrainingData(id) override def readEval(sc: SparkContext) : Seq[(TrainingData, EvalInfo, RDD[(Query, Actual)])] = { (0 until en).map { ex => { val qaSeq: Seq[(Query, Actual)] = (0 until qn).map { qx => { (Query(id, ex=ex, qx=qx), Actual(id, ex, qx)) }} (TrainingData(id), EvalInfo(id), sc.parallelize(qaSeq)) }} } } object PDataSource2 { case class Params(id: Int, en: Int = 0, qn: Int = 0) extends PIOParams } class PDataSource2(params: PDataSource2.Params) extends PDataSource[TrainingData, EvalInfo, Query, Actual] { val id = params.id def readTraining(sc: SparkContext): TrainingData = TrainingData(id) override def readEval(sc: SparkContext) : Seq[(TrainingData, EvalInfo, RDD[(Query, Actual)])] = { (0 until params.en).map { ex => { val qaSeq: Seq[(Query, Actual)] = (0 until params.qn).map { qx => { (Query(id, ex=ex, qx=qx), Actual(id, ex, qx)) }} (TrainingData(id), EvalInfo(id), sc.parallelize(qaSeq)) }} } } class PDataSource3(id: Int = 0, error: Boolean = false) extends PDataSource[TrainingData, EvalInfo, Query, Actual] { def readTraining(sc: SparkContext): TrainingData = { TrainingData(id = id, error = error) } } object PDataSource4 { class Params(val id: Int, val en: Int = 0, val qn: Int = 0) extends PIOParams } class PDataSource4(params: PDataSource4.Params) extends PDataSource[TrainingData, EvalInfo, Query, Actual] { val id = params.id def readTraining(sc: SparkContext): TrainingData = TrainingData(id) override def readEval(sc: SparkContext) : Seq[(TrainingData, EvalInfo, RDD[(Query, Actual)])] = { (0 until params.en).map { ex => { val qaSeq: Seq[(Query, Actual)] = (0 until params.qn).map { qx => { (Query(id, ex=ex, qx=qx), Actual(id, ex, qx)) }} (TrainingData(id), EvalInfo(id), sc.parallelize(qaSeq)) }} } } class LDataSource0(id: Int, en: Int = 0, qn: Int = 0) extends LDataSource[TrainingData, EvalInfo, Query, Actual] { def readTraining(): TrainingData = TrainingData(id) override def readEval() : Seq[(TrainingData, EvalInfo, Seq[(Query, Actual)])] = { (0 until en).map { ex => { val qaSeq: Seq[(Query, Actual)] = (0 until qn).map { qx => { (Query(id, ex=ex, qx=qx), Actual(id, ex, qx)) }} (TrainingData(id), EvalInfo(id), qaSeq) }} } } object LDataSource1 { case class Params(id: Int, en: Int = 0, qn: Int = 0) extends PIOParams } class LDataSource1(params: LDataSource1.Params) extends LDataSource[TrainingData, EvalInfo, Query, Actual] { val id = params.id def readTraining(): TrainingData = TrainingData(id) override def readEval(): Seq[(TrainingData, EvalInfo, Seq[(Query, Actual)])] = { (0 until params.en).map { ex => { val qaSeq: Seq[(Query, Actual)] = (0 until params.qn).map { qx => { (Query(id, ex=ex, qx=qx), Actual(id, ex, qx)) }} (TrainingData(id), EvalInfo(id), qaSeq) }} } } class PPreparator0(id: Int = 0) extends PPreparator[TrainingData, ProcessedData] { def prepare(sc: SparkContext, td: TrainingData): ProcessedData = { ProcessedData(id, td) } } object PPreparator1 { case class Params(id: Int = 0) extends PIOParams } class PPreparator1(params: PPreparator1.Params) extends PPreparator[TrainingData, ProcessedData] { def prepare(sc: SparkContext, td: TrainingData): ProcessedData = { ProcessedData(params.id, td) } } class LPreparator0(id: Int = 0) extends LPreparator[TrainingData, ProcessedData] { def prepare(td: TrainingData): ProcessedData = { ProcessedData(id, td) } } object LPreparator1 { case class Params(id: Int = 0) extends PIOParams } class LPreparator1(params: LPreparator1.Params) extends LPreparator[TrainingData, ProcessedData] { def prepare(td: TrainingData): ProcessedData = { ProcessedData(params.id, td) } } object PAlgo0 { case class Model(id: Int, pd: ProcessedData) } class PAlgo0(id: Int = 0) extends PAlgorithm[ProcessedData, PAlgo0.Model, Query, Prediction] { def train(sc: SparkContext, pd: ProcessedData) : PAlgo0.Model = PAlgo0.Model(id, pd) override def batchPredict(m: PAlgo0.Model, qs: RDD[(Long, Query)]) : RDD[(Long, Prediction)] = { qs.mapValues(q => Prediction(id, q, Some(m))) } def predict(m: PAlgo0.Model, q: Query): Prediction = { Prediction(id, q, Some(m)) } } object PAlgo1 { case class Model(id: Int, pd: ProcessedData) } class PAlgo1(id: Int = 0) extends PAlgorithm[ProcessedData, PAlgo1.Model, Query, Prediction] { def train(sc: SparkContext, pd: ProcessedData) : PAlgo1.Model = PAlgo1.Model(id, pd) override def batchPredict(m: PAlgo1.Model, qs: RDD[(Long, Query)]) : RDD[(Long, Prediction)] = { qs.mapValues(q => Prediction(id, q, Some(m))) } def predict(m: PAlgo1.Model, q: Query): Prediction = { Prediction(id, q, Some(m)) } } object PAlgo2 { case class Model(id: Int, pd: ProcessedData) case class Params(id: Int) extends PIOParams } class PAlgo2(params: PAlgo2.Params) extends PAlgorithm[ProcessedData, PAlgo2.Model, Query, Prediction] { val id = params.id def train(sc: SparkContext, pd: ProcessedData) : PAlgo2.Model = PAlgo2.Model(id, pd) override def batchPredict(m: PAlgo2.Model, qs: RDD[(Long, Query)]) : RDD[(Long, Prediction)] = { qs.mapValues(q => Prediction(id, q, Some(m))) } def predict(m: PAlgo2.Model, q: Query): Prediction = { Prediction(id, q, Some(m)) } } object PAlgo3 { case class Model(id: Int, pd: ProcessedData) extends LocalFileSystemPersistentModel[Params] object Model extends LocalFileSystemPersistentModelLoader[Params, Model] case class Params(id: Int) extends PIOParams } class PAlgo3(params: PAlgo3.Params) extends PAlgorithm[ProcessedData, PAlgo3.Model, Query, Prediction] { val id = params.id def train(sc: SparkContext, pd: ProcessedData) : PAlgo3.Model = PAlgo3.Model(id, pd) override def batchPredict(m: PAlgo3.Model, qs: RDD[(Long, Query)]) : RDD[(Long, Prediction)] = { qs.mapValues(q => Prediction(id, q, Some(m))) } def predict(m: PAlgo3.Model, q: Query): Prediction = { Prediction(id, q, Some(m)) } } object LAlgo0 { case class Model(id: Int, pd: ProcessedData) } class LAlgo0(id: Int = 0) extends LAlgorithm[ProcessedData, LAlgo0.Model, Query, Prediction] { def train(pd: ProcessedData): LAlgo0.Model = LAlgo0.Model(id, pd) def predict(m: LAlgo0.Model, q: Query): Prediction = { Prediction(id, q, Some(m)) } } object LAlgo1 { case class Model(id: Int, pd: ProcessedData) } class LAlgo1(id: Int = 0) extends LAlgorithm[ProcessedData, LAlgo1.Model, Query, Prediction] { def train(pd: ProcessedData): LAlgo1.Model = LAlgo1.Model(id, pd) def predict(m: LAlgo1.Model, q: Query): Prediction = { Prediction(id, q, Some(m)) } } object LAlgo2 { case class Params(id: Int) extends PIOParams case class Model(id: Int, pd: ProcessedData) extends LocalFileSystemPersistentModel[EmptyParams] object Model extends LocalFileSystemPersistentModelLoader[EmptyParams, Model] } class LAlgo2(params: LAlgo2.Params) extends LAlgorithm[ProcessedData, LAlgo2.Model, Query, Prediction] { def train(pd: ProcessedData): LAlgo2.Model = LAlgo2.Model(params.id, pd) def predict(m: LAlgo2.Model, q: Query): Prediction = { Prediction(params.id, q, Some(m)) } } object LAlgo3 { case class Params(id: Int) extends PIOParams case class Model(id: Int, pd: ProcessedData) } class LAlgo3(params: LAlgo3.Params) extends LAlgorithm[ProcessedData, LAlgo3.Model, Query, Prediction] { def train(pd: ProcessedData): LAlgo3.Model = LAlgo3.Model(params.id, pd) def predict(m: LAlgo3.Model, q: Query): Prediction = { Prediction(params.id, q, Some(m)) } } // N : P2L. As N is in the middle of P and L. object NAlgo0 { case class Model(id: Int, pd: ProcessedData) } class NAlgo0 (id: Int = 0) extends P2LAlgorithm[ProcessedData, NAlgo0.Model, Query, Prediction] { def train(sc: SparkContext, pd: ProcessedData) : NAlgo0.Model = NAlgo0.Model(id, pd) def predict(m: NAlgo0.Model, q: Query): Prediction = { Prediction(id, q, Some(m)) } } object NAlgo1 { case class Model(id: Int, pd: ProcessedData) } class NAlgo1 (id: Int = 0) extends P2LAlgorithm[ProcessedData, NAlgo1.Model, Query, Prediction] { def train(sc: SparkContext, pd: ProcessedData) : NAlgo1.Model = NAlgo1.Model(id, pd) def predict(m: NAlgo1.Model, q: Query): Prediction = { Prediction(id, q, Some(m)) } } object NAlgo2 { case class Params(id: Int) extends PIOParams case class Model(id: Int, pd: ProcessedData) extends LocalFileSystemPersistentModel[EmptyParams] object Model extends LocalFileSystemPersistentModelLoader[EmptyParams, Model] } class NAlgo2(params: NAlgo2.Params) extends P2LAlgorithm[ProcessedData, NAlgo2.Model, Query, Prediction] { def train(sc: SparkContext, pd: ProcessedData) : NAlgo2.Model = NAlgo2.Model(params.id, pd) def predict(m: NAlgo2.Model, q: Query): Prediction = { Prediction(params.id, q, Some(m)) } } object NAlgo3 { case class Params(id: Int) extends PIOParams case class Model(id: Int, pd: ProcessedData) } class NAlgo3(params: NAlgo3.Params) extends P2LAlgorithm[ProcessedData, NAlgo3.Model, Query, Prediction] { def train(sc: SparkContext, pd: ProcessedData) : NAlgo3.Model = NAlgo3.Model(params.id, pd) def predict(m: NAlgo3.Model, q: Query): Prediction = { Prediction(params.id, q, Some(m)) } } class LServing0(id: Int = 0) extends LServing[Query, Prediction] { def serve(q: Query, ps: Seq[Prediction]): Prediction = { Prediction(id, q, ps=ps) } } object LServing1 { case class Params(id: Int) extends PIOParams } class LServing1(params: LServing1.Params) extends LServing[Query, Prediction] { def serve(q: Query, ps: Seq[Prediction]): Prediction = { Prediction(params.id, q, ps=ps) } } class LServing2(id: Int) extends LServing[Query, Prediction] { override def supplement(q: Query): Query = q.copy(supp = true) def serve(q: Query, ps: Seq[Prediction]): Prediction = { Prediction(id, q, ps=ps) } } } object Engine1 { case class EvalInfo(v: Double) extends Serializable case class Query() extends Serializable case class Prediction() extends Serializable case class Actual() extends Serializable case class DSP(v: Double) extends Params } class Engine1 extends BaseEngine[ Engine1.EvalInfo, Engine1.Query, Engine1.Prediction, Engine1.Actual] { def train( sc: SparkContext, engineParams: EngineParams, engineInstanceId: String = "", params: WorkflowParams = WorkflowParams()): Seq[Any] = Seq[Any]() def eval(sc: SparkContext, engineParams: EngineParams, params: WorkflowParams) : Seq[(Engine1.EvalInfo, RDD[(Engine1.Query, Engine1.Prediction, Engine1.Actual)])] = { val dsp = engineParams.dataSourceParams._2.asInstanceOf[Engine1.DSP] Seq( (Engine1.EvalInfo(dsp.v), sc.emptyRDD[(Engine1.Query, Engine1.Prediction, Engine1.Actual)])) } } class Metric0 extends Metric[Engine1.EvalInfo, Engine1.Query, Engine1.Prediction, Engine1.Actual, Double] { override def header: String = "Metric0" def calculate( sc: SparkContext, evalDataSet: Seq[(Engine1.EvalInfo, RDD[(Engine1.Query, Engine1.Prediction, Engine1.Actual)])]): Double = { evalDataSet.head._1.v } } object Metric1 { case class Result(c: Int, v: Double) extends Serializable } class Metric1 extends Metric[Engine1.EvalInfo, Engine1.Query, Engine1.Prediction, Engine1.Actual, Metric1.Result]()(Ordering.by[Metric1.Result, Double](_.v)) { override def header: String = "Metric1" def calculate( sc: SparkContext, evalDataSet: Seq[(Engine1.EvalInfo, RDD[(Engine1.Query, Engine1.Prediction, Engine1.Actual)])]): Metric1.Result = { Metric1.Result(0, evalDataSet.head._1.v) } } ================================================ FILE: core/src/test/scala/org/apache/predictionio/core/SelfCleaningDataSourceTest.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.core import org.apache.predictionio.core.SelfCleaningDataSource import org.apache.predictionio.core.EventWindow import org.apache.predictionio.workflow.SharedSparkContext import org.apache.predictionio.controller.PDataSource import org.apache.predictionio.controller.EmptyEvaluationInfo import org.apache.predictionio.controller.EmptyActualResult import org.apache.predictionio.controller.Params import org.apache.predictionio.data.storage.Event import org.apache.predictionio.data.storage.Storage import org.apache.predictionio.data.store._ import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.json4s._ import org.json4s.DefaultFormats import org.apache.spark.rdd.RDD import org.scalatest.Inspectors._ import org.scalatest.Matchers._ import org.scalatest.FunSuite import org.scalatest.Inside case class DataSourceParams(appName: String, eventWindow: Option[EventWindow], appId: Int) extends Params class SelfCleaningPDataSource(anAppName: String) extends PDataSource[TrainingData,EmptyEvaluationInfo, Query, EmptyActualResult] with SelfCleaningDataSource { val (appId, channelId) = org.apache.predictionio.data.store.Common.appNameToId(anAppName, None) val dsp = DataSourceParams(anAppName, Some(EventWindow(Some("1825 days"), true, true)), appId = appId) override def appName = dsp.appName override def eventWindow = dsp.eventWindow override def readTraining(sc: SparkContext): TrainingData = new TrainingData() def events = Storage.getPEvents().find(appId = dsp.appId)_ def itemEvents = Storage.getPEvents().find(appId = dsp.appId, entityType = Some("item"), eventNames = Some(Seq("$set")))_ def eventsAgg = Storage.getPEvents().aggregateProperties(appId = dsp.appId, entityType = "item")_ } class SelfCleaningDataSourceTest extends FunSuite with Inside with SharedSparkContext { //To run manually, requires app "cleanedTest" and test.json data imported to it ignore("Test event cleanup") { val source = new SelfCleaningPDataSource("cleanedTest") val eventsBeforeCount = source.events(sc).count val itemEventsBeforeCount = source.itemEvents(sc).count source.cleanPersistedPEvents(sc) val eventsAfterCount = source.events(sc).count val eventsAfter = source.events(sc) val itemEventsAfterCount = source.itemEvents(sc).count val distinctEventsAfterCount = eventsAfter.map(x => CleanedDataSourceTest.stripIdAndCreationTimeFromEvents(x)).distinct.count val nexusSet = eventsAfter.filter(x => x.event == "$set" && x.entityId == "Nexus").take(1)(0) implicit val formats = DefaultFormats nexusSet.properties.get[String]("available") should equal ("2016-03-18T13:31:49.016770+00:00") nexusSet.properties.get[JArray]("categories").values should equal ( JArray( List(JString("Tablets"), JString("Electronics"), JString("Google"))).values) distinctEventsAfterCount should equal (eventsAfterCount) eventsBeforeCount should be > (eventsAfterCount) itemEventsBeforeCount should be > (itemEventsAfterCount) itemEventsAfterCount should be > 0l } } object CleanedDataSourceTest{ def stripIdAndCreationTimeFromEvents(x: Event): Event = { Event(event = x.event, entityType = x.entityType, entityId = x.entityId, targetEntityType = x.targetEntityType, targetEntityId = x.targetEntityId, properties = x.properties, eventTime = x.eventTime, tags = x.tags, prId= x.prId, creationTime = x.eventTime) } } case class Query() extends Serializable class TrainingData() extends Serializable ================================================ FILE: core/src/test/scala/org/apache/predictionio/core/test.json ================================================ {"eventId":"KpjNMVrQzY2s0TZhYB3vsAAAAVOFSkM1kLoZgQnOA1E","event":"$set","entityType":"item","entityId":"Nexus","properties":{"categories":["Tablets","Electronics","Google"]},"eventTime":"2016-03-17T15:55:49.941Z","creationTime":"2016-03-17T15:55:49.945Z"} {"event":"$set","entityType":"item","entityId":"Nexus","properties":{"categories":["Tablets","Electronics","Google"]},"eventTime":"2016-03-17T15:55:49.941Z","creationTime":"2016-03-17T15:55:49.945Z"} {"event":"$set","entityType":"item","entityId":"Nexus","properties":{"categories":["Tablets","Electronics","Google2"], "test": ["testA", "testB"]},"eventTime":"2006-03-17T15:54:49.941Z","creationTime":"2006-03-17T15:54:49.945Z"} {"eventId":"KpjNMVrQzY2s0TZhYB3vsAAAAVOFSkNogMMiTarDxQA","event":"$set","entityType":"item","entityId":"Nexus","properties":{"countries":["United States","Canada"]},"eventTime":"2016-03-17T15:55:49.992Z","creationTime":"2016-03-17T15:55:49.997Z"} {"eventId":"KpjNMVrQzY2s0TZhYB3vsAAAAVOFSkOdrr3SJaHTlQQ","event":"$set","entityType":"item","entityId":"Nexus","properties":{"available":"2016-03-14T13:31:49.016770+00:00","date":"2016-03-16T13:31:49.016770+00:00","expires":"2016-03-18T13:31:49.016770+00:00"},"eventTime":"2016-03-17T15:55:50.045Z","creationTime":"2016-03-17T15:55:50.049Z"} {"eventId":"KpjNMVrQzY2s0TZhYB3vsAAAAVOFSkOdrr3SJaHTlQQ","event":"$set","entityType":"item","entityId":"Nexus","properties":{"available":"2016-03-18T13:31:49.016770+00:00","date":"2016-03-16T13:31:49.016770+00:00","expires":"2016-03-18T13:31:49.016770+00:00"},"eventTime":"2016-03-18T15:55:50.045Z","creationTime":"2016-03-18T15:55:50.049Z"} {"eventId":"MdgNfySNSsz0WVh1q6f3_gAAAVOFSkNKjmJz4kil3F0","event":"$set","entityType":"item","entityId":"Surface","properties":{"categories":["Tablets","Electronics","Microsoft"]},"eventTime":"2016-03-17T15:55:49.962Z","creationTime":"2016-03-17T15:55:49.966Z"} {"eventId":"MdgNfySNSsz0WVh1q6f3_gAAAVOFSkN-lNLH6dbWhjI","event":"$set","entityType":"item","entityId":"Surface","properties":{"countries":["United States","Canada"]},"eventTime":"2016-03-17T15:55:50.014Z","creationTime":"2016-03-17T15:55:50.018Z"} {"eventId":"MdgNfySNSsz0WVh1q6f3_gAAAVOFSkOmhp8HSvY0l2M","event":"$set","entityType":"item","entityId":"Surface","properties":{"available":"2016-03-15T08:43:49.016770+00:00","date":"2016-03-17T08:43:49.016770+00:00","expires":"2016-03-19T08:43:49.016770+00:00"},"eventTime":"2016-03-17T15:55:50.054Z","creationTime":"2016-03-17T15:55:50.060Z"} {"eventId":"PxKvMIeTGaAvnzYFx0Il5AAAAVOFSkJTgLGAfdlk374","event":"purchase","entityType":"user","entityId":"U 2","targetEntityType":"item","targetEntityId":"Nexus","properties":{},"eventTime":"2016-03-17T15:55:49.715Z","creationTime":"2016-03-17T15:55:49.721Z"} {"eventId":"PxKvMIeTGaAvnzYFx0Il5AAAAVOFSkJflBwNuoxYZSk","event":"purchase","entityType":"user","entityId":"U 2","targetEntityType":"item","targetEntityId":"Galaxy","properties":{},"eventTime":"2016-03-17T15:55:49.727Z","creationTime":"2016-03-17T15:55:49.734Z"} {"eventId":"PxKvMIeTGaAvnzYFx0Il5AAAAVOFSkK0jq-hYskgTHQ","event":"view","entityType":"user","entityId":"U 2","targetEntityType":"item","targetEntityId":"Phones","properties":{},"eventTime":"2016-03-17T15:55:49.812Z","creationTime":"2016-03-17T15:55:49.816Z"} {"eventId":"PxKvMIeTGaAvnzYFx0Il5AAAAVOFSkLBme-oEd51kRc","event":"view","entityType":"user","entityId":"U 2","targetEntityType":"item","targetEntityId":"Tablets","properties":{},"eventTime":"2016-03-17T15:55:49.825Z","creationTime":"2016-03-17T15:55:49.830Z"} {"eventId":"PxKvMIeTGaAvnzYFx0Il5AAAAVOFSkLMtz_jVwJkrMo","event":"view","entityType":"user","entityId":"U 2","targetEntityType":"item","targetEntityId":"Mobile-acc","properties":{},"eventTime":"2016-03-17T15:55:49.836Z","creationTime":"2016-03-17T15:55:49.841Z"} {"eventId":"P0xK5wvjfKzdMwVGPH_MzgAAAVOFSkMXp9rxoAwBXfs","event":"$set","entityType":"item","entityId":"Iphone 5","properties":{"categories":["Phones","Electronics","Apple"]},"eventTime":"2016-03-17T15:55:49.911Z","creationTime":"2016-03-17T15:55:49.915Z"} {"eventId":"P0xK5wvjfKzdMwVGPH_MzgAAAVOFSkPHpu0QYdCNeC4","event":"$set","entityType":"item","entityId":"Iphone 5","properties":{"available":"2016-03-17T18:19:49.016770+00:00","date":"2016-03-19T18:19:49.016770+00:00","expires":"2016-03-21T18:19:49.016770+00:00"},"eventTime":"2016-03-17T15:55:50.087Z","creationTime":"2016-03-17T15:55:50.091Z"} {"eventId":"Rh2bjOtiPNen04BEL4hS1AAAAVOFSkJsqukj8XhK8UQ","event":"purchase","entityType":"user","entityId":"u-3","targetEntityType":"item","targetEntityId":"Surface","properties":{},"eventTime":"2016-03-17T15:55:49.740Z","creationTime":"2016-03-17T15:55:49.745Z"} {"eventId":"Rh2bjOtiPNen04BEL4hS1AAAAVOFSkLXofSwU6V_g8M","event":"view","entityType":"user","entityId":"u-3","targetEntityType":"item","targetEntityId":"Mobile-acc","properties":{},"eventTime":"2016-03-17T15:55:49.847Z","creationTime":"2016-03-17T15:55:49.851Z"} {"eventId":"Z0813DMQIKz7N4VGxZhmngAAAVOFSj-Yq3R7qgg6_Vk","event":"purchase","entityType":"user","entityId":"u1","targetEntityType":"item","targetEntityId":"Iphone 6","properties":{},"eventTime":"2016-03-17T15:55:49.016Z","creationTime":"2016-03-17T15:55:49.401Z"} {"eventId":"Z0813DMQIKz7N4VGxZhmngAAAVOFSkIlpz1FtdazY3s","event":"purchase","entityType":"user","entityId":"u1","targetEntityType":"item","targetEntityId":"Iphone 5","properties":{},"eventTime":"2016-03-17T15:55:49.669Z","creationTime":"2016-03-17T15:55:49.678Z"} {"eventId":"Z0813DMQIKz7N4VGxZhmngAAAVOFSkI0iPemXrYZZvo","event":"purchase","entityType":"user","entityId":"u1","targetEntityType":"item","targetEntityId":"Iphone 4","properties":{},"eventTime":"2016-03-17T15:55:49.684Z","creationTime":"2016-03-17T15:55:49.693Z"} {"eventId":"Z0813DMQIKz7N4VGxZhmngAAAVOFSkJDiVnOzeypN7I","event":"purchase","entityType":"user","entityId":"u1","targetEntityType":"item","targetEntityId":"Ipad-retina","properties":{},"eventTime":"2016-03-17T15:55:49.699Z","creationTime":"2016-03-17T15:55:49.707Z"} {"eventId":"Z0813DMQIKz7N4VGxZhmngAAAVOFSkKbpK-PGBIgfOI","event":"view","entityType":"user","entityId":"u1","targetEntityType":"item","targetEntityId":"Phones","properties":{},"eventTime":"2016-03-17T15:55:49.787Z","creationTime":"2016-03-17T15:55:49.791Z"} {"eventId":"Z0813DMQIKz7N4VGxZhmngAAAVOFSkKptArC8-MR6bE","event":"view","entityType":"user","entityId":"u1","targetEntityType":"item","targetEntityId":"Mobile-acc","properties":{},"eventTime":"2016-03-17T15:55:49.801Z","creationTime":"2016-03-17T15:55:49.806Z"} {"eventId":"dsw1LKGItnaOliG661FGeQAAAVOFSkJ2rb2DtAZ6Kc0","event":"purchase","entityType":"user","entityId":"u-4","targetEntityType":"item","targetEntityId":"Iphone 5","properties":{},"eventTime":"2016-03-17T15:55:49.750Z","creationTime":"2016-03-17T15:55:49.754Z"} {"eventId":"dsw1LKGItnaOliG661FGeQAAAVOFSkJ2rb2DtAZ6Kc0","event":"purchase","entityType":"user","entityId":"u-4","targetEntityType":"item","targetEntityId":"Iphone 5","properties":{},"eventTime":"2016-03-17T15:55:49.750Z","creationTime":"2016-03-17T15:55:49.754Z"} {"eventId":"dsw1LKGItnaOliG661FGeQAAAVOFSkKBmIoMOHYdSNc","event":"purchase","entityType":"user","entityId":"u-4","targetEntityType":"item","targetEntityId":"Iphone 4","properties":{},"eventTime":"2016-03-17T15:55:49.761Z","creationTime":"2016-03-17T15:55:49.769Z"} {"eventId":"dsw1LKGItnaOliG661FGeQAAAVOFSkKPlZrbJdSAuNo","event":"purchase","entityType":"user","entityId":"u-4","targetEntityType":"item","targetEntityId":"Galaxy","properties":{},"eventTime":"2016-03-17T15:55:49.775Z","creationTime":"2016-03-17T15:55:49.781Z"} {"eventId":"dsw1LKGItnaOliG661FGeQAAAVOFSkLhgVH2nSiQUk8","event":"view","entityType":"user","entityId":"u-4","targetEntityType":"item","targetEntityId":"Phones","properties":{},"eventTime":"2016-03-17T15:55:49.857Z","creationTime":"2016-03-17T15:55:49.862Z"} {"eventId":"dsw1LKGItnaOliG661FGeQAAAVOFSkLsv05zv25rTp8","event":"view","entityType":"user","entityId":"u-4","targetEntityType":"item","targetEntityId":"Tablets","properties":{},"eventTime":"2016-03-17T15:55:49.868Z","creationTime":"2016-03-17T15:55:49.872Z"} {"eventId":"dsw1LKGItnaOliG661FGeQAAAVOFSkL2lG__U2kPe1Y","event":"view","entityType":"user","entityId":"u-4","targetEntityType":"item","targetEntityId":"Soap","properties":{},"eventTime":"2016-03-17T15:55:49.878Z","creationTime":"2016-03-17T15:55:49.882Z"} {"eventId":"gmvnQ953Qb_tMUAzxNqgtQAAAVOFSkMKgheCJU2SSYI","event":"$set","entityType":"item","entityId":"Iphone 6","properties":{"categories":["Phones","Electronics","Apple"]},"eventTime":"2016-03-17T15:55:49.898Z","creationTime":"2016-03-17T15:55:49.903Z"} {"eventId":"gmvnQ953Qb_tMUAzxNqgtQAAAVOFSkOIh2BhgBjtKYU","event":"$set","entityType":"item","entityId":"Iphone 6","properties":{"available":"2016-03-12T23:07:49.016770+00:00","date":"2016-03-14T23:07:49.016770+00:00","expires":"2016-03-16T23:07:49.016770+00:00"},"eventTime":"2016-03-17T15:55:50.024Z","creationTime":"2016-03-17T15:55:50.028Z"} {"eventId":"pAabCfxStG8KscX91YcbQgAAAVOFSkMgnNUmSCOAk-k","event":"$set","entityType":"item","entityId":"Iphone 4","properties":{"categories":["Phones","Electronics","Apple"]},"eventTime":"2016-03-17T15:55:49.920Z","creationTime":"2016-03-17T15:55:49.925Z"} {"eventId":"pAabCfxStG8KscX91YcbQgAAAVOFSkNUieljPZ0N8Ks","event":"$set","entityType":"item","entityId":"Iphone 4","properties":{"countries":["United States","Canada","Estados Unidos Mexicanos"]},"eventTime":"2016-03-17T15:55:49.972Z","creationTime":"2016-03-17T15:55:49.976Z"} {"eventId":"pAabCfxStG8KscX91YcbQgAAAVOFSkOztX--kWmGKeg","event":"$set","entityType":"item","entityId":"Iphone 4","properties":{"available":"2016-03-16T03:55:49.016770+00:00","date":"2016-03-18T03:55:49.016770+00:00","expires":"2016-03-20T03:55:49.016770+00:00"},"eventTime":"2016-03-17T15:55:50.067Z","creationTime":"2016-03-17T15:55:50.071Z"} {"eventId":"7CvEfxvyU91u9adLcWdeDAAAAVOFSkMrhghB7z6eySU","event":"$set","entityType":"item","entityId":"Ipad-retina","properties":{"categories":["Tablets","Electronics","Apple"]},"eventTime":"2016-03-17T15:55:49.931Z","creationTime":"2016-03-17T15:55:49.935Z"} {"eventId":"7CvEfxvyU91u9adLcWdeDAAAAVOFSkNemo-5_T66338","event":"$set","entityType":"item","entityId":"Ipad-retina","properties":{"countries":["United States","Estados Unidos Mexicanos"]},"eventTime":"2016-03-17T15:55:49.982Z","creationTime":"2016-03-17T15:55:49.986Z"} {"eventId":"7CvEfxvyU91u9adLcWdeDAAAAVOFSkOSqPalKUCDgQI","event":"$set","entityType":"item","entityId":"Ipad-retina","properties":{"available":"2016-03-13T18:19:49.016770+00:00","date":"2016-03-15T18:19:49.016770+00:00","expires":"2016-03-17T18:19:49.016770+00:00"},"eventTime":"2016-03-17T15:55:50.034Z","creationTime":"2016-03-17T15:55:50.038Z"} {"eventId":"7QC1-7RtN0F-51rlq5irAgAAAVOFSkM_vOQlAmRTg3s","event":"$set","entityType":"item","entityId":"Galaxy","properties":{"categories":["Phones","Electronics","Samsung"]},"eventTime":"2016-03-17T15:55:49.951Z","creationTime":"2016-03-17T15:55:49.955Z"} {"eventId":"7QC1-7RtN0F-51rlq5irAgAAAVOFSkN0pUflT-SwR3w","event":"$set","entityType":"item","entityId":"Galaxy","properties":{"countries":["United States"]},"eventTime":"2016-03-17T15:55:50.004Z","creationTime":"2016-03-17T15:55:50.008Z"} {"eventId":"7QC1-7RtN0F-51rlq5irAgAAAVOFSkO9ko9YEaEplJs","event":"$set","entityType":"item","entityId":"Galaxy","properties":{"available":"2016-03-16T23:07:49.016770+00:00","date":"2016-03-18T23:07:49.016770+00:00","expires":"2016-03-20T23:07:49.016770+00:00"},"eventTime":"2016-03-17T15:55:50.077Z","creationTime":"2016-03-17T15:55:50.081Z"} {"eventId":"-ea0Iys05y2nvrM9WUmnwwAAAVOFSkL_k9xrJWi41qM","event":"view","entityType":"user","entityId":"u5","targetEntityType":"item","targetEntityId":"Soap","properties":{},"eventTime":"2016-03-17T15:55:49.887Z","creationTime":"2016-03-17T15:55:49.892Z"} {"eventId":"dsw1LKGItnaOliG661FGeQAAAVOFSkJ2rb2DtAZ6Kc0","event":"purchase","entityType":"user","entityId":"u-4","targetEntityType":"item","targetEntityId":"Iphone 5","properties":{},"eventTime":"1970-03-17T15:55:49.750Z","creationTime":"1970-03-17T15:55:49.754Z"} ================================================ FILE: core/src/test/scala/org/apache/predictionio/workflow/BaseTest.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ // package org.apache.spark package org.apache.predictionio.workflow import _root_.io.netty.util.internal.logging.{InternalLoggerFactory, Slf4JLoggerFactory} import org.apache.predictionio.data.storage.{EnvironmentFactory, EnvironmentService} import org.scalatest.BeforeAndAfterAll import org.scalatest.BeforeAndAfterEach import org.scalatest.Suite import org.apache.spark.SparkContext import org.apache.spark.SparkConf import org.scalamock.scalatest.MockFactory /** Manages a local `sc` {@link SparkContext} variable, correctly stopping it * after each test. */ trait LocalSparkContext extends BeforeAndAfterEach with BeforeAndAfterAll { self: Suite => @transient var sc: SparkContext = _ override def beforeAll() { InternalLoggerFactory.setDefaultFactory(new Slf4JLoggerFactory()) super.beforeAll() } override def afterEach() { resetSparkContext() super.afterEach() } def resetSparkContext() : Unit = { LocalSparkContext.stop(sc) sc = null } } object LocalSparkContext { def stop(sc: SparkContext) { if (sc != null) { sc.stop() } // To avoid Akka rebinding to the same port, since it doesn't unbind immediately on shutdown System.clearProperty("spark.driver.port") } /** Runs `f` by passing in `sc` and ensures that `sc` is stopped. */ def withSpark[T](sc: SparkContext)(f: SparkContext => T) : Unit = { try { f(sc) } finally { stop(sc) } } } /** Shares a local `SparkContext` between all tests in a suite and closes it at the end */ trait SharedSparkContext extends BeforeAndAfterAll { self: Suite => @transient private var _sc: SparkContext = _ def sc: SparkContext = _sc var conf = new SparkConf(false) override def beforeAll() { _sc = new SparkContext("local[4]", "test", conf) super.beforeAll() } override def afterAll() { LocalSparkContext.stop(_sc) _sc = null super.afterAll() } } trait SharedStorageContext extends BeforeAndAfterAll { self: Suite => override def beforeAll(): Unit ={ ConfigurationMockUtil.createJDBCMockedConfig super.beforeAll() } override def afterAll(): Unit = { super.afterAll() } } object ConfigurationMockUtil extends MockFactory { def createJDBCMockedConfig: Unit = { val mockedEnvService = mock[EnvironmentService] (mockedEnvService.envKeys _) .expects .returning(List("PIO_STORAGE_REPOSITORIES_METADATA_NAME", "PIO_STORAGE_SOURCES_MYSQL_TYPE")) .twice (mockedEnvService.getByKey _) .expects("PIO_STORAGE_REPOSITORIES_METADATA_NAME") .returning("test_metadata") (mockedEnvService.getByKey _) .expects("PIO_STORAGE_REPOSITORIES_METADATA_SOURCE") .returning("MYSQL") (mockedEnvService.getByKey _) .expects("PIO_STORAGE_SOURCES_MYSQL_TYPE") .returning("jdbc") (mockedEnvService.filter _) .expects(*) .returning(Map( "URL" -> "jdbc:h2:~/test;MODE=MySQL;AUTO_SERVER=TRUE", "USERNAME" -> "sa", "PASSWORD" -> "") ) EnvironmentFactory.environmentService = new Some(mockedEnvService) } } ================================================ FILE: core/src/test/scala/org/apache/predictionio/workflow/EngineWorkflowTest.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ ================================================ FILE: core/src/test/scala/org/apache/predictionio/workflow/EvaluationWorkflowTest.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.workflow import org.apache.predictionio.controller._ import org.scalamock.scalatest.MockFactory import org.scalatest.FunSuite import org.scalatest.Matchers._ class EvaluationWorkflowSuite extends FunSuite with SharedStorageContext with SharedSparkContext with MockFactory { test("Evaluation return best engine params, simple result type: Double") { val engine = new Engine1() val ep0 = EngineParams(dataSourceParams = Engine1.DSP(0.2)) val ep1 = EngineParams(dataSourceParams = Engine1.DSP(0.3)) val ep2 = EngineParams(dataSourceParams = Engine1.DSP(0.3)) val ep3 = EngineParams(dataSourceParams = Engine1.DSP(-0.2)) val engineParamsList = Seq(ep0, ep1, ep2, ep3) val evaluator = MetricEvaluator(new Metric0()) object Eval extends Evaluation { engineEvaluator = (new Engine1(), MetricEvaluator(new Metric0())) } val result = EvaluationWorkflow.runEvaluation( sc, Eval, engine, engineParamsList, evaluator, WorkflowParams()) result.bestScore.score shouldBe 0.3 result.bestEngineParams shouldBe ep1 } test("Evaluation return best engine params, complex result type") { val engine = new Engine1() val ep0 = EngineParams(dataSourceParams = Engine1.DSP(0.2)) val ep1 = EngineParams(dataSourceParams = Engine1.DSP(0.3)) val ep2 = EngineParams(dataSourceParams = Engine1.DSP(0.3)) val ep3 = EngineParams(dataSourceParams = Engine1.DSP(-0.2)) val engineParamsList = Seq(ep0, ep1, ep2, ep3) val evaluator = MetricEvaluator(new Metric1()) object Eval extends Evaluation { engineEvaluator = (new Engine1(), MetricEvaluator(new Metric1())) } val result = EvaluationWorkflow.runEvaluation( sc, Eval, engine, engineParamsList, evaluator, WorkflowParams()) result.bestScore.score shouldBe Metric1.Result(0, 0.3) result.bestEngineParams shouldBe ep1 } } ================================================ FILE: core/src/test/scala/org/apache/predictionio/workflow/JsonExtractorSuite.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.workflow import org.apache.predictionio.controller.EngineParams import org.apache.predictionio.controller.Params import org.apache.predictionio.controller.Utils import org.json4s.CustomSerializer import org.json4s.JsonAST.JField import org.json4s.JsonAST.JObject import org.json4s.JsonAST.JString import org.json4s.MappingException import org.json4s.native.JsonMethods.compact import org.json4s.native.JsonMethods.render import org.scalatest.FunSuite import org.scalatest.Matchers class JsonExtractorSuite extends FunSuite with Matchers { test("Extract Scala object using option Json4sNative works with optional and default value " + "provided") { val json = """{"string": "query string", "optional": "optional string", "default": "d"}""" val query = JsonExtractor.extract( JsonExtractorOption.Json4sNative, json, classOf[ScalaQuery]) query should be (ScalaQuery("query string", Some("optional string"), "d")) } test("Extract Scala object using option Json4sNative works with no optional and no default " + "value provided") { val json = """{"string": "query string"}""" val query = JsonExtractor.extract( JsonExtractorOption.Json4sNative, json, classOf[ScalaQuery]) query should be (ScalaQuery("query string", None, "default")) } test("Extract Scala object using option Json4sNative works with null optional and null default" + " value") { val json = """{"string": "query string", "optional": null, "default": null}""" val query = JsonExtractor.extract( JsonExtractorOption.Json4sNative, json, classOf[ScalaQuery]) query should be (ScalaQuery("query string", None, "default")) } test("Extract Scala object using option Both works with optional and default value provided") { val json = """{"string": "query string", "optional": "optional string", "default": "d"}""" val query = JsonExtractor.extract( JsonExtractorOption.Json4sNative, json, classOf[ScalaQuery]) query should be (ScalaQuery("query string", Some("optional string"), "d")) } test("Extract Scala object using option Both works with no optional and no default value " + "provided") { val json = """{"string": "query string"}""" val query = JsonExtractor.extract( JsonExtractorOption.Json4sNative, json, classOf[ScalaQuery]) query should be (ScalaQuery("query string", None, "default")) } test("Extract Scala object using option Both works with null optional and null default value") { val json = """{"string": "query string", "optional": null, "default": null}""" val query = JsonExtractor.extract( JsonExtractorOption.Json4sNative, json, classOf[ScalaQuery]) query should be (ScalaQuery("query string", None, "default")) } test("Extract Scala object using option Gson should not get default value and optional none" + " value") { val json = """{"string": "query string"}""" val query = JsonExtractor.extract( JsonExtractorOption.Gson, json, classOf[ScalaQuery]) query should be (ScalaQuery("query string", null, null)) } test("Extract Scala object using option Gson should throw an exception with optional " + "value provided") { val json = """{"string": "query string", "optional": "o", "default": "d"}""" intercept[RuntimeException] { JsonExtractor.extract( JsonExtractorOption.Gson, json, classOf[ScalaQuery]) } } test("Extract Java object using option Gson works") { val json = """{"q": "query string"}""" val query = JsonExtractor.extract( JsonExtractorOption.Gson, json, classOf[JavaQuery]) query should be (new JavaQuery("query string")) } test("Extract Java object using option Both works") { val json = """{"q": "query string"}""" val query = JsonExtractor.extract( JsonExtractorOption.Both, json, classOf[JavaQuery]) query should be (new JavaQuery("query string")) } test("Extract Java object using option Json4sNative should throw an exception") { val json = """{"q": "query string"}""" intercept[MappingException] { JsonExtractor.extract( JsonExtractorOption.Json4sNative, json, classOf[JavaQuery]) } } test("Extract Scala object using option Json4sNative with custom deserializer") { val json = """{"string": "query string", "optional": "o", "default": "d"}""" val query = JsonExtractor.extract( JsonExtractorOption.Json4sNative, json, classOf[ScalaQuery], Utils.json4sDefaultFormats + new UpperCaseFormat ) query should be(ScalaQuery("QUERY STRING", Some("O"), "D")) } test("Extract Java object usingoption Gson with custom deserializer") { val json = """{"q": "query string"}""" val query = JsonExtractor.extract( extractorOption = JsonExtractorOption.Gson, json = json, clazz = classOf[JavaQuery], gsonTypeAdapterFactories = Seq(new JavaQueryTypeAdapterFactory) ) query should be(new JavaQuery("QUERY STRING")) } test("Java object to JValue using option Both works") { val query = new JavaQuery("query string") val jValue = JsonExtractor.toJValue(JsonExtractorOption.Both, query) compact(render(jValue)) should be ("""{"q":"query string"}""") } test("Java object to JValue using option Gson works") { val query = new JavaQuery("query string") val jValue = JsonExtractor.toJValue(JsonExtractorOption.Gson, query) compact(render(jValue)) should be ("""{"q":"query string"}""") } test("Java object to JValue using option Json4sNative results in empty Json") { val query = new JavaQuery("query string") val jValue = JsonExtractor.toJValue(JsonExtractorOption.Json4sNative, query) compact(render(jValue)) should be ("""{}""") } test("Scala object to JValue using option Both works") { val query = new ScalaQuery("query string", Some("option")) val jValue = JsonExtractor.toJValue(JsonExtractorOption.Both, query) compact(render(jValue)) should be ("""{"string":"query string","optional":"option","default":"default"}""") } test("Scala object to JValue using option Gson does not serialize optional") { val query = new ScalaQuery("query string", Some("option")) val jValue = JsonExtractor.toJValue(JsonExtractorOption.Gson, query) compact(render(jValue)) should be ("""{"string":"query string","optional":{},"default":"default"}""") } test("Scala object to JValue using option Json4sNative works") { val query = new ScalaQuery("query string", Some("option")) val jValue = JsonExtractor.toJValue(JsonExtractorOption.Json4sNative, query) compact(render(jValue)) should be ("""{"string":"query string","optional":"option","default":"default"}""") } test("Scala object to JValue using option Json4sNative with custom serializer") { val query = new ScalaQuery("query string", Some("option")) val jValue = JsonExtractor.toJValue( JsonExtractorOption.Json4sNative, query, Utils.json4sDefaultFormats + new UpperCaseFormat ) compact(render(jValue)) should be ("""{"string":"QUERY STRING","optional":"OPTION","default":"DEFAULT"}""") } test("Java object to JValue using option Gson with custom serializer") { val query = new JavaQuery("query string") val jValue = JsonExtractor.toJValue( extractorOption = JsonExtractorOption.Gson, o = query, gsonTypeAdapterFactories = Seq(new JavaQueryTypeAdapterFactory) ) compact(render(jValue)) should be ("""{"q":"QUERY STRING"}""") } test("Java Param to Json using option Both") { val param = ("algo", new JavaParams("parameter")) val json = JsonExtractor.paramToJson(JsonExtractorOption.Both, param) json should be ("""{"algo":{"p":"parameter"}}""") } test("Java Param to Json using option Gson") { val param = ("algo", new JavaParams("parameter")) val json = JsonExtractor.paramToJson(JsonExtractorOption.Gson, param) json should be ("""{"algo":{"p":"parameter"}}""") } test("Scala Param to Json using option Both") { val param = ("algo", AlgorithmParams("parameter")) val json = JsonExtractor.paramToJson(JsonExtractorOption.Both, param) json should be ("""{"algo":{"a":"parameter"}}""") } test("Scala Param to Json using option Json4sNative") { val param = ("algo", AlgorithmParams("parameter")) val json = JsonExtractor.paramToJson(JsonExtractorOption.Json4sNative, param) json should be ("""{"algo":{"a":"parameter"}}""") } test("Java Params to Json using option Both") { val params = Seq(("algo", new JavaParams("parameter")), ("algo2", new JavaParams("parameter2"))) val json = JsonExtractor.paramsToJson(JsonExtractorOption.Both, params) json should be ("""[{"algo":{"p":"parameter"}},{"algo2":{"p":"parameter2"}}]""") } test("Java Params to Json using option Gson") { val params = Seq(("algo", new JavaParams("parameter")), ("algo2", new JavaParams("parameter2"))) val json = JsonExtractor.paramsToJson(JsonExtractorOption.Gson, params) json should be ("""[{"algo":{"p":"parameter"}},{"algo2":{"p":"parameter2"}}]""") } test("Scala Params to Json using option Both") { val params = Seq(("algo", AlgorithmParams("parameter")), ("algo2", AlgorithmParams("parameter2"))) val json = JsonExtractor.paramsToJson(JsonExtractorOption.Both, params) json should be (org.json4s.native.Serialization.write(params)(Utils.json4sDefaultFormats)) } test("Scala Params to Json using option Json4sNative") { val params = Seq(("algo", AlgorithmParams("parameter")), ("algo2", AlgorithmParams("parameter2"))) val json = JsonExtractor.paramsToJson(JsonExtractorOption.Json4sNative, params) json should be (org.json4s.native.Serialization.write(params)(Utils.json4sDefaultFormats)) } test("Mixed Java and Scala Params to Json using option Both") { val params = Seq(("scala", AlgorithmParams("parameter")), ("java", new JavaParams("parameter2"))) val json = JsonExtractor.paramsToJson(JsonExtractorOption.Both, params) json should be ("""[{"scala":{"a":"parameter"}},{"java":{"p":"parameter2"}}]""") } test("Serializing Scala EngineParams works using option Json4sNative") { val ep = new EngineParams( dataSourceParams = ("ds", DataSourceParams("dsp")), algorithmParamsList = Seq(("a0", AlgorithmParams("ap")))) val json = JsonExtractor.engineParamsToJson(JsonExtractorOption.Json4sNative, ep) json should be ( """{"dataSourceParams":{"ds":{"a":"dsp"}},"preparatorParams":{"":{}},""" + """"algorithmParamsList":[{"a0":{"a":"ap"}}],"servingParams":{"":{}}}""") } test("Serializing Java EngineParams works using option Gson") { val ep = new EngineParams( dataSourceParams = ("ds", new JavaParams("dsp")), algorithmParamsList = Seq(("a0", new JavaParams("ap")), ("a1", new JavaParams("ap2")))) val json = JsonExtractor.engineParamsToJson(JsonExtractorOption.Gson, ep) json should be ( """{"dataSourceParams":{"ds":{"p":"dsp"}},"preparatorParams":{"":{}},""" + """"algorithmParamsList":[{"a0":{"p":"ap"}},{"a1":{"p":"ap2"}}],"servingParams":{"":{}}}""") } test("Serializing Java EngineParams works using option Both") { val ep = new EngineParams( dataSourceParams = ("ds", new JavaParams("dsp")), algorithmParamsList = Seq(("a0", new JavaParams("ap")), ("a1", new JavaParams("ap2")))) val json = JsonExtractor.engineParamsToJson(JsonExtractorOption.Both, ep) json should be ( """{"dataSourceParams":{"ds":{"p":"dsp"}},"preparatorParams":{"":{}},""" + """"algorithmParamsList":[{"a0":{"p":"ap"}},{"a1":{"p":"ap2"}}],"servingParams":{"":{}}}""") } } private case class AlgorithmParams(a: String) extends Params private case class DataSourceParams(a: String) extends Params private case class ScalaQuery(string: String, optional: Option[String], default: String = "default") private class UpperCaseFormat extends CustomSerializer[ScalaQuery](format => ( { case JObject(JField("string", JString(string)) :: JField("optional", JString(optional)) :: JField("default", JString(default)) :: Nil) => ScalaQuery(string.toUpperCase, Some(optional.toUpperCase), default.toUpperCase) }, { case x: ScalaQuery => JObject( JField("string", JString(x.string.toUpperCase)), JField("optional", JString(x.optional.get.toUpperCase)), JField("default", JString(x.default.toUpperCase))) })) ================================================ FILE: data/README.md ================================================ ## Data Collection API Please refer to the documentation site - [Collecting Data through REST/SDKs](http://predictionio.apache.org/datacollection/eventapi/). ## For Development Use only: ### Start Data API without bin/pio ``` $ sbt/sbt "data/compile" $ set -a $ source conf/pio-env.sh $ set +a $ sbt/sbt "data/run-main org.apache.predictionio.data.api.Run" ``` ### Very simple test ``` $ data/test.sh ``` ### Unit test (Very minimal) ``` $ set -a $ source conf/pio-env.sh $ set +a $ sbt/sbt "data/test" ``` - test for EventService ``` $ sbt/sbt "data/test-only org.apache.predictionio.data.api.EventServiceSpec" ``` - test for LEvents ``` $ sbt/sbt "data/test-only org.apache.predictionio.data.storage.LEventsSpec" ``` - test for ExampleJson and ExampleForm webhooks ``` $ sbt/sbt "data/test-only org.apache.predictionio.data.webhooks.examplejson.ExampleJsonConnectorSpec" $ sbt/sbt "data/test-only org.apache.predictionio.data.webhooks.exampleform.ExampleFormConnectorSpec" ``` ### Upgrade from 0.8.0/0.8.1 to 0.8.2 Experimental upgrade tool (Upgrade HBase schema from 0.8.0/0.8.1 to 0.8.2) Create an app to store the data ``` $ bin/pio app new ``` Replace by the returned app ID: ( is the original app ID used in 0.8.0/0.8.2.) ``` $ set -a $ source conf/pio-env.sh $ set +a $ sbt/sbt "data/run-main org.apache.predictionio.data.storage.hbase.upgrade.Upgrade " "" ``` ### Upgrade from 0.8.2 to 0.8.3 0.8.3 disallow entity types `pio_user` and `pio_item`. These types are used by default for most SDKs. We deprecate the use in 0.8.3, and SDKs helper functions use `user` and `item` instead respectively. This script performs the migration by copying one appId to another. User can either point the engine to the new appId, or can migrate the data back to the old one using hbase import / export tool. Suppose we are migrating ``. #### 1. First create a new app: ``` $ set -a $ source conf/pio-env.sh $ set +a $ bin/pio app new NewApp ... you will see ``` The App with `` must be empty before you upgrade. You can check the status of this new created app using: ``` $ sbt/sbt "data/run-main org.apache.predictionio.data.storage.hbase.upgrade.CheckDistribution " ``` If it shows that it is non-empty, you can clean it with ``` $ bin/pio app data-delete ``` #### 2. Run the following to migrate from to ``` $ sbt/sbt "data/run-main org.apache.predictionio.data.storage.hbase.upgrade.Upgrade_0_8_3 " ... Done. ``` You can use the following to check the again. It should display the number of data being migrated: ``` $ sbt/sbt "data/run-main org.apache.predictionio.data.storage.hbase.upgrade.CheckDistribution " ``` ================================================ FILE: data/build.sbt ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import PIOBuild._ name := "apache-predictionio-data" libraryDependencies ++= Seq( "org.scala-lang" % "scala-reflect" % scalaVersion.value, "com.github.nscala-time" %% "nscala-time" % "2.6.0", "com.google.guava" % "guava" % "14.0.1", "com.typesafe.akka" %% "akka-http-testkit" % "10.1.5" % "test", "org.apache.spark" %% "spark-sql" % sparkVersion.value % "provided", "org.clapper" %% "grizzled-slf4j" % "1.0.2", "org.scalatest" %% "scalatest" % "2.1.7" % "test", "org.specs2" %% "specs2" % "3.3.1" % "test" exclude("org.scalaz.stream", s"scalaz-stream_${scalaBinaryVersion.value}"), "org.scalamock" %% "scalamock-specs2-support" % "3.5.0" % "test", "com.h2database" % "h2" % "1.4.196" % "test") parallelExecution in Test := false pomExtra := childrenPomExtra.value ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/Utils.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data import org.joda.time.DateTime import org.joda.time.format.ISODateTimeFormat import java.lang.IllegalArgumentException private[predictionio] object Utils { // use dateTime() for strict ISO8601 format val dateTimeFormatter = ISODateTimeFormat.dateTime().withOffsetParsed() val dateTimeNoMillisFormatter = ISODateTimeFormat.dateTimeNoMillis().withOffsetParsed() def stringToDateTime(dt: String): DateTime = { // We accept two formats. // 1. "yyyy-MM-dd'T'HH:mm:ss.SSSZZ" // 2. "yyyy-MM-dd'T'HH:mm:ssZZ" // The first one also takes milliseconds into account. try { // formatting for "yyyy-MM-dd'T'HH:mm:ss.SSSZZ" dateTimeFormatter.parseDateTime(dt) } catch { case e: IllegalArgumentException => { // handle when the datetime string doesn't specify milliseconds. dateTimeNoMillisFormatter.parseDateTime(dt) } } } def dateTimeToString(dt: DateTime): String = dateTimeFormatter.print(dt) // dt.toString } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/api/Common.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.api import akka.http.scaladsl.server._ import org.apache.predictionio.data.storage.StorageException import org.apache.predictionio.data.webhooks.ConnectorException import org.json4s.{DefaultFormats, Formats} import akka.http.scaladsl.model._ import akka.http.scaladsl.server.Directives._ import org.apache.predictionio.akkahttpjson4s.Json4sSupport._ object Common { object Json4sProtocol { implicit val serialization = org.json4s.native.Serialization implicit def json4sFormats: Formats = DefaultFormats } import Json4sProtocol._ val exceptionHandler = ExceptionHandler { case e: ConnectorException => { complete(StatusCodes.BadRequest, Map("message" -> s"${e.getMessage()}")) } case e: StorageException => { complete(StatusCodes.InternalServerError, Map("message" -> s"${e.getMessage()}")) } case e: Exception => { complete(StatusCodes.InternalServerError, Map("message" -> s"${e.getMessage()}")) } } val rejectionHandler = RejectionHandler.newBuilder().handle { case MalformedRequestContentRejection(msg, _) => complete(StatusCodes.BadRequest, Map("message" -> msg)) case MissingQueryParamRejection(msg) => complete(StatusCodes.NotFound, Map("message" -> s"missing required query parameter ${msg}.")) case AuthenticationFailedRejection(cause, challengeHeaders) => { val msg = cause match { case AuthenticationFailedRejection.CredentialsRejected => "Invalid accessKey." case AuthenticationFailedRejection.CredentialsMissing => "Missing accessKey." } complete(StatusCodes.Unauthorized, Map("message" -> msg)) } case ChannelRejection(msg) => complete(StatusCodes.Unauthorized, Map("message" -> msg)) }.result() } /** invalid channel */ case class ChannelRejection(msg: String) extends Rejection ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/api/EventInfo.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.api import org.apache.predictionio.data.storage.Event case class EventInfo( appId: Int, channelId: Option[Int], event: Event) ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/api/EventServer.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.api import akka.event.{Logging, LoggingAdapter} import sun.misc.BASE64Decoder import java.util.concurrent.TimeUnit import akka.actor._ import akka.http.scaladsl.Http import akka.http.scaladsl.model.{FormData, HttpEntity, HttpResponse, StatusCodes} import akka.http.scaladsl.model.ContentTypes._ import akka.http.scaladsl.model.headers.HttpChallenge import akka.http.scaladsl.server.Directives.complete import akka.http.scaladsl.server.directives._ import akka.http.scaladsl.server._ import akka.pattern.ask import akka.util.Timeout import akka.http.scaladsl.server.Directives._ import akka.stream.ActorMaterializer import org.apache.predictionio.data.storage._ import org.apache.predictionio.akkahttpjson4s.Json4sSupport._ import org.json4s.{DefaultFormats, Formats, JObject} import scala.concurrent._ import scala.concurrent.duration.Duration import scala.util.{Failure, Success, Try} object Json4sProtocol { implicit val serialization = org.json4s.native.Serialization implicit def json4sFormats: Formats = DefaultFormats + new EventJson4sSupport.APISerializer + new BatchEventsJson4sSupport.APISerializer + // NOTE: don't use Json4s JodaTimeSerializers since it has issues, // some format not converted, or timezone not correct new DateTimeJson4sSupport.Serializer } case class EventServerConfig( ip: String = "localhost", port: Int = 7070, plugins: String = "plugins", stats: Boolean = false) object EventServer { import Json4sProtocol._ import FutureDirectives._ import Common._ private val MaxNumberOfEventsPerBatchRequest = 50 private lazy val base64Decoder = new BASE64Decoder private implicit val timeout = Timeout(5, TimeUnit.SECONDS) private case class AuthData(appId: Int, channelId: Option[Int], events: Seq[String]) private def FailedAuth[T]: Either[Rejection, T] = Left( AuthenticationFailedRejection( AuthenticationFailedRejection.CredentialsRejected, HttpChallenge("eventserver", None) ) ) private def MissedAuth[T]: Either[Rejection, T] = Left( AuthenticationFailedRejection( AuthenticationFailedRejection.CredentialsMissing, HttpChallenge("eventserver", None) ) ) def createRoute(eventClient: LEvents, accessKeysClient: AccessKeys, channelsClient: Channels, logger: LoggingAdapter, statsActorRef: ActorSelection, pluginsActorRef: ActorSelection, config: EventServerConfig)(implicit executionContext: ExecutionContext): Route = { /* with accessKey in query/header, return appId if succeed */ def withAccessKey: RequestContext => Future[Either[Rejection, AuthData]] = { ctx: RequestContext => val accessKeyParamOpt = ctx.request.uri.query().get("accessKey") val channelParamOpt = ctx.request.uri.query().get("channel") Future { // with accessKey in query, return appId if succeed accessKeyParamOpt.map { accessKeyParam => accessKeysClient.get(accessKeyParam).map { k => channelParamOpt.map { ch => val channelMap = channelsClient.getByAppid(k.appid) .map(c => (c.name, c.id)).toMap if (channelMap.contains(ch)) { Right(AuthData(k.appid, Some(channelMap(ch)), k.events)) } else { Left(ChannelRejection(s"Invalid channel '$ch'.")) } }.getOrElse{ Right(AuthData(k.appid, None, k.events)) } }.getOrElse(FailedAuth) }.getOrElse { // with accessKey in header, return appId if succeed ctx.request.headers.find(_.name == "Authorization").map { authHeader => authHeader.value.split("Basic ") match { case Array(_, value) => val appAccessKey = new String(base64Decoder.decodeBuffer(value)).trim.split(":")(0) accessKeysClient.get(appAccessKey) match { case Some(k) => Right(AuthData(k.appid, None, k.events)) case None => FailedAuth } case _ => FailedAuth } }.getOrElse(MissedAuth) } } } def authenticate[T](authenticator: RequestContext => Future[Either[Rejection, T]]): AuthenticationDirective[T] = { handleRejections(rejectionHandler).tflatMap { _ => extractRequestContext.flatMap { requestContext => onSuccess(authenticator(requestContext)).flatMap { case Right(x) => provide(x) case Left(x) => reject(x): Directive1[T] } } } } val pluginContext = EventServerPluginContext(logger) val jsonPath = """(.+)\.json$""".r val formPath = """(.+)\.form$""".r val route: Route = pathSingleSlash { get { complete(Map("status" -> "alive")) } } ~ path("plugins.json") { get { complete( Map("plugins" -> Map( "inputblockers" -> pluginContext.inputBlockers.map { case (n, p) => n -> Map( "name" -> p.pluginName, "description" -> p.pluginDescription, "class" -> p.getClass.getName) }, "inputsniffers" -> pluginContext.inputSniffers.map { case (n, p) => n -> Map( "name" -> p.pluginName, "description" -> p.pluginDescription, "class" -> p.getClass.getName) } )) ) } } ~ path("plugins" / Segments) { segments => get { handleExceptions(exceptionHandler) { authenticate(withAccessKey) { authData => val pluginArgs = segments.drop(2) val pluginType = segments(0) val pluginName = segments(1) pluginType match { case EventServerPlugin.inputBlocker => complete(HttpResponse(entity = HttpEntity( `application/json`, pluginContext.inputBlockers(pluginName).handleREST( authData.appId, authData.channelId, pluginArgs) ))) case EventServerPlugin.inputSniffer => complete(pluginsActorRef ? PluginsActor.HandleREST( appId = authData.appId, channelId = authData.channelId, pluginName = pluginName, pluginArgs = pluginArgs) map { json => HttpResponse(entity = HttpEntity( `application/json`, json.asInstanceOf[String] )) }) } } } } } ~ path("events" / jsonPath ) { eventId => get { handleExceptions(exceptionHandler) { authenticate(withAccessKey) { authData => val appId = authData.appId val channelId = authData.channelId logger.debug(s"GET event ${eventId}.") onSuccess(eventClient.futureGet(eventId, appId, channelId)){ eventOpt => eventOpt.map { event => complete(StatusCodes.OK, event) }.getOrElse( complete(StatusCodes.NotFound, Map("message" -> "Not Found")) ) } } } } ~ delete { handleExceptions(exceptionHandler) { authenticate(withAccessKey) { authData => val appId = authData.appId val channelId = authData.channelId logger.debug(s"DELETE event ${eventId}.") onSuccess(eventClient.futureDelete(eventId, appId, channelId)){ found => if (found) { complete(StatusCodes.OK, Map("message" -> "Found")) } else { complete(StatusCodes.NotFound, Map("message" -> "Not Found")) } } } } } } ~ path("events.json") { post { handleExceptions(exceptionHandler) { authenticate(withAccessKey) { authData => val appId = authData.appId val channelId = authData.channelId val events = authData.events entity(as[Event]) { event => if (events.isEmpty || authData.events.contains(event.event)) { pluginContext.inputBlockers.values.foreach( _.process(EventInfo( appId = appId, channelId = channelId, event = event), pluginContext)) onSuccess(eventClient.futureInsert(event, appId, channelId)){ id => pluginsActorRef ! EventInfo( appId = appId, channelId = channelId, event = event) val result = (StatusCodes.Created, Map("eventId" -> s"${id}")) if (config.stats) { statsActorRef ! Bookkeeping(appId, result._1, event) } complete(result) } } else { complete(StatusCodes.Forbidden, Map("message" -> s"${event.event} events are not allowed")) } } } } } ~ get { handleExceptions(exceptionHandler) { authenticate(withAccessKey) { authData => val appId = authData.appId val channelId = authData.channelId parameters( 'startTime.?, 'untilTime.?, 'entityType.?, 'entityId.?, 'event.?, 'targetEntityType.?, 'targetEntityId.?, 'limit.as[Int].?, 'reversed.as[Boolean].?) { (startTimeStr, untilTimeStr, entityType, entityId, eventName, // only support one event name targetEntityType, targetEntityId, limit, reversed) => logger.debug( s"GET events of appId=${appId} " + s"st=${startTimeStr} ut=${untilTimeStr} " + s"et=${entityType} eid=${entityId} " + s"li=${limit} rev=${reversed} ") require(!((reversed == Some(true)) && (entityType.isEmpty || entityId.isEmpty)), "the parameter reversed can only be used with" + " both entityType and entityId specified.") val parseTime = Future { val startTime = startTimeStr.map(Utils.stringToDateTime(_)) val untilTime = untilTimeStr.map(Utils.stringToDateTime(_)) (startTime, untilTime) } val f = parseTime.flatMap { case (startTime, untilTime) => val data = eventClient.futureFind( appId = appId, channelId = channelId, startTime = startTime, untilTime = untilTime, entityType = entityType, entityId = entityId, eventNames = eventName.map(List(_)), targetEntityType = targetEntityType.map(Some(_)), targetEntityId = targetEntityId.map(Some(_)), limit = limit.orElse(Some(20)), reversed = reversed) .map { eventIter => if (eventIter.hasNext) { (StatusCodes.OK, eventIter.toArray) } else { (StatusCodes.NotFound, Map("message" -> "Not Found")) } } data } onSuccess(f){ (status, body) => complete(status, body) } } } } } } ~ path("batch" / "events.json") { post { handleExceptions(exceptionHandler) { authenticate(withAccessKey) { authData => val appId = authData.appId val channelId = authData.channelId val allowedEvents = authData.events entity(as[Seq[Try[Event]]]) { events => if (events.length <= MaxNumberOfEventsPerBatchRequest) { val eventWithIndex = events.zipWithIndex val taggedEvents = eventWithIndex.collect { case (Success(event), i) => if(allowedEvents.isEmpty || allowedEvents.contains(event.event)){ (Right(event), i) } else { (Left(event), i) } } val insertEvents = taggedEvents.collect { case (Right(event), i) => (event, i) } insertEvents.foreach { case (event, i) => pluginContext.inputBlockers.values.foreach( _.process(EventInfo( appId = appId, channelId = channelId, event = event), pluginContext)) } val f: Future[Seq[Map[String, Any]]] = eventClient.futureInsertBatch( insertEvents.map(_._1), appId, channelId).map { insertResults => val results = insertResults.zip(insertEvents).map { case (id, (event, i)) => pluginsActorRef ! EventInfo( appId = appId, channelId = channelId, event = event) val status = StatusCodes.Created if (config.stats) { statsActorRef ! Bookkeeping(appId, status, event) } (Map( "status" -> status.intValue, "eventId" -> s"${id}"), i) } ++ // Results of denied events taggedEvents.collect { case (Left(event), i) => (Map( "status" -> StatusCodes.Forbidden.intValue, "message" -> s"${event.event} events are not allowed"), i) } ++ // Results of failed to deserialze events eventWithIndex.collect { case (Failure(exception), i) => (Map( "status" -> StatusCodes.BadRequest.intValue, "message" -> s"${exception.getMessage()}"), i) } // Restore original order results.sortBy { case (_, i) => i }.map { case (data, _) => data } } onSuccess(f.recover { case exception => Map( "status" -> StatusCodes.InternalServerError.intValue, "message" -> s"${exception.getMessage()}" ) }){ res => complete(res) } } else { complete(StatusCodes.BadRequest, Map("message" -> (s"Batch request must have less than or equal to " + s"${MaxNumberOfEventsPerBatchRequest} events"))) } } } } } } ~ path("stats.json") { get { handleExceptions(exceptionHandler) { authenticate(withAccessKey) { authData => val appId = authData.appId if (config.stats) { complete { statsActorRef ? GetStats(appId) map { _.asInstanceOf[Map[String, StatsSnapshot]] } } } else { complete( StatusCodes.NotFound, Map("message" -> "To see stats, launch Event Server with --stats argument.") ) } } } } // stats.json get } ~ path("webhooks" / jsonPath ) { web => post { handleExceptions(exceptionHandler) { authenticate(withAccessKey) { authData => val appId = authData.appId val channelId = authData.channelId entity(as[JObject]) { jObj => onSuccess(Webhooks.postJson( appId = appId, channelId = channelId, web = web, data = jObj, eventClient = eventClient, log = logger, stats = config.stats, statsActorRef = statsActorRef )){ (status, body) => complete(status, body) } } } } } ~ get { handleExceptions(exceptionHandler) { authenticate(withAccessKey) { authData => val appId = authData.appId val channelId = authData.channelId onSuccess( Webhooks.getJson( appId = appId, channelId = channelId, web = web, log = logger) ){ (status, body) => complete(status, body) } } } } } ~ path("webhooks" / formPath ) { web => post { handleExceptions(exceptionHandler) { authenticate(withAccessKey) { authData => val appId = authData.appId val channelId = authData.channelId entity(as[FormData]){ formData => logger.debug(formData.toString) onSuccess(Webhooks.postForm( appId = appId, channelId = channelId, web = web, data = formData, eventClient = eventClient, log = logger, stats = config.stats, statsActorRef = statsActorRef )){ (status, body) => complete(status, body) } } } } } ~ get { handleExceptions(exceptionHandler) { authenticate(withAccessKey) { authData => val appId = authData.appId val channelId = authData.channelId onSuccess(Webhooks.getForm( appId = appId, channelId = channelId, web = web, log = logger )){ (status, body) => complete(status, body) } } } } } route } def createEventServer(config: EventServerConfig): ActorSystem = { implicit val system = ActorSystem("EventServerSystem") implicit val materializer = ActorMaterializer() implicit val executionContext = system.dispatcher val eventClient = Storage.getLEvents() val accessKeysClient = Storage.getMetaDataAccessKeys() val channelsClient = Storage.getMetaDataChannels() val statsActorRef = system.actorSelection("/user/StatsActor") val pluginsActorRef = system.actorSelection("/user/PluginsActor") val logger = Logging(system, getClass) val route = createRoute(eventClient, accessKeysClient, channelsClient, logger, statsActorRef, pluginsActorRef, config) Http().bindAndHandle(route, config.ip, config.port) system } } object Run { def main(args: Array[String]): Unit = { val f = EventServer.createEventServer(EventServerConfig( ip = "0.0.0.0", port = 7070)) .whenTerminated Await.ready(f, Duration.Inf) } } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/api/EventServerPlugin.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.api trait EventServerPlugin { val pluginName: String val pluginDescription: String val pluginType: String def process(eventInfo: EventInfo, context: EventServerPluginContext) def handleREST(appId: Int, channelId: Option[Int], arguments: Seq[String]): String } object EventServerPlugin { val inputBlocker = "inputblocker" val inputSniffer = "inputsniffer" } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/api/EventServerPluginContext.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.api import java.util.ServiceLoader import akka.event.LoggingAdapter import grizzled.slf4j.Logging import scala.collection.JavaConversions._ import scala.collection.mutable class EventServerPluginContext( val plugins: mutable.Map[String, mutable.Map[String, EventServerPlugin]], val log: LoggingAdapter) { def inputBlockers: Map[String, EventServerPlugin] = plugins.getOrElse(EventServerPlugin.inputBlocker, Map.empty).toMap def inputSniffers: Map[String, EventServerPlugin] = plugins.getOrElse(EventServerPlugin.inputSniffer, Map.empty).toMap } object EventServerPluginContext extends Logging { def apply(log: LoggingAdapter): EventServerPluginContext = { val plugins = mutable.Map[String, mutable.Map[String, EventServerPlugin]]( EventServerPlugin.inputBlocker -> mutable.Map(), EventServerPlugin.inputSniffer -> mutable.Map()) val serviceLoader = ServiceLoader.load(classOf[EventServerPlugin]) serviceLoader foreach { service => plugins(service.pluginType) += service.pluginName -> service } new EventServerPluginContext( plugins, log) } } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/api/PluginsActor.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.api import akka.actor.Actor import akka.event.Logging class PluginsActor() extends Actor { implicit val system = context.system val log = Logging(system, this) val pluginContext = EventServerPluginContext(log) def receive: PartialFunction[Any, Unit] = { case e: EventInfo => pluginContext.inputSniffers.values.foreach(_.process(e, pluginContext)) case h: PluginsActor.HandleREST => try { sender() ! pluginContext.inputSniffers(h.pluginName).handleREST( h.appId, h.channelId, h.pluginArgs) } catch { case e: Exception => sender() ! s"""{"message":"${e.getMessage}"}""" } case _ => log.error("Unknown message sent to Event Server input sniffer plugin host.") } } object PluginsActor { case class HandleREST( pluginName: String, appId: Int, channelId: Option[Int], pluginArgs: Seq[String]) } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/api/Stats.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.api import akka.http.scaladsl.model.StatusCode import org.apache.predictionio.data.storage.Event import scala.collection.mutable.{HashMap => MHashMap} import scala.collection.mutable import com.github.nscala_time.time.Imports.DateTime case class EntityTypesEvent( val entityType: String, val targetEntityType: Option[String], val event: String) { def this(e: Event) = this( e.entityType, e.targetEntityType, e.event) } case class KV[K, V](key: K, value: V) case class StatsSnapshot( val startTime: DateTime, val endTime: Option[DateTime], val basic: Seq[KV[EntityTypesEvent, Long]], val statusCode: Seq[KV[StatusCode, Long]] ) class Stats(val startTime: DateTime) { private[this] var _endTime: Option[DateTime] = None var statusCodeCount = MHashMap[(Int, StatusCode), Long]().withDefaultValue(0L) var eteCount = MHashMap[(Int, EntityTypesEvent), Long]().withDefaultValue(0L) def cutoff(endTime: DateTime) { _endTime = Some(endTime) } def update(appId: Int, statusCode: StatusCode, event: Event) { statusCodeCount((appId, statusCode)) += 1 eteCount((appId, new EntityTypesEvent(event))) += 1 } def extractByAppId[K, V](appId: Int, m: mutable.Map[(Int, K), V]) : Seq[KV[K, V]] = { m .toSeq .flatMap { case (k, v) => if (k._1 == appId) { Seq(KV(k._2, v)) } else { Nil } } } def get(appId: Int): StatsSnapshot = { StatsSnapshot( startTime, _endTime, extractByAppId(appId, eteCount), extractByAppId(appId, statusCodeCount) ) } } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/api/StatsActor.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.api import akka.http.scaladsl.model.StatusCode import org.apache.predictionio.data.storage.Event import akka.actor.Actor import akka.event.Logging import com.github.nscala_time.time.Imports.DateTime /* message to StatsActor */ case class Bookkeeping(val appId: Int, statusCode: StatusCode, event: Event) /* message to StatsActor */ case class GetStats(val appId: Int) class StatsActor extends Actor { implicit val system = context.system val log = Logging(system, this) def getCurrent: DateTime = { DateTime.now. withMinuteOfHour(0). withSecondOfMinute(0). withMillisOfSecond(0) } var longLiveStats = new Stats(DateTime.now) var hourlyStats = new Stats(getCurrent) var prevHourlyStats = new Stats(getCurrent.minusHours(1)) prevHourlyStats.cutoff(hourlyStats.startTime) def bookkeeping(appId: Int, statusCode: StatusCode, event: Event) { val current = getCurrent // If the current hour is different from the stats start time, we create // another stats instance, and move the current to prev. if (current != hourlyStats.startTime) { prevHourlyStats = hourlyStats prevHourlyStats.cutoff(current) hourlyStats = new Stats(current) } hourlyStats.update(appId, statusCode, event) longLiveStats.update(appId, statusCode, event) } def receive: Actor.Receive = { case Bookkeeping(appId, statusCode, event) => bookkeeping(appId, statusCode, event) case GetStats(appId) => sender() ! Map( "time" -> DateTime.now, "currentHour" -> hourlyStats.get(appId), "prevHour" -> prevHourlyStats.get(appId), "longLive" -> longLiveStats.get(appId)) case _ => log.error("Unknown message.") } } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/api/Webhooks.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.api import akka.http.scaladsl.model.{FormData, StatusCode, StatusCodes} import org.apache.predictionio.data.webhooks.ConnectorUtil import org.apache.predictionio.data.storage.LEvents import org.json4s.JObject import akka.event.LoggingAdapter import akka.actor.ActorSelection import scala.concurrent.{ExecutionContext, Future} private[predictionio] object Webhooks { def postJson( appId: Int, channelId: Option[Int], web: String, data: JObject, eventClient: LEvents, log: LoggingAdapter, stats: Boolean, statsActorRef: ActorSelection )(implicit ec: ExecutionContext): Future[(StatusCode, Map[String, String])] = { val eventFuture = Future { WebhooksConnectors.json.get(web).map { connector => ConnectorUtil.toEvent(connector, data) } } eventFuture.flatMap { case None => Future successful { val message = s"webhooks connection for ${web} is not supported." (StatusCodes.NotFound, Map("message" -> message)) } case Some(event) => val data = eventClient.futureInsert(event, appId, channelId).map { id => val result = (StatusCodes.Created, Map("eventId" -> s"${id}")) if (stats) { statsActorRef ! Bookkeeping(appId, result._1, event) } result } data } } def getJson( appId: Int, channelId: Option[Int], web: String, log: LoggingAdapter )(implicit ec: ExecutionContext): Future[(StatusCode, Map[String, String])] = { Future { WebhooksConnectors.json.get(web).map { connector => (StatusCodes.OK, Map("message" -> "Ok")) }.getOrElse { val message = s"webhooks connection for ${web} is not supported." (StatusCodes.NotFound, Map("message" -> message)) } } } def postForm( appId: Int, channelId: Option[Int], web: String, data: FormData, eventClient: LEvents, log: LoggingAdapter, stats: Boolean, statsActorRef: ActorSelection )(implicit ec: ExecutionContext): Future[(StatusCode, Map[String, String])] = { val eventFuture = Future { WebhooksConnectors.form.get(web).map { connector => ConnectorUtil.toEvent(connector, data.fields.toMap) } } eventFuture.flatMap { case None => Future successful { val message = s"webhooks connection for ${web} is not supported." (StatusCodes.NotFound, Map("message" -> message)) } case Some(event) => val data = eventClient.futureInsert(event, appId, channelId).map { id => val result = (StatusCodes.Created, Map("eventId" -> s"${id}")) if (stats) { statsActorRef ! Bookkeeping(appId, result._1, event) } result } data } } def getForm( appId: Int, channelId: Option[Int], web: String, log: LoggingAdapter )(implicit ec: ExecutionContext): Future[(StatusCode, Map[String, String])] = { Future { WebhooksConnectors.form.get(web).map { connector => (StatusCodes.OK, Map("message" -> "Ok")) }.getOrElse { val message = s"webhooks connection for ${web} is not supported." (StatusCodes.NotFound, Map("message" -> message)) } } } } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/api/WebhooksConnectors.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.api import org.apache.predictionio.data.webhooks.JsonConnector import org.apache.predictionio.data.webhooks.FormConnector import org.apache.predictionio.data.webhooks.segmentio.SegmentIOConnector import org.apache.predictionio.data.webhooks.mailchimp.MailChimpConnector private[predictionio] object WebhooksConnectors { val json: Map[String, JsonConnector] = Map( "segmentio" -> SegmentIOConnector ) val form: Map[String, FormConnector] = Map( "mailchimp" -> MailChimpConnector ) } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/package.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio /** Provides data access for PredictionIO and any engines running on top of * PredictionIO */ package object data {} ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/storage/AccessKeys.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.storage import java.security.SecureRandom import org.apache.predictionio.annotation.DeveloperApi import org.apache.commons.codec.binary.Base64 /** :: DeveloperApi :: * Stores mapping of access keys, app IDs, and lists of allowed event names * * @param key Access key * @param appid App ID * @param events List of allowed events for this particular app key * @group Meta Data */ @DeveloperApi case class AccessKey( key: String, appid: Int, events: Seq[String]) /** :: DeveloperApi :: * Base trait of the [[AccessKey]] data access object * * @group Meta Data */ @DeveloperApi trait AccessKeys { /** Insert a new [[AccessKey]]. If the key field is empty, a key will be * generated. */ def insert(k: AccessKey): Option[String] /** Get an [[AccessKey]] by key */ def get(k: String): Option[AccessKey] /** Get all [[AccessKey]]s */ def getAll(): Seq[AccessKey] /** Get all [[AccessKey]]s for a particular app ID */ def getByAppid(appid: Int): Seq[AccessKey] /** Update an [[AccessKey]] */ def update(k: AccessKey): Unit /** Delete an [[AccessKey]] */ def delete(k: String): Unit /** Default implementation of key generation */ def generateKey: String = { val sr = new SecureRandom val srBytes = Array.fill(48)(0.toByte) sr.nextBytes(srBytes) Base64.encodeBase64URLSafeString(srBytes) match { case x if x startsWith "-" => generateKey case x => x } } } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/storage/Apps.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.storage import org.apache.predictionio.annotation.DeveloperApi /** :: DeveloperApi :: * Stores mapping of app IDs and names * * @param id ID of the app. * @param name Name of the app. * @param description Long description of the app. * @group Meta Data */ @DeveloperApi case class App( id: Int, name: String, description: Option[String]) /** :: DeveloperApi :: * Base trait of the [[App]] data access object * * @group Meta Data */ @DeveloperApi trait Apps { /** Insert a new [[App]]. Returns a generated app ID if the supplied app ID is 0. */ def insert(app: App): Option[Int] /** Get an [[App]] by app ID */ def get(id: Int): Option[App] /** Get an [[App]] by app name */ def getByName(name: String): Option[App] /** Get all [[App]]s */ def getAll(): Seq[App] /** Update an [[App]] */ def update(app: App): Unit /** Delete an [[App]] */ def delete(id: Int): Unit } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/storage/BiMap.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.storage import scala.collection.immutable.HashMap import org.apache.spark.rdd.RDD /** Immutable Bi-directional Map * */ class BiMap[K, V] private[predictionio] ( private val m: Map[K, V], private val i: Option[BiMap[V, K]] = None ) extends Serializable { // NOTE: make inverse's inverse point back to current BiMap val inverse: BiMap[V, K] = i.getOrElse { val rev = m.map(_.swap) require((rev.size == m.size), s"Failed to create reversed map. Cannot have duplicated values.") new BiMap(rev, Some(this)) } def get(k: K): Option[V] = m.get(k) def getOrElse(k: K, default: => V): V = m.getOrElse(k, default) def contains(k: K): Boolean = m.contains(k) def apply(k: K): V = m.apply(k) /** Converts to a map. * @return a map of type immutable.Map[K, V] */ def toMap: Map[K, V] = m /** Converts to a sequence. * @return a sequence containing all elements of this map */ def toSeq: Seq[(K, V)] = m.toSeq def size: Int = m.size def take(n: Int): BiMap[K, V] = BiMap(m.take(n)) override def toString: String = m.toString } object BiMap { def apply[K, V](x: Map[K, V]): BiMap[K, V] = new BiMap(x) /** Create a BiMap[String, Long] from a set of String. The Long index starts * from 0. * @param keys a set of String * @return a String to Long BiMap */ def stringLong(keys: Set[String]): BiMap[String, Long] = { val hm = HashMap(keys.toSeq.zipWithIndex.map(t => (t._1, t._2.toLong)) : _*) new BiMap(hm) } /** Create a BiMap[String, Long] from an array of String. * NOTE: the the array cannot have duplicated element. * The Long index starts from 0. * @param keys a set of String * @return a String to Long BiMap */ def stringLong(keys: Array[String]): BiMap[String, Long] = { val hm = HashMap(keys.zipWithIndex.map(t => (t._1, t._2.toLong)) : _*) new BiMap(hm) } /** Create a BiMap[String, Long] from RDD[String]. The Long index starts * from 0. * @param keys RDD of String * @return a String to Long BiMap */ def stringLong(keys: RDD[String]): BiMap[String, Long] = { stringLong(keys.distinct.collect) } /** Create a BiMap[String, Int] from a set of String. The Int index starts * from 0. * @param keys a set of String * @return a String to Int BiMap */ def stringInt(keys: Set[String]): BiMap[String, Int] = { val hm = HashMap(keys.toSeq.zipWithIndex : _*) new BiMap(hm) } /** Create a BiMap[String, Int] from an array of String. * NOTE: the the array cannot have duplicated element. * The Int index starts from 0. * @param keys a set of String * @return a String to Int BiMap */ def stringInt(keys: Array[String]): BiMap[String, Int] = { val hm = HashMap(keys.zipWithIndex : _*) new BiMap(hm) } /** Create a BiMap[String, Int] from RDD[String]. The Int index starts * from 0. * @param keys RDD of String * @return a String to Int BiMap */ def stringInt(keys: RDD[String]): BiMap[String, Int] = { stringInt(keys.distinct.collect) } private[this] def stringDoubleImpl(keys: Seq[String]) : BiMap[String, Double] = { val ki = keys.zipWithIndex.map(e => (e._1, e._2.toDouble)) new BiMap(HashMap(ki : _*)) } /** Create a BiMap[String, Double] from a set of String. The Double index * starts from 0. * @param keys a set of String * @return a String to Double BiMap */ def stringDouble(keys: Set[String]): BiMap[String, Double] = { // val hm = HashMap(keys.toSeq.zipWithIndex.map(_.toDouble) : _*) // new BiMap(hm) stringDoubleImpl(keys.toSeq) } /** Create a BiMap[String, Double] from an array of String. * NOTE: the the array cannot have duplicated element. * The Double index starts from 0. * @param keys a set of String * @return a String to Double BiMap */ def stringDouble(keys: Array[String]): BiMap[String, Double] = { // val hm = HashMap(keys.zipWithIndex.mapValues(_.toDouble) : _*) // new BiMap(hm) stringDoubleImpl(keys.toSeq) } /** Create a BiMap[String, Double] from RDD[String]. The Double index starts * from 0. * @param keys RDD of String * @return a String to Double BiMap */ def stringDouble(keys: RDD[String]): BiMap[String, Double] = { stringDoubleImpl(keys.distinct.collect) } } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/storage/Channels.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.storage import org.apache.predictionio.annotation.DeveloperApi /** :: DeveloperApi :: * Stores mapping of channel IDs, names and app ID * * @param id ID of the channel * @param name Name of the channel (must be unique within the same app) * @param appid ID of the app which this channel belongs to * @group Meta Data */ @DeveloperApi case class Channel( id: Int, name: String, // must be unique within the same app appid: Int ) { require(Channel.isValidName(name), "Invalid channel name: ${name}. ${Channel.nameConstraint}") } /** :: DeveloperApi :: * Companion object of [[Channel]] * * @group Meta Data */ @DeveloperApi object Channel { /** Examine whether the supplied channel name is valid. A valid channel name * must consists of 1 to 16 alphanumeric and '-' characters. * * @param s Channel name to examine * @return true if channel name is valid, false otherwise */ def isValidName(s: String): Boolean = { // note: update channelNameConstraint if this rule is changed s.matches("^[a-zA-Z0-9-]{1,16}$") } /** For consistent error message display */ val nameConstraint: String = "Only alphanumeric and - characters are allowed and max length is 16." } /** :: DeveloperApi :: * Base trait of the [[Channel]] data access object * * @group Meta Data */ @DeveloperApi trait Channels { /** Insert a new [[Channel]]. Returns a generated channel ID if original ID is 0. */ def insert(channel: Channel): Option[Int] /** Get a [[Channel]] by channel ID */ def get(id: Int): Option[Channel] /** Get all [[Channel]] by app ID */ def getByAppid(appid: Int): Seq[Channel] /** Delete a [[Channel]] */ def delete(id: Int): Unit } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/storage/DataMap.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.storage import org.json4s._ import org.json4s.native.JsonMethods.parse import scala.collection.GenTraversableOnce import scala.collection.JavaConversions /** Exception class for [[DataMap]] * * @group Event Data */ case class DataMapException(msg: String, cause: Exception) extends Exception(msg, cause) { def this(msg: String) = this(msg, null) } /** A DataMap stores properties of the event or entity. Internally it is a Map * whose keys are property names and values are corresponding JSON values * respectively. Use the [[get[T](name: String,clazz: Class[T])]] method to * retrieve the value of a mandatory property or use [[getOpt]] to retrieve the * value of an optional property. * * @param fields Map of property name to JValue * @group Event Data */ class DataMap ( val fields: Map[String, JValue] ) extends Serializable { @transient lazy implicit private val formats = DefaultFormats + new DateTimeJson4sSupport.Serializer /** Check the existence of a required property name. Throw an exception if * it does not exist. * * @param name The property name */ def require(name: String): Unit = { if (!fields.contains(name)) { throw new DataMapException(s"The field $name is required.") } } /** Check if this DataMap contains a specific property. * * @param name The property name * @return Return true if the property exists, else false. */ def contains(name: String): Boolean = { fields.contains(name) } /** Get the value of a mandatory property. Exception is thrown if the property * does not exist. * * @tparam T The type of the property value * @param name The property name * @return Return the property value of type T */ def get[T: Manifest](name: String): T = { require(name) fields(name) match { case JNull => throw new DataMapException( s"The required field $name cannot be null.") case x: JValue => x.extract[T] } } /** Get the value of an optional property. Return None if the property does * not exist. * * @tparam T The type of the property value * @param name The property name * @return Return the property value of type Option[T] */ def getOpt[T: Manifest](name: String): Option[T] = { // either the field doesn't exist or its value is null fields.get(name).flatMap(_.extract[Option[T]]) } /** Get the value of an optional property. Return default value if the * property does not exist. * * @tparam T The type of the property value * @param name The property name * @param default The default property value of type T * @return Return the property value of type T */ def getOrElse[T: Manifest](name: String, default: T): T = { getOpt[T](name).getOrElse(default) } /** Java-friendly method for getting the value of a property. Return null if the * property does not exist. * * @tparam T The type of the property value * @param name The property name * @param clazz The class of the type of the property value * @return Return the property value of type T */ def get[T](name: String, clazz: java.lang.Class[T]): T = { val manifest = new Manifest[T] { override def erasure: Class[_] = clazz override def runtimeClass: Class[_] = clazz } fields.get(name) match { case None => null.asInstanceOf[T] case Some(JNull) => null.asInstanceOf[T] case Some(x) => x.extract[T](formats, manifest) } } /** Java-friendly method for getting a list of values of a property. Return null if the * property does not exist. * * @param name The property name * @return Return the list of property values */ def getStringList(name: String): java.util.List[String] = { fields.get(name) match { case None => null case Some(JNull) => null case Some(x) => JavaConversions.seqAsJavaList(x.extract[List[String]](formats, manifest[List[String]])) } } /** Return a new DataMap with elements containing elements from the left hand * side operand followed by elements from the right hand side operand. * * @param that Right hand side DataMap * @return A new DataMap */ def ++ (that: DataMap): DataMap = DataMap(this.fields ++ that.fields) /** Creates a new DataMap from this DataMap by removing all elements of * another collection. * * @param that A collection containing the removed property names * @return A new DataMap */ def -- (that: GenTraversableOnce[String]): DataMap = DataMap(this.fields -- that) /** Tests whether the DataMap is empty. * * @return true if the DataMap is empty, false otherwise. */ def isEmpty: Boolean = fields.isEmpty /** Collects all property names of this DataMap in a set. * * @return a set containing all property names of this DataMap. */ def keySet: Set[String] = this.fields.keySet /** Converts this DataMap to a List. * * @return a list of (property name, JSON value) tuples. */ def toList(): List[(String, JValue)] = fields.toList /** Converts this DataMap to a JObject. * * @return the JObject initialized by this DataMap. */ def toJObject(): JObject = JObject(toList()) /** Converts this DataMap to case class of type T. * * @return the object of type T. */ def extract[T: Manifest]: T = { toJObject().extract[T] } override def toString: String = s"DataMap($fields)" override def hashCode: Int = 41 + fields.hashCode override def equals(other: Any): Boolean = other match { case that: DataMap => that.canEqual(this) && this.fields.equals(that.fields) case _ => false } def canEqual(other: Any): Boolean = other.isInstanceOf[DataMap] } /** Companion object of the [[DataMap]] class * * @group Event Data */ object DataMap { /** Create an empty DataMap * @return an empty DataMap */ def apply(): DataMap = new DataMap(Map[String, JValue]()) /** Create an DataMap from a Map of String to JValue * @param fields a Map of String to JValue * @return a new DataMap initialized by fields */ def apply(fields: Map[String, JValue]): DataMap = new DataMap(fields) /** Create an DataMap from a JObject * @param jObj JObject * @return a new DataMap initialized by a JObject */ def apply(jObj: JObject): DataMap = { if (jObj == null) { apply() } else { new DataMap(jObj.obj.toMap) } } /** Create an DataMap from a JSON String * @param js JSON String. eg """{ "a": 1, "b": "foo" }""" * @return a new DataMap initialized by a JSON string */ def apply(js: String): DataMap = apply(parse(js).asInstanceOf[JObject]) } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/storage/DateTimeJson4sSupport.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.storage import org.apache.predictionio.annotation.DeveloperApi import org.apache.predictionio.data.{Utils => DataUtils} import org.joda.time.DateTime import org.json4s._ /** :: DeveloperApi :: * JSON4S serializer for Joda-Time * * @group Common */ @DeveloperApi object DateTimeJson4sSupport { @transient lazy implicit val formats = DefaultFormats /** Serialize DateTime to JValue */ def serializeToJValue: PartialFunction[Any, JValue] = { case d: DateTime => JString(DataUtils.dateTimeToString(d)) } /** Deserialize JValue to DateTime */ def deserializeFromJValue: PartialFunction[JValue, DateTime] = { case jv: JValue => DataUtils.stringToDateTime(jv.extract[String]) } /** Custom JSON4S serializer for Joda-Time */ class Serializer extends CustomSerializer[DateTime](format => ( deserializeFromJValue, serializeToJValue)) } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/storage/EngineInstances.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.storage import com.github.nscala_time.time.Imports._ import org.apache.predictionio.annotation.DeveloperApi import org.json4s._ /** :: DeveloperApi :: * Stores parameters, model, and other information for each engine instance * * @param id Engine instance ID. * @param status Status of the engine instance. * @param startTime Start time of the training/evaluation. * @param endTime End time of the training/evaluation. * @param engineId Engine ID of the instance. * @param engineVersion Engine version of the instance. * @param engineVariant Engine variant ID of the instance. * @param engineFactory Engine factory class for the instance. * @param batch A batch label of the engine instance. * @param env The environment in which the instance was created. * @param sparkConf Custom Spark configuration of the instance. * @param dataSourceParams Data source parameters of the instance. * @param preparatorParams Preparator parameters of the instance. * @param algorithmsParams Algorithms parameters of the instance. * @param servingParams Serving parameters of the instance. * @group Meta Data */ @DeveloperApi case class EngineInstance( id: String, status: String, startTime: DateTime, endTime: DateTime, engineId: String, engineVersion: String, engineVariant: String, engineFactory: String, batch: String, env: Map[String, String], sparkConf: Map[String, String], dataSourceParams: String, preparatorParams: String, algorithmsParams: String, servingParams: String) /** :: DeveloperApi :: * Base trait of the [[EngineInstance]] data access object * * @group Meta Data */ @DeveloperApi trait EngineInstances { /** Insert a new [[EngineInstance]] */ def insert(i: EngineInstance): String /** Get an [[EngineInstance]] by ID */ def get(id: String): Option[EngineInstance] /** Get all [[EngineInstance]]s */ def getAll(): Seq[EngineInstance] /** Get an instance that has started training the latest and has trained to * completion */ def getLatestCompleted( engineId: String, engineVersion: String, engineVariant: String): Option[EngineInstance] /** Get all instances that has trained to completion */ def getCompleted( engineId: String, engineVersion: String, engineVariant: String): Seq[EngineInstance] /** Update an [[EngineInstance]] */ def update(i: EngineInstance): Unit /** Delete an [[EngineInstance]] */ def delete(id: String): Unit } /** :: DeveloperApi :: * JSON4S serializer for [[EngineInstance]] * * @group Meta Data */ @DeveloperApi class EngineInstanceSerializer extends CustomSerializer[EngineInstance]( format => ({ case JObject(fields) => implicit val formats = DefaultFormats val seed = EngineInstance( id = "", status = "", startTime = DateTime.now, endTime = DateTime.now, engineId = "", engineVersion = "", engineVariant = "", engineFactory = "", batch = "", env = Map.empty, sparkConf = Map.empty, dataSourceParams = "", preparatorParams = "", algorithmsParams = "", servingParams = "") fields.foldLeft(seed) { case (i, field) => field match { case JField("id", JString(id)) => i.copy(id = id) case JField("status", JString(status)) => i.copy(status = status) case JField("startTime", JString(startTime)) => i.copy(startTime = Utils.stringToDateTime(startTime)) case JField("endTime", JString(endTime)) => i.copy(endTime = Utils.stringToDateTime(endTime)) case JField("engineId", JString(engineId)) => i.copy(engineId = engineId) case JField("engineVersion", JString(engineVersion)) => i.copy(engineVersion = engineVersion) case JField("engineVariant", JString(engineVariant)) => i.copy(engineVariant = engineVariant) case JField("engineFactory", JString(engineFactory)) => i.copy(engineFactory = engineFactory) case JField("batch", JString(batch)) => i.copy(batch = batch) case JField("env", env) => i.copy(env = Extraction.extract[Map[String, String]](env)) case JField("sparkConf", sparkConf) => i.copy(sparkConf = Extraction.extract[Map[String, String]](sparkConf)) case JField("dataSourceParams", JString(dataSourceParams)) => i.copy(dataSourceParams = dataSourceParams) case JField("preparatorParams", JString(preparatorParams)) => i.copy(preparatorParams = preparatorParams) case JField("algorithmsParams", JString(algorithmsParams)) => i.copy(algorithmsParams = algorithmsParams) case JField("servingParams", JString(servingParams)) => i.copy(servingParams = servingParams) case _ => i } } }, { case i: EngineInstance => JObject( JField("id", JString(i.id)) :: JField("status", JString(i.status)) :: JField("startTime", JString(i.startTime.toString)) :: JField("endTime", JString(i.endTime.toString)) :: JField("engineId", JString(i.engineId)) :: JField("engineVersion", JString(i.engineVersion)) :: JField("engineVariant", JString(i.engineVariant)) :: JField("engineFactory", JString(i.engineFactory)) :: JField("batch", JString(i.batch)) :: JField("env", Extraction.decompose(i.env)(DefaultFormats)) :: JField("sparkConf", Extraction.decompose(i.sparkConf)(DefaultFormats)) :: JField("dataSourceParams", JString(i.dataSourceParams)) :: JField("preparatorParams", JString(i.preparatorParams)) :: JField("algorithmsParams", JString(i.algorithmsParams)) :: JField("servingParams", JString(i.servingParams)) :: Nil) } )) ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/storage/EntityMap.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.storage import org.apache.predictionio.annotation.Experimental import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.rdd.RDD /** :: Experimental :: */ @Experimental class EntityIdIxMap(val idToIx: BiMap[String, Long]) extends Serializable { val ixToId: BiMap[Long, String] = idToIx.inverse def apply(id: String): Long = idToIx(id) def apply(ix: Long): String = ixToId(ix) def contains(id: String): Boolean = idToIx.contains(id) def contains(ix: Long): Boolean = ixToId.contains(ix) def get(id: String): Option[Long] = idToIx.get(id) def get(ix: Long): Option[String] = ixToId.get(ix) def getOrElse(id: String, default: => Long): Long = idToIx.getOrElse(id, default) def getOrElse(ix: Long, default: => String): String = ixToId.getOrElse(ix, default) def toMap: Map[String, Long] = idToIx.toMap def size: Long = idToIx.size def take(n: Int): EntityIdIxMap = new EntityIdIxMap(idToIx.take(n)) override def toString: String = idToIx.toString } /** :: Experimental :: */ @Experimental object EntityIdIxMap { def apply(keys: RDD[String]): EntityIdIxMap = { new EntityIdIxMap(BiMap.stringLong(keys)) } } /** :: Experimental :: */ @Experimental class EntityMap[A](val idToData: Map[String, A], override val idToIx: BiMap[String, Long]) extends EntityIdIxMap(idToIx) { def this(idToData: Map[String, A]) = this( idToData, BiMap.stringLong(idToData.keySet) ) def data(id: String): A = idToData(id) def data(ix: Long): A = idToData(ixToId(ix)) def getData(id: String): Option[A] = idToData.get(id) def getData(ix: Long): Option[A] = idToData.get(ixToId(ix)) def getOrElseData(id: String, default: => A): A = getData(id).getOrElse(default) def getOrElseData(ix: Long, default: => A): A = getData(ix).getOrElse(default) override def take(n: Int): EntityMap[A] = { val newIdToIx = idToIx.take(n) new EntityMap[A](idToData.filterKeys(newIdToIx.contains(_)), newIdToIx) } override def toString: String = { s"idToData: ${idToData.toString} " + s"idToix: ${idToIx.toString}" } } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/storage/EvaluationInstances.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.storage import com.github.nscala_time.time.Imports._ import org.apache.predictionio.annotation.DeveloperApi import org.json4s._ /** :: DeveloperApi :: * Stores meta information for each evaluation instance. * * @param id Instance ID. * @param status Status of this instance. * @param startTime Start time of this instance. * @param endTime End time of this instance. * @param evaluationClass Evaluation class name of this instance. * @param engineParamsGeneratorClass Engine parameters generator class name of this instance. * @param batch Batch label of this instance. * @param env The environment in which this instance was created. * @param evaluatorResults Results of the evaluator. * @param evaluatorResultsHTML HTML results of the evaluator. * @param evaluatorResultsJSON JSON results of the evaluator. * @group Meta Data */ @DeveloperApi case class EvaluationInstance( id: String = "", status: String = "", startTime: DateTime = DateTime.now, endTime: DateTime = DateTime.now, evaluationClass: String = "", engineParamsGeneratorClass: String = "", batch: String = "", env: Map[String, String] = Map.empty, sparkConf: Map[String, String] = Map.empty, evaluatorResults: String = "", evaluatorResultsHTML: String = "", evaluatorResultsJSON: String = "") /** :: DeveloperApi :: * Base trait of the [[EvaluationInstance]] data access object * * @group Meta Data */ @DeveloperApi trait EvaluationInstances { /** Insert a new [[EvaluationInstance]] */ def insert(i: EvaluationInstance): String /** Get an [[EvaluationInstance]] by ID */ def get(id: String): Option[EvaluationInstance] /** Get all [[EvaluationInstances]] */ def getAll: Seq[EvaluationInstance] /** Get instances that are produced by evaluation and have run to completion, * reverse sorted by the start time */ def getCompleted: Seq[EvaluationInstance] /** Update an [[EvaluationInstance]] */ def update(i: EvaluationInstance): Unit /** Delete an [[EvaluationInstance]] */ def delete(id: String): Unit } /** :: DeveloperApi :: * JSON4S serializer for [[EvaluationInstance]] * * @group Meta Data */ class EvaluationInstanceSerializer extends CustomSerializer[EvaluationInstance]( format => ({ case JObject(fields) => implicit val formats = DefaultFormats fields.foldLeft(EvaluationInstance()) { case (i, field) => field match { case JField("id", JString(id)) => i.copy(id = id) case JField("status", JString(status)) => i.copy(status = status) case JField("startTime", JString(startTime)) => i.copy(startTime = Utils.stringToDateTime(startTime)) case JField("endTime", JString(endTime)) => i.copy(endTime = Utils.stringToDateTime(endTime)) case JField("evaluationClass", JString(evaluationClass)) => i.copy(evaluationClass = evaluationClass) case JField("engineParamsGeneratorClass", JString(engineParamsGeneratorClass)) => i.copy(engineParamsGeneratorClass = engineParamsGeneratorClass) case JField("batch", JString(batch)) => i.copy(batch = batch) case JField("env", env) => i.copy(env = Extraction.extract[Map[String, String]](env)) case JField("sparkConf", sparkConf) => i.copy(sparkConf = Extraction.extract[Map[String, String]](sparkConf)) case JField("evaluatorResults", JString(evaluatorResults)) => i.copy(evaluatorResults = evaluatorResults) case JField("evaluatorResultsHTML", JString(evaluatorResultsHTML)) => i.copy(evaluatorResultsHTML = evaluatorResultsHTML) case JField("evaluatorResultsJSON", JString(evaluatorResultsJSON)) => i.copy(evaluatorResultsJSON = evaluatorResultsJSON) case _ => i } } }, { case i: EvaluationInstance => JObject( JField("id", JString(i.id)) :: JField("status", JString(i.status)) :: JField("startTime", JString(i.startTime.toString)) :: JField("endTime", JString(i.endTime.toString)) :: JField("evaluationClass", JString(i.evaluationClass)) :: JField("engineParamsGeneratorClass", JString(i.engineParamsGeneratorClass)) :: JField("batch", JString(i.batch)) :: JField("env", Extraction.decompose(i.env)(DefaultFormats)) :: JField("sparkConf", Extraction.decompose(i.sparkConf)(DefaultFormats)) :: JField("evaluatorResults", JString(i.evaluatorResults)) :: JField("evaluatorResultsHTML", JString(i.evaluatorResultsHTML)) :: JField("evaluatorResultsJSON", JString(i.evaluatorResultsJSON)) :: Nil ) } ) ) ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/storage/Event.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.storage import org.apache.predictionio.annotation.DeveloperApi import org.joda.time.DateTime import org.joda.time.DateTimeZone /** Each event in the Event Store can be represented by fields in this case * class. * * @param eventId Unique ID of this event. * @param event Name of this event. * @param entityType Type of the entity associated with this event. * @param entityId ID of the entity associated with this event. * @param targetEntityType Type of the target entity associated with this * event. * @param targetEntityId ID of the target entity associated with this event. * @param properties Properties associated with this event. * @param eventTime Time of the happening of this event. * @param tags Tags of this event. * @param prId PredictedResultId of this event. * @param creationTime Time of creation in the system of this event. * @group Event Data */ case class Event( val eventId: Option[String] = None, val event: String, val entityType: String, val entityId: String, val targetEntityType: Option[String] = None, val targetEntityId: Option[String] = None, val properties: DataMap = DataMap(), // default empty val eventTime: DateTime = DateTime.now, val tags: Seq[String] = Nil, val prId: Option[String] = None, val creationTime: DateTime = DateTime.now ) { override def toString(): String = { s"Event(id=$eventId,event=$event,eType=$entityType,eId=$entityId," + s"tType=$targetEntityType,tId=$targetEntityId,p=$properties,t=$eventTime," + s"tags=$tags,pKey=$prId,ct=$creationTime)" } } /** :: DeveloperApi :: * Utilities for validating [[Event]]s * * @group Event Data */ @DeveloperApi object EventValidation { /** Default time zone is set to UTC */ val defaultTimeZone = DateTimeZone.UTC /** Checks whether an event name contains a reserved prefix * * @param name Event name * @return true if event name starts with \$ or pio_, false otherwise */ def isReservedPrefix(name: String): Boolean = name.startsWith("$") || name.startsWith("pio_") /** PredictionIO reserves some single entity event names. They are currently * \$set, \$unset, and \$delete. */ val specialEvents = Set("$set", "$unset", "$delete") /** Checks whether an event name is a special PredictionIO event name * * @param name Event name * @return true if the name is a special event, false otherwise */ def isSpecialEvents(name: String): Boolean = specialEvents.contains(name) /** Validate an [[Event]], throwing exceptions when the candidate violates any * of the following: * * - event name must not be empty * - entityType must not be empty * - entityId must not be empty * - targetEntityType must not be Some of empty * - targetEntityId must not be Some of empty * - targetEntityType and targetEntityId must be both Some or None * - properties must not be empty when event is \$unset * - event name must be a special event if it has a reserved prefix * - targetEntityType and targetEntityId must be None if the event name has * a reserved prefix * - entityType must be a built-in entity type if entityType has a * reserved prefix * - targetEntityType must be a built-in entity type if targetEntityType is * Some and has a reserved prefix * * @param e Event to be validated */ def validate(e: Event): Unit = { require(!e.event.isEmpty, "event must not be empty.") require(!e.entityType.isEmpty, "entityType must not be empty string.") require(!e.entityId.isEmpty, "entityId must not be empty string.") require(e.targetEntityType.map(!_.isEmpty).getOrElse(true), "targetEntityType must not be empty string") require(e.targetEntityId.map(!_.isEmpty).getOrElse(true), "targetEntityId must not be empty string.") require(!((e.targetEntityType != None) && (e.targetEntityId == None)), "targetEntityType and targetEntityId must be specified together.") require(!((e.targetEntityType == None) && (e.targetEntityId != None)), "targetEntityType and targetEntityId must be specified together.") require(!((e.event == "$unset") && e.properties.isEmpty), "properties cannot be empty for $unset event") require(!isReservedPrefix(e.event) || isSpecialEvents(e.event), s"${e.event} is not a supported reserved event name.") require(!isSpecialEvents(e.event) || ((e.targetEntityType == None) && (e.targetEntityId == None)), s"Reserved event ${e.event} cannot have targetEntity") require(!isReservedPrefix(e.entityType) || isBuiltinEntityTypes(e.entityType), s"The entityType ${e.entityType} is not allowed. " + s"'pio_' is a reserved name prefix.") require(e.targetEntityType.map{ t => (!isReservedPrefix(t) || isBuiltinEntityTypes(t))}.getOrElse(true), s"The targetEntityType ${e.targetEntityType.get} is not allowed. " + s"'pio_' is a reserved name prefix.") validateProperties(e) } /** Defines built-in entity types. The current built-in type is pio_pr. */ val builtinEntityTypes: Set[String] = Set("pio_pr") /** Defines built-in properties. This is currently empty. */ val builtinProperties: Set[String] = Set() /** Checks whether an entity type is a built-in entity type */ def isBuiltinEntityTypes(name: String): Boolean = builtinEntityTypes.contains(name) /** Validate event properties, throwing exceptions when the candidate violates * any of the following: * * - property name must not contain a reserved prefix * * @param e Event to be validated */ def validateProperties(e: Event): Unit = { e.properties.keySet.foreach { k => require(!isReservedPrefix(k) || builtinProperties.contains(k), s"The property ${k} is not allowed. " + s"'pio_' is a reserved name prefix.") } } } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/storage/EventJson4sSupport.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.storage import org.apache.predictionio.annotation.DeveloperApi import org.apache.predictionio.data.{Utils => DataUtils} import org.joda.time.DateTime import org.json4s._ import scala.util.{Try, Success, Failure} /** :: DeveloperApi :: * Support library for dealing with [[Event]] and JSON4S * * @group Event Data */ @DeveloperApi object EventJson4sSupport { /** This is set to org.json4s.DefaultFormats. Do not use JSON4S to serialize * or deserialize Joda-Time DateTime because it has some issues with timezone * (as of version 3.2.10) */ implicit val formats = DefaultFormats /** :: DeveloperApi :: * Convert JSON from Event Server to [[Event]] * * @return deserialization routine used by [[APISerializer]] */ @DeveloperApi def readJson: PartialFunction[JValue, Event] = { case JObject(x) => { val fields = new DataMap(x.toMap) // use get() if required in json // use getOpt() if not required in json try { val event = fields.get[String]("event") val eventId = fields.getOpt[String]("eventId") val entityType = fields.get[String]("entityType") val entityId = fields.get[String]("entityId") val targetEntityType = fields.getOpt[String]("targetEntityType") val targetEntityId = fields.getOpt[String]("targetEntityId") val properties = fields.getOrElse[Map[String, JValue]]( "properties", Map.empty) // default currentTime expressed as UTC timezone lazy val currentTime = DateTime.now(EventValidation.defaultTimeZone) val eventTime = fields.getOpt[String]("eventTime") .map{ s => try { DataUtils.stringToDateTime(s) } catch { case _: Exception => throw new MappingException(s"Fail to extract eventTime ${s}") } }.getOrElse(currentTime) // disable tags from API for now. // val tags = fields.getOpt[Seq[String]]("tags").getOrElse(List()) val prId = fields.getOpt[String]("prId") // don't allow user set creationTime from API for now. val creationTime = currentTime // val creationTime = fields.getOpt[String]("creationTime") // .map{ s => // try { // DataUtils.stringToDateTime(s) // } catch { // case _: Exception => // throw new MappingException(s"Fail to extract creationTime ${s}") // } // }.getOrElse(currentTime) val newEvent = Event( eventId=eventId, event = event, entityType = entityType, entityId = entityId, targetEntityType = targetEntityType, targetEntityId = targetEntityId, properties = DataMap(properties), eventTime = eventTime, prId = prId, creationTime = creationTime ) EventValidation.validate(newEvent) newEvent } catch { case e: Exception => throw new MappingException(e.toString, e) } } } /** :: DeveloperApi :: * Convert [[Event]] to JSON for use by the Event Server * * @return serialization routine used by [[APISerializer]] */ @DeveloperApi def writeJson: PartialFunction[Any, JValue] = { case d: Event => { JObject( JField("eventId", d.eventId.map( eid => JString(eid)).getOrElse(JNothing)) :: JField("event", JString(d.event)) :: JField("entityType", JString(d.entityType)) :: JField("entityId", JString(d.entityId)) :: JField("targetEntityType", d.targetEntityType.map(JString(_)).getOrElse(JNothing)) :: JField("targetEntityId", d.targetEntityId.map(JString(_)).getOrElse(JNothing)) :: JField("properties", d.properties.toJObject) :: JField("eventTime", JString(DataUtils.dateTimeToString(d.eventTime))) :: // disable tags from API for now // JField("tags", JArray(d.tags.toList.map(JString(_)))) :: // disable tags from API for now JField("prId", d.prId.map(JString(_)).getOrElse(JNothing)) :: // don't show creationTime for now JField("creationTime", JString(DataUtils.dateTimeToString(d.creationTime))) :: Nil) } } /** :: DeveloperApi :: * Convert JSON4S JValue to [[Event]] * * @return deserialization routine used by [[DBSerializer]] */ @DeveloperApi def deserializeFromJValue: PartialFunction[JValue, Event] = { case jv: JValue => { val event = (jv \ "event").extract[String] val entityType = (jv \ "entityType").extract[String] val entityId = (jv \ "entityId").extract[String] val targetEntityType = (jv \ "targetEntityType").extract[Option[String]] val targetEntityId = (jv \ "targetEntityId").extract[Option[String]] val properties = (jv \ "properties").extract[JObject] val eventTime = DataUtils.stringToDateTime( (jv \ "eventTime").extract[String]) val tags = (jv \ "tags").extract[Seq[String]] val prId = (jv \ "prId").extract[Option[String]] val creationTime = DataUtils.stringToDateTime( (jv \ "creationTime").extract[String]) Event( event = event, entityType = entityType, entityId = entityId, targetEntityType = targetEntityType, targetEntityId = targetEntityId, properties = DataMap(properties), eventTime = eventTime, tags = tags, prId = prId, creationTime = creationTime) } } /** :: DeveloperApi :: * Convert [[Event]] to JSON4S JValue * * @return serialization routine used by [[DBSerializer]] */ @DeveloperApi def serializeToJValue: PartialFunction[Any, JValue] = { case d: Event => { JObject( JField("event", JString(d.event)) :: JField("entityType", JString(d.entityType)) :: JField("entityId", JString(d.entityId)) :: JField("targetEntityType", d.targetEntityType.map(JString(_)).getOrElse(JNothing)) :: JField("targetEntityId", d.targetEntityId.map(JString(_)).getOrElse(JNothing)) :: JField("properties", d.properties.toJObject) :: JField("eventTime", JString(DataUtils.dateTimeToString(d.eventTime))) :: JField("tags", JArray(d.tags.toList.map(JString(_)))) :: JField("prId", d.prId.map(JString(_)).getOrElse(JNothing)) :: JField("creationTime", JString(DataUtils.dateTimeToString(d.creationTime))) :: Nil) } } /** :: DeveloperApi :: * Custom JSON4S serializer for [[Event]] intended to be used by database * access, or anywhere that demands serdes of [[Event]] to/from JSON4S JValue */ @DeveloperApi class DBSerializer extends CustomSerializer[Event](format => ( deserializeFromJValue, serializeToJValue)) /** :: DeveloperApi :: * Custom JSON4S serializer for [[Event]] intended to be used by the Event * Server, or anywhere that demands serdes of [[Event]] to/from JSON */ @DeveloperApi class APISerializer extends CustomSerializer[Event](format => ( readJson, writeJson)) } @DeveloperApi object BatchEventsJson4sSupport { implicit val formats = DefaultFormats @DeveloperApi def readJson: PartialFunction[JValue, Seq[Try[Event]]] = { case JArray(events) => { events.map { event => try { Success(EventJson4sSupport.readJson(event)) } catch { case e: Exception => Failure(e) } } } } @DeveloperApi class APISerializer extends CustomSerializer[Seq[Try[Event]]](format => (readJson, Map.empty)) } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/storage/LEventAggregator.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.storage import org.apache.predictionio.annotation.DeveloperApi import org.joda.time.DateTime /** :: DeveloperApi :: * Provides aggregation support of [[Event]]s to [[LEvents]]. Engine developers * should use [[org.apache.predictionio.data.store.LEventStore]] instead of using this * directly. * * @group Event Data */ @DeveloperApi object LEventAggregator { /** :: DeveloperApi :: * Aggregate all properties grouped by entity type given an iterator of * [[Event]]s with the latest property values from all [[Event]]s, and their * first and last updated time * * @param events An iterator of [[Event]]s whose properties will be aggregated * @return A map of entity type to [[PropertyMap]] */ @DeveloperApi def aggregateProperties(events: Iterator[Event]): Map[String, PropertyMap] = { events.toList .groupBy(_.entityId) .mapValues(_.sortBy(_.eventTime.getMillis) .foldLeft[Prop](Prop())(propAggregator)) .filter{ case (k, v) => v.dm.isDefined } .mapValues{ v => require(v.firstUpdated.isDefined, "Unexpected Error: firstUpdated cannot be None.") require(v.lastUpdated.isDefined, "Unexpected Error: lastUpdated cannot be None.") PropertyMap( fields = v.dm.get.fields, firstUpdated = v.firstUpdated.get, lastUpdated = v.lastUpdated.get ) } } /** :: DeveloperApi :: * Aggregate all properties given an iterator of [[Event]]s with the latest * property values from all [[Event]]s, and their first and last updated time * * @param events An iterator of [[Event]]s whose properties will be aggregated * @return An optional [[PropertyMap]] */ @DeveloperApi def aggregatePropertiesSingle(events: Iterator[Event]) : Option[PropertyMap] = { val prop = events.toList .sortBy(_.eventTime.getMillis) .foldLeft[Prop](Prop())(propAggregator) prop.dm.map{ d => require(prop.firstUpdated.isDefined, "Unexpected Error: firstUpdated cannot be None.") require(prop.lastUpdated.isDefined, "Unexpected Error: lastUpdated cannot be None.") PropertyMap( fields = d.fields, firstUpdated = prop.firstUpdated.get, lastUpdated = prop.lastUpdated.get ) } } /** Event names that control aggregation: \$set, \$unset, and \$delete */ val eventNames = List("$set", "$unset", "$delete") private def dataMapAggregator: ((Option[DataMap], Event) => Option[DataMap]) = { (p, e) => { e.event match { case "$set" => { if (p == None) { Some(e.properties) } else { p.map(_ ++ e.properties) } } case "$unset" => { if (p == None) { None } else { p.map(_ -- e.properties.keySet) } } case "$delete" => None case _ => p // do nothing for others } } } private def propAggregator: ((Prop, Event) => Prop) = { (p, e) => { e.event match { case "$set" | "$unset" | "$delete" => { Prop( dm = dataMapAggregator(p.dm, e), firstUpdated = p.firstUpdated.map { t => first(t, e.eventTime) }.orElse(Some(e.eventTime)), lastUpdated = p.lastUpdated.map { t => last(t, e.eventTime) }.orElse(Some(e.eventTime)) ) } case _ => p // do nothing for others } } } private def first(a: DateTime, b: DateTime): DateTime = if (b.isBefore(a)) b else a private def last(a: DateTime, b: DateTime): DateTime = if (b.isAfter(a)) b else a private case class Prop( dm: Option[DataMap] = None, firstUpdated: Option[DateTime] = None, lastUpdated: Option[DateTime] = None ) } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/storage/LEvents.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.storage import org.apache.predictionio.annotation.DeveloperApi import org.apache.predictionio.annotation.Experimental import scala.concurrent.Future import scala.concurrent.Await import scala.concurrent.duration.Duration import scala.concurrent.ExecutionContext import scala.concurrent.TimeoutException import org.joda.time.DateTime /** :: DeveloperApi :: * Base trait of a data access object that directly returns [[Event]] without * going through Spark's parallelization. Engine developers should use * [[org.apache.predictionio.data.store.LEventStore]] instead of using this directly. * * @group Event Data */ @DeveloperApi trait LEvents { /** Default timeout for asynchronous operations that is set to 1 minute */ val defaultTimeout = Duration(60, "seconds") /** :: DeveloperApi :: * Initialize Event Store for an app ID and optionally a channel ID. * This routine is to be called when an app is first created. * * @param appId App ID * @param channelId Optional channel ID * @return true if initialization was successful; false otherwise. */ @DeveloperApi def init(appId: Int, channelId: Option[Int] = None): Boolean /** :: DeveloperApi :: * Remove Event Store for an app ID and optional channel ID. * * @param appId App ID * @param channelId Optional channel ID * @return true if removal was successful; false otherwise. */ @DeveloperApi def remove(appId: Int, channelId: Option[Int] = None): Boolean /** :: DeveloperApi :: * Close this Event Store interface object, e.g. close connection, release * resources, etc. */ @DeveloperApi def close(): Unit /** :: DeveloperApi :: * Insert an [[Event]] in a non-blocking fashion. * * @param event An [[Event]] to be inserted * @param appId App ID for the [[Event]] to be inserted to */ @DeveloperApi def futureInsert(event: Event, appId: Int)(implicit ec: ExecutionContext): Future[String] = futureInsert(event, appId, None) /** :: DeveloperApi :: * Insert an [[Event]] in a non-blocking fashion. * * @param event An [[Event]] to be inserted * @param appId App ID for the [[Event]] to be inserted to * @param channelId Optional channel ID for the [[Event]] to be inserted to */ @DeveloperApi def futureInsert( event: Event, appId: Int, channelId: Option[Int])(implicit ec: ExecutionContext): Future[String] /** :: DeveloperApi :: * Insert [[Event]]s in a non-blocking fashion. * * Default implementation of this method is calling * [[LEvents.futureInsert(Event, Int, Option[Int])]] per event. * Override in the storage implementation if the storage has * a better way to insert multiple data at once. * * @param events [[Event]]s to be inserted * @param appId App ID for the [[Event]]s to be inserted to * @param channelId Optional channel ID for the [[Event]]s to be inserted to */ @DeveloperApi def futureInsertBatch(events: Seq[Event], appId: Int, channelId: Option[Int]) (implicit ec: ExecutionContext): Future[Seq[String]] = { val seq = events.map { event => futureInsert(event, appId, channelId) } Future.sequence(seq) } /** :: DeveloperApi :: * Get an [[Event]] in a non-blocking fashion. * * @param eventId ID of the [[Event]] * @param appId ID of the app that contains the [[Event]] */ @DeveloperApi def futureGet(eventId: String, appId: Int)(implicit ec: ExecutionContext): Future[Option[Event]] = futureGet(eventId, appId, None) /** :: DeveloperApi :: * Get an [[Event]] in a non-blocking fashion. * * @param eventId ID of the [[Event]] * @param appId ID of the app that contains the [[Event]] * @param channelId Optional channel ID that contains the [[Event]] */ @DeveloperApi def futureGet( eventId: String, appId: Int, channelId: Option[Int] )(implicit ec: ExecutionContext): Future[Option[Event]] /** :: DeveloperApi :: * Delete an [[Event]] in a non-blocking fashion. * * @param eventId ID of the [[Event]] * @param appId ID of the app that contains the [[Event]] */ @DeveloperApi def futureDelete(eventId: String, appId: Int)(implicit ec: ExecutionContext): Future[Boolean] = futureDelete(eventId, appId, None) /** :: DeveloperApi :: * Delete an [[Event]] in a non-blocking fashion. * * @param eventId ID of the [[Event]] * @param appId ID of the app that contains the [[Event]] * @param channelId Optional channel ID that contains the [[Event]] */ @DeveloperApi def futureDelete( eventId: String, appId: Int, channelId: Option[Int] )(implicit ec: ExecutionContext): Future[Boolean] /** :: DeveloperApi :: * Reads from database and returns a Future of Iterator of [[Event]]s. * * @param appId return events of this app ID * @param channelId return events of this channel ID (default channel if it's None) * @param startTime return events with eventTime >= startTime * @param untilTime return events with eventTime < untilTime * @param entityType return events of this entityType * @param entityId return events of this entityId * @param eventNames return events with any of these event names. * @param targetEntityType return events of this targetEntityType: * - None means no restriction on targetEntityType * - Some(None) means no targetEntityType for this event * - Some(Some(x)) means targetEntityType should match x. * @param targetEntityId return events of this targetEntityId * - None means no restriction on targetEntityId * - Some(None) means no targetEntityId for this event * - Some(Some(x)) means targetEntityId should match x. * @param limit Limit number of events. Get all events if None or Some(-1) * @param reversed Reverse the order. * - return oldest events first if None or Some(false) (default) * - return latest events first if Some(true) * @param ec ExecutionContext * @return Future[Iterator[Event]] */ @DeveloperApi def futureFind( appId: Int, channelId: Option[Int] = None, startTime: Option[DateTime] = None, untilTime: Option[DateTime] = None, entityType: Option[String] = None, entityId: Option[String] = None, eventNames: Option[Seq[String]] = None, targetEntityType: Option[Option[String]] = None, targetEntityId: Option[Option[String]] = None, limit: Option[Int] = None, reversed: Option[Boolean] = None )(implicit ec: ExecutionContext): Future[Iterator[Event]] /** Aggregate properties of entities based on these special events: * \$set, \$unset, \$delete events. * and returns a Future of Map of entityId to properties. * * @param appId use events of this app ID * @param channelId use events of this channel ID (default channel if it's None) * @param entityType aggregate properties of the entities of this entityType * @param startTime use events with eventTime >= startTime * @param untilTime use events with eventTime < untilTime * @param required only keep entities with these required properties defined * @param ec ExecutionContext * @return Future[Map[String, PropertyMap]] */ private[predictionio] def futureAggregateProperties( appId: Int, channelId: Option[Int] = None, entityType: String, startTime: Option[DateTime] = None, untilTime: Option[DateTime] = None, required: Option[Seq[String]] = None)(implicit ec: ExecutionContext): Future[Map[String, PropertyMap]] = { futureFind( appId = appId, channelId = channelId, startTime = startTime, untilTime = untilTime, entityType = Some(entityType), eventNames = Some(LEventAggregator.eventNames) ).map{ eventIt => val dm = LEventAggregator.aggregateProperties(eventIt) if (required.isDefined) { dm.filter { case (k, v) => required.get.map(v.contains(_)).reduce(_ && _) } } else dm } } /** * :: Experimental :: * * Aggregate properties of the specified entity (entityType + entityId) * based on these special events: * \$set, \$unset, \$delete events. * and returns a Future of Option[PropertyMap] * * @param appId use events of this app ID * @param channelId use events of this channel ID (default channel if it's None) * @param entityType the entityType * @param entityId the entityId * @param startTime use events with eventTime >= startTime * @param untilTime use events with eventTime < untilTime * @param ec ExecutionContext * @return Future[Option[PropertyMap]] */ @Experimental private[predictionio] def futureAggregatePropertiesOfEntity( appId: Int, channelId: Option[Int] = None, entityType: String, entityId: String, startTime: Option[DateTime] = None, untilTime: Option[DateTime] = None)(implicit ec: ExecutionContext): Future[Option[PropertyMap]] = { futureFind( appId = appId, channelId = channelId, startTime = startTime, untilTime = untilTime, entityType = Some(entityType), entityId = Some(entityId), eventNames = Some(LEventAggregator.eventNames) ).map{ eventIt => LEventAggregator.aggregatePropertiesSingle(eventIt) } } // following is blocking private[predictionio] def insert(event: Event, appId: Int, channelId: Option[Int] = None, timeout: Duration = defaultTimeout)(implicit ec: ExecutionContext): String = { Await.result(futureInsert(event, appId, channelId), timeout) } private[predictionio] def get(eventId: String, appId: Int, channelId: Option[Int] = None, timeout: Duration = defaultTimeout)(implicit ec: ExecutionContext): Option[Event] = { Await.result(futureGet(eventId, appId, channelId), timeout) } private[predictionio] def delete(eventId: String, appId: Int, channelId: Option[Int] = None, timeout: Duration = defaultTimeout)(implicit ec: ExecutionContext): Boolean = { Await.result(futureDelete(eventId, appId, channelId), timeout) } /** reads from database and returns events iterator. * * @param appId return events of this app ID * @param channelId return events of this channel ID (default channel if it's None) * @param startTime return events with eventTime >= startTime * @param untilTime return events with eventTime < untilTime * @param entityType return events of this entityType * @param entityId return events of this entityId * @param eventNames return events with any of these event names. * @param targetEntityType return events of this targetEntityType: * - None means no restriction on targetEntityType * - Some(None) means no targetEntityType for this event * - Some(Some(x)) means targetEntityType should match x. * @param targetEntityId return events of this targetEntityId * - None means no restriction on targetEntityId * - Some(None) means no targetEntityId for this event * - Some(Some(x)) means targetEntityId should match x. * @param limit Limit number of events. Get all events if None or Some(-1) * @param reversed Reverse the order (should be used with both * targetEntityType and targetEntityId specified) * - return oldest events first if None or Some(false) (default) * - return latest events first if Some(true) * @param ec ExecutionContext * @return Iterator[Event] */ private[predictionio] def find( appId: Int, channelId: Option[Int] = None, startTime: Option[DateTime] = None, untilTime: Option[DateTime] = None, entityType: Option[String] = None, entityId: Option[String] = None, eventNames: Option[Seq[String]] = None, targetEntityType: Option[Option[String]] = None, targetEntityId: Option[Option[String]] = None, limit: Option[Int] = None, reversed: Option[Boolean] = None, timeout: Duration = defaultTimeout)(implicit ec: ExecutionContext): Iterator[Event] = { Await.result(futureFind( appId = appId, channelId = channelId, startTime = startTime, untilTime = untilTime, entityType = entityType, entityId = entityId, eventNames = eventNames, targetEntityType = targetEntityType, targetEntityId = targetEntityId, limit = limit, reversed = reversed), timeout) } // NOTE: remove in next release @deprecated("Use find() instead.", "0.9.2") private[predictionio] def findLegacy( appId: Int, channelId: Option[Int] = None, startTime: Option[DateTime] = None, untilTime: Option[DateTime] = None, entityType: Option[String] = None, entityId: Option[String] = None, eventNames: Option[Seq[String]] = None, targetEntityType: Option[Option[String]] = None, targetEntityId: Option[Option[String]] = None, limit: Option[Int] = None, reversed: Option[Boolean] = None, timeout: Duration = defaultTimeout)(implicit ec: ExecutionContext): Either[StorageError, Iterator[Event]] = { try { // return Either for legacy usage Right(Await.result(futureFind( appId = appId, channelId = channelId, startTime = startTime, untilTime = untilTime, entityType = entityType, entityId = entityId, eventNames = eventNames, targetEntityType = targetEntityType, targetEntityId = targetEntityId, limit = limit, reversed = reversed), timeout)) } catch { case e: TimeoutException => Left(StorageError(s"${e}")) case e: Exception => Left(StorageError(s"${e}")) } } /** reads events of the specified entity. * * @param appId return events of this app ID * @param channelId return events of this channel ID (default channel if it's None) * @param entityType return events of this entityType * @param entityId return events of this entityId * @param eventNames return events with any of these event names. * @param targetEntityType return events of this targetEntityType: * - None means no restriction on targetEntityType * - Some(None) means no targetEntityType for this event * - Some(Some(x)) means targetEntityType should match x. * @param targetEntityId return events of this targetEntityId * - None means no restriction on targetEntityId * - Some(None) means no targetEntityId for this event * - Some(Some(x)) means targetEntityId should match x. * @param startTime return events with eventTime >= startTime * @param untilTime return events with eventTime < untilTime * @param limit Limit number of events. Get all events if None or Some(-1) * @param latest Return latest event first (default true) * @param ec ExecutionContext * @return Either[StorageError, Iterator[Event]] */ // NOTE: remove this function in next release @deprecated("Use LEventStore.findByEntity() instead.", "0.9.2") def findSingleEntity( appId: Int, channelId: Option[Int] = None, entityType: String, entityId: String, eventNames: Option[Seq[String]] = None, targetEntityType: Option[Option[String]] = None, targetEntityId: Option[Option[String]] = None, startTime: Option[DateTime] = None, untilTime: Option[DateTime] = None, limit: Option[Int] = None, latest: Boolean = true, timeout: Duration = defaultTimeout)(implicit ec: ExecutionContext): Either[StorageError, Iterator[Event]] = { findLegacy( appId = appId, channelId = channelId, startTime = startTime, untilTime = untilTime, entityType = Some(entityType), entityId = Some(entityId), eventNames = eventNames, targetEntityType = targetEntityType, targetEntityId = targetEntityId, limit = limit, reversed = Some(latest), timeout = timeout) } /** Aggregate properties of entities based on these special events: * \$set, \$unset, \$delete events. * and returns a Map of entityId to properties. * * @param appId use events of this app ID * @param channelId use events of this channel ID (default channel if it's None) * @param entityType aggregate properties of the entities of this entityType * @param startTime use events with eventTime >= startTime * @param untilTime use events with eventTime < untilTime * @param required only keep entities with these required properties defined * @param ec ExecutionContext * @return Map[String, PropertyMap] */ private[predictionio] def aggregateProperties( appId: Int, channelId: Option[Int] = None, entityType: String, startTime: Option[DateTime] = None, untilTime: Option[DateTime] = None, required: Option[Seq[String]] = None, timeout: Duration = defaultTimeout)(implicit ec: ExecutionContext): Map[String, PropertyMap] = { Await.result(futureAggregateProperties( appId = appId, channelId = channelId, entityType = entityType, startTime = startTime, untilTime = untilTime, required = required), timeout) } /** * :: Experimental :: * * Aggregate properties of the specified entity (entityType + entityId) * based on these special events: * \$set, \$unset, \$delete events. * and returns Option[PropertyMap] * * @param appId use events of this app ID * @param channelId use events of this channel ID * @param entityType the entityType * @param entityId the entityId * @param startTime use events with eventTime >= startTime * @param untilTime use events with eventTime < untilTime * @param ec ExecutionContext * @return Future[Option[PropertyMap]] */ @Experimental private[predictionio] def aggregatePropertiesOfEntity( appId: Int, channelId: Option[Int] = None, entityType: String, entityId: String, startTime: Option[DateTime] = None, untilTime: Option[DateTime] = None, timeout: Duration = defaultTimeout)(implicit ec: ExecutionContext): Option[PropertyMap] = { Await.result(futureAggregatePropertiesOfEntity( appId = appId, channelId = channelId, entityType = entityType, entityId = entityId, startTime = startTime, untilTime = untilTime), timeout) } } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/storage/Models.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.storage import com.google.common.io.BaseEncoding import org.apache.predictionio.annotation.DeveloperApi import org.json4s._ /** :: DeveloperApi :: * Stores model for each engine instance * * @param id ID of the model, which should be the same as engine instance ID * @param models Trained models of all algorithms * @group Model Data */ @DeveloperApi case class Model( id: String, models: Array[Byte]) /** :: DeveloperApi :: * Base trait for of the [[Model]] data access object * * @group Model Data */ @DeveloperApi trait Models { /** Insert a new [[Model]] */ def insert(i: Model): Unit /** Get a [[Model]] by ID */ def get(id: String): Option[Model] /** Delete a [[Model]] */ def delete(id: String): Unit } /** :: DeveloperApi :: * JSON4S serializer for [[Model]] * * @group Model Data */ @DeveloperApi class ModelSerializer extends CustomSerializer[Model]( format => ({ case JObject(fields) => implicit val formats = DefaultFormats val seed = Model( id = "", models = Array[Byte]()) fields.foldLeft(seed) { case (i, field) => field match { case JField("id", JString(id)) => i.copy(id = id) case JField("models", JString(models)) => i.copy(models = BaseEncoding.base64.decode(models)) case _ => i } } }, { case i: Model => JObject( JField("id", JString(i.id)) :: JField("models", JString(BaseEncoding.base64.encode(i.models))) :: Nil) } )) // Use where models are saved outside the usual methods in pio case class NullModel() ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/storage/PEventAggregator.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.storage import org.joda.time.DateTime import org.json4s.JValue import org.apache.spark.rdd.RDD // each JValue data associated with the time it is set private[predictionio] case class PropTime(val d: JValue, val t: Long) extends Serializable private[predictionio] case class SetProp ( val fields: Map[String, PropTime], // last set time. Note: fields could be empty with valid set time val t: Long) extends Serializable { def ++ (that: SetProp): SetProp = { val commonKeys = fields.keySet.intersect(that.fields.keySet) val common: Map[String, PropTime] = commonKeys.map { k => val thisData = this.fields(k) val thatData = that.fields(k) // only keep the value with latest time val v = if (thisData.t > thatData.t) thisData else thatData (k, v) }.toMap val combinedFields = common ++ (this.fields -- commonKeys) ++ (that.fields -- commonKeys) // keep the latest set time val combinedT = if (this.t > that.t) this.t else that.t SetProp( fields = combinedFields, t = combinedT ) } } private[predictionio] case class UnsetProp (fields: Map[String, Long]) extends Serializable { def ++ (that: UnsetProp): UnsetProp = { val commonKeys = fields.keySet.intersect(that.fields.keySet) val common: Map[String, Long] = commonKeys.map { k => val thisData = this.fields(k) val thatData = that.fields(k) // only keep the value with latest time val v = if (thisData > thatData) thisData else thatData (k, v) }.toMap val combinedFields = common ++ (this.fields -- commonKeys) ++ (that.fields -- commonKeys) UnsetProp( fields = combinedFields ) } } private[predictionio] case class DeleteEntity (t: Long) extends Serializable { def ++ (that: DeleteEntity): DeleteEntity = { if (this.t > that.t) this else that } } private[predictionio] case class EventOp ( val setProp: Option[SetProp] = None, val unsetProp: Option[UnsetProp] = None, val deleteEntity: Option[DeleteEntity] = None, val firstUpdated: Option[DateTime] = None, val lastUpdated: Option[DateTime] = None ) extends Serializable { def ++ (that: EventOp): EventOp = { val firstUp = (this.firstUpdated ++ that.firstUpdated).reduceOption{ (a, b) => if (b.getMillis < a.getMillis) b else a } val lastUp = (this.lastUpdated ++ that.lastUpdated).reduceOption { (a, b) => if (b.getMillis > a.getMillis) b else a } EventOp( setProp = (setProp ++ that.setProp).reduceOption(_ ++ _), unsetProp = (unsetProp ++ that.unsetProp).reduceOption(_ ++ _), deleteEntity = (deleteEntity ++ that.deleteEntity).reduceOption(_ ++ _), firstUpdated = firstUp, lastUpdated = lastUp ) } def toPropertyMap(): Option[PropertyMap] = { setProp.flatMap { set => val unsetKeys: Set[String] = unsetProp.map( unset => unset.fields.filter{ case (k, v) => (v >= set.fields(k).t) }.keySet ).getOrElse(Set()) val combinedFields = deleteEntity.map { delete => if (delete.t >= set.t) { None } else { val deleteKeys: Set[String] = set.fields .filter { case (k, PropTime(kv, t)) => (delete.t >= t) }.keySet Some(set.fields -- unsetKeys -- deleteKeys) } }.getOrElse{ Some(set.fields -- unsetKeys) } // Note: mapValues() doesn't return concrete Map and causes // NotSerializableException issue. Use map(identity) to work around this. // see https://issues.scala-lang.org/browse/SI-7005 combinedFields.map{ f => require(firstUpdated.isDefined, "Unexpected Error: firstUpdated cannot be None.") require(lastUpdated.isDefined, "Unexpected Error: lastUpdated cannot be None.") PropertyMap( fields = f.mapValues(_.d).map(identity), firstUpdated = firstUpdated.get, lastUpdated = lastUpdated.get ) } } } } private[predictionio] object EventOp { // create EventOp from Event object def apply(e: Event): EventOp = { val t = e.eventTime.getMillis e.event match { case "$set" => { val fields = e.properties.fields.mapValues(jv => PropTime(jv, t) ).map(identity) EventOp( setProp = Some(SetProp(fields = fields, t = t)), firstUpdated = Some(e.eventTime), lastUpdated = Some(e.eventTime) ) } case "$unset" => { val fields = e.properties.fields.mapValues(jv => t).map(identity) EventOp( unsetProp = Some(UnsetProp(fields = fields)), firstUpdated = Some(e.eventTime), lastUpdated = Some(e.eventTime) ) } case "$delete" => { EventOp( deleteEntity = Some(DeleteEntity(t)), firstUpdated = Some(e.eventTime), lastUpdated = Some(e.eventTime) ) } case _ => { EventOp() } } } } private[predictionio] object PEventAggregator { val eventNames = List("$set", "$unset", "$delete") def aggregateProperties(eventsRDD: RDD[Event]): RDD[(String, PropertyMap)] = { eventsRDD .map( e => (e.entityId, EventOp(e) )) .aggregateByKey[EventOp](EventOp())( // within same partition seqOp = { case (u, v) => u ++ v }, // across partition combOp = { case (accu, u) => accu ++ u } ) .mapValues(_.toPropertyMap) .filter{ case (k, v) => v.isDefined } .map{ case (k, v) => (k, v.get) } } } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/storage/PEvents.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.storage import grizzled.slf4j.Logger import org.apache.predictionio.annotation.DeveloperApi import org.apache.predictionio.annotation.Experimental import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD import org.joda.time.DateTime import scala.reflect.ClassTag /** :: DeveloperApi :: * Base trait of a data access object that returns [[Event]] related RDD data * structure. Engine developers should use * [[org.apache.predictionio.data.store.PEventStore]] instead of using this directly. * * @group Event Data */ @DeveloperApi trait PEvents extends Serializable { @transient protected lazy val logger = Logger[this.type] @deprecated("Use PEventStore.find() instead.", "0.9.2") def getByAppIdAndTimeAndEntity(appId: Int, startTime: Option[DateTime], untilTime: Option[DateTime], entityType: Option[String], entityId: Option[String])(sc: SparkContext): RDD[Event] = { find( appId = appId, startTime = startTime, untilTime = untilTime, entityType = entityType, entityId = entityId, eventNames = None )(sc) } /** :: DeveloperApi :: * Read from database and return the events. The deprecation here is intended * to engine developers only. * * @param appId return events of this app ID * @param channelId return events of this channel ID (default channel if it's None) * @param startTime return events with eventTime >= startTime * @param untilTime return events with eventTime < untilTime * @param entityType return events of this entityType * @param entityId return events of this entityId * @param eventNames return events with any of these event names. * @param targetEntityType return events of this targetEntityType: * - None means no restriction on targetEntityType * - Some(None) means no targetEntityType for this event * - Some(Some(x)) means targetEntityType should match x. * @param targetEntityId return events of this targetEntityId * - None means no restriction on targetEntityId * - Some(None) means no targetEntityId for this event * - Some(Some(x)) means targetEntityId should match x. * @param sc Spark context * @return RDD[Event] */ @deprecated("Use PEventStore.find() instead.", "0.9.2") @DeveloperApi def find( appId: Int, channelId: Option[Int] = None, startTime: Option[DateTime] = None, untilTime: Option[DateTime] = None, entityType: Option[String] = None, entityId: Option[String] = None, eventNames: Option[Seq[String]] = None, targetEntityType: Option[Option[String]] = None, targetEntityId: Option[Option[String]] = None)(sc: SparkContext): RDD[Event] /** Aggregate properties of entities based on these special events: * \$set, \$unset, \$delete events. The deprecation here is intended to * engine developers only. * * @param appId use events of this app ID * @param channelId use events of this channel ID (default channel if it's None) * @param entityType aggregate properties of the entities of this entityType * @param startTime use events with eventTime >= startTime * @param untilTime use events with eventTime < untilTime * @param required only keep entities with these required properties defined * @param sc Spark context * @return RDD[(String, PropertyMap)] RDD of entityId and PropertyMap pair */ @deprecated("Use PEventStore.aggregateProperties() instead.", "0.9.2") def aggregateProperties( appId: Int, channelId: Option[Int] = None, entityType: String, startTime: Option[DateTime] = None, untilTime: Option[DateTime] = None, required: Option[Seq[String]] = None) (sc: SparkContext): RDD[(String, PropertyMap)] = { val eventRDD = find( appId = appId, channelId = channelId, startTime = startTime, untilTime = untilTime, entityType = Some(entityType), eventNames = Some(PEventAggregator.eventNames))(sc) val dmRDD = PEventAggregator.aggregateProperties(eventRDD) required map { r => dmRDD.filter { case (k, v) => r.map(v.contains(_)).reduce(_ && _) } } getOrElse dmRDD } /** :: Experimental :: * Extract EntityMap[A] from events for the entityType * NOTE: it is local EntityMap[A] */ @deprecated("Use PEventStore.aggregateProperties() instead.", "0.9.2") @Experimental def extractEntityMap[A: ClassTag]( appId: Int, entityType: String, startTime: Option[DateTime] = None, untilTime: Option[DateTime] = None, required: Option[Seq[String]] = None) (sc: SparkContext)(extract: DataMap => A): EntityMap[A] = { val idToData: Map[String, A] = aggregateProperties( appId = appId, entityType = entityType, startTime = startTime, untilTime = untilTime, required = required )(sc).map{ case (id, dm) => try { (id, extract(dm)) } catch { case e: Exception => { logger.error(s"Failed to get extract entity from DataMap $dm of " + s"entityId $id.", e) throw e } } }.collectAsMap.toMap new EntityMap(idToData) } /** :: DeveloperApi :: * Write events to database * * @param events RDD of Event * @param appId the app ID * @param sc Spark Context */ @DeveloperApi def write(events: RDD[Event], appId: Int)(sc: SparkContext): Unit = write(events, appId, None)(sc) /** :: DeveloperApi :: * Write events to database * * @param events RDD of Event * @param appId the app ID * @param channelId channel ID (default channel if it's None) * @param sc Spark Context */ @DeveloperApi def write(events: RDD[Event], appId: Int, channelId: Option[Int])(sc: SparkContext): Unit @DeveloperApi def delete(eventIds: RDD[String], appId: Int, channelId: Option[Int])(sc: SparkContext): Unit } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/storage/PropertyMap.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.storage import org.joda.time.DateTime import org.json4s.JValue import org.json4s.JObject import org.json4s.native.JsonMethods.parse /** A PropertyMap stores aggregated properties of the entity. * Internally it is a Map * whose keys are property names and values are corresponding JSON values * respectively. Use the get() method to retrieve the value of mandatory * property or use getOpt() to retrieve the value of the optional property. * * @param fields Map of property name to JValue * @param firstUpdated first updated time of this PropertyMap * @param lastUpdated last updated time of this PropertyMap */ class PropertyMap( fields: Map[String, JValue], val firstUpdated: DateTime, val lastUpdated: DateTime ) extends DataMap(fields) { override def toString: String = s"PropertyMap(${fields}, ${firstUpdated}, ${lastUpdated})" override def hashCode: Int = 41 * ( 41 * ( 41 + fields.hashCode ) + firstUpdated.hashCode ) + lastUpdated.hashCode override def equals(other: Any): Boolean = other match { case that: PropertyMap => { (that.canEqual(this)) && (super.equals(that)) && (this.firstUpdated.equals(that.firstUpdated)) && (this.lastUpdated.equals(that.lastUpdated)) } case that: DataMap => { // for testing purpose super.equals(that) } case _ => false } override def canEqual(other: Any): Boolean = other.isInstanceOf[PropertyMap] } /** Companion object of the [[PropertyMap]] class. */ object PropertyMap { /** Create an PropertyMap from a Map of String to JValue, * firstUpdated and lastUpdated time. * * @param fields a Map of String to JValue * @param firstUpdated First updated time * @param lastUpdated Last updated time * @return a new PropertyMap */ def apply(fields: Map[String, JValue], firstUpdated: DateTime, lastUpdated: DateTime): PropertyMap = new PropertyMap(fields, firstUpdated, lastUpdated) /** Create an PropertyMap from a JSON String and firstUpdated and lastUpdated * time. * @param js JSON String. eg """{ "a": 1, "b": "foo" }""" * @param firstUpdated First updated time * @param lastUpdated Last updated time * @return a new PropertyMap */ def apply(js: String, firstUpdated: DateTime, lastUpdated: DateTime) : PropertyMap = apply( fields = parse(js).asInstanceOf[JObject].obj.toMap, firstUpdated = firstUpdated, lastUpdated = lastUpdated ) } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/storage/Storage.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.storage import grizzled.slf4j.Logging import org.apache.commons.lang3.exception.ExceptionUtils import org.apache.predictionio.annotation.DeveloperApi import scala.concurrent.ExecutionContext.Implicits.global import scala.language.existentials import scala.reflect.runtime.universe._ /** :: DeveloperApi :: * Any storage backend drivers will need to implement this trait with exactly * '''StorageClient''' as the class name. PredictionIO storage layer will look * for this class when it instantiates the actual backend for use by higher * level storage access APIs. * * @group Storage System */ @DeveloperApi trait BaseStorageClient { /** Configuration of the '''StorageClient''' */ val config: StorageClientConfig /** The actual client object. This could be a database connection or any kind * of database access object. */ val client: AnyRef /** Set a prefix for storage class discovery. As an example, if this prefix * is set as ''JDBC'', when the storage layer instantiates an implementation * of [[Apps]], it will try to look for a class named ''JDBCApps''. */ val prefix: String = "" } /** :: DeveloperApi :: * A wrapper of storage client configuration that will be populated by * PredictionIO automatically, and passed to the StorageClient during * instantiation. * * @param parallel This is set to true by PredictionIO when the storage client * is instantiated in a parallel data source. * @param test This is set to true by PredictionIO when tests are being run. * @param properties This is populated by PredictionIO automatically from * environmental configuration variables. If you have these * variables, * - PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc * - PIO_STORAGE_SOURCES_PGSQL_USERNAME=abc * - PIO_STORAGE_SOURCES_PGSQL_PASSWORD=xyz * * this field will be filled as a map of string to string: * - TYPE -> jdbc * - USERNAME -> abc * - PASSWORD -> xyz * * @group Storage System */ @DeveloperApi case class StorageClientConfig( parallel: Boolean = false, // parallelized access (RDD)? test: Boolean = false, // test mode config properties: Map[String, String] = Map.empty) /** :: DeveloperApi :: * Thrown when a StorageClient runs into an exceptional condition * * @param message Exception error message * @param cause The underlying exception that caused the exception * @group Storage System */ @DeveloperApi class StorageClientException(message: String, cause: Throwable) extends RuntimeException(message, cause) /** DEPRECATED. Use [[StorageException]]. * * @deprecated Use [[StorageException]] */ private[predictionio] case class StorageError(message: String) /** :: DeveloperApi :: * Thrown by data access objects when they run into exceptional conditions * * @param message Exception error message * @param cause The underlying exception that caused the exception * * @group Storage System */ @DeveloperApi class StorageException(message: String, cause: Throwable) extends Exception(message, cause) { def this(message: String) = this(message, null) } class EnvironmentService{ def envKeys(): Iterable[String] = { sys.env.keys } def getByKey(key: String): String = { sys.env(key) } def filter(filterExpression: ((String, String)) => Boolean): Map[String, String] = { sys.env.filter(filterExpression) } } object EnvironmentFactory{ var environmentService: Option[EnvironmentService] = None def create(): EnvironmentService = { if(environmentService.isEmpty){ environmentService = new Some[EnvironmentService](new EnvironmentService) } environmentService.get } } /** Backend-agnostic data storage layer with lazy initialization. Use this * object when you need to interface with Event Store in your engine. * * @group Storage System */ object Storage extends Logging { private case class ClientMeta( sourceType: String, client: BaseStorageClient, config: StorageClientConfig) var environmentService: EnvironmentService = EnvironmentFactory.create private case class DataObjectMeta(sourceName: String, namespace: String) private var errors = 0 private val sourcesPrefix = "PIO_STORAGE_SOURCES" private val sourceTypesRegex = """PIO_STORAGE_SOURCES_([^_]+)_TYPE""".r private val sourceKeys: Seq[String] = environmentService.envKeys.toSeq.flatMap { k => sourceTypesRegex findFirstIn k match { case Some(sourceTypesRegex(sourceType)) => Seq(sourceType) case None => Nil } } if (sourceKeys.size == 0) warn("There is no properly configured data source.") private val s2cm = scala.collection.mutable.Map[String, Option[ClientMeta]]() /** Reference to the app data repository. */ private val EventDataRepository = "EVENTDATA" private val ModelDataRepository = "MODELDATA" private val MetaDataRepository = "METADATA" private val repositoriesPrefix = "PIO_STORAGE_REPOSITORIES" private val repositoryNamesRegex = """PIO_STORAGE_REPOSITORIES_([^_]+)_NAME""".r private val repositoryKeys: Seq[String] = environmentService.envKeys.toSeq.flatMap { k => repositoryNamesRegex findFirstIn k match { case Some(repositoryNamesRegex(repositoryName)) => Seq(repositoryName) case None => Nil } } if (repositoryKeys.size == 0) { warn("There is no properly configured repository.") } private val requiredRepositories = Seq(MetaDataRepository) requiredRepositories foreach { r => if (!repositoryKeys.contains(r)) { error(s"Required repository (${r}) configuration is missing.") errors += 1 } } private val repositoriesToDataObjectMeta: Map[String, DataObjectMeta] = repositoryKeys.map(r => try { val keyedPath = repositoriesPrefixPath(r) val name = environmentService.getByKey(prefixPath(keyedPath, "NAME")) val sourceName = environmentService.getByKey(prefixPath(keyedPath, "SOURCE")) if (sourceKeys.contains(sourceName)) { r -> DataObjectMeta( sourceName = sourceName, namespace = name) } else { error(s"$sourceName is not a configured storage source.") r -> DataObjectMeta("", "") } } catch { case e: Throwable => val stackTrace = ExceptionUtils.getStackTrace(e) error(s"${e.getMessage}\n${stackTrace}\n\n") errors += 1 r -> DataObjectMeta("", "") } ).toMap if (errors > 0) { error(s"There were $errors configuration errors. Exiting.") sys.exit(errors) } // End of constructor and field definitions and begin method definitions private def prefixPath(prefix: String, body: String) = s"${prefix}_$body" private def sourcesPrefixPath(body: String) = prefixPath(sourcesPrefix, body) private def repositoriesPrefixPath(body: String) = prefixPath(repositoriesPrefix, body) private def sourcesToClientMeta( source: String, parallel: Boolean, test: Boolean): Option[ClientMeta] = { val sourceName = if (parallel) s"parallel-$source" else source s2cm.getOrElseUpdate(sourceName, updateS2CM(source, parallel, test)) } private def getClient( clientConfig: StorageClientConfig, pkg: String): BaseStorageClient = { val className = "org.apache.predictionio.data.storage." + pkg + ".StorageClient" try { Class.forName(className).getConstructors()(0).newInstance(clientConfig). asInstanceOf[BaseStorageClient] } catch { case e: ClassNotFoundException => val originalClassName = pkg + ".StorageClient" Class.forName(originalClassName).getConstructors()(0). newInstance(clientConfig).asInstanceOf[BaseStorageClient] case e: java.lang.reflect.InvocationTargetException => throw e.getCause } } /** Get the StorageClient config data from PIO Framework's environment variables */ def getConfig(sourceName: String): Option[StorageClientConfig] = { if (s2cm.contains(sourceName) && s2cm.get(sourceName).nonEmpty && s2cm.get(sourceName).get.nonEmpty) { Some(s2cm.get(sourceName).get.get.config) } else None } private def updateS2CM(k: String, parallel: Boolean, test: Boolean): Option[ClientMeta] = { try { val keyedPath = sourcesPrefixPath(k) val sourceType = environmentService.getByKey(prefixPath(keyedPath, "TYPE")) val props = environmentService.filter(t => t._1.startsWith(keyedPath)).map( t => t._1.replace(s"${keyedPath}_", "") -> t._2) val clientConfig = StorageClientConfig( properties = props, parallel = parallel, test = test) val client = getClient(clientConfig, sourceType) Some(ClientMeta(sourceType, client, clientConfig)) } catch { case e: Throwable => val stackTrace = ExceptionUtils.getStackTrace(e) error(s"Error initializing storage client for source ${k}.\n" + s"${stackTrace}\n\n") errors += 1 None } } private[predictionio] def getDataObjectFromRepo[T](repo: String, test: Boolean = false) (implicit tag: TypeTag[T]): T = { val repoDOMeta = repositoriesToDataObjectMeta(repo) val repoDOSourceName = repoDOMeta.sourceName getDataObject[T](repoDOSourceName, repoDOMeta.namespace, test = test) } private[predictionio] def getPDataObject[T](repo: String)(implicit tag: TypeTag[T]): T = { val repoDOMeta = repositoriesToDataObjectMeta(repo) val repoDOSourceName = repoDOMeta.sourceName getPDataObject[T](repoDOSourceName, repoDOMeta.namespace) } private[predictionio] def getDataObject[T]( sourceName: String, namespace: String, parallel: Boolean = false, test: Boolean = false)(implicit tag: TypeTag[T]): T = { val clientMeta = sourcesToClientMeta(sourceName, parallel, test) getOrElse { throw new StorageClientException( s"Data source $sourceName was not properly initialized.", null) } val sourceType = clientMeta.sourceType val ctorArgs = dataObjectCtorArgs(clientMeta.client, namespace) val classPrefix = clientMeta.client.prefix val originalClassName = tag.tpe.toString.split('.') val rawClassName = sourceType + "." + classPrefix + originalClassName.last val className = "org.apache.predictionio.data.storage." + rawClassName val clazz = try { Class.forName(className) } catch { case e: ClassNotFoundException => try { Class.forName(rawClassName) } catch { case e: ClassNotFoundException => throw new StorageClientException("No storage backend " + "implementation can be found (tried both " + s"$className and $rawClassName)", e) } } val constructor = clazz.getConstructors()(0) try { constructor.newInstance(ctorArgs: _*). asInstanceOf[T] } catch { case e: IllegalArgumentException => error( "Unable to instantiate data object with class '" + constructor.getDeclaringClass.getName + " because its constructor" + " does not have the right number of arguments." + " Number of required constructor arguments: " + ctorArgs.size + "." + " Number of existing constructor arguments: " + constructor.getParameterTypes.size + "." + s" Storage source name: ${sourceName}." + s" Exception message: ${e.getMessage}).", e) errors += 1 throw e case e: java.lang.reflect.InvocationTargetException => throw e.getCause } } private def getPDataObject[T]( sourceName: String, databaseName: String)(implicit tag: TypeTag[T]): T = getDataObject[T](sourceName, databaseName, true) private def dataObjectCtorArgs( client: BaseStorageClient, namespace: String): Seq[AnyRef] = { Seq(client.client, client.config, namespace) } private[predictionio] def verifyAllDataObjects(): Unit = { info("Verifying Meta Data Backend (Source: " + s"${repositoriesToDataObjectMeta(MetaDataRepository).sourceName})...") getMetaDataEngineInstances() getMetaDataEvaluationInstances() getMetaDataApps() getMetaDataAccessKeys() info("Verifying Model Data Backend (Source: " + s"${repositoriesToDataObjectMeta(ModelDataRepository).sourceName})...") getModelDataModels() info("Verifying Event Data Backend (Source: " + s"${repositoriesToDataObjectMeta(EventDataRepository).sourceName})...") val eventsDb = getLEvents(test = true) info("Test writing to Event Store (App Id 0)...") // use appId=0 for testing purpose eventsDb.init(0) eventsDb.insert(Event( event = "test", entityType = "test", entityId = "test"), 0) eventsDb.remove(0) eventsDb.close() } /** :: DeveloperApi :: * Get a data access object for [[EngineInstances]] * * @return An implementation of [[EngineInstances]], depending on the runtime configuration */ def getMetaDataEngineInstances(): EngineInstances = getDataObjectFromRepo[EngineInstances](MetaDataRepository) /** :: DeveloperApi :: * Get a data access object for [[EvaluationInstances]] * * @return An implementation of [[EvaluationInstances]], depending on the runtime configuration */ def getMetaDataEvaluationInstances(): EvaluationInstances = getDataObjectFromRepo[EvaluationInstances](MetaDataRepository) /** :: DeveloperApi :: * Get a data access object for [[Apps]] * * @return An implementation of [[Apps]], depending on the runtime configuration */ def getMetaDataApps(): Apps = getDataObjectFromRepo[Apps](MetaDataRepository) /** :: DeveloperApi :: * Get a data access object for [[AccessKeys]] * * @return An implementation of [[AccessKeys]], depending on the runtime configuration */ def getMetaDataAccessKeys(): AccessKeys = getDataObjectFromRepo[AccessKeys](MetaDataRepository) /** :: DeveloperApi :: * Get a data access object for [[Channels]] * * @return An implementation of [[Channels]], depending on the runtime configuration */ def getMetaDataChannels(): Channels = getDataObjectFromRepo[Channels](MetaDataRepository) /** :: DeveloperApi :: * Get a data access object for [[Models]] * * @return An implementation of [[Models]], depending on the runtime configuration */ def getModelDataModels(): Models = getDataObjectFromRepo[Models](ModelDataRepository) /** Obtains a data access object that returns [[Event]] related local data * structure. */ def getLEvents(test: Boolean = false): LEvents = getDataObjectFromRepo[LEvents](EventDataRepository, test = test) /** Obtains a data access object that returns [[Event]] related RDD data * structure. */ def getPEvents(): PEvents = getPDataObject[PEvents](EventDataRepository) def config: Map[String, Map[String, Map[String, String]]] = Map( "sources" -> s2cm.toMap.map { case (source, clientMeta) => source -> clientMeta.map { cm => Map( "type" -> cm.sourceType, "config" -> cm.config.properties.map(t => s"${t._1} -> ${t._2}").mkString(", ") ) }.getOrElse(Map.empty) } ) } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/storage/Utils.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.storage import org.joda.time.DateTime import org.joda.time.format.ISODateTimeFormat /** Backend-agnostic storage utilities. */ private[predictionio] object Utils { /** Add prefix to custom attribute keys. */ def addPrefixToAttributeKeys[T]( attributes: Map[String, T], prefix: String = "ca_"): Map[String, T] = { attributes map { case (k, v) => (prefix + k, v) } } /** Remove prefix from custom attribute keys. */ def removePrefixFromAttributeKeys[T]( attributes: Map[String, T], prefix: String = "ca_"): Map[String, T] = { attributes map { case (k, v) => (k.stripPrefix(prefix), v) } } /** Appends App ID to any ID. * Used for distinguishing different app's data within a single collection. */ def idWithAppid(appid: Int, id: String): String = appid + "_" + id def stringToDateTime(dt: String): DateTime = ISODateTimeFormat.dateTimeParser.parseDateTime(dt) } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/storage/package.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data /** If you are an engine developer, please refer to the [[store]] package. * * This package provides convenient access to underlying data access objects. * The common entry point is [[Storage]]. * * Developer APIs are available to advanced developers to add support of other * data store backends. */ package object storage {} ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/store/Common.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.store import org.apache.predictionio.data.storage.Storage import scala.collection.mutable import grizzled.slf4j.Logger private[predictionio] object Common { @transient lazy val logger = Logger[this.type] @transient lazy private val appsDb = Storage.getMetaDataApps() @transient lazy private val channelsDb = Storage.getMetaDataChannels() // Memoize app & channel name-to-ID resolution to avoid excessive storage IO @transient lazy val appNameToIdCache = mutable.Map[(String, Option[String]), (Int, Option[Int])]() /* throw exception if invalid app name or channel name */ def appNameToId(appName: String, channelName: Option[String]): (Int, Option[Int]) = { appNameToIdCache.getOrElseUpdate((appName, channelName), { val appOpt = appsDb.getByName(appName) appOpt.map { app => val channelMap: Map[String, Int] = channelsDb.getByAppid(app.id) .map(c => (c.name, c.id)).toMap val channelId: Option[Int] = channelName.map { ch => if (channelMap.contains(ch)) { channelMap(ch) } else { logger.error(s"Invalid channel name ${ch}.") throw new IllegalArgumentException(s"Invalid channel name ${ch}.") } } appNameToIdCache((appName, channelName)) = (app.id, channelId) (app.id, channelId) }.getOrElse { logger.error(s"Invalid app name ${appName}") throw new IllegalArgumentException(s"Invalid app name ${appName}") } }) } } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/store/LEventStore.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.store import org.apache.predictionio.data.storage.Storage import org.apache.predictionio.data.storage.Event import org.joda.time.DateTime import scala.concurrent.{Await, ExecutionContext, Future} import scala.concurrent.duration.Duration /** This object provides a set of operation to access Event Store * without going through Spark's parallelization. * * Note that blocking methods of this object uses * `scala.concurrent.ExecutionContext.Implicits.global` internally. * Since this is a thread pool which has a number of threads equal to available * processors, parallelism is limited up to the number of processors. * * If this limitation become bottleneck of resource usage, you can increase the * number of threads by declaring following VM options before calling "pio deploy": * *
  * export JAVA_OPTS="$JAVA_OPTS \
  *   -Dscala.concurrent.context.numThreads=1000 \
  *   -Dscala.concurrent.context.maxThreads=1000"
  * 
* * You can learn more about the global execution context in the Scala documentation: * [[https://docs.scala-lang.org/overviews/core/futures.html#the-global-execution-context]] */ object LEventStore { private val defaultTimeout = Duration(60, "seconds") @transient lazy private val eventsDb = Storage.getLEvents() /** Reads events of the specified entity. May use this in Algorithm's predict() * or Serving logic to have fast event store access. * * @param appName return events of this app * @param entityType return events of this entityType * @param entityId return events of this entityId * @param channelName return events of this channel (default channel if it's None) * @param eventNames return events with any of these event names. * @param targetEntityType return events of this targetEntityType: * - None means no restriction on targetEntityType * - Some(None) means no targetEntityType for this event * - Some(Some(x)) means targetEntityType should match x. * @param targetEntityId return events of this targetEntityId * - None means no restriction on targetEntityId * - Some(None) means no targetEntityId for this event * - Some(Some(x)) means targetEntityId should match x. * @param startTime return events with eventTime >= startTime * @param untilTime return events with eventTime < untilTime * @param limit Limit number of events. Get all events if None or Some(-1) * @param latest Return latest event first (default true) * @return Iterator[Event] */ def findByEntity( appName: String, entityType: String, entityId: String, channelName: Option[String] = None, eventNames: Option[Seq[String]] = None, targetEntityType: Option[Option[String]] = None, targetEntityId: Option[Option[String]] = None, startTime: Option[DateTime] = None, untilTime: Option[DateTime] = None, limit: Option[Int] = None, latest: Boolean = true, timeout: Duration = defaultTimeout): Iterator[Event] = { // Import here to ensure ExecutionContext.Implicits.global is used only in this method import scala.concurrent.ExecutionContext.Implicits.global Await.result(findByEntityAsync( appName = appName, entityType = entityType, entityId = entityId, channelName = channelName, eventNames = eventNames, targetEntityType = targetEntityType, targetEntityId = targetEntityId, startTime = startTime, untilTime = untilTime, limit = limit, latest = latest), timeout) } /** Reads events of the specified entity. May use this in Algorithm's predict() * or Serving logic to have fast event store access. * * @param appName return events of this app * @param entityType return events of this entityType * @param entityId return events of this entityId * @param channelName return events of this channel (default channel if it's None) * @param eventNames return events with any of these event names. * @param targetEntityType return events of this targetEntityType: * - None means no restriction on targetEntityType * - Some(None) means no targetEntityType for this event * - Some(Some(x)) means targetEntityType should match x. * @param targetEntityId return events of this targetEntityId * - None means no restriction on targetEntityId * - Some(None) means no targetEntityId for this event * - Some(Some(x)) means targetEntityId should match x. * @param startTime return events with eventTime >= startTime * @param untilTime return events with eventTime < untilTime * @param limit Limit number of events. Get all events if None or Some(-1) * @param latest Return latest event first (default true) * @return Future[Iterator[Event]] */ def findByEntityAsync( appName: String, entityType: String, entityId: String, channelName: Option[String] = None, eventNames: Option[Seq[String]] = None, targetEntityType: Option[Option[String]] = None, targetEntityId: Option[Option[String]] = None, startTime: Option[DateTime] = None, untilTime: Option[DateTime] = None, limit: Option[Int] = None, latest: Boolean = true)(implicit ec: ExecutionContext): Future[Iterator[Event]] = { val (appId, channelId) = Common.appNameToId(appName, channelName) eventsDb.futureFind( appId = appId, channelId = channelId, startTime = startTime, untilTime = untilTime, entityType = Some(entityType), entityId = Some(entityId), eventNames = eventNames, targetEntityType = targetEntityType, targetEntityId = targetEntityId, limit = limit, reversed = Some(latest)) } /** Reads events generically. If entityType or entityId is not specified, it * results in table scan. * * @param appName return events of this app * @param entityType return events of this entityType * - None means no restriction on entityType * - Some(x) means entityType should match x. * @param entityId return events of this entityId * - None means no restriction on entityId * - Some(x) means entityId should match x. * @param channelName return events of this channel (default channel if it's None) * @param eventNames return events with any of these event names. * @param targetEntityType return events of this targetEntityType: * - None means no restriction on targetEntityType * - Some(None) means no targetEntityType for this event * - Some(Some(x)) means targetEntityType should match x. * @param targetEntityId return events of this targetEntityId * - None means no restriction on targetEntityId * - Some(None) means no targetEntityId for this event * - Some(Some(x)) means targetEntityId should match x. * @param startTime return events with eventTime >= startTime * @param untilTime return events with eventTime < untilTime * @param limit Limit number of events. Get all events if None or Some(-1) * @return Iterator[Event] */ def find( appName: String, entityType: Option[String] = None, entityId: Option[String] = None, channelName: Option[String] = None, eventNames: Option[Seq[String]] = None, targetEntityType: Option[Option[String]] = None, targetEntityId: Option[Option[String]] = None, startTime: Option[DateTime] = None, untilTime: Option[DateTime] = None, limit: Option[Int] = None, timeout: Duration = defaultTimeout): Iterator[Event] = { // Import here to ensure ExecutionContext.Implicits.global is used only in this method import scala.concurrent.ExecutionContext.Implicits.global Await.result(findAsync( appName = appName, entityType = entityType, entityId = entityId, channelName = channelName, eventNames = eventNames, targetEntityType = targetEntityType, targetEntityId = targetEntityId, startTime = startTime, untilTime = untilTime, limit = limit), timeout) } /** Reads events generically. If entityType or entityId is not specified, it * results in table scan. * * @param appName return events of this app * @param entityType return events of this entityType * - None means no restriction on entityType * - Some(x) means entityType should match x. * @param entityId return events of this entityId * - None means no restriction on entityId * - Some(x) means entityId should match x. * @param channelName return events of this channel (default channel if it's None) * @param eventNames return events with any of these event names. * @param targetEntityType return events of this targetEntityType: * - None means no restriction on targetEntityType * - Some(None) means no targetEntityType for this event * - Some(Some(x)) means targetEntityType should match x. * @param targetEntityId return events of this targetEntityId * - None means no restriction on targetEntityId * - Some(None) means no targetEntityId for this event * - Some(Some(x)) means targetEntityId should match x. * @param startTime return events with eventTime >= startTime * @param untilTime return events with eventTime < untilTime * @param limit Limit number of events. Get all events if None or Some(-1) * @return Future[Iterator[Event]] */ def findAsync( appName: String, entityType: Option[String] = None, entityId: Option[String] = None, channelName: Option[String] = None, eventNames: Option[Seq[String]] = None, targetEntityType: Option[Option[String]] = None, targetEntityId: Option[Option[String]] = None, startTime: Option[DateTime] = None, untilTime: Option[DateTime] = None, limit: Option[Int] = None)(implicit ec: ExecutionContext): Future[Iterator[Event]] = { val (appId, channelId) = Common.appNameToId(appName, channelName) eventsDb.futureFind( appId = appId, channelId = channelId, startTime = startTime, untilTime = untilTime, entityType = entityType, entityId = entityId, eventNames = eventNames, targetEntityType = targetEntityType, targetEntityId = targetEntityId, limit = limit) } } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/store/PEventStore.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.store import org.apache.predictionio.data.storage.Storage import org.apache.predictionio.data.storage.Event import org.apache.predictionio.data.storage.PropertyMap import org.joda.time.DateTime import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD import scala.concurrent.ExecutionContext /** This object provides a set of operation to access Event Store * with Spark's parallelization */ object PEventStore { @transient lazy private val eventsDb = Storage.getPEvents() /** Read events from Event Store * * @param appName return events of this app * @param channelName return events of this channel (default channel if it's None) * @param startTime return events with eventTime >= startTime * @param untilTime return events with eventTime < untilTime * @param entityType return events of this entityType * @param entityId return events of this entityId * @param eventNames return events with any of these event names. * @param targetEntityType return events of this targetEntityType: * - None means no restriction on targetEntityType * - Some(None) means no targetEntityType for this event * - Some(Some(x)) means targetEntityType should match x. * @param targetEntityId return events of this targetEntityId * - None means no restriction on targetEntityId * - Some(None) means no targetEntityId for this event * - Some(Some(x)) means targetEntityId should match x. * @param sc Spark context * @return RDD[Event] */ def find( appName: String, channelName: Option[String] = None, startTime: Option[DateTime] = None, untilTime: Option[DateTime] = None, entityType: Option[String] = None, entityId: Option[String] = None, eventNames: Option[Seq[String]] = None, targetEntityType: Option[Option[String]] = None, targetEntityId: Option[Option[String]] = None )(sc: SparkContext): RDD[Event] = { val (appId, channelId) = Common.appNameToId(appName, channelName) eventsDb.find( appId = appId, channelId = channelId, startTime = startTime, untilTime = untilTime, entityType = entityType, entityId = entityId, eventNames = eventNames, targetEntityType = targetEntityType, targetEntityId = targetEntityId )(sc) } /** Aggregate properties of entities based on these special events: * \$set, \$unset, \$delete events. * * @param appName use events of this app * @param entityType aggregate properties of the entities of this entityType * @param channelName use events of this channel (default channel if it's None) * @param startTime use events with eventTime >= startTime * @param untilTime use events with eventTime < untilTime * @param required only keep entities with these required properties defined * @param sc Spark context * @return RDD[(String, PropertyMap)] RDD of entityId and PropetyMap pair */ def aggregateProperties( appName: String, entityType: String, channelName: Option[String] = None, startTime: Option[DateTime] = None, untilTime: Option[DateTime] = None, required: Option[Seq[String]] = None) (sc: SparkContext): RDD[(String, PropertyMap)] = { val (appId, channelId) = Common.appNameToId(appName, channelName) eventsDb.aggregateProperties( appId = appId, entityType = entityType, channelId = channelId, startTime = startTime, untilTime = untilTime, required = required )(sc) } } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/store/java/LJavaEventStore.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.store.java import java.util.concurrent.{CompletableFuture, CompletionStage, ExecutorService} import org.apache.predictionio.data.storage.Event import org.apache.predictionio.data.store.LEventStore import org.joda.time.DateTime import scala.collection.JavaConversions import scala.concurrent.duration.Duration import scala.compat.java8.FutureConverters._ /** This Java-friendly object provides a set of operation to access Event Store * without going through Spark's parallelization. * * Note that blocking methods of this object uses * `scala.concurrent.ExecutionContext.Implicits.global` internally. * Since this is a thread pool which has a number of threads equal to available * processors, parallelism is limited up to the number of processors. * * If this limitation become bottleneck of resource usage, you can increase the * number of threads by declaring following VM options before calling "pio deploy": * *
  * export JAVA_OPTS="$JAVA_OPTS \
  *   -Dscala.concurrent.context.numThreads=1000 \
  *   -Dscala.concurrent.context.maxThreads=1000"
  * 
* * You can learn more about the global execution context in the Scala documentation: * [[https://docs.scala-lang.org/overviews/core/futures.html#the-global-execution-context]] */ object LJavaEventStore { /** Reads events of the specified entity. May use this in Algorithm's predict() * or Serving logic to have fast event store access. * * @param appName return events of this app * @param entityType return events of this entityType * @param entityId return events of this entityId * @param channelName return events of this channel (default channel if it's None) * @param eventNames return events with any of these event names. * @param targetEntityType return events of this targetEntityType: * - None means no restriction on targetEntityType * - Some(None) means no targetEntityType for this event * - Some(Some(x)) means targetEntityType should match x. * @param targetEntityId return events of this targetEntityId * - None means no restriction on targetEntityId * - Some(None) means no targetEntityId for this event * - Some(Some(x)) means targetEntityId should match x. * @param startTime return events with eventTime >= startTime * @param untilTime return events with eventTime < untilTime * @param limit Limit number of events. Get all events if None or Some(-1) * @param latest Return latest event first * @return java.util.List[Event] */ def findByEntity( appName: String, entityType: String, entityId: String, channelName: Option[String], eventNames: Option[java.util.List[String]], targetEntityType: Option[Option[String]], targetEntityId: Option[Option[String]], startTime: Option[DateTime], untilTime: Option[DateTime], limit: Option[Integer], latest: Boolean, timeout: Duration): java.util.List[Event] = { val eventNamesSeq = eventNames.map(JavaConversions.asScalaBuffer(_).toSeq) val limitInt = limit.map(_.intValue()) JavaConversions.seqAsJavaList( LEventStore.findByEntity( appName, entityType, entityId, channelName, eventNamesSeq, targetEntityType, targetEntityId, startTime, untilTime, limitInt, latest, timeout ).toSeq) } /** Reads events of the specified entity. May use this in Algorithm's predict() * or Serving logic to have fast event store access. * * @param appName return events of this app * @param entityType return events of this entityType * @param entityId return events of this entityId * @param channelName return events of this channel (default channel if it's None) * @param eventNames return events with any of these event names. * @param targetEntityType return events of this targetEntityType: * - None means no restriction on targetEntityType * - Some(None) means no targetEntityType for this event * - Some(Some(x)) means targetEntityType should match x. * @param targetEntityId return events of this targetEntityId * - None means no restriction on targetEntityId * - Some(None) means no targetEntityId for this event * - Some(Some(x)) means targetEntityId should match x. * @param startTime return events with eventTime >= startTime * @param untilTime return events with eventTime < untilTime * @param limit Limit number of events. Get all events if None or Some(-1) * @param latest Return latest event first * @return CompletableFuture[java.util.List[Event]] */ def findByEntityAsync( appName: String, entityType: String, entityId: String, channelName: Option[String], eventNames: Option[java.util.List[String]], targetEntityType: Option[Option[String]], targetEntityId: Option[Option[String]], startTime: Option[DateTime], untilTime: Option[DateTime], limit: Option[Integer], latest: Boolean, executorService: ExecutorService): CompletableFuture[java.util.List[Event]] = { val eventNamesSeq = eventNames.map(JavaConversions.asScalaBuffer(_).toSeq) val limitInt = limit.map(_.intValue()) implicit val ec = fromExecutorService(executorService) LEventStore.findByEntityAsync( appName, entityType, entityId, channelName, eventNamesSeq, targetEntityType, targetEntityId, startTime, untilTime, limitInt, latest ).map { x => JavaConversions.seqAsJavaList(x.toSeq) }.toJava.toCompletableFuture } /** Reads events generically. If entityType or entityId is not specified, it * results in table scan. * * @param appName return events of this app * @param entityType return events of this entityType * - None means no restriction on entityType * - Some(x) means entityType should match x. * @param entityId return events of this entityId * - None means no restriction on entityId * - Some(x) means entityId should match x. * @param channelName return events of this channel (default channel if it's None) * @param eventNames return events with any of these event names. * @param targetEntityType return events of this targetEntityType: * - None means no restriction on targetEntityType * - Some(None) means no targetEntityType for this event * - Some(Some(x)) means targetEntityType should match x. * @param targetEntityId return events of this targetEntityId * - None means no restriction on targetEntityId * - Some(None) means no targetEntityId for this event * - Some(Some(x)) means targetEntityId should match x. * @param startTime return events with eventTime >= startTime * @param untilTime return events with eventTime < untilTime * @param limit Limit number of events. Get all events if None or Some(-1) * @return java.util.List[Event] */ def find( appName: String, entityType: Option[String], entityId: Option[String], channelName: Option[String], eventNames: Option[java.util.List[String]], targetEntityType: Option[Option[String]], targetEntityId: Option[Option[String]], startTime: Option[DateTime], untilTime: Option[DateTime], limit: Option[Integer], timeout: Duration): java.util.List[Event] = { val eventNamesSeq = eventNames.map(JavaConversions.asScalaBuffer(_).toSeq) val limitInt = limit.map(_.intValue()) JavaConversions.seqAsJavaList( LEventStore.find( appName, entityType, entityId, channelName, eventNamesSeq, targetEntityType, targetEntityId, startTime, untilTime, limitInt, timeout ).toSeq) } /** Reads events generically. If entityType or entityId is not specified, it * results in table scan. * * @param appName return events of this app * @param entityType return events of this entityType * - None means no restriction on entityType * - Some(x) means entityType should match x. * @param entityId return events of this entityId * - None means no restriction on entityId * - Some(x) means entityId should match x. * @param channelName return events of this channel (default channel if it's None) * @param eventNames return events with any of these event names. * @param targetEntityType return events of this targetEntityType: * - None means no restriction on targetEntityType * - Some(None) means no targetEntityType for this event * - Some(Some(x)) means targetEntityType should match x. * @param targetEntityId return events of this targetEntityId * - None means no restriction on targetEntityId * - Some(None) means no targetEntityId for this event * - Some(Some(x)) means targetEntityId should match x. * @param startTime return events with eventTime >= startTime * @param untilTime return events with eventTime < untilTime * @param limit Limit number of events. Get all events if None or Some(-1) * @return CompletableFuture[java.util.List[Event]] */ def findAsync( appName: String, entityType: Option[String], entityId: Option[String], channelName: Option[String], eventNames: Option[java.util.List[String]], targetEntityType: Option[Option[String]], targetEntityId: Option[Option[String]], startTime: Option[DateTime], untilTime: Option[DateTime], limit: Option[Integer], executorService: ExecutorService): CompletableFuture[java.util.List[Event]] = { val eventNamesSeq = eventNames.map(JavaConversions.asScalaBuffer(_).toSeq) val limitInt = limit.map(_.intValue()) implicit val ec = fromExecutorService(executorService) LEventStore.findAsync( appName, entityType, entityId, channelName, eventNamesSeq, targetEntityType, targetEntityId, startTime, untilTime, limitInt ).map { x => JavaConversions.seqAsJavaList(x.toSeq) }.toJava.toCompletableFuture } } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/store/java/OptionHelper.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.store.java /** Used by Java-based engines to mock Some and None */ object OptionHelper { /** Mimics a None from Java-based engine */ def none[T]: Option[T] = { Option(null.asInstanceOf[T]) } /** Mimics a Some from Java-based engine */ def some[T](value: T): Option[T] = { Some(value) } } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/store/java/PJavaEventStore.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.store.java import org.apache.predictionio.data.storage.Event import org.apache.predictionio.data.storage.PropertyMap import org.apache.predictionio.data.store.PEventStore import org.apache.spark.SparkContext import org.apache.spark.api.java.JavaRDD import org.joda.time.DateTime import scala.collection.JavaConversions /** This Java-friendly object provides a set of operation to access Event Store * with Spark's parallelization */ object PJavaEventStore { /** Read events from Event Store * * @param appName return events of this app * @param channelName return events of this channel (default channel if it's None) * @param startTime return events with eventTime >= startTime * @param untilTime return events with eventTime < untilTime * @param entityType return events of this entityType * @param entityId return events of this entityId * @param eventNames return events with any of these event names. * @param targetEntityType return events of this targetEntityType: * - None means no restriction on targetEntityType * - Some(None) means no targetEntityType for this event * - Some(Some(x)) means targetEntityType should match x. * @param targetEntityId return events of this targetEntityId * - None means no restriction on targetEntityId * - Some(None) means no targetEntityId for this event * - Some(Some(x)) means targetEntityId should match x. * @param sc Spark context * @return JavaRDD[Event] */ def find( appName: String, channelName: Option[String], startTime: Option[DateTime], untilTime: Option[DateTime], entityType: Option[String], entityId: Option[String], eventNames: Option[java.util.List[String]], targetEntityType: Option[Option[String]], targetEntityId: Option[Option[String]], sc: SparkContext): JavaRDD[Event] = { val eventNamesSeq = eventNames.map(JavaConversions.asScalaBuffer(_).toSeq) PEventStore.find( appName, channelName, startTime, untilTime, entityType, entityId, eventNamesSeq, targetEntityType, targetEntityId )(sc) } /** Aggregate properties of entities based on these special events: * \$set, \$unset, \$delete events. * * @param appName use events of this app * @param entityType aggregate properties of the entities of this entityType * @param channelName use events of this channel (default channel if it's None) * @param startTime use events with eventTime >= startTime * @param untilTime use events with eventTime < untilTime * @param required only keep entities with these required properties defined * @param sc Spark context * @return JavaRDD[(String, PropertyMap)] JavaRDD of entityId and PropetyMap pair */ def aggregateProperties( appName: String, entityType: String, channelName: Option[String], startTime: Option[DateTime], untilTime: Option[DateTime], required: Option[java.util.List[String]], sc: SparkContext): JavaRDD[(String, PropertyMap)] = { PEventStore.aggregateProperties( appName, entityType, channelName, startTime, untilTime )(sc) } } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/store/package.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data /** Provides high level interfaces to the Event Store from within a prediction * engine. */ package object store {} ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/store/python/PPythonEventStore.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.store.python import java.sql.Timestamp import org.apache.predictionio.data.store.PEventStore import org.apache.spark.sql.{DataFrame, SparkSession} import org.joda.time.DateTime /** This object provides a set of operation to access Event Store * with Spark's parallelization */ object PPythonEventStore { /** Read events from Event Store * * @param appName return events of this app * @param channelName return events of this channel (default channel if it's None) * @param startTime return events with eventTime >= startTime * @param untilTime return events with eventTime < untilTime * @param entityType return events of this entityType * @param entityId return events of this entityId * @param eventNames return events with any of these event names. * @param targetEntityType return events of this targetEntityType: * - None means no restriction on targetEntityType * - Some(None) means no targetEntityType for this event * - Some(Some(x)) means targetEntityType should match x. * @param targetEntityId return events of this targetEntityId * - None means no restriction on targetEntityId * - Some(None) means no targetEntityId for this event * - Some(Some(x)) means targetEntityId should match x. * @param spark Spark context * @return DataFrame */ def find( appName: String, channelName: String, startTime: Timestamp, untilTime: Timestamp, entityType: String, entityId: String, eventNames: Array[String], targetEntityType: String, targetEntityId: String )(spark: SparkSession): DataFrame = { import spark.implicits._ val colNames: Seq[String] = Seq( "eventId", "event", "entityType", "entityId", "targetEntityType", "targetEntityId", "eventTime", "tags", "prId", "creationTime", "fields" ) PEventStore.find(appName, Option(channelName), Option(startTime).map(t => new DateTime(t.getTime)), Option(untilTime).map(t => new DateTime(t.getTime)), Option(entityType), Option(entityId), Option(eventNames), targetEntityType match { case null => None case "" => Option(None) case _ => Option(Option(targetEntityType)) }, targetEntityId match { case null => None case "" => Option(None) case _ => Option(Option(targetEntityId)) } )(spark.sparkContext).map { e => ( e.eventId, e.event, e.entityType, e.entityId, e.targetEntityType.orNull, e.targetEntityId.orNull, new Timestamp(e.eventTime.getMillis), e.tags.mkString("\t"), e.prId.orNull, new Timestamp(e.creationTime.getMillis), e.properties.fields.mapValues(_.values.toString) ) }.toDF(colNames: _*) } /** Aggregate properties of entities based on these special events: * \$set, \$unset, \$delete events. * * @param appName use events of this app * @param entityType aggregate properties of the entities of this entityType * @param channelName use events of this channel (default channel if it's None) * @param startTime use events with eventTime >= startTime * @param untilTime use events with eventTime < untilTime * @param required only keep entities with these required properties defined * @param spark Spark session * @return DataFrame DataFrame of entityId and PropetyMap pair */ def aggregateProperties( appName: String, entityType: String, channelName: String, startTime: Timestamp, untilTime: Timestamp, required: Array[String] ) (spark: SparkSession): DataFrame = { import spark.implicits._ val colNames: Seq[String] = Seq( "entityId", "firstUpdated", "lastUpdated", "fields" ) PEventStore.aggregateProperties(appName, entityType, Option(channelName), Option(startTime).map(t => new DateTime(t.getTime)), Option(untilTime).map(t => new DateTime(t.getTime)), Option(required.toSeq))(spark.sparkContext).map { x => val m = x._2 (x._1, new Timestamp(m.firstUpdated.getMillis), new Timestamp(m.lastUpdated.getMillis), m.fields.mapValues(_.values.toString) ) }.toDF(colNames: _*) } } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/view/DataView.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.view import org.apache.predictionio.annotation.Experimental import org.apache.predictionio.data.storage.Event import grizzled.slf4j.Logger import org.apache.predictionio.data.store.PEventStore import org.apache.spark.rdd.RDD import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession} import org.apache.spark.SparkContext import org.joda.time.DateTime import scala.reflect.ClassTag import scala.reflect.runtime.universe._ import scala.util.hashing.MurmurHash3 /** :: Experimental :: */ @Experimental object DataView { /** * :: Experimental :: * * Create a DataFrame from events of a specified app. * * @param appName return events of this app * @param channelName use events of this channel (default channel if it's None) * @param startTime return events with eventTime >= startTime * @param untilTime return events with eventTime < untilTime * @param conversionFunction a function that turns raw Events into events of interest. * If conversionFunction returns None, such events are dropped. * @param name identify the DataFrame created * @param version used to track changes to the conversionFunction, e.g. version = "20150413" * and update whenever the function is changed. * @tparam E the output type of the conversion function. The type needs to extend Product * (e.g. case class) * @return a DataFrame of events */ @Experimental def create[E <: Product: TypeTag: ClassTag]( appName: String, channelName: Option[String] = None, startTime: Option[DateTime] = None, untilTime: Option[DateTime] = None, conversionFunction: Event => Option[E], name: String = "", version: String = "")(sc: SparkContext): DataFrame = { @transient lazy val logger = Logger[this.type] val sqlSession = SparkSession.builder().getOrCreate() val beginTime = startTime match { case Some(t) => t case None => new DateTime(0L) } val endTime = untilTime match { case Some(t) => t case None => DateTime.now() // fix the current time } // detect changes to the case class val uid = java.io.ObjectStreamClass.lookup(implicitly[reflect.ClassTag[E]].runtimeClass) .getSerialVersionUID val hash = MurmurHash3.stringHash(s"$beginTime-$endTime-$version-$uid") val baseDir = s"${sys.env("PIO_FS_BASEDIR")}/view" val fileName = s"$baseDir/$name-$appName-$hash.parquet" try { sqlSession.read.parquet(fileName) } catch { case e: java.io.FileNotFoundException => logger.info("Cached copy not found, reading from DB.") // if cached copy is found, use it. If not, grab from Storage val result: RDD[E] = PEventStore.find( appName = appName, channelName = channelName, startTime = startTime, untilTime = Some(endTime))(sc) .flatMap((e) => conversionFunction(e)) import sqlSession.implicits._ // needed for RDD.toDF() val resultDF = result.toDF() resultDF.write.mode(SaveMode.ErrorIfExists).parquet(fileName) sqlSession.read.parquet(fileName) case e: java.lang.RuntimeException => if (e.toString.contains("is not a Parquet file")) { logger.error(s"$fileName does not contain a valid Parquet file. " + "Please delete it and try again.") } throw e } } } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/view/LBatchView.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.view import org.apache.predictionio.data.storage.Event import org.apache.predictionio.data.storage.EventValidation import org.apache.predictionio.data.storage.DataMap import org.apache.predictionio.data.storage.Storage import org.joda.time.DateTime import scala.language.implicitConversions import scala.concurrent.ExecutionContext.Implicits.global // TODO object ViewPredicates { @deprecated("Use LEvents or LEventStore instead.", "0.9.2") def getStartTimePredicate(startTimeOpt: Option[DateTime]) : (Event => Boolean) = { startTimeOpt.map(getStartTimePredicate).getOrElse(_ => true) } @deprecated("Use LEvents or LEventStore instead.", "0.9.2") def getStartTimePredicate(startTime: DateTime): (Event => Boolean) = { e => (!(e.eventTime.isBefore(startTime) || e.eventTime.isEqual(startTime))) } @deprecated("Use LEvents or LEventStore instead.", "0.9.2") def getUntilTimePredicate(untilTimeOpt: Option[DateTime]) : (Event => Boolean) = { untilTimeOpt.map(getUntilTimePredicate).getOrElse(_ => true) } @deprecated("Use LEvents or LEventStore instead.", "0.9.2") def getUntilTimePredicate(untilTime: DateTime): (Event => Boolean) = { _.eventTime.isBefore(untilTime) } @deprecated("Use LEvents or LEventStore instead.", "0.9.2") def getEntityTypePredicate(entityTypeOpt: Option[String]): (Event => Boolean) = { entityTypeOpt.map(getEntityTypePredicate).getOrElse(_ => true) } @deprecated("Use LEvents or LEventStore instead.", "0.9.2") def getEntityTypePredicate(entityType: String): (Event => Boolean) = { (_.entityType == entityType) } @deprecated("Use LEvents or LEventStore instead.", "0.9.2") def getEventPredicate(eventOpt: Option[String]): (Event => Boolean) = { eventOpt.map(getEventPredicate).getOrElse(_ => true) } @deprecated("Use LEvents or LEventStore instead.", "0.9.2") def getEventPredicate(event: String): (Event => Boolean) = { (_.event == event) } } object ViewAggregators { @deprecated("Use LEvents instead.", "0.9.2") def getDataMapAggregator(): ((Option[DataMap], Event) => Option[DataMap]) = { (p, e) => { e.event match { case "$set" => { if (p == None) { Some(e.properties) } else { p.map(_ ++ e.properties) } } case "$unset" => { if (p == None) { None } else { p.map(_ -- e.properties.keySet) } } case "$delete" => None case _ => p // do nothing for others } } } } object EventSeq { // Need to // >>> import scala.language.implicitConversions // to enable implicit conversion. Only import in the code where this is // necessary to avoid confusion. @deprecated("Use LEvents instead.", "0.9.2") implicit def eventSeqToList(es: EventSeq): List[Event] = es.events @deprecated("Use LEvents instead.", "0.9.2") implicit def listToEventSeq(l: List[Event]): EventSeq = new EventSeq(l) } class EventSeq(val events: List[Event]) { @deprecated("Use LEvents instead.", "0.9.2") def filter( eventOpt: Option[String] = None, entityTypeOpt: Option[String] = None, startTimeOpt: Option[DateTime] = None, untilTimeOpt: Option[DateTime] = None): EventSeq = { events .filter(ViewPredicates.getEventPredicate(eventOpt)) .filter(ViewPredicates.getStartTimePredicate(startTimeOpt)) .filter(ViewPredicates.getUntilTimePredicate(untilTimeOpt)) .filter(ViewPredicates.getEntityTypePredicate(entityTypeOpt)) } @deprecated("Use LEvents instead.", "0.9.2") def filter(p: (Event => Boolean)): EventSeq = events.filter(p) @deprecated("Use LEvents instead.", "0.9.2") def aggregateByEntityOrdered[T](init: T, op: (T, Event) => T) : Map[String, T] = { events .groupBy( _.entityId ) .mapValues( _.sortBy(_.eventTime.getMillis).foldLeft[T](init)(op)) } } class LBatchView( val appId: Int, val startTime: Option[DateTime], val untilTime: Option[DateTime]) { @transient lazy val eventsDb = Storage.getLEvents() @transient lazy val _events = eventsDb.find( appId = appId, startTime = startTime, untilTime = untilTime).toList @transient lazy val events: EventSeq = new EventSeq(_events) /* Aggregate event data * * @param entityType only aggregate event with entityType * @param startTimeOpt if specified, only aggregate event after (inclusive) * startTimeOpt * @param untilTimeOpt if specified, only aggregate event until (exclusive) * endTimeOpt */ @deprecated("Use LEventStore instead.", "0.9.2") def aggregateProperties( entityType: String, startTimeOpt: Option[DateTime] = None, untilTimeOpt: Option[DateTime] = None ): Map[String, DataMap] = { events .filter(entityTypeOpt = Some(entityType)) .filter(e => EventValidation.isSpecialEvents(e.event)) .aggregateByEntityOrdered( init = None, op = ViewAggregators.getDataMapAggregator()) .filter{ case (k, v) => (v != None) } .mapValues(_.get) } } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/view/PBatchView.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.view import org.apache.predictionio.data.storage.{DataMap, Event, EventValidation, Storage} import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD import org.joda.time.DateTime import org.json4s.JValue // each JValue data associated with the time it is set private[predictionio] case class PropTime(val d: JValue, val t: Long) extends Serializable private[predictionio] case class SetProp ( val fields: Map[String, PropTime], // last set time. Note: fields could be empty with valid set time val t: Long) extends Serializable { def ++ (that: SetProp): SetProp = { val commonKeys = fields.keySet.intersect(that.fields.keySet) val common: Map[String, PropTime] = commonKeys.map { k => val thisData = this.fields(k) val thatData = that.fields(k) // only keep the value with latest time val v = if (thisData.t > thatData.t) thisData else thatData (k, v) }.toMap val combinedFields = common ++ (this.fields -- commonKeys) ++ (that.fields -- commonKeys) // keep the latest set time val combinedT = if (this.t > that.t) this.t else that.t SetProp( fields = combinedFields, t = combinedT ) } } private[predictionio] case class UnsetProp (fields: Map[String, Long]) extends Serializable { def ++ (that: UnsetProp): UnsetProp = { val commonKeys = fields.keySet.intersect(that.fields.keySet) val common: Map[String, Long] = commonKeys.map { k => val thisData = this.fields(k) val thatData = that.fields(k) // only keep the value with latest time val v = if (thisData > thatData) thisData else thatData (k, v) }.toMap val combinedFields = common ++ (this.fields -- commonKeys) ++ (that.fields -- commonKeys) UnsetProp( fields = combinedFields ) } } private[predictionio] case class DeleteEntity (t: Long) extends Serializable { def ++ (that: DeleteEntity): DeleteEntity = { if (this.t > that.t) this else that } } private[predictionio] case class EventOp ( val setProp: Option[SetProp] = None, val unsetProp: Option[UnsetProp] = None, val deleteEntity: Option[DeleteEntity] = None ) extends Serializable { def ++ (that: EventOp): EventOp = { EventOp( setProp = (setProp ++ that.setProp).reduceOption(_ ++ _), unsetProp = (unsetProp ++ that.unsetProp).reduceOption(_ ++ _), deleteEntity = (deleteEntity ++ that.deleteEntity).reduceOption(_ ++ _) ) } def toDataMap(): Option[DataMap] = { setProp.flatMap { set => val unsetKeys: Set[String] = unsetProp.map( unset => unset.fields.filter{ case (k, v) => (v >= set.fields(k).t) }.keySet ).getOrElse(Set()) val combinedFields = deleteEntity.map { delete => if (delete.t >= set.t) { None } else { val deleteKeys: Set[String] = set.fields .filter { case (k, PropTime(kv, t)) => (delete.t >= t) }.keySet Some(set.fields -- unsetKeys -- deleteKeys) } }.getOrElse{ Some(set.fields -- unsetKeys) } // Note: mapValues() doesn't return concrete Map and causes // NotSerializableException issue. Use map(identity) to work around this. // see https://issues.scala-lang.org/browse/SI-7005 combinedFields.map(f => DataMap(f.mapValues(_.d).map(identity))) } } } private[predictionio] object EventOp { def apply(e: Event): EventOp = { val t = e.eventTime.getMillis e.event match { case "$set" => { val fields = e.properties.fields.mapValues(jv => PropTime(jv, t) ).map(identity) EventOp( setProp = Some(SetProp(fields = fields, t = t)) ) } case "$unset" => { val fields = e.properties.fields.mapValues(jv => t).map(identity) EventOp( unsetProp = Some(UnsetProp(fields = fields)) ) } case "$delete" => { EventOp( deleteEntity = Some(DeleteEntity(t)) ) } case _ => { EventOp() } } } } @deprecated("Use PEvents or PEventStore instead.", "0.9.2") class PBatchView( val appId: Int, val startTime: Option[DateTime], val untilTime: Option[DateTime], val sc: SparkContext) { // NOTE: parallel Events DB interface @transient lazy val eventsDb = Storage.getPEvents() @transient lazy val _events: RDD[Event] = eventsDb.getByAppIdAndTimeAndEntity( appId = appId, startTime = startTime, untilTime = untilTime, entityType = None, entityId = None)(sc) // TODO: change to use EventSeq? @transient lazy val events: RDD[Event] = _events def aggregateProperties( entityType: String, startTimeOpt: Option[DateTime] = None, untilTimeOpt: Option[DateTime] = None ): RDD[(String, DataMap)] = { _events .filter( e => ((e.entityType == entityType) && (EventValidation.isSpecialEvents(e.event))) ) .map( e => (e.entityId, EventOp(e) )) .aggregateByKey[EventOp](EventOp())( // within same partition seqOp = { case (u, v) => u ++ v }, // across partition combOp = { case (accu, u) => accu ++ u } ) .mapValues(_.toDataMap) .filter{ case (k, v) => v.isDefined } .map{ case (k, v) => (k, v.get) } } } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/view/QuickTest.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.view import org.apache.predictionio.data.storage.Storage import scala.concurrent.ExecutionContext.Implicits.global // TODO import grizzled.slf4j.Logger import org.joda.time.DateTime import scala.language.implicitConversions class TestHBLEvents() { @transient lazy val eventsDb = Storage.getLEvents() def run(): Unit = { val r = eventsDb.find( appId = 1, startTime = None, untilTime = None, entityType = Some("pio_user"), entityId = Some("3")).toList println(r) } } class TestSource(val appId: Int) { @transient lazy val logger = Logger[this.type] @transient lazy val batchView = new LBatchView(appId, None, None) def run(): Unit = { println(batchView.events) } } object QuickTest { def main(args: Array[String]) { val t = new TestHBLEvents() t.run() // val ts = new TestSource(args(0).toInt) // ts.run() } } object TestEventTime { @transient lazy val batchView = new LBatchView(9, None, None) // implicit def back2list(es: EventSeq) = es.events def main(args: Array[String]) { val e = batchView.events.filter( eventOpt = Some("rate"), startTimeOpt = Some(new DateTime(1998, 1, 1, 0, 0)) // untilTimeOpt = Some(new DateTime(1997, 1, 1, 0, 0)) ) // untilTimeOpt = Some(new DateTime(2000, 1, 1, 0, 0))) e.foreach { println } println() println() println() val u = batchView.aggregateProperties("pio_item") u.foreach { println } println() println() println() // val l: Seq[Event] = e val l = e.map { _.entityId } l.foreach { println } } } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/webhooks/ConnectorException.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.webhooks /** Webhooks Connnector Exception * * @param message the detail message * @param cause the cause */ private[predictionio] class ConnectorException(message: String, cause: Throwable) extends Exception(message, cause) { /** Webhooks Connnector Exception with cause being set to null * * @param message the detail message */ def this(message: String) = this(message, null) } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/webhooks/ConnectorUtil.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.webhooks import org.apache.predictionio.data.storage.Event import org.apache.predictionio.data.storage.EventJson4sSupport import org.json4s.Formats import org.json4s.DefaultFormats import org.json4s.JObject import org.json4s.native.Serialization.read import org.json4s.native.Serialization.write private[predictionio] object ConnectorUtil { implicit val eventJson4sFormats: Formats = DefaultFormats + new EventJson4sSupport.APISerializer // intentionally use EventJson4sSupport.APISerializer to convert // from JSON to Event object. Don't allow connector directly create // Event object so that the Event object formation is consistent // by enforcing JSON format def toEvent(connector: JsonConnector, data: JObject): Event = { read[Event](write(connector.toEventJson(data))) } def toEvent(connector: FormConnector, data: Map[String, String]): Event = { read[Event](write(connector.toEventJson(data))) } } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/webhooks/FormConnector.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.webhooks import org.json4s.JObject /** Connector for Webhooks connection with Form submission data format */ private[predictionio] trait FormConnector { // TODO: support conversion to multiple events? /** Convert from original Form submission data to Event JObject * @param data Map of key-value pairs in String type received through webhooks * @return Event JObject */ def toEventJson(data: Map[String, String]): JObject } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/webhooks/JsonConnector.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.webhooks import org.json4s.JObject /** Connector for Webhooks connection */ private[predictionio] trait JsonConnector { // TODO: support conversion to multiple events? /** Convert from original JObject to Event JObject * @param data original JObject recevived through webhooks * @return Event JObject */ def toEventJson(data: JObject): JObject } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/webhooks/exampleform/ExampleFormConnector.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.webhooks.exampleform import org.apache.predictionio.data.webhooks.FormConnector import org.apache.predictionio.data.webhooks.ConnectorException import org.json4s.JObject /** Example FormConnector with following types of webhook form data inputs: * * UserAction * * "type"="userAction" * "userId"="as34smg4", * "event"="do_something", * "context[ip]"="24.5.68.47", // optional * "context[prop1]"="2.345", // optional * "context[prop2]"="value1" // optional * "anotherProperty1"="100", * "anotherProperty2"="optional1", // optional * "timestamp"="2015-01-02T00:30:12.984Z" * * UserActionItem * * "type"="userActionItem" * "userId"="as34smg4", * "event"="do_something_on", * "itemId"="kfjd312bc", * "context[ip]"="1.23.4.56", * "context[prop1]"="2.345", * "context[prop2]"="value1", * "anotherPropertyA"="4.567", // optional * "anotherPropertyB"="false", // optional * "timestamp"="2015-01-15T04:20:23.567Z" * */ private[predictionio] object ExampleFormConnector extends FormConnector { override def toEventJson(data: Map[String, String]): JObject = { val json = try { data.get("type") match { case Some("userAction") => userActionToEventJson(data) case Some("userActionItem") => userActionItemToEventJson(data) case Some(x) => throw new ConnectorException( s"Cannot convert unknown type ${x} to event JSON") case None => throw new ConnectorException( s"The field 'type' is required.") } } catch { case e: ConnectorException => throw e case e: Exception => throw new ConnectorException( s"Cannot convert ${data} to event JSON. ${e.getMessage()}", e) } json } def userActionToEventJson(data: Map[String, String]): JObject = { import org.json4s.JsonDSL._ // two level optional data val context = if (data.exists(_._1.startsWith("context["))) { Some( ("ip" -> data.get("context[ip]")) ~ ("prop1" -> data.get("context[prop1]").map(_.toDouble)) ~ ("prop2" -> data.get("context[prop2]")) ) } else { None } val json = ("event" -> data("event")) ~ ("entityType" -> "user") ~ ("entityId" -> data("userId")) ~ ("eventTime" -> data("timestamp")) ~ ("properties" -> ( ("context" -> context) ~ ("anotherProperty1" -> data("anotherProperty1").toInt) ~ ("anotherProperty2" -> data.get("anotherProperty2")) )) json } def userActionItemToEventJson(data: Map[String, String]): JObject = { import org.json4s.JsonDSL._ val json = ("event" -> data("event")) ~ ("entityType" -> "user") ~ ("entityId" -> data("userId")) ~ ("targetEntityType" -> "item") ~ ("targetEntityId" -> data("itemId")) ~ ("eventTime" -> data("timestamp")) ~ ("properties" -> ( ("context" -> ( ("ip" -> data("context[ip]")) ~ ("prop1" -> data("context[prop1]").toDouble) ~ ("prop2" -> data("context[prop2]")) )) ~ ("anotherPropertyA" -> data.get("anotherPropertyA").map(_.toDouble)) ~ ("anotherPropertyB" -> data.get("anotherPropertyB").map(_.toBoolean)) )) json } } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/webhooks/examplejson/ExampleJsonConnector.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.webhooks.examplejson import org.apache.predictionio.data.webhooks.JsonConnector import org.apache.predictionio.data.webhooks.ConnectorException import org.json4s.Formats import org.json4s.DefaultFormats import org.json4s.JObject /** Example JsonConnector with following types of webhooks JSON input: * * UserAction * * { * "type": "userAction" * "userId": "as34smg4", * "event": "do_something", * "context": { * "ip": "24.5.68.47", * "prop1": 2.345, * "prop2": "value1" * }, * "anotherProperty1": 100, * "anotherProperty2": "optional1", * "timestamp": "2015-01-02T00:30:12.984Z" * } * * UserActionItem * * { * "type": "userActionItem" * "userId": "as34smg4", * "event": "do_something_on", * "itemId": "kfjd312bc", * "context": { * "ip": "1.23.4.56", * "prop1": 2.345, * "prop2": "value1" * }, * "anotherPropertyA": 4.567, * "anotherPropertyB": false, * "timestamp": "2015-01-15T04:20:23.567Z" * } */ private[predictionio] object ExampleJsonConnector extends JsonConnector { implicit val json4sFormats: Formats = DefaultFormats override def toEventJson(data: JObject): JObject = { val common = try { data.extract[Common] } catch { case e: Exception => throw new ConnectorException( s"Cannot extract Common field from ${data}. ${e.getMessage()}", e) } val json = try { common.`type` match { case "userAction" => toEventJson(common = common, userAction = data.extract[UserAction]) case "userActionItem" => toEventJson(common = common, userActionItem = data.extract[UserActionItem]) case x: String => throw new ConnectorException( s"Cannot convert unknown type '${x}' to Event JSON.") } } catch { case e: ConnectorException => throw e case e: Exception => throw new ConnectorException( s"Cannot convert ${data} to eventJson. ${e.getMessage()}", e) } json } def toEventJson(common: Common, userAction: UserAction): JObject = { import org.json4s.JsonDSL._ // map to EventAPI JSON val json = ("event" -> userAction.event) ~ ("entityType" -> "user") ~ ("entityId" -> userAction.userId) ~ ("eventTime" -> userAction.timestamp) ~ ("properties" -> ( ("context" -> userAction.context) ~ ("anotherProperty1" -> userAction.anotherProperty1) ~ ("anotherProperty2" -> userAction.anotherProperty2) )) json } def toEventJson(common: Common, userActionItem: UserActionItem): JObject = { import org.json4s.JsonDSL._ // map to EventAPI JSON val json = ("event" -> userActionItem.event) ~ ("entityType" -> "user") ~ ("entityId" -> userActionItem.userId) ~ ("targetEntityType" -> "item") ~ ("targetEntityId" -> userActionItem.itemId) ~ ("eventTime" -> userActionItem.timestamp) ~ ("properties" -> ( ("context" -> userActionItem.context) ~ ("anotherPropertyA" -> userActionItem.anotherPropertyA) ~ ("anotherPropertyB" -> userActionItem.anotherPropertyB) )) json } // Common required fields case class Common( `type`: String ) // User Actions fields case class UserAction ( userId: String, event: String, context: Option[JObject], anotherProperty1: Int, anotherProperty2: Option[String], timestamp: String ) // UserActionItem fields case class UserActionItem ( userId: String, event: String, itemId: String, context: JObject, anotherPropertyA: Option[Double], anotherPropertyB: Option[Boolean], timestamp: String ) } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/webhooks/mailchimp/MailChimpConnector.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.webhooks.mailchimp import org.apache.predictionio.data.webhooks.FormConnector import org.apache.predictionio.data.webhooks.ConnectorException import org.apache.predictionio.data.storage.EventValidation import org.apache.predictionio.data.Utils import org.json4s.JObject import org.joda.time.DateTime import org.joda.time.format.DateTimeFormat private[predictionio] object MailChimpConnector extends FormConnector { override def toEventJson(data: Map[String, String]): JObject = { val json = data.get("type") match { case Some("subscribe") => subscribeToEventJson(data) // UNSUBSCRIBE case Some("unsubscribe") => unsubscribeToEventJson(data) // PROFILE UPDATES case Some("profile") => profileToEventJson(data) // EMAIL UPDATE case Some("upemail") => upemailToEventJson(data) // CLEANED EMAILS case Some("cleaned") => cleanedToEventJson(data) // CAMPAIGN SENDING STATUS case Some("campaign") => campaignToEventJson(data) // invalid type case Some(x) => throw new ConnectorException( s"Cannot convert unknown MailChimp data type ${x} to event JSON") case None => throw new ConnectorException( s"The field 'type' is required for MailChimp data.") } json } val mailChimpDateTimeFormat = DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss") .withZone(EventValidation.defaultTimeZone) def parseMailChimpDateTime(s: String): DateTime = { mailChimpDateTimeFormat.parseDateTime(s) } def subscribeToEventJson(data: Map[String, String]): JObject = { import org.json4s.JsonDSL._ /* "type": "subscribe", "fired_at": "2009-03-26 21:35:57", "data[id]": "8a25ff1d98", "data[list_id]": "a6b5da1054", "data[email]": "api@mailchimp.com", "data[email_type]": "html", "data[merges][EMAIL]": "api@mailchimp.com", "data[merges][FNAME]": "MailChimp", "data[merges][LNAME]": "API", "data[merges][INTERESTS]": "Group1,Group2", "data[ip_opt]": "10.20.10.30", "data[ip_signup]": "10.20.10.30" */ // convert to ISO8601 format val eventTime = Utils.dateTimeToString(parseMailChimpDateTime(data("fired_at"))) // TODO: handle optional fields val json = ("event" -> "subscribe") ~ ("entityType" -> "user") ~ ("entityId" -> data("data[id]")) ~ ("targetEntityType" -> "list") ~ ("targetEntityId" -> data("data[list_id]")) ~ ("eventTime" -> eventTime) ~ ("properties" -> ( ("email" -> data("data[email]")) ~ ("email_type" -> data("data[email_type]")) ~ ("merges" -> ( ("EMAIL" -> data("data[merges][EMAIL]")) ~ ("FNAME" -> data("data[merges][FNAME]"))) ~ ("LNAME" -> data("data[merges][LNAME]")) ~ ("INTERESTS" -> data.get("data[merges][INTERESTS]")) )) ~ ("ip_opt" -> data("data[ip_opt]")) ~ ("ip_signup" -> data("data[ip_signup]") )) json } def unsubscribeToEventJson(data: Map[String, String]): JObject = { import org.json4s.JsonDSL._ /* "action" will either be "unsub" or "delete". The reason will be "manual" unless caused by a spam complaint - then it will be "abuse" "type": "unsubscribe", "fired_at": "2009-03-26 21:40:57", "data[action]": "unsub", "data[reason]": "manual", "data[id]": "8a25ff1d98", "data[list_id]": "a6b5da1054", "data[email]": "api+unsub@mailchimp.com", "data[email_type]": "html", "data[merges][EMAIL]": "api+unsub@mailchimp.com", "data[merges][FNAME]": "MailChimp", "data[merges][LNAME]": "API", "data[merges][INTERESTS]": "Group1,Group2", "data[ip_opt]": "10.20.10.30", "data[campaign_id]": "cb398d21d2", */ // convert to ISO8601 format val eventTime = Utils.dateTimeToString(parseMailChimpDateTime(data("fired_at"))) val json = ("event" -> "unsubscribe") ~ ("entityType" -> "user") ~ ("entityId" -> data("data[id]")) ~ ("targetEntityType" -> "list") ~ ("targetEntityId" -> data("data[list_id]")) ~ ("eventTime" -> eventTime) ~ ("properties" -> ( ("action" -> data("data[action]")) ~ ("reason" -> data("data[reason]")) ~ ("email" -> data("data[email]")) ~ ("email_type" -> data("data[email_type]")) ~ ("merges" -> ( ("EMAIL" -> data("data[merges][EMAIL]")) ~ ("FNAME" -> data("data[merges][FNAME]"))) ~ ("LNAME" -> data("data[merges][LNAME]")) ~ ("INTERESTS" -> data.get("data[merges][INTERESTS]")) )) ~ ("ip_opt" -> data("data[ip_opt]")) ~ ("campaign_id" -> data("data[campaign_id]") )) json } def profileToEventJson(data: Map[String, String]): JObject = { import org.json4s.JsonDSL._ /* "type": "profile", "fired_at": "2009-03-26 21:31:21", "data[id]": "8a25ff1d98", "data[list_id]": "a6b5da1054", "data[email]": "api@mailchimp.com", "data[email_type]": "html", "data[merges][EMAIL]": "api@mailchimp.com", "data[merges][FNAME]": "MailChimp", "data[merges][LNAME]": "API", "data[merges][INTERESTS]": "Group1,Group2", \\OPTIONAL "data[ip_opt]": "10.20.10.30" */ // convert to ISO8601 format val eventTime = Utils.dateTimeToString(parseMailChimpDateTime(data("fired_at"))) val json = ("event" -> "profile") ~ ("entityType" -> "user") ~ ("entityId" -> data("data[id]")) ~ ("targetEntityType" -> "list") ~ ("targetEntityId" -> data("data[list_id]")) ~ ("eventTime" -> eventTime) ~ ("properties" -> ( ("email" -> data("data[email]")) ~ ("email_type" -> data("data[email_type]")) ~ ("merges" -> ( ("EMAIL" -> data("data[merges][EMAIL]")) ~ ("FNAME" -> data("data[merges][FNAME]"))) ~ ("LNAME" -> data("data[merges][LNAME]")) ~ ("INTERESTS" -> data.get("data[merges][INTERESTS]")) )) ~ ("ip_opt" -> data("data[ip_opt]") )) json } def upemailToEventJson(data: Map[String, String]): JObject = { import org.json4s.JsonDSL._ /* "type": "upemail", "fired_at": "2009-03-26 22:15:09", "data[list_id]": "a6b5da1054", "data[new_id]": "51da8c3259", "data[new_email]": "api+new@mailchimp.com", "data[old_email]": "api+old@mailchimp.com" */ // convert to ISO8601 format val eventTime = Utils.dateTimeToString(parseMailChimpDateTime(data("fired_at"))) val json = ("event" -> "upemail") ~ ("entityType" -> "user") ~ ("entityId" -> data("data[new_id]")) ~ ("targetEntityType" -> "list") ~ ("targetEntityId" -> data("data[list_id]")) ~ ("eventTime" -> eventTime) ~ ("properties" -> ( ("new_email" -> data("data[new_email]")) ~ ("old_email" -> data("data[old_email]")) )) json } def cleanedToEventJson(data: Map[String, String]): JObject = { import org.json4s.JsonDSL._ /* Reason will be one of "hard" (for hard bounces) or "abuse" "type": "cleaned", "fired_at": "2009-03-26 22:01:00", "data[list_id]": "a6b5da1054", "data[campaign_id]": "4fjk2ma9xd", "data[reason]": "hard", "data[email]": "api+cleaned@mailchimp.com" */ // convert to ISO8601 format val eventTime = Utils.dateTimeToString(parseMailChimpDateTime(data("fired_at"))) val json = ("event" -> "cleaned") ~ ("entityType" -> "list") ~ ("entityId" -> data("data[list_id]")) ~ ("eventTime" -> eventTime) ~ ("properties" -> ( ("campaignId" -> data("data[campaign_id]")) ~ ("reason" -> data("data[reason]")) ~ ("email" -> data("data[email]")) )) json } def campaignToEventJson(data: Map[String, String]): JObject = { import org.json4s.JsonDSL._ /* "type": "campaign", "fired_at": "2009-03-26 21:31:21", "data[id]": "5aa2102003", "data[subject]": "Test Campaign Subject", "data[status]": "sent", "data[reason]": "", "data[list_id]": "a6b5da1054" */ // convert to ISO8601 format val eventTime = Utils.dateTimeToString(parseMailChimpDateTime(data("fired_at"))) val json = ("event" -> "campaign") ~ ("entityType" -> "campaign") ~ ("entityId" -> data("data[id]")) ~ ("targetEntityType" -> "list") ~ ("targetEntityId" -> data("data[list_id]")) ~ ("eventTime" -> eventTime) ~ ("properties" -> ( ("subject" -> data("data[subject]")) ~ ("status" -> data("data[status]")) ~ ("reason" -> data("data[reason]")) )) json } } ================================================ FILE: data/src/main/scala/org/apache/predictionio/data/webhooks/segmentio/SegmentIOConnector.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.webhooks.segmentio import org.apache.predictionio.data.webhooks.{ConnectorException, JsonConnector} import org.json4s._ private[predictionio] object SegmentIOConnector extends JsonConnector { // private lazy val supportedAPI = Vector("2", "2.0", "2.0.0") implicit val json4sFormats: Formats = DefaultFormats override def toEventJson(data: JObject): JObject = { try { val version: String = data.values("version").toString /* if (!supportedAPI.contains(version)) { throw new ConnectorException( s"Supported segment.io API versions: [2]. got [$version]" ) } */ } catch { case _: Throwable => throw new ConnectorException(s"Failed to get segment.io API version.") } val common = try { data.extract[Common] } catch { case e: Throwable => throw new ConnectorException( s"Cannot extract Common field from $data. ${e.getMessage}", e ) } try { common.`type` match { case "identify" => toEventJson( common = common, identify = data.extract[Events.Identify] ) case "track" => toEventJson( common = common, track = data.extract[Events.Track] ) case "alias" => toEventJson( common = common, alias = data.extract[Events.Alias] ) case "page" => toEventJson( common = common, page = data.extract[Events.Page] ) case "screen" => toEventJson( common = common, screen = data.extract[Events.Screen] ) case "group" => toEventJson( common = common, group = data.extract[Events.Group] ) case _ => throw new ConnectorException( s"Cannot convert unknown type ${common.`type`} to event JSON." ) } } catch { case e: ConnectorException => throw e case e: Exception => throw new ConnectorException( s"Cannot convert $data to event JSON. ${e.getMessage}", e ) } } def toEventJson(common: Common, identify: Events.Identify ): JObject = { import org.json4s.JsonDSL._ val eventProperties = "traits" -> identify.traits toJson(common, eventProperties) } def toEventJson(common: Common, track: Events.Track): JObject = { import org.json4s.JsonDSL._ val eventProperties = ("properties" -> track.properties) ~ ("event" -> track.event) toJson(common, eventProperties) } def toEventJson(common: Common, alias: Events.Alias): JObject = { import org.json4s.JsonDSL._ toJson(common, "previous_id" -> alias.previous_id) } def toEventJson(common: Common, screen: Events.Screen): JObject = { import org.json4s.JsonDSL._ val eventProperties = ("name" -> screen.name) ~ ("properties" -> screen.properties) toJson(common, eventProperties) } def toEventJson(common: Common, page: Events.Page): JObject = { import org.json4s.JsonDSL._ val eventProperties = ("name" -> page.name) ~ ("properties" -> page.properties) toJson(common, eventProperties) } def toEventJson(common: Common, group: Events.Group): JObject = { import org.json4s.JsonDSL._ val eventProperties = ("group_id" -> group.group_id) ~ ("traits" -> group.traits) toJson(common, eventProperties) } private def toJson(common: Common, props: JObject): JsonAST.JObject = { val commonFields = commonToJson(common) JObject(("properties" -> properties(common, props)) :: commonFields.obj) } private def properties(common: Common, eventProps: JObject): JObject = { import org.json4s.JsonDSL._ common.context map { context => try { ("context" -> Extraction.decompose(context)) ~ eventProps } catch { case e: Throwable => throw new ConnectorException( s"Cannot convert $context to event JSON. ${e.getMessage }", e ) } } getOrElse eventProps } private def commonToJson(common: Common): JObject = commonToJson(common, common.`type`) private def commonToJson(common: Common, typ: String): JObject = { import org.json4s.JsonDSL._ common.user_id.orElse(common.anonymous_id) match { case Some(userId) => ("event" -> typ) ~ ("entityType" -> "user") ~ ("entityId" -> userId) ~ ("eventTime" -> common.timestamp) case None => throw new ConnectorException( "there was no `userId` or `anonymousId` in the common fields." ) } } } object Events { private[predictionio] case class Track( event: String, properties: Option[JObject] = None ) private[predictionio] case class Alias(previous_id: String, user_id: String) private[predictionio] case class Group( group_id: String, traits: Option[JObject] = None ) private[predictionio] case class Screen( name: Option[String] = None, properties: Option[JObject] = None ) private[predictionio] case class Page( name: Option[String] = None, properties: Option[JObject] = None ) private[predictionio] case class Identify( user_id: String, traits: Option[JObject] ) } object Common { private[predictionio] case class Integrations( All: Boolean = false, Mixpanel: Boolean = false, Marketo: Boolean = false, Salesforse: Boolean = false ) private[predictionio] case class Context( ip: String, library: Library, user_agent: String, app: Option[App] = None, campaign: Option[Campaign] = None, device: Option[Device] = None, network: Option[Network] = None, location: Option[Location] = None, os: Option[OS] = None, referrer: Option[Referrer] = None, screen: Option[Screen] = None, timezone: Option[String] = None ) private[predictionio] case class Screen(width: Int, height: Int, density: Int) private[predictionio] case class Referrer(id: String, `type`: String) private[predictionio] case class OS(name: String, version: String) private[predictionio] case class Location( city: Option[String] = None, country: Option[String] = None, latitude: Option[Double] = None, longitude: Option[Double] = None, speed: Option[Int] = None ) case class Page( path: String, referrer: String, search: String, title: String, url: String ) private[predictionio] case class Network( bluetooth: Option[Boolean] = None, carrier: Option[String] = None, cellular: Option[Boolean] = None, wifi: Option[Boolean] = None ) private[predictionio] case class Library(name: String, version: String) private[predictionio] case class Device( id: Option[String] = None, advertising_id: Option[String] = None, ad_tracking_enabled: Option[Boolean] = None, manufacturer: Option[String] = None, model: Option[String] = None, name: Option[String] = None, `type`: Option[String] = None, token: Option[String] = None ) private[predictionio] case class Campaign( name: Option[String] = None, source: Option[String] = None, medium: Option[String] = None, term: Option[String] = None, content: Option[String] = None ) private[predictionio] case class App( name: Option[String] = None, version: Option[String] = None, build: Option[String] = None ) } private[predictionio] case class Common( `type`: String, sent_at: String, timestamp: String, version: String, anonymous_id: Option[String] = None, user_id: Option[String] = None, context: Option[Common.Context] = None, integrations: Option[Common.Integrations] = None ) ================================================ FILE: data/src/test/resources/application.conf ================================================ org.apache.predictionio.data.storage { sources { mongodb { type = mongodb hosts = [localhost] ports = [27017] } elasticsearch { type = elasticsearch hosts = [localhost] ports = [9300] } } repositories { # This section is dummy just to make storage happy. # The actual testing will not bypass these repository settings completely. # Please refer to StorageTestUtils.scala. settings { name = "test_predictionio" source = mongodb } appdata { name = "test_predictionio_appdata" source = mongodb } } } ================================================ FILE: data/src/test/scala/org/apache/predictionio/data/api/EventServiceSpec.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.api import akka.event.Logging import org.apache.predictionio.data.storage.Storage import org.specs2.mutable.Specification import akka.http.scaladsl.testkit.Specs2RouteTest class EventServiceSpec extends Specification with Specs2RouteTest { val eventClient = Storage.getLEvents() val accessKeysClient = Storage.getMetaDataAccessKeys() val channelsClient = Storage.getMetaDataChannels() val statsActorRef = system.actorSelection("/user/StatsActor") val pluginsActorRef = system.actorSelection("/user/PluginsActor") val logger = Logging(system, getClass) val config = EventServerConfig(ip = "0.0.0.0", port = 7070) val route = EventServer.createRoute( eventClient, accessKeysClient, channelsClient, logger, statsActorRef, pluginsActorRef, config ) "GET / request" should { "properly produce OK HttpResponses" in { Get() ~> route ~> check { status.intValue() shouldEqual 200 responseAs[String] shouldEqual """{"status":"alive"}""" } } } } ================================================ FILE: data/src/test/scala/org/apache/predictionio/data/api/SegmentIOAuthSpec.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.api import akka.event.Logging import akka.http.scaladsl.model.ContentTypes import akka.http.scaladsl.model.headers.RawHeader import akka.http.scaladsl.server.Route import org.apache.predictionio.data.storage._ import org.joda.time.DateTime import org.specs2.mutable.Specification import sun.misc.BASE64Encoder import akka.http.scaladsl.testkit.Specs2RouteTest import scala.concurrent.{ExecutionContext, Future} class SegmentIOAuthSpec extends Specification with Specs2RouteTest { sequential isolated val eventClient = new LEvents { override def init(appId: Int, channelId: Option[Int]): Boolean = true override def futureInsert(event: Event, appId: Int, channelId: Option[Int]) (implicit ec: ExecutionContext): Future[String] = Future successful "event_id" override def futureFind( appId: Int, channelId: Option[Int], startTime: Option[DateTime], untilTime: Option[DateTime], entityType: Option[String], entityId: Option[String], eventNames: Option[Seq[String]], targetEntityType: Option[Option[String]], targetEntityId: Option[Option[String]], limit: Option[Int], reversed: Option[Boolean]) (implicit ec: ExecutionContext): Future[Iterator[Event]] = Future successful List.empty[Event].iterator override def futureGet(eventId: String, appId: Int, channelId: Option[Int]) (implicit ec: ExecutionContext): Future[Option[Event]] = Future successful None override def remove(appId: Int, channelId: Option[Int]): Boolean = true override def futureDelete(eventId: String, appId: Int, channelId: Option[Int]) (implicit ec: ExecutionContext): Future[Boolean] = Future successful true override def close(): Unit = {} } val appId = 0 val accessKeysClient = new AccessKeys { override def insert(k: AccessKey): Option[String] = null override def getByAppid(appid: Int): Seq[AccessKey] = null override def update(k: AccessKey): Unit = {} override def delete(k: String): Unit = {} override def getAll(): Seq[AccessKey] = null override def get(k: String): Option[AccessKey] = k match { case "abc" => Some(AccessKey(k, appId, Seq.empty)) case _ => None } } val channelsClient = Storage.getMetaDataChannels() val statsActorRef = system.actorSelection("/user/StatsActor") val pluginsActorRef = system.actorSelection("/user/PluginsActor") val base64Encoder = new BASE64Encoder val logger = Logging(system, getClass) val config = EventServerConfig(ip = "0.0.0.0", port = 7070) val route = EventServer.createRoute( eventClient, accessKeysClient, channelsClient, logger, statsActorRef, pluginsActorRef, config ) "Event Service" should { "reject with CredentialsRejected with invalid credentials" in new StorageMockContext { val accessKey = "abc123:" Post("/webhooks/segmentio.json") .withHeaders(RawHeader("Authorization", s"Basic $accessKey")) ~> Route.seal(route) ~> check { status.intValue() shouldEqual 401 responseAs[String] shouldEqual """{"message":"Invalid accessKey."}""" } success } } "reject with CredentialsMissed without credentials" in { Post("/webhooks/segmentio.json") ~> Route.seal(route) ~> check { status.intValue() shouldEqual 401 responseAs[String] shouldEqual """{"message":"Missing accessKey."}""" } success } "process SegmentIO identity request properly" in { val jsonReq = """ |{ | "anonymous_id": "507f191e810c19729de860ea", | "channel": "browser", | "context": { | "ip": "8.8.8.8", | "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5)" | }, | "message_id": "022bb90c-bbac-11e4-8dfc-aa07a5b093db", | "timestamp": "2015-02-23T22:28:55.387Z", | "sent_at": "2015-02-23T22:28:55.111Z", | "traits": { | "name": "Peter Gibbons", | "email": "peter@initech.com", | "plan": "premium", | "logins": 5 | }, | "type": "identify", | "user_id": "97980cfea0067", | "version": "2" |} """.stripMargin val accessKey = "abc:" val accessKeyEncoded = base64Encoder.encodeBuffer(accessKey.getBytes) Post("/webhooks/segmentio.json") .withHeaders(RawHeader("Authorization", s"Basic $accessKeyEncoded")) .withEntity(ContentTypes.`application/json`, jsonReq) ~> route ~> check { println(responseAs[String]) status.intValue() shouldEqual 201 responseAs[String] shouldEqual """{"eventId":"event_id"}""" } success } } ================================================ FILE: data/src/test/scala/org/apache/predictionio/data/storage/BiMapSpec.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.storage import org.specs2.mutable._ import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf import org.apache.spark.rdd.RDD class BiMapSpec extends Specification { System.clearProperty("spark.driver.port") System.clearProperty("spark.hostPort") val sc = new SparkContext("local[4]", "BiMapSpec test") "BiMap created with map" should { val keys = Seq(1, 4, 6) val orgValues = Seq(2, 5, 7) val org = keys.zip(orgValues).toMap val bi = BiMap(org) "return correct values for each key of original map" in { val biValues = keys.map(k => bi(k)) biValues must beEqualTo(orgValues) } "get return Option[V]" in { val checkKeys = keys ++ Seq(12345) val biValues = checkKeys.map(k => bi.get(k)) val expected = orgValues.map(Some(_)) ++ Seq(None) biValues must beEqualTo(expected) } "getOrElse return value for each key of original map" in { val biValues = keys.map(k => bi.getOrElse(k, -1)) biValues must beEqualTo(orgValues) } "getOrElse return default values for invalid key" in { val keys = Seq(999, -1, -2) val defaults = Seq(1234, 5678, 987) val biValues = keys.zip(defaults).map{ case (k,d) => bi.getOrElse(k, d) } biValues must beEqualTo(defaults) } "contains() returns true/false correctly" in { val checkKeys = keys ++ Seq(12345) val biValues = checkKeys.map(k => bi.contains(k)) val expected = orgValues.map(_ => true) ++ Seq(false) biValues must beEqualTo(expected) } "same size as original map" in { (bi.size) must beEqualTo(org.size) } "take(2) returns BiMap of size 2" in { bi.take(2).size must beEqualTo(2) } "toMap contain same element as original map" in { (bi.toMap) must beEqualTo(org) } "toSeq contain same element as original map" in { (bi.toSeq) must containTheSameElementsAs(org.toSeq) } "inverse and return correct keys for each values of original map" in { val biKeys = orgValues.map(v => bi.inverse(v)) biKeys must beEqualTo(keys) } "inverse with same size" in { bi.inverse.size must beEqualTo(org.size) } "inverse's inverse reference back to the same original object" in { // NOTE: reference equality bi.inverse.inverse == bi } } "BiMap created with duplicated values in map" should { val dup = Map(1 -> 2, 4 -> 7, 6 -> 7) "return IllegalArgumentException" in { BiMap(dup) must throwA[IllegalArgumentException] } } "BiMap.stringLong and stringInt" should { "create BiMap from set of string" in { val keys = Set("a", "b", "foo", "bar") val values: Seq[Long] = Seq(0, 1, 2, 3) val bi = BiMap.stringLong(keys) val biValues = keys.map(k => bi(k)) val biInt = BiMap.stringInt(keys) val valuesInt: Seq[Int] = values.map(_.toInt) val biIntValues = keys.map(k => biInt(k)) biValues must containTheSameElementsAs(values) and (biIntValues must containTheSameElementsAs(valuesInt)) } "create BiMap from Array of unique string" in { val keys = Array("a", "b", "foo", "bar") val values: Seq[Long] = Seq(0, 1, 2, 3) val bi = BiMap.stringLong(keys) val biValues = keys.toSeq.map(k => bi(k)) val biInt = BiMap.stringInt(keys) val valuesInt: Seq[Int] = values.map(_.toInt) val biIntValues = keys.toSeq.map(k => biInt(k)) biValues must containTheSameElementsAs(values) and (biIntValues must containTheSameElementsAs(valuesInt)) } "not guarantee sequential index for Array with duplicated string" in { val keys = Array("a", "b", "foo", "bar", "a", "b", "x") val dupValues: Seq[Long] = Seq(0, 1, 2, 3, 4, 5, 6) val values = keys.zip(dupValues).toMap.values.toSeq val bi = BiMap.stringLong(keys) val biValues = keys.toSet[String].map(k => bi(k)) val biInt = BiMap.stringInt(keys) val valuesInt: Seq[Int] = values.map(_.toInt) val biIntValues = keys.toSet[String].map(k => biInt(k)) biValues must containTheSameElementsAs(values) and (biIntValues must containTheSameElementsAs(valuesInt)) } "create BiMap from RDD[String]" in { val keys = Seq("a", "b", "foo", "bar") val values: Seq[Long] = Seq(0, 1, 2, 3) val rdd = sc.parallelize(keys) val bi = BiMap.stringLong(rdd) val biValues = keys.map(k => bi(k)) val biInt = BiMap.stringInt(rdd) val valuesInt: Seq[Int] = values.map(_.toInt) val biIntValues = keys.map(k => biInt(k)) biValues must containTheSameElementsAs(values) and (biIntValues must containTheSameElementsAs(valuesInt)) } "create BiMap from RDD[String] with duplicated string" in { val keys = Seq("a", "b", "foo", "bar", "a", "b", "x") val values: Seq[Long] = Seq(0, 1, 2, 3, 4) val rdd = sc.parallelize(keys) val bi = BiMap.stringLong(rdd) val biValues = keys.distinct.map(k => bi(k)) val biInt = BiMap.stringInt(rdd) val valuesInt: Seq[Int] = values.map(_.toInt) val biIntValues = keys.distinct.map(k => biInt(k)) biValues must containTheSameElementsAs(values) and (biIntValues must containTheSameElementsAs(valuesInt)) } } step(sc.stop()) } ================================================ FILE: data/src/test/scala/org/apache/predictionio/data/storage/DataMapSpec.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.storage import org.specs2.mutable._ class DataMapSpec extends Specification { "DataMap" should { val properties = DataMap(""" { "prop1" : 1, "prop2" : "value2", "prop3" : [1, 2, 3], "prop4" : true, "prop5" : ["a", "b", "c", "c"], "prop6" : 4.56 } """) "get Int data" in { properties.get[Int]("prop1") must beEqualTo(1) properties.getOpt[Int]("prop1") must beEqualTo(Some(1)) } "get String data" in { properties.get[String]("prop2") must beEqualTo("value2") properties.getOpt[String]("prop2") must beEqualTo(Some("value2")) } "get List of Int data" in { properties.get[List[Int]]("prop3") must beEqualTo(List(1,2,3)) properties.getOpt[List[Int]]("prop3") must beEqualTo(Some(List(1,2,3))) } "get Boolean data" in { properties.get[Boolean]("prop4") must beEqualTo(true) properties.getOpt[Boolean]("prop4") must beEqualTo(Some(true)) } "get List of String data" in { properties.get[List[String]]("prop5") must beEqualTo(List("a", "b", "c", "c")) properties.getOpt[List[String]]("prop5") must beEqualTo(Some(List("a", "b", "c", "c"))) } "get Set of String data" in { properties.get[Set[String]]("prop5") must beEqualTo(Set("a", "b", "c")) properties.getOpt[Set[String]]("prop5") must beEqualTo(Some(Set("a", "b", "c"))) } "get Double data" in { properties.get[Double]("prop6") must beEqualTo(4.56) properties.getOpt[Double]("prop6") must beEqualTo(Some(4.56)) } "get empty optional Int data" in { properties.getOpt[Int]("prop9999") must beEqualTo(None) } } "DataMap with multi-level data" should { val properties = DataMap(""" { "context": { "ip": "1.23.4.56", "prop1": 2.345 "prop2": "value1", "prop4": [1, 2, 3] }, "anotherPropertyA": 4.567, "anotherPropertyB": false } """) "get case class data" in { val expected = DataMapSpec.Context( ip = "1.23.4.56", prop1 = Some(2.345), prop2 = Some("value1"), prop3 = None, prop4 = List(1,2,3) ) properties.get[DataMapSpec.Context]("context") must beEqualTo(expected) } "get empty optional case class data" in { properties.getOpt[DataMapSpec.Context]("context999") must beEqualTo(None) } "get double data" in { properties.get[Double]("anotherPropertyA") must beEqualTo(4.567) } "get boolean data" in { properties.get[Boolean]("anotherPropertyB") must beEqualTo(false) } } "DataMap extract" should { "extract to case class object" in { val properties = DataMap(""" { "prop1" : 1, "prop2" : "value2", "prop3" : [1, 2, 3], "prop4" : true, "prop5" : ["a", "b", "c", "c"], "prop6" : 4.56 } """) val result = properties.extract[DataMapSpec.BasicProperty] val expected = DataMapSpec.BasicProperty( prop1 = 1, prop2 = "value2", prop3 = List(1,2,3), prop4 = true, prop5 = List("a", "b", "c", "c"), prop6 = 4.56 ) result must beEqualTo(expected) } "extract with optional fields" in { val propertiesEmpty = DataMap("""{}""") val propertiesSome = DataMap(""" { "prop1" : 1, "prop5" : ["a", "b", "c", "c"], "prop6" : 4.56 } """) val resultEmpty = propertiesEmpty.extract[DataMapSpec.OptionProperty] val expectedEmpty = DataMapSpec.OptionProperty( prop1 = None, prop2 = None, prop3 = None, prop4 = None, prop5 = None, prop6 = None ) val resultSome = propertiesSome.extract[DataMapSpec.OptionProperty] val expectedSome = DataMapSpec.OptionProperty( prop1 = Some(1), prop2 = None, prop3 = None, prop4 = None, prop5 = Some(List("a", "b", "c", "c")), prop6 = Some(4.56) ) resultEmpty must beEqualTo(expectedEmpty) resultSome must beEqualTo(expectedSome) } "extract to multi-level object" in { val properties = DataMap(""" { "context": { "ip": "1.23.4.56", "prop1": 2.345 "prop2": "value1", "prop4": [1, 2, 3] }, "anotherPropertyA": 4.567, "anotherPropertyB": false } """) val result = properties.extract[DataMapSpec.MultiLevelProperty] val expected = DataMapSpec.MultiLevelProperty( context = DataMapSpec.Context( ip = "1.23.4.56", prop1 = Some(2.345), prop2 = Some("value1"), prop3 = None, prop4 = List(1,2,3) ), anotherPropertyA = 4.567, anotherPropertyB = false ) result must beEqualTo(expected) } } } object DataMapSpec { // define this case class inside object to avoid case class name conflict with other tests case class Context( ip: String, prop1: Option[Double], prop2: Option[String], prop3: Option[Int], prop4: List[Int] ) case class BasicProperty( prop1: Int, prop2: String, prop3: List[Int], prop4: Boolean, prop5: List[String], prop6: Double ) case class OptionProperty( prop1: Option[Int], prop2: Option[String], prop3: Option[List[Int]], prop4: Option[Boolean], prop5: Option[List[String]], prop6: Option[Double] ) case class MultiLevelProperty( context: Context, anotherPropertyA: Double, anotherPropertyB: Boolean ) } ================================================ FILE: data/src/test/scala/org/apache/predictionio/data/storage/LEventAggregatorSpec.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.storage import org.specs2.mutable._ import org.json4s.JObject import org.json4s.native.JsonMethods.parse import org.joda.time.DateTime class LEventAggregatorSpec extends Specification with TestEvents { "LEventAggregator.aggregateProperties()" should { "aggregate two entities' properties as DataMap correctly" in { val events = Vector(u1e5, u2e2, u1e3, u1e1, u2e3, u2e1, u1e4, u1e2) val result: Map[String, DataMap] = LEventAggregator.aggregateProperties(events.toIterator) val expected = Map( "u1" -> DataMap(u1), "u2" -> DataMap(u2) ) result must beEqualTo(expected) } "aggregate two entities' properties as PropertyMap correctly" in { val events = Vector(u1e5, u2e2, u1e3, u1e1, u2e3, u2e1, u1e4, u1e2) val result: Map[String, PropertyMap] = LEventAggregator.aggregateProperties(events.toIterator) val expected = Map( "u1" -> PropertyMap(u1, u1BaseTime, u1LastTime), "u2" -> PropertyMap(u2, u2BaseTime, u2LastTime) ) result must beEqualTo(expected) } "aggregate deleted entity correctly" in { val events = Vector(u1e5, u2e2, u1e3, u1ed, u1e1, u2e3, u2e1, u1e4, u1e2) val result = LEventAggregator.aggregateProperties(events.toIterator) val expected = Map( "u2" -> PropertyMap(u2, u2BaseTime, u2LastTime) ) result must beEqualTo(expected) } } "LEventAggregator.aggregatePropertiesSingle()" should { "aggregate single entity properties as DataMap correctly" in { val events = Vector(u1e5, u1e3, u1e1, u1e4, u1e2) val eventsIt = events.toIterator val result: Option[DataMap] = LEventAggregator .aggregatePropertiesSingle(eventsIt) val expected = DataMap(u1) result must beEqualTo(Some(expected)) } "aggregate single entity properties as PropertyMap correctly" in { val events = Vector(u1e5, u1e3, u1e1, u1e4, u1e2) val eventsIt = events.toIterator val result: Option[PropertyMap] = LEventAggregator .aggregatePropertiesSingle(eventsIt) val expected = PropertyMap(u1, u1BaseTime, u1LastTime) result must beEqualTo(Some(expected)) } "aggregate deleted entity correctly" in { // put the delete event in the middle val events = Vector(u1e4, u1e2, u1ed, u1e3, u1e1, u1e5) val eventsIt = events.toIterator val result = LEventAggregator.aggregatePropertiesSingle(eventsIt) result must beEqualTo(None) } } } ================================================ FILE: data/src/test/scala/org/apache/predictionio/data/storage/PEventAggregatorSpec.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.storage import org.specs2.mutable._ import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.rdd.RDD class PEventAggregatorSpec extends Specification with TestEvents { System.clearProperty("spark.driver.port") System.clearProperty("spark.hostPort") val sc = new SparkContext("local[4]", "PEventAggregatorSpec test") "PEventAggregator" should { "aggregate two entities' properties as DataMap/PropertyMap correctly" in { val events = sc.parallelize(Seq( u1e5, u2e2, u1e3, u1e1, u2e3, u2e1, u1e4, u1e2)) val users = PEventAggregator.aggregateProperties(events) val userMap = users.collectAsMap.toMap val expectedDM = Map( "u1" -> DataMap(u1), "u2" -> DataMap(u2) ) val expectedPM = Map( "u1" -> PropertyMap(u1, u1BaseTime, u1LastTime), "u2" -> PropertyMap(u2, u2BaseTime, u2LastTime) ) userMap must beEqualTo(expectedDM) userMap must beEqualTo(expectedPM) } "aggregate deleted entity correctly" in { // put the delete event in middle val events = sc.parallelize(Seq( u1e5, u2e2, u1e3, u1ed, u1e1, u2e3, u2e1, u1e4, u1e2)) val users = PEventAggregator.aggregateProperties(events) val userMap = users.collectAsMap.toMap val expectedPM = Map( "u2" -> PropertyMap(u2, u2BaseTime, u2LastTime) ) userMap must beEqualTo(expectedPM) } } step(sc.stop()) } ================================================ FILE: data/src/test/scala/org/apache/predictionio/data/storage/StorageMockContext.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.storage import org.scalamock.specs2.MockContext trait StorageMockContext extends MockContext { if(!EnvironmentFactory.environmentService.isDefined){ val mockedEnvService = mock[EnvironmentService] (mockedEnvService.envKeys _) .expects .returning(List("PIO_STORAGE_REPOSITORIES_METADATA_NAME", "PIO_STORAGE_SOURCES_MYSQL_TYPE", "PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME", "PIO_STORAGE_SOURCES_EVENTDATA_TYPE")) .twice (mockedEnvService.getByKey _) .expects("PIO_STORAGE_REPOSITORIES_METADATA_NAME") .returning("test_metadata") (mockedEnvService.getByKey _) .expects("PIO_STORAGE_REPOSITORIES_METADATA_SOURCE") .returning("MYSQL") (mockedEnvService.getByKey _) .expects("PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME") .returning("test_eventdata") (mockedEnvService.getByKey _) .expects("PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE") .returning("MYSQL") (mockedEnvService.getByKey _) .expects("PIO_STORAGE_SOURCES_MYSQL_TYPE") .returning("jdbc") (mockedEnvService.filter _) .expects(*) .returning(Map( "URL" -> "jdbc:h2:~/test;MODE=MySQL;AUTO_SERVER=TRUE", "USERNAME" -> "sa", "PASSWORD" -> "") ) EnvironmentFactory.environmentService = new Some(mockedEnvService) } } ================================================ FILE: data/src/test/scala/org/apache/predictionio/data/storage/TestEvents.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.storage import org.joda.time.DateTime import org.joda.time.DateTimeZone trait TestEvents { val u1BaseTime = new DateTime(654321) val u2BaseTime = new DateTime(6543210) val u3BaseTime = new DateTime(6543410) // u1 events val u1e1 = Event( event = "$set", entityType = "user", entityId = "u1", properties = DataMap( """{ "a" : 1, "b" : "value2", "d" : [1, 2, 3], }"""), eventTime = u1BaseTime ) val u1e2 = u1e1.copy( event = "$set", properties = DataMap("""{"a" : 2}"""), eventTime = u1BaseTime.plusDays(1) ) val u1e3 = u1e1.copy( event = "$set", properties = DataMap("""{"b" : "value4"}"""), eventTime = u1BaseTime.plusDays(2) ) val u1e4 = u1e1.copy( event = "$unset", properties = DataMap("""{"b" : null}"""), eventTime = u1BaseTime.plusDays(3) ) val u1e5 = u1e1.copy( event = "$set", properties = DataMap("""{"e" : "new"}"""), eventTime = u1BaseTime.plusDays(4) ) val u1LastTime = u1BaseTime.plusDays(4) val u1 = """{"a": 2, "d": [1, 2, 3], "e": "new"}""" // delete event for u1 val u1ed = u1e1.copy( event = "$delete", properties = DataMap(), eventTime = u1BaseTime.plusDays(5) ) // u2 events val u2e1 = Event( event = "$set", entityType = "user", entityId = "u2", properties = DataMap( """{ "a" : 21, "b" : "value12", "d" : [7, 5, 6], }"""), eventTime = u2BaseTime ) val u2e2 = u2e1.copy( event = "$unset", properties = DataMap("""{"a" : null}"""), eventTime = u2BaseTime.plusDays(1) ) val u2e3 = u2e1.copy( event = "$set", properties = DataMap("""{"b" : "value9", "g": "new11"}"""), eventTime = u2BaseTime.plusDays(2) ) val u2LastTime = u2BaseTime.plusDays(2) val u2 = """{"b": "value9", "d": [7, 5, 6], "g": "new11"}""" // u3 events val u3e1 = Event( event = "$set", entityType = "user", entityId = "u3", properties = DataMap( """{ "a" : 22, "b" : "value13", "d" : [5, 6, 1], }"""), eventTime = u3BaseTime ) val u3e2 = u3e1.copy( event = "$unset", properties = DataMap("""{"a" : null}"""), eventTime = u3BaseTime.plusDays(1) ) val u3e3 = u3e1.copy( event = "$set", properties = DataMap("""{"b" : "value10", "f": "new12", "d" : [1, 3, 2]}"""), eventTime = u3BaseTime.plusDays(2) ) val u3LastTime = u3BaseTime.plusDays(2) val u3 = """{"b": "value10", "d": [1, 3, 2], "f": "new12"}""" // some random events val r1 = Event( event = "my_event", entityType = "my_entity_type", entityId = "my_entity_id", targetEntityType = Some("my_target_entity_type"), targetEntityId = Some("my_target_entity_id"), properties = DataMap( """{ "prop1" : 1, "prop2" : "value2", "prop3" : [1, 2, 3], "prop4" : true, "prop5" : ["a", "b", "c"], "prop6" : 4.56 }""" ), eventTime = DateTime.now, prId = Some("my_prid") ) val r2 = Event( event = "my_event2", entityType = "my_entity_type2", entityId = "my_entity_id2" ) val r3 = Event( event = "my_event3", entityType = "my_entity_type", entityId = "my_entity_id", targetEntityType = Some("my_target_entity_type"), targetEntityId = Some("my_target_entity_id"), properties = DataMap( """{ "propA" : 1.2345, "propB" : "valueB", }""" ), prId = Some("my_prid") ) val r4 = Event( event = "my_event4", entityType = "my_entity_type4", entityId = "my_entity_id4", targetEntityType = Some("my_target_entity_type4"), targetEntityId = Some("my_target_entity_id4"), properties = DataMap( """{ "prop1" : 1, "prop2" : "value2", "prop3" : [1, 2, 3], "prop4" : true, "prop5" : ["a", "b", "c"], "prop6" : 4.56 }"""), eventTime = DateTime.now ) val r5 = Event( event = "my_event5", entityType = "my_entity_type5", entityId = "my_entity_id5", targetEntityType = Some("my_target_entity_type5"), targetEntityId = Some("my_target_entity_id5"), properties = DataMap( """{ "prop1" : 1, "prop2" : "value2", "prop3" : [1, 2, 3], "prop4" : true, "prop5" : ["a", "b", "c"], "prop6" : 4.56 }""" ), eventTime = DateTime.now ) val r6 = Event( event = "my_event6", entityType = "my_entity_type6", entityId = "my_entity_id6", targetEntityType = Some("my_target_entity_type6"), targetEntityId = Some("my_target_entity_id6"), properties = DataMap( """{ "prop1" : 6, "prop2" : "value2", "prop3" : [6, 7, 8], "prop4" : true, "prop5" : ["a", "b", "c"], "prop6" : 4.56 }""" ), eventTime = DateTime.now ) // timezone val tz1 = Event( event = "my_event", entityType = "my_entity_type", entityId = "my_entity_id0", targetEntityType = Some("my_target_entity_type"), targetEntityId = Some("my_target_entity_id"), properties = DataMap( """{ "prop1" : 1, "prop2" : "value2", "prop3" : [1, 2, 3], "prop4" : true, "prop5" : ["a", "b", "c"], "prop6" : 4.56 }""" ), eventTime = new DateTime(12345678, DateTimeZone.forID("-08:00")), prId = Some("my_prid") ) val tz2 = Event( event = "my_event", entityType = "my_entity_type", entityId = "my_entity_id1", eventTime = new DateTime(12345678, DateTimeZone.forID("+02:00")), prId = Some("my_prid") ) val tz3 = Event( event = "my_event", entityType = "my_entity_type", entityId = "my_entity_id2", eventTime = new DateTime(12345678, DateTimeZone.forID("+08:00")), prId = Some("my_prid") ) } ================================================ FILE: data/src/test/scala/org/apache/predictionio/data/webhooks/ConnectorTestUtil.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.webhooks import org.specs2.execute.Result import org.specs2.mutable._ import org.json4s.JObject import org.json4s.DefaultFormats import org.json4s.native.JsonMethods.parse import org.json4s.native.Serialization.write /** TestUtil for JsonConnector */ trait ConnectorTestUtil extends Specification { implicit val formats = DefaultFormats def check(connector: JsonConnector, original: String, event: String): Result = { val originalJson = parse(original).asInstanceOf[JObject] val eventJson = parse(event).asInstanceOf[JObject] // write and parse back to discard any JNothing field val result = parse(write(connector.toEventJson(originalJson))).asInstanceOf[JObject] result.obj must containTheSameElementsAs(eventJson.obj) } def check(connector: FormConnector, original: Map[String, String], event: String) = { val eventJson = parse(event).asInstanceOf[JObject] // write and parse back to discard any JNothing field val result = parse(write(connector.toEventJson(original))).asInstanceOf[JObject] result.obj must containTheSameElementsAs(eventJson.obj) } } ================================================ FILE: data/src/test/scala/org/apache/predictionio/data/webhooks/exampleform/ExampleFormConnectorSpec.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.webhooks.exampleform import org.apache.predictionio.data.webhooks.ConnectorTestUtil import org.specs2.mutable._ /** Test the ExampleFormConnector */ class ExampleFormConnectorSpec extends Specification with ConnectorTestUtil { "ExampleFormConnector" should { "convert userAction to Event JSON" in { // webhooks input val userAction = Map( "type" -> "userAction", "userId" -> "as34smg4", "event" -> "do_something", "context[ip]" -> "24.5.68.47", // optional "context[prop1]" -> "2.345", // optional "context[prop2]" -> "value1", // optional "anotherProperty1" -> "100", "anotherProperty2"-> "optional1", // optional "timestamp" -> "2015-01-02T00:30:12.984Z" ) // expected converted Event JSON val expected = """ { "event": "do_something", "entityType": "user", "entityId": "as34smg4", "properties": { "context": { "ip": "24.5.68.47", "prop1": 2.345 "prop2": "value1" }, "anotherProperty1": 100, "anotherProperty2": "optional1" } "eventTime": "2015-01-02T00:30:12.984Z" } """ check(ExampleFormConnector, userAction, expected) } "convert userAction without optional fields to Event JSON" in { // webhooks input val userAction = Map( "type" -> "userAction", "userId" -> "as34smg4", "event" -> "do_something", "anotherProperty1" -> "100", "timestamp" -> "2015-01-02T00:30:12.984Z" ) // expected converted Event JSON val expected = """ { "event": "do_something", "entityType": "user", "entityId": "as34smg4", "properties": { "anotherProperty1": 100, } "eventTime": "2015-01-02T00:30:12.984Z" } """ check(ExampleFormConnector, userAction, expected) } "convert userActionItem to Event JSON" in { // webhooks input val userActionItem = Map( "type" -> "userActionItem", "userId" -> "as34smg4", "event" -> "do_something_on", "itemId" -> "kfjd312bc", "context[ip]" -> "1.23.4.56", "context[prop1]" -> "2.345", "context[prop2]" -> "value1", "anotherPropertyA" -> "4.567", // optional "anotherPropertyB" -> "false", // optional "timestamp" -> "2015-01-15T04:20:23.567Z" ) // expected converted Event JSON val expected = """ { "event": "do_something_on", "entityType": "user", "entityId": "as34smg4", "targetEntityType": "item", "targetEntityId": "kfjd312bc" "properties": { "context": { "ip": "1.23.4.56", "prop1": 2.345 "prop2": "value1" }, "anotherPropertyA": 4.567 "anotherPropertyB": false } "eventTime": "2015-01-15T04:20:23.567Z" } """ check(ExampleFormConnector, userActionItem, expected) } "convert userActionItem without optional fields to Event JSON" in { // webhooks input val userActionItem = Map( "type" -> "userActionItem", "userId" -> "as34smg4", "event" -> "do_something_on", "itemId" -> "kfjd312bc", "context[ip]" -> "1.23.4.56", "context[prop1]" -> "2.345", "context[prop2]" -> "value1", "timestamp" -> "2015-01-15T04:20:23.567Z" ) // expected converted Event JSON val expected = """ { "event": "do_something_on", "entityType": "user", "entityId": "as34smg4", "targetEntityType": "item", "targetEntityId": "kfjd312bc" "properties": { "context": { "ip": "1.23.4.56", "prop1": 2.345 "prop2": "value1" } } "eventTime": "2015-01-15T04:20:23.567Z" } """ check(ExampleFormConnector, userActionItem, expected) } } } ================================================ FILE: data/src/test/scala/org/apache/predictionio/data/webhooks/examplejson/ExampleJsonConnectorSpec.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.webhooks.examplejson import org.apache.predictionio.data.webhooks.ConnectorTestUtil import org.specs2.mutable._ /** Test the ExampleJsonConnector */ class ExampleJsonConnectorSpec extends Specification with ConnectorTestUtil { "ExampleJsonConnector" should { "convert userAction to Event JSON" in { // webhooks input val userAction = """ { "type": "userAction" "userId": "as34smg4", "event": "do_something", "context": { "ip": "24.5.68.47", "prop1": 2.345 "prop2": "value1" }, "anotherProperty1": 100, "anotherProperty2": "optional1", "timestamp": "2015-01-02T00:30:12.984Z" } """ // expected converted Event JSON val expected = """ { "event": "do_something", "entityType": "user", "entityId": "as34smg4", "properties": { "context": { "ip": "24.5.68.47", "prop1": 2.345 "prop2": "value1" }, "anotherProperty1": 100, "anotherProperty2": "optional1" } "eventTime": "2015-01-02T00:30:12.984Z" } """ check(ExampleJsonConnector, userAction, expected) } "convert userAction without optional field to Event JSON" in { // webhooks input val userAction = """ { "type": "userAction" "userId": "as34smg4", "event": "do_something", "anotherProperty1": 100, "timestamp": "2015-01-02T00:30:12.984Z" } """ // expected converted Event JSON val expected = """ { "event": "do_something", "entityType": "user", "entityId": "as34smg4", "properties": { "anotherProperty1": 100, } "eventTime": "2015-01-02T00:30:12.984Z" } """ check(ExampleJsonConnector, userAction, expected) } "convert userActionItem to Event JSON" in { // webhooks input val userActionItem = """ { "type": "userActionItem" "userId": "as34smg4", "event": "do_something_on", "itemId": "kfjd312bc", "context": { "ip": "1.23.4.56", "prop1": 2.345 "prop2": "value1" }, "anotherPropertyA": 4.567 "anotherPropertyB": false "timestamp": "2015-01-15T04:20:23.567Z" } """ // expected converted Event JSON val expected = """ { "event": "do_something_on", "entityType": "user", "entityId": "as34smg4", "targetEntityType": "item", "targetEntityId": "kfjd312bc" "properties": { "context": { "ip": "1.23.4.56", "prop1": 2.345 "prop2": "value1" }, "anotherPropertyA": 4.567 "anotherPropertyB": false } "eventTime": "2015-01-15T04:20:23.567Z" } """ check(ExampleJsonConnector, userActionItem, expected) } "convert userActionItem without optional fields to Event JSON" in { // webhooks input val userActionItem = """ { "type": "userActionItem" "userId": "as34smg4", "event": "do_something_on", "itemId": "kfjd312bc", "context": { "ip": "1.23.4.56", "prop1": 2.345 "prop2": "value1" } "timestamp": "2015-01-15T04:20:23.567Z" } """ // expected converted Event JSON val expected = """ { "event": "do_something_on", "entityType": "user", "entityId": "as34smg4", "targetEntityType": "item", "targetEntityId": "kfjd312bc" "properties": { "context": { "ip": "1.23.4.56", "prop1": 2.345 "prop2": "value1" } } "eventTime": "2015-01-15T04:20:23.567Z" } """ check(ExampleJsonConnector, userActionItem, expected) } } } ================================================ FILE: data/src/test/scala/org/apache/predictionio/data/webhooks/mailchimp/MailChimpConnectorSpec.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.webhooks.mailchimp import org.apache.predictionio.data.webhooks.ConnectorTestUtil import org.specs2.mutable._ class MailChimpConnectorSpec extends Specification with ConnectorTestUtil { // TODO: test other events // TODO: test different optional fields "MailChimpConnector" should { "convert subscribe to event JSON" in { val subscribe = Map( "type" -> "subscribe", "fired_at" -> "2009-03-26 21:35:57", "data[id]" -> "8a25ff1d98", "data[list_id]" -> "a6b5da1054", "data[email]" -> "api@mailchimp.com", "data[email_type]" -> "html", "data[merges][EMAIL]" -> "api@mailchimp.com", "data[merges][FNAME]" -> "MailChimp", "data[merges][LNAME]" -> "API", "data[merges][INTERESTS]" -> "Group1,Group2", //optional "data[ip_opt]" -> "10.20.10.30", "data[ip_signup]" -> "10.20.10.30" ) val expected = """ { "event" : "subscribe", "entityType" : "user", "entityId" : "8a25ff1d98", "targetEntityType" : "list", "targetEntityId" : "a6b5da1054", "properties" : { "email" : "api@mailchimp.com", "email_type" : "html", "merges" : { "EMAIL" : "api@mailchimp.com", "FNAME" : "MailChimp", "LNAME" : "API" "INTERESTS" : "Group1,Group2" }, "ip_opt" : "10.20.10.30", "ip_signup" : "10.20.10.30" }, "eventTime" : "2009-03-26T21:35:57.000Z" } """ check(MailChimpConnector, subscribe, expected) } //check unsubscribe to event Json "convert unsubscribe to event JSON" in { val unsubscribe = Map( "type" -> "unsubscribe", "fired_at" -> "2009-03-26 21:40:57", "data[action]" -> "unsub", "data[reason]" -> "manual", "data[id]" -> "8a25ff1d98", "data[list_id]" -> "a6b5da1054", "data[email]" -> "api+unsub@mailchimp.com", "data[email_type]" -> "html", "data[merges][EMAIL]" -> "api+unsub@mailchimp.com", "data[merges][FNAME]" -> "MailChimp", "data[merges][LNAME]" -> "API", "data[merges][INTERESTS]" -> "Group1,Group2", //optional "data[ip_opt]" -> "10.20.10.30", "data[campaign_id]" -> "cb398d21d2" ) val expected = """ { "event" : "unsubscribe", "entityType" : "user", "entityId" : "8a25ff1d98", "targetEntityType" : "list", "targetEntityId" : "a6b5da1054", "properties" : { "action" : "unsub", "reason" : "manual", "email" : "api+unsub@mailchimp.com", "email_type" : "html", "merges" : { "EMAIL" : "api+unsub@mailchimp.com", "FNAME" : "MailChimp", "LNAME" : "API" "INTERESTS" : "Group1,Group2" }, "ip_opt" : "10.20.10.30", "campaign_id" : "cb398d21d2" }, "eventTime" : "2009-03-26T21:40:57.000Z" } """ check(MailChimpConnector, unsubscribe, expected) } //check profile update to event Json "convert profile update to event JSON" in { val profileUpdate = Map( "type" -> "profile", "fired_at" -> "2009-03-26 21:31:21", "data[id]" -> "8a25ff1d98", "data[list_id]" -> "a6b5da1054", "data[email]" -> "api@mailchimp.com", "data[email_type]" -> "html", "data[merges][EMAIL]" -> "api@mailchimp.com", "data[merges][FNAME]" -> "MailChimp", "data[merges][LNAME]" -> "API", "data[merges][INTERESTS]" -> "Group1,Group2", //optional "data[ip_opt]" -> "10.20.10.30" ) val expected = """ { "event" : "profile", "entityType" : "user", "entityId" : "8a25ff1d98", "targetEntityType" : "list", "targetEntityId" : "a6b5da1054", "properties" : { "email" : "api@mailchimp.com", "email_type" : "html", "merges" : { "EMAIL" : "api@mailchimp.com", "FNAME" : "MailChimp", "LNAME" : "API" "INTERESTS" : "Group1,Group2" }, "ip_opt" : "10.20.10.30" }, "eventTime" : "2009-03-26T21:31:21.000Z" } """ check(MailChimpConnector, profileUpdate, expected) } //check email update to event Json "convert email update to event JSON" in { val emailUpdate = Map( "type" -> "upemail", "fired_at" -> "2009-03-26 22:15:09", "data[list_id]" -> "a6b5da1054", "data[new_id]" -> "51da8c3259", "data[new_email]" -> "api+new@mailchimp.com", "data[old_email]" -> "api+old@mailchimp.com" ) val expected = """ { "event" : "upemail", "entityType" : "user", "entityId" : "51da8c3259", "targetEntityType" : "list", "targetEntityId" : "a6b5da1054", "properties" : { "new_email" : "api+new@mailchimp.com", "old_email" : "api+old@mailchimp.com" }, "eventTime" : "2009-03-26T22:15:09.000Z" } """ check(MailChimpConnector, emailUpdate, expected) } //check cleaned email to event Json "convert cleaned email to event JSON" in { val cleanedEmail = Map( "type" -> "cleaned", "fired_at" -> "2009-03-26 22:01:00", "data[list_id]" -> "a6b5da1054", "data[campaign_id]" -> "4fjk2ma9xd", "data[reason]" -> "hard", "data[email]" -> "api+cleaned@mailchimp.com" ) val expected = """ { "event" : "cleaned", "entityType" : "list", "entityId" : "a6b5da1054", "properties" : { "campaignId" : "4fjk2ma9xd", "reason" : "hard", "email" : "api+cleaned@mailchimp.com" }, "eventTime" : "2009-03-26T22:01:00.000Z" } """ check(MailChimpConnector, cleanedEmail, expected) } //check campaign sending status to event Json "convert campaign sending status to event JSON" in { val campaign = Map( "type" -> "campaign", "fired_at" -> "2009-03-26 22:15:09", "data[id]" -> "5aa2102003", "data[subject]" -> "Test Campaign Subject", "data[status]" -> "sent", "data[reason]" -> "", "data[list_id]" -> "a6b5da1054" ) val expected = """ { "event" : "campaign", "entityType" : "campaign", "entityId" : "5aa2102003", "targetEntityType" : "list", "targetEntityId" : "a6b5da1054", "properties" : { "subject" : "Test Campaign Subject", "status" : "sent", "reason" : "" }, "eventTime" : "2009-03-26T22:15:09.000Z" } """ check(MailChimpConnector, campaign, expected) } } } ================================================ FILE: data/src/test/scala/org/apache/predictionio/data/webhooks/segmentio/SegmentIOConnectorSpec.scala ================================================ /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.predictionio.data.webhooks.segmentio import org.apache.predictionio.data.webhooks.ConnectorTestUtil import org.specs2.mutable._ class SegmentIOConnectorSpec extends Specification with ConnectorTestUtil { // TODO: test different optional fields val commonFields = s""" | "anonymous_id": "id", | "sent_at": "sendAt", | "version": "2", """.stripMargin "SegmentIOConnector" should { "convert group with context to event JSON" in { val context = """ | "context": { | "app": { | "name": "InitechGlobal", | "version": "545", | "build": "3.0.1.545" | }, | "campaign": { | "name": "TPS Innovation Newsletter", | "source": "Newsletter", | "medium": "email", | "term": "tps reports", | "content": "image link" | }, | "device": { | "id": "B5372DB0-C21E-11E4-8DFC-AA07A5B093DB", | "advertising_id": "7A3CBEA0-BDF5-11E4-8DFC-AA07A5B093DB", | "ad_tracking_enabled": true, | "manufacturer": "Apple", | "model": "iPhone7,2", | "name": "maguro", | "type": "ios", | "token": "ff15bc0c20c4aa6cd50854ff165fd265c838e5405bfeb9571066395b8c9da449" | }, | "ip": "8.8.8.8", | "library": { | "name": "analytics-ios", | "version": "1.8.0" | }, | "network": { | "bluetooth": false, | "carrier": "T-Mobile NL", | "cellular": true, | "wifi": false | }, | "location": { | "city": "San Francisco", | "country": "United States", | "latitude": 40.2964197, | "longitude": -76.9411617, | "speed": 0 | }, | "os": { | "name": "iPhone OS", | "version": "8.1.3" | }, | "referrer": { | "id": "ABCD582CDEFFFF01919", | "type": "dataxu" | }, | "screen": { | "width": 320, | "height": 568, | "density": 2 | }, | "timezone": "Europe/Amsterdam", | "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5)" | } """.stripMargin val group = s""" |{ $commonFields | "type": "group", | "group_id": "groupId", | "user_id": "userIdValue", | "timestamp" : "2012-12-02T00:30:08.276Z", | "traits": { | "name": "groupName", | "employees": 329, | }, | $context |} """.stripMargin val expected = s""" |{ | "event": "group", | "entityType": "user", | "entityId": "userIdValue", | "properties": { | $context, | "group_id": "groupId", | "traits": { | "name": "groupName", | "employees": 329 | }, | }, | "eventTime" : "2012-12-02T00:30:08.276Z" |} """.stripMargin check(SegmentIOConnector, group, expected) } "convert group to event JSON" in { val group = s""" |{ $commonFields | "type": "group", | "group_id": "groupId", | "user_id": "userIdValue", | "timestamp" : "2012-12-02T00:30:08.276Z", | "traits": { | "name": "groupName", | "employees": 329, | } |} """.stripMargin val expected = """ |{ | "event": "group", | "entityType": "user", | "entityId": "userIdValue", | "properties": { | "group_id": "groupId", | "traits": { | "name": "groupName", | "employees": 329 | } | }, | "eventTime" : "2012-12-02T00:30:08.276Z" |} """.stripMargin check(SegmentIOConnector, group, expected) } "convert screen to event JSON" in { val screen = s""" |{ $commonFields | "type": "screen", | "name": "screenName", | "user_id": "userIdValue", | "timestamp" : "2012-12-02T00:30:08.276Z", | "properties": { | "variation": "screenVariation" | } |} """.stripMargin val expected = """ |{ | "event": "screen", | "entityType": "user", | "entityId": "userIdValue", | "properties": { | "properties": { | "variation": "screenVariation" | }, | "name": "screenName" | }, | "eventTime" : "2012-12-02T00:30:08.276Z" |} """.stripMargin check(SegmentIOConnector, screen, expected) } "convert page to event JSON" in { val page = s""" |{ $commonFields | "type": "page", | "name": "pageName", | "user_id": "userIdValue", | "timestamp" : "2012-12-02T00:30:08.276Z", | "properties": { | "title": "pageTitle", | "url": "pageUrl" | } |} """.stripMargin val expected = """ |{ | "event": "page", | "entityType": "user", | "entityId": "userIdValue", | "properties": { | "properties": { | "title": "pageTitle", | "url": "pageUrl" | }, | "name": "pageName" | }, | "eventTime" : "2012-12-02T00:30:08.276Z" |} """.stripMargin check(SegmentIOConnector, page, expected) } "convert alias to event JSON" in { val alias = s""" |{ $commonFields | "type": "alias", | "previous_id": "previousIdValue", | "user_id": "userIdValue", | "timestamp" : "2012-12-02T00:30:08.276Z" |} """.stripMargin val expected = """ |{ | "event": "alias", | "entityType": "user", | "entityId": "userIdValue", | "properties": { | "previous_id" : "previousIdValue" | }, | "eventTime" : "2012-12-02T00:30:08.276Z" |} """.stripMargin check(SegmentIOConnector, alias, expected) } "convert track to event JSON" in { val track = s""" |{ $commonFields | "user_id": "some_user_id", | "type": "track", | "event": "Registered", | "timestamp" : "2012-12-02T00:30:08.276Z", | "properties": { | "plan": "Pro Annual", | "accountType" : "Facebook" | } |} """.stripMargin val expected = """ |{ | "event": "track", | "entityType": "user", | "entityId": "some_user_id", | "properties": { | "event": "Registered", | "properties": { | "plan": "Pro Annual", | "accountType": "Facebook" | } | }, | "eventTime" : "2012-12-02T00:30:08.276Z" |} """.stripMargin check(SegmentIOConnector, track, expected) } "convert identify to event JSON" in { val identify = s""" { $commonFields "type" : "identify", "user_id" : "019mr8mf4r", "traits" : { "email" : "achilles@segment.com", "name" : "Achilles", "subscription_plan" : "Premium", "friendCount" : 29 }, "timestamp" : "2012-12-02T00:30:08.276Z" } """ val expected = """ { "event" : "identify", "entityType": "user", "entityId" : "019mr8mf4r", "properties" : { "traits" : { "email" : "achilles@segment.com", "name" : "Achilles", "subscription_plan" : "Premium", "friendCount" : 29 } }, "eventTime" : "2012-12-02T00:30:08.276Z" } """ check(SegmentIOConnector, identify, expected) } } } ================================================ FILE: data/test-form.sh ================================================ #!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # accessKey=$1 # normal subscribe event curl -i -X POST http://localhost:7070/webhooks/mailchimp?accessKey=$accessKey \ -H "Content-type: application/x-www-form-urlencoded" \ --data-urlencode "type"="subscribe" \ --data-urlencode "fired_at"="2009-03-26 21:35:57" \ --data-urlencode "data[id]"="8a25ff1d98" \ --data-urlencode "data[list_id]"="a6b5da1054" \ --data-urlencode "data[email]"="api@mailchimp.com" \ --data-urlencode "data[email_type]"="html" \ --data-urlencode "data[merges][EMAIL]"="api@mailchimp.com" \ --data-urlencode "data[merges][FNAME]"="MailChimp" \ --data-urlencode "data[merges][LNAME]"="API" \ --data-urlencode "data[merges][INTERESTS]"="Group1,Group2" \ --data-urlencode "data[ip_opt]"="10.20.10.30" \ --data-urlencode "data[ip_signup]"="10.20.10.30" \ -w %{time_total} # normal unsubscribe event curl -i -X POST http://localhost:7070/webhooks/mailchimp?accessKey=$accessKey \ -H "Content-type: application/x-www-form-urlencoded" \ --data-urlencode "type"="unsubscribe" \ --data-urlencode "fired_at"="2009-03-26 21:40:57" \ --data-urlencode "data[action]"="unsub" \ --data-urlencode "data[reason]"="manual" \ --data-urlencode "data[id]"="8a25ff1d98" \ --data-urlencode "data[list_id]"="a6b5da1054" \ --data-urlencode "data[email]"="api+unsub@mailchimp.com" \ --data-urlencode "data[email_type]"="html" \ --data-urlencode "data[merges][EMAIL]"="api+unsub@mailchimp.com" \ --data-urlencode "data[merges][FNAME]"="MailChimp" \ --data-urlencode "data[merges][LNAME]"="API" \ --data-urlencode "data[merges][INTERESTS]"="Group1,Group2" \ --data-urlencode "data[ip_opt]"="10.20.10.30" \ --data-urlencode "data[campaign_id]"="cb398d21d2" \ -w %{time_total} # normal profile update event curl -i -X POST http://localhost:7070/webhooks/mailchimp?accessKey=$accessKey \ -H "Content-type: application/x-www-form-urlencoded" \ --data-urlencode "type"="profile" \ --data-urlencode "fired_at"="2009-03-26 21:31:21" \ --data-urlencode "data[id]"="8a25ff1d98" \ --data-urlencode "data[list_id]"="a6b5da1054" \ --data-urlencode "data[email]"="api@mailchimp.com" \ --data-urlencode "data[email_type]"="html" \ --data-urlencode "data[merges][EMAIL]"="api@mailchimp.com" \ --data-urlencode "data[merges][FNAME]"="MailChimp" \ --data-urlencode "data[merges][LNAME]"="API" \ --data-urlencode "data[merges][INTERESTS]"="Group1,Group2" \ --data-urlencode "data[ip_opt]"="10.20.10.30" \ -w %{time_total} # normal email update event curl -i -X POST http://localhost:7070/webhooks/mailchimp?accessKey=$accessKey \ -H "Content-type: application/x-www-form-urlencoded" \ --data-urlencode "type"="upemail" \ --data-urlencode "fired_at"="2009-03-26 22:15:09" \ --data-urlencode "data[list_id]"="a6b5da1054" \ --data-urlencode "data[new_id]"="51da8c3259" \ --data-urlencode "data[new_email]"="api+new@mailchimp.com" \ --data-urlencode "data[old_email]"="api+old@mailchimp.com" \ -w %{time_total} # normal cleaned email event curl -i -X POST http://localhost:7070/webhooks/mailchimp?accessKey=$accessKey \ -H "Content-type: application/x-www-form-urlencoded" \ --data-urlencode "type"="cleaned" \ --data-urlencode "fired_at"="2009-03-26 22:01:00" \ --data-urlencode "data[list_id]"="a6b5da1054" \ --data-urlencode "data[campaign_id]"="4fjk2ma9xd" \ --data-urlencode "data[reason]"="hard" \ --data-urlencode "data[email]"="api+cleaned@mailchimp.com" \ -w %{time_total} # normal campaign sending status event curl -i -X POST http://localhost:7070/webhooks/mailchimp?accessKey=$accessKey \ -H "Content-type: application/x-www-form-urlencoded" \ --data-urlencode "type"="campaign" \ --data-urlencode "fired_at"="2009-03-26 21:31:21" \ --data-urlencode "data[id]"="5aa2102003" \ --data-urlencode "data[subject]"="Test Campaign Subject" \ --data-urlencode "data[status]"="sent" \ --data-urlencode "data[reason]"="" \ --data-urlencode "data[list_id]"="a6b5da1054" \ -w %{time_total} # invalid type curl -i -X POST http://localhost:7070/webhooks/mailchimp?accessKey=$accessKey \ -H "Content-type: application/x-www-form-urlencoded" \ --data-urlencode "type"="something_invalid" \ --data-urlencode "fired_at"="2009-03-26 21:35:57" \ --data-urlencode "data[id]"="8a25ff1d98" \ --data-urlencode "data[list_id]"="a6b5da1054" \ --data-urlencode "data[email]"="api@mailchimp.com" \ --data-urlencode "data[email_type]"="html" \ --data-urlencode "data[merges][EMAIL]"="api@mailchimp.com" \ --data-urlencode "data[merges][FNAME]"="MailChimp" \ --data-urlencode "data[merges][LNAME]"="API" \ --data-urlencode "data[merges][INTERESTS]"="Group1,Group2" \ --data-urlencode "data[ip_opt]"="10.20.10.30" \ --data-urlencode "data[ip_signup]"="10.20.10.30" \ -w %{time_total} # missing data (type) curl -i -X POST http://localhost:7070/webhooks/mailchimp?accessKey=$accessKey \ -H "Content-type: application/x-www-form-urlencoded" \ --data-urlencode "fired_at"="2009-03-26 21:35:57" \ --data-urlencode "data[id]"="8a25ff1d98" \ --data-urlencode "data[list_id]"="a6b5da1054" \ --data-urlencode "data[email]"="api@mailchimp.com" \ --data-urlencode "data[email_type]"="html" \ --data-urlencode "data[merges][EMAIL]"="api@mailchimp.com" \ --data-urlencode "data[merges][FNAME]"="MailChimp" \ --data-urlencode "data[merges][LNAME]"="API" \ --data-urlencode "data[merges][INTERESTS]"="Group1,Group2" \ --data-urlencode "data[ip_opt]"="10.20.10.30" \ --data-urlencode "data[ip_signup]"="10.20.10.30" \ -w %{time_total} # invalid webhooks path curl -i -X POST http://localhost:7070/webhooks/invalid?accessKey=$accessKey \ -H "Content-type: application/x-www-form-urlencoded" \ --data-urlencode "type"="subscribe" \ --data-urlencode "fired_at"="2009-03-26 21:35:57" \ --data-urlencode "data[id]"="8a25ff1d98" \ --data-urlencode "data[list_id]"="a6b5da1054" \ --data-urlencode "data[email]"="api@mailchimp.com" \ --data-urlencode "data[email_type]"="html" \ --data-urlencode "data[merges][EMAIL]"="api@mailchimp.com" \ --data-urlencode "data[merges][FNAME]"="MailChimp" \ --data-urlencode "data[merges][LNAME]"="API" \ --data-urlencode "data[merges][INTERESTS]"="Group1,Group2" \ --data-urlencode "data[ip_opt]"="10.20.10.30" \ --data-urlencode "data[ip_signup]"="10.20.10.30" \ -w %{time_total} # get normal curl -i -X GET http://localhost:7070/webhooks/mailchimp?accessKey=$accessKey \ -H "Content-type: application/x-www-form-urlencoded" \ -w %{time_total} # get invalid curl -i -X GET http://localhost:7070/webhooks/invalid?accessKey=$accessKey \ -H "Content-type: application/x-www-form-urlencoded" \ -w %{time_total} ================================================ FILE: data/test-normal.sh ================================================ #!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # accessKey=$1 curl -i -X POST http://localhost:7070/events.json?accessKey=$1 \ -H "Content-Type: application/json" \ -d '{ "event" : "my_event1", "entityType" : "user", "entityId" : "uid", "eventTime" : "2004-12-13T21:39:45.618-07:00" }' \ -w %{time_total} ================================================ FILE: data/test-segmentio.sh ================================================ #!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # accessKey=$1 # normal case curl -H "Accept: application/json; version=2.0" \ http://spec.segment.com/generate/identify | \ curl -X POST \ -H "Content-Type: application/json" \ -d @- \ http://localhost:7070/webhooks/segmentio.json?accessKey=$accessKey echo '' # normal case api key in header for identify event curl -H "Accept: application/json; version=2.0" \ http://spec.segment.com/generate/identify | \ curl -X POST \ --user "$accessKey:" \ -H "Content-Type: application/json" \ -d @- \ http://localhost:7070/webhooks/segmentio.json echo '' # normal case api key in header for track event curl -H "Accept: application/json; version=2.0" \ http://spec.segment.com/generate/track | \ curl -X POST \ --user "$accessKey:" \ -H "Content-Type: application/json" \ -d @- \ http://localhost:7070/webhooks/segmentio.json echo '' # normal case api key in header for page event curl -H "Accept: application/json; version=2.0" \ http://spec.segment.com/generate/page | \ curl -X POST \ --user "$accessKey:" \ -H "Content-Type: application/json" \ -d @- \ http://localhost:7070/webhooks/segmentio.json echo '' # normal case api key in header for screen event curl -H "Accept: application/json; version=2.0" \ http://spec.segment.com/generate/screen | \ curl -X POST \ --user "$accessKey:" \ -H "Content-Type: application/json" \ -d @- \ http://localhost:7070/webhooks/segmentio.json echo '' # normal case api key in header for group event curl -H "Accept: application/json; version=2.0" \ http://spec.segment.com/generate/group | \ curl -X POST \ --user "$accessKey:" \ -H "Content-Type: application/json" \ -d @- \ http://localhost:7070/webhooks/segmentio.json echo '' # normal case api key in header for alias event curl -H "Accept: application/json; version=2.0" \ http://spec.segment.com/generate/alias | \ curl -X POST \ --user "$accessKey:" \ -H "Content-Type: application/json" \ -d @- \ http://localhost:7070/webhooks/segmentio.json echo '' # invalid type curl -i -X POST http://localhost:7070/webhooks/segmentio.json?accessKey=$accessKey \ -H "Content-Type: application/json" \ -d '{ "version" : 1, "type" : "invalid_type", "userId" : "019mr8mf4r", "sent_at":"2015-08-21T15:25:32.799Z", "traits" : { "email" : "achilles@segment.com", "name" : "Achilles", "subscriptionPlan" : "Premium", "friendCount" : 29 }, "timestamp" : "2012-12-02T00:30:08.276Z" }' \ -w %{time_total} echo '' # invalid data format curl -i -X POST http://localhost:7070/webhooks/segmentio.json?accessKey=$accessKey \ -H "Content-Type: application/json" \ -d '{ "version" : 1, "userId" : "019mr8mf4r", "sent_at":"2015-08-21T15:25:32.799Z", "traits" : { "email" : "achilles@segment.com", "name" : "Achilles", "subscriptionPlan" : "Premium", "friendCount" : 29 }, "timestamp" : "2012-12-02T00:30:08.276Z" }' \ -w %{time_total} echo '' # invalid webhooks path curl -i -X POST http://localhost:7070/webhooks/invalidpath.json?accessKey=$accessKey \ -H "Content-Type: application/json" \ -d '{ "version" : 1, "type" : "identify", "userId" : "019mr8mf4r", "sent_at":"2015-08-21T15:25:32.799Z", "traits" : { "email" : "achilles@segment.com", "name" : "Achilles", "subscriptionPlan" : "Premium", "friendCount" : 29 }, "timestamp" : "2012-12-02T00:30:08.276Z" }' \ -w %{time_total} echo '' # get request curl -i -X GET http://localhost:7070/webhooks/segmentio.json?accessKey=$accessKey \ -H "Content-Type: application/json" \ -w %{time_total} echo '' # get invalid curl -i -X GET http://localhost:7070/webhooks/invalidpath.json?accessKey=$accessKey \ -H "Content-Type: application/json" \ -w %{time_total} echo '' ================================================ FILE: data/test.sh ================================================ #!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # simple test script for dataapi accessKey=$1 function checkGET () { resp=$( curl -i -s -X GET "http://localhost:7070$1" ) status=$( echo "$resp" | grep HTTP/1.1 ) exp=$2 if [[ $status =~ (.*HTTP/1.1 $exp [a-zA-Z]+) ]]; then echo "[pass] GET $1 $status" else echo "[fail] GET $1 $resp" echo "expect $exp" exit -1 fi } function checkPOST () { resp=$( curl -i -s -X POST http://localhost:7070$1 \ -H "Content-Type: application/json" \ -d "$2" ) status=$( echo "$resp" | grep HTTP/1.1 ) exp=$3 if [[ $status =~ (.*HTTP/1.1 $exp [a-zA-Z]+) ]]; then #echo "POST $1 $2 good $status" echo "[pass] POST $1 $status" else echo "[fail] POST $1 $2 $resp" echo "expect $exp" exit -1 fi } # --------------- # status # ---------------- checkGET "/" 200 # ----------- # reserved events # ------------ testdata='{ "event" : "$set", "entityType" : "my_entity_type", "entityId" : "my_entity_id", "properties" : { "prop1" : 1, } "eventTime" : "2004-12-13T21:39:45.618Z" }' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 201 testdata='{ "event" : "$unset", "entityType" : "my_entity_type", "entityId" : "my_entity_id", "properties" : { "prop1" : "", } "eventTime" : "2004-12-13T21:39:45.618Z" }' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 201 testdata='{ "event" : "$delete", "entityType" : "my_entity_type", "entityId" : "my_entity_id", "eventTime" : "2004-12-13T21:39:45.618Z" }' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 201 testdata='{ "event" : "$xxxx", "entityType" : "my_entity_type", "entityId" : "my_entity_id", "targetEntityType" : "my_target_entity_type", "targetEntityId" : "my_target_entity_id", "properties" : { "prop1" : 1, } "eventTime" : "2004-12-13T21:39:45.618Z" }' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 400 # ------------- # create events # ------------- # full testdata='{ "event" : "my_event", "entityType" : "my_entity_type", "entityId" : "my_entity_id", "targetEntityType" : "my_target_entity_type", "targetEntityId" : "my_target_entity_id", "properties" : { "prop1" : 1, "prop2" : "value2", "prop3" : [1, 2, 3], "prop4" : true, "prop5" : ["a", "b", "c"], "prop6" : 4.56 } "eventTime" : "2004-12-13T21:39:45.618Z" }' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 201 # different time zone testdata='{ "event" : "my_event_tzone", "entityType" : "my_entity_type", "entityId" : "my_entity_id", "eventTime" : "2004-12-13T21:39:45.618-08:00" }' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 201 testdata='{ "event" : "my_event_tzone", "entityType" : "my_entity_type", "entityId" : "my_entity_id", "eventTime" : "2004-12-13T21:39:45.618+02:00" }' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 201 # invalid timezone testdata='{ "event" : "my_event_tzone", "entityType" : "my_entity_type", "entityId" : "my_entity_id", "eventTime" : "2004-12-13T21:39:45.618ABC" }' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 400 # invalid timezone testdata='{ "event" : "my_event_tzone", "entityType" : "my_entity_type", "entityId" : "my_entity_id", "eventTime" : "2004-12-13T21:39:45.618+1" }' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 400 # no properties testdata='{ "event" : "my_event", "entityType" : "my_entity_type", "entityId" : "my_entity_id", "targetEntityType" : "my_target_entity_type", "targetEntityId" : "my_target_entity_id", "eventTime" : "2004-12-13T21:39:45.618Z" }' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 201 # no properties with $unset event testdata='{ "event" : "$unset", "entityType" : "my_entity_type", "entityId" : "my_entity_id", "eventTime" : "2004-12-13T21:39:45.618Z" }' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 400 testdata='{ "event" : "my_event", "entityType" : "my_entity_type", "entityId" : "my_entity_id", "targetEntityType" : "my_target_entity_type", "targetEntityId" : "my_target_entity_id", "eventTime" : "2004-12-14T21:39:45.618Z", "properties": {} }' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 201 # no properties with $unset event testdata='{ "event" : "$unset", "entityType" : "my_entity_type", "entityId" : "my_entity_id", "targetEntityType" : "my_target_entity_type", "targetEntityId" : "my_target_entity_id", "eventTime" : "2004-12-14T21:39:45.618Z", "properties": {} }' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 400 # no tags testdata='{ "event" : "my_event", "entityType" : "my_entity_type", "entityId" : "my_entity_id", "targetEntityType" : "my_target_entity_type", "targetEntityId" : "my_target_entity_id", "properties" : { "prop1" : "value1", "prop2" : "value2" } "eventTime" : "2004-12-15T21:39:45.618Z" }' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 201 ## no eventTIme testdata='{ "event" : "my_event", "entityType" : "my_entity_type", "entityId" : "my_entity_id", "targetEntityType" : "my_target_entity_type", "targetEntityId" : "my_target_entity_id", "properties" : { "prop1" : "value1", "prop2" : "value2" } }' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 201 ## with prid testdata='{ "event" : "my_event", "entityType" : "my_entity_type", "entityId" : "my_entity_id", "targetEntityType" : "my_target_entity_type", "targetEntityId" : "my_target_entity_id", "properties" : { "prop1" : "value1", "prop2" : "value2" } "eventTime" : "2004-12-13T21:39:45.618Z" "prId" : "asfasfdsafdcsdFDWd" }' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 201 # minimum testdata='{ "event" : "my_event", "entityType" : "my_entity_type", "entityId" : "my_entity_id" }' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 201 # check accepting null for optional fields testdata='{ "event" : "my_event", "entityType" : "my_entity_type", "entityId" : "my_entity_id", "targetEntityType" : null, "targetEntityId" : null, "properties" : null, "eventTime" : null }' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 201 testdata='{ "event" : "my_event", "entityType" : "my_entity_type", "entityId" : "my_entity_id", "targetEntityType" : null, "targetEntityId" : null, "properties" : { "prop1": 1, "prop2": null }, "eventTime" : null }' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 201 # ---------------------------- # create events error cases # ---------------------------- # missing event testdata='{ "entityType" : "my_entity_type", "entityId" : "my_entity_id", "targetEntityType" : "my_target_entity_type", "targetEntityId" : "my_target_entity_id", "properties" : { "prop1" : 1, "prop2" : "value2", "prop3" : [1, 2, 3], "prop4" : true, "prop5" : ["a", "b", "c"], "prop6" : 4.56 } "eventTime" : "2004-12-13T21:39:45.618Z" }' # missing entityType testdata='{ "event" : "my_event", "entityId" : "my_entity_id", "targetEntityType" : "my_target_entity_type", "targetEntityId" : "my_target_entity_id", "properties" : { "prop1" : 1, "prop2" : "value2", "prop3" : [1, 2, 3], "prop4" : true, "prop5" : ["a", "b", "c"], "prop6" : 4.56 } "eventTime" : "2004-12-13T21:39:45.618Z" }' # missing entityId testdata='{ "event" : "my_event", "entityType" : "my_entity_type", "properties" : { "prop1" : "value1", "prop2" : "value2" } "eventTime" : "2004-12-13T21:39:45.618Z" }' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 400 # empty event string testdata='{ "event" : "", "entityType" : "my_entity_type", "entityId" : "my_entity_id", "targetEntityType" : "my_target_entity_type", "targetEntityId" : "my_target_entity_id", "properties" : { "prop1" : "value1", "prop2" : "value2" } "eventTime" : "2004-12-13T21:39:45.618Z" }' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 400 # empty testdata='{}' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 400 # empty testdata='' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 400 # invalid data testdata='asfd' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 400 # invalid pio_ entityType testdata='{ "event" : "my_event", "entityType" : "pio_xx", "entityId" : "my_entity_id", "targetEntityType" : "my_target_entity_type", "targetEntityId" : "my_target_entity_id", "properties" : { "prop1" : 1, "prop2" : "value2" } "eventTime" : "2004-12-13T21:39:45.618Z" }' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 400 # invalid pio_ targetEntityType testdata='{ "event" : "my_event", "entityType" : "food", "entityId" : "my_entity_id", "targetEntityType" : "pio_xxx", "targetEntityId" : "my_target_entity_id", "properties" : { "prop1" : 1, "prop2" : "value2" } "eventTime" : "2004-12-13T21:39:45.618Z" }' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 400 # invalid pio_ properties testdata='{ "event" : "my_event", "entityType" : "food", "entityId" : "my_entity_id", "targetEntityType" : "food2", "targetEntityId" : "my_target_entity_id", "properties" : { "pio_aaa" : 1, "prop2" : "value2" } "eventTime" : "2004-12-13T21:39:45.618Z" }' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 400 # valid pio_pr entityType testdata='{ "event" : "my_event", "entityType" : "pio_pr", "entityId" : "my_entity_id", "targetEntityType" : "my_target_entity_type", "targetEntityId" : "my_target_entity_id", "properties" : { "prop1" : 1, "prop2" : "value2" } "eventTime" : "2004-12-13T21:39:45.618Z" }' checkPOST "/events.json?accessKey=$accessKey" "$testdata" 201 # ----- # get events # ---- checkGET "/events.json?accessKey=$accessKey" 200 # invalid accessKey checkGET "/events.json?accessKey=999" 401 checkGET "/events.json?accessKey=$accessKey&startTime=abc" 400 checkGET "/events.json?accessKey=$accessKey&untilTime=abc" 400 checkGET "/events.json?accessKey=$accessKey&startTime=2004-12-13T21:39:45.618Z&untilTime=2004-12-15T21:39:45.618Z" 200 # ----- # batch request # ---- # normal request testdata='[{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", "targetEntityType" : "item", "targetEntityId" : "iid", "properties" : { "someProperty" : "value1", "anotherProperty" : "value2" }, "eventTime" : "2004-12-13T21:39:45.618Z" }]' checkPOST "/batch/events.json?accessKey=$accessKey" "$testdata" 200 # request with a malformed event (2nd event) # the response code is successful but the error for individual event is reflected in the response's body. testdata='[{ "event" : "my_event_1", "entityType" : "user", "entityId" : "uid", "eventTime" : "2004-12-13T21:39:45.618Z" }, { "eve" : "my_event_2", "entityType" : "user", "entityId" : "uid", "eventTime" : "2015-12-13T21:39:45.618Z" }]' checkPOST "/batch/events.json?accessKey=$accessKey" "$testdata" 200 # request with too many events (more than 50) testdata=`cat data/very_long_batch_request.txt` checkPOST "/batch/events.json?accessKey=$accessKey" "$testdata" 400 ================================================ FILE: data/test2.sh ================================================ #!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # simple test script for dataapi accessKey=$1 curl -i -X POST "http://localhost:7070/events.json?accessKey=$accessKey" \ -H "Content-Type: application/json" \ -d '{ "event" : "$delete", "entityType" : "pio_user", "entityId" : "123" }' curl -i -X POST "http://localhost:7070/events.json?accessKey=$accessKey" \ -H "Content-Type: application/json" \ -d '{ "event" : "$delete", "entityType" : "pio_item", "entityId" : "174" }' curl -i -X POST "http://localhost:7070/events.json?accessKey=$accessKey" \ -H "Content-Type: application/json" \ -d '{ "event" : "$set", "entityType" : "pio_item", "entityId" : "174", "properties" : { "piox_a" : 1 } }' curl -i -X POST "http://localhost:7070/events.json?accessKey=$accessKey" \ -H "Content-Type: application/json" \ -d '{ "event" : "my_event", "entityType" : "my_entity_type", "entityId" : "my_entity_id", "targetEntityType" : null, "targetEntityId" : null, "properties" : { "prop1" : 1, "prop2" : null, } "eventTime" : "2004-12-13T21:39:45.618Z" }' curl -i -X POST "http://localhost:7070/events.json?accessKey=$accessKey" \ -H "Content-Type: application/json" \ -d '{ "event" : "my_event", "entityType" : "my_entity_type", "entityId" : "my_entity_id", "targetEntityType" : null, "targetEntityId" : null, "properties" : null, "eventTime" : null }' ## prId curl -i -X POST "http://localhost:7070/events.json?accessKey=$accessKey" \ -H "Content-Type: application/json" \ -d '{ "event" : "some_event", "entityType" : "pio_user", "entityId" : "123", "prId" : "AbcdefXXFFdsf1" }' ================================================ FILE: data/test3.sh ================================================ #!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # curl -i -X POST http://localhost:7070/events.json?accessKey=testingkeyasdfasdf \ -H "Content-Type: application/json" \ -d '{ "event" : "my_event1", "entityType" : "user", "entityId" : "uid", "eventTime" : "2004-12-13T21:39:45.618-07:00" }' curl -i -X POST http://localhost:7070/events.json?accessKey=yT8WHQMkQLBPxGdcGWstu6Z12XaNjANu7py98Ysve2NHwGNp825bkCt2G3LPU6aK \ -H "Content-Type: application/json" \ -d '{ "event" : "my_event1", "entityType" : "user", "entityId" : "uid", "eventTime" : "2004-12-13T21:39:45.618-07:00" }' curl -i -X POST http://localhost:7070/events.json \ -H "Content-Type: application/json" \ -d '{ "event" : "my_event2", "entityType" : "user", "entityId" : "uid", "eventTime" : "2004-12-14T21:39:45.618-07:00" }' curl -i -X POST http://localhost:7070/events.json \ -H "Content-Type: application/json" \ -d '{ "event" : "my_event3", "entityType" : "user", "entityId" : "uid", "eventTime" : "2004-12-11T21:39:45.618-07:00" }' curl -i -X POST http://localhost:7070/events.json \ -H "Content-Type: application/json" \ -d '{ "event" : "my_event4", "entityType" : "user", "entityId" : "uid2", "eventTime" : "2004-12-11T22:39:45.618-07:00" }' curl -i -X POST http://localhost:7070/events.json \ -H "Content-Type: application/json" \ -d '{ "event" : "my_event5", "entityType" : "user", "entityId" : "uid2", "eventTime" : "2004-12-14T21:39:45.618-07:00" }' curl -i -X POST http://localhost:7070/events.json \ -H "Content-Type: application/json" \ -d '{ "event" : "my_event1", "entityType" : "item", "entityId" : "uid", "eventTime" : "2004-12-13T21:39:45.618-07:00" }' ================================================ FILE: data/very_long_batch_request.txt ================================================ [{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", },{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", }] ================================================ FILE: doap.rdf ================================================ 2016-05-26 Apache PredictionIO PredictionIO is an open source Machine Learning Server built on top of state-of-the-art open source stack, that enables developers to manage and deploy production-ready predictive services for various kinds of machine learning tasks. PredictionIO is an open source Machine Learning Server built on top of state-of-the-art open source stack, that enables developers to manage and deploy production-ready predictive services for various kinds of machine learning tasks. Scala Donald Szeto ================================================ FILE: docker/.ivy2/.keep ================================================ # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. ================================================ FILE: docker/JUPYTER.md ================================================ Jupyter With PredictionIO ========================= ## Overview Using Jupyter based docker, you can use Jupyter Notebook with PredictionIO environment. It helps you with your exploratory data analysis (EDA). ## Run Jupyter Notebook First of all, start Jupyter container with PredictionIO environment: ``` docker-compose -f docker-compose.jupyter.yml \ -f pgsql/docker-compose.base.yml \ -f pgsql/docker-compose.meta.yml \ -f pgsql/docker-compose.event.yml \ -f pgsql/docker-compose.model.yml \ up ``` Open `http://127.0.0.1:8888/` and then open a new terminal in Jupyter from `New` pulldown button. ## Getting Started With Scala Based Template ### Download Template Clone a template using Git: ``` cd templates/ git clone https://github.com/apache/predictionio-template-recommender.git cd predictionio-template-recommender/ ``` Replace a name with `MyApp1`. ``` sed -i "s/INVALID_APP_NAME/MyApp1/" engine.json ``` ### Register New Application Using pio command, register a new application as `MyApp1`. ``` pio app new MyApp1 ``` This command prints an access key as below. ``` [INFO] [Pio$] Access Key: bbe8xRHN1j3Sa8WeAT8TSxt5op3lUqhvXmKY1gLRjg70K-DUhHIJJ0-UzgKumxGm ``` Set it to an environment variable `ACCESS_KEY`. ``` ACCESS_KEY=bbe8xRHN1j3Sa8WeAT8TSxt5op3lUqhvXmKY1gLRjg70K-DUhHIJJ0-UzgKumxGm ``` ### Import Training Data Download trainging data and import them to PredictionIO Event server. ``` curl https://raw.githubusercontent.com/apache/spark/master/data/mllib/sample_movielens_data.txt --create-dirs -o data/sample_movielens_data.txt python data/import_eventserver.py --access_key $ACCESS_KEY ``` ### Build Template Build your template by the following command: ``` pio build --verbose ``` ### Create Model To create a model, run: ``` pio train ``` ## Getting Started With Python Based Template ### Download Template Clone a template using Git: ``` cd templates/ git clone https://github.com/jpioug/predictionio-template-iris.git predictionio-template-iris/ ``` ### Register New Application Using pio command, register a new application as `IrisApp`. ``` pio app new --access-key IRIS_TOKEN IrisApp ``` ### Import Training Data Download trainging data and import them to PredictionIO Event server. ``` python data/import_eventserver.py ``` ### Build Template Build your template by the following command: ``` pio build --verbose ``` ### EDA To do data analysis, open `templates/predictionio-template-iris/eda.ipynb` on Jupyter. ### Create Model You need to clear the following environment variables in the terminal before executing `pio train`. ``` unset PYSPARK_PYTHON unset PYSPARK_DRIVER_PYTHON unset PYSPARK_DRIVER_PYTHON_OPTS ``` To create a model, run: ``` pio train --main-py-file train.py ``` ================================================ FILE: docker/README.md ================================================ Apache PredictionIO Docker ========================== ## Overview PredictionIO Docker provides Docker image for use in development and production environment. ## Usage ### Run PredictionIO with Selectable docker-compose Files You can choose storages for event/meta/model to select docker-compose.yml. ``` docker-compose -f docker-compose.yml -f ... up ``` Supported storages are as below: | Type | Storage | |:-----:|:---------------------------------| | Event | Postgresql, MySQL, Elasticsearch | | Meta | Postgresql, MySQL, Elasticsearch | | Model | Postgresql, MySQL, LocalFS | If you run PredictionIO with Postgresql, run as below: ``` docker-compose -f docker-compose.yml \ -f pgsql/docker-compose.base.yml \ -f pgsql/docker-compose.meta.yml \ -f pgsql/docker-compose.event.yml \ -f pgsql/docker-compose.model.yml \ up ``` To use localfs as model storage, change as below: ``` docker-compose -f docker-compose.yml \ -f pgsql/docker-compose.base.yml \ -f pgsql/docker-compose.meta.yml \ -f pgsql/docker-compose.event.yml \ -f localfs/docker-compose.model.yml \ up ``` ## Tutorial In this demo, we will show you how to build a recommendation template. ### Run PredictionIO environment The following command starts PredictionIO with an event server. PredictionIO docker image mounts ./templates directory to /templates. ``` $ docker-compose -f docker-compose.yml \ -f pgsql/docker-compose.base.yml \ -f pgsql/docker-compose.meta.yml \ -f pgsql/docker-compose.event.yml \ -f pgsql/docker-compose.model.yml \ up ``` We provide `pio-docker` command as an utility for `pio` command. `pio-docker` invokes `pio` command in PredictionIO container. ``` $ export PATH=`pwd`/bin:$PATH $ pio-docker status ... [INFO] [Management$] Your system is all ready to go. ``` ### Download Recommendation Template This demo uses [predictionio-template-recommender](https://github.com/apache/predictionio-template-recommender). ``` $ cd templates $ git clone https://github.com/apache/predictionio-template-recommender.git MyRecommendation $ cd MyRecommendation ``` ### Register Application You need to register this application to PredictionIO: ``` $ pio-docker app new MyApp1 [INFO] [App$] Initialized Event Store for this app ID: 1. [INFO] [Pio$] Created a new app: [INFO] [Pio$] Name: MyApp1 [INFO] [Pio$] ID: 1 [INFO] [Pio$] Access Key: i-zc4EleEM577EJhx3CzQhZZ0NnjBKKdSbp3MiR5JDb2zdTKKzH9nF6KLqjlMnvl ``` Since an access key is required in subsequent steps, set it to ACCESS_KEY. ``` $ ACCESS_KEY=i-zc4EleEM577EJhx3CzQhZZ0NnjBKKdSbp3MiR5JDb2zdTKKzH9nF6KLqjlMnvl ``` `engine.json` contains an application name, so replace `INVALID_APP_NAME` with `MyApp1`. ``` ... "datasource": { "params" : { "appName": "MyApp1" } }, ... ``` ### Import Data To import training data to Event server for PredictionIO, this template provides an import tool. The tool depends on PredictionIO Python SDK and install as below: ``` $ pip install predictionio ``` and then import data: ``` $ curl https://raw.githubusercontent.com/apache/spark/master/data/mllib/sample_movielens_data.txt --create-dirs -o data/sample_movielens_data.txt $ python data/import_eventserver.py --access_key $ACCESS_KEY ``` ### Build Template This is Scala based template. So, you need to build this template by `pio` command. ``` $ pio-docker build --verbose ``` ### Train and Create Model To train a recommendation model, run `train` sub-command: ``` $ pio-docker train ``` ### Deploy Model If a recommendation model is created successfully, deploy it to Prediction server for PredictionIO. ``` $ pio-docker deploy ``` You can check predictions as below: ``` $ curl -H "Content-Type: application/json" \ -d '{ "user": "1", "num": 4 }' http://localhost:8000/queries.json ``` ## Advanced Topics ### Run with Elasticsearch For Elasticsearch, Meta and Event storage are available. To start PredictionIO with Elasticsearch, ``` docker-compose -f docker-compose.yml \ -f elasticsearch/docker-compose.base.yml \ -f elasticsearch/docker-compose.meta.yml \ -f elasticsearch/docker-compose.event.yml \ -f localfs/docker-compose.model.yml \ up ``` ### Run with Spark Cluster Adding `docker-compose.spark.yml`, you can use Spark cluster on `pio train`. ``` docker-compose -f docker-compose.yml \ -f docker-compose.spark.yml \ -f elasticsearch/docker-compose.base.yml \ -f elasticsearch/docker-compose.meta.yml \ -f elasticsearch/docker-compose.event.yml \ -f localfs/docker-compose.model.yml \ up ``` To submit a training task to Spark Cluster, run `pio-deploy train` with `--master` option: ``` pio-docker train -- --master spark://spark-master:7077 ``` See `docker-compose.spark.yml` if changing settings for Spark Cluster. ### Run Engine Server To deploy your engine and start an engine server, run Docker with `docker-compose.deploy.yml`. ``` docker-compose -f docker-compose.yml \ -f pgsql/docker-compose.base.yml \ -f pgsql/docker-compose.meta.yml \ -f pgsql/docker-compose.event.yml \ -f pgsql/docker-compose.model.yml \ -f docker-compose.deploy.yml \ up ``` See `deploy/run.sh` and `docker-compose.deploy.yml` if changing a deployment. ### Run with Jupyter You can launch PredictionIO with Jupyter. ``` docker-compose -f docker-compose.jupyter.yml \ -f pgsql/docker-compose.base.yml \ -f pgsql/docker-compose.meta.yml \ -f pgsql/docker-compose.event.yml \ -f pgsql/docker-compose.model.yml \ up ``` For more information, see [JUPYTER.md](./JUPYTER.md). ## Development ### Build Base Docker Image ``` docker build -t predictionio/pio pio ``` ### Build Jupyter Docker Image ``` docker build -t predictionio/pio-jupyter jupyter ``` ### Push Docker Image ``` docker push predictionio/pio:latest docker tag predictionio/pio:latest predictionio/pio:$PIO_VERSION docker push predictionio/pio:$PIO_VERSION ``` ================================================ FILE: docker/bin/pio-docker ================================================ #!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # BASE_WORK_DIR=/templates CURRENT_DIR=`pwd` get_container_id() { if [ x"$PIO_CONTAINER_ID" != "x" ] ; then echo $PIO_CONTAINER_ID return fi for i in `docker ps -f "name=pio" -q` ; do echo $i return done } get_current_dir() { if [ x"$PIO_CURRENT_DIR" != "x" ] ; then echo $PIO_CURRENT_DIR return fi D=`echo $CURRENT_DIR | sed -e "s,.*$BASE_WORK_DIR,$BASE_WORK_DIR,"` if [[ $D = $BASE_WORK_DIR* ]] ; then echo $D else echo $BASE_WORK_DIR fi } cid=`get_container_id` if [ x"$cid" = "x" ] ; then echo "Docker Container is not found." exit 1 fi wdir=`get_current_dir` docker exec -w $wdir -it $cid pio $@ ================================================ FILE: docker/charts/README.md ================================================ Helm Charts for Apache PredictionIO ============================ ## Overview Helm Charts are packages of pre-configured Kubernetes resources. Using charts, you can install and manage PredictionIO in the Kubernetes. ## Usage ### Install PredictionIO with PostgreSQL To install PostgreSQL and PredictionIO, run `helm install` command: ``` helm install --name my-postgresql stable/postgresql -f postgresql.yaml helm install --name my-pio ./predictionio -f predictionio_postgresql.yaml ``` `postgresql.yaml` and `predictionio_postgresql.yaml` are configuration files for charts. To access Jupyter for PredictionIO, run `kubectl port-forward` and then open `http://localhost:8888/`. ``` export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=predictionio,app.kubernetes.io/instance=my-pio" -o jsonpath="{.items[0].metadata.name}") kubectl port-forward $POD_NAME 8888:8888 ``` ### Install Spark Cluster To install Spark cluster, run the following command: ``` helm install --name my-spark ./spark ``` To train a model, run `pio train` as below: ``` pio train -- --master spark://my-spark-master:7077 ``` ================================================ FILE: docker/charts/postgresql.yaml ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # postgresqlUsername: pio postgresqlPassword: pio postgresqlDatabase: pio # for testing persistence: enabled: false ================================================ FILE: docker/charts/predictionio/.helmignore ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # Patterns to ignore when building packages. # This supports shell glob matching, relative path matching, and # negation (prefixed with !). Only one pattern per line. .DS_Store # Common VCS dirs .git/ .gitignore .bzr/ .bzrignore .hg/ .hgignore .svn/ # Common backup files *.swp *.bak *.tmp *~ # Various IDEs .project .idea/ *.tmproj ================================================ FILE: docker/charts/predictionio/Chart.yaml ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # name: predictionio version: 0.1.0 appVersion: 0.13.0 description: Machine learning server home: http://predictionio.apache.org icon: http://predictionio.apache.org/images/logos/logo-ee2b9bb3.png sources: - https://github.com/apache/predictionio maintainers: - name: Shinsuke Sugaya email: shinsuke@apache.org ================================================ FILE: docker/charts/predictionio/templates/NOTES.txt ================================================ {{/* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */}} 1. Get the application URL by running these commands: {{- if contains "NodePort" .Values.pio.service.type }} export NODE_PORT=$(kubectl get --namespace {{ .Release.Namespace }} -o jsonpath="{.spec.ports[0].nodePort}" services {{ include "predictionio.fullname" . }}) export NODE_IP=$(kubectl get nodes --namespace {{ .Release.Namespace }} -o jsonpath="{.items[0].status.addresses[0].address}") echo http://$NODE_IP:$NODE_PORT {{- else if contains "LoadBalancer" .Values.pio.service.type }} NOTE: It may take a few minutes for the LoadBalancer IP to be available. You can watch the status of by running 'kubectl get svc -w {{ include "predictionio.fullname" . }}' export SERVICE_IP=$(kubectl get svc --namespace {{ .Release.Namespace }} {{ include "predictionio.fullname" . }} -o jsonpath='{.status.loadBalancer.ingress[0].ip}') echo http://$SERVICE_IP:{{ .Values.pio.service.port }} {{- else if contains "ClusterIP" .Values.pio.service.type }} export POD_NAME=$(kubectl get pods --namespace {{ .Release.Namespace }} -l "app.kubernetes.io/name={{ include "predictionio.name" . }},app.kubernetes.io/instance={{ .Release.Name }}" -o jsonpath="{.items[0].metadata.name}") echo "Visit http://127.0.0.1:8888 to use your application" kubectl port-forward $POD_NAME 8888:8888 {{- end }} ================================================ FILE: docker/charts/predictionio/templates/_helpers.tpl ================================================ {{/* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */}} {{- define "predictionio.name" -}} {{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" -}} {{- end -}} {{- define "predictionio.fullname" -}} {{- if .Values.fullnameOverride -}} {{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" -}} {{- else -}} {{- $name := default .Chart.Name .Values.nameOverride -}} {{- if contains $name .Release.Name -}} {{- .Release.Name | trunc 63 | trimSuffix "-" -}} {{- else -}} {{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" -}} {{- end -}} {{- end -}} {{- end -}} {{- define "predictionio.chart" -}} {{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" -}} {{- end -}} ================================================ FILE: docker/charts/predictionio/templates/pio-deployment.yaml ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # apiVersion: apps/v1beta2 kind: Deployment metadata: name: {{ include "predictionio.fullname" . }} labels: app.kubernetes.io/name: {{ include "predictionio.name" . }} helm.sh/chart: {{ include "predictionio.chart" . }} app.kubernetes.io/instance: {{ .Release.Name }} app.kubernetes.io/managed-by: {{ .Release.Service }} spec: replicas: {{ .Values.pio.replicas }} selector: matchLabels: app.kubernetes.io/name: {{ include "predictionio.name" . }} app.kubernetes.io/instance: {{ .Release.Name }} template: metadata: labels: app.kubernetes.io/name: {{ include "predictionio.name" . }} app.kubernetes.io/instance: {{ .Release.Name }} spec: containers: - name: {{ .Chart.Name }} image: "{{ .Values.pio.image.repository }}:{{ .Values.pio.image.tag }}" imagePullPolicy: {{ .Values.pio.image.pullPolicy }} env: {{ toYaml .Values.pio.env | indent 12 }} ports: - name: event containerPort: 7070 protocol: TCP - name: predict containerPort: 8000 protocol: TCP - name: jupyter containerPort: 8888 protocol: TCP livenessProbe: httpGet: path: / port: 7070 readinessProbe: httpGet: path: / port: 7070 resources: {{ toYaml .Values.pio.resources | indent 12 }} {{- with .Values.pio.nodeSelector }} nodeSelector: {{ toYaml . | indent 8 }} {{- end }} {{- with .Values.pio.affinity }} affinity: {{ toYaml . | indent 8 }} {{- end }} {{- with .Values.pio.tolerations }} tolerations: {{ toYaml . | indent 8 }} {{- end }} ================================================ FILE: docker/charts/predictionio/templates/pio-service.yaml ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # apiVersion: v1 kind: Service metadata: name: {{ include "predictionio.fullname" . }} labels: app.kubernetes.io/name: {{ include "predictionio.name" . }} helm.sh/chart: {{ include "predictionio.chart" . }} app.kubernetes.io/instance: {{ .Release.Name }} app.kubernetes.io/managed-by: {{ .Release.Service }} spec: type: {{ .Values.pio.service.type }} ports: - port: {{ .Values.pio.service.port }} targetPort: 8888 protocol: TCP name: jupyter selector: app.kubernetes.io/name: {{ include "predictionio.name" . }} app.kubernetes.io/instance: {{ .Release.Name }} ================================================ FILE: docker/charts/predictionio/values.yaml ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # pio: replicas: 1 image: repository: predictionio/pio-jupyter tag: latest pullPolicy: IfNotPresent service: type: ClusterIP port: 8888 env: - name: PIO_STORAGE_SOURCES_PGSQL_TYPE value: jdbc - name: PIO_STORAGE_SOURCES_PGSQL_URL value: "jdbc:postgresql://postgresql/pio" - name: PIO_STORAGE_SOURCES_PGSQL_USERNAME value: pio - name: PIO_STORAGE_SOURCES_PGSQL_PASSWORD value: pio - name: PIO_STORAGE_REPOSITORIES_MODELDATA_NAME value: pio_model - name: PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE value: PGSQL - name: PIO_STORAGE_REPOSITORIES_METADATA_NAME value: pio_meta - name: PIO_STORAGE_REPOSITORIES_METADATA_SOURCE value: PGSQL - name: PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME value: pio_event - name: PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE value: PGSQL - name: PYSPARK_DRIVER_PYTHON_OPTS value: "notebook --NotebookApp.token=''" resources: {} nodeSelector: {} tolerations: [] affinity: {} ================================================ FILE: docker/charts/predictionio_postgresql.yaml ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # pio: env: - name: PIO_STORAGE_SOURCES_PGSQL_TYPE value: jdbc - name: PIO_STORAGE_SOURCES_PGSQL_URL value: "jdbc:postgresql://my-postgresql-postgresql:5432/pio" - name: PIO_STORAGE_SOURCES_PGSQL_USERNAME value: pio - name: PIO_STORAGE_SOURCES_PGSQL_PASSWORD value: pio - name: PIO_STORAGE_REPOSITORIES_MODELDATA_NAME value: pio_model - name: PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE value: PGSQL - name: PIO_STORAGE_REPOSITORIES_METADATA_NAME value: pio_meta - name: PIO_STORAGE_REPOSITORIES_METADATA_SOURCE value: PGSQL - name: PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME value: pio_event - name: PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE value: PGSQL - name: PYSPARK_DRIVER_PYTHON_OPTS value: "notebook --NotebookApp.token=''" ================================================ FILE: docker/charts/spark/.helmignore ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # Patterns to ignore when building packages. # This supports shell glob matching, relative path matching, and # negation (prefixed with !). Only one pattern per line. .DS_Store # Common VCS dirs .git/ .gitignore .bzr/ .bzrignore .hg/ .hgignore .svn/ # Common backup files *.swp *.bak *.tmp *~ # Various IDEs .project .idea/ *.tmproj ================================================ FILE: docker/charts/spark/Chart.yaml ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # name: spark version: 0.3.0 appVersion: 2.3.2 description: Fast and general-purpose cluster computing system. home: http://spark.apache.org icon: http://spark.apache.org/images/spark-logo-trademark.png sources: - https://github.com/kubernetes/kubernetes/tree/master/examples/spark - https://github.com/apache/spark maintainers: - name: lachie83 email: lachlan.evenson@gmail.com - name: Shinsuke Sugaya email: shinsuke@apache.org ================================================ FILE: docker/charts/spark/README.md ================================================ # Apache Spark Helm Chart Apache Spark is a fast and general-purpose cluster computing system. * http://spark.apache.org/ This chart is based on stable/spark in [Helm Charts](https://github.com/helm/charts). ## Chart Details This chart will do the following: * 1 x Spark Master with port 8080 exposed on an external LoadBalancer * 3 x Spark Workers with HorizontalPodAutoscaler to scale to max 10 pods when CPU hits 50% of 100m * All using Kubernetes Deployments ## Prerequisites * Assumes that serviceAccount tokens are available under hostname metadata. (Works on GKE by default) URL -- http://metadata/computeMetadata/v1/instance/service-accounts/default/token ## Installing the Chart To install the chart with the release name `my-release`: ```bash $ helm install --name my-release stable/spark ``` ## Configuration The following table lists the configurable parameters of the Spark chart and their default values. ### Spark Master | Parameter | Description | Default | | ----------------------- | ---------------------------------- | ---------------------------------------------------------- | | `Master.Name` | Spark master name | `spark-master` | | `Master.Image` | Container image name | `bde2020/spark-master` | | `Master.ImageTag` | Container image tag | `2.2.2-hadoop2.7` | | `Master.Replicas` | k8s deployment replicas | `1` | | `Master.Component` | k8s selector key | `spark-master` | | `Master.Cpu` | container requested cpu | `100m` | | `Master.Memory` | container requested memory | `512Mi` | | `Master.ServicePort` | k8s service port | `7077` | | `Master.ContainerPort` | Container listening port | `7077` | | `Master.DaemonMemory` | Master JVM Xms and Xmx option | `1g` | | `Master.ServiceType ` | Kubernetes Service type | `LoadBalancer` | ### Spark WebUi | Parameter | Description | Default | |-----------------------|----------------------------------|----------------------------------------------------------| | `WebUi.Name` | Spark webui name | `spark-webui` | | `WebUi.ServicePort` | k8s service port | `8080` | | `WebUi.ContainerPort` | Container listening port | `8080` | ### Spark Worker | Parameter | Description | Default | | ----------------------- | ------------------------------------ | ---------------------------------------------------------- | | `Worker.Name` | Spark worker name | `spark-worker` | | `Worker.Image` | Container image name | `bde2020/spark-worker` | | `Worker.ImageTag` | Container image tag | `2.2.2-hadoop2.7` | | `Worker.Replicas` | k8s hpa and deployment replicas | `3` | | `Worker.ReplicasMax` | k8s hpa max replicas | `10` | | `Worker.Component` | k8s selector key | `spark-worker` | | `Worker.Cpu` | container requested cpu | `100m` | | `Worker.Memory` | container requested memory | `512Mi` | | `Worker.ContainerPort` | Container listening port | `7077` | | `Worker.CpuTargetPercentage` | k8s hpa cpu targetPercentage | `50` | | `Worker.DaemonMemory` | Worker JVM Xms and Xmx setting | `1g` | | `Worker.ExecutorMemory` | Worker memory available for executor | `1g` | | `Worker.Autoscaling` | Enable horizontal pod autoscaling | `false` | Specify each parameter using the `--set key=value[,key=value]` argument to `helm install`. Alternatively, a YAML file that specifies the values for the parameters can be provided while installing the chart. For example, ```bash $ helm install --name my-release -f values.yaml stable/spark ``` > **Tip**: You can use the default [values.yaml](values.yaml) ================================================ FILE: docker/charts/spark/templates/NOTES.txt ================================================ {{/* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */}} 1. Get the Spark URL to visit by running these commands in the same shell: NOTE: It may take a few minutes for the LoadBalancer IP to be available. You can watch the status of by running 'kubectl get svc --namespace {{ .Release.Namespace }} -w {{ template "webui-fullname" . }}' export SPARK_SERVICE_IP=$(kubectl get svc --namespace {{ .Release.Namespace }} {{ template "webui-fullname" . }} -o jsonpath='{.status.loadBalancer.ingress[0].ip}') echo http://$SPARK_SERVICE_IP:{{ .Values.WebUi.ServicePort }} ================================================ FILE: docker/charts/spark/templates/_helpers.tpl ================================================ {{/* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */}} {{/* vim: set filetype=mustache: */}} {{/* Expand the name of the chart. */}} {{- define "name" -}} {{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" -}} {{- end -}} {{/* Create fully qualified names. We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec). */}} {{- define "master-fullname" -}} {{- $name := default .Chart.Name .Values.Master.Name -}} {{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" -}} {{- end -}} {{- define "webui-fullname" -}} {{- $name := default .Chart.Name .Values.WebUi.Name -}} {{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" -}} {{- end -}} {{- define "worker-fullname" -}} {{- $name := default .Chart.Name .Values.Worker.Name -}} {{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" -}} {{- end -}} ================================================ FILE: docker/charts/spark/templates/spark-master-deployment.yaml ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # apiVersion: v1 kind: Service metadata: name: {{ template "master-fullname" . }} labels: heritage: {{ .Release.Service | quote }} release: {{ .Release.Name | quote }} chart: "{{ .Chart.Name }}-{{ .Chart.Version }}" component: "{{ .Release.Name }}-{{ .Values.Master.Component }}" spec: ports: - port: {{ .Values.Master.ServicePort }} targetPort: {{ .Values.Master.ContainerPort }} selector: component: "{{ .Release.Name }}-{{ .Values.Master.Component }}" type: {{ .Values.Master.ServiceType }} --- apiVersion: v1 kind: Service metadata: name: {{ template "webui-fullname" . }} labels: heritage: {{ .Release.Service | quote }} release: {{ .Release.Name | quote }} chart: "{{ .Chart.Name }}-{{ .Chart.Version }}" component: "{{ .Release.Name }}-{{ .Values.Master.Component }}" spec: ports: - port: {{ .Values.WebUi.ServicePort }} targetPort: {{ .Values.WebUi.ContainerPort }} selector: component: "{{ .Release.Name }}-{{ .Values.Master.Component }}" type: {{ .Values.WebUi.ServiceType }} --- apiVersion: extensions/v1beta1 kind: Deployment metadata: name: {{ template "master-fullname" . }} labels: heritage: {{ .Release.Service | quote }} release: {{ .Release.Name | quote }} chart: "{{ .Chart.Name }}-{{ .Chart.Version }}" component: "{{ .Release.Name }}-{{ .Values.Master.Component }}" spec: replicas: {{ default 1 .Values.Master.Replicas }} strategy: type: RollingUpdate selector: matchLabels: component: "{{ .Release.Name }}-{{ .Values.Master.Component }}" template: metadata: labels: heritage: {{ .Release.Service | quote }} release: {{ .Release.Name | quote }} chart: "{{ .Chart.Name }}-{{ .Chart.Version }}" component: "{{ .Release.Name }}-{{ .Values.Master.Component }}" spec: containers: - name: {{ template "master-fullname" . }} image: "{{ .Values.Master.Image }}:{{ .Values.Master.ImageTag }}" command: ["/bin/sh","-c"] args: ["echo $(hostname -i) {{ template "master-fullname" . }} >> /etc/hosts; {{ .Values.Spark.Path }}/bin/spark-class org.apache.spark.deploy.master.Master"] ports: - containerPort: {{ .Values.Master.ContainerPort }} - containerPort: {{ .Values.WebUi.ContainerPort }} resources: requests: cpu: "{{ .Values.Master.Cpu }}" memory: "{{ .Values.Master.Memory }}" env: - name: SPARK_DAEMON_MEMORY value: {{ default "1g" .Values.Master.DaemonMemory | quote }} - name: SPARK_MASTER_HOST value: {{ template "master-fullname" . }} - name: SPARK_MASTER_PORT value: {{ .Values.Master.ServicePort | quote }} - name: SPARK_MASTER_WEBUI_PORT value: {{ .Values.WebUi.ContainerPort | quote }} ================================================ FILE: docker/charts/spark/templates/spark-sql-test.yaml ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # apiVersion: v1 kind: Pod metadata: name: "{{ .Release.Name }}-sql-test-{{ randAlphaNum 5 | lower }}" annotations: "helm.sh/hook": test-success spec: containers: - name: {{ .Release.Name }}-sql-test image: {{ .Values.Master.Image }}:{{ .Values.Master.ImageTag }} command: ["{{ .Values.Spark.Path }}/bin/spark-sql", "--master", "spark://{{ .Release.Name }}-master:{{ .Values.Master.ServicePort }}", "-e", "show databases;"] restartPolicy: Never ================================================ FILE: docker/charts/spark/templates/spark-worker-deployment.yaml ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # apiVersion: extensions/v1beta1 kind: Deployment metadata: name: {{ template "worker-fullname" . }} labels: heritage: {{ .Release.Service | quote }} release: {{ .Release.Name | quote }} chart: "{{ .Chart.Name }}-{{ .Chart.Version }}" component: "{{ .Release.Name }}-{{ .Values.Worker.Component }}" spec: replicas: {{ default 1 .Values.Worker.Replicas }} strategy: type: RollingUpdate selector: matchLabels: component: "{{ .Release.Name }}-{{ .Values.Worker.Component }}" template: metadata: labels: heritage: {{ .Release.Service | quote }} release: {{ .Release.Name | quote }} chart: "{{ .Chart.Name }}-{{ .Chart.Version }}" component: "{{ .Release.Name }}-{{ .Values.Worker.Component }}" spec: containers: - name: {{ template "worker-fullname" . }} image: "{{ .Values.Worker.Image }}:{{ .Values.Worker.ImageTag }}" command: ["{{ .Values.Spark.Path }}/bin/spark-class", "org.apache.spark.deploy.worker.Worker", "spark://{{ template "master-fullname" . }}:{{ .Values.Master.ServicePort }}"] ports: - containerPort: {{ .Values.Worker.ContainerPort }} resources: requests: cpu: "{{ .Values.Worker.Cpu }}" memory: "{{ .Values.Worker.Memory }}" env: - name: SPARK_DAEMON_MEMORY value: {{ default "1g" .Values.Worker.DaemonMemory | quote }} - name: SPARK_WORKER_MEMORY value: {{ default "1g" .Values.Worker.ExecutorMemory | quote }} - name: SPARK_WORKER_WEBUI_PORT value: {{ .Values.WebUi.ContainerPort | quote }} ================================================ FILE: docker/charts/spark/templates/spark-worker-hpa.yaml ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # {{- if .Values.Worker.Autoscaling.Enabled }} apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler metadata: labels: heritage: {{ .Release.Service | quote }} release: {{ .Release.Name | quote }} chart: "{{ .Chart.Name }}-{{ .Chart.Version }}" component: "{{ .Release.Name }}-{{ .Values.Worker.Component }}" name: {{ template "worker-fullname" . }} spec: scaleTargetRef: apiVersion: apps/v1beta1 kind: Deployment name: {{ template "worker-fullname" . }} minReplicas: {{ .Values.Worker.Replicas }} maxReplicas: {{ .Values.Worker.ReplicasMax }} metrics: - type: Resource resource: name: cpu targetAverageUtilization: {{ .Values.Worker.CpuTargetPercentage }} {{- end }} ================================================ FILE: docker/charts/spark/values.yaml ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # Default values for spark. # This is a YAML-formatted file. # Declare name/value pairs to be passed into your templates. # name: value Spark: Path: "/spark" Master: Name: master Image: "bde2020/spark-master" ImageTag: "2.2.2-hadoop2.7" Replicas: 1 Component: "spark-master" Cpu: "100m" Memory: "512Mi" ServicePort: 7077 ContainerPort: 7077 # Set Master JVM memory. Default 1g # DaemonMemory: 1g ServiceType: LoadBalancer WebUi: Name: webui ServicePort: 8080 ContainerPort: 8080 ServiceType: LoadBalancer Worker: Name: worker Image: "bde2020/spark-worker" ImageTag: "2.2.2-hadoop2.7" Replicas: 3 Component: "spark-worker" Cpu: "100m" Memory: "512Mi" ContainerPort: 8081 # Set Worker JVM memory. Default 1g # DaemonMemory: 1g # Set how much total memory workers have to give executors # ExecutorMemory: 1g Autoscaling: Enabled: false ReplicasMax: 10 CpuTargetPercentage: 50 ================================================ FILE: docker/deploy/run.sh ================================================ #!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # cd /templates/$PIO_TEMPLATE_NAME pio deploy ================================================ FILE: docker/docker-compose.deploy.yml ================================================ # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. version: "3" services: pio: environment: - "PIO_TEMPLATE_NAME=MyRecommendation" - "PIO_RUN_FILE=/deploy/run.sh" volumes: - ./deploy:/deploy ================================================ FILE: docker/docker-compose.jupyter.yml ================================================ # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. version: "3" services: pio: image: predictionio/pio-jupyter:latest ports: - 7070:7070 - 8000:8000 - 8888:8888 volumes: - ./templates:/home/jovyan/templates - ./.ivy2:/home/jovyan/.ivy2 environment: - CHOWN_HOME=yes - GRANT_SUDO=yes - VOLUME_UID=yes - "PYSPARK_DRIVER_PYTHON_OPTS=notebook --NotebookApp.token=''" dns: 8.8.8.8 ================================================ FILE: docker/docker-compose.spark.yml ================================================ # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. version: "3" services: spark-master: image: bde2020/spark-master:2.2.2-hadoop2.7 container_name: spark-master ports: - "8080:8080" - "7077:7077" environment: - INIT_DAEMON_STEP=setup_spark spark-worker-1: image: bde2020/spark-worker:2.2.2-hadoop2.7 container_name: spark-worker-1 depends_on: - spark-master ports: - "8081:8081" environment: - "SPARK_MASTER=spark://spark-master:7077" ================================================ FILE: docker/docker-compose.yml ================================================ # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. version: "3" services: pio: image: predictionio/pio:latest ports: - 7070:7070 - 8000:8000 volumes: - ./templates:/templates dns: 8.8.8.8 ================================================ FILE: docker/elasticsearch/docker-compose.base.yml ================================================ # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. version: "3" services: elasticsearch: image: docker.elastic.co/elasticsearch/elasticsearch:5.6.4 environment: - xpack.graph.enabled=false - xpack.ml.enabled=false - xpack.monitoring.enabled=false - xpack.security.enabled=false - xpack.watcher.enabled=false - cluster.name=predictionio - bootstrap.memory_lock=true - "ES_JAVA_OPTS=-Xms1g -Xmx1g" ulimits: memlock: soft: -1 hard: -1 pio: depends_on: - elasticsearch environment: PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE: elasticsearch PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS: elasticsearch PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS: 9200 PIO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES: http ================================================ FILE: docker/elasticsearch/docker-compose.event.yml ================================================ # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. version: "3" services: pio: environment: PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME: pio_event PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE: ELASTICSEARCH ================================================ FILE: docker/elasticsearch/docker-compose.meta.yml ================================================ # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. version: "3" services: pio: environment: PIO_STORAGE_REPOSITORIES_METADATA_NAME: pio_meta PIO_STORAGE_REPOSITORIES_METADATA_SOURCE: ELASTICSEARCH ================================================ FILE: docker/jupyter/Dockerfile ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # FROM predictionio/pio:latest ENV DEBIAN_FRONTEND noninteractive RUN apt-get update \ && apt install -y build-essential curl git gcc make openssl libssl-dev libbz2-dev \ apt-transport-https ca-certificates g++ gnupg graphviz lsb-release openssh-client zip \ libreadline-dev libsqlite3-dev cmake libxml2-dev wget bzip2 sudo vim unzip locales \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* RUN echo "en_US.UTF-8 UTF-8" > /etc/locale.gen && locale-gen ENV LC_ALL=en_US.UTF-8 \ LANG=en_US.UTF-8 \ LANGUAGE=en_US.UTF-8 \ NB_USER=jovyan \ NB_UID=1000 \ NB_GID=100 \ CONDA_DIR=/opt/conda \ PIP_DEFAULT_TIMEOUT=180 ENV PATH=$CONDA_DIR/bin:$PATH \ HOME=/home/$NB_USER ADD fix-permissions /usr/local/bin/fix-permissions RUN chmod +x /usr/local/bin/fix-permissions \ && groupadd wheel -g 11 \ && echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su \ && useradd -m -s /bin/bash -N -u $NB_UID $NB_USER \ && mkdir -p $CONDA_DIR \ && chmod g+w /etc/passwd \ && fix-permissions $HOME \ && fix-permissions $CONDA_DIR USER $NB_USER ENV MINICONDA_VERSION 4.4.10 RUN wget -q https://repo.continuum.io/miniconda/Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh -O /tmp/miniconda.sh \ && echo 'bec6203dbb2f53011e974e9bf4d46e93 */tmp/miniconda.sh' | md5sum -c - \ && bash /tmp/miniconda.sh -f -b -p $CONDA_DIR \ && rm /tmp/miniconda.sh \ && conda config --system --prepend channels conda-forge \ && conda config --system --set auto_update_conda false \ && conda config --system --set show_channel_urls true \ && conda install --quiet --yes conda="${MINICONDA_VERSION%.*}.*" \ && conda update --all --quiet --yes \ && conda clean -tipsy \ && rm -rf /home/$NB_USER/.cache/yarn \ && fix-permissions $CONDA_DIR \ && fix-permissions /home/$NB_USER RUN conda install --quiet --yes 'tini=0.18.0' \ && conda list tini | grep tini | tr -s ' ' | cut -d ' ' -f 1,2 >> $CONDA_DIR/conda-meta/pinned \ && conda clean -tipsy \ && fix-permissions $CONDA_DIR \ && fix-permissions /home/$NB_USER RUN conda install --quiet --yes 'notebook=5.6.*' 'jupyterlab=0.34.*' nodejs\ && jupyter labextension install @jupyterlab/hub-extension@^0.11.0 \ && jupyter notebook --generate-config \ && conda clean -tipsy \ && npm cache clean --force \ && rm -rf $CONDA_DIR/share/jupyter/lab/staging \ && rm -rf /home/$NB_USER/.cache/yarn \ && fix-permissions $CONDA_DIR \ && fix-permissions /home/$NB_USER ADD requirements.txt /tmp/requirements.txt RUN pip --no-cache-dir install -r /tmp/requirements.txt \ && fix-permissions $CONDA_DIR \ && fix-permissions /home/$NB_USER COPY jupyter_notebook_config.py /home/$NB_USER/.jupyter/ COPY start*.sh /usr/local/bin/ USER root RUN chmod +x /usr/local/bin/*.sh EXPOSE 8888 WORKDIR $HOME ENTRYPOINT ["tini", "--"] CMD ["/usr/local/bin/start-jupyter.sh"] ================================================ FILE: docker/jupyter/fix-permissions ================================================ #!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # set -e for d in $@; do find "$d" \ ! \( \ -group $NB_GID \ -a -perm -g+rwX \ \) \ -exec chgrp $NB_GID {} \; \ -exec chmod g+rwX {} \; find "$d" \ \( \ -type d \ -a ! -perm -6000 \ \) \ -exec chmod +6000 {} \; done ================================================ FILE: docker/jupyter/jupyter_notebook_config.py ================================================ # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. c = get_config() c.NotebookApp.ip = '*' c.NotebookApp.port = 8888 c.NotebookApp.open_browser = False ================================================ FILE: docker/jupyter/requirements.txt ================================================ cython google-cloud h5py ipywidgets jupyter_contrib_nbextensions keras matplotlib pandas pandas-gbq predictionio sklearn tensor2tensor tensorflow widgetsnbextension ================================================ FILE: docker/jupyter/start-jupyter.sh ================================================ #!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # set -e # store PIO environment to pio-env.sh PIO_ENV_FILE=/etc/predictionio/pio-env.sh env | grep ^PIO_ >> $PIO_ENV_FILE if [ $(grep _MYSQL_ $PIO_ENV_FILE | wc -l) = 0 ] ; then sed -i "s/^MYSQL/#MYSQL/" $PIO_ENV_FILE fi # start event server sh /usr/bin/pio_run & export PYSPARK_PYTHON=$CONDA_DIR/bin/python if [ x"$PYSPARK_DRIVER_PYTHON" = "x" ] ; then export PYSPARK_DRIVER_PYTHON=$CONDA_DIR/bin/jupyter fi if [ x"$PYSPARK_DRIVER_PYTHON_OPTS" = "x" ] ; then export PYSPARK_DRIVER_PYTHON_OPTS=notebook fi . /usr/local/bin/start.sh $PIO_HOME/bin/pio-shell --with-pyspark ================================================ FILE: docker/jupyter/start.sh ================================================ #!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # set -e if [[ "$VOLUME_UID" == "1" || "$VOLUME_UID" == 'yes' ]]; then DIR_UID=`ls -lnd /home/jovyan/templates | awk '{print $3}'` if [ x"$DIR_UID" != "x" -a x"$DIR_UID" != "x0" ] ; then NB_UID=$DIR_UID fi fi if [ $(id -u) == 0 ] ; then if id jovyan &> /dev/null ; then echo "Set username to $NB_USER" usermod -d /home/$NB_USER -l $NB_USER jovyan fi if [[ "$CHOWN_HOME" == "1" || "$CHOWN_HOME" == 'yes' ]]; then echo "Change ownership of /home/$NB_USER to $NB_UID" chown -R $NB_UID /home/$NB_USER fi if [ ! -z "$CHOWN_EXTRA" ]; then for extra_dir in $(echo $CHOWN_EXTRA | tr ',' ' '); do chown -R $NB_UID $extra_dir done fi if [[ "$NB_USER" != "jovyan" ]]; then if [[ ! -e "/home/$NB_USER" ]]; then echo "Move home dir to /home/$NB_USER" mv /home/jovyan "/home/$NB_USER" fi if [[ "$PWD/" == "/home/jovyan/"* ]]; then newcwd="/home/$NB_USER/${PWD:13}" echo "Set CWD to $newcwd" cd "$newcwd" fi fi if [ "$NB_UID" != $(id -u $NB_USER) ] ; then echo "Set $NB_USER to uid:$NB_UID" usermod -u $NB_UID $NB_USER fi if [ "$NB_GID" != $(id -g $NB_USER) ] ; then echo "Add $NB_USER to gid:$NB_GID" groupadd -g $NB_GID -o ${NB_GROUP:-${NB_USER}} usermod -g $NB_GID -a -G $NB_GID,100 $NB_USER fi if [[ "$GRANT_SUDO" == "1" || "$GRANT_SUDO" == 'yes' ]]; then echo "Set sudo access to $NB_USER" echo "$NB_USER ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/notebook fi echo "Execute command as $NB_USER" exec su $NB_USER -c "env PATH=$PATH $*" else echo "Execute command" exec $* fi ================================================ FILE: docker/localfs/docker-compose.model.yml ================================================ # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. version: "3" services: pio: environment: PIO_STORAGE_REPOSITORIES_MODELDATA_NAME: pio_model PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE: LOCALFS PIO_FS_BASEDIR: /work/pio_store PIO_FS_ENGINESDIR: /work/pio_store/engines PIO_FS_TMPDIR: /work/pio_store/tmp PIO_STORAGE_SOURCES_LOCALFS_TYPE: localfs PIO_STORAGE_SOURCES_LOCALFS_PATH: /work/pio_store/models ================================================ FILE: docker/mysql/docker-compose.base.yml ================================================ # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. version: "3" services: mysql: image: mysql:8 command: mysqld --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci environment: MYSQL_ROOT_PASSWORD: root MYSQL_USER: pio MYSQL_PASSWORD: pio MYSQL_DATABASE: pio pio: depends_on: - mysql environment: PIO_STORAGE_SOURCES_MYSQL_TYPE: jdbc PIO_STORAGE_SOURCES_MYSQL_URL: "jdbc:mysql://mysql/pio" PIO_STORAGE_SOURCES_MYSQL_USERNAME: pio PIO_STORAGE_SOURCES_MYSQL_PASSWORD: pio ================================================ FILE: docker/mysql/docker-compose.event.yml ================================================ # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. version: "3" services: pio: environment: PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME: pio_event PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE: MYSQL ================================================ FILE: docker/mysql/docker-compose.meta.yml ================================================ # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. version: "3" services: pio: environment: PIO_STORAGE_REPOSITORIES_METADATA_NAME: pio_meta PIO_STORAGE_REPOSITORIES_METADATA_SOURCE: MYSQL ================================================ FILE: docker/mysql/docker-compose.model.yml ================================================ # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. version: "3" services: pio: environment: PIO_STORAGE_REPOSITORIES_MODELDATA_NAME: pio_model PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE: MYSQL ================================================ FILE: docker/pgsql/docker-compose.base.yml ================================================ # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. version: "3" services: postgres: image: postgres:9 environment: POSTGRES_USER: pio POSTGRES_PASSWORD: pio POSTGRES_INITDB_ARGS: --encoding=UTF8 pio: depends_on: - postgres environment: PIO_STORAGE_SOURCES_PGSQL_TYPE: jdbc PIO_STORAGE_SOURCES_PGSQL_URL: "jdbc:postgresql://postgres/pio" PIO_STORAGE_SOURCES_PGSQL_USERNAME: pio PIO_STORAGE_SOURCES_PGSQL_PASSWORD: pio ================================================ FILE: docker/pgsql/docker-compose.event.yml ================================================ # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. version: "3" services: pio: environment: PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME: pio_event PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE: PGSQL ================================================ FILE: docker/pgsql/docker-compose.meta.yml ================================================ # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. version: "3" services: pio: environment: PIO_STORAGE_REPOSITORIES_METADATA_NAME: pio_meta PIO_STORAGE_REPOSITORIES_METADATA_SOURCE: PGSQL ================================================ FILE: docker/pgsql/docker-compose.model.yml ================================================ # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. version: "3" services: pio: environment: PIO_STORAGE_REPOSITORIES_MODELDATA_NAME: pio_model PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE: PGSQL ================================================ FILE: docker/pio/Dockerfile ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # FROM openjdk:8 ARG PIO_GIT_URL=https://github.com/apache/predictionio.git ARG PIO_TAG=v0.13.0 ENV SCALA_VERSION=2.11.12 ENV SPARK_VERSION=2.2.3 ENV HADOOP_VERSION=2.7.7 ENV ELASTICSEARCH_VERSION=5.5.3 ENV PGSQL_VERSION=42.2.4 ENV MYSQL_VERSION=8.0.12 ENV PIO_HOME=/usr/share/predictionio RUN apt-get update && \ apt-get install -y dpkg-dev fakeroot && \ apt-get clean && \ rm -rf /var/lib/apt/lists/* WORKDIR /opt/src RUN git clone -b $PIO_TAG $PIO_GIT_URL WORKDIR /opt/src/predictionio RUN bash ./make-distribution.sh \ -Dscala.version=$SCALA_VERSION \ -Dspark.version=$SPARK_VERSION \ -Dhadoop.version=$HADOOP_VERSION \ -Delasticsearch.version=$ELASTICSEARCH_VERSION \ --with-deb && \ dpkg -i ./assembly/target/predictionio_*.deb && \ cp -r ./python /usr/share/predictionio && \ mkdir /var/log/predictionio && \ rm -rf /opt/src/predictionio/* RUN cp /etc/predictionio/pio-env.sh /etc/predictionio/pio-env.sh.orig && \ echo "#!/usr/bin/env bash" > /etc/predictionio/pio-env.sh RUN curl -o $PIO_HOME/lib/postgresql-$PGSQL_VERSION.jar \ http://central.maven.org/maven2/org/postgresql/postgresql/$PGSQL_VERSION/postgresql-$PGSQL_VERSION.jar && \ echo "POSTGRES_JDBC_DRIVER=$PIO_HOME/lib/postgresql-$PGSQL_VERSION.jar" >> /etc/predictionio/pio-env.sh && \ echo "MYSQL_JDBC_DRIVER=$PIO_HOME/lib/mysql-connector-java-$MYSQL_VERSION.jar" >> /etc/predictionio/pio-env.sh WORKDIR /usr/share RUN curl -o /opt/src/spark-$SPARK_VERSION.tgz \ http://archive.apache.org/dist/spark/spark-$SPARK_VERSION/spark-$SPARK_VERSION-bin-hadoop2.7.tgz && \ tar zxvf /opt/src/spark-$SPARK_VERSION.tgz && \ echo "SPARK_HOME="`pwd`/`ls -d spark*` >> /etc/predictionio/pio-env.sh && \ rm -rf /opt/src WORKDIR /templates ADD pio_run /usr/bin/pio_run EXPOSE 7070 EXPOSE 8000 CMD ["sh", "/usr/bin/pio_run"] ================================================ FILE: docker/pio/pio_run ================================================ #!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # . /etc/predictionio/pio-env.sh # check elasticsearch status if [ x"$PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE" != "x" ] ; then RET=-1 COUNT=0 ES_HOST=`echo $PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS | sed -e "s/,.*//"` ES_PORT=`echo $PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS | sed -e "s/,.*//"` # Wait for elasticsearch startup while [ $RET != 0 -a $COUNT -lt 10 ] ; do echo "Waiting for ${ES_HOST}..." curl --connect-timeout 60 --retry 10 -s "$ES_HOST:$ES_PORT/_cluster/health?wait_for_status=green&timeout=1m" RET=$? COUNT=`expr $COUNT + 1` sleep 1 done fi # check mysql jar file if [ x"$PIO_STORAGE_SOURCES_MYSQL_TYPE" != "x" ] ; then MYSQL_JAR_FILE=$PIO_HOME/lib/mysql-connector-java-$MYSQL_VERSION.jar if [ ! -f $MYSQL_JAR_FILE ] ; then curl -o $MYSQL_JAR_FILE http://central.maven.org/maven2/mysql/mysql-connector-java/$MYSQL_VERSION/mysql-connector-java-$MYSQL_VERSION.jar fi fi # Check PIO status RET=-1 COUNT=0 while [ $RET != 0 -a $COUNT -lt 10 ] ; do echo "Waiting for PredictionIO..." $PIO_HOME/bin/pio status RET=$? COUNT=`expr $COUNT + 1` sleep 1 done if [ x"$PIO_RUN_FILE" != "x" ] ; then sh $PIO_RUN_FILE else # Start PIO Event Server $PIO_HOME/bin/pio eventserver fi ================================================ FILE: docker/templates/.keep ================================================ # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. ================================================ FILE: docs/javadoc/README.md ================================================ Java API Documentation ====================== 1. Run this command at the project's root. ``` $ sbt/sbt unidoc ``` 2. Point your web browser at `target/javaunidoc/index.html`. ================================================ FILE: docs/javadoc/javadoc-overview.html ================================================

PredictionIO API Documentation

If you are building a prediction engine, the most interesting package would be org.apache.predictionio.controller.java and org.apache.predictionio.data.store.java You may also want to look at org.apache.predictionio.controller, as some functionality, such as custom model persistence {@link org.apache.predictionio.controller.PersistentModel}, are provided directly by that package.

================================================ FILE: docs/manual/.gitignore ================================================ # Bower /bower_components # Bundler /.bundle # Build Directory /build # Cache /.sass-cache /.cache # Sanity /sanity.html # OS Files .DS_Store .DS_Store? ._* .Spotlight-V100 .Trashes .AppleDouble .LSOverride Icon Desktop.ini Icon? ehthumbs.db Thumbs.db *~ .project _site ================================================ FILE: docs/manual/Gemfile ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # source 'https://rubygems.org' gem 'middleman', '~> 3.3.10' gem 'middleman-livereload', '~> 3.4.2' gem 'middleman-autoprefixer' gem 'middleman-minify-html' gem 'middleman-syntax' gem 'middleman-s3_sync' gem 'middleman-search_engine_sitemap' gem 'slim' gem 'therubyracer' gem 'oj' gem 'redcarpet', '>= 3.2.3' gem 'travis' gem 'nokogiri' gem 'rainbow' gem "bootstrap-sass", require: false platforms :mswin, :mingw do gem 'tzinfo-data' gem 'wdm', '~> 0.1.0' end ================================================ FILE: docs/manual/Rakefile ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # require 'middleman' require 'nokogiri' require 'rainbow/ext/string' require 'uri' require 'net/http' require 'redcarpet' require File.join(File.dirname(__FILE__), 'lib', 'custom_renderer') task :test do HTML = <
``` Test 0 <> Test 1 > Test 3 < Test 4 >< Test 5 => Test 6 <= Test 7 <>

Test 8

```
EOT HTML2 = <
```ruby Test 0 <> Test 1 > Test 3 < Test 4 >< Test 5 => Test 6 <= Test 7 <>

Test

# This is a ruby file. class MyClass def foo 'bar' end end ```
This is a test of **markdown** inside a tab! ``` // This tab does not have the data-lang attribute set! $ cd path/to/your/file ```
```html

Yes you can still use HTML in code blocks!

```
EOT def block_html(raw_html) done = raw_html.gsub(/(```.*?```)/m) do |match| markdown = Redcarpet::Markdown.new(CustomRenderer, fenced_code_blocks: true) markdown.render(match) end doc = Nokogiri::HTML::DocumentFragment.parse(done) nodes = doc.css('div.tabs > div') if nodes.empty? raw_html else ul = Nokogiri::XML::Node.new('ul', doc) ul['class'] = 'control' nodes.each do |node| title = node.attribute('data-tab').to_s lang = node.attribute('data-lang').to_s uuid = SecureRandom.uuid id = "tab-#{uuid}" li = Nokogiri::XML::Node.new('li', doc) li['data-lang'] = lang li.inner_html = %Q(#{title}) ul.add_child(li) node['id'] = id end nodes.first.before(ul) doc.to_html end end puts 'start block' puts block_html(HTML2) puts 'end block' end desc 'Check site for broken links' task :check do sets = [] cache = Sanity::Cache.new Dir["build/**/*.html"].each do |filename| p = Sanity::Page.new(filename, cache) sets << p.check_links end html = Sanity::Report::HTML.new(sets.map{ |s| s.to_html }) File.open('sanity.html', 'w') { |file| file.write(html) } end module Sanity module Report class HTML include Padrino::Helpers HEADER = < Sanity Report

Sanity Report

EOT FOOTER = < EOT def initialize(content) if content.respond_to?(:to_html) html = content.to_html else html = content end @content = content_tag(:div, html, id: 'content') end def to_html "#{HEADER}#{@content}#{FOOTER}" end def to_s to_html end end end class ResultSet include Padrino::Helpers attr_accessor :set def initialize(path, set = []) @path = path @set = set end def push(item) @set.push(item) end def to_html content_tag(:h2, @path) << content_tag(:table, class: 'table table-striped') do content_tag(:thead) do content_tag(:tr) do content_tag(:th, 'Type') << content_tag(:th, 'Status') << content_tag(:th, 'URI') << content_tag(:th, 'Message') << content_tag(:th, 'Path') end end << content_tag(:tbody) do @set.map do |item| item.to_html end end end end end class Result include Padrino::Helpers attr_accessor :type, :status, :path, :uri, :message, :cache, :backtrace def initialize end def exception=(e) @status = :exception @message = e.message @backtrace = e.backtrace end def to_s "#{@type} [#{@status}] #{@uri} #{@message} #{@path}".color(terminal_color) end def to_html content_tag(:tr, class: bootstrap_css_class) do content_tag(:td, @type) << content_tag(:td, @status) << content_tag(:td, @uri) << content_tag(:td, @message) << content_tag(:td, @path) end end private def bootstrap_css_class case @status when :success 'success' when :info 'info' when :warning 'warning' when :error 'danger' when :exception 'danger' else raise ArgumentError, "Status `#{@status}` is not a valid type" end end def terminal_color case @status when :success :green when :info :blue when :warning :yellow when :error :red when :exception :red else raise ArgumentError, "Status `#{@status}` is not a valid type" end end end class Cache def initialize(store = {}) @store = store end def read(uri) @store[uri] end def write(uri, value) @store[uri] = value end def fetch(uri) if block_given? if exists?(uri) read(uri) else write(uri, yield) end else read(uri) end end def exists?(uri) @store.has_key?(uri) end end class Page INDEX_FILE = 'index.html' def initialize(filename, cache = Sanity::Cache.new) @filename = filename @cache = cache f = File.open(@filename) @doc = Nokogiri::HTML(f) @build_path = File.join(Middleman::Application.root, 'build') f.close end def check_links rs = Sanity::ResultSet.new(@filename) @doc.css('a').each do |link| uri = link['href'] r = check_href(uri) r.path = @filename puts r rs.push(r) end rs end def check_href(href) # TODO: add trailing slash, relative url, and in page anchor links. # TODO: Test for missing titles! # TODO: Test for unneeded .html extension! # TODO: Switch from Nokogir to raw ID checker case href when /\A\s*\z/, nil check_empty_href(href) when /\A(https?):\/\/.+\z/ check_external_href(href) when /\A#.+\z/ check_anchor_href(href) when /\A#\z/ check_empty_anchor_href(href) when /\A\/\z/ check_root_href(href) when /\Amailto:.+\z/ check_mailto_href(href) else check_internal_href(href) end end def check_external_href(href) r = Sanity::Result.new r.uri = href r.type = :external_uri begin response = @cache.fetch(href) do r.cache = :miss uri = URI(href) Net::HTTP.get_response(uri) end case response when Net::HTTPSuccess r.status = :success when Net::HTTPNotFound r.status = :error when Net::HTTPRedirection location = response['location'] r.status = :info r.message = "Redirect: #{location}" else r.status = :warning r.message = "Response: #{response.class}" end rescue => e r.exception = e end r end def check_anchor_href(href) r = Sanity::Result.new r.uri = href r.type = :anchor begin result = @doc.css(href) if result.count > 0 r.status = :success else r.status = :error end rescue => e r.exception = e end r end def check_empty_anchor_href(href) r = Sanity::Result.new r.uri = href r.type = :empty_anchor r.status = :info r end def check_root_href(href) r = Sanity::Result.new r.uri = href r.type = :root_path filename = File.join(@build_path, INDEX_FILE) if File.exist?(filename) r.status = :success else r.status = :error r.message "Not found: #{filename}" end r end def check_mailto_href(href) r = Sanity::Result.new r.uri = href r.type = :mailto uri = URI.parse(href) if uri.is_a?(URI::MailTo) r.status = :success else r.status = :error end r end def check_empty_href(href) r = Sanity::Result.new r.uri = href r.type = :empty_uri r.status = :error r end def check_internal_href(href) r = Sanity::Result.new r.uri = href r.type = :internal_uri filename = File.join(@build_path, href.gsub('/', File::SEPARATOR)) if File.directory?(filename) filename = File.join(filename, INDEX_FILE) end if File.exist?(filename) r.status = :success else r.status = :error end r end end end ================================================ FILE: docs/manual/bower.json ================================================ { "name": "predictionio.apache.org", "description": "Apache PredictionIO Documentation", "license": "Apache-2.0", "homepage": "predictionio.apache.org", "ignore": [ "**/.*", "node_modules", "bower_components" ], "dependencies": { "jquery": "~2.1.1", "normalize.css": "~3.0.2", "Slidebars": "~0.10.2", "Tabslet": "~1.4.8", "jcarousel": "~0.3.3" } } ================================================ FILE: docs/manual/config.rb ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # require 'lib/custom_renderer' require 'lib/gallery_generator' # General Settings set :css_dir, 'stylesheets' set :js_dir, 'javascripts' set :images_dir, 'images' set :partials_dir, 'partials' activate :directory_indexes activate :syntax, line_numbers: true activate :autoprefixer # Markdown set :markdown_engine, :redcarpet set :markdown, renderer: ::CustomRenderer, fenced_code_blocks: true, no_intra_emphasis: true, autolink: true, strikethrough: true, superscript: true, highlight: true, underline: true, tables: true # Sprockets sprockets.append_path File.join root, 'bower_components' # Sitemap set :url_root, '//predictionio.apache.org' activate :search_engine_sitemap, exclude_attr: 'hidden' # Development Settings configure :development do set :scheme, 'http' set :host, Middleman::PreviewServer.host rescue 'localhost' set :port, Middleman::PreviewServer.port rescue 80 Slim::Engine.set_options pretty: false, sort_attrs: false set :debug_assets, true end # Build Settings configure :build do set :scheme, 'https' set :host, 'predictionio.apache.org' set :port, 80 Slim::Engine.set_options pretty: false, sort_attrs: false activate :asset_hash activate :minify_css activate :minify_javascript activate :minify_html do |html| html.remove_multi_spaces = true html.remove_comments = true html.remove_intertag_spaces = false html.remove_quotes = false html.simple_doctype = false html.remove_script_attributes = true html.remove_style_attributes = false html.remove_link_attributes = false html.remove_form_attributes = false html.remove_input_attributes = false html.remove_javascript_protocol = true html.remove_http_protocol = false html.remove_https_protocol = false html.preserve_line_breaks = false html.simple_boolean_attributes = false end end # Hacks # Engine Template Gallery generation current_dir = File.dirname(__FILE__) yaml_file_path = "#{current_dir}/source/gallery/templates.yaml" out_file_path = "#{current_dir}/source/gallery/template-gallery.html.md" Gallery.generate_md(yaml_file_path, out_file_path) # https://github.com/middleman/middleman/issues/612 Slim::Engine.disable_option_validator! # https://github.com/Aupajo/middleman-search_engine_sitemap/issues/2 set :file_watcher_ignore, [ /^bin(\/|$)/, /^\.bundle(\/|$)/, # /^vendor(\/|$)/, # Keep this commented out! /^node_modules(\/|$)/, /^\.sass-cache(\/|$)/, /^\.cache(\/|$)/, /^\.git(\/|$)/, /^\.gitignore$/, /\.DS_Store/, /^\.rbenv-.*$/, /^Gemfile$/, /^Gemfile\.lock$/, /~$/, /(^|\/)\.?#/, /^tmp\// ] ================================================ FILE: docs/manual/data/nav/build.yml ================================================ root: - body: 'Samples' url: '#' children: - body: 'Typography' url: '/samples/' - body: 'Sizing' url: '/samples/sizing/' - body: 'Tabs' url: '/samples/tabs/' - body: 'Languages' url: '/samples/languages/' - body: 'Menu' url: '/samples/level-1/' children: - body: 'Level 2.1' url: '/samples/level-2-1/' - body: 'Level 2.2' url: '/samples/level-2-2/' children: - body: 'Level 3.1 This Title Is Very Long To Test Line Wrap' url: '/samples/level-3-1/' children: - body: 'Level 4.1' url: '/samples/level-4-1/' - body: 'Level 4.2' url: '/samples/level-4-2/' - body: 'Level 4.3' url: '/samples/level-4-3/' - body: 'Level 3.2' url: '/samples/level-3-2/' ================================================ FILE: docs/manual/data/nav/main.yml ================================================ root: - body: 'Apache PredictionIO® Documentation' url: '/' children: - body: 'Welcome to Apache PredictionIO®' url: '/' - body: 'Getting Started' url: '#' children: - body: 'A Quick Intro' url: '/start/' - body: 'Installing Apache PredictionIO' url: '/install/' - body: 'Downloading an Engine Template' url: '/start/download/' - body: 'Deploying Your First Engine' url: '/start/deploy/' - body: 'Customizing the Engine' url: '/start/customize/' - body: 'Integrating with Your App' url: '#' children: - body: 'App Integration Overview' url: '/appintegration/' - body: 'List of SDKs' url: '/sdk/' children: - body: 'Java & Android SDK' url: '/sdk/java/' - body: 'PHP SDK' url: '/sdk/php/' - body: 'Python SDK' url: '/sdk/python/' - body: 'Ruby SDK' url: '/sdk/ruby/' - body: 'Community Powered SDKs' url: '/community/projects.html#sdks' - body: 'Deploying an Engine' url: '#' children: - body: 'Deploying as a Web Service' url: '/deploy/' - body: 'Batch Predictions' url: '/batchpredict/' - body: 'Monitoring Engine' url: '/deploy/monitoring/' - body: 'Setting Engine Parameters' url: '/deploy/engineparams/' - body: 'Deploying Multiple Engine Variants' url: '/deploy/enginevariants/' - body: 'Engine Server Plugin' url: '/deploy/plugin/' - body: 'Customizing an Engine' url: '#' children: - body: 'Learning DASE' url: '/customize/' - body: 'Implement DASE' url: '/customize/dase/' - body: 'Troubleshooting Engine Development' url: '/customize/troubleshooting/' - body: 'Engine Scala APIs' url: '/api/current/#package' - body: 'Collecting and Analyzing Data' url: '#' children: - body: 'Event Server Overview' url: '/datacollection/' - body: 'Collecting Data with REST/SDKs' url: '/datacollection/eventapi/' - body: 'Events Modeling' url: '/datacollection/eventmodel/' - body: 'Unifying Multichannel Data with Webhooks' url: '/datacollection/webhooks/' - body: 'Channel' url: '/datacollection/channel/' - body: 'Importing Data in Batch' url: '/datacollection/batchimport/' - body: 'Using Analytics Tools' url: '/datacollection/analytics/' - body: 'Event Server Plugin' url: '/datacollection/plugin/' - body: 'Choosing an Algorithm' url: '#' children: - body: 'Built-in Algorithm Libraries' url: '/algorithm/' - body: 'Switching to Another Algorithm' url: '/algorithm/switch/' - body: 'Combining Multiple Algorithms' url: '/algorithm/multiple/' - body: 'Adding Your Own Algorithms' url: '/algorithm/custom/' - body: 'Tuning and Evaluation' url: '#' children: - body: 'Overview' url: '/evaluation/' - body: 'Hyperparameter Tuning' url: '/evaluation/paramtuning/' - body: 'Evaluation Dashboard' url: '/evaluation/evaluationdashboard/' - body: 'Choosing Evaluation Metrics' url: '/evaluation/metricchoose/' - body: 'Building Evaluation Metrics' url: '/evaluation/metricbuild/' - body: 'System Architecture' url: '#' children: - body: 'Architecture Overview' url: '/system/' - body: 'Using Another Data Store' url: '/system/anotherdatastore/' - body: 'PredictionIO® Official Templates' url: '#' children: - body: 'Intro' url: '/templates/' - body: 'Recommendation' children: - body: 'Quick Start' url: '/templates/recommendation/quickstart/' - body: 'DASE' url: '/templates/recommendation/dase/' - body: 'Evaluation Explained' url: '/templates/recommendation/evaluation/' - body: 'How-To' url: '/templates/recommendation/how-to/' - body: 'Read Custom Events' url: '/templates/recommendation/reading-custom-events/' - body: 'Customize Data Preparator' url: '/templates/recommendation/customize-data-prep/' - body: 'Customize Serving' url: '/templates/recommendation/customize-serving/' - body: 'Train with Implicit Preference' url: '/templates/recommendation/training-with-implicit-preference/' - body: 'Filter Recommended Items by Blacklist in Query' url: '/templates/recommendation/blacklist-items/' - body: 'Batch Persistable Evaluator' url: '/templates/recommendation/batch-evaluator/' - body: 'E-Commerce Recommendation' children: - body: 'Quick Start' url: '/templates/ecommercerecommendation/quickstart/' - body: 'DASE' url: '/templates/ecommercerecommendation/dase/' - body: 'How-To' url: '/templates/ecommercerecommendation/how-to/' - body: 'Train with Rate Event' url: '/templates/ecommercerecommendation/train-with-rate-event/' - body: 'Adjust Score' url: '/templates/ecommercerecommendation/adjust-score/' - body: 'Similar Product' children: - body: 'Quick Start' url: '/templates/similarproduct/quickstart/' - body: 'DASE' url: '/templates/similarproduct/dase/' - body: 'How-To' url: '/templates/similarproduct/how-to/' - body: 'Multiple Events and Multiple Algorithms' url: '/templates/similarproduct/multi-events-multi-algos/' - body: 'Returns Item Properties' url: '/templates/similarproduct/return-item-properties/' - body: 'Train with Rate Event' url: '/templates/similarproduct/train-with-rate-event/' - body: 'Get Rid of Events for Users' url: '/templates/similarproduct/rid-user-set-event/' - body: 'Recommend Users' url: '/templates/similarproduct/recommended-user/' - body: 'Classification' children: - body: 'Quick Start' url: '/templates/classification/quickstart/' - body: 'DASE' url: '/templates/classification/dase/' - body: 'How-To' url: '/templates/classification/how-to/' - body: 'Use Alternative Algorithm' url: '/templates/classification/add-algorithm/' - body: 'Read Custom Properties' url: '/templates/classification/reading-custom-properties/' - body: 'Engine Template Gallery' url: '#' children: - body: 'Browse' url: '/gallery/template-gallery/' - body: 'Submit your Engine as a Template' url: '/community/submit-template/' - body: 'Demo Tutorials' url: '#' children: - body: 'Community Contributed Demo' url: '/community/projects.html#demos' - body: 'Text Classification Engine Tutorial' url: '/demo/textclassification/' - body: 'Getting Involved' url: '/community/' children: - body: 'Contribute Code' url: '/community/contribute-code/' - body: 'Contribute Documentation' url: '/community/contribute-documentation/' - body: 'Contribute a SDK' url: '/community/contribute-sdk/' - body: 'Contribute a Webhook' url: '/community/contribute-webhook/' - body: 'Community Projects' url: '/community/projects/' - body: 'Getting Help' url: '#' children: - body: 'FAQs' url: '/resources/faq/' - body: 'Support' url: '/support/' - body: 'Resources' url: '#' children: - body: 'Command-line Interface' url: '/cli/' - body: 'Release Cadence' url: '/resources/release/' - body: 'Developing Engines with IntelliJ IDEA' url: '/resources/intellij/' - body: 'Upgrade Instructions' url: '/resources/upgrade/' - body: 'Glossary' url: '/resources/glossary/' - body: 'Apache Software Foundation' url: '#' children: - body: 'Apache Homepage' url: 'https://www.apache.org/' - body: 'License' url: 'https://www.apache.org/licenses/' - body: 'Sponsorship' url: 'https://www.apache.org/foundation/sponsorship.html' - body: 'Thanks' url: 'https://www.apache.org/foundation/thanks.html' - body: 'Security' url: 'https://www.apache.org/security/' ================================================ FILE: docs/manual/data/versions.yml ================================================ pio: 0.14.0 spark: 2.4.0 spark_download_filename: spark-2.4.0-bin-hadoop2.7 elasticsearch_download_filename: elasticsearch-6.8.1 hbase_version: 1.2.6 hbase_basename: hbase-1.2.6 hbase_variant: bin ================================================ FILE: docs/manual/helpers/application_helpers.rb ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # module ApplicationHelpers def page_title if current_page.data.title content_tag :h1 do rendered_title end else content_tag :h1, class: 'missing' do 'Missing Title' end end end def rendered_title return unless current_page.data.title title = current_page.data.title template = Tilt['erb'].new { title } template.render(self, current_page.data) end def github_url base = 'https://github.com/apache/predictionio/tree/livedoc/docs/manual' path = current_page.source_file.sub(Middleman::Application.root_path.to_s, '') base + path end def page_title_in_nav_menu(nodes) def is_current_page(node) if node.url == current_page.url return true else return false end end if nodes result = "" nodes.each do |node| if node.children node.children.each do |child| if is_current_page(child) result = child end end else if is_current_page(node) result = node end end end if result != "" return result.body else return current_page.data.title end else return "Welcome to Apache PredictionIO Documentation!" end end def link_to_with_active(body, url, options = {}) if url == current_page.url link_to body, url, options.merge(class: [options[:class], 'active'].join(' ')) else link_to body, url, options end end end ================================================ FILE: docs/manual/helpers/breadcrumb_helpers.rb ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # require 'rainbow/ext/string' module BreadcrumbHelpers def breadcrumbs result = false data.nav.main.root.each do |node| result = breadcrumb_search(current_page.url, node) break if result end partial 'nav/breadcrumbs', locals: { crumbs: result } end def breadcrumb_search(path, node, depth = 0, crumb = []) crumb[depth] = node return crumb if node.url == path if node.children node.children.each do |child| result = breadcrumb_search(path, child, depth + 1, crumb) return result if result end end false end end ================================================ FILE: docs/manual/helpers/icon_helpers.rb ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # module IconHelpers def icon(name) if name.nil? %Q{} else %Q{} end end end ================================================ FILE: docs/manual/helpers/table_of_contents_helpers.rb ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # module TableOfContentsHelpers def table_of_contents(resource) content = remove_front_matter_data(File.read(resource.source_file)) extension = File.extname(resource.source_file)[1..-1] # Trim the first dot. if extension != 'md' # Render other extensions first if they exist. template = Tilt[extension].new { content } content = template.render(self, resource.data) end # Now the custom Markdown TOC. markdown = Redcarpet::Markdown.new(Redcarpet::Render::HTML_TOC.new(nesting_level: 2)) # TOC gets confused with Ruby comments inside code blocks so we removed them. content_without_code = content.gsub(/(```[\s\S]*?```)/, '') output = markdown.render(content_without_code) if output.length == 0 return else content_tag :aside, output, id: 'table-of-contents' end end private def remove_front_matter_data(content) yaml_regex = /\A(---\s*\n.*?\n?)^((---|\.\.\.)\s*$\n?)/m if content =~ yaml_regex content = content.sub(yaml_regex, '') end json_regex = /\A(;;;\s*\n.*?\n?)^(;;;\s*$\n?)/m if content =~ json_regex content = content.sub(json_regex, '') end content end end ================================================ FILE: docs/manual/helpers/url_helpers.rb ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # module UrlHelpers def absolute_url(path) URI::Generic.build( scheme: 'https', host: host, path: path ).to_s end end ================================================ FILE: docs/manual/lib/custom_renderer.rb ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # require 'middleman-core/renderers/redcarpet' class CustomRenderer < Middleman::Renderers::MiddlemanRedcarpetHTML def initialize(options = {}) defaults = { with_toc_data: true } super(defaults.merge(options)) end def paragraph(text) case text when/\A(INFO|SUCCESS|WARNING|DANGER|NOTE|TODO):/ convert_alerts(text) else %Q(

#{text}

) end end def header(text, level) id = text.downcase.tr(" ", "-") id = "'" + id + "'" #the anchor before the headings are there to provide proper jumping points. "#{text}" end def block_code(code, language) language = language ? language : 'bash' super end def block_html(raw_html) # Render fenced code blocks first! replace = raw_html.gsub(/(```.*?```)/m) do |match| markdown = Redcarpet::Markdown.new(CustomRenderer, fenced_code_blocks: true) markdown.render(match) end doc = Nokogiri::HTML::DocumentFragment.parse(replace) nodes = doc.css('div.tabs > div') if nodes.empty? raw_html else ul = Nokogiri::XML::Node.new('ul', doc) ul['class'] = 'control' nodes.each do |node| title = node.attribute('data-tab').to_s lang = node.attribute('data-lang').to_s uuid = SecureRandom.uuid id = "tab-#{uuid}" li = Nokogiri::XML::Node.new('li', doc) li['data-lang'] = lang li.inner_html = %Q(#{title}) ul.add_child(li) node['id'] = id end nodes.first.before(ul) doc.to_html end end private def convert_alerts(text) text.gsub(/\A(INFO|SUCCESS|WARNING|DANGER|NOTE|TODO):(.*?)(\n(?=\n)|\z)/m) do css_class = $1.downcase content = $2.strip %Q(

#{content}

) end end end ================================================ FILE: docs/manual/lib/gallery_generator.rb ================================================ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # require 'yaml' require 'uri' module Gallery private INTRO = '--- title: Engine Template Gallery --- Pick a tab for the type of template you are looking for. Some still need to be ported (a simple process) to Apache PIO and these are marked. Also see each Template description for special support instructions. ' BEGIN_TABS = '
' RECOMMENDER_SYSTEMS = '
' CLASSIFICATION = '
' REGRESSION = '
' CLUSTERING = '
' NLP = '
' SIMILARITY = '
' OTHER = '
' TEMPLATE_INTRO = '

%{name}

' STAR_BUTTON ='' TEMPLATE_DETAILS = '

%{description}

Support: %{support}


TypeLanguageLicenseStatusPIO min versionApache PIO Convesion Required
%{type}%{language}%{license}%{status}%{pio_min_version}%{apache_pio_convesion_required}

' SECTION_SEPARATOR ='
' END_TABS ='
' class Template public attr_accessor :has_github, :github_repo, :github_user def initialize(engine) engine.each do |key, val| self.instance_variable_set("@#{key}", val) self.class.send :define_method, key, lambda { self.instance_variable_get("@#{key}") } end @tags = @tags.map{ |s| s.downcase } @has_github = parse_github end private def parse_github uri = URI.parse(@repo) if uri.host == 'github.com' path = uri.path.split('/') raise "Wrong github repo url" unless path.length >= 3 @github_user = path[1] @github_repo = path[2] return true else return false end end end def self.write_template(mdfile, template) intro = TEMPLATE_INTRO % { name: template.name, repo: template.repo } if template.has_github intro += STAR_BUTTON % { user: template.github_user, repo: template.github_repo} end mdfile.write(intro) mdfile.write(TEMPLATE_DETAILS % { description: template.description, type: template.type, language: template.language, license: template.license, status: template.status, support: template.support_link, pio_min_version: template.pio_min_version, apache_pio_convesion_required: template.apache_pio_convesion_required }) end def self.write_templates(mdfile, templates) templates.each do |t| write_template(mdfile, t) end end def self.write_markdown(mdfile, templates) recommenders = templates.select{ |engine| engine.tags.include? 'recommender' } classification = templates.select{ |engine| engine.tags.include? 'classification' } regression = templates.select{ |engine| engine.tags.include? 'regression' } similarity = templates.select{ |engine| engine.tags.include? 'similarity' } nlps = templates.select{ |engine| engine.tags.include? 'nlp' } clustering = templates.select{ |engine| engine.tags.include? 'clustering' } others = templates.select{ |engine| engine.tags.include? 'other' } mdfile.write(INTRO) mdfile.write(BEGIN_TABS) mdfile.write(RECOMMENDER_SYSTEMS) write_templates(mdfile, recommenders) mdfile.write(SECTION_SEPARATOR) mdfile.write(CLASSIFICATION) write_templates(mdfile, classification) mdfile.write(SECTION_SEPARATOR) mdfile.write(REGRESSION) write_templates(mdfile, regression) mdfile.write(SECTION_SEPARATOR) mdfile.write(NLP) write_templates(mdfile, nlps) mdfile.write(SECTION_SEPARATOR) mdfile.write(CLUSTERING) write_templates(mdfile, clustering) mdfile.write(SECTION_SEPARATOR) mdfile.write(SIMILARITY) write_templates(mdfile, similarity) mdfile.write(SECTION_SEPARATOR) mdfile.write(OTHER) write_templates(mdfile, others) mdfile.write(SECTION_SEPARATOR) mdfile.write(END_TABS) end public def self.generate_md(yaml_file_path, out_file_path) File.open(yaml_file_path) do |in_file| File.open(out_file_path, 'w') do |out_file| templates = YAML.load(in_file) parsed = templates.map{ |t| Template.new(t["template"]) } write_markdown(out_file, parsed) end end end end ================================================ FILE: docs/manual/source/404.html.md ================================================ --- title: Error 404 description: Page not found! --- # Page Not Found Sorry the page you were looking for was not found :( ================================================ FILE: docs/manual/source/algorithm/custom.html.md ================================================ --- title: Adding your own Algorithms --- (Coming soon) ================================================ FILE: docs/manual/source/algorithm/index.html.md ================================================ --- title: Built-in Algorithm Libraries --- An engine can virtually call any algorithm in the Algorithm class. Apache PredictionIO currently offers native support to [Spark MLlib](http://spark.apache.org/docs/latest/mllib-guide.html) machine learning library. It is being used by some of the engine templates in the [template gallery](/gallery/template-gallery). More library support will be added soon. ================================================ FILE: docs/manual/source/algorithm/multiple.html.md ================================================ --- title: Combining Multiple Algorithms --- You can use more than one algorithm to build multiple models in an engine. The predicted results can be combined in the Serving class. Here are some How-to examples: * [Similar Product template - Multiple Events and Multiple Algorithms](/templates/similarproduct/multi-events-multi-algos/) ================================================ FILE: docs/manual/source/algorithm/switch.html.md ================================================ --- title: Switching to Another Algorithm --- Every engine template comes with default algorithm(s). To switch to another algorithm, you simply need to modify the Algorithm class. Here are some How-to examples: * [Classification template - switching from NaiveBayes to Random Forests](/templates/classification/add-algorithm/) ================================================ FILE: docs/manual/source/appintegration/index.html.md ================================================ --- title: App Integration Overview --- Apache PredictionIO is designed as a machine learning server that integrates with your applications on production environments. A web or mobile app normally: 1. Send event data to Apache PredictionIO's Event Server for model training 2. Send dynamic queries to deployed engine(s) to retrieve predicted results ![Apache PredictionIO Single Engine Overview](/images/overview-singleengine.png) ## Sending Event Data Apache PredictionIO's Event Server receives event data from your application. The data can be used by engines as training data to build predictive models. Event Server listens to port 7070 by default. You can change the port with the [--port arg](/cli/#event-server-commands) when you launch the Event Server. For further information, please read: * [Event Server Overview](/datacollection/) * [Collecting Data with REST/SDKs](/datacollection/eventapi) ## Sending Query After you deploy an engine as a web service, it will wait for queries from your application and return predicted results in JSON format. An engine listens to port 8000 by default. If you want to deploy multiple engines, you can specific a different port for each of them. For further information, please read: * [Deploying an Engine as a Web Service](/deploy/) ================================================ FILE: docs/manual/source/archived/community.html.md ================================================ --- title: Archived Community Projects --- ## SDKs ### Node.js SDK for PredictionIO URL: https://github.com/asafyish/predictionio-driver and https://www.npmjs.org/package/predictionio-driver Node.js PredictionIO 0.8+ client supporting both callback syntax and promise syntax. - Core Author: Asaf Yishai - Status: It works with PredictionIO v0.8 - Under active development ### C#/.NET SDK for PredictionIO URL: https://github.com/orbyone/Sensible.PredictionIO.NET C#/.NET library for PredictionIO 0.9.4, supporting both synchronous and asynchronous calls, for item recommendation and item ranking algorithms. Loosely based on the PredictionIO Java SDK API. - Core Author: Themos Piperakis - Status: It works with PredictionIO v0.9.4 - Under active development ### .NET SDK for PredictionIO URL: https://github.com/ibrahimozgon/PredictionIO-.Net-SDK .NET SDK for PredictionIO - Core Author: Ibrahim Özgön - Status: It works with PredictionIO v0.9 - Under active development ## Installations ### Vagrant Installation for Apache PredictionIO® URL: https://github.com/PredictionIO/PredictionIO-Vagrant Bring Up PredictionIO 0.9.x VM with Vagrant. - Core Author: Raphael Mäder - Status: It works with PredictionIO v0.8 - Under active development ### Another Docker Installation for Apache PredictionIO® URL: https://github.com/sphereio/docker-predictionio Docker container for PredictionIO-based machine learning services. - Core Author: Fabian M. Borschel - Status: It works with PredictionIO v0.9.3 - Under active development ## Extensions ### GraphX Parallel SimRank Algorithm URL: https://github.com/ZhouYii/PIO-Parallel-Simrank-Engine Implementation of Delta-Simrank algorithm using Spark's GraphX framework. - Core Author: Joey Zhou - Status: It works with PredictionIO v0.8 - Under active development ### Magento Similar Products Extension URL: https://github.com/magento-hackathon/Predictionio Similar Products is a Magento extension that utilizes PredictionIO to create a more personalized suggestion of up-sell products on the Magento product page. - Core Author: Steven Richardson, Raphael Mäder & Damian Luszczymak - Status: It works with PredictionIO v0.8 - Under active development ## Wrappers ### Lavarel Wrapper for PredictionIO URL: https://github.com/michael-hopkins/PredictionIO-Laravel-Wrapper and https://packagist.org/packages/hopkins/predictionio-laravel-wrapper A Laravel wrapper for PredictionIO v0.8. - Core Author: Bruno Cabral & Michael Hopkins - Status: It works with PredictionIO v0.8 - Under active development ### Magento 2 Personalised Products Module URL: https://github.com/richdynamix/personalised-products Personalised Products is a Magento 2 module that will serve realtime predicted suggestions for product upsells on the product page and complimentary suggestions for cross sells on the basket page. All powered by PredictionIO using the [Similar Product](/gallery/template-gallery/#recommender-systems "Similar Product") engine and the [Complementary Purchase](/gallery/template-gallery/#unsupervised-learning "Complementary Purchase") engine. - Core Author: Steven Richardson - Status: It works with PredictionIO v0.9.5 - Under active development ## Demos ### NoGoodGamez NoGoodGamez PS3/PS4 game Recommendation built by [pashadude](https://github.com/pashadude/) URL: http://nogoodgamez.com ### OnTapp OnTapp Beer recommendation app built by [Victor Leung](https://twitter.com/victorleungtw). Writeup: https://medium.com/@victorleungtw/beer-recommendation-engine-using-predictionio-36488ea0c50d ### Yelpio OnTapp Business Recommendation built by [TRAN QUOC HOAN](https://twitter.com/k09ht), [Inhwan Lee](https://github.com/ihlee01), and 山本直人. URL: http://yelpio.hongo.wide.ad.jp/ ================================================ FILE: docs/manual/source/archived/index.html.md ================================================ --- title: List of Archived Pages --- ## Archived Contribution Please help to move pages here if they are no-longer maintained ================================================ FILE: docs/manual/source/archived/install-linux.html.md.erb ================================================ --- title: Installing Apache PredictionIO on Linux / Mac OS X --- Follow the steps below to setup Apache PredictionIO and its dependencies. In these instructions we will assume you are in your home directory. Wherever you see `/home/abc`, replace it with your own home directory. ### Java Ensure you have an appropriate Java version installed. For example: ``` $ java -version java version "1.8.0_40" Java(TM) SE Runtime Environment (build 1.8.0_40-b25) Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode) ``` ### Download Apache PredictionIO Download Apache PredictionIO and extract it. ``` $ cd $ pwd /home/abc $ wget http://download.prediction.io/PredictionIO-<%= data.versions.pio %>.tar.gz $ tar zxvf PredictionIO-<%= data.versions.pio %>.tar.gz ``` NOTE: Download instructions above apply to previous non-Apache releases only. Once we have made an Apache release, new instructions will be provided. ### Installing Dependencies Let us install dependencies inside a subdirectory of the Apache PredictionIO installation. By following this convention, you can use PredictionIO's default configuration as is. ``` $ mkdir PredictionIO-<%= data.versions.pio %>/vendors ``` #### Spark Setup <%= partial 'shared/install/spark' %> #### Elasticsearch Setup WARNING: You may skip this section if you are using PostgreSQL or MySQL. <%= partial 'shared/install/elasticsearch' %> #### HBase Setup  WARNING: You may skip this section if you are using PostgreSQL or MySQL. <%= partial 'shared/install/hbase' %> In addition, you must set your environment variable `JAVA_HOME`. For example, in `/home/abc/.bashrc` add the following line: ``` export JAVA_HOME=/usr/lib/jvm/java-8-oracle ``` <%= partial 'shared/install/dependent_services' %> Now you have installed everything you need! <%= partial 'shared/install/proceed_template' %> ================================================ FILE: docs/manual/source/archived/install-vagrant.html.md.erb ================================================ --- title: Installing PredictionIO with Vagrant (VirtualBox) --- WARNING: Running PredictionIO with Vagrant is intended for the purposes of simple tests in an isolated environment. Due to resource limitation and overhead of virtual machine (VM), it runs much more slowly or may encounter memory issue. We recommend using Linux or Mac machine for serious usage. ## Install VirtualBox If you don't have VirtualBox installed, please follow the instructions in the [VirtualBox site](https://www.virtualbox.org/wiki/Downloads) to download and install it. After installation is done, you don't need to setup anything in Virtual Box. Vagrant will do it for you later. ## Install Vagrant If you don't have Vagrant installed, please follow the instructions in the the [Vagrant site](https://www.vagrantup.com/downloads.html) to download and install it. ## Bring up PredictoinIO VM with Vagrant Get the latest vagrant setup from github and make sure in master branch: ``` $ git clone https://github.com/PredictionIO/PredictionIO-Vagrant.git $ cd PredictionIO-Vagrant/ $ git checkout master ``` Inside the directory `PredictionIO-Vagrant/`, you will find a file named `Vagrantfile` which is the configuration file used by Vagrant to setup the VM. You may modify this file if you want to change the VM configuration. For example, if you want to change the memory of the VM, you can locate the following line in the `Vagrantfile` and change the value passed to the `memory` parameter (default is 2048MB): ``` v.customize ["modifyvm", :id, "--cpuexecutioncap", "90", "--memory", "2048"] ``` In the directory `PredictionIO-Vagrant/`, bring up PredictionIO VM by running: ``` $ vagrant up ``` INFO: When you run `vagrant up` for the first time, it will download the base box ubuntu/trusty64 if you don't have it. Then it will also install all necessary libraries and setup PredictionIO in the virtual machine. When it finishes successfully, you should see something like the following: ``` ==> default: Installation done! ==> default: -------------------------------------------------------------------------------- ==> default: Installation of PredictionIO <%= data.versions.pio %> complete! ==> default: IMPORTANT: You still have to start PredictionIO and dependencies manually: ==> default: Run: 'pio-start-all' ==> default: Check the status with: 'pio status' ==> default: Use: 'pio [train|deploy|...]' commands ==> default: Please report any problems to: dev@predictionio.apache.org ==> default: Documentation at: http://predictionio.apache.org ==> default: -------------------------------------------------------------------------------- ==> default: Finish PredictionIO installation. ``` That's it! Now you have a PredictionIO VM running! Please see the following notes regarding how to use PredictionIO VM with vagrant. ## Using the PredictionIO VM ### Login to the VM You could ssh to the VM by running the following from your host machine in the same directory where you run `vagrant up` (i.e. PredictionIO-Vagrant/) ``` $ vagrant ssh ``` Then your console prompt becomes something like the following, which means you have logged into the VM: ``` vagrant@vagrant-ubuntu-trusty-64:~$ ``` One you've logged into the VM, you can proceed to [Choosing an Engine Template](/start/download) or continue the QuickStart of the Engine template you have chosen. ### Shutdown and bring up PredictionIO VM again When you are not using PredictionIO VM, you should shut down VM properly, by running the following **in the host machine** (not inside VM): ``` $ vagrant halt ``` WARNING: If you didn't shut down VM properly or you ran `vagrant suspend`, the VM may go to suspend state. HBase may not be running properly next time when you run `vagrant up.` In this case, you can always run `vagrant halt` to do a clean shutdown first before run `vagrant up` again. Then you can run `vagrant up` again later to bring up the PredicitonIO VM again. ``` $ vagrant up ``` When it's ready, you should see the following: ``` ==> default: -------------------------------------------------------------------------------- ==> default: PredictionIO VM is up! ==> default: You could run 'pio status' inside VM ('vagrant ssh' to VM first) to confirm if PredictionIO is ready. ==> default: IMPORTANT: You still have to start the eventserver manually (inside VM): ==> default: Run: 'pio eventserver' ==> default: -------------------------------------------------------------------------------- ``` ================================================ FILE: docs/manual/source/archived/launch-aws.html.md.erb ================================================ --- title: Launching PredictionIO on AWS --- Deploying PredictionIO on Amazon Web Services is extremely easy thanks to AWS Marketplace. As long as you have access to AWS, you can launch a ready-to-use PredictionIO Amazon EC2 instance with a single click. ## Prerequisites * Amazon Web Services account * Amazon EC2 ## Access AWS Marketplace Visit [PredictionIO product's page on AWS Marketplace](https://aws.amazon.com/marketplace/pp/B00RPIFSYS/) and sign in with your AWS account. ## Using 1-Click Launch You should see the following screen after you have logged in. ![alt text](../images/awsm-product.png) Under the big yellow "Continue" button, select the region where you want to launch the PredictionIO EC2 instance, then click "Continue". ![alt text](../images/awsm-1click.png) Review your instance's settings before launching. For quick prototyping work, we recommend using the "memory optimized" instances for the cheapest memory configurations at least the "Memory Optimized R3 (r3.large)" or for larger datasets the "(r3.xlarge)". ## Setting Security Group The default security group, marked by "AutogenByAWSMP", has the following ports opened to public: * 22 (SSH) * 7070 (PredictionIO Event Server) * 8000 (PredictionIO Server) * 8080 (Spark Master) * 9200 (Elasticsearch) ## Start Using PredictionIO It may take a few minutes after the EC2 instance has launched for all PredictionIO components to become ready. When they are ready, you may connect to your instance, see [AWS documentation](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-connect-to-instance-linux.html) for more details. Once you connect to your instance, you can find PredictionIO at `/opt/PredictionIO` and the binary command path is `/opt/PredictionIO/bin`. <%= partial 'shared/install/proceed_template' %> NOTE: The AWS instance will have all PredictionIO components automatically started for you, so you could safely skip the **pio-start-all** command as described in QuickStart. ================================================ FILE: docs/manual/source/archived/supervisedlearning.html.md ================================================ --- title: Machine Learning With PredictionIO --- This guide is designed to give developers a brief introduction to fundamental concepts in machine learning, as well as an explanation of how these concept tie into PredictionIO's engine development platform. This particular guide will largely deal with giving some ## Introduction to Supervised Learning The first question we must ask is: what is machine learning? **Machine learning** is the field of study at the intersection of computer science, engineering, mathematics, and statistics which seeks to discover or infer patterns hidden within a set of observations, which we call our data. Some examples of problems that machine learning seeks to solve are: - Predict whether a patient has breast cancer based on their mammogram results. - Predict whether an e-mail is spam or not based on the e-mail's content. - Predict today's temperature based on climate variables collected for the previous week. ### Thinking About Data In the latter examples, we are trying to predict an outcome \\(Y\\), or **response**, based on some recorded or observed variables \\(X\\), or **features**. For example: in the third problem each observation is a patient, the response variable \\(Y\\) is equal to 1 if this patient has breast cancer and 0 otherwise, and \\(X\\) represents the mammogram results. When we say we want to predict \\(Y\\) using \\(X\\), we are trying to answer the question: how does a response \\(Y\\) depend on a set of features \\(X\\) affect the response \\(Y\\)? To do this we need a set of observations, which we call our **training data**, consisting of observations for which we have observed both \\(Y\\) and \\(X\\), in order to make inference about this relationship. ### Different Types of Supervised Learning Problems Note that in the first two examples, the outcome \\(Y\\) can only take on two values (1 : cancer/spam, 0: no cancer/ no spam). Whenever the outcome variable \\(Y\\) denotes a label associated to a particular group of observations (i.e. cancer group), the **supervised learning** problem is also called a **classification** problem. In the third example, however, \\(Y\\) can take on any numerical value since it denotes some temperature reading (i.e. 25.143, 25.14233, 32.0). These types of supervised learning problems are also called **regression** problems. ### Training a Predictive Model A predictive model should be thought of as a function \\(f\\) that takes as input a set of features, and outputs a predicted outcome (i.e. \\(f(X) = Y\\)). The phrase **training a model** simply refers to the process of using the training data to estimate such a function. ## PredictionIO and Supervised Learning Machine learning methods generally assume that our observation responses and features are numeric vectors. We will say that observations in this format are in **standard form**. However, when you are working with real-life data this will often not be the case. The data will often be formatted in a manner that is specific to the application's needs. As an example, let's suppose our application is [StackOverFlow](http://stackoverflow.com). The data we want to analyze are questions, and we want to predict based on a question's content whether or not it is related to Scala. **Self-check:** Is this a classification or regression problem? ### Thinking About Data With PredictionIO PredictionIO's predictive engine development platform allows you to easily incorporate observations that are not in standard form. Continuing with our example, we can import the observations, or StackOverFlow questions, into [PredictionIO's Event Server](/datacollection/) as events with the following properties: `properties = {question : String, topic : String}` The value `question` is the actual question stored as a `String`, and topic is also a string equal to either `"Scala"` or `"Other"`. Our outcome here is `topic`, and `question` will provide a source for extracting features. That is, we will be using `question` to predict the outcome `topic`. Once the observations are loaded as events into the Event Server, the engine's [Data Source](/customize/) component is able to read them, which allows you to treat them as objects in a Scala project. The engine's Preparator component is in charge of converting these observations into standard form. To do this, we can first map the topic values as follows: `Map("Other" -> 0, "Scala" -> 1)`. We can then vectorize the observation's associated question text to obtain a numeric feature vector for each of our observations. This text vectorization procedure is an example of a general concept in machine learning called **feature extraction**. After performing these transformations of our observations, they are now in standard form and can be used for training a large quantity of machine learning models. ### Training the Model With PredictionIO The Algorithm engine component serves two purposes: outputting a predictive model \\(f\\) and using this to predict the outcome variable. Here \\(f\\) takes as input a vectorized question and outputs either 0 or 1. However, our `Query` input will be again a question, and our `PredictedResult` the topic associated to the predicted label (0 or 1): `Query = {question : String}` `PredictedResult = {topic : String}` With PredictionIO's engine development platform, you can easily automate the vectorization of the Query question, as well as mapping the predicted label to the appropriate topic output format. ================================================ FILE: docs/manual/source/archived/tapster.html.md ================================================ --- title: Comics Recommendation Demo --- ## Introduction In this demo, we will show you how to build a Tinder-style web application (named "Tapster") recommending comics to users based on their likes/dislikes of episodes interactively. The demo will use [Similar Product Template](https://predictionio.apache.org/templates/similarproduct/quickstart/). Similar Product Template is a great choice if you want to make recommendations based on immediate user activities or for new users with limited history. It uses MLLib Alternating Least Squares (ALS) recommendation algorithm, a [Collaborative filtering](http://en.wikipedia.org/wiki/Recommender_system#Collaborative_filtering) (CF) algorithm commonly used for recommender systems. These techniques aim to fill in the missing entries of a user-item association matrix. Users and products are described by a small set of latent factors that can be used to predict missing entries. A layman's interpretation of Collaborative Filtering is "People who like this comic, also like these comics." All the code and data is on GitHub at: [github.com/PredictionIO/Demo-Tapster](https://github.com/PredictionIO/Demo-Tapster). ### Data The source of the data is from [Tapastic](http://tapastic.com/). You can find the data files [here](https://github.com/PredictionIO/Demo-Tapster/tree/master/data). The data structure looks like this: [Episode List](https://github.com/PredictionIO/Demo-Tapster/blob/master/data/episode_list.csv) `data/episode_list.csv` **Fields:** episodeId | episodeTitle | episodeCategories | episodeUrl | episodeImageUrls 1,000 rows. Each row represents one episode. [User Like Event List](https://github.com/PredictionIO/Demo-Tapster/blob/master/data/user_list.csv) `data/user_list.csv` **Fields:** userId | episodeId | likedTimestamp 192,587 rows. Each row represents one user like for the given episode. The tutorial has four major steps: - Demo application setup - PredictionIO installation and setup - Import data into database and PredictionIO - Integrate demo application with PredictionIO ## Tapster Demo Application The demo application is built using Rails. You can clone the existing application with: ``` $ git clone https://github.com/PredictionIO/Demo-Tapster.git $ cd Demo-Tapster $ bundle install ``` You will need to edit `config/database.yml` to match your local database settings. We have provided some sensible defaults for PostgreSQL, MySQL, and SQLite. Setup the database with: ``` $ rake db:create $ rake db:migrate ``` At this point, you should have the demo application ready but with an empty database. Lets import the episodes data into our database. We will do this with: `$ rake import:episodes`. An "Episode" is a single [comic strip](http://en.wikipedia.org/wiki/Comic_strip). [View on GitHub](https://github.com/PredictionIO/Demo-Tapster/blob/master/lib/tasks/import/episodes.rake) This script is pretty simple. It loops through the CSV file and creates a new episode for each line in the file in our local database. You can start the app and point your browser to [http://localhost:3000](http://localhost:3000) ``` $rails server ``` ![Rails Server](/images/demo/tapster/rails-server.png) ## Apache PredictionIO Setup ### Install Apache PredictionIO Follow the installation instructions [here](http://predictionio.apache.org/install/) or simply run: ``` $ bash -c "$(curl -s https://raw.githubusercontent.com/apache/predictionio/master/bin/install.sh)" ``` ![PIO Install](/images/demo/tapster/pio-install.png) ### Create a New App You will need to create a new app on Apache PredictionIO to house the Tapster demo. You can do this with: ``` $ pio app new tapster ``` Take note of the App ID and Access Key. ![PIO App New](/images/demo/tapster/pio-app-new.png) ### Setup Engine We are going to copy the Similar Product Template into the PIO directory. ``` $ cd PredictionIO $ git clone https://github.com/apache/predictionio-template-similar-product.git tapster-episode-similar ``` Next we are going to update the App ID in the ‘engine.json’ file to match the App ID we just created. ``` $ cd tapster-episode-similar $ nano engine.json $ cd .. ``` ![Engine Setup](/images/demo/tapster/pio-engine-setup.png) ### Modify Engine Template By the default, the engine template reads the “view” events. We can easily to change it to read “like” events. Modify `readTraining()` in DataSource.scala: ```scala override def readTraining(sc: SparkContext): TrainingData = { ... val viewEventsRDD: RDD[ViewEvent] = eventsDb.find( appId = dsp.appId, entityType = Some("user"), eventNames = Some(List("like")), // MODIFIED // targetEntityType is optional field of an event. targetEntityType = Some(Some("item")))(sc) // eventsDb.find() returns RDD[Event] .map { event => val viewEvent = try { event.event match { case "like" => ViewEvent( // MODIFIED user = event.entityId, item = event.targetEntityId.get, t = event.eventTime.getMillis) case _ => throw new Exception(s"Unexpected event ${event} is read.") } } catch { case e: Exception => { logger.error(s"Cannot convert ${event} to ViewEvent." + s" Exception: ${e}.") throw e } } viewEvent } ... } } ``` Finally to build the engine we will run: ``` $ cd tapster-episode-similar $ pio build $ cd .. ``` ![PIO Build](/images/demo/tapster/pio-build.png) ## Import Data Once everything is installed, start the event server by running: `$ pio eventserver` ![Event Server](/images/demo/tapster/pio-eventserver.png) INFO: You can check the status of Apache PredictionIO at any time by running: `$ pio status` ALERT: If your laptop goes to sleep you might manually need to restart HBase with: ``` $ cd PredictionIO/venders/hbase-0.98.6/bin $ ./stop-hbase.sh $ ./start-hbase.sh ``` The key event we are importing into Apache PredictionIO event server is the "Like" event (for example, user X likes episode Y). We will send this data to Apache PredictionIO by executing `$ rake import:predictionio` command. [View on GitHub](https://github.com/PredictionIO/Demo-Tapster/blob/master/lib/tasks/import/predictionio.rake) This script is a little more complex. First we need to connect to the Event Server. ``` client = PredictionIO::EventClient.new(ENV['PIO_ACCESS_KEY'], ENV['PIO_EVENT_SERVER_URL'], THREADS) ``` You will need to create the environmental variables `PIO_ACCESS_KEY` and `PIO_EVENT_SERVER_URL`. The default Event Server URL is: http://localhost:7070. INFO: If you forget your **Access Key** you can always run: `$ pio app list` You can set these values in the `.env` file located in the application root directory and it will be automatically loaded into your environment each time Rails is run. The next part of the script loops through each line of the `data/user_list.csv` file and returns an array of unique user and episode IDs. Once we have those we can send the data to Apache PredictionIO like this. First the users: ``` user_ids.each_with_index do |id, i| # Send unique user IDs to PredictionIO. client.aset_user(id) puts "Sent user ID #{id} to PredictionIO. Action #{i + 1} of #{user_count}" end ``` And now the episodes: ``` episode_ids.each_with_index do |id, i| # Load episode from database - we will need this to include the categories! episode = Episode.where(episode_id: id).take if episode # Send unique episode IDs to PredictionIO. client.acreate_event( '$set', 'item', id, properties: { categories: episode.categories } ) puts "Sent episode ID #{id} to PredictionIO. Action #{i + 1} of #{episode_count}" else puts "Episode ID #{id} not found in database! Skipping!".color(:red) end end ``` Finally we loop through the `data/user_list.csv` file a final time to send the like events: ``` CSV.foreach(USER_LIST, headers: true) do |row| user_id = row[0] # userId episode_id = row[1] # episodeId # Send like to PredictionIO. client.acreate_event( 'like', 'user', user_id, { 'targetEntityType' => 'item', 'targetEntityId' => episode_id } ) puts "Sent user ID #{user_id} liked episode ID #{episode_id} to PredictionIO. Action #{$INPUT_LINE_NUMBER} of #{line_count}." end ``` In total the script takes about 4 minutes to run on a basic laptop. At this point all the data is now imported to Apache PredictionIO. ![Import](/images/demo/tapster/pio-import-predictionio.png) ### Engine Training We train the engine with the following command: ``` $ cd tapster-episode-similar $ pio train -- --driver-memory 4g ``` ![PIO Train](/images/demo/tapster/pio-train.png) Using the --driver-memory option to limit the memory used by Apache PredictionIO. Without this Apache PredictionIO can consume too much memory leading to a crash. You can adjust the 4g up or down depending on your system specs. You can set up a job to periodically retrain the engine so the model is updated with the latest dataset. ### Deploy Model You can deploy the model with: `$ pio deploy` from the `tapster-episode-similar` directory. At this point, you have an demo app with data and a Apache PredictionIO server with a trained model all setup. Next, we will connect the two so you can log the live interaction (likes) events into Apache PredictionIO event server and query the engine server for recommendation. ## Connect Demo app with Apache PredictionIO ### Overview On a high level the application keeps a record of each like and dislike. It uses jQuery to send an array of both likes and dislikes to the server on each click. The server then queries Apache PredictionIO for a similar episode which is relayed to jQuery and displayed to the user. Data flow: - The user likes an episode. - Tapster sends the "Like" event to Apache PredictionIO event server. - Tapster queries Apache PredictionIO engine with all the episodes the user has rated (likes and dislikes) in this session. - Apache PredictionIO returns 1 recommended episode. ### JavaScript All the important code lives in `app/assets/javascripts/application.js` [View on GitHub](https://github.com/PredictionIO/Demo-Tapster/blob/master/app/assets/javascripts/application.js) Most of this file is just handlers for click things, displaying the loading dialog and other such things. The most important function is to query the Rails server for results from Apache PredictionIO. ``` // Query the server for a comic based on previous likes. See episodes#query. queryPIO: function() { var _this = this; // For closure. $.ajax({ url: '/episodes/query', type: 'POST', data: { likes: JSON.stringify(_this.likes), dislikes: JSON.stringify(_this.dislikes), } }).done(function(data) { _this.setComic(data); }); } ``` ### Rails On the Rails side all the fun things happen in the episodes controller located at: `app/controllers/episodes_controller` [View on GitHub](https://github.com/PredictionIO/Demo-Tapster/blob/master/app/controllers/episodes_controller.rb). ``` def query # Create PredictionIO client. client = PredictionIO::EngineClient.new(ENV['PIO_ENGINE_URL']) # Get posted likes and dislikes. likes = ActiveSupport::JSON.decode(params[:likes]) dislikes = ActiveSupport::JSON.decode(params[:dislikes]) if likes.empty? # We can't query PredictionIO with no likes so # we will return a random comic instead. @episode = random_episode render json: @episode return end # Query PredictionIO. # Here we black list the disliked items so they are not shown again! response = client.send_query(items: likes, blackList: dislikes, num: 1) # With a real application you would want to do some # better sanity checking of the response here! # Get ID of response. id = response['itemScores'][0]['item'] # Find episode in database. @episode = Episode.where(episode_id: id).take render json: @episode end ``` On the first line we make a connection to Apache PredictionIO. You will need to set the `PIO_ENGINE_URL`. This can be done in the `.env` file. The default URL is: http://localhost:8000. Next we decode the JSON sent from the browser. After that we check to see if the user has liked anything yet. If not we just return a random episode. If the user has likes then we can send that data to Apache PredictionIO event server. We also blacklist the dislikes so that they are not returned. With our response from Apache PredictionIO it’s just a matter of looking it up in the database and rendering that object as JSON. Once the response is sent to the browser JavaScript is used to replace the existing comic and hide the loading message. Thats it. You’re done! If Ruby is not your language of choice check out our other [SDKs](http://predictionio.apache.org/sdk/) and remember you can always interact with the Event Server though it’s native JSON API. ## Links Source code is on GitHub at: [github.com/PredictionIO/Demo-Tapster](https://github.com/PredictionIO/Demo-Tapster) ## Conclusion Love this tutorial and Apache PredictionIO? Both are open source (Apache 2 License). [Fork](https://github.com/PredictionIO/Demo-Tapster) this demo and build upon it. If you produce something cool shoot us an email and we will link to it from here. Found a typo? Think something should be explained better? This tutorial (and all our other documentation) live in the main repo [here](https://github.com/apache/predictionio/blob/livedoc/docs/manual/source/demo/tapster.html.md). Our documentation is in the `livedoc` branch. Find out how to contribute documentation at http://predictionio.apache.org/community/contribute-documentation/]. We ♥ pull requests! ================================================ FILE: docs/manual/source/batchpredict/index.html.md ================================================ --- title: Batch Predictions --- ##Overview Process predictions for many queries using efficient parallelization through Spark. Useful for mass auditing of predictions and for generating predictions to push into other systems. Batch predict reads and writes multi-object JSON files similar to the [batch import](/datacollection/batchimport/) format. JSON objects are separated by newlines and cannot themselves contain unencoded newlines. ##Compatibility `pio batchpredict` loads the engine and processes queries exactly like `pio deploy`. There is only one additional requirement for engines to utilize batch predict: WARNING: All algorithm classes used in the engine must be [serializable](https://www.scala-lang.org/api/2.11.8/index.html#scala.Serializable). **This is already true for PredictionIO's base algorithm classes**, but may be broken by including non-serializable fields in their constructor. Using the [`@transient` annotation](http://fdahms.com/2015/10/14/scala-and-the-transient-lazy-val-pattern/) may help in these cases. This requirement is due to processing the input queries as a [Spark RDD](https://spark.apache.org/docs/latest/rdd-programming-guide.html#resilient-distributed-datasets-rdds) which enables high-performance parallelization, even on a single machine. ##Usage ### `pio batchpredict` Command to process bulk predictions. Takes the same options as `pio deploy` plus: ### `--input ` Path to file containing queries; a multi-object JSON file with one query object per line. Accepts any valid Hadoop file URL. Default: `batchpredict-input.json` ### `--output ` Path to file to receive results; a multi-object JSON file with one object per line, the prediction + original query. Accepts any valid Hadoop file URL. Actual output will be written as Hadoop partition files in a directory with the output name. Default: `batchpredict-output.json` ### `--query-partitions ` Configure the concurrency of predictions by setting the number of partitions used internally for the RDD of queries. This will directly effect the number of resulting `part-*` output files. While setting to `1` may seem appealing to get a single output file, this will remove parallelization for the batch process, reducing performance and possibly exhausting memory. Default: number created by Spark context's `textFile` (probably the number of cores available on the local machine) ### `--engine-instance-id ` Identifier for the trained instance to use for batch predict. Default: the latest trained instance. ##Example ###Input A multi-object JSON file of queries as they would be sent to the engine's HTTP Queries API. NOTE: Read via [SparkContext's `textFile`](https://spark.apache.org/docs/latest/rdd-programming-guide.html#external-datasets) and so may be a single file or any supported Hadoop format. File: `batchpredict-input.json` ```json {"user":"1"} {"user":"2"} {"user":"3"} {"user":"4"} {"user":"5"} ``` ###Execute ```bash pio batchpredict \ --input batchpredict-input.json \ --output batchpredict-output.json ``` This command will run to completion, aborting if any errors are encountered. ###Output A multi-object JSON file of predictions + original queries. The predictions are JSON objects as they would be returned from the engine's HTTP Queries API. NOTE: Results are written via Spark RDD's `saveAsTextFile` so each partition will be written to its own `part-*` file. See [post-processing results](#post-processing-results). File 1: `batchpredict-output.json/part-00000` ```json {"query":{"user":"1"},"prediction":{"itemScores":[{"item":"1","score":33},{"item":"2","score":32}]}} {"query":{"user":"3"},"prediction":{"itemScores":[{"item":"2","score":16},{"item":"3","score":12}]}} {"query":{"user":"4"},"prediction":{"itemScores":[{"item":"3","score":19},{"item":"1","score":18}]}} ``` File 2: `batchpredict-output.json/part-00001` ```json {"query":{"user":"2"},"prediction":{"itemScores":[{"item":"5","score":55},{"item":"3","score":28}]}} {"query":{"user":"5"},"prediction":{"itemScores":[{"item":"1","score":24},{"item":"4","score":14}]}} ``` ###Post-processing Results After the process exits successfully, the parts may be concatenated into a single output file using a command like: ```bash cat batchpredict-output.json/part-* > batchpredict-output-all.json ``` ================================================ FILE: docs/manual/source/cli/index.html.md ================================================ --- title: Command Line --- ##Overview Interaction with Apache PredictionIO is done through the command line interface. It follows the format of: ```pio [options] ...``` You can run ```pio help``` to see a list of all available commands and ```pio help ``` to see details of the command. Apache PredictionIO commands can be separated into the following three categories. ##General Commands ```pio help``` Display usage summary. `pio help ` to read about a specific subcommand. ```pio version``` Displays the version of the installed PredictionIO. ```pio status``` Displays install path and running status of PredictionIO system and its dependencies. ##Event Server Commands ```pio eventserver``` Launch the Event Server. ```pio app``` Manage apps that are used by the Event Server. ```pio app data-delete ``` deletes all data associated with the app. ```pio app delete ``` deletes the app and its data. ```--ip ``` IP to bind to. Default to localhost. ```--port ``` Port to bind to. Default to 7070. ```pio accesskey``` Manage app access keys. ##Engine Commands Engine commands need to be run from the directory that contains the engine project. ```--debug``` and ```--verbose``` flags will provide debug and third-party informational messages. ```pio build``` Build the engine at the current directory. ```pio train``` Kick off a training using an engine. ```pio deploy``` Deploy an engine as an engine server. ```pio batchpredict``` Process bulk predictions using an engine. For ```deploy``` & ```batchpredict```, if ```--engine-instance-id``` is not specified, it will use the latest trained instance. ================================================ FILE: docs/manual/source/community/contribute-code.html.md ================================================ --- title: Contribute Code --- Thank you for your interest in contributing to Apache PredictionIO. Our mission is to enable developers to build scalable machine learning applications easily. Here is how you can help with the project development. If you have any question regarding development at anytime, please free to [subscribe](mailto:dev-subscribe@predictionio.apache.org) and post to the [Development Mailing List](mailto:dev-subscribe@predictionio.apache.org). ## Areas in Need of Help We accept contributions of all kinds at any time. We are compiling this list to show features that are highly sought after by the community. - Tests and CI - Engine template, tutorials, and samples - Client SDKs - Building engines in Java (updating the Java controller API) - Code clean up and refactoring - Code and data pipeline optimization - Developer experience (UX) improvement ## How to Report an Issue If you wish to report an issue you found, you can do so on [Apache PredictionIO JIRA](https://issues.apache.org/jira/browse/PIO). ## How to Help Resolve Existing Issues In general, bug fixes should be done the same way as new features, but critical bug fixes will follow a different path. ## How to Add / Propose a New Feature Before adding new features into JIRA, please check that the feature does not currently exist in JIRA. 1. To propose a new feature, simply [subscribe](mailto:dev-subscribe@predictionio.apache.org) and post your proposal to [Apache PredictionIO Development Mailing List] (mailto:dev@predictionio.apache.org). 2. Discuss with the community and the core development team on what needs to be done, and lay down concrete plans on deliverables. 3. Once solid plans are made, start creating tickets in the [issue tracker] (https://issues.apache.org/jira/browse/PIO). 4. Work side by side with other developers using Apache PredictionIO Development Mailing List as primary mode of communication. You never know if someone else has a better idea. ;) ### Adding ticket to JIRA 1. Add a descriptive Summary and a detailed description 2. Set Issue Type to Bug, Improvement, New Feature, Test or Wish 3. Set Priority to Blocker, Critical, Major, Minor or Trivial 4. Fill out Affects Version with the version of PredictionIO you are currently using 5. Fill out Environment if needed for description of your bug / feature 6. Please leave other fields blank ### Triaging JIRA Tickets will be triaged by PredictionIO committers. - **Target Version**: Either a particular version or `Future` if to be done later + Once a fix has been committed, the Fix Version will filled in with the appropriate release - **Component**: Each ticket will be annotated with one or more of the following Components + **Core**: affects the main code branch / will be part of a release + **Documentation**: affects the documents / will be pushed to livedoc branch + **Templates**: affects one of the separate github repositories for a template ## How to Issue a Pull Request When you have finished your code, you can [create a pull request](https://help.github.com/articles/creating-a-pull-request/) against the **develop** branch. - The title must contain a tag associating with an existing JIRA ticket. You must create a ticket so that the infrastructure can correctly track issues across Apache JIRA and GitHub. If your ticket is `PIO-789`, your title must look something like `[PIO-789] Some short description`. - Please also, in your commit message summary, include the JIRA ticket number similar to above. - Make sure the title and description are clear and concise. For more details on writing a good commit message, check out [this guide](http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html). - If the change is visual, make sure to include a screenshot or GIF. - Make sure it is being opened into the right branch. - Make sure it has been rebased on top of that branch. NOTE: When it is close to a release, and if there are major development ongoing, a release branch will be forked from the develop branch to stabilize the code for binary release. Please refer to the *git flow* methodology page for more information. ## Getting Started Apache PredictionIO relies heavily on the [git flow methodology]( http://nvie.com/posts/a-successful-git-branching-model/). Please make sure you read and understand it before you start your development. By default, cloning Apache PredictionIO will put you in the *develop* branch, which in most cases is where all the latest development go to. NOTE: For core development, please follow the [Scala Style Guide](http://docs.scala-lang.org/style/). ### Create a Fork of the Apache PredictionIO Repository 1. Start by creating a GitHub account if you do not already have one. 2. Go to [Apache PredictionIO’s GitHub mirror](https://github.com/PredictionIO/PredictionIO) and fork it to your own account. 3. Clone your fork to your local machine. If you need additional help, please refer to https://help.github.com/articles/fork-a-repo/. ### Building Apache PredictionIO from Source After the previous section, you should have a copy of Apache PredictionIO in your local machine ready to be built. 1. Make sure you are on the *develop* branch. You can double check by `git status` or simply `git checkout develop`. 2. At the root of the repository, do `./make-distribution.sh` to build PredictionIO. ### Setting Up the Environment Apache PredictionIO relies on 3rd party software to perform its tasks. To set them up, simply follow this [documentation]( http://predictionio.apache.org/install/install-sourcecode/#installing-dependencies). ### Start Hacking You should have a Apache PredictionIO development environment by now. Happy hacking! ## Anatomy of Apache PredictionIO Code Tree The following describes each directory’s purpose. ### bin Shell scripts and any relevant components to go into the binary distribution. Utility shell scripts can also be included here. ### conf Configuration files that are used by both a source tree and binary distribution. ### core Core Apache PredictionIO code that provides the DASE controller API, core data structures, and workflow creation and management code. ### data Apache PredictionIO Event Server, and backend-agnostic storage layer for event store and metadata store. ### docs Source code for http://predictionio.apache.org site, and any other documentation support files. ### examples Complete code examples showing Apache PredictionIO's application. ### sbt Embedded SBT (Simple Build Tool) launcher. ### storage Storage implementations. ### tools Tools for running Apache PredictionIO. Contains primarily the CLI (command-line interface) and its supporting code, and the experimental evaluation dashboard. ================================================ FILE: docs/manual/source/community/contribute-documentation.html.md ================================================ --- title: Contribute Documentation --- ## How to Write Documentation You can help improve the Apache PredictionIO documentation by submitting tutorials, writing how-tos, fixing errors, and adding missing information. You can edit any page live on [GitHub](https://github.com/apache/predictionio) by clicking the pencil icon on any page or open a [Pull Request](https://help.github.com/articles/creating-a-pull-request/). ## Branching Use the `livedoc` branch if you want to update the current documentation. Use the `develop` branch if you want to write documentation for the next release. ## Installing Locally Apache PredictionIO documentation uses [Middleman](http://middlemanapp.com/) and is hosted on Apache. [Gems](http://rubygems.org/) are managed with [Bundler](http://bundler.io/). Front end code with [Bower](http://bower.io/). Requires [Ruby](https://www.ruby-lang.org/en/) 2.1 or greater. We recommend [RVM](http://rvm.io/) or [rbenv](https://github.com/sstephenson/rbenv). WARNING: **OS X** users you will need to install [Xcode Command Line Tools](https://developer.apple.com/xcode/downloads/) with: `$ xcode-select --install` first. You can install everything with the following commands: ```bash $ cd docs/manual $ gem install bundler $ bundle install $ npm install -g bower $ bower install ``` ## Starting the Server Start the server with: ``` $ bundle exec middleman server ``` This will start the local web server at [localhost:4567](http://localhost:4567/). ## Building the Site Build the site with: ``` $ bundle exec middleman build ``` ## Styleguide Please follow this styleguide for any documentation contributions. ### Text View our [Sample Typography](/samples/) page for all possible styles. ### Headings The main heading `h1` is derived from the title data attribute: ``` --- title: Page Title --- ``` Start other headings with `h2`. Prefer the `## Heading` format in Markdown. ### Links Internal links: * Should start with / (relative to root). * Should end with / (S3 requirement). * Should **not** end with .html. Following these rules helps keep everything consistent and allows our version parser to correctly version links. Middleman is configured for directory indexes. Linking to a file in `sources/samples/index.html` should be done with `[Title](/sample/)`. ```md [Good](/path/to/page/) [Bad](../page) Not relative to root! [Bad](page.html) Do not use the .html extension! [Bad](/path/to/page) Does not end with a /. ``` ### Images Images should be exactly 900px wide. [Chrome Window Resizer](https://chrome.google.com/webstore/detail/window-resizer/kkelicaakdanhinjdeammmilcgefonfh) is an excellent extension for browser resizing. WARNING: **OS X** users please [Disable Shadows](http://www.idownloadblog.com/2014/08/03/how-to-remove-the-shadow-window-screenshots-on-mac-os-x/) before taking a screenshot. Images should only show the relevant tab/terminal. Hide any additional toolbars. Images will **automatically scale** by default. If you want an image to remain a set size you can use a raw HTML tag like this: ``` Image ``` ### Code Blocks Fenced code blocks are created using the ```language format. A example of each language is available on our [Language Samples](/samples/languages) page. ### Code Tabs Code tabs use the following HTML format: ```html
Markdown, code blocks, or HTML is OK inside a tab.
...
``` You can see an example of this on our [Tab Samples](/samples/tabs/) page. ### SEO You can hide a page from the `sitemap.xml` file by setting the pages [Frontmater](http://middlemanapp.com/basics/frontmatter/) like this: ```md --- title: Secret Page hidden: true --- ``` ## Important Files | Description | File | | ------------- | ------------- | | Left side navigation. | `data/nav/main.yml` | | Main site layout. | `source/layouts/layout.html.slim` | | Custom Markdown renderer based on [Redcarpet](https://github.com/vmg/redcarpet). | `lib/custom_renderer.rb` | | Custom TOC helper. | `helpers/table_of_contents_helpers.rb` | ### Versions Various site wide versions are defined in `data/versions.yml` and embedded with ERB like `<%= data.versions.pio %>`. NOTE: Files must end with a `.erb` extension to be processed as ERB. ## Going Live For Apache project committers, pushing to the `livedoc` branch of PredictionIO ASF git will update http://predictionio.apache.org in about 10 minutes. Make sure the **apache.org** remote is attached to your `predictionio` repo, and if not, add it: ``` $ git remote -v $ git remote add apache https://gitbox.apache.org/repos/asf/predictionio.git ``` Then, push the `livedoc` branch. (It will be published and synced with the public GitHub mirror): ``` $ git push apache livedoc ``` You can check the progress of each build on [Apache's Jenkins](https://builds.apache.org/): * [build-site](https://builds.apache.org/job/PredictionIO-build-site/) * [publish-site](https://builds.apache.org/job/PredictionIO-publish-site/) ## Checking the Site WARNING: The check rake task is still in **beta** however it is extremely useful for catching accidental errors. ```bash $ bundle exec middleman build $ bundle exec rake check ``` The `rake check` task parses each HTML page in the `build` folder and checks it for common errors including 404s. ## License Documentation is under a [Apache License Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). ================================================ FILE: docs/manual/source/community/contribute-sdk.html.md ================================================ --- title: Contribute a SDK --- A SDK should provide convenient methods for client applications to easily record users' behaviors in Apache PredictionIO's Event Server and also query recommendations from machine learning Engines. Therefore, a SDK typically has 2 corresponding clients: `Event Client` and `Engine Client`. The following guideline bases on the REST API provided by Apache PredictionIO's Event Client which details can be found [here](http://predictionio.apache.org/datacollection/eventapi/). ## Event Client Because the Event Server has only 1 connection point, the `Event Client` needs to implement this core request first. The core request has the following rules. - **URL**: `/events.json?accessKey=` (e.g. http://localhost:7070/events.json?accessKey=1234567890) - **Request**: `POST` + JSON data. Please refer to the [Event Creation API] (http://predictionio.apache.org/datacollection/eventapi/) for the details on the fields of the JSON data object. - **Response**: + **Success**: status code `201` with a JSON result containing the `eventId`. + **Failure**: a JSON result containing a `message` field describing the error. * Status code `401`: invalid access key. * Status code `400`: fail to parse the JSON request e.g. missing required fields like `event`, or invalid `eventTime` format. Other convenient methods are just shortcut. They could simply build the event's parameters and call the core request. `Event Client` should support the following 7 shorthand operations: - **User entities** + **Sets properties of a user**: with the JSON object ```json { "event": "$set", "entityType": "user", "entityId": , "properties": } ``` + **Unsets some properties of a user**: with the JSON object ```json { "event": "$unset", "entityType": "user", "entityId": , "properties": } ``` + **Delete a user**: with the JSON object ```json { "event": "$delete", "entityType": "user", "entityId": } ``` - **Item entities** + **Sets properties of an item**: with the JSON object ```json { "event": "$set", "entityType": "item", "entityId": , "properties": } ``` + **Unsets some properties of an item**: with the JSON object ```json { "event": "$unset", "entityType": "item", "entityId": , "properties": } ``` + **Delete an item**: with the JSON object ```json { "event": "$delete", "entityType": "item", "entityId": } ``` - **Others** + **Record a user's action on some item**: with the JSON object ```json { "event": , "entityType": "user", "entityId": , "targetEntityType": "item", "targetEntityId": , "properties": } ``` Again, please refer to the [API documentation] (http://predictionio.apache.org/datacollection/eventapi/) for explanations on the reversed events like `$set`, `$unset` or `$delete`. INFO: The `eventTime` is optional but it is recommended that the client application should include time in the request. Therefore, it is best that the `Event Client` includes the time field if missing, before sending the event to the server. ## Engine Client `Engine Client`'s main job is to retrieve recommendation or prediction results from Apache PredictionIO's Engines. It has only a few rules on the request and response type. - **URL**: `/queries.json` (e.g. http://localhost:8000/queries.json) - **Request**: `POST` + JSON data. For example, ```json { "user": 1, "num": 4 } ``` - **Response**: + **Success**: status code `200` with a JSON result object. For example, ```json { "itemScores": [ { "item": 39, "score": "6.177719297832409" }, { "item": 79, "score": "5.931687319083594" }, ... ] } ``` + **Failure**: status code `400` e.g. fail to parse the query. The formats of JSON objects in both the request and response must be defined by the Apache PredictionIO's Engine and are different across applications. The above examples are taken from the Recommendation Engine template in which the query and prediction results are defined as following. ```scala case class Query( user: String, num: Int ) extends Serializable case class PredictedResult( itemScores: Array[ItemScore] ) extends Serializable ``` ## Testing Your SDK You can set up a local host Apache PredictionIO environment to test your SDK. However, it is hard to set it up online to test your SDK automatically using services like Travis CI. In that case, you should consider using these lightweight [mock servers] (https://github.com/minhtule/PredictionIO-Mock-Server). Please see the instructions in the repo how to use it. It takes less than 5 minutes! That's it! We are looking forward to see your SDK! ================================================ FILE: docs/manual/source/community/contribute-webhook.html.md ================================================ --- title: Contribute a Webhooks Connector --- NOTE: Please check out the [latest develop branch](https://github.com/apache/predictionio). Event server can collect data from other third-party sites or software through their webhooks services (for example, SegmentIO, MailChimp). To support that, a *Webhooks Connector* for the third-party data is needed to be integrated into Event Server. The job of the *Webhooks Connector* is as simply as converting the third-party data into Event JSON. You can find an example below. Currently we support two types of connectors: `JsonConnector` and `FormConnector`, which is responsible for accepting *JSON* data and *Form-submission* data, respectively. **JsonConnector**: ```scala package org.apache.predictionio.data.webhooks /** Connector for Webhooks connection */ private[predictionio] trait JsonConnector { /** Convert from original JObject to Event JObject * @param data original JObject recevived through webhooks * @return Event JObject */ def toEventJson(data: JObject): JObject } ``` The EventServer URL path to collect webhooks JSON data: ``` http:///webhooks/.json?accessKey=&channel= ``` Note that you may collect Webhooks data into default channel (without the `channel` parameter in the URL) but it's highly recommended to create dedicated [Channel](/datacollection/channel/) to collect specific Webhooks data (e.g. create one channel "segmentio" for SegmentIO and another channel "mailchimp" for Mailchimp data) because it allows you to manage and query data more easily, and the Webhooks data won't be mixed with your other normal app data. **FormConnector**: ```scala package org.apache.predictionio.data.webhooks /** Connector for Webhooks connection with Form submission data format */ private[predictionio] trait FormConnector { /** Convert from original Form submission data to Event JObject * @param data Map of key-value pairs in String type received through webhooks * @return Event JObject */ def toEventJson(data: Map[String, String]): JObject } ``` The EventServer URL path to collect webhooks form-subimssion data (no .json): ``` http:///webhooks/?accessKey=&channel= ``` Note that you may collect Webhooks data into default channel (without the `channel` parameter in the URL) but it's highly recommended to create dedicated [Channel](/datacollection/channel/) to collect specific Webhooks data (e.g. create one channel "segmentio" for SegmentIO and another channel "mailchimp" for Mailchimp data) because it allows you to manage and query data more easily, and the Webhooks data won't be mixed with your other normal app data. # Example For example, let's say there is a third-party website (say, it is named "ExampleJson") which can send the following JSON data through its webhooks service and we would like to collect it into Event Store. **UserActionItem**: ```json { "type": "userActionItem", "userId": "as34smg4", "event": "do_something_on", "itemId": "kfjd312bc", "context": { "ip": "1.23.4.56", "prop1": 2.345, "prop2": "value1" }, "anotherPropertyA": 4.567, "anotherPropertyB": false, "timestamp": "2015-01-15T04:20:23.567Z" } ``` ## 1. Implement Webhooks Connector Because the data sent by this third-party "ExampleJson" site is in JSON format, we implement an object `ExampleJsonConnector` which extends `JsonConnector`: ```scala private[predictionio] object ExampleJsonConnector extends JsonConnector { implicit val json4sFormats: Formats = DefaultFormats override def toEventJson(data: JObject): JObject = { val common = try { data.extract[Common] } catch { case e: Exception => throw new ConnectorException( s"Cannot extract Common field from ${data}. ${e.getMessage()}", e) } val json = try { common.`type` match { case "userActionItem" => toEventJson(common = common, userActionItem = data.extract[UserActionItem]) case x: String => throw new ConnectorException( s"Cannot convert unknown type '${x}' to Event JSON.") } } catch { case e: ConnectorException => throw e case e: Exception => throw new ConnectorException( s"Cannot convert ${data} to eventJson. ${e.getMessage()}", e) } json } // Convert the UserActionItem JSON to Event JSON def toEventJson(common: Common, userActionItem: UserActionItem): JObject = { import org.json4s.JsonDSL._ // map to EventAPI JSON val json = ("event" -> userActionItem.event) ~ ("entityType" -> "user") ~ ("entityId" -> userActionItem.userId) ~ ("targetEntityType" -> "item") ~ ("targetEntityId" -> userActionItem.itemId) ~ ("eventTime" -> userActionItem.timestamp) ~ ("properties" -> ( ("context" -> userActionItem.context) ~ ("anotherPropertyA" -> userActionItem.anotherPropertyA) ~ ("anotherPropertyB" -> userActionItem.anotherPropertyB) )) json } // Common required fields case class Common( `type`: String ) // UserActionItem fields case class UserActionItem ( userId: String, event: String, itemId: String, context: JObject, anotherPropertyA: Option[Double], anotherPropertyB: Option[Boolean], timestamp: String ) } ``` You can find the complete example in [the GitHub repo](https://github.com/apache/predictionio/blob/develop/data/src/main/scala/org/apache/predictionio/data/webhooks/examplejson/ExampleJsonConnector.scala) and how to write [tests for the connector](https://github.com/apache/predictionio/blob/develop/data/src/test/scala/org/apache/predictionio/data/webhooks/examplejson/ExampleJsonConnectorSpec.scala). Please put the connector code in a separate directory for each site. For example, code for segmentio connector should be in ``` data/src/main/scala/org.apache.predictionio/data/webhooks/segmentio/ ``` and tests should be in ``` data/src/test/scala/org.apache.predictionio/data/webhooks/segmentio/ ``` **For form-submission data**, you can find the complete example [the GitHub repo](https://github.com/apache/predictionio/blob/develop/data/src/main/scala/org/apache/predictionio/data/webhooks/exampleform/ExampleFormConnector.scala) and how to write [tests for the connector](https://github.com/apache/predictionio/blob/develop/data/src/test/scala/org/apache/predictionio/data/webhooks/exampleform/ExampleFormConnectorSpec.scala). ## 2. Integrate the Connector into Event Server Once we have the connector implemented, we can add this to the EventServer so we can collect real-time data. Add the connector to [`WebhooksConnectors` object]( https://github.com/apache/predictionio/blob/develop/data/src/main/scala/org/apache/predictionio/data/api/WebhooksConnectors.scala): ```scala import org.apache.predictionio.data.webhooks.examplejson.ExampleJsonConnector // ADDED private[predictionio] object WebhooksConnectors { // Map of Connector Name to Connector val json: Map[String, JsonConnector] = Map( "segmentio" -> SegmentIOConnector, "examplejson" -> ExampleJsonConnector // ADDED ) // Map of Connector Name to Connector val form: Map[String, FormConnector] = Map( "mailchimp" -> MailChimpConnector ) } ``` Note that the name of the connectors (e.g. "examplejson", "segmentio") will be used as the webhooks URL. In this example, the event server URL to collect data from "ExampleJson" would be: ``` http:///webhooks/examplejson.json?accessKey=&channel= ``` For `FormConnector`, the URL doesn't have `.json`. For example, ``` http:///webhooks/mailchimp?accessKey=&channel= ``` That's it. Once you re-compile Apache PredictionIO, you can send the ExampleJson data to the following URL and the data will be stored to the App of the corresponding Access Key. ================================================ FILE: docs/manual/source/community/index.html.md ================================================ --- title: Community Page --- ## User Mailing List This list is for users of Apache PredictionIO to ask questions, share knowledge, and discuss issues. Do send mail to this list with usage and configuration questions and problems. Also, please send questions to this list to verify your problem before filing issues in JIRA. [Subscribe](mailto:user-subscribe@predictionio.apache.org) to our User Mailing List. [Unsubscribe](mailto:user-unsubscribe@predictionio.apache.org) from our User Mailing List. ## Twitter Follow us on Twitter [@predictionio](https://twitter.com/PredictionIO). ## Facebook Page Like us on Facebook at https://www.facebook.com/predictionio. ## GitHub View our code on GitHub at https://github.com/apache/predictionio. ================================================ FILE: docs/manual/source/community/projects.html.md ================================================ --- title: Community Powered Projects --- Here you will find great projects contributed by the Apache PredictionIO community. INFO: If you have built a Apache PredictionIO-related project, we would love to showcase it to the community! Simply edit [this page](https://github.com/apache/predictionio/blob/livedoc/docs/manual/source/community/projects.html.md) and submit a pull request. ## SDKs ### Swift SDK - Minh-Tu Le: https://github.com/minhtule/PredictionIO-Swift-SDK ## DEMOs ### Tapster iOS Demo - Minh-Tu Le: https://github.com/minhtule/Tapster-iOS-Demo ## Universal Recommender - ActionML: https://github.com/actionml/universal-recommender ## Docker Images - Ming Fang: https://github.com/mingfang/docker-predictionio - Steven Yan: https://github.com/steveny2k/docker-predictionio - Japan PredictionIO User Group: https://github.com/jpioug/predictionio-docker - Inspectorio Inc: https://github.com/inspectorioinc/docker-prediction-io ## Archived Projects Some community projects have not got any update for quite some time. These projects are listed in the [archived list](/archived/community/). If an archived project is updated, please edit [this page](https://github.com/apache/predictionio/blob/livedoc/docs/manual/source/community/projects.html.md) and submit a pull request to put your project back to this active projects list. ================================================ FILE: docs/manual/source/community/submit-template.html.md ================================================ --- title: Submitting a Template to Template Gallery --- ## Template Guidelines - Please give your template and GitHub repo a meaningful name (for example, My-MLlibKMeansClustering-Template). - Please tag your repo for each released version. This is required by Template Gallery. For example, tag the release with v0.1.0: ``` $ git tag -a v0.1.0 -m 'version 0.1.0' ``` - For clarity, the engine template directory structure should be: ``` data/ # contains sample data or related files project/ # contains the necessary sbt files for build (e.g assembly.sbt) src/ # template source code .gitignore README.md build.sbt engine.json # one or more engine.json template.json ``` - Try to keep the root directory clean. If you have additional script files or other files, please create new folders for them and provide description. - Include a QuickStart of how to use the engine, including: 1. Overview description of the template 2. Events and Data required by the template 3. Description of Query and PredictedResult 4. Steps to import sample data 5. Description of the sample data 6. Steps to build, train and deploy the engine 7. Steps to send sample query and expected output - If you have additional sample data, please also provide description and how to import them in README - If you have multiple engine.json files, please provide description of them in README - It's recommended to follow [Scala Style Guide](http://docs.scala-lang.org/style/) ## How to submit - Fork repository - Modify *docs/manual/source/gallery/templates.yaml* introducing a new template. The schema of the engine description is following: ```yml - template: name: (Name of your template) repo: (Link to your repository) description: |- (Brief description of your template written in markdown syntax) tags: [ (One of [classification, regression, unsupervised, recommender, nlp, other]) ] type: (Parallel or Local) language: (Language) license: (License) status: (e.g. alpha, stable or requested (under development)) pio_min_version: (Minimum version of PredictionIO to run your template) ``` - Submit your changes via pull-request ================================================ FILE: docs/manual/source/customize/dase.html.md.erb ================================================ --- title: Implementing DASE --- This section gives you an overview of DASE components and how to implement them. You will find links to some engine templates for more concrete examples. # DataSource DataSource reads and selects useful data from the Event Store (data store of the Event Server) and returns TrainingData. ## readTraining() You need to implement readTraining() of [PDataSource](https://predictionio.apache.org/api/current/#org.apache.predictionio.controller.PDataSource), where you can use the [PEventStore Engine API](https://predictionio.apache.org/api/current/#org.apache.predictionio.data.store.PEventStore$) to read the events and create the TrainingData based on the events. The following code example reads user "view" and "buy" item events, filters specific type of events for future processing and returns TrainingData accordingly. ```scala class DataSource(val dsp: DataSourceParams) extends PDataSource[TrainingData, EmptyEvaluationInfo, Query, EmptyActualResult] { @transient lazy val logger = Logger[this.type] override def readTraining(sc: SparkContext): TrainingData = { val eventsRDD: RDD[Event] = PEventStore.find( appName = dsp.appName, entityType = Some("user"), eventNames = Some(List("view", "buy")), // targetEntityType is optional field of an event. targetEntityType = Some(Some("item")))(sc) .cache() val viewEventsRDD: RDD[ViewEvent] = eventsRDD .filter { event => event.event == "view" } .map { ... } ... new TrainingData(...) } } ``` ## Using PEventStore Engine API Please see [Event Server Overview](https://predictionio.apache.org/datacollection/) to understand [EventAPI](https://predictionio.apache.org/datacollection/eventapi/) and [event modeling](https://predictionio.apache.org/datacollection/eventmodel/). With [PEventStore Engine API](https://predictionio.apache.org/api/current/#org.apache.predictionio.data.store.PEventStore$), you can easily read different events in DataSource and get the information you need. For example, let's say you have events like the following: ```json { "event": "myEvent", "entityType": "user", "entityId": "u0", "targetEntityType": "item", "targetEntityId": "i0", "properties" : { "a" : 3, "b" : "some_string", "c" : ["a", "b", "c"], "d" : [1.2, 3.4, 5.6], "e" : 6 } } ``` Then following code could read these events and extract the properties field of the event and convert it to a `MyEvent` object. ```scala val myEvents: RDD[MyEvent] = PEventStore.find( appName = dsp.appName, entityType = Some("user"), eventNames = Some(List("myEvent")), // targetEntityType is optional field of an event. targetEntityType = Some(Some("item")))(sc) .map { event => try { MyEvent( entityId = event.entityId, targetEntityId = event.targetEntityId.get, a = event.properties.get[Int]("a"), b = event.properties.get[String]("b"), c = event.properties.get[List[String]]("c"), d = event.properties.get[List[Double]]("d"), e = event.properties.getOpt[Int]("e") // use getOpt for optional data ) } catch { case e: Exception => logger.error(s"Cannot convert ${event}. Exception: ${e}.") throw e } } ``` If you have used special events `$set/$unset/$delete` setting entity's properties, you can retrieve it with `PEventStore.aggregateProperties()`. Please see [event modeling](https://predictionio.apache.org/datacollection/eventmodel/) to understand usage of special `$set/$unset/$delete` events. For example, the following code show how you could retrieve properties of the "item" entities: ```scala // create a RDD of (entityID, Item) val itemsRDD: RDD[(String, Item)] = PEventStore.aggregateProperties( appName = dsp.appName, entityType = "item" )(sc).map { case (entityId, properties) => try { val item = Item( a = preopties.get[Int]("a"), b = properties.get[String]("b"), c = properties.get[List[String]]("c"), d = properties.get[List[Double]]("d"), e = properties.getOpt[Int]("e") // use getOpt for optional data ) (entityId, item) } catch { case e: Exception => logger.error(s"Failed to get properties ${properties} of ${entityId}. Exception: ${e}.") throw e } } ``` Example: - [DataSource of Similar Product Template](/templates/similarproduct/dase/#data) # Preparator Preparator is responsible for pre-processing `TrainingData` for any necessary feature selection and data processing tasks and generate `PreparedData` which contains the data the Algorithm needs. A few example usages of Preparator: - Feature extraction - Common pre-processing logic if you have multiple algorithms - For simple cases, the Preparator may simply pass the same `TrainingData` as `PreparedData` for Algorithm. ## prepare() You need to implement the `prepare()` method of [PPrepartor](https://predictionio.apache.org/api/current/#org.apache.predictionio.controller.PPreparator) to perform such tasks. Example: - [Preparator of Leading Scoring Template](/templates/leadscoring/dase/#data): it pre-processes the TrainingData and generate the feature vectors needed for the algorithm. - [Preparator of Similar Product Template](/templates/similarproduct/dase/#data): it simply passes the TrainingData as PreparedData for the algorithm. # Algorithm The two methods of the Algorithm class are train() and predict(): ## train() train() is responsible for training a predictive model. It is called when you run `pio train`. Apache PredictionIO will store this model. ## predict() predict() is responsible for using this model to make prediction. It is called when you send a JSON query to the engine. Note that predict() is called in real time. Apache PredictionIO supports two types of algorithms: - **[P2LAlgorithm](https://predictionio.apache.org/api/current/#org.apache.predictionio.controller.P2LAlgorithm)**: trains a Model which does not contain RDD - **[PAlgorithm](https://predictionio.apache.org/api/current/#org.apache.predictionio.controller.PAlgorithm)**: trains a Model which contains RDD ## P2LAlgorithm For `P2LAlgorithm`, the Model is automatically serialized and persisted by Apache PredictionIO after training. Implementing `IPersistentModel` and `IPersistentModelLoader` is optional for P2LAlgorithm. Example: - [Algorithm of Similar Product Template](/templates/similarproduct/dase/#algorithm) ## PAlgorithm `PAlgorithm` should be used when your Model contains RDD. The model produced by `PAlgorithm` is not persisted by default. To persist the model, you need to do the following: - The Model class should extend the `IPersistentModel` trait and implement the `save()` method for saving the model. The trait `IPersistentModel` requires a type parameter which is the class type of algorithm parameter. - Implement a Model factory object which extends the `IPersistentModelLoader` trait and implement the `apply()` for loading the model. The trait `IPersistentModelLoader` requires two type parameters which are the types of algorithm parameter and the model produced by the algorithm. Example: - [Algorithm of Recommendation Template](/templates/recommendation/dase/#algorithm): it implements PAlgorithm and the IPersistentModel and IPersistentModelLoader. - [Algorithm of Vanilla Template](/templates/vanilla/dase): it walks through example of P2LAlgorithm and PAlgorithm. ## using LEventStore Engine API in predict() You may use [LEventStore.findByEntity()](https://predictionio.apache.org/api/current/#org.apache.predictionio.data.store.LEventStore$) to retrieve events of a specific entity. For example, retrieve recent events of the user specified in the query) and use these recent events to make prediction in real time. For example, the following code reads the recent 10 view events of `query.user`: ```scala val recentEvents = try { LEventStore.findByEntity( appName = ap.appName, // entityType and entityId is specified for fast lookup entityType = "user", entityId = query.user, eventNames = Some(List("view")), targetEntityType = Some(Some("item")), limit = Some(10), latest = true, // set time limit to avoid super long DB access timeout = Duration(200, "millis") ) } catch { case e: scala.concurrent.TimeoutException => logger.error(s"Timeout when read recent events." + s" Empty list is used. ${e}") Iterator[Event]() case e: Exception => logger.error(s"Error when read recent events: ${e}") throw e } ``` Example: - [Algorithm of E-Commerce Recommendation template](/templates/ecommercerecommendation/dase#algorithm): LEventStore.findByEntity() is used to retrieve all items seen by the user and filter them from recommendation in predict(). # Serving ## serve() You need to implement the serve() method of the class [LServing](https://predictionio.apache.org/api/current/#org.apache.predictionio.controller.LServing). The serve() method processes predicted result. It is also responsible for combining multiple predicted results into one if you have more than one predictive model. Example: - [Serving of Similar Product Template](/templates/similarproduct/dase/#serving): It simply returns the predicted result - [Serving of multi-algorithm examples of Similar Product Template](/templates/similarproduct/multi-events-multi-algos/): It combines the result of multiple algorithms and return ================================================ FILE: docs/manual/source/customize/index.html.md ================================================ --- title: Learning DASE --- The code of an engine consists of D-A-S-E components: ### [D] Data Source and Data Preparator Data Source reads data from an input source and transforms it into a desired format. Data Preparator preprocesses the data and forwards it to the algorithm for model training. ### [A] Algorithm The Algorithm component includes the Machine Learning algorithm, and the settings of its parameters, determines how a predictive model is constructed. ### [S] Serving The Serving component takes prediction *queries* and returns prediction results. If the engine has multiple algorithms, Serving will combine the results into one. Additionally, business-specific logic can be added in Serving to further customize the final returned results. ### [E] Evaluation Metrics An Evaluation Metric quantifies prediction accuracy with a numerical score. It can be used for comparing algorithms or algorithm parameter settings. > Apache PredictionIO helps you modularize these components so you can build, for example, several Serving components for an Engine. You will be able to choose which one to be deployed when you create an Engine. ![Engine Overview](/images/engineinstance-overview.png) ## The Roles of an Engine The main functions of an engine are: * Train a model using the training data and be deployed as a web service * Respond to prediction query in real-time An engine puts all DASE components into a deployable state by specifying: * One Data Source * One Data Preparator * One or more Algorithm(s) * One Serving INFO: If more than one algorithm is specified, each of their model prediction results will be passed to Serving for ensembling. Each Engine processes data and constructs predictive models independently. Therefore, every engine serves its own set of prediction results. For example, you may deploy two engines for your mobile application: one for recommending news to users and another one for suggesting new friends to users. ### Training a Model - The DASE View The following graph shows the workflow of DASE components when `pio train` is run. ![Engine Overview](/images/engine-training.png) ### Respond to Prediction Query - The DASE View The following graph shows the workflow of DASE components when a REST query is received by a deployed engine. ![Engine Overview](/images/engine-query.png) Please see [Implement DASE](/customize/dase) for DASE implementation details. Please refer to following templates and their how-to guides for concrete examples. ## Examples of DASE - [DASE of Recommendation Template](/templates/recommendation/dase/) - [DASE of Similar Product Template](/templates/similarproduct/dase/) - [DASE of Classification Template](/templates/classification/dase/) - [DASE of Lead Scoring Template](/templates/leadscoring/dase/) ================================================ FILE: docs/manual/source/customize/troubleshooting.html.md ================================================ --- title: Engine Development - Troubleshoot --- Apache PredictionIO provides the following features to help you debug engines during development cycle. ## Stop Training between Stages By default `pio train` runs through the whole training process including [DataSource, Preparator and Algorithm](/templates/recommendation/dase/). To speed up the development and debug cycle, you can stop the process after each stage to verify it has completed correctly. If you have modified DataSource and want to confirm the TrainingData is generated as expected, you can run `pio train` with `--stop-after-read` option: ``` pio train --stop-after-read ``` This would stop the training process after the TrainingData is generated. For example, if you are running [Recommendation Template](/templates/recommendation/quickstart/), you should see the the training process stops after the TrainingData is printed. ``` [INFO] [CoreWorkflow$] TrainingData: [INFO] [CoreWorkflow$] ratings: [1501] (List(Rating(3,0,4.0), Rating(3,1,4.0))...) ... [INFO] [CoreWorkflow$] Training interrupted by org.apache.predictionio.workflow.StopAfterReadInterruption. ``` Similarly, you can stop the training after the Preparator phase by using --stop-after-prepare option and it would stop after PreparedData is generated: ``` pio train --stop-after-prepare ``` ## Sanity Check You can extend a trait `SanityCheck` and implement the method `sanityCheck()` with your error checking code. The `sanityCheck()` is called when the data is generated. This can be applied to `TrainingData`, `PreparedData` and the `Model` classes, which are outputs of DataSource's `readTraining()`, Preparator's `prepare()` and Algorithm's `train()` methods, respectively. For example, one frequent error with the Recommendation Template is that the TrainingData is empty because the DataSource is not reading data correctly. You can add the check of empty data inside the `sanityCheck()` function. You can easily add other checking logic into the `sanityCheck()` function based on your own needs. Also, If you implement `toString()` method in your TrainingData. You can call `toString()` inside `sanityCheck()` to print out some data for visual checking. For example, to print TrainingData to console and check if the `ratings` is empty, you can do the following: ```scala import org.apache.predictionio.controller.SanityCheck // ADDED class TrainingData( val ratings: RDD[Rating] ) extends Serializable with SanityCheck { // EXTEND SanityCheck override def toString = { s"ratings: [${ratings.count()}] (${ratings.take(2).toList}...)" } // IMPLEMENT sanityCheck() override def sanityCheck(): Unit = { println(toString()) // add your other checking here require(!ratings.take(1).isEmpty, s"ratings cannot be empty!") } } ``` You may also use together with --stop-after-read flag to debug the DataSource: ``` pio build pio train --stop-after-read ``` If your data is empty, you should see the following error thrown by the `sanityCheck()` function: ``` [INFO] [CoreWorkflow$] Performing data sanity check on training data. [INFO] [CoreWorkflow$] org.template.recommendation.TrainingData supports data sanity check. Performing check. Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: ratings cannot be empty! at scala.Predef$.require(Predef.scala:233) at org.template.recommendation.TrainingData.sanityCheck(DataSource.scala:73) at org.apache.predictionio.workflow.CoreWorkflow$$anonfun$runTypelessContext$7.apply(Workflow.scala:474) at org.apache.predictionio.workflow.CoreWorkflow$$anonfun$runTypelessContext$7.apply(Workflow.scala:465) at scala.collection.immutable.Map$Map1.foreach(Map.scala:109) ... ``` You can specify the `--skip-sanity-check` option to turn off sanityCheck: ``` pio train --stop-after-read --skip-sanity-check ``` You should see the checking is skipped such as the following output: ``` [INFO] [CoreWorkflow$] Data sanity checking is off. [INFO] [CoreWorkflow$] Data Source ... [INFO] [CoreWorkflow$] Training interrupted by org.apache.predictionio.workflow.StopAfterReadInterruption. ``` ## Engine Status Page After run `pio deploy`, you can access the engine status page by go to same URL and port of the deployed engine with your browser, which is "http://localhost:8000" by default. In the engine status page, you can find the Engine information, and parameters of each DASE components. In particular, you can also see the "Model" trained by the algorithm based on how `toString()` method is implemented in the Algorithm's Model class. ## pio-shell Apache PredictionIO also provides `pio-shell` in which you can easily access Apache PredictionIO API, Spark context and Spark API for quickly testing code or debugging purposes. To bring up the shell, simply run: ``` $ pio-shell --with-spark ``` (`pio-shell` is available inside `bin/` directory of installed Apache PredictionIO directory, you should be able to access it if you have added PredictionIO/bin into your environment variable `PATH`) Note that the Spark context is available as variable `sc` inside the shell. For example, to get the events of `MyApp1` using PEventStore API inside the pio-shell and collect them into an array `c`. run the following in the shell: ``` > import org.apache.predictionio.data.store.PEventStore > val eventsRDD = PEventStore.find(appName="MyApp1")(sc) > val c = eventsRDD.collect() ``` Then you should see following returned in the shell: ``` ... 15/05/18 14:24:42 INFO DAGScheduler: Job 0 finished: collect at :24, took 1.850779 s c: Array[org.apache.predictionio.data.storage.Event] = Array(Event(id=Some(AaQUUBsFZxteRpDV_7fDGQAAAU1ZfRW1tX9LSWdZSb0),event=$set,eType=item,eId=i42,tType=None,tId=None,p=DataMap(Map(categories -> JArray(List(JString(c2), JString(c1), JString(c6), JString(c3))))),t=2015-05-15T21:31:19.349Z,tags=List(),pKey=None,ct=2015-05-15T21:31:19.354Z), Event(id=Some(DjvP3Dnci9F4CWmiqoLabQAAAU1ZfROaqdRYO-pZ_no),event=$set,eType=user,eId=u9,tType=None,tId=None,p=DataMap(Map()),t=2015-05-15T21:31:18.810Z,tags=List(),pKey=None,ct=2015-05-15T21:31:18.817Z), Event(id=Some(DjvP3Dnci9F4CWmiqoLabQAAAU1ZfRq7tsanlemwmZQ),event=view,eType=user,eId=u9,tType=Some(item),tId=Some(i25),p=DataMap(Map()),t=2015-05-15T21:31:20.635Z,tags=List(),pKey=None,ct=2015-05-15T21:31:20.639Z), Event(id=Some(DjvP3Dnci9F4CWmiqoLabQAAAU1ZfR... ``` ================================================ FILE: docs/manual/source/datacollection/analytics-ipynb.html.md.erb ================================================ --- title: Machine Learning Analytics with IPython Notebook --- [IPython Notebook](http://ipython.org/notebook.html) is a very powerful interactive computational environment, and with [Apache PredictionIO](http://predictionio.apache.org), [PySpark](http://spark.apache.org/docs/latest/api/python/) and [Spark SQL](https://spark.apache.org/sql/), you can easily analyze your collected events when you are developing or tuning your engine. ## Prerequisites Before you begin, please make sure you have the latest stable IPython installed, and that the command `ipython` can be accessed from your shell's search path. <%= partial 'shared/datacollection/parquet' %> ## Preparing IPython Notebook Launch IPython Notebook with PySpark using the following command, with `$SPARK_HOME` replaced by the location of Apache Spark. ``` $ PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook --pylab inline" $SPARK_HOME/bin/pyspark ``` If you see a error appearing in the console like this: ``` [E 10:07:53.900 NotebookApp] Support for specifying --pylab on the command line has been removed. [E 10:07:53.901 NotebookApp] Please use `%pylab inline` or `%matplotlib inline` in the notebook itself. ``` Then you can use the following command. ``` PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook --`%pylab inline`" $SPARK_HOME/bin/pyspark ``` By default, you should be able to access your IPython Notebook via web browser at http://localhost:8888. Let's initialize our notebook for the following code in the first cell. ```python import pandas as pd def rows_to_df(rows): return pd.DataFrame(map(lambda e: e.asDict(), rows)) from pyspark.sql import SQLContext sqlc = SQLContext(sc) rdd = sqlc.parquetFile("/tmp/movies") rdd.registerTempTable("events") ``` ![Initialization for IPython Notebook](/images/datacollection/ipynb-01.png) `rows_to_df(rows)` will come in handy when we want to dump the results from Spark SQL using IPython Notebook's native table rendering. ## Performing Analysis with Spark SQL If all steps above ran successfully, you should have a ready-to-use analytics environment by now. Let's try a few examples to see if everything is functional. In the second cell, put in this piece of code and run it. ```python summary = sqlc.sql("SELECT " "entityType, event, targetEntityType, COUNT(*) AS c " "FROM events " "GROUP BY entityType, event, targetEntityType").collect() rows_to_df(summary) ``` You should see the following screen. ![Summary of Events](/images/datacollection/ipynb-02.png) We can also plot our data, in the next two cells. ```python import matplotlib.pyplot as plt count = map(lambda e: e.c, summary) event = map(lambda e: "%s (%d)" % (e.event, e.c), summary) colors = ['gold', 'lightskyblue'] plt.pie(count, labels=event, colors=colors, startangle=90, autopct="%1.1f%%") plt.axis('equal') plt.show() ``` ![Summary in Pie Chart](/images/datacollection/ipynb-03.png) ```python ratings = sqlc.sql("SELECT properties.rating AS r, COUNT(*) AS c " "FROM events " "WHERE properties.rating IS NOT NULL " "GROUP BY properties.rating " "ORDER BY r").collect() count = map(lambda e: e.c, ratings) rating = map(lambda e: "%s (%d)" % (e.r, e.c), ratings) colors = ['yellowgreen', 'plum', 'gold', 'lightskyblue', 'lightcoral'] plt.pie(count, labels=rating, colors=colors, startangle=90, autopct="%1.1f%%") plt.axis('equal') plt.show() ``` ![Breakdown of Ratings](/images/datacollection/ipynb-04.png) Happy analyzing! ================================================ FILE: docs/manual/source/datacollection/analytics-tableau.html.md.erb ================================================ --- title: Machine Learning Analytics with Tableau --- With Spark SQL, it is possible to connect Tableau to Apache PredictionIO Event Server for interactive analysis of event data. ## Prerequisites - Tableau Desktop 8.3+ with a proper license key that supports Spark SQL; - Spark ODBC Driver from Databricks (https://databricks.com/spark/odbc-driver-download); - Apache Hadoop 2.4+ - Apache Hive 0.3.1+ INFO: In this article, we will assume that you have a working HDFS, and that your environmental variable `HADOOP_HOME` has been properly set. This is essential for Apache Hive to function properly. In addition, `HADOOP_CONF_DIR` in `$PIO_HOME/conf/pio-env.sh` must also be properly set for the `pio export` command to write to HDFS instead of the local filesystem. <%= partial 'shared/datacollection/parquet' %> ## Creating Hive Tables Before you can use Spark SQL's Thrift JDBC/ODBC Server, you will need to create the table schema in Hive first. Please make sure to replace `path_of_hive` with the real path. ``` $ cd path_of_hive $ bin/hive hive> CREATE EXTERNAL TABLE events (event STRING, entityType STRING, entityId STRING, targetEntityType STRING, targetEntityId STRING, properties STRUCT) STORED AS parquet LOCATION '/tmp/movies'; hive> exit; ``` ## Launch Spark SQL's Thrift JDBC/ODBC Server Once you have created your Hive tables, create a Hive configuration in your Spark installation. If you have a custom `hive-site.xml`, simply copy or link it to `$SPARK_HOME/conf`. Otherwise, Hive would have created a local Derby database, and you will need to let Spark knows about it. Create `$SPARK_HOME/conf/hive-site.xml` from scratch with the following template. WARNING: You must change `/opt/apache-hive-0.13.1-bin` below to a real Hive path. ```xml javax.jdo.option.ConnectionURL jdbc:derby:;databaseName=/opt/apache-hive-0.13.1-bin/metastore_db;create=true ``` Launch Spark SQL's Thift JDBC/ODBC Server by ``` $ $SPARK_HOME/sbin/start-thriftserver.sh ``` You can test the server using the included Beeline client. ``` $ $SPARK_HOME/bin/beeline beeline> !connect jdbc:hive2://localhost:10000 (Use empty username and password when prompted) 0: jdbc:hive2://localhost:10000> select * from events limit 10; +--------+-------------+-----------+-------------------+-----------------+------------------+ | event | entitytype | entityid | targetentitytype | targetentityid | properties | +--------+-------------+-----------+-------------------+-----------------+------------------+ | buy | user | 3 | item | 0 | {"rating":null} | | buy | user | 3 | item | 1 | {"rating":null} | | rate | user | 3 | item | 2 | {"rating":1.0} | | buy | user | 3 | item | 7 | {"rating":null} | | buy | user | 3 | item | 8 | {"rating":null} | | buy | user | 3 | item | 9 | {"rating":null} | | rate | user | 3 | item | 14 | {"rating":1.0} | | buy | user | 3 | item | 15 | {"rating":null} | | buy | user | 3 | item | 16 | {"rating":null} | | buy | user | 3 | item | 18 | {"rating":null} | +--------+-------------+-----------+-------------------+-----------------+------------------+ 10 rows selected (0.515 seconds) 0: jdbc:hive2://localhost:10000> ``` Now you are ready to use Tableau! ## Performing Analysis with Tableau Launch Tableau and Connect to Data. Click on **Spark SQL (Beta)** and enter Spark SQL's Thrift JDBC/ODBC Server information. Make sure to pick **User Name** as **Authentication**. Click **Connect**. ![Tableau and Spark SQL](/images/datacollection/tableau-01.png) On the next page, pick **default** under **Schema**. INFO: You may not see any choices when you click on Schema. Simply press Enter and Tableau will try to list all schemas. Once you see a list of tables that includes **events**, click **New Custom SQL**, then enter the following. ```sql SELECT event, entityType, entityId, targetEntityType, targetEntityId, properties.rating FROM events ``` Click **Update Now**. You should see the following screen by now, indicating success in loading data. Using a custom SQL allows you to extract arbitrary fields from within properties. ![Setting up Tableau](/images/datacollection/tableau-02.png) Click **Go to Worksheet** and start analyzing. The following shows an example of breaking down different rating values. ![Rating Values Breakdown](/images/datacollection/tableau-03.png) The following shows a summary of interactions. ![Interactions](/images/datacollection/tableau-04.png) Happy analyzing! ================================================ FILE: docs/manual/source/datacollection/analytics-zeppelin.html.md.erb ================================================ --- title: Machine Learning Analytics with Zeppelin --- [Apache Zeppelin](http://zeppelin-project.org/) is an interactive computational environment built on Apache Spark like the IPython Notebook. With [Apache PredictionIO](http://predictionio.apache.org) and [Spark SQL](https://spark.apache.org/sql/), you can easily analyze your collected events when you are developing or tuning your engine. ## Prerequisites The following instructions assume that you have the command `sbt` accessible in your shell's search path. Alternatively, you can use the `sbt` command that comes with Apache PredictionIO at `$PIO_HOME/sbt/sbt`. <%= partial 'shared/datacollection/parquet' %> ## Building Zeppelin for Apache Spark 1.2+ Start by cloning Zeppelin. ``` $ git clone https://github.com/apache/zeppelin.git ``` Build Zeppelin with Hadoop 2.4 and Spark 1.2 profiles. ``` $ cd zeppelin $ mvn clean package -Pspark-1.2 -Dhadoop.version=2.4.0 -Phadoop-2.4 -DskipTests ``` Now you should have working Zeppelin binaries. ## Preparing Zeppelin First, start Zeppelin. ``` $ bin/zeppelin-daemon.sh start ``` By default, you should be able to access Zeppelin via web browser at http://localhost:8080. Create a new notebook and put the following in the first cell. ```scala sqlc.parquetFile("/tmp/movies").registerTempTable("events") ``` ![Preparing Zeppelin](/images/datacollection/zeppelin-01.png) ## Performing Analysis with Zeppelin If all steps above ran successfully, you should have a ready-to-use analytics environment by now. Let's try a few examples to see if everything is functional. In the second cell, put in this piece of code and run it. ``` %sql SELECT entityType, event, targetEntityType, COUNT(*) AS c FROM events GROUP BY entityType, event, targetEntityType ``` ![Summary of Events](/images/datacollection/zeppelin-02.png) We can also easily plot a pie chart. ``` %sql SELECT event, COUNT(*) AS c FROM events GROUP BY event ``` ![Summary of Event in Pie Chart](/images/datacollection/zeppelin-03.png) And see a breakdown of rating values. ``` %sql SELECT properties.rating AS r, COUNT(*) AS c FROM events WHERE properties.rating IS NOT NULL GROUP BY properties.rating ORDER BY r ``` ![Breakdown of Rating Values](/images/datacollection/zeppelin-04.png) Happy analyzing! ================================================ FILE: docs/manual/source/datacollection/analytics.html.md ================================================ --- title: Using Analytics Tools --- Event Server collects and unifies data for your application from multiple channels. Data can be exported to Apache parquet format with `pio export` for fast analysis. The following analytics tools are currently supported: 1. [IPython Notebook](/datacollection/analytics-ipynb/) 2. [Tableau](/datacollection/analytics-tableau/) 3. [Zeppelin](/datacollection/analytics-zeppelin/) ================================================ FILE: docs/manual/source/datacollection/batchimport.html.md ================================================ --- title: Importing Data in Batch --- If you have a large amount of data to start with, performing batch import will be much faster than sending every event over an HTTP connection. ## Preparing Input File The import tool expects its input to be a file stored either in the local filesystem or on HDFS. Each line of the file should be a JSON object string representing an event. For more information about the format of event JSON object, please refer to [this page](/datacollection/eventapi/#using-event-api). Shown below is an example that contains 5 events ready to be imported to the Event Server. ```json {"event":"buy","entityType":"user","entityId":"3","targetEntityType":"item","targetEntityId":"0","eventTime":"2014-11-21T01:04:14.716Z"} {"event":"buy","entityType":"user","entityId":"3","targetEntityType":"item","targetEntityId":"1","eventTime":"2014-11-21T01:04:14.722Z"} {"event":"rate","entityType":"user","entityId":"3","targetEntityType":"item","targetEntityId":"2","properties":{"rating":1.0},"eventTime":"2014-11-21T01:04:14.729Z"} {"event":"buy","entityType":"user","entityId":"3","targetEntityType":"item","targetEntityId":"7","eventTime":"2014-11-21T01:04:14.735Z"} {"event":"buy","entityType":"user","entityId":"3","targetEntityType":"item","targetEntityId":"8","eventTime":"2014-11-21T01:04:14.741Z"} ``` WARNING: Please make sure your import file does not contain any empty lines. Empty lines will be treated as a null object and will return an error during import. ## Use SDK to Prepare Batch Input File Some of the Apache PredictionIO SDKs also provides FileExporter client. You may use them to prepare the JSON file as described above. The FileExporter creates event in the same way as EventClient except that the events are written to a JSON file instead of being sent to EventSever. The written JSON file can then be used by batch import.
(coming soon)
```python import predictionio from datetime import datetime import pytz # Create a FileExporter and specify "my_events.json" as destination file exporter = predictionio.FileExporter(file_name="my_events.json") event_properties = { "someProperty" : "value1", "anotherProperty" : "value2", } # write the events to a file event_response = exporter.create_event( event="my_event", entity_type="user", entity_id="uid", target_entity_type="item", target_entity_id="iid", properties=event_properties, event_time=datetime(2014, 12, 13, 21, 38, 45, 618000, pytz.utc)) # ... # close the FileExporter when finish writing all events exporter.close() ```
(coming soon)
```java (coming soon) ```
## Import Events from Input File Importing events from a file can be done easily using the command line interface. Assuming that `pio` be in your search path, your App ID be `123`, and the input file `my_events.json` be in your current working directory: ```bash $ pio import --appid 123 --input my_events.json ``` After a brief while, the tool should return to the console without any error. Congratulations! You have successfully imported your events. ================================================ FILE: docs/manual/source/datacollection/channel.html.md.erb ================================================ --- title: Channel --- Each App has a default channel (without name) which stores all incoming events. This "default" one is used when channel is not specified. You may create additional Channels for the App. Creating multiple Channels is advanced usage. You don't need to create any in order to use Apache PredictionIO. The Channel is associated with one App only and must have unique name within the same App. Creating multiple Channels allows you more easily to identify, manage and use specific event data if you may collect events from different multiple sources (eg. mobile, website, or third-party webhooks service) for the your application. (More usage details coming soon...) ## Create a new Channel For example, to create a new channel "myChannel" for app "myApp", run following `pio` command: ``` pio app channel-new myApp myChannel ``` you should see something like the following outputs: ``` [INFO] [App$] Updated Channel meta-data. [INFO] [HBLEvents] The table predictionio_eventdata:events_5_2 doesn't exist yet. Creating now... [INFO] [App$] Initialized Event Store for the channel: myChannel. [INFO] [App$] Created new channel: [INFO] [App$] Channel Name: myChannel [INFO] [App$] Channel ID: 2 [INFO] [App$] App ID: 5 ``` Now "myChannel" is created and ready for collecting data. ## Collect data through Channel The Event API support optional `channel` query parameter. This allows you to import and query events of the specified channel. When the `channel` parameter is not specified, the data is collected through the default channel. URL: `http://localhost:7070/events.json?accessKey=yourAccessKeyString&channel=yourChannelName` Query parameters: Field | Type | Description :---- | :----| :----- `accessKey` | String | The Access Key for your App `channel` | String | The channel name (optional). Specify this to import data to this channel. **NOTE: supported in PIO version >= 0.9.2** only. Channel must be created first. For SDK usage, one EventClient should be responsible for collecting data of one specific channel. The channel name is specified when the EventClient object is instantiated. For example, the following code import event to "YOUR_CHANNEL" of the corresponding App.
```bash $ curl -i -X POST http://localhost:7070/events.json?accessKey=YOUR_ACCESS_KEY&channel=YOUR_CHANNEL \ -H "Content-Type: application/json" \ -d '{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", "targetEntityType" : "item", "targetEntityId" : "iid", "properties" : { "someProperty" : "value1", "anotherProperty" : "value2" }, "eventTime" : "2004-12-13T21:39:45.618Z" }' ```
(TODO: update me)
```python from predictionio import EventClient from datetime import datetime import pytz # Create a EventClient for "YOUR_CHANNEL" client = EventClient('YOUR_ACCESS_KEY', "http://localhost:7070", channel='YOUR_CHANNEL') # default channel if not specified event_properties = { "someProperty" : "value1", "anotherProperty" : "value2", } event_response = client.create_event( event="my_event", entity_type="user", entity_id="uid", target_entity_type="item", target_entity_id="iid", properties=event_properties, event_time=datetime(2014, 12, 13, 21, 38, 45, 618000, pytz.utc)) ```
(TODO: update me)
```java (coming soon) ```
You can also follow the EventAPI [debug receipts](/datacollection/eventapi/#debugging-recipes) to query the events of specific channel by adding the `channel` query parameter in the URL. ## Delete a Channel (including all imported data) ``` pio app channel-delete ``` ## Delete the data-only of a Channel ``` pio app data-delete --channel ``` ## Accessing Channel Data in Engine To acccess channel data, simply specify the channel name when use the PEventStore or LEventStore API. Data is read from from the default channel if channelName is not specified. For example, read data from default channel: ```scala val eventsRDD: RDD[Event] = PEventStore.find( appName = dsp.appName, entityType = Some("user"), eventNames = Some(List("rate", "buy")), // read "rate" and "buy" event // targetEntityType is optional field of an event. targetEntityType = Some(Some("item")))(sc) ``` For examlpe, read data from the channel "CHANNEL_NAME" ```scala val eventsRDD: RDD[Event] = PEventStore.find( appName = dsp.appName, channelName = Some("CHANNEL_NAME"), // ADDED entityType = Some("user"), eventNames = Some(List("rate", "buy")), // read "rate" and "buy" event // targetEntityType is optional field of an event. targetEntityType = Some(Some("item")))(sc) ``` ================================================ FILE: docs/manual/source/datacollection/eventapi.html.md ================================================ --- title: Collecting Data through REST/SDKs --- **Event Server** is designed to collect data into Apache PredictionIO in an event-based style. Once the Event Server is launched, your application can send data to it through its **Event API** with HTTP requests or with `EventClient`s of PredictionIO's SDKs. INFO: All Apache PredictionIO-compliant engines support accessing the Event Store (i.e. the data store of Event Server) through [Apache PredictionIO's Storage API](http://predictionio.apache.org/api/current/index.html#org.apache.predictionio.data.storage.package). ## Launching the Event Server INFO: Before launching the Event Server, make sure that your event data store backend is properly configured and is running. By default, Apache PredictionIO uses Apache HBase, and a quick configuration can be found [here](/install/install-sourcecode/#hbase). Please allow a minute (usually less than 30 seconds) after HBase is started for its initialization to complete before starting the Event Server. Everything about Apache PredictionIO can be done through the `pio` command. Please add PIO binary command path to to your `PATH` first. Assuming PredictionIO is installed at `/home/yourname/PredictionIO/`, you can run ``` $ PATH=$PATH:/home/yourname/PredictionIO/bin; export PATH ``` To start the event server, run ``` $ pio eventserver ``` INFO: By default, the Event Server is bound to 0.0.0.0, which serves global traffic. To tighten security, you may use `pio eventserver --ip 127.0.0.1` to serve only local traffic. ### Check Server Status ``` $ curl -i -X GET http://localhost:7070 ``` Sample response: ``` HTTP/1.1 200 OK Server: akka-http/10.1.5 Date: Wed, 10 Sep 2014 22:37:30 GMT Content-Type: application/json; charset=UTF-8 Content-Length: 18 {"status":"alive"} ``` ### Generating App ID and Access Key First, you need to create a new app in the Event Server. You will later send data into it. ``` $ pio app new MyTestApp ``` > You can replace `MyTestApp` with name of your App. Take note of the *Access Key* and *App ID* generated. You need the *Access Key* to use the Event API. You should see something like the following output: ``` [INFO] [App$] Created new app: [INFO] [App$] Name: MyTestApp [INFO] [App$] ID: 6 [INFO] [App$] Access Key: WPgcXKd42FPQpZHVbVeMyqF4CQJUnXQmIMTHhX3ZUrSzvy1KXJjdFUrslifa9rnB ``` ### Creating Your First Event You may connect to the Event Server with HTTP request or by using one of many **Apache PredictionIO SDKs**. For example, the following shows how one can create an event involving a single entity. Replace the value of `accessKey` by the *Access Key* generated for your App.
```bash $ curl -i -X POST http://localhost:7070/events.json?accessKey=WPgcXKd42FPQpZHVbVeMyqF4CQJUnXQmIMTHhX3ZUrSzvy1KXJjdFUrslifa9rnB \ -H "Content-Type: application/json" \ -d '{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", "properties" : { "prop1" : 1, "prop2" : "value2", "prop3" : [1, 2, 3], "prop4" : true, "prop5" : ["a", "b", "c"], "prop6" : 4.56 } "eventTime" : "2004-12-13T21:39:45.618-07:00" }' ```
```php createEvent(array( 'event' => 'my_event', 'entityType' => 'user', 'entityId' => 'uid', 'properties' => array('prop1' => 1, 'prop2' => 'value2', 'prop3' => array(1,2,3), 'prop4' => true, 'prop5' => array('a','b','c'), 'prop6' => 4.56 ), 'eventTime' => '2004-12-13T21:39:45.618-07:00' )); ?> ```
```python from predictionio import EventClient from datetime import datetime import pytz client = EventClient('YOUR_ACCESS_KEY', "http://localhost:7070") first_event_properties = { "prop1" : 1, "prop2" : "value2", "prop3" : [1, 2, 3], "prop4" : True, "prop5" : ["a", "b", "c"], "prop6" : 4.56 , } first_event_time = datetime( 2004, 12, 13, 21, 39, 45, 618000, pytz.timezone('US/Mountain')) first_event_response = client.create_event( event="my_event", entity_type="user", entity_id="uid", properties=first_event_properties, event_time=first_event_time, ) ```
```ruby require 'predictionio' event_client = PredictionIO::EventClient.new('YOUR_ACCESS_KEY') event_client.create_event('my_event', 'user', 'uid', 'eventTime' => '2004-12-13T21:39:45.618-07:00', 'properties' => { 'prop1' => 1, 'prop2' => 'value2', 'prop3' => [1, 2, 3], 'prop4' => true, 'prop5' => %w(a b c), 'prop6' => 4.56 }) ```
```java (coming soon) ```
For example, the following shows how one can create an event involving two entities (with `targetEntity`).
```bash $ curl -i -X POST http://localhost:7070/events.json?accessKey=WPgcXKd42FPQpZHVbVeMyqF4CQJUnXQmIMTHhX3ZUrSzvy1KXJjdFUrslifa9rnB \ -H "Content-Type: application/json" \ -d '{ "event" : "my_event", "entityType" : "user", "entityId" : "uid", "targetEntityType" : "item", "targetEntityId" : "iid", "properties" : { "someProperty" : "value1", "anotherProperty" : "value2" }, "eventTime" : "2004-12-13T21:39:45.618Z" }' ```
```php createEvent(array( 'event' => 'my_event', 'entityType' => 'user', 'entityId' => 'uid', 'targetEntityType' => 'item', 'targetEntityId' => 'iid', 'properties' => array('someProperty'=>'value1', 'anotherProperty'=>'value2'), 'eventTime' => '2004-12-13T21:39:45.618Z' )); ?> ```
```python # Second Event second_event_properties = { "someProperty" : "value1", "anotherProperty" : "value2", } second_event_response = client.create_event( event="my_event", entity_type="user", entity_id="uid", target_entity_type="item", target_entity_id="iid", properties=second_event_properties, event_time=datetime(2014, 12, 13, 21, 38, 45, 618000, pytz.utc)) ```
```ruby require 'predictionio' event_client = PredictionIO::EventClient.new('YOUR_ACCESS_KEY') event_client.create_event('my_event', 'user', 'uid', 'targetEntityType' => 'item', 'targetEntityId' => 'iid', 'eventTime' => '2004-12-13T21:39:45.618Z', 'properties' => { 'someProperty' => 'value1', 'anotherProperty' => 'value2' }) ```
```java (coming soon) ```
Sample response: ``` HTTP/1.1 201 Created Server: akka-http/10.1.5 Date: Wed, 10 Sep 2014 22:51:33 GMT Content-Type: application/json; charset=UTF-8 Content-Length: 41 {"eventId":"AAAABAAAAQDP3-jSlTMGVu0waj8"} ``` ## Using Event API ### Event Creation API URL: `http://localhost:7070/events.json?accessKey=yourAccessKeyString` Query parameters: Field | Type | Description :---- | :----| :----- `accessKey` | String | The Access Key for your App The event creation support many commonly used data. POST request body: Field | Type | Description :---- | :----| :----- `event` | String | Name of the event. | | (Examples: "sign-up", "rate", "view", "buy"). | | **Note**: All event names start with "$" and "pio_" are reserved | | and shouldn't be used as your custom event name (eg. "$set"). `entityType` | String | The entity type. It is the namespace of the entityId and | | analogous to the table name of a relational database. The | | entityId must be unique within same entityType. | | **Note**: All entityType names start with "$" and "pio_" are | | reserved and shouldn't be used. `entityId` | String | The entity ID. `entityType-entityId` becomes the unique | | identifier of the entity. For example, you may have entityType | | named `user`, and different entity IDs, say `1` and `2`. In this | | case, `user-1` and `user-2` uniquely identifies | these two | | entities. `targetEntityType` | String | (Optional) The target entity type. | | **Note**: All entityType names start with "$" and "pio_" | | are reserved and shouldn't be used. `targetEntityId` | String | (Optional) The target entity ID. `properties` | JSON | (Optional) See **Note About Properties** below | | **Note**: All property names start with "$" and "pio_" | | are reserved and shouldn't be used as keys inside `properties`. `eventTime` | String | (Optional) The time of the event. Although Event Server's | | current system time and UTC timezone will be used if this is | | unspecified, it is highly recommended that this time should be | | generated by the client application in order to accurately | | record the time of the event. | | Must be in ISO 8601 format (e.g. | | `2004-12-13T21:39:45.618Z`, or `2014-09-09T16:17:42.937-08:00`). ## Note About Properties Note that `properties` can be: 1. Associated with an *generic event*: The `properties` field provide additional information about this event 2. Associated with an *entity*: The `properties` field is used to record the changes of an entity's properties with special events `$set`, `$unset` and `$delete`. Please see the [Events Modeling](/datacollection/eventmodel/) for detailed explanation. ## Debugging Recipes WARNING: The following API are mainly for development or debugging purpose only. They should not be supported by SDK nor used by real application under normal circumstances and they are subject to changes. INFO: Instead of using `curl`, you can also install JSON browser plugins such as **JSONView** to pretty-print the JSON on your browser. With the browser plugin you can make the `GET` queries below by passing in the URL. Plugins like **Postman - REST Client** provide a more advanced interface for making queries. The `accessKey` query parameter is mandatory. Replace `` and `` by a real one in the following: ### Get an Event ``` $ curl -i -X GET http://localhost:7070/events/.json?accessKey= ``` ### Delete an Event ``` $ curl -i -X DELETE http://localhost:7070/events/.json?accessKey= ``` ### Get Events of an App ``` $ curl -i -X GET http://localhost:7070/events.json?accessKey= ``` INFO: By default, it returns at most 20 events. Use the `limit` parameter to specify how many events returned (see below). Use cautiously! In addition, the following *optional* parameters are supported: - `startTime`: time in ISO8601 format. Return events with `eventTime >= startTime`. - `untilTime`: time in ISO8601 format. Return events with `eventTime < untilTime`. - `entityType`: String. The entityType. Return events for this `entityType` only. - `entityId`: String. The entityId. Return events for this `entityId` only. - `event`: String. The event name. Return events with this name only. - `targetEntityType`: String. The targetEntityType. Return events for this `targetEntityType` only. - `targetEntityId`: String. The targetEntityId. Return events for this `targetEntityId` only. - `limit`: Integer. The number of record events returned. Default is 20. -1 to get all. - `reversed`: Boolean. **Must be used with both `entityType` and `entityId` specified**, returns events in reversed chronological order. Default is false. WARNING: If you are using curl with the & symbol, you should quote the entire URL by using single or double quotes. WARNING: Depending on the size of data, you may encounter timeout when querying with some of the above filters. Event server uses `entityType` and `entityId` as the key so any query without both `entityType` and `entityId` specified might result in a timeout. For example, get all events of an app with `eventTime >= startTime` ``` $ curl -i -X GET "http://localhost:7070/events.json?accessKey=&startTime=