--- name: ModelDeployment topic: Model Deployment with R maintainer: Yuan Tang, James Joseph Balamuta email: terrytangyuan@gmail.com version: 2022-08-24 source: https://github.com/cran-task-views/ModelDeployment --- This CRAN task view contains a list of packages, grouped by topic, that provides functionalities to streamline the process of deploying models to various environments, such as mobile devices, edge devices, cloud, and GPUs, for scoring or inferencing on new data. It complements the related task views on `r view("HighPerformanceComputing")` and `r view("MachineLearning")`. Model deployment is often challenging due to various reasons. Some example challenges are: - It involves deploying models on heterogenous environments, e.g. edge devices, mobile devices, GPUs, etc. - It is hard to compress the model to very small size that could fit on devices with limited storage while keeping the same precision and minimizing the overhead to load the model for inference. - Deployed models sometimes need to process new data records within limited memory on small devices. - Many deployment environments have bad network connectivity so sometimes cloud solutions may not meet the requirements. - There's interest in stronger user data privacy paradigms where user data does not need to leave the mobile device. - There's growing demand to perform on-device model-based data filtering before collecting the data. Many of the areas discussed in this task view are undergoing rapid changes in industries and academia. Please send any suggestions to the maintainer via e-mail or submit an issue or pull request in the GitHub repository linked above. All suggestions and corrections by others are gratefully acknowledged. ### Deployment through different types of artifacts This section includes packages that provides functionalities to export the trained model to an artifact that could fit in small devices such as mobile devices (e.g. Android, iOS) and edge devices (Rasberri Pi). These packages are built based on different model format. - Predictive Model Markup Language (PMML) is an XML-based language which provides a way for applications to define statistical and data mining models and to share models between PMML compliant applications. The following packages are based on PMML: - The `r pkg("pmml")` package provides the main interface to PMML. - The `r pkg("pmmlTransformations")` package allows for data to be transformed before using it to construct models. Builds structures to allow functions in the PMML package to output transformation details in addition to the model in the resulting PMML file. - The `r pkg("arules")` package provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). The associations can be written to disk in PMML. - The `r pkg("arulesSequences")` package is an add-on for arules to handle and mine frequent sequences. - The `r pkg("arulesCBA")` package provides a function to build an association rule-based classifier for data frames, and to classify incoming data frames using such a classifier. - Plain Old Java Object (POJO) or a Model Object, Optimized (MOJO) are intended to be easily embeddable in any Java environment. The only compilation and runtime dependency for a generated model is a h2o-genmodel.jar file produced as the build output of these packages. The `r pkg("h2o")` package provides easy-to-use interface to build a wide range of machine learning models, such as GLM, DRF, and XGBoost models based on `r pkg("xgboost")` package, which can then be exported as MOJO and POJO format. The MOJO and POJO artifacts can then be loaded by its REST interface as well as different language bindings, e.g. Java, Scala, R, and Python. - [TensorFlow](https://www.tensorflow.org/)'s [SavedModel](https://www.tensorflow.org/api_docs/python/tf/saved_model) as well as its optimized version [TensorFlow Lite](https://www.tensorflow.org/mobile/tflite/), which uses many techniques for achieving low latency such as optimizing the kernels for mobile apps, pre-fused activations, and quantized kernels that allow smaller and faster (fixed-point math) models. It enables on-device machine learning inference with low latency and small binary size. The packages listed below can produce models in this format. Note that these packages are R wrappers of their corresponding Python API based on the `r pkg("reticulate")` package. Though Python binary is required for creating the models, it's not required during inference time for deployment. - The `r pkg("tensorflow")` package provides full access to TensorFlow API for numerical computation using data flow graphs. - The `r pkg("tfestimators")` package provides high-level API to machine learning models as well as highly customized neural network architectures. - The `r pkg("keras")` package high-level API to construct different types of neural networks. - The `r pkg("onnx")` package provides the interface to [Open Neural Network Exchange (ONNX)](https://onnx.ai/) which is a standard format for models built using different frameworks (e.g. TensorFlow, MXNet, PyTorch, CNTK, etc). It defines an extensible computation graph model, as well as definitions of built-in operators and standard data types. Models trained in one framework can be easily transferred to another framework for inference. This open source format enables the interoperability between different frameworks and streamlining the path from research to production will increase the speed of innovation in the AI community. Note that this package is based on the `r pkg("reticulate")` package to interface with the original Python API so Python binary is required for deployment. - The `r pkg("xgboost")` and `r pkg("lightgbm")` packages can be used to create gradient-boosted decision tree (GBDT) models and serialize them to text and binary formats which can be used to create predictions with other technologies outside of R, including but not limited to [Apache Spark](https://spark.apache.org/), [Dask](https://dask.org/), and [treelite](https://github.com/dmlc/treelite). ### Deployment through cloud/server Many deployment environments are based on cloud/server. The following packages provides functionalities to deploy models in those types of environments: - The `r pkg("yhatr")` package allows to deploy, maintain, and invoke models via the [Yhat](https://www.yhat.com) REST API. - The `r pkg("cloudml")` package provides functionality to easily deploy models to [Google Cloud ML Engine](https://cloud.google.com/ml-engine/). - The `r pkg("tfdeploy")` package provides functions to run a local test server that supports the same REST API as CloudML and [RStudio Connect](https://www.rstudio.com/products/connect/). - The `r pkg("vetiver")` package provides tooling to version, share, deploy, and monitor a trained model. Functions handle both recording and checking the model's input data prototype, and predicting from a remote API endpoint. This package is extensible, with generics to support many kinds of models. - The `r pkg("domino")` package provides R interface to [Domino](https://www.dominodatalab.com/) CLI, a service that makes it easy to run your code on scalable hardware, with integrated version control and collaboration features designed for analytical workflows. - The `r pkg("tidypredict")` package provides functionalities to run predictions inside database. It's based on `r pkg("dplyr")` and `r pkg("dbplyr")` that could translate data manipulations written in R to database queries that can be used later to execute the data transformations and aggregations inside various types of databases. - The `r pkg("ibmdbR")` package allows many basic and complex R operations to be pushed down into the database, which removes the main memory boundary of R and allows to make full use of parallel processing in the underlying database. - The `r pkg("sparklyr")` package provides bindings to [Apache Spark](https://spark.apache.org/)'s distributed machine learning library and allows to deploy the trained models to clusters. Additionally, the `r pkg("rsparkling")` package uses `r pkg("sparklyr")` for Spark job deployment while using `r pkg("h2o")` package for regular model building. - The non-CRAN [mrsdeploy](https://docs.microsoft.com/en-us/machine-learning-server/r-reference/mrsdeploy/mrsdeploy-package) package provides functions for establishing a remote session in a console application and for publishing and managing a web service that is backed by the R code block or script you provided. - The `r pkg("opencpu")` package provides a server that exposes a simple but powerful HTTP API for RPC and data interchange with R. This provides a reliable and scalable foundation for statistical services or building R web applications. - Several general purpose server/client frameworks for R exist that could help deploy models in server based environments: - The `r pkg("Rserve")` and `r pkg("RSclient")` packages both provide server and client functionality for TCP/IP or local socket interfaces to enable access to R from many languages and systems. - The `r pkg("httpuv")` package provides a low-level socket and protocol support for handling HTTP and WebSocket requests directly within R. - Several packages offer functionality for turning R code into a web API: - The `r pkg("FastRWeb")` package provides some basic infrastructure for this. - The `r pkg("plumber")` package allows you to create a web API by merely decorating your existing R source code with special comments. - The `r pkg("RestRserve")` package is a R web API framework for building high-performance microservices and app backends based on `r pkg("Rserve")`. ### Links - Non-CRAN package: [mrsdeploy](https://docs.microsoft.com/en-us/machine-learning-server/r-reference/mrsdeploy/mrsdeploy-package)