by Le Zhang (Data Scientist, Microsoft) and Graham Williams (Director of Data Science, Microsoft)
As an in-memory application, R is sometimes thought to be constrained in performance or scalability for enterprise-grade applications. But by deploying R in a high-performance cloud environment, and by leveraging the scale of parallel architectures and dedicated big-data technologies, you can build applications using R that provide the necessary computational efficiency, scale, and cost-effectiveness.
We identify four application areas and associated applications and Azure services that you can use to deploy R in enterprise applications. They cover the tasks required to prototype, build, and operationalize an enterprise-level data science and AI solution. In each of the four, there are R packages and tools specifically for accelerating the development of desirable analytics.
Below is a brief introduction of each.
Cloud resource management and operation
Cloud computing instances or services can be harnessed within an R session, and this favors programmatic control and operationalization of R based analytical pipelines. R packages and tools in this category are featured by offering a simplified way to interact with the Azure cloud platform and operate resources (e.g., blob storage, Data Science Virtual Machine, Azure Batch Service, etc.) on Azure for various tasks.
Remote interaction and access to cloud resources
Data scientists can seamlessly log in and out of R session on cloud for experimentation and explorative study. The R packages and tools in this category help data scientists or developers to remotely access or interact with Azure cloud instances or services for convenient development.
- mrsdeploy – an R package that provides functions for establishing a remote session in a console application and for publishing and managing a web service that is backed by the R code block or script you provided.
- R Tools for Visual Studio – IDE with R support.
- RStudio Server – IDE for remote R session with access via Internet browser.
- JupterHub – Jupyter notebook with multi-user access.
- IRKernel – R kernel for Jupyter notebook.
Scalable and advanced analytics.
Scalable analytics and advanced machine (deep) learning model creation can be performed in R on cloud services, with acceleration of application-specific hardware like GPUs. R packages and tools in this category allow one to perform large-scale R-based analytics on Azure with modern frameworks such as Spark, Hadoop, Microsoft Cognitive Toolkit, Tensorflow, and Keras. It is worth mentioning that many of the tools are pre-installed and configured for direct use on the Azure Data Science Virtual Machine.
- dplyrXdf – a dplyr backend for the XDF data format used in Microsoft ML Server.
- sparklyr – R interface for Apache Spark.
- SparkR – an R package that provides a light-weight frontend to use Apache Spark from R.
- CNTK-R – R bindings to the Cognitive Toolkit (CNTK) deep learning library.
- tensorflow – R interface to Tensorflow.
- mxnet – R interface to MXNET, bringing flexible and efficient GPU computing and state-of-art deep learning to R.
- keras – R interface to Keras.
- darch – Create deep architectures in R.
- deepnet – Implement some deep learning architectures and neural network algorithms, including BP, RBM, DBN, Deep autoencoder and so on.
- gpuR – R interface to use GPUs.
- RevoScaleR – a collection of portable, scalable, and distributable R functions for importing, transforming, and analyzing data at scale, included with Microsoft ML Server.
- MicrosoftML – a package that provides state-of-the-art fast, scalable machine learning algorithms and transforms for R.
- h2o – R interface to H2O.
Application and service deployment
R based applications can be easily deployed as service for end-users or developers. The R packages and tools in this category are used for deploying an R-based analytics or applicaiton as services or interfaces that can be conveniently consumed by end-users or developers.
- mrsdeploy – an R package included with Microsoft ML Server that provides functions for deploying easily-consumable service within R session.
- AzureML– an R package to allow one to interact with Azure Machine Learning Studio for publishing R functions as API services.
- Azure Container Instances – service to allow running containerized R analytics in Azure.
- Azure Container Service – service that simplifies deployment, management, and operation of orchestrated containers of R analytics in Azure.
- Shiny server – Develop and publish Shiny based web applications online.
For more information
Companies around the world are using R to build enterprise-grade applications on Azure. For in-depth examples (with code and architecture), you can also find a selection of R based solutions for real-world use cases. A more detailed list of packages and tools for deploying R in Azure is provided at the link below, and will be updated as new tools become available.
Github (yueguoguo): R in Azure
Bigdata and data center