Introduction

RStudio provides a feature-rich user interface to the R programming language, which is commonly used to perform statistical analysis and generate graphical representations of data sets. R is widely used by statisticians and data scientists.

Amherst College provides web-based access to a shared RStudio environment at https://r.amherst.edu. Users new to the R language may find the R intro or the student guide to be useful.

Instructions

Creating Shared Projects 

Ordinarily, an RStudio project is accessible only by the user who creates it. In a classroom context, however, students may work on projects in groups. One feature provided by RStudio is the ability to create shared projects, allowing a particular user to grant access to a project to other RStudio users.

For most users, you can create a shared project by first creating a new project: File -> New Project... and saving that project in your home directory. Then, using File -> Share Project..., you can identify which users will have access to this project.

For shared projects located in your home directory, there is a limit to the total number of users that can be added to these projects. That limit is typically 20 users across all projects. If you plan to share projects with more than 20 distinct users, IT will set up a dedicated directory for you on the RStudio server at /shared/projects/{username}. This is especially well suited for classroom groups.

When creating projects with many users, be sure to create the project inside your dedicated shared projects directory.

For more information about setting up shared RStudio projects for classroom use, please send a message to askIT@amherst.edu.

RStudio, Spark and Hadoop

In addition to the single-node RStudio instance described above, users can also use RStudio to interact with a Spark/Hadoop cluster. Please send a message to askIT@amherst.edu if you would like to have access to this resource. Please also note that users must either be on campus or use a VPN connection in order to access this resource.

Connecting to Spark involves, at a minimum:

> library(sparklyr)
> sc <- spark_connect(master = "yarn-client")

If you are using the H2O/SparklingWater system, the correct version must be identified before connecting to Spark.

> library(sparklyr)
> library(rsparkling)
> options(rsparkling.sparklingwater.version = "1.6.8")
> sc <- spark_connect(master = "yarn-client")

More information about using Spark and RStudio is available from the RStudio website: http://spark.rstudio.com/

Audience

students
faculty

Tags