Your First R Session

Last updated on 2026-04-22 | Edit this page

Overview

Questions

  • How does RStudio compare to the SPSS interface?
  • How do I import data, including SPSS .sav files?
  • How do I get descriptive statistics and frequency tables in R?

Objectives

  • Navigate the RStudio interface and identify the equivalent of SPSS panels
  • Import CSV and SPSS .sav files into R
  • Run basic descriptive statistics and frequency tables
  • Inspect variables and data structure
Cartoon of a researcher at a Caribbean beach bar opening his laptop to the R console, with an iguana watching from the counter
The iguana is optional. The coconut water is not.

RStudio orientation


When you open RStudio for the first time, you see four panes. If you have used SPSS before, each one has a rough equivalent:

RStudio pane Location SPSS equivalent What it does
Source Editor Top-left Syntax Editor Where you write and save your code (scripts)
Console Bottom-left Output Viewer Where R runs commands and prints results
Environment Top-right Data View header Lists all objects (datasets, values) currently in memory
Files / Plots / Help Bottom-right (no equivalent) File browser, plot preview, and built-in documentation

The key difference from SPSS: in SPSS, you usually have one dataset open at a time and interact through menus. In RStudio, you write instructions in the Source Editor (top-left), send them to the Console (bottom-left), and the results appear either in the Console or the Plots pane.

Callout

The Source Editor is your new best friend

In SPSS, many users never open the Syntax Editor — they click menus instead. In R, the Source Editor is how you work. Think of it as a recipe: you write the steps once, and you (or anyone else) can re-run them at any time.

Save your scripts with the .R extension. Three reasons you will thank yourself later. RStudio recognises .R files and turns on syntax highlighting, error checking, and the “Run” button. Version control systems like Git track changes line by line in .R files but treat other formats as opaque blobs. And when a colleague opens the file in six months, the extension tells them immediately that this is R code, not a Word document or a loose text file. The extension is small. The habit pays for itself the first time you come back to your own work.

Objects and assignment


In SPSS, when you compute a new variable, it appears as a column in your dataset. In R, everything you create is stored as a named object.

You create objects with the assignment operator <- (a less-than sign followed by a hyphen). Read it as “gets” or “is assigned”.

R

# Store a number
population <- 106739

# Store text (called a "character string" in R)
island <- "Aruba"

# Store the result of a calculation
density <- population / 180  # Aruba is about 180 km²

To see the value of an object, type its name and run it:

R

population

OUTPUT

[1] 106739

R

island

OUTPUT

[1] "Aruba"

R

density

OUTPUT

[1] 592.9944
Callout

Why <- and not =?

You will see some people use = for assignment, and it works in most cases. However, the R community convention is <-. It makes your code easier to read because = is also used inside function arguments (as you will see shortly).

In RStudio, the keyboard shortcut Alt + - (Alt and the minus key) types <- for you automatically.

Functions: R’s version of menu clicks


In SPSS, you click Analyze > Descriptive Statistics > Descriptives and a dialog box appears. In R, you call a function instead. A function has a name, and you pass it arguments inside parentheses.

R

# round() is a function. 592.777 is the input, digits = 1 is an option.
round(592.777, digits = 1)

OUTPUT

[1] 592.8

R

# sqrt() calculates a square root
sqrt(density)

OUTPUT

[1] 24.35148

The pattern is always: function_name(argument1, argument2, ...). This is the R equivalent of filling in an SPSS dialog box — the function name is the menu item, and the arguments are the fields you would fill in.

Packages: extending R


R comes with many built-in functions, but its real power comes from packages — add-on libraries written by other users. Think of them as SPSS modules, except they are free.

There are two steps:

  1. Install the package (once per computer, like installing an app):

R

install.packages("tidyverse")
  1. Load the package (once per session, like opening an app):

R

library(tidyverse)

OUTPUT

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.1     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Callout

install.packages() vs library()

A common source of confusion for beginners:

  • install.packages("tidyverse") — downloads and installs the package. You only need to do this once (or when you want to update). Note the quotation marks.
  • library(tidyverse) — loads a package that is already installed so you can use it in your current session. You do this every time you start R. No quotation marks needed (though they work too).

Analogy: install.packages() is buying a book and putting it on your shelf. library() is taking the book off the shelf and opening it.

Importing data


Before you import — set up your workshop folder

R can read a file from the internet, and we saw that in Episode 1. In daily work you will more often read from a file that already lives on your computer — on a shared drive, in a project folder, next to your script. We will do that here.

Three short steps, and then every read_csv() line in the rest of the course will just work.

1. Download the two course datasets. Later in this episode we compare loading the same data from a CSV file and from an Excel file. Download both now. Open each link in your browser and click the Download raw file button near the top right of the preview:

Do not open the CSV in Excel and re-save — that can silently change the encoding. Just save both files as they are.

2. Create an RStudio project. In RStudio, go to File → New Project → New Directory → New Project. Name the directory r-workshop and save it somewhere you can find again (your Documents folder or Desktop is fine). RStudio will open a fresh session with this folder as its working directory.

3. Put the CSV where R expects to find it. Inside your r-workshop project folder, create a subfolder called data (lower-case, no spaces). Move the downloaded aruba_visitors.csv into it. Your structure should look like this:

r-workshop/
├── r-workshop.Rproj
└── data/
    ├── aruba_visitors.csv
    └── aruba_visitors.xlsx

In RStudio’s Files pane (bottom-right), click into the data folder. If you see aruba_visitors.csv, you are ready. Green sticky note.

Callout

Why a project folder?

A project folder answers the single most common beginner error in R: “R cannot find my file.” The file path "data/aruba_visitors.csv" is read relative to R’s current working directory. When you open a project, RStudio automatically sets the working directory to the project folder, so the path works. Without a project, R’s working directory could be anywhere — usually somewhere unhelpful like your Documents folder — and the file is not found.

Projects also keep scripts, data, and outputs organised in one place you can hand to a colleague or archive at the end of an engagement.

CSV files with read_csv()

The most common data format in R is CSV (comma-separated values). The readr package (loaded as part of tidyverse) provides read_csv():

R

visitors <- read_csv("data/aruba_visitors.csv")

OUTPUT

Rows: 120 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): quarter, origin
dbl (7): year, visitors_stayover, visitors_cruise, avg_stay_nights, avg_spen...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

R prints a summary of the column types it detected. This is equivalent to opening a CSV in SPSS via File > Open > Data and checking the variable types.

SPSS .sav files with haven

If you have existing SPSS datasets, the haven package reads them directly — including variable labels and value labels:

R

# install.packages("haven")  # run once if needed
library(haven)
spss_data <- read_sav("path/to/your/file.sav")

This means you do not have to convert your SPSS files to CSV first. R reads them as-is.

Excel files with readxl

Many datasets arrive as Excel files (.xlsx or .xls), especially from government agencies and international organizations. In SPSS, you would import these through File > Open > Data and select the Excel file type from the dropdown. In R, the readxl package handles this.

R

# Iteration: 1
# Install once if needed
install.packages("readxl")

R

# Iteration: 1
library(readxl)

# Basic import -- reads the first sheet by default
visitors_xl <- read_excel("data/aruba_visitors.xlsx")

If your Excel file has multiple sheets, use the sheet argument to specify which one you want – either by name or by position:

R

# Iteration: 1
# By sheet name
stayover <- read_excel("data/aruba_visitors.xlsx", sheet = "stayover")

# By position (second sheet)
cruise <- read_excel("data/aruba_visitors.xlsx", sheet = 2)

You can also read a specific cell range with the range argument, which is useful when the data does not start at cell A1:

R

# Iteration: 1
# Read only cells B2 through F50
subset <- read_excel("data/aruba_visitors.xlsx", range = "B2:F50")
Callout

read_excel() vs read_csv() – when to use which

If you have a choice, CSV is simpler: it is plain text, lightweight, and avoids formatting surprises. Use read_excel() when you receive data in Excel format and do not want to manually export it to CSV first – or when the file contains multiple sheets you need to access programmatically.

Unlike read_csv(), read_excel() is not part of the tidyverse. You need to install and load readxl separately.

Challenge

Challenge: Import from Excel

Suppose you received an Excel file called aruba_visitors.xlsx with two sheets: “stayover” and “cruise”.

  1. Write the code to load the readxl package.
  2. Write the code to read the “cruise” sheet into an object called cruise_data.
  3. How would you check how many rows and columns cruise_data has?

R

# Iteration: 1
# 1: Load the package
library(readxl)

# 2: Read the cruise sheet
cruise_data <- read_excel("aruba_visitors.xlsx", sheet = "cruise")

# 3: Check dimensions
dim(cruise_data)
# Or: glimpse(cruise_data)

Exploring your data


Now that we have the visitors dataset loaded, let us explore it. Each of the functions below is the R equivalent of something you would do in SPSS.

View() — the Data View equivalent

R

View(visitors)

This opens a spreadsheet-like viewer in RStudio, just like SPSS Data View. You can scroll, sort columns by clicking headers, and filter. (Note the capital V.)

head() — see the first few rows

R

head(visitors)

OUTPUT

# A tibble: 6 × 9
   year quarter origin        visitors_stayover visitors_cruise avg_stay_nights
  <dbl> <chr>   <chr>                     <dbl>           <dbl>           <dbl>
1  2019 Q1      United States             72450           45200             6.8
2  2019 Q1      Netherlands               18300            1200            10.2
3  2019 Q1      Venezuela                  8200             400             5.1
4  2019 Q1      Colombia                   6100             800             4.8
5  2019 Q1      Canada                     4500            3200             7.1
6  2019 Q1      Other                      9800            5600             5.5
# ℹ 3 more variables: avg_spending_usd <dbl>, hotel_occupancy_pct <dbl>,
#   satisfaction_score <dbl>

This is faster than View() when you just want a quick look. By default it shows 6 rows. You can change that: head(visitors, n = 10).

str() — the Variable View equivalent

In SPSS, you would switch to Variable View to see variable names, types, and labels. In R, str() does the same thing:

R

str(visitors)

OUTPUT

spc_tbl_ [120 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ year               : num [1:120] 2019 2019 2019 2019 2019 ...
 $ quarter            : chr [1:120] "Q1" "Q1" "Q1" "Q1" ...
 $ origin             : chr [1:120] "United States" "Netherlands" "Venezuela" "Colombia" ...
 $ visitors_stayover  : num [1:120] 72450 18300 8200 6100 4500 ...
 $ visitors_cruise    : num [1:120] 45200 1200 400 800 3200 5600 38100 900 300 700 ...
 $ avg_stay_nights    : num [1:120] 6.8 10.2 5.1 4.8 7.1 5.5 6.5 9.8 4.9 4.6 ...
 $ avg_spending_usd   : num [1:120] 1250 980 620 710 1180 890 1220 960 600 690 ...
 $ hotel_occupancy_pct: num [1:120] 82.3 82.3 82.3 82.3 82.3 82.3 78.1 78.1 78.1 78.1 ...
 $ satisfaction_score : num [1:120] 8.1 7.9 7.5 7.7 8 7.6 8 7.8 7.4 7.6 ...
 - attr(*, "spec")=
  .. cols(
  ..   year = col_double(),
  ..   quarter = col_character(),
  ..   origin = col_character(),
  ..   visitors_stayover = col_double(),
  ..   visitors_cruise = col_double(),
  ..   avg_stay_nights = col_double(),
  ..   avg_spending_usd = col_double(),
  ..   hotel_occupancy_pct = col_double(),
  ..   satisfaction_score = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

This tells you: how many observations (rows), how many variables (columns), and the type of each variable (num for numbers, chr for text).

glimpse() — a tidyverse alternative to str()

The glimpse() function from dplyr gives similar information in a tidier format:

R

glimpse(visitors)

OUTPUT

Rows: 120
Columns: 9
$ year                <dbl> 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 20…
$ quarter             <chr> "Q1", "Q1", "Q1", "Q1", "Q1", "Q1", "Q2", "Q2", "Q…
$ origin              <chr> "United States", "Netherlands", "Venezuela", "Colo…
$ visitors_stayover   <dbl> 72450, 18300, 8200, 6100, 4500, 9800, 65200, 15400…
$ visitors_cruise     <dbl> 45200, 1200, 400, 800, 3200, 5600, 38100, 900, 300…
$ avg_stay_nights     <dbl> 6.8, 10.2, 5.1, 4.8, 7.1, 5.5, 6.5, 9.8, 4.9, 4.6,…
$ avg_spending_usd    <dbl> 1250, 980, 620, 710, 1180, 890, 1220, 960, 600, 69…
$ hotel_occupancy_pct <dbl> 82.3, 82.3, 82.3, 82.3, 82.3, 82.3, 78.1, 78.1, 78…
$ satisfaction_score  <dbl> 8.1, 7.9, 7.5, 7.7, 8.0, 7.6, 8.0, 7.8, 7.4, 7.6, …

summary() — Descriptives in one command

In SPSS: Analyze > Descriptive Statistics > Descriptives. In R:

R

summary(visitors)

OUTPUT

      year        quarter             origin          visitors_stayover
 Min.   :2019   Length:120         Length:120         Min.   :  180
 1st Qu.:2020   Class :character   Class :character   1st Qu.: 4050
 Median :2021   Mode  :character   Mode  :character   Median : 5900
 Mean   :2021                                         Mean   :15948
 3rd Qu.:2022                                         3rd Qu.:17875
 Max.   :2023                                         Max.   :78200
 visitors_cruise   avg_stay_nights  avg_spending_usd hotel_occupancy_pct
 Min.   :    0.0   Min.   : 3.500   Min.   : 400.0   Min.   :12.10
 1st Qu.:  242.5   1st Qu.: 4.575   1st Qu.: 677.5   1st Qu.:70.72
 Median : 1100.0   Median : 5.650   Median : 900.0   Median :77.95
 Mean   : 6624.8   Mean   : 6.220   Mean   : 898.2   Mean   :71.61
 3rd Qu.: 4425.0   3rd Qu.: 6.900   3rd Qu.:1142.5   3rd Qu.:81.47
 Max.   :48900.0   Max.   :10.700   Max.   :1310.0   Max.   :86.40
 satisfaction_score
 Min.   :6.50
 1st Qu.:7.30
 Median :7.65
 Mean   :7.63
 3rd Qu.:8.00
 Max.   :8.40      

For numeric columns, you get the minimum, maximum, mean, median, and quartiles. For character columns, you get the length and type.

table() — Frequency tables

In SPSS: Analyze > Descriptive Statistics > Frequencies. In R:

R

table(visitors$origin)

OUTPUT


       Canada      Colombia   Netherlands         Other United States
           20            20            20            20            20
    Venezuela
           20 

The $ operator extracts a single column from a data frame. So visitors$origin means “the origin column from the visitors dataset” — like clicking on a single variable in SPSS.

You can also make two-way frequency tables:

R

table(visitors$year, visitors$origin)

OUTPUT


       Canada Colombia Netherlands Other United States Venezuela
  2019      4        4           4     4             4         4
  2020      4        4           4     4             4         4
  2021      4        4           4     4             4         4
  2022      4        4           4     4             4         4
  2023      4        4           4     4             4         4
Challenge

Challenge 1: Explore the Aruba visitors dataset

Import the Aruba visitors dataset and answer the following questions using R functions. Write your code in the Source Editor and run each line.

  1. How many rows and how many columns does the dataset have?
  2. What data type is the origin column? What data type is visitors_stayover?
  3. What is the mean avg_spending_usd across all rows?
  4. How many rows are there for each origin country?

R

# Load the data (if not already loaded)
library(tidyverse)
visitors <- read_csv("data/aruba_visitors.csv")

OUTPUT

Rows: 120 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): quarter, origin
dbl (7): year, visitors_stayover, visitors_cruise, avg_stay_nights, avg_spen...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Question 1: How many rows and columns?

R

# Either of these works:
dim(visitors)

OUTPUT

[1] 120   9

R

glimpse(visitors)

OUTPUT

Rows: 120
Columns: 9
$ year                <dbl> 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 20…
$ quarter             <chr> "Q1", "Q1", "Q1", "Q1", "Q1", "Q1", "Q2", "Q2", "Q…
$ origin              <chr> "United States", "Netherlands", "Venezuela", "Colo…
$ visitors_stayover   <dbl> 72450, 18300, 8200, 6100, 4500, 9800, 65200, 15400…
$ visitors_cruise     <dbl> 45200, 1200, 400, 800, 3200, 5600, 38100, 900, 300…
$ avg_stay_nights     <dbl> 6.8, 10.2, 5.1, 4.8, 7.1, 5.5, 6.5, 9.8, 4.9, 4.6,…
$ avg_spending_usd    <dbl> 1250, 980, 620, 710, 1180, 890, 1220, 960, 600, 69…
$ hotel_occupancy_pct <dbl> 82.3, 82.3, 82.3, 82.3, 82.3, 82.3, 78.1, 78.1, 78…
$ satisfaction_score  <dbl> 8.1, 7.9, 7.5, 7.7, 8.0, 7.6, 8.0, 7.8, 7.4, 7.6, …

The dataset has 120 rows and 9 columns (5 years x 4 quarters x 6 origin countries = 120 rows).

Question 2: Data types?

R

str(visitors)

OUTPUT

spc_tbl_ [120 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ year               : num [1:120] 2019 2019 2019 2019 2019 ...
 $ quarter            : chr [1:120] "Q1" "Q1" "Q1" "Q1" ...
 $ origin             : chr [1:120] "United States" "Netherlands" "Venezuela" "Colombia" ...
 $ visitors_stayover  : num [1:120] 72450 18300 8200 6100 4500 ...
 $ visitors_cruise    : num [1:120] 45200 1200 400 800 3200 5600 38100 900 300 700 ...
 $ avg_stay_nights    : num [1:120] 6.8 10.2 5.1 4.8 7.1 5.5 6.5 9.8 4.9 4.6 ...
 $ avg_spending_usd   : num [1:120] 1250 980 620 710 1180 890 1220 960 600 690 ...
 $ hotel_occupancy_pct: num [1:120] 82.3 82.3 82.3 82.3 82.3 82.3 78.1 78.1 78.1 78.1 ...
 $ satisfaction_score : num [1:120] 8.1 7.9 7.5 7.7 8 7.6 8 7.8 7.4 7.6 ...
 - attr(*, "spec")=
  .. cols(
  ..   year = col_double(),
  ..   quarter = col_character(),
  ..   origin = col_character(),
  ..   visitors_stayover = col_double(),
  ..   visitors_cruise = col_double(),
  ..   avg_stay_nights = col_double(),
  ..   avg_spending_usd = col_double(),
  ..   hotel_occupancy_pct = col_double(),
  ..   satisfaction_score = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

origin is character (chr), visitors_stayover is numeric (num).

Question 3: Mean average spending?

R

summary(visitors$avg_spending_usd)

OUTPUT

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  400.0   677.5   900.0   898.2  1142.5  1310.0 

The mean avg_spending_usd is shown in the summary output. You can also get just the mean with:

R

mean(visitors$avg_spending_usd)

OUTPUT

[1] 898.1667

Question 4: Rows per origin country?

R

table(visitors$origin)

OUTPUT


       Canada      Colombia   Netherlands         Other United States
           20            20            20            20            20
    Venezuela
           20 

Each origin country has 20 rows (4 quarters x 5 years).

Challenge

Challenge 2: Practice with objects and functions

  1. Create an object called my_island that stores the text "Aruba".
  2. Create an object called area_km2 that stores the value 180.
  3. Use the nchar() function to count the number of characters in my_island.
  4. Use round() to round the mean of avg_spending_usd to the nearest whole number. (Hint: you can put one function inside another.)

R

# 1 and 2: Create objects
my_island <- "Aruba"
area_km2 <- 180

# 3: Count characters
nchar(my_island)

OUTPUT

[1] 5

R

# 4: Round the mean spending
round(mean(visitors$avg_spending_usd), digits = 0)

OUTPUT

[1] 898

Nesting functions (putting one inside another) is common in R. R evaluates from the inside out: first it calculates mean(visitors$avg_spending_usd), then it passes that result to round().

Summary


You have now completed your first hands-on R session. You can:

  • Find your way around RStudio
  • Create objects and use functions
  • Install and load packages
  • Import a CSV file
  • Inspect your data with View(), head(), str(), glimpse(), summary(), and table()

In SPSS terms, you have learned the equivalent of opening a dataset, switching between Data View and Variable View, and running Descriptives and Frequencies. The difference is that everything you did is saved in a script that you can re-run at any time.

Key Points
  • RStudio is your workspace — it combines a script editor, console, and data viewer
  • haven::read_sav() imports SPSS files directly, preserving labels
  • summary(), table(), and str() replace the Descriptives and Frequencies menus in SPSS