Your First R Session
Last updated on 2026-04-22 | Edit this page
Estimated time: 90 minutes
Overview
Questions
- How does RStudio compare to the SPSS interface?
- How do I import data, including SPSS
.savfiles? - How do I get descriptive statistics and frequency tables in R?
Objectives
- Navigate the RStudio interface and identify the equivalent of SPSS panels
- Import CSV and SPSS
.savfiles into R - Run basic descriptive statistics and frequency tables
- Inspect variables and data structure

RStudio orientation
When you open RStudio for the first time, you see four panes. If you have used SPSS before, each one has a rough equivalent:
| RStudio pane | Location | SPSS equivalent | What it does |
|---|---|---|---|
| Source Editor | Top-left | Syntax Editor | Where you write and save your code (scripts) |
| Console | Bottom-left | Output Viewer | Where R runs commands and prints results |
| Environment | Top-right | Data View header | Lists all objects (datasets, values) currently in memory |
| Files / Plots / Help | Bottom-right | (no equivalent) | File browser, plot preview, and built-in documentation |
The key difference from SPSS: in SPSS, you usually have one dataset open at a time and interact through menus. In RStudio, you write instructions in the Source Editor (top-left), send them to the Console (bottom-left), and the results appear either in the Console or the Plots pane.
The Source Editor is your new best friend
In SPSS, many users never open the Syntax Editor — they click menus instead. In R, the Source Editor is how you work. Think of it as a recipe: you write the steps once, and you (or anyone else) can re-run them at any time.
Save your scripts with the .R extension. Three reasons
you will thank yourself later. RStudio recognises .R files
and turns on syntax highlighting, error checking, and the “Run” button.
Version control systems like Git track changes line by line in
.R files but treat other formats as opaque blobs. And when
a colleague opens the file in six months, the extension tells them
immediately that this is R code, not a Word document or a loose text
file. The extension is small. The habit pays for itself the first time
you come back to your own work.
Objects and assignment
In SPSS, when you compute a new variable, it appears as a column in your dataset. In R, everything you create is stored as a named object.
You create objects with the assignment operator
<- (a less-than sign followed by a hyphen). Read it as
“gets” or “is assigned”.
R
# Store a number
population <- 106739
# Store text (called a "character string" in R)
island <- "Aruba"
# Store the result of a calculation
density <- population / 180 # Aruba is about 180 km²
To see the value of an object, type its name and run it:
R
population
OUTPUT
[1] 106739
R
island
OUTPUT
[1] "Aruba"
R
density
OUTPUT
[1] 592.9944
Why <- and not
=?
You will see some people use = for assignment, and it
works in most cases. However, the R community convention is
<-. It makes your code easier to read because
= is also used inside function arguments (as you will see
shortly).
In RStudio, the keyboard shortcut Alt + - (Alt and
the minus key) types <- for you automatically.
Functions: R’s version of menu clicks
In SPSS, you click Analyze > Descriptive Statistics > Descriptives and a dialog box appears. In R, you call a function instead. A function has a name, and you pass it arguments inside parentheses.
R
# round() is a function. 592.777 is the input, digits = 1 is an option.
round(592.777, digits = 1)
OUTPUT
[1] 592.8
R
# sqrt() calculates a square root
sqrt(density)
OUTPUT
[1] 24.35148
The pattern is always:
function_name(argument1, argument2, ...). This is the R
equivalent of filling in an SPSS dialog box — the function name is the
menu item, and the arguments are the fields you would fill in.
Packages: extending R
R comes with many built-in functions, but its real power comes from packages — add-on libraries written by other users. Think of them as SPSS modules, except they are free.
There are two steps:
- Install the package (once per computer, like installing an app):
R
install.packages("tidyverse")
- Load the package (once per session, like opening an app):
R
library(tidyverse)
OUTPUT
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.1 ✔ readr 2.2.0
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.2 ✔ tibble 3.3.1
✔ lubridate 1.9.5 ✔ tidyr 1.3.2
✔ purrr 1.2.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
install.packages() vs
library()
A common source of confusion for beginners:
-
install.packages("tidyverse")— downloads and installs the package. You only need to do this once (or when you want to update). Note the quotation marks. -
library(tidyverse)— loads a package that is already installed so you can use it in your current session. You do this every time you start R. No quotation marks needed (though they work too).
Analogy: install.packages() is buying a book and putting
it on your shelf. library() is taking the book off the
shelf and opening it.
Importing data
Before you import — set up your workshop folder
R can read a file from the internet, and we saw that in Episode 1. In daily work you will more often read from a file that already lives on your computer — on a shared drive, in a project folder, next to your script. We will do that here.
Three short steps, and then every read_csv() line in the
rest of the course will just work.
1. Download the two course datasets. Later in this episode we compare loading the same data from a CSV file and from an Excel file. Download both now. Open each link in your browser and click the Download raw file button near the top right of the preview:
- aruba_visitors.csv — the plain-text version (one flat table)
-
aruba_visitors.xlsx
— the Excel version, with two sheets:
stayoverandcruise
Do not open the CSV in Excel and re-save — that can silently change the encoding. Just save both files as they are.
2. Create an RStudio project. In RStudio, go to
File → New Project → New Directory → New Project. Name
the directory r-workshop and save it somewhere you can find
again (your Documents folder or Desktop is fine). RStudio will open a
fresh session with this folder as its working directory.
3. Put the CSV where R expects to find it. Inside
your r-workshop project folder, create a subfolder called
data (lower-case, no spaces). Move the downloaded
aruba_visitors.csv into it. Your structure should look like
this:
r-workshop/
├── r-workshop.Rproj
└── data/
├── aruba_visitors.csv
└── aruba_visitors.xlsx
In RStudio’s Files pane (bottom-right), click into the
data folder. If you see aruba_visitors.csv,
you are ready. Green sticky note.
Why a project folder?
A project folder answers the single most common beginner error in R:
“R cannot find my file.” The file path
"data/aruba_visitors.csv" is read relative to R’s current
working directory. When you open a project, RStudio automatically sets
the working directory to the project folder, so the path works. Without
a project, R’s working directory could be anywhere — usually somewhere
unhelpful like your Documents folder — and the file is not found.
Projects also keep scripts, data, and outputs organised in one place you can hand to a colleague or archive at the end of an engagement.
CSV files with read_csv()
The most common data format in R is CSV (comma-separated values). The
readr package (loaded as part of tidyverse)
provides read_csv():
R
visitors <- read_csv("data/aruba_visitors.csv")
OUTPUT
Rows: 120 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): quarter, origin
dbl (7): year, visitors_stayover, visitors_cruise, avg_stay_nights, avg_spen...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
R prints a summary of the column types it detected. This is equivalent to opening a CSV in SPSS via File > Open > Data and checking the variable types.
SPSS .sav files with haven
If you have existing SPSS datasets, the haven package
reads them directly — including variable labels and value labels:
R
# install.packages("haven") # run once if needed
library(haven)
spss_data <- read_sav("path/to/your/file.sav")
This means you do not have to convert your SPSS files to CSV first. R reads them as-is.
Excel files with readxl
Many datasets arrive as Excel files (.xlsx or
.xls), especially from government agencies and
international organizations. In SPSS, you would import these through
File > Open > Data and select the Excel file type
from the dropdown. In R, the readxl package handles
this.
R
# Iteration: 1
# Install once if needed
install.packages("readxl")
R
# Iteration: 1
library(readxl)
# Basic import -- reads the first sheet by default
visitors_xl <- read_excel("data/aruba_visitors.xlsx")
If your Excel file has multiple sheets, use the sheet
argument to specify which one you want – either by name or by
position:
R
# Iteration: 1
# By sheet name
stayover <- read_excel("data/aruba_visitors.xlsx", sheet = "stayover")
# By position (second sheet)
cruise <- read_excel("data/aruba_visitors.xlsx", sheet = 2)
You can also read a specific cell range with the range
argument, which is useful when the data does not start at cell A1:
R
# Iteration: 1
# Read only cells B2 through F50
subset <- read_excel("data/aruba_visitors.xlsx", range = "B2:F50")
read_excel() vs
read_csv() – when to use which
If you have a choice, CSV is simpler: it is plain text, lightweight,
and avoids formatting surprises. Use read_excel() when you
receive data in Excel format and do not want to manually export it to
CSV first – or when the file contains multiple sheets you need to access
programmatically.
Unlike read_csv(), read_excel() is not part
of the tidyverse. You need to install and load readxl
separately.
Challenge: Import from Excel
Suppose you received an Excel file called
aruba_visitors.xlsx with two sheets: “stayover” and
“cruise”.
- Write the code to load the
readxlpackage. - Write the code to read the “cruise” sheet into an object called
cruise_data. - How would you check how many rows and columns
cruise_datahas?
R
# Iteration: 1
# 1: Load the package
library(readxl)
# 2: Read the cruise sheet
cruise_data <- read_excel("aruba_visitors.xlsx", sheet = "cruise")
# 3: Check dimensions
dim(cruise_data)
# Or: glimpse(cruise_data)
Exploring your data
Now that we have the visitors dataset loaded, let us
explore it. Each of the functions below is the R equivalent of something
you would do in SPSS.
View() — the Data View equivalent
R
View(visitors)
This opens a spreadsheet-like viewer in RStudio, just like SPSS Data View. You can scroll, sort columns by clicking headers, and filter. (Note the capital V.)
head() — see the first few rows
R
head(visitors)
OUTPUT
# A tibble: 6 × 9
year quarter origin visitors_stayover visitors_cruise avg_stay_nights
<dbl> <chr> <chr> <dbl> <dbl> <dbl>
1 2019 Q1 United States 72450 45200 6.8
2 2019 Q1 Netherlands 18300 1200 10.2
3 2019 Q1 Venezuela 8200 400 5.1
4 2019 Q1 Colombia 6100 800 4.8
5 2019 Q1 Canada 4500 3200 7.1
6 2019 Q1 Other 9800 5600 5.5
# ℹ 3 more variables: avg_spending_usd <dbl>, hotel_occupancy_pct <dbl>,
# satisfaction_score <dbl>
This is faster than View() when you just want a quick
look. By default it shows 6 rows. You can change that:
head(visitors, n = 10).
str() — the Variable View equivalent
In SPSS, you would switch to Variable View to see
variable names, types, and labels. In R, str() does the
same thing:
R
str(visitors)
OUTPUT
spc_tbl_ [120 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ year : num [1:120] 2019 2019 2019 2019 2019 ...
$ quarter : chr [1:120] "Q1" "Q1" "Q1" "Q1" ...
$ origin : chr [1:120] "United States" "Netherlands" "Venezuela" "Colombia" ...
$ visitors_stayover : num [1:120] 72450 18300 8200 6100 4500 ...
$ visitors_cruise : num [1:120] 45200 1200 400 800 3200 5600 38100 900 300 700 ...
$ avg_stay_nights : num [1:120] 6.8 10.2 5.1 4.8 7.1 5.5 6.5 9.8 4.9 4.6 ...
$ avg_spending_usd : num [1:120] 1250 980 620 710 1180 890 1220 960 600 690 ...
$ hotel_occupancy_pct: num [1:120] 82.3 82.3 82.3 82.3 82.3 82.3 78.1 78.1 78.1 78.1 ...
$ satisfaction_score : num [1:120] 8.1 7.9 7.5 7.7 8 7.6 8 7.8 7.4 7.6 ...
- attr(*, "spec")=
.. cols(
.. year = col_double(),
.. quarter = col_character(),
.. origin = col_character(),
.. visitors_stayover = col_double(),
.. visitors_cruise = col_double(),
.. avg_stay_nights = col_double(),
.. avg_spending_usd = col_double(),
.. hotel_occupancy_pct = col_double(),
.. satisfaction_score = col_double()
.. )
- attr(*, "problems")=<externalptr>
This tells you: how many observations (rows), how many variables
(columns), and the type of each variable (num for numbers,
chr for text).
glimpse() — a tidyverse alternative to
str()
The glimpse() function from dplyr gives
similar information in a tidier format:
R
glimpse(visitors)
OUTPUT
Rows: 120
Columns: 9
$ year <dbl> 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 20…
$ quarter <chr> "Q1", "Q1", "Q1", "Q1", "Q1", "Q1", "Q2", "Q2", "Q…
$ origin <chr> "United States", "Netherlands", "Venezuela", "Colo…
$ visitors_stayover <dbl> 72450, 18300, 8200, 6100, 4500, 9800, 65200, 15400…
$ visitors_cruise <dbl> 45200, 1200, 400, 800, 3200, 5600, 38100, 900, 300…
$ avg_stay_nights <dbl> 6.8, 10.2, 5.1, 4.8, 7.1, 5.5, 6.5, 9.8, 4.9, 4.6,…
$ avg_spending_usd <dbl> 1250, 980, 620, 710, 1180, 890, 1220, 960, 600, 69…
$ hotel_occupancy_pct <dbl> 82.3, 82.3, 82.3, 82.3, 82.3, 82.3, 78.1, 78.1, 78…
$ satisfaction_score <dbl> 8.1, 7.9, 7.5, 7.7, 8.0, 7.6, 8.0, 7.8, 7.4, 7.6, …
summary() — Descriptives in one command
In SPSS: Analyze > Descriptive Statistics > Descriptives. In R:
R
summary(visitors)
OUTPUT
year quarter origin visitors_stayover
Min. :2019 Length:120 Length:120 Min. : 180
1st Qu.:2020 Class :character Class :character 1st Qu.: 4050
Median :2021 Mode :character Mode :character Median : 5900
Mean :2021 Mean :15948
3rd Qu.:2022 3rd Qu.:17875
Max. :2023 Max. :78200
visitors_cruise avg_stay_nights avg_spending_usd hotel_occupancy_pct
Min. : 0.0 Min. : 3.500 Min. : 400.0 Min. :12.10
1st Qu.: 242.5 1st Qu.: 4.575 1st Qu.: 677.5 1st Qu.:70.72
Median : 1100.0 Median : 5.650 Median : 900.0 Median :77.95
Mean : 6624.8 Mean : 6.220 Mean : 898.2 Mean :71.61
3rd Qu.: 4425.0 3rd Qu.: 6.900 3rd Qu.:1142.5 3rd Qu.:81.47
Max. :48900.0 Max. :10.700 Max. :1310.0 Max. :86.40
satisfaction_score
Min. :6.50
1st Qu.:7.30
Median :7.65
Mean :7.63
3rd Qu.:8.00
Max. :8.40
For numeric columns, you get the minimum, maximum, mean, median, and quartiles. For character columns, you get the length and type.
table() — Frequency tables
In SPSS: Analyze > Descriptive Statistics > Frequencies. In R:
R
table(visitors$origin)
OUTPUT
Canada Colombia Netherlands Other United States
20 20 20 20 20
Venezuela
20
The $ operator extracts a single column from a data
frame. So visitors$origin means “the origin column from the
visitors dataset” — like clicking on a single variable in SPSS.
You can also make two-way frequency tables:
R
table(visitors$year, visitors$origin)
OUTPUT
Canada Colombia Netherlands Other United States Venezuela
2019 4 4 4 4 4 4
2020 4 4 4 4 4 4
2021 4 4 4 4 4 4
2022 4 4 4 4 4 4
2023 4 4 4 4 4 4
- Spend time on the RStudio pane orientation. Have participants identify each pane on their own screen before moving on.
- The
<-assignment operator trips people up. Give them a few minutes to practice creating objects with different names and values. - When loading tidyverse, the startup messages can be alarming to beginners. Reassure them that the “Attaching packages” and “Conflicts” messages are normal and expected.
- Do the “Before you import — set up your workshop
folder” subsection as a whole-room moment, not as reading.
Project the CSV download page, walk everyone through the browser
download, the New Project dialog, and creating the
datasubfolder. Wait for green sticky notes in the Files pane before typingread_csv(). This is the most common point of failure in the course and it is worth five deliberate minutes up front to avoid twenty scattered minutes of troubleshooting later. - If a participant cannot download (blocked network, locked laptop),
have a USB stick or shared-drive copy of
aruba_visitors.csvready as fallback.
Challenge 1: Explore the Aruba visitors dataset
Import the Aruba visitors dataset and answer the following questions using R functions. Write your code in the Source Editor and run each line.
- How many rows and how many columns does the dataset have?
- What data type is the
origincolumn? What data type isvisitors_stayover? - What is the mean
avg_spending_usdacross all rows? - How many rows are there for each origin country?
R
# Load the data (if not already loaded)
library(tidyverse)
visitors <- read_csv("data/aruba_visitors.csv")
OUTPUT
Rows: 120 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): quarter, origin
dbl (7): year, visitors_stayover, visitors_cruise, avg_stay_nights, avg_spen...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Question 1: How many rows and columns?
R
# Either of these works:
dim(visitors)
OUTPUT
[1] 120 9
R
glimpse(visitors)
OUTPUT
Rows: 120
Columns: 9
$ year <dbl> 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 20…
$ quarter <chr> "Q1", "Q1", "Q1", "Q1", "Q1", "Q1", "Q2", "Q2", "Q…
$ origin <chr> "United States", "Netherlands", "Venezuela", "Colo…
$ visitors_stayover <dbl> 72450, 18300, 8200, 6100, 4500, 9800, 65200, 15400…
$ visitors_cruise <dbl> 45200, 1200, 400, 800, 3200, 5600, 38100, 900, 300…
$ avg_stay_nights <dbl> 6.8, 10.2, 5.1, 4.8, 7.1, 5.5, 6.5, 9.8, 4.9, 4.6,…
$ avg_spending_usd <dbl> 1250, 980, 620, 710, 1180, 890, 1220, 960, 600, 69…
$ hotel_occupancy_pct <dbl> 82.3, 82.3, 82.3, 82.3, 82.3, 82.3, 78.1, 78.1, 78…
$ satisfaction_score <dbl> 8.1, 7.9, 7.5, 7.7, 8.0, 7.6, 8.0, 7.8, 7.4, 7.6, …
The dataset has 120 rows and 9 columns (5 years x 4 quarters x 6 origin countries = 120 rows).
Question 2: Data types?
R
str(visitors)
OUTPUT
spc_tbl_ [120 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ year : num [1:120] 2019 2019 2019 2019 2019 ...
$ quarter : chr [1:120] "Q1" "Q1" "Q1" "Q1" ...
$ origin : chr [1:120] "United States" "Netherlands" "Venezuela" "Colombia" ...
$ visitors_stayover : num [1:120] 72450 18300 8200 6100 4500 ...
$ visitors_cruise : num [1:120] 45200 1200 400 800 3200 5600 38100 900 300 700 ...
$ avg_stay_nights : num [1:120] 6.8 10.2 5.1 4.8 7.1 5.5 6.5 9.8 4.9 4.6 ...
$ avg_spending_usd : num [1:120] 1250 980 620 710 1180 890 1220 960 600 690 ...
$ hotel_occupancy_pct: num [1:120] 82.3 82.3 82.3 82.3 82.3 82.3 78.1 78.1 78.1 78.1 ...
$ satisfaction_score : num [1:120] 8.1 7.9 7.5 7.7 8 7.6 8 7.8 7.4 7.6 ...
- attr(*, "spec")=
.. cols(
.. year = col_double(),
.. quarter = col_character(),
.. origin = col_character(),
.. visitors_stayover = col_double(),
.. visitors_cruise = col_double(),
.. avg_stay_nights = col_double(),
.. avg_spending_usd = col_double(),
.. hotel_occupancy_pct = col_double(),
.. satisfaction_score = col_double()
.. )
- attr(*, "problems")=<externalptr>
origin is character (chr),
visitors_stayover is numeric (num).
Question 3: Mean average spending?
R
summary(visitors$avg_spending_usd)
OUTPUT
Min. 1st Qu. Median Mean 3rd Qu. Max.
400.0 677.5 900.0 898.2 1142.5 1310.0
The mean avg_spending_usd is shown in the summary
output. You can also get just the mean with:
R
mean(visitors$avg_spending_usd)
OUTPUT
[1] 898.1667
Question 4: Rows per origin country?
R
table(visitors$origin)
OUTPUT
Canada Colombia Netherlands Other United States
20 20 20 20 20
Venezuela
20
Each origin country has 20 rows (4 quarters x 5 years).
Challenge 2: Practice with objects and functions
- Create an object called
my_islandthat stores the text"Aruba". - Create an object called
area_km2that stores the value180. - Use the
nchar()function to count the number of characters inmy_island. - Use
round()to round the mean ofavg_spending_usdto the nearest whole number. (Hint: you can put one function inside another.)
R
# 1 and 2: Create objects
my_island <- "Aruba"
area_km2 <- 180
# 3: Count characters
nchar(my_island)
OUTPUT
[1] 5
R
# 4: Round the mean spending
round(mean(visitors$avg_spending_usd), digits = 0)
OUTPUT
[1] 898
Nesting functions (putting one inside another) is common in R. R
evaluates from the inside out: first it calculates
mean(visitors$avg_spending_usd), then it passes that result
to round().
Summary
You have now completed your first hands-on R session. You can:
- Find your way around RStudio
- Create objects and use functions
- Install and load packages
- Import a CSV file
- Inspect your data with
View(),head(),str(),glimpse(),summary(), andtable()
In SPSS terms, you have learned the equivalent of opening a dataset, switching between Data View and Variable View, and running Descriptives and Frequencies. The difference is that everything you did is saved in a script that you can re-run at any time.
- RStudio is your workspace — it combines a script editor, console, and data viewer
-
haven::read_sav()imports SPSS files directly, preserving labels -
summary(),table(), andstr()replace the Descriptives and Frequencies menus in SPSS