Instructor Notes
General Teaching Approach
This course follows the Carpentries live-coding pedagogy: the instructor types code live while participants follow along. Avoid slides for code — always demonstrate in RStudio.
Key principles: - Start from what they know: Every R operation is introduced alongside its SPSS equivalent. Use SPSS terminology first, then introduce the R term. - Wow first, skills second: Episode 1 is pure motivation. Show impressive things before asking anyone to type. - Sticky notes: Use colored sticky notes (or digital equivalents) for real-time feedback. Green = I’m following. Red = I need help. - Helpers: Aim for 1 helper per 5-8 participants to assist with individual issues without stopping the class.
Session Structure
Session 1 (5-6 hours with breaks)
| Episode | Time | Notes |
|---|---|---|
| 01 - The Case for Switching | 45 min | Instructor demo only, no participant coding. Demo is the UA SIDS
reference-list pull from island-research-reference-data
(see Episode 1 instructor block). Also shows the Xander Bogaerts
capstone HTML as the Friday-afternoon target — rendered version at
episodes/files/xander-bogaerts-report.html and source at
episodes/files/xander-bogaerts-report-template.Rmd. |
| Break | 15 min | |
| 02 - Your First R Session | 90 min | First hands-on. Go slow. Many will struggle with typos. The “Before
you import — set up your workshop folder” subsection is a deliberate
whole-room synchronized moment: project the download links on the
screen, wait for green stickies in the Files pane before typing
read_csv(). |
| Break | 15 min | |
| 03 - Data Manipulation | 90 min | The pipe operator is the key “aha” moment |
| Break | 15 min | |
| 04 - Visualization | 60 min | End on a high — everyone leaves with a beautiful chart |
| Wrap-up + homework brief | 15 min | Project the homework page on the screen. Walk through the four-step assignment out loud. Tell participants the page URL is bookmarked under “For Learners → Homework brief” on the course site so they can open it on any device overnight. Emphasise: 30–60 minutes is enough, do not attempt R Markdown yet (that is Day 2). Bring the script to Day 2 open lab. |
Session 2 (5-6 hours with breaks)
| Episode | Time | Notes |
|---|---|---|
| Review and troubleshooting | 30 min | Address questions from between-session practice |
| 05 - Statistical Analysis | 120 min | Core for survey researchers. The normality-testing section (histogram, Q-Q plot, Shapiro-Wilk, robustness note) maps directly onto the SPSS Explore output most participants will recognise. Take your time. |
| Break | 15 min | |
| 06 - Reproducible Reporting | 60 min | R Markdown is often the biggest “wow” for SPSS users. Ends with the
Xander Bogaerts capstone section that Episode 1’s opening teased — walk
through the template at
episodes/files/xander-bogaerts-report-template.Rmd if it
has landed; otherwise describe its structure using the scaffold in the
episode. |
| Break | 15 min | |
| 07 - Where to Go from Here | 55 min | End with practical next steps. The new UA datasets subsection
(CAS_election_data and island-research-reference-data) is a chance to
live-demo read.csv() straight from a raw GitHub URL — most
SPSS users have never seen data load over HTTPS without a manual
download. |
Per-episode scene transitions
Each episode opens with one atmospheric scene image at the top of the
page (fig/scene_1.jpg through scene_7.jpg) and
a one-line quip caption. The captions do most of the work; instructors
who prefer to start straight from the first heading should feel free to.
For instructors who want a single-beat acknowledgment when arriving at
each new episode, the lines below match the captions in tone and feed
naturally into the episode’s opening content. They are optional.
| Ep | Caption on page | Optional transition line |
|---|---|---|
| 1 | One road costs you a license fee. The other one costs you a learning curve. | (Episode 1 opens with the workshop’s full opening sequence; no separate transition needed.) |
| 2 | The iguana is optional. The coconut water is not. | “We’re at the bar, R is open, the iguana is doing iguana things. Time to type something.” |
| 3 | You can’t cook without ingredients. You can’t wrangle without verbs. | “Three jars on the counter today: filter, select, mutate. Everything else in dplyr is a variation on those three.” |
| 4 | SPSS gives you a chart. ggplot2 gives you a language. | “ggplot is grammar, not buttons. By the end of this episode you’ll be writing sentences.” |
| 5 | Same tests, fewer menus, more crabs. | “The tests you know from SPSS — t-test, ANOVA, chi-square, regression — are all here. The crabs are the new part. Trust the crabs.” |
| 6 | Your supervisor changed the sample. Again. Good thing you only need one button. | “This is the moment R Markdown earns the price of admission. One button, new data in, finished document out.” |
| 7 | You learned the basics. The map has a lot more islands. | “The basics are behind you. The next forty-five minutes are about where to go from here, with islands marked for you to chart.” |
Pick one beat, deliver it, move into the page’s first heading. Do not stack a second sentence on top.
Common Issues
- Installation problems: The pre-course installation clinic should catch most of these. Have a USB drive with R and RStudio installers as backup.
- Typos: SPSS users are not used to typing commands. Expect many syntax errors. Normalize this — “error messages are how R talks to you.”
- Parentheses and quotes: The most common beginner errors. Show how RStudio auto-completes these.
-
Loading packages: Participants will forget
library(). Remind them at the start of each episode.
Local Data Notes
The course uses Dutch Caribbean datasets to keep examples relevant: -
CBS Aruba tourism and CPI data (Excel downloads from cbs.aw) - World
Bank indicators via the WDI package (used in Episode 7
only; Episode 1’s demo was switched to the SIDS reference list below
after the WDI tourism series was found missing for 2019-2023) - CBS
Netherlands BES island data via cbsodataR -
CAS_election_data — Aruba, Curacao, Sint Maarten
election results 1985-2025 (tidy CSV at
github.com/University-of-Aruba/CAS_election_data). Used in Episode 7. -
island-research-reference-data — country reference list
with SIDS, SNIJ, and World Bank classifications (CSV at
github.com/University-of-Aruba/island-research-reference-data). Used in
Episodes 1 and 7. A backup copy is committed at
episodes/data/countries_backup.csv for offline
fallback.
Prepare cleaned versions of these datasets in the
episodes/data/ folder before the course. Test all data
downloads — URLs and APIs can change.
Note on the elections example
The Episode 7 election-data example deliberately groups MEP and AVP
together versus all other parties, rather than singling out one party.
Aruba’s two-party dynamic means filtering on party == "MEP"
(or "AVP") on its own can read as partisan bias in a
publicly distributed course. If extending the example live, default to
the same grouped framing or to all-parties views (e.g. all parties for
the most recent election, or vote share over time). If a participant
asks why we don’t filter to one party, this is the reason worth naming
briefly: it is a small editorial choice that protects the course and the
network’s neutrality.
Train-the-Trainer
This course is designed for replication. If you are adapting it for another island or institution: 1. Replace datasets with locally relevant equivalents 2. Adjust the SPSS operations covered based on your pre-course survey results 3. Keep the “wow first, skills second” structure 4. All materials are CC-BY 4.0 — please attribute the DCDC Network
The Case for Switching
Live demo script
This is the complete script to run live. Practice this before
the workshop. Make sure tidyverse is installed —
everything we need for the demo comes with it.
Step 1: Frame the source before you type
Before the first keystroke, name what the room is about to see. The
CSV about to load is in a GitHub repository maintained at the University
of Aruba — island-research-reference-data — part of the
DCDC Network’s shared infrastructure for island research. It is not a
third-party service you hope stays up. It is research data the network
owns and curates. That framing matters: the “wow” is not just that R can
read a URL. It is that the data layer underneath belongs to us.
Step 2: Pull the SIDS reference list
Open a new R script in RStudio and type (or paste) the following. Run it line by line so participants can watch each step.
R
# Load packages (install tidyverse once, before the workshop)
# install.packages("tidyverse")
library(tidyverse)
# Pull the UA island-research reference list straight from GitHub
countries <- read_csv(
"https://raw.githubusercontent.com/University-of-Aruba/island-research-reference-data/main/countries/countries_reference_xlsform.csv"
)
# Quick look at what we got
head(countries)
Pause. Point out: “No browser. No download dialog. No save-as. The file is now a live object in my session, with over a dozen columns per country.”
Step 3: Filter to SIDS and chart by region
R
countries |>
filter(is_sids == 1) |>
count(wb_region) |>
ggplot(aes(x = reorder(wb_region, n), y = n)) +
geom_col(fill = "#44759e") +
coord_flip() +
labs(
title = "Small island developing states by World Bank region",
x = NULL,
y = "Number of SIDS"
) +
theme_minimal(base_size = 14)
Pause again. Key talking points:
- “This chart is ready for a report as-is. Title, axis labels, colour, proportions — all set in code.”
- “If the UA team adds a country to the reference list tomorrow, I re-run this script and the chart updates. No re-click, no re-export.”
- “Every editorial choice — what counts as a SIDS, which region goes where — is traceable, because the definitions sit in the source CSV you just pulled.”
Step 4: Show the contrast with SPSS
Ask the audience: “How would you have done this in SPSS?”
Walk through it slowly. Make it sting a little — this is the moment the cost of the current workflow lands.
- Go looking for a canonical SIDS list. UN-OHRLLS? UN DESA? A supplementary table from a recent paper? Pick one and hope it is current.
- If you cannot find a clean download, email a colleague who might have one saved somewhere. Wait for a reply. Half a day, on a good day.
- Open the CSV you eventually receive. Discover that “Cabo Verde” and “Cape Verde” are different strings, that some country codes are ISO-2 and others ISO-3, that one row has a stray trailing comma. Clean by hand.
- Import the cleaned file into SPSS. Recode the region variable because whatever classification the file uses does not match World Bank regions.
- Analyze > Descriptive Statistics > Frequencies on region. Copy the output table.
- Graphs > Chart Builder, drag variables, format the chart, copy, paste into Word.
Then say: “That is half a morning, on a good day, assuming the colleague replies and the file is clean. In R it was six lines and ten seconds — from a dataset the DCDC Network maintains, so the next person who needs it gets the same clean answer.”
Backup plan
If the Wi-Fi is unreliable, the same CSV is saved locally at
episodes/data/countries_backup.csv. Swap the
read_csv() call for the local path:
R
countries <- read_csv("data/countries_backup.csv")
Then proceed with the
filter() |> count() |> ggplot() pipeline as
normal.
Your First R Session
Pacing notes
- Spend time on the RStudio pane orientation. Have participants identify each pane on their own screen before moving on.
- The
<-assignment operator trips people up. Give them a few minutes to practice creating objects with different names and values. - When loading tidyverse, the startup messages can be alarming to beginners. Reassure them that the “Attaching packages” and “Conflicts” messages are normal and expected.
- Do the “Before you import — set up your workshop
folder” subsection as a whole-room moment, not as reading.
Project the CSV download page, walk everyone through the browser
download, the New Project dialog, and creating the
datasubfolder. Wait for green sticky notes in the Files pane before typingread_csv(). This is the most common point of failure in the course and it is worth five deliberate minutes up front to avoid twenty scattered minutes of troubleshooting later. - If a participant cannot download (blocked network, locked laptop),
have a USB stick or shared-drive copy of
aruba_visitors.csvready as fallback.
Data Manipulation
Teaching tips
- Write the pipe
|>on the whiteboard and say “and then” out loud every time you use it. This mental model sticks. - Build pipelines live, one step at a time. Run after each added line so participants can see how the output changes.
- The most common beginner mistake is putting
|>at the start of a line instead of at the end of the previous line. Emphasize: the pipe goes at the end of the line, so R knows the expression continues. - Compare nested function calls to piped code side by side. The readability advantage sells itself.
Visualization with ggplot2
Instructor note
The faceting example is a great place to pause and let learners experiment. Encourage them to try:
-
facet_wrap(~ origin, ncol = 2)to control the layout -
facet_grid(origin ~ .)for a grid arrangement - removing
scales = "free_y"to see the difference
Statistical Analysis in R
Instructor note
This is a good time to reinforce the reproducibility advantage. In SPSS, if a reviewer asks you to re-run an analysis with a different subset, you have to click through the dialogs again. In R, you change one line of code and re-run the script.
Emphasize that broom::tidy() produces a data frame —
this means students can use all the dplyr verbs they learned in Episode
3 on their statistical results (filtering significant results, arranging
by p-value, etc.).
Reproducible Reporting
Common knitting problems
The most common issue learners face is that knitting fails because
the .Rmd file does not load packages or data that earlier
chunks depend on. Remind participants that knitting starts from a
blank environment — every package and dataset must be
loaded within the .Rmd file itself, even if it is already
loaded in their current R session.
Another common issue: file paths. If participants use
read_csv("data/aruba_visitors.csv"), the working directory
during knitting is the folder where the .Rmd file is saved.
Make sure the data file is in the right relative location.