Instructor Notes

General Teaching Approach

This course follows the Carpentries live-coding pedagogy: the instructor types code live while participants follow along. Avoid slides for code — always demonstrate in RStudio.

Key principles: - Start from what they know: Every R operation is introduced alongside its SPSS equivalent. Use SPSS terminology first, then introduce the R term. - Wow first, skills second: Episode 1 is pure motivation. Show impressive things before asking anyone to type. - Sticky notes: Use colored sticky notes (or digital equivalents) for real-time feedback. Green = I’m following. Red = I need help. - Helpers: Aim for 1 helper per 5-8 participants to assist with individual issues without stopping the class.

Session Structure

Session 1 (5-6 hours with breaks)

Episode	Time	Notes
01 - The Case for Switching	45 min	Instructor demo only, no participant coding. Demo is the UA SIDS reference-list pull from `island-research-reference-data` (see Episode 1 instructor block). Also shows the Xander Bogaerts capstone HTML as the Friday-afternoon target — rendered version at `episodes/files/xander-bogaerts-report.html` and source at `episodes/files/xander-bogaerts-report-template.Rmd`.
Break	15 min
02 - Your First R Session	90 min	First hands-on. Go slow. Many will struggle with typos. The “Before you import — set up your workshop folder” subsection is a deliberate whole-room synchronized moment: project the download links on the screen, wait for green stickies in the Files pane before typing `read_csv()`.
Break	15 min
03 - Data Manipulation	90 min	The pipe operator is the key “aha” moment
Break	15 min
04 - Visualization	60 min	End on a high — everyone leaves with a beautiful chart
Wrap-up + homework brief	15 min	Project the homework page on the screen. Walk through the four-step assignment out loud. Tell participants the page URL is bookmarked under “For Learners → Homework brief” on the course site so they can open it on any device overnight. Emphasise: 30–60 minutes is enough, do not attempt R Markdown yet (that is Day 2). Bring the script to Day 2 open lab.

Session 2 (5-6 hours with breaks)

Episode	Time	Notes
Review and troubleshooting	30 min	Address questions from between-session practice
05 - Statistical Analysis	120 min	Core for survey researchers. The normality-testing section (histogram, Q-Q plot, Shapiro-Wilk, robustness note) maps directly onto the SPSS Explore output most participants will recognise. Take your time.
Break	15 min
06 - Reproducible Reporting	60 min	R Markdown is often the biggest “wow” for SPSS users. Ends with the Xander Bogaerts capstone section that Episode 1’s opening teased. Participants pull `xander-bogaerts-report-template.Rmd` and `xander-report.css` from the UA GitHub raw URL via the `download.file()` block in the episode; walk through the template’s structure live once both files are in their working directory.
Break	15 min
07 - Where to Go from Here	55 min	End with practical next steps. The new UA datasets subsection (CAS_election_data and island-research-reference-data) is a chance to live-demo `read.csv()` straight from a raw GitHub URL — most SPSS users have never seen data load over HTTPS without a manual download.

Per-episode scene transitions

Each episode opens with one atmospheric scene image at the top of the page (fig/scene_1.jpg through scene_7.jpg) and a one-line quip caption. The captions do most of the work; instructors who prefer to start straight from the first heading should feel free to. For instructors who want a single-beat acknowledgment when arriving at each new episode, the lines below match the captions in tone and feed naturally into the episode’s opening content. They are optional.

Ep	Caption on page	Optional transition line
1	One road costs you a license fee. The other one costs you a learning curve.	(Episode 1 opens with the workshop’s full opening sequence; no separate transition needed.)
2	The iguana is optional. The coconut water is not.	“We’re at the bar, R is open, the iguana is doing iguana things. Time to type something.”
3	You can’t cook without ingredients. You can’t wrangle without verbs.	“Three jars on the counter today: filter, select, mutate. Everything else in dplyr is a variation on those three.”
4	SPSS gives you a chart. ggplot2 gives you a language.	“ggplot is grammar, not buttons. By the end of this episode you’ll be writing sentences.”
5	Same tests, fewer menus, more crabs.	“The tests you know from SPSS — t-test, ANOVA, chi-square, regression — are all here. The crabs are the new part. Trust the crabs.”
6	Your supervisor changed the sample. Again. Good thing you only need one button.	“This is the moment R Markdown earns the price of admission. One button, new data in, finished document out.”
7	You learned the basics. The map has a lot more islands.	“The basics are behind you. The next forty-five minutes are about where to go from here, with islands marked for you to chart.”

Pick one beat, deliver it, move into the page’s first heading. Do not stack a second sentence on top.

Common Issues

Installation problems: The pre-course installation clinic should catch most of these. Have a USB drive with R and RStudio installers as backup.
Typos: SPSS users are not used to typing commands. Expect many syntax errors. Normalize this — “error messages are how R talks to you.”
Parentheses and quotes: The most common beginner errors. Show how RStudio auto-completes these.
Loading packages: Participants will forget library(). Remind them at the start of each episode.

Local Data Notes

The course uses Dutch Caribbean datasets to keep examples relevant: - CBS Aruba tourism and CPI data (Excel downloads from cbs.aw) - World Bank indicators via the WDI package (used in Episode 7 only; Episode 1’s demo was switched to the SIDS reference list below after the WDI tourism series was found missing for 2019-2023) - CBS Netherlands BES island data via cbsodataR - CAS_election_data — Aruba, Curacao, Sint Maarten election results 1985-2025 (tidy CSV at github.com/University-of-Aruba/CAS_election_data). Used in Episode 7. - island-research-reference-data — country reference list with SIDS, SNIJ, and World Bank classifications (CSV at github.com/University-of-Aruba/island-research-reference-data). Used in Episodes 1 and 7. A backup copy is committed at episodes/data/countries_backup.csv for offline fallback.

Prepare cleaned versions of these datasets in the episodes/data/ folder before the course. Test all data downloads — URLs and APIs can change.

Note on the elections example

The Episode 7 election-data example deliberately groups MEP and AVP together versus all other parties, rather than singling out one party. Aruba’s two-party dynamic means filtering on party == "MEP" (or "AVP") on its own can read as partisan bias in a publicly distributed course. If extending the example live, default to the same grouped framing or to all-parties views (e.g. all parties for the most recent election, or vote share over time). If a participant asks why we don’t filter to one party, this is the reason worth naming briefly: it is a small editorial choice that protects the course and the network’s neutrality.

Train-the-Trainer

This course is designed for replication. If you are adapting it for another island or institution: 1. Replace datasets with locally relevant equivalents 2. Adjust the SPSS operations covered based on your pre-course survey results 3. Keep the “wow first, skills second” structure 4. All materials are CC-BY 4.0 — please attribute the DCDC Network

The Case for Switching

Live demo script

This is the complete script to run live. Practice this before the workshop. Make sure tidyverse is installed — everything we need for the demo comes with it.

Step 1: Frame the source before you type

Before the first keystroke, name what the room is about to see. The CSV about to load is in a GitHub repository maintained at the University of Aruba — island-research-reference-data — part of the DCDC Network’s shared infrastructure for island research. It is not a third-party service you hope stays up. It is research data the network owns and curates. That framing matters: the “wow” is not just that R can read a URL. It is that the data layer underneath belongs to us.

Step 2: Pull the SIDS reference list

Open a new R script in RStudio and type (or paste) the following. Run it line by line so participants can watch each step.

R

# Load packages (install tidyverse once, before the workshop)
# install.packages("tidyverse")
library(tidyverse)

# Pull the UA island-research reference list straight from GitHub
countries <- read_csv(
  "https://raw.githubusercontent.com/University-of-Aruba/island-research-reference-data/main/countries/countries_reference_xlsform.csv"
)

# Quick look at what we got
head(countries)

Pause. Point out: “No browser. No download dialog. No save-as. The file is now a live object in my session, with over a dozen columns per country.”

Step 3: Filter to SIDS and chart by region

R

countries |>
  filter(is_sids == 1) |>
  count(wb_region) |>
  ggplot(aes(x = reorder(wb_region, n), y = n)) +
  geom_col(fill = "#44759e") +
  coord_flip() +
  labs(
    title = "Small island developing states by World Bank region",
    x = NULL,
    y = "Number of SIDS"
  ) +
  theme_minimal(base_size = 14)

Pause again. Key talking points:

“This chart is ready for a report as-is. Title, axis labels, colour, proportions — all set in code.”
“If the UA team adds a country to the reference list tomorrow, I re-run this script and the chart updates. No re-click, no re-export.”
“Every editorial choice — what counts as a SIDS, which region goes where — is traceable, because the definitions sit in the source CSV you just pulled.”

Step 4: Show the contrast with SPSS

Ask the audience: “How would you have done this in SPSS?”

Walk through it slowly. Make it sting a little — this is the moment the cost of the current workflow lands.

Go looking for a canonical SIDS list. UN-OHRLLS? UN DESA? A supplementary table from a recent paper? Pick one and hope it is current.
If you cannot find a clean download, email a colleague who might have one saved somewhere. Wait for a reply. Half a day, on a good day.
Open the CSV you eventually receive. Discover that “Cabo Verde” and “Cape Verde” are different strings, that some country codes are ISO-2 and others ISO-3, that one row has a stray trailing comma. Clean by hand.
Import the cleaned file into SPSS. Recode the region variable because whatever classification the file uses does not match World Bank regions.
Analyze > Descriptive Statistics > Frequencies on region. Copy the output table.
Graphs > Chart Builder, drag variables, format the chart, copy, paste into Word.

Then say: “That is half a morning, on a good day, assuming the colleague replies and the file is clean. In R it was six lines and ten seconds — from a dataset the DCDC Network maintains, so the next person who needs it gets the same clean answer.”

Backup plan

If the Wi-Fi is unreliable, the same CSV is saved locally at episodes/data/countries_backup.csv. Swap the read_csv() call for the local path:

R

countries <- read_csv("data/countries_backup.csv")

Then proceed with the filter() |> count() |> ggplot() pipeline as normal.

Your First R Session

Pacing notes

Spend time on the RStudio pane orientation. Have participants identify each pane on their own screen before moving on.
The <- assignment operator trips people up. Give them a few minutes to practice creating objects with different names and values.
When loading tidyverse, the startup messages can be alarming to beginners. Reassure them that the “Attaching packages” and “Conflicts” messages are normal and expected.
Do the “Before you import — set up your workshop folder” subsection as a whole-room moment, not as reading. Project the CSV download page, walk everyone through the browser download, the New Project dialog, and creating the data subfolder. Wait for green sticky notes in the Files pane before typing read_csv(). This is the most common point of failure in the course and it is worth five deliberate minutes up front to avoid twenty scattered minutes of troubleshooting later.
If a participant cannot download (blocked network, locked laptop), have a USB stick or shared-drive copy of aruba_visitors.csv ready as fallback.

Data Manipulation

Teaching tips

Write the pipe |> on the whiteboard and say “and then” out loud every time you use it. This mental model sticks.
Build pipelines live, one step at a time. Run after each added line so participants can see how the output changes.
The most common beginner mistake is putting |> at the start of a line instead of at the end of the previous line. Emphasize: the pipe goes at the end of the line, so R knows the expression continues.
Compare nested function calls to piped code side by side. The readability advantage sells itself.

Visualization with ggplot2

Instructor note

The faceting example is a great place to pause and let learners experiment. Encourage them to try:

facet_wrap(~ origin, ncol = 2) to control the layout
facet_grid(origin ~ .) for a grid arrangement
removing scales = "free_y" to see the difference

Statistical Analysis in R

Instructor note

This is a good time to reinforce the reproducibility advantage. In SPSS, if a reviewer asks you to re-run an analysis with a different subset, you have to click through the dialogs again. In R, you change one line of code and re-run the script.

Emphasize that broom::tidy() produces a data frame — this means students can use all the dplyr verbs they learned in Episode 3 on their statistical results (filtering significant results, arranging by p-value, etc.).

Reproducible Reporting

Common knitting problems

The most common issue learners face is that knitting fails because the .Rmd file does not load packages or data that earlier chunks depend on. Remind participants that knitting starts from a blank environment — every package and dataset must be loaded within the .Rmd file itself, even if it is already loaded in their current R session.

Another common issue: file paths. If participants use read_csv("data/aruba_visitors.csv"), the working directory during knitting is the folder where the .Rmd file is saved. Make sure the data file is in the right relative location.