The problem
Researchers working on Small Island Developing States (SIDS), the Caribbean, or other sub-sovereign territories run into two recurring frictions when joining classifications onto their data.
The first is sub-sovereign disambiguation. Aruba (AW),
Curaçao (CW), and Sint Maarten (SX) have their
own ISO 3166-1 codes, but Bonaire, Sint Eustatius, and Saba share
BQ. Most country-code packages either drop these three or
collapse them into one row, which silently corrupts joins.
The second is classification. UN-DESA’s SIDS list, including both sovereign and associate members, and the broader sub-national island jurisdiction (SNIJ) literature, are not standard fields in country-code dictionaries. Researchers tend to keep these as a side spreadsheet and hand-copy them onto each project.
islandcodes does one thing: it ships the classification
list, with disambiguating codes for the sub-sovereign cases, and a few
helpers that work alongside countrycode.
A first pass
library(islandcodes)
#>
#> Attaching package: 'islandcodes'
#> The following object is masked from 'package:datasets':
#>
#> islands
is_sids(c("Aruba", "Curacao", "Bonaire", "Brazil"))
#> [1] TRUE TRUE FALSE FALSE
is_snij(c("Aruba", "Curacao", "Bonaire", "Brazil"))
#> [1] TRUE TRUE TRUE FALSENotice that Aruba returns TRUE for both
is_sids and is_snij. It is a UN-DESA SIDS
associate member and a sub-national island jurisdiction within the
Kingdom of the Netherlands. Bonaire returns FALSE for SIDS
but TRUE for SNIJ: it is part of the Netherlands proper as
a special municipality, not a separate jurisdiction recognised by
UN-DESA.
The package accepts country names, ISO 3166-1 alpha-2 codes, or the hyphenated extensions used here for the three BES islands.
Adding columns to a research data frame
df <- data.frame(
country = c("Aruba", "Curacao", "Bonaire", "Sint Maarten", "Brazil"),
variable = c(3.5, 3.1, 0.5, 1.2, 1900)
)
add_island_cols(df, "country")
#> country variable iso_code is_sids is_snij sids_tier
#> 1 Aruba 3.5 AW 1 1 Associate member
#> 2 Curacao 3.1 CW 1 1 Associate member
#> 3 Bonaire 0.5 BQ-BO 0 1 <NA>
#> 4 Sint Maarten 1.2 SX 1 1 Associate member
#> 5 Brazil 1900.0 BR 0 0 <NA>
#> political_association wb_region wb_income_group
#> 1 Dutch Kingdom Latin America & Caribbean High income
#> 2 Dutch Kingdom Latin America & Caribbean High income
#> 3 Dutch Kingdom Latin America & Caribbean High income
#> 4 Dutch Kingdom Latin America & Caribbean High income
#> 5 Independent Latin America & Caribbean Upper-middle incomeBy default add_island_cols attaches
iso_code, is_sids, is_snij,
sids_tier, political_association,
wb_region, and wb_income_group. Override
cols for a narrower selection.
Working alongside countrycode
islandcodes imports countrycode for
name-to-code resolution on the long tail of country names. For projects
that already use countrycode, run it first to get an ISO
column, then pass that column to islandcodes.
library(countrycode)
df$iso2 <- countrycode(df$country, "country.name", "iso2c")
#> Warning: Some values were not matched unambiguously: Bonaire
#> To fix unmatched values, please use the `custom_match` argument. If you think the default matching rules should be improved, please file an issue at https://github.com/vincentarelbundock/countrycode/issues
df$iso2 # note Bonaire collapses to NA in countrycode
#> [1] "AW" "CW" NA "SX" "BR"
# islandcodes recovers the BES cases via direct hyphenated lookup
add_island_cols(df, "country",
cols = c("iso_code", "is_sids", "is_snij"))
#> country variable iso2 iso_code is_sids is_snij
#> 1 Aruba 3.5 AW AW 1 1
#> 2 Curacao 3.1 CW CW 1 1
#> 3 Bonaire 0.5 <NA> BQ-BO 0 1
#> 4 Sint Maarten 1.2 SX SX 1 1
#> 5 Brazil 1900.0 BR BR 0 0The pattern is: countrycode for the standard ISO
conversion, islandcodes for everything that does not
fit.
Filtered subsets
nrow(small_islands(sids_only = TRUE))
#> [1] 58
nrow(small_islands(snij_only = TRUE))
#> [1] 47
head(small_islands(criteria = c(small = TRUE, island = TRUE, sovereign = TRUE)),
8)
#> name label iso_code wb_region
#> 1 AG Antigua and Barbuda AG Latin America & Caribbean
#> 2 BS Bahamas BS Latin America & Caribbean
#> 3 BB Barbados BB Latin America & Caribbean
#> 4 BN Brunei Darussalam BN East Asia & Pacific
#> 5 CV Cabo Verde CV Sub-Saharan Africa
#> 6 KM Comoros KM Sub-Saharan Africa
#> 7 CY Cyprus CY Europe & Central Asia
#> 8 DM Dominica DM Latin America & Caribbean
#> wb_income_group political_association is_sids sids_tier is_snij
#> 1 High income Independent 1 Sovereign member 0
#> 2 High income Independent 1 Sovereign member 0
#> 3 High income Independent 1 Sovereign member 0
#> 4 High income Independent 0 <NA> 0
#> 5 Lower-middle income Independent 1 Sovereign member 0
#> 6 Low income Independent 1 Sovereign member 0
#> 7 High income Independent 0 <NA> 0
#> 8 Upper-middle income Independent 1 Sovereign member 0
#> criterion_small criterion_island criterion_developing criterion_sovereign
#> 1 1 1 0 1
#> 2 1 1 0 1
#> 3 1 1 0 1
#> 4 1 1 0 1
#> 5 1 1 1 1
#> 6 1 1 1 1
#> 7 1 1 0 1
#> 8 1 1 1 1Source and citation
The bundled dataset is mirrored from the University
of Aruba island-research-reference-data repository, licensed CC BY
4.0. Run citation("islandcodes") for the canonical
citation.