The problem
Researchers working on Small Island Developing States (SIDS), the Caribbean, or other sub-sovereign territories run into two recurring frictions when joining classifications onto their data.
The first is sub-sovereign disambiguation. Aruba (AW),
Curaçao (CW), and Sint Maarten (SX) have their
own ISO 3166-1 codes, but Bonaire, Sint Eustatius, and Saba share
BQ. Most country-code packages either drop these three or
collapse them into one row, which silently corrupts joins.
The second is classification. UN-DESA’s SIDS list, including both sovereign and associate members, and the broader sub-national island jurisdiction (SNIJ) literature, are not standard fields in country-code dictionaries. Researchers tend to keep these as a side spreadsheet and hand-copy them onto each project.
islandcodes does one thing: it ships the classification
list, with disambiguating codes for the sub-sovereign cases, and a few
helpers that work alongside countrycode.
A first pass
library(islandcodes)
#>
#> Attaching package: 'islandcodes'
#> The following object is masked from 'package:datasets':
#>
#> islands
is_sids(c("Aruba", "Curacao", "Bonaire", "Brazil"))
#> [1] TRUE TRUE FALSE FALSE
is_snij(c("Aruba", "Curacao", "Bonaire", "Brazil"))
#> [1] TRUE TRUE TRUE FALSENotice that Aruba returns TRUE for both
is_sids and is_snij. It is a UN-DESA SIDS
associate member and a sub-national island jurisdiction within the
Kingdom of the Netherlands. Bonaire returns FALSE for SIDS
but TRUE for SNIJ: it is part of the Netherlands proper as
a special municipality, not a separate jurisdiction recognised by
UN-DESA.
The package accepts country names, ISO 3166-1 alpha-2 codes, or the hyphenated extensions used here for the three BES islands.
Adding columns to a research data frame
df <- data.frame(
country = c("Aruba", "Curacao", "Bonaire", "Sint Maarten", "Brazil"),
variable = c(3.5, 3.1, 0.5, 1.2, 1900)
)
add_island_cols(df, "country")
#> country variable iso_code is_sids is_snij sids_tier
#> 1 Aruba 3.5 AW 1 1 Associate member
#> 2 Curacao 3.1 CW 1 1 Associate member
#> 3 Bonaire 0.5 BQ-BO 0 1 <NA>
#> 4 Sint Maarten 1.2 SX 1 1 Associate member
#> 5 Brazil 1900.0 BR 0 0 <NA>
#> political_association wb_region wb_income_group
#> 1 Dutch Kingdom Latin America & Caribbean High income
#> 2 Dutch Kingdom Latin America & Caribbean High income
#> 3 Dutch Kingdom Latin America & Caribbean High income
#> 4 Dutch Kingdom Latin America & Caribbean High income
#> 5 Independent Latin America & Caribbean Upper-middle incomeBy default add_island_cols attaches
iso_code, is_sids, is_snij,
sids_tier, political_association,
wb_region, and wb_income_group. Override
cols for a narrower selection.
Working alongside countrycode
islandcodes imports countrycode for
name-to-code resolution on the long tail of country names. For projects
that already use countrycode, run it first to get an ISO
column, then pass that column to islandcodes.
library(countrycode)
df$iso2 <- countrycode(df$country, "country.name", "iso2c")
#> Warning: Some values were not matched unambiguously: Bonaire
#> To fix unmatched values, please use the `custom_match` argument. If you think the default matching rules should be improved, please file an issue at https://github.com/vincentarelbundock/countrycode/issues
df$iso2 # note Bonaire collapses to NA in countrycode
#> [1] "AW" "CW" NA "SX" "BR"
# islandcodes recovers the BES cases via direct hyphenated lookup
add_island_cols(df, "country",
cols = c("iso_code", "is_sids", "is_snij"))
#> country variable iso2 iso_code is_sids is_snij
#> 1 Aruba 3.5 AW AW 1 1
#> 2 Curacao 3.1 CW CW 1 1
#> 3 Bonaire 0.5 <NA> BQ-BO 0 1
#> 4 Sint Maarten 1.2 SX SX 1 1
#> 5 Brazil 1900.0 BR BR 0 0The pattern is: countrycode for the standard ISO
conversion, islandcodes for everything that does not
fit.
Filtered subsets
nrow(small_islands(sids_only = TRUE))
#> [1] 58
nrow(small_islands(snij_only = TRUE))
#> [1] 47
head(small_islands(criteria = c(small = TRUE, island = TRUE, sovereign = TRUE)),
8)
#> name label iso_code wb_region
#> 1 AG Antigua and Barbuda AG Latin America & Caribbean
#> 2 BS Bahamas BS Latin America & Caribbean
#> 3 BB Barbados BB Latin America & Caribbean
#> 4 BN Brunei Darussalam BN East Asia & Pacific
#> 5 CV Cabo Verde CV Sub-Saharan Africa
#> 6 KM Comoros KM Sub-Saharan Africa
#> 7 CY Cyprus CY Europe & Central Asia
#> 8 DM Dominica DM Latin America & Caribbean
#> wb_income_group political_association is_sids sids_tier is_snij
#> 1 High income Independent 1 Sovereign member 0
#> 2 High income Independent 1 Sovereign member 0
#> 3 High income Independent 1 Sovereign member 0
#> 4 High income Independent 0 <NA> 0
#> 5 Lower-middle income Independent 1 Sovereign member 0
#> 6 Low income Independent 1 Sovereign member 0
#> 7 High income Independent 0 <NA> 0
#> 8 Upper-middle income Independent 1 Sovereign member 0
#> criterion_small criterion_island criterion_developing criterion_sovereign
#> 1 1 1 0 1
#> 2 1 1 0 1
#> 3 1 1 0 1
#> 4 1 1 0 1
#> 5 1 1 1 1
#> 6 1 1 1 1
#> 7 1 1 0 1
#> 8 1 1 1 1
#> latitude longitude capital capital_latitude capital_longitude
#> 1 17.3522 -61.7906 Saint John's 17.1180 -61.8500
#> 2 26.4018 -77.1467 Nassau 25.0834 -77.3500
#> 3 13.1637 -59.5690 Bridgetown 13.1020 -59.6165
#> 4 4.4483 114.5519 Bandar Seri Begawan 4.8833 114.9333
#> 5 15.0748 -23.6394 Praia 14.9167 -23.5167
#> 6 -11.7277 43.3181 Moroni -11.7042 43.2402
#> 7 34.9133 33.0842 Nicosia 35.1667 33.3666
#> 8 15.4588 -61.3450 Roseau 15.3010 -61.3870Coordinates
Each territory carries two kinds of location, both in WGS84 decimal
degrees: a representative point on the main landmass
(latitude, longitude) and the capital city
(capital, capital_latitude,
capital_longitude). island_coords() returns
them for a vector of names or codes, in input order.
island_coords(c("Aruba", "Curacao", "Bonaire", "Sint Maarten"))
#> label iso_code latitude longitude
#> 1 Aruba AW 12.5174 -69.9728
#> 2 Curaçao CW 12.1450 -68.9206
#> 3 Bonaire BQ-BO 12.1830 -68.2500
#> 4 Sint Maarten SX 18.0409 -63.0701The two semantics differ for dispersed territories. For French Polynesia the landmass point sits near Tahiti while the capital is Papeete; for the United States the landmass point is in the continental interior, far from Washington.
island_coords(c("PF", "US"))
#> label iso_code latitude longitude
#> 1 French Polynesia PF -17.6281 -149.4616
#> 2 United States US 39.5385 -97.4826
island_coords(c("PF", "US"), which = "capital")
#> label iso_code capital latitude longitude
#> 1 French Polynesia PF Papeete -17.5334 -149.5667
#> 2 United States US Washington, D.C. 38.9015 -77.0114Distance and isolation
island_distance() turns the coordinates into
great-circle (haversine) distances. Given a single set it returns a
symmetric matrix, which exposes the spread of the Dutch Caribbean, from
the ABC islands off Venezuela to the SSS islands some 900 km to the
northeast.
round(island_distance(c("AW", "CW", "BQ-BO", "SX", "BQ-SA", "BQ-SE")))
#> AW CW BQ-BO SX BQ-SA BQ-SE
#> AW 0 122 191 962 920 933
#> CW 122 0 73 908 863 873
#> BQ-BO 191 73 0 856 811 818
#> SX 962 908 856 0 49 62
#> BQ-SA 920 863 811 49 0 32
#> BQ-SE 933 873 818 62 32 0Pass a second argument for distances from one or more origins to a set of destinations. The shorter side is recycled, so a single origin against many destinations gives the distance to each.
island_distance("Aruba", c("Curacao", "Bonaire", "Sint Maarten"))
#> CW BQ-BO SX
#> 121.5695 190.7913 961.7026which = "capital" measures between capital cities
instead of landmass points, and unit switches between
kilometres, statute miles, and nautical miles.
island_distance("AW", "CW", which = "capital", unit = "nmi")
#> CW
#> 71.96274Because both helpers keep input order and fill unresolved rows with
NA, they slot directly into a pipeline: resolve a column of
names, attach coordinates or distances, then rank by isolation or hand
the points to a mapping package. A quick isolation measure is each
island’s nearest neighbour, the smallest off-diagonal distance in its
row.
abc <- island_distance(c("AW", "CW", "BQ-BO"))
diag(abc) <- NA
apply(abc, 1, min, na.rm = TRUE) # km to nearest of the other two
#> AW CW BQ-BO
#> 121.56948 73.01551 73.01551Source and citation
The bundled dataset is mirrored from the University
of Aruba island-research-reference-data repository, licensed CC BY
4.0. Run citation("islandcodes") for the canonical
citation.