
Auto-repair country names to their closest known match
Source:R/diagnostics.R
repair_country_names.RdThe "act on it" companion to check_country_match(): replaces unmatched
country names with their closest known country name (by string distance), but
only when the match is confident enough, and reports what it changed. Pipe the
result into standardize_country() / join_world().
Arguments
- x
A vector of country names.
- threshold
Maximum string distance to accept a repair (0 = identical, 1 = unrelated). Lower is stricter; default
0.2. Uses Jaro-Winkler whenstringdistis installed, otherwise a length-normalised edit distance.- origin
countrycode origin scheme (default
"country.name").- verbose
Whether to message the substitutions made (default
TRUE).
Value
A character vector the same length as x, with confident misses
replaced by the closest known country name (others left unchanged). The
applied substitutions are attached as the attribute "repairs".
Examples
repair_country_names(c("United States", "Brzil", "Germny"))
#> ✔ Repaired 2 country names:
#> • "Brzil -> Brazil" and "Germny -> Germany"
#> [1] "United States" "Brazil" "Germany"
#> attr(,"repairs")
#> # A tibble: 2 × 2
#> from to
#> <chr> <chr>
#> 1 Brzil Brazil
#> 2 Germny Germany