--- title: "Approach to Correcting Misspelt Local Government Areas" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Approach to Correcting Misspelt Local Government Areas} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: inline --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Motivation Nigeria has 774 Local Government Areas (LGAs). There are a number of factors that can make working with them challenging: 1. Some LGAs share the same name with their respective States e.g. Bauchi LGA in Bauchi State, and this can lead to wrong application of the data. 2. Some LGAs bear the same name although they are in different States, e.g. Obi LGA in Benue and Nasarawa States. 3. LGA spelling discrepancies are rife in the literature, and believe it or not, even in official government documents. ## A proposed solution The function `fix_region` has methods that are designed to address spelling errors LGAs^[There is a method for repairing the names of States but the operation is more manageable since there are far fewer States]. These fixes can be carried out in 3 incremental phases; if the user cannot effect the corrections at a particular level s/he can proceed to the next stage. The phases are as follows: 1. Automatic 2. Interactive 3. Manual #### 1. Automatic fixes When the spelling error is slight and unambiguous, the function automatically effects the repair. However, when there is only a single misspelt LGA, and especially if it is supplied as a string, `fix_region` will signal an error. ```{r, error = TRUE} library(naijR) fix_region("Legos Island") ``` The thinking is that an automated solution may not be necessary for a single value that was probably provided interactively. When another LGA is added, such that the argument passed to the function is a character vector with more that one element: ```{r} fix_region(c("Legos Island", "Amuwo-Odofin")) ``` Sometimes when we use a character vector to perform this check, the function `fix_region` may find it difficult to decide on what fixes to apply. In such instances, the best thing is to convert the vector into an `lgas` object. ```{r, error = TRUE} chars <- c("Legos Island", "Amuwo Odofin") fix_region(chars) # Create an `lgas` object lgs <- lgas(c("Legos Island", "Amuwo Odofin")) fix_region(lgs) ``` Note that the constructor `lgas()`, by default, signals a warning when the vector we are supplying has misspelt LGAs. However, to avoid the unnecessary verbosity whilst attempting the fix, this warning is suppressed if `lgas()` is nested in `fix_region()`. ```{r} fix_region(lgas(c("Legos Island", "Amuwo Odofin"))) ``` When the spelling mistakes depart far from the available LGA spellings, the user is shown a message stating what fixes could not be applied. To now carry out a fix, one can proceed to the next stage. ```{r} fix_region(lgas(c("Orange County", "Amuwo Odofin"))) ``` #### 2. Interactive fixes When the automatic fix is not feasible, the user has the option of doing it interactively by calling `fix_region` and setting its `interactive` argument to `TRUE`. By following the prompts, the misspelt LGA as well as possible replacements are presented. All the user needs to do is to select the desired replacement value. This is particularly useful when the user is not sure of what the correct spelling might be. When a misspelt LGA has more than one match, the interactive approach is the viable option for effecting fixes. In the example below, not all the mispelt LGAs could be fixed automatically: ```{r} mispelt.adamawa <- c("Fufure", "Demsa", "Machika", "Fufure", "Ganye", "Hong") # check for misspelt LGAs and, if necessary, attempt to fix length(mispelt.adamawa) sum(is_lga(mispelt.adamawa)) corrected.adamawa <- fix_region(mispelt.adamawa) sum(is_lga(corrected.adamawa)) ``` We see that the string 'Machika' is not an actual LGA and there are more than one possible candidate LGA that are considered as possible replacement. Our original intent was to use "Michika LGA". To address this mistake, we can run `fix_region` interactively. Note that the `interactive` option is only available for use with objects of class `lgas`. ```{r, eval=FALSE} adamawa <- fix_region(lgas(corrected.adamawa), interactive = TRUE) ``` Next, the user is provided with a prompt to provide a search item for likely LGAs that would be an appropriate replacement for the misspelt one. Possible replacements are presented and to select any value, the appropriate number should be entered at the prompt. The option `RETRY` allows the user to restart the query and provide another search term, `SKIP` is to try out a different misspelt item (where there are more than one) and to `QUIT` is to exit the prompt. On Windows machines, the user can use the native dialog boxes by setting `graphic` to `TRUE`. ```{r manual-fix, include = FALSE} adamawa <- fix_region_manual(corrected.adamawa, wrong = "Machika", correct = "Michika") adamawa ``` ```{r confirm-fix} # Confirm that the spelling mistakes have been fixed. all(is_lga(adamawa)) ``` #### 3. Manual fixes When spelling errors are identified and the correct one is already known, then a manual fix can be applied. For this, we use the function `fix_region_manual` and its use is straightforward^[Works for both character vectors and `lgas` objects.]: ```{r manual-fix} ```