Last updated: 2021-10-05

Checks: 6 1

Knit directory: codemapper/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: uncommitted changes

The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20210923)

The command set.seed(20210923) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: d285b07

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version d285b07. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    R/.DS_Store
    Ignored:    _targets/
    Ignored:    config.ini
    Ignored:    data/
    Ignored:    output/
    Ignored:    renv/library/
    Ignored:    renv/local/
    Ignored:    renv/staging/
    Ignored:    tests/.DS_Store

Untracked files:
    Untracked:  R/codemapper.R
    Untracked:  man/codemapper.Rd

Unstaged changes:
    Modified:   DESCRIPTION
    Modified:   R/clinical_codes.R
    Modified:   R/constants.R
    Modified:   R/utils.R
    Modified:   _targets.R
    Modified:   analysis/clinical_codes_lkps_and_mappings.Rmd
    Modified:   analysis/index.Rmd
    Modified:   man/get_child_codes.Rd
    Modified:   man/lookup_codes.Rd
    Modified:   man/map_codes.Rd
    Modified:   man/reformat_icd10_codes.Rd
    Modified:   man/search_codes_by_description.Rd
    Modified:   tests/testthat/test_constants.R

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/clinical_codes_lkps_and_mappings.Rmd) and HTML (public/clinical_codes_lkps_and_mappings.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	d285b07	rmgpanw	2021-09-29	add analysis/reformat_all_lkps_maps.Rmd to _targets.R; start analysis/clinical_codes_lkps_and_mappings.Rmd

library(codemapper)
#> Loading required package: ukbwranglr
library(tidyverse)
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
#> ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
#> ✓ tibble  3.1.4     ✓ dplyr   1.0.7
#> ✓ tidyr   1.1.4     ✓ stringr 1.4.0
#> ✓ readr   2.0.2     ✓ forcats 0.5.1
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag()    masks stats::lag()

Introduction

This vignette describes how to work with clinical codes using codemapper. Specifically:

Looking up codes and their descriptions
Getting ‘children’ codes
Mapping codes from one coding system to another

The functions provided by codemapper rely on UK Biobank resource 592, which includes an Excel workbook containing lookup and mapping tables, and the NHSBSA BNF to SNOMED code mapping file (available here). These tables have been converted into a named list of data frames. This can be retrieved with TODO():

# retrieve code mappings in .Rdata format
all_lkps_maps <- codemapper:::get_all_lkps_maps()

# each item in the list is a sheet in the UKB Excel workbook (resource 592)
names(all_lkps_maps)
#>  [1] "bnf_lkp"           "dmd_lkp"           "icd9_lkp"         
#>  [4] "icd10_lkp"         "icd9_icd10"        "read_v2_lkp"      
#>  [7] "read_v2_drugs_lkp" "read_v2_drugs_bnf" "read_v2_icd9"     
#> [10] "read_v2_icd10"     "read_v2_opcs4"     "read_v2_read_ctv3"
#> [13] "read_ctv3_lkp"     "read_ctv3_icd9"    "read_ctv3_icd10"  
#> [16] "read_ctv3_opcs4"   "read_ctv3_read_v2" "bnf_dmd"

Code lookups

To lookup details for a particular code, use lookup_codes(). Setting preferred_description_only to TRUE will return only the preferred code descriptions if synonyms are present (read2 and read3 may include multiple descriptions for the same code):

# Some Read2 codes for T1DM
t1dm_read2 <- c("C10E.", "C108.")

# lookup details
lookup_codes(codes = t1dm_read2,
             code_type = "read2",
             all_lkps_maps = all_lkps_maps,
             preferred_description_only = TRUE)
#> # A tibble: 2 × 3
#>   code  description                         code_type
#>   <chr> <chr>                               <chr>    
#> 1 C108. Insulin dependent diabetes mellitus read2    
#> 2 C10E. Type 1 diabetes mellitus            read2

By default, the output is standardised to produce the columns shown above. To output the original formatting from UK Biobank resource 592, set standardise_output to FALSE:

# lookup details
lookup_codes(codes = t1dm_read2,
             code_type = "read2",
             all_lkps_maps = all_lkps_maps,
             preferred_description_only = TRUE,
             standardise_output = FALSE)
#> # A tibble: 2 × 3
#>   read_code term_code term_description                   
#>   <chr>     <chr>     <chr>                              
#> 1 C108.     00        Insulin dependent diabetes mellitus
#> 2 C10E.     00        Type 1 diabetes mellitus

Code mapping

An example: map codes for type 1 diabetes from Read2 to Read3. Note that both of the Read 2 codes C10E. and C108. map to a single read3 code, X40J4:

# Some Read2 codes for T1DM
t1dm_read2 <- c("C10E.", "C108.")

# lookup details
map_codes(codes = t1dm_read2, 
          from = "read2", 
          to = "read3", 
          all_lkps_maps = all_lkps_maps)
#> # A tibble: 5 × 3
#>   code  description                                code_type
#>   <chr> <chr>                                      <chr>    
#> 1 X40J4 Type I diabetes mellitus                   read3    
#> 2 X40J4 Type 1 diabetes mellitus                   read3    
#> 3 X40J4 IDDM - Insulin-dependent diabetes mellitus read3    
#> 4 X40J4 Juvenile onset diabetes mellitus           read3    
#> 5 X40J4 Insulin-dependent diabetes mellitus        read3

Note that preferred_description_only cannot be TRUE if standardise_output is FALSE with map_codes (will raise an error). This is because some codes may otherwise be ‘lost’ in the mapping process. When standardise_output is TRUE, the mapped codes from map_codes are passed on to lookup_codes, at which point one can request to return only preferred code descriptions:

# mapping the Read 2 code "D4104" to Read 3 only returns the secondary Read 3
# description (`TERMV3_TYPE` == "S"), unlike for "D4103". 
map_codes(codes = c("D4103", "D4104"), 
          from = "read2", 
          to = "read3", 
          all_lkps_maps = all_lkps_maps, 
          codes_only = FALSE,
          preferred_description_only = FALSE,
          standardise_output = FALSE) %>% 
  dplyr::select(tidyselect::contains("V3"))
#> # A tibble: 2 × 4
#>   READV3_CODE TERMV3_CODE TERMV3_TYPE TERMV3_DESC                               
#>   <chr>       <chr>       <chr>       <chr>                                     
#> 1 D4103       Y20e3       P           Polycythaemia due to cyanotic respiratory…
#> 2 D4104       Y20e6       S           Renal polycythaemia

# if `standardise_output` is `TRUE`, then `preferred_description_only` may also
# be set to `TRUE`
map_codes(codes = c("D4103", "D4104"), 
          from = "read2", 
          to = "read3", 
          all_lkps_maps = all_lkps_maps, 
          codes_only = FALSE,
          preferred_description_only = TRUE,
          standardise_output = TRUE)
#> # A tibble: 2 × 3
#>   code  description                                        code_type
#>   <chr> <chr>                                              <chr>    
#> 1 D4103 Polycythaemia due to cyanotic respiratory disease  read3    
#> 2 D4104 Secondary polycythaemia with excess erythropoietin read3

Mapping to ICD is more problematic as some results are a range of ICD codes (note also that for ICD-10, the mapping sheets use an alternative code format which removes any “.” characters):

map_codes(codes = t1dm_read2, 
          from = "read2", 
          to = "icd10", 
          all_lkps_maps = all_lkps_maps, 
          codes_only = TRUE,
          standardise_output = FALSE)
#> [1] "E100-E109"

The available mappings do not cover all possible mapping directions. For example, while there are mappings for Read2 to ICD-10, there is no mapping for ICD-10 to Read2. For cases like this, map_codes() will attempt to map anyway by using the same mapping sheet in reverse (e.g. for mapping ICD-10 to Read2, map_codes uses the read_v2_icd10 mapping sheet). However, this returns no results when attempting to map the ICD-10 code, H36.0 for diabetic retinopathy:

# find ICD-10 code matching "diabetic retinopathy"
icd10_diabetic_retinopathy <-
  search_codes_by_description(
    reg_expr = "diabetic retinopathy",
    code_type = "icd10",
    all_lkps_maps = all_lkps_maps,
    ignore_case = TRUE,
    codes_only = TRUE
  )

# attempting to map this to Read 2 returns a NULL result however
map_codes(
  codes = icd10_diabetic_retinopathy,
  from = "icd10",
  to = "read2",
  all_lkps_maps = all_lkps_maps,
  standardise_output = FALSE,
  codes_only = TRUE
)
#> Warning in map_codes(codes = icd10_diabetic_retinopathy, from = "icd10", :
#> Warning! No mapping sheet available for this request. Attempting to map anyway
#> using: read_v2_icd10
#> 
#> No codes found after mapping. Returning `NULL`
#> NULL

Inspecting the mapping sheet read_v2_icd10 shows why. The icd10_code column contains 2 ICD-10 codes which both describe diabetic retinopathy:

all_lkps_maps$read_v2_icd10 %>% 
  dplyr::filter(stringr::str_detect(icd10_code, pattern = "H360"))
#> # A tibble: 17 × 3
#>    read_code icd10_code  icd10_code_def
#>    <chr>     <chr>       <chr>         
#>  1 C1087     E103D H360A 7             
#>  2 C1096     E113D H360A 7             
#>  3 C10E7     E103D H360A 7             
#>  4 C10EP     E103D H360A 7             
#>  5 C10F6     E113D H360A 7             
#>  6 C10FQ     E113D H360A 7             
#>  7 F420.     E143D H360A 7             
#>  8 F4200     E143D H360A 7             
#>  9 F4201     E143D H360A 7             
#> 10 F4202     E143D H360A 7             
#> 11 F4203     E143D H360A 7             
#> 12 F4204     E143D H360A 7             
#> 13 F4205     H360A       5             
#> 14 F4206     E143D H360A 7             
#> 15 F4207     E143D H360A 7             
#> 16 F4208     E143D H360A 7             
#> 17 F420z     E143D H360A 7

Note: “H36.0” is converted to “H360” by map_codes internally, as this is the format used by the mapping sheets.

Find codes that match a description

Use search_codes_by_description(). For example, to find Read2 codes that match the description ‘diabetic retinopathy’:

search_codes_by_description(reg_expr = "diabetic retinopathy", 
                          code_type = "read2", 
                          all_lkps_maps = all_lkps_maps, 
                          ignore_case = TRUE, 
                          codes_only = FALSE,
                          preferred_description_only = TRUE) %>% 
  head()
#> # A tibble: 6 × 3
#>   read_code term_code term_description                                          
#>   <chr>     <chr>     <chr>                                                     
#> 1 2BBJ.     00        O/E - no right diabetic retinopathy                       
#> 2 2BBk.     00        O/E - right eye stable treated proliferative diabetic ret…
#> 3 2BBK.     00        O/E - no left diabetic retinopathy                        
#> 4 2BBl.     00        O/E - left eye stable treated proliferative diabetic reti…
#> 5 2BBo.     00        O/E - sight threatening diabetic retinopathy              
#> 6 2BBP.     00        O/E - right eye background diabetic retinopathy

Children codes

To get the children codes, use get_child_codes(). This will return all unique clinical codes that start with the codes of interest:

# Some Read2 codes for T1DM
t1dm_read2 <- c("C10E.", "C108.")

# lookup details
get_child_codes(codes = t1dm_read2, 
                code_type = "read2", 
                all_lkps_maps = all_lkps_maps, 
                codes_only = TRUE,
                standardise_output = FALSE)
#>  [1] "C108." "C1080" "C1081" "C1082" "C1083" "C1084" "C1085" "C1086" "C1087"
#> [10] "C1088" "C1089" "C108A" "C108B" "C108C" "C108D" "C108E" "C108F" "C108G"
#> [19] "C108H" "C108J" "C108y" "C108z" "C10E." "C10E0" "C10E1" "C10E2" "C10E3"
#> [28] "C10E4" "C10E5" "C10E6" "C10E7" "C10E8" "C10E9" "C10EA" "C10EB" "C10EC"
#> [37] "C10ED" "C10EE" "C10EF" "C10EG" "C10EH" "C10EJ" "C10EK" "C10EL" "C10EM"
#> [46] "C10EN" "C10EP" "C10EQ" "C10ER"

Note: Some coding systems include a ‘.’ (e.g. ICD-10) - this may return unexpected results with get_child_codes(), as this function searches using regexs and ‘.’ is interpreted as a wildcard.

By default, a character vector of codes is returned. To return a data frame including code descriptions, set the argument codes_only to FALSE:

# Some Read2 codes for T1DM
t1dm_read2 <- c("C10E.", "C108.")

# lookup details
get_child_codes(codes = t1dm_read2, 
                code_type = "read2", 
                all_lkps_maps = all_lkps_maps, 
                codes_only = FALSE) %>% 
  head()
#> # A tibble: 6 × 3
#>   code  description                                                    code_type
#>   <chr> <chr>                                                          <chr>    
#> 1 C108. Insulin dependent diabetes mellitus                            read2    
#> 2 C1080 Insulin-dependent diabetes mellitus with renal complications   read2    
#> 3 C1081 Insulin-dependent diabetes mellitus with ophthalmic complicat… read2    
#> 4 C1082 Insulin-dependent diabetes mellitus with neurological complic… read2    
#> 5 C1083 Insulin dependent diabetes mellitus with multiple complicatio… read2    
#> 6 C1084 Unstable insulin dependent diabetes mellitus                   read2

sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur 10.16
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices datasets  utils     methods   base     
#> 
#> other attached packages:
#>  [1] forcats_0.5.1         stringr_1.4.0         dplyr_1.0.7          
#>  [4] purrr_0.3.4           readr_2.0.2           tidyr_1.1.4          
#>  [7] tibble_3.1.4          ggplot2_3.3.5         tidyverse_1.3.1      
#> [10] codemapper_0.0.0.9000 ukbwranglr_0.0.0.9000 workflowr_1.6.2      
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.7        lubridate_1.7.10  ps_1.6.0          assertthat_0.2.1 
#>  [5] rprojroot_2.0.2   digest_0.6.28     utf8_1.2.2        R6_2.5.1         
#>  [9] cellranger_1.1.0  backports_1.2.1   reprex_2.0.1      evaluate_0.14    
#> [13] httr_1.4.2        pillar_1.6.3      rlang_0.4.11      readxl_1.3.1     
#> [17] rstudioapi_0.13   data.table_1.14.2 callr_3.7.0       whisker_0.4      
#> [21] jquerylib_0.1.4   rmarkdown_2.9     igraph_1.2.6      munsell_0.5.0    
#> [25] broom_0.7.9       compiler_4.1.0    httpuv_1.6.3      modelr_0.1.8     
#> [29] xfun_0.24         pkgconfig_2.0.3   htmltools_0.5.2   tidyselect_1.1.1 
#> [33] codetools_0.2-18  fansi_0.5.0       crayon_1.4.1      tzdb_0.1.2       
#> [37] dbplyr_2.1.1      withr_2.4.2       later_1.3.0       grid_4.1.0       
#> [41] jsonlite_1.7.2    gtable_0.3.0      lifecycle_1.0.1   DBI_1.1.1        
#> [45] git2r_0.28.0      magrittr_2.0.1    scales_1.1.1      cli_3.0.1        
#> [49] stringi_1.7.4     renv_0.13.2       fs_1.5.0          promises_1.2.0.1 
#> [53] xml2_1.3.2        bslib_0.3.0       targets_0.8.0     ellipsis_0.3.2   
#> [57] generics_0.1.0    vctrs_0.3.8       tools_4.1.0       glue_1.4.2       
#> [61] hms_1.1.1         processx_3.5.2    fastmap_1.1.0     yaml_2.2.1       
#> [65] colorspace_2.0-2  rvest_1.0.1       knitr_1.34        haven_2.4.3      
#> [69] sass_0.4.0

Clinical codes - lookups and mappings

Alasdair Warwick

05 October, 2021

Introduction

Code lookups

Code mapping

Find codes that match a description

Children codes