Last updated: 2022-02-22

Checks: 6 1

Knit directory: codemapper/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

The global environment had objects present when the code in the R Markdown file was run. These objects can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment. Use wflow_publish or wflow_build to ensure that the code is always run in an empty environment.

The following objects were defined in the global environment when these results were created:

Name Class Size
install_codemapper function 1.2 Kb

The command set.seed(20210923) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version a42dc66. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Renviron
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    _targets/
    Ignored:    all_lkps_maps.db
    Ignored:    renv/library/
    Ignored:    renv/staging/
    Ignored:    tar_make.R
    Ignored:    ukbb_pan_ancestry-master/

Unstaged changes:
    Modified:   R/utils.R
    Modified:   analysis/read3_icd10_mapping.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/clinical_codes_lkps_and_mappings.Rmd) and HTML (public/clinical_codes_lkps_and_mappings.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
html 1df97df rmgpanw 2022-02-17 incorporate icd9 and icd10 to phecode maps
Rmd 81047b4 rmgpanw 2022-02-17 setup for gitlab CI with pkgdown site and test coverage; start adding read3 to snomed mapping
html 81047b4 rmgpanw 2022-02-17 setup for gitlab CI with pkgdown site and test coverage; start adding read3 to snomed mapping
html 5c2a3e3 Chuin Ying Ung 2022-02-17 update _targets.R (housekeeping) and phecode.Rmd
Rmd f8d1889 rmgpanw 2021-10-09 icd10 codes now returned as ALT CODE; codemapper app includes self-reported codes
html f8d1889 rmgpanw 2021-10-09 icd10 codes now returned as ALT CODE; codemapper app includes self-reported codes
Rmd 919be0d rmgpanw 2021-10-07 renamed functions and made shiny app for selecting codes
html 919be0d rmgpanw 2021-10-07 renamed functions and made shiny app for selecting codes
Rmd d285b07 rmgpanw 2021-09-29 add analysis/reformat_all_lkps_maps.Rmd to _targets.R; start analysis/clinical_codes_lkps_and_mappings.Rmd

library(codemapper)
#> Loading required package: ukbwranglr
library(tidyverse)
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
#> ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
#> ✓ tibble  3.1.4     ✓ dplyr   1.0.7
#> ✓ tidyr   1.1.4     ✓ stringr 1.4.0
#> ✓ readr   2.0.2     ✓ forcats 0.5.1
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag()    masks stats::lag()

Introduction

This vignette describes how to work with clinical codes using codemapper. Specifically:

The functions provided by codemapper rely on UK Biobank resource 592, which includes an Excel workbook containing lookup and mapping tables, and the NHSBSA BNF to SNOMED code mapping file (available here). These tables have been converted into a named list of data frames. This can be retrieved with TODO():

# retrieve code mappings in .Rdata format
targets::tar_load(all_lkps_maps)

# each item in the list is a sheet in the UKB Excel workbook (resource 592)
names(all_lkps_maps)
#>  [1] "bnf_lkp"                    "dmd_lkp"                   
#>  [3] "icd9_lkp"                   "icd10_lkp"                 
#>  [5] "icd9_icd10"                 "read_v2_lkp"               
#>  [7] "read_v2_drugs_lkp"          "read_v2_drugs_bnf"         
#>  [9] "read_v2_icd9"               "read_v2_icd10"             
#> [11] "read_v2_opcs4"              "read_v2_read_ctv3"         
#> [13] "read_ctv3_lkp"              "read_ctv3_icd9"            
#> [15] "read_ctv3_icd10"            "read_ctv3_opcs4"           
#> [17] "read_ctv3_read_v2"          "bnf_dmd"                   
#> [19] "opcs4_lkp"                  "self_report_cancer"        
#> [21] "self_report_medication"     "self_report_operation"     
#> [23] "self_report_non_cancer"     "self_report_med_to_atc_map"
#> [25] "read_ctv3_sct"              "phecode_lkp"               
#> [27] "icd10_phecode"              "icd9_phecode"

Code lookups

To lookup details for a particular code, use lookup_codes(). Setting preferred_description_only to TRUE will return only the preferred code descriptions if synonyms are present (read2 and read3 may include multiple descriptions for the same code):

# Some Read2 codes for T1DM
t1dm_read2 <- c("C10E.", "C108.")

# lookup details
lookup_codes(codes = t1dm_read2,
             code_type = "read2",
             all_lkps_maps = all_lkps_maps,
             preferred_description_only = TRUE)
#> # A tibble: 2 × 3
#>   code  description                         code_type
#>   <chr> <chr>                               <chr>    
#> 1 C108. Insulin dependent diabetes mellitus read2    
#> 2 C10E. Type 1 diabetes mellitus            read2

By default, the output is standardised to produce the columns shown above. To output the original formatting from UK Biobank resource 592, set standardise_output to FALSE:

# lookup details
lookup_codes(codes = t1dm_read2,
             code_type = "read2",
             all_lkps_maps = all_lkps_maps,
             preferred_description_only = TRUE,
             standardise_output = FALSE)
#> # A tibble: 2 × 3
#>   read_code term_code term_description                   
#>   <chr>     <chr>     <chr>                              
#> 1 C108.     00        Insulin dependent diabetes mellitus
#> 2 C10E.     00        Type 1 diabetes mellitus

Code mapping

An example: map codes for type 1 diabetes from Read2 to Read3. Note that both of the Read 2 codes C10E. and C108. map to a single read3 code, X40J4:

# Some Read2 codes for T1DM
t1dm_read2 <- c("C10E.", "C108.")

# lookup details
map_codes(codes = t1dm_read2, 
          from = "read2", 
          to = "read3", 
          all_lkps_maps = all_lkps_maps)
#> # A tibble: 5 × 3
#>   code  description                                code_type
#>   <chr> <chr>                                      <chr>    
#> 1 X40J4 Type I diabetes mellitus                   read3    
#> 2 X40J4 Type 1 diabetes mellitus                   read3    
#> 3 X40J4 IDDM - Insulin-dependent diabetes mellitus read3    
#> 4 X40J4 Juvenile onset diabetes mellitus           read3    
#> 5 X40J4 Insulin-dependent diabetes mellitus        read3

Note that preferred_description_only cannot be TRUE if standardise_output is FALSE with map_codes (will raise an error). This is because some codes may otherwise be ‘lost’ in the mapping process. When standardise_output is TRUE, the mapped codes from map_codes are passed on to lookup_codes, at which point one can request to return only preferred code descriptions:

# mapping the Read 2 code "D4104" to Read 3 only returns the secondary Read 3
# description (`TERMV3_TYPE` == "S"), unlike for "D4103". 
map_codes(codes = c("D4103", "D4104"), 
          from = "read2", 
          to = "read3", 
          all_lkps_maps = all_lkps_maps, 
          codes_only = FALSE,
          preferred_description_only = FALSE,
          standardise_output = FALSE) %>% 
  dplyr::select(tidyselect::contains("V3"))
#> # A tibble: 2 × 4
#>   READV3_CODE TERMV3_CODE TERMV3_TYPE TERMV3_DESC                               
#>   <chr>       <chr>       <chr>       <chr>                                     
#> 1 D4103       Y20e3       P           Polycythaemia due to cyanotic respiratory…
#> 2 D4104       Y20e6       S           Renal polycythaemia
# if `standardise_output` is `TRUE`, then `preferred_description_only` may also
# be set to `TRUE`
map_codes(codes = c("D4103", "D4104"), 
          from = "read2", 
          to = "read3", 
          all_lkps_maps = all_lkps_maps, 
          codes_only = FALSE,
          preferred_description_only = TRUE,
          standardise_output = TRUE)
#> # A tibble: 2 × 3
#>   code  description                                        code_type
#>   <chr> <chr>                                              <chr>    
#> 1 D4103 Polycythaemia due to cyanotic respiratory disease  read3    
#> 2 D4104 Secondary polycythaemia with excess erythropoietin read3

Mapping to ICD is more problematic as some results are a range of ICD codes (note also that for ICD-10, the mapping sheets use an alternative code format which removes any “.” characters):

map_codes(codes = t1dm_read2, 
          from = "read2", 
          to = "icd10", 
          all_lkps_maps = all_lkps_maps, 
          codes_only = TRUE,
          standardise_output = FALSE)
#> [1] "E100-E109"

The available mappings do not cover all possible mapping directions. For example, while there are mappings for Read2 to ICD-10, there is no mapping for ICD-10 to Read2. For cases like this, map_codes() will attempt to map anyway by using the same mapping sheet in reverse (e.g. for mapping ICD-10 to Read2, map_codes uses the read_v2_icd10 mapping sheet). However, this returns no results when attempting to map the ICD-10 code, H36.0 for diabetic retinopathy:

# find ICD-10 code matching "diabetic retinopathy"
icd10_diabetic_retinopathy <-
  code_descriptions_like(
    reg_expr = "diabetic retinopathy",
    code_type = "icd10",
    all_lkps_maps = all_lkps_maps,
    ignore_case = TRUE,
    codes_only = TRUE
  )

# attempting to map this to Read 2 returns a NULL result however
map_codes(
  codes = icd10_diabetic_retinopathy,
  from = "icd10",
  to = "read2",
  all_lkps_maps = all_lkps_maps,
  standardise_output = FALSE,
  codes_only = TRUE
)
#> Warning in check_mapping_args(from = from, to = to): Warning! No mapping sheet
#> available for this request. Attempting to map anyway using: read_v2_icd10
#> 
#> No codes found after mapping. Returning `NULL`
#> NULL

Inspecting the mapping sheet read_v2_icd10 shows why. The icd10_code column contains 2 ICD-10 codes which both describe diabetic retinopathy:

all_lkps_maps$read_v2_icd10 %>% 
  dplyr::filter(stringr::str_detect(icd10_code, pattern = "H360"))
#> # A tibble: 17 × 3
#>    read_code icd10_code  icd10_code_def
#>    <chr>     <chr>       <chr>         
#>  1 C1087     E103D H360A 7             
#>  2 C1096     E113D H360A 7             
#>  3 C10E7     E103D H360A 7             
#>  4 C10EP     E103D H360A 7             
#>  5 C10F6     E113D H360A 7             
#>  6 C10FQ     E113D H360A 7             
#>  7 F420.     E143D H360A 7             
#>  8 F4200     E143D H360A 7             
#>  9 F4201     E143D H360A 7             
#> 10 F4202     E143D H360A 7             
#> 11 F4203     E143D H360A 7             
#> 12 F4204     E143D H360A 7             
#> 13 F4205     H360A       5             
#> 14 F4206     E143D H360A 7             
#> 15 F4207     E143D H360A 7             
#> 16 F4208     E143D H360A 7             
#> 17 F420z     E143D H360A 7

Note: “H36.0” is converted to “H360” by map_codes internally, as this is the format used by the mapping sheets.

Find codes that match a description

Use code_descriptions_like(). For example, to find Read2 codes that match the description ‘diabetic retinopathy’:

code_descriptions_like(reg_expr = "diabetic retinopathy", 
                          code_type = "read2", 
                          all_lkps_maps = all_lkps_maps, 
                          ignore_case = TRUE, 
                          codes_only = FALSE,
                          preferred_description_only = TRUE) %>% 
  head()
#> # A tibble: 6 × 3
#>   code  description                                                    code_type
#>   <chr> <chr>                                                          <chr>    
#> 1 2BBJ. O/E - no right diabetic retinopathy                            read2    
#> 2 2BBk. O/E - right eye stable treated proliferative diabetic retinop… read2    
#> 3 2BBK. O/E - no left diabetic retinopathy                             read2    
#> 4 2BBl. O/E - left eye stable treated proliferative diabetic retinopa… read2    
#> 5 2BBo. O/E - sight threatening diabetic retinopathy                   read2    
#> 6 2BBP. O/E - right eye background diabetic retinopathy                read2

Children codes

To get the children codes, use codes_starting_with(). This will return all unique clinical codes that start with the codes of interest:

# Some Read2 codes for T1DM
t1dm_read2 <- c("C10E.", "C108.")

# lookup details
codes_starting_with(codes = t1dm_read2, 
                code_type = "read2", 
                all_lkps_maps = all_lkps_maps, 
                codes_only = TRUE,
                standardise_output = FALSE)
#> [1] "C108." "C10E."

Note: Some coding systems include a ‘.’ (e.g. ICD-10) - this may return unexpected results with codes_starting_with(), as this function searches using regexs and ‘.’ is interpreted as a wildcard.

By default, a character vector of codes is returned. To return a data frame including code descriptions, set the argument codes_only to FALSE:

# Some Read2 codes for T1DM
t1dm_read2 <- c("C10E.", "C108.")

# lookup details
codes_starting_with(codes = t1dm_read2, 
                code_type = "read2", 
                all_lkps_maps = all_lkps_maps, 
                codes_only = FALSE) %>% 
  head()
#> # A tibble: 2 × 3
#>   code  description                         code_type
#>   <chr> <chr>                               <chr>    
#> 1 C108. Insulin dependent diabetes mellitus read2    
#> 2 C10E. Type 1 diabetes mellitus            read2

ICD-10

ICD-10 codes are presented in two different formats in the icd10_lkp table: ICD_10 and ALT_CODE. The latter is how ICD-10 codes are recorded in UKB. However, an ‘X’ is appended to 3 character codes without any 4 character children (e.g. A38X, ‘Scarlet fever’).

Another issue with the ICD_10 format is that is contains some duplicated codes e.g. I70.0 appears 3 times. This is because of MODIFIER_5 - the ALT_CODE format records a different code for each of these.


sessionInfo()
#> R version 4.1.2 (2021-11-01)
#> Platform: aarch64-apple-darwin20 (64-bit)
#> Running under: macOS Monterey 12.2
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices datasets  utils     methods   base     
#> 
#> other attached packages:
#>  [1] forcats_0.5.1         stringr_1.4.0         dplyr_1.0.7          
#>  [4] purrr_0.3.4           readr_2.0.2           tidyr_1.1.4          
#>  [7] tibble_3.1.4          ggplot2_3.3.5         tidyverse_1.3.1      
#> [10] codemapper_0.0.0.9000 ukbwranglr_0.0.0.9000 workflowr_1.6.2      
#> 
#> loaded via a namespace (and not attached):
#>  [1] httr_1.4.2        sass_0.4.0        jsonlite_1.7.2    modelr_0.1.8     
#>  [5] bslib_0.3.0       shiny_1.7.0       assertthat_0.2.1  highr_0.9        
#>  [9] renv_0.13.2       cellranger_1.1.0  yaml_2.2.1        pillar_1.6.3     
#> [13] backports_1.2.1   glue_1.4.2        digest_0.6.28     promises_1.2.0.1 
#> [17] rvest_1.0.1       colorspace_2.0-2  htmltools_0.5.2   httpuv_1.6.3     
#> [21] pkgconfig_2.0.3   broom_0.7.9       haven_2.4.3       xtable_1.8-4     
#> [25] scales_1.1.1      processx_3.5.2    whisker_0.4       later_1.3.0      
#> [29] tzdb_0.1.2        git2r_0.28.0      generics_0.1.0    ellipsis_0.3.2   
#> [33] withr_2.4.2       cli_3.0.1         magrittr_2.0.1    crayon_1.4.1     
#> [37] readxl_1.3.1      mime_0.12         evaluate_0.14     ps_1.6.0         
#> [41] fs_1.5.0          fansi_0.5.0       xml2_1.3.2        tools_4.1.2      
#> [45] data.table_1.14.2 hms_1.1.1         lifecycle_1.0.1   munsell_0.5.0    
#> [49] reprex_2.0.1      targets_0.8.0     callr_3.7.0       compiler_4.1.2   
#> [53] jquerylib_0.1.4   rlang_0.4.11      grid_4.1.2        rstudioapi_0.13  
#> [57] igraph_1.2.6      rmarkdown_2.11    gtable_0.3.0      codetools_0.2-18 
#> [61] DBI_1.1.1         R6_2.5.1          lubridate_1.7.10  knitr_1.34       
#> [65] fastmap_1.1.0     utf8_1.2.2        rprojroot_2.0.2   stringi_1.7.4    
#> [69] Rcpp_1.0.7        vctrs_0.3.8       dbplyr_2.1.1      tidyselect_1.1.1 
#> [73] xfun_0.24