Last updated: 2022-06-10

Checks: 6 1

Knit directory: codemapper_notes/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

The global environment had objects present when the code in the R Markdown file was run. These objects can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment. Use wflow_publish or wflow_build to ensure that the code is always run in an empty environment.

The following objects were defined in the global environment when these results were created:

Name Class Size
install_codemapper function 1.2 Kb

The command set.seed(20210923) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 301314f. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Renviron
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    _targets/meta/process
    Ignored:    _targets/meta/progress
    Ignored:    _targets/objects/
    Ignored:    _targets/user/
    Ignored:    all_lkps_maps.db
    Ignored:    renv/library/
    Ignored:    renv/staging/
    Ignored:    tar_make.R

Unstaged changes:
    Modified:   _targets/meta/meta
    Modified:   analysis/read3_icd10_mapping.Rmd
    Modified:   renv.lock

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/clinical_codes_lkps_and_mappings.Rmd) and HTML (public/clinical_codes_lkps_and_mappings.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 92339ca rmgpanw 2022-04-28 various code tweaks; rerun targets pipeline
html 92339ca rmgpanw 2022-04-28 various code tweaks; rerun targets pipeline
html ae02335 Chuin Ying Ung 2022-02-22 update notes for read3_icd10
html 1df97df rmgpanw 2022-02-17 incorporate icd9 and icd10 to phecode maps
Rmd 81047b4 rmgpanw 2022-02-17 setup for gitlab CI with pkgdown site and test coverage; start adding read3 to snomed mapping
html 81047b4 rmgpanw 2022-02-17 setup for gitlab CI with pkgdown site and test coverage; start adding read3 to snomed mapping
html 5c2a3e3 Chuin Ying Ung 2022-02-17 update _targets.R (housekeeping) and phecode.Rmd
Rmd f8d1889 rmgpanw 2021-10-09 icd10 codes now returned as ALT CODE; codemapper app includes self-reported codes
html f8d1889 rmgpanw 2021-10-09 icd10 codes now returned as ALT CODE; codemapper app includes self-reported codes
Rmd 919be0d rmgpanw 2021-10-07 renamed functions and made shiny app for selecting codes
html 919be0d rmgpanw 2021-10-07 renamed functions and made shiny app for selecting codes
Rmd d285b07 rmgpanw 2021-09-29 add analysis/reformat_all_lkps_maps.Rmd to _targets.R; start analysis/clinical_codes_lkps_and_mappings.Rmd

library(codemapper)
library(tidyverse)
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
#> ✔ ggplot2 3.3.5     ✔ purrr   0.3.4
#> ✔ tibble  3.1.7     ✔ dplyr   1.0.9
#> ✔ tidyr   1.2.0     ✔ stringr 1.4.0
#> ✔ readr   2.1.2     ✔ forcats 0.5.1
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
library(targets)

tar_load(ALL_LKPS_MAPS_DB)

Introduction

This vignette describes how to work with clinical codes using codemapper. Specifically:

The functions provided by codemapper rely on UK Biobank resource 592, which includes an Excel workbook containing lookup and mapping tables, and the NHSBSA BNF to SNOMED code mapping file (available here). These tables have been converted into a named list of data frames. This can be retrieved with TODO():

# retrieve code mappings in .Rdata format
targets::tar_load(all_lkps_maps)

# each item in the list is a sheet in the UKB Excel workbook (resource 592)
names(all_lkps_maps)
#>  [1] "bnf_lkp"                    "dmd_lkp"                   
#>  [3] "icd9_lkp"                   "icd10_lkp"                 
#>  [5] "icd9_icd10"                 "read_v2_lkp"               
#>  [7] "read_v2_drugs_lkp"          "read_v2_drugs_bnf"         
#>  [9] "read_v2_icd9"               "read_v2_icd10"             
#> [11] "read_v2_opcs4"              "read_v2_read_ctv3"         
#> [13] "read_ctv3_lkp"              "read_ctv3_icd9"            
#> [15] "read_ctv3_icd10"            "read_ctv3_opcs4"           
#> [17] "read_ctv3_read_v2"          "opcs4_lkp"                 
#> [19] "self_report_cancer"         "self_report_medication"    
#> [21] "self_report_operation"      "self_report_non_cancer"    
#> [23] "bnf_dmd"                    "self_report_med_to_atc_map"
#> [25] "read_ctv3_sct"              "phecode_lkp"               
#> [27] "icd10_phecode"              "icd9_phecode"

Code lookups

To lookup details for a particular code, use lookup_codes(). Setting preferred_description_only to TRUE will return only the preferred code descriptions if synonyms are present (read2 and read3 may include multiple descriptions for the same code):

# Some Read2 codes for T1DM
t1dm_read2 <- c("C10E.", "C108.")

# lookup details
lookup_codes(codes = t1dm_read2,
             code_type = "read2",
             all_lkps_maps = all_lkps_maps,
             preferred_description_only = TRUE)
#> # A tibble: 2 × 3
#>   code  description                         code_type
#>   <chr> <chr>                               <chr>    
#> 1 C108. Insulin dependent diabetes mellitus read2    
#> 2 C10E. Type 1 diabetes mellitus            read2

By default, the output is standardised to produce the columns shown above. To output the original formatting from UK Biobank resource 592, set standardise_output to FALSE:

# lookup details
lookup_codes(codes = t1dm_read2,
             code_type = "read2",
             all_lkps_maps = all_lkps_maps,
             preferred_description_only = TRUE,
             standardise_output = FALSE)
#> # A tibble: 2 × 4
#>   .rowid read_code term_code term_description                   
#>    <int> <chr>     <chr>     <chr>                              
#> 1  56884 C108.     00        Insulin dependent diabetes mellitus
#> 2  57026 C10E.     00        Type 1 diabetes mellitus

Code mapping

An example: map codes for type 1 diabetes from Read2 to Read3. Note that both of the Read 2 codes C10E. and C108. map to a single read3 code, X40J4:

# Some Read2 codes for T1DM
t1dm_read2 <- c("C10E.", "C108.")

# lookup details
map_codes(codes = t1dm_read2, 
          from = "read2", 
          to = "read3", 
          all_lkps_maps = all_lkps_maps)
#> # A tibble: 5 × 3
#>   code  description                                code_type
#>   <chr> <chr>                                      <chr>    
#> 1 X40J4 Type I diabetes mellitus                   read3    
#> 2 X40J4 Type 1 diabetes mellitus                   read3    
#> 3 X40J4 IDDM - Insulin-dependent diabetes mellitus read3    
#> 4 X40J4 Juvenile onset diabetes mellitus           read3    
#> 5 X40J4 Insulin-dependent diabetes mellitus        read3

Note that preferred_description_only cannot be TRUE if standardise_output is FALSE with map_codes (will raise an error). This is because some codes may otherwise be ‘lost’ in the mapping process. When standardise_output is TRUE, the mapped codes from map_codes are passed on to lookup_codes, at which point one can request to return only preferred code descriptions:

# mapping the Read 2 code "D4104" to Read 3 only returns the secondary Read 3
# description (`TERMV3_TYPE` == "S"), unlike for "D4103". 
map_codes(codes = c("D4103", "D4104"), 
          from = "read2", 
          to = "read3", 
          all_lkps_maps = all_lkps_maps, 
          codes_only = FALSE,
          preferred_description_only = FALSE,
          standardise_output = FALSE) %>% 
  dplyr::select(tidyselect::contains("V3"))
#> # A tibble: 2 × 4
#>   READV3_CODE TERMV3_CODE TERMV3_TYPE TERMV3_DESC                               
#>   <chr>       <chr>       <chr>       <chr>                                     
#> 1 D4103       Y20e3       P           Polycythaemia due to cyanotic respiratory…
#> 2 D4104       Y20e6       S           Renal polycythaemia
# if `standardise_output` is `TRUE`, then `preferred_description_only` may also
# be set to `TRUE`
map_codes(codes = c("D4103", "D4104"), 
          from = "read2", 
          to = "read3", 
          all_lkps_maps = all_lkps_maps, 
          codes_only = FALSE,
          preferred_description_only = TRUE,
          standardise_output = TRUE)
#> # A tibble: 2 × 3
#>   code  description                                        code_type
#>   <chr> <chr>                                              <chr>    
#> 1 D4103 Polycythaemia due to cyanotic respiratory disease  read3    
#> 2 D4104 Secondary polycythaemia with excess erythropoietin read3

Mapping to ICD is more problematic as some results are a range of ICD codes (note also that for ICD-10, the mapping sheets use an alternative code format which removes any “.” characters):

map_codes(codes = t1dm_read2, 
          from = "read2", 
          to = "icd10", 
          all_lkps_maps = all_lkps_maps, 
          codes_only = TRUE,
          standardise_output = FALSE, 
          # by default, an error is raised if any unrecognised codes are present
          unrecognised_codes = 'warning')
#> Warning in handle_unrecognised_codes(unrecognised_codes = unrecognised_codes, :
#> The following 2 codes were not found for 'read2' in table 'read_v2_icd10':
#> 'C10E.', 'C108.'
#> No codes found after mapping. Returning `NULL`
#> NULL

The available mappings do not cover all possible mapping directions. For example, while there are mappings for Read2 to ICD-10, there is no mapping for ICD-10 to Read2. For cases like this, by setting the reverse_mapping argument to ‘warning’ for map_codes() an attempt to map anyway by using the same mapping sheet in reverse (e.g. for mapping ICD-10 to Read2, map_codes uses the read_v2_icd10 mapping sheet).

# find ICD-10 code matching "diabetic retinopathy"
icd10_diabetic_retinopathy <-
  code_descriptions_like(
    reg_expr = "diabetic retinopathy",
    code_type = "icd10",
    all_lkps_maps = all_lkps_maps,
    ignore_case = TRUE,
    codes_only = TRUE
  )

# attempting to map this to Read 2 returns a NULL result however
map_codes(
  codes = icd10_diabetic_retinopathy,
  from = "icd10",
  to = "read2",
  all_lkps_maps = all_lkps_maps,
  standardise_output = TRUE,
  codes_only = FALSE, 
  reverse_mapping = 'warning'
) %>% 
  knitr::kable()
#> Warning in check_mapping_args(from = from, to = to, reverse_mapping =
#> reverse_mapping): No mapping sheet available for this request. Attempting to map
#> anyway using: read_v2_icd10
code description code_type
C1087 Insulin dependent diabetes mellitus with retinopathy read2
C1087 Type I diabetes mellitus with retinopathy read2
C1087 Type 1 diabetes mellitus with retinopathy read2
C1096 Non-insulin-dependent diabetes mellitus with retinopathy read2
C1096 Type II diabetes mellitus with retinopathy read2
C1096 Type 2 diabetes mellitus with retinopathy read2
C10E7 Type 1 diabetes mellitus with retinopathy read2
C10E7 Type I diabetes mellitus with retinopathy read2
C10E7 Insulin dependent diabetes mellitus with retinopathy read2
C10EP Type 1 diabetes mellitus with exudative maculopathy read2
C10EP Type I diabetes mellitus with exudative maculopathy read2
C10F6 Type 2 diabetes mellitus with retinopathy read2
C10F6 Type II diabetes mellitus with retinopathy read2
C10FQ Type 2 diabetes mellitus with exudative maculopathy read2
C10FQ Type II diabetes mellitus with exudative maculopathy read2
F420. Diabetic retinopathy read2
F4200 Background diabetic retinopathy read2
F4201 Proliferative diabetic retinopathy read2
F4202 Preproliferative diabetic retinopathy read2
F4203 Advanced diabetic maculopathy read2
F4204 Diabetic maculopathy read2
F4205 Advanced diabetic retinal disease read2
F4206 Non proliferative diabetic retinopathy read2
F4207 High risk proliferative diabetic retinopathy read2
F4208 High risk non proliferative diabetic retinopathy read2
F420z Diabetic retinopathy NOS read2

Find codes that match a description

Use code_descriptions_like(). For example, to find Read2 codes that match the description ‘diabetic retinopathy’:

code_descriptions_like(reg_expr = "diabetic retinopathy", 
                          code_type = "read2", 
                          all_lkps_maps = all_lkps_maps, 
                          ignore_case = TRUE, 
                          codes_only = FALSE,
                          preferred_description_only = TRUE) %>% 
  head() %>% 
  knitr::kable()
code description code_type
2BBJ. O/E - no right diabetic retinopathy read2
2BBk. O/E - right eye stable treated proliferative diabetic retinopathy read2
2BBK. O/E - no left diabetic retinopathy read2
2BBl. O/E - left eye stable treated proliferative diabetic retinopathy read2
2BBo. O/E - sight threatening diabetic retinopathy read2
2BBP. O/E - right eye background diabetic retinopathy read2

Children codes

To get the children codes, use codes_starting_with(). This will return all unique clinical codes that start with the codes of interest:

# Some Read2 codes for T1DM
t1dm_read2 <- c("C10E.", "C108.")

# lookup details
codes_starting_with(codes = t1dm_read2, 
                code_type = "read2", 
                all_lkps_maps = all_lkps_maps, 
                codes_only = TRUE,
                standardise_output = FALSE) %>% 
  knitr::kable()
x
C108.
C10E.

Note: Some coding systems include a ‘.’ (e.g. ICD-10) - this may return unexpected results with codes_starting_with(), as this function searches using regexs and ‘.’ is interpreted as a wildcard.

By default, a character vector of codes is returned. To return a data frame including code descriptions, set the argument codes_only to FALSE:

# Some Read2 codes for T1DM
t1dm_read2 <- c("C10E.", "C108.")

# lookup details
codes_starting_with(codes = t1dm_read2, 
                code_type = "read2", 
                all_lkps_maps = all_lkps_maps, 
                codes_only = FALSE) %>% 
  head() %>% 
  knitr::kable()
code description code_type
C108. Insulin dependent diabetes mellitus read2
C10E. Type 1 diabetes mellitus read2

ICD-10

ICD-10 codes are presented in two different formats in the icd10_lkp table: ICD_10 and ALT_CODE. The latter is how ICD-10 codes are recorded in UKB. However, an ‘X’ is appended to 3 character codes without any 4 character children (e.g. A38X, ‘Scarlet fever’).

Another issue with the ICD_10 format is that is contains some duplicated codes e.g. I70.0 appears 3 times. This is because of MODIFIER_5 - the ALT_CODE format records a different code for each of these.


sessionInfo()
#> R version 4.2.0 (2022-04-22)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur/Monterey 10.16
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices datasets  utils     methods   base     
#> 
#> other attached packages:
#>  [1] targets_0.12.0        forcats_0.5.1         stringr_1.4.0        
#>  [4] dplyr_1.0.9           purrr_0.3.4           readr_2.1.2          
#>  [7] tidyr_1.2.0           tibble_3.1.7          ggplot2_3.3.5        
#> [10] tidyverse_1.3.1       codemapper_0.0.0.9001 workflowr_1.7.0      
#> 
#> loaded via a namespace (and not attached):
#>  [1] httr_1.4.2        sass_0.4.1        jsonlite_1.8.0    modelr_0.1.8     
#>  [5] bslib_0.3.1       shiny_1.7.1       assertthat_0.2.1  getPass_0.2-2    
#>  [9] highr_0.9         base64url_1.4     renv_0.13.2       cellranger_1.1.0 
#> [13] yaml_2.3.5        pillar_1.7.0      backports_1.4.1   glue_1.6.2       
#> [17] digest_0.6.29     promises_1.2.0.1  rvest_1.0.2       colorspace_2.0-3 
#> [21] htmltools_0.5.2   httpuv_1.6.5      pkgconfig_2.0.3   broom_0.8.0      
#> [25] haven_2.5.0       xtable_1.8-4      scales_1.2.0      processx_3.5.3   
#> [29] whisker_0.4       later_1.3.0       tzdb_0.3.0        git2r_0.30.1     
#> [33] generics_0.1.2    ellipsis_0.3.2    withr_2.5.0       cli_3.3.0        
#> [37] magrittr_2.0.3    crayon_1.5.1      readxl_1.4.0      mime_0.12        
#> [41] evaluate_0.15     ps_1.7.0          fs_1.5.2          fansi_1.0.3      
#> [45] xml2_1.3.3        tools_4.2.0       data.table_1.14.2 hms_1.1.1        
#> [49] lifecycle_1.0.1   munsell_0.5.0     reprex_2.0.1      callr_3.7.0      
#> [53] compiler_4.2.0    jquerylib_0.1.4   rlang_1.0.2       grid_4.2.0       
#> [57] rstudioapi_0.13   igraph_1.3.1      rmarkdown_2.14    gtable_0.3.0     
#> [61] codetools_0.2-18  DBI_1.1.2         R6_2.5.1          lubridate_1.8.0  
#> [65] knitr_1.39        fastmap_1.1.0     utf8_1.2.2        rprojroot_2.0.3  
#> [69] stringi_1.7.6     Rcpp_1.0.8.3      vctrs_0.4.1       dbplyr_2.2.0     
#> [73] tidyselect_1.1.2  xfun_0.30