Last updated: 2022-02-22
Checks: 6 1
Knit directory: codemapper/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
The global environment had objects present when the code in the R Markdown file was run. These objects can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment. Use wflow_publish or wflow_build to ensure that the code is always run in an empty environment.
The following objects were defined in the global environment when these results were created:
| Name | Class | Size |
|---|---|---|
| install_codemapper | function | 1.2 Kb |
The command set.seed(20210923) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version a42dc66. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Renviron
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: _targets/
Ignored: all_lkps_maps.db
Ignored: renv/library/
Ignored: renv/staging/
Ignored: tar_make.R
Ignored: ukbb_pan_ancestry-master/
Unstaged changes:
Modified: R/utils.R
Modified: analysis/read3_icd10_mapping.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/clinical_codes_lkps_and_mappings.Rmd) and HTML (public/clinical_codes_lkps_and_mappings.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.
| File | Version | Author | Date | Message |
|---|---|---|---|---|
| html | 1df97df | rmgpanw | 2022-02-17 | incorporate icd9 and icd10 to phecode maps |
| Rmd | 81047b4 | rmgpanw | 2022-02-17 | setup for gitlab CI with pkgdown site and test coverage; start adding read3 to snomed mapping |
| html | 81047b4 | rmgpanw | 2022-02-17 | setup for gitlab CI with pkgdown site and test coverage; start adding read3 to snomed mapping |
| html | 5c2a3e3 | Chuin Ying Ung | 2022-02-17 | update _targets.R (housekeeping) and phecode.Rmd |
| Rmd | f8d1889 | rmgpanw | 2021-10-09 | icd10 codes now returned as ALT CODE; codemapper app includes self-reported codes |
| html | f8d1889 | rmgpanw | 2021-10-09 | icd10 codes now returned as ALT CODE; codemapper app includes self-reported codes |
| Rmd | 919be0d | rmgpanw | 2021-10-07 | renamed functions and made shiny app for selecting codes |
| html | 919be0d | rmgpanw | 2021-10-07 | renamed functions and made shiny app for selecting codes |
| Rmd | d285b07 | rmgpanw | 2021-09-29 | add analysis/reformat_all_lkps_maps.Rmd to _targets.R; start analysis/clinical_codes_lkps_and_mappings.Rmd |
library(codemapper)
#> Loading required package: ukbwranglr
library(tidyverse)
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
#> ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
#> ✓ tibble 3.1.4 ✓ dplyr 1.0.7
#> ✓ tidyr 1.1.4 ✓ stringr 1.4.0
#> ✓ readr 2.0.2 ✓ forcats 0.5.1
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag() masks stats::lag()
This vignette describes how to work with clinical codes using codemapper. Specifically:
The functions provided by codemapper rely on UK Biobank resource 592, which includes an Excel workbook containing lookup and mapping tables, and the NHSBSA BNF to SNOMED code mapping file (available here). These tables have been converted into a named list of data frames. This can be retrieved with TODO():
# retrieve code mappings in .Rdata format
targets::tar_load(all_lkps_maps)
# each item in the list is a sheet in the UKB Excel workbook (resource 592)
names(all_lkps_maps)
#> [1] "bnf_lkp" "dmd_lkp"
#> [3] "icd9_lkp" "icd10_lkp"
#> [5] "icd9_icd10" "read_v2_lkp"
#> [7] "read_v2_drugs_lkp" "read_v2_drugs_bnf"
#> [9] "read_v2_icd9" "read_v2_icd10"
#> [11] "read_v2_opcs4" "read_v2_read_ctv3"
#> [13] "read_ctv3_lkp" "read_ctv3_icd9"
#> [15] "read_ctv3_icd10" "read_ctv3_opcs4"
#> [17] "read_ctv3_read_v2" "bnf_dmd"
#> [19] "opcs4_lkp" "self_report_cancer"
#> [21] "self_report_medication" "self_report_operation"
#> [23] "self_report_non_cancer" "self_report_med_to_atc_map"
#> [25] "read_ctv3_sct" "phecode_lkp"
#> [27] "icd10_phecode" "icd9_phecode"
To lookup details for a particular code, use lookup_codes(). Setting preferred_description_only to TRUE will return only the preferred code descriptions if synonyms are present (read2 and read3 may include multiple descriptions for the same code):
# Some Read2 codes for T1DM
t1dm_read2 <- c("C10E.", "C108.")
# lookup details
lookup_codes(codes = t1dm_read2,
code_type = "read2",
all_lkps_maps = all_lkps_maps,
preferred_description_only = TRUE)
#> # A tibble: 2 × 3
#> code description code_type
#> <chr> <chr> <chr>
#> 1 C108. Insulin dependent diabetes mellitus read2
#> 2 C10E. Type 1 diabetes mellitus read2
By default, the output is standardised to produce the columns shown above. To output the original formatting from UK Biobank resource 592, set standardise_output to FALSE:
# lookup details
lookup_codes(codes = t1dm_read2,
code_type = "read2",
all_lkps_maps = all_lkps_maps,
preferred_description_only = TRUE,
standardise_output = FALSE)
#> # A tibble: 2 × 3
#> read_code term_code term_description
#> <chr> <chr> <chr>
#> 1 C108. 00 Insulin dependent diabetes mellitus
#> 2 C10E. 00 Type 1 diabetes mellitus
An example: map codes for type 1 diabetes from Read2 to Read3. Note that both of the Read 2 codes C10E. and C108. map to a single read3 code, X40J4:
# Some Read2 codes for T1DM
t1dm_read2 <- c("C10E.", "C108.")
# lookup details
map_codes(codes = t1dm_read2,
from = "read2",
to = "read3",
all_lkps_maps = all_lkps_maps)
#> # A tibble: 5 × 3
#> code description code_type
#> <chr> <chr> <chr>
#> 1 X40J4 Type I diabetes mellitus read3
#> 2 X40J4 Type 1 diabetes mellitus read3
#> 3 X40J4 IDDM - Insulin-dependent diabetes mellitus read3
#> 4 X40J4 Juvenile onset diabetes mellitus read3
#> 5 X40J4 Insulin-dependent diabetes mellitus read3
Note that preferred_description_only cannot be TRUE if standardise_output is FALSE with map_codes (will raise an error). This is because some codes may otherwise be ‘lost’ in the mapping process. When standardise_output is TRUE, the mapped codes from map_codes are passed on to lookup_codes, at which point one can request to return only preferred code descriptions:
# mapping the Read 2 code "D4104" to Read 3 only returns the secondary Read 3
# description (`TERMV3_TYPE` == "S"), unlike for "D4103".
map_codes(codes = c("D4103", "D4104"),
from = "read2",
to = "read3",
all_lkps_maps = all_lkps_maps,
codes_only = FALSE,
preferred_description_only = FALSE,
standardise_output = FALSE) %>%
dplyr::select(tidyselect::contains("V3"))
#> # A tibble: 2 × 4
#> READV3_CODE TERMV3_CODE TERMV3_TYPE TERMV3_DESC
#> <chr> <chr> <chr> <chr>
#> 1 D4103 Y20e3 P Polycythaemia due to cyanotic respiratory…
#> 2 D4104 Y20e6 S Renal polycythaemia
# if `standardise_output` is `TRUE`, then `preferred_description_only` may also
# be set to `TRUE`
map_codes(codes = c("D4103", "D4104"),
from = "read2",
to = "read3",
all_lkps_maps = all_lkps_maps,
codes_only = FALSE,
preferred_description_only = TRUE,
standardise_output = TRUE)
#> # A tibble: 2 × 3
#> code description code_type
#> <chr> <chr> <chr>
#> 1 D4103 Polycythaemia due to cyanotic respiratory disease read3
#> 2 D4104 Secondary polycythaemia with excess erythropoietin read3
Mapping to ICD is more problematic as some results are a range of ICD codes (note also that for ICD-10, the mapping sheets use an alternative code format which removes any “.” characters):
map_codes(codes = t1dm_read2,
from = "read2",
to = "icd10",
all_lkps_maps = all_lkps_maps,
codes_only = TRUE,
standardise_output = FALSE)
#> [1] "E100-E109"
The available mappings do not cover all possible mapping directions. For example, while there are mappings for Read2 to ICD-10, there is no mapping for ICD-10 to Read2. For cases like this, map_codes() will attempt to map anyway by using the same mapping sheet in reverse (e.g. for mapping ICD-10 to Read2, map_codes uses the read_v2_icd10 mapping sheet). However, this returns no results when attempting to map the ICD-10 code, H36.0 for diabetic retinopathy:
# find ICD-10 code matching "diabetic retinopathy"
icd10_diabetic_retinopathy <-
code_descriptions_like(
reg_expr = "diabetic retinopathy",
code_type = "icd10",
all_lkps_maps = all_lkps_maps,
ignore_case = TRUE,
codes_only = TRUE
)
# attempting to map this to Read 2 returns a NULL result however
map_codes(
codes = icd10_diabetic_retinopathy,
from = "icd10",
to = "read2",
all_lkps_maps = all_lkps_maps,
standardise_output = FALSE,
codes_only = TRUE
)
#> Warning in check_mapping_args(from = from, to = to): Warning! No mapping sheet
#> available for this request. Attempting to map anyway using: read_v2_icd10
#>
#> No codes found after mapping. Returning `NULL`
#> NULL
Inspecting the mapping sheet read_v2_icd10 shows why. The icd10_code column contains 2 ICD-10 codes which both describe diabetic retinopathy:
all_lkps_maps$read_v2_icd10 %>%
dplyr::filter(stringr::str_detect(icd10_code, pattern = "H360"))
#> # A tibble: 17 × 3
#> read_code icd10_code icd10_code_def
#> <chr> <chr> <chr>
#> 1 C1087 E103D H360A 7
#> 2 C1096 E113D H360A 7
#> 3 C10E7 E103D H360A 7
#> 4 C10EP E103D H360A 7
#> 5 C10F6 E113D H360A 7
#> 6 C10FQ E113D H360A 7
#> 7 F420. E143D H360A 7
#> 8 F4200 E143D H360A 7
#> 9 F4201 E143D H360A 7
#> 10 F4202 E143D H360A 7
#> 11 F4203 E143D H360A 7
#> 12 F4204 E143D H360A 7
#> 13 F4205 H360A 5
#> 14 F4206 E143D H360A 7
#> 15 F4207 E143D H360A 7
#> 16 F4208 E143D H360A 7
#> 17 F420z E143D H360A 7
Note: “H36.0” is converted to “H360” by
map_codesinternally, as this is the format used by the mapping sheets.
Use code_descriptions_like(). For example, to find Read2 codes that match the description ‘diabetic retinopathy’:
code_descriptions_like(reg_expr = "diabetic retinopathy",
code_type = "read2",
all_lkps_maps = all_lkps_maps,
ignore_case = TRUE,
codes_only = FALSE,
preferred_description_only = TRUE) %>%
head()
#> # A tibble: 6 × 3
#> code description code_type
#> <chr> <chr> <chr>
#> 1 2BBJ. O/E - no right diabetic retinopathy read2
#> 2 2BBk. O/E - right eye stable treated proliferative diabetic retinop… read2
#> 3 2BBK. O/E - no left diabetic retinopathy read2
#> 4 2BBl. O/E - left eye stable treated proliferative diabetic retinopa… read2
#> 5 2BBo. O/E - sight threatening diabetic retinopathy read2
#> 6 2BBP. O/E - right eye background diabetic retinopathy read2
To get the children codes, use codes_starting_with(). This will return all unique clinical codes that start with the codes of interest:
# Some Read2 codes for T1DM
t1dm_read2 <- c("C10E.", "C108.")
# lookup details
codes_starting_with(codes = t1dm_read2,
code_type = "read2",
all_lkps_maps = all_lkps_maps,
codes_only = TRUE,
standardise_output = FALSE)
#> [1] "C108." "C10E."
Note: Some coding systems include a ‘.’ (e.g. ICD-10) - this may return unexpected results with
codes_starting_with(), as this function searches using regexs and ‘.’ is interpreted as a wildcard.
By default, a character vector of codes is returned. To return a data frame including code descriptions, set the argument codes_only to FALSE:
# Some Read2 codes for T1DM
t1dm_read2 <- c("C10E.", "C108.")
# lookup details
codes_starting_with(codes = t1dm_read2,
code_type = "read2",
all_lkps_maps = all_lkps_maps,
codes_only = FALSE) %>%
head()
#> # A tibble: 2 × 3
#> code description code_type
#> <chr> <chr> <chr>
#> 1 C108. Insulin dependent diabetes mellitus read2
#> 2 C10E. Type 1 diabetes mellitus read2
ICD-10 codes are presented in two different formats in the icd10_lkp table: ICD_10 and ALT_CODE. The latter is how ICD-10 codes are recorded in UKB. However, an ‘X’ is appended to 3 character codes without any 4 character children (e.g. A38X, ‘Scarlet fever’).
Another issue with the ICD_10 format is that is contains some duplicated codes e.g. I70.0 appears 3 times. This is because of MODIFIER_5 - the ALT_CODE format records a different code for each of these.
sessionInfo()
#> R version 4.1.2 (2021-11-01)
#> Platform: aarch64-apple-darwin20 (64-bit)
#> Running under: macOS Monterey 12.2
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices datasets utils methods base
#>
#> other attached packages:
#> [1] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7
#> [4] purrr_0.3.4 readr_2.0.2 tidyr_1.1.4
#> [7] tibble_3.1.4 ggplot2_3.3.5 tidyverse_1.3.1
#> [10] codemapper_0.0.0.9000 ukbwranglr_0.0.0.9000 workflowr_1.6.2
#>
#> loaded via a namespace (and not attached):
#> [1] httr_1.4.2 sass_0.4.0 jsonlite_1.7.2 modelr_0.1.8
#> [5] bslib_0.3.0 shiny_1.7.0 assertthat_0.2.1 highr_0.9
#> [9] renv_0.13.2 cellranger_1.1.0 yaml_2.2.1 pillar_1.6.3
#> [13] backports_1.2.1 glue_1.4.2 digest_0.6.28 promises_1.2.0.1
#> [17] rvest_1.0.1 colorspace_2.0-2 htmltools_0.5.2 httpuv_1.6.3
#> [21] pkgconfig_2.0.3 broom_0.7.9 haven_2.4.3 xtable_1.8-4
#> [25] scales_1.1.1 processx_3.5.2 whisker_0.4 later_1.3.0
#> [29] tzdb_0.1.2 git2r_0.28.0 generics_0.1.0 ellipsis_0.3.2
#> [33] withr_2.4.2 cli_3.0.1 magrittr_2.0.1 crayon_1.4.1
#> [37] readxl_1.3.1 mime_0.12 evaluate_0.14 ps_1.6.0
#> [41] fs_1.5.0 fansi_0.5.0 xml2_1.3.2 tools_4.1.2
#> [45] data.table_1.14.2 hms_1.1.1 lifecycle_1.0.1 munsell_0.5.0
#> [49] reprex_2.0.1 targets_0.8.0 callr_3.7.0 compiler_4.1.2
#> [53] jquerylib_0.1.4 rlang_0.4.11 grid_4.1.2 rstudioapi_0.13
#> [57] igraph_1.2.6 rmarkdown_2.11 gtable_0.3.0 codetools_0.2-18
#> [61] DBI_1.1.1 R6_2.5.1 lubridate_1.7.10 knitr_1.34
#> [65] fastmap_1.1.0 utf8_1.2.2 rprojroot_2.0.2 stringi_1.7.4
#> [69] Rcpp_1.0.7 vctrs_0.3.8 dbplyr_2.1.1 tidyselect_1.1.1
#> [73] xfun_0.24