Last updated: 2021-10-05
Checks: 6 1
Knit directory: codemapper/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20210923) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version d285b07. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: R/.DS_Store
Ignored: _targets/
Ignored: config.ini
Ignored: data/
Ignored: output/
Ignored: renv/library/
Ignored: renv/local/
Ignored: renv/staging/
Ignored: tests/.DS_Store
Untracked files:
Untracked: R/codemapper.R
Untracked: man/codemapper.Rd
Unstaged changes:
Modified: DESCRIPTION
Modified: R/clinical_codes.R
Modified: R/constants.R
Modified: R/utils.R
Modified: _targets.R
Modified: analysis/clinical_codes_lkps_and_mappings.Rmd
Modified: analysis/index.Rmd
Modified: man/get_child_codes.Rd
Modified: man/lookup_codes.Rd
Modified: man/map_codes.Rd
Modified: man/reformat_icd10_codes.Rd
Modified: man/search_codes_by_description.Rd
Modified: tests/testthat/test_constants.R
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/clinical_codes_lkps_and_mappings.Rmd) and HTML (public/clinical_codes_lkps_and_mappings.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.
| File | Version | Author | Date | Message |
|---|---|---|---|---|
| Rmd | d285b07 | rmgpanw | 2021-09-29 | add analysis/reformat_all_lkps_maps.Rmd to _targets.R; start analysis/clinical_codes_lkps_and_mappings.Rmd |
library(codemapper)
#> Loading required package: ukbwranglr
library(tidyverse)
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
#> ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
#> ✓ tibble 3.1.4 ✓ dplyr 1.0.7
#> ✓ tidyr 1.1.4 ✓ stringr 1.4.0
#> ✓ readr 2.0.2 ✓ forcats 0.5.1
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag() masks stats::lag()
This vignette describes how to work with clinical codes using codemapper. Specifically:
The functions provided by codemapper rely on UK Biobank resource 592, which includes an Excel workbook containing lookup and mapping tables, and the NHSBSA BNF to SNOMED code mapping file (available here). These tables have been converted into a named list of data frames. This can be retrieved with TODO():
# retrieve code mappings in .Rdata format
all_lkps_maps <- codemapper:::get_all_lkps_maps()
# each item in the list is a sheet in the UKB Excel workbook (resource 592)
names(all_lkps_maps)
#> [1] "bnf_lkp" "dmd_lkp" "icd9_lkp"
#> [4] "icd10_lkp" "icd9_icd10" "read_v2_lkp"
#> [7] "read_v2_drugs_lkp" "read_v2_drugs_bnf" "read_v2_icd9"
#> [10] "read_v2_icd10" "read_v2_opcs4" "read_v2_read_ctv3"
#> [13] "read_ctv3_lkp" "read_ctv3_icd9" "read_ctv3_icd10"
#> [16] "read_ctv3_opcs4" "read_ctv3_read_v2" "bnf_dmd"
To lookup details for a particular code, use lookup_codes(). Setting preferred_description_only to TRUE will return only the preferred code descriptions if synonyms are present (read2 and read3 may include multiple descriptions for the same code):
# Some Read2 codes for T1DM
t1dm_read2 <- c("C10E.", "C108.")
# lookup details
lookup_codes(codes = t1dm_read2,
code_type = "read2",
all_lkps_maps = all_lkps_maps,
preferred_description_only = TRUE)
#> # A tibble: 2 × 3
#> code description code_type
#> <chr> <chr> <chr>
#> 1 C108. Insulin dependent diabetes mellitus read2
#> 2 C10E. Type 1 diabetes mellitus read2
By default, the output is standardised to produce the columns shown above. To output the original formatting from UK Biobank resource 592, set standardise_output to FALSE:
# lookup details
lookup_codes(codes = t1dm_read2,
code_type = "read2",
all_lkps_maps = all_lkps_maps,
preferred_description_only = TRUE,
standardise_output = FALSE)
#> # A tibble: 2 × 3
#> read_code term_code term_description
#> <chr> <chr> <chr>
#> 1 C108. 00 Insulin dependent diabetes mellitus
#> 2 C10E. 00 Type 1 diabetes mellitus
An example: map codes for type 1 diabetes from Read2 to Read3. Note that both of the Read 2 codes C10E. and C108. map to a single read3 code, X40J4:
# Some Read2 codes for T1DM
t1dm_read2 <- c("C10E.", "C108.")
# lookup details
map_codes(codes = t1dm_read2,
from = "read2",
to = "read3",
all_lkps_maps = all_lkps_maps)
#> # A tibble: 5 × 3
#> code description code_type
#> <chr> <chr> <chr>
#> 1 X40J4 Type I diabetes mellitus read3
#> 2 X40J4 Type 1 diabetes mellitus read3
#> 3 X40J4 IDDM - Insulin-dependent diabetes mellitus read3
#> 4 X40J4 Juvenile onset diabetes mellitus read3
#> 5 X40J4 Insulin-dependent diabetes mellitus read3
Note that preferred_description_only cannot be TRUE if standardise_output is FALSE with map_codes (will raise an error). This is because some codes may otherwise be ‘lost’ in the mapping process. When standardise_output is TRUE, the mapped codes from map_codes are passed on to lookup_codes, at which point one can request to return only preferred code descriptions:
# mapping the Read 2 code "D4104" to Read 3 only returns the secondary Read 3
# description (`TERMV3_TYPE` == "S"), unlike for "D4103".
map_codes(codes = c("D4103", "D4104"),
from = "read2",
to = "read3",
all_lkps_maps = all_lkps_maps,
codes_only = FALSE,
preferred_description_only = FALSE,
standardise_output = FALSE) %>%
dplyr::select(tidyselect::contains("V3"))
#> # A tibble: 2 × 4
#> READV3_CODE TERMV3_CODE TERMV3_TYPE TERMV3_DESC
#> <chr> <chr> <chr> <chr>
#> 1 D4103 Y20e3 P Polycythaemia due to cyanotic respiratory…
#> 2 D4104 Y20e6 S Renal polycythaemia
# if `standardise_output` is `TRUE`, then `preferred_description_only` may also
# be set to `TRUE`
map_codes(codes = c("D4103", "D4104"),
from = "read2",
to = "read3",
all_lkps_maps = all_lkps_maps,
codes_only = FALSE,
preferred_description_only = TRUE,
standardise_output = TRUE)
#> # A tibble: 2 × 3
#> code description code_type
#> <chr> <chr> <chr>
#> 1 D4103 Polycythaemia due to cyanotic respiratory disease read3
#> 2 D4104 Secondary polycythaemia with excess erythropoietin read3
Mapping to ICD is more problematic as some results are a range of ICD codes (note also that for ICD-10, the mapping sheets use an alternative code format which removes any “.” characters):
map_codes(codes = t1dm_read2,
from = "read2",
to = "icd10",
all_lkps_maps = all_lkps_maps,
codes_only = TRUE,
standardise_output = FALSE)
#> [1] "E100-E109"
The available mappings do not cover all possible mapping directions. For example, while there are mappings for Read2 to ICD-10, there is no mapping for ICD-10 to Read2. For cases like this, map_codes() will attempt to map anyway by using the same mapping sheet in reverse (e.g. for mapping ICD-10 to Read2, map_codes uses the read_v2_icd10 mapping sheet). However, this returns no results when attempting to map the ICD-10 code, H36.0 for diabetic retinopathy:
# find ICD-10 code matching "diabetic retinopathy"
icd10_diabetic_retinopathy <-
search_codes_by_description(
reg_expr = "diabetic retinopathy",
code_type = "icd10",
all_lkps_maps = all_lkps_maps,
ignore_case = TRUE,
codes_only = TRUE
)
# attempting to map this to Read 2 returns a NULL result however
map_codes(
codes = icd10_diabetic_retinopathy,
from = "icd10",
to = "read2",
all_lkps_maps = all_lkps_maps,
standardise_output = FALSE,
codes_only = TRUE
)
#> Warning in map_codes(codes = icd10_diabetic_retinopathy, from = "icd10", :
#> Warning! No mapping sheet available for this request. Attempting to map anyway
#> using: read_v2_icd10
#>
#> No codes found after mapping. Returning `NULL`
#> NULL
Inspecting the mapping sheet read_v2_icd10 shows why. The icd10_code column contains 2 ICD-10 codes which both describe diabetic retinopathy:
all_lkps_maps$read_v2_icd10 %>%
dplyr::filter(stringr::str_detect(icd10_code, pattern = "H360"))
#> # A tibble: 17 × 3
#> read_code icd10_code icd10_code_def
#> <chr> <chr> <chr>
#> 1 C1087 E103D H360A 7
#> 2 C1096 E113D H360A 7
#> 3 C10E7 E103D H360A 7
#> 4 C10EP E103D H360A 7
#> 5 C10F6 E113D H360A 7
#> 6 C10FQ E113D H360A 7
#> 7 F420. E143D H360A 7
#> 8 F4200 E143D H360A 7
#> 9 F4201 E143D H360A 7
#> 10 F4202 E143D H360A 7
#> 11 F4203 E143D H360A 7
#> 12 F4204 E143D H360A 7
#> 13 F4205 H360A 5
#> 14 F4206 E143D H360A 7
#> 15 F4207 E143D H360A 7
#> 16 F4208 E143D H360A 7
#> 17 F420z E143D H360A 7
Note: “H36.0” is converted to “H360” by
map_codesinternally, as this is the format used by the mapping sheets.
Use search_codes_by_description(). For example, to find Read2 codes that match the description ‘diabetic retinopathy’:
search_codes_by_description(reg_expr = "diabetic retinopathy",
code_type = "read2",
all_lkps_maps = all_lkps_maps,
ignore_case = TRUE,
codes_only = FALSE,
preferred_description_only = TRUE) %>%
head()
#> # A tibble: 6 × 3
#> read_code term_code term_description
#> <chr> <chr> <chr>
#> 1 2BBJ. 00 O/E - no right diabetic retinopathy
#> 2 2BBk. 00 O/E - right eye stable treated proliferative diabetic ret…
#> 3 2BBK. 00 O/E - no left diabetic retinopathy
#> 4 2BBl. 00 O/E - left eye stable treated proliferative diabetic reti…
#> 5 2BBo. 00 O/E - sight threatening diabetic retinopathy
#> 6 2BBP. 00 O/E - right eye background diabetic retinopathy
To get the children codes, use get_child_codes(). This will return all unique clinical codes that start with the codes of interest:
# Some Read2 codes for T1DM
t1dm_read2 <- c("C10E.", "C108.")
# lookup details
get_child_codes(codes = t1dm_read2,
code_type = "read2",
all_lkps_maps = all_lkps_maps,
codes_only = TRUE,
standardise_output = FALSE)
#> [1] "C108." "C1080" "C1081" "C1082" "C1083" "C1084" "C1085" "C1086" "C1087"
#> [10] "C1088" "C1089" "C108A" "C108B" "C108C" "C108D" "C108E" "C108F" "C108G"
#> [19] "C108H" "C108J" "C108y" "C108z" "C10E." "C10E0" "C10E1" "C10E2" "C10E3"
#> [28] "C10E4" "C10E5" "C10E6" "C10E7" "C10E8" "C10E9" "C10EA" "C10EB" "C10EC"
#> [37] "C10ED" "C10EE" "C10EF" "C10EG" "C10EH" "C10EJ" "C10EK" "C10EL" "C10EM"
#> [46] "C10EN" "C10EP" "C10EQ" "C10ER"
Note: Some coding systems include a ‘.’ (e.g. ICD-10) - this may return unexpected results with
get_child_codes(), as this function searches using regexs and ‘.’ is interpreted as a wildcard.
By default, a character vector of codes is returned. To return a data frame including code descriptions, set the argument codes_only to FALSE:
# Some Read2 codes for T1DM
t1dm_read2 <- c("C10E.", "C108.")
# lookup details
get_child_codes(codes = t1dm_read2,
code_type = "read2",
all_lkps_maps = all_lkps_maps,
codes_only = FALSE) %>%
head()
#> # A tibble: 6 × 3
#> code description code_type
#> <chr> <chr> <chr>
#> 1 C108. Insulin dependent diabetes mellitus read2
#> 2 C1080 Insulin-dependent diabetes mellitus with renal complications read2
#> 3 C1081 Insulin-dependent diabetes mellitus with ophthalmic complicat… read2
#> 4 C1082 Insulin-dependent diabetes mellitus with neurological complic… read2
#> 5 C1083 Insulin dependent diabetes mellitus with multiple complicatio… read2
#> 6 C1084 Unstable insulin dependent diabetes mellitus read2
sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur 10.16
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices datasets utils methods base
#>
#> other attached packages:
#> [1] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7
#> [4] purrr_0.3.4 readr_2.0.2 tidyr_1.1.4
#> [7] tibble_3.1.4 ggplot2_3.3.5 tidyverse_1.3.1
#> [10] codemapper_0.0.0.9000 ukbwranglr_0.0.0.9000 workflowr_1.6.2
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.7 lubridate_1.7.10 ps_1.6.0 assertthat_0.2.1
#> [5] rprojroot_2.0.2 digest_0.6.28 utf8_1.2.2 R6_2.5.1
#> [9] cellranger_1.1.0 backports_1.2.1 reprex_2.0.1 evaluate_0.14
#> [13] httr_1.4.2 pillar_1.6.3 rlang_0.4.11 readxl_1.3.1
#> [17] rstudioapi_0.13 data.table_1.14.2 callr_3.7.0 whisker_0.4
#> [21] jquerylib_0.1.4 rmarkdown_2.9 igraph_1.2.6 munsell_0.5.0
#> [25] broom_0.7.9 compiler_4.1.0 httpuv_1.6.3 modelr_0.1.8
#> [29] xfun_0.24 pkgconfig_2.0.3 htmltools_0.5.2 tidyselect_1.1.1
#> [33] codetools_0.2-18 fansi_0.5.0 crayon_1.4.1 tzdb_0.1.2
#> [37] dbplyr_2.1.1 withr_2.4.2 later_1.3.0 grid_4.1.0
#> [41] jsonlite_1.7.2 gtable_0.3.0 lifecycle_1.0.1 DBI_1.1.1
#> [45] git2r_0.28.0 magrittr_2.0.1 scales_1.1.1 cli_3.0.1
#> [49] stringi_1.7.4 renv_0.13.2 fs_1.5.0 promises_1.2.0.1
#> [53] xml2_1.3.2 bslib_0.3.0 targets_0.8.0 ellipsis_0.3.2
#> [57] generics_0.1.0 vctrs_0.3.8 tools_4.1.0 glue_1.4.2
#> [61] hms_1.1.1 processx_3.5.2 fastmap_1.1.0 yaml_2.2.1
#> [65] colorspace_2.0-2 rvest_1.0.1 knitr_1.34 haven_2.4.3
#> [69] sass_0.4.0