Last updated: 2020-05-15

Checks: 4 3

Knit directory: SolFaMi_bioinfo/

This reproducible R Markdown analysis was created with workflowr (version 1.6.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: uncommitted changes

The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Environment: objects present

The global environment had objects present when the code in the R Markdown file was run. These objects can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment. Use wflow_publish or wflow_build to ensure that the code is always run in an empty environment.

The following objects were defined in the global environment when these results were created:

Name	Class	Size
add_dna_to_phyloseq	function	7.6 Kb
asv2otu	function	23.7 Kb
dada_to_phyloseq	function	1.8 Kb
drake_envir	environment	56 bytes
funguild_from_taxo	function	27.1 Kb
list_fastq_files	function	10.2 Kb
plan	drake_plan;tbl_df;tbl;data.frame	21.1 Kb
quality_filter_fastp	function	19.7 Kb
quality_report	function	336 bytes
remove_chimera	function	2.5 Kb
rename_sample	function	6.7 Kb
sample_data_matching	function	3.2 Kb
taxo_from_seqtab	function	2.5 Kb
track_wkflow	function	57.6 Kb

Seed: set.seed(20200401)

The command set.seed(20200401) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: detected

The following chunks had caches available:

session-info-chunk-inserted-by-workflowr
unnamed-chunk-1
unnamed-chunk-2
unnamed-chunk-3
unnamed-chunk-4

To ensure reproducibility of the results, delete the cache directory Bioinfo_cache and re-run the analysis. To have workflowr automatically delete the cache directory prior to building the file, set delete_cache = TRUE when running wflow_build() or wflow_publish().

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: aab21dc

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version aab21dc. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    .drake/config/
    Ignored:    .drake/data/
    Ignored:    .drake/drake/
    Ignored:    .drake/keys/
    Ignored:    .drake/scratch/
    Ignored:    analysis/Bioinfo_cache/
    Ignored:    data/raw_seq/
    Ignored:    output/filterAndTrim_fwd/
    Ignored:    output/filterAndTrim_rev/
    Ignored:    renv/library/
    Ignored:    renv/staging/

Unstaged changes:
    Modified:   .workInProgress.Rmd
    Modified:   analysis/Bioinfo.Rmd
    Modified:   analysis/_site.yml
    Modified:   code/plan.R

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/Bioinfo.Rmd) and HTML (public/Bioinfo.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	08ab169	adrientaudiere	2020-05-15	Add report to drake plan and new physeq objects with re-clusterisation
Rmd	006a4e7	adrientaudiere	2020-05-12	Add _drake.R and plan.R allowing the use of r_make
Rmd	73a6a6a	adrientaudiere	2020-05-12	New drake plan
Rmd	be2d56f	adrientaudiere	2020-05-08	ammend plan_ligth and try a first plan with all files
Rmd	40fb4ef	adrientaudiere	2020-05-07	make a plan_ligth plan to test the transform argument of target function
Rmd	be05791	adrientaudiere	2020-05-07	Add target and use format = “qs” to save big object
Rmd	2b5702d	adrientaudiere	2020-05-06	First try with the entire dataset
Rmd	9ab379b	adrientaudiere	2020-05-06	Better plan
Rmd	7adc02c	adrientaudiere	2020-05-04	first drake version on a subset of fastq files
Rmd	6b478fe	adrientaudiere	2020-04-28	update nav links
html	6b478fe	adrientaudiere	2020-04-28	update nav links
html	a1f274a	adrientaudiere	2020-04-28	Build site.
Rmd	2a1f393	adrientaudiere	2020-04-28	workflowr::wflow_publish(all = T)
Rmd	b5a3cac	adrientaudiere	2020-04-28	Update package
html	f9f6bd8	adrientaudiere	2020-04-28	Build site.
Rmd	81e0125	adrientaudiere	2020-04-28	try publish 2
html	302cf0c	adrientaudiere	2020-04-28	Build site.
Rmd	cdde2f0	adrientaudiere	2020-04-28	Add analysis files

Bioinformatique pipeline

Load packages, functions and drake plan

The _drake.R make the drake plan after sourcing code/packages.R, code/functions_bioinfo.R and code/plan.R.

source("_drake.R")

* The lockfile is already up to date.

# r_make()  # need to run r_make if you change the plan

Warning: The above code chunk cached its results, but it won’t be re-run if previous chunks it depends on are updated. If you need to use caching, it is highly recommended to also set knitr::opts_chunk$set(autodep = TRUE) at the top of the file (in a chunk that is not cached). Alternatively, you can customize the option dependson for each individual chunk that is cached. Using either autodep or dependson will remove this warning. See the knitr cache options for more details.

Visualisation of the pipeline (drake plan)

vis_drake_graph(plan, envir = drake_envir)

Tracking sequences and samples accross the pipeline

track <- track_wkflow(
  list( 
    "raw_data_fw" = normalizePath(readd(data)$fnfs),
    "raw_data_rev" = normalizePath(readd(data)$fnrs),
    "filter_data_fw" = normalizePath(
      list.files("output/filterAndTrim_fwd/",
                 full.names = TRUE)),
    "filter_data_rev" = normalizePath(
      list.files("output/filterAndTrim_rev/",
                 full.names = TRUE)),
    "derep_fs" = readd(derep_fs), 
    "derep_rs" = readd(derep_rs),
    "ddR" = readd(ddR), 
    "ddF" = readd(ddF), 
    "merged_seq" = readd(merged_seq),
    "seqtab_chim" = readd(seqtab_w_short_seq),
    "seqtab" = readd(seqtab),
    "data_phyloseq" = readd(data_phyloseq),
    "data_phyloseq_otu" = readd(data_phyloseq_otu),
    "data_phyloseq_otu_vsearch" = readd(data_phyloseq_otu_vsearch)
  )
)

Compute the number of sequences

Start object of class: character
Start object of class: character
Start object of class: character
Start object of class: character

Start object of class: list
Start object of class: list
Start object of class: list
Start object of class: list
Start object of class: list

Start object of class: matrix
Start object of class: matrix

Start object of class: phyloseq
Start object of class: phyloseq
Start object of class: phyloseq

Compute the number of clusters

Start object of class: character
Start object of class: character
Start object of class: character
Start object of class: character

Start object of class: list
Start object of class: list
Start object of class: list
Start object of class: list
Start object of class: list

Start object of class: matrix
Start object of class: matrix

Start object of class: phyloseq
Start object of class: phyloseq
Start object of class: phyloseq

Compute the number of samples

Start object of class: character
Start object of class: character
Start object of class: character
Start object of class: character

Start object of class: list
Start object of class: list
Start object of class: list
Start object of class: list
Start object of class: list

Start object of class: matrix
Start object of class: matrix

Start object of class: phyloseq
Start object of class: phyloseq
Start object of class: phyloseq

track_formattable <- track
track_formattable$nb_singletons <- NA
track_formattable[c("derep_fs", "derep_rs"),]$nb_singletons <- 
  track_formattable[c("derep_fs", "derep_rs"),]$nb_clusters
track_formattable[c("derep_fs", "derep_rs"),]$nb_clusters <- NA

track_formattable[is.na(track_formattable)] <- ""


formattable(track_formattable, 
            list(
              area(col = nb_sequences) ~ color_bar("cyan",  na.rm = T),
              area(col = nb_clusters) ~ normalize_bar("yellowgreen", na.rm = TRUE),
              area(col = nb_singletons) ~ color_tile("orange", "orange"),
              area(col = nb_samples) ~ color_tile("red",  "pink")
            )
)

	nb_sequences	nb_clusters	nb_samples	nb_singletons
raw_data_fw	18929458		96
raw_data_rev	18929458		96
filter_data_fw	16595729		96
filter_data_rev	16595729		96
derep_fs	16595729		96	2608188
derep_rs	16595729		96	2081688
ddR	16572308	1398	96
ddF	16540730	4148	96
merged_seq	16524311	9637	96
seqtab_chim	9063976	1446	96
seqtab	9030839	1295	96
data_phyloseq	9030839	1295	96
data_phyloseq_otu	9030839	304	96
data_phyloseq_otu_vsearch	9030839	250	96

sessionInfo()

R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: KDE neon User Edition 5.18

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C               LC_TIME=fr_FR.UTF-8       
 [4] LC_COLLATE=fr_FR.UTF-8     LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
 [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] knitr_1.28           tidyselect_1.0.0     speedyseq_0.1.2.9004 DECIPHER_2.14.0      RSQLite_2.2.0       
 [6] Biostrings_2.54.0    XVector_0.26.0       IRanges_2.20.2       S4Vectors_0.24.4     BiocGenerics_0.32.0 
[11] phyloseq_1.30.0      dada2_1.14.1         pbapply_1.4-2        formattable_0.2.0.1  dplyr_0.8.5         
[16] styler_1.3.2         qs_0.21.2            ggraph_2.0.2         networkD3_0.4        here_0.1            
[21] lubridate_1.7.8      visNetwork_2.0.9     forcats_0.5.0        readxl_1.3.1         Rcpp_1.0.4.6        
[26] drake_7.12.0         devtools_2.3.0       usethis_1.6.0        BiocManager_1.30.10  gridExtra_2.3       
[31] ggplot2_3.3.0        workflowr_1.6.1     

loaded via a namespace (and not attached):
  [1] backports_1.1.6             plyr_1.8.6                  igraph_1.2.5               
  [4] splines_3.6.3               storr_1.2.1                 RApiSerialize_0.1.0        
  [7] BiocParallel_1.20.1         GenomeInfoDb_1.22.1         digest_0.6.25              
 [10] foreach_1.5.0               htmltools_0.4.0             viridis_0.5.1              
 [13] fansi_0.4.1                 magrittr_1.5                memoise_1.1.0              
 [16] base64url_1.4               cluster_2.1.0               remotes_2.1.1              
 [19] graphlayouts_0.7.0          RcppParallel_5.0.0          matrixStats_0.56.0         
 [22] prettyunits_1.1.1           jpeg_0.1-8.1                colorspace_1.4-1           
 [25] blob_1.2.1                  ggrepel_0.8.2               xfun_0.13                  
 [28] callr_3.4.3                 crayon_1.3.4                RCurl_1.98-1.2             
 [31] jsonlite_1.6.1              survival_3.1-12             iterators_1.0.12           
 [34] ape_5.3                     glue_1.4.0                  polyclip_1.10-0            
 [37] gtable_0.3.0                zlibbioc_1.32.0             DelayedArray_0.12.3        
 [40] pkgbuild_1.0.7              Rhdf5lib_1.8.0              scales_1.1.0               
 [43] DBI_1.1.0                   viridisLite_0.3.0           progress_1.2.2             
 [46] bit_1.1-15.2                txtq_0.2.0                  htmlwidgets_1.5.1          
 [49] RColorBrewer_1.1-2          ellipsis_0.3.0              pkgconfig_2.0.3            
 [52] farver_2.0.3                utf8_1.1.4                  rlang_0.4.5                
 [55] reshape2_1.4.4              later_1.0.0                 munsell_0.5.0              
 [58] cellranger_1.1.0            tools_3.6.3                 cli_2.0.2                  
 [61] generics_0.0.2              ade4_1.7-15                 evaluate_0.14              
 [64] biomformat_1.14.0           stringr_1.4.0               yaml_2.2.1                 
 [67] bit64_0.9-7                 processx_3.4.2              fs_1.4.1                   
 [70] tidygraph_1.1.2             purrr_0.3.4                 nlme_3.1-147               
 [73] whisker_0.4                 compiler_3.6.3              rstudioapi_0.11            
 [76] filelock_1.0.2              png_0.1-7                   testthat_2.3.2             
 [79] tibble_3.0.1                tweenr_1.0.1                stringi_1.4.6              
 [82] highr_0.8                   ps_1.3.2                    desc_1.2.0                 
 [85] lattice_0.20-41             Matrix_1.2-18               permute_0.9-5              
 [88] vegan_2.5-6                 multtest_2.42.0             vctrs_0.2.4                
 [91] pillar_1.4.3                lifecycle_0.2.0             data.table_1.12.8          
 [94] bitops_1.0-6                httpuv_1.5.2                GenomicRanges_1.38.0       
 [97] R6_2.4.1                    latticeExtra_0.6-29         hwriter_1.3.2              
[100] promises_1.1.0              renv_0.9.3                  ShortRead_1.44.3           
[103] sessioninfo_1.1.1           codetools_0.2-16            MASS_7.3-51.6              
[106] assertthat_0.2.1            pkgload_1.0.2               rhdf5_2.30.1               
[109] SummarizedExperiment_1.16.1 rprojroot_1.3-2             withr_2.2.0                
[112] GenomicAlignments_1.22.1    Rsamtools_2.2.3             GenomeInfoDbData_1.2.2     
[115] mgcv_1.8-31                 hms_0.5.3                   grid_3.6.3                 
[118] tidyr_1.0.2                 rmarkdown_2.1               git2r_0.26.1               
[121] ggforce_0.3.1               Biobase_2.46.0

Bioinformatic pipeline

Bioinformatique pipeline

Load packages, functions and drake plan

Visualisation of the pipeline (drake plan)

Tracking sequences and samples accross the pipeline