Last updated: 2020-05-16

Checks: 6 1

Knit directory: SolFaMi_bioinfo/

This reproducible R Markdown analysis was created with workflowr (version 1.6.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20200401)

The command set.seed(20200401) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: detected

The following chunks had caches available:

session-info-chunk-inserted-by-workflowr
unnamed-chunk-1
unnamed-chunk-2
unnamed-chunk-3
unnamed-chunk-4

To ensure reproducibility of the results, delete the cache directory Bioinfo_cache and re-run the analysis. To have workflowr automatically delete the cache directory prior to building the file, set delete_cache = TRUE when running wflow_build() or wflow_publish().

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: bf17f72

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version bf17f72. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    .drake/config/
    Ignored:    .drake/data/
    Ignored:    .drake/drake/
    Ignored:    .drake/keys/
    Ignored:    .drake/scratch/
    Ignored:    analysis/Bioinfo_cache/
    Ignored:    data/raw_seq/
    Ignored:    output/filterAndTrim_fwd/
    Ignored:    output/filterAndTrim_rev/
    Ignored:    renv/library/
    Ignored:    renv/staging/

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/Bioinfo.Rmd) and HTML (public/Bioinfo.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	bf17f72	adrientaudiere	2020-05-16	workflowr::wflow_publish(“analysis/Bioinfo.Rmd”)
Rmd	e6e7518	adrientaudiere	2020-05-16	Fix the wflow_publish report in drake plan
html	e6e7518	adrientaudiere	2020-05-16	Fix the wflow_publish report in drake plan
Rmd	08ab169	adrientaudiere	2020-05-15	Add report to drake plan and new physeq objects with re-clusterisation
Rmd	006a4e7	adrientaudiere	2020-05-12	Add _drake.R and plan.R allowing the use of r_make
Rmd	73a6a6a	adrientaudiere	2020-05-12	New drake plan
Rmd	be2d56f	adrientaudiere	2020-05-08	ammend plan_ligth and try a first plan with all files
Rmd	40fb4ef	adrientaudiere	2020-05-07	make a plan_ligth plan to test the transform argument of target function
Rmd	be05791	adrientaudiere	2020-05-07	Add target and use format = “qs” to save big object
Rmd	2b5702d	adrientaudiere	2020-05-06	First try with the entire dataset
Rmd	9ab379b	adrientaudiere	2020-05-06	Better plan
Rmd	7adc02c	adrientaudiere	2020-05-04	first drake version on a subset of fastq files
Rmd	6b478fe	adrientaudiere	2020-04-28	update nav links
html	6b478fe	adrientaudiere	2020-04-28	update nav links
html	a1f274a	adrientaudiere	2020-04-28	Build site.
Rmd	2a1f393	adrientaudiere	2020-04-28	workflowr::wflow_publish(all = T)
Rmd	b5a3cac	adrientaudiere	2020-04-28	Update package
html	f9f6bd8	adrientaudiere	2020-04-28	Build site.
Rmd	81e0125	adrientaudiere	2020-04-28	try publish 2
html	302cf0c	adrientaudiere	2020-04-28	Build site.
Rmd	cdde2f0	adrientaudiere	2020-04-28	Add analysis files

Bioinformatique pipeline

Load packages, functions and drake plan

The _drake.R make the drake plan after sourcing code/packages.R, code/functions_bioinfo.R and code/plan.R.

source("_drake.R")

Le chargement a nécessité le package : conflicted

Le chargement a nécessité le package : ggplot2

Le chargement a nécessité le package : gridExtra

Le chargement a nécessité le package : BiocManager

Bioconductor version 3.10 (BiocManager 1.30.10), ?BiocManager::install for help

Bioconductor version '3.10' is out-of-date; the current release version '3.11'
  is available with R version '4.0'; see https://bioconductor.org/install

Le chargement a nécessité le package : devtools

Le chargement a nécessité le package : usethis


Attachement du package : 'devtools'

The following object is masked from 'package:BiocManager':

    install

Le chargement a nécessité le package : drake


Attachement du package : 'drake'

The following object is masked from 'package:devtools':

    check

Le chargement a nécessité le package : Rcpp

Le chargement a nécessité le package : readxl

Le chargement a nécessité le package : forcats

Le chargement a nécessité le package : visNetwork

Le chargement a nécessité le package : lubridate


Attachement du package : 'lubridate'

The following objects are masked from 'package:base':

    date, intersect, setdiff, union

Le chargement a nécessité le package : here

here() starts at /home/adrien/Bureau/Analyse_Solfami/SolFaMi_bioinfo

Le chargement a nécessité le package : networkD3

Le chargement a nécessité le package : ggraph

Le chargement a nécessité le package : qs

qs v0.21.2. See ChangeLog for update info.

Le chargement a nécessité le package : styler

Le chargement a nécessité le package : dplyr


Attachement du package : 'dplyr'

The following objects are masked from 'package:lubridate':

    intersect, setdiff, union

The following object is masked from 'package:gridExtra':

    combine

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Le chargement a nécessité le package : formattable

Le chargement a nécessité le package : pbapply

Le chargement a nécessité le package : dada2

Le chargement a nécessité le package : phyloseq

Le chargement a nécessité le package : DECIPHER

Le chargement a nécessité le package : Biostrings

Le chargement a nécessité le package : BiocGenerics

Le chargement a nécessité le package : parallel


Attachement du package : 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following object is masked from 'package:formattable':

    normalize

The following objects are masked from 'package:dplyr':

    combine, intersect, setdiff, union

The following objects are masked from 'package:lubridate':

    intersect, setdiff, union

The following object is masked from 'package:gridExtra':

    combine

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Le chargement a nécessité le package : S4Vectors

Le chargement a nécessité le package : stats4


Attachement du package : 'S4Vectors'

The following objects are masked from 'package:dplyr':

    first, rename

The following objects are masked from 'package:lubridate':

    second, second<-

The following object is masked from 'package:drake':

    expand

The following object is masked from 'package:base':

    expand.grid

Le chargement a nécessité le package : IRanges


Attachement du package : 'IRanges'

The following object is masked from 'package:phyloseq':

    distance

The following objects are masked from 'package:dplyr':

    collapse, desc, slice

The following object is masked from 'package:lubridate':

    %within%

Le chargement a nécessité le package : XVector


Attachement du package : 'Biostrings'

The following object is masked from 'package:base':

    strsplit

Le chargement a nécessité le package : RSQLite

Le chargement a nécessité le package : speedyseq


Attachement du package : 'speedyseq'

The following objects are masked from 'package:phyloseq':

    plot_bar, plot_heatmap, plot_tree, psmelt, tax_glom, tip_glom

* The lockfile is already up to date.

# r_make()  # need to run r_make if you change the plan

Visualisation of the pipeline (drake plan)

vis_drake_graph(plan, envir = drake_envir)

Tracking sequences and samples accross the pipeline

track <- track_wkflow(
  list( 
    "raw_data_fw" = normalizePath(readd(data)$fnfs),
    "raw_data_rev" = normalizePath(readd(data)$fnrs),
    "filter_data_fw" = normalizePath(
      list.files("output/filterAndTrim_fwd/",
                 full.names = TRUE)),
    "filter_data_rev" = normalizePath(
      list.files("output/filterAndTrim_rev/",
                 full.names = TRUE)),
    "derep_fs" = readd(derep_fs), 
    "derep_rs" = readd(derep_rs),
    "ddR" = readd(ddR), 
    "ddF" = readd(ddF), 
    "merged_seq" = readd(merged_seq),
    "seqtab_chim" = readd(seqtab_w_short_seq),
    "seqtab" = readd(seqtab),
    "data_phyloseq" = readd(data_phyloseq),
   "data_phyloseq_otu" = readd(data_phyloseq_otu),
    "data_phyloseq_otu_vsearch" = readd(data_phyloseq_otu_vsearch)
  )
)

Compute the number of sequences

Start object of class: character
Start object of class: character
Start object of class: character
Start object of class: character

Start object of class: list
Start object of class: list
Start object of class: list
Start object of class: list
Start object of class: list

Start object of class: matrix
Start object of class: matrix

Start object of class: phyloseq
Start object of class: phyloseq
Start object of class: phyloseq

Compute the number of clusters

Start object of class: character
Start object of class: character
Start object of class: character
Start object of class: character

Start object of class: list
Start object of class: list
Start object of class: list
Start object of class: list
Start object of class: list

Start object of class: matrix
Start object of class: matrix

Start object of class: phyloseq
Start object of class: phyloseq
Start object of class: phyloseq

Compute the number of samples

Start object of class: character
Start object of class: character
Start object of class: character
Start object of class: character

Start object of class: list
Start object of class: list
Start object of class: list
Start object of class: list
Start object of class: list

Start object of class: matrix
Start object of class: matrix

Start object of class: phyloseq
Start object of class: phyloseq
Start object of class: phyloseq

track_formattable <- track
track_formattable$nb_singletons <- NA
track_formattable[c("derep_fs", "derep_rs"),]$nb_singletons <- 
  track_formattable[c("derep_fs", "derep_rs"),]$nb_clusters
track_formattable[c("derep_fs", "derep_rs"),]$nb_clusters <- NA

track_formattable[is.na(track_formattable)] <- ""


formattable(track_formattable, 
            list(
              area(col = nb_sequences) ~ color_bar("cyan",  na.rm = T),
              area(col = nb_clusters) ~ normalize_bar("yellowgreen", na.rm = TRUE),
              area(col = nb_singletons) ~ color_tile("orange", "orange"),
              area(col = nb_samples) ~ color_tile("red",  "pink")
            )
)

	nb_sequences	nb_clusters	nb_samples	nb_singletons
raw_data_fw	18929458		96
raw_data_rev	18929458		96
filter_data_fw	16595729		96
filter_data_rev	16595729		96
derep_fs	16595729		96	2608188
derep_rs	16595729		96	2081688
ddR	16572308	1398	96
ddF	16540730	4148	96
merged_seq	16524311	9637	96
seqtab_chim	9063976	1446	96
seqtab	9030839	1295	96
data_phyloseq	9030839	1295	96
data_phyloseq_otu	9030839	304	96
data_phyloseq_otu_vsearch	9030839	250	96

sessionInfo()

R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: KDE neon User Edition 5.18

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8    
 [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
 [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices datasets  utils    
[8] methods   base     

other attached packages:
 [1] tidyselect_1.0.0     speedyseq_0.1.2.9004 DECIPHER_2.14.0     
 [4] RSQLite_2.2.0        Biostrings_2.54.0    XVector_0.26.0      
 [7] IRanges_2.20.2       S4Vectors_0.24.4     BiocGenerics_0.32.0 
[10] phyloseq_1.30.0      dada2_1.14.1         pbapply_1.4-2       
[13] formattable_0.2.0.1  dplyr_0.8.5          styler_1.3.2        
[16] qs_0.21.2            ggraph_2.0.2         networkD3_0.4       
[19] here_0.1             lubridate_1.7.8      visNetwork_2.0.9    
[22] forcats_0.5.0        readxl_1.3.1         Rcpp_1.0.4.6        
[25] drake_7.12.0         devtools_2.3.0       usethis_1.6.0       
[28] BiocManager_1.30.10  gridExtra_2.3        ggplot2_3.3.0       
[31] conflicted_1.0.4     knitr_1.28           workflowr_1.6.1     

loaded via a namespace (and not attached):
  [1] backports_1.1.6             plyr_1.8.6                 
  [3] igraph_1.2.5                splines_3.6.3              
  [5] storr_1.2.1                 RApiSerialize_0.1.0        
  [7] BiocParallel_1.20.1         GenomeInfoDb_1.22.1        
  [9] digest_0.6.25               foreach_1.5.0              
 [11] htmltools_0.4.0             viridis_0.5.1              
 [13] fansi_0.4.1                 magrittr_1.5               
 [15] memoise_1.1.0               base64url_1.4              
 [17] cluster_2.1.0               remotes_2.1.1              
 [19] graphlayouts_0.7.0          RcppParallel_5.0.0         
 [21] matrixStats_0.56.0          prettyunits_1.1.1          
 [23] jpeg_0.1-8.1                colorspace_1.4-1           
 [25] blob_1.2.1                  ggrepel_0.8.2              
 [27] xfun_0.13                   callr_3.4.3                
 [29] crayon_1.3.4                RCurl_1.98-1.2             
 [31] jsonlite_1.6.1              survival_3.1-12            
 [33] iterators_1.0.12            ape_5.3                    
 [35] glue_1.4.0                  polyclip_1.10-0            
 [37] gtable_0.3.0                zlibbioc_1.32.0            
 [39] DelayedArray_0.12.3         pkgbuild_1.0.7             
 [41] Rhdf5lib_1.8.0              scales_1.1.0               
 [43] DBI_1.1.0                   viridisLite_0.3.0          
 [45] progress_1.2.2              bit_1.1-15.2               
 [47] txtq_0.2.0                  htmlwidgets_1.5.1          
 [49] RColorBrewer_1.1-2          ellipsis_0.3.0             
 [51] pkgconfig_2.0.3             farver_2.0.3               
 [53] rlang_0.4.5                 reshape2_1.4.4             
 [55] later_1.0.0                 munsell_0.5.0              
 [57] cellranger_1.1.0            tools_3.6.3                
 [59] cli_2.0.2                   generics_0.0.2             
 [61] ade4_1.7-15                 evaluate_0.14              
 [63] biomformat_1.14.0           stringr_1.4.0              
 [65] yaml_2.2.1                  bit64_0.9-7                
 [67] processx_3.4.2              fs_1.4.1                   
 [69] tidygraph_1.1.2             purrr_0.3.4                
 [71] nlme_3.1-147                whisker_0.4                
 [73] compiler_3.6.3              filelock_1.0.2             
 [75] png_0.1-7                   testthat_2.3.2             
 [77] tibble_3.0.1                tweenr_1.0.1               
 [79] stringi_1.4.6               ps_1.3.2                   
 [81] desc_1.2.0                  lattice_0.20-41            
 [83] Matrix_1.2-18               permute_0.9-5              
 [85] vegan_2.5-6                 multtest_2.42.0            
 [87] vctrs_0.2.4                 pillar_1.4.3               
 [89] lifecycle_0.2.0             data.table_1.12.8          
 [91] bitops_1.0-6                httpuv_1.5.2               
 [93] GenomicRanges_1.38.0        R6_2.4.1                   
 [95] latticeExtra_0.6-29         hwriter_1.3.2              
 [97] promises_1.1.0              renv_0.9.3                 
 [99] ShortRead_1.44.3            sessioninfo_1.1.1          
[101] codetools_0.2-16            MASS_7.3-51.6              
[103] assertthat_0.2.1            pkgload_1.0.2              
[105] rhdf5_2.30.1                SummarizedExperiment_1.16.1
[107] rprojroot_1.3-2             withr_2.2.0                
[109] GenomicAlignments_1.22.1    Rsamtools_2.2.3            
[111] GenomeInfoDbData_1.2.2      mgcv_1.8-31                
[113] hms_0.5.3                   grid_3.6.3                 
[115] tidyr_1.0.2                 rmarkdown_2.1              
[117] git2r_0.26.1                ggforce_0.3.1              
[119] Biobase_2.46.0

Bioinformatic pipeline

Bioinformatique pipeline

Load packages, functions and drake plan

Visualisation of the pipeline (drake plan)

Tracking sequences and samples accross the pipeline