1安装

如果(!requireNamespace("BiocManager", quiet = TRUE)) install.packages("BiocManager")::install("SingleCellMultiModal")

1.1加载包

库(SingleCellMultiModal)库(MultiAssayExperiment)

2scNMT:单细胞核小体、甲基化和转录测序

数据集由Argelaguet等人(2019)

用于处理原始数据的脚本由Argelaguet和同事编写和维护,并驻留在GitHub上:https://github.com/rargelaguet/scnmt_gastrulation

有关协议的更多信息,请参见克拉克等人(2018)

2.1数据集查询

方法可以查看可用的数据集dry.run论点:

scNMT("mouse_gastrulation", mode = "*", version = "1.0.0", dry.run = TRUE)
## snapshotDate(): 2022-04-19
# # ah_id模式file_size rdataclass rdatadateadded rdatadateremoved # # 1 EH3738 acc_cgi 7 Mb矩阵2020-09-03 < NA > # # 2 EH3739 acc_CTCF 1.2 Mb矩阵2020-09-03 < NA > # # 3 EH3740 acc_DHS 0.3 Mb矩阵2020-09-03 < NA > # # 4 EH3741 acc_genebody 49.6 Mb矩阵2020-09-03 < NA > # # 5 EH3742 acc_p300 0.2 Mb矩阵2020-09-03 < NA > # # 6 EH3743 acc_promoter 27.2 Mb矩阵2020-09-03 < NA > # # 7 EH3745 met_cgi 4.6 Mb矩阵2020-09-03 < NA > # # 8 EH3746 met_CTCF 0.1 Mb矩阵2020-09-03 < NA > # # 9 EH3747 met_DHS0.1 Mb矩阵2020-09-03  ## 10 EH3748 met_genebody 26.8 Mb矩阵2020-09-03  ## 11 EH3749 met_p300 0.1 Mb矩阵2020-09-03  ## 12 EH3750 met_promoter 11.5 Mb矩阵2020-09-03  ## 13 EH3751 rna 18.6 Mb矩阵2020-09-03 

或者简单地运行scNMT带有默认值的函数:

scNMT("mouse_gastrulation", version = "1.0.0")
## snapshotDate(): 2022-04-19
# # ah_id模式file_size rdataclass rdatadateadded rdatadateremoved # # 1 EH3738 acc_cgi 7 Mb矩阵2020-09-03 < NA > # # 2 EH3739 acc_CTCF 1.2 Mb矩阵2020-09-03 < NA > # # 3 EH3740 acc_DHS 0.3 Mb矩阵2020-09-03 < NA > # # 4 EH3741 acc_genebody 49.6 Mb矩阵2020-09-03 < NA > # # 5 EH3742 acc_p300 0.2 Mb矩阵2020-09-03 < NA > # # 6 EH3743 acc_promoter 27.2 Mb矩阵2020-09-03 < NA > # # 7 EH3745 met_cgi 4.6 Mb矩阵2020-09-03 < NA > # # 8 EH3746 met_CTCF 0.1 Mb矩阵2020-09-03 < NA > # # 9 EH3747 met_DHS0.1 Mb矩阵2020-09-03  ## 10 EH3748 met_genebody 26.8 Mb矩阵2020-09-03  ## 11 EH3749 met_p300 0.1 Mb矩阵2020-09-03  ## 12 EH3750 met_promoter 11.5 Mb矩阵2020-09-03  ## 13 EH3751 rna 18.6 Mb矩阵2020-09-03 

2.2数据版本

Argelaguet和他的同事提供了“mouse_gastrulation”数据集的最新版本。此数据集包括未通过版本强加的原始质量度量的其他单元格1.0.0数据集。

使用版本参数,以指示较新的数据集版本(即2.0.0):

scNMT("mouse_gastrulation", version = '2.0.0', dry.run = TRUE)
## snapshotDate(): 2022-04-19
# # ah_id模式file_size rdataclass rdatadateadded rdatadateremoved # # 1 EH3753 acc_cgi 21.1 Mb矩阵2020-09-03 < NA > # # 2 EH3754 acc_CTCF 1.2 Mb矩阵2020-09-03 < NA > # # 3 EH3755 acc_DHS 16.2 Mb矩阵2020-09-03 < NA > # # 4 EH3756 acc_genebody 60.1 Mb矩阵2020-09-03 < NA > # # 5 EH3757 acc_p300 0.2 Mb矩阵2020-09-03 < NA > # # 6 EH3758 acc_promoter 33.8 Mb矩阵2020-09-03 < NA > # # 7 EH3760 met_cgi 12.1 Mb矩阵2020-09-03 < NA > # # 8 EH3761 met_CTCF 0.1 Mb矩阵2020-09-03 < NA > # # 9 EH3762met_DHS 3.9 Mb矩阵2020-09-03  ## 10 EH3763 met_genebody 33.9 Mb矩阵2020-09-03  ## 11 EH3764 met_p300 0.1 Mb矩阵2020-09-03  ## 12 EH3765 met_promoter 18.7 Mb矩阵2020-09-03  ## 13 EH3766 rna 43.5 Mb矩阵2020-09-03 

2.3下载数据

为了获得数据,我们可以使用模式参数以使用与上述输出匹配的' glob '模式指示特定的数据集。例如,如果我们想拥有所有可用分析的所有“基因体”数据集,我们将使用* _genebody作为一个输入模式

nmt <- scNMT("mouse_gastrulation", mode = c("*_DHS", "*_cgi", "*_genebody"), version = "1.0.0", dry.run = FALSE
MultiAssayExperiment对象,包含6个实验,用户自定义名称和各自的类。##包含一个长度为6的ExperimentList类对象:## [1]acc_DHS: 290行826列的矩阵## [2]met_DHS: 66行826列的矩阵## [3]acc_cgi: 4459行826列的矩阵## [4]met_cgi: 5536行826列的矩阵## [5]acc_genebody: 17139行826列的矩阵## [6]met_genebody: 15837行826列的矩阵##功能:## experiments() -获取ExperimentList实例## colData() -主/表型DataFrame ## sampleMap() -样本协调DataFrame ## ' $ ', '[', '[[' -提取colData列,子集,或实验## *格式()-转换为长或宽的DataFrame ## assays() -转换ExperimentList为矩阵的SimpleList ## exportClass() -保存数据到平面文件

2.4检查单元格元数据

包括在colDataDataFrameMultiAssayExperiment类是变量cellID阶段lineage10x_2,stage_lineage.要提取这个DataFrame,一个人必须使用colDataMultiAssayExperiment对象:

colData (nmt)
##数据帧与826行和4列## cellID阶段lineage10x_2 ## <字符> <字符> <字符> E7.5 _plate1_a3 E7.5 _plate1_a3 E7.5 _plate1_h3 E7.5 _plate1_h3 E7.5 Endoderm E7.5 _plate1_d2 E7.5 _plate1_d7 E7.5 _plate1_d7 E7.5 _plate1_d5 E7.5 _plate1_f5 E7.5 _plate1_f5 E7.5 Endoderm ## Endoderm E7.5 _plate1_f5 E7.5 _plate1_f5 E7.5 _plate1_f5 E7.5 E7.5 Endoderm ## Endoderm E7.5 _plate1_f5 E7.5 _plate1_f5 E7.5 _plate1_f5 E7.5 E7.5 Endoderm ## Endoderm ## E7.5 _plate1_f5 E7.5 _plate1_f5 E7.5 _plate1_f5 E7.5PS_VE_Plate9_E11 PS_VE_Plate9_E11 E6.5 epblast ## PS_VE_Plate9_D11 E6.5 Primitive_Streak ## PS_VE_Plate9_A11 E6.5 Primitive_Streak ## PS_VE_Plate9_B11 PS_VE_Plate9_B11 E6.5 Primitive_Streak ## PS_VE_Plate9_B11 E6.5中胚层stage_lineage ## <角色> ## E7.5_Plate1_A3 E7.5_Endoderm ## E7.5_Plate1_H3 E7.5_Endoderm ## E7.5_Plate1_D2 E7.5_Endoderm ## E7.5_Plate1_D7 E7.5_Endoderm ## E7.5_Plate1_F5 E7.5_Endoderm ## E7.5_Plate1_F5 E7.5_Endoderm ## E7.5_Plate1_F5 E7.5_Endoderm ## E7.5_Plate1_F5 E7.5_Endoderm ## E7.5_Plate1_F5 E7.5_Endoderm ## E7.5_Plate1_F5 E7.5_Endoderm ## E7.5_Plate1_F5 E7.5_Endoderm ## # ... ...# PS_VE_Plate9_B11 E6.5_Epiblast ## PS_VE_Plate9_D11 E6.5_Primitive_Streak ## PS_VE_Plate9_A11 E6.5_Primitive_Streak ## PS_VE_Plate9_B11 E6.5_Mesoderm

2.5探索数据结构

检查行注释:

rownames (nmt)
##长度为6的字符列表## [["acc_DHS"]] ESC_DHS_118970 ESC_DHS_118919…ESC_DHS_68996 ESC_DHS_109494 ## [["met_DHS"]] ESC_DHS_20778 ESC_DHS_14504…ESC_DHS_72133 ESC_DHS_72129 ## [["acc_cgi"]] CGI_5278 CGI_6058 CGI_10627…CGI_7832 CGI_11329 CGI_10964 ## [["met_cgi"]] CGI_3481 CGI_8941 CGI_956 CGI_9461…CGI_2867 CGI_3499 CGI_365 ## [["acc_genebody"]] ENSMUSG00000036181 ENSMUSG00000071862…ENSMUSG00000025576 ## [["met_genebody"]] ENSMUSG00000059334 ENSMUSG00000024026…ENSMUSG00000078302

sampleMap是细胞和“化验”数据集之间关系的图表表示:

sampleMap (nmt)
##分析主colname ## <因子> <字符> <字符> ## 1 met_genebody E4.5-5.5_new_Plate1_..e4.5 - 5.5 - _new_plate1_ . .## 2 met_genebody E4.5-5.5_new_Plate1_..e4.5 - 5.5 - _new_plate1_ . .## 3 met_genebody E4.5-5.5_new_Plate1_..e4.5 - 5.5 - _new_plate1_ . .## 4 met_genebody E4.5-5.5_new_Plate1_..e4.5 - 5.5 - _new_plate1_ . .## 5 met_genebody E4.5-5.5_new_Plate1_..e4.5 - 5.5 - _new_plate1_ . . ## ... ... ... ... ## 4952 acc_DHS PS_VE_Plate9_G05 PS_VE_Plate9_G05 ## 4953 acc_DHS PS_VE_Plate9_G08 PS_VE_Plate9_G08 ## 4954 acc_DHS PS_VE_Plate9_G09 PS_VE_Plate9_G09 ## 4955 acc_DHS PS_VE_Plate9_G12 PS_VE_Plate9_G12 ## 4956 acc_DHS PS_VE_Plate9_H08 PS_VE_Plate9_H08

看一看细胞标识符或条形码的分析:

colnames (nmt)
##长度为6的字符列表## [["acc_DHS"]] E4.5-5.5_new_Plate1_A02…PS_VE_Plate9_H08 # # [[" met_DHS "]] e4.5 - 5.5 - _new_plate1_a02……PS_VE_Plate9_H08 # # [[" acc_cgi "]] e4.5 - 5.5 - _new_plate1_a02……PS_VE_Plate9_H08 # # [[" met_cgi "]] e4.5 - 5.5 - _new_plate1_a02……PS_VE_Plate9_H08 # # [[" acc_genebody "]] e4.5 - 5.5 - _new_plate1_a02……PS_VE_Plate9_H08 # # [[" met_genebody "]] e4.5 - 5.5 - _new_plate1_a02……PS_VE_Plate9_H08

2.6染色质可达性(acc_*)

见DNase超敏位点的可及性水平(按比例):

head(assay(nmt, "acc_DHS"))[, 1:4]
# ESC_DHS_118970 0.66666667 NA ## esc_dhs_11818182 0.7142857 NA ## ESC_DHS_6229 0.85714286 0.8000000 ## esc_dhs_913 0.06666667 0.6800000 ## E4.5-5.5_new_Plate1_A07 E4.5-5.5_new_Plate1_A08 ## ESC_DHS_118919 0.3636364 0.8421053 ## ESC_DHS_66330 0.7391304 0.8888889 ## ESC_DHS_6229 0.3333333 0.7142857 ##Esc_dhs_9413 0.2142857 0.5217391

2.7DNA甲基化(met_*)

见甲基化百分比/比例:

head(assay(nmt, "met_DHS"))[, 1:4]
ESC_DHS_20778 0.8000000 NA ## ESC_DHS_14504 0.8000000 0.8 ## ESC_DHS_112143 NA 0.4 ## ESC_DHS_34593 0.6666667 0.6 ## esc_dhs_203671 NA 0.6 ## E4.5-5.5_new_Plate1_A07 e4.5 -5.5 _new_dhs_33671 NA 0.8571429 0.8000000 ## ESC_DHS_14504 0.8000000 0.6000000 ## ESC_DHS_112143 0.5714286 0.5000000 ## ESC_DHS_34593 0.7142857 0.8000000 ## ESC_DHS_20747 NA 0.6000000 ## ESC_DHS_33671 0.8333333 0.6666667

有关协议信息,请参阅下面的参考文献。

3.sessionInfo

sessionInfo ()
## R版本4.2.0 RC (2022-04-19 r82224) ##平台:x86_64-pc-linux-gnu(64位)##运行在Ubuntu 20.04.4 LTS ## ##矩阵产品:默认## BLAS: /home/biocbuild/bbs-3.15-bioc/R/lib/libRblas。/home/biocbuild/bbs-3.15-bioc/R/lib/libRlapack。所以## ## locale: ## [1] LC_CTYPE=en_US。UTF-8 LC_NUMERIC= c# # [3] LC_TIME=en_GB LC_COLLATE= c# # [5] LC_MONETARY=en_US。utf - 8 LC_MESSAGES = en_US。UTF-8 ## [7] LC_PAPER=en_US。UTF-8 LC_NAME= c# # [9] LC_ADDRESS=C lc_phone = c# # [11] LC_MEASUREMENT=en_US。UTF-8 LC_IDENTIFICATION=C ## ##附加的基本包:## [1]stats4 stats graphics grDevices utils datasets methods ##[8]基础## ##其他附加包:# # # # [1] scater_1.24.0 ggplot2_3.3.5 [3] scran_1.24.0 scuttle_1.6.0 # # [5] rhdf5_2.40.0 SingleCellExperiment_1.18.0 # # [7] RaggedExperiment_1.20.0 SingleCellMultiModal_1.8.0 # # [9] MultiAssayExperiment_1.22.0 SummarizedExperiment_1.26.0 # # [11] Biobase_2.56.0 GenomicRanges_1.48.0 # # [13] GenomeInfoDb_1.32.0 IRanges_2.30.0 # # [15] S4Vectors_0.34.0 BiocGenerics_0.42.0 # # [17] MatrixGenerics_1.8.0 matrixStats_0.62.0 # # [19] BiocStyle_2.24.0 # # # #通过加载一个名称空间(而不是附加):# # # # [1] AnnotationHub_3.4.0 BiocFileCache_2.4.0 [3] plyr_1.8.7 igraph_1.3.1 # # [5] BiocParallel_1.30.0 digest_0.6.29 # # [7] htmltools_0.5.2 viridis_0.6.2 # # [9] magick_2.7.3 fansi_1.0.3 # # [11] magrittr_2.0.3 memoise_2.0.1 # # [13] ScaledMatrix_1.4.0 SpatialExperiment_1.6.0 # # [15] cluster_2.1.3 limma_3.52.0 # # [17] Biostrings_2.64.0 R.utils_2.11.0 # # [19] colorspace_2.0-3 blob_1.2.3 # # [21] rappdirs_0.3.3 ggrepel_0.9.1 # # [23] xfun_0.30 dplyr_1.0.8 # # [25] crayon_1.5.1 rcurl_1.98 - 1.6 # # [27][39] DBI_1.1.2 edgeR_3.38.0 # [41] Rcpp_1.0.8.3 dqrng_0.3.0 # [45] bit_4.0.4 rsvd_1.0.5 # [47] metapod_1.4.0 httr_1.4.2 # [49] ellipsis_0.3.2 farver_2.1.0 # [51] pkgconfig_2.0.3 R.methodsS3_1.8.1 ## [53] uwot_0.1.11sass_0.4.1 ## [55] dbplyr_2.1.1 locfit_1.5-9.5 ## [57] utf8_1.2.2 labeling_0.4.2 ## [59] tidyselect_1.1.2 rlang_1.0.2 ## [61] later_1.3.0 AnnotationDbi_1.58.0 ## [63] munsell_0.5.0 BiocVersion_3.15.2 ## [65] tools_4.2.0 cachem_1.0.6 ## [67] cli_3.3.0 generics_0.1.2 ## [69] RSQLite_2.2.12 ExperimentHub_2.4.0 ## [73] fastmap_1.1.0 yaml_2.3.5 ## [75] knitr_1.39 bit64_4.0.5 ## [77] purrr_0.3.4 KEGGREST_1.36.0 ## [79] sparseMatrixStats_1.8.0 mime_0.12 ##[83] compiler_4.2.0 beeswarm_0.4.0 ## [85] filelock_1.0.2 curl_4.3.2 ## [87] png_0.1-7 interactiveDisplayBase_1.34.0 ## [91] bslib_0.3.1 stringi_1.7.6 ## [93] highr_0.9 rspectra_0.16 .1 ## [95] lattice_0.20-45 bluster_1.6.0 ## [97] Matrix_1.4-1 vctrs_0.4.1 ## [99] pillar_1.7.0 lifecycle_1.0.1 ## [101] rhdf5filters_1.8.0 BiocManager_1.30.17 ## [103] jquerylib_1.1.4 RcppAnnoy_0.0.19 ## [105] biocneighbor1.14.0 cowplot_1.1.1 ## [107] bitops_1.0-7 irlba_2.3.5 ## [109] httpuv_1.6.5 R6_2.5.1 ## [111] bookdown_0.26 promises_1.2.0.1 ## [113] gridExtra_2.3 vipor_0.4.5 ## [115] codetools_0.2-18 assertthat_0.2.1 ## [117] rjson_0.2.21 withr_2.5.0 ## [119] GenomeInfoDbData_1.2.8 parallel_4.2.0 ## [121] grid_4.2.0 beachmat_2.12.0 ## [123] rmarkdown_2.14 DelayedMatrixStats_1.18.0 ## [125] shiny_1.7.1 ggbeeswarm_0.6.0

参考文献

Argelaguet, Ricard, Stephen J Clark, Hisham Mohammed, L Carine Stapel, Christel Krueger, chantriolant - andreas Kapourani, Ivan Imaz-Rosshandler等。2019。“单细胞分辨率下小鼠原肠形成的多组学分析。”自然576(7787): 487-91。

Clark, Stephen J, Ricard Argelaguet, chantriolent - andreas Kapourani, Thomas M Stubbs, Heather J Lee, Celia Alda-Catalinas, Felix Krueger等。2018。“scNMT-seq能够在单细胞中实现染色质可达性DNA甲基化和转录的联合分析。”Commun Nat。9(1): 781。