TCGA重新处理了来自24种癌症类型的9264个肿瘤样本和741个正常样本的RNA-Seq数据,并将其通过GSE62944从地理。这些数据也可以作为ExpressionSet从ExperimentHub中获得,并且可以用于差分表达式分析。
在下面的示例中,我们将展示如何从ExperimentHub下载此数据集。
库(ExperimentHub)
##加载所需的包:BiocGenerics
## ##附加包:“BiocGenerics”
以下对象从'package:stats'中屏蔽:## ## IQR, mad, sd, var, xtabs
##以下对象从'package:base'中屏蔽:## ## Filter, Find, Map, Position, Reduce, anyduplication, append, ## as.data.frame, basename, cbind, colnames, dirname, do。调用,## duplicate eval evalq get grep grepl, intersect, is。Unsorted, ## lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin, ## pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table, ## tapply, union, unique, unsplit,其中。马克斯,which.min
##加载所需包:AnnotationHub
##加载所需包:BiocFileCache
##加载所需的包:dbplyr
eh = ExperimentHub()
## snapshotDate(): 2022-04-19
查询(呃,“GSE62944”)
## ExperimentHub与3个记录## # snapshotDate(): 2022-04-19 ## $dataprovider: GEO ## # $species: Homo sapiens ## # $rdataclass: summarizeexperimental, ExpressionSet ## #附加mcols(): taxonomyid,基因组,描述,## # coordinate_1_based, maintainer, rdatadateadded, prepareclass, tags, ## # rdatapath, sourceurl, sourcetype ## #检索记录,例如,'object[["EH1"]]]' ## ## title ## EH1 | rna -测序和临床数据7706肿瘤样本从…## EH1043 | rna测序和9246例肿瘤样本的临床数据…## EH1044 | rna测序和741例正常样本的临床数据…
然后可以提取数据用于此用途
tcga_data <- eh[["EH1"]]
参见?GSE62944和browseVignettes('GSE62944')获取文档
##从缓存加载
不同的癌症类型可以使用-访问
头(phenoData (tcga_data) CancerType美元)
20个级别:BLCA BRCA COAD GBM HNSC KICH KIRC KIRP LAML LGG LIHC LUAD…UCEC
上面我们只展示了前6种癌症亚型。
我们感兴趣的是从TCGA的低级别胶质瘤样本中鉴定IDH1突变体和IDH1野生型样本,然后使用DESeq2进行差异表达分析
将表达式设置为只包含来自LGG的样本。lgg_data <- tcga_data[, which(表型数据(tcga_data)$CancerType=="LGG")] # extract the IDHI突变样本mut_idx <- which(表型数据(lgg_data)$idh1_mutation_found=="YES") mut_data <- exprs(lgg_data)[, mut_idx] # extract the IDH1 WT样本wt_idx <- which(表型数据(lgg_data)$idh1_mutation_found=="NO") wt_data <- exprs(lgg_data)[, wt_idx] # make a countTable。countData <- cbind(mut_data, wt_data) # DE分析与DESeq2我们需要一个sampleTable样本=c(colnames(mut_data), colnames(wt_data)) group =c(rep("mut",长度(mut_idx)), rep("wt",长度(wt_idx))) coldata <- cbind("sampleName", " group ") colnames(coldata) <- c("sampleName", " group ") coldata[," group "] <- factor(coldata[," group "], c("wt","mut")) #现在我们可以运行DE分析库(DESeq2)
##加载所需的包:S4Vectors
##加载所需的包:stats4
## ##附加包:“S4Vectors”
以下对象从'package:base'中屏蔽:## ## I,展开。网格,unname
##加载所需的包:IRanges
##加载所需软件包:GenomicRanges
##加载所需包:GenomeInfoDb
##加载所需包:摘要实验
##加载所需包:MatrixGenerics
##加载所需的包:matrixStats
## ##附加包:'matrixStats'
以下对象从'package:Biobase'中屏蔽:## ## anyMissing, rowMedians
## ##附加包:'MatrixGenerics'
下面的对象从package:matrixStats中屏蔽:## ## colAlls, colAnyNAs, colanyans, colAvgsPerRowSet, colCollapse, ## colCounts, colCummaxs, colCummins, colCumprods, colMadDiffs, colIQRs, colLogSumExps, colMadDiffs, ## colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats, ## colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds, ## colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads, ## colWeightedMeans, colWeightedMedians, colweighteddsds, ## colweighttedvars, rowAlls, rowAnyNAs, rowAnys, colIQRs, colLogSumExps, colMadDiffs,rowAvgsPerColSet, ## rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods, ## rowcumsum, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps, ## rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins, ## rowOrderStats, rowProds, rowQuantiles, rowwranges, rowwranks, ## rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars, ## rowWeightedMads, rowWeightedMeans, rowWeightedMedians, ## rowweighteddsds, rowWeightedVars
以下对象从'package:Biobase'中屏蔽:## ## rowMedians
ddsMat <- DESeqDataSetFromMatrix(countData = countData, colData = DataFrame(colData), design = ~ Group)
## DESeqDataSet中的警告(se, design = design, ignoreRank): ## design公式中的一些变量是字符,转换为因子
dds <- ddsMat dds <- dds[rowsum (counts(dds)) > 1,] dds <- DESeq(dds)
估算尺寸因素
估算离散度
##基因分散估计
均值-色散关系
最终的离散度估计
装配模型和测试
##——替换异常值并为865个基因进行改装##——DESeq参数'minReplicatesForReplace' = 7 ##——原始计数保存在计数(dds)中
估算离散度
装配模型和测试
Res <-结果(dds)摘要(Res)
## ##出22546非零总读计数##调整p值< 0.1 ## LFC > 0(上):2892,13% ## LFC < 0(下):5094,23% ##异常值[1]:0,0% ##低计数[2]:1749,7.8% ##(平均计数< 0)##[1]见'cooksCutoff'参数?results ##[2]见'independentFiltering'参数?results
有关RNASeq的详细分析,请参见Mike Love的RNASeq工作流
sessionInfo ()
## R版本4.2.0 RC (2022-04-19 r82224) ##平台:x86_64-pc-linux-gnu(64位)##运行在Ubuntu 20.04.4 LTS ## ##矩阵产品:默认## BLAS: /home/biocbuild/bbs-3.15-bioc/R/lib/libRblas。/home/biocbuild/bbs-3.15-bioc/R/lib/libRlapack。所以## ## locale: ## [1] LC_CTYPE=en_US。UTF-8 LC_NUMERIC= c# # [3] LC_TIME=en_GB LC_COLLATE= c# # [5] LC_MONETARY=en_US。utf - 8 LC_MESSAGES = en_US。UTF-8 ## [7] LC_PAPER=en_US。UTF-8 LC_NAME= c# # [9] LC_ADDRESS=C lc_phone = c# # [11] LC_MEASUREMENT=en_US。UTF-8 LC_IDENTIFICATION=C ## ##附加的基本包:## [1]stats4 stats graphics grDevices utils datasets methods ##[8]基础## ##其他附加包:[1] DESeq2_1.36.0 MatrixGenerics_1.8.0 matrixStats_0.62.0 ## [5] GenomicRanges_1.48.0 GenomeInfoDb_1.32.0 ## [7] IRanges_2.30.0 S4Vectors_0.34.0 ## [9] GSE62944_1.24.0 GEOquery_2.64.0 ## [11] Biobase_2.56.0 ExperimentHub_2.4.0 ## [13] AnnotationHub_3.4.0 BiocFileCache_2.4.0 ## [15] dbplyr_2.1.1 BiocGenerics_0.42.0 ## [17] BiocStyle_2.24.0 ## ##通过命名空间加载(并且没有附加):## [9] R6_2.5.1 colorspace_2.0-3 ## [11] DBI_1.1.2 withr_2.5.0 ## [13] tidyselect_1.1.2 bit_4.0.4 ## [15] curl_4.3.2 compiler_4.2.0 ## [15] DelayedArray_0.22.0 bookdown_0.26 ## [21] sass_0.4.1 scales_1.2.0 ## [23] genefilter_1.78.0 readr_2.1.2 ## [25] rappdirs_0.3.3 string_1 .4.0 ## [27] digest_0.6.29 rmarkdown_2.14 ## [29]XVector_0.36.0 pkgconfig_2.0.3 # # [31] htmltools_0.5.2 fastmap_1.1.0 # # [33] limma_3.52.0 rlang_1.0.2 # # [35] RSQLite_2.2.12 shiny_1.7.1 # # [37] jquerylib_0.1.4 generics_0.1.2 # # [39] jsonlite_1.8.0 BiocParallel_1.30.0 # # [41] dplyr_1.0.8 rcurl_1.98 - 1.6 # # [43] magrittr_2.0.3 GenomeInfoDbData_1.2.8 # # [45] Matrix_1.4-1 munsell_0.5.0 # # [47] Rcpp_1.0.8.3 fansi_1.0.3 # # [49] lifecycle_1.0.1 stringi_1.7.6 # # [51] yaml_2.3.5 zlibbioc_1.42.0 # # [53] grid_4.2.0 blob_1.2.3 # # [55] parallel_4.2.0[61] annotate_1.74.0 hms_1.1.1 ## [63] KEGGREST_1.36.0 locfit_1. 1.5-9.5 ## [65] knitr_1.39 pillar_1.7.0 ## [67] geneplotter_1.74.0 XML_3.99-0.9 ## [69] glue_1.6.2 BiocVersion_3.15.2 ## [71] evaluate_0.15 data.table_1.14.2 ## [73] BiocManager_1.30.17 png_0.1-7 ## [75] vctrs_0.4.1 tzdb_0.3.0 ## [77] httprs_0.6.5 gtable_0.3.0 ## [81] assertthat_0.2.1 ggplot2_3.3.5 ## [83]cachem_1.0.6 xfun_0.30 ## [85] mime_0.12 xtable_1.8-4 ## [87] later_1.3.0 survivval_3 .3-1 ## [89] tibble_3.1.6 AnnotationDbi_1.58.0 ## [91] memoise_2.0.1 ellipsis_0.3.2 ## [93] interactiveDisplayBase_1.34.0