genefu 2.29.0
该基因包为基因表达分析,特别是乳腺癌的基因表达分析提供了相关功能。该软件包包括一些用于分子亚型分类的算法。该软件包还包括预后预测算法的实现,以及这些算法所基于的预后基因签名列表。
请参考稿件URL和实验室网站:http://www.pmgenomics.ca/bhklab/software/genefu
也请参考下面的参考文献部分,以获得引用genefu版本1的出版物的额外信息。
首先,我们将genefu加载到工作区中。该软件包是公开可用的,可以从R版本2.13.0或更高版本的Bioconductor 2.8或更高版本安装。
安装genefu包。
BiocManager::安装(“genefu”)
为了计算风险分数,估计风险分数的表现,结合估计和比较估计,我们必须加载genefu
而且survcomp
包到工作区中。我们还装载了进行案例研究所需的所有包。
库(genefu)库(xtable)库(rmeta)库(Biobase)库(插入)
下面的案例研究比较了风险预测模型。这包括计算风险评分,计算风险评分表现的估计,以及组合估计并进行比较。
我们在案例研究中使用的五个数据集在bioconductor网站上作为实验数据包公开提供。我们特别使用了:
请注意:在本案例研究中,我们没有使用breastCancerVDX实验包,因为它已被用作GENIUS的训练数据集。请参考Haibe-Kains et al, 2010。breastCancerVDX可在以下链接找到:
这些实验数据包可以从Bioconductor version 2.8或更高版本的R version 2.13.0或更高版本安装。对于实验数据包,安装数据集的命令为:
BiocManager::install("breastCancerMAINZ") BiocManager::install("breastCancerTRANSBIG") BiocManager::install(" breastcancerstrup ") BiocManager::install("breastCancerUNT") BiocManager::install("breastCancerNKI")
图书馆(breastCancerMAINZ)图书馆(breastCancerTRANSBIG)图书馆(breastcancer喷发)图书馆(breastCancerUNT)图书馆(breastcancerki)
数据集 | 病人[#] | ER + [#] | HER2 + [#] | 年龄(年) | 年级(1/2/3) | 平台 |
---|---|---|---|---|---|---|
美因茨 | 200 | 155 | 23 | 25 - 90 | 29/136/35 | HGU133A |
TRANSBIG | 198 | 123 | 35 | 24-60 | 30/83/83 | HGU133A |
UPP | 251 | 175 | 46 | 28 - 93 | 67/128/54 | HGU133AB |
螺母 | 137 | 94 | 21 | 24 - 73 | 32/51/29 | HGU133AB |
NKI | 337 | 212 | 53 | 26 - 62 | 79/109/149 | 安捷伦科技公司 |
整体 | 1123 | 759 | 178 | 25 - 73 | 237/507/350 | Affy /安捷伦 |
表格1显示了数据集和患者(n=1123)的概述。关于ER和HER2状态的信息,以及患者年龄(每个数据集的患者年龄范围)已经从基因表达综合(GEO)下对应数据集的表型(pData)中提取出来。Mainz、Transbig、UPP和UNT数据集对应的GEO接入号分别为GSE11121、GSE7390、GSE3494和GSE2990。数据也从NKI数据集的出版物补充信息中获得。
对于涉及分子分型分类的分析[第4节],我们在数据集中去除重复患者后,分别对每个数据集进行分子分型。
在比较风险预测模型和判断预后的分析中[第5节],我们从这1123例乳腺癌患者中只选择了淋巴结阴性且未接受任何治疗(局部放疗除外)的患者,结果有713例患者[详情请参阅第5节]。
data(breastCancerData) cinfo <- colnames(pData(mainz7g)) data。所有<- c("transbig7g"=transbig7g, "unt7g"=unt7g, "upp7g"=upp7g, "mainz7g"=mainz7g, "nki7g"=nki7g) idtoremove。all <- NULL duplres <- NULL ## MainZ和NKI数据集中没有重叠。##关注UNT vs UPP vs TRANSBIG的演示。all <- rbind(pData(transbig7g), pData(unt7g), pData(upp7g)) dn2 <- c("TRANSBIG", "UNT", "UPP") ## Karolinska ##查找VDXKIU, KIU, UPPU系列ds2 <- c("VDXKIU", "KIU", "UPPU") demot <- demo.all[完成。case(演示。所有[,c("series")]) & is.element(演示。全部,“系列”,ds2)] #找到系列的重复病人duplid < -排序(独特(demot[复制(demot [" id "]), " id "])) duplrest <——零(我在1:长度(duplid)) {tt <——零(k 1:长度(dn2)) {myx < -排序(row.names (demot) [complete.cases (demot [c(“id”、“数据集”)])& demot (" id ") = = duplid[我]& demot(,“数据集”)= = dn2 [k]])如果(长度(myx) > 0) {tt < - c (tt, myx)}} duplrest < - c (duplrest、列表(tt))}的名字(duplrest) < - duplid duplres < - c (duplres,##牛津##查找VVDXOXFU, OXFU系列ds2 <- c("VDXOXFU", "OXFU") demot <- demo.all[complete.cases(demo. case)所有[,c("series")]) & is.element(演示。全部,“系列”,ds2)] #找到系列的重复病人duplid < -排序(独特(demot[复制(demot [" id "]), " id "])) duplrest <——零(我在1:长度(duplid)) {tt <——零(k 1:长度(dn2)) {myx < -排序(row.names (demot) [complete.cases (demot [c(“id”、“数据集”)])& demot (" id ") = = duplid[我]& demot(,“数据集”)= = dn2 [k]])如果(长度(myx) > 0) {tt < - c (tt, myx)}} duplrest < - c (duplrest、列表(tt))}的名字(duplrest) < - duplid duplres < - c (duplres,duPL <- sort(unlist(lapply(duplres, function(x) {return(x[-1])}))))
我们现在对每个数据集进行分子分型。在这里,我们使用PAM50和SCMOD2子类型算法执行子类型。
#加载必要的数据data(scmod2.robust) data(pam50.robust) data(scmgen .robust) data(sig.ggi) data(scmod1.robust) data(sig.genius) dn <- c("transbig", "unt", "upp", "mainz", "nki") dn。平台< - c(“affy”、“affy”,“affy”、“affy”、“安捷伦科技公司”)res < - ddemo。所有<- ddemo。coln <- NULL for(i in 1:length(dn)){##加载数据集dd <- get(data(list=dn[i]))) #删除重复识别的第一个消息("已获得的数据集!")#提取表达式集,pData, fData为每个数据集ddata<- t(exprs(dd)) ddemo<- phenoData(dd)@data if(length(intersect(rownames(ddata),duPL))>0) {ddata<-ddata[-which(rownames(ddata) %in% duPL),] ddemo<-ddemo[-which(rownames(ddemo) %in% duPL),]} dannot <- featureData(dd)@data # MOLECULAR SUBTYPING #使用scmod2执行子类型。健壮的# scmod2。健壮的:定义子类型集群模型的参数列表#(由Wirapati等人定义)#过时的函数调用- GENEFU的旧版本# SubtypePredictions<-subtype.cluster.predict(st .model=scmod2. predict)健壮,data=ddata, # annot=dannot,do。map =TRUE, # verbose=TRUE) #当前函数调用- GENEFU SubtypePredictions的最新版本<- molecular.subtyping(sbt. subtyping)Model = "scmod2",data = ddata, annot = dannot,do。mapping = TRUE) #获取与每个子类型表相关的样本计数(SubtypePredictions$subtype == "ER-/HER2-") #选择与基础子类型Basals相关的样本<-names(which(SubtypePredictions$subtype == "ER-/HER2-")) #选择与HER2子类型HER2s相关的样本<-names(which(SubtypePredictions$subtype == "HER2+")) #选择与Luminal Subtypes相关的样本LuminalB<-names(which(SubtypePredictions$subtype == "ER+/HER2-高产量"))LuminalA<-names(which(SubtypePredictions$subtype == "ER+/HER2-低产量"Prolif")) #为每个样本分配子类型,添加到现有的表型数据ddemo$SCMOD2<- subtypeprediction $subtype ddemo[LuminalB,]$SCMOD2<-"LumB" ddemo[LuminalA,]$SCMOD2<-"LumA" ddemo[Basals,]$SCMOD2<-"Basal" ddemo[HER2s,]$SCMOD2<-"Her2" #使用PAM50执行子类型#矩阵应该有样本作为ROWS,基因作为COLUMNS # rownames(dannot)<-dannot$probe<-dannot$EntrezGene。ID #旧函数调用# PAM50Preds<-intrinsic.cluster.predict(sbt. preds)Model =pam50,data=ddata, # annot=dannot,do。mapmapping =TRUE, # verbose=TRUE) #基于最新版本的更新函数调用PAM50Preds<-molecular.subtyping(sbt. subtyping)model = "pam50",data=ddata, annot=dannot,do.mapping=TRUE) table(PAM50Preds$subtype) ddemo$ pam50 <-PAM50Preds$subtype LumA<-names(PAM50Preds$subtype == "LumA") [which(PAM50Preds$subtype == "LumB")] LumB<-names(PAM50Preds$subtype)[which(PAM50Preds$subtype == "LumB")] ddemo[LumA,]$ pam50 <-"LumA" ddemo[LumB,]$ pam50 <-"LumB" ddemo。所有<- rbind(ddemo, ddemo. All)}
##已获取数据集!##已获取数据集!##已获取数据集!##已获取数据集!##已获取数据集!
我们可以比较两种分子分型方法的性能,并确定在全球人口中亚型预测的一致性。我们首先生成子类型预测的混淆矩阵。
获取PAM50表的子类型预测计数(ddemo.all$PAM50)
## ##基础Her2 LumB LumA正常## 161 116 306 398 38
法线< -rownames (ddemo.all [(ddemo。获取SCMOD2表的子类型预测计数(ddemo.all$SCMOD2)
## ##基底Her2 LumA LumB ## 184 118 434 283
ddemo.all$PAM50<-as.character(ddemo.all$PAM50) #我们比较被预测为属于分子子类型的样本#我们现在忽略由PAM50 confusionMatrix(factor(ddemo.all[-which(rownames(ddemo.all) %in% Normals),]$SCMOD2), factor(ddemo.all[-which(rownames(ddemo.all) %in% Normals),]$PAM50)预测为'Normal'的样本
# #混淆矩阵和统计参考# # # # # #预测基底Her2亮度LumB # #基底151 16 2 4 # # Her2 6 84 4 22 # #亮度1 4 361 43 # # LumB 12 31日237 # # # #总体统计精度# # # #:# # 95%置信区间:0.8491(0.8252,0.871)# #没有信息率:0.4057 # #假定值(Acc > NIR): < 2 e-16 # # # # Kappa: 0.7838 # # # # Mcnemar检验法测试假定值:0.1285 # # # #统计类:# # # #类:基础类:Her2类:亮度类:灵敏度0.9379 0.72414 0.9070 0.7745特异性0.9732 0.96301 0.9177 0.9319 Pos Pred值0.8728 0.72414 0.8826 0.8375阴性Pred值0.9876 0.96301 0.9353 0.9011患病率0.1641 0.11825 0.4057 0.3119检出率0.1539 0.08563 0.3680 0.2416检测患病率0.1764 0.11825 0.4169 0.2885平衡精度0.9555 0.84357 0.9124 0.8532
从这些结果来看,这些模型之间的预测一致性约为85%。
我们还可以比较每个亚型患者的生存率。我们根据每个分子分类算法,按亚型绘制患者的生存曲线
# http://www.inside-r.org/r-doc/survival/survfit.coxph library(survival) ddemo<-ddemo。所有data.for.survival。SCMOD2 <- ddemo[,c("e。os", " t.s "," SCMOD2","age")] data.for.survival。PAM50 <- ddemo[,c("e。os", " t.s "," PAM50","age")] #删除缺失生存信息data.for.survival的患者。SCMOD2 <- data.for.survival.SCMOD2[complete.cases(data.for.survival.SCMOD2),] data.for.survival.SCMOD2。PAM50 <- data.for.survival.PAM50[complete.cases(data.for. survivor .PAM50),] days.per.month <- 30.4368 days.per.year <- 365.242 data.for.survival.PAM50[complete.cases(data.for. survivor .PAM50),] days.per.month <- 30.4368 days.per.year <- 365.242 data.for.survival.PAM50]PAM50$months_to_death <- data.for.survival.PAM50$t。OS / days.per.month数据。for.survival。PAM50$ vitital_status <- data.for.survival.PAM50$e. .OS == "1" .obj。PAM50 <- survfit(Surv(data.for.survival。PAM50$months_to_death, data.for.survival.PAM50$ vitital_status) ~ data.for.survival.PAM50$PAM50) data.for.survival。SCMOD2$months_to_death <- data.for.survival.SCMOD2$t。OS / days.per.month数据。for.survival。SCMOD2$ vitital_status <- data.for.survival.SCMOD2$e. txtOS == "1" .obj。SCMOD2 <- survfit(Surv(data.for.survival。SCMOD2$months_to_death, data.for.survival.SCMOD2$ vitital_status) ~ data.for.survival.SCMOD2$SCMOD2)消息("KAPLAN-MEIR CURVE - USING PAM50")
## kaplan-meir曲线-使用pam50
plot(main =“生存曲线PAM50”,survey .obj。PAM50坳= c(“# 006 d2c”、“# 8856 a7”、“# a50f15”、“# 08519 c”、“# 000000”),lty = 1, lwd = 3, xlab =“时间(月)”,传奇ylab =“生存概率”)(“topright”,填补= c(“# 006 d2c”、“# 8856 a7”、“# a50f15”、“# 08519 c”、“# 000000”),传说= c(“基底”,“Her2”,“亮度”、“LumB”、“正常”),电池=“n”)
消息("KAPLAN-MEIR曲线-使用SCMOD2")
kaplan-meir曲线-使用scmod2
plot(main =“生存曲线SCMOD2”,survey .obj。SCMOD2, col =c("#006d2c", "#8856a7","#a50f15", "#08519c"),lty = 1,lwd = 3, xlab = "时间(月)",ylab = "存活概率")legend("topright", fill =c("#006d2c", "#8856a7","#a50f15", "#08519c"), legend =c(" Basal","Her2","LumA","LumB"),bty = "n")
生成一个生存曲线叠加图消息(“基于PAM50和SCMOD2的生存曲线叠加图”)
基于PAM50和SCMOD2的重叠生存图
##基础Her2 LuminalA LuminalB正常值图(survey .obj。PAM50,col =c("#006d2c", "#8856a7","#a50f15", "#08519c", "#000000"),lty = 1,lwd = 3, xlab = "时间(月)",ylab = "存活概率",ymin = 0.2) legend("topright", fill =c("#006d2c", "#8856a7","#a50f15", "#08519c", "#000000"), legend =c("基底","Her2","LumA","LumB","正常"),bty = "n") par(new=TRUE) ##基底Her2 LuminalA LuminalB lines(surv.obj.)SCMOD2坳= c(“# 006 d2c”、“# 8856 a7”、“# a50f15”、“# 08519 c”),lwd = 2, lty = 5)传说(“bottomright”,c(“PAM50”、“SCMOD2”),lty = c(“固体”、“冲”))
我们现在可以比较哪种分子分型算法更有预后性。为此,我们使用survcomp的交叉验证部分似然(cvpl)计算。这将返回平均交叉验证的部分似然,对于每个算法,使用分子亚型进行分层
set.seed (12345) PAM5_CVPL < -cvpl (x = data.for.survival。PAM50时代,美元surv.time = data.for.survival。PAM50 months_to_death美元,surv.event = data.for.survival。$cvpl SCMOD2_CVPL<-cvpl(x=data.for.survival. PAM50$vital_status, strata=as.integer(factor(data.for.survival.PAM50$PAM50)), nfold=10, setseed=54321)$cvpl SCMOD2_CVPL<-cvpl(x=data.for.survival. PAM50$PAM50)SCMOD2时代,美元surv.time = data.for.survival。SCMOD2 months_to_death美元,surv.event = data.for.survival。SCMOD2$vital_status, strata=as.integer(factor(data.for. alive .SCMOD2$SCMOD2)), nfold=10, setseed=54321)$cvpl print.data.frame(data.frame(cbind(PAM5_CVPL,SCMOD2_CVPL)))
# PAM5_CVPL SCMOD2_CVPL ## logpl 1.424844 1.429175
我们使用以下算法列表(以及相应的genefu函数)计算风险分数:
#加载基因特征数据(sig. endoppredict) data(sig.oncotypedx) data(sig.tamr13) data(sig.gene70) data(sig.pik3cags) data(pam50) dn <- c("transbig", "unt", "upp", "mainz", "nki") dn。平台< - c(“affy”、“affy”,“affy”、“affy”、“安捷伦科技公司”)res < - ddemo。所有<- ddemo。coln <- NULL for(i in 1:length(dn)) {## load dataset dd <- get(data(list=dn[i]))) #提取表达式集,pData, fData for每个数据集ddata <- t(exprs(dd)) ddemo <- phenoData(dd)@data dannot <- featureData(dd)@data ddemo。所有<- c(ddemo。All, list(ddemo)) if(is.null(ddemo.col)) {ddemo. list(ddemo))Coln <- colnames(ddemo)} else {ddemo. colnames(ddemo)}Coln <- intersect(ddemo。colnames(ddemo))} rest <- NULL ## AURKA ## if affy平台考虑探针发表在Desmedt et al., CCR, 2008 if(dn。平台[i] == "affy") {domap <- FALSE}其他{domap <- TRUE} modt <- scmgene. exe . exe。健壮的$mod$AURKA ##如果agilent平台考虑探针发表在Desmedt等人,CCR, 2008如果(dn。platform[i] == "agilent") {domap <- FALSE modt[, "probe"] <- "NM_003600"} rest <- cbind(rest, "AURKA"=sig. net)score(x=modt, data=ddata, annot=dannot, do.mapping=domap)$score) ## ESR1 ##如果affy平台考虑Desmedt等人发表的探针,CCR, 2008 if(dn. map)平台[i] == "affy") {domap <- FALSE}其他{domap <- TRUE} modt <- scmgene. exe . exe。健壮的$mod$ESR1 ##如果agilent平台考虑探针发表在Desmedt等人,CCR, 2008如果(dn。platform[i] == "agilent") {domap <- FALSE modt[, "probe"] <- "NM_000125"} rest <- cbind(rest, "ESR1"=sig. cn)score(x=modt, data=ddata, annot=dannot, do.mapping=domap)$score) ## ERBB2 ##如果affy平台考虑Desmedt等人发表的探针,CCR, 2008 if(dn. map)平台[i] == "affy") {domap <- FALSE}其他{domap <- TRUE} modt <- scmgene. exe . exe。robust$mod$ERBB2 ## if agilent platform consider the probe published in Desmedt et al., CCR, 2008 if(dn.platform[i] == "agilent") { domap <- FALSE modt[ , "probe"] <- "NM_004448" } rest <- cbind(rest, "ERBB2"=sig.score(x=modt, data=ddata, annot=dannot, do.mapping=domap)$score) ## NPI ss <- ddemo[ , "size"] gg <- ddemo[ , "grade"] nn <- rep(NA, nrow(ddemo)) nn[complete.cases(ddemo[ , "node"]) & ddemo[ , "node"] == 0] <- 1 nn[complete.cases(ddemo[ , "node"]) & ddemo[ , "node"] == 1] <- 3 names(ss) <- names(gg) <- names(nn) <- rownames(ddemo) rest <- cbind(rest, "NPI"=npi(size=ss, grade=gg, node=nn, na.rm=TRUE)$score) ## GGI if(dn.platform[i] == "affy") { domap <- FALSE } else { domap <- TRUE } rest <- cbind(rest, "GGI"=ggi(data=ddata, annot=dannot, do.mapping=domap)$score) ## GENIUS if(dn.platform[i] == "affy") { domap <- FALSE } else { domap <- TRUE } rest <- cbind(rest, "GENIUS"=genius(data=ddata, annot=dannot, do.mapping=domap)$score) ## ENDOPREDICT if(dn.platform[i] == "affy") { domap <- FALSE } else { domap <- TRUE } rest <- cbind(rest, "EndoPredict"=endoPredict(data=ddata, annot=dannot, do.mapping=domap)$score) # OncotypeDx if(dn.platform[i] == "affy") { domap <- FALSE } else { domap <- TRUE } rest <- cbind(rest, "OncotypeDx"=oncotypedx(data=ddata, annot=dannot, do.mapping=domap)$score) ## TamR # Note: risk is not implemented, the function will return NA values if(dn.platform[i] == "affy") { domap <- FALSE } else { domap <- TRUE } rest <- cbind(rest, "TAMR13"=tamr13(data=ddata, annot=dannot, do.mapping=domap)$score) ## GENE70 # Need to do mapping for Affy platforms because this is based on Agilent. # Hence the mapping rule is reversed here! if(dn.platform[i] == "affy") { domap <- TRUE } else { domap <- FALSE } rest <- cbind(rest, "GENE70"=gene70(data=ddata, annot=dannot, std="none", do.mapping=domap)$score) ## Pik3cags if(dn.platform[i] == "affy") { domap <- FALSE } else { domap <- TRUE } rest <- cbind(rest, "PIK3CA"=pik3cags(data=ddata, annot=dannot, do.mapping=domap)) ## rorS # Uses the pam50 algorithm. Need to do mapping for both Affy and Agilent rest <- cbind(rest, "rorS"=rorS(data=ddata, annot=dannot, do.mapping=TRUE)$score) ## GENE76 # Mainly designed for Affy platforms. Has been excluded here # BIND ALL TOGETHER res <- rbind(res, rest) } names(ddemo.all) <- dn
为了进一步分析和处理数据,我们将所有信息存储在一个对象中。我们还从分析中删除重复的患者,只考虑那些具有完整的淋巴结、生存和治疗状况信息的患者。
ddemot <- NULL for(i in 1:length(ddemo.all)) {ddemot <- rbind(ddemot, ddemo.all)。所有[[i]][, ddemo.]coln, drop=FALSE])} res[完整的。案例(ddemot[,"dataset"]) & ddemot[,"dataset"] == "VDX", "GENIUS"] <- NA ##仅选择具有所有风险预测的未经治疗的节点阴性患者##即(不完整的病例(其中样本可能缺少风险预测)随后被删除))#注意,增加风险预测分析的数量#可能会增加不完整病例的数量#在genefu version1的前面小图中,我们只测试了4个风险预测因子,#所以我们总共剩下722个完整的病例#在这里,我们现在测试12个风险预测因子,所以我们只剩下713个完整的病例。两个版本之间9个案例的差异都来自NKI数据集。Myx <- complete。cases(res, ddemot[, c("node", "treatment")]) & ddemot[, "treatment"] == 0 & ddemot[, "node"] == 0 & !is.element(rownames(ddemot), duPL) res <- res[myx,, drop=FALSE] ddemot <- ddemot[myx,, drop=FALSE]
为了比较风险评分表现,我们计算一致性指数,这是一种概率,对于一对随机选择的可比样本,具有更高风险预测的样本将在另一个样本之前经历事件或属于更高的二进制类。
cc.res <- complete.cases(res) datasetList <- c("MAINZ","TRANSBIG","UPP","UNT","NKI") riskPList <- c("AURKA","ESR1","ERBB2","NPI", "GGI", "GENIUS", " endoppredict ","OncotypeDx","TAMR13","GENE70","PIK3CA","rorS") setT <- setE <- NULL resMatrix <- as.list(NULL) for(i in datasetList) {dataset。only <- ddemot[,"dataset"] == i patientsAll <- cc.res & dataset。仅##设置可用生存数据类型if(i == "UPP") {setT <- "t.rfs" setE <- "e.rfs"} else {setT <- " t.m mfs" setE <- " e.m mfs"} #计算每个预测器的cindex计算(Dat in riskPList) {cindex <- t(apply(X=t(res[patientsAll,Dat]), MARGIN=1, function(X, y, z) {tt <- concordance。指数(x = x,测量员时间= y,测量员event=z, method="noether", na.rm=TRUE);返回(c(“cindex”= tt $ c。cindex指数。”Se "=tt$ Se, "lower"=tt$lower, "upper"=tt$upper));}, y=ddemot[patientsAll,setT], z=ddemot[patientsAll, setE])) resMatrix[[Dat]] <- rbind(resMatrix[[Dat]], cindex)}}
使用随机效应模型,我们将特定于数据集的性能估计合并到每个风险预测模型的总体估计中:
for(i in names(resMatrix)){#获取一个元估计ceData <- combine.est(x=resMatrix[[i]][,"cindex"], x.se=resMatrix[[i]][,"cindex. est],* ceData$se cUpper <- ceData$estimate + qnorm(0.025, lower.tail=TRUE) * ceData$se cUpper <- ceData$estimate + qnorm(0.025, lower.tail=FALSE) * ceData$se cindexO <- cbind("cindex"=ceData$estimate, "cindex. tail=TRUE)se"=ceData$se, "lower"=cLower, "upper"=cUpper) resMatrix[[i]] <- rbind(resMatrix[[i]], cinindexo) rownames(resMatrix[[i]]) <- c(datasetList, "Overall")}
为了比较不同的风险预测模型,我们计算元估计的单边p值:
pv <- sapply(resMatrix, function(x) {return(x["Overall", c("cindex","cindex.se")])}) pv <- apply(pv, 2, function(x) {return(pnorm((x[1] - 0.5) / x[2], lower. 0)。tail=x[1] < 0.5))}) printPV<- matrix(pv,ncol=length(names(resMatrix))) rownames(printPV) <- "P-value" colnames(printPV) <- names(pv) printPV<-t(printPV)
打印p值表:
knitr::kable(printPV, digits=c(0, -1))
假定值 | |
---|---|
AURKA | 0 |
ESR1 | 0 |
ERBB2 | 0 |
NPI | 0 |
GGI | 0 |
天才 | 0 |
EndoPredict | 0 |
OncotypeDx | 0 |
TAMR13 | 0 |
GENE70 | 0 |
PIK3CA | 0 |
ror | 0 |
下图表示由每个预后预测因子的一致性指数衡量的风险评分表现。
RiskPList < - c(“AURKA”、“ESR1”、“ERBB2”、“NPI”、“GGI”、“天才”、“EndoPredict”、“OncotypeDx”、“TAMR13”、“GENE70”、“PIK3CA”、“ror”)datasetListF < - c(“美因茨”,“TRANSBIG”、“UPP”、“单元”、“NKI”、“整体”)myspace < -”“par (mfrow = c(2, 2))为(RiskPList RP) {# < < forestplotDat,无花果= TRUE > > = # # Forestplot tt < - rbind (resMatrix [(RP)][1:5,],“整体”= resMatrix [(RP)] [6]) tt < - as.data.frame (tt) labeltext < - (datasetListF) r.mean < c (tt cindex美元)r.lower < - c (tt降低美元)r.upper < - c (tt上美元)metaplot.surv (mn = r。意思是,降低= r。低,上层= r。upper, labels=labeltext, xlim=c(0.3,0.9), boxsize=0.5, zero=0.5, col=meta.colors(box="royalblue",line="深蓝色",zero="耐火砖"),main=粘贴(RP))}
我们还可以表示所有预测因子的总体估计,所有数据集的总体估计。
# #整体Forestplot mybigspace < -“tt < rbind(“OverallA”= resMatrix[[“AURKA”]][6],“OverallE1”= resMatrix[[“ESR1”]][6],“OverallE2”= resMatrix[[“ERBB2”]][6],“OverallN”= resMatrix[[“NPI”]][6],“OverallM”= resMatrix[[“GGI”]][6],“OverallG”= resMatrix[[“天才”]][6],“OverallE3”= resMatrix[[“EndoPredict”]][6],“OverallOD”= resMatrix[[“OncotypeDx”]][6],“OverallT”= resMatrix[[“TAMR13”]][6],“OverallG70”= resMatrix[[“GENE70”]][6],“OverallP”= resMatrix[[“PIK3CA”]][6],"OverallR"=resMatrix[["rorS"]][6,]) tt <- as.data.frame(tt) labeltext <- cbind(c("Risk Prediction","AURKA","ESR1","ERBB2","NPI", "GGI","GENIUS"," endoppredict ","OncotypeDx","TAMR13","GENE70","PIK3CA","rorS")) r.mean <- c(NA,tt$cindex) r.lower <- c(NA,tt$lower) r.upper <- c(NA,tt$upper) metaplot.surv(mn=r。意思是,降低= r。低,上层= r。upper, labels=labeltext, xlim=c(0.35,0.75), boxsize=0.5, 0 =0.5, col=meta.colors(box="royalblue",line="深蓝色",0 ="耐火砖"),main="整体和谐指数")
为了评估风险评分之间的差异,我们计算了一致性指数与它们的p值,并将估计值与p值进行比较cindex.comp.meta
用配对的学生t检验。
cc.res <- complete.cases(res) datasetList <- c("MAINZ","TRANSBIG","UPP","UNT","NKI") riskPList <- c("AURKA","ESR1","ERBB2","NPI","GGI","GENIUS", " endoppredict ","OncotypeDx","TAMR13","GENE70","PIK3CA","rorS") setT <- setE <- NULL resMatrixFull <- as.list(NULL) for(i in datasetList) {dataset。only <- ddemot[,"dataset"] == i patientsAll <- cc.res & dataset。如果(i == "UPP") {setT <- "t.rfs" setE <- "e.rfs"} else {setT <- " t.m mfs" setE <- " e.m mfs"} ## cindex和每个算法的p值计算(Dat in riskPList) {cindex <- t(apply(X=t(res[patientsAll,Dat])), MARGIN=1, function(X, y, z) {tt <- concordance。指数(x = x,测量员时间= y,测量员event=z, method="noether", na.rm=TRUE);返回(tt);}, y=ddemot[patientsAll,setT], z=ddemot[patientsAll, setE])) resMatrixFull[[Dat]] <- rbind(resMatrixFull[[Dat]], cindex)}} for(i in names(resMatrixFull)){rownames(resMatrixFull[[i]]) <- datasetList} ccmData <- tt <- rr <- NULL for(i in 1:length(resMatrixFull))){tt <- NULL for(j in 1:length(resMatrixFull)){if(i != j) {rr <- cindex.comp.meta(list.cindex1=resMatrixFull[[i]]], list。cindex2 = resMatrixFull [[j]],异性恋= TRUE) $ p。value} else {rr <- 1} tt <- cbind(tt, rr)} ccmData <- rbind(ccmData, tt)} ccmData <- as.data.frame(ccmData) colnames(ccmData) <- riskPList rownames(ccmData) <- riskPList . value
表2显示了用于不同方法比较的未经校正的p值。
表3显示了使用霍尔姆斯方法校正的p值,以校正多次测试。
#kable(ccmData,format =" latex") knitr::kable(ccmData[,1:6], digits=c(0, rep(-1,ncol(ccmData[,1:6]))), size="footnotesize") knitr::kable(ccmData[,7:12], digits=c(0, rep(-1,ncol(ccmData[,7:12]))), size="footnotesize",标题="不同方法比较的未校正p值"))
ccmDataPval <- matrix(p.adjust(data.matrix(ccmData), method="holm"), ncol=length(riskPList), dimnames=list(rownames(ccmData), colnames(ccmData))))
knitr::kable(ccmDataPval[,1:6], digits=c(0, rep(-1,ncol(ccmDataPval[,1:6])), size="footnotesize") knitr::kable(ccmDataPval[,7:12], digits=c(0, rep(-1,ncol(ccmDataPval[,7:12]))), size="footnotesize",标题="使用Holm方法校正p值"))
## R版本4.2.0 RC (2022-04-21 r82226) ##平台:x86_64-pc-linux-gnu(64位)##运行在Ubuntu 20.04.4 LTS ## ##矩阵产品:默认## BLAS: /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas。/home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack。所以## ## locale: ## [1] LC_CTYPE=en_US。UTF-8 LC_NUMERIC= c# # [3] LC_TIME=en_GB LC_COLLATE= c# # [5] LC_MONETARY=en_US。utf - 8 LC_MESSAGES = en_US。UTF-8 ## [7] LC_PAPER=en_US。UTF-8 LC_NAME= c# # [9] LC_ADDRESS=C lc_phone = c# # [11] LC_MEASUREMENT=en_US。UTF-8 LC_IDENTIFICATION=C ## ##附加的基本包:##[1]统计图形grDevices utils数据集方法基础## ##其他附加包:# # # # [1] breastCancerNKI_1.33.0 breastCancerUNT_1.33.0 [3] breastCancerUPP_1.33.0 breastCancerTRANSBIG_1.33.0 # # [5] breastCancerMAINZ_1.33.0 caret_6.0 - 92 # # [7] lattice_0.20-45 ggplot2_3.3.5 # # [9] rmeta_3.0 xtable_1.8-4 # # [11] genefu_2.29.0 AIMS_1.29.0 # # [13] Biobase_2.57.0 BiocGenerics_0.43.0 # # [15] e1071_1.7-9 iC10_1.5 # # [17] iC10TrainingData_1.3.1 impute_1.71.0 # # [19] pamr_1.56.1 cluster_2.1.3 # # [21] biomaRt_2.53.0 survcomp_1.47.0 # # [23] prodlim_2019.11.13 survival_3.3-1 # # [25]BiocStyle_2.25.0 ## ##通过命名空间加载(并且没有附加):[1] BiocFileCache_2.5.0 plyr_1.8.7 rcurl_1 . 1.6 ## [28] jsonlite_1.8.0 iterators_1.0.14 ## [7] SuppDists_1.1-9.7 foreach_1.5.2 htmltools_0.5.2 ## [10] magick_2.7.3 fansi_1.0.3 magrittr_2.0.3 ## [13] memoise_2.0.1 limma_3.53.0 recipes_0.2.0 ## [16] globals_0.14.0 Biostrings_2.65.0 gower_1.0.0 ## [19] hardhat_0.2.0 prettyunits_1.1.1 colorspace_2.0-3 ## [25] blob_1.2.3 rappdirs_0.3.3 xfun_0.30 ## [25] dplyr_1.0.8 crayon_1.5.1 RCurl_1.98-1.6 ## jsonlite_1.8.0 iterators_1.0.14glue_1.6.2 ## [31] gtable_0.3.0 ipred_0.9-12 zlibbioc_1.43.0 ## [34] XVector_0.37.0 future.apply_1.9.0 scales_1.2.0 ## [37] DBI_1.1.2 Rcpp_1.0.8.3 progress_1.2.2 ## [40] bit_4.0.4 proxy_0.4-26 mclust_5.4.9 ## [43] stats4_4.2.0 lava_1.6.10 httr_1.4.2 ## [46] ellipsis_0.3.2 pkgconfig_2.0.3 XML_3.99-0.9 ## [49] nnet_7.3-17 sass_0.4.1 dbplyr_2.1.1 ## [52] utf8_1.2.2 tidyselect_1.1.2 rlang_1.0.2 ## [55] reshape2_1.4.4 AnnotationDbi_1.59.0 munsell_0.5.0 ## [58] tools_4.2.0 cachem_1.0.6 cli_3.3.0 ## [61] generics_0.1.2 RSQLite_2.2.12 evaluate_0.15 ## [64] stringr_1.4.0 fastmap_1.1.0 yaml_2.3.5 ## [67] bootstrap_2019.6 ModelMetrics_1.2.2.2 knitr_1.38 ## [70] bit64_4.0.5 purrr_0.3.4 KEGGREST_1.37.0 ## [73] future_1.25.0 nlme_3.1-157 xml2_1.3.3 ## [76] compiler_4.2.0 filelock_1.0.2 curl_4.3.2 ## [79] png_0.1-7 tibble_3.1.6 bslib_0.3.1 ## [82] stringi_1.7.6 highr_0.9 Matrix_1.4-1 ## [85] survivalROC_1.0.3 vctrs_0.4.1 pillar_1.7.0 ## [88] lifecycle_1.0.1 BiocManager_1.30.17 jquerylib_0.1.4 ## [91] data.table_1.14.2 bitops_1.0-7 R6_2.5.1 ## [94] bookdown_0.26 KernSmooth_2.23-20 IRanges_2.31.0 ## [97] parallelly_1.31.1 codetools_0.2-18 MASS_7.3-57 ## [100] assertthat_0.2.1 withr_2.5.0 S4Vectors_0.35.0 ## [103] GenomeInfoDbData_1.2.8 parallel_4.2.0 hms_1.1.1 ## [106] grid_4.2.0 rpart_4.1.16 timeDate_3043.102 ## [109] class_7.3-20 rmarkdown_2.14 pROC_1.18.0 ## [112] lubridate_1.8.0