scGPS介绍

Quan Nguyen和Michael Thompson

2022-04-26

1.安装指导

#从github安装scGPS(取决于本地的配置#计算机或HPC,可能需要自定义c++编译-见#安装故障排除如下)devtools::install_github“IMB-Computational-Genomics-Lab / scGPS”# c++编译故障排除,手动下载安装即可#完成从githubGit克隆HTTPS//github.com/海事局-计算-基因组学-实验室/scGPS#然后检入scGPS/src,如果有任何预编译的(例如带有*. #的)。所以,# *.o)文件存在,在重新编译之前删除它们#然后以scGPS为R工作目录,手动安装和加载#使用devtools功能#安装包devtools::安装()#加载包到工作区图书馆(scGPS)

2.scGPS的简单工作流程:

此工作流的目的是解决以下任务:

2.1创建scGPS对象

2.2运行预测

2.3总结结果

3.scGPS的完整工作流程:

此工作流的目的是解决以下任务:

3.1使用CORE识别数据集中的集群

(如果已知集群,则跳过此步骤)

3.2使用SCORE(最优分辨率稳定聚类)识别数据集中的聚类

(如果已知集群,则跳过此步骤)

(SCORE旨在通过在CORE算法中引入bagging aggregation和bootstrapping来获得稳定的子种群结果)

3.3可视化所有迭代中的所有聚类结果

3.4将聚类结果与其他降维方法(如tSNE)进行比较

3.5寻找基因标记,标注聚类

4.一个样本内或两个样本间簇之间的关系

此工作流的目的是解决以下任务:

4.1启动scGPS预测,寻找簇间关系

4.2显示预测的汇总结果

#获取汇总矩阵的行数row_cluster < -长度独特的colData(mixedpop2),1)))LDA显示被分类为单元格的单元格百分比#归属LDA分类器summary_prediction_ldaLSOLDA_dat =LSOLDA_dat,nPredSubpop =row_cluster)#> V1 V2的名称#> 1 6.41711229946524 8.02139037433155 LDA的子pop 1在目标mixedpop2#> 2 92.1428571428571 89.2857142857143 LDA的子pop2在目标mixedpop2#> 3 3.7593984962406 6.76691729323308 LDA的子pop 3在目标mixedpop2#> 4 32.5 47.5 LDA的子pop 4在目标mixedpop2套索显示被分类为单元格的单元格的百分比#归属套索分类器summary_prediction_lassoLSOLDA_dat =LSOLDA_dat,nPredSubpop =row_cluster)#> V1 V2的名称#> 1 40.1069518716578 83.4224598930481为目标mixedpop2的子pop1设置弹性网#> 2 96.4285714285714 97.1428571428571在目标mixedpop2的子pop2的ElasticNet在目标mixedpop2中添加子pop3的ElasticNet#> 4 75 72.5 ElasticNet的subpop4在目标mixedpop2模型训练过程中模型解释的最大偏差summary_deviance对象=LSOLDA_dat)# > allDeviance美元#>[1]“61.37”“82.73”# ># > DeviMax美元#> dat_DE$Dfd偏差DEgenes#> 1 0 82.73 genes_cluster1#> 2 1 82.73 genes_cluster1#> 3 2 82.73 genes_cluster1#> 4 3 82.73 genes_cluster1#> 5 6 82.73 genes_cluster1#> 6 8 82.73 genes_cluster1#> 7 11 82.73 genes_cluster1#> 8 12 82.73 genes_cluster1#> 9 16 82.73 genes_cluster1#> 10 17 82.73 genes_cluster1#> 11 19 82.73 genes_cluster1#> 12 21 82.73 genes_cluster1#> 13 23 82.73 genes_cluster1#> 14 25 82.73 genes_cluster1#> 15 27 82.73 genes_cluster1#> 16 28 82.73 genes_cluster1#> 17 32 82.73 genes_cluster1#> 18 36 82.73 genes_cluster1#> 19 39 82.73 genes_cluster1#> 20 41 82.73 genes_cluster1#> 21 45 82.73 genes_cluster1#> 22 51 82.73 genes_cluster1#> 23 56 82.73 genes_cluster1#> 24 58 82.73 genes_cluster1#> 25 59 82.73 genes_cluster1#> 26 60 82.73 genes_cluster1#> 27 61 82.73 genes_cluster1#> 28 67 82.73 genes_cluster1#> 29 71 82.73 genes_cluster1#> 30个剩余DEgenes剩余DEgenes剩余DEgenes# ># > LassoGenesMax美元# >零# summary accuracy检查遗漏测试集中的模型精度summary_accuracy对象=LSOLDA_dat)#> [1] 66.07143 66.07143

4.3绘制一个样本中聚类之间的关系

在这里,我们看一个示例用例,以查找一个样本内或两个样本之间的集群之间的关系

#运行预测3个集群cluster_mixedpop1 < -colData(mixedpop1),1cluster_mixedpop2 < -colData(mixedpop2),1#cluster_mixedpop2 <- as.numeric(as.vector(colData(mixedpop2)[,1]))c_selectID < -1#前200个基因标记区分簇1基因=德根id (1200LSOLDA_dat1 < -bootstrap_predictionnboots =2mixedpop1 =mixedpop2,mixedpop2 =mixedpop2,基因=基因,c_selectID,listData =列表(),cluster_mixedpop1 =cluster_mixedpop2,cluster_mixedpop2 =cluster_mixedpop2)c_selectID < -2基因=德根id (1200LSOLDA_dat2 < -bootstrap_predictionnboots =2mixedpop1 =mixedpop2,mixedpop2 =mixedpop2,基因=基因,c_selectID,listData =列表(),cluster_mixedpop1 =cluster_mixedpop2,cluster_mixedpop2 =cluster_mixedpop2)c_selectID < -3.基因=德根id (1200LSOLDA_dat3 < -bootstrap_predictionnboots =2mixedpop1 =mixedpop2,mixedpop2 =mixedpop2,基因=基因,c_selectID,listData =列表(),cluster_mixedpop1 =cluster_mixedpop2,cluster_mixedpop2 =cluster_mixedpop2)c_selectID < -4基因=德根id (1200LSOLDA_dat4 < -bootstrap_predictionnboots =2mixedpop1 =mixedpop2,mixedpop2 =mixedpop2,基因=基因,c_selectID,listData =列表(),cluster_mixedpop1 =cluster_mixedpop2,cluster_mixedpop2 =cluster_mixedpop2)#准备sankey图的表输入LASSO_C1S2 < -reformat_LASSOc_selectID =1mp_selectID =2LSOLDA_dat =LSOLDA_dat1,nPredSubpop =长度独特的colData(mixedpop2)(,1))),Nodes_group =“# 7570 b3”LASSO_C2S2 < -reformat_LASSOc_selectID =2mp_selectID =2LSOLDA_dat =LSOLDA_dat2,nPredSubpop =长度独特的colData(mixedpop2)(,1))),Nodes_group =“# 1 b9e77”LASSO_C3S2 < -reformat_LASSOc_selectID =3.mp_selectID =2LSOLDA_dat =LSOLDA_dat3,nPredSubpop =长度独特的colData(mixedpop2)(,1))),Nodes_group =“# e7298a”LASSO_C4S2 < -reformat_LASSOc_selectID =4mp_selectID =2LSOLDA_dat =LSOLDA_dat4,nPredSubpop =长度独特的colData(mixedpop2)(,1))),Nodes_group =“# 00飞行符”结合< -rbind(lasso_c1s2, lasso_c2s2, lasso_c3s2, lasso_c4s2)结合< -结合(is.na(结合值)! =真正的,)nboots =2#links:源,目标,值#来源:节点,节点组combined_D3obj < -列表节点=[(nboots相结合+3.(nboots+4)),链接=(相结合,c((nboots+2(nboots+1),ncol(联合))))图书馆(networkD3)Node_source < -as.vector排序独特的(combined_D3obj链接源)))Node_target < -as.vector排序独特的(combined_D3obj链接目标)))Node_all < -独特的c(Node_source Node_target))#为Source分配id(从0开始)< -combined_D3obj来源链接目标< -combined_D3obj链接目标(我1长度(Node_all)) {来源(来源= =Node_all[我]]< -我-1目标(目标= =Node_all[我]]< -我-1combined_D3obj链接源< -as.numeric(源)combined_D3obj链接目标< -as.numeric(目标)combined_D3obj链接LinkColor < -结合节点组#准备节点信息node_df < -data.frame节点=Node_all)node_dfid < -as.numericc01长度(Node_all)-1)))suppressMessages图书馆(dplyr))颜色< -结合% > %(节点,颜色=节点组)% > %选择2node_df颜色< -颜色颜色suppressMessages图书馆(networkD3))p1 < -sankeyNetwork链接=combined_D3obj链接,节点=node_df,值=“价值”节点组=“颜色”LinkGroup =“LinkColor”NodeID =“节点”源=“源”目标=“目标”字形大小=22p1
#saveNetwork(p1, file = paste0(path,'Subpopulation_Net.html'))

4.3绘制两个样本的聚类关系

在这里,我们看一个示例用例,以查找一个样本内或两个样本之间的集群之间的关系

#运行预测3个集群cluster_mixedpop1 < -colData(mixedpop1),1cluster_mixedpop2 < -colData(mixedpop2),1row_cluster < -长度独特的colData(mixedpop2),1)))c_selectID < -1#前200个基因标记区分簇1基因=德根id (1200LSOLDA_dat1 < -bootstrap_predictionnboots =2mixedpop1 =mixedpop1,mixedpop2 =mixedpop2,基因=基因,c_selectID,listData =列表(),cluster_mixedpop1 =cluster_mixedpop1,cluster_mixedpop2 =cluster_mixedpop2)c_selectID < -2基因=德根id (1200LSOLDA_dat2 < -bootstrap_predictionnboots =2mixedpop1 =mixedpop1,mixedpop2 =mixedpop2,基因=基因,c_selectID,listData =列表(),cluster_mixedpop1 =cluster_mixedpop1,cluster_mixedpop2 =cluster_mixedpop2)c_selectID < -3.基因=德根id (1200LSOLDA_dat3 < -bootstrap_predictionnboots =2mixedpop1 =mixedpop1,mixedpop2 =mixedpop2,基因=基因,c_selectID,listData =列表(),cluster_mixedpop1 =cluster_mixedpop1,cluster_mixedpop2 =cluster_mixedpop2)#准备sankey图的表输入LASSO_C1S1 < -reformat_LASSOc_selectID =1mp_selectID =1LSOLDA_dat =LSOLDA_dat1,nPredSubpop =row_cluster,Nodes_group =“# 7570 b3”LASSO_C2S1 < -reformat_LASSOc_selectID =2mp_selectID =1LSOLDA_dat =LSOLDA_dat2,nPredSubpop =row_cluster,Nodes_group =“# 1 b9e77”LASSO_C3S1 < -reformat_LASSOc_selectID =3.mp_selectID =1LSOLDA_dat =LSOLDA_dat3,nPredSubpop =row_cluster,Nodes_group =“# e7298a”结合< -rbind(LASSO_C2S1 LASSO_C1S1 LASSO_C3S1)nboots =2#links:源,目标,值#来源:节点,节点组combined_D3obj < -列表节点=[(nboots相结合+3.(nboots+4)),链接=(相结合,c((nboots+2(nboots+1),ncol(联合))))结合< -结合(is.na(结合值)! =真正的,)图书馆(networkD3)Node_source < -as.vector排序独特的(combined_D3obj链接源)))Node_target < -as.vector排序独特的(combined_D3obj链接目标)))Node_all < -独特的c(Node_source Node_target))#为Source分配id(从0开始)< -combined_D3obj来源链接目标< -combined_D3obj链接目标(我1长度(Node_all)) {来源(来源= =Node_all[我]]< -我-1目标(目标= =Node_all[我]]< -我-1combined_D3obj链接源< -as.numeric(源)combined_D3obj链接目标< -as.numeric(目标)combined_D3obj链接LinkColor < -结合节点组#准备节点信息node_df < -data.frame节点=Node_all)node_dfid < -as.numericc01长度(Node_all)-1)))suppressMessages图书馆(dplyr))n < -长度独特的(node_df节点)getPalette =colorRampPalette(RColorBrewer::brewer.pal9“set2”中的))颜色=getPalette(n)node_df颜色< -颜色suppressMessages图书馆(networkD3))p1 < -sankeyNetwork链接=combined_D3obj链接,节点=node_df,值=“价值”节点组=“颜色”LinkGroup =“LinkColor”NodeID =“节点”源=“源”目标=“目标”字形大小=22p1
#saveNetwork(p1, file = paste0(path,'Subpopulation_Net.html'))
devtools::session_info()# >会话信息────────────────────────────────────────────────────────────────#>设置值#>版本R版本4.2.0 RC (2022-04-21 r82226)#> os Ubuntu 20.04.4 LTS#>系统x86_64, linux-gnu#> ui X11#>语言(EN)C .选项C#> ctype en_US。utf - 8# >tz美国/纽约#>日期2022-04-26@ /usr/bin/(通过rmarkdown)# >包# >────────────────────────────────────────────────────────────────────#>包的版本日期(UTC[2]生物导体[2]生物导体[2] CRAN (R 4.2.0)#> applot 0.1.3 2022-04-01 [2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)[2]生物导体[2] Bioconductor[2]生物导体[2]生物导体[2] CRAN (R 4.2.0)> bit64 4.0.5 2020-08-30 [2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)2021-11-30 [2] CRAN (R 4.2.0)#> bslib 0.3.1 2021-10-06 [2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)> callr 3.7.0 2021-04-20 [2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)> clusterProfiler * 4.5.0 2022-04-26 [2]> codetools 0.2-18 2020-11-04 [2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)#> cowplot 1.1.1 2020-12-30 [2] CRAN (R 4.2.0)>蜡笔1.5.1 2022-03-26 [2]CRAN (R 4.2.0)# >数据。table 1.14.2 2021-09-27 [2] CRAN (R 4.2.0)#> dbi 1.1.2 2021-12-20[2]起重机(r 4.2.0)[2]生物导体#> denextend 1.15.2 2021-10-28 [2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)[2]生物导体#> devtools 2.4.3 [2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)[2] Bioconductor[2] Bioconductor2015-07-09 [2] CRAN (R 4.2.0)#> dplyr * 1.0.8 2022-02-08 [2] CRAN (R 4.2.0)#> dynamicTreeCut * 1.63- 2016-03-11 [2] CRAN (R 4.2.0)> e1071 1.7-9 2021-09-16 [2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)[2] Bioconductor[2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)> farver 2.1.0 [2] CRAN (R 4.2.0)#> fastcluster 1.2.3 2021-05-24 [2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)> fastmatch 1.1-3 2021-07-23 [2] CRAN (R 4.2.0)[2]生物导体[2] CRAN (R 4.2.0)> fs 1.5.2 2021-12-08 [2] CRAN (R 4.2.0)#> future 1.25.0 2022-04-24 [2] CRAN (R 4.2.0)# >的未来。应用1.9.0 2022-04-25 [2] CRAN (R 4.2.0)[2]生物导体[2] Bioconductor[2] CRAN (R 4.2.0)[2] Bioconductor#> GenomeInfoDbData[2] Bioconductor#> ggforce 0.3.3 2021-03-05 [2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)#> ggplot2 * 3.3.5 2021-06-25 [2] CRAN (R 4.2.0)> ggplotify 0.1.0 2021-09-02 [2] CRAN (R 4.2.0)#> grgraph 2.0.5 2021-02-23 [2] CRAN (R 4.2.0)> ggrepel 0.9.1 2021-01-15 [2] CRAN (R 4.2.0)> ggtree 3.5.0 2022-04-26 [2] Bioconductor[2] CRAN (R 4.2.0)> globals 0.14.0 2020-11-22 [2] CRAN (R 4.2.0)>胶1.6.2 2022-02-24 [2]CRAN (R 4.2.0)> GO.db 3.15.0 2022-04-25 [2][2] Bioconductor#>高尔1.0.0 2022-02-03 [2]CRAN (R 4.2.0)[2] Bioconductor[2]生物导体[2] CRAN (R 4.2.0)#> gridExtra 2.3 2017-09-09 [2] CRAN (R 4.2.0)#> gridGraphics 0.5-1 2020-12-13 [2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)>硬帽0.2.0 2022-01-24 [2]CRAN (R 4.2.0)[2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)> htmlwidgets 1.5.4 2021-09-08 [2] CRAN (R 4.2.0)2020-07-20 [2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)#> ipred 0.9-12 2021-09-15 [2] CRAN (R 4.2.0)#> IRanges * 2.31.0 2022-04-26 [2] Bioconductor>迭代器1.0.14 2022-02-05 [2]CRAN (R 4.2.0)[2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)[2]生物导体#>针织衫1.38 2022-03-25 [2]CRAN (R 4.2.0)>标签0.4.2 2020-10-20 [2]CRAN (R 4.2.0)#>格子* 0.20-45 2021-09-22 [2]CRAN (R 4.2.0)>熔岩1.6.10 2021-09-02 [2]CRAN (R 4.2.0)[2] CRAN (R 4.2.0)>生命周期1.0.1 2021-09-24 [2]CRAN (R 4.2.0)> listenv 0.8.0 2019-12-05 [2] CRAN (R 4.2.0)#> locfit * 1.5-9.5 2022-03-03 [2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)> magrittr 2.0.3 2022-03-30 [2] CRAN (R 4.2.0)[2]起重机(r 4.2.0)[2] CRAN (R 4.2.0)#> MatrixGenerics * 1.9.0 2022-04-26 [2] Bioconductor[2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)> ModelMetrics 1.2.2.2 2020-03-17 [2] CRAN (R 4.2.0)#> munsell 0.5.0 2018-06-12 [2] CRAN (R 4.2.0)#>网络d3 * 0.4 2017-03-18 [2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)#> nnet 7.3-17 2022-01-16 [2] CRAN (R 4.2.0)[2] Bioconductor[2] CRAN (R 4.2.0)2020-12-17 [2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)#> pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.2.0)#> pkgload 1.2.4 2021-11-30 [2] CRAN (R 4.2.0)#> plyr 1.8.7 2022-03-24 [2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)> polyclip 1.10-0 2019-03-14 [2] CRAN (R 4.2.0)#> prettyunits 1.1.1 2020-01-24 [2] CRAN (R 4.2.0)#> pROC 1.18.0 2021-09-03 [2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)#> prodlim 2019.11.13 2019-11-17 [2] CRAN (R 4.2.0)#>代理0.4-26 2021-06-07 [2]CRAN (R 4.2.0)> ps 1.7.0 [2] CRAN (R 4.2.0)#> purrr 0.3.4 2020-04-17 [2] CRAN (R 4.2.0)[2] Bioconductor> r6 2.5.1 2021-08-19[2]起重机(r 4.2.0)#> rappdirs 0.3.3 2021-01-31 [2] CRAN (R 4.2.0)#> RColorBrewer 1.1-3 2022-04-03 [2] CRAN (R 4.2.0)> Rcpp 1.0.8.3 [2] CRAN (R 4.2.0)#> RcppArmadillo 0.11.0.0.0 2022-04-04 [2] CRAN (R 4.2.0)#> RcppParallel 5.1.5 2022-01-05 [2] CRAN (R 4.2.0)> RCurl 1.98-1.6 2022-02-08 [2] CRAN (R 4.2.0)[2]生物导体#> ReactomePA * 1.40.0 [2] Bioconductor>食谱0.2.0 2022-02-18 [2]CRAN (R 4.2.0)>遥控器2.4.2 2021-11-30 [2]CRAN (R 4.2.0)[2] CRAN (R 4.2.0)> rlang 1.0.2 2022-03-04 [2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)> rpart 4.1.16 2022-01-24[2]起重机(R 4.2.0)> rprojroot 2.0.3 2022-04-02 [2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)> Rtsne 0.16 2022-04-17 [2] CRAN (R 4.2.0)#> S4Vectors * 0.35.0 2022-04-26 [2] Bioconductor> sass 0.4.1 [2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)#> scGPS * 1.11.0 2022-04-26 [1] Bioconductor[2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)[2]生物导体> stringi 1.7.6 2021-11-29 [2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)>生物导体[2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)> tibble 3.1.6 2021-11-07 [2] CRAN (R 4.2.0)>潮汐图1.2.1 2022-04-05 [2]CRAN (R 4.2.0)#> tidyr 1.2.0 2022-02-01 [2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)#> tidytree 0.3.9 2022-03-04 [2] CRAN (R 4.2.0)#> timeDate 3043.102 2018-02-21 [2] CRAN (R 4.2.0)[2]生物导体#> tween1.0.2 [2] CRAN (R 4.2.0)2021-12-09 [2] CRAN (R 4.2.0)> utf8 1.2.2 2021-07-24 [2] CRAN (R 4.2.0)> vctrs 0.4.1 [2] CRAN (R 4.2.0)> viridis 0.6.2 [2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)[2] CRAN (R 4.2.0)> xfun 0.30 2022-03-02 [2] CRAN (R 4.2.0)[2] cran (r 4.2.0)[2] CRAN (R 4.2.0)[2]生物导体> yaml 2.3.5 2022-02-21 [2] CRAN (R 4.2.0)# > yulab。[2] CRAN (R 4.2.0)[2]生物导体# >#> [1] /tmp/RtmpkjCPYf/Rinst2e0f2214339d01#> [2] /home/biocbuild/bbs-3.16-bioc/R/library# ># >──────────────────────────────────────────────────────────────────────────────