介绍PANTHER.db

PANTHER.db包提供了一个选择与驻留在SQLite数据库中的已编译PANTHER本体的接口。

PANTHER.db可以安装从Bioconductor使用

if (!requireNamespace("BiocManager")) install.packages("BiocManager") BiocManager::install("PANTHER.db")

底层SQLite数据库的大小目前约为500MB,必须使用AnnotationHub预先下载,如下所示

if (!requireNamespace("AnnotationHub")) BiocManager::install("AnnotationHub") library(AnnotationHub) ah <- AnnotationHub() query(ah, " panher .db")[[1]]

最后PANTHER.db可以装载

库(PANTHER.db)

如果您已经了解了select接口,那么只需查看帮助页,就可以立即了解此对象的各种方法。

帮助(“PANTHER.db”)

当您加载PANTHER.db包,它创建了一个PANTHER.db对象。如果你看这个物体,你会看到一些关于它的有用信息。

PANTHER.db
## PANTHER.db对象:## |有机体:AMBTC | ANOCA |疟| AQUAE |拟南芥| ASHGO | ASPFU | BACCR | BACSU | BACTN | BATDJ牛| | BRADI | BRADU | BRAFL | BRANA | BRARP | CAEBR运河| |狗| CAPAN鸡| | |黑猩猩CHLAA | CHLRE | CHLTR | CIOIN | CITSI | CLOBH | COELICOLOR | COXBU | CRYNJ | CUCSA | DAPPU |德伊勒| DICDI | DICPU | DICTD | ECOLI | EMENI | ENTHI | ERYGU | EUCGR | FELCA | |飞FUSNN | GEOSL | GIAIC | GLOVI | GORGO |西山|元素| HALSA |贺兰山| HELPY | HELRO | |马HORVV人类| | IXOSC | JUGRE | KORCO | LACSA | LEIMA | LEPIN | LEPOC | LISMO玉米| | |疟疾灵魂| MEDTR | METAC | METJA | MONBE | MONDO老鼠| | MUSAM | MYCGE | MYCTU | NEIMB | NELNU | NEMVE | NEUCR | NITMS |阿珥楠| ORYLA | ORYSJ | OSTTA |部分| PHANO | PHODC | PHYPA | PHYRM猪| | POPTR | PRIPA | PRUPE | PSEAE | PUCGT | PYRAE老鼠| |恒河| RHOBA | RICCO | SACS2 | |咸SCHPO | SCLS1 | SELML | SETIT | SHEON | SOLLC | SOLTU | SORBI | SOYBN | SPIOL | STAA8 | STRPU | STRR6 | SYNY3 | THAPS | THECC | THEKO |主题| THEMA孩子| TOBAC |三| TRICA | TRIVA | TRYB2 | USTMA | VIBCH | VITVI小麦| |蠕虫| XANCP |非洲爪蟾蜍| YARLI酵母| | YERPE斑马鱼| | ZOSMR # # | PANTHERVERSION: 16.0 # # | PANTHERSOURCEURL: ftp.pantherdb.org # # | PANTHERSOURCEDATE: 2021 - feb02 # # |包:注释dbi ## | Db type: PANTHER.db ## | DBSCHEMA: PANTHER_DB ## | dbschem规避:2.1 ## | UNIPROT to ENTREZ mapping: 2021-Feb02

默认情况下,您可以看到PANTHER.db对象设置为从支持的各种生物体中检索记录http://pantherdb.org.方法用于将所有查询限制到特定生物体。为了更改它,您首先需要为您感兴趣的生物体查找适当的生物体标识符。PANTHER基因本体基于Uniprot参考蛋白质组集。为了显示选项,我们提供了助手函数availablePthOrganisms它将列出所有受支持的生物以及它们的Uniprot生物名称和分类id:

availablePthOrganisms (PANTHER.db) [1:5]
注释dbi物种PANTHER物种基因组来源# 1人类人类HGNC,集合2 MOUSE MOUSE集合2,MGI # 3 RAT RAT集合,RGD # 4 CHICKEN CHICK集合# 5斑马鱼DANRE ZFIN,集合##基因组日期UNIPROT物种ID UNIPROT物种名称# 1参考蛋白质组2020_04人类智人# 2参考蛋白质组2020_04 MOUSE Mus musculus # 3参考蛋白质组2020_04 RAT Rattus norvegicus # 4参考蛋白质组2020_04 CHICK Gallus Gallus # 5参考蛋白质组2020_04 DANREDanio rerio ## UNIPROT Taxon ID ## 1 9606 ## 2 10090 ## 3 10116 ## 4 9031 ## 5 7955

学习了所感兴趣的生物体的PANTHER生物体名称后,就可以将该生物体更改为PANTHER.db对象:

pthOrganisms(PANTHER.db) <- "HUMAN"
## panthersourceull: ftp.pantherdb.org ## | PANTHERSOURCEDATE: 2021-Feb02 ## |包:AnnotationDbi ## | Db类型:PANTHER.db ## | DBSCHEMA: PANTHER_DB ## | dbschem规避:2.1 ## | UNIPROT to ENTREZ mapping: 2021-Feb02
resetPthOrganisms PANTHER.db (PANTHER.db)
## PANTHER.db对象:## |有机体:AMBTC | ANOCA |疟| AQUAE |拟南芥| ASHGO | ASPFU | BACCR | BACSU | BACTN | BATDJ牛| | BRADI | BRADU | BRAFL | BRANA | BRARP | CAEBR运河| |狗| CAPAN鸡| | |黑猩猩CHLAA | CHLRE | CHLTR | CIOIN | CITSI | CLOBH | COELICOLOR | COXBU | CRYNJ | CUCSA | DAPPU |德伊勒| DICDI | DICPU | DICTD | ECOLI | EMENI | ENTHI | ERYGU | EUCGR | FELCA | |飞FUSNN | GEOSL | GIAIC | GLOVI | GORGO |西山|元素| HALSA |贺兰山| HELPY | HELRO | |马HORVV人类| | IXOSC | JUGRE | KORCO | LACSA | LEIMA | LEPIN | LEPOC | LISMO玉米| | |疟疾灵魂| MEDTR | METAC | METJA | MONBE | MONDO老鼠| | MUSAM | MYCGE | MYCTU | NEIMB | NELNU | NEMVE | NEUCR | NITMS |阿珥楠| ORYLA | ORYSJ | OSTTA |部分| PHANO | PHODC | PHYPA | PHYRM猪| | POPTR | PRIPA | PRUPE | PSEAE | PUCGT | PYRAE老鼠| |恒河| RHOBA | RICCO | SACS2 | |咸SCHPO | SCLS1 | SELML | SETIT | SHEON | SOLLC | SOLTU | SORBI | SOYBN | SPIOL | STAA8 | STRPU | STRR6 | SYNY3 | THAPS | THECC | THEKO |主题| THEMA孩子| TOBAC |三| TRICA | TRIVA | TRYB2 | USTMA | VIBCH | VITVI小麦| |蠕虫| XANCP |非洲爪蟾蜍| YARLI酵母| | YERPE斑马鱼| | ZOSMR # # | PANTHERVERSION: 16.0 # # | PANTHERSOURCEURL: ftp.pantherdb.org # # | PANTHERSOURCEDATE: 2021 - feb02 # # |包:注释dbi ## | Db type: PANTHER.db ## | DBSCHEMA: PANTHER_DB ## | dbschem规避:2.1 ## | UNIPROT to ENTREZ mapping: 2021-Feb02

如你所见,生物现在仅限于智人。要显示从选择查询返回的所有数据,可以使用columns方法:

列(PANTHER.db)
## [1] " class_id " " class_term " " component_id " " component_term " ## [5] " confidence_code " " entrez " " evidence " " evidence_type " ## [9] " family_id " " family_term " " goslim_id " " goslim_term " ## [13] " pathway_id " " pathway_term " " species " " subfamily_term " ## [17] " uniprot "

其中一些字段也可以用作键类型:

keytypes (PANTHER.db)
## [1] " class_id " " component_id " " entrez " " family_id " " goslim_id " ## [6] " pathway_id " " species " " uniprot "

还可以显示表中任何键类型的所有可能键。如果未指定键类型,则FAMILY_ID将被归还。

go_ids <- head(keys(PANTHER.db,keytype="GOSLIM_ID")
## [1] " go:0000003" " go:0000018" " go:0000027" " go:0000030" " go:0000038" ## [6] " go:0000041"

最后,您可以循环使用时需要的列、键类型和键的任何组合选择mapIds

cols <- "CLASS_ID" res <- mapIds(panter .db, keys=go_ids, column=cols, keytype="GOSLIM_ID", multiVals="list")
## go:0000003 go:0000018 go:0000027 go:0000030 go:0000038 go:0000041 ## 54 10 6 5 8 13
res_inner <- select(PANTHER.db, keys=go_ids, columns=cols, keytype="GOSLIM_ID")
# # 96年[1]
尾(res_inner)
## goslim_id class_id ## 1072 go:0000041 pc00191 ## 1073 go:0000041 pc00149 ## 1074 go:0000041 pc00068 ## 1082 go:0000041 pc00003 ## 1083 go:0000041 pc00262 ## 1084 go:0000041 pc00176

默认情况下,所有的表都将通过内部连接使用带有PANTHER家族id的中心表进行连接。因此,所有没有关联PANTHER族ID的行将从输出中删除。要包括所有与PANTHER族ID相关的结果,参数jointype选择函数必须设置为

<- select(PANTHER.db, keys=go_ids, columns=cols,keytype="GOSLIM_ID", jointype="left")
# # 1705年[1]
尾(res_left)
## 1702 go:0000041 pthr45820: sf2 < na > ## 1703 go:0000041 pthr45820: sf3 < na > ## 1704 go:0000041 pthr45820: sf5 < na > ## 1705 go:0000041 pthr45820: sf5 < na > ## 1705 go:0000041 pthr45820: sf6 < na >

获取PANTHER蛋白类本体树结构的方法traverseClassTree可以使用:

term <- "PC00209" select(PANTHER.db,term, "CLASS_TERM","CLASS_ID")
## [1] CLASS_ID CLASS_TERM <0行>(或0长度的row.names)
祖宗<- traverseClassTree(PANTHER.db,term,scope="祖宗")select(PANTHER.db,祖宗," CLASS_TERM","CLASS_ID")
## [1] CLASS_ID CLASS_TERM <0行>(或0长度的row.names)
select(PANTHER.db,parents, "CLASS_TERM","CLASS_ID")
## [1] CLASS_ID CLASS_TERM <0行>(或0长度的row.names)
children <- traverse seclasstree (PANTHER.db,term,scope="CHILD") select(PANTHER.db,children, "CLASS_TERM","CLASS_ID")
## [1] CLASS_ID CLASS_TERM <0行>(或0长度的row.names)
(PANTHER.db,term,scope=" offspring ") select(PANTHER.db,offspring, "CLASS_TERM","CLASS_ID")
## [1] CLASS_ID CLASS_TERM <0行>(或0长度的row.names)

SessionInfo

sessionInfo ()
## R正在开发中(不稳定)(21-01-20 r79850) ##平台:x86_64-apple-darwin17.7.0(64位)##运行在:macOS High Sierra 10.13.6 ## ## Matrix products: default ## BLAS: /Users/ka36530_ca/R-stuff/bin/R-devel/lib/libRblas。dylib ## LAPACK: /Users/ka36530_ca/R-stuff/bin/R-devel/lib/libRlapack。dylib # # # #语言环境:# # [1]C / en_US.UTF-8 / en_US.UTF-8 / C / en_US.UTF-8 / en_US。UTF-8 ## ##附加的基包:## [1]parallel stats4 stats graphics grDevices utils datasets ## [8] methods base ## ##其他附加的包:## [1]panter .db_1.0.11 RSQLite_2.2.3 AnnotationHub_2.23.2 ## [4] BiocFileCache_1.15.1 dbplyr_2.1.0 AnnotationDbi_1.53.1 ## [7] IRanges_2.25.6 S4Vectors_0.29.7 Biobase_2.51.0 ## [10] BiocGenerics_0.37.1 BiocStyle_2.19.1 ## ### [1] Rcpp_1.0.6 png_0.1-7 ## [3] Biostrings_2.59.2 assertthat_0.2.1 ## [7] mime_0.10 R6_2.5.0 ## [9] evaluate_0.14 httr_1.4.2 ## [11] pillar_1.5.0 zlibbioc_1.37.0 ## [13] rlang_0.4.10 curl_4.3 ## [15] jquerylib_0.1.3 blob_1.2.1 ## [15] rmarkdown_2.7 string_1 .4.0 ## [19] compiler_4.1.0 httpuv_1.5.5 ## [23] xfun_0.21 pkgconfig_2.0.3 ## [25] htmltools_0.5.1.1 tibble_3.1.0 ## [29]# [39] lifecycle_1.0.0 DBI_1.1.1 ## [41] magrittr_2.0.1 stringi_1.5.3 ## [43] cachem_1.0.4 XVector_0.31.1 ## [45] promises_1.2.0.1 bslib_0.2.4 ## [47] ellipsis_0.3.1 filelock_1.0.2 ## [49] generics_0.1.0 vctrs_0.3.6 ## [51] tools_4.1.0 bit64_4.0.5 ## [53] glue_1.4.2 purrr_0.3.4 ## [35] BiocVersion_3.13.1fastmap_1.1.0 ## [57] yaml_2.2.1 BiocManager_1.30.10 ## [59] memoise_2.0.0 knitr_1.31 ## [61] sass_0.3.1