库(MungeSumstats)
MungeSumstats现在提供了从MRC IEU数据的高吞吐量查询和导入功能开放GWAS项目.
####搜索数据集#### metagwas <- MungeSumstats::find_sumstats(traits = c("parkinson","alzheimer"), min_sample_size = 1000) head(metagwas,3) ids <- (dplyr::arrange(metagwas, nsnp))$id
## id trait group_name年度作者## 1 ieu-a-298阿尔茨海默病公众2013兰伯特## 2 ieu-b-2阿尔茨海默病公众2019昆克尔BW ## 3 ieu-a-297阿尔茨海默病公众2013兰伯特## 1 IGAP ## 2阿尔茨海默病遗传学联盟(ADGC),欧洲阿尔茨海默病倡议(EADI),基因组流行病学联盟心脏和衰老研究队列(CHARGE), AD的遗传和环境风险/定义遗传,阿尔茨海默病多基因和环境风险联盟# # 3 IGAP # #性别人口单位nsnp sample_size构建# # 74046男性和女性11633年欧洲日志赔率HG19 / GRCh37 # # 2雄性和雌性欧洲NA 10528610 63926 HG19 / GRCh37 # # 3雄性和雌性欧洲日志赔率7055882 54162 HG19 / GRCh37 # #类别子类本体优先pmid先生sd # # 1疾病精神/神经NA 1 1 24162737 NA # # 2二进制精神/神经NA 1 0 30820047 NA # # 3疾病精神/神经NA 1 2 24162737 NA # #注意ncase # # 1接触;效应等位基因频率缺失;正向(+)链25580 ## 2 NA 21982 ## 3效应等位基因频率缺失;向前(+)股17008 ## ncontrol N ## 1 48466 74046 ## 2 41944 63926 ## 3 37154 54162
你可以提供import_sumstats ()
你想要多少OpenGWAS id就有多少,但为了节省时间我们只给一个。
数据集<- MungeSumstats::import_sumstats(id = "ieu-a-298", ref_genome = "GRCH37")
默认情况下,import_sumstats
结果是一个命名列表,其中名称是Open GWAS数据集id,项目是格式化的摘要统计信息的各自路径。
打印(数据集)
# #的内外加厚- a - 298 # #美元[1]“/ tmp / RtmpfsG8K8 /内外加厚- - 298. tsv.gz”
你也可以很容易地将其转换为data.frame。
Results_df <- data.frame(id=names(datasets), path=unlist(datasets)) print(Results_df)
ieu-a-298 ieu-a-298 /tmp/RtmpfsG8K8/ieu-a-298.tsv.gz
可选:通过多线程下载加快速度阿克塞尔.
datasets <- MungeSumstats::import_sumstats(ids = ids, vcf_download = TRUE, download_method = "axel", nThread = max(2,future::availableCores()-2))
看到入门小插图有关如何使用MungeSumstats及其功能的更多信息。
跑龙套:sessionInfo ()
## R正在开发中(不稳定)(2022-12-10 r83428) ##平台:x86_64-pc-linux-gnu(64位)##运行在Ubuntu 22.04.1 LTS ## ##矩阵产品:默认## BLAS: /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas。so ## LAPACK: /usr/lib/x86_64-linux-gnu/ LAPACK /liblapack.so.3.10.0 ## ## locale: ## [1] LC_CTYPE=en_US。UTF-8 LC_NUMERIC= c# # [3] LC_TIME=en_GB LC_COLLATE= c# # [5] LC_MONETARY=en_US。utf - 8 LC_MESSAGES = en_US。UTF-8 ## [7] LC_PAPER=en_US。UTF-8 LC_NAME= c# # [9] LC_ADDRESS=C lc_phone = c# # [11] LC_MEASUREMENT=en_US。UTF-8 LC_IDENTIFICATION=C ## ##时区:美国/New_York ## tzcode源:系统(glibc) ## ##附加的基本包:## [1]stats graphics grDevices utils datasets methods base ## ##其他附加包:## [1]MungeSumstats_1.7.12 biocstyle_id .27.0 ## ##通过命名空间加载(且未附加):# # # # [1] tidyselect_1.2.0 [2] dplyr_1.0.10 # # [3] blob_1.2.3 # # [4] filelock_1.0.2 # # [5] R.utils_2.12.2 # # [6] Biostrings_2.67.0 # # [7] bitops_1.0-7 # # [8] fastmap_1.1.0 # # [9] rcurl_1.98 - 1.9 # # [10] BiocFileCache_2.7.1 # # [11] VariantAnnotation_1.45.0 # # [12] GenomicAlignments_1.35.0 # # [13] xml_3.99 - 0.13 # # [14] digest_0.6.31 # # [15] lifecycle_1.0.3 # # [16] ellipsis_0.3.2 # # [17] KEGGREST_1.39.0 # # [18] RSQLite_2.2.19 # # [19] googleAuthR_2.0.0 # # [20] magrittr_2.0.3 # # [21] compiler_4.3.0 # #[22] rlang_1.0.6 ## [23] sass_0.4.4 ## [24] progress_1.2.2 ## [25] tools_4.3.0 ## [26] utf8_1.2.2 ## [27] yaml_2.3.6 ## [28] data.table_1.14.6 ## [29] rtracklayer_1.59.0 ## [30] knitr_1.41 ## [31] prettyunits_1.1.1 ## [32] curl_4.3.3 ## [33] bit_4.0.5 ## [34] DelayedArray_0.25.0 ## [35] xml2_1.3.3 ## [36] BiocParallel_1.33.7 ## [37] BiocGenerics_0.45.0 ## [38] R.oo_1.25.0 ## [39] grid_4.3.0 ## [40] stats4_4.3.0 ## [41] fansi_1.0.3 ## [42] biomaRt_2.55.0 ## [43] SummarizedExperiment_1.29.1 ## [44] cli_3.5.0 ## [45] rmarkdown_2.19 ## [46] crayon_1.5.2 ## [47] generics_0.1.3 ## [48] BSgenome.Hsapiens.1000genomes.hs37d5_0.99.1 ## [49] httr_1.4.4 ## [50] rjson_0.2.21 ## [51] DBI_1.1.3 ## [52] cachem_1.0.6 ## [53] stringr_1.5.0 ## [54] zlibbioc_1.45.0 ## [55] assertthat_0.2.1 ## [56] parallel_4.3.0 ## [57] AnnotationDbi_1.61.0 ## [58] BiocManager_1.30.19 ## [59] XVector_0.39.0 ## [60] restfulr_0.0.15 ## [61] matrixStats_0.63.0 ## [62] vctrs_0.5.1 ## [63] Matrix_1.5-3 ## [64] jsonlite_1.8.4 ## [65] bookdown_0.31 ## [66] IRanges_2.33.0 ## [67] hms_1.1.2 ## [68] S4Vectors_0.37.3 ## [69] bit64_4.0.5 ## [70] GenomicFiles_1.35.0 ## [71] GenomicFeatures_1.51.4 ## [72] jquerylib_0.1.4 ## [73] glue_1.6.2 ## [74] codetools_0.2-18 ## [75] stringi_1.7.8 ## [76] GenomeInfoDb_1.35.8 ## [77] BiocIO_1.9.1 ## [78] GenomicRanges_1.51.4 ## [79] tibble_3.1.8 ## [80] pillar_1.8.1 ## [81] SNPlocs.Hsapiens.dbSNP155.GRCh37_0.99.22 ## [82] rappdirs_0.3.3 ## [83] htmltools_0.5.4 ## [84] GenomeInfoDbData_1.2.9 ## [85] BSgenome_1.67.1 ## [86] R6_2.5.1 ## [87] dbplyr_2.2.1 ## [88] evaluate_0.19 ## [89] lattice_0.20-45 ## [90] Biobase_2.59.0 ## [91] highr_0.9 ## [92] R.methodsS3_1.8.2 ## [93] png_0.1-8 ## [94] Rsamtools_2.15.0 ## [95] gargle_1.2.1 ## [96] memoise_2.0.1 ## [97] bslib_0.4.2 ## [98] Rcpp_1.0.9 ## [99] xfun_0.36 ## [100] fs_1.5.2 ## [101] MatrixGenerics_1.11.0 ## [102] pkgconfig_2.0.3