内容

1简介

下一代测序实验和生物信息学管道产生的基因组区域在注释基因组特征时更有意义。发生在外显子或增强子中的SNP可能比发生在基因间区域的SNP更有吸引力。我们可能有兴趣发现一种特定的转录因子绝大多数结合在启动子中，而另一种转录因子主要结合在3 ' utr中。含有CpG岛的启动子的超甲基化可能表明在一种条件下与另一种条件下不同的调控机制。

annotatr提供基因组注释和一组在基因组注释上下文中读取、交叉、总结和可视化基因组区域的函数。

2安装

发布版本annotatr可通过Bioconductor，可安装如下:

如果(!requireNamespace("BiocManager"， quiet =TRUE)) install.packages("BiocManager")::install("annotatr")

的开发版本annotatr可以通过GitHub库或Bioconductor．方法安装开发版本是最简单的devtools包装如下:

devtools:: install_github(“rcavalcante / annotatr”)

开发版本的变更日志将在下文详细介绍GitHub版本．

3.注释

注解有三种类型可供注解atr使用:

内置注释，包括CpG注释、基因注释、增强子、GENCODE lncrna和来自chromHMM的染色质状态。以某种方式检索和处理每个注释的基本数据。有关数据源和处理的详细信息，请参阅下面的每一个。
AnnotationHub注释包括Bioconductor AnnotationHub web资源中的任何GRanges资源。
用户提供的自定义注释。

3．1CpG注释

CpG岛是所有CpG注释的基础，由AnnotationHub特定生物体的包装。CpG海岸的定义为CpG岛屿的上游/下游2Kb，不包括CpG岛屿。CpG陆架定义为CpG海岸最远的上游/下游边界以外的2Kb，不包括CpG岛屿和CpG海岸。剩余的基因组区域组成了cgi间注释。

CpG注释可用于hg19, hg38, mm9, mm10, rn4, rn5, rn6。

CpG注释的示意图。

3.2基因的注释

基因注释是由函数决定的GenomicFeatures数据来自TxDb。*而且org . * .eg.db包。基因注释包括TSS上游1-5Kb，启动子(TSS上游< 1Kb)， 5'UTR，第一外显子，外显子，内含子，CDS, 3'UTR和基因间区域(基因间区域不包括前面的注释列表)。实例中提取的不同注释之间的关系，下面的示意图说明了它们之间的关系TxDb。*包通过GenomicFeatures功能。

knownGene注释的示意图。

基因注释还包括内含子和外显子边界。这些注释位于外显子和内含子之间任何边界的上游/下游200bp处。值得注意的是，边界是关于基因链的。

非基因间基因注释包括存在的Entrez ID和基因符号信息。的org . * .eg.db用于适当生物体的包用于提供基因id和基因符号。

基因注释已经填充tx_id，gene_id,象征列。分别是knownGene转录本名称、Entrez Gene ID和基因符号。

基因注释适用于所有hg19、hg38、mm9、mm10、rn4、rn5、rn6、dm3和dm6。

3．3FANTOM5允许增强器

从双向CAGE转录中确定FANTOM5允许增强子安德森等人(2014)，下载并处理hg19和mm9FANTOM5资源。使用rtracklayer: liftOver ()功能，增强子从hg19提升到hg38, mm9提升到mm10。

3.4GENCODE lncRNA转录本

长非编码RNA (lncRNA)注释来自GENCODEhg19、hg38、mm10。使用了lncRNA转录本，我们最终计划在稍后的日期包括lncRNA内含子/外显子。lncRNA注释已经填充tx_id，gene_id,象征列。分别是Ensembl转录本名称、Entrez基因ID和基因符号。根据transcript_type字段中的生物型均在id列。

3．5来自ChromHMM的染色质状态

染色质状态由chromHMM (恩斯特和凯利斯(2012)通过UCSC基因组浏览器跟踪9个细胞系(Gm12878, H1hesc, Hepg2, Hmec, Hsmm, Huvec, K562, Nhek和Nhlf)。所有状态的注释都可以使用像这样的快捷方式构建hg19_Gm12878-chromatin，或者特定的染色质状态可以通过代码访问hg19_chromatin_Gm12878-StrongEnhancer或hg19_chromatin_Gm12878-Repressed．

3.6`AnnotationHub`注释

的AnnotationHubBioconductor包是AnnotationHub web资源的客户端。从包的描述:

AnnotationHub web资源提供了一个中心位置，可以发现基因组文件(例如，VCF，床，假发)和来自标准位置(例如，UCSC, Ensembl)的其他资源。资源包括关于每个资源的元数据，例如，文本描述、标签和修改日期。客户端创建并管理用户检索到的文件的本地缓存，有助于快速和可重复的访问。

使用build_ah_annots ()功能，用户可以转任意资源类农庄导入到注释中使用annotatr．例如，我们为Gm12878和H1-hesc细胞中的H3K4me3 ChIP-seq峰创建注释。

#为AnnotationHub接入代码创建一个命名向量h3k4me3_codes = c('Gm12878' = 'AH23256') #从AnnotationHub获取ah_codes并创建注释注释atr理解build_ah_annots(基因组= 'hg19'， ah_codes = h3k4me3_codes, annotation_class = 'H3K4me3') #注释，因为他们出现在注释atr_cache ah_names = c('hg19_H3K4me3_Gm12878')打印(注释atr_cache$get('hg19_H3K4me3_Gm12878'))

与57476年# #农庄对象范围和5元数据列:# # seqnames范围链| id tx_id # # < Rle > < IRanges > < Rle > | <人物> <逻辑> # # [1]chr1 713208 - 713477 * | H3K4me3_Gm12878:1 < NA > # # [2] chr1 713874 - 714056 * | H3K4me3_Gm12878:2 < NA > # # [3] chr1 714474 - 714750 * | H3K4me3_Gm12878:3 < NA > # # [4] chr1 715069 - 715388 * | H3K4me3_Gm12878:4 < NA > # # [5] chr1 724097 - 724311 * | H3K4me3_Gm12878:5 < NA > ## ... ... ... ... . ... ...# # [57472] chrX 154996923 - 154996923 * | H3K4me3_Gm12878:57472 < NA > # # [57473] chrX 154997422 - 154997422 * | H3K4me3_Gm12878:57473 < NA > # # [57474] chrX 155100454 - 155100454 * | H3K4me3_Gm12878:57474 < NA > # # [57475] chrX 155148379 - 155148379 * | H3K4me3_Gm12878:57475 < NA > # # [57476] chrX 155227027 - 155227027 * | H3K4me3_Gm12878:57476 < NA > # # gene_id符号类型# # <逻辑> <逻辑> <人物> # # [1]< NA > < NA > hg19_H3K4me3_Gm12878 # # [2] < NA > < NA > hg19_H3K4me3_Gm12878 # # [3] < NA > < NA >hg19_H3K4me3_Gm12878 # # [4] < NA > < NA > hg19_H3K4me3_Gm12878 # # [5] < NA > < NA > hg19_H3K4me3_Gm12878  ## ... ... ... ...[57472]   hg19_H3K4me3_Gm12878 ## [57473]   hg19_H3K4me3_Gm12878 ## [57474]   hg19_H3K4me3_Gm12878 ## [57475]   hg19_H3K4me3_Gm12878 ## [57476]   hg19_H3K4me3_Gm12878 ## ------- # seqinfo:来自hg19基因组的298个序列(2个循环)

3．7自定义注解

方法从BED文件中加载自己的注释read_annotations ()函数，该函数使用rtracklayer:进口()函数。输出为农庄与mcols ()为id，tx_id，gene_id，象征,类型．如果用户想要包含tx_id，gene_id和/或象征在自定义注释中，它们可以作为BED6输入文件中的额外列包含。

这些文件包含chr，开始和结束列ezh2_file = system。##自定义注释对象以genome_custom_name read_annotations(con = ezh2_file, genome = 'hg19'， name = 'ezh2'， format = 'bed')的形式命名。

与2472年# #农庄对象范围和5元数据列:# # seqnames范围链| id tx_id gene_id # # < Rle > < IRanges > < Rle > | <人物> <逻辑> <逻辑> # # [1]chr1 860063 - 860382 * | ezh2:1 < NA > < NA > # # [2] chr1 934911 - 935230 * | ezh2:2 < NA > < NA > # # [3] chr1 3573321 - 3573321 * | ezh2:3 < NA > < NA > # # [4] chr1 6301401 - 6301401 * | ezh2:4 < NA > < NA > # # [5] chr1 6301996 - 6301996 * | ezh2:5 < NA > < NA > ## ... ... ... ... . ... ... ...# # [2468] chrX 99880950 - 99880950 * | ezh2:2468 < NA > < NA > # # [2469] chrX 108514101 - 108514101 * | ezh2:2469 < NA > < NA > # # [2470] chrX 111981673 - 111981673 * | ezh2:2470 < NA > < NA > # # [2471] chrX 118109216 - 118109216 * | ezh2:2471 < NA > < NA > # # [2472] chrX 136114771 - 136114771 * | ezh2:2472 < NA > < NA > # # # #符号类型逻辑> < <人物> # # [1]< NA > hg19_custom_ezh2 # # [2] < NA > hg19_custom_ezh2 # # [3] < NA > hg19_custom_ezh2 # # [4] < NA > hg19_custom_ezh2 # # [5] < NA > hg19_custom_ezh2  ## ... ... ...[2468]  hg19_custom_ezh2 ## [2469]  hg19_custom_ezh2 ## [2471]  hg19_custom_ezh2 ## [2472]  hg19_custom_ezh2 ## ------- # seqinfo:来自hg19基因组的298个序列(2个循环)

看看里面有什么annotatr_cache环境方面，做到以下几点:

打印(annotatr_cache list_env美元())

## [1] "hg19_H3K4me3_Gm12878" "hg19_custom_ezh2"

4使用

下面的例子是基于两种条件下使用的基因组区域差异甲基化的测试结果methylSig．文件(本月/ extdata / IDH2mut_v_NBM_multi_data_chr9.txt.gz)包含染色体位置，以及分类和数字数据列，并提供了的灵活性的良好示例annotatr．

4．1解读基因组区域

read_regions ()使用rtracklayer:进口()函数读入BED文件并将其转换为农庄对象。的的名字而且分数普通BED文件中的列可以分别用于分类数据和数值数据。此外，可以将任意数量的分类和数字数据列追加到BED6文件中。的extraCols参数用于此目的，而rename_name而且rename_score列允许用户为这些列提供更多描述性的名称。

inst/extdata中的这个文件表示在两种条件下测试差异甲基化的区域。此外，还有列#报告差分meth测试上的p值。#冰毒。两组之间的区别，和冰毒组的区别。利率。Dm_file = system。file('extdata'， 'IDH2mut_v_NBM_multi_data_chr9.txt.gz'， package = 'annotatr') extraCols = c(diff_meth = 'numeric'， mu0 = 'numeric'， mu1 = 'numeric') dm_regions = read_regions(con = dm_file, genome = 'hg19'， extraCols = extraCols, format = 'bed'， rename_name = 'DM_status'， rename_score = 'pval') #使用更少的区域来加快速度dm_regions = dm_regions[1:2000] print(dm_regions)

| DM_status pval diff_meth ##    |    [1] chr9 10950-11048 * |无0.2227126 8.7195270 ## [3]chr9 28950-29048 * |无0.5530958 0.0700847 ## [5]chr9 72950-73048 * |无0.1752872 17.7606626 ## # ... ... ... ... . ... ... ...# # [1996] chr9 35605150 - 35605150 * |没有0.274255 - -0.0539158 # # [1997]chr9 35605250 - 35605250 * |没有0.918064 - 0.0329283 # # [1998]chr9 35605350 - 35605350 * |没有0.614312 - -0.0977500 # # [1999]chr9 35605450 - 35605450 * |没有1.000000 - 0.0000000 # # [2000]chr9 35605550 - 35605550 * |没有0.814567 - 0.0349967 # # mu0 mu1 # # <数字> <数字> # # # # [2][1]79.981920 - 90.7148252 86.704015 - 77.9844878 # # # # [4][3]0.124081 - 0.0539963 72.455413 - 27.5800883 28.440368 - 10.6797057 # # [5]  ## ... ... ...## [1996] 0.000000 0.0539158 ## [1997] 0.328024 0.2950959 ## [1998] 0.130184 0.2279345 ## [1999] 0.000000 0.0000000 ## [2000] 0.118272 0.0832756 ## ------- ## seqinfo:来自hg19基因组的298个序列(2个循环)

4.2注释区域

用户可以通过列出的访问器选择注释builtin_annotations ()，快捷方式，或使用自定义注释如上所述。的hg19_cpgs快捷方式注释区域到CpG岛屿，CpG海岸，CpG货架，和cgi之间。的hg19_basicgenes快捷方式将区域标注为1-5Kb，启动子，5 ' utr，外显子，内含子和3 ' utr。其他人的捷径builtin_genomes ()以类似的方式访问。

annotate_regions ()需要一个农庄对象的结果read_regions ()或一个现有的对象)，a农庄对象的注释，以及指示是否为的逻辑值ignore.strand当调用GenomicRanges: findOverlaps ()．正整数minoverlap也传递给GenomicRanges: findOverlaps ()并指定要分配给注释的区域所需的最小重叠。

在注释区域之前，必须使用build_annotations ()这需要一个所需注释代码的字符向量。

#选择与区域相交的注释#注意包含自定义注释，并使用快捷键annots = c('hg19_cpgs'， 'hg19_basicgenes'， 'hg19_genes_intergenic'， 'hg19_genes_intronexonboundaries'， 'hg19_custom_ezh2'， 'hg19_H3K4me3_Gm12878') #构建注释(单个GRanges对象)注释= build_annotations(genome = 'hg19'，注释= annots) #用注释dm_annotated = annotate_regions(regions = dm_regions，注释=注释，忽略。返回一个GRanges对象print(dm_annotated)

| DM_status pval diff_meth ##    |    > ## [1] chr9 10850-10948 * |无0.504550 -10.73290 ## [2]chr9 10950- 10948 * |无0.504550 -10.73290 ## [3]chr9 10950-11048 * |无0.222713 8.71953 ## [5]chr9 10950-11048 * |无0.222713 8.71953 ## ... ... ... ... . ... ... ...# # [11370] chr9 35605550 - 35605550 * |没有0.814567 - 0.0349967 # # [11371]chr9 35605550 - 35605550 * |没有0.814567 - 0.0349967 # # [11372]chr9 35605550 - 35605550 * |没有0.814567 - 0.0349967 # # [11373]chr9 35605550 - 35605550 * |没有0.814567 - 0.0349967 # # [11374]chr9 35605550 - 35605550 * |没有0.814567 - 0.0349967 # # mu0 mu1 annot # # <数字> <数字> <农庄> # # 79.9819 [1]chr9:6987 - 10986: 90.7148 + 79.9819 - 90.7148 # # [2] chr9:1 - 24849: * # # [3] 86.7040 - 77.9845 chr9:10987 - 11986: 86.7040 + # # [4]chr9:6987 - 10986: 77.9845 + 86.7040 - 77.9845 # # [5] chr9:1 - 24849 :* ## ... ... ... ...## [11370] 0.118272 0.0832756 chr9:35605281-35605835:+ ## [11372] 0.118272 0.0832756 chr9:35605281-35605835:+ ## [11373] 0.118272 0.0832756 chr9:35605281-35605835:+ ## [11374] 0.118272 0.0832756 chr9:35603969-35605991:* ## ------- ## seqinfo:来自hg19基因组的298个序列(2个循环)

的annotate_regions ()函数返回农庄，但它可能更方便操纵一个胁迫data.frame．例如,

#转换为data.frame df_dm_annotated = data.frame(dm_annotated) #查看dm_annotated的GRanges列(head(df_dm_annotated))

## seqnames start end width strand DM_status pval diff_meth mu0 ## 1 chr9 10850 10948 99 *无0.5045502 -10.73290471 79.981920 ## 2 chr9 10850 10948 99 *无0.5045502 -10.73290471 79.981920 ## 3 chr9 10950 11048 99 *无0.2227126 8.71952705 86.704015 ## 4 chr9 10950 11048 99 *无0.2227126 8.71952705 86.704015 ## 5 chr9 10950 11048 99 *无0.2227126 8.71952705 86.704015 ## 6 chr9 28950 29048 99 *无0.5530958 0.07008468seqnames annot。annot开始。annot结束。宽度annot。1 90.7148252 chr9 6987 10986 4000 + ## 2 90.7148252 chr9 1 24849 24849 * ## 3 77.9844878 chr9 10987 11986 1000 + ## 4 77.9844878 chr9 6987 10986 4000 + ## 5 77.9844878 chr9 1 24849 24849 * ## 6 0.0539963 chr9 28923 29077 155 * ## not。id annot。tx_idannot.gene_id annot.symbol ## 1 1to5kb:34327 uc011llp.1 100287596 DDX11L5 ## 2 inter:8599    ## 3 promoter:34327 uc011llp.1 100287596 DDX11L5 ## 4 1to5kb:34327 uc011llp.1 100287596 DDX11L5 ## 5 inter:8599    ## 6 H3K4me3_Gm12878:27530    ## annot.type ## 1 hg19_genes_1to5kb ## 2 hg19_cpg_inter ## 3 hg19_genes_promoters ## 4 hg19_genes_1to5kb ## 5 hg19_cpg_inter ## 6 hg19_H3K4me3_Gm12878

#基于基因符号的子集，在本例中NOTCH1 notch1_子集=子集(df_dm_annotated, annot.子集)。symbol == 'NOTCH1') print(head(notch1_子集))

## [1] seqnames start end width strand ## [6] DM_status pval diff_meth mu0 mu1 ## [11] annot。seqnames annot。annot开始。annot结束。宽度annot。Strand ## [16] annot。id annot。tx_idannot.gene_id annot.symbol annot.type ## <0 rows> (or 0-length row.names)

4.３随机化区域

给定一组带注释的区域，了解注释与随机区域集的注释相比如何是很重要的。的randomize_regions ()函数的包装器地区:randomizeRegions ()从地区包，它创建一组给定的随机区域农庄对象。创建随机集后，必须对其进行注释annotate_regions ()供以后使用。只有builtin_genomes ()可以在包装器函数中使用。支持使用随机区域注释的下游函数是summarize_annotations ()，plot_annotation (),plot_categorical ()．

重要的是要注意，如果要随机化的区域具有特定的属性，例如它们是cpg，则randomize_regions ()包装器将不保留该属性!相反，我们建议使用地区:resampleRegions ()与宇宙作为要从中进行抽样的数据区域的超集。

#随机化输入区域dm_random_regions = randomize_regions(区域= dm_regions，允许。重叠= TRUE，每。这些将在后面的函数中使用dm_random_annotated = annotate_regions(regions = dm_random_regions, annotations = annotations, ignore)。strand = TRUE, quiet = TRUE)

4.4通过注释总结

当没有与区域相关的分类或数字信息时，summarize_annotations ()是唯一可用的摘要函数。它给出每种注释类型中的区域计数(参见下面的示例)。如果存在分类和/或数字信息，那么summarize_numerical ()和/或summarize_categorical ()可能会用到。使用随机区域注释仅适用于summarize_annotations ()．

#查找每个注释类型dm_annsum = summarize_annotations(annotated_regions = dm_annotated_regions, quiet = TRUE) print(dm_annsum)

## #一个tibble: 14 × 2 ## annot。类型n ##   ## 1 hg19_H3K4me3_Gm12878 747 ## 2 hg19_cpg_inter 905 ## 3 hg19_cpg_islands 848 ## 4 hg19_cpg_shelves 46 ## 5 hg19_cpg_shores 341 ## 6 hg19_custom_ezh2 7 ## 7 hg19_genes_1to5kb 257 ## 8 hg19_genes_3UTRs 28 ## 9 hg19_genes_5UTRs 271 ## 10 hg19_genes_exons 483 ## 11 hg19_genes_intergenic 557 ## 12 hg19_genes_intronexonboundaries 319 ## 13 hg19_genes_introns 951 ## 14 hg19_genes_promoters 393

#查找每个注释类型#和每个注释类型dm_annsum_rnd = summarize_annotations(annotated_regions = dm_annotated, annotated_random = dm_random_annotated, quiet = TRUE) print(dm_annsum_rnd)

## #一个标题:28 × 3 ## #组:data_type [2] ## data_type annot。类型n ##    ## 1 Data hg19_H3K4me3_Gm12878 747 ## 2 Data hg19_cpg_inter 905 ## 3 Data hg19_cpg_islands 848 ## 4 Data hg19_cpg_shelves 46 ## 5 Data hg19_cpg_shores 341 ## 6 Data hg19_custom_ezh2 7 ## 7 Data hg19_genes_1to5kb 257 ## 8 Data hg19_genes_3UTRs 28 ## 9 Data hg19_genes_5UTRs 271 ## 10 Data hg19_genes_exons 483 ## # … with 18 more rows

求注释中出现的所有区域的diff_meth列的平均值。Dm_numsum = summarize_numerical(annotated_regions = dm_annotated_regions, by = c('annot. ')type'， '注释.id')， over = c('diff_meth')， quiet = TRUE) print(dm_numsum)

## #一个tibble: 3,597 × 5 ## #组:annot。输入[14]## annot。annot类型。idn mean sd ##      ## 1 hg19_H3K4me3_Gm12878 H3K4me3_Gm12878:27530 1 0.0701 NA ## 2 hg19_H3K4me3_Gm12878 H3K4me3_Gm12878:27531 8 1.28 3.78 ## 3 hg19_H3K4me3_Gm12878 H3K4me3_Gm12878:27532 2 13.4 5.11 ## 4 hg19_H3K4me3_Gm12878 H3K4me3_Gm12878:27534 10 0.526 0.975 ## 5 hg19_H3K4me3_Gm12878 H3K4me3_Gm12878:27535 8 0.407 0.923 ## 6 hg19_H3K4me3_Gm12878 H3K4me3_Gm12878:27543 2 -0.0530 0.0749 ## 7 hg19_H3K4me3_Gm12878 H3K4me3_Gm12878:27544 11 0.192 0.427 ## 8 hg19_H3K4me3_Gm12878 H3K4me3_Gm12878:27545 2 2.80 10.1 ## 9 hg19_H3K4me3_Gm12878 H3K4me3_Gm12878:27549 2 -0.811 1.75 ## 10 hg19_H3K4me3_Gm12878 H3K4me3_Gm12878:27555 1 -1.50 NA ## # … with 3,587 more rows

统计DM_status #列中所有注释类型中分类的出现次数。Dm_catsum = summarize_categorical(annotated_regions = dm_annotated_regions, by = c('annot。type'， 'DM_status')， quiet = TRUE) print(dm_catsum)

## #一个tibble: 39 × 3 ## #组:annot。输入[14]## annot。类型DM_status n ##    ## 1 hg19_H3K4me3_Gm12878 hyper 78 ## 2 hg19_H3K4me3_Gm12878 hypo 8 ## 3 hg19_H3K4me3_Gm12878 none 661 ## 4 hg19_cpg_inter hyper 32 ## 5 hg19_cpg_inter hypo 90 ## 6 hg19_cpg_inter none 783 ## 7 hg19_cpg_islands hyper 151 ## 8 hg19_cpg_islands hypo 4 ## 9 hg19_cpg_islands none 693 ## 10 hg19_cpg_shelves hyper 2 ## # … with 29 more rows

4．5策划

下面描述的5个plot函数将用于返回的对象annotate_regions ()．plot函数返回一个类型的对象ggplot可浏览(打印)，保存(ggsave)，或增加ggplot2代码。

4.5.1根据注释绘制区域

#查看每个注释的区域数量。当没有与区域相关联的分类或数据#时，此函数#非常有用。annots_order = c('hg19_custom_ezh2'， 'hg19_H3K4me3_Gm12878'， 'hg19_genes_1to5kb'， 'hg19_genes_promoters'， 'hg19_genes_5UTRs'， 'hg19_genes_exons'， 'hg19_genes_intronexonboundaries'， 'hg19_genes_introns'， 'hg19_genes_3UTRs'， 'hg19_genes_3UTRs') dm_vs_kg_annotations = plot_annotation(annotated_regions = dm_annotated, annotation_order = annots_order, plot_title = '# of Sites tests for DM annotated on chr9'， x_label = 'knownGene Annotations'， y_label = 'Count') print(dm_vs_kg_annotations)

图1:每个注释的DM区域数量

的plot_annotation ()还可以使用注释中的随机区域annotated_random参数将每个注释类型的随机区域的数量绘制在输入数据区域的数量旁边。

#查看每个注释的地区数量,包括随机区域的注释# annots_order = c(‘hg19_custom_ezh2’,‘hg19_H3K4me3_Gm12878’,‘hg19_genes_1to5kb’,‘hg19_genes_promoters’,‘hg19_genes_5UTRs’,‘hg19_genes_exons’,‘hg19_genes_intronexonboundaries’,‘hg19_genes_introns’,‘hg19_genes_3UTRs’,‘hg19_genes_intergenic’)dm_vs_kg_annotations_wrandom = plot_annotation (annotated_regions = dm_annotated annotated_random = dm_random_annotated annotation_order = annots_order,plot_title = 'Dist. of Sites tests for DM (with rndm.)'， x_label = 'Annotations'， y_label = 'Count') print(dm_vs_kg_annotations_wrrandom)

图2:带有随机区域的每个注释的DM区域数

4.5.2标注成对出现的区域

#查看注释成对出现区域的热图annots_order = c('hg19_custom_ezh2'， 'hg19_H3K4me3_Gm12878'， 'hg19_genes_promoters'， 'hg19_genes_exons'， 'hg19_genes_introns'， 'hg19_genes_3UTRs'， 'hg19_genes_intergenic') dm_vs_coannotations = plot_coannotations(annotated_regions = dm_annotated, annotation_order = annots_order, axes_label = ' annotations '， plot_title = ' regions in pairs of annotations ') print(dm_vs_coannotations)

图3:每对注释的DM区域数

4.5.3在区域上绘制数值数据

有了数值数据plot_numerical ()函数在区域水平上绘制单个变量(直方图)或两个变量(散点图)，对选择的分类变量进行切面。可以包含两个分类变量来进行facet转换(参见下面)。注意，当图是直方图时，所有区域的分布都在每个方面内绘制。

Dm_vs_regions_annot = plot_numerical(annotated_regions = dm_annotated_regions, x = 'mu0'， facet = 'annot。类型', facet_order = c('hg19_genes_1to5kb','hg19_genes_promoters', 'hg19_genes_5UTRs','hg19_genes_3UTRs', 'hg19_custom_ezh2', 'hg19_genes_intergenic', 'hg19_cpg_islands'), bin_width = 5, plot_title = 'Group 0 Region Methylation In Genes', x_label = 'Group 0') print(dm_vs_regions_annot)

图4:超过DM状态区域的0组甲基化率

dm_vs_regions_annot2 = plot_numerical(annotated_regions = dm_annotated, x = 'diff_meth'， facet = c(' annotation .type'，'DM_status')， facet_order = list(c('hg19_genes_promoters'，'hg19_genes_5UTRs'，'hg19_cpg_islands')， c('hyper'，'hypo'，'none'))， bin_width = 5, plot_title = '基因组0区域甲基化'，x_label = '甲基化差异')print(dm_vs_regions_annot2)

图5:甲基化区域在DM状态和注释类型上的差异

Dm_vs_regions_name = plot_numerical(annotated_regions = dm_annotated, x = 'mu0'， y = 'mu1'， facet = 'annot.)类型', facet_order = c('hg19_genes_1to5kb','hg19_genes_promoters', 'hg19_genes_5UTRs','hg19_genes_3UTRs', 'hg19_custom_ezh2', 'hg19_genes_intergenic', 'hg19_cpg_islands', 'hg19_cpg_shores'), plot_title = 'Region Methylation: Group 0 vs Group 1', x_label = 'Group 0', y_label = 'Group 1') print(dm_vs_regions_name)

图6:组0和组1中DM状态以上区域的甲基化率

的plot_numerical_coannotations ()显示任意两个注释中出现的区域以及一个或另一个注释中出现的区域的数值数据分布。例如，下面的例子显示了CpG的甲基化率仅发生在启动子中，仅发生在CpG岛中，启动子和CpG岛都发生。

dm_vs_num_co = plot_numerical_coannotations(annotated_regions = dm_annotated, x = 'mu0'， annot1 = 'hg19_cpg_islands'， annot2 = 'hg19_genes_promoters'， bin_width = 5, plot_title = '组0 Perc。冰毒。in CpG Islands and promoter '， x_label = ' % Methylation') print(dm_vs_num_co)

图7:在启动子、CpG岛和两者的区域中，0组甲基化率

4.5.4绘制分类数据

#查看数据类中CpG注释的计数# x轴标签的顺序这也是标签#的子集(hyper, hypo, none)。x_order = c('hyper'， 'hypo') #填充标签的顺序。也可以使用这个#参数来子集注释类型来填充。fill_order = c('hg19_cpg_islands'， 'hg19_cpg_shores'， ' hg19_cpg_racks '， 'hg19_cpg_inter') #制作数据类的barplot，其中每个bar #由CpG注释的计数组成。dm_vs_cpg_cat1 = plot_categorical(annotated_regions = dm_annotated_regions, x='DM_status'， fill='annot. '类型', x_order = x_order, fill_order = fill_order, position='stack', plot_title = 'DM Status by CpG Annotation Counts', legend_title = 'Annotations', x_label = 'DM status', y_label = 'Count') print(dm_vs_cpg_cat1)

图8:基于CpG注释计数的甲基化差异分类

使用与前面代码块相同的顺序向量，但是使用比例填充而不是计数。制作一个数据类的条形图，其中每个条形图由CpG注释的*比例*组成。dm_vs_cpg_cat2 = plot_categorical(annotated_regions = dm_annotated_regions, x='DM_status'， fill='annot. '类型', x_order = x_order, fill_order = fill_order, position='fill', plot_title = 'DM Status by CpG Annotation Proportions', legend_title = 'Annotations', x_label = 'DM status', y_label = 'Proportion') print(dm_vs_cpg_cat2)

图9:CpG注释比例的差异甲基化分类

与plot_annotation ()对象中可以添加随机区域的注释annotated_random参数的plot_categorical ()．结果是一个随机区域条，表示用于分类变量的随机区域分布填满．注:随机区域只能在Fill = ' annotation .type'．

#为“随机区域”栏添加随机注释#制作数据类的barplot，其中每个栏#由CpG注释的*比例*组成，#包括测试DM的“所有”区域和“随机区域”#由随机区域组成的区域。dm_vs_cpg_cat_random = plot_categorical(annotated_regions = dm_annotated, annotated_random = dm_random_annotated, x='DM_status'， fill='annot。类型', x_order = x_order, fill_order = fill_order, position='fill', plot_title = 'DM Status by CpG Annotation Proportions', legend_title = 'Annotations', x_label = 'DM status', y_label = 'Proportion') print(dm_vs_cpg_cat_random)

图10:基于CpG注释和随机区域比例的差异甲基化分类

#查看knownGene注释中数据类的比例# x轴标签的顺序x_order = c('hg19_custom_ezh2'， 'hg19_genes_1to5kb'， 'hg19_genes_promoters'， 'hg19_genes_5UTRs'， 'hg19_genes_exons'， 'hg19_genes_introns'， 'hg19_genes_3UTRs'， 'hg19_genes_intergenic') #填充标签的顺序。Fill_order = c('hyper'， 'hypo'， 'none') dm_vs_kg_cat = plot_categorical(annotated_regions = dm_annotated, x='annot. ')类型', fill='DM_status', x_order = x_order, fill_order = fill_order, position='fill', legend_title = 'DM Status', x_label = 'knownGene Annotations', y_label = 'Proportion') print(dm_vs_kg_cat)

图11:DM分类比例的基本基因标注

`annotatr`:理解基因组区域

2022-04-26

内容

1简介

2安装

3.注释

3．1CpG注释

3.2基因的注释

3．3FANTOM5允许增强器

3.4GENCODE lncRNA转录本

3．5来自ChromHMM的染色质状态

3.6`AnnotationHub`注释

3．7自定义注解

4使用

4．1解读基因组区域

4.2注释区域

4.３随机化区域

4.4通过注释总结

4．5策划

4.5.1根据注释绘制区域

4.5.2标注成对出现的区域

4.5.3在区域上绘制数值数据

4.5.4绘制分类数据

annotatr:理解基因组区域

2022-04-26

内容

1简介

2安装

3.注释

3．1CpG注释

3.2基因的注释

3．3FANTOM5允许增强器

3.4GENCODE lncRNA转录本

3．5来自ChromHMM的染色质状态

3.6AnnotationHub注释

3．7自定义注解

4使用

4．1解读基因组区域

4.2注释区域

4.３随机化区域

4.4通过注释总结

4．5策划

4.5.1根据注释绘制区域

4.5.2标注成对出现的区域

4.5.3在区域上绘制数值数据

4.5.4绘制分类数据

`annotatr`:理解基因组区域

3.6`AnnotationHub`注释