IMMAN

Minoo Ashtiani, Payman Nickchi, abdullah Safari, Mehdi Mirzaie, Mohieddin Jafari

2022-11-01

IMMAN被指定用于检索跨不同物种共享的interlog蛋白质网络。为此,我们首先通过迭代任何一对输入物种,并使用Needleman-Wunsch比对算法和最佳互反命中策略,通过所有或所有成对跨物种比对来获得同源物,从而超越不同物种序列之间的同源关系。从同源分配中,我们推导出同源蛋白集(OPSs),一种同源物簇的分类(每个物种最多一个),它将符合所谓的Interolog Protein Network (IPN)的节点。

我们从STRING数据库中超过了n个物种特异性互联蛋白网络,其中每个节点映射到IPN中的单个OPS,并通过仅选择连接IPN中至少连接到物种特异性网络(其中' k '被设置为参数)的节点的边来区分结果IPN的边缘。

比对过程使用一个评分系统,它可以被描述为一组指定的值,用于量化在比对中一个残基被另一个残基取代的可能性。比对过程使用的评分系统称为替换矩阵,它可以通过对高度相关序列的可靠比对集的残差替换数据进行统计分析来实现。使用从0到100的identityU值,用户可以专门化ipn的大小。当identityU的值越来越大时,算法将会找到很多相似的同形物,反之亦然。我们使用gapOpening和gapExtension参数来计算同源蛋白的数值。如果我们跳过一个蛋白质,对于匹配蛋白质的排列,gapOpening参数将增加。间隙越小,蛋白质排列就越相似。指定score_threshold参数用于评估substitutionMatrix中两个蛋白质之间的相似性值。从0到100不等,但常用的范围是25到30。不同物种的直系同源物之间相互作用的转移称为互域方法。 We used Besthit argument to reach proteins which has the most similarity in all versus all protein alignment. If an interaction was exist between each pair of proteins of OPSs, an edge would be linked in the IPN. The coverage_threshold specifies the number of interactions that are exist among pair of proteins of OPSs. It differs from 1 to number of species. As much as the value of coverage_threshold was high, the final IPN would be more robust and usually smaller. NetworkShrinkage argument determine whether two similar OPSs which have ortholog proteins in common should be merged or not. If it was TRUE the resulting IPN would be smaller.

为了使用这个包,我们假设“IMMAN”包已经正确地安装到R环境中。安装后,“IMMAN”包可以通过

图书馆(IMMAN)

为了说明,我们将读取来自不同物种的两个数据集,它们可以通过以下方式访问:

数据(果蝇)数据(Celegance)subFruitFly < -as.character(果蝇美元V1) [110subCelegance < -as.character(Celegance美元V1) [110

然后,我们必须制作一个数据集物种列表并设置它们的分类id。

ProteinLists =列表(subFruitFly subCelegance)List1_Species_ID =7227果蝇List2_Species_ID =6239# taxonomy ID优雅Species_IDs =c(List1_Species_ID List2_Species_ID)

要继续,设置参数以运行分析。以下是IMMAN中的参数说明。如果你需要更多的信息,你可以参考这篇论文。

identityU:选择比对分数大于或等于identityU的蛋白质的截断值。

substitutionMatrix:哪个评分矩阵用于对齐设置gapOpening和gapExtension以进行对齐。

对于网络缩水率、覆盖率和最佳性能,请参考纸张。

STRINGversion:表示程序在哪个版本的STRING数据库中搜索PPIs的分数。

然后,我们将设置参数值:

identityU =30.substitutionMatrix =“BLOSUM62”gapOpening =8gapExtension =8NetworkShrinkage =覆盖率=1BestHit =真正的score_threshold =400STRINGversion =“十一”

最后,我们可以运行IMMAN函数:

输出=IMMAN(ProteinLists文件名=Species_IDs,identityU substitutionMatrix,gapOpening, gapExtension, bestthit,报道,NetworkShrinkage,score_threshold STRINGversion,InputDirectory =getwd())
##步骤1/4:下载氨基酸序列…
下载清单1的氨基酸序列
下载清单2的氨基酸序列
##步骤2/4:对齐…
##将List1与List2对齐
##步骤3/4:检测字符串…
##在STRING中检测List1
##在STRING中检测List2
##步骤4/4:检索字符串网络…
##检索List1
##检索List2
##生产IPN…
# #完成了!

为了查看结果的某些特定部分,你可以使用:

输出美元IPNEdges
# # node1 node2 # # 21 OPS0001 OPS0002 # # 41 OPS0001 OPS0004 # # 51 OPS0001 OPS0005 # # 61 OPS0001 OPS0006 # # 101 OPS0001 OPS00010 # # 411 OPS0002 OPS0004 # # 511 OPS0002 OPS0005 # # 611 OPS0002 OPS0006 # # 1011 OPS0002 OPS00010 # # 412 OPS0003 OPS0004 # # 612 OPS0003 OPS0006 # # 81 OPS0003 OPS0008 # # 512 OPS0004 OPS0005 # # 613 OPS0004 OPS0006 # # 811 OPS0004 OPS0008 # # 1012 OPS0004 OPS00010 # # 91 OPS0005 OPS0009 # # 1013 OPS0005 OPS00010 # # 812 OPS0006 OPS0008 # # 1014 OPS0006 OPS00010 # # 911 OPS0007 OPS0009
输出美元IPNNodes
OPSLabel ## 1 7227。6239. FBpp0100177 mtce.31Ops0001 ## 2 7227。FBpp0070871 6239.T20G5.2 OPS0002 ## 3 7227FBpp0073568 6239.ZK652.9 OPS0003 ## 4 7227FBpp03058286239.C34E10.6.1 OPS0004 ## 5 7227.FBpp0073290 6239.Y22D7AL.5a.2 OPS0005 ## 6 7227.FBpp0298344 6239.Y37D8A.14 OPS0006 ## 7 7227.FBpp0076520 6239.T08G2.3.1 OPS0007 ## 8 7227.FBpp0081347 6239.F57B9.4b OPS0008 ## 9 7227.FBpp0085821 6239.B0250.5 OPS0009 ## 10 7227.FBpp0071794 6239.H28O16.1a OPS00010
输出美元网络
## [[1]] ## from to ## 1 7227FBpp0070871 7227。FBpp0071794 ## 2 7227FBpp0070871 7227。FBpp0071794 ## 3 7227。FBpp0070871 7227。FBpp0073290 ## 4 7227FBpp0070871 7227。FBpp0073290 ## 5 7227FBpp0071794 7227。FBpp0073290 ## 6 7227FBpp0071794 7227。FBpp0073290 ## 7227FBpp0073568 7227。FBpp0081347 ## 8 7227FBpp0073568 7227。FBpp0081347 ## 9 7227FBpp0076520 7227。FBpp0085821 ## 10 7227FBpp0076520 7227。FBpp0085821 ## 11 7227.FBpp0070871 7227.FBpp0100177 ## 12 7227.FBpp0070871 7227.FBpp0100177 ## 13 7227.FBpp0071794 7227.FBpp0100177 ## 14 7227.FBpp0071794 7227.FBpp0100177 ## 15 7227.FBpp0073568 7227.FBpp0298344 ## 16 7227.FBpp0073568 7227.FBpp0298344 ## 17 7227.FBpp0081347 7227.FBpp0298344 ## 18 7227.FBpp0081347 7227.FBpp0298344 ## 19 7227.FBpp0070871 7227.FBpp0305828 ## 20 7227.FBpp0070871 7227.FBpp0305828 ## 21 7227.FBpp0071794 7227.FBpp0305828 ## 22 7227.FBpp0071794 7227.FBpp0305828 ## 23 7227.FBpp0073290 7227.FBpp0305828 ## 24 7227.FBpp0073290 7227.FBpp0305828 ## 25 7227.FBpp0073568 7227.FBpp0305828 ## 26 7227.FBpp0073568 7227.FBpp0305828 ## 27 7227.FBpp0081347 7227.FBpp0305828 ## 28 7227.FBpp0081347 7227.FBpp0305828 ## 29 7227.FBpp0100177 7227.FBpp0305828 ## 30 7227.FBpp0100177 7227.FBpp0305828 ## 31 7227.FBpp0298344 7227.FBpp0305828 ## 32 7227.FBpp0298344 7227.FBpp0305828 ## ## [[2]] ## from to ## 1 6239.C34E10.6.1 6239.F57B9.4b ## 2 6239.C34E10.6.1 6239.F57B9.4b ## 3 6239.C34E10.6.1 6239.H28O16.1a ## 4 6239.C34E10.6.1 6239.H28O16.1a ## 5 6239.C34E10.6.1 6239.MTCE.31 ## 6 6239.C34E10.6.1 6239.MTCE.31 ## 7 6239.H28O16.1a 6239.MTCE.31 ## 8 6239.H28O16.1a 6239.MTCE.31 ## 9 6239.B0250.5 6239.T08G2.3.1 ## 10 6239.B0250.5 6239.T08G2.3.1 ## 11 6239.C34E10.6.1 6239.T20G5.2 ## 12 6239.C34E10.6.1 6239.T20G5.2 ## 13 6239.H28O16.1a 6239.T20G5.2 ## 14 6239.H28O16.1a 6239.T20G5.2 ## 15 6239.MTCE.31 6239.T20G5.2 ## 16 6239.MTCE.31 6239.T20G5.2 ## 17 6239.B0250.5 6239.Y22D7AL.5a.2 ## 18 6239.B0250.5 6239.Y22D7AL.5a.2 ## 19 6239.C34E10.6.1 6239.Y22D7AL.5a.2 ## 20 6239.C34E10.6.1 6239.Y22D7AL.5a.2 ## 21 6239.H28O16.1a 6239.Y22D7AL.5a.2 ## 22 6239.H28O16.1a 6239.Y22D7AL.5a.2 ## 23 6239.MTCE.31 6239.Y22D7AL.5a.2 ## 24 6239.MTCE.31 6239.Y22D7AL.5a.2 ## 25 6239.T20G5.2 6239.Y22D7AL.5a.2 ## 26 6239.T20G5.2 6239.Y22D7AL.5a.2 ## 27 6239.C34E10.6.1 6239.Y37D8A.14 ## 28 6239.C34E10.6.1 6239.Y37D8A.14 ## 29 6239.H28O16.1a 6239.Y37D8A.14 ## 30 6239.H28O16.1a 6239.Y37D8A.14 ## 31 6239.MTCE.31 6239.Y37D8A.14 ## 32 6239.MTCE.31 6239.Y37D8A.14 ## 33 6239.T20G5.2 6239.Y37D8A.14 ## 34 6239.T20G5.2 6239.Y37D8A.14 ## 35 6239.C34E10.6.1 6239.ZK652.9 ## 36 6239.C34E10.6.1 6239.ZK652.9 ## 37 6239.F57B9.4b 6239.ZK652.9 ## 38 6239.F57B9.4b 6239.ZK652.9
输出美元网络[[1]]
##从## 1 7227。FBpp0070871 7227。FBpp0071794 ## 2 7227FBpp0070871 7227。FBpp0071794 ## 3 7227。FBpp0070871 7227。FBpp0073290 ## 4 7227FBpp0070871 7227。FBpp0073290 ## 5 7227FBpp0071794 7227。FBpp0073290 ## 6 7227FBpp0071794 7227。FBpp0073290 ## 7227FBpp0073568 7227。FBpp0081347 ## 8 7227FBpp0073568 7227。FBpp0081347 ## 9 7227FBpp0076520 7227。FBpp0085821 ## 10 7227FBpp0076520 7227。FBpp0085821 ## 11 7227.FBpp0070871 7227.FBpp0100177 ## 12 7227.FBpp0070871 7227.FBpp0100177 ## 13 7227.FBpp0071794 7227.FBpp0100177 ## 14 7227.FBpp0071794 7227.FBpp0100177 ## 15 7227.FBpp0073568 7227.FBpp0298344 ## 16 7227.FBpp0073568 7227.FBpp0298344 ## 17 7227.FBpp0081347 7227.FBpp0298344 ## 18 7227.FBpp0081347 7227.FBpp0298344 ## 19 7227.FBpp0070871 7227.FBpp0305828 ## 20 7227.FBpp0070871 7227.FBpp0305828 ## 21 7227.FBpp0071794 7227.FBpp0305828 ## 22 7227.FBpp0071794 7227.FBpp0305828 ## 23 7227.FBpp0073290 7227.FBpp0305828 ## 24 7227.FBpp0073290 7227.FBpp0305828 ## 25 7227.FBpp0073568 7227.FBpp0305828 ## 26 7227.FBpp0073568 7227.FBpp0305828 ## 27 7227.FBpp0081347 7227.FBpp0305828 ## 28 7227.FBpp0081347 7227.FBpp0305828 ## 29 7227.FBpp0100177 7227.FBpp0305828 ## 30 7227.FBpp0100177 7227.FBpp0305828 ## 31 7227.FBpp0298344 7227.FBpp0305828 ## 32 7227.FBpp0298344 7227.FBpp0305828
输出美元地图
## [b[1]] ## UNIPROT_AC STRING_id ## 1 Q9V8M5 7227FBpp0085821 ## 2 Q9VSA3 7227FBpp0076520 ## 3 P35381 7227Q05825 7227。FBpp0305828 ## 5 02649 7227## 6 Q9W401 7227Q9VHS7 7227。Q9VVG6 7227。## 9 Q9VYF8 7227FBpp0073568 ## 10 P00408 7227FBpp0100177 ## ## [[2]] ## UNIPROT_AC STRING_id ## 1 Q9XTI0 6239.B0250.5 ## 2 Q22347 6239.T08G2.3.1 ## 3 Q9XXK1 6239.H28O16.1a ## 4 P46561 6239.C34E10.6.1 ## 5 P50140 6239.Y22D7AL.5a。2 ## 6 P34575 6239.T20G5.2 ## 7 Q8I7J4 6239.F57B9.4b ## 8 P55954 6239.Y37D8A14 ## 9 p34666 6239.zk652.9 ## 10 p24894 6239.mtce.31
输出美元地图[[2]]
## # 1 Q9XTI0 6239.B0250.5 ## 2 Q22347 6239.T08G2.3.1 ## 3 Q9XXK1 6239.H28O16.1a ## 4 P46561 6239.C34E10.6.1 ## 5 P50140 6239.Y22D7AL.5a。2 ## 6 P34575 6239.T20G5.2 ## 7 Q8I7J4 6239.F57B9.4b ## 8 P55954 6239.Y37D8A14 ## 9 p34666 6239.zk652.9 ## 10 p24894 6239.mtce.31

引用

李建军,李建军,李建军,张建军,张建军,张建军,张建军。基于R/Bioconductor的生物信息学研究进展。BMC生物信息学。2019年12月,20(1):73。