MetaboSignal 2: merging KEGG with additional interaction resources
安德里亚·罗德里格斯马丁内斯Maryam安瓦尔,拉斐尔啊ala, Joram M. Posma, Ana L. Neves, Marc-Emmanuel Dumas
May 22, 2017
用于室内外MetaboSignal R是一个包te the genetic regulation of the metabolome, using KEGG as primary reference database. The main goal of this vignette is to illustrate how KEGG interactions can be merged with two large literature-curated resources of human regulatory interactions: OmniPath and TRRUST.
Metabolites are organized in biochemical pathways regulated by signaling-transduction pathways, allowing the organism to adapt to environmental changes and maintain homeostasis. We developed MetaboSignal(Rodriguez-Martinezet al.2017)as a tool to explore the relationships between genes (both enzymatic and signaling) and metabolites, using the Kyoto Encyclopedia of Genes and Genomes (KEGG)(Kanehisa & Goto 2000)as primary reference database. In order to generate a more complete picture of the genetic regulation of the metabolome, we have now updated and standarized the functionalities of MetaboSignal to facilitate its integration with additional resources of molecular interactions. In this vignette we show how KEGG interactions can be merged with human regulatory interactions from two large literature-curated resources: OmniPath(Tureiet al.2016)and TRRUST(Hanget al.2015).
We begin by loading the MetaboSignal package:
## Load MetaboSignal library(MetaboSignal)
We then load the “regulatory_interactions” and “kegg_pathways” datasets, containing the following information:
- regulatory_interactions: matrix containing a set of regulatory interactions reported in OmniPath (directed protein-protein and signaling interactions) and TRRUST (transcription factor-target interactions). For each interaction, both literature reference(s) and primary database reference(s) are reported. Users are responsible for respecting the terms of the licences of these databases and for citing them when required. Notice that there are some inconsistencies between databases in terms of direction and sign of the interactions. This is likely to be due to curation errors, or also to the fact that some interactions might be bidirectional or have different sign depending on the tissue. Users can update/edit this matrix as required.
- kegg_pathways: matrix containing the identifiers (IDs) of relevant metabolic (n = 85) and signaling (n = 126) human KEGG pathways. These IDs were retrieved using the function “MS_getPathIds( )”.
## Regulatory interactions data("regulatory_interactions") head(regulatory_interactions[, c(1, 3, 5)])
## source_entrez target_entrez interaction_type ## [1,] "351" "2" "o_Unknown" ## [2,] "3576" "2" "o_Unknown" ## [3,] "7040" "2" "o_Unknown" ## [4,] "7042" "2" "o_Unknown" ## [5,] "2064" "12" "o_Unknown" ## [6,] "3817" "12" "o_Unknown"
## KEGG metabolic pathways data("kegg_pathways") head(kegg_pathways[, -2])
## Path_id Path_category Path_type ## [1,] "hsa00010" "Metabolism; Carbohydrate metabolism" "metabolic" ## [2,] "hsa00020" "Metabolism; Carbohydrate metabolism" "metabolic" ## [3,] "hsa00030" "Metabolism; Carbohydrate metabolism" "metabolic" ## [4,] "hsa00040" "Metabolism; Carbohydrate metabolism" "metabolic" ## [5,] "hsa00051" "Metabolism; Carbohydrate metabolism" "metabolic" ## [6,] "hsa00052" "Metabolism; Carbohydrate metabolism" "metabolic"
## KEGG signaling pathways tail(kegg_pathways[, -2])
## Path_id Path_category Path_type ## [206,] "hsa04964" "Organismal Systems; Excretory system" "signaling" ## [207,] "hsa04966" "Organismal Systems; Excretory system" "signaling" ## [208,] "hsa04970" "Organismal Systems; Digestive system" "signaling" ## [209,] "hsa04971" "Organismal Systems; Digestive system" "signaling" ## [210,] "hsa04972" "Organismal Systems; Digestive system" "signaling" ## [211,] "hsa04976" "Organismal Systems; Digestive system" "signaling"
We use the function “MS_getPathIds( )” to retrieve the IDs of all human metabolic and signaling KEGG pathways.
# # IDs的人类pathwa代谢和信号ys hsa_paths <- MS_getPathIds(organism_code = "hsa")
This function generates a “.txt” file in the working directory named “hsa_pathways.txt”. We recommend that users take some time to inspect this file and carefully select the metabolic and signaling pathways that will be used to build the network. In this example, we selected the pathways stored in the “kegg_pathways” dataset.
Next, we use the function “MS_keggNetwork( )” to build a MetaboSignal network, by merging the selected metabolic and signaling KEGG pathways stored in the “kegg_pathways” dataset:
## Create metabo_paths and signaling_paths ## vectors metabo_paths <- kegg_pathways[kegg_pathways[, "Path_type"] == "metabolic", "Path_id"] signaling_paths <- kegg_pathways[kegg_pathways[, "Path_type"] == "signaling", "Path_id"]
## Build KEGG network (might take a while) keggNet_example <- MS_keggNetwork(metabo_paths, signaling_paths, expand_genes = TRUE, convert_entrez = TRUE)
## See network format head(keggNet_example)
## source target interaction_type ## [1,] "cpd:C00084" "217" "k_compound:reversible" ## [2,] "cpd:C00084" "224" "k_compound:reversible" ## [3,] "cpd:C00084" "221" "k_compound:reversible" ## [4,] "cpd:C00084" "219" "k_compound:reversible" ## [5,] "cpd:C00084" "222" "k_compound:reversible" ## [6,] "cpd:C00084" "220" "k_compound:reversible"
The network is formatted as a three-column matrix where each row represents an edge between two nodes (from source to target). The nodes represent the following molecular entities: chemical compounds (KEGG IDs), reactions (KEGG IDs), signaling genes (Entrez IDs) and metabolic genes (Entrez IDs). The type of interaction is reported in the “interaction_type” column. Compound-gene (or gene-compound) interactions are designated as: “k_compound:reversible” or “k_compound:irreversible”, depending on the direction of the interaction. Other types of interactions correspond to gene-gene interactions. When KEGG reports various types of interaction for the same interactant pair, the “interaction_type” is collapsed using “/”.
Notice that when transforming KEGG signaling maps into binary interactions, a number of indirect interactions are introduced, such as interactions involving all members of a proteic complex or proteins interactingviaan intermediary compound (e.g.AC and PKA,viacAMP). We recommend excluding these indirect interactions, as they might alter further topological analyses. In this example, we remove interactions classified as: “unknown”, “indirect-compound”, “indirect-effect”, “dissociation”, “state-change”, “binding”, “association”.
## Get all types of interaction all_types <- unique(unlist(strsplit(keggNet_example[, "interaction_type"], "/"))) all_types <- gsub("k_", "", all_types) ## Select wanted interactions wanted_types <- setdiff(all_types, c("unknown", "indirect-compound", "indirect-effect", "dissociation", "state-change", "binding", "association")) print(wanted_types) # interactions that will be retained
## [1] "compound:reversible" "compound:irreversible" "expression" ## [4] "activation" "phosphorylation" "dephosphorylation" ## [7] "inhibition" "repression" "ubiquitination" ## [10] "methylation" "glycosylation"
## Filter keggNet_example to retain only wanted ## interactions wanted_types <- paste(wanted_types, collapse = "|") keggNet_clean <- keggNet_example[grep(wanted_types, keggNet_example[, 3]), ]
We then use the function “MS2_ppiNetwork( )” to generate a regulatory network, by merging the signaling interactions from OmniPath and TRRUST, or by selecting the interactions of only one of these databases. Some examples are shown below:
## Build regulatory network of TRRUST interactions trrustNet_example <- MS2_ppiNetwork(datasets = "trrust") ## Build regulatory network of OmniPath interactions omnipathNet_example <- MS2_ppiNetwork(datasets = "omnipath") ## Build regulatory network by merging OmniPath and TRRUST interactions ppiNet_example <- MS2_ppiNetwork(datasets = "all") ## See network format head(ppiNet_example)
## source target interaction_type ## [1,] "351" "2" "o_Unknown" ## [2,] "3576" "2" "o_Unknown" ## [3,] "7040" "2" "o_Unknown" ## [4,] "7042" "2" "o_Unknown" ## [5,] "2064" "12" "o_Unknown" ## [6,] "3817" "12" "o_Unknown"
Each of these networks is formatted as a three-column matrix where each row represents an edge between two nodes (from source to target). The third column indicates the interaction type and the source of the interaction (OmniPath: “o_”, TRRUST: “t_”). Notice that common interactions between both databases are collapsed, and the interaction type is reported as: “o_; t_;”.
Finally, we use the function “MS2_mergeNetworks( )” to merge the KEGG-based network with the regulatory network.
## Merge networks mergedNet_example <- MS2_mergeNetworks(keggNet_clean, ppiNet_example)
## See network format head(mergedNet_example)
## source target interaction_type ## [1,] "cpd:C00084" "217" "k_compound:reversible" ## [2,] "cpd:C00084" "224" "k_compound:reversible" ## [3,] "cpd:C00084" "221" "k_compound:reversible" ## [4,] "cpd:C00084" "219" "k_compound:reversible" ## [5,] "cpd:C00084" "222" "k_compound:reversible" ## [6,] "cpd:C00084" "220" "k_compound:reversible"
The network is formatted as a three-column matrix where each row represents an edge between two nodes (from source to target). The third column indicates the interaction type and the source of the interaction (KEGG: “k_”, OmniPath: “o_”, TRRUST: “t_”). Notice that common interactions between both databases are collapsed, and the interaction type is reported as: “k_;o_;t_;”. This network can be further customized and subsequently used to explore gene-metabolite associations as described in the introductory vignette of the package.
Hang, H., Shim, H., Shin, D., Shim, J.E., Ko, Y., Shin, J., H Kim, Cho, A., Kim, E., Lee, T., Kim, H., Kim, K., Yang, S., Bae, D., Yun, A., Kim, S., Kim, C.Y., Cho, H.J., Kang, B., Shin, S. & Lee, I. (2015). TRRUST: A reference database of human transcriptional regulatory interactions.Scientific Reports,12, 11432. Retrieved fromhttps://www.nature.com/articles/srep11432
Kanehisa, M. & Goto, S. (2000). KEGG: Kyoto encyclopedia of genes and genomes.Nucleic Acids Research,28, 27–30. Retrieved fromhttp://nar.oxfordjournals.org/content/28/1/27
Rodriguez-Martinez, A., Ayala, R., Posma, J.M., Neves, A.L., Gauguier, D., Nicholson, J.K. & Dumas, M.-E. (2017). MetaboSignal: A network-based approach for topological analysis of metabotype regulation via metabolic and signaling pathways.Bioinformatics,33, 773–775. Retrieved fromhttps://academic.oup.com/bioinformatics/article/33/5/773/2725552/
Turei, D., Korcsmaros, T. & Saez-Rodriguez, J. (2016). OmniPath: Guidelines and gateway for literature-curated signaling pathway resources.Nature Methods,13, 966–967. Retrieved fromhttp://www.nature.com/nmeth/journal/v13/n12/full/nmeth.4077.html