Enhancing gene regulatory networks inference through hub-based data integration

  • Atefeh Naseri*
  • , Mehran Sharghi
  • , Seyed Mohammad Hossein Hasheminejad
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

One of the main research topics in computational biology is Gene Regulatory Network (GRN) reconstruction that refers to inferring the relationships between genes involved in regulating cell conditions in response to internal or external stimuli. To this end, most computational methods use only transcriptional gene expression data to reconstruct gene regulatory networks, but recent studies suggest that gene expression data must be integrated with other types of data to obtain more accurate models predicting real relationships between genes. In this study, a diffusion-based method is enhanced to integrate biological data of network types besides structural prior knowledge. The Random Walk with Restart algorithm (RWR) with an emphasis on hub nodes is executed separately on each network, and then jointly optimizes low-dimensional feature vectors for network nodes by diffusion component analysis. Next, these feature vectors are used to infer gene regulatory networks. Fourteen centrality measures are studied for the detection of hub nodes to be used in the RWR algorithm, and the best centrality measure having the greatest effect on the improvement of gene network inference is selected. A case study for the Saccharomyces cerevisiae and E. coli networks shows that using the proposed features in comparison with gene expression data alone results in 0.02–0.08 units improvement in Area Under Receiver Characteristic Operator (AUROC) criteria across different gene regulatory network inference methods. Furthermore, the proposed method was applied to the esophageal cancer data to infer its gene regulatory network. The proposed framework substantially improves accuracy and scalability of GRN inference. The fused features and the best centrality measure detected can be used to provide functional insights about genes or proteins in various biological applications. Moreover, it can be served as a general framework for network data and structural data integration and analysis problems in various scientific disciplines including biology.
Original languageEnglish
Article number107589
JournalComputational Biology and Chemistry
Volume95
Early online date6 Oct 2021
DOIs
Publication statusPublished - Dec 2021

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Fingerprint

Dive into the research topics of 'Enhancing gene regulatory networks inference through hub-based data integration'. Together they form a unique fingerprint.

Cite this