Abstract:
Independent Component Analysis is a matrix factorization method for data dimension
reduction. ICA has been widely applied for the analysis of transcriptomic data for blind
separation of biological, environmental, and technical factors affecting gene expression.
The study aimed to analyze the publicly available esophageal cancer data using the ICA for
identification and comprehensive analysis of reproducible signaling pathways and
molecular signatures involved in this cancer type. In this study, four independent
esophageal cancer transcriptomic datasets from GEO databases were used. A
bioinformatics tool « BiODICA—Independent Component Analysis of Big Omics Data»
was applied to compute independent components (ICs). Gene Set Enrichment Analysis
(GSEA) and ToppGene uncovered the most significantly enriched pathways. Construction
and visualization of gene networks and graphs were performed using the Cytoscape, and
HPRD database. The correlation graph between decompositions into 30 ICs was built with
absolute correlation values exceeding 0.3. Clusters of components—pseudocliques were
observed in the structure of the correlation graph. The top 1,000 most contributing genes
of each ICs in the pseudocliques were mapped to the PPI network to construct associated
signaling pathways. Some cliques were composed of densely interconnected nodes and
included components common to most cancer types (such as cell cycle and extracellular
matrix signals), while others were specific to EC. The results of this investigation may reveal
potential biomarkers of esophageal carcinogenesis, functional subsystems dysregulated
in the tumor cells, and be helpful in predicting the early development of a tumor