DEEP LEARNING IN GENOMIC SIGNAL PROCESSING
Loading...
Date
2022-04
Authors
Bekbolat, Marzhan
Journal Title
Journal ISSN
Volume Title
Publisher
Nazarbayev University School of Engineering and Digital Sciences
Abstract
The complexity of the genomics data is increasing in parallel with the development of
this science, and creating new computational challenges. The recent appearance of the new
generation sequencing (NGS) technologies as single cell RNA sequence (scRNA-seq) increases
the chance of discovering new disease biomarkers and helps to deepen the knowledge about
cellular functions. In parallel with development in genomics, a number of algorithmic and
computational advancement in machine learning have enabled deep learning technologies to
find unprecedented applications in many fields. However, the applications of deep learning in
genomics is limited. This state of affairs is mainly attributed to relatively small sample size (n)
with respect to the large number of genes (p) in such biomedical data. Moreover, the presence
of the lowly expressed genes in the cell causes the dropout events, which leads to sparse nature
of scRNA-seq expression data.
Among various types of neural networks, convolutional neural network has particularly
become an attractive choice in many applications. Although it has been presented as an effective
tool in dealing with complex classification and regression problems in fields such as computer
vision and natural language processing that work with high dimensional data, there are
limitations in applying CNNs on the scRNA-seq data.
Even though CNN has a weight sharing feature that increase the network generalization
property, the “large p small n” nature of scRNA-seq data can lead to overfitting. Another
problem is that CNN is basically designed to work with a data with grid-like topology such as
time-series or digital images, which is not the case in scRNA-seq data. Therefore, in this thesis,
we are proposing a combination of methods based on hierarchical clustering, random projection,
and ensemble learning to train CNN with scRNA-seq data. The integration of ensemble learning with random projection is helpful when dealing with high dimensionality of the scRNA-seq
data. Whereas, the hierarchical clustering was used as a tool for creating a sequential data. The
proposed method does not imply use of any domain-specific knowledge in creating the
sequential data, hence is applicable not only for scRNA-seq data, but also in other applications
where data is sparse and high-dimensional.
Description
Keywords
Type of access: Gated Access, Research Subject Categories::TECHNOLOGY, new generation sequencing, NGS, single cell RNA sequence, scRNA-seq, Neural Networks
Citation
Bekbolat, M. (2022). DEEP LEARNING IN GENOMIC SIGNAL PROCESSING (Unpublished master's thesis). Nazarbayev University, Nur-Sultan, Kazakhstan