DEEP LEARNING IN GENOMIC SIGNAL PROCESSING

Bekbolat, Marzhan

DEEP LEARNING IN GENOMIC SIGNAL PROCESSING

dc.contributor.author	Bekbolat, Marzhan
dc.date.accessioned	2022-09-21T08:29:21Z
dc.date.available	2022-09-21T08:29:21Z
dc.date.issued	2022-04
dc.description.abstract	The complexity of the genomics data is increasing in parallel with the development of this science, and creating new computational challenges. The recent appearance of the new generation sequencing (NGS) technologies as single cell RNA sequence (scRNA-seq) increases the chance of discovering new disease biomarkers and helps to deepen the knowledge about cellular functions. In parallel with development in genomics, a number of algorithmic and computational advancement in machine learning have enabled deep learning technologies to find unprecedented applications in many fields. However, the applications of deep learning in genomics is limited. This state of affairs is mainly attributed to relatively small sample size (n) with respect to the large number of genes (p) in such biomedical data. Moreover, the presence of the lowly expressed genes in the cell causes the dropout events, which leads to sparse nature of scRNA-seq expression data. Among various types of neural networks, convolutional neural network has particularly become an attractive choice in many applications. Although it has been presented as an effective tool in dealing with complex classification and regression problems in fields such as computer vision and natural language processing that work with high dimensional data, there are limitations in applying CNNs on the scRNA-seq data. Even though CNN has a weight sharing feature that increase the network generalization property, the “large p small n” nature of scRNA-seq data can lead to overfitting. Another problem is that CNN is basically designed to work with a data with grid-like topology such as time-series or digital images, which is not the case in scRNA-seq data. Therefore, in this thesis, we are proposing a combination of methods based on hierarchical clustering, random projection, and ensemble learning to train CNN with scRNA-seq data. The integration of ensemble learning with random projection is helpful when dealing with high dimensionality of the scRNA-seq data. Whereas, the hierarchical clustering was used as a tool for creating a sequential data. The proposed method does not imply use of any domain-specific knowledge in creating the sequential data, hence is applicable not only for scRNA-seq data, but also in other applications where data is sparse and high-dimensional.	en_US
dc.identifier.citation	Bekbolat, M. (2022). DEEP LEARNING IN GENOMIC SIGNAL PROCESSING (Unpublished master's thesis). Nazarbayev University, Nur-Sultan, Kazakhstan	en_US
dc.identifier.uri	http://nur.nu.edu.kz/handle/123456789/6718
dc.language.iso	en	en_US
dc.publisher	Nazarbayev University School of Engineering and Digital Sciences	en_US
dc.rights	Attribution-NonCommercial-ShareAlike 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/3.0/us/	*
dc.subject	type of access: gated access	en_US
dc.subject	Research Subject Categories::TECHNOLOGY	en_US
dc.subject	new generation sequencing	en_US
dc.subject	NGS	en_US
dc.subject	single cell RNA sequence	en_US
dc.subject	scRNA-seq	en_US
dc.subject	Neural Networks	en_US
dc.title	DEEP LEARNING IN GENOMIC SIGNAL PROCESSING	en_US
dc.type	Master's thesis	en_US
workflow.import.source	science

Files

Original bundle

Now showing 1 - 2 of 2

Name:: Thesis - Marzhan Bekbolat.pdf
Size:: 586.14 KB
Format:: Adobe Portable Document Format
Description:: Thesis

Download

Name:: Presentation - Marzhan Bekbolat.pptx
Size:: 4.19 MB
Format:: Microsoft Powerpoint XML
Description:: Presentation

Download

Collections

02. Master's Thesis