DEEP LEARNING IN GENOMIC SIGNAL PROCESSING

dc.contributor.authorBekbolat, Marzhan
dc.date.accessioned2022-09-21T08:29:21Z
dc.date.available2022-09-21T08:29:21Z
dc.date.issued2022-04
dc.description.abstractThe complexity of the genomics data is increasing in parallel with the development of this science, and creating new computational challenges. The recent appearance of the new generation sequencing (NGS) technologies as single cell RNA sequence (scRNA-seq) increases the chance of discovering new disease biomarkers and helps to deepen the knowledge about cellular functions. In parallel with development in genomics, a number of algorithmic and computational advancement in machine learning have enabled deep learning technologies to find unprecedented applications in many fields. However, the applications of deep learning in genomics is limited. This state of affairs is mainly attributed to relatively small sample size (n) with respect to the large number of genes (p) in such biomedical data. Moreover, the presence of the lowly expressed genes in the cell causes the dropout events, which leads to sparse nature of scRNA-seq expression data. Among various types of neural networks, convolutional neural network has particularly become an attractive choice in many applications. Although it has been presented as an effective tool in dealing with complex classification and regression problems in fields such as computer vision and natural language processing that work with high dimensional data, there are limitations in applying CNNs on the scRNA-seq data. Even though CNN has a weight sharing feature that increase the network generalization property, the “large p small n” nature of scRNA-seq data can lead to overfitting. Another problem is that CNN is basically designed to work with a data with grid-like topology such as time-series or digital images, which is not the case in scRNA-seq data. Therefore, in this thesis, we are proposing a combination of methods based on hierarchical clustering, random projection, and ensemble learning to train CNN with scRNA-seq data. The integration of ensemble learning with random projection is helpful when dealing with high dimensionality of the scRNA-seq data. Whereas, the hierarchical clustering was used as a tool for creating a sequential data. The proposed method does not imply use of any domain-specific knowledge in creating the sequential data, hence is applicable not only for scRNA-seq data, but also in other applications where data is sparse and high-dimensional.en_US
dc.identifier.citationBekbolat, M. (2022). DEEP LEARNING IN GENOMIC SIGNAL PROCESSING (Unpublished master's thesis). Nazarbayev University, Nur-Sultan, Kazakhstanen_US
dc.identifier.urihttp://nur.nu.edu.kz/handle/123456789/6718
dc.language.isoenen_US
dc.publisherNazarbayev University School of Engineering and Digital Sciencesen_US
dc.rightsAttribution-NonCommercial-ShareAlike 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/us/*
dc.subjectType of access: Gated Accessen_US
dc.subjectResearch Subject Categories::TECHNOLOGYen_US
dc.subjectnew generation sequencingen_US
dc.subjectNGSen_US
dc.subjectsingle cell RNA sequenceen_US
dc.subjectscRNA-seqen_US
dc.subjectNeural Networksen_US
dc.titleDEEP LEARNING IN GENOMIC SIGNAL PROCESSINGen_US
dc.typeMaster's thesisen_US
workflow.import.sourcescience

Files

Original bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
Thesis - Marzhan Bekbolat.pdf
Size:
586.14 KB
Format:
Adobe Portable Document Format
Description:
Thesis
No Thumbnail Available
Name:
Presentation - Marzhan Bekbolat.pptx
Size:
4.19 MB
Format:
Microsoft Powerpoint XML
Description:
Presentation
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.28 KB
Format:
Item-specific license agreed upon to submission
Description: