DSpace Repository

DEEP LEARNING IN GENOMIC SIGNAL PROCESSING

Система будет остановлена для регулярного обслуживания. Пожалуйста, сохраните рабочие данные и выйдите из системы.

Show simple item record

dc.contributor.author Bekbolat, Marzhan
dc.date.accessioned 2022-09-21T08:29:21Z
dc.date.available 2022-09-21T08:29:21Z
dc.date.issued 2022-04
dc.identifier.citation Bekbolat, M. (2022). DEEP LEARNING IN GENOMIC SIGNAL PROCESSING (Unpublished master's thesis). Nazarbayev University, Nur-Sultan, Kazakhstan en_US
dc.identifier.uri http://nur.nu.edu.kz/handle/123456789/6718
dc.description.abstract The complexity of the genomics data is increasing in parallel with the development of this science, and creating new computational challenges. The recent appearance of the new generation sequencing (NGS) technologies as single cell RNA sequence (scRNA-seq) increases the chance of discovering new disease biomarkers and helps to deepen the knowledge about cellular functions. In parallel with development in genomics, a number of algorithmic and computational advancement in machine learning have enabled deep learning technologies to find unprecedented applications in many fields. However, the applications of deep learning in genomics is limited. This state of affairs is mainly attributed to relatively small sample size (n) with respect to the large number of genes (p) in such biomedical data. Moreover, the presence of the lowly expressed genes in the cell causes the dropout events, which leads to sparse nature of scRNA-seq expression data. Among various types of neural networks, convolutional neural network has particularly become an attractive choice in many applications. Although it has been presented as an effective tool in dealing with complex classification and regression problems in fields such as computer vision and natural language processing that work with high dimensional data, there are limitations in applying CNNs on the scRNA-seq data. Even though CNN has a weight sharing feature that increase the network generalization property, the “large p small n” nature of scRNA-seq data can lead to overfitting. Another problem is that CNN is basically designed to work with a data with grid-like topology such as time-series or digital images, which is not the case in scRNA-seq data. Therefore, in this thesis, we are proposing a combination of methods based on hierarchical clustering, random projection, and ensemble learning to train CNN with scRNA-seq data. The integration of ensemble learning with random projection is helpful when dealing with high dimensionality of the scRNA-seq data. Whereas, the hierarchical clustering was used as a tool for creating a sequential data. The proposed method does not imply use of any domain-specific knowledge in creating the sequential data, hence is applicable not only for scRNA-seq data, but also in other applications where data is sparse and high-dimensional. en_US
dc.language.iso en en_US
dc.publisher Nazarbayev University School of Engineering and Digital Sciences en_US
dc.rights Attribution-NonCommercial-ShareAlike 3.0 United States *
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/3.0/us/ *
dc.subject Type of access: Gated Access en_US
dc.subject Research Subject Categories::TECHNOLOGY en_US
dc.subject new generation sequencing en_US
dc.subject NGS en_US
dc.subject single cell RNA sequence en_US
dc.subject scRNA-seq en_US
dc.subject Neural Networks en_US
dc.title DEEP LEARNING IN GENOMIC SIGNAL PROCESSING en_US
dc.type Master's thesis en_US
workflow.import.source science


Files in this item

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-ShareAlike 3.0 United States Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike 3.0 United States