DSpace Repository

USING NATURAL LANGUAGE PROCESSING TO THE HUMANITIES: EXPLORING CLIMATE CHANGE IN THE SCIENTIFIC LITERATURE

Система будет остановлена для регулярного обслуживания. Пожалуйста, сохраните рабочие данные и выйдите из системы.

Show simple item record

dc.contributor.author Berdimbetov, Dossym
dc.date.accessioned 2022-09-16T05:14:05Z
dc.date.available 2022-09-16T05:14:05Z
dc.date.issued 2022-07
dc.identifier.citation Berdimbetov, D. (2022). Using Natural Language Processing to the Humanities: Exploring Climate Change in the Scientific Literature (Unpublished master's thesis). Nazarbayev University, Nur-Sultan, Kazakhstan en_US
dc.identifier.uri http://nur.nu.edu.kz/handle/123456789/6702
dc.description.abstract Today’s scientific community is puzzled by searching and processing data and extracting hidden relations and critical facts from articles. In this regard, we decided to consider how to improve the situation. This study explored methods for obtaining metadata from various scientific article aggregators before settling on "www.core.ac.uk". After checking for duplicates, language, and preprocessing, the data set was reduced from 111,552 records to 49,310 records. Stemming, standardization, and lemmatization methods were used for data preprocessing. Our goal is to see how different the clustering results of the models Word2Vec, FastText, Doc2Vec, and Top2Vec embeddings are. We used the TF-IDF and K-Means clustering approach as the base model. The critical point is that real-world data is diverse and has different densities. It follows that preserving the local and global data structure is necessary. We used the UMAP dimensionality reduction approach for dense and arbitrary data and the HDBSCAN algorithm, which detects clusters based on density. Based on the test results, we interviewed five assessors. The results are promising, but we intend to continue research on this subject, including topic models such as LDA and BERTopic, in the future. en_US
dc.language.iso en en_US
dc.publisher Nazarbayev University School of Engineering and Digital Sciences en_US
dc.rights Attribution-NonCommercial-ShareAlike 3.0 United States *
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/3.0/us/ *
dc.subject Type of access: Gated Access en_US
dc.subject Research Subject Categories::TECHNOLOGY en_US
dc.subject Natural Language Processing en_US
dc.subject Climate Change en_US
dc.subject Scientific Literature en_US
dc.title USING NATURAL LANGUAGE PROCESSING TO THE HUMANITIES: EXPLORING CLIMATE CHANGE IN THE SCIENTIFIC LITERATURE en_US
dc.type Master's thesis en_US
workflow.import.source science


Files in this item

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-ShareAlike 3.0 United States Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike 3.0 United States