dc.contributor.author | Shomanov, A S | |
dc.contributor.author | Mansurova, M E | |
dc.date.accessioned | 2021-09-16T07:30:38Z | |
dc.date.available | 2021-09-16T07:30:38Z | |
dc.date.issued | 2021 | |
dc.identifier.citation | Shomanov, A. S., & Mansurova, M. E. (2021). Parallel news clustering and topic modeling approaches. Journal of Physics: Conference Series, 1727, 012018. https://doi.org/10.1088/1742-6596/1727/1/012018 | en_US |
dc.identifier.uri | http://nur.nu.edu.kz/handle/123456789/5793 | |
dc.description.abstract | At the current age there is an urgent need in developing massively scalable and efficient tools to Big Data processing. Even the smallest companies nowadays inevitably require more and more resources for data processing routines that could enhance decision making and reliably predict and simulate different scenarios. In the current paper we present our combined work on different massively scalable approaches for the task of clustering and topic modeling of the dataset, collected by crawling Kazakhstan news websites. In particular, we propose Apache Spark parallel solutions to news clustering and topic modeling problems and, additionally, we describe results of implementing document clustering using developed partitioned global address space Mapreduce system. In our work we describe our experience in solving these problems and investigate the efficiency and scalability of the proposed solutions. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Journal of Physics: Conference Series | en_US |
dc.rights | Attribution-NonCommercial-ShareAlike 3.0 United States | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/3.0/us/ | * |
dc.subject | Type of access: Open Access | en_US |
dc.subject | Big Data processing | en_US |
dc.title | PARALLEL NEWS CLUSTERING AND TOPIC MODELING APPROACHES | en_US |
dc.type | Article | en_US |
workflow.import.source | science |
The following license files are associated with this item: