PARALLEL NEWS CLUSTERING AND TOPIC MODELING APPROACHES

Loading...
Thumbnail Image

Date

Authors

Shomanov, A S
Mansurova, M E

Journal Title

Journal ISSN

Volume Title

Publisher

Journal of Physics: Conference Series

Abstract

At the current age there is an urgent need in developing massively scalable and efficient tools to Big Data processing. Even the smallest companies nowadays inevitably require more and more resources for data processing routines that could enhance decision making and reliably predict and simulate different scenarios. In the current paper we present our combined work on different massively scalable approaches for the task of clustering and topic modeling of the dataset, collected by crawling Kazakhstan news websites. In particular, we propose Apache Spark parallel solutions to news clustering and topic modeling problems and, additionally, we describe results of implementing document clustering using developed partitioned global address space Mapreduce system. In our work we describe our experience in solving these problems and investigate the efficiency and scalability of the proposed solutions.

Description

Citation

Shomanov, A. S., & Mansurova, M. E. (2021). Parallel news clustering and topic modeling approaches. Journal of Physics: Conference Series, 1727, 012018. https://doi.org/10.1088/1742-6596/1727/1/012018

Collections

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-ShareAlike 3.0 United States