A Distributed Software Architecture for Performing Text Analysis on Web content
Loading...
Date
2017-05
Authors
Aldabergenov, Aibek
Journal Title
Journal ISSN
Volume Title
Publisher
Nazarbayev University School of Science and Technology
Abstract
With the high availability of data on the World Wide Web, researchers are actively using
Web content for performing various text analysis operations. The large amount of data
introduces challenges in data acquisition, storage and processing for researchers who want to
use data from different sources on the Internet. In an environment where several people might
want to share their data and code, the problem is further complicated by researchers' use of
different software applications for performing data collection, storage and analysis tasks.
The goal of this thesis is to study the components that make up different parts of web
mining systems, and present a scalable software architecture for large-scale Web content
analytics tasks performed in a multi-user setting. Additionally, an implementation of the
proposed software architecture using modern open source software frameworks and tools is
presented in this work.
Description
Keywords
World Wide Web, data
Citation
Aibek Aldabergenov. A Distributed Software Architecture for Performing Text Analysis on Web content. 2017. Department of Computer Science, School of Science and Technology, Nazarbayev University