TEXT CATEGORIZATION USING MACHINE LEARNING

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

School of Engineering and Digital Sciences

Abstract

In today’s technological world the amount of textual data is growing dramatically. With that, databases, libraries, bookstores, and private storage are increasing with demand. Many upcoming and new books and articles need to be classified whether they are related to science, entertainment, or fiction stories. Therefore, the machine learning approach could be applied to classify large text data into categories and genres. It will allow saving thousands of hours of manual routine work and automize the process. Text categorization is the process of text classification into different categories or genres. There are a variety of ways to do this, but one common method is to use a set of predetermined categories, such as news, sports, weather, and so on. Another approach is to let the system learn what categories exist by looking at a training set of texts that have been previously labeled with their genres. Once the system has learned about the different genres, it can then be used to automatically label new texts. In this paper, we are going to focus on classifying books by genre using machine learning. Some popular genres include fiction, non-fiction, mystery, romance, and science fiction. There are countless other genres, and new ones are always being created. The main objective of this paper is to build an automated system of genre classification for large texts. The outcome of this paper may help in text categorization, recommendation systems, search engines to find books or articles related to the topic, document classification, and more. For that, investigation of state-of-the-art methodologies is going to be performed.

Description

Citation

Okhassov, T. (2023). Text categorization using machine learning. School of Engineering and Digital Sciences

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-ShareAlike 3.0 United States