A Comparative Study for Arabic Text Classification Based on BOW and Mixed Words Representations

Sallam, Rouhia M. and Mousa, Hamdy and Hussien, Mahmoud (2016) A Comparative Study for Arabic Text Classification Based on BOW and Mixed Words Representations. IJCI. International Journal of Computers and Information, 5 (1). pp. 24-34. ISSN 1687-7853

[thumbnail of IJCI_Volume 5_Issue 1_Pages 24-34.pdf]

Text
IJCI_Volume 5_Issue 1_Pages 24-34.pdf - Published Version
Download (661kB)

Official URL: https://doi.org/10.21608/ijci.2016.33954

Abstract

This paper compares two methods for features representation in Arabic text classification. These methods are bag of words (BOW) that mean the word-level unigram and mixed words representations. The mixed words use a mixture of a bag of words and two adjacent words with different proportions. The main objective of this paper is to measure the accuracy of each method and to determine which method is more accurate for Arabic text classification based on the representation modes. Each method uses normalization and stemming. The results show that the use of mixed words in features representation achieves the highest accuracy by 98.61% when normalization is used.

Item Type:	Article
Subjects:	Research Scholar Guardian > Computer Science
Depositing User:	Unnamed user with email support@scholarguardian.com
Date Deposited:	15 Sep 2023 05:09
Last Modified:	15 Sep 2023 05:09
URI:	http://science.sdpublishers.org/id/eprint/1383

Actions (login required)

: View Item