Sallam, Rouhia M. and Mousa, Hamdy and Hussien, Mahmoud (2016) A Comparative Study for Arabic Text Classification Based on BOW and Mixed Words Representations. IJCI. International Journal of Computers and Information, 5 (1). pp. 24-34. ISSN 1687-7853
IJCI_Volume 5_Issue 1_Pages 24-34.pdf - Published Version
Download (661kB)
Abstract
This paper compares two methods for features representation in Arabic text classification. These methods are bag of words (BOW) that mean the word-level unigram and mixed words representations. The mixed words use a mixture of a bag of words and two adjacent words with different proportions. The main objective of this paper is to measure the accuracy of each method and to determine which method is more accurate for Arabic text classification based on the representation modes. Each method uses normalization and stemming. The results show that the use of mixed words in features representation achieves the highest accuracy by 98.61% when normalization is used.
Item Type: | Article |
---|---|
Subjects: | Research Scholar Guardian > Computer Science |
Depositing User: | Unnamed user with email support@scholarguardian.com |
Date Deposited: | 15 Sep 2023 05:09 |
Last Modified: | 15 Sep 2023 05:09 |
URI: | http://science.sdpublishers.org/id/eprint/1383 |