Sentiment Analysis using Unlabeled Email data

Rayan Salah Hag Ali, Neamat El Gayar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)


Sentiment Analysis (SA) in the context of text mining is an automated process to detect subjectivity information, such as opinions, attitudes, emotions and feeling. Most prior work in SA view it as a text classification problem which needs labeled data to train the model. However, it is tough to get a labeled dataset. Most of the times we will need to do it by hand. Another issue is that the lack of portability across different domains makes it hard to use the same labeled data in different applications. Thus, we need to create labeled data for each domain manually. In this paper, we will use sentiment analysis to analyze the Enron email dataset. This work aims to find the best techniques to label the dataset automatically and avoid manual labeling. The training data is used to build a classifier using a supervised machine learning algorithm. In the labeling phase, we compare the lexicon labeling with k-mean labeling. Lexicon labeling gave better and reliable results. We used this labeled dataset to train the classifier. We used TF-IDF for feature extraction, to train Naïve Bayes and Support vector machine (SVM) classifiers.

Original languageEnglish
Title of host publication2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE)
Number of pages6
ISBN (Electronic)9781728137780
Publication statusPublished - 20 Feb 2020
Event2019 International Conference on Computational Intelligence and Knowledge Economy - Dubai, United Arab Emirates
Duration: 11 Dec 201912 Dec 2019


Conference2019 International Conference on Computational Intelligence and Knowledge Economy
Abbreviated titleICCIKE 2019
Country/TerritoryUnited Arab Emirates


  • k-means
  • Sentiment analysis
  • support vector machine

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications


Dive into the research topics of 'Sentiment Analysis using Unlabeled Email data'. Together they form a unique fingerprint.

Cite this