Abstract
Sentiment Analysis (SA) in the context of text mining is an automated process to detect subjectivity information, such as opinions, attitudes, emotions and feeling. Most prior work in SA view it as a text classification problem which needs labeled data to train the model. However, it is tough to get a labeled dataset. Most of the times we will need to do it by hand. Another issue is that the lack of portability across different domains makes it hard to use the same labeled data in different applications. Thus, we need to create labeled data for each domain manually. In this paper, we will use sentiment analysis to analyze the Enron email dataset. This work aims to find the best techniques to label the dataset automatically and avoid manual labeling. The training data is used to build a classifier using a supervised machine learning algorithm. In the labeling phase, we compare the lexicon labeling with k-mean labeling. Lexicon labeling gave better and reliable results. We used this labeled dataset to train the classifier. We used TF-IDF for feature extraction, to train Naïve Bayes and Support vector machine (SVM) classifiers.
Original language | English |
---|---|
Title of host publication | 2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE) |
Publisher | IEEE |
Pages | 328-333 |
Number of pages | 6 |
ISBN (Electronic) | 9781728137780 |
DOIs | |
Publication status | Published - 20 Feb 2020 |
Event | 2019 International Conference on Computational Intelligence and Knowledge Economy - Dubai, United Arab Emirates Duration: 11 Dec 2019 → 12 Dec 2019 |
Conference
Conference | 2019 International Conference on Computational Intelligence and Knowledge Economy |
---|---|
Abbreviated title | ICCIKE 2019 |
Country/Territory | United Arab Emirates |
City | Dubai |
Period | 11/12/19 → 12/12/19 |
Keywords
- k-means
- Sentiment analysis
- support vector machine
- TFIDF
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Science Applications