Abstract
In the last decade, social media has emerged as the largest centralized source of opinions, expressions, blogs and micro-blogs, news, and other information. It has presented a great opportunity for the researchers, industries, and governments to understand the behavior of their customers and constituents to better align their products and services with their customers’ and citizens’ requirements. Among the social media sources, Twitter is a unique source in that data (microblogs) is unstructured and is available for free. Twitter is used widely across the globe and its microblog concept lends itself to analyze the underlying sentiment. A recent debate has been on the COVID vaccines – whether the potential benefits outweigh the side effects. Currently, there are many vaccines available with different claimed efficacy against the virus. The varying efficacies of these vaccines have attracted a public discourse. This research aims to analyze COVID-19 vaccines related tweets to better understand the pattern of public sentiments and opinions about the vaccines with respect to their side effects, potency, availability, and efficacy. The tweets are categorized and analyzed based on their polarity and subjectivity towards the vaccines. To perform the classification of tweets based on aspects, machine learning techniques such as Logistic Regression (LR), Naïve Bayes (MNB), and Support Vector Machine (SVM) along with deep learning technique Long Term Short Memory (LSTM) are used. All these classification algorithms are then compared and evaluated on the measures of precision, recall, accuracy score and F1-score. Apart from categorization and classification, topic modelling method LDA is used to extract the topics based on their similarity and frequency that can sum up the sentiment of common public towards whole process of COVID vaccine. Based on 60,000 tweets between 1-March-2021 to 31-May-2021, overall public sentiment for vaccine indicated a positive trend. Analyzing aspects of vaccines, efficacy of vaccines has turned out to be the most positive aspect which has encouraged people to advocate for vaccines. With evaluation and comparison of model’s performance, bidirectional LSTM with 92% accuracy has outperformed all the machine learning algorithms. Among all the machine learning algorithms based on different vectorization techniques, Logistic Regression achieved the highest accuracy of 73% with count vectorizer while SVM and MNB got accuracy of 64% only. Topic modeling with LDA for whole dataset with 60,000 tweets, yielded 4optimum number of topics with coherence score of 0.625. Most of the topics had common theme of availability, efficacy, and side effect.</jats:p>
Original language | English |
---|---|
Journal | International Research Journal of Computer Science |
Volume | 9 |
Issue number | 4 |
Publication status | Published - 30 Apr 2022 |