THE 3
rd
INTERNATIONAL SCIENTIFIC CONFERENCES OF STUDENTS AND YOUNG RESEARCHERS
dedicated to the 99
th
anniversary of the National Leader
of Azerbaijan Heydar Aliyev
119
E-commerce industry should pay more attention to the sentiment
analysis which has a very tremendous effect on understanding customer
needs. The process of sentiment analysis contains
defining the emotions
from a series of words, or the opinions and the tone from the text. Most
important one is to understand whether the feedback is positive or negative.
An integral part of the analysis is not only to understand the content or the
emotion of the feedback, but also to understand which aspect they are talking
about. Nowadays, the number of e-commerce
platforms using sentiment
analysis is increasing rapidly due to the need to understand the consumers.
Furthermore, there are numerous benefits of using sentiment analysis
techniques, such as processing exceedingly large amount of data within a
short time, finding out weak points of the products (The size mismatch for the
fashion items can be an example.) [1]
Traditional customer surveys always exist in the market, but the
improvement of e-commerce has brought the sentiment analysis concept to
a more innovative phase using NLP (Natural Language Processing). The first
step is to start data cleaning such as removing missing value, stop words,
digits or the unnecessary symbols, these steps are followed by converting
the whole text to lowercase.
There are various methods used for NLP, however, after testing them
(Bags of Words, BERT, Hugging Face libraries) XLNet has been selected as
the best model. XLNet uses the advantages of auto-regressive methods to
train the data which does not rely on the data corruption so that it outperforms
BERT. The main advantage of AR models is to be good at NLP tasks. The
dataset that has been used to train the model contains review title, reviews,
the binary value of recommended or not and the star rating. Model is trained
using XLNet with 4 epochs which indicate the total number of stages that the
total training data passed through. Throughout the process, the decrease in
the loss (prediction error) has been observed which results in providing better
performance for the model.
As our dataset is unbalanced, precision, recall and F1-score (harmonic
mean of the previous two) are much more appropriate to apply for evaluating
the model performance rather than accuracy, as it is distributed mostly by
the True Negatives/Positives (correct prediction
for negative and positive
class) which is not the focus point. The integral part here is to highlight the
model behavior on the False Negatives/Positives (wrong prediction for
negative and positive class) so that the decrease of the cost can be viable.[2]
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 + 𝐹𝑎𝑙𝑠𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
,
𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 + 𝐹𝑎𝑙𝑠𝑒𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠
,