The Emotion Annotation Task
We worked with blog posts we collected directly from the Web. First, we prepared a list of seed words for six basic emotion categories proposed by Ekman [3]. These categories represent the distinctly identifiable facial expressions of emotion – happiness, sadness, anger, disgust, surprise and fear. We took words commonly used in the context of a particular emotion. Thus, we chose “happy”, “enjoy”, “pleased” as
seed words for the happiness category, “afraid”, “scared”, “panic” for the fear category, and so on. Next, using the seed words for each category, we retrieved blog posts containing one or more of those words. Table 1 gives the details of the datasets thus collected. Sample examples of annotated text appear in Table 2.
Table 1. The details of the datasets
Dataset
|
# posts
|
# sentences
|
Collected using seed words for
|
Ec-hp
|
34
|
848
|
Happiness
|
Ec-sd
|
30
|
884
|
Sadness
|
Ec-ag
|
26
|
883
|
Anger
|
Ec-dg
|
21
|
882
|
Disgust
|
Ec-sp
|
31
|
847
|
Surprise
|
Ec-fr
|
31
|
861
|
Fear
|
Total
|
173
|
5205
|
|
Table 2. Sample examples from the annotated text
I have to look at life in her perspective, and it would break anyone’s heart. (sadness, high)
|
We stayed in a tiny mountain village called Droushia, and these people brought hospitality to incredible new heights. (surprise, medium)
|
But the rest of it came across as a really angry, drunken rant. (anger, high)
|
And I realllllly want to go to Germany – dang terrorists are making flying overseas all scary and annoying and expensive though!! (mixed emotion, high)
|
I hate it when certain people always seem to be better at me in everything they do. (disgust, low)
|
Which, to be honest, was making Brad slightly nervous. (fear, low)
|
Emotion labeling is reliable if there is more than one judgment for each label. Four judges manually annotated the corpus; each sentence was subject to two judgments. The first author of this paper produced one set of annotations, while the second set was shared by the three other judges. The annotators received no training, though they were given samples of annotated sentences to illustrate the kind of annotations required. The annotated data was prepared over a period of three months.
The annotators were required to label each sentence with the appropriate emotion category, which describes its affective content. To Ekman's six emotions [3], we added mixed emotion and no emotion, resulting in eight categories to which a sentence could be assigned. While sentiment analysis usually focuses on documents, this work’s focus is on the sentence-level analysis. The main consideration behind this decision is that there is often a dynamic progression of emotions in the narrative texts found in fiction, as well as in the conversation texts and blogs.
The initial annotation effort suggested that in many instances a sentence was found to exhibit more than one emotion – consider (1), for example, marked for both
happiness and surprise. Similarly, (2) shows how more than one type of emotion can be present in a sentence that refers to the emotional states of more than one person.
Everything from trying to order a baguette in the morning to asking directions or talking to cabbies, we were always pleasantly surprised at how open and welcoming they were.
I felt bored and wanted to leave at intermission, but my wife was really enjoying it, so we stayed.
We also found that the emotion conveyed in some sentences could not be attributed to any basic category, for example in (3). We decided to have an additional category called mixed emotion to account for all such instances. All sentences that had no emotion content were to be assigned to the no emotion category.
It's like everything everywhere is going crazy, so we don't go out any more.
In the final annotated corpus, the no emotion category was the most frequent. It is important to have no emotion sentences in the corpus, as both positive and negative examples are required to train any automatic analysis system. It should also be noted that in both sets of annotations a significant number of sentences were assigned to the mixed emotion category, justifying its addition in the first place.
The second kind of annotations involved assigning emotion intensity (high, medium, or low) to all emotion sentences in the corpus, irrespective the emotion category assigned to them. No intensity label was assigned to the no emotion sentences. A study of emotion intensity can help recognize the linguistic choices writers make to modify the strength of their expressions of emotion. The knowledge of emotion intensity can also help locate highly emotional snippets of text, which can be further analyzed to identify emotional topics. Intensity values can also help distinguish borderline cases from clear cases [20], as the latter will generally have higher intensity.
Besides labeling the emotion category and intensity, the secondary objective of the annotation task was to identify spans of text (individual words or strings of consecutive words) that convey emotional content in a sentence. We call them emotion indicators. Knowing them could help identify a broad range of affect-bearing lexical tokens and possibly, syntactic phrases. The annotators were permitted to mark in a sentence any number of emotion indicators of any length.
We considered several annotation schemes for emotion indicators. First we thought to identify only individual words for this purpose. That would simplify calculating the agreement between annotation sets. We soon realized, however, that individual words may not be sufficient. Emotion is often conveyed by longer units of text or by phrases, for example, the expressions “can't believe” and “blissfully unaware” in (4). It would also allow the study of the various linguistic features that serve to emphasize or modify emotion, as the use of word “blissfully” in (4) and “little” in (5).
I can't believe this went on for so long, and we were blissfully unaware of it.
The news brought them little happiness.
Dostları ilə paylaş: |