Classification of Speech Acts: Direct and indirect speech acts
During the early stages of speech acts studies, an observation was made that, for instance, illocutionary act of ordering can be realized not only by means of imperatives but also by interrogative sentences. Further investigations in this field, led Heringer to conceive the notion of indirect speech acts. In order to illustrate this idea, let us consider the following examples, as presented by Yule (2000: 54):
You wear a seat belt. (declarative)
Do you wear a seat belt? (interrogative)
Wear a seat belt! (imperative)
We can easily recognize the relationship between the three structural forms mentioned above (declarative, interrogative, imperative) and the three general communicative functions they perform (statement, question, command / request).
Next, Yule (2000: 54-55) divides speech acts into direct and indirect, and proceeds to explain that direct speech acts occur “whenever there is a direct relationship between a structure and a function” and “whenever there is an indirect relationship between a structure and a function, we have an indirect speech act.”16 Therefore, a declarative sentence used to make a statement is an example of a direct speech act, and a declarative sentence used to make a request is an indirect speech act. Various structures can be used to accomplish the same basic function, for instance, Levinson enumerated a large number of ways in which we can utter a request and claims that “what people do with sentences seems quite unrestricted by the surface form (i.e. sentence-type) of the sentences uttered. Cohen also states that nearly each and every type of speech act can be realized by means of another act. Consequently, the number of possible interpretations of a given sentence, e.g. Go to London tomorrow! (it might be a command, instruction, advice, and many more), are numerous and difficult to assess, but cannot be indefinite.
The question arises, whether such a classification (into direct and indirect speech acts) makes any sense. The answer seems to be yes. It appears so due to the fact that a considerable number of linguists, scholars and philosophers have tried to distinguish direct and indirect speech acts. However, there is still a lack of agreement among linguists on a common definition of indirect speech act. According to Leech, the notion of indirectness is gradable, and therefore, there is no need to distinguish direct and indirect as two opposed groups, since direct speech acts constitute those which are the least indirect. We should not, however, indulge in a detailed discussion of this matter because the idea of classifying speech acts into direct and indirect has been widely accepted and practised, even when these two categories are not regarded as being in opposition. Another definition is proposed by Wiertlewski writes that speakers, while uttering indirect speech acts have something more on their minds than what is actually said; and that indirect speech acts are utterances which “break the connection” between the grammatical structure of the utterance and its illocutionary force of it. In other words, “the speaker means what the sentence means, but something else as well.”.
We should not forget about Searle’s tremendous contribution to this field and his understanding of indirect acts. He described indirect speech acts as cases in which “one illocutionary act can be uttered to perform, in addition, another type of illocutionary act”. He called the former primary and the latter secondary illocutionary act. For instance, after uttering a sentence Can you reach the salt? a speaker considers it not merely as a question, but as a request to pass the salt. In this case the direct act is a question about the hearer’s ability to pass the salt (secondary illocutionary act). It could be answered by him /her saying yes and doing nothing.
However, both interlocutors are aware that action is expected, and that what the speaker told the hearer was simply to pass the salt (primary illocutionary act), but the speaker did so indirectly. It seems that the meaning of utterances is in large part indirect, so it happens with the use of e.g. requests “(…) most usages are indirect. (…) the imperative is very rarely used to issue requests in English; instead we tend to employ sentences that only indirectly do requesting.”.
If this is the case, then a question arises, namely, “How do I know that he has made a request when he only asked me a question about my abilities?” How are we to know if I’ll meet you tomorrow is a promise, a threat, or perhaps just an announcement? In order to solve this problem we ought to take a pragmatic approach. Searle postulates that in understanding indirect speech acts we bring together our knowledge of the following three elements, they are:
- the felicity conditions of direct speech acts,
- the context of the utterance, and principles of conversational cooperation.
As has already been pointed out in this unit, the felicity conditions deal with the speaker being in an appropriate situation to make the utterance. One cannot promise someone his / her bike, for instance, if the speaker does not possess one. The context of the utterance is the situation in which it is made. Context mostly helps us to understand how a particular utterance should be interpreted. The conversational principles constitute the base-line assumptions which speakers and hearers conventionally have about relevance, orderliness and truthfulness. “The process of combining these elements draws heavily on inference because much of what is meant is not explicitly stated. It is here that the work of speech acts theories links up with the more general approach of H. P. Grice and his interest in conversational implicatures”.
We might face the dilemma – Why are indirect speech acts used at all, given that they apparently create some difficulties in human communication? This issue has been dealt with by pragmalinguists. Some of them claim that by applying indirect speech acts we sound more polite. Lakoff following the ideas of Grice’s Cooperative Principle, presents his “logic of politeness” with its main requirement of being polite. Leech introduces a tact maxim aiming at preventing any conflicts. He also observes that the use of direct speech acts in the case of directives, e.g. request, may lead to hostile behavior. Therefore, he recommends employing indirect speech acts to make requests.
Language use is not just an exercise in understanding as it is for example when we are learning to speak an unfamiliar foreign language. It is not used randomly and out of context, but often with a clear aim or as a focus for some further action. It is this context that will resolve for the participants what is an acceptable level of understanding. If a hearer is dissatisfied in some way with his mental model of what the speaker has said, he will ask for clarification, or query his understanding by rephrasing what has been said in his own words.
Not only this, but it is a mistake to think that once an interpretation has been made that it is immutably fixed, and stored in the state it was originally conceived. We do not keep a permanent mental representation of an utterance. We incorporate our interpretation and the various beliefs and knowledge attendant on this interpretation, as well as our understanding of the function of the utterance, into our mental model to inform an appropriate response, and to provide further interpretations of discourse later on. If evidence in later talk reveals that we have misunderstood something earlier on in the conversation, we will refine and update our model accordingly.
A speaker does not form his utterances using the only possible set of words for the ‘correct’ communication of his ideas, but packages what he says in a way he believes the hearer is most likely to understand in the context of the discourse situation.
If the speaker includes too much detail in his conversation then it becomes boring for the hearer or the hearer might become overloaded by too much information and so is unable to process it to make a ‘correct’ interpretation; too little information on the other hand, will lead to ambiguity. Speech is thus constantly balanced between too much and too little information. A speaker is always vying for a hearer’s attention and so must try to convey his message as simply as possible.
Minimal specification is often the best strategy for speakers to follow. This is often the way that children behave in conversation because they tend to believe that others (especially adults) are already aware of all the background information necessary to decode their message. (In fact this belief in very young children extends to all behavior – they are incapable of deceit because of the assumption that the other person has complete knowledge of all that they themselves know.) It is interesting to note that minimal specification is often enough, and is easily expanded at need when extra information is required. This is negotiated between the participants in a conversation at the time the need for it occurs. If it becomes apparent that a hearer is unable to understand all that is said, the speaker can easily switch from a strategy of under-specification to that of over-specification (for example when a hearer’s background information is inadequate to follow the references being made by the speaker, as in the case of an outsider joining a closely knit group of friends).
Likewise, it is expected that the hearer will try to make sense of what he is hearing and co-operate in the process of communication. Sperber and Wilson claim that every utterance comes with a presumption of its own optimal relevance for the listener. This is evidently too strong an assertion; it seems obvious that there are some utterances that intrude on the hearer, and whose outcome is of sole benefit to the speaker (anyone who has been approached in the streets by Big Issue homeless magazine sellers will understand what this means). From the hearer’s point of view, there is no guarantee that, in the end, it will be in the hearer’s interests to attend to what the speaker says. Yet we do pay attention to each other when we speak (sometimes!). Brown says:
It is not necessary to postulate a universal guarantee of relevance to the hearer as the motivation for a hearer paying attention to what a speaker says.
All one needs to do is to look at the social aspects of communication to find a reason for a hearer’s attention. We can explain a hearer’s compliance to the demands of consideration of the speaker’s utterance by the general elements of co-operative behavior that govern all human interaction. We work together in life because in the end we as individuals will benefit. The principle of ‘if I listen to you now, you may listen to me later’ comes into play this is the fundamental basis of all co-operative activities. One also never knows when what we hear might be of some use.
The view is often expressed in the literature that speakers take the active role in conversation while hearers are merely passive. Clark says: “All that counts in the end is the speaker’s meaning and the recovery of the speaker’s intentions in uttering the sentence”.
This ignores the case when the hearer was originally the prime mover within a conversation, when it is the hearer who decides what information is most important in interaction (because he himself has first elicited it). This is an important point for my thesis as it motivates the interpretation of speech acts based on prior speech acts.
Overlooking the importance of both the speaker and the hearer (and only focusing on the speaker’s intentions) is as insufficient as trying to assume that the participants share common goals and contexts. Johnson-Laird points out the fallacy of common contexts (or mutual beliefs) for both speaker and hearer in conversation there are two, one for each participant17:
…the notion of the context overlooks the fact that an utterance generally has at least two contexts: one for the speaker and one for the listener. The differences between them are not merely contingent, but… a crucial datum for communication.
In fact, this fails fully to cover all the possibilities, for when there is a third observer present to hear the communication between the speaker and the recognized addressee, surely we must then add an extra dimension of context to the conversation. We can now see the various roles that people can take in a conversation. It will become important to our discussions later to have drawn the distinction between the different ‘current’ roles or the ‘statuses’ of participants within a conversation at the time of an utterance.
People minimize the risk of miscommunication by judging how much information is needed by the hearer in order to be able to decode the message in context. So a speaker will constantly be deciding whether to maximize or minimize (using pronomialization or ellipsis for instance) the referents depending upon the status of focus for these referents.
Participants in a conversation will also constantly check whether the message has been correctly conveyed. In spoken language the speaker includes information about how the hearer should treat the content of his utterance, and the hearer will repeatedly feedback reassurances that he has, in actual fact, received the message correctly. Mistakes in understanding are in this way often caught quickly and rectified.
There have been a number of studies of spontaneous speech used in dialogues. Spontaneous speech differs from read speech in that the speaker is often greatly influenced by the environment. Not only will a speaker adjust to his or her dialogue partner, but he or she may use their words and even sentence intonation. Dybkjaer noted that whilst collecting dialogues in a WoZ experiment, speakers tended to use the words which appeared in the description of the scenario. To avoid this problem, Dybkjaer designed two sets of experiments: one used a text-based description of the scenario and task, the other a graphical representation of the same tasks. The graphic-based experiments negated the priming effect, but paradoxically generated a smaller vocabulary than the text-based approach. In a similar way, system questions and responses can influence the style of language of the user. Schillo also tried to avoid lexical "priming" by using graphical icons to prompt user responses. Schillo found a great deal of planning was required to "engineer" an icon-based dialogue and he had to resort to language for anything beyond basic functions.
A large proportion of spontaneous speech is ungrammatical, and yet it is astoundingly resilient to any arising errors. Any misunderstandings which arise, termed communication deviations by Taleb (1996), are resolved within 6 turns on average. Fraser notes that communication is inherently ‘messy’, and even goes as far as saying that messy communication may be a natural phenomenon and perhaps necessary for a pleasant, natural dialogue.
At any stage in a dialogue, one participant has the initiative of the conversation. For example, when agent A asks agent B a question, agent A has the initiative. When agent B answers, however, agent A still has the initiative, i.e. has effective control of the conversation. Now, if agent B then asks a question of A, B takes control of the dialogue or takes the initiative. In a sense, the initiative is a function of the roles of the dialogue participants. Dialogue systems may allow the user or the system of take the initiative, or may allow both to switch roles as required.
Menu navigation, system-orientated question and answer systems where the system has the initiative throughout the dialogue are the simplest to model. In this case, the user must select between a small numbers of well-defined options. Where the user takes the initiative in systems such as command and control and database query systems more complex modeling strategies are needed. Although the range of tasks to be performed may be low, the user has greater expressibility and freedom of choice. The hardest dialogues to model, by far, are those where the initiative can be taken by either the system or the user at various points in the dialogue. As noted by Eckert (1996), mixed initiative systems involve dialogues which approach the intricacies of conversational turn-taking, utilizing strategies which determine when, for example, the system can take the initiative away from the user. For systems which use speech recognition, the ability to confirm or clarify given information is essential, hence system-orientated or mixed initiative must exist.
A dialogue taxonomy. Dahlback (1995) argues for a classification or taxonomy of dialogues. Comparisons of different dialogue management techniques or computational theories of discourse are difficult when dialogues have differing features. If dialogue systems could be classified according to some taxonomy, then any algorithms developed could be applied to different systems with comparable results. Dahlback's taxonomy is based on work by Rubin (1980) and Clark (1985). It distinguishes four main dimensions:
1. Type of agent
2. Channel
3. Task type
4. Shared knowledge
The type of agent, either human or computer greatly influences the style of spoken language used. Work by Guindon and Kennedy et al. shows that when addressing a computer, human utterances are shorter and have a smaller lexical variation with a minimal number of pronouns.
The channel of communication involves many different aspects. The most obvious is the modality of the channel which may be either written or spoken; the distinctions between the two have been explored above. Other aspects of the channel include the style of interaction (influencing for example, the anaphor-antecedent relations), spatial and temporal commonality, concreteness of referents (are the objects and events referred to visually present or not) and separability of characters, all of which influence the dialogue structure and style.
As noted above, the dialogue structure is influenced by the structure of the task. The number of different tasks managed also affects the dialogue structure. Unsurprisingly, there are three types of information which can be used to infer the common ground between speaker and listener: shared perceptual, linguistic and cultural knowledge. The first two types are the easiest to model. When both participants are aware of physical or visual objects then they share perceptual knowledge. Linguistic knowledge is the shared knowledge of the dialogue up to that point. Cultural knowledge is more difficult to model since it depends on knowledge that is obtained from the participant’s community.
Approaches to dialogue management. A dialogue manager must model to a certain extent two aspects of the interaction between the user and the machine. The first of these is the interaction history which allows user’s sentences to be interpreted according to the current context. The context may also deal with phenomena such as anaphora and ellipsis. The second aspect is a model of the interaction, controlling the strategy adopted by the system to control the dialogue structure.
This strategy also influences the role of initiative in the dialogue. There are many approaches to the design of dialogue management systems. Typically, a system will consist of the components in Figure 1, below. Text or transcribed speech is input into the system which is parsed and semantic information deduced by the natural language components. The dialogue manager then coordinates between the other components in order to fully understand all the implications of the input in the current context, resolving conflicts, ambiguities and dealing with anaphora and ellipsis. Multi-modal systems can receive from and output too many different devices. The response generation module generates a response based on the output of the dialogue manager, usually as a natural language text to be read out by the speech synthesizer, but may also generate graphical or other output. Whilst the system may output text to be converted into speech, it may also include visual output. Each device needs a level of abstraction, a driver, which interprets between the actual device and the dialogue manager, so that all input and output can be modeled with the same mechanisms.
Conversation and Cultural Context (Ethnography of Speaking). Ethnographers of communication are primarily concerned with the analysis of patterns of communicative behavior, by observing how participants use language. Their aim is to try to discover how members of a specific culture perceive their experiences and then transmit their interpretations of these.
Hymes developed a schema for breaking down the constituents of a context into units of analysis, which he called the speech events in which the language happens. He listed these in a classificatory grid, called a SPEAKING grid (because of the letters used to identify the different components).
S
Setting, scene
Temporal and physical circumstances, subjective definition of an occasion