Universal meaning extensions of perception verbs are grounded in
2.1 The database Data from informal naturally-occurring conversation were collated from thirteen languages from nine language families (Figure 1). These families are: Austronesian (Whitesands), Barbacoan (Cha’palaa), Duna-Bogaia (Duna), Indo-European (English, Italian, Spanish), Mayan (Tzeltal), Mon-Khmer/Austroasiatic (Semai), Niger-Congo (Avatime, Siwu), Sino-Tibetan (Chintang, Mandarin) and Tai-Kadai (Lao). This was a convenience sample selected on the basis of available data and expertise, with an emphasis on the inclusion of lesser-described indigenous languages. [2] Such languages make up more than half of the group, alongside five national or international languages (English, Italian, Lao, Mandarin, Spanish).
Figure 1: Languages featured in this study, with names of the contributing researchers. The corresponding database was first reported in San Roque et al. (2015).
The investigators named in Figure 1 provided transcripts and coding for each language. To do so, they drew on long-term fieldwork with the relevant community, having worked in collaboration with (other) native speakers to record, transcribe, and translate conversational material. This is time-consuming and challenging work, in many cases made more so by a dearth of prior language description and difficult field conditions. Such difficulties, indeed, go partway to explaining the low representation of conversational corpora in cross-linguistic study (cf. Floyd et al. in press).
Each language contributed six video-recorded conversation segments of approximately 10 minutes each (i.e., an hour in total), primarily of interlocutors in domestic settings. The samples are comparable in that they cover people’s daily interactions doing typical activities such as chatting, preparing food, engaging in craft activities, and so on. Further details about the sampling procedures can be found in San Roque et al. (2015).
For each language, the relevant investigator(s) identified a set of perception terms according to Viberg’s (1983) method (allowing complex predicates as well as simple verbs), so each of the five sense modalities (sight, hearing, touch, taste, smell) was represented by one or more terms. [3] Seven languages in the sample (Avatime, Duna, Italian, Semai, Spanish, Tzeltal, Whitesands) have a multi-sense term in their core perception vocabulary, described and discussed in §3.4. For the purposes of this study, words that refer only to internal sensation, temperature, proprioception and/or emotion were not considered.
Perception terms were located in the sample, and the conversational turn in which the term occurred was entered into a database. For each example, researchers provided a free translation into English of the whole utterance and of the perception term. Each item was coded for a number of features (see San Roque et al. 2015) including the sense modality of the lemma (sight, hearing, touch, taste, smell, multi-sense) and whether it had a discourse function in this context; that is, if in this instance it appeared to be serving a primarily discourse-oriented purpose in the conversation, such as directing the attention of the addressee to upcoming talk.
The number of perception tokens noted in each language varied considerably; for example, the Cha’palaa hour included fewer than 50 basic perception verbs, whereas the Avatime sample included nearly 180. This is likely to affect the number of polysemies observed in each language. In addition, verbs of vision were by far the most frequent, accounting for 75–85% of the basic perception verbs used (see San Roque et al. 2015), except in Tzeltal, where the multi-sense verb was almost twice as frequent as the vision verb. Apart from Tzeltal, then, additional meanings for vision are likely to be the best represented here, potentially as a genuine reflection of how meaning and form are distributed (i.e., there is likely a relationship between frequency and number of meanings, as per Zipf 1945; Winter et al. 2018).
Taking all of this into consideration, the database allows us to draw conclusions about the presence of meanings across languages and modalities, but not their absence. Furthermore, given the sample, quantitative comparisons between sensory modalities or languages would be inappropriate. However, we believe the qualitative approach of the following sections provides rich insights into perception verb usage in everyday talk, and does so for diverse communities worldwide in an unprecedented manner.