DISPEL Corpus


Nikolinka Collier
Department of Computer Science
University College

website

Participants: 26
Type of Study: task
Location: Ireland
Media type: video
DOI: doi:10.21415/T50K7K

Browsable transcripts

Download transcripts

Media folder

Citation information

Some citation information here.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

The Dispel corpus was collected in the autumn of 2001 in the Department of Computer Science, University College Dublin. It is a set of 30 two channel 16 bit recordings. They were recorded on DAT Sony MP2. Approximately five minutes excerpts of each dialogue were annotated using Chat. The corpus is available on 3 CDs, which contain the sound and annotated files and the design of the data collection, settings and demographic details of the participants. It is also available through TalkBank http://www.talkbank.org/data/tutor/dispel/

The collection of Dispel was aimed at providing sufficient material for the analyses of DPs in spontaneous speech. Although there are a lot of corpora available for discourse analyses, few account for DPs consistently. The core reason for that omission is that DPs are often considered peripheral to speech processing. Therefore it is common practice for a corpus to lack relevant presentation of DPs. This want can be a reflection of the design settings for a data collection. The design is often oriented towards collaboration at some level- topic of interest or solving a task. The subjects’ behavior is usually controlled for situations that are either outside their daily routine- or the interaction is with partners that are not so familiar with each other so they are restricted in the range of the social interaction that they apply to the collaboration. In this collection the aim was to preserve the loose task collaboration pattern of the interaction, without at the same time dispensing with the phenomena that occur in more informal communication. The objective here was to provoke intensive interaction by promoting a topic from an everyday life routine that the participants consider entertainment. This usually neither increases nor decreases the number of tokens but expands the range of DPs produced.

Design setting

The design set of for collecting the dialogues was approximating a collaborative introductory tutorial between two participants: a Beginner and an Expert. The two participants are aware that they are engaged in a role playing game and their task is to grasp a basic notion of one of the following computer games: Age of Empires or Civilization . The speakers are told that the aim of their session is to initiate an introductory tutorial about the discussed computer game. One of the participants, the Expert is familiar with the discussed game. The other participant, the Beginner has not played the game before. The Expert instructs the Beginner while they are playing the game on the cause of actions to undertake and on the general tactics in their improvised demo tutorial. They are both sitting next to each other looking at the computer screen and following the events of the game. The Beginner has control over the keyboard. The Beginner is entitled to ask and to get help from the Expert for all the movements and strategy puzzles in the game.

The design was aimed at providing conditions, which allow for average frequency of DPs one for task-oriented interactions. Their roles are generally the role of the novice and the role of the expert. The environment was aimed at remotely simulating an interaction between a user and an intelligent help interactive assistant. The Beginner has control over the keyboard. The Beginner is entitled to ask and get help for all the movements and strategy puzzles in the game from the Expert. Apart from providing us with enough data of DP tokens the work can be used with some insights for studying interaction patterns outside classroom conversations for computer human interaction.

The games

The choice of the games was motivated in two ways. The first one was that enough subjects have to be familiar with the game. The games strategy has to be complex enough to allow the need of interaction between the participants and especially to prompt the Beginner to ask questions and elicit discussions about the best and the not so good choices in the game.

In order to choose a set of games we interviewed the subjects for the proficiency they have in a game. There were about four games that a large group of the interviewees was familiar with: Team Fortress, Sim City, Tomb Raider, Age of Empires and Civilization. Sim City was a complex game and the subjects were engaging in conversation about it but not playing the game. The second trial with Team Fortress showed us that the level of the game proficiency of the players was very similar and they do not interact—instructions are not demanded as the two players are equally familiar with the special and problem layout of the game. Tomb raider involved a lot of puzzles both in special orientation in the virtual world and in handling monsters and finding bounties. As the game involves only one avatar it is not so demanding to provoke the Beginner to make inquires about the course of action to be taken.

After the pilot study on a variety of games my choice was restricted to: Age of Empires and Civilization II. Both computer games are strategy computer games, which main purpose is to create and evolve a civilization strong and smart enough to survive through internal (diseases, lack of resources, discontent citizens and so on) and external difficulties (hostile and friendly tribes or civilizations, attacks and/or wars) in order to become the most prosperous ruling civilization. They were complicated enough to engage participants in conversation but still not so demanding as to prevent them from discussing points while playing the game.

There are also a few differences and points to be made about each game. In the following sections we present two introductory descriptions of each game. One is a resume of the commercial descriptions of the games and the second one a transcribed version of one of the introduction given by the subjects.

Civilization II is a journey through time where players are challenged to create their own version of history as they match wits against the world's greatest leaders and build, expand and rule a world dominating civilization to stand the test of time. The leader (the player themselves) rules their citizens with the help of advisors, (that is the interactive help of the game). The main occupations of the citizens are to build a strong base, explore new and uncharted territories as they search for valuable resources, and conquer enemies through force or diplomacy. An important point about this game is that there are technologies to be chosen for the benefit of our population and they determine the stages that the participants would undergo in the development of their civilization.

Age of Empires is an epic real-time strategy game spanning 10,000 years, in which players are the guiding spirit in the evolution of small Stone Age tribes [1]. Starting with minimal resources, players are challenged to build their tribes into civilizations. Gamers can choose from one of several ways to win the game, including: world domination by conquering enemy civilizations, exploration of the known world and economic victory through the accumulation of wealth. The players have a choice between twelve civilizations, technology tree to select the next step in the evolution, dozens of units, randomly generated maps. Although the game sets players within a historical context, it is supplied with a built-in scenario editor so the players can create their own conflicts and scenarios. It allows up to eight other players, to play in a multiplayer forum.

The corpus involves 26 participants- 17 male and 9 female. The participants are graduate students and members of staff in Department of Computer Science, UCD. English is first language for all participants. The origin of the participants was predominantly Irish, with the exclusion of three subjects. There are two Americans and one English amongst the group. The age of the participants was basically between 20 and 30. The youngest one is 22, the oldest 36. The demographic details of the participants are listed in the Table1 below.

The roles

The roles of the dialogues were based on the domain knowledge of the participants. There are two roles: Beginner and Expert. In the group of the Experts were designated people who are familiar with the game. In order to be promoted an Expert the person has to have spent not less than 20 hours playing the discussed game. On the other hand, in the group of the Beginners were designated people that have not played the chosen game ever before.

In the orthographic transcription of the game we have chosen the initials that are provided as part of the tiers available for encoding the roles of the participants in the tools provided in CHILDES System. The Expert takes the initials TEA for teacher and the Beginner takes the initials STU for student.

The feedback from the pilot data collection

In the beginning a few pilot sessions were done on people that have different level of familiarity but nevertheless are familiar with a game. This worked for people that had very contrasting proficiency in the game. The choice of the games was motivated by two main criteria: the first one was diversity of subjects and the second one was complexity of the game that would induce intensive discussion in the sessions. The first criterion demanded enough subjects to be familiar with the game in order to have a variety of both Experts and Beginners. The second one determined games objectives to be challenging enough to allow the need of interaction between the participants and especially to prompt the Beginner to ask questions and elicit discussions about the cause of actions they should overtake for better results.

In the initial design the efficiency of the subject was not positioned in the two opposing scales of the knowledge. The intention was to use subjects that have different knowledge about one and the same game, but both of them have some familiarity with the game and have spent not less than five hours exploring it. It proved out to be difficult to precisely evaluate the proficiency in the games primarily based on the time that the subject calculates they spent exploring the game. The problems emerge when the sessions were carried out with people that have almost the same level of knowledge in the domain. There was almost no verbal interaction between these participants as they were equally proficient and there was little that could prompt inquiries by either of them. The setting of the game was changed to participants who lie in the opposing scale of the domain knowledge so there would be one Expert who has intermediate up to advanced level in playing the game and Beginner who has never played the game before.

Another factor for the intensity of the interaction was the level of familiarity between the participants as well as the amount of time they have spent together in that problem-solving environment. As the aim of the data collection was allowing abundance of discourse particles production the interaction has to be intensive in order to allow the natural occurrence of such. The intensity of the interaction was considered to be the frequent overlapping speech and the signals indicating surprise and frustration of unexpected outcome after some set off actions.

The sessions

Originally the conversations between the participants were approximately 10 to 15 min. From the collected sound data were chosen about 5 min extract from each dialogue which were transcribed in order to measure the frequency of the disfluency and to investigate their functions in this dialogues. After listening through the collected data the extractions were restricted in most instances to the first five minutes of the introduction when the participants interact about the goals of the game and are exposed to the task of finding the realms of their world and the possibilities that they have in it.

Some of the sessions were more successful than others in the sense that the participants reached a more advanced stage in the game they were playing. This was usually related to proficiency of the participants in either similar games or game playing in general. The most challenged beginners were those that have not played the games before and did not have experience in virtual reality environment. There are basic conventions on how movements are performed or how information swaps between birds eye view of the world that is being explored to information on current resources or politics development. For example, one of the most frequent disorientation in Civilization II was the pop up window which gave statistics what is the increase of the population, what are the necessary resources or the information when certain technology has been developed. The statistics showed that out of eight sessions in this particular game, six out of the eight participants playing the role of the Beginner asked What is going on?; while in the age of empires where the game does not have pop up menus and tutorials this question has not being asked. there was also a general sense of detachment from the avatars with the more experienced players, while those who had no knowledge about the games, or game scenarios in general seemed to concentrate more on the avatars as personalities. More about the peculiarities of each session can read in sessions.txt.