Langman Corpus

Juliet Langman
Division of Bicultural-Bilingual Studies
University of Texas at San Antonio


Participants: 11
Type of Study: interview
Location: Hungary
Media type: audio
DOI: doi:10.21415/T5C027

Publications using these data should cite:

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

This corpus is made up of 10 files consisting of interviews conducted in 1994 with 11 Chinese immigrants living in Hungary. The bulk of the conversation is in Hungarian, although in the case of those who speak English there is also English, and in the case of one transcript (KIN10) there are significant amounts of Chinese (with a Hungarian translation in a %tra dependent tier). Interviews focused on issues related to their arrival in Hungary as well as their daily life activities. With the exception of KIN2 and KIN10 none of the participants had had formal training in Hungarian. Interviewers were the researcher, as well as three different Hungarian undergraduates. Data were collected with two purposes in mind: the analyses of communicative strategies among adult second-language learners learning in a nonstructured environment, and the analysis of the acquisition of morphology of an agglutinative language.

Partial support for data collection and analysis was provided through a grant awarded to Dr. Csaba Pléh, OTKA grant T018173, A magyar morfológia pszicholingvistikai vizsgálata (The psycholinguistic study of Hungarian morphology).

Special Coding

The following additional form markers have been used in the (*) speaker lines of the transcripts:
@e = English word, e.g., go@e
@c = Chinese word, e.g., xie@c
@a = adult-invented word, e.g., pigyilni@a

The following special codes have been used on the %lan tier:
$MIX utterances with some form of code-switching or borrowing
$CHI utterance in Chinese (used only in KIN10)

The following special codes have been used on the %rep (repetition) tier to identify:
1. whose speech is repeated

2. the function of the repetition 3. the form of the repetition

These three types of codes could be combined as in: %rep: SRP:MIS:PAR

Error coding focused exclusively on morphology and is represented on two separate tiers, %err and %mor. The %mor tier shows the actual target form for each error marked. The %err tier marks the types of errors using the following codes:
$OMI: omission
$OMI:PAR partial omission
$INS: insertion
$INS:PAR partial insertion
$SWI switched form
$SWI:PAR partially switched form