Eppler Corpus

Eva Eppler
Department of Media, Culture, and Language
University of Surrey Roehampton


Participants: 4 (2-8 files per participant)
Type of Study: naturalistic, interview
Location: UK
Media type: audio
DOI: doi:10.21415/T5GK6J

Browsable transcripts

Download transcripts

Media folder

Citation information

Publications that use these data should cite:

Eppler, Eva. 1999. ‘Word order in German-English mixed discourse’, UCL Working Papers in Linguistics 11, 285-309.

Eva Duran Eppler. 2010. Emigranto. The syntax of a German/English mixed code. Vienna: Braumueller. ISBN 978-3-7003-1739-5

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

The data was collected from a community of Austrian Jewish refugees from Nazi occupied Austria (approx. 30000 Austrians fled to the UK) who settled in Northwest London in the late 1930s. We are therefore dealing with a community in which German and English have been in close contact for over sixty years. The L1 of the informants is close to Standard German, although occasionally interspersed with Yiddish lexical items and phonetically influenced by the Viennese variety. A peculiarity of the linguistic profile of this community is that they do NOT speak Yiddish. The age of onset of L2 (English) was during the late teens and early twenties for most speakers. At the time the audio-recordings were made (1993) all informants were in their late sixties or early seventies. Patterns of language use in this bilingual community changed throughout the last half a century: up to the 1970s mainly English was used in both public and private domains. Once the second generation had left the parent’s household and especially after retirement both languages started being used in the private domain. A close-knit network between a subset of the community facilitated the development of a bilingual mode of interaction, sometimes called 'Emigranto'. This mode of interaction is only used in in-group situations, is regarded as the 'we-code' (Gumperz 19982) and has covert prestige. Linguistically it is characterised by intra-sentential code-switching, and frequent switching at speaker turn boundaries. Biographical (age, gender, schooling, social class of informants etc.) and situational information, where available, is provided under the relevant headers in the .cha files. Pseudonyms are used for all participants.The goal of the project was to provide a linguistic profile of the Jewish refugee community in London and to study patterns of code-mixing.

Sampling and Data Collection

A random sample of 70 members of the target community was selected from a list of clients of an Austrian solicitor specializing in pension claims for refugees. 27 of them were audio-recorded for approx. 90 minutes in one-to-one or one-to-two sociolinguistic interviews/oral history collections. To this body of subjects other informants were added by referral (snowball sampling). All audio-recordings were collected in the informants’ homes. Informants were encouraged to choose as a language of interaction the one they normally use in their home. An additional 400 minutes of group recordings with three informants and the researcher were collected in participant observation technique during informal gatherings. Another 540 minutes of audio-data collected in the Day-Centre of a Refugee Organisation are almost impossible to transcribe due to the low quality of the recordings and the amount of overlap.

Data Transcription

Full transcripts were made of sound files using the CHAT/LIDES transcription systems. LIDES (Language Interaction Data Exchange System) is based on CHAT but was extended to deal with code-mixed data. For this purpose language tags (@2 English and @4 German) are added to each word/morpheme to indicate its language. In cases where it was impossible to determine the language in which words were being produced, @u was attached, e.g. in@u preceding English or German place-names. Morphologically mixed words only display the full language tag on the suffix as CHECK does not pass sequences like e.g. ge@4#bother@2-t@. The comma was used to indicate syntactic juncture as one of the research aims is co- and subordination. The CHAT symbol for tag questions was also used to delimit discourse markers (Schiffrin 1987). Due to the nature of some of the data (group recordings) overlaps are only indicated when the beginning and end point of the overlap was clearly recognisable. Eva Eppler and Maggie Brueckner of the Language Centre of the University of Rostock, Germany both transcribed and checked each transcript. Project-specific codes are not included in the files on the web.

The collection and transcription of the data was funded by various research grants form the University of Vienna and the University of Surrey Roehampton. The Austrian Ministry of Science funded this research. Many thanks for the technical support from the media team at Roehampton, to LIPPS and to TalkBank.