HKPU Corpus

HKPU Corpus

Angel Chan
Department of Chinese and Bilingual Studies
Hong Kong Polytechnic University


Participants: 20
Type of Study: xxx
Location: xxx
Media type: audio
DOI: doi:10.21415/T5489G
Browsable transcripts
Download transcripts
Media folder

Citation information

Chan, A., Feng, Z-H, Yang, W-C. (2013). A new multimedia shared L2 spoken Mandarin Chinese corpus: construction and linguistic analyses. Paper presented at the 21st Annual Meeting of the International Association of Chinese Linguistics (IACL 21).7-9 June 2013. Taiwan.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

This corpus builds on a completed MA student dissertation by Jeff Zhen-Hui Feng with technical support by a doctoral student Wenchun Yang and a research assistant Shanrong Xie under the supervision of Angel Chan. It is a small video-linked L2 spoken Mandarin Chinese corpus featuring 14 adult subjects whose L1 is English and 6 L1 adult participants as controls. It is hoped that the addition of this corpus could further raise the visibility of SLA learner corpora featuring Chinese as the target language at Talkbank in this collaborative data-sharing international consortium.


Fourteen L2 adult learners of Mandarin (L1: English) and six L1 native Mandarin speakers (as L1 controls), aged between 20 and 70, were recruited. The 6 L1 participants were selected to match 6 L2 participants based on gender, age and education level to enable systematic L1-L2 comparisons for future research. Tables 1 and 2 below provide the background information of our L2 participants and the L1 participants respectively. All the L2 participants are able to communicate verbally in Mandarin Chinese at least at simple sentence level. All participants have given their written consent to participate in this project.

Background Information of the 14 L2 Subjects (L1 English)
1Aa23MBachelor22Classroom/ConversationFrench, Swedish
3Ba20MBachelor18Classroom/ConversationSpanish, German
6Ja31MMaster24Classroom/Conversation/Self-learningSpanish, French
8Jo37MHigh School34Classroom/ConversationNone
9Mi24FMaster18Classroom/Conversation/ReadingGerman, French
12Pa48MMaster28Classroom/Conversation/Self-learningGerman, French
14Ta36FHigh School33Classroom/ConversationNone

Background Information of the 6 L1 Mandarin Subjects
No. AgeEducation LevelGenderL2_Match
1Do 35 Doctor M Je
2 Gu 38 Bachelor M Ga
3 Qi 49Bachelor M Pa
4 Wu 35 Doctor M Jo
5 Ya 24 Master F Mi
6 Zh32Master F Ta


Speech samples were collected from each of the 20 participants on an individual basis. Each participant engaged in a structured narrative task, retelling in Mandarin Chinese the classic frog story “Frog, Where Are You?” (Mayer 1969, Berman & Slobin 1994) commonly used in cross-linguistic research. The process was videotaped with a high-quality audio track.

As is typical for studies using the frog story research tool, each participant reads the standard “frog story” storybook that tells a story in 24 pictures with no words, and then is asked to tell or retell the story in the target language. In addition, in this project, after reading the wordless storybook once, each participant (L1 and L2 participants alike) would listen once to the story narrated and audio-recorded in English with a standard story script to ensure that s/he became familiar with the story contents, before s/he retold the story in the target Mandarin language. The selected L1 Mandarin participants have the level of L2 English proficiency being able to understand the story narration in English.

The procedures in constructing the corpus follow the Talkbank format. We also conducted inter-person reliability checks of the transcriptions, the video-linking and synchronization of the data, as well as manual disambiguation of the automatic tagging.  


The construction of this corpus was supported by a grant (project code: 1-ZVAQ) from the Hong Kong Polytechnic University to Angel Chan.