ESRC Centre for Research on Bilingualism
|Type of Study:||naturalistic|
In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.
The Siarad corpus of Welsh-English bilingual speech was recorded and transcribed between 2005 and 2008 as part of a research project funded by the Arts and Humanities Research Council (AHRC), entitled ’Code switching and convergence in Welsh: a universal versus a typological approach’. The main theoretical aim of the project was to test alternative models of code switching with Welsh-English data. The title of the corpus, Siarad is the Welsh word for speaking.
The corpus consists of 69 audio recordings and their corresponding transcripts of informal conversation between two or more speakers, involving a total of 153 speakers from across Wales. Participants were recruited via a variety of methods, including advertising, approaching visitors at a Welsh-language cultural event, and using the research team’s extended social network. In total, the corpus consists of 452,116 words of text from 40 hours of recorded conversation. The transcriptions (in CHAT format) are linked to the digitized recordings through sound links at the end of each main tier. Most recordings were in stereo, and made using radio microphones and a Marantz hard disk recorder. A minidisk recorder was also occasionally used, meaning that some recordings are in mono mode.
The recordings were made at a place convenient for the speakers, e.g. at their homes, workplaces or at the university. After setting up the equipment the researcher would leave the speakers to talk freely with one another. The first five minutes of all recordings after the point when the researcher left the room have been deleted. In some cases the researcher re-entered briefly during the recording. These sections have not been transcribed, but notes have been made in the relevant parts of the transcripts.
At the end of each recording all participants were asked to fill in questionnaires providing background information regarding their age, gender, location of places lived, etc, in order to provide information for sociolinguistic analysis. They were also asked to sign consent forms giving permission for their recording and its transcript to be used for research purposes and to be submitted to online linguistic archives. The consent form included the provision that the names of speakers and other people named in the recording would be replaced by pseudonyms in the transcript. In the case of children of 16 years or younger, a parent or guardian also signed the consent form.
When using these data, please refer to the corpus as the Bangor Siarad corpus, and provide a link to the website from which you accessed the corpus.