|
Velka Popova Laboratory of Applied Linguistics University of Shumen v.popova@shu.bg |
|
Dmitar Popov Laboratory of Applied Linguistics University of Shumen labling@shu.bg |
| Participants: | 5, 50, 71 |
| Type of Study: | naturalistic, narrative |
| Location: | Bulgaria |
| Media type: | audio |
| DOI: | doi:10.21415/PHWH-J834 |
Popova, V. (2024). Колекция с българска детска реч в термините на корпусната лингвистика [A Collection of Bulgarian Child Language in Corpus Linguistics Terms]. Институт за български език „Проф. Любомир Андрейчин“, Българска академия на науките. 328-346.
Popova, V. (2021). Български корпус с детска реч на платформата CHILDES – Рогожникова, Т. М. (ред.) Теория и практика языковой коммуникации: материалы XIII Международной научно-методической конференции / Уфимск. гос. авиац. техн. ун-т; [отв. ред. Т. М. Рогожникова]. – Уфа: РИК УГАТУ, 2021, 136–147 (РИНЦ) ISBN 978-5-4221-1507-5.
Popova, V., Iglikova, R & Kordov, K. (2021). LABLASS and the BULGARIAN LABLING CORPUS for Teaching Linguistics. Selected papers from the CLARIN Annual Conference 2020. Linköping Electronic Conference Proceedings 180, 2021, 208–213. CLARIN Annual Conference. 208-213. 10.3384/ecp18022.
Popova, V., & Popov, D. (2023). Computer-assisted Transcription and Analysis of Bulgarian Child Speech Data using CHILDES and CLAN. Journal of Computational and Applied Linguistics, 1, 66–76. https://doi.org/10.33919/JCAL.23.1.3
Popova, V., Popov, D. (2025). Bulgarian Speech Resources in the CHILDES System. In: Karpov, A., Delić, V. (eds) Speech and Computer. SPECOM 2024. Lecture Notes in Computer Science, vol 15299. Springer, Cham. https://doi.org/10.1007/978-3-031-77961-9_13
The main focus of the LabLing research program is the creation of a Bulgarian children's language corpus as part of the CHILDES database. The LabLing is part of the consortium of the Bulgarian national research infrastructure for resources and technologies for linguistic, cultural and historical heritage, integrated within CLARIN EU and DARIAH EDU (CLaDA-BG – https://clada-bg.eu/en). The data in particular will be of great importance for the formation and creation of a national interdisciplinary electronic infrastructure in the process of integration and development of electronic resources in Bulgarian. Therefore, the construction of the LabLing CORPUS is a priority task of the consortium CLaDA-BG. The Cyrillic letters Я, Ю, Ъ, Ч, Щ, Ш, Ж, Ц, Й are assigned the following Latin correspondences: Я – ja , Ю – ju , Ъ – y , Ч – ch , Ш – sh , Щ – sht, Ж – zh , Ц – c , Й – j, X - x.
The children were born and live in the northeastern part of Bulgaria (Shuman and Varna). They were recorded in common situations (games, when dressing, eating, going to sleep, going through children’s pictorial books, free playing with mother, free playing with father, free playing with other children, reading a book and others) in the process of their daily interaction surrounded by their relatives. All individuals who were signed in the database in their role as participants in dialogues are monolingual native speakers of Bulgarian. The adults in the surroundings have a sufficient level of proper education (either secondary or higher university education). The audio-recordings of two of the children (ALE and TEF) were made by the researchers team of LabLing and those of of BOG, SIM, and ELI – by their mothers. The digitization and transcription of the material is done by the participants in the research team.
The narrative corpus consists of two segments. The first uses the fox and cat stories and the second uses the birds and dogs stories.
The fox-cat collection contains 91 transcripts of children`s narratives extracted from 50 monolingual children (native speakers of Bulgarian). They were recorded using a recorder in several kindergartens in Shumen and Varna (north-eastern Bulgaria), in only a few separate cases - at home or in the street. The children are grouped into 3 age groups: