TalkBank MOR and UD Grammars

Processing with UD

We are currently transitioning the TalkBank system for morphosyntactic analysis from the MOR/POST/MEGRASP system to the UD (Universal Dependencies) system which is described in detail here . We apply UD taggers to TalkBank files using Stanford's Stanza system that has been built into the Batchalign2 program created by Houjun Liu, as described in this this article published in Language Development Research.

Processing with Batchalign

Creating the new UD analysis requires use of Batchalign2 which can be download and installed from here . However, users who are not familiar with the type of installation required by Batchalign are welcome to send their transcript to macw@cmu.edu for tagging. It only takes minutes for us to tag and then send you back the result. However, it is important that the transcripts have already been validated by the CHECK program inside CLAN.

As of March 2024, we have tagged these languages in CHILDES using UD: Afrikaans, Cantonese, Catalan, Croatian, Czech, Danish, Dutch, English, Estonian, French, German, Icelandic, Irish, Italian, Japanese, Korean, Mandarin, Norwegian, Polish, Portuguese, Serbian, Slovenian, Spanish, Swedish, Turkish, and Welsh. Once UD grammars become available for languages such as Sesotho or Nungon, we hope to apply UD through Batchalign to these languages also. Currently, application to Arabic, Bulgarian, Farsi, Greek, Hebrew, Russian, and Tamil is blocked by the fact that the transcripts were done in a non-standard romanization not supported by UD. Application to Danish and Hungarian will require extensive cleanup of the transcripts.

The great advantage of UD over MOR is that it is available for many more languages. It also performs much better than MOR for computing dependency relations on the %gra line. However, its control of morphological analysis on the %mor line is not as analytic as MOR. So, for English, French, and Spanish, we will retain access to the MOR grammars from these links:

The manual for MOR grammars is available here and researchers who might wish to develop the older incomplete MOR grammars for Italian, Dutch, German, Hebrew, or Japanese can write to Brian MacWhinney at macw@cmu.edu for copies of the older grammars.