TalkBank | MOR and UD Grammars |
We are currently transitioning the TalkBank system for morphosyntactic analysis from the MOR/POST/MEGRASP system to the UD (Universal Dependencies) system which is described in detail here . We apply UD taggers to TalkBank files using Stanford's Stanza system that has been built into the Batchalign2 program created by Houjun Liu, as described in this this article published in Language Development Research.
As of March 2024, we have tagged these languages in CHILDES using UD: Afrikaans, Cantonese, Catalan, Croatian, Czech, Danish, Dutch, English, Estonian, French, German, Icelandic, Irish, Italian, Japanese, Korean, Mandarin, Norwegian, Polish, Portuguese, Serbian, Slovenian, Spanish, Swedish, Turkish, and Welsh. Once UD grammars become available for languages such as Sesotho or Nungon, we hope to apply UD through Batchalign to these languages also. Currently, application to Arabic, Bulgarian, Farsi, Greek, Hebrew, Russian, and Tamil is blocked by the fact that the transcripts were done in a non-standard romanization not supported by UD. Application to Danish and Hungarian will require extensive cleanup of the transcripts.
The great advantage of UD over MOR is that it is available for many more languages. It also performs much better than MOR for computing dependency relations on the %gra line. However, its control of morphological analysis on the %mor line is not as analytic as MOR. So, for English, French, and Spanish, we will retain access to the MOR grammars from these links: