The MOR program provides a method for automatic tagging of corpora in the CHAT format. To make this work, it is necessary to construct a separate MOR grammar for each language. After analysis with MOR, users can then use the POST program to disambiguate the %mor line. We provide a POST disambiguation database for English, but for other languages, users will need to do the work of training a POST database for themselves. This whole system is fully described in the MOR Manual as well as in a book chapter on morphosyntactic analysis in CLAN.
We have working MOR grammars for these languages:
These grammars also include POST databases created by Christophe Parisse's POSTTRAIN program. After MOR finishes, POST runs automatically to disambiguate the output of MOR. After this, the grammars for English, Hebrew, Japanese, Mandarin, and Spanish will also run the MEGRASP programs to automatically create a dependency grammar analysis on the %gra line. However, the accuracy of these analyses varies across these languages because some need more training data.
To help those interested in building their own MOR grammars, we provide these two examples of minMOR grammars. One is the basic example and the other indicates how to build a grammar that targets only a few word forms, such as the German article.