The current version defines two types of documents: the global elements
below... The global types are available for reusing through schema type extension/restriction.
The most up to date document definition is CHAT, it is also the richest in structure. Ideally,
each group should develop a schema module defining the structure of their specific (class of)
annotations, this schema should be an assembly of their definitions.
Developed by Romeo Anghelache, from the CHAT specifications, released under
the GNU Public License, 2001. Continuing development by Franklin Chen.
structure of a CHAT document
@Participants; a structure enumerating the beings
participating
31 March 1999 is formatted as 1999-03-31
an AIF document, see http://morph.ldc.upenn.edu/AG/doc/xml/
administrative descriptions, reused from Dublin Core
() in a word
unscoped code in the middle of an utterance; CHAT {...}
postcode at the end of an utterance; CHAT [+ ...]
allows semi structured extensions to the current set of annotations
allows for identification of a user who made this annotation
inlined annotations, the conventional CHAT symbols are listed
too
[!]
[!!]
[?] in CHAT, ( text ) in CA
[/] in CHAT
[//] in CHAT, - in CA
[///] in CHAT
[/?]
[/-]
quicker tempo, no CHAT equivalent, used in CA
slower tempo, no CHAT equivalent, used in CA
larger volume, louder, no CHAT equivalent, used in
CA
lower volume, no CHAT equivalent, used in CA
CA-style overlap
fmc
fmc
fmc
fmc
mark overlap scoping
[>]
[<]
[*] or [* text]
,, for %mor
For %mor
non verbal happenings
0
0word
0*word
00word
&; phonological fragment
&=; happening, such as sneeze
&*WHO=word; word spoken by
someone else
a reference to a point/portion of a mute/action signal, e.g. 0
intended as a feature of a word, see also the CHAT conventional notations
@ap
@b
@c
@cue
@d
@f
@fp
@fs
@g
@i
@inf
@ins
@k
@l
@m
@n
@nv
@o
@p
@pm
@pr
@q
@sc
@sas
@si
@sl
@t
@u
@x
@wp
a nonempty string
list of languages
syntactic structure
the unit of a %mor line corresponding to a word (this element belongs to a
word element, but, if the precise correspondence is not yet established, these elements will
be present at the utterance level (contained in an utterance);
%mor part of speech
omitted, CHAT equivalent is 0
subcategory
a group of words in %mor or %trn; can be empty
if associated with separator or terminator
a single word or a compound word
a compound word
structure used to let annotations to belong to more than one word, can be
recursive, although unnecessary: one can attach more than one annotations to a word, group
of words, or whole utterances
a word
xx
yy
xxx
yyy
www
0
0word
0*word
00word
&; phonological fragment
&=; happening, such as sneeze
utterance initiators or linkers; they indicate the way to fit the current
utterance with an earlier one, the CHAT conventional symbols are listed
too
+"
+""
+^
+<
+,
++
+≋
+≈
a pointer to a selection in the single video/audio
file associated with the transcript
frame
second
millisecond
byte
character
+ for mor
word#
=word (English translation)
morphemes
suffix marker, CHAT equivalent is -
suffix fusion marker, CHAT equivalent is &;
morphological category, CHAT equivalent is :, when used after
the stem
the beings along with their characteristics (age, sex...)
stress, blocking etc.
/
//
///
:
^ internal
^ at beginning
#, pause between words
[x number] in CHAT
,
,,
;
:
[c] clause-delimiter;
⇗
↗
→
↘
⇘
≡
period, question, exclamation; basic utterance terminator; tone
terminator
+.
+...
+..?
+!?
+/.
+/?
+//.
+//?
+"/.
+".
For heritage only
≋
≈
structure used to let annotations to belong to more than one word, can be
recursive, although unnecessary: one can attach more than one annotations to a word, group
of words, or whole utterances
Phonetic transcriptions of orthographic forms.
Collection of syllable constituents.
Specifies a syllable constituent. The type is one of constituentTypeType.
Each constituent can constist of one or more phones identified by zero-based index of the
parent phonetic rep.
If two adjacent nuclei exist, diphthongMember controls the parsing of a
hiatus.
Valid syllable constituent labels.
Syllable boundary marker (e.g., space, '.')
Syllable stress (i.e., primary or secondary)
Left appendix
Onset
Nucleus
Coda
Right appendix
Onset of an empty headed syllable
Ambisyllabic
Unknown
This type represents the alignment of two phonetic representations.
The number -1 represents an indel
(insertion-deletion point). Any number >= 0 is the index of a
phone identified by the referenced syllabifcation element.
clitic or compound or reduplication markers in wordnet
compound, CHAT +
clitic, CHAT ~
hyphen, CHAT -
clitic separators in morphemics
preclitic, CHAT $
postclitic, CHAT ~
a group of utterances having something in common, usually the
speaker
these are the (legacy) dependent tiers, %mor line is, now,
<morphemics> element
%add
%act
%alt
%cod; general purpose coding
%coh; cohesion tier
%com;[% text]; comments by investigator
%eng
%err; error coding
[%exc ...]
%exp; [= text]
%flo
%fac
%gls
%gpx
%int
%lan
%ort
%par:
%:
%pho:
%pht:
%mod:
%def; on the main line, not recommended
%sit
%ssy
%spa
%spe
%tim
arbitrary annotations of the form %xfoo, intended as an extension
mechanism
%ton
%rom
%sdi
%sch
%sxx
#
##
###
fmc should change to xs:duration
For use for delimited material. A workaround for lack of overlapping
elements in XML.
Begin delimited material
End delimited material
For use for delimited material. A workaround for lack of overlapping
elements in XML.
Begin delimited material
End delimited material
Begin and end delimited material (degenerate case)
Underline arbitrary content
Long feature <TAG material TAG>for Santa Barbara; other
begin/end features
Nonvocal <<TAG material TAG>>for Santa
Barbara
CA delimited material
CA subword element
∾
∙
Ἡ
↓
↻
↑
⁎
∆
▔
◉
▁
§
∮
∇
☺
°
⁇
∬
Ϋ
[: word1 ...]
["]
scoped symbols
plural after form marker
a pointer to a graphics file
a word spoken by some other
speaker
equivalent of CHAT symbol @;
category
Hack for CA heritage
the place to add research content