Tools for Analyzing Talk

 

Part 1:  The CHAT Transcription Format

 

 

Brian MacWhinney

Carnegie Mellon University

 

September 1, 2022

https://doi.org/10.21415/3mhn-0z89

 

 

 

 

 

 

 

When citing the use of TalkBank and CHILDES facilities, please use this reference to the last printed version of the CHILDES manual:

 

MacWhinney, B. (2000).  The CHILDES Project: Tools for Analyzing Talk. 3rd Edition.  Mahwah, NJ: Lawrence Erlbaum Associates.

 

This allows us to track usage of the programs and data systematically through scholar.google.com.


 

 

1        Introduction.. 5

2        The CHILDES Project 7

2.1     Impressionistic Observation. 7

2.2     Baby Biographies. 8

2.3     Transcripts. 8

2.4     Computers. 9

2.5     Connectivity. 10

3        From CHILDES to TalkBank. 11

3.1     Three Tools. 11

3.2     Shaping CHAT.. 12

3.3     Building CLAN.. 12

3.4     Constructing the Database. 13

3.5     Dissemination. 13

3.6     Funding. 14

3.7     How to Use These Manuals. 14

3.8     Changes. 15

4        Principles. 16

4.1     Computerization. 16

4.2     Words of Caution. 17

4.2.1      The Dominance of the Written Word.. 17

4.2.2      The Misuse of Standard Punctuation.. 18

4.2.3      Working With Video.. 18

4.3     Problems With Forced Decisions. 19

4.4     Transcription and Coding. 19

4.5     Three Goals. 19

5        minCHAT.. 21

5.1     minCHAT – the Form of Files. 21

5.2     minCHAT – Words and Utterances. 21

5.3     Analyzing One Small File. 22

5.4     Next Steps. 23

5.5     Checking Syntactic Accuracy. 23

6        Corpus Organization. 24

6.1     File Naming. 24

6.2     Metadata. 24

6.3     The Documentation File. 26

7        File Headers 28

7.1     Hidden Headers. 28

7.2     Initial Headers. 29

7.3     Participant-Specific Headers. 36

7.4     Constant Headers. 36

7.5     Changeable Headers. 39

8        Words. 43

8.1     The Main Line. 44

8.2     Basic Words. 44

8.3     Special Form Markers. 44

8.4     Unidentifiable Material 47

8.5     Incomplete and Omitted Words. 49

8.6     De-Identification, Anonymization, and Pseudonyms. 50

8.7     Standardized Spellings. 50

8.7.1      Letters. 51

8.7.2      Compounds and Linkages. 51

8.7.3       Capitalization and Acronyms. 52

8.7.4      Numbers and Titles.. 52

8.7.5      Kinship Forms.. 53

8.7.6      Shortenings.. 53

8.7.7      Assimilations and Cliticizations.. 54

8.7.8      Communicators and Interjections.. 55

8.7.9      Spelling Variants.. 56

8.7.10    Colloquial Forms.. 56

8.7.11    Dialectal Variations.. 56

8.7.12    Baby Talk.. 57

8.7.13    Word separation in Japanese.. 58

8.7.14    Abbreviations in Dutch.. 58

9        Utterances 60

9.1     One Utterance or Many?. 60

9.2     Satellite Markers. 61

9.3     Discourse Repetition.. 62

9.4     C-Units, sentences, utterances, and run-ons. 62

9.5     Retracing. 63

9.6     Basic Utterance Terminators. 63

9.7     Separators. 64

9.8     Tone Direction.. 65

9.9     Prosody Within Words. 65

9.10       Local Events. 66

9.10.1    Simple Events. 66

9.10.2    Interposed Word &*.. 67

9.10.3    Complex Local Events.. 67

9.10.4    Pauses.. 68

9.10.5    Long Events.. 68

9.11       Special Utterance Terminators. 68

9.12       Utterance Linkers. 71

10      Scoped Symbols 73

10.1       Audio and Video Time Marks. 73

10.2       Paralinguistic and Duration Scoping. 74

10.3       Explanations and Alternatives. 75

10.4       Retracing, Overlap, and Clauses. 76

10.5       Error Marking. 80

10.6       Precodes and Postcodes. 80

11      Dependent Tiers 82

11.1       Standard Dependent Tiers. 82

11.2       Synchrony Relations. 88

12      CHAT-CA Transcription. 90

13      Disfluency Transcription. 93

14     Transcribing Aphasic Language. 95

15      Arabic and Hebrew Transcription. 99

16      Specific Applications. 102

16.1       Code-Switching. 102

16.2       Elicited Narratives and Picture Descriptions. 103

16.3       Written Language. 103

16.4       Sign and Speech.. 104

17      Speech Act Codes. 106

17.1       Interchange Types. 106

17.2       Illocutionary Force Codes. 107

18      Error Coding. 110

18.1       Word level error codes. 110