This page explains the principles involved in securing IRB
permission for data sharing. If you already have IRB clearance and are
ready to contribute your data to TalkBank (CHILDES, AphasiaBank,
SLABank, etc.), you should follow these instructions
on how to actually submit your data.
1. IRB Applications
Contributions to TalkBank should obtain IRB approval
for the study, along with informed consent from individual
participants for data-sharing. There are no standard forms for IRB applications, since
every university or institute creates their own forms, procedures, and
templates.
2. Informed Consent
You can select from a series of options for contribution to TalkBank,
as described in this OPTIONS summary
Using less restrictive options will make the data more useful for research.
What is crucial is that you should ask participants to permit data access for
authorized researchers using password protection. This is because
all access to TalkBank data is in fact password protected. NIH refers to
this as "registered required" data access, as described in
this guide to data management and sharing from NICHD.
You should include on your form the fact
that participants always have the right to request that parts or all of
the data in which they participate be removed from TalkBank at any time.
3. Contributions of Archival Data
Often researchers will wish to contribute data collected in projects that have
already been completed. In such cases, it may be difficult or impossible to
contact participants to obtain a new consent form. However, IRBs are allowed
to permit including these data in TalkBank, if certain conditions are met.
The original consent forms did not have exclusionary language such
as "These data will only be made available to Professor XYZ and her laboratory".
If the consent forms says something like "These data will only be made available to
qualified researchers," then inclusion in TalkBank should be allowed, as long as
only qualified researchers are given the necessary password. If the consent form
is still more general, then passwords may not be necessary.
It is important to emphasize that granting agencies stipulate that data collected with federal funds
should be made available to researchers, as long as anonymity is preserved.
4. GDPR Compliance
The General Data Protection Regulation (GDPR) establishes rules for personal data
on the web. The EU web site for GDPR issues is https://gdpr-info.eu/.
In regards to TalkBank, there are five core GDPR issues
Commercial purposes issue:
GDPR is designed to apply to data transferred for commercial purposes.
TalkBank has no commercial purposes. However, it could still apply if
TalkBank were to collect emails and addresses, which it does not do.
The scientific data issue:
A good summary of these issues can be found in
this Nature article which notes that, consent is given "to certain areas of
scientific research when in keeping with recognised ethical standards
for scientific research." Article 89 of the GDPR states that, "Where
personal data are processed for scientific or historical research
purposes or statistical purposes, Union or Member State law may provide
for derogations from the rights referred to in Articles 15, 16, 18 and
21 subject to the conditions and safeguards referred to in paragraph 1
of this Article in so far as such rights are likely to render impossible
or seriously impair the achievement of the specific purposes, and such
derogations are necessary for the fulfilment of those purposes." In other words,
data-sharing is allowed for research purposes. In addition,
Recital 113 allows for transfers of data from a limited number of
data subjects for scientific purposes for an increase of knowledge.
The informed consent Issue:
NIH IRB informed consent guidelines are in accord with the GDPR Consent
rules. Given this, if participants give consent for making data available to
qualified researchers, then this should be approved. GDPR emphasizes also that
this consent must be revocable and that there should be methods for allowing
participants to revoke consent.
The deidentification issue:
If data are deidentified, then they are not personal data and are not
covered by GDPR and they should receive IRB exemption. Data are not
deidentified if they have: name plus surname, credit card, telephone,
address, or number plate. First name alone is not identifying, unless it
is common or the reference population is very small. Anonymization must
be irreversible. This means that contributors should destroy
participant names. This holds in both EU and USA. However, the GDPR
catch-22 here is that a link to the data needs to be maintained to allow
for data removal. The solution for this is to make the information
linking to a person only available to a third party "honest broker". See
below for a discussion of identification based on voice samples.
The Code of Conduct issue:
Article 40 allows for development of a Code of Conduct to facilitate
data transfer to non-EU countries. In the case that an institution prefers
to have identifiable media stored on servers in the EU, it is possible to
implement CORS (cross origin resource sharing) from a CHAT file at CMU to
a media server in the EU. This is done by allowing access from https://*.talkbank.org.
5. Methods for deidentification
If a transcript contains last names, these can be replaced
with the word "Lastname" with a capital L. Also
addresses or local city names should be replaced with "Addressname" with a
capital A. Other forms include "Cityname", "Schoolname", "Hospitalname" and so on.
These same English words should be used even in other
languages. It is not crucial to replace children's first names unless
they are very unique.
Deidentification of names and addresses in audio files linked to transcripts can be done through silencing of the relevant audio segment using Amadeus Pro or Audacity. It is more difficult to deidentify video. Therefore video can only be made available with explicit informed consent or through the higher level of password protection (committee approval).
The EU Amnesia project at https://amnesia.openaire.eu provides software for
deidentification of spreadsheet data.
You can avoid much of this extra work if you avoid using identifying
information when making recordings.
Voiceprints
Researchers often ask about whether they
need to request additional IRB approval for contributing audio data.
The concern is that audio data may be less confidential than transcript
data. However, as long as identifying material is removed from both
transcripts and audio, they do not present additional confidentiality
issues.
Some reviewers and IRB committees believe that spoken data is identifiable
through voice recognition technology. However, this judgment is based on a confusion
between closed-set identification and open-set identification. Closed-set
identification relies on a pre-existing pool of voiceprints from a given group,
such as members of a company or subscribers to a service. Open-set identification
does not rely on this pre-existing pool of voiceprints. As noted by Togneri and
Pullella (2011), "in open-set identification the unknown individual can come from the
general population. However as identification is always carried out
against a finite, known pool of individuals it is not possible to
identify arbitrary people."
Togneri, R., & Pullella, D. (2011). An overview of speaker
identification: Accuracy and robustness issues. IEEE circuits and
systems magazine, 11(2), 23-61. pdf
As Yuan and Liberman (2008) discovered, speaker identification in even
a closed group of Supreme Court judges in TalkBank's SCOTUS corpus is still
very difficult.
Yuan, J., & Liberman, M. (2008). Speaker identification on the SCOTUS
corpus. Journal of the Acoustical Society of America, 123(5), 3878.
pdf
6. Contributions to CHILDES and PhonBank
Although each University and project will have different requirements, contributors
often ask for a generic contribution template form, so here is a
sample CHILDES/PhonBank consent form based
roughly on the local format at CMU.
7. Contributions to AphasiaBank/DementiaBank/TBIBank/RHDBank:
Research with subjects with
disabilities requires additional access restriction, such as password
protection. It may also require more complete IRB documentation. In
this regard, researchers working with the AphasiaBank protocol will find
these additional IRB-approved materials useful:
Contributions to the other three clinical databanks -- DementiaBank, RHDBank, and TBIBank
can follow formats similar those given above for AphasiaBank. The issues involved are generally
similar.
8. Contributions to FluencyBank
To protect subject confidentiality, all research
contributions to FluencyBank are restricted and require password to
access. We suggest that new projects use a graduated consent form
developed at the University of Maryland, that allows participants to
specify use of video, audio-only, or transcript-only in contributed
data.
When communicating with your IRB, you may find the suggestions in this
briefing sheet helpful.
For projects underway, or recently completed, or longitudinal
projects in which PIs would like to have an ongoing relationship before
making a contribution request of subjects, we have a
sample post-hoc consent form from the University of
Maryland.
For completed projects that have used video without permission to share the
video, we will work with
you to extract the audio tracks from your video files. (Please see Contributing
audio, above, for reasons why this may not require additional IRB
consideration). Please contact Brian MacWhinney or Nan Bernstein Ratner
to determine how best to handle your data.