Through the lens Ex achina: using NLP and statistical learning methods to model eyewitness statements and choosing behavior
University of Cape Town
Project type: Open call project
Start Date: 01-11-2018
End Date: 31-10-2020
Project Objective:
The primary aim of this project is to develop and put to trial a new, innovative way of analysing and using eyewitness statements and descriptions to predict eyewitness identification performance. This has not been done before within natural language processing or machine learning methods, and this could solve the current difficulty of analysing large quantities of verbal data.
A secondary aim is that the modelling of witness statements may provide usable predictions of whether an eyewitness i) will be able to make an identification ;ii) will be reliable; and iii) has a strong memory of the event in question. This model might well provide more useful and objective information about meta-memory than a subjective measurement of self-reported confidence and willingness to testify.
A third aim is to develop a large, and publicly available archive or witness statements. We will curate our 2000 witness statements in an online, open access database to achieve this goal. We will have a sub-section for witness statements from criminal cases in South Africa, and are presently able to potentially offer 108 witness statements to real crimes.
A fourth aim is to test the proposition that statements taken by an interviewer versus those given directly by witnesses are distinguishable in the language models we build and are differentially useful. This will require an experimental program of research, with at least one study, but probably two or three.
Impact of the project in the long run
This project advances the use of natural language processing techniques in the applied field of eyewitness memory, and may provide a solution to working with qualitative data collected in an experimental setting.
Additionally, this project has resulted in a large curated archive of eyewitness data, which was collected in South Africa and internationally; an archive such as this is extremely valuable to researchers. A second curated archive has also been constructed, which consisted of 108 eyewitness statements collected from real eyewitnesses in South Africa. This curated archive is rare and is of great value.
Finally, teaching materials on the topic of NLP were developed for the use of a Masters-level Statistics course.
Scope of the project:
The use of NLP techniques for assessing eyewitness memory can be used in any language and any country as long as the materials used are appropriate for the language. For example, the libraries that we used were specifically constructed for English; if a French library exists, then the same techniques can be used for French data.
Research and development opportunities of the project
This project has resulted in four smaller projects:
1) A project that aims to use NLP processing techniques to analyse eyewitness statements and descriptions
2) a Masters-level project that is investigating methods of building composites from descriptions; and
3) two Honours-level projects that are investigating different methods of adducing descriptions and statements from witnesses.
Achievements
Through the assistance of SADiLaR, we have achieved the following
- 3 x presented papers at international conferences (American Psychology-Law Society 2020; British Psychological Society Cognitive Section Conference 2020)
- 1 x accepted paper to the International Congress of Psychology 2020 in Prague (but the conference has moved to 2021 due to COVID).
- 4 x accepted papers in peer-reviewed journals (local and international)
- 3 x papers in progress
The use of NLP techniques for assessing eyewitness memory can be used in any language and any country as long as the materials used are appropriate for the language. For example, the libraries that we used were specifically constructed for English; if a French library exists, then the same techniques can be used for French data.