Abstract
In corpus-based interpreting studies, typical challenges exist in the time-consuming and labour-intensive nature of transcribing spoken data and in identifying prosodic properties. This paper addresses these challenges by exploring methods for the automatic compilation of multimodal interpreting corpora, with a focus on English/Chinese Consecutive Interpreting. The results show that: 1) automatic transcription can achieve an accuracy rate of 95.3% in transcribing consecutive interpretations; 2) prosodic properties related to filled pauses, unfilled pauses, articulation rate, and mispronounced words can be automatically extracted using our rule-based programming; 3) mispronounced words can be effectively identified by employing Confidence Measure, with any word having a Confidence Measure lower than 0.321 considered as mispronounced; 4) automatic alignment can be achieved through the utilisation of automatic segmentation, sentence embedding, and alignment techniques. This study contributes to interpreting studies by broadening the empirical understanding of orality, enabling multimodal analyses of interpreting products, and providing a new methodological solution for the construction and utilisation of multimodal interpreting corpora. It also has implications in exploring applicability of new technologies in interpreting studies.
Original language | English |
---|---|
Pages (from-to) | 48-70 |
Number of pages | 23 |
Journal | Across Languages and Cultures |
Volume | 25 |
Issue number | 1 |
DOIs | |
Publication status | Published - 10 Jun 2024 |
Keywords
- multimodal interpreting corpus
- multi-layer model
- automatic extraction of paralinguistic features
- disfluency
- mispronounced words
- automatic alignment