The recent worldwide audiovisual content boom has led to an ever-increasing demand for such content to be made accessible in real-time, in different languages, and for a wide audience in a variety of settings, including television, conferences and other live events (e.g. lectures, museum tours, business meetings, medical appointments). The SMART project (Shaping Multilingual Access through Respeaking Technology), funded by the ESRC UK (Economic and Social Research Council UK, ES/T002530/1, 2020-2022), addresses the urgent challenge to deliver high-quality real-time speech-to-text services across languages by exploring a new practice, interlingual respeaking. SMART will investigate how this technique can be fine-tuned to best produce subtitles in a different language, and the impact this may have on society.
What is speech-to-text via respeaking?
Speech-to-text via respeaking is an innovative speech-recognition based method that relies on human-machine interaction to provide subtitles in real time. Respeaking can take two forms: intralingual and interlingual. In both cases, a professional, the respeaker, uses speech recognition software to produce the written text. Respeakers are also known as live subtitlers, speech-to-text interpreters, transpeakers or voice-writers.
Intralingual respeaking: the respeaker converts spoken content into written output (subtitles, captions or scrolling text) in the same language. This has become standard practice for many broadcasters in the UK and around the world. It makes content accessible to a wide-ranging audience who need access to audio for different reasons. It is likely that you have used intralingual live subtitles yourself, they are often displayed on television in places such as waiting rooms, pubs and airports.
Interlingual respeaking: the respeaker converts spoken content into written output in a different language. It is a far newer technique that combines human simultaneous interpreting and live subtitling skills with speech recognition technology to rapidly produce text on screen. The professional listens to the live spoken input and simultaneously translates it into a target language, voicing it to a speech recognition software, which then converts the spoken utterances into written text as shown below.
Here the words of the Spanish speaker are converted into written English subtitles in real time.
As the presenter gives their presentation (step 1), the respeaker listens and simultaneously translates it into a target language, and voices it along with punctuation for the speech recognition software (step 2). The speech recognition software converts the spoken input into written text and this appears on screen as quickly as possible (step 3).
What does SMART aim to do?
1. Map out interlingual respeaking as an emerging field and practice
- Where can we position interlingual repeaking within the interlingual speech-to-text landscape?
- When and where are speech-to-text practices currently used?
- How can technology be used to ease interlingual respeaking challenges?
Lots of different practices exist ranging from human-centred to semi- and fully-automated workflows. We want to identify the different communicative settings in which interlingual respeaking might be used, and the possible configurations and set ups that might be needed. As part of this we will be exploring the different terminology used because as you can see here there is a great deal of variation.
2. Explore the human variables that are used as interlingual respeaking is learned and used professionally
- How do different combinations of variables affect skill development and performance? These may be cognitive, interpersonal and procedural.
- What challenges arise as interlingual respeaking is learned and used professionally? How are they dealt with?
- How can we refine existing competence models for interlingual respeaking on the basis of practice we see during the upskilling course?
3. Measure the accuracy of interlingual respeaking by language professionals
- How well can interlingual respeaking be done?
- Which variables lead to high-accuracy performance?
- What affect does the duration of a programme or event have on performance?
- What else affects quality in interlingual respeaking?
We are developing an upskilling course to assess how language professionals cope with different respeaking tasks as they learn to respeak interlingually. We expect the personality traits, cognitive differences and expertise of each person to have an impact on how they approach tasks and perform in the course. This is the first study that will collect this range of data and insights about how language professionals from different walks of life perform in interlingual respeaking. We will be using mixed-methods to fully explore the process of respeaking and the written output displayed to the audience.
4. Optimise upskilling
- What are the strengths and weaknesses of our upskilling course for different language professionals?
- Which skills and techniques to overcome the challenges of interlingual respeaking can be taught? Which ones rely on the respeaker’s existing personality traits, cognitive abilities and expertise?
- How can we optimise upskilling for language professionals to ensure high quality interlingual respeaking?
We hope that our findings from the upskilling course and study will ultimately be used to identify aspects related to the process of how interlingual respeaking is learned and performed. These findings will be used to refine the upskilling course and train language professionals to help fulfil industry needs in a more timely and efficient manner.
Why is SMART important?
SMART explores a practice that has the potential to address important accessibility challenges in a variety of contexts. While this practice is already being used, there is still more to learn. What settings and set-up lead to the best respeaking practice? What is it like for language professionals to begin working as interlingual respeakers? How well does interlingual respeaking meet the access needs of those who use this service?
This project will collect extensive evidence to explore the first two questions in depth. We will be exploring what features may lead to an increase or decrease in the quality of the respoken text, and how language professionals from different walks of life respond to specialised training in this complex practice. Our findings will help us better characterise the skill set necessary to perform interlingual respeaking, including interpersonal traits and cognitive abilities that can best support this practice.
Analysing the process in so much depth and sharing findings with a variety of stakeholders will hopefully contribute to improving the recognition, status and working conditions of interlingual respeakers. The “Advanced Introduction to Interlingual Respeaking” course delivered in the first part of the project will enable language professionals to take an active role in shaping this practice.
Our findings will lead to a refined version of the course being offered in the future. An intensive Summer School will be organised in the final part of the project, alongside a Public Engagement event that will help us gauge the audience satisfaction in a live setting (2022). Stay tuned, subscribe to our mailing list to keep abreast of the developments and follow us on Twitter!
Who will benefit from SMART?
As our use of digital media from around the world continues to increase, it is vital that audiovisual content can be enjoyed by a wide audience. Intralingual respeaking became a standard practice for broadcasters worldwide in response to legal requirements to produce live subtitles for people who are deaf or hard of hearing. Interlingual respeaking, with a language transfer, can revolutionise the way subtitles are produced for live foreign-language content, making it accessible across sensory and language barriers.
Interlingual respeaking makes content accessible to a wide-ranging audience of all ages who need access to audio, including people with sensory, learning or cognitive disabilities who would benefit from a live transcript, people who are viewing content in public locations where that audio is muted and people who are not native speakers of the language being spoken. It can be used in a variety of settings, including television, conferences, education and other live events.
This reflects the inclusivity of respeaking and how it can be used to make information, content and entertainment more accessible.