TURNING WHISPER INTO REAL-TIME TRANSCRIPTION SYSTEM

IEEE

Real-time Energy Efficiency Monitoring Using IoT

Connected Industrial Monitoring Systems Using IoT

Smart Urban Traffic Solutions with IoT Integration

IoT-Based Smart Agriculture and Farming Solutions

Smart Urban Traffic Management with IoT Integration

IoT-Based Smart Energy Metering Solutions

Click here to download the project base paper of speech recognition.

Abstract:

Whisper is one of the recent state-of-the-art multilingual speech recognition and translation models, however, it is not designed for real-time transcription. In this paper, we build on top of Whisper and create Whisper-Streaming, an implementation of real-time speech transcription and translation of Whisper-like models. Whisper-streaming uses a local agreement policy with self-adaptive latency to enable streaming transcription. We show that Whisper-Streaming achieves high quality and 3.3 seconds latency on an unsegmented long-form speech transcription test set, and we demonstrate its robustness and practical usability as a component in live transcription service at a multilingual conference.

Introduction Whisper (Radford et al.,2022) is a cent state-of-the-art system for automatic speech recognition (ASR) for 97 languages and translation from 96 languages into English—whisper models are publicly available under the MIT license.However, the current public implementations of Whisper inference usually allow only offline processing of audio documents that are completely available at the time of processing, without any processing time constraints. Real-time streaming mode is useful in certain situations,e.g. for live captioning. It means that source speech audio has to be processed at the time when it is being recorded. The transcripts or translations have to be delivered with a short additive latency, in seconds.TherearesomeimplementationsofWhisperforstreaming, but their approach is rather naive, first, record a 30-second audio segment, and then process it. The latency of these methods is large, and the quality of the segment boundaries is low because a simple content unaware segmentation can split a word in the middle.

TURNING WHISPER INTO REAL-TIME TRANSCRIPTION SYSTEM, speech recognition, final year projects for computer science

PYTHON Research

Comments

No comments yet. Why don’t you start the discussion?

Real-time Energy Efficiency Monitoring Using IoT

Connected Industrial Monitoring Systems Using IoT

Smart Urban Traffic Solutions with IoT Integration

IoT-Based Smart Agriculture and Farming Solutions

Smart Urban Traffic Management with IoT Integration

IoT-Based Smart Energy Metering Solutions

Comments

Leave a Reply Cancel reply

Top Artificial Intelligence Projects for Students- Innovative AI Solutions and Ideas for Final Year

Android Project Ideas- Innovative Android Projects for Final Year Students

Bus tracker Android Application

E-Commerce Application for Mobile