Large Scale Speech Recognition for Low Resource Language Amharic, an End-to-End Approach

Yohannes Ayana Ejigu; Tesfa Tegegne Asfaw

doi:10.35248/2090-4908.24.13.357

Awards Nomination 20+ Million Readerbase

Google Scholar citation report

Citations : 510

International Journal of Swarm Intelligence and Evolutionary Computation received 510 citations as per Google Scholar report

International Journal of Swarm Intelligence and Evolutionary Computation peer review process verified at publons

25+ Million Website Visitors

Indexed In

Genamics JournalSeek
RefSeek
Hamdard University
EBSCO A-Z
OCLC- WorldCat
Publons
Euro Pub
Google Scholar

Useful Links

Share This Page

Journal Flyer

International Journal of Swarm Intelligence and Evolutionary Computation

Open Access Journals

Abstract

Large Scale Speech Recognition for Low Resource Language Amharic, an End-to-End Approach

Yohannes Ayana Ejigu* and Tesfa Tegegne Asfaw

Speech recognition, or Automatic Speech Recognition (ASR), is a technology designed to convert spoken language into text using software. However, conventional ASR methods involve several distinct components, including language, acoustic, and pronunciation models with dictionaries. This modular approach can be time-consuming and may influence performance. In this study, we propose a method that streamlines the speech recognition process by incorporating a unified Recurrent Neural Network (RNN) architecture. Our architecture integrates a Convolutional Neural Network (CNN) with an RNN and employs a Connectionist Temporal Classification (CTC) loss function.

Key experiments were carried out using a dataset comprising 576,656 valid sentences, using erosion techniques. Evaluation of the model performance, measured by the Word Error Rate (WER) metric, demonstrated remarkable results, achieving a WER of 2%. This approach has significant implications for the realm of speech recognition, as it alleviates the need for labor-intensive dictionary creation, enhancing the efficiency and accuracy of ASR systems, and making them more applicable to real-world scenarios.

For future enhancements, we recommend the inclusion of dialectal and spontaneous data in the dataset to broaden the model's adaptability. Additionally, fine-tuning the model for specific tasks can optimize its performance for targeted objectives or domains, further enhancing its effectiveness in those areas.

Published Date: 2024-03-21; Received Date: 2024-02-15