2 february 1100

AP19678995 – Development of a speaker recognition method using deep neural networks with ultra-short duration of pure speech

Objective of the project: The aim of the project is to investigate the possibility of implementing and training deep neural networks to identify speakers using ultrashort phrases when standard statistical methods do not work.

Relevance: The proposed project is aimed at investigating the effectiveness of the use of deep neural networks in the development of voice identification systems based on ultrashort phrases, the duration of which in its pure form does not exceed a few seconds. The relevance of these studies is based on the fact that the methods of speaker recognition used today are mainly focused on building a statistical model of the speaker's voice, where Gaussian mixed models, i-vectors, etc. are used. However, as practice shows, in real life there is often a situation when it is necessary to identify a person by his short phrases. It is clear that it is virtually impossible to construct a statistical digital voice model from an ultrashort utterance of a person. Thus, we are faced with the problem of creating a voice model of the speaker that does not require long utterances (with a duration of more than 15 seconds in its pure form). Based on this, we set the task to conduct research and develop algorithms for constructing a human voice model when traditional statistical methods are not applicable.

Scientific adviser: PhD, professor, Akhmediyarova Ainur Tanatarovna

The results obtained: Within the research project, speech recognition and speaker identification methods were thoroughly analyzed, and a digital voice model based on neural network architectures was developed. CNN, RNN, and BiLSTM models operating at the phoneme level were designed, trained, and experimentally evaluated. Algorithms for audio processing, Mel-frequency cepstral coefficient extraction, and embedding visualization were implemented. As a result, a model capable of identifying a speaker with high accuracy (about 97%) from short speech segments of 2–3 seconds was obtained. The model demonstrated robustness to noise and effective clustering of speaker-specific features. The project outcomes are supported by scientific publications and are ready for practical application.

List of publications with links to them

Ахмедиярова А.Т., Олжабаева А.Б., Намазбаев Т.А. Разработка мобильного приложения распознавания речи для людей с ограниченными возможностями // Евразийский Союз Ученых. Серия: технические и физико-математические науки. – 2023. – № 10 (113), т. 1. – С. 3–16. – URL: https://fizmat-tech.euroasia-science.ru/index.php/Euroasia/issue/view/147
Nurlankyzy A., Akhmediyarova A., Zhetpisbayeva A., Namazbayev T., Yskak A., Yerzhan N., Medetov B. The dependence of the effectiveness of neural networks for recognizing human voice on language // Eastern-European Journal of Enterprise Technologies. – 2024. – № 1 (9 (127)). – С. 72–81. – DOI: 10.15587/1729-4061.2024.298687
Ахмедиярова А.Т., Нурланкызы А., Кулакаева А.Е., Медетов Б.Ж. Анализ эффективности нейронных сетей по распознаванию человеческого голоса // Вестник Алматинского университета энергетики и связи. – 2024. – № 1 (64). – С. 37–46. – DOI: 10.51775/2790-0886_2024_64_1_37
Медетов Б., Нурланкызы А., Кулакаева А., Жетписбаева А., Намазбаев Т. Оценка влияния языка на точность распознавания человеческого голоса с помощью искусственных нейронных сетей // Вестник КазАТК. – 2024. – № 2 (131). – С. 456–466. – DOI: 10.52167/1609-1817-2024-131-2-456-466
Ахмедиярова А.Т., Жетписбаева А.Т., Касымова Д.Т., Үрістембек Г.Қ. Сөйлеуді тану технологияларын қолдану // Материалы международной научно-практической конференции AIACIT-2024 «Достижения и применение искусственного интеллекта в автоматизации, управлении и информационных технологиях». – Алматы, 2024. – Т. 2. – С. 108–117.
Medetov B., Zhetpisbayeva A., Akhmediyarova A., Nurlankyzy A., Namazbayev T., Kulakayeva A., Albanbay N., Turdalyuly M., Yskak A., Uristimbek G. Evaluating the effectiveness of a voice activity detector based on various neural networks // Eastern European Journal of Enterprise Technologies. – 2025. – № 1 (5(133)). – С. 19–28. – DOI: 10.15587/1729-4061.2025.321659
Medetov B., Nurlankyzy A., Namazbayev T., Akhmediyarova A., Zhetpisbayev K., Zhetpisbayeva A., Kargulova A. Speaker recognition by ultrashort utterances // Eastern European Journal of Enterprise Technologies. – 2025. – № 2 (9(134)). – С. 62–69. – DOI: 10.15587/1729-4061.2025.327907
Ахмедиярова А.Т., Алибиева Ж.М., Мукажанов Н.К. Дикторды сәйкестендіру кезінде сөйлеуді сегментациялау // Вестник Казахстанско-Британского технического университета. – 2025. – № 2 (73). – С. 10–23. – URL: https://vestnik.kbtu.edu.kz/jour/article/view/1983
Үрістембек Г.Қ., Ахмедиярова А.Т. Современные подходы к распознаванию нестандартной речи: обзор моделей и перспективы // Материалы международной научно-практической конференции «Цифровая трансформация: роль искусственного интеллекта и больших данных в праве, управлении и международных отношениях». – Алматы, 2025. – С. 295–302.

AP19678995 – Development of a speaker recognition method using deep neural networks with ultra-short duration of pure speech

An error has occurred!

Try to fill in the fields correctly.

Your data was successfully sent!

We will contact you shortly.

Your data was successfully sent!

A confirmation email was sent to your e-mail address. Please do not forget to confirm your e-mail address.

Translation unavailable