AP19678995 – Development of a speaker recognition method using deep neural networks with ultra-short duration of pure speech
Objective of the project: The aim of the project is to investigate the possibility of implementing and training deep neural networks to identify speakers using ultrashort phrases when standard statistical methods do not work.
Relevance: The proposed project is aimed at investigating the effectiveness of the use of deep neural networks in the development of voice identification systems based on ultrashort phrases, the duration of which in its pure form does not exceed a few seconds. The relevance of these studies is based on the fact that the methods of speaker recognition used today are mainly focused on building a statistical model of the speaker's voice, where Gaussian mixed models, i-vectors, etc. are used. However, as practice shows, in real life there is often a situation when it is necessary to identify a person by his short phrases. It is clear that it is virtually impossible to construct a statistical digital voice model from an ultrashort utterance of a person. Thus, we are faced with the problem of creating a voice model of the speaker that does not require long utterances (with a duration of more than 15 seconds in its pure form). Based on this, we set the task to conduct research and develop algorithms for constructing a human voice model when traditional statistical methods are not applicable.
Scientific adviser: PhD, professor, Akhmediyarova Ainur Tanatarovna
The results obtained: Within the research project, speech recognition and speaker identification methods were thoroughly analyzed, and a digital voice model based on neural network architectures was developed. CNN, RNN, and BiLSTM models operating at the phoneme level were designed, trained, and experimentally evaluated. Algorithms for audio processing, Mel-frequency cepstral coefficient extraction, and embedding visualization were implemented. As a result, a model capable of identifying a speaker with high accuracy (about 97%) from short speech segments of 2–3 seconds was obtained. The model demonstrated robustness to noise and effective clustering of speaker-specific features. The project outcomes are supported by scientific publications and are ready for practical application.
List of publications with links to them
- Ахмедиярова А.Т., Олжабаева А.Б., Намазбаев Т.А. Разработка мобильного приложения распознавания речи для людей с ограниченными возможностями // Евразийский Союз Ученых. Серия: технические и физико-математические науки. – 2023. – № 10 (113), т. 1. – С. 3–16. – URL: https://fizmat-tech.euroasia-science.ru/index.php/Euroasia/issue/view/147
- Nurlankyzy A., Akhmediyarova A., Zhetpisbayeva A., Namazbayev T., Yskak A., Yerzhan N., Medetov B. The dependence of the effectiveness of neural networks for recognizing human voice on language // Eastern-European Journal of Enterprise Technologies. – 2024. – № 1 (9 (127)). – С. 72–81. – DOI: 10.15587/1729-4061.2024.298687
- Ахмедиярова А.Т., Нурланкызы А., Кулакаева А.Е., Медетов Б.Ж. Анализ эффективности нейронных сетей по распознаванию человеческого голоса // Вестник Алматинского университета энергетики и связи. – 2024. – № 1 (64). – С. 37–46. – DOI: 10.51775/2790-0886_2024_64_1_37
- Медетов Б., Нурланкызы А., Кулакаева А., Жетписбаева А., Намазбаев Т. Оценка влияния языка на точность распознавания человеческого голоса с помощью искусственных нейронных сетей // Вестник КазАТК. – 2024. – № 2 (131). – С. 456–466. – DOI: 10.52167/1609-1817-2024-131-2-456-466
- Ахмедиярова А.Т., Жетписбаева А.Т., Касымова Д.Т., Үрістембек Г.Қ. Сөйлеуді тану технологияларын қолдану // Материалы международной научно-практической конференции AIACIT-2024 «Достижения и применение искусственного интеллекта в автоматизации, управлении и информационных технологиях». – Алматы, 2024. – Т. 2. – С. 108–117.
- Medetov B., Zhetpisbayeva A., Akhmediyarova A., Nurlankyzy A., Namazbayev T., Kulakayeva A., Albanbay N., Turdalyuly M., Yskak A., Uristimbek G. Evaluating the effectiveness of a voice activity detector based on various neural networks // Eastern European Journal of Enterprise Technologies. – 2025. – № 1 (5(133)). – С. 19–28. – DOI: 10.15587/1729-4061.2025.321659
- Medetov B., Nurlankyzy A., Namazbayev T., Akhmediyarova A., Zhetpisbayev K., Zhetpisbayeva A., Kargulova A. Speaker recognition by ultrashort utterances // Eastern European Journal of Enterprise Technologies. – 2025. – № 2 (9(134)). – С. 62–69. – DOI: 10.15587/1729-4061.2025.327907
- Ахмедиярова А.Т., Алибиева Ж.М., Мукажанов Н.К. Дикторды сәйкестендіру кезінде сөйлеуді сегментациялау // Вестник Казахстанско-Британского технического университета. – 2025. – № 2 (73). – С. 10–23. – URL: https://vestnik.kbtu.edu.kz/jour/article/view/1983
- Үрістембек Г.Қ., Ахмедиярова А.Т. Современные подходы к распознаванию нестандартной речи: обзор моделей и перспективы // Материалы международной научно-практической конференции «Цифровая трансформация: роль искусственного интеллекта и больших данных в праве, управлении и международных отношениях». – Алматы, 2025. – С. 295–302.