24 june 518

BR21882268 — Automated Construction of a Multilingual Ontology for Empowering the Kazakh Language through Advanced AI Technologies

BR21882268 — Automated Construction of a Multilingual Ontology for Empowering the Kazakh Language through Advanced AI Technologies

Goal of the work: The purpose of this work is to develop tools for automated content filling of an intelligent NLP information resource, which implies the creation and implementation of effective algorithms and technologies capable of processing and analyzing large amounts of text data in different languages to extract useful information and knowledge for a given subject area.

Relevance of the work: The developed methods and algorithms have a sufficient degree of efficiency and have been tested at the level of the conducted experiments.

Scientific supervisor: Doctor of Ph.D., Research Professor, Musabaev Rustam Rafikovich

Results obtained:  Within the project, an intelligent NLP-based information resource was developed, including optimized search and navigation functionalities. A methodology for annotating parallel texts was created using modern computational linguistics standards for named entity extraction and language model training. A multilingual semantic dictionary was developed, along with approaches for building an interoperable domain-specific thesaurus. An algorithm for constructing a controlled multilingual thesaurus was implemented and integrated into the information system. A platform providing access to annotated text corpora and reference materials was established for a wide range of users. Scientific papers were published, a monograph was issued, and intellectual property outputs were obtained. The results align with current advancements in natural language processing and demonstrate strong practical relevance.

List of publications with links to them

  1. Mussabayev R., Mussabayev R. Strategies for Parallelizing the Big-Means Algorithm: A Comprehensive Tutorial for Effective Big Data Clustering // ArXiv. – 2023. – Vol. abs/2311.04517v1. – P. 1–53. – URL: https://arxiv.org/pdf/2311.04517v1.pdf.
  2. Мусабаев Р., Мусабаев Р., Кульдеев Н. Использование конкурентной оптимизации и стохастической выборки в Big-means для эффективной параллельной кластеризации K-means // Труды 21-й Междунар. конф. «Математические методы распознавания образов» (MMRO-23). – Москва, 2023. – P. 1–4.
  3. Kozbagarov O., Mussabayev R. Distributed random swap: An efficient algorithm for minimum sum-of-squares clustering // Information Sciences. – 2024. – Vol. 681. – Art. 121204. – https://doi.org/10.1016/j.ins.2024.121204.
  4. Mussabayev R., Mussabayev R. High-Performance Hybrid Algorithm for Minimum Sum-of-Squares Clustering of Infinitely Tall Data // Mathematics. – 2024. – Vol. 12, No. 13. – Art. 1930. – https://doi.org/10.3390/math12131930.
  5. Baktibayev D., Baigozha B., Akhmetov I., Mussabayev R., Krassovitskiy A., Toleu A. Literature review on aftershock and earthquake prediction models aided by NLP summarization and ontology extraction techniques // Procedia Computer Science. – 2024. – Vol. 238. – P. 579–586. – https://doi.org/10.1016/j.procs.2024.06.064.
  6. Toleu A., Tolegen G., Mussabayev R. Topic Modeling with Variable Neighborhood Search // Communications in Computer and Information Science. – 2024. – Vol. 2166. – P. 234–246. – https://doi.org/10.1007/978-3-031-70259-4_18.
  7. Kozbagarov O., Mussabayev R., Krassovitskiy A., Kuldeyev N. Interpretable Dense Embedding for Large-Scale Textual Data via Fast Fuzzy Clustering // Communications in Computer and Information Science. – 2024. – Vol. 2165. – P. 206–218. – https://doi.org/10.1007/978-3-031-70248-8_16.
  8. Tolegen G., Toleu A., Mussabayev R. Enhancing Low-Resource NER via Knowledge Transfer from LLM // Lecture Notes in Computer Science. – 2024. – Vol. 14810. – P. 238–248. – https://doi.org/10.1007/978-3-031-70816-9_19.
  9. Mussabayev R., Mussabayev R. Superior Parallel Big Data Clustering Through Competitive Stochastic Sample Size Optimization in Big-Means // Lecture Notes in Computer Science. – 2024. – Vol. 14796. – P. 224–236. – https://doi.org/10.1007/978-981-97-4985-0_18.
  10. Barakhnin V.B., Karpov M.V., Machikina E.P., Musasbayev R.R. Optimization of Database Operations in the Application for Text Corpus Analysis // Proc. 20th Int. Asian School-Seminar on Optimization Problems of Complex Systems (OPCS). – Novosibirsk, 2024. – P. 1–6. – https://doi.org/10.1109/OPCS63516.2024.10720387.
  11. Mussabayev R. WRDScore: New Metric for Evaluation of Natural Language Generation Models // Proc. 20th Int. Asian School-Seminar on Optimization Problems of Complex Systems (OPCS). – Novosibirsk, 2024. – P. 20–23. – https://doi.org/10.1109/OPCS63516.2024.10720439.
  12. Aubakirov S., Akhmetov I. Dynamic Optimization of Minimum Document Frequency for Extractive Text Summarization Using GreedSum Algorithm // Proc. 20th Int. Asian School-Seminar on Optimization Problems of Complex Systems (OPCS). – Novosibirsk, 2024. – P. 33–37. – https://doi.org/10.1109/OPCS63516.2024.10720390.
  13. Mussabayev R., Mussabayev R. BiModalClust: Fused Data and Neighborhood Variation for Advanced K-Means Big Data Clustering // Applied Sciences. – 2025. – Vol. 15, No. 3. – Art. 1032. – https://doi.org/10.3390/app15031032.
  14. Shestov A., Levichev R., Mussabayev R., Maslov E., Zadorozhny P., Cheshkov A., Toleu A., Tolegen G., Krassovitskiy A. Finetuning Large Language Models for Vulnerability Detection // IEEE Access. – 2025. – Vol. 13. – P. 38889–38900. – https://doi.org/10.1109/ACCESS.2025.3546700.
  15. Tolegen G., Toleu A., Mussabayev R. Contrastive Learning for Morphological Disambiguation Using Large Language Models in Low-Resource Settings // Applied Sciences. – 2024. – Vol. 14, No. 21. – Art. 9992. – https://doi.org/10.3390/app14219992.
  16. Aubakirov S., Akhmetov I., Gelbukh A., Mussabayev R. Dynamic optimization of min-df in the GreedSum algorithm for enhanced extractive summarization // Artificial Intelligence Review. – 2025. – Vol. 58, No. 9. – Art. 270. – https://doi.org/10.1007/s10462-025-11276-w.
  17. Mussabayev R. Optimizing Euclidean Distance Computation // Mathematics. – 2024. – Vol. 12, No. 23. – Art. 3787. – https://doi.org/10.3390/math12233787.
  18. Tolegen G., Toleu A., Mussabayev R., Krassovitskiy A., Zhuldyzbayuly N. Automatic Creation of Multilingual Knowledge Graph with Large Language Models // Lecture Notes in Computer Science. – 2025. – Vol. 15683. – P. 271–285. – https://doi.org/10.1007/978-981-96-6008-7_20.
  19. Mussabayev R., Mussabayev R. Boosting K-means for Big Data by Fusing Data Streaming with Global Optimization // Communications in Computer and Information Science. – 2025. – Vol. 2493. – P. 1–17. – https://doi.org/10.1007/978-981-96-5881-7_3.
  20. Mussabayev R., Mussabayev R., Toleu A., Ibraimova A. Efficient big data clustering via VNS-accelerated optimization // Book of Abstracts: 11th Int. Conf. on Variable Neighborhood Search (ICVNS 2025). – Montreal, 2025. – P. 17. – URL: https://2025.icvns.com/static/ICVNS_2025_Book_of_Abstracts.pdf.
  21. Toleu A., Tolegen G., Mussabayev R.R. Towards Full-Stack Kazakh NLP: from Morphology to Ontology and LLMs. – Almaty: KazNITU named after K.I. Satbayev, 2025. – 175 p.
  22. Mussabayev R., Mussabayev R. Parallel Memetic Differential Evolution for Minimum Sum-of-Squares Clustering // Communications in Computer and Information Science. – 2026. – Vol. 2747. – P. 1–17.
  23. А.с. № 62665. Рекурсивный алгоритм иерархической сегментации временных рядов для формирования структурной тематической таксономии текста / Мусабаев Р.Р. – опубл. 03.10.2025.
  24. А.с. № 62708. Алгоритм нечёткой кластеризации для тематического моделирования и онтологического структурирования текстовых корпусов / Мусабаев Р.Р. – опубл. 06.10.2025.
  25. А.с. № 62700. Алгоритм экстрактивной суммаризации текста на основе контекстных эмбеддингов и метода переменных окрестностей / Мусабаев Р.Р. – опубл. 06.10.2025.
  26. А.с. № 62164. KazNER: система распознавания именованных сущностей для казахского языка / Төлеу А., Төлеген Г. – опубл. 16.09.2025.
  27. А.с. № 62340. QazAnalyzer: инструмент анализа казахского текста и построения онтологий / Төлеу А., Төлеген Г. – опубл. 23.09.2025.
Back to top

An error has occurred!

Try to fill in the fields correctly.

Your data was successfully sent!

We will contact you shortly.

Your data was successfully sent!

A confirmation email was sent to your e-mail address. Please do not forget to confirm your e-mail address.

Translation unavailable


Go to main page