Nazarbayev University Repository (NUR) is an institutional electronic archive designed for the long-term preservation, aggregation, and dissemination of scientific research outcomes and intellectual property produced by the Nazarbayev University community and affiliated organizations.

Communities in DSpace

Select a community to browse its collections.

Recent Submissions

  • Item type:Item, Access status: Embargo ,
    Energy-Efficient Virtual Network Function (VNF) Management in Open Radio Access Networks
    (Nazarbayev University School of Engineering and Digital Sciences, 2026-04-29) Toporkov, Maxim; Kizilirmak, Refik; Maham, Behrouz
    In this thesis, we are exploring the energy efficient resource management in Open Radio Access Networks(Open RAN).Wearemainlyfocusedonoptimization of processing functions between centralized unit(CU) and distributed unit(DU). We proposed a system level model to analyze and evaluate the impact of selection functional splits and algorithms complexity on total energy consumption. Threshold-based heuristic, Binary Integer Programming (BIP) and Mixed Integer Linear Programming (MILP) have been chosen as main algorithms for analysis and evaluation due to the fact that they are differ in the cases of algorithm complexity, thus they all have different power and computational resources consumption profiles. For evaluation purposes we analyzed them under different conditions, considering processing, transport and control energy components. The results demonstrate that MILP is far superior in comparison to other two algorithms because it achieves the highest energy savings due to jointly optimizing resource allocation. In the cases of less complex algorithms they demonstrate close to MILP results in the cases of off peak hours traffic and far worse results at the peak hours. This work mainly shows the exchange between energy optimization and algorithmic complexity in Open RAN architecture.
  • Item type:Item, Access status: Open Access ,
    Detecting Machine-Generated Code in Multiple Programming Languages and Domains
    (Nazarbayev University School of Engineering and Digital Sciences, 2026-04-30) Khamitov, Rakhat
    The widespread adoption of large language models for software development has created an urgent need for reliable detection of machine-generated code. This thesis studies machine-generated code detection under realistic conditions where code varies across programming languages, application domains, model families, and generation strategies. The experiments are grounded in SemEval-2026 Task 13, Subtask A, an externally organized benchmark for machine-generated code detection. The benchmark used in this thesis contains training and validation data in three programming languages and evaluation data spanning eight programming languages, multiple domains, unseen generator families, adversarial examples, and human–AI co-authored settings. This thesis contributes a systematic comparison of lexical, structural, neural-embedding, metric-learning, comment-embedding, and stylometric approaches under in-distribution and out-of-distribution evaluation. The results show that high in-distribution validation performance does not predict robust detection: direct classifiers reach validation Macro-F1 above 0.94 but fall to 0.24–0.41 OOD Macro-F1. The strongest configuration, a comment-embedding SVM, achieves 0.671 OOD Macro-F1 on the labeled diagnostic test sample and a 0.638 Kaggle submission score, suggesting that comment style is a more stable cross-language signal than code-surface patterns alone.
  • Item type:Item, Access status: Open Access ,
    Designing a machine learning-calibrated IOT sensor network for real-time air quality assessment
    (Nazarbayev University School of Engineering and Digital Sciences, 2026-04-27) Zhexenov, Adil; Almagambetov, Akhan; Arzykulov, Sultangali
    Existing air quality monitoring infrastructure in Kazakhstan provides limited spatial coverage, particularly in cities with extreme continental climates and coal-dominated PM2.5 emissions. This thesis presents the design, deployment, and evaluation of an IoT-based sensor network for real-time PM2.5 monitoring in Astana. Four ESP32-based sensor nodes with PMS5003 sensors were deployed across the city, collecting 14,444 measurements over 28 days (February–March 2026) with 89.95% data completeness at temperatures down to −26.8 °C. Colocation with the Kazhydromet-14 reference station enabled machine learning calibration, with Random Forest achieving the highest accuracy (R² = 0.84, RMSE = 3.80 µg/m³), satisfying the EPA performance criterion. Age-based calibration analysis revealed that linear model coefficients degrade by 76.8% within one week during seasonal transitions, while Random Forest maintains stable performance (R² = 0.93–0.99), leading to a weekly retraining recommendation. For 7-day PM2.5 forecasting, LSTM was identified as the best model (R² = 0.23, RMSE = 11.62 µg/m³).
  • Item type:Item, Access status: Open Access ,
    Computational comparative analysis of global water legislation: an NLP and LLM-based framework for cross-jurisdictional policy assessment
    (Nazarbayev University School of Engineering and Digital Sciences, 2026-05-08) Alikhanov, Adilkhan; Siamac, Fazli
    The research outlined within this dissertation provides an approach to analyzing inter- national water legislation by using a computational pipeline to process water legislation from 165 different countries written in over 35 different languages and represented by over 10 different writing systems. The computational pipeline included seven steps: extracting the text from documents, translating that extracted text into English, eval- uating the quality of those translations based on multiple metrics, utilizing a large language model to extract legal information from the translated text, calculating the similarities between each piece of legislation utilizing embedded representations of the text, and finally clustering these similar pieces of legislation together to identify pat- terns of similarity among them. This computational pipeline shows how automated methods may provide an extension to the existing manual comparative tradition in water law research, allowing researchers to analyze large amounts of data that would be impossible to compare manually. Important findings from this project were: (1) that the quality of the translation was sufficient enough to allow for meaningful com- parison in the majority of the sample set (based on COMET reference-free quality estimation the mean score was 0.83); however, it was determined that there existed a phenomenon referred to as “contextual flattening,” where low resource languages had been reduced to a flat context that did not take advantage of the linguistic complexity present in the original language; (2) that the large language model-based extraction pipeline was able to extract all relevant information regarding three dimensions of wa- ter law policy—groundwater regulation, river basin management, and polluter-pays principle—with 100% compliance with the schema; (3) that cluster analysis revealed five distinct typologies of water law that corresponded with some extent to traditional classifications of legal families but also indicated cross-traditional convergence in basin- based governance practices; and (4) that the polluter-pays principle was found to be the most frequently used mechanism of implementation although it was never explicitly mentioned in any of the examined country profiles. The methodology presented in this dissertationwillserveasthebasisforfutureresearchinvolvingtheuseofcomputational comparative law in areas outside of the water sector.
  • Item type:Item, Access status: Open Access ,
    Energy-Efficient GPU Frequency Scaling Characterization for SLM Fine-Tuning on Embedded Platforms
    (Nazarbayev University School of Engineering and Digital Sciences, 2026-05-12) Amangeldi, Aidar; Park, Jurn Gyu; Do, Ton Duc; Lee, Min-Ho
    While embedded GPU dynamic voltage and frequency scaling (DVFS) is well-studied for inference workloads, fine-tuning exhibits different memory access patterns and runs 100–1000× longer, making inference-derived policies inappropriate. We present the first per-frequency characterization of transformer fine-tuning across three model scales (BERT-tiny 14M, BERT-base 110M, DeBERTa-xlarge 900M) on the NVIDIA Jetson AGX Orin, sweeping GPU frequencies from 306 to 1300 MHz on SST-2 and QNLI benchmarks. Across 77 experiments, optimal frequencies fall consistently in the 612–1020 MHz range, with production-scale models achieving 22–32% energy savings over the default governor. We develop a GPU-utilization-guided frequency selection algorithm requiring only 30 profiling steps that achieves a 1.5% average gap from the true optimum across 13 validation workloads, versus 21% energy waste for the default governor.