Energy and memory-efficient PEFT optimization for on-device SLMS on consumer single-GPU

dc.contributor.advisorPark, Jurn-Gyu
dc.contributor.authorAkhmetzhanov, Kuanysh
dc.date.accessioned2026-05-29T12:12:44Z
dc.date.issued2026-04-30
dc.description.abstractAdvances in large language models (LLMs) are resulting in rapid improvement in NLP. However, there is growing demand for smaller, more efficient models that can be deployed on devices lacking sufficient resources (such as smartphones and edge hardware) and personalized to individual users. While full fine-tuning remains costly in terms of VRAM, training time, and power consumption even when applied to small language models (SLMs), while additionally few prior studies have explored Parameter-Efficient Fine-Tuning (PEFT) approaches on on-device platforms or using personalization benchmarks, nor have they accounted for energy cost. In this work, we address each of these gaps by comparing five different fine-tuning approaches (Full Fine-Tuning, LoRA, LoRA+, QLoRA, and BitFit) on four SLMs from two distinct families (Transformer-based: TinyLlama-1.1B, Qwen3-1.7B; SSM-based: Mamba-1.4B, Mamba-2-1.3B) across three GLUE benchmark tasks (SST-2, QNLI, STS-B) and three LaMP personalization tasks (LaMP-1, LaMP-2, LaMP-3). To evaluate the PEFT methods, we use the Sustainable Accuracy Metric (SAM), which captures both task accuracy and energy cost in a single score. LoRA+ achieved the highest SAM in 19 of the 24 model-task configurations we tested. Furthermore, full fine-tuning incurred the highest energy cost while yielding no statistically significant improvement in performance. Across all model-PEFT pairs, TinyLlama-1.1B combined with LoRA+ provided the best SAM results on five of the six GLUE evaluations. These findings demonstrate that compact transformer-based models paired with parameter-efficient fine-tuning can be a practical and energy-aware approach to developing personalized models for deployment on resource-constrained on-device platforms, achieving as much as a 31% improvement over the baseline fully fine-tuned TinyLlama.
dc.identifier.citationAkhmetzhanov, K. (2026). Energy and Memory-Efficient PEFT Optimization for On-device SLMs on Consumer Single-GPU. Nazarbayev University School of Engineering and Digital Sciences
dc.identifier.urihttps://nur.nu.edu.kz/handle/123456789/18786
dc.language.isoen
dc.publisherNazarbayev University School of Engineering and Digital Sciences
dc.rightsAttribution-ShareAlike 3.0 United Statesen
dc.rights.urihttp://creativecommons.org/licenses/by-sa/3.0/us/
dc.subjectSmall Language Models (SLMs)
dc.subjectParameter-Efficient Fine-Tuning (PEFT)
dc.subjectLoRA
dc.subjectOn-Device AI
dc.subjectModel Personalization
dc.subjectResource-Constrained Deployment.
dc.titleEnergy and memory-efficient PEFT optimization for on-device SLMS on consumer single-GPU
dc.typeMaster`s thesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Kuanysh_Akhmetzhanov_FinalThesis.pdf
Size:
603.9 KB
Format:
Adobe Portable Document Format
Description:
Master`s thesis
Access status: Embargo until 2027-05-13 , Download