Energy and memory-efficient PEFT optimization for on-device SLMS on consumer single-GPU

Akhmetzhanov, Kuanysh

Energy and memory-efficient PEFT optimization for on-device SLMS on consumer single-GPU

dc.contributor.advisor	Park, Jurn-Gyu
dc.contributor.author	Akhmetzhanov, Kuanysh
dc.date.accessioned	2026-05-29T12:12:44Z
dc.date.issued	2026-04-30
dc.description.abstract	Advances in large language models (LLMs) are resulting in rapid improvement in NLP. However, there is growing demand for smaller, more efficient models that can be deployed on devices lacking sufficient resources (such as smartphones and edge hardware) and personalized to individual users. While full fine-tuning remains costly in terms of VRAM, training time, and power consumption even when applied to small language models (SLMs), while additionally few prior studies have explored Parameter-Efficient Fine-Tuning (PEFT) approaches on on-device platforms or using personalization benchmarks, nor have they accounted for energy cost. In this work, we address each of these gaps by comparing five different fine-tuning approaches (Full Fine-Tuning, LoRA, LoRA+, QLoRA, and BitFit) on four SLMs from two distinct families (Transformer-based: TinyLlama-1.1B, Qwen3-1.7B; SSM-based: Mamba-1.4B, Mamba-2-1.3B) across three GLUE benchmark tasks (SST-2, QNLI, STS-B) and three LaMP personalization tasks (LaMP-1, LaMP-2, LaMP-3). To evaluate the PEFT methods, we use the Sustainable Accuracy Metric (SAM), which captures both task accuracy and energy cost in a single score. LoRA+ achieved the highest SAM in 19 of the 24 model-task configurations we tested. Furthermore, full fine-tuning incurred the highest energy cost while yielding no statistically significant improvement in performance. Across all model-PEFT pairs, TinyLlama-1.1B combined with LoRA+ provided the best SAM results on five of the six GLUE evaluations. These findings demonstrate that compact transformer-based models paired with parameter-efficient fine-tuning can be a practical and energy-aware approach to developing personalized models for deployment on resource-constrained on-device platforms, achieving as much as a 31% improvement over the baseline fully fine-tuned TinyLlama.
dc.identifier.citation	Akhmetzhanov, K. (2026). Energy and Memory-Efficient PEFT Optimization for On-device SLMs on Consumer Single-GPU. Nazarbayev University School of Engineering and Digital Sciences
dc.identifier.uri	https://nur.nu.edu.kz/handle/123456789/18786
dc.language.iso	en
dc.publisher	Nazarbayev University School of Engineering and Digital Sciences
dc.rights	Attribution-ShareAlike 3.0 United States	en
dc.rights.uri	http://creativecommons.org/licenses/by-sa/3.0/us/
dc.subject	Small Language Models (SLMs)
dc.subject	Parameter-Efficient Fine-Tuning (PEFT)
dc.subject	LoRA
dc.subject	On-Device AI
dc.subject	Model Personalization
dc.subject	Resource-Constrained Deployment.
dc.title	Energy and memory-efficient PEFT optimization for on-device SLMS on consumer single-GPU
dc.type	Master`s thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Kuanysh_Akhmetzhanov_FinalThesis.pdf
Size:: 603.9 KB
Format:: Adobe Portable Document Format
Description:: Master`s thesis

Embargo until 2027-05-13

Download

Collections

02. Master's Thesis