SUBLINEAR AND SUBQUADRATIC ALGORITHMS FOR SVMS

Omarova, Gauhar

SUBLINEAR AND SUBQUADRATIC ALGORITHMS FOR SVMS

dc.contributor.author	Omarova, Gauhar
dc.date.accessioned	2022-09-21T07:57:36Z
dc.date.available	2022-09-21T07:57:36Z
dc.date.issued	2022-05
dc.description.abstract	Support Vector Machine (SVM) is an important supervised machine learning algorithm used for regression and classification tasks. The core idea behind this algorithm is to create a hyperplane between data points of two classes. The algorithm is very intuitive and it works well when the points of the two classes are (almost) linearly separable and the training set is not large. However, if the points in the dataset are not linearly separable, in order to use SVM, one needs to transfer the data points to a higher dimension, which is a costly operation. Or, alternatively, another algorithm should be used (which probably will be a more complicated one). Therefore, it is important to know in advance whether the data is linearly separable because further steps of solving the given regression or classification task depend on that. One part of this thesis focuses on investigating linear separability of data in 2 and 3 dimensions. We propose an efficient algorithm for testing whether a dataset is linearly separable in the context of property testing. For a given parameter ✏ 2 (0, 1), the sample complexity and the running time complexity of our algorithm are both O(1/✏), which is sublinear in the number of samples. When the data points have a constant number of dimensions, the running time of the standard SVM algorithm can reach ⌦(n2) (where n is the size of the training set) and this makes the algorithm impractical for large values of n. In the second part of the thesis, we propose a more efficient algorithm for dataset training using SVM. Our algorithm does not use the entire dataset to determine the optimal hyperplane, but only a specially constructed subset of the data that guarantees an approximate solution. This allows us to design a subquadratic-time algorithm. More formally, our algorithm approximates the optimal hyperplane with an (e−1)/e multiplicative error in time O(nt · min(t, k)) + o(n2), where t is the total number of training samples in our subset and k is a hyperparameter that controls the number of nearest neighbors when we construct our subset.	en_US
dc.identifier.citation	Omarova, G. (2022). Sublinear and Subquadratic algorithms for SVMs (Unpublished master's thesis). Nazarbayev University, Nur-Sultan, Kazakhstan	en_US
dc.identifier.uri	http://nur.nu.edu.kz/handle/123456789/6715
dc.language.iso	en	en_US
dc.publisher	Nazarbayev University School of Engineering and Digital Sciences	en_US
dc.rights	Attribution-NonCommercial-ShareAlike 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/3.0/us/	*
dc.subject	Support Vector Machine	en_US
dc.subject	Research Subject Categories::TECHNOLOGY	en_US
dc.subject	type of access: gated access	en_US
dc.subject	SVM	en_US
dc.subject	algorithm	en_US
dc.subject	machine learning algorithm	en_US
dc.title	SUBLINEAR AND SUBQUADRATIC ALGORITHMS FOR SVMS	en_US
dc.type	Master's thesis	en_US
workflow.import.source	science

Files

Original bundle

Now showing 1 - 2 of 2

Name:: Thesis - Gauhar Omarova.pdf
Size:: 2.8 MB
Format:: Adobe Portable Document Format
Description:: Thesis

Download

Name:: Presentation - Gauhar Omarova.pdf
Size:: 2.32 MB
Format:: Adobe Portable Document Format
Description:: Presentation

Download

Collections

02. Master's Thesis