SUBLINEAR AND SUBQUADRATIC ALGORITHMS FOR SVMS

dc.contributor.authorOmarova, Gauhar
dc.date.accessioned2022-09-21T07:57:36Z
dc.date.available2022-09-21T07:57:36Z
dc.date.issued2022-05
dc.description.abstractSupport Vector Machine (SVM) is an important supervised machine learning algorithm used for regression and classification tasks. The core idea behind this algorithm is to create a hyperplane between data points of two classes. The algorithm is very intuitive and it works well when the points of the two classes are (almost) linearly separable and the training set is not large. However, if the points in the dataset are not linearly separable, in order to use SVM, one needs to transfer the data points to a higher dimension, which is a costly operation. Or, alternatively, another algorithm should be used (which probably will be a more complicated one). Therefore, it is important to know in advance whether the data is linearly separable because further steps of solving the given regression or classification task depend on that. One part of this thesis focuses on investigating linear separability of data in 2 and 3 dimensions. We propose an efficient algorithm for testing whether a dataset is linearly separable in the context of property testing. For a given parameter ✏ 2 (0, 1), the sample complexity and the running time complexity of our algorithm are both O(1/✏), which is sublinear in the number of samples. When the data points have a constant number of dimensions, the running time of the standard SVM algorithm can reach ⌦(n2) (where n is the size of the training set) and this makes the algorithm impractical for large values of n. In the second part of the thesis, we propose a more efficient algorithm for dataset training using SVM. Our algorithm does not use the entire dataset to determine the optimal hyperplane, but only a specially constructed subset of the data that guarantees an approximate solution. This allows us to design a subquadratic-time algorithm. More formally, our algorithm approximates the optimal hyperplane with an (e−1)/e multiplicative error in time O(nt · min(t, k)) + o(n2), where t is the total number of training samples in our subset and k is a hyperparameter that controls the number of nearest neighbors when we construct our subset.en_US
dc.identifier.citationOmarova, G. (2022). Sublinear and Subquadratic algorithms for SVMs (Unpublished master's thesis). Nazarbayev University, Nur-Sultan, Kazakhstanen_US
dc.identifier.urihttp://nur.nu.edu.kz/handle/123456789/6715
dc.language.isoenen_US
dc.publisherNazarbayev University School of Engineering and Digital Sciencesen_US
dc.rightsAttribution-NonCommercial-ShareAlike 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/us/*
dc.subjectSupport Vector Machineen_US
dc.subjectResearch Subject Categories::TECHNOLOGYen_US
dc.subjecttype of access: gated accessen_US
dc.subjectSVMen_US
dc.subjectalgorithmen_US
dc.subjectmachine learning algorithmen_US
dc.titleSUBLINEAR AND SUBQUADRATIC ALGORITHMS FOR SVMSen_US
dc.typeMaster's thesisen_US
workflow.import.sourcescience

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Thesis - Gauhar Omarova.pdf
Size:
2.8 MB
Format:
Adobe Portable Document Format
Description:
Thesis
Loading...
Thumbnail Image
Name:
Presentation - Gauhar Omarova.pdf
Size:
2.32 MB
Format:
Adobe Portable Document Format
Description:
Presentation