CRIME PREDICTION AND FORECASTING: FEATURE SELECTION AND VULNERABLE REGION DETECTION MODELS

Bekmaganbet, Galym

NUR Home
→
01.NU Schools
→
School of Engineering and Digital Sciences
→
Theses and Dissertations
→
View Item

dc.contributor.author	Bekmaganbet, Galym
dc.date.accessioned	2021-07-26T04:28:08Z
dc.date.available	2021-07-26T04:28:08Z
dc.date.issued	2021-07
dc.identifier.citation	Bekmaganbet, G. (2021). Crime Prediction and Forecasting: Feature Selection and Vulnerable Region Detection Models (Unpublished master's thesis). Nazarbayev University, Nur-Sultan, Kazakhstan	en_US
dc.identifier.uri	http://nur.nu.edu.kz/handle/123456789/5603
dc.description.abstract	Crime is one of the most negatively affecting destructive factor for society. The efforts of law enforcement bodies are mostly oriented to determine the criminals post factum. However, in order to reduce the crime growth tendency proactive measures are essential. Therefore, constructing an effective crime sensitive region prediction model along with identifying proper features (factors) would concentrate the efforts of governmental bodies on most vulnerable areas. The objective of this research is to apply a suitable machine learning algorithm on crime, economic and social data to predict the likelihood of particular regions having low or high crimes levels with further defining main social and economical factors that correlate with crime growth in order to assist not only law enforcement bodies but whole governmental programs to solve related issues and improve crime prevention measures. In current work the most accurate prediction models were compared and investigated. Tests on available open source data were made and acquired models were applied to available data from Kazakhstani officials. During evaluation two main issues were faced: inconsistency and inadequacy of data. Consequently, data collection, exploration, preprocessing and normalization were significant steps. Furthermore, the number of popular models with efficient methodology were compared, combined and the one, that proved to be appropriate for Kazakhstani situation was figured out. Main prediction models based on Classification, Regression and Clustering techniques: Decision Tree, Random Forest, Naïve Bayesian, K-means, Support Vector Machine algorithms were selected. They were tested applying both - data available from opensource materials and collected from Kazakhstani state bodies. As a result of tuning parameters and testing various types of feature selection techniques Random Forest model proved to be the most accurate (UCI Repository materials, Accuracy: 0.837, Precision: 0.884, Recall: 0.872, F1 score: 0.868) among listed models, whereas Decision Tree achieved the best result on Kazakhstani data (govstat.kz materials, Accuracy: 0.781, Precision: 0.801, Recall: 0.767, F1 score: 0.784). Furthermore, statistical analysis were performed to define an appropriate threshold for classifying the high and low crime rate groups. At final stage hypothesis of importance of a certain feature was tested and model proved that this feature correlates with target (crime rate) and its inclusion positively affected the accuracy of result. Therefore, it can be claimed that the more we acquire expertise in the field of important features, the better selected model will perform.	en_US
dc.language.iso	en	en_US
dc.publisher	Nazarbayev University School of Engineering and Digital Sciences	en_US
dc.rights	Attribution-NonCommercial-ShareAlike 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/3.0/us/	*
dc.subject	Research Subject Categories::TECHNOLOGY	en_US
dc.subject	Random Forest model	en_US
dc.subject	Crime Prediction	en_US
dc.subject	Vulnerable Region	en_US
dc.subject	Forecasting	en_US
dc.subject	Type of access: Open Access
dc.title	CRIME PREDICTION AND FORECASTING: FEATURE SELECTION AND VULNERABLE REGION DETECTION MODELS	en_US
dc.type	Master's thesis	en_US
workflow.import.source	science