CRIME PREDICTION AND FORECASTING: FEATURE SELECTION AND VULNERABLE REGION DETECTION MODELS

dc.contributor.authorBekmaganbet, Galym
dc.date.accessioned2021-07-26T04:28:08Z
dc.date.available2021-07-26T04:28:08Z
dc.date.issued2021-07
dc.description.abstractCrime is one of the most negatively affecting destructive factor for society. The efforts of law enforcement bodies are mostly oriented to determine the criminals post factum. However, in order to reduce the crime growth tendency proactive measures are essential. Therefore, constructing an effective crime sensitive region prediction model along with identifying proper features (factors) would concentrate the efforts of governmental bodies on most vulnerable areas. The objective of this research is to apply a suitable machine learning algorithm on crime, economic and social data to predict the likelihood of particular regions having low or high crimes levels with further defining main social and economical factors that correlate with crime growth in order to assist not only law enforcement bodies but whole governmental programs to solve related issues and improve crime prevention measures. In current work the most accurate prediction models were compared and investigated. Tests on available open source data were made and acquired models were applied to available data from Kazakhstani officials. During evaluation two main issues were faced: inconsistency and inadequacy of data. Consequently, data collection, exploration, preprocessing and normalization were significant steps. Furthermore, the number of popular models with efficient methodology were compared, combined and the one, that proved to be appropriate for Kazakhstani situation was figured out. Main prediction models based on Classification, Regression and Clustering techniques: Decision Tree, Random Forest, Naïve Bayesian, K-means, Support Vector Machine algorithms were selected. They were tested applying both - data available from opensource materials and collected from Kazakhstani state bodies. As a result of tuning parameters and testing various types of feature selection techniques Random Forest model proved to be the most accurate (UCI Repository materials, Accuracy: 0.837, Precision: 0.884, Recall: 0.872, F1 score: 0.868) among listed models, whereas Decision Tree achieved the best result on Kazakhstani data (govstat.kz materials, Accuracy: 0.781, Precision: 0.801, Recall: 0.767, F1 score: 0.784). Furthermore, statistical analysis were performed to define an appropriate threshold for classifying the high and low crime rate groups. At final stage hypothesis of importance of a certain feature was tested and model proved that this feature correlates with target (crime rate) and its inclusion positively affected the accuracy of result. Therefore, it can be claimed that the more we acquire expertise in the field of important features, the better selected model will perform.en_US
dc.identifier.citationBekmaganbet, G. (2021). Crime Prediction and Forecasting: Feature Selection and Vulnerable Region Detection Models (Unpublished master's thesis). Nazarbayev University, Nur-Sultan, Kazakhstanen_US
dc.identifier.urihttp://nur.nu.edu.kz/handle/123456789/5603
dc.language.isoenen_US
dc.publisherNazarbayev University School of Engineering and Digital Sciencesen_US
dc.rightsAttribution-NonCommercial-ShareAlike 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/us/*
dc.subjectResearch Subject Categories::TECHNOLOGYen_US
dc.subjectRandom Forest modelen_US
dc.subjectCrime Predictionen_US
dc.subjectVulnerable Regionen_US
dc.subjectForecastingen_US
dc.subjectType of access: Open Access
dc.titleCRIME PREDICTION AND FORECASTING: FEATURE SELECTION AND VULNERABLE REGION DETECTION MODELSen_US
dc.typeMaster's thesisen_US
workflow.import.sourcescience

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Thesis - Galym Bekmaganbet.pdf
Size:
4.3 MB
Format:
Adobe Portable Document Format
Description:
Thesis
Loading...
Thumbnail Image
Name:
Presentation - Galym Bekmaganbet.pptx
Size:
5.27 MB
Format:
Microsoft Powerpoint XML
Description:
Presentation