Thumbnail Image




Almakhan, Symbat

Journal Title

Journal ISSN

Volume Title


School of Engineering and Digital Sciences


The process of data labeling is a critical step in the machine learning workflow since the quality and accuracy of the labeled data can significantly impact the performance of the trained models. Data labeling can be performed manually or automatically, depending on the dataset’s complexity and the available resources.Manual data labeling involves human experts who review and annotate the data based on predefined guidelines or criteria. While this approach provides high-quality labeled data, it can be time-consuming, labor-intensive, and costly, particularly for large datasets. In this research paper, the automatic annotation and classification approach is evaluated to determine if an autonomous data labeling can be designed for tree classification problems. The study utilized a dataset of 465 high-resolution images with dimensions of 6000x4000. K-means clusters the images based on the similarity of their feature vectors, which are extracted from the images using a feature extractor model into 4 and 10 classes. The clustered data was then used to train a CNN model, which was tested on a separate dataset. The results of the predictions were saved in a new folder with respective class labels and mapped to the original images, with the location of the cropped images indicated. Mapping the clustered images with their original images provides a visual understanding of which cluster is characterized by the features that are mostly present in images belonging to a certain cluster. By examining the features of each cluster, it is possible to identify the features that are most commonly associated with a particular image cluster. In this way, it is possible to identify the cluster that has features that are specific to trees.



Type of access: Restricted, data labeling


Almakhan, S. (2023). Autonomous tree labeling and recognition. School of Engineering and Digital Sciences