Roadheader performance prediction using Machine Learning Methods Case Study: San Manuel Mine, Arizona by Askar Omirzak THESIS SUPERVISOR Saffet Yagiz Thesis submitted to the School of Mining and Geosciences of Nazarbayev University in Partial Fulfillment of the Requirements for the Degree of Bachelor of Science in Mining Engineering Nazarbayev University May 2025 ORIGINALITY STATEMENT I, Askar Omirzak, hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at Nazarbayev University or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at NU or elsewhere is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project's design and conception or in style, presentation and linguistic expression is acknowledged. Signed on 03. 05. 2025 _______________________________________ i ABSTRACT This thesis is dedicated to the development and evaluation of predictive models for the performance of roadheader machines (ICR) using machine learning algorithms under conditions of limited data. The scarcity of available datasets is primarily due to high collection costs, issues of commercial confidentiality, and heterogeneous geological conditions, which significantly complicates the application of traditional prediction models. To address this challenge, the study employs data synthesis techniques that expand the training set by generating artificial observations through the addition of Gaussian noise, as well as alternative approaches based on Ridge Regression and Random Forest methods. A comparative analysis of various models is conducted, including linear methods (Ridge, Lasso, ElasticNet), ensemble algorithms (Random Forest, Gradient Boosting, Extra Trees), and nonlinear approaches (SVR, MLP). The results demonstrate that ensemble methods achieve the highest prediction accuracy, as evidenced by high R² values and low MSE values, even when using synthetically expanded datasets. However, while data synthesis improves model performance, it does not fully replace real-world observations, necessitating further validation of the developed models under practical conditions. The findings hold practical significance for optimizing planning processes and economic evaluations in the mining and construction industries, and they point to the promising prospects of integrating data synthesis techniques with real-time monitoring systems to enhance the robustness and interpretability of predictive models. ii ACKNOWLEDGMENT I want to express my sincerest gratitude to my thesis supervisor, Professor Saffet Yagiz, for his invaluable guidance and support through this research. His expertise, encouragement, and constant feedback have been instrumental in shaping my work. Also, I would like to thank my friends for their support and mentorship, especially – Ulan Sharipov, a person who giuded me through the world of Machine Learning. Special thanks to Abylai Salimzhanov, Sultan Turan, Adel Kolesnikov, Alibek Abilgazym, Abylai Kubeyev, Yersultan Tursyn, Zhandos Dauletov, Arsen Kenzhebekov, and many others who was with me during my time at Pochinki and School Also, Faculty Development Competitive Research Grant program of Nazarbayev University (Grant Number 201223FD8837) for funding this research is acknowledged. iii TABLE OF CONTENTS ABSTRACT i TABLE OF CONTENTS iii LIST OF FIGURES iv LIST OF TABLES vi 1. INTRODUCTION 1 1.1 Background 1 1.2 Objectives of the Thesis 10 1.3 Justification of the Research 10 1.4 Scope of Work 11 2. LITERATURE REVIEW 12 2.1. Introduction to the topic and Relevance of the study 12 2.1.1 Problem statement: "small data" and complex geology 12 2.1.2 Purpose and objectives of the literature review 13 2.2 Key Parameters Affecting the Performance of Roadheader Machines 14 2.3 Machine learning and its implementation in the mining industry 15 2.4 Analysis of existing studies on ICR prediction for roadheaders 17 2.5. Critical analysis and identified gaps 22 2.6. Future research directions 24 2.7. Literature Review Summary 26 3. CASE STUDY – SAN MANUEL MINE 27 4. METHODOLOGY 31 4.1 Data collection and description 31 4.2 Data Preprocessing and Feature Generation 31 4.3 Synthetic Data Generation 32 4.4 Cross-validation and evaluation metrics 38 5. DATA ANALYSIS 40 5.2 Model training and evaluation 45 5.3 Performance Comparison 64 6. DISCUSSION 69 7. CONCLUSIONS AND RECOMMENDATIONS 71 REFERENCES 72 APPENDICES 76 iv LIST OF FIGURES Figure 1: Comparison of Roadheader Machine Excavation with Drilling and Blasting in a Mining Environment (Krzystof &Piotr, 2019). Figure 2: Transverse Roadheader and Axial roadheaders. Figure 3: Geologic Map of San Manuel area (Schwartz, 1953) Figure 4: RMR table and Machine Performance Graph Figure 5: Visualization for Random Forest Regressor Labeling. Figure 6: Correlation Heatmaps of the Original Data. (Figure A - Heatmap of the Base Features, B - Polynomial Features) Figure 7: Histograms of Standardized Original Data Distributions (Histogram A - UCS, B - RQD, C - RMR value). Figure 8: Base Features Correlation Heatmaps of the Synthetic Data. (Figure A – Ridge, B - Gaussian, C - Random Forest). Figure 9: Polynomial Features Correlation Heatmaps of the Synthetic Data. (Figure A – Ridge, B - Gaussian, C - Random Forest) Figure 10: Synthetic Data Distribution for Ridge Regression Labeling (Figure A - RQD, B - RMR, C - UCS). Figure 11: Synthetic Data Distribution for Gaussian Noise Labeling (Figure A - RQD, B - RMR, C - UCS). Figure 12: Synthetic Data Distribution for Random Forest Regression Labeling (Figure A - RQD , B - RMR, C - UCS). Figure 13: ICR plots for Linear Models trained on Ridge Regression Labeling data (Scatterplot A - Ridge regression, B - Lasso, C - ElasticNet). Figure 14: ICR plots for Ensemble Models trained on Ridge Regression Labeling data (Scatterplot A - Random Forest, B - Gradient Boosting, C - ExtraTrees). Figure 15: ICR plots for Non-Linear Models trained on Ridge Regression Labeling data (Scatterplot A - MLP, B - SVR). Figure 16: ICR plot for Base Model trained on Ridge Regression Labeling data(ZeroR). Figure 17: ICR plots for Linear Models trained on Gaussian Noise Labeling method (Scatterplot A - Ridge regression, B - Lasso, C - ElasticNet). Figure 18: ICR plots for Ensemble Models trained on Gaussian Noise Labeling method (Random Forest, Gradient Boosting, ExtraTrees). v Figure 19: ICR plots for Non-Linear Models trained on Gaussian Noise Labeling data (Scatterplot A - MLP, B - SVR). Figure 20: ICR plot for Base Model trained on Ridge Regression Labeling data(ZeroR). Figure 21: ICR plots for Linear Models trained on Random Forest Regression Labeling data (Scatterplot A - Ridge Regression, B - Lasso, C - ElasticNet). Figure 22: ICR plots for Ensemble Models trained on Ridge Regression Labeling data (Scatterplot A - Random Forest, B - Gradient Boosting, C - ExtraTrees). Figure 23: ICR plots for Non-Linear Models trained on Random Forest Regression Labeling data (Scatterplot A - MLP, B - SVR). Figure 24: ICR plot for Base Model (ZeroR) trained on Random Forest Regression Labeling. vi LIST OF TABLES Table 1: General Comparison of Axial vs Transverse Roadheaders (Taylor & Francis Group, 2014) Table 2: Atlas Copco – Eickhoff classification Table 3: Neil et. al classification Table 4: Dataset Utilized for Model Establishing Table 5: EDA summary for the Original Dataset Table 6: EDA summary for the Polynomial features of the Original Dataset Table 7: Example Outputs of Predicted ICR values Table 8: Performance Metrics – Ridge Regression Labeling Table 9: Performance Metrics – Gaussian Noise Labeling Table 10: Performance Metrics – Random Forest Regression Labeling Table 11: Ranking table for models trained on Ridge Regression Labeling data. Table 12: Ranking table for models trained on Gaussian Noise Labeling data. Table 13: Ranking table for models trained on Random Forest Regression Labeling data. 1 1. INTRODUCTION The mining industry is constantly evolving, adopting advanced technologies to improve efficiency, safety, and environmental sustainability. One of the key technological advances is the use of roadheader machines, versatile excavators equipped with a rotating cutting head that were originally developed for coal mining and are now widely used in various mining and tunneling projects due to their flexibility and accuracy. The implementation of machine learning methods in predicting the performance of roadheaders is complicated by the limitation of available datasets, which are often small in size and do not fully cover all relevant parameters. Therefore, this study focuses on small dataset methods, aiming to develop robust and accurate forecasting models that can operate effectively even with a limited amount of data. 1.1 Background Roadheader machines are highly specialized equipment for the precise and efficient crushing of rock, soil and other geological formations. Unlike traditional methods that rely on drilling and blasting, these machines use mechanical cutting heads equipped with carbide picks that continuously crush the material. This technology has revolutionized the fields of underground mining, tunneling and civil engineering, offering a more controlled and safer alternative to traditional mining methods. The origins of roadheader machine technology date back to the mid-20th century Europe (Kogelmann & Schenck, 1982), when the need for mechanized excavation methods increased due to the limitations and dangers associated with conventional blasting methods. Roadheader machines were initially developed primarily for coal mines, where their excavation efficiency made them indispensable. Over time, improvements in machine design, the use of new materials and the development of automation have expanded the scope of application of roadheader machines, and today they are successfully used even in difficult geological conditions, including work in hard rocks. Modern roadheader machines are equipped with automated control systems and modern sensors, which allows optimizing the drilling process and significantly increasing operational safety. These improvements ensure stable operation of the equipment, help reduce production costs and improve the quality of mining operations. 2 Advantages Over Traditional Excavation Methods Traditional mining methods such as drilling and blasting present a number of operational challenges. Blasting generates significant vibration, noise and dust, which can have a negative impact on worker health and the surrounding rock mass. In addition, the unpredictability of blasting results in re-fracturing of the rock, which requires additional stabilization of the workings and increases the cost of transporting the material. In contrast, roadheader machines provide a more controlled and continuous mining process. The use of a mechanical cutting head allows the workings to be formed with high accuracy, which minimizes excess material removal and reduces the requirements for working stabilization. As described by Sandbak (1985) - the ability to cut rock without excessive vibration makes roadheaders especially valuable for tunneling projects in urban areas and geologically sensitive areas, where it is critical to minimize structural disturbance. Ozdemir (1997) highlighted another important advantage of roadheaders - their ability to operate in varying geological conditions with minimal downtime. Unlike blasting operations, which require scheduled delays for loading and detonation, roadheader machines allow continuous drilling, which helps to reduce project deadlines and increase overall productivity. The advanced automation technologies implemented in these machines include real-time monitoring and adaptive control systems, which allow for prompt adjustment of operating parameters depending on rock characteristics and equipment specifications. These innovations increase the efficiency of the cutting process and make roadheaders the preferred choice for underground operations where high precision and reliability are required. Figure 1: Comparison of Roadheader Machine Excavation with Drilling and Blasting in a Mining Environment (Krzystof &Piotr, 2019). 3 Roadheader Technical Components Roadheaders are complex machines that consist of several key components that provide efficient excavation, mobility, and material handling. The main systems include the cutting head mechanism, hydraulic and electrical systems, and advanced navigation and control systems. Cutting Head Mechanism The cutting head is the primary excavation tool in a roadheader machine, responsible for breaking up rock and soil using mechanical force. It consists of a rotating drum equipped with multiple tungsten carbide picks strategically positioned to maximize cutting efficiency. The cutting process is accomplished through a combination of rotary motion and forward thrust, allowing the machine to effectively penetrate rock masses and break up their structure. Roadheaders are equipped with two main types of cutting heads: transverse and axial (longitudinal). Transverse heads have a horizontally oriented drum, which makes them particularly effective in soft rock, where wide, uniform cuts facilitate faster excavation. In contrast, longitudinal heads with a vertically rotating drum provide deeper penetration and high efficiency in hard rock, concentrating the cutting force on a smaller contact area. The choice of cutting head orientation is determined by the geological conditions of the excavation site and the requirements of the specific project. The efficiency of a cutting head depends on several factors, including tooth geometry, material composition, and cutting force distribution. Proper tooth spacing is critical to optimizing rock fragmentation and preventing excess energy consumption. Research shows that improper tooth spacing can lead to uneven force distribution, accelerated tool wear, and reduced overall excavation efficiency. In addition, advances in materials science, such as the use of polycrystalline diamond coatings, have improved tool wear resistance and extended tool life. Modern developments in automation and real-time monitoring have further improved cutting head efficiency. Sensor-based systems are now able to analyze cutting force data in real time, allowing automatic adjustments to the cutting parameters to optimize the process. Furthermore, the integration of water jet cutting technologies has proven effective in reducing cutting resistance, minimizing dust, and reducing the risk of cutting failure. 4 Hydraulic and Electrical Systems The hydraulic and electrical systems of a roadheader machine play a critical role in its operational efficiency, providing the necessary power for excavation, maneuverability, and control. The hydraulic system is responsible for actuating the cutting head, controlling boom movement, and driving the machine’s crawler tracks. High-pressure hydraulic cylinders adjust the position of the cutting head, providing precise adjustments that improve excavation efficiency. The responsiveness of the hydraulic system directly affects cutting performance, especially in hard rock conditions where greater force is required to penetrate the rock mass. To optimize energy consumption, modern roadheader machines use variable displacement hydraulic pumps that dynamically adjust fluid flow based on cutting conditions. This allows hydraulic power to be used as efficiently as possible, reducing energy loss and extending the life of the machine. Additional advances in hydraulic drive design have improved machine stability, enabling the machine to maintain consistent cutting performance even in challenging geological conditions. The machine’s electrical system powers key components including the cutter head motor, integrated sensors and lighting. Programmable logic controllers (PLC) and advanced sensors are integrated into modern machine systems to improve accuracy and automate processes. Variable frequency drive systems dynamically change the cutter head speed, enabling the machine to adapt to changing rock conditions in real time. Remote monitoring capabilities provide continuous feedback on machine performance, facilitating predictive maintenance and reducing unplanned downtime. Navigation and control systems The integration of modern navigation and control systems has significantly improved the accuracy and efficiency of roadheader. Previously, traditional machines required skilled operators to manually adjust cutting parameters based on visual judgment and experience. However, modern machines are equipped with automated guidance systems that use geotechnical sensors, GPS, and laser scanning to improve cutting accuracy. Geotechnical sensors integrated into the machine continuously measure rock properties such as uniaxial compressive strength (UCS) and abrasiveness. These sensors provide real-time data, allowing adaptive control systems to dynamically adjust cutting force, speed, and tooth rotation. 5 The use of 3D mapping technologies further improves cutting accuracy by allowing operators to visualize the process and ensure compliance with tunnel or excavation design requirements. Remote control capabilities have also become an important feature of modern machines, improving the safety and efficiency of underground mining operations. Wireless communication systems allow operators to control the machine from the safety of operating rooms, reducing direct exposure to hazardous working conditions. In addition, machine learning algorithms are increasingly being used to analyze cutting patterns and optimize cutting parameters based on historical performance data. The future of roadheader machine control systems is moving towards full autonomy, where AI-based decision making and robotic drives have the potential to revolutionize underground mining methods. According to Yin (2024), the integration of these advanced automation technologies enables modern roadheader machines to operate with higher excavation efficiency, improved safety, and enhanced adaptability to diverse mining and tunneling conditions. Continued development of automated, sensor-based control and monitoring systems is expected to further enhance the accuracy and reliability of these machines in the coming years. Classification of Roadheaders Roadheaders are complex equipment used for various purposes and working conditions, which determines the existence of various types and classifications of this equipment. In the literature, several key criteria for classifying roadheaders are distinguished: by the type of cutting head, machine power, movement method and area of application. Classification by cutting head type There are two types of roadheaders according to cutting head type: Transverse head roadheaders: characterized by a cutting head located perpendicular to the axis of the machine (Figure ). Such roadheaders are most effective when driving large-diameter tunnels and provide an accurate contour of the workings, which makes them suitable for the construction of transport and utility tunnels. Transverse cutting heads, commonly referred to as the ripping method, are devices adapted from continuous mining machines that rotate perpendicular to the boom axis. This method is particularly effective in soft rock, where high extraction efficiency and greater adaptability to changing geological conditions are achieved. 6 Transverse machines generate turning forces at right angles to the gripping force, which makes them more stable when cutting rock. The design of these machines allows them to cut rock with strengths up to 100 MPa (15,000 psi), with the most powerful models capable of working at strengths up to 150 MPa (22,000 psi). However, optimum performance is achieved at rock strengths of around 30 MPa (5,000 psi), making them ideal for coal mining, sedimentary work, and soft rock tunneling (Hemphill, 2012). Axial head roadheaders: have a cutting head located parallel to the longitudinal axis of the machine. This type of roadheader is mainly used for narrow workings, as well as inclined and vertical passages, due to its high stability and lower vibration level. An axial cutting head, also known as an inline drilling head, rotates parallel to the boom axis. This design provides maximum forward cutting force, making it particularly effective in hard rock. Due to the lower cutting speed, axial machines consume fewer picks, which reduces wear and operating costs. They are often equipped with telescopic booms, which provide the necessary force for direct penetration into the rock mass. In hard rock conditions, axial machines are stabilized by hydraulic jacks or support arms, similar to the outriggers on a crane, which increases cutting stability. However, when working in soft rock, support arms may not be effective enough due to the low strength of the rock, and in wide tunnels, the fixed length of the booms limits maneuverability. Table 1: General Comparison of Axial vs Transverse Roadheaders (Taylor & Francis Group, 2014) Profile smoothness Favorable Unfavorable Muck loading efficiency Unfavorable Favorable Application limits For UCS < 60–80 MPa, non abrasive Soft to medium-strength rocks (UCS < 100–120 MPa), moderately abrasive Production rate Higher for UCS < 40–60 MPa Higher for UCS > 60–80 MPa 7 Figure 2: Transverse Roadheader and Axial roadheaders. Classification by weight There are several approaches to classifying roadheaders by weight: According to Tucker's classification (Tucker, 1985): Light: weight up to 30 t, able to cut rock with a strength of up to 70 MPa. Medium: weight from 34 to 45 t, able to cut rock with a strength of up to 100 MPa. Heavy: weight over 45 t, able to cut rock with a strength of up to 120 MPa. According to Atlas Copco – Eickhoff classification (Schneider, 1988): Table 2: Atlas Copco – Eickhoff classification (1988) Class Weight (tons) Class 0 less than 20 t Class I 20 to 30 t Class II 30 to 50 t Class III 50 to 75 t Class IV over 75 t 8 According to Neil et al. (1994) classification: Table 3: Neil et. Al. (1994) classification Class Weight (tons) Small less than 20 t Medium 20 to 30 t Large 30 to 50 t Rock cutting mechanics of Roadheaders The efficiency of roadheader machines is determined by the rock cutting mechanics, which include the interaction between the cutting tools and the rock surface. The ability of the machine to effectively crush rock depends on factors such as rock properties, cutting tool geometry, applied forces, and control systems. The cutting process is based on mechanical fragmentation, where the picks of the cutting head create local stresses in the rock, which leads to its destruction by crushing, shearing, or tensile tearing. Rock destruction during machine operation occurs primarily through indentation, crack propagation, and chip formation. When the tooth of the cutting tool penetrates the rock surface, it creates local stress concentrations, leading to the formation of an initial crushing zone directly below the tool. As the penetration depth increases, radial and lateral cracks propagate outward, which ultimately leads to fragmentation of the material. The main failure mechanisms are: - tensile failure, which occurs when tensile stresses exceed the tensile strength of the rock, causing crack propagation and fragmentation; - shear failure, associated with displacement of material along shear planes, typical of plastic or layered rocks; - compressive failure, which occurs when the applied forces exceed the compressive strength of the rock, which leads to its fragmentation - this mechanism predominates in hard and dense rock masses. 9 The machine's cutting performance depends on a number of geomechanical and operational factors. The uniaxial compressive strength (UCS) of the rock is one of the key parameters: soft rocks with a UCS below 50 MPa are easily fragmented, while hard rocks with a UCS above 100 MPa require greater efforts and specialized tooth configurations. Another important indicator is the Cerchar Abrasiveness Index (CAI), which measures the abrasiveness of the rock and its effect on tooth wear – highly abrasive rocks such as quartzite cause rapid tool wear, which increases maintenance costs and reduces cutting efficiency. The choice of pick geometry and material is critical to cutting efficiency. Picks with a wide tip angle generate higher cutting forces, making them suitable for hard rocks, while picks with a sharp angle are better suited for soft rocks, reducing energy consumption and improving the degree of fragmentation. Modern picks are made of tungsten carbide, which has high wear resistance and impact resistance, and picks with a polycrystalline diamond coating are successfully used for cutting hard rocks, significantly extending tool life and reducing downtime (Huff, 1980). In addition, performance is affected by operating parameters such as cutting speed, thrust force, torque and power consumption. Higher cutting speeds improve fragmentation but increase tooth wear, and excessive pushing forces can lead to premature tool failure. Higher torque allows deeper cutting, especially in hard rock conditions. Modern machines employ advances in automation, real-time monitoring, and adaptive control systems to optimize cutting efficiency. Machine learning algorithms analyze rock properties on the fly and adjust cutting force, tooth angle, and speed to achieve optimal efficiency. Water-jet-assisted cutting technology has been developed to reduce cutting resistance and minimize wear, and vibration-assisted cutting with ultrasonic vibrations can reduce required cutting forces by up to 30%, improving productivity in hard rock conditions (Grosso, 2014). 10 1.2 Objectives of the Thesis 1.2.1 To develop a predictive model for estimating the performance of Roadheaders using machine learning methods, taking into account rock properties, machine specifications, and operational parameters. 1.2.2 Conduct a comprehensive analysis of existing performance prediction methods and identify the most effective approaches for use in conditions of limited and small datasets 1.3 Justification of the Research The mining industry is increasingly challenged by the need to improve operational efficiency and safety, particularly in the context of limited and heterogeneous datasets. Traditional models for predicting the performance of roadheader machines often fail to account for the nonlinear and multifactorial interactions present in complex geological conditions. This shortcoming results in unreliable productivity forecasts and can lead to significant operational delays and increased costs. Furthermore, existing research was largely focused on large datasets, leaving a gap in methodologies that can effectively handle “small data” scenarios. By integrating advanced machine learning techniques with synthetic data generation, this research aims to bridge that gap. The innovative approach not only leverages the predictive power of ensemble algorithms and neural networks but also enhances the data landscape through carefully generated synthetic observations. The practical benefits of this study include more accurate productivity forecasts, optimized equipment utilization, and a reduction in the economic risks associated with overestimating machine performance. Scientifically, this research contributes to the field by combining classical mining engineering principles with modern data-driven methods, offering a novel perspective that could be applied to a range of similar challenges in the mining and construction sector 11 1.4 Scope of Work The primary objective of this research is to develop and validate a predictive model for estimating the performance of roadheader machines under conditions of limited data. To achieve this, the study will: 2. Define the Research Objectives: Focus on the development of machine learning models that incorporate synthetic data generation to overcome the constraints imposed by small sample sizes. 3. Data Preprocessing and Feature Engineering: Describe the methods used for cleaning the dataset, normalizing features, and expanding the feature space with polynomial transformations to capture nonlinear relationships. 4. Synthetic Data Generation: Evaluate three distinct methods—Ridge Regression- based labeling, Gaussian noise-based labeling, and Random Forest-based labeling—to augment the dataset, and assess their impact on model performance. 5. Model Development and Validation: Compare the performance of various predictive models, including linear models, ensemble methods, and neural networks, using standard evaluation metrics such as the coefficient of determination (R²) and mean square error (MSE). 6. Analysis and Comparison: Analyze the robustness of each model and synthetic data approach, providing insights into the conditions under which the models perform optimally. 7. Limitations and Future Work: Discuss the inherent limitations of synthetic data generation and the challenges associated with small datasets, as well as propose potential directions for further research. 12 2. LITERATURE REVIEW 2.1. Introduction to the topic and Relevance of the study With the development of the mining industry and large-scale construction of underground structures (transport tunnels, mine shafts, workings for laying communications), mechanized mining technologies are becoming especially important. One of the key types of equipment in this area is roadheaders. These machines allow for safer and more efficient work compared to the traditional drilling and blasting method. Due to their mobility and the ability to continuously extract rock, roadheaders reduce downtime, cut costs and increase productivity. However, despite the obvious advantages, the actual cutting rate (Instantaneous Cutting Rate, ICR) can vary greatly depending on many factors: geological (strength, fracturing, abrasiveness of rocks), technical (type of machine, power, cutting head), as well as organizational and technological (shift schedule, availability of qualified personnel, ventilation scheme, etc.). Incorrect performance assessment can lead to schedule delays, budget overruns and increased risks on the project. Therefore, a reliable ICR forecast is one of the most important elements of planning and economic assessment of future work. 2.1.1 Problem statement: "small data" and complex geology San Manuel Mine, due to it’s complex geological structure represents the classical problem faced by the researchers trying to develop ML models for mining purposes. Classical approaches to performance assessment or forecasting widely use empirical formulas (Bilgin et al. 1990, Copur et al. 1998, Thuro & Plinninger 1990, etc.), which take into account key rock parameters (UCS, RQD, RMR) and the characteristics of the machine itself. However, the versatility and accuracy of such models are often questioned. The main limitation is that each formula is obtained for specific conditions (a certain type of rock, class of equipment) and can give large errors outside the "native" range of values. In recent years, machine learning (ML) and data mining methods have been increasingly used to solve engineering problems. For ICR prediction, the most interesting are artificial neural networks (ANN), support vector machine (SVM), decision tree, random forest, ensemble methods (boosting, bagging). These methods are better at “capturing” nonlinear dependencies and, given a sufficient set of examples, are capable of self-training, increasing the accuracy of forecasts. 13 However, the mining industry often faces the problem of “small data”: there are few real projects with a full range of measurements, mining and geological conditions are often different, and some information may be unavailable due to commercial secrecy and other restrictions. As a result, many ML models begin to overtrain and lose accuracy due to insufficient sampling. Thus, the problem comes down to the need to create methods that can build reliable forecasts even with a small amount of input data, and also take into account the heterogeneity of geological and operational factors. In this context, methods of synthetic data extension, model regularization, and hybrid approaches (combination of empirics and ML) are of particular interest. 2.1.2 Purpose and objectives of the literature review The main purpose of the review is to analyze existing approaches to predicting the productivity of roadheaders and identify key trends and “blank spots” in research. To achieve this goal, it is proposed to solve the following tasks: ● Characterize the role and types of roadheader miners, showing their importance for modern underground construction and mining. ● Analyze modern approaches using machine learning, to evaluate their effectiveness and typical problems (in particular, small data sets). ● Formulate the main gaps in research that require more in-depth study or a new methodology. ● Justify the choice of a specific direction (for example, the use of ML algorithms in conditions of a limited sample) for further research work. 14 Structure of the further presentation The literature review is divided into several subsequent sections: ● Section 2 briefly describes the evolution of roadheaders and the key technical parameters that affect their performance. ● Section 3 is devoted to the analysis of traditional empirical and statistical methods for predicting ICR. ● Section 4 considers the use of machine learning methods, including neural networks, SVR and ensemble approaches, as well as the specifics of application in mining. ● Section 5 contains a critical analysis of the state of research, discussing the issues of “small data” and variability of geological conditions. ● Section 6 summarizes the review, highlighting the directions that will form the basis for the methodological part and future experiments. 2.2 Key Parameters Affecting the Performance of Roadheader Machines In the process of mechanized mining, the speed of rock extraction and the efficiency of work (Instantaneous Cutting Rate, ICR) depend on a combination of mining and geological, technical and organizational factors. In this section, the paper will consider the main parameters that, according to a number of studies (Bilgin et al. 1990, Copur et al. 1998, Ebrahimabadi et al. 2011, Seker & Ocak 2011, etc.), have the most significant impact on the result. Geological and geomechanical factors Uniaxial compressive strength (UCS): UCS is considered one of the main parameters for assessing the rigidity and difficulty of rock destruction. The higher the UCS value, the more difficult the cutting process is and the lower the potential productivity of the machine, all other things being equal. Various authors (e.g. Balci et al., 2004) point out that in most empirical models UCS can be included as a linear or power function to predict ICR. 15 Rock Quality Designation (RQD) and Classification Systems: RQD (Rock Quality Designation): reflects the degree of fracturing of the rock mass. With a high RQD, the rock is relatively solid, which complicates destruction; with a low RQD, the rock mass is more fragmented, and the excavation process can be accelerated. Rock Mass Rating (RMR) or similar systems: take into account UCS, fracture frequency and orientation, joint condition, etc. The higher the class (good rock mass quality), the more difficult the mechanical cutting is. Therefore, in some models such as Bilgin et al. (1988), Ebrahimabadi et al. (2011), high RMR correlates with lower ICR values. Analysis of oriented fracturing and soil parameters The orientation of cracks (angle α between the working axis and the planes of weakening) can significantly change the nature of rock destruction. For example, with a favorable orientation of the layers, the cutting process is facilitated. Abrasivity: the presence of quartz or other hard minerals in the rock leads to accelerated wear of the machine cutters, reducing efficiency and increasing tool costs. Humidity and water saturation: If the massif is saturated with water, deterioration in the stability of the roof and side walls is possible. In addition, high water content sometimes reduces the strength characteristics of the rock, but, on the other hand, may require additional drainage and complicate the organization of work Thus, these parameters - UCS, RQD, RMR, fracture characteristics, water saturation, etc. - have a complex effect on the instantaneous productivity (ICR). The more accurate the values obtained during engineering and geological surveys, the more reliable the forecast for the roadheader. 2.3 Machine learning and its implementation in the mining industry In the last decade, machine learning (ML) technologies have been rapidly spreading in the mining sector, playing a significant role in the digital transformation processes. One of the key reasons for this has been the development of computing power and data collection tools. Processing large volumes of information coming from numerous sensors and automated monitoring systems has allowed ML analytical algorithms to find hidden nonlinear 16 dependencies and form accurate predictive models. For example, Mahdevari et al. (2014) studying the relationship between the geomechanical properties of rock and cutting speed, showed that neural networks can outperform classical regression methods in complex geology. In the mining industry, machine learning is actively used in several main areas. Firstly, to predict the productivity of mining equipment (including roadheaders), where ANN models or support vector machines (SVM) look for patterns between rock characteristics (UCS, RQD, etc.) and the actual rate of advance. Secondly, to analyze the condition of equipment to prevent accidents (Predictive Maintenance). Such approaches use data from vibration diagnostics, acoustic and thermal sensors, which allows for early detection of failures and prevention of downtime. Thirdly, ML technologies are integrated into mining management systems: smart algorithms optimize the routes of road trains, distribute excavators and dump trucks, thereby increasing the overall efficiency of mining transport complexes. Despite all the obvious advantages, such as the ability to identify deep nonlinear dependencies and quickly adapt to new data, machine learning methods face the problem of “small data”. In the mining sector, large datasets are often unavailable: measurements are taken in unique geological conditions, storage formats and levels of detail are different. This entails the risk of model overfitting and the impossibility of correct validation of the results. To solve such problems, various approaches are proposed in the scientific literature. Seker & Ocak, (2019) include regularization algorithms (dropout, L2-regularization), generation of synthetic data based on the statistical properties of the original samples, as well as combined approaches combining classical mining engineering models with learning algorithms Thus, machine learning provides significant advantages in the analysis and forecasting of underground and open-pit mining processes. However, to achieve high accuracy and reliability of results, it is necessary to take into account both the technical features of the algorithms and the specifics of the geological data. In particular, to compensate for the lack of observations, ensemble learning (bagging, boosting) or transfer learning mechanisms can be used, when a model trained on one data set is further trained on another, similar in characteristics. Such a flexible approach provides a more universal solution for variable operating conditions of mining equipment (Ebrahimabadi et al., 2011). 17 2.4 Analysis of existing studies on ICR prediction for roadheaders The first works by Sandbak (1985), attempted to relate the instantaneous cutting rate (ICR) of a roadheader and rock parameters (usually UCS, fracturing index) to the installed power. The models remained relatively simple and did not reflect the multifactorial nature of the cutting process in complex geology. Gehring (1989) presented formulas for two types of roadheaders: For a transverse roadheader (power ≈250 kW): 𝐼𝐼𝐼𝐼𝐼𝐼 = ( 719 𝜎𝜎𝑐𝑐0.78) (1) For an axial roadheader (power ≈230 kW): 𝐼𝐼𝐼𝐼𝐼𝐼 = ( 1739 𝜎𝜎𝑐𝑐1.13 ) (2) where σc is the uniaxial compressive strength of the rock (MPa). The higher the σc, the lower the ICR value. Later, Bilgin et al. (1990) used more detailed empirical relationships. In particular, the study proposed the formula: 𝐼𝐼𝐼𝐼𝐼𝐼 = 0.28 × 𝑃𝑃 × (0.974)𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 (3) where P is the cutting head power (hp). The RMCI index is calculated as follows: 𝐼𝐼𝑅𝑅𝐼𝐼𝐼𝐼 = 𝜎𝜎𝑐𝑐 × ( 𝐼𝐼𝑅𝑅𝑅𝑅 100 )2/3 (4) σc is UCS (MPa), RQD is the core integrity index (%). It was assumed that for a fixed power, an increase in RQD often leads to a decrease in ICR, since the rock becomes more integral. 18 In the works of Copur et al. (1998), the emphasis shifted to the machine weight and specific energy. The authors derived the relationship: 𝐼𝐼𝐼𝐼𝐼𝐼 = 27.511 × 𝑒𝑒𝑒𝑒𝑒𝑒(0.0023 × 𝐼𝐼𝑃𝑃𝐼𝐼), (5) where RPI = (P × W) / σc, P is the power (kW), W is the weight of the machine (tons), σc is UCS. It was shown that more massive machines (with equal power) better stabilize the cutting process in hard rocks. In the studies of Thuro & Plinninger (1999) using the example of a 132 kW machine, a trend relationship was found: 𝐼𝐼𝐼𝐼𝐼𝐼 = 75.7 − 14.3 × 𝑙𝑙𝑙𝑙(𝜎𝜎𝜎𝜎) (6) where σc (MPa) is the uniaxial compressive strength. The model gave acceptable results in the given UCS range (about 30–100 MPa). Since the early 2000s, ML technologies have become increasingly used (Yagiz et al., 2009; Mahdevari et al., 2014), but empirical methods have also continued to develop. Thus, Balci et al. (2004) consider the cutting depth (d): For d = 5 mm: 𝐼𝐼𝐼𝐼𝐼𝐼 = 0.8 × 𝑃𝑃 0.37 𝜎𝜎𝑐𝑐0.86 (7) For d = 9 mm: 𝐼𝐼𝐼𝐼𝐼𝐼 = 0.8 × 𝑃𝑃 0.41 𝜎𝜎𝑐𝑐0.67 (8) where P is the power (kW), σc is the UCS (MPa). The value of 0.8 was determined empirically. Also Keles (2005) proposed an expression for the MK2B (milling type) roadheader: 19 𝐼𝐼𝐼𝐼𝐼𝐼 = 163.93 × 𝜎𝜎𝑐𝑐−0.5737 (9) Among more modern variants, Ebrahimabadi et al. (2011) can be noted. The authors applied the rock mass brittleness index (RMBI) and took into account the orientation of the layers. One of the models looked like this: 𝐼𝐼𝐼𝐼𝐼𝐼 = 5.56 × 𝐼𝐼𝑅𝑅𝑅𝑅𝐼𝐼 + 0.60 × 𝑎𝑎 − 8.17 (10) where a is the bedding angle of the layers, and RMBI is calculated based on UCS, RQD and rock brittleness indices. Another formula relates ICR to the specific energy (SE): 𝐼𝐼𝐼𝐼𝐼𝐼 = −0.18 × 𝑆𝑆𝑆𝑆³ + 28.57 × 𝑆𝑆𝑆𝑆 − 92.82. (2.11) Earlier, the idea of the SE (specific energy) approach was also developed by Rostami et al. (1994), proposing: 𝐼𝐼𝐼𝐼𝐼𝐼 = 𝑘𝑘 × ( 𝑃𝑃 𝑆𝑆𝑆𝑆 ) (12) where k is the energy transfer coefficient (0.45–0.55 for roadheader), P is the power (kW), SE is the specific energy (kWh/m³). Finally, Choudhary et al. (2017) can be mentioned, who reused the cubic dependence on SE, similar to Ebrahimabadi et al. (2011). This emphasizes the trend towards a more “energy” view of the rock cutting process. Thus, the evolution of ICR models goes from the simplest dependencies such as Gehring (1989), to taking into account an extended set of rock-mechanical indicators – e.g, Bilgin et al., (1990), Copur et al. (1998), and then to complex indices of the massif structure as in work of Ebrahimabadi et al. (2011), and specific energy characteristics as in Rostami et al. (1994). Today, many authors combine these formulas with machine learning methods, which allows them to refine the coefficients and increase the reliability of models when expanding the range of conditions. 20 Key Works Focusing on Geotechnical parameters A significant part of the early ICR prediction formulas was built around UCS (uniaxial compressive strength). For example, Gehring (1989), derived an equation for a 250 kW roadheader (transverse) in the form: 𝐼𝐼𝐼𝐼𝐼𝐼 = 719 𝜎𝜎𝑐𝑐0.78 (13) and for 230 kW (axial): 𝐼𝐼𝐼𝐼𝐼𝐼 = 1739 𝜎𝜎𝑐𝑐1.13 (14) where 𝜎𝜎𝑐𝑐 is UCS (MPa). An increase in 𝜎𝜎𝑐𝑐 leads to a decrease in ICR. Another common parameter is RQD (Rock Quality Designation). For example, Bilgin et al. (1990) included in the formula: 𝐼𝐼𝐼𝐼𝐼𝐼 = 0.28 × 𝑃𝑃 × (0.974)𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 (15) Where P is the cutting head power (hp). The RMCI index is calculated as follows: 𝐼𝐼𝑅𝑅𝐼𝐼𝐼𝐼 = 𝜎𝜎𝑐𝑐 × ( 𝐼𝐼𝑅𝑅𝑅𝑅 100 )2/3 (16) At higher RQD (more solid rock), there is a tendency for productivity to decrease at a fixed power P. The third basic parameter is RMR (Rock Mass Rating), which summarizes UCS, RQD, orientation and state of cracks. An increase in RMR indicates a higher quality rock mass, which complicates mechanical cutting and, accordingly, reduces ICR. The problem is the 21 ambiguity of RMR calculation methods: different authors can use slightly different scales and weighting factors. Therefore, UCS, RQD and RMR form a triad, which is most often encountered in predictive models. However, there is no single universal equation that can accurately account for all the diversity of geology. Many researchers have to combine classical mining and mechanical formulas with correction factors or pay attention to the specifics of the equipment, such as the mass and design of the cutting head. Comparison of Machine Learning Approaches Since the early 2000s, machine learning (ML) methods have become increasingly popular, which are capable of analyzing the relationships between several dozen parameters at once. The most frequently mentioned are: ANN (artificial neural networks), SVR (support vector regression), Random Forest, Gradient boosting and other ensemble methods. Seker & Ocak (2019) compared a range of such algorithms on a dataset that included UCS, RQD, machine weight, cutting head power, and actual ICR values. It turned out that ensemble methods (Random Forest, Gradient Boosted Trees) gave 5–10% higher accuracy than single models (e.g., simple neural network or linear regression). Ebrahimabadi et al. (2011) used a hybrid approach, combining empirical formulas with the concept of RMBI (Rock Mass Brittleness Index). For example, one of the models: 𝐼𝐼𝐼𝐼𝐼𝐼 = 5.56 𝐼𝐼𝑅𝑅𝑅𝑅𝐼𝐼 + 0.6𝑎𝑎 − 8.17 (17) where a is the bedding or fracturing angle, and RMBI is calculated using UCS, RQD and brittleness indices. Further optimization of the coefficients was carried out using training algorithms (ANN, SVR), which allowed to increase the accuracy. The main advantage of the ML approach is its flexibility and the ability to “learn” from data from different deposits. However, a high-quality result requires careful tuning of 22 hyperparameters and a sufficient number of observations, which is not always achievable in mining. Experience with small data sets Data limitations are one of the most pressing issues in building predictive ICR models. Performance measurements are usually taken over narrow intervals and are highly dependent on the unique local geology. This often leads to overfitting and reduced reliability of forecasts. Salsani et al. (2013) and Seker & Ocak (2019) proposed several strategies to improve the situation: Synthetic sample expansion: introducing “artificial” points around existing values (based on KNN). Cross-validation (k-fold or Leave-One-Out): allows each observation to be used as a test one in turn, increasing the objectivity of the accuracy assessment. Regularization: in neural networks, this can be dropout or L2 regularization, and in trees, a depth limit or a minimum number of observations per leaf. Such techniques help keep the model from overfitting to noisy data and make it more robust to the variability of mountain conditions. The final results are usually better than those of classical empirical equations, which are designed only for a narrow range of input parameters. 2.5. Critical analysis and identified gaps This part of the research aims to identify the gaps present in the previous works. Here, a brief analysis of the works aimed at empirical models, and more recent works on ML implication was performed. Analysis of the accuracy and limitations of various models Some researchers (Bilgin et al., 1990; Copur et al., 1998) emphasized the importance of a comprehensive accounting of mining and geological factors (UCS, RQD, RMR), but their formulas often show good convergence only in the initial data range. When trying to extend such empirical dependencies to other types of rocks or other types of tunneling machines (differing in weight, power), the accuracy drops significantly. Limitations in the generalizing ability - or, in the language of machine learning, in generalization - arise due to the fact that each model is "tied" to a specific set of conditions and features. 23 Another obstacle to comparing the results is the variety of initial data processing methods. Some authors may include rock abrasiveness in the calculation (CERCHAR, CAI), while others may not take this parameter into account at all. A similar problem concerns the method of measuring (or calculating) RMR: there are different "branches" of classification (Bieniawski 1973 vs. modifications of 1989, 1993, etc.), leading to variations in the massif assessment. As a result, direct comparison of models or their combination may be incorrect if the feature spaces are not consistent. Disadvantages and advantages of research for "small data" In mining, there are often situations when the set of empirical ICR measurement points is limited to tens of observations, and the measurements themselves may contain noise or errors. Under such conditions, even the most advanced machine learning algorithms risk overfitting, i.e. fitting the model to random fluctuations instead of stable patterns. In addition, there is often no independent test sample: all available data are used in training, and it becomes difficult to correctly assess the real accuracy of predictions. Insufficient variability of geological conditions also plays a role. If, for example, the entire sample is taken from one mine or only one object, the model can "remember" these specific parameters. When transferred to another deposit (with different UCS, RQD, fracture orientation), the forecast accuracy decreases sharply. Despite all the risks, research in the field of "small data" also has a positive side: it stimulates the search for adaptive or hybrid approaches, when basic empirical models and modern ML methods are combined. This can give a more practical result than universal formulas calculated for thousands of observations, because it is difficult or almost impossible to obtain so many observations in a real project. The need for an integrated approach Modern practice shows that one linear (or purely empirical) model is not enough to fully take into account all aspects that determine the instantaneous productivity of a roadheader. Many authors such as Ebrahimabadi et al., (2011), Seker & Ocak, (2019) emphasize the benefits of combining classical geomechanical equations of Bilgin et al. (1990), Copur et al. (1998), Rostami et al. (1994) with machine learning methods. An empirical formula or specific indices (RMBI, SE) provide the supporting “physical logic”, and the ML model refines the coefficients and better adapts to the local features of the massif. 24 Additionally, expert opinion (soft computing) can be used when working with incomplete or fuzzy data. For example, in conditions of a shortage of UCS, RQD and other key indicators, a fuzzy description is allowed. Expert systems based on the knowledge of geology specialists allow for high-quality model calibration where classical statistical learning is inapplicable due to a lack of numerical data. Thus, the main problem identified in most studies is the lack of a single “universal” solution capable of producing stable results on any sample. The optimal path of development is a combination of several approaches: from traditional mining-mechanical equations to hybrid ML algorithms taking into account expert assessments. Such a comprehensive path gives a chance not only to improve the accuracy of forecasts, but also to ensure the adaptability of the model to new or poorly studied conditions. 2.6. Future research directions Integration with additional technologies Current research tends to connect roadheaders with complex sensor systems and real-time data analysis systems. Li & Gao (2021) showed that the use of laser scanners and 3D face visualization can significantly improve the accuracy of automatic assessment of the working geometry and, accordingly, influence the choice of the optimal cutting mode. In parallel, the idea of continuous monitoring of machine parameters is being developed: Zhang et al. (2022) implemented a set of sensors to record acoustic signals and vibrations of cutters in order to record the transition from “normal” operation to potential emergency modes. Such data was integrated with machine learning algorithms that quickly update the instantaneous productivity forecast (ICR). Given a sufficient volume of accumulated observations, a number of authors (Mahdevari et al., 2014; Ghasemi et al., 2020) talk about the prospects of deep learning. For example, LSTM or transformer architectures can use time series recorded during mining. So far, these methods remain relatively rare in mining due to the difficulties in generating “big data”, but with the advent of expanded datasets, the result can be very promising. Strengthening the robustness of ML models Many studies such as Chang & Peng (2020) emphasize the need for regularization and optimization of models to cope with noisy and heterogeneous mining data. One of the key 25 techniques is strict cross-validation (k-fold), which allows for a reliable assessment of the generalization ability of the algorithm with a small number of observations. An important direction is the introduction of hybrid methods. For example, Seker & Ocak (2019) proposed ensembles (bagging, boosting) for working with a small set of training data. Other authors (Salsani et al., 2013; Ghasemi et al., 2020) are experimenting with combining classical neural networks and evolutionary algorithms (genetic algorithms, particle swarm optimization) for selecting hyperparameters. This strategy makes it possible to quickly find optimal settings for network depth, activation function, etc., reducing the risk of overfitting. Bayesian regression approaches deserve special mention, allowing one to estimate prior distributions of UCS, RQD, or other features when the data is too sparse (Li et al., 2022). Such models not only predict ICR, but also generate a probability interval, which provides a more complete picture of uncertainty. Model transferability Transferability (or transfer learning) becomes an important factor when a model trained on one field must quickly adapt to other geolocations. Zhang et al. (2022) experimentally showed that if you have a “base” neural network for a narrow range of UCS, then using an additional 10-15 points from a new field, you can fine-tune the network and achieve accuracy comparable to “pure” training on a large dataset. Alternatively, Chang & Peng (2020) point to the idea of forming a “universal” set of features (e.g. UCS, RMR, CAI, machine weight, power, cutter head type), standardized by measurement methods. This will allow researchers from different regions to “put” data into a common database and obtain more universal ML models. In the long term, such a step will accelerate the implementation of drilling optimization algorithms already at the design stage, where an engineer will be able to pre-calculate the expected ICR without expensive field tests. To summarize, further research should cover at least three areas of development: integration with real time via sensors, enhancing the robustness of ML models using regularization and hybridization, and increasing portability through transfer learning and unification of methods for measuring mining and geological parameters. 26 2.7. Literature Review Summary Summary of Key Results The analysis of existing works on forecasting the instantaneous productivity (ICR) of roadheaders revealed that the development of methods went from simple empirical formulas such as Gehring et al. (1989), Bilgin et al. (1990), Copur et al. (1998), taking into account mainly UCS and RQD, to more complex models reflecting many factors: crack orientation, abrasiveness, design features of the miner itself. At the same time, machine learning methods (ANN, SVR, ensemble approaches) are gaining popularity, which are able to better capture nonlinear relationships in data and expand the applicability of models. At the same time, an important condition for high-quality ICR forecasting remains the presence of a representative observation base. However, in mining practice, the problem of small data volumes is often encountered, which leads to difficulties in training models and in assessing their generalizing ability. To overcome these limitations, synthetic data generation, strict cross- validation and regularization are used. Justification of the Selected Scientific Direction Summarizing the results of the review, several critical gaps can be identified. Firstly, most empirical formulas (Bilgin et al. 1990, Copur et al. 1998, Thuro & Plinninger 1990) are designed for a narrow range of geological conditions or for certain types of machines, which means that their accuracy drops sharply when changing a deposit or when working with a new type of roadheader. Secondly, even multifactor models (taking into account RMR, RMBI, SE) remain vulnerable to noise data and fluctuations in geological properties. On the other hand, machine learning shows promise with sufficient variability and quantity of data. However, “small data” complicates the use of deep neural networks and requires improved techniques to combat overfitting. Therefore, it is logical to move towards an integrated approach, in which nonlinear features are added to classical mining and mechanical parameters, and regularization and synthetic generation mechanisms are incorporated into the model. Transition to methods and experimentation Based on the identified gaps in the study, a methodological part will be proposed, where the main focus is on methods for processing limited data sets and expanding the feature space. In particular: 27 It is planned to use procedures for generating artificial observations (data augmentation) to increase the model's resistance to noise. It is envisaged to introduce second-order polynomial features reflecting potential nonlinear interactions (e.g., UCS × RQD, RMR², etc.). Various machine learning algorithms (neural networks, SVM, ensembles) will be used to train the models, to which a set of regularization measures and strict cross-validation are added. Therefore, the next chapter will detail the methodology based on the results of the literature review: what data features are taken into account, how the hyperparameters of ML models are selected, and why the proposed techniques are especially important for predicting ICR in a limited sample. 3. CASE STUDY – SAN MANUEL MINE, ARIZONA USA The San Manuel Mine, located in Arizona's Lower San Pedro River Basin, was a significant underground copper mine and a cornerstone of the region's mining industry. Operational from the 1950s, the mine utilized various methods, including block caving, open-pit mining, and in- situ leaching, to adapt to changing technological advancements and resource extraction demands. However, due to economic constraints and resource depletion, the mine ceased operations in 1999 (BHP Copper Inc., 2002). Geologically, the mine is characterized by an ore body hosted in granodiorite porphyry and quartz monzonite, intersected by fault systems such as the San Manuel, East, and West Faults. These structural features provided pathways for mineralization but also posed operational challenges due to zones of weakness and variability in rock strength. The depth of ore deposits, which extended from 700 to 3,000 feet below the surface, further complicated mining operations (Arizona Geological Society, 1987). In response to these challenges, roadheader technology was tested and utilized at the San Manuel Mine to enhance drift excavation. Roadheaders, such as the DOSCO SL-120, were employed to overcome the limitations of conventional drill-and-blast techniques, particularly in fractured and jointed rock environments. The technology demonstrated its ability to achieve faster excavation rates, reduced overbreak, and improved stability of drifts. Tests conducted in the San Manuel and Kalamazoo ore bodies revealed that roadheaders could be a viable 28 alternative, particularly in areas where structural complexities and variable rock properties necessitated precision excavation (Sandbak, 1985). Figure 3. Geologic Map of San Manuel area (Schwartz, 1953). 29 Figure 4. RMR table and Machine Performance Graph Table 4: Dataset Utilized for Model Establishing # Geology RQD RMR UCS ICR Description (%) (-.) (MPa) (m^3/hr) 1 Good Rock 72 70 200 8,309 2 Good Rock 70 68 170 6,492 3 Good Rock 62 64 150 8,547 4 Poor Rock 35 34 120 12,115 5 Poor Rock 50 37 137 15,016 6 Poor Rock 58 31 116 17,406 7 Poor Rock 25 24 127 18,789 8 Poor Rock 38 27 172 18,491 9 Poor Rock 53 28 162 19,199 10 Very Poor Rock 5 15 130 20,06 11 Very Poor Rock 18 18 100 21,099 12 Very Poor Rock 3 15 75 21,807 13 Poor Rock 19 27 60 22,167 14 Poor Rock 28 31 88 18,997 15 Poor Rock 42 34 103 13,809 16 Poor Rock 22 25 95 12,317 17 Poor Rock 39 26 90 14,357 30 18 Poor Rock 39 29 92 16,567 19 Poor Rock 55 30 85 19,45 20 Poor Rock 18 27 90 16,446 21 Fair Rock 54 45 115 13,952 22 Fair Rock 52 44 130 14,665 23 Fair Rock 63 49 157 12,182 24 Fair Rock 52 44 130 11,044 25 Fair Rock 50 48 145 12,596 26 Poor Rock 7 34 70 12,79 27 Poor Rock 47 40 115 13,151 28 Poor Rock 44 38 120 14,193 Roadheader Operational Features The San Manuel mine has served as a testbed for roadheader technology. Field studies, including those conducted by Sandbak, have demonstrated that roadheaders significantly reduce the risk of fracturing and excessive rock failure compared to conventional blasting methods. In the Kalamazoo and San Manuel deposits, roadheading achieved development rates up to 38% higher, especially in weak, highly fractured, and hydrothermally altered rocks. Detailed geomechanical mapping of test levels (e.g., 2890 and 2375 ft tests) revealed that different rock classes, from weak, highly fractured porphyry to massive quartz monzonite, have a direct impact on cutting rates (in ft/hour) and cutting tool consumption (bits per foot). The results of these tests were used to calibrate the predictive models, confirming that ensemble machine learning methods are particularly effective in describing the complex nonlinear behavior of rock masses during roadheader operations. The San Manuel case study demonstrates the practical benefits and challenges of applying predictive models to the mining industry. Heterogeneous geological settings, complex structural settings, and associated sedimentation require advanced data synthesis techniques and robust ensemble models. The analysis suspects that ensemble methods significantly outperform simple linear regressions in predicting roadheader performance in such dynamic environments. In addition, geomechanical observations provide an important link between model output and real mining conditions, which is critical for optimizing mine planning, reducing costs, and ensuring mine safety. 31 4. METHODOLOGY This chapter describes the methodological approach used to solve the problem of predicting the ICR (roadheader machine performance) using machine learning methods. The stages of data collection and description, their pre-processing and feature formation, methods for generating synthetic data, as well as the selected models and the strategy for their evaluation are considered. 4.1 Data collection and description The research on tunneling and drift excavation at San Manuel, performed by Louis Sandbak in 1985, contains a complete dataset with the geotechnical parameters for model establishment required. This dataset includes measurements of such parameters as rock quality factor (RQD), RMR value, and Uniaxial compressive strength (UCS), as well as the target variable – ICR, characterizing the equipment productivity in m³/hour. The section describes the data source, the conditions for collecting measurements, as well as the main characteristics and statistics of the dataset. Such a detailed analysis allows us to understand the original distribution of values and identify potential problems, such as the presence of missing values or outliers. 4.2 Data Preprocessing and Feature Generation In this study, data preprocessing and feature generation plays a key role, since the success of subsequent modeling depends on the quality of the input data. In the first step, the data is carefully extracted from the original dataset: values from the RMR table and Machine Performance graphs were extracted and digitalized. It’s important to carefully handle data and create a reliable base for further analysis. The next important step is data normalization. In the original dataset, features are measured in different units (for example, RQD is expressed in percentages, UCS in MPa, and ICR in cubic meters per hour), which can lead to machine learning algorithms paying too much attention to features of a large scale. To address this issue, standard scaling was applied, transforming all features so that their mean is zero and their standard deviation is one. This not only speeds up the convergence process of the algorithms, but also ensures that all variables have an equal influence on the final result. In addition, to enable the models to capture more complex, nonlinear relationships between features and the target variable, the original feature space was expanded using a polynomial 32 transformation. Specifically, combinations of second-degree features were generated for each original variable, revealing hidden patterns not reflected in the original data. Although adding polynomial features increases the dimensionality of the dataset, the benefits in terms of the improved ability of the model to describe complex dependencies significantly outweigh the potential computational costs. Thus, the complex preprocessing includes: removing incomplete records to ensure data purity, normalizing features to bring them to a common scale, and creating additional polynomial features to reveal nonlinear relationships. This approach forms a solid foundation for subsequent stages of analysis, synthetic data generation, and model training, contributing to increased accuracy and reliability of predictions. 4.3 Synthetic Data Generation Due to the limited size of the original data set, synthetic data generation methods are used to improve the learning ability of models. Three methods are described below, each of which is mathematically justified. 1. The Ridge Regression-based Label Method. The idea of this method is to use a linear model with L2 regularization to predict the target variable on a bootstrapped sample of features. Mathematically, the Ridge regression problem is formulated as the minimization of the loss function: || 𝑋𝑋𝑋𝑋  −  𝑦𝑦||2  +  𝜆𝜆 ||𝑋𝑋||2 (18) where X is the feature matrix, y is the vector of target values, β is the vector of model coefficients, and λ is the regularization coefficient. After training the model on the original data, synthetic labels are calculated for the bootstrapped (randomly selected with repetition) subsample 𝑋𝑋𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏: 𝑦𝑦𝑠𝑠𝑠𝑠𝑠𝑠𝑏𝑏ℎ𝑒𝑒𝑏𝑏𝑒𝑒𝑐𝑐 == 𝑋𝑋𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 �̂�𝑋. (19) To simulate natural variability, a small amount of noise is added to the features, for example, 33 𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑡𝑡′ = 𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑋𝑡𝑡 + 𝜀𝜀, (20) where 𝜀𝜀 ∼ 𝑁𝑁(0,𝜎𝜎2𝐼𝐼). This approach preserves the linear dependencies present in the data and allows for an expanded training sample. 2. Gaussian Noise Generation Method. This method assumes that synthetic labels are generated by adding random noise to the target values obtained by resampling. Formally, for a resampled target variable 𝑦𝑦𝑟𝑟𝑒𝑒𝑠𝑠𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑒𝑒 , the synthetic label is defined as: 𝑦𝑦𝑠𝑠𝑠𝑠𝑠𝑠𝑏𝑏ℎ𝑒𝑒𝑏𝑏𝑒𝑒𝑐𝑐 = 𝑦𝑦𝑟𝑟𝑒𝑒𝑠𝑠𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑒𝑒 + 𝜖𝜖, (21) where the noise ϵ\epsilonϵ is normally distributed, 𝜖𝜖 ∼ 𝑁𝑁(0,𝜎𝜎2). Here, the parameter σ\sigmaσ controls the degree of random fluctuations, allowing the synthetic data to mimic the natural fluctuations found in real measurements. 3. Random Forest Regressor -based method. Using the Random Forest model, an ensemble of decision trees is trained to predict the target variable. Mathematically, Random Forest estimates a function 𝑓𝑓(𝑒𝑒) such that: 𝑦𝑦 ≈ 𝑓𝑓(𝑋𝑋) = 1 𝑇𝑇 �𝑓𝑓𝑏𝑏(𝑋𝑋) 𝑇𝑇 𝑏𝑏=1 , (22) where T is the number of trees and 𝑓𝑓𝑏𝑏(𝑋𝑋) is the prediction of the t-th tree. After training, for the bootstrapped feature set 𝑋𝑋𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏, the synthetic labels are computed as: 𝑦𝑦𝑠𝑠𝑠𝑠𝑠𝑠𝑏𝑏ℎ𝑒𝑒𝑏𝑏𝑒𝑒𝑐𝑐 = 𝑓𝑓(𝑋𝑋𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏). (23) Similarly to the first method, a small noise can be added to the features to increase diversity: 𝑋𝑋𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏′ = 𝑋𝑋𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 + 𝜖𝜖, 𝜖𝜖 ∼ 𝑁𝑁(0,𝜎𝜎2𝐼𝐼). 34 (24) This approach allows us to take into account complex nonlinear dependencies present in the data and generate synthetic labels that are as close as possible to real ones. Figure 5: Visualization for Random Forest Regressor Labeling In this study, a wide range of machine learning models was considered to address the ICR forecasting problem in order to find the optimal balance between accuracy and interpretability. Linear methods were analyzed at first (Ridge, Lasso, and ElasticNet), which are well suited for situations where simplicity and high generalization ability with a large number of features are important. Ridge regression (L2 regularization) helps smooth out possible jumps in coefficients and reduce overfitting, while Lasso (L1 regularization) is able to zero out coefficients of insignificant features and thus perform partial data filtering. ElasticNet represents a compromise, combining the properties of both approaches and often demonstrating more stable results, especially in problems with many correlated variables. However, given the complexity of the rocks and potential nonlinearities in the data, the analysis was suplemented with ensemble methods: Random Forest, Gradient Boosting, and Extra Trees. They build multiple decision trees and combine their results, which allows them to model complex relationships and deal with noise more effectively. For example, Random Forest averages the results of individual trees trained on bootstrapped samples, which reduces the risk of overfitting and increases robustness to outliers. Gradient Boosting sequentially corrects the errors of previous trees, which helps to better capture subtle patterns in the data, and Extra Trees adds additional randomness to the tree construction process, due to which it can work even 35 more effectively on heterogeneous datasets. To account for nonlinear relationships, Support Vector Regression (SVR) and a neural network in the form of a multilayer perceptron (MLP) was included. SVR with an appropriate kernel (e.g., RBF) can handle highly nonlinear functions, and a neural network, with the right settings for the architecture and training parameters, often outperforms other models in problems where the ability to generalize to complex distributions is important. Finally, to assess how much better all these approaches are than the trivial option, the DummyRegressor (ZeroR) model, which simply predicts the mean value of the target variable, served as the base model. The training process for each model consisted of several stages. First, synthetic datasets were generated (using three different methods) to expand the training set and make it more diverse. Next, cross-validation was performed for each model (usually 5-fold), where hyperparameters were selected using Grid Search or alternative optimization algorithms. This helped to refine parameters such as regularization coefficients for linear models, the number of trees and depth for Random Forest and Gradient Boosting, kernel parameters for SVR, and the number of layers and neurons in MLP. After completing the tuning on synthetic data, model trained on the entire expanded dataset and tested on the original one (without adding synthetics) to check the real generalization ability. This comprehensive approach to model selection and training ensures that were considered different classes of algorithms, tested them on a wide range of data, and selected those that actually perform best in predicting the performance of roadheader machines. Linear models (Ridge, Lasso and ElasticNet) For linear methods, it is assumed that the target variable y depends linearly on the set of features X. Let X be a matrix of size 𝑙𝑙 × 𝑑𝑑, where nnn is the number of observations, d is the number of features, and y is a vector of targets of size n. Observation of a vector of coefficients 𝑋𝑋  ∈  𝐼𝐼𝑑𝑑 that best approximates the dependence 𝑦𝑦 ≈ 𝑋𝑋𝑋𝑋 is performed. Ridge regression (L2 regularization). The objective function of Ridge regression is formulated as 𝑚𝑚𝑚𝑚𝑙𝑙 𝛽𝛽 ||𝑋𝑋𝑋𝑋 − 𝑦𝑦||2 + 𝜆𝜆||𝑋𝑋||2, (25) 36 where ∥⋅∥ denotes the Euclidean norm, and 𝜆𝜆 ≥ 0 is the regularization coefficient. The first term is responsible for minimizing the forecast error, and the second limits the growth of the coefficients, reducing the risk of overfitting. Lasso regression (L1 regularization). Unlike Ridge, the L1 norm is used here: 𝑚𝑚𝑚𝑚𝑙𝑙 𝛽𝛽 ||𝑋𝑋𝑋𝑋 − 𝑦𝑦||2 + 𝛼𝛼||𝑋𝑋||1, (26) where ||𝑋𝑋||1 = ∑𝑑𝑑 𝑗𝑗=1 𝛼𝛼�𝑋𝑋𝑗𝑗� The parameter α controls the strength of the regularization. Due to the L1 norm, some components of β can be set to zero, which gives a feature selection effect. ElasticNet (mixed regularization). ElasticNet combines the properties of Ridge and Lasso by minimizing 𝑚𝑚𝑚𝑚𝑙𝑙 𝛽𝛽 ||𝑋𝑋𝑋𝑋 − 𝑦𝑦||2 + 𝛼𝛼(𝜌𝜌||𝑋𝑋||1 + (1 − 𝜌𝜌)||𝑋𝑋||2), (27) where α and ρ control the degree of contribution of L1 and L2 to the regularization. This helps to cope with cases where there is both feature sparseness and cross-correlation. Ensemble methods (Random Forest, Gradient Boosting, Extra Trees) Ensemble models build several base algorithms (usually decision trees) and combine their predictions. Random Forest. Let us have T trees, each of which is denoted as 𝑓𝑓𝑏𝑏(𝑒𝑒). In regression, the final prediction is obtained by averaging: 𝑦𝑦�(𝑒𝑒) = 1 𝑇𝑇 �𝑓𝑓𝑏𝑏(𝑒𝑒) 𝑇𝑇 𝑏𝑏=1 . (28) 37 Each tree is trained on a bootstrap sample of the original data, and a random subsample of features is used to select features for splitting at each node. This approach reduces the correlation of trees and improves generalization ability. Gradient Boosting. The idea of gradient boosting is that at each step a new tree is built that approximates the antigradient of the loss function. Let 𝐹𝐹𝑟𝑟(𝑒𝑒) be the model after the mmm-th step, then at step m+1 a tree ℎ𝑟𝑟+1(𝑒𝑒) is built, which approximates the residuals or errors of the previous model. The final model takes the form: 𝐹𝐹𝑟𝑟+1(𝑒𝑒) = 𝐹𝐹𝑟𝑟(𝑒𝑒) + 𝜈𝜈 ⋅ ℎ𝑟𝑟+1(𝑒𝑒), (29) where 𝜈𝜈 is the learning rate. This iterative scheme allows us to gradually “refine” the predictions, reducing the error. Extra Trees. The Extra Trees (Extremely Randomized Trees) model is similar to Random Forest in many ways, but it selects partition thresholds even more randomly. Formally, when constructing a tree, a subset of features and random thresholds are randomly selected for each node, rather than a detailed enumeration. This additional "loosening" of the tree structure can improve the robustness to overfitting on heterogeneous datasets. Nonlinear methods and neural networks (SVR, MLP) Support Vector Regression (SVR). The SVR model searches for a function 𝑓𝑓(𝑒𝑒) = ⟨𝑤𝑤, 𝑒𝑒⟩ + 𝑋𝑋 that fits into the ε -neighborhood of the target value and has the minimum norm of the vector w. The problem is formulated as follows: 𝑚𝑚𝑚𝑚𝑙𝑙 𝑤𝑤 𝛽𝛽 1 2 ||𝑤𝑤||2 under constraints {𝑦𝑦𝑒𝑒 − ⟨𝑤𝑤, 𝑒𝑒𝑒𝑒⟩ − 𝑋𝑋 ≤ 𝜖𝜖, ⟨𝑤𝑤, 𝑒𝑒𝑒𝑒⟩ + 𝑋𝑋 − 𝑦𝑦𝑒𝑒 ≤ 𝜖𝜖 }, (30) 38 where ε defines the error tolerance. When using kernels (e.g., RBF), the regression goes into a nonlinear feature space, which helps to capture complex dependencies. Multilayer Perceptron (MLP). An MLP neural network consists of several layers: an input layer, one or more hidden layers, and an output layer. Let there be HHH neurons in the hidden layer, then the output of each neuron is calculated as 𝑧𝑧𝑗𝑗 = 𝜎𝜎 ��𝑤𝑤𝑗𝑗𝑒𝑒 (1)𝑒𝑒𝑒𝑒 + 𝑋𝑋𝑗𝑗 (1) 𝑠𝑠 𝑒𝑒+1 �, (31) where σ(⋅) is the activation function (e.g. ReLU or sigmoid), 𝑤𝑤𝑗𝑗𝑒𝑒 (1) and 𝑋𝑋𝑗𝑗 (1) are the weights and biases for the first layer. The output layer undergoes a similar transformation, and the final output of the network gives an estimate of the target variable. The network is trained by minimizing the loss function (e.g. MSE) using backpropagation, where the weights are adjusted using gradient methods. Baseline Model (DummyRegressor, ZeroR) The baseline model simply predicts the mean (or median) of the target variable, ignoring the feature values. Let 𝑦𝑦 = 1 𝑠𝑠 ∑ 𝑦𝑦𝑒𝑒𝑠𝑠 𝑒𝑒=1 . Then for any x, the model predicts 𝑦𝑦�(𝑒𝑒) = 𝑦𝑦. This is the simplest approach, which serves as a lower bound on the quality: all other models should perform better than the trivial averaging. 4.4 Cross-validation and evaluation metrics For an objective assessment of the quality of the constructed models, our study uses 5-fold cross-validation. This method allows dividing the synthetic sample into five equal parts, sequentially training the model on four parts and testing it on the remaining one. This approach helps to obtain a reliable estimate of the generalization ability of the model, minimizing the impact of random data partitioning and preventing overfitting. Cross-validation ensures the stability of the results and makes it possible to optimize the hyperparameters of each model, which is especially important when working with a small amount of initial data supplemented synthetically. 39 As metrics for assessing the effectiveness of the models, three main indicators are used - the determination coefficient (R²), the mean square error (MSE) and variance accounted for (VAF). The determination coefficient R² and VAF are inter-related, and show what proportion of the variance of the target variable is explained by the model. An R² value close to 1 indicates that the model is able to adequately describe the data, while low values indicate insufficient explanatory power of the model. The mean squared error (MSE) measures the average squared difference between predicted and actual values. The smaller the MSE, the more accurate the model. Using these metrics allows for a comprehensive assessment of both the relative ability of the model to explain data variability and the absolute accuracy of its forecasts. The metrics are calculated in two modes: during cross-validation on synthetic data, which allows for selecting optimal hyperparameters and assessing the stability of the model, and when testing the trained model on the original data set. This approach ensures that the achievements obtained on synthetic data are actually reflected in the quality of forecasts on the real sample. As a result, comparing R² and MSE for different models and methods of generating synthetic data allows for choosing the most optimal algorithm for ICR forecasting, ensuring a high level of accuracy and reliability of forecasts. 40 5. DATA ANALYSIS Data Analysis At this stage of the study, a detailed analysis of the original data set containing the geotechincal and roadheader performance parameters from San Manuel mine was carried out. The dataset that was obtained contains 28 datapoints, including the RMR, RQD, UCS, and ICR values. Initial data is given in the RMR table and Machine performance graph in the figure 3. Data preprocessing and expansion of the feature space First of all, features were normalized to bring them to a single scale, which is critical for correct model training. After that, second-degree polynomial features were generated. This allowed us to identify hidden nonlinear relationships between the original characteristics and the target variable (ICR). Expansion of the feature space made it possible to detect additional dependencies that were not obvious when analyzing the original data. Descriptive statistics To assess the quality and distribution of data after the preprocessing stages, a descriptive statistics table was compiled. It presents the following indicators for each feature (both original and generated): Count — number of observations (in this case, 28 records). Mean — average value of the feature. Std — standard deviation, reflecting the spread of values around the mean. Min, 25%, 50%, 75%, Max — minimum value, first quartile, median, third quartile and maximum value, allowing to estimate the distribution and range of values. Examples of statistics interpretation For example, for the original RQD feature, the mean value is close to zero, and the standard deviation is approximately one. This indicates the correctness of the data scaling. In the case of polynomial features, such as RQD² or RQD × RMR, the values of the mean and standard deviations also confirm the precision of the scaling. 41 Table 5: EDA summary for the Original Dataset count mean std min 25% 50% 75% max RQD 28.0 40.0 19.459 3.0 24.25 43.0 53.25 72.0 RMR 28.0 35.786 14.312 15.0 27.0 32.5 44.0 70.0 UCS 28.0 119.429 34.009 60.0 91.5 118.0 139.0 200.0 ICR 28.0 15.215 4.151 6.492 12.526 14.511 18.841 22.167 Table 6: EDA summary for the Polynomial features of the Original Dataset Feature count mean std min 25% 50% 75% max RQD 28 0 1,00 -1,936 -0,824 0,157 0,693 1,675 RMR 28 0 1,00 -1,479 -0,625 -0,234 0,584 2,434 UCS 28 0 1,00 -1,779 -0,836 -0,043 0,586 2,413 RQD² 28 1 1,096 0,003 0,239 0,576 1,326 3,749 RQD RMR 28 0,798 1,204 -0,377 0,036 0,367 0,825 4,077 RQD UCS 28 0,674 1,116 -0,809 -0,049 0,237 1,024 4,04 RMR² 28 1 1,571 0,007 0,116 0,391 0,787 5,926 RMR UCS 28 0,636 1,348 -0,984 0,001 0,254 0,683 5,873 UCS² 28 1 1,286 0 0,1 0,63 1,355 5,821 Second-degree polynomial features were created to capture complex nonlinear relationships between original features. Generating new features, such as squared original features and their interactions, produced data with diverse values and large scatter. For example, squared features, such as RQD^2, RMR^2, UCS^2 have a mean value 1 with a high standard deviation, indicating significant variations in the data. These features can be useful for models as they provide additional features that can help in better prediction. Interactions between features, such as RQD RMR, RMR UCS, have both positive and negative values, reflecting complex relationships between geological characteristics. These interactions can help models uncover hidden dependencies and improve predictive ability, especially for nonlinear models such as ensemble methods. 42 The high scatter and large ranges of polynomial features highlight their importance for machine learning models, where such features can improve prediction accuracy. These polynomial features enable models to capture more complex relationships, which is especially useful for predicting drilling machine performance, where the interaction of different characteristics is critical. 43 Figure 6: Correlation Heatmaps of the Original Data. (Figure A - Heatmap of the Base Features, B - Polynomial Features) 44 Figure 7: Histograms of Standardized Original Data Distributions (Histogram A - UCS, B - RQD, C - RMR value). 45 5.2 Model training and evaluation This section provides a comparative analysis of the performance of models trained on a synthetically expanded sample, followed by an evaluation of their performance on the original data. A detailed description of the training process is presented in the methodological section, so here the emphasis is on comparing the results and interpreting the obtained metrics. Based on synthetically generated data using three different methods (Ridge Regression Labeling, Gaussian Noise Labeling and Random Forest Regression Labeling), 5-fold cross- validation was performed for each model. The main evaluation metrics are the determination coefficient (R²) and the mean square error (MSE). These indicators allow us to evaluate the extent to which models trained on synthetic data are able to generalize information when moving to the original data set. 46 Figure 8: Base Features Correlation Heatmaps of the Synthetic Data. (Figure A - Ridge , B - Gaussian, C - Random Forest) 47 Figure 9: Polynomial Features Correlation Heatmaps of the Synthetic Data. (Figure A - Ridge , B - Gaussian, C - Random Forest) 48 Following the correlation heatmaps for base and polynomial features, the histograms for the synthetic data distribution were plotted: Figure 10: Synthetic Data Distribution for Ridge Regression Labeling (Figure A - RQD , B - RMR, C - UCS). 49 Figure 11: Synthetic Data Distribution for Gaussian Noise Labeling (Figure A - RQD , B - RMR, C - UCS). 50 Figure 12: Synthetic Data Distribution for Random Forest Regression Labeling (Figure A - RQD , B - RMR, C - UCS). 51 Following the synthetic data generation, the models were trained on the augmented data. After the training was performed, models were evaluated on the original dataset which serves the purpose of the test set. The scatterplots were plotted with the purpose of representing the output: 52 Figure 13: ICR plots for Linear Models trained on Ridge Regression Labeling data (Scatterplot A - Ridge regression, B - Lasso, C - ElasticNet). 53 Figure 14: ICR plots for Ensemble Models trained on Ridge Regression Labeling data (Scatterplot A - Random Forest, B - Gradient Boosting, C - ExtraTrees). 54 Figure 15: ICR plots for Non-Linear Models trained on Ridge Regression Labeling data ( Scatterplot A - MLP, B - SVR). 55 Figure 16: ICR plot for Base Model trained on Ridge Regression Labeling data(ZeroR). The same plots are made for models trained on Gaussian Noise Labeling method. 56 Figure 17: ICR plots for Linear Models trained on Gaussian Noise Labeling method (Scatterplot A - Ridge regression, B - Lasso, C - ElasticNet). 57 Figure 18: ICR plots for Ensemble Models trained on Gaussian Noise Labeling method (Random Forest, Gradient Boosting, ExtraTrees). 58 Figure 19: ICR plots for Non-Linear Models trained on Gaussian Noise Labeling data ( Scatterplot A - MLP, B - SVR). 59 Figure 20: ICR plot for Base Model trained on Ridge Regression Labeling data(ZeroR). Scatterplots of ICR values obtained on the models trained on Random Forest Regression Labeling. 60 Figure 21: ICR plots for Linear Models trained on Random Forest Regression Labeling data (Scatterplot A - Ridge Regression, B - Lasso, C - ElasticNet). 61 Figure 22: ICR plots for Ensemble Models trained on Ridge Regression Labeling data (Scatterplot A - Random Forest, B - Gradient Boosting, C - ExtraTrees). 62 Figure 23: ICR plots for Non-Linear Models trained on Random Forest Regression Labeling data (Scatterplot A - MLP, B - SVR). 63 Figure 24: ICR plot for Base Model (ZeroR) trained on Random Forest Regression Labeling. The results show that ensemble methods (e.g. Random Forest, Gradient Boosting, Extra Trees) demonstrate the best performance, providing high prediction accuracy and resistance to overfitting. At the same time, linear models (Ridge, Lasso, ElasticNet) are inferior in forecasting quality, which indicates the difficulty of identifying nonlinear relationships in the original data. Table 7: Example Outputs of Predicted ICR values Model ICR actual Ridge Regression Gaussian Noise R.Forest Regression Ridge 8,309 8,662 8,120 8,737 Lasso 8,309 8,684 8,216 8,783 ElasticNet 8,309 9,077 9,093 9,539 RandomForest 8,309 8,451 8,209 8,339 XGboost 8,309 8,475 8,400 8,438 ExtraTrees 8,309 8,451 8,257 8,390 MLP 8,309 8,359 8,447 8,263 SVR 8,309 8,322 8,421 8,345 ZeroR 8,309 14,935 14,910 14,736 64 By examining the table, results confirm the effectiveness of using a synthetically expanded sample for training models. The tables containing the results for all predicted ICR values are attached in Appendices 1-9. Comparative analysis demonstrated that the optimal balance between prediction accuracy and model stability is achieved when using ensemble algorithms. These findings serve as an important basis for further practical application of the developed ICR forecasting technique. 5.3 Performance Comparison This section provides a detailed comparison of the models' performance based on the key metrics - the coefficient of determination (R²) and the mean square error (MSE). The experimental results are obtained for three synthetic data generation methods, which allows us to evaluate how the selected models cope with ICR forecasting under different approaches to sample expansion. Table 8: Performance Metrics – Ridge Regression Labeling Model CV R² Mean (Synthetic) CV R² Std (Synthetic) CV MSE Mean (Synthetic) Test R² (Original) Test MSE (Original) Test VAF (Original) Ridge 0.9352 0.0092 0.8340 0.7825 3.6142 0.7827 Lasso 0.9347 0.0096 0.8401 0.7783 3.6826 0.7786 ElasticNet 0.7484 0.0248 3.2639 0.5776 7.0171 0.5787 Random Forest 0.9392 0.0102 0.7833 0.7663 3.8832 0.7663 Gradient Boosting 0.9749 0.0042 0.3227 0.8156 3.0629 0.8157 Extra Trees 0.8649 0.0195 1.7434 0.7122 4.7816 0.7128 MLP 0.9676 0.0082 0.4179 0.8084 3.1831 0.8091 SVR 0.9771 0.0054 0.2934 0.8153 3.0689 0.8154 ZeroR -0.0058 0.0028 13.0774 -0.0047 16.6920 0.0000 65 Table 9: Performance Metrics – Gaussian Noise Labeling Model R² Mean (Train) R² Std (Train) MSE Mean (Train) Test R² (Test) Test MSE (Test) Test VAF (Test) Ridge 0.8182 0.0152 2.9544 0.8200 2.9899 0.8204 Lasso 0.8170 0.0155 2.9743 0.8184 3.0166 0.8188 ElasticNet 0.5831 0.0383 6.7863 0.5747 7.0662 0.5774 Random Forest 0.9505 0.0085 0.8039 0.9565 0.7222 0.9566 Gradient Boosting 0.9831 0.0039 0.2746 0.9841 0.2634 0.9841 Extra Trees 0.8358 0.0268 2.6580 0.8396 2.6644 0.8406 MLP 0.9676 0.0056 0.5248 0.9568 0.7177 0.9569 SVR 0.8915 0.0159 1.7601 0.8913 1.8056 0.8915 ZeroR -0.0103 0.0076 16.4882 -0.0080 16.7470 0.0000 Table 10: Performance Metrics – Random Forest Regression Labeling Model R² Mean (Train) R² Std (Train) MSE Mean (Train) Test R² (Test) Test MSE (Test) Test VAF (Test) Ridge 0.8261 0.0103 2.0391 0.7674 3.8640 0.7698 Lasso 0.8255 0.0111 2.0454 0.7633 3.9329 0.7657 ElasticNet 0.6355 0.0321 4.2635 0.5264 7.8689 0.5329 Random Forest 0.8743 0.0177 1.4620 0.8081 3.1885 0.8105 Gradient Boosting 0.9386 0.0126 0.7118 0.8900 1.8272 0.8922 Extra Trees 0.7877 0.0290 2.4714 0.6999 4.9853 0.7052 MLP 0.9330 0.0207 0.7737 0.8984 1.6885 0.9033 SVR 0.9245 0.0105 0.8816 0.8572 2.3724 0.8592 ZeroR -0.0103 0.0061 11.8555 -0.0138 16.8432 0.0000 In this work, three different synthetic generation methods were used to evaluate the quality of the forecasting models: Ridge Regression Labeling, Gaussian Noise Labeling, and Random Forest Regression Labeling. Each of these strategies generated synthetic training data, which was cross-validated (with R² and MSE estimates), and then the ability of the models to generalize to the original dataset was assessed. In addition to the main metrics such as R² and MSE, the Test VAF (Variance Accounted For) metric was also calculated, which allows us to 66 judge how well the model explains the variance of the target variable, which actually correlates with the R² value. When using the Ridge Regression Labeling method, linear models such as Ridge and Lasso showed high results: the cross-validation R² was about 0.935, and the test R² was about 0.782. Test VAF was comparable to the test R², indicating robustness of the explained variance. ElasticNet performed slightly worse, with a test R² of around 0.578, indicating weaker generalization ability in this configuration. Among the nonlinear models and ensemble methods, Gradient Boosting performed outstandingly with a test R² of around 0.816, while SVR and MLP also showed comparable values (around 0.815 and 0.808, respectively). Interestingly, Random Forest and Extra Trees performed more modestly in this method, while the mean- predicting ZeroR model did not produce any significant results. There synthetic data generated for gaussian noise labeling demonstrates the best quality so far, as reflected in the cross-validation R² values for the models (CV and Test values are almost similar). Due to this, the models achieve high performance on the test set: the test R² for Ridge and Lasso reaches around 0.820, and ensemble models such as Random Forest, Gradient Boosting, and even MLP demonstrate test R² of around 0.956–0.984. Since the Test VAF is almost identical to the Test R², it can be concluded that the data quality allows the models to generalize well on the original dataset. The Random Forest Regression Labeling method shows intermediate results. The cross- validation metrics for the Ridge and Lasso models are around 0.826, but their test R² values are slightly lower (around 0.767 and 0.763, respectively). ElasticNet still demonstrates poor generalization ability with a test R² of around 0.526. Among the ensemble methods, Gradient Boosting shows good results (test R² is about 0.890), and MLP and SVR models also demonstrate satisfactory performance. It is worth noting that for this synthetic generation method, Extra Trees turned out to be less robust, which is reflected in a decrease in test scores to 0.6999. Now, the ranking of the models’ performance is conducted. The ranking was done for all three methods of data synthesis, however the attention will be focused on Gaussian noise labeling since generally the models show higher performance for this method. 67 The ranking is done based on the following criteria: R² value - The Higher the better; MSE - The Lower the better; VAF - The Higher the better. Now, the ranks for each model are computed: Table 11: Ranking table for models trained on Ridge Regression Labeling data. Model Rank Gradient Boosting 1 SVR 2 MLP 3 Ridge 4 Lasso 5 Random Forest 6 Extra Trees 7 ElasticNet 8 Strong inter-linked correlation between the performance metrics is seen during the comparison. Since the metrics have are mathematically inter-related to each other, they tend to change in unity with each other. This may seem like usage of these metrics may not display the independence in the computation sources, however, all of these metrics are strong standards for assessment of ML models. The same ranking is performed for Gaussian Regression Labeling, where the model