ADVANCED MACHINE LEARNING FOR PREDICTING CO₂ SOLUBILITY AND DIFFUSION IN RESERVOIR FLUIDS: INTEGRATING DATA-DRIVEN AND PHYSICS-BASED APPROACHES FOR CARBON SEQUESTRATION AND ENHANCED OIL RECOVERY
Loading...
Files
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Nazarbayev University School of Mining and Geosciences
Abstract
Accurate prediction of carbon dioxide (CO₂) solubility and diffusion in reservoir fluids is essential for designing effective carbon sequestration and enhanced oil recovery (EOR) strategies. This thesis presents a unified machine learning framework developed to predict CO₂ solubility and diffusion coefficients across two distinct reservoir fluid systems: (1) crude oils and liquid hydrocarbons, and (2) pure water and saline brines. Four data-driven models were applied, including Artificial Neural Networks (ANN), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Support Vector Regression (SVR). Each model was trained, optimized, and evaluated independently for each fluid-property combination. In addition, a Physics-Informed Neural Network (PINN) was implemented to model CO₂ diffusion in aqueous systems by incorporating governing equations, including the Stokes–Einstein and Arrhenius relations, directly into the loss function during training.
The compiled datasets include over 3,000 experimental records from more than 80 published references, covering a wide range of conditions. These span temperatures from 252 to 513 K, pressures from 0.1 to 200 MPa, salinities up to 6.8 mol/kg, viscosities up to 224,500 cP, and densities from 443 to 1026 kg/m³. The datasets contain limited measurements at high salinity and pressure ranges, which limits the model’s ability to generalize accurately in those conditions. A structured preprocessing pipeline was applied, including outlier handling, salinity and viscosity grouping, engineered features, and categorical encodings tailored to each fluid system. Among the data-driven models, ANN and XGBoost showed stronger performance than RF and SVR, which can be attributed to their ability to capture nonlinear dependencies and generalize well across diverse input conditions., with R² values exceeding 0.98 for most tasks. The PINN achieved a test R² of 0.992 on the aqueous diffusion dataset, along with an 80% reduction in RMSE and a 75% reduction in MAE compared to the ANN model. When compared to XGBoost, RMSE and MAE dropped by approximately 89% and 84%, respectively. These improvements were achieved while preserving physical consistency through integration of the Stokes-Einstein and Arrhenius relations. The PINN also showed strong generalization in regions with limited experimental data, offering a distinct advantage over conventional models in low-data settings.
This research shows that modern machine learning methods, when combined with structured data and guided by physical principles, can provide accurate, generalizable, and interpretable tools for CO₂ transport prediction. The developed models, curated datasets, and the PINN framework offer practical value for fluid screening, digital reservoir simulation, and the design of CO₂ storage and EOR operations.
Description
Keywords
type of access: embargo, CO₂ solubility, CO₂ diffusion coefficient, reservoir fluids, machine learning, artificial neural networks (ANN), physics-informed neural networks (PINN), carbon sequestration, enhanced oil recovery (EOR), data-driven modeling, physics-based modeling, water and brine systems, hydrocarbon systems, predictive modeling, Optuna hyperparameter tuning, subsurface fluid transport, EOR
Citation
Hassan, S. (2025). Advanced machine learning for predicting CO₂ solubility and diffusion in reservoir fluids: Integrating data-driven and physics-based approaches for carbon sequestration and enhanced oil recovery. Nazarbayev University School of Mining and Geosciences.
Collections
Endorsement
Review
Supplemented By
Referenced By
Creative Commons license
Except where otherwised noted, this item's license is described as Attribution 3.0 United States
