ADVANCED MACHINE LEARNING FOR PREDICTING CO₂ SOLUBILITY AND DIFFUSION IN RESERVOIR FLUIDS: INTEGRATING DATA-DRIVEN AND PHYSICS-BASED APPROACHES FOR CARBON SEQUESTRATION AND ENHANCED OIL RECOVERY

Loading...
Thumbnail Image

Files

Access status: Embargo until 2027-05-15 , FinalThesis_GrM_2025_Suleiman_Hassan.pdf (6.2 MB)

Journal Title

Journal ISSN

Volume Title

Publisher

Nazarbayev University School of Mining and Geosciences

Abstract

Accurate prediction of carbon dioxide (CO₂) solubility and diffusion in reservoir fluids is essential for designing effective carbon sequestration and enhanced oil recovery (EOR) strategies. This thesis presents a unified machine learning framework developed to predict CO₂ solubility and diffusion coefficients across two distinct reservoir fluid systems: (1) crude oils and liquid hydrocarbons, and (2) pure water and saline brines. Four data-driven models were applied, including Artificial Neural Networks (ANN), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Support Vector Regression (SVR). Each model was trained, optimized, and evaluated independently for each fluid-property combination. In addition, a Physics-Informed Neural Network (PINN) was implemented to model CO₂ diffusion in aqueous systems by incorporating governing equations, including the Stokes–Einstein and Arrhenius relations, directly into the loss function during training. The compiled datasets include over 3,000 experimental records from more than 80 published references, covering a wide range of conditions. These span temperatures from 252 to 513 K, pressures from 0.1 to 200 MPa, salinities up to 6.8 mol/kg, viscosities up to 224,500 cP, and densities from 443 to 1026 kg/m³. The datasets contain limited measurements at high salinity and pressure ranges, which limits the model’s ability to generalize accurately in those conditions. A structured preprocessing pipeline was applied, including outlier handling, salinity and viscosity grouping, engineered features, and categorical encodings tailored to each fluid system. Among the data-driven models, ANN and XGBoost showed stronger performance than RF and SVR, which can be attributed to their ability to capture nonlinear dependencies and generalize well across diverse input conditions., with R² values exceeding 0.98 for most tasks. The PINN achieved a test R² of 0.992 on the aqueous diffusion dataset, along with an 80% reduction in RMSE and a 75% reduction in MAE compared to the ANN model. When compared to XGBoost, RMSE and MAE dropped by approximately 89% and 84%, respectively. These improvements were achieved while preserving physical consistency through integration of the Stokes-Einstein and Arrhenius relations. The PINN also showed strong generalization in regions with limited experimental data, offering a distinct advantage over conventional models in low-data settings. This research shows that modern machine learning methods, when combined with structured data and guided by physical principles, can provide accurate, generalizable, and interpretable tools for CO₂ transport prediction. The developed models, curated datasets, and the PINN framework offer practical value for fluid screening, digital reservoir simulation, and the design of CO₂ storage and EOR operations.

Description

Citation

Hassan, S. (2025). Advanced machine learning for predicting CO₂ solubility and diffusion in reservoir fluids: Integrating data-driven and physics-based approaches for carbon sequestration and enhanced oil recovery. Nazarbayev University School of Mining and Geosciences.

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution 3.0 United States