ADVANCED MACHINE LEARNING FOR PREDICTING CO₂ SOLUBILITY AND DIFFUSION IN RESERVOIR FLUIDS: INTEGRATING DATA-DRIVEN AND PHYSICS-BASED APPROACHES FOR CARBON SEQUESTRATION AND ENHANCED OIL RECOVERY

dc.contributor.authorHassan, Suleiman
dc.date.accessioned2025-05-21T10:28:22Z
dc.date.available2025-05-21T10:28:22Z
dc.date.issued2025-04-24
dc.description.abstractAccurate prediction of carbon dioxide (CO₂) solubility and diffusion in reservoir fluids is essential for designing effective carbon sequestration and enhanced oil recovery (EOR) strategies. This thesis presents a unified machine learning framework developed to predict CO₂ solubility and diffusion coefficients across two distinct reservoir fluid systems: (1) crude oils and liquid hydrocarbons, and (2) pure water and saline brines. Four data-driven models were applied, including Artificial Neural Networks (ANN), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Support Vector Regression (SVR). Each model was trained, optimized, and evaluated independently for each fluid-property combination. In addition, a Physics-Informed Neural Network (PINN) was implemented to model CO₂ diffusion in aqueous systems by incorporating governing equations, including the Stokes–Einstein and Arrhenius relations, directly into the loss function during training. The compiled datasets include over 3,000 experimental records from more than 80 published references, covering a wide range of conditions. These span temperatures from 252 to 513 K, pressures from 0.1 to 200 MPa, salinities up to 6.8 mol/kg, viscosities up to 224,500 cP, and densities from 443 to 1026 kg/m³. The datasets contain limited measurements at high salinity and pressure ranges, which limits the model’s ability to generalize accurately in those conditions. A structured preprocessing pipeline was applied, including outlier handling, salinity and viscosity grouping, engineered features, and categorical encodings tailored to each fluid system. Among the data-driven models, ANN and XGBoost showed stronger performance than RF and SVR, which can be attributed to their ability to capture nonlinear dependencies and generalize well across diverse input conditions., with R² values exceeding 0.98 for most tasks. The PINN achieved a test R² of 0.992 on the aqueous diffusion dataset, along with an 80% reduction in RMSE and a 75% reduction in MAE compared to the ANN model. When compared to XGBoost, RMSE and MAE dropped by approximately 89% and 84%, respectively. These improvements were achieved while preserving physical consistency through integration of the Stokes-Einstein and Arrhenius relations. The PINN also showed strong generalization in regions with limited experimental data, offering a distinct advantage over conventional models in low-data settings. This research shows that modern machine learning methods, when combined with structured data and guided by physical principles, can provide accurate, generalizable, and interpretable tools for CO₂ transport prediction. The developed models, curated datasets, and the PINN framework offer practical value for fluid screening, digital reservoir simulation, and the design of CO₂ storage and EOR operations.
dc.identifier.citationHassan, S. (2025). Advanced machine learning for predicting CO₂ solubility and diffusion in reservoir fluids: Integrating data-driven and physics-based approaches for carbon sequestration and enhanced oil recovery. Nazarbayev University School of Mining and Geosciences.
dc.identifier.urihttps://nur.nu.edu.kz/handle/123456789/8580
dc.language.isoen
dc.publisherNazarbayev University School of Mining and Geosciences
dc.rightsAttribution 3.0 United Statesen
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/us/
dc.subjecttype of access: embargo
dc.subjectCO₂ solubility
dc.subjectCO₂ diffusion coefficient
dc.subjectreservoir fluids
dc.subjectmachine learning
dc.subjectartificial neural networks (ANN)
dc.subjectphysics-informed neural networks (PINN)
dc.subjectcarbon sequestration
dc.subjectenhanced oil recovery (EOR)
dc.subjectdata-driven modeling
dc.subjectphysics-based modeling
dc.subjectwater and brine systems
dc.subjecthydrocarbon systems
dc.subjectpredictive modeling
dc.subjectOptuna hyperparameter tuning
dc.subjectsubsurface fluid transport
dc.subjectEOR
dc.titleADVANCED MACHINE LEARNING FOR PREDICTING CO₂ SOLUBILITY AND DIFFUSION IN RESERVOIR FLUIDS: INTEGRATING DATA-DRIVEN AND PHYSICS-BASED APPROACHES FOR CARBON SEQUESTRATION AND ENHANCED OIL RECOVERY
dc.typeMaster`s thesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
FinalThesis_GrM_2025_Suleiman_Hassan.pdf
Size:
6.2 MB
Format:
Adobe Portable Document Format
Description:
Master`s thesis
Access status: Embargo until 2027-05-15 , Download