Abstract:
Objectives: Chronic diseases pose a significant threat to global health, highlighting the need for innovative approaches to predict patient outcomes effectively. This study aims to predict the one-year mortality in patients with chronic viral hepatitis (CVH) and tuberculosis (TB) utilizing administrative data, which includes demographic information, comorbidities, diagnoses, and characteristics of service providers.
Methods: Clinical data collected from a nationwide database between January 2014 to December 2019 was analyzed with 82,700 CVH patients and 150,000 TB patients. The data were segmented into yearly cohorts to forecast mortality within one year based on information up to the end of the preceding year. We developed a machine learning platform utilizing six categories of models: linear, nearest neighbors, support vector machines, naïve Bayes, and ensemble methods (including gradient boosting, AdaBoost, and random forest). Feature importance was assessed through SHapley Additive exPlanations (SHAP) values.
Results: The year-specific models demonstrated an area under the receiver operating characteristic curve (AUC) between 0.74 and 0.83 on separate test sets. SHAP analysis showed that age, sex, type of hepatitis, and ethnicity are main predictors of one-year mortality for CVH patients. For TB patients, main predictors included age, type of TB, ethnicity, and duration of TB.
Conclusion: The results show that it is possible to construct accurate machine learning models using administrative health data for predicting one-year mortality in patients with CVH and TB. In future work, detailed laboratory and medical history data could be incorporated to improve performance. This integration can provide a helpful tool for healthcare workers to effectively manage and treat chronic diseases.