Abstract:
Blood sample analysis plays a crucial role in modern medical practice,
aiding in the detection of a wide array of diseases. Despite its
significance, the potential of blood samples for predicting various
diseases has remained largely unexplored. Our project aimed to dive
into evaluate the efficacy of blood samples in predicting a broad spectrum
of disease using large-scale MIMIC III medical dataset. Given
the sparse nature of the data, we combine imputation with multi-task
models for which we identify and utilize meaningful auxiliary tasks
and are thus able to reach an average state-of-the-art ROC-AUC score
of 81% across the 50 most prevalent diseases within the dataset. To
further validate our findings, we sought the expertise of five medical
doctors, who independently rated the predictability of these diseases
from blood samples. Spearman’s rho analysis revealed a substantial
agreement ( = 0.61) between the doctors’ ratings and the actual ROCAUC
values of our machine learning models. In order to add transparency
and reliability, we employed the Local Interpretable Modelagnostic
Explanations (LIME) method to identify the most predictive
blood sample features. These findings were rigorously cross-checked
with medical experts, affirming the robustness and credibility of our
predictive models. Our study represents a significant advancement in
the field of medical diagnostics, showcasing the untapped potential of
blood sample analysis in disease prediction. By integrating cuttingedge
machine learning techniques with expert validation, we pave the
way for enhanced patient care and improved healthcare outcomes.