Balancing accuracy and Interpretability: An R package assessing complex relationships beyond the Cox model and applications to clinical prediction
Shamsutdinova, Diana; Stamate, Daniel and Stahl, Daniel. 2025. Balancing accuracy and Interpretability: An R package assessing complex relationships beyond the Cox model and applications to clinical prediction. International Journal of Medical Informatics, 194, 105700. ISSN 1386-5056 [Article]
|
Text
1-s2.0-S1386505624003630-main.pdf - Published Version Available under License Creative Commons Attribution. Download (4MB) | Preview |
|
Text
ijmedinf24.pdf - Accepted Version Permissions: Administrator Access Only Available under License Creative Commons Attribution Non-commercial No Derivatives. Download (1MB) |
Abstract or Description
Background
Accurate and interpretable models are essential for clinical decision-making, where predictions can directly impact patient care. Machine learning (ML) survival methods can handle complex multidimensional data and achieve high accuracy but require post-hoc explanations. Traditional models such as the Cox Proportional Hazards Model (Cox-PH) are less flexible, but fast, stable, and intrinsically transparent. Moreover, ML does not always outperform Cox-PH in clinical settings, warranting a diligent model validation. We aimed to develop a set of R functions to help explore the limits of Cox-PH compared to the tree-based and deep learning survival models for clinical prediction modelling, employing ensemble learning and nested cross-validation.
Methods
We developed a set of R functions, publicly available as the package "survcompare”. It supports Cox-PH are Cox-Lasso, and Survival Random Forest (SRF) and DeepHit are the ML alternatives, along with the ensemble methods integrating Cox-PH with SRF or DeepHit designed to isolate the marginal value of ML. The package performs a repeated nested cross-validation and tests for statistical significance of the ML’s superiority using the survival-specific performance metrics, the concordance index, time-dependent AUC-ROC and calibration slope. To get practical insights, we applied this methodology to clinical and simulated datasets with varying complexities and sizes.
Results
In simulated data with non-linearities or interactions, ML models outperformed Cox-PH at sample sizes ≥500. ML superiority was also observed in imaging and high-dimensional clinical data. However, for tabular clinical data, the performance gains of ML were minimal; in some cases, regularised Cox-Lasso recovered much of the ML’s performance advantage with significantly faster computations. Ensemble methods combining Cox-PH and ML predictions were instrumental in quantifying Cox-PH’s limits and improving ML calibration. Traditional models like Cox-PH or Cox-Lasso should not be overlooked while developing clinical predictive models from tabular data or data of limited size.
Conclusion
Our package offers researchers a framework and practical tool for evaluating the accuracy-interpretability trade-off, helping make informed decisions about model selection.
Item Type: |
Article |
||||||||
Identification Number (DOI): |
|||||||||
Additional Information: |
Funding: D Shamsutdinova and D Stahl are funded by the National Institute for Health Research (NIHR) Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London. The views expressed are those of the authors and not necessarily those of the NHS, NIHR or the Department of Health and Social Care. This paper represents independent research part-funded by the NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London. Declaration of Generative AI and AI-assisted technologies in the writing process: During the preparation of this work the author(s) used Grammarly (https://www.grammarly.com) in order to improve the readability of the manuscript, including spelling and grammar check. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the published article. |
||||||||
Keywords: |
Clinical prediction model, Interpretability, Survival analysis, Ensemble methods, Internal validation, R |
||||||||
Departments, Centres and Research Units: |
|||||||||
Dates: |
|
||||||||
Item ID: |
37853 |
||||||||
Date Deposited: |
18 Nov 2024 11:54 |
||||||||
Last Modified: |
18 Nov 2024 11:58 |
||||||||
Peer Reviewed: |
Yes, this version has been peer-reviewed. |
||||||||
URI: |
View statistics for this item...
Edit Record (login required) |