Balancing accuracy and Interpretability: An R package assessing complex relationships beyond the Cox model and applications to clinical prediction

Shamsutdinova, Diana; Stamate, Daniel and Stahl, Daniel. 2025. Balancing accuracy and Interpretability: An R package assessing complex relationships beyond the Cox model and applications to clinical prediction. International Journal of Medical Informatics, 194, 105700. ISSN 1386-5056 [Article]

[img]
Preview
Text
1-s2.0-S1386505624003630-main.pdf - Published Version
Available under License Creative Commons Attribution.

Download (4MB) | Preview
[img] Text
ijmedinf24.pdf - Accepted Version
Permissions: Administrator Access Only
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (1MB)

Abstract or Description

Background
Accurate and interpretable models are essential for clinical decision-making, where predictions can directly impact patient care. Machine learning (ML) survival methods can handle complex multidimensional data and achieve high accuracy but require post-hoc explanations. Traditional models such as the Cox Proportional Hazards Model (Cox-PH) are less flexible, but fast, stable, and intrinsically transparent. Moreover, ML does not always outperform Cox-PH in clinical settings, warranting a diligent model validation. We aimed to develop a set of R functions to help explore the limits of Cox-PH compared to the tree-based and deep learning survival models for clinical prediction modelling, employing ensemble learning and nested cross-validation.

Methods
We developed a set of R functions, publicly available as the package "survcompare”. It supports Cox-PH are Cox-Lasso, and Survival Random Forest (SRF) and DeepHit are the ML alternatives, along with the ensemble methods integrating Cox-PH with SRF or DeepHit designed to isolate the marginal value of ML. The package performs a repeated nested cross-validation and tests for statistical significance of the ML’s superiority using the survival-specific performance metrics, the concordance index, time-dependent AUC-ROC and calibration slope. To get practical insights, we applied this methodology to clinical and simulated datasets with varying complexities and sizes.

Results
In simulated data with non-linearities or interactions, ML models outperformed Cox-PH at sample sizes ≥500. ML superiority was also observed in imaging and high-dimensional clinical data. However, for tabular clinical data, the performance gains of ML were minimal; in some cases, regularised Cox-Lasso recovered much of the ML’s performance advantage with significantly faster computations. Ensemble methods combining Cox-PH and ML predictions were instrumental in quantifying Cox-PH’s limits and improving ML calibration. Traditional models like Cox-PH or Cox-Lasso should not be overlooked while developing clinical predictive models from tabular data or data of limited size.

Conclusion
Our package offers researchers a framework and practical tool for evaluating the accuracy-interpretability trade-off, helping make informed decisions about model selection.

Item Type:

Article

Identification Number (DOI):

https://doi.org/10.1016/j.ijmedinf.2024.105700

Additional Information:

Funding: D Shamsutdinova and D Stahl are funded by the National Institute for Health Research (NIHR) Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London. The views expressed are those of the authors and not necessarily those of the NHS, NIHR or the Department of Health and Social Care. This paper represents independent research part-funded by the NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London.

Declaration of Generative AI and AI-assisted technologies in the writing process: During the preparation of this work the author(s) used Grammarly (https://www.grammarly.com) in order to improve the readability of the manuscript, including spelling and grammar check. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the published article.

Keywords:

Clinical prediction model, Interpretability, Survival analysis, Ensemble methods, Internal validation, R

Departments, Centres and Research Units:

Computing

Dates:

DateEvent
8 November 2024Accepted
15 November 2024Published Online
February 2025Published

Item ID:

37853

Date Deposited:

18 Nov 2024 11:54

Last Modified:

18 Nov 2024 11:58

Peer Reviewed:

Yes, this version has been peer-reviewed.

URI:

https://research.gold.ac.uk/id/eprint/37853

View statistics for this item...

Edit Record Edit Record (login required)