Combining Cox Model and Tree-Based Algorithms to Boost Performance and Preserve Interpretability for Health Outcomes

Shamsutdinova, Diana; Stamate, Daniel; Roberts, Angus and Stahl, Daniel. 2022. 'Combining Cox Model and Tree-Based Algorithms to Boost Performance and Preserve Interpretability for Health Outcomes'. In: 18th IFIP International Conference on Artificial Intelligence Applications and Innovations. Hersonissos, Crete, Greece 17 - 20 June 2022. [Conference or Workshop Item]

[img]
Preview
Text
AIAI_2022_1.pdf - Accepted Version

Download (368kB) | Preview

Abstract or Description

Predicting health outcomes such as a disease onset, recovery or mortality is an important part of medical research. Classical methods of survival analysis such as Cox proportionate hazards model have successfully been employed and proved robust and easy to interpret. Recent development of computational methods and digitalization of medical records brought new tools to survival analysis, which can handle large data with complex non-linear relationships. However, such methods often result in “black box” models hard to interpret. In this project we combine the Cox model with tree-based machine-learning algorithms to take advantage of both approaches’ strength and to boost the overall predictive performance. Moreover, we aimed to preserve interpretability of the results, quantify the contribution of linear and non-linear and cross-term dependencies, and get insight into a potential non-linearity. The first method includes the Cox model, ensembled with the survival random forest. The second employs a survival tree algorithm to cluster the data, and then fits a separate Cox model in each cluster. The third uses the clusters obtained with a survival tree to identify interaction and non-linear terms and adds them as new terms to the Cox model. We tested the methods on simulated and real-life medical data and compared their internally validated discrimination and calibration. Our results show that classical models outperform combined methods in data with predominantly linear relationships. The proposed methods were more effective in predicting survival outcomes with strong non-linear and inter-dependent relationships and provided an insight into where the non-linearity is placed.

Item Type:

Conference or Workshop Item (Paper)

Identification Number (DOI):

https://doi.org/10.1007/978-3-031-08337-2_15

Additional Information:

“This version of the contribution has been accepted for publication, after peer review (when applicable) but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: http://dx.doi.org/10.1007/978-3-031-08337-2_15. Use of this Accepted Version is subject to the publisher’s Accepted Manuscript terms of use https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms”.

Daniel Stahl and Angus Roberts are part-funded by the National Institute for Health Research (NIHR) Maudsley Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London

Keywords:

survival analysis; health research; Cox model; survival random forest; machine learning; ensemble methods

Departments, Centres and Research Units:

Computing

Dates:

DateEvent
31 March 2022Accepted
10 June 2022Published

Event Location:

Hersonissos, Crete, Greece

Date range:

17 - 20 June 2022

Item ID:

32819

Date Deposited:

20 Dec 2022 13:26

Last Modified:

10 Jun 2023 01:26

URI:

https://research.gold.ac.uk/id/eprint/32819

View statistics for this item...

Edit Record Edit Record (login required)