Predicting risk of dementia with machine learning and survival models using routine primary care records

Langham, John; Stamate, Daniel; Wu, Charlotte A.; Murtagh, Fionn; Morgan, Catharine; Reeves, David; Ashcroft, Darren; Kontopantelis, Evan and McMillan, Brian. 2022. 'Predicting risk of dementia with machine learning and survival models using routine primary care records'. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Houston, TX, United States 9-12 December 2021. [Conference or Workshop Item]

paper_2021254853.pdf - Accepted Version

Download (193kB) | Preview

Abstract or Description

Worldwide, it is forecasted that 131.5 million people will suffer from dementia by 2050, and the annual cost of care will increase from 818 billion USD in 2016 to 2 trillion USD by 2030, with burgeoning social consequences. Given a timely prediction of a dementia outcome in patients, appropriate mitigating interventions can be applied to reduce risk. However such prediction facilities need to be made available to wider populations, and these facilities cannot rely on specialised, costly and invasive testing (such as neuroimaging, cerebrospinal fluid collection, etc which constitute important instruments used in diagnosis), for interventions to have a meaningful quantitative impact. Hence an emerging need exists for the wider application of prognostic measures which can be deployed using lower cost data sources such as longitudinal records routinely collected by general practices. This paper proposes an efficient prediction modelling approach to the risk of dementia, using CPRD data collected from GP practices in UK, and based on machine learning in particular the Gradient Boosting Machines model combined with a survival model such as the Cox Proportional Hazard, encapsulated in a semi-supervised learning and model calibration methodology.

Item Type:

Conference or Workshop Item (Paper)

Identification Number (DOI):

Additional Information:

© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.


dementia risk, CPRD, primary care, prediction modelling, machine learning, classification, gradient boosting machines, Cox proportional hazards, model calibration

Departments, Centres and Research Units:



1 November 2021Accepted
14 January 2022Published

Event Location:

Houston, TX, United States

Date range:

9-12 December 2021

Item ID:


Date Deposited:

25 Feb 2022 16:17

Last Modified:

26 Feb 2022 17:00


View statistics for this item...

Edit Record Edit Record (login required)