#ChronicPain: Automated Building of a Chronic Pain Cohort from Twitter Using Machine Learning

Sarker, A; Lakamana, S; Guo, Y; Ge, Y; Leslie, A; Okunromade, O; Gonzalez-Polledo, EJ; Perrone, J and Mackenzie-Brown, AM. 2023. #ChronicPain: Automated Building of a Chronic Pain Cohort from Twitter Using Machine Learning. Health Data Science, 3, 0078. ISSN 2765-8783 [Article]

hds.0078.pdf - Published Version
Available under License Creative Commons Attribution.

Download (1MB) | Preview

Abstract or Description

Background: Due to the high burden of chronic pain, and the detrimental public health consequences of its treatment with opioids, there is a high-priority need to identify effective alternative therapies. Social media is a potentially valuable resource for knowledge about self-reported therapies by chronic pain sufferers.

Methods: We attempted to (a) verify the presence of large-scale chronic pain-related chatter on Twitter, (b) develop natural language processing and machine learning methods for automatically detecting self-disclosures, (c) collect longitudinal data posted by them, and (d) semiautomatically analyze the types of chronic pain-related information reported by them. We collected data using chronic pain-related hashtags and keywords and manually annotated 4,998 posts to indicate if they were self-reports of chronic pain experiences. We trained and evaluated several state-of-the-art supervised text classification models and deployed the best-performing classifier. We collected all publicly available posts from detected cohort members and conducted manual and natural language processing-driven descriptive analyses.

Results: Interannotator agreement for the binary annotation was 0.82 (Cohen’s kappa). The RoBERTa model performed best (F1 score: 0.84; 95% confidence interval: 0.80 to 0.89), and we used this model to classify all collected unlabeled posts. We discovered 22,795 self-reported chronic pain sufferers and collected over 3 million of their past posts. Further analyses revealed information about, but not limited to, alternative treatments, patient sentiments about treatments, side effects, and self-management strategies.

Conclusion: Our social media based approach will result in an automatically growing large cohort over time, and the data can be leveraged to identify effective opioid-alternative therapies for diverse chronic pain types.

Item Type:


Identification Number (DOI):


Additional Information:

Funding: Research reported in this publication was supported in part by the National Institute on Drug Abuse (NIDA) of the National Institutes of Health (NIH) under award number R01DA057599.

Data Access Statement:

The tweet IDs and their labels are available (Supplementary material S2). Researchers may download the texts of the tweets associated with the IDs as long as they are publicly available via the Twitter API.

Departments, Centres and Research Units:



12 June 2023Accepted
4 July 2023Published

Item ID:


Date Deposited:

15 Aug 2023 10:37

Last Modified:

15 Aug 2023 10:37

Peer Reviewed:

Yes, this version has been peer-reviewed.



View statistics for this item...

Edit Record Edit Record (login required)