EmoMBTI-Net: introducing and leveraging a novel emoji dataset for personality profiling with large language models

Kumar, Akshi and Jain, Dipika. 2024. EmoMBTI-Net: introducing and leveraging a novel emoji dataset for personality profiling with large language models. Social Network Analysis and Mining, 14, 234. ISSN 1869-5450 [Article]

[img]
Preview
Text
s13278-024-01400-z.pdf - Published Version
Available under License Creative Commons Attribution.

Download (1MB) | Preview

Abstract or Description

Emojis, integral to digital communication, often encapsulate complex emotional layers that enhance text beyond mere words. This research leverages the expressive power of emojis to predict Myers-Briggs Type Indicator (MBTI) personalities, diverging from conventional text-based approaches. We developed a unique dataset, EmoMBTI, by mapping emojis to specific MBTI traits using diverse posts scraped from Reddit. This dataset enabled the integration of Natural Language Processing (NLP) techniques tailored for emoji analysis. Large Language Models (LLMs) such as FlanT5, BART, and PEGASUS were trained to generate contextual linkages between text and emojis, further correlating these emojis with MBTI personalities. Following the creation of this dataset, these LLMs were applied to understand the context conveyed by emojis and were subsequently fine-tuned. Additionally, transformer models like RoBERTa, DeBERTa, and BART were specifically fine-tuned to predict MBTI personalities based on emoji mappings from MBTI dataset posts. Our methodology significantly enhances the capability of personality assessments, with the fine-tuned BART model achieving an impressive accuracy of 0.875 in predicting MBTI types, which notably exceeds the performances of RoBERTa and DeBERTa, at 0.82 and 0.84 respectively. By leveraging the nuanced communication potential of emojis, this approach not only advances personality profiling techniques but also deepens insights into digital behaviour, highlighting the substantial impact of emotive icons in online interactions.

Item Type:

Article

Identification Number (DOI):

https://doi.org/10.1007/s13278-024-01400-z

Data Access Statement:

No datasets were generated or analysed during the current study.

Keywords:

Sentiment analysis, Personality, MBTI, Emojis, LLM, Natural language understanding

Departments, Centres and Research Units:

Computing

Dates:

DateEvent
28 November 2024Accepted
10 December 2024Published

Item ID:

37968

Date Deposited:

11 Dec 2024 09:15

Last Modified:

11 Dec 2024 09:15

Peer Reviewed:

Yes, this version has been peer-reviewed.

URI:

https://research.gold.ac.uk/id/eprint/37968

View statistics for this item...

Edit Record Edit Record (login required)