A probabilistic approach to mining mobile phone data sequences

Farrahi, Katayoun and Gatica-Perez, Daniel. 2014. A probabilistic approach to mining mobile phone data sequences. Personal and Ubiquitous Computing, 18(1), pp. 223-238. ISSN 1617-4909 [Article]

No full text available

Abstract or Description

We present a new approach to address the problem of large sequence mining from big data. The particular problem of interest is the effective mining of long sequences from large-scale location data to be practical for Reality Mining applications, which suffer from large amounts of noise and lack of ground truth. To address this complex data, we propose an unsupervised probabilistic topic model called the distant n-gram topic model (DNTM). The DNTM is based on latent Dirichlet allocation (LDA), which is extended to integrate sequential information. We define the generative process for the model, derive the inference procedure, and evaluate our model on both synthetic data and real mobile phone data. We consider two different mobile phone datasets containing natural human mobility patterns obtained by location sensing, the first considering GPS/wi-fi locations and the second considering cell tower connections. The DNTM discovers meaningful topics on the synthetic data as well as the two mobile phone datasets. Finally, the DNTM is compared to LDA by considering log-likelihood performance on unseen data, showing the predictive power of the model. The results show that the DNTM consistently outperforms LDA as the sequence length increases.

Item Type:


Identification Number (DOI):



Mobile Phone, Topic Model, Latent Dirichlet Allocation, Unseen Data, Mobile Phone Data

Departments, Centres and Research Units:



20 February 2013Published Online
January 2014Published

Item ID:


Date Deposited:

04 Nov 2013 11:00

Last Modified:

23 Apr 2021 14:31

Peer Reviewed:

Yes, this version has been peer-reviewed.



Edit Record Edit Record (login required)