A probabilistic approach to mining mobile phone data sequences

Farrahi, Katayoun and Gatica-Perez, Daniel. 2013. A probabilistic approach to mining mobile phone data sequences. Personal and Ubiquitous Computing, n/a, n/a-n/a. ISSN 1617-4909 [Article]

No full text available

Abstract or Description

We present a new approach to address the problem of large sequence mining from big data. The particular problem of interest is the effective mining of long sequences from large-scale location data to be practical for Reality Mining applications, which suffer from large amounts of noise and lack of ground truth. To address this complex data, we propose an unsupervised probabilistic topic model called the distant n-gram topic model (DNTM). The DNTM is based on latent Dirichlet allocation (LDA), which is extended to integrate sequential information. We define the generative process for the model, derive the inference procedure, and evaluate our model on both synthetic data and real mobile phone data. We consider two different mobile phone datasets containing natural human mobility patterns obtained by location sensing, the first considering GPS/wi-fi locations and the second considering cell tower connections. The DNTM discovers meaningful topics on the synthetic data as well as the two mobile phone datasets. Finally, the DNTM is compared to LDA by considering log-likelihood performance on unseen data, showing the predictive power of the model. The results show that the DNTM consistently outperforms LDA as the sequence length increases.

Item Type:


Identification Number (DOI):


Departments, Centres and Research Units:




Item ID:


Date Deposited:

04 Nov 2013 11:00

Last Modified:

20 Jun 2017 10:06

Peer Reviewed:

Yes, this version has been peer-reviewed.



Edit Record Edit Record (login required)