Synthetic Sensor Data for Human Activity Recognition

Alharbi, F; Ouarbya, L and Ward, J A. 2020. 'Synthetic Sensor Data for Human Activity Recognition'. In: International Joint Conference on Neural Networks (IJCNN). Glasgow, United Kingdom 19-24 July 2020. [Conference or Workshop Item]

Final__PRESUBMIT_JW- -29 - April -2020.pdf - Accepted Version
Available under License Creative Commons Attribution Non-commercial.

Download (598kB) | Preview

Abstract or Description

Human activity recognition (HAR) based on wearable sensors has emerged as an active topic of research in machine learning and human behavior analysis because of its applications in several fields, including health, security and surveillance, and remote monitoring. Machine learning algorithms are frequently applied in HAR systems to learn from labeled sensor data. The effectiveness of these algorithms generally relies on having access to lots of accurately labeled training data. But labeled data for HAR is hard to come by and is often heavily imbalanced in favor of one or other dominant classes, which in turn leads to poor recognition performance.
In this study we introduce a generative adversarial network (GAN)-based approach for HAR that we use to automatically synthesize balanced and realistic sensor data. GANs are robust generative networks, typically used to create synthetic images that cannot be distinguished from real images. Here we explore and construct a model for generating several types of human activity sensor data using a Wasserstein GAN (WGAN). We assess the synthetic data using two commonly-used classifier models, Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM). We evaluate the quality and diversity of the synthetic data by training on synthetic data and testing on real sensor data, and vice versa. We then use synthetic sensor data to oversample the imbalanced training set. We demonstrate the efficacy of the proposed method on two publicly available human activity datasets, the Sussex-Huawei Locomotion (SHL) and Smoking Activity Dataset (SAD). We achieve improvements of using WGAN augmented training data over the imbalanced case, for both SHL (0.85 to 0.95 F1-score), and for SAD (0.70 to 0.77 F1-score) when using a CNN activity classifier.

Item Type:

Conference or Workshop Item (Paper)

Identification Number (DOI):

Additional Information:

© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Departments, Centres and Research Units:



20 March 2020Accepted
28 September 2020Published

Event Location:

Glasgow, United Kingdom

Date range:

19-24 July 2020

Item ID:


Date Deposited:

01 May 2020 14:24

Last Modified:

11 Jun 2021 17:19


View statistics for this item...

Edit Record Edit Record (login required)