Examining Student Coding Behaviours in Creative Computing Lessons using Abstract Syntax Trees and Vocabulary Analysis

Yee-King, Matthew; McCallum, Louis; Llano, Maria Teresa; Ruzicka, Vit; d'Inverno, Mark and Grierson, Mick. 2020. 'Examining Student Coding Behaviours in Creative Computing Lessons using Abstract Syntax Trees and Vocabulary Analysis'. In: 2020 ACM Conference on Innovation and Technology in Computer Science Education. Trondheim, Norway. [Conference or Workshop Item]

Text (yeeking_iticse_2020)
3341525.3387408.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial.

Download (2MB) | Preview

Abstract or Description

Creative computing is an approach to computing education which emphasises the creation of interactive audiovisual software and an art-school influenced pedagogy. Given this emphasis on Dewey’s "learning by doing”, we set out to investigate the processes students use to develop their programs. We refer to these processes as the students’ ‘coding behaviour’, and we expect that understanding it will provide us with valuable information about how students learn in our creative computing classes. As existing metrics were not sufficient, we introduce a new set of quantitative metrics to describe coding behaviours. The metrics consider factors such as students’ vocabulary use and development, how fast and how much they alter the functionality of code over time and how they iterate on their code through text insert and delete operations. Many of our lessons involve providing students with demonstrator code which they use as a base for the development of their programs, so we use demo code as an entry point to our dataset. We look at programs students have written through developing the demo code in a dataset of over 16,000 programs. We clustered the demo code using the set of descriptive metrics. This lead to a set of clusters containing programs which are associated with distinct coding behaviours. Four was the ideal number of clusters for cluster density and separation. We found that the clusters had distinct behaviour patterns, that they were associated with different instructors and that they contained demo programs with different lengths.

Item Type:

Conference or Workshop Item (Paper)

Identification Number (DOI):


Additional Information:

The work reported in this paper was supported by the Arts and Humanities Research Council under grant number AH/R002657/1, and the Higher Education Funding Council England Catalyst Schemeunder grant number PK31

(c) 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from permissions@acm.org


creative computing, automated code analysis, MOOCs, demonstrator code

Departments, Centres and Research Units:



2 March 2020Accepted
17 June 2020Published

Event Location:

Trondheim, Norway

Item ID:


Date Deposited:

23 Jun 2020 10:22

Last Modified:

10 Jun 2021 08:16



View statistics for this item...

Edit Record Edit Record (login required)