Audiovisual Scene Synthesis

Mital, Parag Kumar. 2014. Audiovisual Scene Synthesis. Doctoral thesis, Goldsmiths, University of London [Thesis]

No full text available
[img] Text (Audiovisual Scene Synthesis)
COM_thesis_MitalPK_2014.pdf - Accepted Version
Permissions: Administrator Access Only

Download (167MB)

Abstract or Description

This thesis attempts to open a dialogue around fundamental questions of perception such as: how do we represent our ongoing auditory or visual perception of the world using our brain; what could these representations explain and not explain; and how can these representations eventually be modeled by computers? Rather than answer these questions scientifically, we will attempt to develop a computational arts practice presenting these questions to participants. The approach this thesis takes is computational scene synthesis: a computationally generative collage process where the units of the collage are built using perceptually-inspired representations. We explain how scene synthesis is built in detail and relate it to an existing lineage of collage-based practitioners. Then, working in auditory and visual domains separately, in order to bring questions of perception to the experience of the artwork, this thesis makes significant interdisciplinary strides from reviewing fundamental issues in perception in terms of experimental psychology and cognitive neuroscience, to formulating and developing perceptually-inspired computational models of large databases of audiovisual material, to finally developing these models with a computationally generative collage-based arts practice. Two final practical outputs using audiovisual scene synthesis will be explored: (1) a short film series which attempts to recreate the number 1 video of the week on YouTube using only the audiovisual content from the remaining top 10 videos; and (2) a real-time augmented reality experience presented through a virtual reality headset and headphones presenting a scene synthesis of a participant's surroundings using only previously learned audiovisual fragments. Results from both outputs demonstrate the ability for scene synthesis to provoke meaningful engagements with one's own process of perception. The results further demonstrate that scene synthesis is capable of highlighting both theoretical and practical gaps in our current understanding of human perception and their computational implementations.

Item Type:

Thesis (Doctoral)

Identification Number (DOI):


Collage, Scene Analysis, Synthesis, Perception, Attention, Proto-objects, Streaming, Decoding, Information Retrieval, Augmented Reality, Youtube, Digital Copyright, Smash Up, Mashup, Infringement, Audiovisual, Memory Mosaic, Photosynthesizer, Daphne Oram, MFCC, PLCA, MSCR, Visual Acuity, ERP, N1, PN, MMN, ORN

Departments, Centres and Research Units:

Computing > Embodied AudioVisual Interaction Group (EAVI)


11 September 2014

Item ID:


Date Deposited:

16 Sep 2014 13:30

Last Modified:

29 Apr 2020 16:02


View statistics for this item...

Edit Record Edit Record (login required)