Using Virtual Objects With Hand-Tracking: The Effects of Visual Congruence and Mid-Air Haptics on Sense of Agency

Virtual reality expands the possibilities of human action. With hand-tracking technology, we can directly interact with these environments without the need for a mediating controller. Much previous research has looked at the user-avatar relationship. Here we explore the avatar-object relationship by manipulating the visual congruence and haptic feedback of the virtual object of interaction. We examine the effect of these variables on the sense of agency (SoA), which refers to the feeling of control over our actions and their effects. This psychological variable is highly relevant to user experience and is attracting increased interest in the field. Our results showed that implicit SoA was not significantly affected by visual congruence and haptics. However, both of these manipulations significantly affected explicit SoA, which was strengthened by the presence of mid-air haptics and was weakened by the presence of visual incongruence. We propose an explanation of these findings that draws on the cue integration theory of SoA. We also discuss the implications of these findings for HCI research and design.

Here we explore the avatar-object relationship by manipulating the visual congruence and haptic feedback of the virtual object of interaction.We examine the effect of these variables on the sense of agency (SoA), which refers to the feeling of control over our actions and their effects.This psychological variable is highly relevant to user experience and is attracting increased interest in the field.Our results showed that implicit SoA was not significantly affected by visual congruence and haptics.However, both of these manipulations significantly affected explicit SoA, which was strengthened by the presence of mid-air haptics and was weakened by the presence of visual incongruence.We propose an explanation of these findings that draws on the cue integration theory of SoA.We also discuss the implications of these findings for HCI research and design.

I. INTRODUCTION
V IRTUAL reality opens up new possibilities for human action and interaction.This has expanded the horizons of human agency and has promising applications in various domains such as medicine [1] motor rehabilitation [2], [3], cooperation [4], and animation and editing [5].Many of these applications depend on the user interacting with a virtual object in an immersive or non-immersive environment.This is normally through a virtual avatar, which is itself controlled by the user.An important consideration is the means of this interaction.One option is through a physical device such as a controller, or a wearable device that can track the user's movements.Another option is through hand-tracking which allows the user to directly interact with virtual environments, which has been suggested to be a more naturalistic mode of interaction [6].
Here, we consider the psychological variable known as Sense of Agency (SoA) in these potentially more naturalistic interactions with virtual objects.SoA refers to the feeling of control over one's actions and their effects [7].This has been the focus of much research in psychology, and has, in the past 10 years or so, also attracted growing interest in the HCI community [8].Primarily, this is because of the recognition that users' sense of being in control of a system is fundamental to effective user interface design [9].Additionally, HCI research has benefitted from the adoption of rigorous measures and theories developed in psychological research on SoA.This is something we aim to continue in the present study in which we investigate SoA in the context of the avatar-object relationship.

II. TRACKING THE VIRTUAL AGENT
Although hand-tracking may be preferable in terms of its support of natural gesture-based interaction, there are concerns about its accuracy and precision [6].This is particularly relevant when it comes to SoA, which is known to be acutely sensitive to perturbations in the relationship between a movement and its visual representation (e.g., [10], [11]).This feature of agency processing is captured by the comparator model, which emphasises the importance of a correspondence between expected and actual action feedback in generating the SoA [12].
In line with this, an extensive body of research has already confirmed that the relationship between user and avatar movement is important for the experience of agency.For example, artefacts such as latency, jitter and spatial congruency that disrupt the user-avatar relationship have been shown to impact SoA [13], [14], [15], [16], [17], [18].Naturally, these contingencies are considered to be of importance to user representation in HCI [19].What has seldom been investigated, however, is whether this extends to our interactions with objects in the virtual environment.This is something we explore here, by assessing the effect of manipulating the relationship between a virtual action aimed at an object and the behaviour of that object.Psychological theories have consistently emphasised the importance of environmental feedback in informing SoA (e.g., [20], [21]), and the limited research in this area would appear to support this.For example, it has been shown that when causing an object to move on a screen, the extent of the movement in terms of its congruency with the force applied [22] can impact SoA.In light of this we would expect that disruption of the virtual action-object relationship reduces SoA.
Another variable of interest in the context of hand-tracking technology is haptics.Although hand-tracking allows for more naturalistic interactions, as a result there is a lack of tactile feedback that would typically accompany actions in the physical world.Psychological theories of SoA emphasise the importance of bodily feedback and sensory signals in the construction of this experience [20], [21].In this way, the absence of haptic feedback would potentially harm SoA.To overcome this issue, technology has recently been developed that is able to provide mid-air haptic feedback without the need for wearables or physical objects [23], [24].These arrays use ultrasound which targets focused points on the hand, stimulating mechanoreceptors and transmitting vibrotactile sensations.
Cornelio-Martinez et al. [25] demonstrated mid-air haptic feedback for gesture-based touchless interactions to be beneficial, increasing SoA as compared to visual.Recent research by Evangelou et al. [14] has looked at the presence mid-air haptics for virtual objects of interaction and shown this to optimise SoA under certain conditions.Moreover, their study demonstrated that the presence of this haptic information also protects against the loss of SoA arising from user-avatar latency.This latter finding is important in the present context as it suggests that any putative disruption of the avatar-object relationship with hand-tracking could also be mitigated by the presence of mid-air haptics.

III. EXPERIMENT AND CONTRIBUTIONS
The present study explores a) the effect of disruption to the avatar-object relationship, and b) it's possible mitigation by haptic feedback in a non-immersive virtual environment.With this, we aim to contribute to HCI by looking at whether the responsiveness of virtual objects affects SoA, and whether the positive effects of mid-air haptics extend from the user-avatar relationship to the avatar-object relationship.
Participants pressed a virtual button with their avatar hand, which caused an auditory tone after a brief delay.In a visually congruent condition, the virtual hand made contact with the button which caused it to visibly depress.In an incongruent condition, the button did not visibly depress.The button press interaction was either accompanied with haptic feedback emulating a physical button press or no feedback at all.We measured SoA via the interval estimation paradigm [26].This is an implicit measure of SoA based on changes in time perception associated with voluntary actions and effects (Fig. 1).More specifically, when someone feels in control of their action and its effect, they perceive a compression of time between the two, referred to as intentional binding [27], [28].We supplemented the binding measure with explicit self-report measures of agency, whereby participants were asked to rate their feelings of controlling the button press and causing the tone outcome.These questions are adapted from previous research [14] and tailored to the task.

A. Participants
Based on a medium effect size (f = .25)and desired power of .9, using G * Power [29] we calculated the required sample size to be 30 participants.In total, we recruited 32 participants (18 females, 1 prefer not to say) via email or the SONA participation database.They received a compensatory £15 Amazon voucher for their participation.Ages ranged from 18-50 years (M = 30.2years; SD = 7.8 years).Two participants were excluded from analyses due to not following instructions (time estimates exceeding the maximum of that instructed) or too many unreported missing trials demonstrating a lack of concentration.Handedness was measured via the short form revised Edinburgh Handedness Inventory [30] to ensure that the dominant hand was used.For mixed handers (scores ranging 60 to −60) their self-reported preferred hand was used.There were no reported visual or hearing impairments.

B. Materials and Apparatus
An interactive non-immersive virtual scene (see Fig. 2(a)) was setup and run via Unity game engine (v2019.4.12f1).There was a virtual button and a virtual hand displayed on the screen.A Leap Motion camera was used to track the participants' hand movements, which were displayed on the screen as movements of the virtual hand towards the virtual button.The Leap Motion camera was attached to an Ultraleap STRATOS Xplore development kit which uses ultrasound technology to transmit tactile sensations directly to the hand [23].This was used to provide haptic feedback for the button press (see Fig. 2(b)).The sensation for the button was designed to emulate a physical button force, with a circle shaped sensation that ranged dynamically from maximum intensity at the tip down to no feedback at the point of click, and back up.
A 14" HD monitor was used to display the virtual hand and button.The Ultraleap device was positioned so that the participant's dominant hand would be tracked at a similar height to the desk (see Fig. 2(a)).This allowed for a more naturalised button-press interaction.The pressing of the virtual button was followed by an auditory tone after a variable delay.One second later a UI panel was displayed on the screen, which could be interacted with via keyboard and mouse.Headphones were used to minimise the possible sensory conflict between the mid-air tactile sensation and the auditory noise generated by the ultrasound array.

C. Tasks and Measures
To measure intentional binding (implicit SoA), we adopted the direct interval estimation method from Moore et al. [31].Participants were told that the interval between the button press and the tone would vary randomly between 1 ms and 999 ms.In reality, however, only three intervals are presented: 100 ms, 400 ms or 700 ms in a pseudorandomised order.Participants entered their estimations manually in the UI panel and clicked to submit and continue for each trial.Shorter interval estimations are taken to indicate a stronger SoA.
For explicit SoA, two questions were adapted from previous work [14] and tailored to the task: "I feel in control of the button press" for control over intentional action and "I feel I am causing the tone by pressing the button" for causation of the outcome.These were measured on a Likert scale of 1 (strongly disagree) to 7 (strongly agree) and reported every 12 trials (3 times per condition), thus higher average scores represent greater explicit agency.

D. Design
We used a 2 (haptic feedback) x 2 (visual congruence) withinsubject design.Haptic feedback was manipulated at two levels: with or without.Visual congruence was also manipulated at two levels: the button would depress with the movement of the virtual hand (Fig. 3(a)) or it would remain fixed (Fig. 3(b)).Each 36-trial condition was split into three steps.Each step consisted of 12 trials with the three interval lengths presented in a pseudorandomised order.At the end of each step we collected the self-report measures.A Latin square method was used to counterbalance conditions across participants.

E. Procedure
Participants were told they would be interacting with a nonimmersive virtual scene, using a hand tracking system, where they would press a button and hear a tone after a short delay.They were required to estimate the time interval between when the button is pressed and when they hear the tone, and that this can vary between 1-999 ms.
For the learning phase, participants were sat at a safe distance from the monitor, put the headphones on and the Ultraleap apparatus was adjusted to a point that it was in a natural position.In this practice block, they would hover their hand over the  ultrasound array to enter the virtual environment then press the button by making a downward movement of their estimate in the "Enter milliseconds" UI panel via the keyboard.
Following this they clicked submit via mouse.On these practice trials only they also received feedback of the exact time delay.These time delays were all either 50 ms, 500 ms and 950 ms to give them an idea of the lower, middle and far end of the scale.This block consisted of 10 trials with haptic feedback and visual congruence so as to also familiarise participants with the technology.In this time, participants were also instructed to try and avoid pressing the button twice in a single trial as this would render the trial void.If this did occur they were to report this and enter 0.
Moving onto the experimental block (Fig. 4), it was reiterated to participants that intervals would now range from 1-999 ms.They then completed 36 trials per condition, split into three blocks of 12 trials.After each block, an additional UI panel opened with each self-report question consecutively, and participants were told to click the answer (1-7, 1 being strongly disagree and 7 being strongly agree) that best indicates their experience.They then clicked continue in order to proceed to the next block of trials.A message was displayed to signal the end of a condition, after which participants were permitted a two minute break if necessary.When the session finished, participants were debriefed and asked if they had any questions or if they noticed anything about the experiment.

V. RESULTS
One participant was removed from the intentional binding analysis due to reporting losing concentration in one condition which led to consistent input of under 100 ms.No outliers were detected (all Z<3).Interval estimations were averaged for each condition.Lower scores indicate greater binding, and therefore, stronger implicit SoA.Scores for self-reported control and causation were averaged for each condition separately, with higher scores indicating greater explicit SoA.Data were processed in Excel and analysis carried out in Jamovi 2 and R.

B. Haptics and Visual Congruence on Self-Reported Control and Causal Influence
Due to significant departures from normality in the selfreport data (Shapiro Wilk, p<.05, Skewness Z>1.96), we applied the aligned rank transform (ART; [32]) before conducting the ANOVAs.This method permits factorial ANOVA on nonparametric data to also examine interactions.
A 2x2 repeated measures ANOVA was conducted on the aligned ranks for self-reported control with haptic feedback (with or without) and visual congruence (congruent or incongruent) entered as within-subject factors.There was a main effect of haptic feedback, F(1, 87) = 18.78, p<.001, η p 2 = .18,such that feelings of control over the button press action were greater with haptic feedback than without (Fig. 6(a)).There was also a  main effect of visual congruence, F(1, 87) = 30.46,p<.001, η p 2 = .26,revealing a greater sense of control over action when the button press was congruent compared to when not (Fig. 6
A 2x2 repeated measures ANOVA was conducted on the aligned ranks for self-reported causation with haptic feedback(with or without) and visual congruence (congruent or incongruent) entered as within-subject factors.There was a main effect of haptic feedback, F(1, 87) = 5.26, p = .024,η p 2 = .06,such that feelings of causing the outcome were greater with haptic feedback than without (Fig. 7(a)).There was also a main effect of visual congruence, F(1, 87) = 9.17, p = .003,η p 2 = .10,revealing a greater sense of causal influence when the button press was congruent compared to when not (Fig. 7(b)).There was no significant interaction, F(1, 87) = 0.07, p = .785,and so post-hoc tests were not carried out.

VI. DISCUSSION
The aims of this study were to investigate the impacts of midair haptics and visual congruence on SoA in touchless virtual interactions.We found that binding was not affected by either of these manipulations; however, both self-reported control over action and causal influence of outcome were.We discuss these results and their implications below.
To our knowledge this study is the first to look at mid-air haptics and visual congruence with implicit SoA when interacting with virtual objects.The lack of a significant effect here is surprising, especially given the apparent importance of these variables for SoA [8], [20].However, one possible explanation comes from the cue integration model [21].According to this model, SoA is based on various agency cues, including internal sensorimotor signals and external sensory feedback.The relative influence of these cues is determined by their reliability.Indeed, it has been shown that in situations where internal sensorimotor signals are reliable, external sensory information will have less influence (e.g., [31], [33]).This may explain our findings: the presence of internal sensorimotor signals could have attenuated the influence of haptics and visual congruence (external cues to agency).
Intriguingly, explicit SoA was strengthened by haptic feedback and weakened by visual incongruence.Notably, both control over action and the perceived sense of causing the resulting outcome were affected.Although at first these findings seem at odds with the implicit agency findings, the cue integration approach may shed some light.It has been suggested that implicit and explicit aspects of SoA are influenced by different agency cues [34].Importantly, implicit levels rely more on sensorimotor signals and explicit levels more on external sensory feedback.In this way, the modulation of self-reported control and causation by haptics and visual congruence is predicted by the model.
In terms of user experience and design considerations, our findings have two key implications.The first is to confirm the importance of visual congruence when users are interacting with virtual objects.We show that this factor can negatively impact users' experience of both controlling the object and through that, causal influence on the environment.Future research could also look into the extent of these effects too, for example whether more recent physics-based hand-object interactions [35] actually strengthens agency.Second, our findings extend previous research which investigated the effect of mid-air haptics on explicit SoA.Previously, we have suggested that the influence of haptics may be limited to protecting explicit feelings of control under conditions of agentic uncertainty [14].However, here our data suggest the presence of haptics can generally strengthen both explicit control over objects and the resulting causal influence.Overall, these findings are noteworthy in the context of HCI and design given of the foundational role of SoA in broader user experience, influencing other psychological variables such as motivation, engagement and presence [36].

VII. LIMITATIONS
One limitation we consider here concerns the minimal selfreport data collected.This limited the scope of other interesting effects on user experience that could have been examined.For example, a virtual embodiment questionnaire [16] could explore the effects of this external avatar-object relationship on the general sense of embodiment.In line with this it would have been interesting to explore the relationship, if any, between embodiment and SoA, something that has attracted interest in the fields of both psychology and HCI [37].Furthermore, some open-ended qualitative questions that might give voice to a broader range of agentic experiences than is permitted by our purely quantitative approach.
Another limitation relates to the non-immersive virtual environment.While this is appropriate for our aim here, it does limit the scope of its broader significance when it comes to HCI applications.For example, it would be interesting to note whether these effects extend to or even change in an immersive virtual environment.Despite this limitation, it should be noted that previous research has shown that implicit and explicit SoA are not affected by such a change of modality [38].

VIII. CONCLUSION
In sum, this study investigated object-related visual-haptic effects on SoA in a non-immersive virtual environment.For implicit SoA, there was no significant influence of these external sensory variables, perhaps because of the presence of internal sensorimotor signals (which implicit SoA relies on heavily).For explicit SoA, there was an overall strengthening with haptic feedback, and an overall weakening with visual incongruence.These findings can be explained under the cue integration approach, which may offer a useful framework for understanding how different variables are likely to influence user experience in this content.

Using
Virtual Objects With Hand-Tracking: The Effects of Visual Congruence and Mid-Air Haptics on Sense of Agency George Evangelou , Orestis Georgiou , Senior Member, IEEE, and James Moore Abstract-Virtual reality expands the possibilities of human action.With hand-tracking technology, we can directly interact with these environments without the need for a mediating controller.Much previous research has looked at the user-avatar relationship.

Fig. 1 .
Fig. 1.Changes in perceived time between actions and outcomes associated with the sense of agency.

Fig. 4 .
Fig. 4. Visualization of a typical experimental trial within a block.Actual intervals pseudorandomized for 12 trials x3 for block step measure.

Fig. 5 .
Fig. 5. Mean interval estimations plotted as a function of visual congruence and haptic feedback.The error bars represent standard error across participants.

Fig. 6 .
Fig. 6.Ratings of control over the virtual button plotted as a function of visual congruence and haptic feedback.The middle lines of the boxplot indicate the median; upper and lower limits indicate the first and third quartile.The error bars represent 1.5 X interquartile range or minimum or maximum.

Fig. 7 .
Fig. 7. Ratings of causal influence over the tone plotted as a function of visual congruence and haptic feedback.The middle lines of the boxplot indicate the median; upper and lower limits indicate the first and third quartile.The error bars represent 1.5 X interquartile range or minimum or maximum.