The project Virtual Humans in the Brabant Economy (ViBE) aims to develop embodied conversational agents (ECAs) for healthcare applications. An ECA is a virtual character that can interact with humans using spoken language and non-verbal behaviors. ViBE is an interdisciplinary research project that includes five knowledge institutions, five industry partners, and two hospitals in the Netherlands. The collaboration between academic institutions and companies permits bringing together the expertise in artificial intelligence, virtual reality, and 3D scanning, whereas the two hospitals offer medical expertise and living lab environments for testing of the developed agents. The ViBE project commenced in 2018 and was supported by a research grant of 7 million euros from the European Union, OP Zuid, the Ministry of Economic Affairs, and the municipality of Tilburg. Tilburg University, with a team of two faculty members, two PhD students, and one programmer, is principal investigator of this project. The end date of the project is December, 2022.
While virtual agents that have been developed during ViBE are meant to be applied to use cases in healthcare, the “digital human” technology, as it is sometimes called, can be employed in a variety of areas: customer service, military training, counselling, tutoring, and entertainment. To create a virtual agent requires a close collaboration between researchers and experts in different fields, including computer science, design, cognitive science, and computational linguistics. This is where the strength of the ViBE project lies – being able to bring together researchers and practitioners across a range of areas to realize scientific and societal goals.
From the applied perspective, ViBE seeks to test the developed agent technology in two concrete business cases for the medical sphere. In the first use case, the virtual agent will be able to impart advice and information to users on a bariatric surgery, the surgery that facilitates weight loss. In the second use case, the developed agent will be combined with a physical mannequin to be used for medics training in obstetrics. The goal of this is to make medics training in delivery rooms more immersive than is typically the case when using a physical mannequin. From the scientific perspective, the ViBE strives to introduce a change in the development of virtual agents. Still today many virtual agents lack in realistic appearance as well as verbal and non-verbal behaviors. Moreover, the interactions with these agents fall behind human-human interactions regarding the richness and sophistication of social cues. Research conducted under ViBE aims to provide new insights into how effective behavior for virtual agents can be created and how real humans perceive these agents. Understanding the mechanisms that govern the behavior generation and the perception of agents by users can help design better virtual agents in the future. The overarching goal of the ViBE project is to deliver a generic platform for virtual agent development that can be used to generate a virtual agent that is customizable to any domain.
Creating a virtual agent
Today, an increasing number of people are becoming familiar with social robots that are physically embodied, such as the Pepper Robot that has humanoid features, the body and the head with the face. Virtual agents, in contrast, are graphically embodied, and as such, they can be presented on a computer screen, a tablet, a smartphone, or in virtual reality. Virtual agents, similarly to physical robots, can interact with humans using speech and various non-verbal cues, such as facial expressions, eye gaze, gestures, tone of voice, and posture. By conveying these social cues, virtual agents can regulate communication with humans. Most virtual agents are currently created in the image of a human, albeit the quality of the representation can vary greatly, from cartoony virtual agents to photorealistic virtual agents.
Appearance and Behavior
Virtual agents that are photorealistic in their appearance are created using a combination of the state-of-the-art 3D face digitization (3D scanning and photogrammetry techniques) and deep learning. While 3D scanning permits creating a 3D model of the real object, in this case, the face and the body of a real person, deep learning can be used to enhance specific features that are difficult to be scanned, for instance, a person’s hair or teeth.
Modelling a virtual agent after a real person entails a set of steps. First, multiple images of a real person’s face and/or body from different angles are taken. For this, many cameras that range in number from 30 to more than 60 are used. The person who is being scanned is asked to produce different facial expressions so that an animated model can be created. The obtained images are then processed and combined into a 3D model. Next, texture is incorporated into the model which adds color and detail to the model. Texture is extracted from a 2D image and applied to a 3D model. For the purposes of the ViBE project, one of the partners of the project, created a custom-made studio and workflow to efficiently generate virtual renditions of real people.
A good question to ask is then whether we need photorealistic virtual agents, in that people generally tend to treat computers that are humanoid but not necessarily realistic as if they were humans. The answer to this question lies in the application domain. For healthcare and education use cases, photorealistic virtual agents might be beneficial for kick-starting the same perceptual and cognitive processes that are expected to take place when, for example, one is looking at the face of a real human. This further can help creating and fostering trust between a human and a virtual agent. Having a trusted agent is desirable in both healthcare and educational settings. Research conducted in the ViBE and elsewhere show that humans generally have high standards for what they consider human-like and that even tiny details in the face matter for the perceived human-likeness of the agent.
Creating convincing verbal and non-verbal behavior for virtual agents is even a more difficult task than the creation of photorealistic appearance. The reasons for this are multiple. First, behavior is dynamic. For instance, a virtual agent needs to know not only what facial expression to show but also when to show it, that is, when to raise an eyebrow or produce a smile. Secondly, behavior is situated and contextual. The same facial expression may mean different things depending on the context. Research conducted in the ViBE project showed that computationally light models can be used to generate facial expressions. Thirdly, non-verbal behaviors such as facial expressions and eye gaze need to be aligned with the generation of spoken language utterances, and generating spoken language in itself is complex. By studying statistical regularities present in spoken dialog research in the ViBE project was able to help understand what makes conversations human-like.