Team Leader
Interaction Design and Technologies, Fraunhofer IAO, Germany
I am currently leading the research group of Interaction Design and Technologies, at the Fraunhofer Institute for Industrial Engineering IAO, Stuttgart, Germany. My team focuses on interdisciplinary methods of human-computer Interaction and artificial intelligence to empower people through intuitive and inspiring user interfaces. Previously, I worked as a senior researcher at the Analytic Computing Group - University of Stuttgart, and at the Institute for Web Science and Technologies - University of Koblenz, Germany. I received my Ph.D. in Computer Science in 2016 from the University of Oldenburg, working with the Interactive Systems group at OFFIS, Germany.
I believe in user-centered approach, and most of my research work is driven by real-world problems. Currently, I work user-centered aspects to improve accessibility and usability, i.e., 1. Enhancing interface accessibility using novel interaction techniques such as eye gaze, touch, verbal and non-verbal input 2. Improving interface usability by understanding user/customer behavior through quantitative and qualitative analysis.
Research Interests:
- User Experience
- Multimodal Interaction
- Eye Tracking and Usability
- Geo-analytics and Visualization
- Web Information Retrieval
- Personalization and User Modeling
Interaction Design and Technologies, Fraunhofer IAO, Germany
Analytic Computing, University of Stuttgart, Germany
Institute for Web Science and Technologies, University of Koblenz, Germany
Media Informatics and Multimedia Systems Group, University of Oldenburg, Germany
Interactive Systems Group, OFFIS - Institute for Information Technology, Germany
Search Sciences Group, Yahoo! Labs Bangalore, India
Language Technologies Research Centre, IIIT Hyderabad, India
Ph.D. in Computer Science
University of Oldenburg, Germany
Master of Science by Research (Informatics)
IIIT Hyderabad, India
Bachelor of Engineering (Computer Science)
Bhilai Institute of Technology, Durg, India
GazeTheWeb integrates the visual appearance and control functionality of webpages in an eye tracking environment. It combines both, webpage element extraction and emulation of traditional input devices within the browser in order to provide smooth and reliable Web access. GazeTheWeb not only supports the efficient interaction with the webpage (scrolling, link navigation, text input), it effectively supports all essential browsing menu operations like bookmarks, history, and tab management.
The detailed description of this project could be found at the official project page or through my recent publications.
The aim of the research project GazeMining is to capture Web sessions semantically and thus obtain a complete picture of visual content, perception and interaction. The log streams of usability tests are evaluated using data mining. The analysis and interpretability of the data collected in this way is made possible by a user-friendly presentation, semi-automatic and automatic analysis procedures.
The detailed description of this project could be found at the official project page or through my recent publications.
Coastal urban development incorporates a wide range of development activities that are taking place as a result of the water element existing in the fabric of the city. We aim to shift the existing paradigm of policy making in coastal areas, which is largely based on intuition, towards an evidence-driven approach enabled by big data. Our basis is the sensing infrastructures installed in the cities o↵ering demographic data, statistical information, sensor readings and user contributed content forming the big data layer. Methods for big data analytics and visualization are used to measure the economic activity, assess the environmental impact and evaluate the social consequences.
The detailed description of this project could be found at the official project page
MAMEM is a platform designed to aid people with physical disabilities to use digital devices, in order to create optimal conditions for digitally and socially inclusive activities that promote their quality of life. Thus, MAMEM has sought to reshape radically the human- computer interaction with the purpose of offering a technology that will enable individuals with disabilities to fully use software applications so as to perform multimedia-related tasks using their eyes and mind.
The detailed description of this project could be found at the official project page or through my recent publications.
In this project I investigated methods for retrieving geo-spatially related information of various types and from different sources, integrating these into novel visualizations that are easy to interpret. We developed interactive interfaces which go beyond map-based points and provide intuitive visualization for relevant aspects of the geo-spatial availability of services and infrastructure such as spreading and distribution, sparseness and density, reachability and connectivity.
The detailed description of this project could be found in the University project page
We have developed "GazeTheKey” interface for eye typing, where keys not only signifies the input letter, but also predict the relevant words that could be selected by user’s gaze utilizing a two-step dwell time. The proposed design brings the relevant suggestions in the visual attention of users, minimizing the additional cost of scanning an external word suggestion list. Furthermore, it offers the possibility to include much more suggestions than the conventional interfaces having few suggestions at the top of keyboard layout.
The project goal was to present useful spatial information from Web to car passengers with respect to the context and driving situation. To realise such an information system, I contributed in the development of specialised geographical search engine with the enhancements of retrieval methods for geospatial data collection through focused web crawling, address extraction and indexing.
The detailed description of this project could be found in the OFFIS project page
I worked on the problem of attribute extraction from template-generated web pages. I contributed in building a high-performance extraction system using machine learning techniques to perform the extraction on real-life web pages, specifically in product and business domain. Another aspect of focus is on the problem of robust Wrapper generation to accommodate seasonal changes in web page structure and produce precise extractions across time.
In this work my aim was to provide ”more information with less overload” for faster and easier way to access precise information to the end user. This project is especially targeted for the small device users, where accessing information through documents becomes really difficult due to the limitations like small display, tiny keypad and low bandwidth. I used techniques like clustering, summarization and personalization to produce final result as a summary text in response to user query.
The objective of my PhD research was to investigate and develop sophisticated visual interfaces and ranking methods to enable end-users in discovering knowledge hidden in multi-dimensional geospatial databases. The proposed interaction procedure goes beyond the conventional list based local search interfaces and proposes to access the geospatial data sources with regional overview. Users can compare the characterization of urban areas with respect to multiple spatial dimensions of interest and can search for the most suitable region. The search experience is further enhanced via efficient regional ranking algorithms and optimization methods to accomplish the complex search task in computationally effective manner.
Advisor: Prof. Susanne Boll
The thesis is focused on proposing theoretical framework for document summarization; I formalize document summarization as a decision making problem, and derive a general extraction mechanism for picking sentences based on expected risk of information loss. Through this formulation I come up with a lightweight function to generate more informative summary than the earlier approaches which use complex algorithm for summary generation.
Advisor: Prof. Vasudeva Varma
Text entry by gaze is a useful means of hands-free interaction that is applicable in settings where dictation suffers from poor voice recognition or where spoken words and sentences jeopardize privacy or confidentiality. However, text entry by gaze still shows inferior performance and it quickly exhausts its users. We introduce text entry by gaze and hum as a novel hands-free text entry. We review related literature to converge to word-level text entry by analysis of gaze paths that are temporally constrained by humming. We develop and evaluate two design choices: ``HumHum'' and ``Hummer.'' The first method requires short hums to indicate the start and end of a word. The second method interprets one continuous humming as an indication of the start and end of a word. In an experiment with 12 participants, Hummer achieved a commendable text entry rate of 20.45 words per minute, and outperformed HumHum and the gaze-only method EyeSwipe in both quantitative and qualitative measures.
The conventional dwell-based methods for text entry by gaze are typically slow and uncomfortable. A swipe-based method that maps gaze path into words offers an alternative. However, it requires the user to explicitly indicate the beginning and ending of a word, which is typically achieved by tedious gaze-only selection. This paper introduces TAGSwipe, a bi-modal method that combines the simplicity of touch with the speed of gaze for swiping through a word. The result is an efficient and comfortable dwell-free text entry method. In the lab study TAGSwipe achieved an average text entry rate of 15.46 wpm and significantly outperformed conventional swipe-based and dwell-based methods in efficacy and user satisfaction.
The evolution of eye tracking and brain-computer interfaces has given a new perspective on the control channels that can be used for interacting with computer applications. In this book leading researchers show how these technologies can be used as control channels with signal processing algorithms and interface adaptations to drive a human-computer interface. Topics included in the book include a comprehensive overview of eye-mind interaction incorporating algorithm and interface developments; modeling the (dis)abilities of people with motor impairment and their computer use requirements and expectations from assistive interfaces; and signal processing aspects including acquisition, preprocessing, enhancement, feature extraction, and classification of eye gaze, EEG (Steady-state visual evoked potentials, motor imagery and error-related potentials) and near-infrared spectroscopy (NIRS) signals. Finally, the book presents a comprehensive set of guidelines, with examples, for conducting evaluations to assess usability, performance, and feasibility of multi-model interfaces combining eye gaze and EEG based interaction algorithms. The contributors to this book are researchers, engineers, clinical experts, and industry practitioners who have collaborated on these topics, providing an interdisciplinary perspective on the underlying challenges of eye and mind interaction and outlining future directions in the field.
The motivation to investigate eye tracking as a hands-free input method for interaction is pertinent, because eye control can be a significant addition to the lives of people with a motor disability, which hinders their use of mouse and keyboard. With this motivation in mind, so far research in eye-controlled interaction has focused on several aspects of interpreting eye tracking as input for pointing, typing, and interaction methods with interfaces. In this regard, the major question is about how well does the eye-controlled interaction work for the proposed methods? How efficiently can pointing and selection be performed? Whether com- mon tasks can be performed quickly and accurately with the novel interface? How different gaze interaction methods can be compared? What is the user experience while using eye-controlled interfaces? These are the sorts of questions that can be answered with an appropriate evaluation methodology. Therefore, in this chapter, we review and elaborate different evaluation methods used in gaze interaction research, so the readers can inform themselves of the procedure and metrics to assess their novel gaze interaction method or interface.
We developed a gaze immersive YouTube player, called GIUPlayer, with two objectives: First to enable eye-controlled interaction with video content, to support people with motor disabilities. Second to enable the prospect of quantifying attention when users view video content, which can be used to estimate natural viewing behaviour. In this paper, we illustrate the functionality and design of GIUPlayer, and the visualization of video viewing pattern. The long-term perspective of this work could lead to the realization of eye control and attention based recommendations in online video platforms and smart TV applications that record eye tracking data.
Usability analysis plays a significant role in optimizing Web interaction by understanding the behavior of end users. To support such analysis, we present a tool to visualize gaze and mouse data of Web site interactions. The proposed tool provides not only the traditional visualizations with fixations, scanpath, and heatmap, but allows for more detailed analysis with data clustering, demographic correlation, and advanced visualization like attention flow and 3D-scanpath. To demonstrate the usefulness of the proposed tool, we conducted a remote qualitative study with six analysts, using a dataset of 20 users browsing eleven real-world Web sites.
This chapter describes how eye tracking can be used for interaction. The term eye tracking refers to the process of tracking the movement of eyes in relation to the head, to estimate the direction of eye gaze. The eye gaze direction can be related to the absolute head position and the geometry of the scene, such that a point-of-regard (POR) may be estimated. We call the sequential estimations of the POR gaze signals in the following, and a single estimation gaze sample. In Section 5.1, we provide basic description of the eye anatomy, which is required to understand the technologies behind eye tracking and the limitations of the same. Moreover, we discuss popular technologies to perform eye tracking and explain how to process the gaze signals for real-time interaction. In Section 5.2, we describe the unique challenges of eye tracking for interaction, as we use the eyes primarily for perception and potentially overload them with interaction. In Section 5.3, we survey graphical interfaces for multimedia access that have been adapted to work effectively with eye-controlled interaction. After discussing the state-of-the-art in eye-controlled multimedia interfaces, we outline in Section 5.4 how the contextualized integration of gaze signals might proceed in order to provide richer interaction with eye tracking.
We present TouchGazePath, a multimodal method for entering personal identification numbers (PINs). Using a touch-sensitive display showing a virtual keypad, the user initiates input with a touch at any location, glances with their eye gaze on the keys bearing the PIN numbers, then terminates input by lifting their finger. TouchGazePath is not susceptible to security attacks, such as shoulder surfing, thermal attacks, or smudge attacks. In a user study with 18 participants, TouchGazePath was compared with the traditional Touch-Only method and the multimodal Touch+Gaze method, the latter using eye gaze for targeting and touch for selection. The average time to enter a PIN with TouchGazePath was 3.3 s. This was not as fast as Touch-Only (as expected), but was about twice as fast the Touch+Gaze. TouchGazePath was also more accurate than Touch+Gaze. TouchGazePath had high user ratings as a secure PIN input method and was the preferred PIN input method for 11 of 18 participants.
Eye tracking systems have greatly improved in recent years, being a viable and affordable option as digital communication channel, especially for people lacking fine motor skills. Using eye tracking as an input method is challenging due to accuracy and ambiguity issues, and therefore research in eye gaze interaction is mainly focused on better pointing and typing methods. However, these methods eventually need to be assimilated to enable users to control application interfaces. A common approach to employ eye tracking for controlling application interfaces is to emulate mouse and keyboard functionality. We argue that the emulation approach incurs unnecessary interaction and visual overhead for users, aggravating the entire experience of gaze-based computer access. We discuss how the knowledge about the interface semantics can help reducing the interaction and visual overhead to improve the user experience. Thus, we propose the efficient introspection of interfaces to retrieve the interface semantics and adapt the interaction with eye gaze. We have developed a Web browser, GazeTheWeb, that introspects Web page interfaces and adapts both the browser interface and the interaction elements on Web pages for gaze input. In a summative lab study with 20 participants, GazeTheWeb allowed the participants to accomplish information search and browsing tasks significantly faster than an emulation approach. Additional feasibility tests of GazeTheWeb in lab and home environment showcase its effectiveness in accomplishing daily Web browsing activities and adapting large variety of modern Web pages to suffice the interaction for people with motor impairment.
Text predictions play an important role in improving the performance of gaze-based text entry systems. However, visual search, scanning, and selection of text predictions require a shift in the user's attention from the keyboard layout. Hence the spatial positioning of predictions becomes an imperative aspect of the end-user experience. In this work, we investigate the role of spatial positioning by comparing the performance of three different keyboards entailing variable positions for text predictions. The experiment result shows no significant differences in the text entry performance, i.e., displaying suggestions closer to visual fovea did not enhance the text entry rate of participants, however they used more keystrokes and backspace. This implies to the inessential usage of suggestions when it is in the constant visual attention of users, resulting in increased cost of correction. Furthermore, we argue that the fast saccadic eye movements undermines the spatial distance optimization in prediction positioning.
Eye tracking as a tool to quantify user attention plays a major role in research and application design. For Web usability, it has become a prominent measure to assess which sections of a Web page are read, glanced or skipped. Such assessments primarily depend on the mapping of gaze data to a page representation. However, current representation methods, a virtual screenshot of the Web page or a video recording of the complete interaction session, suffer either from accuracy or scalability issues. We present a method that identifies fixed elements on Web pages and combines user viewport screenshots in relation to fixed elements for an enhanced representation of the page, in alignment with the user experience. We conducted an experiment with 10 participants and the results signify that analysis with our method is more efficient than a video recording, which is an essential criterion for large scale Web studies.
Hands-free browsers provide an effective tool for Web interaction and accessibility, overcoming the need for conventional input sources. Current approaches to hands-free interaction are primarily categorized in either voice or gaze-based modality. In this work, we investigate how these two modalities could be integrated to provide a better hands-free experience for end-users. We demonstrate a multimodal browsing approach combining eye gaze and voice inputs for optimized interaction, and to suffice user preferences with unimodal benefits. The initial assessment with five participants indicates improved performance for the multimodal prototype in comparison to single modalities for hands-free Web browsing.
Enabling Web interaction by non-conventional input sources like eyes has great potential to enhance Web accessibility. In this paper, we present a Chromium based inclusive framework to adapt eye gaze events in Web interfaces. The framework provides more utility and control to develop a full-featured interactive browser, compared to the related approaches of gaze-based mouse and keyboard emulation or browser extensions. We demonstrate the framework through a sophisticated gaze driven Web browser, which effectively supports all browsing operations like search, navigation, bookmarks, and tab management.
Web is essential for most people, and its accessibility should not be limited to conventional input sources like mouse and keyboard. In recent years, eye tracking systems have greatly improved, beginning to play an important role as input medium. In this work, we present GazeTheWeb, a Web browser accessible solely by eye gaze input. It effectively supports all browsing operations like search, navigation and bookmarks. GazeTheWeb is based on a Chromium powered framework, comprising Web extraction to classify interactive elements, and application of gaze interaction paradigms to represent these elements.
Gaze-based virtual keyboards provide an effective interface for text entry by eye movements. The efficiency and usability of these keyboards have traditionally been evaluated with conventional text entry performance measures such as words per minute, keystrokes per character, backspace usage, etc. However, in comparison to the traditional text entry approaches, gaze-based typing involves natural eye movements that are highly correlated with human brain cognition. Employ- ing eye gaze as an input could lead to excessive mental demand, and in this work we argue the need to include cognitive load as an eye typing evaluation measure. We evaluate three variations of gaze-based virtual keyboards, which implement variable designs in terms of word suggestion positioning. The conventional text entry metrics indicate no significant difference in the performance of the different keyboard designs. However, STFT (Short-time Fourier Transform) based analysis of EEG signals indicate variances in the mental workload of participants while interacting with these designs. Moreover, the EEG analysis provides insights into the user’s cognition variation for different typing phases and intervals, which should be considered in order to improve eye typing usability.
In the conventional keyboard interfaces for eye typing, the functionalities of the virtual keys are static, i.e., user’s gaze at a particular key simply translates the associated letter as user’s input. In this work we argue the keys to be more dynamic and embed intelligent predictions to support gaze-based text entry. In this regard, we demonstrate a novel "GazeTheKey" interface where a key not only signifies the input character, but also predict the relevant words that could be selected by user's gaze utilizing a two-step dwell time.
In recent years, eye tracking systems have greatly improved, beginning to play a promising role as an input medium. Eye trackers can be used for application control either by simply emulating the mouse and keyboard devices in the traditional graphical user interface, or by customized interfaces for eye gaze events. In this work, we evaluate these two approaches to assess their impact in usability. We present a gaze-adapted Twitter application interface with direct interaction of eye gaze input, and compare it to Twitter in a conventional browser interface with gaze-based mouse and keyboard emulation. We conducted an experimental study, which indicates a significantly better subjective user experience for the gaze-adapted approach. Based on the results, we argue the need of user interfaces interacting directly to eye gaze input to provide an improved user experience, more specifically in the field of accessibility.
Eye tracking devices have become affordable. However, they are still not very much pre- sent in everyday lives. To explore the feasibility of modern low-cost hardware in terms of reliability and usability for broad user groups, we present a gaze-controlled game in a standalone arcade box with a single physical buzzer for activation. The player controls an avatar in appearance of a butterfly, which flies over a meadow towards the horizon. Goal of the game is to collect spawning flowers by hitting them with the avatar, which increases the score. Three mappings of gaze on screen to world position of the avatar, featuring different levels of intelligence, have been defined and were randomly assigned to players. Both a survey after a session and the high score distribution are considered for evaluation of these control styles. An additional serious part of the game educates the players in flower species, who are rewarded with a point-multiplier for prior knowledge. During this part, gaze data on images is collected, which can be used for saliency calculations. Nearly 3000 completed game sessions were recorded on a state horticulture show in Germany, which demonstrates the impact and acceptability of this novel input technique among lay users.
Evolution of affordable assistive technologies like eye tracking helps people with motor disabilities to communicate with computers by eye-based interaction. Eye-controlled interface environments need to be specially built for better usability and accessibility of the content and should not be on interface layouts that are conducive to conventional mouse or touch-based interfaces. In this work we argue the need of the domain specific heuristic checklist for eye-controlled interfaces, which conforms to the usability, design principles and less demanding from cognitive load perspective. It focuses on the need to understand the product in use inside the gaze based environment and apply the heuristic guidelines for design and evaluation. We revisit Nielsen’s heuristic guidelines to acclimatize it to eye-tracking environment, and infer a questionnaire for the subjective assessment of eye-controlled user interfaces.
The EU-funded MAMEM project (Multimedia Authoring and Management using your Eyes and Mind) aims to propose a framework for natural interaction with multimedia information for users who lack fine motor skills. As part of this project, the authors have developed a gaze-based control paradigm. Here, they outline the challenges of eye-controlled interaction with multimedia information and present initial project results. Their objective is to investigate how eye-based interaction techniques can be made precise and fast enough to let disabled people easily interact with multimedia information.
The user interfaces and input events are typically composed of mouse and keyboard interactions in generic applications. Eye-controlled applications need to revise these interactions to eye gestures, and hence design and optimization of interface elements becomes a substantial feature. In this work, we propose a novel eyeGUI framework, to support the development of such interactive eye-controlled applications with many significant aspects, like rendering, layout, dynamic modification of content, support of graphics and animation.
Increasing volumes of spatial data about urban areas are captured and made available via volunteered geographic information (VGI) sources, such as OpenStreetMap (OSM). Hence, new opportunities arise for regional exploration that can lead to improvements in the lives of citizens through spatial decision support. We believe that the VGI data of the urban environment could be used to present a constructive overview of the regional infrastructure with the advent of web technologies. Current location-based services provide general map-based information for the end users with conventional local search functionality, and hence, the presentation of the rich urban information is limited. In this work, we analyze the OSM data to classify the geo entities into consequential categories with facilities, landscape and land use distribution. We employ a visual overlay of heat map and interactive visualizations to present the regional characterization on OSM data classification. In the proposed interface, users are allowed to express a variety of spatial queries to exemplify their geographic interests. They can compare the characterization of urban areas with respect to multiple spatial dimensions of interest and can search for the most suitable region. The search experience is further enhanced via efficient optimization and interaction methods to support the decision making of end users. We report the end user acceptability and efficiency of the proposed system via usability studies and performance analysis comparison.
Visualization of maps to explore relevant geographic areas is one of the common practices in spatial decision scenarios. However visualizing geographic distribution with multidimensional criteria becomes a nontrivial setup in the conventional point based map space. In this work the swarm intelligence. We exploit the particle swarm optimization (PSO) framework, where particles represent geographic regions that are moving in the map space to find better position with respect to user's criteria. We track the swarm movement on map surface to generate a relevance heatmap, which could effectively support the spatial analysis task of end users.
The ranking problem in geospatial Web search has traditionally been focused towards query dependent relevance, i.e., the combination of textual and geographic similarity of pages with respect to query text and footprint. The query independent relevance which is also known as the popularity or significance of a page is a valuable aspect of generic Web search ranking. But the research progression or the formalization of query independent significance for geospatial Web search has been limited to the basic adaption of general popularity of pages. In this paper, we discuss how several location sensitive properties could alter the significance of geospatial Web pages. We particularly argue the significance of pages with respect to categorical, regional, and granular criterions. We analyze these criterions over a huge geospatial Web graph of different German cities, and perform some small-scale evaluations of our approach. We derive some valuable heuristics on the link structure of geospatial Web that can be used in the ranking formulation, or to cater certain contextual information needs from end-user of a geospatial Web search system.
I am currently looking for a new team member 'Researcher for Data Science and Human-Computer Interaction' to work in well-known national and international, research and consulting projects at the earliest possible date. Please apply Here.
We have several opportunities for student projects, internships, and thesis. Please contact me if you are interested.
Multimodal Interaction with Gaze and Hum
Multimodal Text Entry
User comment analysis
Hands-free gaming
Web Interaction and Analysis by Gaze
Eye Typing Keyboards
Opinion Clusteing and Summarization
Hands-free Web Browsers
Optimizing Gaze-Touch Interaction
Web usability analysis
A Computational Intelligence Framework for Multidimensional Regional Search
Visual Representation of Similar Touristic Regions Using Georeferenced Images
Geovisual analytics for lay users: A user-centred approach
I would be happy to talk to you and dicuss any kind of research or business collaborations.
You can find me at my office located at Universitaetsstraße 32, 70569, Stuttgart, Germany