Category ArchiveSpeech user interface
Driving simulator & R&D & Speech user interface Andrew Kun on 06 Nov 2008
Project54 research featured in SLTC e-Newsletter
The Fall 2008 e-Newsletter of the Speech and Language Technical Committee of the IEEE Signal Processing Society includes an article about the driving simulator-based research going on within Project54. Thanks to Mike Seltzer for his help with editing the article.
Btw, the image that appears in the article was created by Alex Shyrokov and may be included in an upcoming book by Matthias Wölfel and John McDonough entitled Distand Speech Recognition.
Andrew Kun
PDA & Speech user interface & Ubicomp & User interface Michael Farrar on 03 Nov 2008
Using voice to tag digital photographs on the spot
Hi ecebloggers,
In the past I’ve discussed the imaging application, and in particular, the tagging capabilities it provides. Now it’s time to put the application to work. Tagging of media, particularly photographs, has become a very popular and efficient means of organizing material on the internet and on personal computers. Over a short period of time the technique has evolved from an optional feature to a must-provide service, and can be found within modern desktop and internet photo galleries. However, tagging is normally accomplished long after the images have been captured, and possibly at the expense of in-the-moment information. In this view, we hypothesize that tagging photos right after they are taken, or on the spot, will result in a larger number of tags than tagging photos long after they are taken. We also hypothesize that the tags created on the spot will be perceived to better describe the photos by consumers of the photos. Finally, we hypothesize that a convenient way of tagging photos on the spot is by using voice commands.
To test these hypotheses we will conduct a study in which participants will be asked to introduce the University of New Hampshire campus in a number of low-resolution pictures captured using a Symbol MC50 PDA. Participants will be divided into two groups. One group will be able to issue voice commands to select tags from a list while the other group will have to manually select or type in tags from the same list. The images and their tags will be posted to an internet photo gallery, such as Flickr, which will allow us to recruit a third group of participants who will compare the quality of the tags created by the first two participant groups. The study will be conducted throughout the months of November and December, so check back for the results soon after. Below is a sample image of UNH posted on Flickr. The tagging section is highlighted in red.
Michael Farrar
Driving simulator & Speech user interface nemanja on 29 Oct 2008
Effects of using a push-to-talk button for speech user interfaces on visual attention
Hello ecebloggers,
I’m very proud today because I’m writing about my first experiment. The experiment builds on two of our previous studies. In one of them Zeljko Medenica found that when speech recognition performance is poor, having to use a push-to-talk button results in worse driving performance than using ambient recognition (you can find the paper on his study here). In his thesis Oskar Palinko found that drivers divert their visual attention from the outside world when using the push-to-talk button. This happens because drivers often cast a glance away from the road in order to locate the push-to-talk button before operating it.
A simple solution to this latter problem may be training drivers not to look for the push-to-talk button. However, just like users of consumer electronics in general, drivers are not willing to be trained for long periods of time in order to use an in-car device. The goal of this study is to determine if simple training, such as verbal instructions issued before driving, would improve drivers’ visual attention directed at the road ahead.
The experiment setup is shown in the picture below.

The experiment involves our high fidelity driving simulator and eye tracker. I’m also using three cameras: one to record subjects’ head movements, one to record hand positions while driving, and one to record the eye tracker laptop screen. The eye tracker laptop shows the video captured by the two eye tracking cameras but there is no elegant way of saving this video.
I’ve started the experiments this week with Oskar’s help so I’ll be able to post about results very soon!
Have a good one,
Nemanja Memarovic
Driving simulator & Speech user interface oszkar on 28 Oct 2008
License Plate Experiment
The last few weeks I have been working on designing a new driving experiment which would compare different user interfaces for entering license plate numbers into the police cruiser system for checking records. The two user interfaces are manual and speech. To be able to achieve this, license plates had to be added into our simulations. I have built on the results of Zeljko Medenica’s work to put in license plates as textures in the DriveSafety simulator computers. These worked only for some car models, because of the way textures are applied to them. Here is an example of successful applications:

The plates had to be made larger than in real life, because of the limited resolution of the projection system (1024×768). This way, the plates became readable from moderate distances and still preserved a realistic feel.
The experiment hypothesizes that a voice input system would have a beneficial effect on driving performance compared to using a manual interface.

We would also research if completion time would be longer for either of the two methods. Further, looking at the eye-tracker data, it would be possible to say which method demands more visual attention. My bet is on the manual UI. We will post all interesting results on this blog.
Oszkar Palinko
Driving simulator & Navigation & Project54 & R&D & Speech user interface zeljko.medenica on 27 Oct 2008
Navigation aids and driving performance
Probably everybody has at least some experience with the GPS-based personal navigation aids. They usually provide directions using voice prompts and information displayed on LCD screens. While such devices appear to be less distracting to use than paper directions, in-car displays may distract drivers from their primary task, driving.
In order to assess how different in-car navigation aids affect driving performance and visual attention, we conducted an experiment using our high-fidelity driving simulator. The simulator is also equipped with an eye-tracker which provides information about subjects’ visual attention. The following picture illustrates how the experimental setup looked like.

There were three navigation aids that subjects tested in this experiment: paper directions (turn-by-turn directions with a map printed on a sheet of paper), standard PND directions (a map displayed on an LCD screen with turn-by-turn directions delivered through voice prompts), and voice-only directions (just voice-delivered turn-by-turn directions). Participants were driving on a two-lane city road with markings and a light traffic was introduced. The following picture shows one snapshot from the simulation.

We recorded three measures of driving performance from our driving simulator: lane position, steering wheel angle, and velocity. Higher variances of these variables represent worse driving performance. We also calculated the percent dwell time on the outside world for each subject using the data collected by the eye-tracker.
Our results showed that using paper directions degrades driving performance (lane position variance, steering wheel angle variance, and the mean velocity were significantly different between the paper and other navigation aids) and visual attention significantly more than using either a navigation device that provides voice prompts with a map or a device that provides voice prompts only.
Regarding the visual attention, the time participants spent looking at the road ahead was significantly higher when using the voice-only aid than when using either the paper or the voice and visual aid. On average participants spent around 94% of time looking at the road when using the voice only aid and around 89% when using the voice and visual aid.
While these findings supported our hypotheses, one interesting thing was that the majority of the participants expressed that they would prefer using the navigation aid which provides both the visual and voice directions. Based on this, we propose for our future work to model the glancing behavior of the drivers, which would enable us to predict when one would require next instruction, so we would be able to issue a voice prompt. This in turn would enable drivers to keep their eyes on the road at all times, while still being able to have the next instruction in a timely manner.
Lots of other experiments regarding this issue will follow, so we’ll keep you posted!
Zeljko Medenica
Education & Speech user interface & Talk puneet_IITguwahati on 30 Aug 2008
Speech User Interface Lecture - IIT Guwahati
Hi Ecebloggers,
This time, I decided to write something about the current activities in my college, IIT Guwahati. After completing my internship at UNH , I always thought how I could promote the current research work at UNH and my pilot experiment work for obstacle testing during the internship period. Fortunately, the electronics department in our institute has initiated a lecture series on certain crucial areas of research in the electronics domain. This lecture series is being organized and managed by Cepstrum, the IIT Guwahati ECE society. I volunteered to contribute in this lecture series, the topic being “Speech Processing and In vehicle Interaction”. Quite Interestingly, much more students showed up for this lecture than was expected. I gradually commenced from defining speech synthesis , speech recognition etc. and later slowly paced up the proceedings covering the in - vehicle speech user interface. For elucidating an “in vehicle speech user interface” in a limited time, nothing could serve better than the demo video available for the Project54 speech user interface on the Catlab website. I then moved on to expand some details about my work at UNH and then finally covered the future prospects and current research at the Microsoft Research Lab, US. I used the Microsoft Research Driving simulator video to show the future directions in this field. Since some of the listeners were first year and second year undergraduates, I had to restrict myself to the basics of speech without going into the technical details. Here is a picture from the lecture and the rest can be found on this website.

Now, something about the listeners’ responses..Some of the second year undergraduates were interested to know about certain specific fields like keyword spotting and speech recognition. Since my final year B.Tech project at IIT Guwahati is on “Speech based Emotion recognition” , I was able to suggest some parameters imperative for emotion and speech recognition but again, without going into the finer technical details. Moving on, some juniors were interested to know how the lane variance and the steering wheel angle variance measures could be used to actually improve the driving performance. To vividly reply that query, I mentioned about the current research work being done in Project54 Lab to improve the driver performance. I remembered reading Oszkar’s paper on the wireless Push to Talk Glove and thus quoted it as an example, elaborating how it is better than the fixed PTT switch and helps to improve the driving performance. Finally, I concluded the lecture providing the website address for eceblogger as a resource for information about the latest proceedings in the Project 54 Lab. Thus to summarize , I had a wonderful time and I hope I could have incited some enthusiasm in my junior undergraduates.
Puneet Lakhanpal
Education & People & Project54 & R&D & Speech user interface & UNH ECE & Ubicomp Andrew Kun on 29 Aug 2008
Oszkar Palinko defends MS thesis
Last Friday Oskar Palinko defended his MS thesis. Oszkar’s thesis was centered around the cool push-to-talk (PTT) glove he has designed.

Oszkar ran a rather large user study (24 participants) to evaluate if the PTT glove outperforms a fixed PTT button. While in comparing driving performance when using the two PTT solutions Oszkar didn’t find a main effect, he did find that the experiment participants looked down at the steering wheel more often when using the fixed PTT. Is this a problem? Maybe. While the total amount of time subjects spent looking at the steering wheel when using the fixed PTT button amounted to about 1% of the total experiment time, the average fixation was around 300 ms long. If such a fixation came at the wrong time (e.g. at the moment a leading vehicle started to brake), this could be a problem.
Congratulations Oszkar on a job well done!
Andrew Kun
People & R&D & Speech user interface Andrew Kun on 17 Aug 2008
Tim Paek talk on mobile speech interaction
Last week Tim Paek of Microsoft Research visited the Project54 lab at UNH. Tim and I have been collaborating for about two years now, and this summer Tim is the supervisor of my PhD student, Zeljko Medenica, during Zeljko’s summer internship with MSR.
As part of his visit Tim gave a talk on his research in the field of mobile speech interaction. The talk covered three topics. First, Tim discussed his work on the Voice Command. Tim worked on utilizing user models to reduce the semantic error rate. Next, he talked about using reinforcement learning in a voice enabled browser. Finally, Tim talked about a mobile voice search application. The application provides intuitive ways to help the search application find the right answer. One way this is accomplished is by allowing multimodal refinements to the search. E.g. the user can help the speech recognizer by typing in the first letters of a problem word in an utterance. Another way is by allowing uncertain queries in the search (you can say “something” as part of the search to indicate you’re not sure what the exact search term should be). As part of the talk Tim gave a live demo of the mobile voice search application:

The talk was very interesting and the demo was impressive - thanks Tim! For more pictures click here.
Andrew Kun
Conferences & PDA & Speech user interface & US travel oszkar on 07 Aug 2008
The Intelligent Environments Conference ‘08
A few day ago, Prof Andrew Kun, Andras Fekete and I visited the Intelligent Environments ‘08 Conference in Seattle, WA. An earlier post already introduced this conference on eceblogger. We presented three works there. Andras had a great poster on the deployment of his new P54 PDA software. The poster session took place in the afternoon of the first day. I think his work drew the biggest crowd.

Andras presented the PDA study with great confidence and answered the questions flawlessly. Besides him, I also presented my research results from the past year. I had two oral presentations. The first one was on the steering wheel sensor device. This was a mixture of a regular slide-show presentation and a demonstration. For this purpose we shipped out a scaled down version of our driving simulator equipped with the new sensor. Here, we are testing the system right before the the start of the presentation.

Luckily, none of the equipment got broken during transportation, so everything worked perfectly. My other presentation took place in the afternoon of the second day. It was on the results of the PTT glove experiment that we mentioned here before. This presentation also went smoothly.
The conference was organized very nicely, with helpful hosts and great food. They even scheduled a visit for us to see the Microsoft Home project. The location of the conference was on the campus of the University of Washington in Seattle. It proved to be a beautiful place. I didn’t even know that there are campuses in the USA that are built in gothic style. I have seen this before only in Europe. Here Andras and Andrew explore the square in front of the landmark library building of the university, that looks more like a gothic cathedral.

Thanks go to Prof Kun for helping us and actively participating in writing all three papers (second author on all of them). Also, thanks to Erika Clifford for doing all the logistics for the trip and shipping the equipment.
Oszkar
Driving simulator & People & Speech user interface oszkar on 27 Jun 2008
Automotive discussions at YRRSDS’08
This year’s Young Researchers’ Roundtable on Spoken Dialog Systems was a great place to share ideas with fellow speech research enthusiasts. Lots of attention was devoted to the design of spoken dialog systems in general (as the name of the event might suggest), but also to more specific areas, as automotive speech user interfaces. Besides the main program, the pleasant meal breaks provided a great environment for people to discuss research ideas in an informal manner.
I had some very interesting conversations concerning automotive topics during these breaks with Ben Reaves from Toyota ITC, Stefan Hamerich from Harman/Becker and Zeljko Medenica from Project54/UNH. We discussed the state-of-the-art of automotive speech user interfaces and where the field is headed. Most of us heard about Microsoft’s Sync Technology for Ford vehicles, thanks to their advertising campaign in the USA, but other big auto makers are also selling their cars equipped with speech recognition systems (e.g. Toyota, Honda, etc). Ben proposed, that now is the time to set standards for automotive speech user interfaces, which could be accepted by all relevant parties in business and research. We all agreed that the accepted standard might not be the best one (see VHS vs. Betamax), but still it could be very beneficial to the field, by focusing its development.
The poster session was cleverly piggy-backed onto the afternoon coffee break. This way people didn’t even notice that their minds were “on-duty” even during “recess”. I presented a poster on my ongoing research concerning push-to-talk button solutions for in-car speech user interfaces. It drew a bigger crowd than I imagined. Professor Alex Rudnicky from CMU was inquiring about the premises and methods of my research. Then, the automotive specialists, Stefan and Ben were joined by several other participants in discussing details about the poster with me.
Stefan Hamerich, Ben Reaves, Antoine Raux, Oskar Palinko, Milica Gasic
We had a very good conversation on the advantages/disadvantages of high fidelity driving simulators in automotive research. They can provide lots of measured variables (lane position, steering wheel angle, distances, speeds, etc.), but in the same time researchers must cope with their possible undesired effects (e.g. motion sickness).
I found out, that informal discussions are a very effective way of sharing ideas within small groups of people. I enjoyed a lot to talk to fellow researchers about common interests at YRRSDS’08.
Oskar Palinko
Project54, UNH

