Feed on Posts or Comments

Category ArchiveSpeech user interface



Driving simulator & R&D & Speech user interface & Talk & User interface zeljko.medenica on 01 Apr 2010

Bret Harsham talk at UNH

A couple of weeks ago our ECE department hosted Bret Harsham of Mitsubishi Electric Research Labs (MERL) as a part of the ECE900 graduate seminar. Bret gave a very interesting lecture about the ways of shortening voice dialogs through using a contextual push-to-talk button. The title of the lecture was “Contextual Push-to-Talk: Shortening Voice Dialogs to Improve Driving Performance.”

IMG_9179

The focus of the lecture was on a prototype in-car voice user interface (VUI) which was tested during my MERL internship last summer. As opposed to the contemporary in-car VUIs, which use only one push-to-talk button for issuing commands, this work presents a way of utilizing multiple push-to-talk buttons depending on the context of the query. For example, if we have three domains of interest, we can associate one push to talk button for each of them. Therefore, we skip multiple steps which are otherwise required in order to switch to a desired domain and initiate a voice search.

This work was recently accepted for publication at MobileHCI 2010 conference.

Zeljko Medenica

Driving simulator & R&D & Speech user interface & User interface zeljko.medenica on 13 Aug 2009

Summer internship at MERL

This summer I have had a great opportunity to be an intern at Mitsubishi Electric Research Labs (MERL) in Cambridge, MA. This is not only a great prospect for my professional career, but also a chance to experience how working in a real company looks like.

MERL is a daughter company of the Mitsubishi Electric Corporation from Japan and therefore they have close collaboration. As the name implies, it is a research facility where people work on many different areas, such as digital communications, multimedia, user interfaces, speech interaction, mechatronics and many others. Their overwhelming publications page just confirms how important research is in this institution.

As a MERL intern, I am a member of the speech group. My advisor here is Bret Harsham and for this summer we have been working on testing an in-car speech user interface which was developed here at MERL. This interface enables contacts, music and points-of-interest selection using voice commands. The experiments will be concerned with the influence this interface may have on driving performance and will be performed on their driving simulator (shown in the picture below).

MERL driving simulator

The driving simulator is based on a racing game simulation and consists of three huge DLP projector screens which create a very wide field-of-view, force feedback steering wheel and pedals, and a motion chair. The motion chair is very powerful, because it simulates the vibrations caused by road surface and engine, as well as the tilting of the car caused by acceleration and deceleration. The feeling it produces is very realistic and it may help prevent simulator sickness which was so common in our simulator studies.

We are looking forward to publishing the results of this interesting study. So, stay tuned for more info on this topic.

Zeljko Medenica

Conferences & Speech processing & Speech user interface & Ubicomp Andrew Kun on 26 Mar 2009

SiMPE Workshop at MobileHCI 2009

The call for papers for the Fourth Workshop on Speech in Mobile and Pervasive Environments (SiMPE) is now available. The workshop will focus on speech user interfaces for pervasive (ubiquitous) computing applications. SiMPE will be held in conjuntion with MobileHCI 2009 in Bonn. This is the fourth year SiMPE is being organized and I’m thrilled to be one of three new members of the Organizing Committee, along with Tim Paek and Ivan Tashev.

Some of the topics I hope will get attention at this year’s workshop are those related to leveraging speech for user interaction in cars. We encourage the submission of position papers on the best practices and techniques for facilitating effective interaction through on-board computers or mobile devices such as phones or media players. Topics of interest range from noise robustness technology, to the effects of speech interaction on driving performance, to societal and cultural issues relevant to in-car speech interaction.

For questions or comments please email me at andrew DOT kun AT unh DOT edu.

Andrew Kun

Driving simulator & Speech user interface & Ubicomp Andrew Kun on 12 Mar 2009

Commute UX at Microsoft TechFest 2009

Check out the video below in which Ivan Tashev, Mike Seltzer and Y.C. Ju discuss the Commute UX project at TechFest 2009

Last summer Oszkar and I visited the MSR driving simulator lab and it’s great to see how quickly Ivan and co. are making progress with the simulator. And it’s pretty cool that the video features ABBA!

UPDATE: The video above doesn’t feature the MSR driving simulator, but in the video below Ivan Tashev introduces Commute UX using the simulator - thanks for the comment Oszkar.

Andrew Kun

Conferences & PDA & PowerPoint & Speech user interface & UNH ECE Michael Farrar on 18 Nov 2008

My talk at Northeastern University: NECHFES student conference

Hello ecebloggers,

Last week, a group of us from the lab had traveled to Boston’s Northeastern University, attending the NECHFES student conference.  It was my first conference experience and I do have to say that it surpassed my expectations.  The atmosphere was very relaxed, and presentations were nicely sequenced with 10 – 15 minute breaks.  Of course, breakfast and lunch were served, both of which were outstanding, and free!  The keynote speaker, Daniel Serfaty from Aptima Inc, had some unique perspectives on the ubiquitous computing world of today, and tomorrow, very interesting!  Photos of my speech, entitle “Using voice to tag digital photographs on the spot” can be found here, and the complete presentation here.  I look forward to attending similar events.

P1010522 by www.eceblogger.com.

Michael Farrar

Driving simulator & R&D & Speech user interface Andrew Kun on 06 Nov 2008

Project54 research featured in SLTC e-Newsletter

IEEE SLTCThe Fall 2008 e-Newsletter of the Speech and Language Technical Committee of the IEEE Signal Processing Society includes an article about the driving simulator-based research going on within Project54. Thanks to Mike Seltzer for his help with editing the article.

Btw, the image that appears in the article was created by Alex Shyrokov and may be included in an upcoming book by Matthias Wölfel and John McDonough entitled Distand Speech Recognition.

Andrew Kun

PDA & Speech user interface & Ubicomp & User interface Michael Farrar on 03 Nov 2008

Using voice to tag digital photographs on the spot

Hi ecebloggers,

In the past I’ve discussed the imaging application, and in particular, the tagging capabilities it provides.  Now it’s time to put the application to work.  Tagging of media, particularly photographs, has become a very popular and efficient means of organizing material on the internet and on personal computers.  Over a short period of time the technique has evolved from an optional feature to a must-provide service, and can be found within modern desktop and internet photo galleries.  However, tagging is normally accomplished long after the images have been captured, and possibly at the expense of in-the-moment information.  In this view, we hypothesize that tagging photos right after they are taken, or on the spot, will result in a larger number of tags than tagging photos long after they are taken.  We also hypothesize that the tags created on the spot will be perceived to better describe the photos by consumers of the photos.  Finally, we hypothesize that a convenient way of tagging photos on the spot is by using voice commands.  

To test these hypotheses we will conduct a study in which participants will be asked to introduce the University of New Hampshire campus in a number of low-resolution pictures captured using a Symbol MC50 PDA. Participants will be divided into two groups.  One group will be able to issue voice commands to select tags from a list while the other group will have to manually select or type in tags from the same list.  The images and their tags will be posted to an internet photo gallery, such as Flickr, which will allow us to recruit a third group of participants who will compare the quality of the tags created by the first two participant groups.  The study will be conducted throughout the months of November and December, so check back for the results soon after.  Below is a sample image of UNH posted on Flickr.  The tagging section is highlighted in red.

 

 

Michael Farrar

Driving simulator & Speech user interface Nemanja Memarovic on 29 Oct 2008

Effects of using a push-to-talk button for speech user interfaces on visual attention

Hello ecebloggers,

I’m very proud today because I’m writing about my first experiment. The experiment builds on two of our previous studies. In one of them Zeljko Medenica found that when speech recognition performance is poor, having to use a push-to-talk button results in worse driving performance than using ambient recognition (you can find the paper on his study here). In his thesis Oskar Palinko found that drivers divert their visual attention from the outside world when using the push-to-talk button. This happens because drivers often cast a glance away from the road in order to locate the push-to-talk button before operating it.

A simple solution to this latter problem may be training drivers not to look for the push-to-talk button. However, just like users of consumer electronics in general, drivers are not willing to be trained for long periods of time in order to use an in-car device. The goal of this study is to determine if simple training, such as verbal instructions issued before driving, would improve drivers’ visual attention directed at the road ahead.

The experiment setup is shown in the picture below.

The experiment involves our high fidelity driving simulator and eye tracker. I’m also using three cameras: one to record subjects’ head movements, one to record hand positions while driving, and one to record the eye tracker laptop screen. The eye tracker laptop shows the video captured by the two eye tracking cameras but there is no elegant way of saving this video.

I’ve started the experiments this week with Oskar’s help so I’ll be able to post about results very soon!

Have a good one,

Nemanja Memarovic


Driving simulator & Speech user interface oszkar on 28 Oct 2008

License Plate Experiment

The last few weeks I have been working on designing a new driving experiment which would compare different user interfaces for entering license plate numbers into the police cruiser system for checking records. The two user interfaces are manual and speech. To be able to achieve this, license plates had to be added into our simulations. I have built on the results of Zeljko Medenica’s work to put in license plates as textures in the DriveSafety simulator computers. These worked only for some car models, because of the way textures are applied to them. Here is an example of successful applications:

The plates had to be made larger than in real life, because of the limited resolution of the projection system (1024×768). This way, the plates became readable from moderate distances and still preserved a realistic feel.
The experiment hypothesizes that a voice input system would have a beneficial effect on driving performance compared to using a manual interface.

We would also research if completion time would be longer for either of the two methods. Further, looking at the eye-tracker data, it would be possible to say which method demands more visual attention. My bet is on the manual UI. We will post all interesting results on this blog.

Oszkar Palinko

Driving simulator & Navigation & Project54 & R&D & Speech user interface zeljko.medenica on 27 Oct 2008

Navigation aids and driving performance

Probably everybody has at least some experience with the GPS-based personal navigation aids. They usually provide directions using voice prompts and information displayed on LCD screens. While such devices appear to be less distracting to use than paper directions, in-car displays may distract drivers from their primary task, driving.

In order to assess how different in-car navigation aids affect driving performance and visual attention, we conducted an experiment using our high-fidelity driving simulator. The simulator is also equipped with an eye-tracker which provides information about subjects’ visual attention. The following picture illustrates how the experimental setup looked like.

Experimental setup

There were three navigation aids that subjects tested in this experiment: paper directions (turn-by-turn directions with a map printed on a sheet of paper), standard PND directions (a map displayed on an LCD screen with turn-by-turn directions delivered through voice prompts), and voice-only directions (just voice-delivered turn-by-turn directions). Participants were driving on a two-lane city road with markings and a light traffic was introduced. The following picture shows one snapshot from the simulation.

Sample scenario

We recorded three measures of driving performance from our driving simulator: lane position, steering wheel angle, and velocity. Higher variances of these variables represent worse driving performance. We also calculated the percent dwell time on the outside world for each subject using the data collected by the eye-tracker.

Our results showed that using paper directions degrades driving performance (lane position variance, steering wheel angle variance, and the mean velocity were significantly different between the paper and other navigation aids) and visual attention significantly more than using either a navigation device that provides voice prompts with a map or a device that provides voice prompts only.

Regarding the visual attention, the time participants spent looking at the road ahead was significantly higher when using the voice-only aid than when using either the paper or the voice and visual aid. On average participants spent around 94% of time looking at the road when using the voice only aid and around 89% when using the voice and visual aid.

While these findings supported our hypotheses, one interesting thing was that the majority of the participants expressed that they would prefer using the navigation aid which provides both the visual and voice directions. Based on this, we propose for our future work to model the glancing behavior of the drivers, which would enable us to predict when one would require next instruction, so we would be able to issue a voice prompt. This in turn would enable drivers to keep their eyes on the road at all times, while still being able to have the next instruction in a timely manner.

Lots of other experiments regarding this issue will follow, so we’ll keep you posted!

Zeljko Medenica

Next Page »