Conferences & Project54 & User interface zeljko.medenica on 15 Aug 2007 03:35 pm
Speech Interface Accuracy and Driving Performance
Hello again. This year was very productive for our papers at different conferences, so we are getting ready for another one. This time we published a paper at the conference called Interspeech 2007, which will be held at the end of August in Belgium. Our paper, The Effect of Speech Interface Accuracy on Driving Performance, written in collaboration with Tim Paek of Microsoft Research, was accepted for the poster session. Here, I will say a couple of words about the research that was performed in the paper.
With increasing the number of electronic devices that are finding their way into vehicles, the number of potential sources of distraction increases as well. As we showed in our paper that was presented at Driving Assessment 2007, speech interaction with in-vehicle devices has a potential to eliminate or at least decrease this distraction.
While designing speech user interfaces, there are several important factors that must be paid attention to: speech recognition accuracy, push-to-talk (PTT) button, and dialog repair.
Speech recognition accuracy refers to the accuracy of the speech recognizer, and it is clear that it is desired to be as high as possible. Because current speech recognizers do not work well with ambient recognition, PTT buttons still have to be used in order to improve the recognition rate. Dialog repair refers to the procedure the user has to perform when the recognizer makes a mistake.
Now, if we take these factors into account, the logical question would be what the influence of these factors is on driving performance. In order to investigate this, we designed an experiment with these factors: speech recognizer accuracy (Low accuracy 44% vs. High accuracy 89%), PTT button usage (with vs. without), and dialog repair (misunderstanding – the system responds with incorrect recognitions vs. non-understanding – the system responds with “unrecognized”).
The experiment was performed in our driving simulator. Twenty subjects participated in the study. From the simulator we collected three variables: lane position, steering wheel angle, and velocity. We calculated the variances of these variables taking into account only the road segments on which subjects performed interactions with the system. Higher variances indicated worse driving performance.
The video below will help you get a feel for how the experiment was performed. The subject was involved in all interaction combinations: successful recognition, misrecognition, and dialog repair. Dialog repair is performed in the case of a misrecognition by issuing a “cancel” command and repeating the misrecognized command. The interactions were performed using the PTT button.
After performing statistical analyses on the data that we obtained from the simulator, there were two major conclusions that we could draw. First, the steering wheel angle variance is higher (which indicated worse driving performance) when the speech recognition accuracy is Low. Second, lane position variance is higher (worse driving performance) when the speech recognition accuracy is Low and the PTT button is being used. The type of dialog repair did not have any statistically significant influence on driving performance.
These results are very important for the design of in-vehicle speech interfaces. Although, connected with these factors there are interactions with some other variables that can also influence driving performance, such as the position of the PTT button and frustration that is induced in subjects when the speech recognizer performs poor recognition. These questions will be investigated in our future research.
The conference starts in a couple of weeks, and after we come back, we will probably have more experiences and information to share. So, stay tuned!
Zeljko Medenica
