Category ArchiveSpeech processing
DSP & Just for fun & Software & Speech processing & User interface Ivan Elhart on 30 Jun 2008
Sony’s MP3 dancing robot - Rolly
Sony revealed an egg-shaped digital music player named Rolly (picture below) at the end of 2007, but I haven’t had the chance to see it until last weekend. It plays MP3 and AAC music files and supports direct music streaming over a Bluetooth connection. And it is able to dance.

The Rolly is more than an ordinary music player. Thus, it is motion-controlled robot with a bunch of sensors, color lights, and two flapping wings. It uses two wheels that surround the body to roll, wiggle, and spin. In vertical position the wheels can be used to change songs and adjust volume. The Rolly creates motion automatically by analyzing the music, so it can dance to any song. Also, there is a possibility of creating new motions or customizing exiting ones using PC software. You can see the Rolly’s dance in the video below. It is amazing how the sound and motion are synchronized.
Ivan Elhart
Education & Speech processing & UNH ECE nemanja on 12 May 2008
ECE 992 Speech signal processing – Student presentations, Monday 05/12/2008
Hello ecebloggers,
Today, Monday 05/12/2008 was the last day of the student project presentations and the day of LPC (Linear Predictive Coding). The session chair was Nate Bourgoine (picture below).

After a little confusion in the beginning, Ivan Elhart (in the picture below), CATlab member, opened today’s presentations with Project 25 MATLAB implementation of compression. He explained the standard he tried to implement for speech compression and he compared his results with Praat.

Yuanli Wang (picture below), who we know as the session chair for Wednesday 05/07/2008, used LPC for compression and synthesis of speech. We had a nice demo where we heard his results.

After Yuanli, another member of CATlab presented. Oszkar Palinko (picture below), presented his work on a wireless microphone. He transmitted LPC coefficients to lower the amount of data transmitted over RF.

The honor of closing the student presentations went to Dan Reynolds (picture below). He implemented audio water marking and his presentation was voted for the best student presentation of the class. He also received a nice book as a reward. Congrats Dan!

Here should be a picture of all of us with a satisfied, wide smile after the end of the ECE 992 Speech class. Unfortunately you’ll get just the description of the picture because my camera battery died, but I’m sure you can imagine it. Here’s some help : )))
All pictures from the presentations can be found here.
Have a good one,
Nemanja Memarovic
Education & Speech processing nemanja on 09 May 2008
ECE 992 Speech signal processing – Student presentations, Friday 05/09/2008
Hello ecebloggers,
Friday 05/07/2008 was another exciting day of student presentations. The session chair was Dave Estes. He didn’t bring Budweiser (Dude!) but he did have a nice slide listing all the presenters (see picture below).

The first presentation today was from Nate Bourgoine. Nate’s project was on speaker recognition using high order formants. He thoroughly explained what he did and gave a nice demo. Here’s picture of Nate:

Next was Keith Spaulding (see picture below). Keith analyzed and synthesized the chant “OM”. He explained what we are supposed to achieve with chanting and the difference between speech processing we did in class and what he did.

Jon Carrier (in picture below) closed today’s day of presentations. Echo cancellation was the topic of his project. He explained the way echo appears and how he tried to handle it.

Monday 05/12/2008 is going to be the last day of student presentations. All pictures from the presentations can be found here.
Have a good one,
Nemanja Memarovic
Education & Speech processing nemanja on 07 May 2008
ECE 992 Speech signal processing – Student presentations, Wednesday 05/07/2008
Hello ecebloggers,
Wednesday 05/07/2008 was the second day of the student project presentations, and the day of prosody. The session chair was Yuanli Wang. He did a nice job with power point presentation so we would all know who is presenting. Here’s Yuanli’s picture.

I had the honor of breaking the ice today. My presentation was on prosody in HCI. I tried to describe human voice in HCI with pitch and power variances. Here’s picture of me in action.

Jeremy Kent analyzed prosody of emotions in R2D2’s voice. It seems that R2D2 has feelings, so if you run into him be careful how you talk to him. Here’s Jeremy giving us some details about R2D2’s voice.

After Jeremy, Mike Farrar presented. We had a similar topic. Mike was analyzing voice recorded on PDAs. He extracted a lot of data and gave a very informative presentation. Nice job Mike!

Another member of CATlab was presenting on Wednesday. Jon Oppelaar was analyzing speech for map task (one person explaining directions to another person – map task).

Another post about the presentations is coming this Friday 05/09/2008. All pictures from the presentations can be found here.
Have a good one,
Nemanja
Education & Speech processing nemanja on 07 May 2008
ECE 992 Speech signal processing – Student presentations, Monday 05/05/2008
Hello ecebloggers,
Monday was the first day of the student project presentations. The session chair was Ivan Elhart. Here’s Ivan introducing a presenter:

Four students were presenting: Zeljko Medenica, Dawe Schwarzenberg, Zach Clifton and Dave Estes. All four presentations were very interesting. Zeljko was trying to estimate pitch in a similar way as people hear. Here’s Zeljko during his presentation:

Zeljko ran a demo on a short wave file to show us how it works.
Dave Schwarzenberg’s project was also about pitch estimation. He tried to detect pitch with dynamic time wrapping. Here’s Dave:

Zach Clifton (see picture below) as a former musician tried to show acceptable variance in pitch for a vocalist. We also heard his lovely singing.

Before reading about Dave Estes’s project watch this video:
Dave (see picture below) used prosody to determine human emotions. He found the inspiration in Budweiser’s “Dude” ad. We had a few laugh listening to his wave files.

That’s all about Monday’s presentations. There’s more coming Wednesday 05/07/2008. You can find all the pictures from the presentation here.
Have a good one,
Nemanja
People & R&D & Speech processing & Speech user interface Andrew Kun on 12 Mar 2008
Microsoft acoustic anechoic chamber displayed at TechFest
Wired Magazine’s online edition has a couple of blog posts about Microsoft Research’s TechFest, an annual internal event that allows MS researchers from around the world to meet and share ideas in Redmond, WA.
One of the Wired blog posts is about the so-called Soundless Audio Lab, that is an acoustic anechoic chamber, located at the Redmond offices of Microsoft Research. The researcher describing the lab to the Wired blogger is Ivan Tashev. Ivan’s research interests include microphone arrays, which is one reason for his involvement with the anechoic chamber. Ivan is also involved in developing in-car speech interfaces, e.g. the Commute UX technology that provides location based services via telephone, which he presented at SigDial 2007 (here’s the paper).
Andrew Kun
Speech processing & UNH ECE Andrew Kun on 16 Feb 2008
Lecture: Dr. Jenni Cook on Speech Production and the Singing Voice
Dr. Jenni Cook, Associate Professor in the UNH Music Department, visited my Speech Processing class to discuss speech production as it relates to the singing voice. Here’s a picture of Jenni I took during the lecture:

I invited Jenni because I was intrigued by a figure in the Quatieri textbook that I use in this course, and I thought she’d be the perfect person to bring this figure to life for us. The figure (a reproduction from a paper by Johan Sundberg) pictorially explains how singers can generate loud sounds, or using music terminology, how they can project well. In essence, the vocal tract has resonances, called formants. For so-called voiced sounds, the vocal tract is excited by periodic air puffs from the glottis. The frequency of the air puffs is the fundamental frequency of the voice. When the formants line up with the fundamental frequency, and/or its harmonics (its multiples in frequency), the singer projects well. Here’s the figure:

Jenni discussed this issue, as well as others, in her lecture. For me, the most exciting parts of the lecture were the demonstrations: Jenni sang short sequences to demonstrate various effects (including the one in the figure above), and she also had us participate in some of the demonstrations by vocalizing and by exploring the anatomy of our voice production mechanism.
Jenni has kindly agreed to help me in assigning a project topic for my Speech Processing course (each student will complete a project as part of the course). Jenni and I will work together on guiding a project on generating feedback to a voice student about how he or she is projecting while singing, as well as about other relevant features of his or her voice (we’ll determine which ones as the project gets going). The feedback will be based on digital speech signal processing algorithms and it will be presented in a user-friendly form to the singer. Should be fun!
Andrew Kun
Education & Speech processing & UNH ECE Andrew Kun on 29 Jan 2008
Speech Signal Processing course - links
Last week I started teaching a Speech Signal Processing course. This promises to be a fun course for me, since it mixes DSP and human speech production, both topics I’m very interested in. I keep track of links relevant to this course here. The latest link I’ve added to this list is a link to SPAN, the Speech Production and Articulation Knowledge Group at USC. They have fascinating movies showing human speech production as it happens. The movies were made using USC’s high-speed MRI technique. There are several movies on the SPAN home page, and for more click on Videos.
A couple of notes on viewing the SPAN videos. First, to see a video of a particular sound being produced, select the sound and then select the speaker (American or Indian English speaker). Second, on one computer I had problems viewing the videos from Firefox, however they worked fine from IE. So if you’re having problems with viewing try switching browsers (or let me know if you have a better explanation as to why the Quicktime plugin works OK on the home page but not the videos page).
Andrew Kun
