Feed on Posts or Comments

Mobile phone & Project54 & Ubicomp Michael Farrar on 15 Apr 2009 02:11 pm

Using voice to tag digital photographs on the spot: seeking participants

Have you ever wanted to tag photographs immediately upon capture?  Your answer to this question is probably twofold: (1) Yes I have; it would seem to be more convenient. (2) No I have not; text entry on a mobile device is a difficult process.  Now, what if I told you that we’ve developed an application which provides this functionality while removing you from the larger majority of text entry?  If I told you this then you’d be willing to participate in its evaluation study, right?

The figure above depicts the latest version of Project54’s imaging application.  It’s come a long ways since its initial developments, now targeting any camera-equipped Windows Mobile 6.0 device.  The leftmost portion of the figure shows the Manager window, where you can capture photographs.  The rightmost portion of the figure shows the Tags window, where you can tag your photographs via text entry, screen-tap selections or voice commands.  That’s right, voice commands; hence, using voice to tag digital photographs on the spot.  Interested?  The middle portion of the figure demonstrates the imaging application’s cooperation with Flickr, allowing you to upload your live photo stream directly to your account wherever you are.  Interested?

Participation in the study is free and usage of the application will not expire, you get to keep it for however long you want.  Everything you need to get started is downloadable: installer; written tutorial; video demonstrations.  The application also houses an on-device tutorial, so if you’re the type who likes to figure things out as you go along, then this may provide a quick and easy route through basic training.  You should be aware that this is a research study and that large amounts of data will be logged and transferred from your device.  Therefore, if your cellular data plan is anything other than the unlimited type then you may not wish to participate.  For additional information regarding the logging and collection of data please review the study’s consent and release forms.

So how does it work?  There are many features implemented by the imaging application, all of which are detailed in the written tutorial.  In an effort to keep this post as simple as possible I’ll only review the tagging process.  The figure below depicts the First Use window (left) and the Tag Bank window (right).  Upon first use of the application, the First Use window will be displayed and you’ll be able to link with your Flickr account.  At this time the most frequently used tags from your Flickr account will be downloaded to your phone.  The First Use window also allows you to specify up to five photographic interests, each of which is compared against Flickr’s immense tag database for similar listings which are then downloaded to your phone.  This series of downloads initially populates the tag bank.

Only the tags displayed by the Tag Bank window are valid voice commands.  Tags may be inserted or removed from the tag bank as necessary, as shown in the rightmost portion of the figure where we are inserting the tag “john”.  The more you use the application the more voice becomes a viable interaction method.  The performance of speech recognition is based on your knowing of which tags exist in the tag bank, so please review its contents after linking with your Flickr account. 

Please do not hesitate to contact me with any questions you may have.  Thank you for your participation – and don’t forget to speak up!

Michael A. Farrar
mafarrar@unh.edu

Subscribe to the comments through RSS Feed

Leave a Reply