Train Your Computer to Listen

Try out a Java applet that classifies sound based on your own recordings and shows the results graphically.

A while ago I held an internal presentation about machine learning here at the Alexandra Institute, and to illustrate the basic principles, I made a simple Java application that was able to learn to recognize different sounds from the microphone and show the training data points of the different classes graphically. Now I have refined it a bit and I thought I’d share it here on the blog, so, here it is - just click the image below to try it out!

Sound Classifier Demo

The main idea with the application is to show that if the features (zero crossing rate, flux, centroid etc. in this case) are wisely chosen, it will be possible to see how the training data points of different classes end up in different regions of the feature graph. Then, when a new sample is taken for classification (shown as a big blue dot in the screenshot above), it’s basically just a matter of finding out in which of these regions this new data point is.

This application is rather generic and can be used in many different ways. It can easily recognize typical human phonemes or distinguish between different kinds of noise, like footsteps, whistling, coughing and so on. I also successfully used it in a train to recognize whether the train was standing at a station, going slowly, going fast or was in a tunnel - only based on the ambient noise. To some degree it can also recognize different human speakers, but since it only works on individual short chunks of sound, it fails to recognize sequential patterns, which turns out to be an important element of the way we speak. I will get back to the subject of sequential patterns in a future post.

If you want to play with it a bit more, try training it with both little and much variation in the input for each class. For example, try using a monotonous voice consistently for the first session, then reload the page and record the same classes with varying pitch and add some extra noise for the second session and compare the results. You should see that when the training data variation is higher, the reported accuracy becomes lower, but in reality the classifier is more robust to variation and noise. This highlights the necessity to collect training data that varies and contains noise like the real expected input.

Skriv kommentar

InfinIT er finansieret af en bevilling fra Styrelsen for Forskning og Innovation og drives af et konsortium bestående af:
Alexandra Instituttet . BrainsBusiness . CISS . Datalogisk Institut, Københavns Universitet . DELTA . DTU Compute, Danmarks Tekniske Universitet . Institut for Datalogi, Aarhus Universitet . IT-Universitetet . Knowledge Lab, Syddansk Universitet . Væksthus Hovedstadsregionen . Aalborg Universitet