Can a Smartphone Recognize Birds?

The Alexandra Institute and DTU conducted a study on automatic recognition of bird sounds on smartphones. It was concluded that noone has yet demonstrated the recognition accuracy that one could expect from a bird recognizer app.

Would it be possible to develop a smartphone app that recognizes birds from their sounds, so that you could bring your smartphone to the forest and have it tell you what birds you’re hearing? That’s the question that I was asked to sort out a couple of years ago as the small Danish company pr-development hired us for an initial feasibility study. However, since I was rather new to machine learning at that time, I contacted Jan Larsen and Lasse Lohilahti Mølgaard at DTU (Technical University of Denmark) to help me out - they have extensive experience with machine learning on sound.

We started out with experiments on some sound data from six different bird species, all recorded with a smartphone. With some help from Lasse at DTU, I extracted 19 features for each 100 ms chunk of sound data - those were features like frequency, energy, some tonal components, etc. Below I have taken two out of these 19 features and plotted them in a two-dimensional chart, just to show how the different species dominate different regions of the feature chart - somewhat similar to how socialistic and liberal voters dominate different regions on a map of Denmark. Each cross corresponds to a 100 ms chunk, and the color indicates the species.


Feature plot for six bird speciesFeature plot for six bird species: Blue - Grasshopper Warbler; Green - Garden Warbler; Red - Blackcap; Cyan - Common Redstart; Violet - Common Blackbird; Black - Eurasian Pygmy Owl.

Even though this chart only shows two out of the 19 features (it’s difficult to produce a 19-dimensional chart), one can see that some bird species, notably Grasshopper Warbler (blue) and Eurasian Pygmy Owl (black), are clearly distinguishable, while others are more mixed up. Lasse now tried a number of different classifier models and ended up with an accuracy of 73% with the best model, which in this case turned out to be multinomial logistic regression.

But 73% on a catalog of six birds is obviously not good enough for the envisioned bird recognition app, so I went on to study some research articles to see if others had achieved better results. Indeed, Jancovic and Köküer1 achieved 91.5% accuracy on 95 birds, and Chou et al2 achieved 78% on 420 birds. However, without going into details, none of the articles studied recognition under conditions that would apply for a smartphone app, so these results aren’t really reliable for our case.

So the only thing we could conclude for sure is that if it’s at all possible to make a well-functioning bird recognizer app, then noone has yet proven it to be feasible. Indeed, if you look at the FAQ section at the home page of iBird, one of the most popular apps for bird identification, they also seem to have investigated the matter and concluded that it’s too difficult - at least for now. Thus, the question in the title remains open, and unfortunately we didn’t have the resources to conduct a full-scale study to conclude it. If anyone can contribute with some information or ideas, don’t hesitate to leave a comment!

1 Peter Jancovic, Münevver Köküer: Automatic Detection and Recognition of Tonal Bird Sounds in Noisy Environments. EURASIP Journal on Advances in Signal Processing, 2011
2 Chih-Hsun Chou, Chang-Hsing Lee, Hui-Wen Ni: Bird Species Recognition by Comparing the HMMs of the Syllables. Second International Conference on Innovative Computing, Information and Control, 2007.

The Danish Voter Classifier

A graphic of Danish election results produced by an unknowing statistician can be used to explain the fundamental concepts of machine learning.

For those new to the concepts of machine learning, it’s really instructive to study the figure below, which shows Danish voters’ tendency to vote for either the socialistic (red) or liberal-conservative (blue) political blocks in the last parliamentary election. In fact, it constitutes what’s known as a classifier: Given the home address of a voter, it predicts what block he or she will vote for.

Danish voters tendency to vote for political blocks dependning on home address.

Danish voter’s tendency to vote for political blocks depending on
home address. Based on a graphic from Berlingske Tidende.

That’s what a classifier does: Given some information about something, the classifier predicts which class the something belongs to. And just like all classifiers based on machine learning, this classifier is also produced from the statistical exploration of some training data - in this case the 2011 election data. I don’t know for sure, but presumably, a statistician has used some mathematical methods on the election data to calculate where the straight line separating the red and the blue area should be. It’s this process that‘s called learning in machine learning. Notice that learning must take place before the classifier can be used, and that learning requires a set of training data to learn from. This is the case for all machine learning applications.

The second point that one can make from this figure concerns the difficulties in achieving good accuracy. Frankly, that figure a lousy classifier, because in reality, there are many socialistic voters in the blue region and many liberal/conservative voters in the red region, so the classification accuracy is not very high. The problems here essentially boil down to:

  1. The home address of a voter alone doesn’t really say that much about his or her political standpoints. One needs to know more: Income, education, age, gender, IQ, favourite pet... any variables that might correlate with political views should be taken into account. In general, selecting these pieces of information about the object to be classified (they’re called features in machine learning jargon) is often both crucial and difficult and requires considerable domain knowledge to achieve good results.
  2. A single straight line is a too simple separator. It’s surely possible to define areas of Denmark where one or the other political block is dominant, but their borders would be curved, and it would be necessary to use several distinct lines to represent them. Fortunately (maybe), the machine learning literature is packed with different models for classification: Linear discriminants (that’s the single straight line), logistic regression models, neural networks, support vector machines, etc, and each of these models allows for different possibilities to fit separators to the data.

Of course, I’m not blaming the unknowing statistician for bad machine learning craftmanship, because the figure wasn’t made for that purpose. However, for real machine learning applications, proper feature selection and model selection are crucial for achieving high accuracy, and together with the data collection, they are the most important development activities. We’ll surely get back to these concepts repeatedly in the postings to come.

Netværkets aktiviteter er medfinansieret af Uddannelses- og Forskningsministeriet og drives af et konsortium bestående af:
Alexandra Instituttet . BrainsBusiness . CISS . Datalogisk Institut, Københavns Universitet . DELTA . DTU Compute, Danmarks Tekniske Universitet . Institut for Datalogi, Aarhus Universitet . IT-Universitetet . Knowledge Lab, Syddansk Universitet . Væksthus Hovedstadsregionen . Aalborg Universitet