Reality is Difficult

The implementation of machine learning in real-world products calls for knowledge and skills far beyond standard machine learning theory. The Alexandra Institute has filed a research application to explore this field, which we believe will receive considerable attention in the next few years.

Machine learning theory is complex in itself, but just wait until you have to implement it in a real-world product! As this excellent article by Aria Haghighi points out, creating well-functioning products based on machine learning almost invariably involves application-specific problems beyond the standard techniques, and solving such problems calls for the understanding of the application domain just as well as of machine learning theory.

And it’s not just about developing the classifier or recognizer. When a sufficient recognition accuracy has been achieved, there are often several other challenges to attend to: The algorithm must be optimized for the specific platform, the computation must be distributed over several devices, the data is sensitive and must be secured, the user interface must be adapted for possibly inaccurate output, the system must improve by itself from user feedback, and so on.

As an example of what I’m talking about here, take a look at Google Translate. While the translation is impressive in itself, they didn’t stop there: If the user isn’t happy with the translation, he or she has the possibility to choose alternative translations of individual elements and to move words around. The machine and the user are collaborating in finding the best translation through a clever user interface. On top of that, the input from the user is fed back to Google’s database, so that it can be used to improve future translations.

Word cloudIt’s all about making the application useful for the actual usage scenario, and this often requires more than a good recognizer. At the Alexandra Institute, we believe that problems of this kind are an upcoming area of research, simply because it is not until now that the recognizers based on machine learning are becoming so accurate that they are ready be used in many different real products. We have therefore filed a research proposal called Data Mining and Machine Learning in Practice and it’s currently under open evaluation at the website Bedre Innovation (Danish for Better Innovation). If you understand Danish, you are very welcome to take a look at our proposal and leave a comment directly on the website - your feedback is very useful for us in the application process.

Can a Smartphone Recognize Birds?

The Alexandra Institute and DTU conducted a study on automatic recognition of bird sounds on smartphones. It was concluded that noone has yet demonstrated the recognition accuracy that one could expect from a bird recognizer app.

Would it be possible to develop a smartphone app that recognizes birds from their sounds, so that you could bring your smartphone to the forest and have it tell you what birds you’re hearing? That’s the question that I was asked to sort out a couple of years ago as the small Danish company pr-development hired us for an initial feasibility study. However, since I was rather new to machine learning at that time, I contacted Jan Larsen and Lasse Lohilahti Mølgaard at DTU (Technical University of Denmark) to help me out - they have extensive experience with machine learning on sound.

We started out with experiments on some sound data from six different bird species, all recorded with a smartphone. With some help from Lasse at DTU, I extracted 19 features for each 100 ms chunk of sound data - those were features like frequency, energy, some tonal components, etc. Below I have taken two out of these 19 features and plotted them in a two-dimensional chart, just to show how the different species dominate different regions of the feature chart - somewhat similar to how socialistic and liberal voters dominate different regions on a map of Denmark. Each cross corresponds to a 100 ms chunk, and the color indicates the species.


Feature plot for six bird speciesFeature plot for six bird species: Blue - Grasshopper Warbler; Green - Garden Warbler; Red - Blackcap; Cyan - Common Redstart; Violet - Common Blackbird; Black - Eurasian Pygmy Owl.

Even though this chart only shows two out of the 19 features (it’s difficult to produce a 19-dimensional chart), one can see that some bird species, notably Grasshopper Warbler (blue) and Eurasian Pygmy Owl (black), are clearly distinguishable, while others are more mixed up. Lasse now tried a number of different classifier models and ended up with an accuracy of 73% with the best model, which in this case turned out to be multinomial logistic regression.

But 73% on a catalog of six birds is obviously not good enough for the envisioned bird recognition app, so I went on to study some research articles to see if others had achieved better results. Indeed, Jancovic and Köküer1 achieved 91.5% accuracy on 95 birds, and Chou et al2 achieved 78% on 420 birds. However, without going into details, none of the articles studied recognition under conditions that would apply for a smartphone app, so these results aren’t really reliable for our case.

So the only thing we could conclude for sure is that if it’s at all possible to make a well-functioning bird recognizer app, then noone has yet proven it to be feasible. Indeed, if you look at the FAQ section at the home page of iBird, one of the most popular apps for bird identification, they also seem to have investigated the matter and concluded that it’s too difficult - at least for now. Thus, the question in the title remains open, and unfortunately we didn’t have the resources to conduct a full-scale study to conclude it. If anyone can contribute with some information or ideas, don’t hesitate to leave a comment!

1 Peter Jancovic, Münevver Köküer: Automatic Detection and Recognition of Tonal Bird Sounds in Noisy Environments. EURASIP Journal on Advances in Signal Processing, 2011
2 Chih-Hsun Chou, Chang-Hsing Lee, Hui-Wen Ni: Bird Species Recognition by Comparing the HMMs of the Syllables. Second International Conference on Innovative Computing, Information and Control, 2007.

InfinIT er finansieret af en bevilling fra Styrelsen for Forskning og Uddannelse og drives af et konsortium bestående af:
Alexandra Instituttet . BrainsBusiness . CISS . Datalogisk Institut, Københavns Universitet . DELTA . DTU Compute, Danmarks Tekniske Universitet . Institut for Datalogi, Aarhus Universitet . IT-Universitetet . Knowledge Lab, Syddansk Universitet . Væksthus Hovedstadsregionen . Aalborg Universitet