Machine Learning for Safer Landings

The Alexandra Institute is involved in developing an optical sensor for measuring friction on airport runways in winter conditions. Machine learning is used to relate the optical features with the runway state.

If you feel nervous when driving a car on a slippery road, then think about what it would be like to maneuver a 300 tonne airplane with 300 passengers at 200 kph on a slippery runway. Thankfully, airports operating in winter conditions put in considerable effort to keep runways clean from ice, snow, water and other contaminants that could reduce the grip. An important part of this effort is to regularly measure the friction and register unwanted contaminants, both for reporting it to the pilots and for planning the cleaning work.

To make life easier for the field staff, we’re currently trying to develop an optical sensor that can replace the existing mechanical one, which is expensive and cumbersome to use. The sensor can be mounted on a car, and it has lasers of different wavelengths directed towards the runway and optical instruments to measure five different features of the reflected light. The task now is to find the relationship between these features and the state of the runway. And here is where machine learning comes into play.

PCA plot of optical measurements for five runway contamination classes.

The above figure shows some data collected from a runway in summer time with contaminants like water, rubber and white paint (winter is coming to Denmark just now and I’m expecting to receive some real winter data soon). Each cross corresponds to a measurement from the sensor, projected from five features to two using a method called principal component analysis (PCA), so they can be plotted in a two-dimensional chart. One can clearly see that the wet measurements end up in another region than the dry ones, and indeed a classifier can differ between these two classes with 99.5% accuracy. However, when differing between all five classes, the accuracy is down to 88.1%. The problem can also be seen in the graph: Dry rubber is mixed up with both Paint and Dry. I will get back to what we can do about this problem and increase the accuracy in a later post.

We also tried to relate the optical features to the friction number, which was measured simultaneously with a mechanical device. Now we’re not doing classification any more, but regression, where the objective is to predict a number instead of a class. My colleague Christian trained a small neural network with our optical and friction data, and the result can be seen below.

Estimated friction from neural network and actual measurement from a mechanical device along two runways, joined at the middle of the chart.

As you can see, the estimate (green) follows the actual measurement (blue) quite well. We even have reasons to believe that wherever the difference is large, for example at 3450m, it might be the mechanical device that’s wrong and our neural network that’s right: After a wet spot, which gives low friction, the mechanics will stay wet for a little while and report a too low friction number.

Train Your Computer to Listen

Try out a Java applet that classifies sound based on your own recordings and shows the results graphically.

A while ago I held an internal presentation about machine learning here at the Alexandra Institute, and to illustrate the basic principles, I made a simple Java application that was able to learn to recognize different sounds from the microphone and show the training data points of the different classes graphically. Now I have refined it a bit and I thought I’d share it here on the blog, so, here it is - just click the image below to try it out!

Sound Classifier Demo

The main idea with the application is to show that if the features (zero crossing rate, flux, centroid etc. in this case) are wisely chosen, it will be possible to see how the training data points of different classes end up in different regions of the feature graph. Then, when a new sample is taken for classification (shown as a big blue dot in the screenshot above), it’s basically just a matter of finding out in which of these regions this new data point is.

This application is rather generic and can be used in many different ways. It can easily recognize typical human phonemes or distinguish between different kinds of noise, like footsteps, whistling, coughing and so on. I also successfully used it in a train to recognize whether the train was standing at a station, going slowly, going fast or was in a tunnel - only based on the ambient noise. To some degree it can also recognize different human speakers, but since it only works on individual short chunks of sound, it fails to recognize sequential patterns, which turns out to be an important element of the way we speak. I will get back to the subject of sequential patterns in a future post.

If you want to play with it a bit more, try training it with both little and much variation in the input for each class. For example, try using a monotonous voice consistently for the first session, then reload the page and record the same classes with varying pitch and add some extra noise for the second session and compare the results. You should see that when the training data variation is higher, the reported accuracy becomes lower, but in reality the classifier is more robust to variation and noise. This highlights the necessity to collect training data that varies and contains noise like the real expected input.

Heritage Health Prize : $3 million

If you are really good at data mining there are big money to be made

As you may  know Kaggle is a company that provides match-making between companies in serious need of data mining and data scientists. Kaggle does this by hosting data mining competitions for the companies and letting the rest of us join these competitions.

Since its conception Kaggle has grown enormously and last year marked a milestone for the company when they together with Heritage Provider Network launched


The Heritage Health Prize competition is single competition running until April 2013 where the first price for the best prediction is $3 million. The competition has attracted a lot of interest. Currently there are more than 1200 teams in the competition.

The Heritage Provider Network has provided more than 70000 rows of data of real people and their history with the health-care system (in USA) over a two year period (giving more than 140000 rows) and the variable that we want to predict is the following: the total numbers of days spend in hospital by the patients.

The reason for the large price is that the ability to predict this number also gives the hospitals the ability to optimize their use of their capacity, and this is one of the holy grails in cost-reductions in the health-care domain.

The competition has several milestones, where the best prediction so far receive a small prize of money. If this has sparked your interest, then there is an excellent blog post from the winners from the milestone last year: http://anotherdataminingblog.blogspot.dk/2011/10/code-for-respectable-hhp-model.html

You can use the blog post as a starting point to get in the game for the prize money, or as a good read on how to do data mining on complex datasets.


To SSAS or Not to SSAS

This is a case where a data mining tool seemed like the best approach for data mining but was bested by standard algorithms and careful analysis of the data.

A local computer hardware retailer approached the Alexandra Institute with an interesting problem: they receive a lot of data files from their suppliers but the data is not entirely consistent nor is it necessarily correct (from a computer's narrow perspective). A supplier data file has, amongst other things, the suppliers best guess at who manufactured the particular piece of hardware but this is not always correct. So the retailer wanted to automate this procedure and thought that data mining using Microsofts SQL Server Analysis Services (SSAS) could be the answer.

I went into this project with a feeling that SSAS might not do the trick. And I was right. No matter how I tweaked the parameters and the data the hit rate would not go higher than 87%. This is still good, but the best hit rate was with a decision tree. By looking at that tree I found that it basically boiled down to one big if-statement which only looked at one particular input value:

if inputvalue=then result = a
else if inputvalue= then result = b
else if …

...you get the picture.

After analysing the data I found that in most cases the suppliers guess at a manufacturer was correct (in about 87% of the cases, what a coincidence) although it didn't quite match with the list of "legal" manufacturer names that the retailer had ("Hewlett & Packard" versus "Hewlett-Packard" for example).
The data from the suppliers also contain a short description of the particular piece of hardware, most often in english but german desciptions were also present. It turned out that by using algorithms like Levenshtein Distance, Longest Common Substring and Longest Common Subsequenceon both the suppliers guess at a manufaturer and the description I could find matches in the list of legal manufacturer names with a hit rate > 96%. The last few percent are down to noisy data since the (manually maintained) list of legal manufacturers contains entries like "Hewlett Packard", "HP" and "Hewlett Packard Toner".

So Far, So Good

That was the easy part of the problem. The retailer also wanted an automated process that could put each hardware item in a group that is searchable on their website. "Microsoft Office 2007" should be in the group "Applications", "HP 146 GB SAS 15K Universal Hard Drive" should be in the group "Hard drives - internal" et cetera.
No matter how much I tweaked, twisted and mistreated SSAS I could not get a hit rate above 52%. One of the problems with tools like SSAS is that it only operates on discrete and continuous values; a neural network in SSAS, for example, does not look at a discrete string value and see that "236-72AX8G" looks rather similar to "23672AX8G".
So a model was built that allowed the algorithm to determine if we have seen the the particular item before, if we have seen something similar, and how many times we have seen it before. The algorithm then outputs its guess at a group as well as a score that determines how accurate the guess is. This will then be used to further refine the model when a user looks at the values and determines whether or not the guess was correct: correct values will be fed into the model.


I tested it on a number of data set sizes that I used to build the initial model. On the chart you can see how well it performs (percent hit rate) as a function of the model size (percent of full data set). Note that the graph depicts error rate against the full data set which includes the cases used to build the model - the error rate will be higher for new and "unseen" data, but the rate of hits will grow as the model is gradually refined.

The conclusion is: use the right tool for the job. Not all that looks like a duck and quacks like a duck is suitable for the dinner table. While tools like SSAS can be great for some problems, a careful analysis and some old fashioned code can solve problems unsuited for data mining tools.

Machine Learning Goes Big Brother

New surveillance products use machine learning to automatically detect dangerous or criminal events from video and sound feeds.

I was at a the security exhibition IFSEC in Birmingham a few weeks ago as part of a project totally unrelated to machine learning. I thought. It turned out that machine learning has found its way into some rather neat applications in the security business - in particular for automatically detecting dangerous or criminal events from surveillance equipment, thereby reducing the workload for the security guards that continuously monitor a big number of video feeds.

BRS Labs, based in Houston, Texas, has developed a video analytics technology that they call behavioral recognition. This essentially means that the system automatically learns common patterns in the movements of the objects passing the scene, and whenever it detects a movement pattern that is not normal for that kind of object, it issues an alarm. Compared to traditional video analytics, which is based on rules that are set up for each camera, behavioral recognition reduces the number of false alarms and eases the installation.

Sound Intelligence, based in Amersfoort, Netherlands, has developed an audio analytics system that recognizes sounds that indicate criminality or dangerous situations, for example breaking glass, gunshots and aggressive voices. When such a sound is detected, it alerts a security guard, who can then focus his or her attention to the associated video feed. I looked a bit more into the technology, and found out that the system employs some rather interesting signal processing techniques that I hadn’t seen before. In particular, it filters out the background noise and detects the onset of a foreground sound event by continuously monitoring and modelling the background noise. The analysis is based on a graph of the sound known as a cochleogram (see example below), which mimics the information extracted from a sound by the human ear.

Cochleogram for the dutch word welkom

Cochleogram for the dutch word "welkom".

It was also interesting to notice that the system doesn't seem to employ any of the well-known, generic classification techniques found in the standard literature, such as support vector machines or hidden markov models. Instead, once all the heavy signal processing is complete, it uses rather simple recognition rules that are specialized for the different sound classes that it’s designed to recognize. For example, to detect an aggressive voice, it merely extracts some sound parameters that are known to be characteristic for aggressive voices and checks whether these parameters are above a certain threshold. I think it displays a pragmatic approach to system design: If the system doesn’t need to be generic and flexible, one can often achieve better results by tailoring the algorithms to the specific needs rather than using generic techniques.

Reality is Difficult

The implementation of machine learning in real-world products calls for knowledge and skills far beyond standard machine learning theory. The Alexandra Institute has filed a research application to explore this field, which we believe will receive considerable attention in the next few years.

Machine learning theory is complex in itself, but just wait until you have to implement it in a real-world product! As this excellent article by Aria Haghighi points out, creating well-functioning products based on machine learning almost invariably involves application-specific problems beyond the standard techniques, and solving such problems calls for the understanding of the application domain just as well as of machine learning theory.

And it’s not just about developing the classifier or recognizer. When a sufficient recognition accuracy has been achieved, there are often several other challenges to attend to: The algorithm must be optimized for the specific platform, the computation must be distributed over several devices, the data is sensitive and must be secured, the user interface must be adapted for possibly inaccurate output, the system must improve by itself from user feedback, and so on.

As an example of what I’m talking about here, take a look at Google Translate. While the translation is impressive in itself, they didn’t stop there: If the user isn’t happy with the translation, he or she has the possibility to choose alternative translations of individual elements and to move words around. The machine and the user are collaborating in finding the best translation through a clever user interface. On top of that, the input from the user is fed back to Google’s database, so that it can be used to improve future translations.

Word cloudIt’s all about making the application useful for the actual usage scenario, and this often requires more than a good recognizer. At the Alexandra Institute, we believe that problems of this kind are an upcoming area of research, simply because it is not until now that the recognizers based on machine learning are becoming so accurate that they are ready be used in many different real products. We have therefore filed a research proposal called Data Mining and Machine Learning in Practice and it’s currently under open evaluation at the website Bedre Innovation (Danish for Better Innovation). If you understand Danish, you are very welcome to take a look at our proposal and leave a comment directly on the website - your feedback is very useful for us in the application process.

Can a Smartphone Recognize Birds?

The Alexandra Institute and DTU conducted a study on automatic recognition of bird sounds on smartphones. It was concluded that noone has yet demonstrated the recognition accuracy that one could expect from a bird recognizer app.

Would it be possible to develop a smartphone app that recognizes birds from their sounds, so that you could bring your smartphone to the forest and have it tell you what birds you’re hearing? That’s the question that I was asked to sort out a couple of years ago as the small Danish company pr-development hired us for an initial feasibility study. However, since I was rather new to machine learning at that time, I contacted Jan Larsen and Lasse Lohilahti Mølgaard at DTU (Technical University of Denmark) to help me out - they have extensive experience with machine learning on sound.

We started out with experiments on some sound data from six different bird species, all recorded with a smartphone. With some help from Lasse at DTU, I extracted 19 features for each 100 ms chunk of sound data - those were features like frequency, energy, some tonal components, etc. Below I have taken two out of these 19 features and plotted them in a two-dimensional chart, just to show how the different species dominate different regions of the feature chart - somewhat similar to how socialistic and liberal voters dominate different regions on a map of Denmark. Each cross corresponds to a 100 ms chunk, and the color indicates the species.


Feature plot for six bird speciesFeature plot for six bird species: Blue - Grasshopper Warbler; Green - Garden Warbler; Red - Blackcap; Cyan - Common Redstart; Violet - Common Blackbird; Black - Eurasian Pygmy Owl.

Even though this chart only shows two out of the 19 features (it’s difficult to produce a 19-dimensional chart), one can see that some bird species, notably Grasshopper Warbler (blue) and Eurasian Pygmy Owl (black), are clearly distinguishable, while others are more mixed up. Lasse now tried a number of different classifier models and ended up with an accuracy of 73% with the best model, which in this case turned out to be multinomial logistic regression.

But 73% on a catalog of six birds is obviously not good enough for the envisioned bird recognition app, so I went on to study some research articles to see if others had achieved better results. Indeed, Jancovic and Köküer1 achieved 91.5% accuracy on 95 birds, and Chou et al2 achieved 78% on 420 birds. However, without going into details, none of the articles studied recognition under conditions that would apply for a smartphone app, so these results aren’t really reliable for our case.

So the only thing we could conclude for sure is that if it’s at all possible to make a well-functioning bird recognizer app, then noone has yet proven it to be feasible. Indeed, if you look at the FAQ section at the home page of iBird, one of the most popular apps for bird identification, they also seem to have investigated the matter and concluded that it’s too difficult - at least for now. Thus, the question in the title remains open, and unfortunately we didn’t have the resources to conduct a full-scale study to conclude it. If anyone can contribute with some information or ideas, don’t hesitate to leave a comment!

1 Peter Jancovic, Münevver Köküer: Automatic Detection and Recognition of Tonal Bird Sounds in Noisy Environments. EURASIP Journal on Advances in Signal Processing, 2011
2 Chih-Hsun Chou, Chang-Hsing Lee, Hui-Wen Ni: Bird Species Recognition by Comparing the HMMs of the Syllables. Second International Conference on Innovative Computing, Information and Control, 2007.

The Danish Voter Classifier

A graphic of Danish election results produced by an unknowing statistician can be used to explain the fundamental concepts of machine learning.

For those new to the concepts of machine learning, it’s really instructive to study the figure below, which shows Danish voters’ tendency to vote for either the socialistic (red) or liberal-conservative (blue) political blocks in the last parliamentary election. In fact, it constitutes what’s known as a classifier: Given the home address of a voter, it predicts what block he or she will vote for.

Danish voters tendency to vote for political blocks dependning on home address.

Danish voter’s tendency to vote for political blocks depending on
home address. Based on a graphic from Berlingske Tidende.

That’s what a classifier does: Given some information about something, the classifier predicts which class the something belongs to. And just like all classifiers based on machine learning, this classifier is also produced from the statistical exploration of some training data - in this case the 2011 election data. I don’t know for sure, but presumably, a statistician has used some mathematical methods on the election data to calculate where the straight line separating the red and the blue area should be. It’s this process that‘s called learning in machine learning. Notice that learning must take place before the classifier can be used, and that learning requires a set of training data to learn from. This is the case for all machine learning applications.

The second point that one can make from this figure concerns the difficulties in achieving good accuracy. Frankly, that figure a lousy classifier, because in reality, there are many socialistic voters in the blue region and many liberal/conservative voters in the red region, so the classification accuracy is not very high. The problems here essentially boil down to:

  1. The home address of a voter alone doesn’t really say that much about his or her political standpoints. One needs to know more: Income, education, age, gender, IQ, favourite pet... any variables that might correlate with political views should be taken into account. In general, selecting these pieces of information about the object to be classified (they’re called features in machine learning jargon) is often both crucial and difficult and requires considerable domain knowledge to achieve good results.
  2. A single straight line is a too simple separator. It’s surely possible to define areas of Denmark where one or the other political block is dominant, but their borders would be curved, and it would be necessary to use several distinct lines to represent them. Fortunately (maybe), the machine learning literature is packed with different models for classification: Linear discriminants (that’s the single straight line), logistic regression models, neural networks, support vector machines, etc, and each of these models allows for different possibilities to fit separators to the data.

Of course, I’m not blaming the unknowing statistician for bad machine learning craftmanship, because the figure wasn’t made for that purpose. However, for real machine learning applications, proper feature selection and model selection are crucial for achieving high accuracy, and together with the data collection, they are the most important development activities. We’ll surely get back to these concepts repeatedly in the postings to come.

Stories of Machine Learning Applications

The Alexandra Institute has seen an increasing interest in machine learning among our clients in the recent years. That’s why we start up this blog on machine learning applications. In this blog, we will tell stories about machine learning applications and projects that we have come across.

A few months ago, I got an email from my colleague over at the communications department, where she asked if we, the tech nerds at the Copenhagen branch of the Alexandra Institute, would like to start up a blog on machine learning. I made a quick search on the Internet for machine learning blogs, and there were already a few of them - at least of the highly technical sort, written by experts for other experts. So that’s been done.


Tech nerds: Christian, myself and Morten. 
Tech nerds: Christian, myself and Morten.

On the other hand, we’ve seen an increasing interest in machine learning applications from our clients in the last few years, not least because of the booming smartphone market. Most of these clients don’t even know what machine learning is - they just have an idea for a cool application and a mysterious feeling that the idea might be difficult to implement, so they turn to us to sort things out for them. Often it turns out that their applications have research potential, and then we involve machine learning researchers from the universities in the project.

So, given this interest from our clients, given all the potential cool applications, and given the remarkable research progress in the last few years (which renders machine learning applications so accurate that they’re actually usable and no longer frustrating), we decided to start up a blog with stories about how machine learning can be used, plain and simply. Some of these stories will be inspirational, some of them will be more educational. Some stories will be about our own projects, some will be about other projects from all over the world. But all of them will focus on applications rather than technical details, and all of them will be written with the non-expert reader in mind. With these stories, we want to inform and inspire, hopefully contributing to the innovation of even more cool applications.

And when you have an idea for a cool application, or you just have anything machine learning-ish to say - don’t hesitate to write a comment to the posts or to contact me directly. You’ll find my contact information by following the link on the About page.

Netværkets aktiviteter er medfinansieret af Uddannelses- og Forskningsministeriet og drives af et konsortium bestående af:
Alexandra Instituttet . BrainsBusiness . CISS . Datalogisk Institut, Københavns Universitet . DELTA . DTU Compute, Danmarks Tekniske Universitet . Institut for Datalogi, Aarhus Universitet . IT-Universitetet . Knowledge Lab, Syddansk Universitet . Væksthus Hovedstadsregionen . Aalborg Universitet