0

Machine Learning for Safer Landings



The Alexandra Institute is involved in developing an optical sensor for measuring friction on airport runways in winter conditions. Machine learning is used to relate the optical features with the runway state.

If you feel nervous when driving a car on a slippery road, then think about what it would be like to maneuver a 300 tonne airplane with 300 passengers at 200 kph on a slippery runway. Thankfully, airports operating in winter conditions put in considerable effort to keep runways clean from ice, snow, water and other contaminants that could reduce the grip. An important part of this effort is to regularly measure the friction and register unwanted contaminants, both for reporting it to the pilots and for planning the cleaning work.

To make life easier for the field staff, we’re currently trying to develop an optical sensor that can replace the existing mechanical one, which is expensive and cumbersome to use. The sensor can be mounted on a car, and it has lasers of different wavelengths directed towards the runway and optical instruments to measure five different features of the reflected light. The task now is to find the relationship between these features and the state of the runway. And here is where machine learning comes into play.

PCA plot of optical measurements for five runway contamination classes.

The above figure shows some data collected from a runway in summer time with contaminants like water, rubber and white paint (winter is coming to Denmark just now and I’m expecting to receive some real winter data soon). Each cross corresponds to a measurement from the sensor, projected from five features to two using a method called principal component analysis (PCA), so they can be plotted in a two-dimensional chart. One can clearly see that the wet measurements end up in another region than the dry ones, and indeed a classifier can differ between these two classes with 99.5% accuracy. However, when differing between all five classes, the accuracy is down to 88.1%. The problem can also be seen in the graph: Dry rubber is mixed up with both Paint and Dry. I will get back to what we can do about this problem and increase the accuracy in a later post.

We also tried to relate the optical features to the friction number, which was measured simultaneously with a mechanical device. Now we’re not doing classification any more, but regression, where the objective is to predict a number instead of a class. My colleague Christian trained a small neural network with our optical and friction data, and the result can be seen below.

Estimated friction from neural network and actual measurement from a mechanical device along two runways, joined at the middle of the chart.

As you can see, the estimate (green) follows the actual measurement (blue) quite well. We even have reasons to believe that wherever the difference is large, for example at 3450m, it might be the mechanical device that’s wrong and our neural network that’s right: After a wet spot, which gives low friction, the mechanics will stay wet for a little while and report a too low friction number.


Can a Smartphone Recognize Birds?



The Alexandra Institute and DTU conducted a study on automatic recognition of bird sounds on smartphones. It was concluded that noone has yet demonstrated the recognition accuracy that one could expect from a bird recognizer app.

Would it be possible to develop a smartphone app that recognizes birds from their sounds, so that you could bring your smartphone to the forest and have it tell you what birds you’re hearing? That’s the question that I was asked to sort out a couple of years ago as the small Danish company pr-development hired us for an initial feasibility study. However, since I was rather new to machine learning at that time, I contacted Jan Larsen and Lasse Lohilahti Mølgaard at DTU (Technical University of Denmark) to help me out - they have extensive experience with machine learning on sound.

We started out with experiments on some sound data from six different bird species, all recorded with a smartphone. With some help from Lasse at DTU, I extracted 19 features for each 100 ms chunk of sound data - those were features like frequency, energy, some tonal components, etc. Below I have taken two out of these 19 features and plotted them in a two-dimensional chart, just to show how the different species dominate different regions of the feature chart - somewhat similar to how socialistic and liberal voters dominate different regions on a map of Denmark. Each cross corresponds to a 100 ms chunk, and the color indicates the species.

 

Feature plot for six bird speciesFeature plot for six bird species: Blue - Grasshopper Warbler; Green - Garden Warbler; Red - Blackcap; Cyan - Common Redstart; Violet - Common Blackbird; Black - Eurasian Pygmy Owl.

Even though this chart only shows two out of the 19 features (it’s difficult to produce a 19-dimensional chart), one can see that some bird species, notably Grasshopper Warbler (blue) and Eurasian Pygmy Owl (black), are clearly distinguishable, while others are more mixed up. Lasse now tried a number of different classifier models and ended up with an accuracy of 73% with the best model, which in this case turned out to be multinomial logistic regression.

But 73% on a catalog of six birds is obviously not good enough for the envisioned bird recognition app, so I went on to study some research articles to see if others had achieved better results. Indeed, Jancovic and Köküer1 achieved 91.5% accuracy on 95 birds, and Chou et al2 achieved 78% on 420 birds. However, without going into details, none of the articles studied recognition under conditions that would apply for a smartphone app, so these results aren’t really reliable for our case.

So the only thing we could conclude for sure is that if it’s at all possible to make a well-functioning bird recognizer app, then noone has yet proven it to be feasible. Indeed, if you look at the FAQ section at the home page of iBird, one of the most popular apps for bird identification, they also seem to have investigated the matter and concluded that it’s too difficult - at least for now. Thus, the question in the title remains open, and unfortunately we didn’t have the resources to conduct a full-scale study to conclude it. If anyone can contribute with some information or ideas, don’t hesitate to leave a comment!

1 Peter Jancovic, Münevver Köküer: Automatic Detection and Recognition of Tonal Bird Sounds in Noisy Environments. EURASIP Journal on Advances in Signal Processing, 2011
2 Chih-Hsun Chou, Chang-Hsing Lee, Hui-Wen Ni: Bird Species Recognition by Comparing the HMMs of the Syllables. Second International Conference on Innovative Computing, Information and Control, 2007.


The Danish Voter Classifier



A graphic of Danish election results produced by an unknowing statistician can be used to explain the fundamental concepts of machine learning.

For those new to the concepts of machine learning, it’s really instructive to study the figure below, which shows Danish voters’ tendency to vote for either the socialistic (red) or liberal-conservative (blue) political blocks in the last parliamentary election. In fact, it constitutes what’s known as a classifier: Given the home address of a voter, it predicts what block he or she will vote for.

Danish voters tendency to vote for political blocks dependning on home address.

Danish voter’s tendency to vote for political blocks depending on
home address. Based on a graphic from Berlingske Tidende.

That’s what a classifier does: Given some information about something, the classifier predicts which class the something belongs to. And just like all classifiers based on machine learning, this classifier is also produced from the statistical exploration of some training data - in this case the 2011 election data. I don’t know for sure, but presumably, a statistician has used some mathematical methods on the election data to calculate where the straight line separating the red and the blue area should be. It’s this process that‘s called learning in machine learning. Notice that learning must take place before the classifier can be used, and that learning requires a set of training data to learn from. This is the case for all machine learning applications.

The second point that one can make from this figure concerns the difficulties in achieving good accuracy. Frankly, that figure a lousy classifier, because in reality, there are many socialistic voters in the blue region and many liberal/conservative voters in the red region, so the classification accuracy is not very high. The problems here essentially boil down to:

  1. The home address of a voter alone doesn’t really say that much about his or her political standpoints. One needs to know more: Income, education, age, gender, IQ, favourite pet... any variables that might correlate with political views should be taken into account. In general, selecting these pieces of information about the object to be classified (they’re called features in machine learning jargon) is often both crucial and difficult and requires considerable domain knowledge to achieve good results.
  2. A single straight line is a too simple separator. It’s surely possible to define areas of Denmark where one or the other political block is dominant, but their borders would be curved, and it would be necessary to use several distinct lines to represent them. Fortunately (maybe), the machine learning literature is packed with different models for classification: Linear discriminants (that’s the single straight line), logistic regression models, neural networks, support vector machines, etc, and each of these models allows for different possibilities to fit separators to the data.

Of course, I’m not blaming the unknowing statistician for bad machine learning craftmanship, because the figure wasn’t made for that purpose. However, for real machine learning applications, proper feature selection and model selection are crucial for achieving high accuracy, and together with the data collection, they are the most important development activities. We’ll surely get back to these concepts repeatedly in the postings to come.


InfinIT er finansieret af en bevilling fra Styrelsen for Forskning og Uddannelse og drives af et konsortium bestående af:
Alexandra Instituttet . BrainsBusiness . CISS . Datalogisk Institut, Københavns Universitet . DELTA . DTU Compute, Danmarks Tekniske Universitet . Institut for Datalogi, Aarhus Universitet . IT-Universitetet . Knowledge Lab, Syddansk Universitet . Væksthus Hovedstadsregionen . Aalborg Universitet