Anita Faul is a Data Scientist helping to bring Machine Learning to BAS. Before this she was a Teaching Associate, Fellow and Director of Studies in Mathematics at Selwyn College, University of Cambridge. She came to Cambridge after studying two years in Germany. She did Part II and Part III Mathematics at Churchill College, Cambridge. Since these are only two years, and three years are necessary for a first degree, she does not hold one. However, this was followed by a PhD on the Faul-Powell Algorithm for Radial Basis Function Interpolation under the supervision of Professor Mike Powell. She then worked on the Relevance Vector Machine with Mike Tipping at Microsoft Research Cambridge. Ten years in industry followed where she worked on various algorithms on mobile phone networks, image processing and data visualisation. The recording of a relative recent talk is here. A recording of a short talk about work at BAS is here.
Profile picture explained: It was classified by ImageNet Roulette, an art project to shed light on what happens when technical systems are trained on problematic training data. ImageNet is one of the most important and historically significant training sets in artificial intelligence. It consists of over 14 million labelled images organised into more than twenty thousand categories. Under the top-level category “Person” it contains 2833 sub-categories and classifies people into a huge range of types including race, nationality, profession, economic status, behaviour, character, and even morality. ImageNet Roulette first runs a face detector to locate any faces, when a user uploads a picture. It then uses an open source Caffe deep learning framework (produced at UC Berkeley) trained on the “Person” category for classification. The category “Nurse” has positive connotations, but ImageNet contains a number of problematic, offensive and bizarre categories – all drawn from WordNet, including misogynistic or racist terminology.
There are several challenges with which data present us nowadays. For one there is the abundance of data and the necessity to extract the essential information from it. When tackling this task a balance has to be struck between putting aside irrelevant information and keeping the relevant one without getting lost in detail, known as over-fitting. The law of parsimony, also known as Occam’s razor should be a guiding principle, keeping models simple while explaining the data.
The next challenge is the fact that the data samples are not static. New samples arrive constantly through the pipeline. Therefore, there is a need for models which update themselves as the new sample becomes available. The models should be flexible enough to become more complex should this be necessary. In addition the models should inform us which samples need to be collected so that the collection process becomes most informative.
Another challenge are the conclusions we draw from the data. After all, as popularised by Mark Twain: “There are three kinds of lies: lies, damned lies, and statistics.” An objective measure of confidence is needed to make generalised statements. Conclusions should be drawn in line with ethical principles.
The last challenge is the analysis. Can we build systems which inform us of the underlying structure and processes which gave rise to the data? Moreover, it is not enough to discover the structure and processes, we also need to add meaning. Here different disciplines need to work together.
The emphasis of the book is on the question of Why – only if why an algorithm is successful is understood, can it be properly applied, and the results trusted. Algorithms are often taught side by side without showing the similarities and differences between them. This book addresses the commonalities, and aims to give a thorough and in-depth treatment and develop intuition, while remaining concise. This class-tested textbook uses mathematics as the common language. It covers a variety of machine learning concepts from basic principles, and illustrates every concept using examples in MATLAB®. Accompanying resources are available in Moodle.
This textbook provides an accessible and concise introduction to numerical analysis for upper undergraduate and beginning graduate students from various backgrounds. It was developed from the lecture notes of four successful courses on numerical analysis taught within the MPhil of Scientific Computing at the University of Cambridge. The book is easily accessible, even to those with limited knowledge of mathematics. Students will get a concise, but thorough introduction to numerical analysis. In addition, the algorithmic principles are emphasised to encourage a deeper understanding of why an algorithm is suitable, and sometimes unsuitable, for a particular problem. Additional material such as the solutions to odd numbered exercises and MATLAB® examples can be downloaded from the publishers’ webpage.
AI Lab presents at the First Artificial Intelligence for Copernicus Workshop
Blog 22 November, 2019