As Helios prepared to sit down at EASA's headquarters in Cologne for the Data Science in aviation workshop, a quick look around the room showed a full and diverse audience. The steady rise in attendance and profile of the workshop, which initially started as a small and specialised event five years ago, is representative of the change in attitude towards complex data analysis. Words such as 'machine-learning', 'natural language processing' and 'cryptocurrency', which used to feature only in the academic dealings of computer scientists, are now commonplace in daily conversations and news articles. The aviation domain of course, is not exempt.
The company Safety Data presented their software 'PLUS', which uses natural language processing to read and analyse flight reports. Traditional search methodology results in a binary answer (Is the specific key word present? Yes or No), and therefore leads to false positives (the key word is present but in the wrong context) and false negatives (the right context but without the key word). Natural language processing allows them to weigh a text (here flight reports) depending on its likelihood of addressing the desired topic.
Natural language processing is part of artificial intelligence processes. Another common one is Machine learning which was featured in a presentation by Atos of a project which aims to partly automatize the design of aircraft parts. However, artificial intelligence processes may not always be the most pertinent method of analysis. For example, the University of Westminster argued against it, and chose physical modelling instead to analyse the co-dependency between KPIs in aviation.
All these techniques could bring about substantial improvements in the aviation industry, from enhancing safety to minimizing environmental impact. However, they unequivocally rely on access to sufficient data to perform these complex analyses and obtain meaningful results. As is often the case, the quality of the input determines the quality of the output. Many of the successful projects presented at this event relied on a commercial partnership with the data owner. Data Safety sells its PLUS software to airlines, ANSPs and regulators alike, often customizing it in the process – each user's interaction with the tool leading to daily updates which benefit all the user base. Boeing sells its service that monitors and compares flight plans to real-time flight recordings to airlines seeking improvements in fuel efficiency. Consequently, and there lies the issue, data access tends to be given only to trusted companies selling a service or diagnosing a problem. Therefore, as most complex data analytics projects require a sufficiently large data set to be able to initialise, this results in a very limited number of established companies, which have the resources and connections to develop new tools.
Conversely, open source projects have been thriving: from 3D printing to the peer-to-peer economy, many of the most revolutionary inventions of the last decade have been the result of open-source collaborations. Indeed, the crowdsourcing of data collection in aviation has already begun, with ADS-B data exchange websites and apps, such as the Open Sky network. The EU has also been encouraging more open data exchange, sponsoring projects such as Innaxis' SafeClouds, part of Horizon 2020, and involving 16 airlines and ANSPs who share data with the aim of improving aviation safety. Moreover, the development of hashing functions and blockchain protocols enables the partial anonymization of data and safe data sharing without compromising on safety and privacy.
Despite these shining examples of data analytics, either by established companies or through open-source projects, 95% of all data recorded in the world is not analysed. With a staggering 2.5 quintillion bytes of data created every day, unless there is a clear financial advantage or a specific problem to solve, most companies lack the resources to analyse it all. As a result, it lies untouched, taking storage space on servers throughout the world, often in an unusable format. Here lies an opportunity for us as an industry, to find ways of providing access to more of our data, so that a new generation of insight and innovation can take place.