Last year saw machine learning continue to be at the forefront of technological innovation, with a number of eye-catching projects coming to fruition. Arguably the most well-publicized was Google DeepMind’s AlphaGo defeat of 9-dan professional Go star Lee Sedol in a five-game match, but the huge success of consumer products such as Amazon’s Alexa probably have more long-term ramifications for AI in everyday life. As Amazon chief Jeff Bezos told the Internet Association’s annual gala in Washington DC, ‘It is a renaissance, it is a golden age. We are now solving problems with machine learning and artificial intelligence that were in the realm of science fiction for the last several decades. And natural language understanding, machine vision problems, it really is an amazing renaissance.’
‘It will empower and improve every business, every government organization, every philanthropy,’ Bezos continued, ‘basically there’s no institution in the world that cannot be improved with machine learning.’ And you would be hard pressed to find anyone who disagrees with him. In 2017, there is unlikely to be any let-up in the speed of innovation, as the biggest players in tech continue to invest heavily in the field. We’ve looked at some of the more interesting developments in the field.
We’ve had a look at some of the most exciting developments arising in machine learning this year, and also asked four machine learning experts from some of the world’s leading companies what was giving them cause for excitement in the field.
The rise of fake data
According to Crowdflower’s 2017 Data Scientist report, ‘when asked to identify the biggest bottleneck in successfully completing AI projects, over half the respondents named issues related to training data such as “Getting good quality training data or improving the training dataset”, while less than 10% identified the machine learning code as the biggest bottleneck.’
Collecting sufficient data together to build a training set to build AI is fairly easy today, but making sure it is of high enough quality far more difficult. It is also far more important, and data scientists are currently spending the bulk of their time cleaning, labeling, and categorizing data to get it up to the quality they need. One solution to this that we are seeing increasingly touted that is set to become more prominent this year is synthetic data.
Synthetic data is artificially produced data that mimics more or less exactly the properties of real data. There are two primary ways to generate synthetic data. The first is by observing real-world statistic distributions from the original data and reproducing fake data by drawing simple numbers. The second is by creating a physical model that explains the observed behaviour, then reproducing random data using this model. For example, if you applied a generative model built on a neural network to a series images of faces for the purposes of facial recognition, it would produce fake images of faces. This could be applied to a wide range of other data, establishing patterns and then producing something that fit into the range established. In this sense, it needs real datasets to work and will never be able to replace them entirely. No model will ever be able to generate examples of things it’s never seen real examples of before.
One early stage startup making ground in the field is Automated DL. The Virginia-based company creates synthetic data by using generative models that create data that resembles or are in some way related to historical examples they’re trained on. We expect to see a number of others start to make moves in the area as the year progresses.
Democratization of machine learning becomes more important
The voices to democratize AI have come from leaders from every leading tech companies. Microsoft CEO Satya Nadella recently wrote that he wanted AI ‘in the hands of every developer, every organization, every public sector organization around the world’ to allow them to build their own intelligence and AI capability. Fei-Fei Li, chief scientist of artificial intelligence and machine learning at Google Cloud, agrees, stating ‘The next step for AI must be democratization. This means lowering the barriers of entry, and making it available to the largest possible community of developers, users and enterprises.’
At the moment, Google has TensorFlow, the open source set of machine learning libraries that Google open sourced in 2015, Amazon has made its Deep Scalable Sparse Tensor Network Engine (DSSTNE - pronounced ‘Destiny’) library available on GitHub under the Apache 2.0 license, and Elon Musk has OpenAI, which bills itself as a ‘non-profit AI research company, discovering and enacting the path to safe artificial general intelligence.’ Google also recently announced its acquisition of online data scientists community Kaggle, which community of data scientists, but also one of the largest repositories of datasets that will help train the next generation of machine-learning algorithms.
The cloud will also help in the push for democratization. Machine learning requires an immense amount of computing power to function correctly - power that was previously out of reach for many companies. The scalability offered by Amazon Web Services (AWS), Google Cloud Platform, Microsoft Azure enable developers to build out their infrastructure optimized for machine learning at a fraction of the cost of developing their own proprietary system.
Machine learning becomes vital for cybersecurity
As we saw in the recent WannaCry attacks, cybersecurity poses a clear and present danger to organizations in every industry and in truth, it is unlikely to be resolved anytime soon. Research by Accenture found that the average organization faces 106 targeted cyber-attacks per year, with one in three of those attacks resulting in a security breach. Towards the end of 2016, estimates put the number of new malware samples being generated in a single quarter at around 18 million - as many as 200,000 per day.
This threat is constantly mutating, as hackers adapt to cybersecurity measures and find new ways to infect systems. In order to deal with this, organizations must be extremely quick to adapt their security countermeasures, and machine learning techniques are the only technology currently available with this capability. Former Department of Defense Chief Information Officer, Terry Halvorsen, believes that ‘within the next 18-months, AI will become a key factor in helping human analysts make decisions about what to do.’ This point of view is being reinforced by significant investment in the field by the world’s largest technology companies.’ According to DFLabs’s May 2017 report ‘Next Generation Cybersecurity Analytics and Operations Survey,’ 93% of IT leaders are using or planning to use these types of solutions, 12% have deployed machine learning technologies designed for security analytics and operations automation and orchestration, 27% that they're doing so on a limited basis, and 22% said they're adding them. Just 6% said they're either not planning on or not interested in deploying these technologies.
MIT has been experimenting with it for some years, while IBM is training its AI-based Watson in security protocols and has now made it available to customers. Amazon also recently acquired AI-based cyber-security company Harvest.ai, which uses AI-based algorithms to identify the most important documents and intellectual property of a business before combining user behavior analytics with data loss prevention techniques.
And what the experts say…
Ashish Rastogi, Senior Data Scientist at Netflix
'The trend in machine learning has been strongly in favor of neural networks. If 2016 was the year when Alpha-Go bested world's best Go players, I think 2017 will be the year where we make continued progress towards neural-networks based technologies entering the every-day world. I'm thinking better personal assistants (Amazon's Alexa and Google Home are already great successes), greater progress towards autonomous vehicles, machine learning in health-care, etc.'
Saket Kumar, Chief Data Scientist At Google
'We see tons of business and consumer activities being digitized. The amount of data that gets digitized continues to grow. Machine learning is great for situations where there are large data sets and cases to learn from. Examples of this include Image identification, voice transcription, translation etc. The most important applications will likely be analyzing consumer behavior as companies like Google, Facebook, Amazon, and others have tons of such data and have developed a large knowledge base that they can leverage to machine learning solutions.
I am excited about the intersection of video/multi-media consumption and analytics. Image and video recognition is still work in progress. There is a lot of exciting stuff that can be done with respect to what ML sees in videos and actual consumption/interaction response of the consumers.'
Nikhil Garg, Software Engineering Manager at Quora
'The most exciting promise of ML to me is that its impact won't be limited to a single vertical or area, but rather would be felt in an extremely wide variety of economic sectors - all the way from retail, manufacturing, and logistics to education, healthcare and everything in between.
I'm very excited to see more progress in the area of end-to-end differentiable network architectures. I feel that our rate of innovation will greatly accelerate if we could somehow also offload the learning of network architectures to machines.'
Jérôme Selles, Director of Data Science at Turo
I can't wait to see applications of Machine Learning completely disrupting our approach to the biggest challenges of humanity. Managing our resources, improving health, preventing risk... there are countless opportunities where Machine Learning carefully applied can lead us to a better world.
In the mobility industry, in particular, autonomous vehicles come to mind. Let's not forget that vehicles are resources that are completely underutilized today: in the US alone, there are 300 million cars and 200 million people able to drive them, in the world we're talking about 1 billion cars. Our mission at Turo is to put the world's billion cars to better use, which can be seen as a macro machine learning problem.
I'm pretty excited to see Google getting into the field as a vendor and providing their expertise in the domain with the different parts of Google Cloud. Also, the understanding of the processes and the infrastructure it takes to make a real world application of machine learning is getting better and better. It is great to see progress being made in abstracting the infrastructure complexity and making it easier for more data-savvy people to manipulate datasets. Amazon has done an amazing job in that space, but it's especially exciting to see Google becoming a challenger. Along with that, more and more projects are becoming open source - this emulation should elevate the ecosystem as a whole.