Amazon last month became the latest tech giant to make some of its most sophisticated deep learning technology open source, following the lead of fellow tech giants Google, Microsoft, Yahoo, and Facebook.
The world’s largest eCommerce site has made its Deep Scalable Sparse Tensor Network Engine (DSSTNE - pronounced ‘Destiny’) library available on GitHub under the Apache 2.0 license. DSSTNE is unique among deep learning libraries because of its focus on smaller datasets. Amazon uses them for its recommendation service, so obviously they are required to analyze small amounts of information. Amazon also claims that DSSTNE is 2.1 times faster than TensorFlow, Google's open source machine learning system.
In the Q&A section of Amazon’s DSSTNE GitHub page, they said: ’We are releasing DSSTNE as open source software so that the promise of deep learning can extend beyond speech and language understanding and object recognition to other areas such as search and recommendations. We hope that researchers around the world can collaborate to improve it. But more importantly, we hope that it spurs innovation in many more areas.’
The move follows Google’s decision to open-source TensorFlow last November. Initial reviews of it were poor. Google itself stretches its platform across thousands of computer servers, but the version it open sourced was only capable of running on a single machine. Matthew Mayo, for one, wrote on KDnuggets of his confusion around where or why exactly someone would use it. This appears to have since been rectified, with a new version of TensorFlow released in May that can run on multiple machines at the same time. And the issues didn’t prevent it being a hit. It was one of the six open source projects to receive the most attention from developers in all of 2015, despite only being released in November.
It seems that the major tech companies are in some sort of race to open source as much as they possibly can. There are a number of clear benefits. Open sourcing the project means a greater number of users can contribute back to it which has the potential to help discover use cases that Google and Amazon may never have found otherwise, helping to make their algorithms both more efficient and powerful. For individuals and small organizations, it enables them to perform data crunching and predictive tasks that wouldn’t before have been possible. It is also not the case of giving away trade secrets that it may appear. While Google and Amazon are making public some of its most important data center software, it hasn’t given anyone access to advanced hardware infrastructure that drives this engine.
Ultimately, the tech giants can afford to give away all the algorithms they want - without training data for the algorithms, you can’t build a search algorithm anywhere near as good as Google’s or a recommendation engine anywhere near as good as Amazon’s. The main thing that the open sourcing trend shows is that algorithms are no longer considered the most important part of machine learning anymore. The data that the algorithms are trained on is the key. The trend among tech companies towards open sourcing algorithms will therefore likely continue apace, but so will the rush to gather as many sensor points as possible to enable the most data collection. This has huge potential to drive innovation among these companies in the realm of deep learning, machine learning, and AI, but for entrants to the market that lack the data of their larger competitors, they are likely not going to be in any better a position than before as a result.