IBM is aiming to popularise its proprietary machine learning programme SystemML through open-source communities.
Announcing the decision to share the system source code on the company blog, IBM’s Analytics VP Rob Thomas said application developers are in need of a good translator. This was a reference to the huge challenges developers face when combining information from different sources into data-heavy applications on a variety of computers, said Thomas. It is also a reference to the transformation of a little used proprietary IBM system into a popular, widely adopted artificial intelligence tool for the big data market. The vehicle for this transformation, according to Thomas, will be the open-source community.
IBM claims SystemML is now freely available to share and modify through the Apache Software Foundation open-source organisation. Apache, which manages 150 open-source projects, represents the first step to widespread adoption, Thomas said. The new Apache Incubator project will be code named Apache SystemML.
The machine learning platform originally came out of IBM’s Almaden research lab ten years ago when IBM was looking for ways to simplify the creation of customized machine-learning software, Mr. Thomas said. Now that it is in the public domain, it could be used by a developer of cloud based services to create risk-modeling and fraud prevention software for the financial services industry, Thomas said.
The current version of SystemML could work well with Apache project Spark, Thomas said, since this is designed for processing large amounts of data that stream in from continuous sources like monitors and smartphones. SystemML will save companies valuable time by allowing developers to write a single machine learning algorithm and automatically scale it up using open-source data analytics tools Spark and Hadoop.
MLLib, the machine learning library for Spark, provides developers with a rich set of machine learning algorithms, according to Thomas, and SystemML enables developers to translate those algorithms so they can easily digest different kinds of data and to run on different kinds of computers.
“We believe that Apache Spark is the most important new open-source project in a decade. We’re embedding Spark into our Analytics and Commerce platforms, offering Spark as a service on IBM Cloud, and putting more than 3,500 IBM researchers and developers to work on Spark-related projects,” said Thomas.
While other tech companies have open-sourced machine learning technologies they are generally niche specialised tools to train neural networks. IBM aims to popularise machine learning within Spark or Hadoop and its ubiquity will be critical in the long run, said Thomas.