Øyvind Roti, Head Of Solutions Architecture, Google Cloud-APAC
Machine learning is used extensively by companies across the industry spectrum. Examples include recommendation engines in media and e-commerce, quality control on factory floors and trading systems in financial services. However, many enterprises
are just getting started and are looking for the best way to manage their machine learning initiatives. Large-scale machine learning requires substantial computing power and data, so managing costs through efficient use of resources is also an important consideration.
Given the wide range of business problems it addresses, machine learning will become a ubiquitous part of software applications throughout enterprises, and managing the machine learning lifecycle will become a critical skill for software engineers and IT practitioners of the future. This will require some new skills, but fortunately, it fits neatly into existing good practices like agile development. In short, the solution is to build an end-to-end platform for machine learning pipelines.
Building Machine Learning Models
Machine learning is a subfield of artificial intelligence to build models that learn from examples without hard-coding rules or behaviours. It is usually broken into three types: supervised, unsupervised, and reinforcement learning. We will focus on supervised learning, but the platform concept extends to all three. Supervised machine learning works by training a model with labeled examples. For instance, to build a system that can recognise objects in an image, the training data could be photos with a label (e.g. a photo labelled “cat”). After a while, the model will be able to tell which pixels and patterns make up a “cat”. By repeatedly training and tweaking, the model will gradually improve until it is able to take an unlabeled example and predict the correct output.
Some models can reach this level of performance with hundreds of training examples and a single machine, but a particularly powerful form of supervised learning, Deep Learning, is especially hungry for data and compute resources.
For Deep Learning models, you may need a cluster of servers and tens of thousands training examples to achieve good results.While they require more resources, Deep Learning models have advanced the cutting edge of fields like computer vision, speech recognition, natural language processing and recommender systems, and are now widely used by companies in numerous industries as well as in popular consumer mobile apps.
Managing the machine learning lifecycle will become a critical skill for software engineers and it practitioners of the future
Building a Platform for end-to-end Machine Learning Pipelines
At a high level, the end-to-end machine learning pipeline can be broken into three stages: data transformation, model training, and model serving. A typical oversight is to focus almost exclusively on the model training stage. However, much of the work going into an effective machine learning pipeline is the cleaning and transformation of input data. The key to data transformation is automation to embed good practices into future data sources and reduce the chance of errors prevalent in ad-hoc, manual processes. Automating the data pipeline requires sufficient flexibility to add new data sources, whilst blocking poor quality data points. This data engineering step is not trivial and will likely require the most effort and political capital to initially set up, as data silos are broken down and processing is standardized. This initial investment will pay off as more and more real-time and batch data sources are added.
Once the input data is available, training the machine learning models can begin. It is usual for this step to be led by experts like data scientists. However, if you abstract the underlying complexity and encapsulate best practices, people without machine learning expertise can train models as well, opening it up to many more possible industry applications. It is a good idea to conduct many experiments in parallel and expect some of these to work well and some not to work at all. This requires automation and scalable infrastructure resources. Note that when measuring the real-life performance of the model, it should be tied directly to business results, e.g. change in customer conversion rates, not just technical metrics like prediction accuracy. After all, the model could be incredibly accurate but bring no business benefit whatsoever.
Serving of machine learning models requires production levels of availability and scalability. Models will often be deployed to the client or bundled with mobile apps. By creating several candidate models, the platform can be built to allow A/B testing where user subsets are exposed to different model versions. Just like with Continuous Integration and Delivery, automated testing must ensure that every model that gets deployed has reached the acceptable level of generalisation accuracy before being exposed to the end users. Once deployed, end-user interactions can be used to further train and improve the model and so the life cycle continues.
For example, companies like Urban Outfitters are using ML to enhance the customer e-commerce shopping experience by maintaining a comprehensive set of product attributes that is able to provide shoppers with better discovery, recommendation and search experiences. Meanwhile, Disney is building vision models to annotate its products to improve discovery and product recommendations on shopDisney.
Putting it All Together
Many enterprises focus on the data science part of machine learning but neglect the criticality of building a platform for end-to-end, automated machine learning pipelines.
As machine learning becomes a standard part of every software engineer and IT professionals toolkit, it should be integrated into existing agile practices and Continuous Integration and Delivery processes. Rather than re-inventing the wheel, aim to integrate them into your enterprise IT delivery capability and governance.