If you’ve been following the progress of Machine Learning (ML) in industry over the last decade, you may be aware of a growing gap between the capabilities demonstrated in academic papers and the prevalence or viability of ML in production robots. Engineers know that staging a sparkly demo for a research paper or a marketing video is relatively easy. Once that’s done, you are perhaps only 5% of the way to a robust production-ready system; the latter must withstand the “long tail” of edge cases, unexpected circumstances, unforeseen requirements, and implicit expectations that messy reality throws at it. The chasm between ML in theory and ML in practice is well-known and well-documented. This post, however, will discuss another perhaps more intractable problem, one that has been preventing ML from gaining a foothold in robotics.
Imagine you’re an engineer training ML models for use in production robots. Imagine also that you’re aware of the “long tail” problem. Your goal, therefore is to create a situation in which you can control the environment and the types of tasks in order to reduce the negative impact of unexpected factors on performance. You are designing a narrow, well-defined space — like you might find in a factory or warehouse — where lighting conditions, collisions possibilities, and surrounding object types can be more or less prescribed and controlled. This lets you limit the out-of-distribution cases to a negligible fraction of the total.
Given such an ideal situation, you might be tempted to predict that your model will perform well, or at least as well as is described in academic papers. This is when you encounter another, more prosaic, and less-discussed problem that is built into the very process of training ML models for robots, and that is more difficult to solve. Specifically: the act of collecting data to train your model will, ironically, end up shutting you out of product cycle itself.
The act of collecting data to train your model will, ironically, end up shutting you out of product cycle.
When training a robot to perform some task in a physical space — e.g. moving crates in a warehouse — you have a few options available to you. The first and most apparently sensible one would be to train a Reinforcement Learning (RL) agent from scratch. Using carefully crafted rewards and an appropriate state-space to explore, such an agent could learn to perform well without needing direct supervision. However, pure RL systems take a long time to converge, and generally necessitate randomized actions during their exploration phases. Allowing a $2 million, 1 ton robot to flail wildly while in exploration mode is generally frowned upon by the flesh-and-blood operations team standing next to it. A consequence-free training environment is hard to find — Sim2Real algorithms are still not good enough to deliver the level of performance required by industrial robots. And unless your company plans on ceasing to evolve their products, this kind of exploration would have to be done frequently as small changes in product direction force you to retrain the model. Remember: the premise behind this hypothetical scenario was that ML is only reliable when changes in both the task and environment are kept to a minimum.
This leaves only supervised or hybrid approaches for training your robot. A popular candidate is imitation learning, by which a system learns from expert demonstration performed on real examples. This is where the true paradox reveals itself. Any supervised approach you choose, including imitation learning, will be data-hungry. This means you will need to gather large volumes of data from production machines, ideally working in real conditions (rather than simulations¹) and use that to train a robust model. The more complex — and therefore valuable — the task, the more edge cases the domain presents the agent, and the more data is needed for the model to be robust. Yet operations teams are rarely willing to let an expensive and dangerous robot run free on the production floor unless they trust that it will act safely.
On the financial side, early prototypes can be expensive, and this expense can only be justified if there is a business case and a potential customer waiting for you to deliver a product. Such stakeholders want returns on their investments soon, which means you as the developer need to quickly produce a good-enough model to get the company on its feet. Using human demonstration to train a model through imitation learning is expensive and slow, and will generally take so long that the business will demand an alternative product or solution for their customers in order to stay solvent. Unless your company has treasure chests full of cash to throw into generating millions of mocked-up samples, these same machines should be productive enough (financially) without needing ML models to justify the cost of their installation.
So the only option available to you, the plucky developer, is to write a piece of traditional software to run these early machines, one that is reliable enough to be trusted on the factory floor by both the business and operations. From these you hope to collect enough data to train a robust model, and then eventually to replace it.
Months pass, and you’ve finally trained a candidate you are happy with. You take it to the production teams and submit it for deployment. However, these folks have been at it for a while now, solving the customer’s needs with the existing hand-scripted code-base, which has since become more performant and reliable. It had to, in order for the business to feel safe putting it in a fleet of robots. Supporting systems have also been built around the stack. Developers have been trained to work on it and are engaged in ongoing improvement projects. The operations team has written a handbook of troubleshooting steps for the existing solution. To expect them to throw out all this investment, then reorganize and retrain employees around a brand new product is a tall order. They would only be motivated to do so if the performance improvements above and beyond the existing system were high enough to justify the operational cost of switching.
But they aren’t; the ML model is only good enough. To get better you need more data and more time to research. And to collect more data, the current systems have to be run for even longer, during which they will be continually improved. The business may also demand new products, or changes to the old one, which forces you to start training your model again from scratch. This serves to entrench the status quo even further, while your model is continually playing catch-up.
The above cycle resembles a variety of the Innovator’s Dilemma. In this case the dilemma emerges out of the basic dynamics of the ML development cycle. Curiously, it was the requirements of the disruptor itself, namely an ML model that needed so much data, that entrenched the incumbent system in the first place. A sort of tragic irony is at play.
In my line of work I have seen this pattern recur many times on robotics projects. Indeed, any automated system whose task is either critical, expensive, or dangerous may succumb to this vicious cycle. Trivial systems on the other hand, such as labelling images in Google Photos, don’t suffer from this paradox since the cost of making mistakes is low. The cycle only gets triggered when the bar for running a model in production is so high that a certain investment in traditional code is necessary to get the data collection pipeline started.
How can a company break out of this cycle? I’d be lying if I said I had a universal panacea. The academic world is already well aware of this problem (called the cold start problem), and there are many research projects trying to resolve it. In my own experience, the only solutions that I’ve seen work require a confluence of several factors, including: (1) clever tech leads who know how to negotiate with stakeholders and set their ML products up for success, (2) production teams that are willing to put in the time to try something new in parallel with the current system, and (3) plain luck — being in the right place at the right time to get the business hooked on the ML solution.
Another viable option is to present an opportunity for reducing costs — the new model may not perform as well as the existing solution, but if the existing solution requires continual human labour, the cost savings alone may be enough to tip the scales. Cutting down on human labour is perhaps the most straightforward method for introducing ML models into production, especially if that same labour has generated the required training data, and has been doing so for a while.
Generally there is a bit of serendipity involved in introducing ML into production. The right combination of business needs, technological opportunities, and talent must come together to sustain an ongoing ML product, one that is trusted to perform in robots without the team back-sliding into alternative approaches. Because even if the model has its foot in the door, your work still isn’t finished. In any dynamic space unexpected issues will emerge. You must have the engineering expertise and MLOps infrastructure at the ready to retrain the model and address these issues quickly, otherwise the business may lose faith in your ability to provide a robust system.
The fact is that ML as a technology is not yet at the stage of maturity that businesses can wholeheartedly trust it in production. Supporting systems, human oversight, and traditional code are still necessary, and are liberally used even by self-styled “AI” companies. I foresee a time in the near future when this will not be the case, and when the vicious cycle described above will not have a chance to gain a foothold in a product’s development cycle. Until then, we in engineering can only wait outside research labs with high hopes and bated breath.
¹ Sim2Real approaches are always less than ideal, and tend to reduce the robot’s already small performance margins.