The current industrial revolution, Industry 4.0, is seeking to take advantage of Artificial Intelligence (AI) for process control in manufacturing. The main impediment to this initiative is the current learning paradigm, since it involves learning from a large sample of bad and good actions. Let us suppose that one wishes to train an AI for process control in a factory, using hitherto available learning methodologies. For this, an environment needs to be created, within the factory, wherein the AI can execute a wide variety of possible control-actions and record relevant outcomes for every one of them. Clearly, such approaches can lead to production losses in addition to being expensive. The infeasible nature of these learning approaches is further exasperated by any real damages (e.g., machine breakdowns) done by the bad control actions taken during training.
To overcome these drawbacks, we have been developing a two phase learning paradigm that is both intuitive and powerful.
Phase One of this paradigm involves training the AI to learn baseline control policies based on recorded expert actions. For this, the AI is trained using readily available historical data, for example, factory-specific parameters, machine ages and conditions, control actions taken, and quality of the manufactured products. Such data is used by the AI to learn acceptable action sequences also known as baseline control policies. The research goal for the first phase is the development of a suite of deep learning algorithms (based on imitation learning algorithms) that facilitate training, strictly from recorded data, for a multitude of factory settings and control scenarios.
While the AI trained in Phase One could be deployed in a factory and maintain the status quo, it cannot respond to possible changes in factory environment and machinery. It cannot, in principle, outperform learned baseline policies. This much needed adaptation and improvement process constitutes Phase Two of our learning paradigm. The research goal for this phase involves the development of deep learning algorithms for baseline policy improvement (PI) and for finding policies that are robust to changes. We are developing novel critic networks that can ``retain'' baseline policies from Phase One, yet help in policy improvement in Phase Two. Whilst often in Phase Two the AI is deployed in the factory, where it continuously improves upon baseline policies and adapts to changing factory conditions, it may be noted that Phase Two can also be realized offline using suitable process models.
We believe that research conducted in this direction will usher in an era of data-driven AI in manufacturing, and contribute to making AI business-ready. Further, we believe that our paradigm is general enough to be useful in a wide range of applications where feedback and control play a major role.