How Failure Mode Analytics Streamline the Factory Floor

This article was written and contributed by Anubhav Bhatia, VP Engineering, Intelligent Asset Management at SAP.

Learning from failures is an important part of moving forward in any industry, job or project. It’s no different in the manufacturing environment where various equipment failures can provide invaluable operational insight to streamline operations and improve safety.

Manufacturing organizations that manage assets may have years of data that is collected from their machines or equipment. Buried within the years of data are failure-related notifications involving events associated with the asset (e.g., maintenance performed, error code displays, customer complaints, part replacements, work orders, etc.) Unfortunately, this valuable information is often hidden within millions of lines of text (e.g., within free-form text within the notifications and work orders, etc.) which makes it difficult, if not impossible, to analyze how often a particular failure occurred in the past, or if there is equipment in their fleet that perhaps has a higher occurrence of failure than others.

So, while the organization has a proliferation of data on asset performance, without the proper tools in place, they cannot derive value and improve business operations.

Getting Started

Failure mode Analytics is a system used to help organizations derive value from existing notification data which was previously not possible. Furthermore, it can analyze historical notification texts and assign the most likely failure mode in a fraction of the time it would take a human to do so manually. The technology allows experts to validate high quality failure mode analysis so the results can be fed back into the machine learning engine so past knowledge can inform future failure node models. The resulting model enables a proactive maintenance approach that avoids unplanned downtime by timely replacement or maintenance of equipment that is near failure.

How it Works

Failure mode analytics can create actionable knowledge from historical data, by providing a mechanism by which a user can build a machine learning model for failure mode analysis based on their historical data. They can then train the model, validate the model, and score the model to determine its effectiveness.

During an unsupervised learning stage, the system trains a topic model (e.g., a newly created model or update an existing model) for failure mode analysis based on the historical asset data. It can then use machine learning to extract topics from the historical data based on textual information in the existing notifications and match the topics to predefined failure modes for such asset (e.g., asset-specific failure modes). Then, the unsupervised model for failure mode analysis can be stored in a database for ongoing access.

Further, a streamlined user interface allows users (e.g., subject matter experts, reliability engineers, etc.) to validate the unsupervised model for failure mode analysis and make modifications, if necessary. For example, the user can double-check the machine learning matches between topics, predefined failure modes and reassign a topic to a different failure mode. The interface also provides information about each topic such as an identification of top keywords of a specific topic, etc.

During a second stage, the system performs supervised learning on the validated model, otherwise referred to as ensemble learning. During this stage, the system uses the model to predict failure modes associated with notifications, creating mappings on raw data and providing insights into the model’s quality and effectiveness, through various metrics (e.g., top failure modes, KPIs, etc.). Once the user has achieved a desired result with the model during the supervised learning, the text classification model can be stored and/or provided to a monitoring system for monitoring assets such as a condition-based monitoring platform.

The finished model for failure mode analytics receives a notification and identifies which failure mode the notification belongs to and automatically assigns the best suitable failure mode accordingly. With the help of these assignments, the system can calculate indicators such as MTTF (Mean Time To Failure), MTTR (Mean Time to Repair) and MTBF (Mean Time between Failures). Furthermore, this technology provides the end user with additional details about the failures such as how often which failure mode appeared in notifications for the equipment model and it will display if a failure mode is detected more than average compared across all equipment of that model.

Business Benefits

Failure mode analytics allows companies to get to the root cause of various malfunctions and aids in swift issue remediation. Further, having full visibility into product performance can help inform future design efforts, ultimately creating more seamless factory operations. Being able to pinpoint equipment failures will inevitably help organizations can get ahead of potential setbacks, therefore reducing the any chance of downtime and the steep costs associated with it.