ABSTRACT The Overall Equipment Effectiveness OEE stands as a key performance metric widely adopted in the manufacturing industry, aiding in enhancing productivity.. Then, prominent machi
INTRODUCTION
Rationale
In today's business landscape, the focus on digital technology development is paramount, particularly with the adoption of the Internet of Things (IoT), real-time data access, and Cyber Physical Systems In manufacturing, Industry 4.0 provides a more interconnected and comprehensive approach, enabling managers to enhance operational control and understanding, ultimately boosting productivity and fostering growth To meet rising customer demands and improve revenue while minimizing costs—such as investment, quality, labor, and maintenance—decisions must be made swiftly and effectively throughout the production process, with equipment playing a crucial role.
Overall Equipment Effectiveness (OEE) is a vital metric in manufacturing, rooted in Total Productive Maintenance (TPM) introduced by Nakajima in 1988 OEE evaluates equipment performance and identifies opportunities for improvement, focusing on the continuous enhancement of key parameters to reach optimal OEE By minimizing machine downtime and reducing equipment-related defects, TPM boosts production line efficiency, lowers costs, minimizes inventory, and enhances labor productivity.
The OEE metric plays a crucial role in manufacturing by improving return on investment (ROI) As machinery constitutes a substantial investment for companies, they actively pursue methods to optimize their ROI By analyzing OEE data, businesses can effectively showcase the benefits of investing in their machinery and equipment systems In the initial phases of production, however, equipment utilization efficiency only marginally impacts a company's overall profits.
Increasing production scale and investing in machinery lead to reduced waste and losses, ultimately resulting in higher profitability for businesses OEE (Overall Equipment Effectiveness) plays a crucial role in enhancing competitiveness by enabling efficient equipment management, which allows companies to optimize asset utilization and minimize the need for additional capital investments By focusing on continuous improvement through OEE metrics, businesses can identify production constraints and bottlenecks, thus improving their operational efficiency and competitive capabilities within the industry.
The Overall Equipment Effectiveness (OEE) metric provides production managers with a visual representation of the production line's performance, making it easier for businesses to understand production efficiency By utilizing calculated formulas and tracking production losses, OEE quantifies key factors such as availability, performance, and quality with precise numerical values This allows companies to evaluate their current production status and pinpoint areas that need prompt improvement.
Senior managers can leverage detailed production insights to implement timely measures for managing factory fluctuations To enhance production operations effectively, businesses must rely on direct data regarding equipment status This data-driven approach enables manufacturers to optimize their production lines and machinery, ultimately reducing unplanned downtime and boosting production speed.
3 or after scheduled stops Analyzing the correlation between performance and losses also indicates the potential for future maintenance, ultimately saving investment costs for production equipment
Figure 1.1 Production dashboard using in daily meeting with OEE visulization on the top-right corner
To create a visual dashboard for Overall Equipment Effectiveness (OEE), data is gathered from the Manufacturing Execution System (MES) through tablet computers at each production line station Operators input essential information, including order details, material codes, operator data, and the quantity of defective products, while also marking the completion of each production stage The maintenance department contributes by recording details about maintenance personnel and the start and end times for maintenance tasks, such as equipment repairs and inspections This comprehensive data collection enables the calculation of OEE, utilizing three key components extracted from the MES system.
The operating time of equipment is tracked through the MES system, where operators must log the start and end times for each order This data reflects the duration of equipment usage for specific orders at a designated station.
In manufacturing, downtime is defined as the time when machinery or equipment is non-operational Manufacturing Execution Systems (MES) mandate that operators log downtime whenever equipment fails or produces inconsistent product quality The downtime is officially recorded from the moment the issue arises, while maintenance personnel begin tracking the "fixtime" once they input maintenance-related details into the system.
To ensure high product quality, operators must accurately log the quantity of defective items and their associated error codes at different production stages before finalizing an order This critical data is subsequently recorded and stored for quality control purposes.
In the factory case study, Overall Equipment Effectiveness (OEE) serves as a crucial Key Performance Indicator (KPI) Data flows from the Manufacturing Execution System (MES) to SQL, where it is processed and visualized on a KPI board The production team analyzes issues and monitors equipment efficiency during daily meetings, allowing high-level management to understand the assembly line's production status This enables the identification of necessary improvement measures when OEE scores are suboptimal or when unexpected drops occur, fostering informed decision-making and proactive problem-solving in manufacturing processes.
The Manufacturing Execution System (MES) empowers production managers to monitor orders in real-time by collecting data directly from workers, enhancing decision-making efficiency This real-time reporting of Overall Equipment Effectiveness (OEE) enables timely interventions to address potential risks affecting the production line due to equipment failures Equipment-related issues can severely impact production capabilities, resulting in unmet orders and compromised product quality To counter these challenges, daily maintenance and repairs are performed based on production needs and past experiences with equipment malfunctions.
Maintenance personnel must proactively predict potential equipment failures caused by specific issues However, documenting and conveying experiential knowledge remains difficult The key challenge is creating a tool that can accurately replicate this often-hidden source of knowledge.
Based on the evidences, a predictive OEE model is proposed to serve the following purposes:
1 Automating Overall Equipment Effectiveness (OEE): By leveraging data science techniques, including machine learning predictive models and algorithms, the model aims to evaluate the productivity and performance of manufacturing operations It combines three key factors: availability, performance, and quality Traditionally, OEE calculations were manual, but with advancements in data science, automation can provide valuable insights for optimizing manufacturing processes
2 Real-time Predictions: By feeding real-time data into the trained model, it can predict OEE performance These predictions help identify bottlenecks, optimize machine utilization, and make data-driven decisions to improve overall equipment efficiency
Research objective
Research investigates algorithms to predict the Overall Equipment Effectiveness (OEE) based on historical data which was collected in an assembly production line in Viet Nam.
Research scope
The study was conducted in an assembly production line in Viet Nam.
Reseach structure
This research article is organized into five chapters: Chapter II offers a literature review on the implementation of prediction for Overall Equipment Effectiveness, while Chapter III details the methodology, focusing on the algorithms used in the study Chapter IV showcases a case study conducted on an assembly production line in Vietnam, and the article concludes with insights and suggestions for future research in Chapter V.
LITERATURE REVIEW
Overall Equipment Effectiveness Metric
Overall Equipment Effectiveness (OEE) is a vital performance measurement tool that assesses equipment efficiency by analyzing three key factors: Availability, Performance, and Quality It identifies equipment-related losses to enhance production line performance and stability, categorizing significant factors that contribute to low efficiency OEE provides valuable insights for prioritizing improvements and conducting root cause analysis, while also revealing production blind spots, particularly in low-demand product lines, which aids in workload balancing Additionally, OEE facilitates real-time monitoring of equipment efficiency through advanced information systems like Manufacturing Execution Systems (MES).
OEE (Overall Equipment Effectiveness) is a crucial metric for assessing the performance and efficiency of manufacturing processes and equipment It highlights how effectively equipment is utilized and its operational efficiency in producing goods or delivering services OEE aims to pinpoint losses that hinder equipment efficiency, which are actions that waste resources without adding value As noted by Jonsson and Lesshammar (1999), these losses can stem from chronic disruptions, which are minor and arise from multiple causes, or sporadic disruptions, which are more critical as they occur suddenly and significantly impact quality Nakajima further classified OEE development into six distinct types of sporadic losses.
Unplanned machinery and equipment failures lead to significant downtime losses, making them one of the most noticeable disruptions in production activities These failures disrupt continuous operations in the factory, highlighting the critical importance of maintaining equipment availability for optimal productivity.
Losses can arise from various factors, including damage to molds or jigs, unexpected maintenance activities, mechanical or electrical failures, and instances where equipment fails to meet operational requirements or technological parameters.
Availability is significantly impacted by downtime losses and planned stops, particularly during equipment installation and adjustment to meet customer demands These losses can be less noticeable during production due to the standardization of setup processes Key examples include the time spent on setup and startup at the beginning of shifts, as well as adjustments made when changing orders, fixtures, technological parameters, and dealing with shortages or changes in raw materials.
Performance losses due to idle running and short stops occur when production is interrupted by temporary breakdowns or insufficient manpower For instance, contamination of a mechanical component can lead to brief equipment downtime, with recovery time being quick but losses accumulating based on frequency Short stops, lasting under 5 minutes and not requiring maintenance staff, often arise from assembly line flow issues, jamming, improper feeding of parts, incorrect materials, obscured sensors, or minor malfunctions in subsequent stages.
Performance – Speed losses – Slow cycles Losses due to reduced speed resulting from differences between the design process and actual operations that can be met
In some cases, equipment is operated at a lower speed than designed/standard, causing losses to OEE Some situations contributing to this loss may include
9 operating equipment under unsuitable conditions (environment, materials, ), operating equipment at a slower speed than designed or standard operation, equipment/components being dirty/worn/damaged, and operators lacking competence
Quality losses in manufacturing arise from defective products and the need for rework, often triggered by equipment malfunctions These losses encompass items produced under the assumption of normal operational conditions, leading to immediate rework, scrap, and quality issues in later stages if defects are not identified.
Quality losses during startup can significantly impact overall equipment effectiveness (OEE) due to both machine downtime and the production of defective products When equipment is being started or adjusted, any defects may be detected early for elimination or repair, but some may slip through and affect later processes, leading to more substantial quality issues Addressing these challenges is crucial for startups to minimize losses and enhance productivity.
Figure 2.1 OEE measurement tools and industry integration aspects [1]
Overall Equipment Effectiveness (OEE) is an essential manufacturing metric that measures the true productivity of production time It encompasses three main components: availability, which compares actual production time to planned time; performance, assessing equipment efficiency against its maximum potential; and quality, focusing on the rate of defect-free products By calculating OEE through the multiplication of these factors, manufacturers can pinpoint areas for optimization, monitor progress, and enhance overall efficiency.
The Overall Equipment Effectiveness (OEE) is calculated using the formula OEE = A x P x Q, where A represents Availability Availability accounts for the operating rate by considering various factors that impact a machine's usability, including both unplanned and planned downtime that may influence expected production activities over an extended period.
Performance measures the efficiency rate by accounting for factors that lead to loss or reduced output This includes elements that cause manufacturing equipment to operate below its maximum speed, such as slow cycles and minor pauses during operation.
Q - Quality is the quality rate used to assess factors of product quality assurance, not meeting quality standards, including products requiring rework
OEE prediction research
Machine learning (ML) models are increasingly being utilized in manufacturing to predict key performance indicators These models excel in learning, generalization, and adaptability, making them suitable for various applications For instance, Kuo & Lin integrated neural networks and decision trees to forecast availability efficiency, closely linked to Overall Equipment Effectiveness (OEE) in washing machines Similarly, Mazgualdi et al employed several algorithms, including SVM, RF, and XGBoost, to predict OEE in automotive cable production, with ANNs and RF showing the highest accuracy Additionally, De Souza et al utilized both supervised and unsupervised learning models to analyze historical data for estimating production line efficiency, yielding valuable insights into equipment behavior and identifying patterns to optimize production processes.
Numerous supervised machine-learning models have been developed for predicting Overall Equipment Effectiveness (OEE), with the Bayesian Ridge Regression model achieving remarkable accuracy exceeding 99% Lukas et al implemented a machine learning model utilizing sensor data to predict OEE, validated through real-world data from a continuous extrusion process in the plastics industry Additionally, Lucantoni et al investigated rule-based machine learning techniques, which led to a significant OEE improvement of 5.4% by effectively selecting enhancement actions.
Table 2-1 Overview of OEE prediction literatures
Literature Objective Application field Model Input
Availability of the eights previous 8 weeks
Supervised and Unsupervised learning models
Extrusion process in plastic industry
Time, OEE factors, Failure mode
LR, SVM, RF, XGBoost, ANN
Predicting Overall Equipment Effectiveness (OEE) is achievable across various industries, primarily using supervised learning models that exhibit low prediction errors Different fields necessitate distinct input data; for example, some studies utilize data from the past eight weeks, while others consider product characteristics and OEE formulation factors Additionally, certain research employs device sensor inputs, and others analyze historical failure modes to enhance predictions and propose solutions Most production environments are relatively straightforward, featuring few processes and machines In contrast, this study examines a complex assembly line with 49 production processes dedicated to miniature electronic devices, integrating insights from diverse literature sources to combine time and OEE factors, thereby confirming the viability of OEE prediction in this intricate setting.
METHODOLOGY
Overall Equipment Effectiveness
In the context of Total Productive Maintenance (TPM), Overall Equipment Effectiveness (OEE) is assessed through six primary losses categorized into three key groups: Availability, Performance, and Quality Availability measures the percentage of machine downtime, encompassing both unplanned and planned stoppages Performance evaluates the operational speed relative to small stops, such as idle time and brief interruptions, as well as slow cycles characterized by reduced speed Quality assesses the proportion of acceptable products by accounting for production rejects, including defective items and those requiring rework, along with startup rejects caused by inconsistent equipment productivity.
Overall Equipment Effectiveness (OEE) is defined by multiplying factors as Eq (1)
Availability (A) refers to the operating rate of a machine, taking into account various factors that impact its usability This includes both unplanned and planned downtime that can influence expected production activities over an extended period.
Performance measures the efficiency of manufacturing equipment by evaluating the impact of factors that reduce operational speed, such as slow cycles and minor pauses, which prevent the machinery from reaching its maximum potential.
Q - Quality is the quality rate used to assess factors of product quality assurance, not meeting quality standards, including products requiring rework
AI-based OEE prediction models
AI-driven OEE prediction models utilize supervised learning techniques, as illustrated in Fig 3.1 [18] These models forecast the Overall Equipment Effectiveness (OEE) for the upcoming shift by analyzing historical data gathered from prior shifts.
An investigation is initiated to analyze business objectives, current conditions, and existing challenges to define model goals, with a particular emphasis on forecasting Overall Equipment Effectiveness (OEE) Subsequently, relevant data is identified and collected, undergoing essential statistical procedures like descriptive analysis and exploration to guarantee data quality.
In Step 3, the selected data is meticulously prepared by structuring it into the desired format and applying data cleaning techniques to eliminate outliers and inconsistencies Symbolic and categorical data is converted into numerical formats, while numerical data undergoes normalization through methods like standardization This thorough preparation ensures that the data is ready for building machine learning models in Step 4.
Subsequently, four prominent data mining methods—Linear Regression [19], Support Vector Regression [20], Random Forest [21], Extreme Gradient Boosting
Artificial Neural Networks and other prediction models are utilized to assess their effectiveness through performance metrics detailed in Section 3.4, with outcomes being compared against established business objectives.
Various visualisation packages and tools are utilised to enhance communication and contextualisation of the data In the final step, the selected model is used to forecast OEE
Figure 3.1 Step by step to process data.
Algorithm
Artificial Intelligence (AI) has become integral to our daily lives, evident in technologies like facial recognition on smartphones, Apple's virtual assistant, and e-commerce product recommendations Its influence is expanding into manufacturing, where AI is utilized for quality control and predictive maintenance, showcasing its versatility and growing importance across various sectors.
Machine Learning (ML), a key component of artificial intelligence, utilizes algorithms that learn from input data to forecast output values These algorithms are primarily classified into three categories: supervised learning, unsupervised learning, and reinforcement learning, based on their learning methodologies.
1 Supervised Learning: This algorithm predicts the output of new input data based on known input-output pairs It has two fundamental applications: classification and regression Classification involves assigning data points to predefined categories, while regression predicts continuous values
2 Unsupervised Learning: In contrast, unsupervised learning relies on the structure of input data (without knowing the output) to perform tasks such as clustering or dimension reduction Clustering groups similar data points
17 together, while dimension reduction simplifies data storage and computation
3 Reinforcement Learning: This type of algorithm automatically determines output values to maximize benefits It’s commonly used in scenarios where an agent interacts with an environment and learns from feedback
In this research, we address the Overall Equipment Effectiveness (OEE) problem by calculating the predicted value for the next production shift using input-output pairs from prior shifts To achieve this, we employ supervised learning algorithms, particularly regression techniques, which are essential for optimizing processes and enhancing efficiency across multiple industries.
Figure 3.2 Prediction approach of the thesis
Linear refers to a straight or flat relationship in mathematics In two-dimensional space, a function is linear if its graph is a straight line, while in three-dimensional space, it forms a plane For higher-dimensional spaces, the concept extends to hyperplanes In regression analysis, the linear regression algorithm identifies the best-fitting line, plane, or hyperplane based on the input data, aiming to minimize the error across the training dataset.
- Weight is a column vector containing the coefficients that need to be optimized during the regression process:
- Input data is a row vector containing input data:
We have, the linear equation of the regression algorithm is as follows:
The error \( e \) represents the discrepancy between the actual value \( y \) and the predicted value \( \hat{y} \) The objective is to determine the weight \( w \) that minimizes this error \( e \).
In particular, the fraction ẵ helps facilitate calculations when the algorithm needs a derivative to find the optimal value without affecting the prediction results b Loss function
For linear regression problems, the data needed to be provided to the model are pairs (input, output):
(𝑥 𝑖 , 𝑦 𝑖 ), 𝑖 = 1,2, … , 𝑁 (9) Where, N is the number of observed data And the objective function of the problem is to minimize the error with the following loss function:
To minimize the loss function, it is essential to identify the coefficient vector w that achieves the lowest possible value In other words, the goal is to determine the optimal point for the weight vector w.
Derivative with respect to the weight w of the loss function
The zero derivative equation is equivalent to:
If the square matrix 𝐴 ≜ 𝑋̅ 𝑇 𝑋̅ is invertible, the equation has a solution 𝑤 = 𝐴 −1 𝑏
If matrix A is not invertible, indicated by a determinant of zero, the equation may either have no solutions or infinitely many solutions In such cases, the pseudoinverse, denoted as 𝐴 †, is utilized to find the optimal solution to the problem.
Support Vector Machine (SVM) is a widely used machine learning technique for classification tasks, while Support Vector Regression (SVR) applies similar principles but focuses on predicting continuous real numbers The key distinction between SVM and SVR is that SVR addresses a regression problem, which involves predicting a nearly infinite range of values Ultimately, both methods aim to minimize prediction error and maximize the margin between data points.
In the regression problem, the main goal is to find a continuous function based on the initial training data set For SVM regression, the input value x is mapped into an m-
20 dimensional space using fixed (nonlinear) mappings and then a linear model is built in this space
In which, the function g(x) represents a set of nonlinear transformations
Figure 3.3 Support vector Machine model in 2-dimensional space
SVM regression performs linear regression in multidimensional space using ε- insensitive loss type, while trying to reduce model complexity by minimizing‖𝑤‖ 2
Through two slack variables 𝜉 𝑖 , 𝜉 𝑖 ∗ i=1,…,n to measure the fluctuations of training data outside the ε region Then, SVM regression is formulated to minimize the following function:
The optimization problem of this function can be replaced by a dual problem and the solution is:
0 ≤ 𝛼 𝑖 ≤ 𝐶 Where, 𝑛 𝑆𝑉 is the number of support vectors and 𝐾(𝑥 𝑖 , 𝑥) is a kernel function
Figure 3.4 Role of Kernel function in SVM/SVR
The kernel function plays a crucial role in identifying a hyperplane within a higher-dimensional space of a dataset, enabling classification without increasing computational demands It is particularly useful when a suitable plane cannot be found to classify the data effectively The accuracy of Support Vector Regression (SVR) largely hinges on user-defined meta-parameters, including C, ε, and the kernel function The parameter C balances model complexity with the tolerance for data points outside the ε region, while ε determines the width of this region, influencing the optimization process.
The size of the ε value in a training dataset influences the selection of support vectors, with larger ε values resulting in fewer selected vectors Both the real numbers C and ε play a significant role in determining model complexity, albeit through different trends Additionally, there are four commonly utilized types of kernel functions, as outlined in the table below.
Support Vector Machines (SVM) are utilized for regression problems by identifying a function that maps input data in a multidimensional space to predict real values In this context, the two red lines represent decision boundaries, while the blue line denotes the hyperplane The primary objective of Support Vector Regression (SVR) is to determine a hyperplane function that encompasses the maximum number of training data points within the smallest ε margin.
Random Forest Regression (RF) is an advanced regression algorithm that utilizes the collective computation of multiple decision trees to forecast a specific variable value Upon receiving a set of input values, RF generates several decision trees based on the data's characteristics, and it predicts outcomes by averaging the predictions from all the individual trees.
Figure 3.5 Random Forest Regression model
Suppose the data set has n data and each data has d attributes The sequence of building each decision tree is as follows:
1 Get a random amount of data
2 Randomly select a number of attributes
3 Use the Decision Tree algorithm to build a decision tree with the data set selected above
4 Continue similarly for the remaining decision trees
5 Calculate the average of the predicted values of the decision trees
A decision tree is a fundamental data structure in machine learning that systematically partitions a dataset into smaller subsets based on specific features This process continues until only a single class remains, effectively categorizing the data For instance, a decision tree utilizes a sequence of yes or no questions to incrementally divide the data into distinct categories.
Model Performance Metrics
Performance metrics are essential for evaluating predictive models, with key indicators including Mean Absolute Error (MAE), Mean Squared Error (MSE), Mean Absolute Percent Error (MAPE), Root Mean Square Error (RMSE), and R-squared or adjusted R-squared These metrics utilize actual values (𝑦𝑡), predicted values (𝑦̂𝑡), and residuals (𝜀𝑡) to assess the accuracy of predictions, considering the number of observations (n) and features (p).
Cross validation
Cross-validation is a crucial technique for assessing the effectiveness of predictive machine learning models It involves dividing a dataset into multiple folds, allowing the model to be trained and tested on different subsets For instance, with a 5-fold cross-validation, the dataset is split into five parts, and the model is trained five times, each time using a different combination of training and testing data This method helps to prevent overfitting, which can occur when a model is trained on a fixed dataset that fails to generalize well to new input data.
CASE STUDY
Business understanding
A case study was carried out at an electronic components assembly company in Vietnam aimed at predicting the Overall Equipment Effectiveness (OEE) index The study sought to identify strategies for enhancing equipment performance across production lines.
The assembly process consists of 49 stations, following an S-line layout, which includes critical operations such as laser marking, laser welding, gluing, coil winding, soldering, and performance measurement of final products Each station is equipped with a PC control unit or tablet that communicates with the MES system, facilitating real-time data transfer to a central storage system Users can efficiently collect and analyze this data through Microsoft SQL, ensuring precise control over the manufacturing of products with infinitesimal dimensions.
Data understanding
Figure 4.3 Data process flow of the case study
Data collected from SQL encompasses four distinct fields, including cycle time data for each order as it moves through various workstations In fully automated stages, the system automatically sends a start signal to the MES system, and upon completion at the station, a signal is sent to mark the end of the order For semi-automatic or manual stations, operators must transmit this signal to the MES system using tablets at their workstations The recording time is divided into two components: MES activity time and processing time, with cycle time being the sum of both This process captures essential information such as time, order details, cycle time, and the operation name or position.
Second, the line error data field provides information related to time, number of defective products for each workstation, and failure modes Similar to cycle time data
30 field, this data is recorded by operator or through the automative inspection and measurement system
The downtime data field is crucial for tracking equipment issues, complementing the previously mentioned data fields It includes the manually recorded fixtime, which maintenance staff must log after submitting equipment inspection orders prior to repairs or installations Timely response from maintenance personnel is essential to minimize production disruptions; however, delays may occur if resources are unresponsive or if specialized tools are needed, resulting in a discrepancy between downtime and fixtime Additionally, this data field captures information about the errors faced by maintenance and the methods employed to resolve them.
Finnally, the data from demand of customer, it directly affects to orders on production line It includes the quantity of each product series.
Data preparation
This case study collected data from early 2022 to the end of 2023, requiring processing to create a structured dataset for prediction modeling The dataset includes columns representing features for input and rows for training and testing observations The processed data computes Overall Equipment Effectiveness (OEE), as illustrated in Fig 4.4 Key time-related fields include date, week, month, shift, and day of the week, which facilitate calculations By integrating cycle time data, actual production time is assessed alongside downtime to compute running time and availability ratios Additionally, the Demand data field encompasses planned output quantities, which are then compared with actual output from the cycle time data.
31 field to compute the performance ratio And last one is quality ratio which calculated from line error data field with good and reject quantity
Figure 4.4 Data processing to create label dataset
Related to “Cycle time” data field, the column names are listed as table 4-1 In that, there are 7 columns which is collected for OEE calculation consist of week (column
The article discusses the organization of data by separating it into five distinct columns for integration purposes These columns include date, week, month, working shift, and day of the week, derived from related time columns (1, 3, and 4) Additionally, it highlights key metrics such as shift (column 3), date_time (column 4), item configuration (column 14), yield (column 18), scrap (column 19), and total_time (column 25).
Figure 4.5 Example of related-time data
Figure 4.6 Example of related-time data after processing
The production line consists of 49 working stations, with the planned production time calculated as the average of the total time across all stations (column 25) Using Python software and the pandas library, the related time data is grouped and summed to determine the total time, which is then divided by 49 to find the average Additionally, downtime is collected for further analysis.
“Downtime” data field Then running time is computed by the planned production time minus downtime From equation 2, availability is defined
Table 4-1 Column list of “Cycle time” data field
No Column name Description Collected data
1 Week The recorded week of the order at each working station X
2 Date The recorded date of the order at each working station
3 Shift The recorded shift of the order at each working station X
4 Date_time The recorded time of the order at each working station X
5 Pool The variable to get data from
SQL sever A pool represents a production line
6 Warehouse Warehouse position where the order will be stored
7 Mes_order_number Mes code of the order
8 Order_status Status of the order
9 Op_type Level of OP
11 Mes_operation_number Mes code of working station
12 Operation_name Working station name
13 Operation position Position of the working station by interger (1~49)
14 Item_configuration Code of product X
15 Product name Name of product
16 Last_operation The varaiable to track which one is the last operation of order
17 Workplace_machine Code of machine
18 Yield Number of good products after assembling process X
19 Scrap Number of failed products after assembling process X
20 Rework Number of “hot” rework products after assembling process
21 Scheduled_start_time Plan to start order
22 Scheduled_end_time Plan to end order
23 First_logon_time Time when operators start the order at working station
24 Last_logoff_time Time when operators end the order at working station
25 Total time Calculated time based on 23 and 24 columns X
26 Modified_at Last modified time of the observation
Related-time data is processed similarly to the "Cycle time" data field, serving as a unified form of time-related data for integration purposes For OEE calculation, five key columns are collected: week (column 1), shift (column 3), date_time (column 4), downtime (column 23), and fixtime (column 24) Notably, downtime is averaged across 49 working stations.
Table 4-2 Column list of “Downtime” data field
No Column name Description Collected data
1 Week The recorded week of the order at each working station X
2 Date The recorded date of the order at each working station
3 Shift The recorded shift of the order at each working station X
4 Date_time The recorded time of the order at each working station X
5 Pool The variable to get data from
SQL sever A pool represents a production line
6 Event_id Id of trouble event
7 Event_name Name of trouble event
8 Msg_subject Working station code
9 Machine name Name of machine
10 Machine id ID of machine
11 Msg_body First initial assessment of trouble from operator
12 Information Chanel to inform the trouble
15 End_remark Comment from maintenance at the end of fixing
18 Cause of trouble Cause of trouble
20 Msg_date Recorded when machine start being broken
21 Read_date Recorded when maintenance start fixing machine
22 End_date Recorded when maintenance is done fixing
23 Downtime Calculated downtime from column 20 and 22 X
24 Fixtime Calculated downtime from column 21 and 22 X
25 Modified_at Last modified time of the observation
To calculate the performance ratio, the actual output is determined by aggregating related time data and summing the yield column, which is then divided by 49, representing the number of workstations The planned output is obtained from the production planner's data, which is not displayed in the MES system The performance ratio is defined using Equation 3.
Table 4-3 Column list of “Line error” data field
No Column name Description Collected data
1 Week The recorded week of the order at each working station X
2 Date The recorded date of the order at each working station
3 Shift The recorded shift of the order at each working station X
4 Date_time The recorded time of the order at each working station X
5 Pool The variable to get data from
SQL sever A pool represents a production line
6 Warehouse Warehouse position where the order will be stored
7 Mes_order_number Mes code of the order
8 Mes_operation_number Mes code of the working station
9 Operation name Name of the working station
10 Operation position Position of the working station by interger (1~49)
11 Gross The number of order gross X
12 Item-configuartion Code of product
13 Product name Name of product
14 Reason code Code of failure mode
16 Error The number of error X
17 Modified_at Last modified time of the observation
The quality ratio for OEE calculation is derived from five key columns: week, shift, date_time, gross, and error Total errors are calculated by summing all errors based on the corresponding time data, while the total count is obtained by adding the gross values The quality ratio is computed using a specific equation, and the overall OEE ratio is determined by multiplying three essential components: availability, performance, and quality.
A data processing workflow was established to generate a feature dataset, where columns signify features and rows denote observations This dataset integrates four data fields from the Manufacturing Execution System (MES), with a key focus on time-related information, including date, week, month, shift, and day of the week, to consolidate all data fields into a single cohesive dataset.
A set of predictors involves three main data groups: time-related data, production- related data, and maintenance-related data, as summarised in Table 4-1
Time-related data includes essential metrics such as date, month, shift, and weekday, which are crucial for tracking production schedules On a production assembly line, diverse products are assembled using uniform machinery, ensuring consistency in the assembly process Each product follows the same operational workflow, despite variations in design The overall production output is determined by the primary order quantity along with any necessary rework quantities Operators maintain fixed work positions throughout their shifts, enhancing efficiency and productivity on the assembly line.
Operators undergo a minimum of one month of training before integrating into the main production line, ensuring a consistent skill level across the team The facility features 20 distinct product lines, each differing in materials, assembly processes, and intended applications.
Maintenance-related data: Data related to maintenance, which contains scheduled inspections, repairs, replacements, and overall asset health From such data, downtime and fixed time can be calculated
Figure 4.8 Data process flow to create feature dataset
Category Data Type of variables
Number of error products Integer
Actual output of the previous working shift Integer
OEE value of the previous working shift Integer
Number of orders per product series (20 product series available)
Actual total production time Integer Unit= secs
Fixed Time Integer Unit= secs
Table 4-5 Data collected before modeling
Category Data Data field Note
Group in each data field
Week Month Working shift Day of week
Number of error products Line error
Group by time-related and sum of error column
Actual output of the previous working shift
Group by time-related and sum of yield column
OEE value of the previous working shift
OEE label Collect from OEE label as section 4.3.1
Number of orders per product series
Demand file store the plan of production per working shift and product series
Actual total production time Cycle time
Group by time-related and sum of total_time column
Downtime Downtime Group by time-related and sum of downtime
Fixed Time Fixtime Group by time-related and sum of fixtime
This case study analyzes data collected from early 2022 to the end of 2023, encompassing 31 features and 1,639 observations, with Overall Equipment Effectiveness (OEE) as the key label Each observation corresponds to a single working shift While OEE remains relatively stable throughout the study, certain shifts exhibit unacceptable levels of performance, highlighting the necessity for OEE prediction to facilitate system adjustments The dataset was divided into two groups: 90% for training and 10% for testing.
Modeling and evaluation
Before modeling, data undergoes a cleaning process to eliminate noise, including NaN values and duplicates Subsequently, the data is standardized to ensure a balanced scale The dataset is then divided into training and testing groups, with the testing set comprising 10% of the total data Following this, the data is trained and optimized using two methods: GridSearchCV and hyperparameter experiments Finally, cross-validation is performed to re-evaluate the model's performance.
Figure 4.10 Prediction result of Linear Regression model
Linear regression results indicate strong performance, with predicted values closely aligning with actual outcomes However, during significant downtrends in actual values, the predictions fall short, leading to potential subjectivity for production managers.
Figure 4.11 Prediction result of SVR model using “linear” kernel
Figure 4.12 Prediction result of SVR model using “polynominal” kernel
Figure 4.13 Prediction result of SVR model using “sigmoid” kernel
Figure 4.14 Prediction result of SVR model using “rbf” kernel
Through the application of GridSearchCV and hyperparameter experiments, the optimal parameters for the Support Vector Regression (SVR) model were identified as C = 1000, Gamma = 0.01, and the use of an "rbf" kernel Despite achieving excellent training results, the SVR model exhibited high error rates during cross-validation, indicating overfitting Consequently, the findings suggest that Support Vector Regression may not be the most suitable method for predicting Overall Equipment Effectiveness (OEE) based on the case study data.
Figure 4.15 Prediction result of Random Forest Regression model
The Random Forest Regression model demonstrates impressive performance, particularly excelling in low error rates, despite a tendency to provide conservative estimates for high values This characteristic is not a major concern, as low values hold greater significance in this context Furthermore, cross-validation results support the effectiveness of the Random Forest model in accurately predicting Overall Equipment Effectiveness (OEE) values.
Figure 4.16 Prediction result of XGBoost Regression model
The XGBoost Regression model excels in estimation accuracy, building upon the foundation of the Random Forest model Its performance is characterized by minimal error rates and impressive cross-validation results.
Artificial Neural Networks optimize Mean Squared Error (MSE) by fine-tuning hyperparameters, achieving optimal performance with a structure of three hidden layers The architecture includes 192 neurons in the first layer, 256 neurons in the second layer, and 192 neurons in the final layer.
Figure 4.17 Prediction result of ANN model
Deployment
In daily production meetings, the predicted Overall Equipment Effectiveness (OEE) value is crucial for discussions If the forecast indicates a downward trend, it is essential for the production manager to closely monitor the production line and implement preventive measures based on prior experience.
Comparison performance of models
Figure 4.18 Mean absolute error comparision among prediction models
The comparison between models using to predict OEE is show as Figure 4.18 to 4.21
The most effective models identified are Extreme Gradient Boosting (XGB) and Artificial Neural Networks (ANN) To evaluate their performance, key metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE) are calculated.
Figure 4.19 Root mean squared error comparision among prediction models
Figure 4.20 Mean squared error comparision among prediction models
Figure 4.21 Mean absolute percentage error comparision among prediction models
The R2-adjusted metric is utilized to assess the predictive capability of the dataset, with the XGBoost model demonstrating superior performance, achieving R2-adjusted values of 0.998 for training and 0.997 for testing Following XGBoost, the ANN, SVR, and RF models also contribute to the evaluation.
The performance of the LP model ranks lowest among the evaluated models In both training and testing phases, the XGBoost model achieved the lowest values for MAE, MSE, and RMSE, demonstrating a narrower range during 5-fold cross-validation Conversely, the SVR model exhibited relatively poor performance, particularly in MAPE, recording an average value of 1.14 after cross-validation In contrast, other models outperformed SVR in MAPE, with XGBoost achieving 0.076, ANN at 0.201, LP at 0.260, and RF at 0.233.
CONCLUSION AND FUTURE DIRECTION
Conclusion and future direction
This study highlights the significance of predicting organizational KPIs to enable timely adjustments in production plans We employed various machine learning models to forecast Overall Equipment Effectiveness (OEE), achieving highly accurate results A practical case study conducted in a Vietnamese electronics component assembly company demonstrated our approach, utilizing a dataset with 31 features and 1,639 samples We trained five machine learning models: Linear Regression, Support Vector Regression, Random Forest, Extreme Gradient Boosting, and Artificial Neural Networks The findings reveal that the XGBoost (Xgb) model outperforms the others in accuracy.
To achieve effective practical application, it is essential to utilize a model that incorporates high-quality inputs Overall Equipment Effectiveness (OEE) serves as a crucial metric influenced by various factors, such as human involvement and environmental conditions, both of which carry inherent uncertainties.
To tackle this challenge, implement a reinforcement learning model that includes uncertainty factors like operator performance scores, temperature, and equipment characteristics Furthermore, investigate additional development strategies by suggesting preventive measures derived from Overall Equipment Effectiveness (OEE) and historical repair data from the maintenance team.
Limitation
This study has notable limitations that present opportunities for future research While our model is designed specifically to predict Overall Equipment Effectiveness (OEE), it can be enhanced by incorporating additional Key Performance Indicators (KPIs) with suitable modifications Furthermore, future research could concentrate on generating and identifying the most effective subset of predictors for training machine learning models, with a particular emphasis on feature selection techniques.
[1] E Cevikcan, Industry 4.0: Managing the digital transformation, Springer
[2] C Li, Y Chen and Y.Shang, "A review of industrial big data for decision making in intelligent manufacturing, Engineering Science Technology",
Engineering Science and Technology an International Journal, vol 29, 2022
[3] F Tao, Q Qi, A Liu and A Kusiak, "Data-driven smart manufacturing",
Journal of Manufacturing Systems, vol 48, pp 157-169, 2018
[4] M Žilka, Z T Kalender, J Lhota, V Kalina and R Pinto, "Tools to support managerial decision-building competencies in data driven decision making in manufacturing SMEs", Procedia Computer Science, vol 232, pp 416-425,
[5] S Nakajima, TPM tenkai, Tokyo: Japan Institute of Plant Maintenance, 1982
[6] P Jonsson and M Lesshamma, "Evaluation and improvement of manufacturing performance measurement systems‐the role of OEE",
International Journal of Operations and Production Management, vol 19, pp
[7] Z Kang, C Catal and B Tekinerdogan, "Machine learning applications in production lines: A systematic literature review", Computers and Industrial Engineering, vol 149, 2020
[8] Y Kuo and K P Lin, "Using neural network and decision tree for machine reliability prediction", The International Journal of Advanced Manufacturing
[9] C E Mazgualdi, T Masrour, I E Hassani and A Khdoudi, "Using machine learning for predicting efficiency in manufacturing industry", Advanced Intelligent Systems for Sustainable Development (AI2SD’2019), Volume 3-
Advanced Intelligent Systems for Sustainable Development Applied to Environment, Industry and Economy: Springer, 2020, pp 750-762
[10] C E Mazgualdi, T Masrour, I E Hassani and A Khdoudi, "Machine learning for KPIs prediction: a case study of the overall equipment effectiveness within the automotive industry", Soft Computing, vol 25, pp 2891-2909, 2021
[11] B V D Souza, S R B D Santos, A M D Oliveira and S N Givigi,
"Analysing and predicting Overall Equipment Effectiveness in manufacturing industries using machine learning", 2022 IEEE International Systems Conference (SysCon) IEEE, 2022, pp 2891-2909
[12] M Imane, E S Aoula and E H Achouyab, "Using Bayesian ridge regression to predict the overall equipment effectiveness performance," 2022 2nd International Conference on Innovative Research in Applied Science,
Engineering and Technology (IRASET): IEEE, 2022, pp 1-4
[13] L Longard, T Prein and J Metternich, "Intraday forecasting of OEE through sensor data and machine learning", Procedia CIRP, vol 120, pp.93-98, 2023
[14] L Lucantoni, S Antomarioni, F E Ciarapica and M Bevilacqua, "A rule- based machine learning methodology for the proactive improvement of OEE: a real case study", International Journal of Quality and Reliability Management, vol 41, pp 1356-1376, 2024
[15] P Dobra and J Jósvai, "Assembly Line Overall Equipment Effectiveness
(OEE) Prediction from Human Estimation to Supervised Machine Learning",
Journal of Manufacturing and Materials Processing, vol 6, iss 3, pp 59,
[16] I E Hassani, C E Mazgualdi and T Masrour, "Artificial Intelligence and
Machine Learning to Predict and Improve Efficiency in Manufacturing Industry." Internet: https://arxiv.org/abs/1901.02256, 2019
[17] P D Groote, "Maintenance performance analysis: a practical approach",
Journal of Quality in Maintenance Engineering, vol 1, pp 4-24, 1995
[18] J Han, J Pei and H Tong, Data mining: concepts and techniques, Morgan kaufmann, 2022
[19] S Chatterjee and A S Hadi, Regression analysis by example, John Wiley &
[20] C J Burges, "A tutorial on support vector machines for pattern recognition",
Data mining and knowledge discovery, vol 2, pp 121-167, 1998
[21] T K Ho, "Random decision forests", Proceedings of 3rd international conference on document analysis and recognition, IEEE, 1995, pp 278-282
[22] T Chen, T He, M Benesty, V Khotilovich, Y Tang and H Cho, "Xgboost: extreme gradient boosting", R package version, vol 4, pp 1-4, 2015
[23] S C Wang, "Artificial neural network", Interdisciplinary computing in java programming, Springer, 2003, pp 81-100
[24] F Pedregosa, G Varoquaux, A Gramfort, V Michel, B Thirion and O
Grisel, "Scikit-learn: Machine learning in Python", The Journal of machine Learning research, vol 12, pp 2825-2830, 2011
[25] D D Nguyen, C Lohrmann and P Luukka, "A Comparison of Feature
Construction Methods in the Context of Supervised Feature Selection for Classification", International Conference on Green Technology and Sustainable Development, Springer, 2022, pp 48-59
[26] V H Tiệp "Machine Learning cơ bản", Internet: www.machinelearningcoban.com, Dec 26, 2016.