1. Trang chủ
  2. » Luận Văn - Báo Cáo

Luận văn thạc sĩ Kỹ thuật công nghiệp: Application of clustering algorithm for storage location assignment problem in synchronized zone order pciking warehouses

79 1 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Nội dung

Trang 1

VIET NAM NATIONAL UNIVERSITY HO CHI MINH CITY

HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY

HUYNH HUU DUC

APPLICATION OF CLUSTERING ALGORITHM FOR STORAGE LOCATION ASSIGNMENT PROBLEM IN SYNCHRONIZED ZONE ORDER PICKING WAREHOUSES

Major: Industrial Engineering Major ID: 8520117

MASTER THESIS

HO CHI MINH CITY, January 2023

Trang 2

THIS RESEARCH IS COMPLETED AT:

HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY – VNU HCM

Instructor: PhD Nguyen Duc Duy ………

Examiner 1: PhD Do Thanh Luu………

Examiner 2: PhD Le Song Thanh Quynh………

Master’s Thesis is defended at HCMC University of Technology, VNU-HCM on January 08, 2023 The Board of The Master’s Thesis Defense Council includes: 1 Chairman: Assoc Prof PhD Do Ngoc Hien………

2 Secretary: PhD Le Duc Dao………

3 Counter-Argument Member: PhD Do Thanh Luu………

4 Counter-Argument Member: PhD Le Song Thanh Quynh………

5 Council Member: PhD Nguyen Van Thanh………

Verification of the Chairman of the Master’s Thesis Defense Council and the Dean of the Faculty of Mechanical Engineering after the thesis is corrected (if any)

CHAIRMAN OF THE COUNCIL

(Full name and signature)

DEAN OF FACULTY OF MECHANICAL ENGINEERING

(Full name and signature)

Trang 3

VIETNAM NATIONAL UNIVERSITY HCMC SOCIALIST REPUBLIC OF VIETNAM VNUHCM UNIVERSITY OF TECHNOLOGY

Independence – Liberty – Happiness

MASTER’S THESIS ASSIGNMENTS

Full name: HUYNH HUU DUC Learner ID: 2070127

Date of birth: August 08, 1997 Place of birth: Long An

Major: Industrial Engineering Major ID: 8520117

I – TITLE: APPLICATION OF CLUSTERING ALGORITHM FOR STORAGE

LOCATION ASSIGNMENT PROBLEM IN SYNCHRONIZED ZONE ORDER PICKING WAREHOUSES/ ỨNG DỤNG GIẢI THUẬT PHÂN CỤM VÀO BÀI TOÁN XÁC ĐỊNH VỊ TRÍ LƯU TRỮ TRONG KHO LẤY HÀNG ĐỒNG THỜI

ASSIGNMENTS AND CONTENT:

Assignments:

• Understanding the structure of k-means clustering algorithms

• Understanding the problem of storage location assignment in synchronized order picking warehouses

• Applying the k-means clustering algorithm in a suitable way to solve the problem of storage location assignment in synchronized order picking warehouses

• Demonstrating the effectiveness of the proposed algorithm by using different data sets in the scope and goals of the thesis

Content:

• Chapter 1 gives some introduction and determines the goal and scope of the

study

Trang 4

• Chapter 2 reports the literature review and shows the methodology of the

study

• Chapter 3 presents the problem statement and the modified design of the

clustering algorithm

• Chapter 4 shows the experimental validation and analysis results to prove the

ability and extent of the proposed algorithm to the problem

Chapter 5 gives conclusion statements and some suggestions for future

IV – INSTRUCTOR: PhD Nguyen Duc Duy

Ho Chi Minh City, ………

INSTRUCTOR

(Full name and signature)

HEAD OF DEPARTMENT

(Full name and signature)

DEAN OF FACULTY OF MECHANICAL ENGINEERING

(Full name and signature)

Trang 5

ACKNOWLEDGEMENTS

This thesis is completed with the valuable support of many people First of all, I would like to express my special and sincere gratitude to my research advisor PhD Nguyen Duc Duy for thoughtfully instructing and consulting me during two hard semesters of conducting this thesis Secondly, I want to express many thanks to Assoc.Prof PhD Do Ngoc Hien, Head of Department of Industrial Systems Engineering, Ho Chi Minh City Univeristy of Technology for kindly giving me a good environment to work and study

I am also grateful to all staff of Department of of Industrial Systems Engineering and members of Faculty of Mechanical Engineering for their kindness and sympathy when I carry out this thesis

Last but not least, it is my pleasure and lucky to be my parents’ son and to be a member of my close friends’ circle as well as my extended family Thanks so much to them for accompanying me not only in my academic path but also in my memorable life

HUYNH HUU DUC

Trang 6

ABSTRACT

The development of E-commerce leads to the appearance of higher numbers of orders but with smaller amounts of item quantities per order In this situation, the important role of warehouses as buffering places is becoming more obvious One of the common and vital performance metrics of warehouse operations is the efficiency of the order-picking phase To improve this metric, several solutions are studied and applied, including the policy of synchronized zone order-picking systems In this thesis, the author tries to prove that in synchronized zone order-picking warehouses, a proper mechanism of storage location assignment can contribute to the improvement rate of order-picking efficiency In more detail, the more the picking demand of two items is similar, the less the likelihood that these items should be located in the same zone should be so that the idle time between pickers when they simultaneously fulfill the order can be cut down, leading to improvement in the completion time of an order Therefore, the author tries to develop a suitable k-means clustering algorithm to release lists of items that should be located in different zones as suggestions for warehouse staff to plan for the storage phase, aiming at improving order-picking efficiency

Trang 7

TÓM TẮT LUẬN VĂN

Sự phát triển của thương mại điện tử dẫn đến sự xuất hiện của số lượng đơn đặt hàng cao hơn nhưng với lượng đặt hàng của mỗi mặt hàng trên mỗi đơn hàng nhỏ hơn Trong bối cảnh như vậy, vai trò bộ đệm quan trọng của nhà kho ngày càng trở nên rõ ràng Một trong những thước đo hiệu suất phổ biến và quan trọng của hoạt động nhà kho là hiệu quả của giai đoạn lấy hàng Để cải thiện chỉ số này, một số giải pháp đã được nghiên cứu và áp dụng, trong đó có chính sách về hệ thống lấy hàng đồng thời Trong luận án này, tác giả cố gắng chứng minh rằng trong kho lấy hàng đồng thời, cơ chế xác định vị trí lưu trữ phù hợp có thể góp phần nâng cao hiệu quả lấy hàng Cụ thể hơn, nhu cầu lấy hàng của hai mặt hàng càng giống nhau thì khả năng các mặt hàng này nằm trong cùng một khu vực nên càng nhỏ đi, để thời gian nhàn rỗi giữa những người lấy hàng khi họ thực hiện lấy hàng đồng thời cho cùng một đơn hàng có thể được cắt giảm, dẫn đến cải tiến về thời gian hoàn thành đơn hàng Do đó, tác giả cố gắng phát triển thuật toán phân cụm k-mean phù hợp để đưa ra danh sách các mặt hàng nên đặt ở các khu vực khác nhau, từ đó cung cấp gợi ý cho nhân viên kho lập kế hoạch cho giai đoạn lưu trữ nhằm nâng cao hiệu quả lấy hàng

Trang 8

DECLARATION

I hereby declare that this is my own research All the data and the results used in this research are honest and have not been published in other studies I will be totally responsible for my research if it is incorrect as mentioned above

Trang 9

Chapter 2 LITERATURE REVIEW AND METHODOLOGY 4

2.1 Literature review and contributions 4

2.1.1 Storage location assignment 4

2.1.2 Synchronized order picking warehouses 6

2.1.3 Clustering algorithms 8

2.1.4 Hierarchical and partitional clustering 9

2.1.5 k-means clustering algorithms 10

2.2 Methodology 11

Trang 10

Chapter 3 PROBLEM AND ALGORITHM 14

3.1 Definitions 14

3.2 Solution orientation and concept of flow 14

3.3 Feature selection and distance measure 15

3.4 Clustering evaluation function 17

3.5 Initial cluster centers selection method 18

3.5.1 Dimension reduction component 21

3.5.2 Initialization component 22

3.6 Determination of the number of clusters 26

3.7 Convergence condition 30

3.8 Flow of the algorithm 31

Chapter 4 EXPERIMENTAL RESULTS AND ANALYSIS 35

4.1 Operational context and notations 35

4.2 Input data 35

4.3 Implementation and results 36

4.4 Managerial insights 37

4.5 Discussions on computational complexity 38

Chapter 5 CONCLUSION AND FUTURE DIRECTIONS 40

Trang 12

List of Figures

Figure 2.1 Illustration of Synchronized Order Picking (source: [14]) 7

Figure 2.2 Methodology 12

Figure 3.1 An example of the method of determining the number of clusters 30

Figure 3.2 The master flow of the k-means algorithm 32

Trang 13

List of Tables

Table 2.1 Literature review on Storage Location Assignment 4

Table 3.1 Pseudo-code of Distance calculation function 16

Table 3.2 Literature review on issue groups of random initialization 19

Table 3.3 Summary of quantification orientations for initial selection of cluster centers 21

Table 3.4 Pseudo-code of function of initial selecting cluster centers 24

Table 3.5 Matching between properties of the proposed intialization method based on m-NN and issue groups of random intialization 25

Table 3.6 Pseudo-code of the function to determine the number of clusters 28

Table 3.7 An example of the method of determining the number of clusters 29

Table 3.8 The detailed content of each component of the master flow 33

Table 4.1 Summary of experimental results 36

Trang 14

List of Acronyms

Acronyms Terms

CRCCM Cumulative Rate of Change of Candidate Measure

Trang 15

Chapter 1 INTRODUCTION

1.1 MOTIVATION

Warehousing plays a vital role in logistics and supply chain management, especially in Viet Nam In detail, 53.7% of logistics enterprises in Viet Nam provide warehousing services [1] Additionally, the industry of e-commerce in Viet Nam has recently experienced significant developments [2], leading to a dramatically increasing demand for various items but in small volumes In such a trend, since warehouses have been playing a vital role as buffering spaces in supply chains and ensuring the uncertainty of consumer demand has as few adverse effects as possible on the stability of production systems [3 - 6], improvement activities in the warehouse operations are still a valuable and practical area to study

There are six fundamental warehouse processes, including receiving, putaway, storage, picking, packing, and shipping [7] In which, storage is vital in the warehouse operation While the other steps ensure smooth material flows inside and outside a warehouse, storage is where the buffering role of a warehouse in a supply chain presents apparently, placing items after the put-away process and before specific quantities of them are picked in the order-picking process to cover demands [8] As a result, a mechanism set in the storage phase, like Storage Location Assignment (SLA) problems, has various effects on the efficiency of both inbound and outbound flows of a warehouse regarding several key performance indicators such as picking time and cost, productivity, delivery and inventory accuracy [5, 8, 9]

The order-picking process accounts for a large proportion, about 55% [7], of the operational cost of a warehouse Therefore, this study focuses on SLA problems in the area of its relationship to order-picking processing efficiency and inherits the idea, from the study of Ene and Öztürk [5], of employing order-picking processing time (OPT) as the measure of the efficiency According to Tompkins et al [10], OPT is constructed

Trang 16

mainly (about 95%) from four components: traveling time, searching time, picking time, and set-up time Traveling time accounts for the most significant proportion (approximately 50%), so it is chosen as the indicator of the efficiency of the order-picking process in this study

To enhance the efficiency of the order-picking process in terms of the OPT, synchronized order-picking (SOP), or multiple items of the same picking order (PO) are collected simultaneously by pickers from pickers’ assigned zones, utilized [11] Because of its operational advantages, the SOP warehouses are selected to be the scope of this study

An SLA solution that targets improving the order-picking process can be broken down into two sub-processes: family grouping and storage allocation [12, 13] In more detail, while the family grouping sub-process includes an analysis of correlation (or similarity) between items and a clustering step to formulate clusters of items that are highly similar in terms of some selected characteristics, the storage allocation sub-process includes drafting a priority list and a position assignment result based on this list for items Since the reality is that the development of the storage allocation sub-process can be enormously affected by various factors when real-time assignment happens after receiving cargos from inbound docks and this sub-process does better adapt quickly to changes of inbound flow, it is more proper to spend more focus on the family grouping sub-process Because of its structure, the family grouping sub-process can be operated by a suitable clustering algorithm And when it comes to clustering algorithms applied to SLA in SOP warehouses with the objective function of order-picking efficiency, it is evident that this topic has drawn little attention in research The author has found two studies related to the topic [11, 14]

Trang 17

1.2 PROBLEM STATEMENT

When it comes to warehouse operations, the order-picking processing time is contributed by the traveling time the most In synchronized order-picking warehouses, traveling time depends on the level that the items that are often demanded in the same order are located in different zones In this study, a clustering algorithm is developed to provide separated lists of items that should be located in different zones, aiming at decreasing the traveling time and then decreasing order-picking processing time in synchronized order-picking warehouses

1.3 GOAL AND SCOPE

For short motivation in above, this research aims to develop a clustering algorithm to tackle the SLA problems, within the scope of SOP warehouses, improving order-picking traveling time

1.4 THESIS LAYOUT

The thesis comprises six chapters:

• Chapter 1 gives some introduction and determines the goal and scope of

• Chapter 4 shows the experimental validation and analysis results to prove

the ability and extent of the proposed algorithm to the problem;

• Chapter 5 gives conclusion statements and some suggestions for future

studies

Trang 18

Chapter 2 LITERATURE REVIEW AND METHODOLOGY

2.1 LITERATURE REVIEW AND CONTRIBUTIONS 2.1.1 Storage location assignment

The SLA problems focus on allocating product items into storage locations targeting optimized material handling cost and storage space utilization Two common approaches aim to improve warehouse space utilization and the cycle time of the order fulfillment process Typical constraints include the capacity values of available storage and resources and dispatching policies [15] SLA plays an important role in operations of warehouses, because it provides a basis for various improvements in inventory management system of warehouses: reducing the complexity of inventory management via standardizing how a new item unit is located in storage systems of warehouses, giving optimal solutions in terms of the utilization rate of resources like space or handling devices, and considering demand pattern, separately or simultaneously [9, 16-18] More details can be seen in Table 2.1 The column of “Targeted function” in Table 2.1 is based on the structure of two components, family grouping and storage location, proposed by Bindi et al [12] as mentioned in Chapter 1

Table 2.1 Literature review on Storage Location Assignment

Studies

Target of improvement/

optimization

Approach/ Solution orientation

Method Targeted function

Fontana and Nepomuceno,

2017 [16]

Multi-criteria of order-

picking process efficiency

Classification and their storage location assignment

MCDM

• Family grouping (determing shelf) • Storage location

(determining location in shelf)

Trang 19

Studies

Target of improvement/

optimization

Approach/ Solution orientation

Method Targeted function

Guerriero et al., 2013 [19]

the total inbound cost

Classification of products LP

• Family grouping (based on

characteristics) • Storage location

(based on physical compatibility)

Muppani and Adil, 2008

[17]

Total cost of order picking and storage space

Establishing classes of products and allocating them to storage locations

• ILP • SA

• Family grouping • Storage location

(“class” plays the role of an

intermediate factor between two functions)

Zhang et al., 2019 [18]

Total travel distance

Demand correlation pattern (The DCP of

item I is set of I and

items that are frequently ordered with

• ILP • H&SA

Family grouping and Storage location

Trang 20

Studies

Target of improvement/

optimization

Approach/ Solution orientation

Method Targeted function

I, in a certain

probability)

Notes: ILP- Integer linear programming model, MCDM- Multi-criteria decision-making; LP- Linear programming; SA-Simulated annealing; H&SA- Heuristic and Simulated annealing

As can be seen in Table 2.1, when it comes to SLA problems:

• Targets are related to improving order-picking efficiency (cost or time); • The most common approach is characteristics-based classification or

grouping;

• An algorithm that targets optimal or near-optimal solutions is applied In this study, the author tries to propose an approach to solve a subset of SLA problems in warehouses in which the synchronized order picking system is applied The common objective orientation of targeting improvements in order-picking efficiency via

grouping is still pursued, in the form of clustering problems Additionally, a k-means

algorithm is applied to solve to widen the area of algorithm classes that can be considered when it comes to the problem under discussion.

2.1.2 Synchronized order picking warehouses

Synchronized order picking is a type of order-picking process in a warehouse in which pickers simultaneously fulfilling the demands of the same PO from different zones (see Figure 2.1)

Trang 21

Figure 2.1 Illustration of Synchronized Order Picking (source: [14])

To the best of the author’s knowledge, a limited number of research focus on SLA problems in SOP warehouses For example, Jane and Laith (2005) formulated a model to minimize the total similarity between items in the same zone, measuring the similarity by the co-appearance of items in the same order The main idea of this study is that the higher the similarity between two items is, the lower the likelihood that these items are in the same zone Kuo et al [11] proposed two metaheuristics, one is based on the particle swarm optimization algorithm, and another is based on the genetic algorithm, the measurement of similarity is similar to the study of Jane and Laith [14] The objective function reflects properly the purpose of reducing the waiting time between pickers in SOP warehouses However, the measurement of similarity in these two studies still have a limitation In cases when two items appear together in the same PO but the picking amounts of them are far different from each other, the number of times that two items co-appears would not lead to any considerable improvement in the probability that the processing time of a PO could be cut down

In this study, therefore, the level of difference between items’ quantities in the same PO is used instead, aiming at grouping items that could highly make two pickers

Trang 22

spend similar picking time into clusters, which then plays the role of lists of items that

should be in different zones In more detail, given groups of O POs, groups of I items, and the number of expected zones (and pickers as well) is N; quantity of item I in PO o, picking time of picker m for PO o, picking time for item I corresponding to order o, and the binary signal whether item I is assigned to picker m or not are denoted by

𝑸𝒊𝒐, 𝑻𝒎𝒐, 𝑻𝒊𝒐, 𝑨𝒎𝒊 respectively (I ∈ I, o ∈ 𝑶, m ∈ 𝑵), the idea can be mathematically

2.1.3 Clustering algorithms

Clustering algorithms operate based on the principle that entities whose similarity level of some characteristics is relatively high must be grouped into a cluster so that specific operations can treat them to achieve some benefits [14, 19] A typical clustering algorithm often includes the following steps (“pattern” has been mentioned as an “object” or “data point” recently):

1 Pattern representation (optionally including feature extraction and/ or selection)

Trang 23

2 Definition of a pattern proximity (or distance, similarity) measure that is appropriate to the data domain

3 Clustering (or grouping) 4 Data abstraction (Optional) 5 Assessment of output (Optional)

Regarding categorizing clustering algorithms, the most commonly used framework is dividing them into two groups: hierarchical and partitional [20] The main difference between the two groups is whether resulting clusters are nested (in hierarchical ones) or separated (in partitional ones)

2.1.4 Hierarchical and partitional clustering

The central concept of a hierarchical clustering algorithm is that a group of data patterns should be divided in a structure as meaningful as possible in terms of the number of clusters and the interrelationship between clusters without trying to separate these clusters or clusters appears in forms of a multi-level dendrogram [21, 22] Although hierarchical clustering algorithms show advantages in terms of illustrating the nature and the meaningfulness of the clustering results, their drawbacks regarding controlling the clustering process by parameters and high computational complexity [23] Therefore, they may not be applied in some situations which request a relatively highly strictness of how clustering processes happen like SLA problems

On the other hand, the partitional clustering algorithm tries to develop distinct clusters, so it needs some initial input data, such as the number of clusters or the threshold for point density in clusters [21, 23] The partitional clustering algorithm is benefit when applied in cases involving large data sets However, it turn to sensitivity to outliers of data sets and also is complicated to define at set of appropriate parameters [24] In this research, clustering-driven parameters are known in advance, like SLA problems in SOP

Trang 24

warehouses; partitional clustering algorithms should be used because the low computational complexity will be utilized

2.1.5 k-means clustering algorithms

k-means clustering has been the most common partitional clustering algorithm for 50 years since it was introduced [25] According to Jain et al [21], a basic k-means

clustering algorithm includes four steps:

1 Choose k initial cluster centers from the data set

2 Assign each pattern (a data point or a group of data points) to the closest cluster center

3 Redetermine the cluster centers

4 Repeat step 2 if the convergence selection is not met

MacKay [26] defined a standard version of k-means clustering is an algorithm that changes an initial set of k means m1(1), ,mk(1) by repeating two steps until reaching covergence criteria:

• Assignment step: Assign each observation to the cluster 𝑆𝑖(𝑡)

with the nearest mean 𝑥𝑝 that with the least squared Euclidean distance, as in equation (2.1)

𝑆𝑖(𝑡) = {𝑥𝑝: ‖𝑥𝑝− 𝑚𝑖(𝑡)‖2 ≤ ‖𝑥𝑝 − 𝑚𝑗(𝑡)‖2 ∀𝑗, 1 ≤ 𝑗 ≤ 𝑘} (2.1)

• Update step: Recalculate means (centroids) 𝑚𝑖(𝑡+1)

for observations assigned to each cluster, as in equation (2.2)

𝑚𝑖(𝑡+1) = 1

|𝑆𝑖(𝑡)| ∑ 𝑥𝑗𝑥𝑗∈𝑆𝑖(𝑡)

(2.2)

Trang 25

Within the scope of the literature review results, this study contributes to some common issues of k-means, including:

• Proposing a proper measurement for clustering performance: The

clustering evaluation function in this study is based on the distance measure as in previous related works

• Designing an algorithm for the initial cluster centers selection phase: In

this study, the author tries to tackle the issues of Elbow method and random initialization, which are commonly applied in this phase of clustering algorithms, in terms of execution time and clustering performance The concept of nearest-neighborhood-radius is utilized to calculate the candidate measure of every object in the data set, ensuring the requirements on the size and density of a cluster

• Determining an appropriate convergence condition: In this study, the

author inherits the idea that the clustering evaluation function must be integrated into the convergence checking function so that the stopping crtiterion is aligned with the solution quality Additionally, the idea of a consecutive interval of indicator exceeding a threshold is applied to decrease the probability of being stuck in a local optimum;

The core orientation of the contributions in terms of the three mentioned points is that the author tries to consider whether solutions in previous studies can be applied to the case of SLA in SOP warehouses and modified where necessary

2.2 METHODOLOGY

The methodology is summarized in Figure 2.2

Trang 27

object to be clustered will be defined, coupling with the selections of feature and distance measure After that, the traditional general processes of a typical clustering algorithm (in sub-section 2.1.3) and a k-means algorithm (in sub-section 2.1.5) are utilized with some modifications resulting from referring to related studies Some experimental analysis is then conducted to check the practicability and the possible scope of applying the modified algorithm to the problem studied Finally, the author summarizes the critical points of the study and gives some suggestions for future studies

Trang 28

Chapter 3 PROBLEM AND ALGORITHM

3.1 DEFINITIONS

• Object or Data point: the smallest unit of data to provide the material for

the clustering algorithm; in this study, an object or a data point is the expression of an item’s problem-related information

• Data set: The whole set of objects or data points;

• Item: the concept indicates a group of similar products that are managed

together in a warehouse (equivalent to a Stock-Keeping-Unit, or SKU, code shown in a Warehouse-Management-System, or WMS);

• Storage location: the smallest unit of warehouse space to keep item units; • Zone: a group of storage locations that will be assigned to a picker

3.2 SOLUTION ORIENTATION AND CONCEPT OF FLOW

As presented in Chapter 1, traveling time is selected to indicate the order-picking efficiency In order to optimize the traveling time in SOP warehouses, from the viewpoint of SLA, as also proved in Chapter 2, a solution needs to be developed so that if two items are more frequently scheduled in the same PO, they must be less likely located in the same zone This way, a PO’s makespan can be minimized [14], leading to a more efficient order-picking process

As mentioned in Chapter 2, the flow of the algorithm in this study will be based

on the traditional flows of a typical clustering algorithm and a k-means algorithm, being

integrated with some modifications from the literature review and operational based information Some essential points of the concept of flow are:

context-• “Object” (or sometimes “data point” in this study) is defined by an item; • The features (or “dimensions” in this thesis) to be selected and the

measurement of distance (or similarity) between objects must reflect the

Trang 29

solution orientation of “more frequently scheduled” mentioned previously, which will be clarified in sub-section 3.3

• Feature extraction (or dimension reduction) is considered conducting based on the results of analysing operational context of SLA in SOP warehouses

• Methods for selecting the number of clusters, determining and determining the cluster centers, clustering, assessing clustering results, and the convergence condition must be incorporated with each other so that they can support the idea that “if two items are more frequently scheduled in the same PO, they must be less likely located in the same zone” mentioned above These contents will be presented in sub-sections 3.5, 3.6, and 3.7 respectively

re-3.3 FEATURE SELECTION AND DISTANCE MEASURE

Let the traveling speeds of pickers have the same values as an assumption The traveling time then depends on the total distance that pickers must pass to complete their assigned routes To measure conveniently, the unit of distance is “location length”, equal to the size of a storage location in the aisle dimension of a rack Therefore, the traveling time corresponding to a PO is produced by multiplying the “location length” and the item quantity in the PO

Since the “location length” is a layout-based value, it is proper to choose the quantity as the only feature type Noticing that the number of features is the number of POs that appeared in the data set As a result, the calculation of the absolute difference of distance between those quantities of two items in the same PO reflects the level of being frequently scheduled together reasonably

Euclidean distance has various applications of clustering algorithms in the realm of logistics and supply chain management such as area segmentation [27], decision-

Trang 30

making support [28], facilities planning [29] Therefore, in this study, the second-order Euclidean distance is selected as the method of aggregating distances between item pairs from multiple POs, as in equation (3.1)

• 𝐷𝑖𝑗 is the Euclidean distance between two items I and j

• 𝑜 is the index of a PO

• 𝑂 is the number of POs to be considered • 𝑞𝑖𝑜 is the picking quantity of item i in PO o

Pseudo-code of the function responsible for calculating distance is in Table 3.1

Table 3.1 Pseudo-code of Distance calculation function

1 FUNCTION

EuclideanDistance(order, ListOfFeatureValues_1, ListOfFeatureValues_2):

2 # ListOfFeatureValues_1 and ListOfFeatureValues_2 have the same length

3 SET result = 0

4 FOR each i = 1 to length(ListOfFeatureValues_1):

5 UPDATE result = result + (ListOfFeatureValues_1[i]

6 UPDATE result = result(1/ order)

7 RETURN result

8 END FUNCTION

Trang 31

3.4 CLUSTERING EVALUATION FUNCTION

Because the clustering result decides whether some items must be stored in different zones, it is vital to develop a suitable cluster evaluation function that can be used to assess the quality of the clustering process and the location assignment process based on that

As mentioned earlier, in the context of the SLA problems in SOP warehouses, the most critical quality the clustering result needs to meet is that the quantities of intra-cluster items must be as identical as possible, i.e., the distances between pairs of cluster members must be as small as possible

Therefore, in this study, the evaluation function value for a cluster is constructed from the square root of the summation of square distances between pairs of items in a cluster, not including any computations regarding the relationships between a cluster center and this cluste’'s members, as well as between cluster centers It is also the difference between the proposed method and previous methods, for example methods of Hatamlou et al [30] and Tzortzis and Likas [31], whose intra-cluster clustering quality is measured by the total square errors between a cluster center and each cluste’'s members The square root operator is utilized to support the intuitive evaluation of the function value compared to the feature values of objects, via turning the evaluation function back to the same unit with feature values

The expression of the clustering evaluation function is in equation (3.2)

𝐸𝐶 = √ ∑ 𝐷𝑖𝑗2𝑖,𝑗⊂𝐶

(3.2) In equation :

• C is the index of a cluster;

Trang 32

• 𝐸𝐶 is the clustering evaluation function corresponding to cluster C; • i, j are the indices of items (objects) that belong to cluster C;

• 𝐷𝑖𝑗 is the distance between items i and j

To the extent of the whole data set, i.e., all the clusters, the clustering evaluation function must be toward the high possibility that all clusters have good problem-driven clustering quality Therefore, the overall value of the clustering evaluation function is calculated by the equation (3.3)

𝐸𝑜𝑣𝑒𝑟𝑎𝑙𝑙 = ∑ 𝐸𝐶 𝐶

(3.3)

3.5 INITIAL CLUSTER CENTERS SELECTION METHOD

Coupling with several applications of k-means clustering algorithms, the topic of

initial cluster center selection methods for this clustering algorithm group has also drawn a considerable amount of attention, because of its important role in the overall process

of a k-means clustering algorithm, for example in studies of Khan and Ahmad [32], Frey

and Dueck [33], Cao et al [34] Random selection is the most basic and prevalent method of determining initial cluster centers However, this method remains limitations: it can lead to a time-consuming iterative process of clustering before reaching an optimal or acceptable solution, due to the fact that its nature is a trial-and-error and uncertain mechanism

In previous related studies, some modifications have been proposed to cover the problems of randomness mentioned above, for example in studies of Su and Dy [35], Erisoglu et al [36], Xiong et al [37] It can be seen that there are three common issues when it comes to random initialization for a k-means clustering algorithm, they are (1) large search space of initialization results, (2) the risk of being stuck in a local optimum, and (3) the risk of establishing empty clusters due to effects of isolated objects,

Trang 33

i.e.“outlier” (see Table 3.2) To cover these issues of the random initialization method, several solutions are proposed These solutions try to develop a method that picks out a

set of k initial centers without any further need to rerun to tackle the issue of reaching a

local optimum accidentally In this way, both the problem of various possible initialization results and its results regarding being stuck in a local optimum or establishing empty clusters are prevented from occurring

Table 3.2 Literature review on issue groups of random initialization

• Dimension-reduction-based: This orientation results from the widely

accepted notion that in high-dimensional vector space, there are a small number of principal dimension that contains most of the information of the

Trang 34

data set Erisogul et al [36] choose two of all dimensions that best describe the spread of the data set to project objects onto, via the application of the variation coefficient and the correlation coefficient Su and Dy [35] choose the dimension that contributes the most to the largest value of clustering evaluation function of a cluster and partitioning the data set based on this dimension

• Previous-centers-based: This orientation determines a new cluster center

based on information that is related to chosen cluster centers, considering the interaction between clusters Erisogul et al [36] use the cumulative distance between an object and chosen center candidates (between an object and the mean of the data set for the special case of the first candidate) to ensure that the nearest neighbors of the object cannot become a center candidate Xiong et al [37] select the data object that is furthest from the set of previous chosen cluster centers (distance between an object and a set, in this study, is defined as the minimum value of distances between the object and each element of the set)

• Cluster-radius-threshold-based: This orientation focuses on assuring an

intuitively acceptable threshold of radius for all established clusters The method proposed by Xiong et al [37] induces the mean distance of the data set as a cluster radius threshold, then a variable to categorize whether an object is an isolated object (which will be removed from the candidate set) or not

• Density-based: This orientation considers density (within a determined

radius from an object) as one of the control parameters of the initial selection method, in addition to distances between objects As a result, this orientation strongly supports the selection process of cluster centers in terms of fitting the concept of “cluster center” as mentioned above Xiong

Trang 35

et al [37] define the concept of density with respect to the radius threshold determined by mean distance of the data set, and then using density as a criterion to select cluster centers sequentially

Table 3.3 Summary of quantification orientations for initial selection of cluster

centers

Study

Quantification orientations

Dimension-based

Previous-based

Cluster-radius-threshold-based based

3.5.1 Dimension reduction component

When it comes to dimension reduction component, the Principal Component Analysis (PCA) method, which was invented in 1901 by Karl Pearson [39], is a prevalent solution To the best knowledge of the author, this method has three main characteristics:

• Spread representation: The dimensions (or features) chosen to implement

dimension reduction – which will be called “pricipal component” from

now on – must best reflect the spread (or variances) of the data set;

Trang 36

• Orthogonality: The pricipal components must be orthogonal to each other • Information loss minimization: The lower-dimensional space where the

data objects are projected must ensure minimizing information loss Additionally, three characteristics mentioned above also appear in two related studies of Su and Dy [35] and Erisoglu et al [36] Therefore, it is evident to apply the PCA method to develop the dimension reduction component For the first time of applying this method, the author plans to apply the PCA process presented in a previous study of [40] via the dedicated function of PCA in the Scikit-learn library for Python language programming (sklearn.decomposition.PCA) However, due to the duration limitation of the thesis, the dimension reduction component is partially conceptualized as mentioned in this thesis Its details will be considered a future direction in the following works

3.5.2 Initialization component

Because SLA results are requested right after receiving items from inbound docks, a decision of determining where an item will be placed must be given after a relatively short time, ensuring warehouse workers can be released as soon as possible for other receiving requests, or other activities of the warehouse such as the order-picking, packing, or checking processes Therefore, the practical requirement is that a

fast cluster center selection algorithm initializes the applied k-means clustering

algorithm To achieve the characteristic of fast operation, the applied selection algorithm should not contain any highly complex calculation or analysis Still, a cluster cente’'s intuitive concept must be ensured simultaneously by an appropriate measurement

According to Li et al [38], a cluster center is an object surrounded by a high density of other objects within a small radius threshold, located relatively far from other cluster centers Li et al [38] proposed an algorithm that targets radius threshold and density Considering the same idea in the context of the problem statement as in sub-

Trang 37

section 3.1, the radius threshold models the acceptable maximum level of difference in terms of item quantities between two items, and the density models the number of items whose quantities are highly close to the figure for the cluster center Due to the suitability between the algorithm and the problem, the author decides to try to apply the algorithm in this study for initial cluster center selection

Additionally, the algorithm from Li et al [38] also tackles uncertainty level from random initializing to some extent and ensures the concept of a cluster center at the same time It is also run by easy comparison and calculation operators, thus cover the need of a fast algorithm Detailed steps of the algorithm, with notices for the problem, include:

1 Calculating distance values (based on the chosen method in sub-section 3.1) between pairs of objects

2 Calculate sets of object’' distinct neighborhood radius values based on an

m-nearest-neighbor (m-NN) algorithm called “m-NN radius” In this study, the m-NN radius is determined as the highest value among m objects with

the smallest distance from the current object Additionally, the value of

parameter m is determined by the number of pickers that will participate

in the shift considered; because, in this way, there will be more possibility that the pickers finish a PO after an SOP process at the same time, cutting down the tardiness of this PO

3 Calculating the average of all m-NN radius values from step 2, called

“average neighborhood radius”

4 Calculating the density of each object by counting the number of objects whose distance values (from step 1) are in the spatial scope limited by the average neighborhood radius (from step 3) centered from the object 5 Calculating the measurement of the level that an object can be chosen as a

cluster center, called “candidate measure” by dividing the corresponding

density value (from step 4) by the m-NN radius of this object

Trang 38

6 Determining the number of clusters k via a method presented in

sub-section 0 (this is an additional step)

7 Picking k objects corresponding to top k candidate measure values (from step 5) to become k cluster centers

The pseudo-code of function of initial selecting cluster centers is in Table 3.4

Notice that the pseudo-code of the function NumberOfClusters will be presented in

3 FROM objectData GET NumberOfItems

4 FROM distanceData GET DistanceValues

5 FOR i = 1 to NumberOfItems

6 Calculate and save m-NN radius to initialClusterCenterData

7 Calculate and save density to initialClusterCenterData

8 Calculate and save Candidate Measure to initialClusterCenterData

9 CALL NumberOfClusters k

10 SORT_DESCENDING initialClusterCenterData[Candidate Measure]

Key = Candidate Measure → SortedCandidateMeasure

11 GET top k of SortedCandidateMeasure ListOfInitialClusterCenters

12 RETURN initialClusterCenterData, ListOfInitialClusterCenters

13 END FUNCTION

Trang 39

The m-NN-based method above contributes to solve issue groups of random initialization to some extent First of all, the usage of concepts of m-NN object, m-NN radius, m-NN-based density ensure that a center candidate must be surrounded by a large

number of neighbors (high density) within an acceptable radius threshold that is suitable

to the structure of the data set (which is the average of m-NN radius values of data

objects) In this way, the method can lower the capability of isolated data objects to become cluster centers Secondly, the calculation of candidate measure values from density and radius of each data object ensures a unique initialization result and fit the concept of a cluster center at the same time Therefore, the candidate measure value in this method fulfills two issue groups: wide space of possible initialization results and risk of being stuck in a local optimum Summary of matching evaluation between

properties of the proposed intialization method based on m-NN and mentioned issue

groups of random intialization is presented in Table 3.5

Table 3.5 Matching between properties of the proposed intialization method based

on m-NN and issue groups of random intialization

Properties of

m-NN-based initialization component

Issue groups of random initialization

Large search space

Risk of being stuck in a local

optimum

Risk of empty clusters/ effects of

isolated objects m-NN cluster radius

Ngày đăng: 31/07/2024, 09:32

TÀI LIỆU CÙNG NGƯỜI DÙNG

  • Đang cập nhật ...

TÀI LIỆU LIÊN QUAN