Main steps of differential evolution

Một phần của tài liệu (LUẬN văn THẠC sĩ) RESEARCH AND APPLY EVOLUTIONARY COMPUTATION TECHNIQUES ON AUTOMATIC TEXT SUMMARIZATION (Trang 56 - 66)

This section explains step by step the operation of the differential evolution algorithm in solving the problem of automatic text summarization, in particular, solving (1).

Step 1: Initialization:

Generate randomly a population of P individuals:

Each individual is a real-valued vector:

U p (t) = [u p,1 (t), …, u p,n (t)]

p=1, 2, …, P (population size)

n is the number of sentences in document collection t is the generation number.

However, at first at generation 0, each elements u p,i (0) of individual U p (0) is initialized as:

u p,i (0) = u i min + u i max − u i min . rand p,i (7)

(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION

in which, rand p,i is a random number within [0,1] and is reassigned for each element u p,i of vector U p (0)

u i min and u i max are often set at -5 and 5, correspondingly.

Because we are working with problem of text summarization, the solution vectors should be in binary representation.

Step 2: Binarization:

Convert P real-valued vectors to P binary vectors using the formula:

u p,i t = 1, if rand p,i < 𝑠𝑖𝑔𝑚(u p,i (t))

0, otherwise (8)

𝑠𝑖𝑔𝑚 𝑧 = 1

1+exp ⁡ (−z) (9)

rand p,i is a random number within [0,1], and reassigned for each i th component of the p th vector

Step 3: Evaluation:

Calculate fitness value of each of P individuals in the population.

Step 4: Mutation:

The aim of this operator is to generate mutant vectors, making the algorithm to expand the searching direction/ explore the searching space.

For each target vector U p (t), we choose three random vectors U p1 (t), U p2 (t) and U p3 (t) in which p, p1, p2 and p3 are different from each other.

The mutant vector: V p (t) = U p1 (t) + F × (U p2 (t)-U p3 (t)) (10)

F: mutant factor, specifying the scale of the difference (U p2 (t) – U p3 (t)), often in the interval [0.4, 1.0] [19]

Figure 3.1 describes the position of vector V p relative to vector U p1 , U p2 and U p3 [6].

(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION

Figure 3.1. Illustration of mutation operation

Step 5: Check the boundary restriction:

Components of the mutant vector are examined if they violate the boundary constraints.

𝑣 𝑝,𝑖 (𝑡) =

2 ∗ 𝑢 𝑖 𝑚𝑖𝑛 − 𝑣 𝑝,𝑖 (𝑡), 𝑖𝑓 𝑣 𝑝,𝑖 𝑡 < 𝑢 𝑖 𝑚𝑖𝑛 2 ∗ 𝑢 𝑖 𝑚𝑎𝑥 − 𝑣 𝑝,𝑖 𝑡 , 𝑖𝑓 𝑣 𝑝,𝑖 𝑡 > 𝑢 𝑖 𝑚𝑎𝑥

𝑣 𝑝,𝑖 𝑡 , 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒.

(11)

Formula (11) makes sure that v p,i (t) is always in the interval (𝑢 𝑖 𝑚𝑖𝑛 , 𝑢 𝑖 𝑚𝑎𝑥 ) Step 6: Crossover:

This operator is used in order for offspring vectors to inherit some features of their parents, diversifying children’s properties. The target vector is mixed with the mutant vector to generate a trial vector:

Z p (t) = [z p,1 (t), …, z p,n (t)]

(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION

z p,i (𝑡) = 𝑣 𝑝,𝑖 𝑡 , 𝑖𝑓 𝑟𝑎𝑛𝑑 𝑝,𝑖 ≤ 𝐶𝑅 𝑜𝑟 𝑖 = 𝑘

𝑢 𝑝,𝑖 𝑡 , 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒 (12)

rand p,i is a random number within [0,1], refreshed for each i th component of the p th parameter vector.

CR ∈ [0,1]: crossover constant

k ∈{1, 2, …, n} randomly chosen for each p th parameter vector to make sure the population will evolve because at least one element of the trial vector prefers mutant vector than target/parent vector, otherwise no new vector is created. If CR is big then there is more likelihood that the trial vector is generated from more mutant vector elements than target/parent vector element. Figure 3.2 gives us an example when rand p,2 ≤ CR and i=k=4 .

Figure 3.2. Illustration of crossover operation Step 7: Binarization:

Convert real-valued trial vectors to binary trial vectors (the same as step 2).

Step 8: Constraint handling:

(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION

However, we also have to satisfy the constraint of summary length, the way these researchers manage this restriction as follows:

- Any feasible solution overweighs any infeasible solution

- Two feasible solutions will be compared based on their fitness values - Two infeasible solutions will be compared based on how much they

violate the constraint.

where feasible solutions are vectors/individuals satisfying the restriction, otherwise they are infeasible solutions. In this method, feasible solutions will be emphasized more than infeasible ones, moreover, we could still keep infeasible solutions with high fitness values.

Step 9: Selection:

This operator is performed to keep the population size constant. We will select better vector between target and trial one to survive to the next generation:

𝑈 𝑝 (𝑡 + 1) = 𝑍 𝑝 𝑡 , 𝑖𝑓 𝑓 𝑍 𝑝 𝑡 ≥ 𝑓 𝑈 𝑝 𝑡 ,

𝑈 𝑝 𝑡 , 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒 (13) f(.) is the fitness function.

Thus, if the trial vector has a better or equal value of the fitness function, it will replace its target vector in the next generation, otherwise the target vector is maintained. That is why the population will get better or keep constant but never get worse.

Step 10: Stopping criteria:

The process of evolving will continue by going back step 2 until one of the criteria is matched:

- t max is reached.

- the best fitness of the population does not change considerably over continuous iterations

- a specified CPU time limit is reached - gaining a pre-specified fitness value

In this case, we choose the first one as the termination criteria.

Step 11: Output:

(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION

Return the best vector ever found as the final solution, from that, build the summary.

Một phần của tài liệu (LUẬN văn THẠC sĩ) RESEARCH AND APPLY EVOLUTIONARY COMPUTATION TECHNIQUES ON AUTOMATIC TEXT SUMMARIZATION (Trang 56 - 66)

Tải bản đầy đủ (PDF)

(96 trang)