Methodologies for automatic text summarization- 123docz.net

Up to now, there have been many methods applied to summarize text automatically including [21]:

- Traditional methods: term, word, phrase frequencies

- Corpus-based approaches: combination of statistical features, learning to extract

- Discourse structures: Word-net, Rhetorical analysis

- Knowledge rich approaches: different for particular domains

Evolutionary computation is a new approach to summarize text automatically, in which solutions are evolved until a certain benchmark is satisfied.

(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION(LUAN.van.THAC.si).RESEARCH.AND.APPLY.EVOLUTIONARY.COMPUTATION.TECHNIQUES.ON.AUTOMATIC.TEXT.SUMMARIZATION

2. 2. Evolutionary computation

In computer science, evolutionary computation is a subfield of artificial intelligence, defined by some types of evolutionary algorithms which is based on Darwinian principles. They belong to the family of trial and error problem solvers and can be regarded as global optimization methods with meta-heuristic or stochastic optimization character, in which there exists the utilization of a population of candidate solutions [1].

Evolutionary computation uses continuous progression of the population, which is then selected in a guided random search to get the required stop.

Automated problem solving that uses Darwinian principles started in the 1950s.

However, three different interpretations of this idea started to be implemented in 1960s in three strands.

Evolutionary programming (EP) was invented by Lawrence J.Fogel in the US, while John Henry suggested a method named genetic algorithm (GA). Ingo Rechenberg and Hans-Paul Schwefel introduced evolution strategies (ES).

Although these algorithms are proposed quite soon, they are only considered as different types of one technology known as evolutionary computation from the early nineties [1].

This is a concept based on natural evolution. In nature, all plants and animals which can exist and adapt to the changing environment so far are the best ones, not eliminated by the natural selection process. Individuals in a population are parents, producing new offspring through mutation and crossover. These new children have to fight against others including their parents to survive in the next generation. Overall, mutation and crossover diversify properties of offspring while natural selection results in an increase in the quality (fitness) of population [2]. Table 2.1 below shows us equivalent concepts between natural evolution and problem solving [3].

Evolution Problem solving

Environment Problem

Individual Candidate solution

Fitness Quality

Table 2.1. The basic evolutionary computation linking natural evolution to problem solving

Figure 2.5 and Figure 2.6 illustrates typical pseudo-code and scheme of evolutionary algorithms [3].

Figure 2.5. The general scheme of an Evolutionary Algorithm in pseudo-code

Figure 2.6. General scheme of evolutionary algorithms

Evolutionary computation consists of some algorithms which are used to search for optimal solutions to a problem.

Figure 2.6 illustrates the transformation of typical population till the end. An evolutionary algorithm starts by initializing a number of individuals forming a population. Each individual is evaluated based on a fitness function, which is varied among types of algorithms and specific problem. Some or all of these individuals are chosen to be parents. They experienced reproduction operators to produce new children. Fitness values of those offspring are calculated. In other words, those offspring’s quality are assessed. Better ones between parents and children are chosen to be member of the next generation. The process is repeated until the best individual is found based on a certain stopping criteria.

According to A.E.Eiben and J.E.Smith, the typical progression of fitness in a run is in the following Figure 2.7 [3]:

Figure 2.7. Correlation between number of generations and best fitness in population

The mechanisms deciding how to create children and the way to choose among parents and children varies among specific evolution algorithms. The following section explains in details a technique applied to the real world problem:

Automatic text summarization. In this thesis, we deal with extractive multi- document summarization.

There are some typical evolutionary algorithms such as: differential evolution , genetic algorithm, genetic programming, evolutionary programming, etc. In this research, we focus on the first mentioned one.

2. 3. Differential evolution (DE)

Differential Evolution appeared when Ken Price tried to solve the Chebychev Polynomial fitting Problem [14] that had been asked by Rainer Storn. A progress was made when Ken came up with the idea of using vector differences for perturbing the vector population. Since then, discussion between Ken and Rainer and computer simulations on both parts brought in many considerable improvements which make DE the flexible and powerful tool it is today. The

"DE community" has been developing since the early DE years of 1994 - 1996 and more researchers are working on and with DE. Ken and Rainer wish that DE will be improved further by scientists around the world and that DE may help more users in their daily work. This wish is the reason why DE has not been patented [8].

Figure 2.8. Steps of differential evolution algorithm

In this algorithm, after initializing population of a certain number of individuals, each of which is a float-valued vector bounded in a specific range, these vectors (target vectors) might be binarized and evaluated based on fitness/objective function. The idea of this algorithm is that new generations of individuals are created based on their parents and some operators like mutation – based on the difference of random sampled pairs of individuals, defining searching mechanism, crossover – exchange elements of a pair of individuals, increasing the diversity of features of offspring and selection – choose better individuals between parents and their children to become member of the next generation.

The process is repeated until the t max generation is reached or a predefined fitness value is satisfied. The result of the algorithm will be the best individual corresponding to a float-valued or binary vector P of n dimension. In case of text summarization , n is the number of sentences in document collection, P is

otherwise the value P[i] = 0, i={1, 2, …, n}. Figure 2.8 illustrates main steps of a typical differential evolution algorithm.

Pseudo-code of this algorithm is given below [15]:

Begin

Generate randomly an initial population of solutions Calculate the fitness of the initial population

For each parent

Select three different solutions at random

Create one offspring using DE operators (mutation, crossover)

If offspring is the same or better than its parent (selection) Parent is replaced

End For

While the stopping condition is not satisfied End

Example [7]:

The following numerical example is given to demonstrate the DE algorithm. We have the objective/fitness function:

Maximize f(X) = x 1 + x 2 + x 3 in which X = [x 1 , x 2 , x 3 ]

Our goal is to find x 1 , x 2 , x 3 . We will follow steps in pseudo-code above to solve this problem.

Generate randomly an initial population of solutions:

Each individual or solution contains values for x 1 , x 2 and x 3 . Thus, it is a three- dimension vector and has the form Xp = [x p.1 , x p.2 , x p.3 ]. We initialize P individuals bounded in the interval [x min , x max ]. In this case, we choose x min = 0,

Overall, we generate randomly six three-dimension vectors X1, X2, X3, X4, X5 and X6 such that elements of these vectors are bounded in (0,1).

Calculate the fitness of the initial population (generation 0):

X 1 X2 X 3 X4 X5 X6

x 1 0.68 0.92 0.22 0.12 0.40 0.94

x 2 0.89 0.92 0.14 0.09 0.81 0.63

x 3 0.04 0.33 0.40 0.05 0.83 0.13

f(X) 1.61 2.17 0.76 0.26 2.04 1.70

Table 2.2.Fitness of six individuals at generation 0 For each parent:

Now these six individuals are going to produce their own children.

Firstly, choose individual 1 as the first target vector (the first parent):

Select three different solutions at random:

We select randomly three different individuals, for example: individual 2, 4 and 6.

Create one offspring using DE operators:

Mutation: Mutant vector V1 = [v 1.1 , v 1.2 , v 1.3 ] v 1.i = x 6.i + F * (x 2.i -x 4.i )

i = {1, 2, 3}

F: mutant factor

X2 X4 Difference Vector

Weighted Difference

Vector

Mutant vector

(V1) x 1 0.92

0.12 0.80

× F (F=0.8)

0.64

0.94 1.58

x 2 0.92 0.09 0.83 0.66 0.63 1.29

x 3 0.33 0.05 0.28 0.22 0.13 0.35

Table 2.3. Creation of mutant vector V1

The result of mutation operator is a mutant vector. We say the mutant vector corresponding to target vector X1= [0.68, 0.89, 0.04] is V1 = [1.58, 1.29, 0.35]

Crossover: The mutant vector V1 does a crossover with the target vector X1 to create the trial vector Z1 as shown in Table 2.4.

z p.i = 𝑣 𝑝.𝑖 𝑖𝑓 𝑟𝑎𝑛𝑑 𝑝.𝑖 ≤ 𝐶𝑅 𝑜𝑟 𝑖 = 𝑘 𝑥 𝑖.𝑗 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒

k: random number, k ∈ {1, 2, 3}.

CR: crossover rate

rand p,i : random number within [0,1], reassigned for each i th component of the p th vector

If k = 1, then

Target vector

(X1)

Random vector (rand p.i )

Mutant vector

(V1)

Trial vector

(Z1)

x 1 0.68 0.8

Crossover (CR = 0.5)

1.58 1.58

x 2 0.89 0.3 0.89 0.89

x 3 0.04 0.7 0.35 0.04

f(X) 1.61 2.51

If offspring is the same or better than its parent (selection), parents are replaced:

f(X1) < f(Z1), then the trial vector becomes target vector X1 of the next generation (generation 1) as shown in Table 2.5

X1 X2 X3 X4 X5 X6

x 1 1.58

x 2 0.89

x 3 0.04

f(X) 2.51

Table 2.5. Values of X1 in generation 1

The process continues with target vector X2 of generation 0. The process ends when generation t max (user defined) is reached. We will take the maximum f(X) as the maximum fitness value and X as the final solution. Figure 2.9 specifies steps to get the value of X1 at the generation 1:

Figure 2.9. Steps to get the next X1 (generation 1)

On the whole, properties that we need to care for in DE:

- Solution representation: a real-valued or binary vector

- Number of individuals in population when initialized: Population is usually initialized randomly, bounded in an interval. Population size shows the number of individuals in the population in a generation. This is an important parameter we need to decide. If population size is too small, the algorithm converge too fast, individuals can just reach a small part of the searching space. On the other hand, population size is too big, leading to resources waste, extending the searching process.

- Objective/fitness function: This function evaluates how good the solution is, therefore it needs to be built carefully.

- Operators [22]:

o Mutation: The goal is to define searching mechanism of the algorithm, generating mutant vectors. This mutant factor (F) decides how many perturbation ratios the solution can obtain. If mutant factor is great, the size of jump will be increased. That is, the population can break away the regional optimum effectively

while small mutant factor means convergence comes rapidly, but there is a high probability for the algorithm to stop in a regional optimum.

o Crossover: The aim is to create trial vectors from mutant vectors.

Trial vectors will be a mixture between parent and mutant vectors.

The higher the crossover rate (CR) is, the more likelihood that the trial vectors have more properties of mutant vectors than parent ones. In other words, CR decides the swap probability between target and trial vector.

o Selection: This makes sure the next generation at least remains or gets better than the previous generation.

2. 4. Conclusion

This chapter has provided readers with background knowledge about automatic text summarization, its types and methodologies. The next thing is the main idea of evolutionary computation and its associated algorithms, one of which is differential evolution. In DE, we generalize the algorithm and properties that affect its efficiency. The detail DE algorithm used to solve the problem of text summarization is mentioned in the next chapter.

Methodologies for automatic text summarization

Main steps of differential evolution