Approximation Capabilities of Feedforward networks- 123docz.net

In this section, we review the capability of approximating functions by multilayer feedforward neural networks and the capability of SLFN in solving problems. From mathematical viewpoint, research on the approximation capabilities of multilayer feedforward neural networks can focus on two aspects, which are universal approximation in Rp or in one compact set of RN, and approximation in a finite set {(xi, ti)| xi∈Rp, ti∈Rc, i=1, 2, …, N}. There are many researches focusing on

universal approximation capabilities of standard multilayer feedforward neural networks shown by many researchers. One of the first researches on approximation capability of neural network was provided by Hecht-Nielsen [25] which used Kolmogorov’s theorem to prove that an arbitrary mapping can be approximated by a concrete neural network. Rigorous mathematical proof for the approximation of feedforward neural networks with continuous or monotonic sigmoidal and some classical activation functions were given in [26-28]. Hornik [29] proved that the continuous mappings can be approximated in measure by neural networks with compact input sets if the activation function is non-constant, bounded and continuous.

An improvement of Hornik’s result was shown by Leshno et al. [30], which showed that feedforward networks with a non-polynomial activation function can approximate (in measure) continuous functions. In addition, Ito and Chen showed the approximation capability of neural networks with generalized sigmoidal functions [31-33]. Before we go into reviewing these research results, let us first review

definitions as follows.

Definition 2.1 [31] f: R→R is called as generalized sigmoidal function if there

exist the limits

( ) 0 ( ) 1

x x

lim f x lim f x

→−∞

→+∞

⎧ =

⎨ =

⎩ Note: f does not need to be continuous or monotone.

Definition 2.2 [31] If a continuous function is defined in Rp and limx→∞g( )x exists,

then g(x) is called a continuous function in the extended Rp (denoted asRp), and the set of all continuous functions defined in the extended Rpis written as (C Rp), namely

( p) { ( p) : ( )

C R = g C∈ R limx→∞g x exists}

In [32, 33], Ito proved the uniform approximation capability in (C Rp) of neural networks, where sigmoidal function was assumed to be monotone. T. Chen et al. [31] improved the results of Ito, and showed the capability of approximation in

( p

C R ) by neural networks with bounded sigmoid, not necessarily monotone (generalized sigmoidal function). A significant result was proved by T. Chen et al. in [31]:

Lemma 2.1 [31]: If f(x) is a bounded generalized sigmoidal function, and g(x) is a

continuous function in R, for which limx→−∞g x( )= Aand , where A and B are constant, then for any ε>0, there exists K, α

x ( )

lim→+∞g x =B

i, wi, and bi such that

( ) K i ( i i)

g x a f w x b ε

−∑ + <

holds for all x in R.

This was the extended result on continuous and monotonic sigmoidal function.

However, it could not extend this result to all functions in R. The problem was proved by the following Lemma:

Lemma 2.2 [34]: Given any (bounded or unbounded) function f in R, if

, then for any mapping g (continuous or non-continuous) defined in R with and , and for every ε>0 there not always exist α

( ) ( )

x x

lim→−∞f x =lim →+∞f x

x ( )

lim→−∞g x limx→+∞g x( )

i, wi, and bi ∈ R and K ∈ N such that

( ) K i ( i i)

g x a f w x b ε

−∑ + <

holds for all x∈R.

This Lemma showed that only boundedness is not sufficient for extending the result on sigmoidal functions to more general cases. Some other conditions must be given and Huang et al. [34] found that the unequal limits at both infinities are needed.

The following Lemma confirmed this result.

Lemma 2.3 [34]: Given a bounded function f in R and there exist limits

, and

( ), ( )

x x

lim→−∞f x lim→+∞f x limx→−∞f x( )≠limx→+∞f x( ) , then for arbitrary mapping g in (C Rp) , for every ε >0 there exist αi, wi, and bi ∈ R and K ∈ N such that

( ) ( )

i i i

g x a f w x b ε

−∑ + <

holds for all x∈R.

From above results, we can see that standard SLFNs can uniformly

approximate arbitrary continuous mappings in (C Rp) with arbitrary bounded activation functions (continuous or non-continuous) that have two unequal limits at infinities. Their boundedness is sufficient, but not necessary.

In the approximation capability of SLFNs with a finite set, it is shown that N distinct patterns (xi, ti) can be learned precisely by SLFNs with N hidden units and the signum activation function, which is defined by

1 0

sgn( ) 0 0

1 0

if x

x if x

if x

− <

⎧⎪

=⎨ =

⎪ >

⎩

The bounds on the number of hidden units were shown in [35]. However, it is not easy to obtain bounds by Huang’s approach, especially for non-regular activation functions. Sartori and Antsaklis [36] observed that the bounds can be derived by simply satisfying a rank condition on the output of hidden layer. It was also shown that the weights for hidden layer can be chosen almost arbitrarily, and the weights for output layer can be calculated by solving N-1 linear equations. These results were proved for the case where the activation function of hidden units is the signum function. It was also pointed out that the nonlinearities for hidden units are not restricted to be the signum function. Although this method is efficient for activation functions like signum and sigmoidal, it is not feasible for all case, and its success depends on the activation and the distribution of the input patterns. For some activation functions, the almost arbitrarily chosen weights may result in the inputs of hidden units lying within a linear subinterval, from which the hidden layer output matrix is not invertible. This issue is shown in [37].

Researches on possible nonlinear functions in the approximation capability of

SLFN with a finite set were also investigated by Huang et al. [37]. It was rigorously proved that standard SLFNs with at most N hidden units and with any bounded nonlinear activation function which has a limit at infinity can learn N distinct patterns with zero error. These functions cover nearly all activation functions, which are often used in most applications. In addition, it can be conjectured that “the sufficient and necessary conditions for activation functions by which SLFNs with N hidden units can precisely approximate N distinct patterns are that these activation functions are nonlinear”.

From above results, we can conclude that SLFNs can precisely approximate arbitrary continuous mappings in (C Rp) as well as a finite set of patterns {(xi, ti)|

xi∈Rp, ti∈Rc, i=1, 2, …, N} if the activation functions are chosen properly. Therefore, we will concern with SLFNs for function approximation in the next steps instead of using general multilayer feedforward neural models.

Approximation Capabilities of Feedforward networks and SLFNs

Review of Hematocrit and Previous Measurement Methods

Experimental Results for Hematocrit Estimation