In mathematical statistics it is need to have a priori information
about the structure of the mathematical model. In neural networks
the user estimates this structure by choosing the number of layers
and the number and transfer functions of nodes of a neural network.
This requires not only knowledge about the theory of neural networks,
but also knowledge of the object nature and time. Besides this the
knowledge from systems theory about the systems modelled is not
applicable without transformation in neural network world. But the
rules of translation are unknown. This problems can be overcome
by GMDH type neural networks - it can pick out knowledge about object
directly from data sampling. The Group Method of Data Handling is
the inductive sorting-out method, which has advantages in the cases
of rather complex objects, having no definite theory, particularly
for the objects with fuzzy characteristics.
The table gives a comparison of both methodologies: neural networks
and inductive self-organizing modeling in connection with their
application to data analysis.
|
Neural
networks |
Statistical
learning GMDH networks |
Data analysis |
universal approximator |
universal structure identificator |
Analytical model |
indirect approximation |
direct approximation |
Architecture |
preselected unbounded network structure; experimental selection
of adequate architecture demands time and experience |
bounded network structure evolved during estimation process |
Network synthesis |
globally optimized fixed network structure |
adaptively synthesized structure |
Apriori Information |
without transformation in the concepts of neural networks
not usable |
can be used directly to select the reference functions and
criteria |
Self-organization |
deductive, subjective choice of layers number and number of
nodes |
inductive, number of layers and of nodes estimated by minimum
of external criterion (objective choice) |
Parameter estimation |
in a recursive way;
demands long samples |
estimation on training set by means of maximum likelihood
techniques, selection on testing set (may be extremely short
or noised) |
Optimization |
global search in a highly multimodal space, result depends
from initial solution, tedious and requiring from user to set
various algorithmic parameters by trial and error, time-consuming
technique |
simultaneously optimize the structure and dependencies in
model, not time-consuming technique, inappropriate parameters
not included automatically |
Access to result |
available transiently in a real-time environment |
usually stored and repeatedly accessible |
Initial knowledge |
needs knowledge about the theory of neural networks |
necessary knowledge about the kind of task (criterion) and
class of system (linear,non-linear) |
Convergence |
global convergence is difficult to guarantee |
model of optimal complexity is founded |
Computing |
suitable for implementation using hardware with parallel computation |
efficient for ordinary computers and also for massively parallel
computation |
Features |
general-purpose, flexible, non-linear (especially linear)
static or dynamic models |
general-purpose, flexible linear or nonlinear, static or dynamic,
parametric and non-parametric models |
Results obtained by statistical learning networks and especially
GMDH algorithms are comparable
with results obtained by neural networks [30]. The well-known problems
of an optimal (subjective) choice of the neural network architecture
are solved in the GMDH algorithms by means of an adaptive synthesis
(objective choice) of the architecture. Such algorithms combining
the best features of neural nets and statistical techniques in a
powerful way discover the
entire model structure directly from data sample - in the form of
a network of polynomial functions, difference equations or another
structure type. Models are selected automatically based on their
ability to solve the task (approximation, identification, forecasting,
classification).
Example: Comparison of identification and prediction
of the system of equations by ANN and GMDH networks
Let us consider the system of equation
y1t = -2y1t-1 (1 - y1t-1)
;
y2t = 1 + 0.5 y1t-2 y2t-1
;
where the first is the logistic function. For estimation data
with additional noise yit = yit
+ az was used, where -0.5 < z < 0.5 and a = 0; 0.1; 0.5 .
For a=0 the following model were received by self-organizing algorithm
on the basis of 50 observations:
y1t = -2.00 y1t-1
+ 2.00 y1t-1 y1t-1
+ 2.34e-6,
y2t = 1.002 + 0.50005 y1t-2
y2t-1 + 4.8e-8 y1t-2
,
Neural networks are unable to identify the system. Several implicit
models, distributed in the BP neural network were obtained using
5 input neurons, 2 output and n = 2,4,6,10,20 neurons in the hidden
layer. In this example was shown that for large complexity n the
approximation error is independent from noise level and models are
overfitted. The selected models were used for prediction on 10 steps
ahead.
Table: Prediction error MAD = 1/10 * SUM | (y-ym)/y
| *100%
|
ANN
|
GMDH
|
|
y1
|
y2
|
y1
|
y2
|
n
|
a = 0
|
2
|
116.0
|
7.7
|
0.0
|
0.0
|
6
|
8.0
|
2.0
|
|
|
10
|
18.9
|
1.4
|
|
|
|
a = 0.1
|
2
|
21.1
|
4.8
|
6.9
|
1.3
|
6
|
20.6
|
4.0
|
|
|
10
|
18.9
|
3.7
|
|
|
|
a = 0.5
|
2
|
24.6
|
8.5
|
29.6
|
5.0
|
6
|
27.0
|
5.8
|
|
|
10
|
27.1
|
10.2
|
|
|
This results were received by Prof.J.-A.Mueller and Dr.F.Lemke.
We thank them for help in preparing of this subsection.
|