In mathematical statistics it is need to have a priori information
about the structure of the mathematical model. In neural networks
the user estimates this structure by choosing the number of layers
and the number and transfer functions of nodes of a neural network.
This requires not only knowledge about the theory of neural networks,
but also knowledge of the object nature and time. Besides this the
knowledge from systems theory about the systems modelled is not
applicable without transformation in neural network world. But the
rules of translation are unknown. This problems can be overcome
by GMDH type neural networks  it can pick out knowledge about object
directly from data sampling. The Group Method of Data Handling is
the inductive sortingout method, which has advantages in the cases
of rather complex objects, having no definite theory, particularly
for the objects with fuzzy characteristics.
The table gives a comparison of both methodologies: neural networks
and inductive selforganizing modeling in connection with their
application to data analysis.

Neural
networks 
Statistical
learning GMDH networks 
Data analysis 
universal approximator 
universal structure identificator 
Analytical model 
indirect approximation 
direct approximation 
Architecture 
preselected unbounded network structure; experimental selection
of adequate architecture demands time and experience 
bounded network structure evolved during estimation process 
Network synthesis 
globally optimized fixed network structure 
adaptively synthesized structure 
Apriori Information 
without transformation in the concepts of neural networks
not usable 
can be used directly to select the reference functions and
criteria 
Selforganization 
deductive, subjective choice of layers number and number of
nodes 
inductive, number of layers and of nodes estimated by minimum
of external criterion (objective choice) 
Parameter estimation 
in a recursive way;
demands long samples 
estimation on training set by means of maximum likelihood
techniques, selection on testing set (may be extremely short
or noised) 
Optimization 
global search in a highly multimodal space, result depends
from initial solution, tedious and requiring from user to set
various algorithmic parameters by trial and error, timeconsuming
technique 
simultaneously optimize the structure and dependencies in
model, not timeconsuming technique, inappropriate parameters
not included automatically 
Access to result 
available transiently in a realtime environment 
usually stored and repeatedly accessible 
Initial knowledge 
needs knowledge about the theory of neural networks 
necessary knowledge about the kind of task (criterion) and
class of system (linear,nonlinear) 
Convergence 
global convergence is difficult to guarantee 
model of optimal complexity is founded 
Computing 
suitable for implementation using hardware with parallel computation 
efficient for ordinary computers and also for massively parallel
computation 
Features 
generalpurpose, flexible, nonlinear (especially linear)
static or dynamic models 
generalpurpose, flexible linear or nonlinear, static or dynamic,
parametric and nonparametric models 
Results obtained by statistical learning networks and especially
GMDH algorithms are comparable
with results obtained by neural networks [30]. The wellknown problems
of an optimal (subjective) choice of the neural network architecture
are solved in the GMDH algorithms by means of an adaptive synthesis
(objective choice) of the architecture. Such algorithms combining
the best features of neural nets and statistical techniques in a
powerful way discover the
entire model structure directly from data sample  in the form of
a network of polynomial functions, difference equations or another
structure type. Models are selected automatically based on their
ability to solve the task (approximation, identification, forecasting,
classification).
Example: Comparison of identification and prediction
of the system of equations by ANN and GMDH networks
Let us consider the system of equation
y_{1t} = 2y_{1t1} (1  y_{1t1})
;
y_{2t} = 1 + 0.5 y_{1t2} y_{2t1}
;
where the first is the logistic function. For estimation data
with additional noise y^{i}_{t} = y_{it}
+ az was used, where 0.5 < z < 0.5 and a = 0; 0.1; 0.5 .
For a=0 the following model were received by selforganizing algorithm
on the basis of 50 observations:
y^{1}_{t} = 2.00 y^{1}_{t1}
+ 2.00 y^{1}_{t1} y^{1}_{t1}
+ 2.34e^{6},
y^{2}_{t} = 1.002 + 0.50005 y^{1}_{t2}
y^{2}_{t1} + 4.8e^{8} y^{1}_{t2}
,
Neural networks are unable to identify the system. Several implicit
models, distributed in the BP neural network were obtained using
5 input neurons, 2 output and n = 2,4,6,10,20 neurons in the hidden
layer. In this example was shown that for large complexity n the
approximation error is independent from noise level and models are
overfitted. The selected models were used for prediction on 10 steps
ahead.
Table: Prediction error MAD = 1/10 * SUM  (yy_{m})/y
 *100%

ANN

GMDH


y_{1}

y_{2}

y_{1}

y_{2}

n

a = 0

2

116.0

7.7

0.0

0.0

6

8.0

2.0



10

18.9

1.4




a = 0.1

2

21.1

4.8

6.9

1.3

6

20.6

4.0



10

18.9

3.7




a = 0.5

2

24.6

8.5

29.6

5.0

6

27.0

5.8



10

27.1

10.2



This results were received by Prof.J.A.Mueller and Dr.F.Lemke.
We thank them for help in preparing of this subsection.
