|
Slide 1:Neural Networks, Key Notes An introduction to Neural Networks, eight edition, 1996 Authors: Ben Krose, Faculty of Mathematics & Computer Science, University of Amsterdam. Patrick wan der Smagt, Institute of Robotics and Systems Dynamics, German Aerospace Research Establishment Keynote: Nelson Piedra, Computer Sciences School - Advanced Tech, Technical University of Loja UTPL, Ecuador. Slide 2:Part I Fundamentals 1. IntroductionSlide 3:First wave of interestSlide 4:First wave of interest • First wave of interest emerged after the introduction of simplified neurons by McCullock and Pitts in 1943.Slide 5:First wave of interest • First wave of interest emerged after the introduction of simplified neurons by McCullock and Pitts in 1943. • These neurons were introduced as models of biological neurons and as conceptual components for circuts that could perform computational tasks.Slide 6:ANN, “black age”Slide 7:ANN, “black age” • Perceptrons book (Minsky & Papert, 1969): showed deficiencies of perceptrons models, most neural network funding was redirected and researches left the fieldSlide 8:ANN, “black age” • Perceptrons book (Minsky & Papert, 1969): showed deficiencies of perceptrons models, most neural network funding was redirected and researches left the field • Only a few researchers continued their efforts, most notably Teuvo Kohoen, Stephen Grossberg, James Anderson, and Kunihiko FukushimaSlide 9:ANN re-emergedSlide 10:ANN re-emerged • Early eighties: ANN, re-emerged only after some important theorical results, most notably the discovery of error back- propagation, and new hardware developments increased the processing capacities.Slide 11:ANN re-emerged • Early eighties: ANN, re-emerged only after some important theorical results, most notably the discovery of error back- propagation, and new hardware developments increased the processing capacities. • Nowdays most universities have a neural networks groups (i.e. Advanced Tech - UTPL)Slide 12:¿How be can adequality characterised A.N.N.?Slide 13:¿How be can adequality characterised A.N.N.? • Artificial neural networks can be most adequately characterised as “computational models” with particular properties such as the ability,Slide 14:¿How be can adequality characterised A.N.N.? • Artificial neural networks can be most adequately characterised as “computational models” with particular properties such as the ability, • to adapt or learn, Slide 15:¿How be can adequality characterised A.N.N.? • Artificial neural networks can be most adequately characterised as “computational models” with particular properties such as the ability, • to adapt or learn, • to generalise, or Slide 16:¿How be can adequality characterised A.N.N.? • Artificial neural networks can be most adequately characterised as “computational models” with particular properties such as the ability, • to adapt or learn, • to generalise, or • to cluster or organise data, and Slide 17:¿How be can adequality characterised A.N.N.? • Artificial neural networks can be most adequately characterised as “computational models” with particular properties such as the ability, • to adapt or learn, • to generalise, or • to cluster or organise data, and • which operation is based on parallel processing.Slide 18:¿How be can adequality characterised A.N.N.? • Artificial neural networks can be most adequately characterised as “computational models” with particular properties such as the ability, • to adapt or learn, • to generalise, or • to cluster or organise data, and • which operation is based on parallel processing. • Also exist parallels with biological systemsSlide 19:Slide 20:to adaptSlide 21:to adapt to learnSlide 22:to adapt to learn parallel processSlide 23:to adapt to learn parallel process to organise dataSlide 24:to adapt to learn parallel process to organise data to clusterSlide 25:Above slide shows properties can be attributed to neural network models and existing (non-neural) models to adapt to learn parallel process to organise data to clusterSlide 26:Extent the neural approach proves to be better suited for certain applications than existing modelsSlide 27:Part I Fundamentals 2. FundamentalsSlide 28:A framework for distributed representationSlide 29:A framework for distributed representation • To understand ANN, thinking on the parallel distributed processing (PDP) ideaSlide 30:A framework for distributed representation • To understand ANN, thinking on the parallel distributed processing (PDP) idea • An artifitial network consists of a pool of simple processing units wich comunicate by sending signals to each other over a large number of weighted connections.Slide 31:Slide 32:• 1/2 Rumelhart and McClelland, 1986:Slide 33:• 1/2 Rumelhart and McClelland, 1986: • a set o processing units (‘neurons’, ‘cells’);Slide 34:• 1/2 Rumelhart and McClelland, 1986: • a set o processing units (‘neurons’, ‘cells’); • a state o activation y k for every unit, wich equivalent to the output of the unit;Slide 35:• 1/2 Rumelhart and McClelland, 1986: • a set o processing units (‘neurons’, ‘cells’); • a state o activation y k for every unit, wich equivalent to the output of the unit; • connections between the units. Generally each conection is defined by a weight w jk wich determines the effect wich the signal of unit j has on unit k;Slide 36:• 1/2 Rumelhart and McClelland, 1986: • a set o processing units (‘neurons’, ‘cells’); • a state o activation y k for every unit, wich equivalent to the output of the unit; • connections between the units. Generally each conection is defined by a weight w jk wich determines the effect wich the signal of unit j has on unit k; • a propagation rule, wich determines the effective input s k of a unit from its external inputs.Slide 37:Slide 38:• 2/2 Rumelhart and McClelland, 1986:Slide 39:• 2/2 Rumelhart and McClelland, 1986: • an activation function F k , wich determines the new level of activation based on the effective input s k (t) and the current activation y k (t);Slide 40:• 2/2 Rumelhart and McClelland, 1986: • an activation function F k , wich determines the new level of activation based on the effective input s k (t) and the current activation y k (t); • an external input (aka bias, offset) θ k for each unit;Slide 41:• 2/2 Rumelhart and McClelland, 1986: • an activation function F k , wich determines the new level of activation based on the effective input s k (t) and the current activation y k (t); • an external input (aka bias, offset) θ k for each unit; • a method for information gathering (the learning rule);Slide 42:• 2/2 Rumelhart and McClelland, 1986: • an activation function F k , wich determines the new level of activation based on the effective input s k (t) and the current activation y k (t); • an external input (aka bias, offset) θ k for each unit; • a method for information gathering (the learning rule); • an environment within wich the system must operate, provinding input signals and -if necesary- error signalsSlide 43:Processing UnitsSlide 44:Processing Units • Each unit performs a relatively simple job: Slide 45:Processing Units • Each unit performs a relatively simple job: • a) receive input from neighbours or external sources an use this to compute an output which is propagated to other units; Slide 46:Processing Units • Each unit performs a relatively simple job: • a) receive input from neighbours or external sources an use this to compute an output which is propagated to other units; • b) adjustment of the weightsSlide 47:Processing Units • Each unit performs a relatively simple job: • a) receive input from neighbours or external sources an use this to compute an output which is propagated to other units; • b) adjustment of the weights • The system is inherently parallel in the sense that many units can carry out their computations at the same timeSlide 48:s k = Σ j w jk y j + θ k f k w 1k w 2k w jk w nk y k θ k k y j The basic components of an artificial neural network. The propagation rule used here is the standard wighted summationSlide 49:Thre types of units input units, i: which receive data from outside the neural network output units, o: which send data out of neural network hidden units, h: whose input and output signals remain within the neural networkSlide 50:update of units Synchronously: all units update their activation simultanously Asynchronously: each unit has a (usually fixed) probability of updating its activation at a time t, and usually only one unit will be to do this at a time; in some cases the latter model has some advantagesSlide 51:Conections between units s k (t) = Σ j w jk (t) y j (t)+ θ k Slide 52:Conections between units • Assume that unit provides an additive contribution to the input of the unit which it is connected s k (t) = Σ j w jk (t) y j (t)+ θ k Slide 53:Conections between units • Assume that unit provides an additive contribution to the input of the unit which it is connected • The total input to unit k is simply the weighted sum of the separate outputs from each of the connected units plus a bias or offset term θ k s k (t) = Σ j w jk (t) y j (t)+ θ k Slide 54:Conections between units • Assume that unit provides an additive contribution to the input of the unit which it is connected • The total input to unit k is simply the weighted sum of the separate outputs from each of the connected units plus a bias or offset term θ k • A positive w jk is considerad excitation and negative w jk as inhibition. s k (t) = Σ j w jk (t) y j (t)+ θ k Slide 55:Conections between units • Assume that unit provides an additive contribution to the input of the unit which it is connected • The total input to unit k is simply the weighted sum of the separate outputs from each of the connected units plus a bias or offset term θ k • A positive w jk is considerad excitation and negative w jk as inhibition. • The units of propagation rule be call sigma units s k (t) = Σ j w jk (t) y j (t)+ θ k Slide 56:Different propagation rule s k (t) = Σ j w jk (t) ∏ m y jm (t)+ θ k (t)Slide 57:Different propagation rule • Propagation rule for the sigma - Pi unit, Feldman and Ballard, 1982. s k (t) = Σ j w jk (t) ∏ m y jm (t)+ θ k (t)Slide 58:Different propagation rule • Propagation rule for the sigma - Pi unit, Feldman and Ballard, 1982. • Often, the y jm are weighted before multiplication. Although these units are not frequently used, they their value for gating of input, as well as implementation of lookup tables (Mel 1990) s k (t) = Σ j w jk (t) ∏ m y jm (t)+ θ k (t)Slide 59:Activation and output rules • New value de activation: we need a function f k which takes the total input s k (t) and the current activation y k (t) and produced a new value of the activation of the unit k. y k (t+1) = f k (y k (t) , s k (t) )Slide 60:• Often, the activation function is a nondecreasing function of the total input of the unit y k (t+1) = f k ( s k (t) ) = f k ( Σ j w jk (t) y j (t)+ θ k (t) ) iSgn i semi linear isigmoid hard limiting threshold function linear o semi linear function smoothly limiting thresholdSlide 61:• For this smoonthly limiting function often a sigmoid (S-shaped) function like: p( y k ← 1 ) = 1/ ( 1 +e -s k /T ) y k = f k ( s k )=1 / ( 1 +e -s k ) • In some cases, the output of a unit can be a stochastic function of the total input of the unit. In that case the activation is not deterministically determined by the neuron input, but the neuron input determines the probability p that a neuron get a high activation ruleSlide 62:• This section focuses on the pattern of connections between the units and the propagation of data: • Feed - forward networks • Recurrent networks that do contain feedback connections Network topologiesSlide 63:Feed-forward networks • The data processing can extend over multiple (layers of) units, but no feedback connections are present, that is, connections extending from outputs of units to input of units in the same layer or previous layersSlide 64:Recurrent networks that do contain feedback connectionsSlide 65:Recurrent networks that do contain feedback connections • Contrary to feed-forward networks, the dynamical properties of the network are important.Slide 66:Recurrent networks that do contain feedback connections • Contrary to feed-forward networks, the dynamical properties of the network are important. • In some cases, the activation values of the units under go a relaxation process such that the network will evolve to a stable state in wich these activations do not change anymore.Slide 67:Recurrent networks that do contain feedback connections • Contrary to feed-forward networks, the dynamical properties of the network are important. • In some cases, the activation values of the units under go a relaxation process such that the network will evolve to a stable state in wich these activations do not change anymore. • In other applications, the change of the activation values of the output neurons are significant, such that the dynamical behaviour constitutes the output of the network (Pearlmutter, 1990)Slide 68:Recurrent networks that do contain feedback connections • Contrary to feed-forward networks, the dynamical properties of the network are important. • In some cases, the activation values of the units under go a relaxation process such that the network will evolve to a stable state in wich these activations do not change anymore. • In other applications, the change of the activation values of the output neurons are significant, such that the dynamical behaviour constitutes the output of the network (Pearlmutter, 1990) • Classical examples of feed-forward networks are the Perceptron and Adaline.Slide 69:Training of artificial neural networksSlide 70:Training of artificial neural networks • A neural network has to be configured such that the application of a set of inputs produces (either ‘direct’ or via a relaxation process) the desired set ot output.Slide 71:Training of artificial neural networks • A neural network has to be configured such that the application of a set of inputs produces (either ‘direct’ or via a relaxation process) the desired set ot output. • One way is to set the weights explicity, using a priori knowledge.Slide 72:Training of artificial neural networks • A neural network has to be configured such that the application of a set of inputs produces (either ‘direct’ or via a relaxation process) the desired set ot output. • One way is to set the weights explicity, using a priori knowledge. • Other way is to ‘train’ the neural network by feeding it teaching patterns and letting it change its weights according to some learning rule.Slide 73:Paradigms of learningSlide 74:Paradigms of learning • Supervised learning or Associative learning in which the network is trained by providing in with input and matching output patterns. These input-output pairs can be provided by an external teacher, or by the system which contains the network (self- supervised)Slide 75:Paradigms of learningSlide 76:Paradigms of learning • Unsupervised learning or Self- organisation in which an (output) unit is trained to respond to clusters of pattern within the input. In this paradigm the system is supposed to discover statistically salient features of the input population. Unlike the supervised learning paradigm, there is no a priori set of categories into which the patterns are to be classified; rather the system must develop its own representation of the input stimuli.Slide 77:Modifying patters of connectivitySlide 78:Modifying patters of connectivity Hebbian learning rule Widrow - Hoff In the next chapters some of these update rules will be discussedSlide 79:Hebbian learning ruleSlide 80:Hebbian learning rule • Suggested by Hebb in his classic book Organization of Behaviour (Hebb, 1949) • The basic idea is that if two units j and k are active simultaneously, their interconnection must be strengthened. If j receives input from k, the simplest version of Hebbian learning prescribes to modify the weight w jk with:Slide 81:Hebbian learning rule • Suggested by Hebb in his classic book Organization of Behaviour (Hebb, 1949) • The basic idea is that if two units j and k are active simultaneously, their interconnection must be strengthened. If j receives input from k, the simplest version of Hebbian learning prescribes to modify the weight w jk with: ∆w jk = ϒ y j y k; ϒ is a positive constant of proportionality representing the learning rateSlide 82:Widrow-Hoff rule or the delta ruleSlide 83:Widrow-Hoff rule or the delta rule • Another common rule uses not the actual activation of unit k but the difference between the actual and desired activation for adjusting the weights. • d k is the desired activation provided by a teacherSlide 84:Widrow-Hoff rule or the delta rule • Another common rule uses not the actual activation of unit k but the difference between the actual and desired activation for adjusting the weights. • d k is the desired activation provided by a teacher ∆w jk = γy j( d k - y k) Slide 85:Terminology Output vs activation of a unit: to be and the same thing; that is, the output of each neuron equals its activation rule Bias, offset, threshold: These terms all refer to a constant term which is input to a unit. This external input is usually implemented (and can be written) as a weight from a unit with activation value 1 Number of layers: In a feed-forward network, the inputs perform no computation and their layer is therefore not counted. Thus a network with one input layer, one hidden layer, and one output layer is referred to as a network with two layer
Printing options
views: 71 | comments: 0 | favorites: 0
Would you like to comment on this slide ?
Join SlideBurner for a free account or login if you are already a member.
|
Related Slideshow
More from user
|
|








