In the early years of development of artificial intelligence and computing, neurons were fast considered for their role as computing devices. In an earlier post I discussed the mapping of neurons McCulloch and Pitts (1943). This was part of the enterprise of the same enterprise of envisioning various computing machines. This is not to make the claim that those engaged in this process viewed these mappings as perfectly realistic. Rather, they captured the idea that nervous systems could be described as rule-based, deterministic systems, a necessary feature of computing machines.
The Perceptron is essentially a device that generates a line for categorizing (separating) observations. If you are familiar with Logit and Probit models, you already have a sense of the goal of the Perceptron. The Perceptron solves for a generates a separating line of good fit by initially guessing what the defining parameters should be. The algorithm weights input values and sums them. Observations are categorized by the height of the value output. When the sum of the weighted values is greater than the threshold value, $\theta$, the observation is assigned the category associated with relatively high output values. Observations with values below the threshold are assigned the category associated with relatively low output values.
Predictors are marginally adjusted by shifting the weights. When predictions are below the threshold but should be above the threshold, the weights marginally increase. When predictions are incorrectly above the threshold, the weights decrease. This simultaneously shifts the line and changes its slope, with the outcome tending to improve across iterations of adjustment. Through this automatic feedback, the accuracy of the prediction will tend to improve as the algorithm repeatedly iterates through the observations.
Adaptive Learning
Before formally outlining and demonstrating the functioning of a Perceptron, it is appropriate to identify its significance. An artificially intelligent agent must be able to interact with an environment in an orderly manner, typically in pursuit of some goal (e.g., accurate classification which may be a means to some higher level goal like survival). Learning requires adaptation. There are multiple means by which an agent or system can adapt. In previous work, I have leveraged the power of collective learning to demonstrate that the coordinating powers of markets do not require agents to be particularly intelligent. Rather, some minimum level of experimentation and some mechanism(s) driving adoption of successful innovation are all that is required. Concerning real world corollaries, successful entrepreneurs might succeed in mentoring other entrepreneurs or other entrepreneurs may simply copy their most successful competitors. This is sufficient not only for survival. Learning in this manner supports innovation capable of increasing the carrying capacity of the economic system. That is, competitive markets naturally generate innovations that improve the productivity of the average entrepreneur.
The approach taken by Rosenblatt focuses on the ability of an individual agent to learn. The metaphor considered is one of vision. How can an observer learn to differentiate classes of observations based on some limited number of features that define those observations? There was a clear sense that the nervous system played an important role in such categorization. In this light, Pitts and McCulloch (1947) open their discussion for the relationship between perception and the nervous system, noting that:
Numerous nets, embodied in special nervous structures, serve to classify information according to useful common characters. In vision they detect the equivalence of apparitions related by similarity and congruence, like those of a single physical thing seen form various places. In audition, they recognize timbre and chord, regardless of pitch. The equivalent apparitions in all cases share a common figure and define a group of transformations that take the equivalents into one another but preserve the figure invariant. . . We seek general methods for designing nervous nets which recognize figures in such a way as to produce the same output for every input belonging to the figure. We endeavor particularly to find those which fit the histology and physiology of the actual structure. (127-128)
Rosenblatt draws from Pitts and McCulloch (1947) (and would have been familiar with the work cited above and does cite it elsewhere). He makes special effort to identify inspiration from the connectionist approaches of Donald Hebb (1949) and F. A. Hayek (1952). Three years later, Rosenblatt provides a bit more narrative with regard to these citations:
Hebb (Ref. 33) and Hayek (Ref. 32), following the tradition of James Stuart Mill and Helmholtz, have attempted to show how an organism can acquire perceptual capabilities through a maturational process. For Hayek, the recognition of the attributes of a stimulus is essentially a problem in classification [emphasis mine], and his point of view has inspired Uttley (Refs . 101, 10Z) to design a type of classifying automaton which attempts to translate the approach into more rigorous mathematical form. Hebb's model is more detailed in its biological description, and suggests a process by which neurons which are frequently activated together become linked into functional organizations called "cell assemblies" and "phase sequences" which, when stimulated, correspond to the evocation of an elementary idea or percept.
Hayek also follows a similar approach with regard to neurons being activated in groups. In the preface to The Sensory Order, Hayek recognized the similarity in approach and indicates that his work was generated independently, discovering Hebb after the fact:
It seems as if the problem discussed here were coming back into favour and some recent contributions have come to my knowledge too late to make full use of them. This applies particularly to Professor D. O. Hebb's Organization of Behaviour which appeared when the final version of the present book was practically finished. That work contains a theory of sensation which in many respects is similar to the one expounded here; and in view of the much greater technical competence of the author I doubted for a while whether publication of the present book was still justified. In the end I decided that the very fullness with which Professor Hebb has worked out the physiological detail has prevented him from bringing out as clearly as might be wished the general priniciples of the theory; and as I am concerned more with the general significance of a theory of that kind than with its detail, the two books, I hope, are complementary rather than covering the same ground.
Rosenblatt seems to have identified this distinction, as I have indicated with emphasis in his description of the significance of Hayek (1952). This begs the question, which I will not go into here: If Rosenblatt, one of the earliest developers of neural networks, credited Hayek for inspiring his contribution, why does he not receive credit comparable to Donald Hebb who show regularly in the list of early citations in neural network literature?
Building and Applying a Perceptron
While I have described the structure of a Perceptron, formal presentation will vanquish any opaqueness in description of the processes that help define the Perceptron.
The key idea is that the perceptron use weights to predict the identity of an observation. We presume that identity can be divided between two distinct classes, though we might imagine predicting that an observation is either $A$ or $\neg A$.
We can generalized for n inputs the following.
$$\hat{y} = w_bb + \Sigma^n_{i=1}{w_ix_i} = w_0x_0 + w_1x_1 + ... + w_nx_n$$
If you would like, recognize that b is a column of $1$ values and call it $x_0$ so that
$$\hat{y} = \Sigma^n_{i=0}{w_ix_i} = w_0x_0 + w_1x_1 + ... + w_nx_n$$
Compare to threshold value, $\theta$.
$$\hat{y} = \Sigma^n_{i=0}{w_ix_i} \ge{\theta}$$
If we represent $x_0$ as $b$, though, it is easy to see that $-\theta$ stands in for $w_bb$.
$$\Sigma^n_{i=1}{w_ix_i} - \theta = \Sigma^n_{i=1}{w_ix_i} + w_bb$$
$$w_bb = -\theta$$
The value $\hat{y}$ is transformed to $\hat{y}'$, a discrete value of either $1$ or $-1$.
$$\hat{y}' = \left\{
\begin{array}{ll}
1 & \text{if } \hat{y} > 0 \\
-1 & \text{otherwise}
\end{array}
\right.
$$
Updating Weights
After each iteration, weights are updated. If an observation is incorrectly classified, weights are adjusted by the product of the difference between the observation and prediction, $\eta$
$$w_{j+1} = w_j + \Delta w_j$$
$$\Delta w_j = \eta (y_j - \hat{y}_j')x_j$$
Notice that if $\hat{y}_j'=y_j$, then $\Delta = 0$ and $w_{j+1} = w_j$.
Below, I include the script for visualizing this process. In the final step, there is really only need for one edge since $y_j-\hat{y}'_j=0$, but I find that it is easier to read the outcome if the case where $w_{j+1}=w_j$ is separate from the case where $w_{j+1}\ne w_j$.