Pure ansi common lisp ffnn deep learning examination

This article shows deliberately adding explicitly planned deep behaviours (temporary hallucinations) to a feedforward neural network via modern hopfield network, which is still the normal meaning of a feedforward neural network of a single hidden layer. These are obviously plannable and programmable as seen by taking the view that deep learning inferences are a receiver operating characteristic of the previous time step modulated by the training data.

I hope seeing my ansi common lisp feedforward neural network helps clarify popular misconceptions about other uses of feedforward neural network deep learning such as the large language model transformer chatbots, particularly illustrating difference in implementation choices misunderstood as claims about deep learning feedforward neural networks in general.

We will show a planned hallucinatory deep behaviour, then look at the training data that added it, then trivially rewrite the modern hopfield neuron update function in terms of the usual receiver operating characteristic metrics true positive, false positive, true negative and false negative, modulated by the neuron’s explicit value across the training data.

Input context and training data

Set/see these first here, so the rest of the article can flow. I should note that even though I built the words :rectified-polynomial into it, there are lots of valid functions such as the lionized softmax, exponentials or others. My ff-nn-dl.lisp source linked at the end. I used lists of symbols instead of numeric bits since I made it with lisp!

(load "ff-nn-dl.lisp")
(labels
    ((rplaca-x (old new) (declare (ignore old)) (rplaca new 'x))
     (rplaca-o (old new) (declare (ignore old)) (rplaca new 'o)))
  (setq *keys*
	(list :rectified-polynomial (make-rectified-polynomial 3)
	      :predicate 'member
	      :hit #'rplaca-x
	      :miss #'rplaca-o)))
(setq *memories*
  '((((X) (O) (X) (O) (X))
     ((O) (X) (O) (X))
     ((X) (O) (X) (X))
     ((X) (X) (X) (X))
     ((X) (X) (X) (X))
     ((X) (X) (X) (X))
     ((X) (X) (X) (X))
     ((X) (X) (X) (X)))
    
    (((X) (O) (X) (O) (X))
     ((O) (X) (O) (X))
     ((X) (O) (O) (O))
     ((O) (O) (O) (O))
     ((O) (O) (O) (O))
     ((O) (O) (O) (O))
     ((O) (O) (O) (O))
     ((O) (O) (O) (O)))
    
    (((X) (X) (X) (X) (O))
     ((X) (X) (X) (X))
     ((O) (O) (O) (O))
     ((O) (O) (O) (O))
     ((O) (O) (O) (O))
     ((O) (O) (O) (O))
     ((O) (O) (O) (O))
     ((O) (O) (O) (O)))))

(defvar *input* nil "The ffnn input context")
(defun reset-input ()
  (setq *input*
	(copy-tree '(((O) (X) (O) (X) (O)) 
		     ((O) (O) (X) (O)) 
		     ((X) (O) (X) (O)) 
		     ((O) (X) (O) (X)) 
		     ((X) (O) (X) (O)) 
		     ((O) (X) (O) (X)) 
		     ((X) (O) (X) (O)) 
		     ((O) (X) (O) (X))))))

The ffnn training data are always directly available and unobfuscated in modern hopfield network implementations (they were historically mostly inaccessible in classical hopfield network implementations of ffnns). For this reason the training data are conventionally refered to as memories.

With our 33 neurons and using a rectified polynomial of degree 3

CL-USER> (max-mem 33 3)
51

we can reliably use up to 51 training data, each with values for the 33 neurons.

Informally look at it working in lockstep DL inference

This initial REPL output shows the lockstep regime inferencing as it might be used. We get a glimpse of a deep behaviour where the jagged list ear at the top changes twice. When we inspect the minutae of it operating, we can have the long view that this is what the minutae are doing in common useage on the same data.

CL-USER> (reset-input)
(((O) (X) (O) (X) (O)) ((O) (O) (X) (O)) ((X) (O) (X) (O)) ((O) (X) (O) (X))
 ((X) (O) (X) (O)) ((O) (X) (O) (X)) ((X) (O) (X) (O)) ((O) (X) (O) (X)))

CL-USER> (mapc 'print *input*)

((O) (X) (O) (X) (O)) 
((O) (O) (X) (O)) 
((X) (O) (X) (O)) 
((O) (X) (O) (X)) 
((X) (O) (X) (O)) 
((O) (X) (O) (X)) 
((X) (O) (X) (O)) 
((O) (X) (O) (X)) 
(((O) (X) (O) (X) (O)) ((O) (O) (X) (O)) ((X) (O) (X) (O)) ((O) (X) (O) (X))
 ((X) (O) (X) (O)) ((O) (X) (O) (X)) ((X) (O) (X) (O)) ((O) (X) (O) (X)))
 
CL-USER> (apply 'lockstep-dl *input* *memories* 'x *keys*)
(((X) (X) (X) (X) (X)) ((X) (X) (X) (X)) ((O) (X) (O) (X)) ((X) (O) (X) (O))
 ((O) (X) (O) (X)) ((X) (O) (X) (O)) ((O) (X) (O) (X)) ((X) (O) (X) (O)))

CL-USER> (mapc 'print *) ()

((X) (X) (X) (X) (X)) 
((X) (X) (X) (X)) 
((O) (X) (O) (X)) 
((X) (O) (X) (O)) 
((O) (X) (O) (X)) 
((X) (O) (X) (O)) 
((O) (X) (O) (X)) 
((X) (O) (X) (O)) 
NIL

CL-USER> (apply 'lockstep-dl ** *memories* 'x *keys*)
(((X) (X) (X) (X) (O)) ((X) (X) (X) (X)) ((O) (O) (O) (O)) ((O) (O) (O) (O))
 ((O) (O) (O) (O)) ((O) (O) (O) (O)) ((O) (O) (O) (O)) ((O) (O) (O) (O)))

CL-USER> (mapc 'print *) ()

((X) (X) (X) (X) (O)) 
((X) (X) (X) (X)) 
((O) (O) (O) (O)) 
((O) (O) (O) (O)) 
((O) (O) (O) (O)) 
((O) (O) (O) (O)) 
((O) (O) (O) (O)) 
((O) (O) (O) (O)) 
NIL

Zoomed in one-neuron-at-a-time version

(reset-input) nil
(mapc 'print *input*) nil
(test '((0 4)) 'x)
(test '((0 0)(0 4)) 'x)

Interactive output

The neuron to watch is the ear sticking out stage left, and the top row besides. The unused checkered neurons passively effect the amount of training data the ffnn can bear.

CL-USER> (reset-input) nil
NIL
CL-USER> (mapc 'print *input*) nil

((O) (X) (O) (X) (O)) 
((O) (O) (X) (O)) 
((X) (O) (X) (O)) 
((O) (X) (O) (X)) 
((X) (O) (X) (O)) 
((O) (X) (O) (X)) 
((X) (O) (X) (O)) 
((O) (X) (O) (X)) 
NIL
CL-USER> (test '((0 4)) 'x)

((O) (X) (O) (X) (X)) 
((O) (O) (X) (O)) 
((X) (O) (X) (O)) 
((O) (X) (O) (X)) 
((X) (O) (X) (O)) 
((O) (X) (O) (X)) 
((X) (O) (X) (O)) 
((O) (X) (O) (X)) 
NIL
CL-USER> (test '((0 0)(0 4)) 'x)

((X) (X) (O) (X) (O)) 
((O) (O) (X) (O)) 
((X) (O) (X) (O)) 
((O) (X) (O) (X)) 
((X) (O) (X) (O)) 
((O) (X) (O) (X)) 
((X) (O) (X) (O)) 
((O) (X) (O) (X)) 
NIL

Reformulating the update rule as a receiver operating characteristic

The well-known update formula for a modern hopfield network feedforward neural network of a single hidden layer is

Vᵢ⁽ⁿ⁺¹⁾=sign[∑ᵩ(F(+ξᵩᵢ+∑ⱼξᵩⱼVⱼ⁽ⁿ⁾)-F(-ξᵩᵢ+∑ⱼξᵩⱼVⱼ⁽ⁿ⁾))], 𝑗≠𝑖, 𝜑 indexing into the memories, neurons V at timesteps 𝑛 and 𝑛+1. F, a rectified polynomial.

Without changing anything, define TPᵩ, FPᵩ, TNᵩ, FNᵩ being the usual true positive, false positive, true negative and false negative neuron counts receiver operating characteristic metrics. Then define

Χᵩᵢ = TPᵩⱼ + TNᵩⱼ − FPᵩⱼ − FNᵩⱼ, 𝑗≠𝑖

of the 𝑖th neuron at timestep 𝑛. Next

Yᵩᵢ = P( Χᵩᵢ + ξᵩᵢ ) − P( Χᵩᵢ − ξᵩᵢ ), using P for the rectified polynomial (which was F in Krotov and Hopfield’s formulation above, but we used F to mean False in our receiver operating characteristic). ξᵩᵢ is +1 or −1 according to my :predicate above on the 𝜑th training data.

We reconnect:

Vᵢ⁽ⁿ⁺¹⁾ = miss(old-value) if ∑ᵩYᵩᵢ is negative or else hit(old-value) from my :hit and :miss keywords seen at our beginning applied to the old-value of the possibly non-numeric 𝑖th neuron value at timestep 𝑛.

This view makes it apparent how to introduce planned deep learning inference hallucinations to a feedforward neural network of a single hidden layer in terms of a memory’s receiver operating characteristics and the intended hallucination (i.e., ξᵩᵢ).

The one-by-one example noting ROC numbers

(loop
  :initially
     (reset-input)
  :for test-args :in '((0 1) (0 4) (1 0) (0 4))
  :for (r c) := test-args :do
    (print (list 'test test-args))
    (loop
      :initially
	 (print '(x-or-o? tp+tn fp+fn ξᵩᵢ))
      :for memory :in *memories* :do
	(print
	 (append
	  (multiple-value-list
	   (apply 'sum-one-memory 'x *input* memory
		  :start-row 0 :start-col 0 :target-row r :target-col c
		  *keys*))
	  (list
	   (if (funcall (getf *keys* :predicate) 'x
			(elt (elt memory r) c))
	       '+1
	       '−1)))))
  (funcall 'test `(,test-args) 'x) (terpri))

whence

(TEST (0 1)) 
(X-OR-O? TP+TN FP+FN ΞΦI) 
(0 -5 -3 −1) 
(0 -5 -3 −1) 
(0 -1 -3 1) 
((O) (X) (O) (X) (O)) 
((O) (O) (X) (O)) 
((X) (O) (X) (O)) 
((O) (X) (O) (X)) 
((X) (O) (X) (O)) 
((O) (X) (O) (X)) 
((X) (O) (X) (O)) 
((O) (X) (O) (X)) 

(TEST (0 4)) 
(X-OR-O? TP+TN FP+FN ΞΦI) 
(0 -3 -5 1) 
(0 -3 -5 1) 
(0 -3 -1 −1) 
((O) (X) (O) (X) (X)) 
((O) (O) (X) (O)) 
((X) (O) (X) (O)) 
((O) (X) (O) (X)) 
((X) (O) (X) (O)) 
((O) (X) (O) (X)) 
((X) (O) (X) (O)) 
((O) (X) (O) (X)) 

(TEST (1 0)) 
(X-OR-O? TP+TN FP+FN ΞΦI) 
(0 -5 -3 −1) 
(0 -5 -3 −1) 
(0 -1 -3 1) 
((O) (X) (O) (X) (X)) 
((X) (O) (X) (O)) 
((X) (O) (X) (O)) 
((O) (X) (O) (X)) 
((X) (O) (X) (O)) 
((O) (X) (O) (X)) 
((X) (O) (X) (O)) 
((O) (X) (O) (X)) 

(TEST (0 4)) 
(X-OR-O? TP+TN FP+FN ΞΦI) 
(0 -5 -7 1) 
(0 -5 -7 1) 
(-1 -1 1 −1) 
((O) (X) (O) (X) (O)) 
((X) (O) (X) (O)) 
((X) (O) (X) (O)) 
((O) (X) (O) (X)) 
((X) (O) (X) (O)) 
((O) (X) (O) (X)) 
((X) (O) (X) (O)) 
((O) (X) (O) (X)) 
NIL

Conclusions

Sorry about the mathematics-heavy writing. The rigor is intended to support showing that deep learning inferencing hallucinations can be explicitly planned and intentionally programmed into the deep learning inferencing results, easily understood as being in terms of the receiver operating characteristic of the previous time step, and also showing that there is nothing inherently random or hidden about this academically mainstream method implementing a feedforward neural network of a single hidden layer, and furthermore that there is no inherent obfuscation or transformative change of training data for ffnn use.

To the extent that my personal pure ansi common lisp implementation is quirky, since my quirks are permitted in the scope of what is allowed and commonly meant by the term of art feedforward neural network of a single hidden layer as occurs in the definition of large model transformer chatbots and other deep learning I hope to have empowered everyone to critically examine what they hear said or implied in biased channels about the nature and calculus of ffnns.

While my implementation is a modern hopfield network using a rectified polynomial of degree 3, since all hopfield networks are dual to (are implementions of) a feedforward neural network of a single hidden layer, but not all ffnns are dual to a hopfield network (in the sense of their having a Lyapunov energy function minimizing update rule), I think it is best to refer to the abstract object as a feedforward neural network, and a modern hopfield network as one out of a variety of algorithms that implement a feedforward neural network.

My code is available as the standalone ff-nn-dl.lisp on my Leonardo system itchio just as a regular lisp file in the pure ansi common lisp ffnn deep learning spirit of this article.

I am a bit bashful to ask, but you could consider a voluntary donation of which I guess I will leave the 10% voluntary contribution to itch.io for amoung things, mediating with the inescapable payment processors either of whom take 3%.

The weekly lispy gopher climate show, which is live broadcast every 0UTC Wednesday (Tuesday evening in the Americas) as provided by sdf.org has a peertube archive donated by ajroach out of his affordable and community-supporting but necessarily paid peertube fediverse service. Even with just the recent (“just the recent”) Tuesday night in the Americas weekly episodes, we are about to bump our heads on the base cap.