The Scalable Forward-Forward Algorithm

Scalable Forward-Forward (Krutsylo, 2025) carries Hinton's Forward-Forward idea to deep convolutional networks. FF wrote the class label into a few input pixels and trained every layer in isolation with two forward passes — but in a deep CNN those label pixels are convolved and pooled away, and pure layerwise training severs the shortcut connections that residual blocks rely on. SFF makes two changes. First, a tiny auxiliary convolution on each block — a 1×1 (or slightly larger) kernel with one output channel per class — turns a single forward pass into a per-class goodness score: the mean of its squared activations, one value per class. The true class is the positive signal and the rest are negative — no second pass, no pixels overwritten. Second, it trains the network one block at a time per minibatch: ordinary backpropagation runs inside each block (keeping shortcuts intact) and the block output is layer-normalized, but the signal is detached between blocks, so no gradient ever crosses a boundary. At inference, every block (or the last one) votes with its goodness vector and the averaged argmax wins. The result is a drop-in wrapper around unmodified models such as ResNet18 and MobileNetV3 — comparable to backpropagation in accuracy at 1.5-3× slowdown for a large number of classes, often lighter on memory though the saving is architecture-dependent, and stronger when data is scarce.

—

01 / 05