Bit-Neurons That Organize Themselves: Memory and Emergence Without Gradients

May 24, 2026

Languages:Rust

Patterns:Hamming-resonanceBit-flip learningWinner-take-allDensity-fair scoringOnline competitive clusteringPattern-valued outputs

Architecture:Bit-vector value domainContent-addressable memorySelf-organizing unitsDensity-uniform encodingSubstrate consumer

Tags:research

The Hypothesis

The bet is simple to state. A neuron is a bit-pattern in and a bit-pattern out. Alone it does little. Connect many and let their patterns interfere and structure should emerge: recognition, memory, specialization. No gradients, no weight matrices, no floating point. Just bits and a few bitwise operations.

This post is the state of that bet: what we measured, what failed, and what we haven’t run. The short version is that the core works on real data, gradient-free and integer-only, but only after we fixed how the neurons compete, and only on inputs encoded to a uniform density.

How It Works

Everything is one value, a fixed-width bit-vector, plus a handful of bitwise ops. Two do most of the work.

Resonance measures how well an input matches a stored pattern. The basic form is shared bits minus differing bits. Higher means a closer match.

Learning flips bits toward a target through a mask that decides which bits may change:

next = current XOR ((target XOR current) AND mask)

No gradients. The mask is the only knob.

The choice that matters is what counts as a neuron’s output. We read the pattern it holds, not a score. A score collapses the neuron to one number and turns the system into a nearest-neighbour classifier. The pattern keeps the information that lets neurons remember and organize.

A population learns by competition. For each input, the neuron that resonates most is the only one that updates. That is winner-take-all, and bounding the interference this way is load-bearing, for reasons the results below make concrete.

What We’ve Tested

Approach	Method	Result	Status
Self-organizing memory, real digits	online winner-take-all with a density-fair score; recall and novelty in one loop	clusters self-organize to 0.48 purity (vs 0.10 chance); recall 3.8–4.5× chance at 10% corruption; novelty AUROC 1.00; running all three together degrades none	Measured
Same loop, a second dataset	identical loop, inputs rank-encoded to uniform density	purity 0.49–0.53; recall 4.7–5.0× chance; novelty AUROC 1.00; composition within ±2pp	Measured
Unsupervised specialization, synthetic clusters	16 neurons, winner-take-all bit-flip	one neuron per cluster, purity 1.0 with enough neurons, holds at 20% noise	Measured
Associative recall, synthetic	store by bit-flip, recall from a corrupted cue by resonance	256-bit neurons recall up to 128 patterns at 10% corruption, no ceiling reached	Measured
Confidence / novelty, synthetic	low resonance everywhere flags out-of-distribution	novelty AUROC 0.81 (to 50% corruption); recall-correctness AUROC 0.87 in-capacity	Measured
Continual learning	settle on clusters, then stream new ones in	with spare neurons 80–100% retained; below capacity retention falls to ~0.65	Measured
Pattern classifier (the detour)	per-patch best-alignment over stored prototypes	91.94%, up from 84.56% flat and past an 88.28% capacity ceiling	Measured
Two-layer composition	detect parts, then compose	30.7%, lost to 88.2% flat	Measured
Classifier as an executable graph	run the same math through the graph engine	0 of 50 predictions differ from the host computation, ~0.1 ms/image at 50 neurons	Measured
Large network on the graph path	—	bit-exact only verified at small scale	Built, not validated
Compile to a tight bitwise loop	—	scoped, not built	Built, not validated

Self-organization, and the fix that made it work on real data

We stream unlabelled patterns into a group of neurons and let them compete.

flowchart LR
    In["unlabelled input"] --> Cmp["who resonates most?<br/>(density-fair score)"]
    N1["neuron 1"] --> Cmp
    N2["neuron 2"] --> Cmp
    N3["neuron ..."] --> Cmp
    Cmp --> Win["winner only"]
    Win --> Learn["winner flips bits toward the input"]
    Learn -.->|"over many inputs"| Spec["one specialist per cluster"]

On synthetic clusters this worked at once. With at least as many neurons as clusters, each cluster claimed its own neuron, purity 1.0, and it held at 20% noise.

Real handwritten digits broke it. Every neuron drifted to the single sparsest digit. Purity fell to 0.11, about the chance floor. The unbounded control we keep as a warning case did not blur on real data; it covered all ten digits at 0.27–0.35 and beat the competitive version. So the synthetic win had been measuring how cleanly synthetic clusters separate, not the mechanism.

The cause was the resonance score. Shared-minus-differing rewards sparse patterns, so on digits that all share strokes every neuron slid to the same low-density blob, prototypes filling to about 360 of 784 bits. We swapped in a density-fair score, shared bits over their union (a Jaccard-style ratio). Prototypes settled sparse and distinct at about 190 of 784 bits, purity rose to 0.48, and the competitive loop beat the unbounded control again, 0.48 vs 0.35. The same inter-neuron distance check that detects blur showed the competitive neurons spread about 219 bits further apart than the unbounded ones, so this is real specialization, not hidden blur.

The ceiling is real too. 48% unsupervised purity is decent clustering, not classification, and the most confusable digits go uncovered when neurons are scarce.

Recall from a damaged cue

A neuron stores a pattern and emits it back. Give the population a corrupted version and it returns the original by resonance.

flowchart LR
    Cue["corrupted cue"] --> R["resonance vs each stored pattern"]
    M1["stored pattern A"] --> R
    M2["stored pattern B"] --> R
    M3["stored ..."] --> R
    R --> W["best match wins"]
    W --> Out["recalled PATTERN<br/>(not a label)"]

On synthetic patterns, 256-bit neurons recalled up to 128 stored patterns at 10% corruption without hitting a ceiling. Wider vectors store more and tolerate more damage. On real digits the same recall, run inside the self-organizing loop, returned the right class 3.8–4.5 times more often than chance at 10% corruption and recovered about 0.86 of the stored bits. It recalls the gist of a cluster, not an exact copy, which for a memory is the behaviour we want.

One loop, and a second dataset

The three behaviours, self-organize, recall, flag novelty, run as one online loop rather than three programs. On real digits, running them together did not degrade any of them against measuring each alone, and novelty detection on noise reached AUROC 1.00.

Then the real test of a mechanism: change the dataset. We ran the same loop, untouched, on a clothing-image set. With the same encoding it failed and even inverted, the competitive neurons ending up more blurred than the unbounded ones. The cause was density again. These images set about 32% of their bits, with every class packed into one band, so the density-fair score lost its grip.

The fix was the input. We re-encoded each image by rank: keep the brightest k pixels, set exactly those, so every image carries the same number of bits, landing near the 15% density the digits happened to have. With nothing else changed, the loop came back: purity 0.49–0.53, recall 4.7–5.0× chance, novelty AUROC 1.00, composition within 2 points. So the mechanism carries to a second real dataset on one condition: the inputs have to be encoded to a uniform density first. The digit benchmark only ever worked because it was already close to uniform.

Forgetting is a capacity limit, not decay

We let the loop settle on one set of clusters, then streamed new ones in. With spare neurons, nothing broke: new clusters took idle neurons and the old specialists kept their patterns, 80–100% retained. Forgetting only appeared when we forced fewer neurons than clusters, where retention dropped to about 0.65. Shrinking a specialized neuron’s mask after it settles recovered most of that, 0.68 to 0.87 in the tight case, until too few neurons were left. So forgetting here is a capacity wall, which points at adding neurons rather than changing the rule.

What Didn’t Work

The classifier was the wrong shape. Chasing accuracy, we built a recognition unit that scores how well an input matches stored prototypes and reached 91.94%, up from 84.56% flat and past an 88.28% capacity ceiling. The gain came from comparing local patches and letting each find its best small shift, structure beating brute template count. But to get a score we collapsed the neuron’s output to one number. That is a nearest-neighbour classifier, not the idea. The number is fine; it just doesn’t measure remembering or organizing.

Stacking layers bought nothing on digits. A two-level version, detect parts then compose them, scored 30.7% against 88.2% for the flat version. Forcing the image through a sparse part-code threw away the spatial detail digit identity lives in. The lesson is that this benchmark isn’t a composition problem, not that depth is wrong.

Unbounded interference collapses to blur on synthetic data. Let every neuron absorb every input and they converge to the same pattern. This is how an earlier version of the idea failed. The same loop with the gate on versus off is the clearest evidence we have that bounded competition is what makes structure appear. On real digits the failure mode flipped to the score bias above, which is why the gate alone wasn’t enough there.

The Untested Frontier

Is uniform-density encoding a general fix? It carried the loop from one dataset to a second once inputs were rank-encoded. We don’t know if that’s a portable rule or a coincidence that fits these two. A third, different dataset would test the fix itself. (mechanism)
A task that isn’t an image benchmark. Everything ran on synthetic clusters and two image sets. A different modality would test the core claim properly. (mechanism)
Recovering the confusable classes. Competition leaves the hardest-to-separate classes uncovered when neurons are scarce. Whether adding neurons on demand fixes that or only defers the wall is open. (structural)
The neuron as a 3-D volume. The original idea was a volume, not a flat string of bits. We haven’t tried it and suspect it adds nothing without a reason for the third dimension to mean something. (structural)
Scale. The neurons run as a graph and match the host math exactly at fifty neurons, ~0.1 ms/image. A large network would need compiling to a bitwise loop, which we’ve scoped but not built. (structural)

Open Questions

Is rank, uniform-density encoding a portable fix across datasets, or does it only happen to suit the two we’ve run?
Below the capacity wall, does adding neurons on demand restore continual learning, or just move the cliff?
Is there a task, not a digit or clothing benchmark, where composing parts in layers actually helps?
Can the most confusable classes get their own specialist without supervision?
Does a 3-D neuron ever earn its third dimension, or is a flat pattern enough?

Considerations

We read the neuron’s pattern as its output instead of a score. We give up a ready-made class label and get memory and self-organization, which a single number can’t carry.

We bound the competition to one winner. It’s less natural than letting every signal sum, but the free-for-all blurs everything to the same pattern, and the gate is why structure appears.

We made the score density-fair and, when that wasn’t enough, made the inputs uniform-density. That is a preprocessing assumption, not “works on anything raw.” Without it the neurons collapse onto the densest blob; with it the same loop carried to a second dataset.

We treat the 91.94% classifier as a side-quest. It’s a fine number, but the things we care about, remembering and organizing and knowing what you don’t know, aren’t measured by it.