AIR DEFENSE

Imitation learning, live

Drones drift down from the dark. You command the right turret — aim with the mouse, click to fire one round. Your barrel turns slowly and your magazine reloads a round at a time, so every shot has to count. Lead the targets and make them land.

The left turret learns from you: each hit you land becomes a training example, and a small neural net learns to aim — leads and all. It also learns which drone to take next from the cost of the ones that slip through, and it defends the whole skyline. Train it well and watch it hold the city. Tune everything under ⚙ Options.

Mouse — aim Click — one shot P — pause

Choose a difficulty

CITY OVERRUN

The skyline took too many hits

You

AI student

Survived

ABOUT THE TRAINING

Two tiny neural nets, trained live in your browser

The left turret can't see the sky the way you do. Instead it watches you play and learns two separate things: how to aim (by copying you) and what to shoot first (from the cost of the drones that get through). Both are small multilayer perceptrons (MLPs) written in plain JavaScript and trained live — no server, no pre-training.

1 · Learning to aim — imitation learning

Every time you land a shot, the game stores one training example:

Input (the state): four numbers describing the drone you hit — its position relative to your gun (Δx, Δy) and its velocity (vx, vy), each normalised to a roughly 0–1 range.
Target (the action): two numbers — the unit vector of the direction you actually fired, i.e. how far you led the moving target.

That state → firing-direction pair is a demonstration. Copying an expert's policy from demonstrations, with no reward function, is behavioural cloning — the simplest form of imitation learning. Two details make it work well:

Only hits become demos. A miss is a bad label, so the dataset is filtered to successful expert actions — much cleaner supervision.
A shared coordinate frame. Your turret is on the right and the AI's faces the other way, so the game mirrors the x-axis into one "canonical" frame; your examples are then directly reusable by the left gun. Bullet speed is held fixed, so the exact amount of lead it learns transfers.

The network is an MLP shaped 4 → 32 → 24 → 2, with tanh hidden layers and a linear output. The task is regression: make its predicted direction match yours, scored by mean-squared error (MSE). It trains by full-batch gradient descent with momentum (plus a little weight decay to resist overfitting). The green sparkline in the bottom readout is that MSE falling. As demos pile up the net generalises — it produces sensible leads for drone states you never explicitly demonstrated.

2 · The honesty rule — it never cheats

The aim always comes from the network, never from a closed-form intercept formula. Before firing, it runs an honest self-check: "if I fire where my net is pointing, does the straight bullet pass close to where the drone will be?" If yes it shoots; if not it holds fire. This is pure ammo discipline — it decides whether to fire, but never corrects the aim. So an undertrained net genuinely misses (or wisely holds); only real learning makes it connect.

3 · Learning what to shoot — from the penalty

With a slow barrel and scarce ammo, target choice decides whether the city survives, so a second net learns it. For each candidate drone it reads five features: time-to-ground (urgency), fall speed, how far the barrel must swing to face it (slew cost), distance, and reachability — the aim net's own predicted miss.

The label comes from the outcome: a drone the turret destroys → 1 (good pick); a drone that slips through and hits the city → 0 (the penalty). It's trained online, mid-game, to predict the probability that engaging a drone will pay off — a small value / credit-assignment model where the cost of missed drones is the learning signal. It's blended with a plain "take the makeable shot" prior so it can never do worse than the obvious heuristic, only better.

What to watch

Loss falling = the aim net fitting your style.
Hit-rate climbing = both nets working together — better aim, smarter targets.
Reset brain in Options wipes both nets so you can watch it learn from zero.

Concepts in play: behavioural cloning · imitation vs. reinforcement learning · regression with MSE · gradient descent + momentum · weight decay · feature normalisation · learning from successful demonstrations · generalisation · online learning · outcome-driven target selection.