Drones drift down from the dark. You command the right turret — aim with the mouse, click to fire one round. Your barrel turns slowly and your magazine reloads a round at a time, so every shot has to count. Lead the targets and make them land.
The left turret learns from you: each hit you land becomes a training example, and a small neural net learns to aim — leads and all. It also learns which drone to take next from the cost of the ones that slip through, and it defends the whole skyline. Train it well and watch it hold the city. Tune everything under ⚙ Options.
The left turret can't see the sky the way you do. Instead it watches you play and learns two separate things: how to aim (by copying you) and what to shoot first (from the cost of the drones that get through). Both are small multilayer perceptrons (MLPs) written in plain JavaScript and trained live — no server, no pre-training.
Every time you land a shot, the game stores one training example:
That state → firing-direction pair is a demonstration. Copying an expert's policy from
demonstrations, with no reward function, is behavioural cloning — the simplest form of imitation
learning. Two details make it work well:
The network is an MLP shaped 4 → 32 → 24 → 2, with tanh hidden layers and a linear
output. The task is regression: make its predicted direction match yours, scored by
mean-squared error (MSE). It trains by full-batch gradient descent with momentum (plus a
little weight decay to resist overfitting). The green sparkline in the bottom readout is that MSE
falling. As demos pile up the net generalises — it produces sensible leads for drone states you
never explicitly demonstrated.
The aim always comes from the network, never from a closed-form intercept formula. Before firing, it runs an honest self-check: "if I fire where my net is pointing, does the straight bullet pass close to where the drone will be?" If yes it shoots; if not it holds fire. This is pure ammo discipline — it decides whether to fire, but never corrects the aim. So an undertrained net genuinely misses (or wisely holds); only real learning makes it connect.
With a slow barrel and scarce ammo, target choice decides whether the city survives, so a second net learns it. For each candidate drone it reads five features: time-to-ground (urgency), fall speed, how far the barrel must swing to face it (slew cost), distance, and reachability — the aim net's own predicted miss.
The label comes from the outcome: a drone the turret destroys → 1 (good pick); a drone that slips through and hits the city → 0 (the penalty). It's trained online, mid-game, to predict the probability that engaging a drone will pay off — a small value / credit-assignment model where the cost of missed drones is the learning signal. It's blended with a plain "take the makeable shot" prior so it can never do worse than the obvious heuristic, only better.
Concepts in play: behavioural cloning · imitation vs. reinforcement learning · regression with MSE · gradient descent + momentum · weight decay · feature normalisation · learning from successful demonstrations · generalisation · online learning · outcome-driven target selection.