Introduction to Automatic Differentiation
Automatic differentiation (AD) is the backbone of modern deep learning. Unlike symbolic differentiation (which manipulates expressions) or numerical differentiation (which approximates via finite differences), AD computes exact derivatives by decomposing programs into elementary operations and applying the chain rule systematically.
The Chain Rule
Given a composition of functions , the chain rule tells us:
The order in which we multiply these Jacobians gives rise to two modes of AD.
Forward Mode
Forward mode propagates derivatives alongside the computation, from inputs to outputs. For each input variable , we track a dual number where .
For a function , forward mode computes one column of the Jacobian per pass — efficient when .
Reverse Mode
Reverse mode — what frameworks like PyTorch call “backpropagation” — propagates derivatives from outputs to inputs. It first performs a forward pass to build a computation graph, then walks it backward:
where is the adjoint. This computes one row of the Jacobian per pass — efficient when , which is exactly the case in ML (scalar loss, many parameters).
A Minimal Value Class
Here’s the core of a reverse-mode AD engine in TypeScript:
class Value {
data: number;
grad: number = 0;
private backward: () => void = () => {};
private children: Value[];
constructor(data: number, children: Value[] = []) {
this.data = data;
this.children = children;
}
add(other: Value): Value {
const out = new Value(this.data + other.data, [this, other]);
out.backward = () => {
this.grad += out.grad;
other.grad += out.grad;
};
return out;
}
mul(other: Value): Value {
const out = new Value(this.data * other.data, [this, other]);
out.backward = () => {
this.grad += other.data * out.grad;
other.grad += this.data * out.grad;
};
return out;
}
backprop(): void {
const topo: Value[] = [];
const visited = new Set<Value>();
const build = (v: Value) => {
if (!visited.has(v)) {
visited.add(v);
v.children.forEach(build);
topo.push(v);
}
};
build(this);
this.grad = 1;
for (const v of topo.reverse()) {
v.backward();
}
}
}
Interactive Computation Graph
Try adjusting the inputs below to see how gradients flow through a simple computation :
Computation graph: y = w·x + b. Toggle gradients to see ∂y/∂(each variable).
Key Takeaways
- Forward mode is passes for inputs — good for few inputs, many outputs
- Reverse mode is passes for outputs — ideal for ML’s scalar-loss setup
- The computation graph records operations during the forward pass, then gradients propagate backward through the graph
- TSTorch implements reverse-mode AD, just like PyTorch, but targeting TypeScript and WebGPU