It's really easy to apply it if you understand it. It's easy to remember the formula if you know where it came from.
Basically the aim is to find the gradient of a curve.
A good approximation of the gradient is to plot a secant/straight line through the curve and find the gradient of the straight line. As we approach closer and closer to the actual gradient at a point, the approximate gradient approaches closer to the actual gradient.
So say we have y = f(x), the usual straight line formula for a gradient is (y<sub>2</sub> - y<sub>1</sub>) / (x<sub>2</sub> - x<sub>1</sub>)
If we replace y with f(x), we get:
[f(x<sub>2</sub>) - f(x<sub>1</sub>)] / [x<sub>2</sub> - x<sub>1</sub>]
If x<sub>2</sub> and x<sub>1</sub> are really close to each other then we can let Δx (or more often called h) = x<sub>2</sub> - x<sub>1</sub>
Which implies x<sub>2</sub> = x<sub>1</sub> + Δx
so the approximation is [f(x<sub>1</sub> + Δx) - f(x<sub>1</sub>)] / Δx
As x<sub>2</sub> and x<sub>1</sub> get really close to each other (almost to where they're equal but never exactly equal), then Δx which approach zero, hence the first principles formula (with generalisation of x<sub>1</sub> to just the variable x) is;
lim<sub>Δx --> 0</sub> [f(x + Δx) - f(x)] / Δx
Basically it's just:
lim<sub>'a little bit' --> 0</sub> [f(x + 'a little bit') - f(x)] / 'a little bit'