Derivatives

A derivative allows you to calculate the gradient at points a long a curved graph.

A derivative is defined as the instantaneous rate of change at a point on a graph. However, this is a confusing, almost meaningless definition. It seems paradoxical for something to have a rate change in a sinlge instant as change is, by definition, something that occurs over time.

A better definition of a derivate might be: the rate of change between two points that are so close together the effect that the distance between them has on calculation can be ignored.

This is often rephrased as a difference between two points that shrinks towards zero.

Visual Approach

This calculation is trivial when your graph is just a straight line. The formula for which is:

$$ slope = \frac{\Delta y}{\Delta x} $$

Which is read as delta x divided by delta y which simply means the change in y divided by the change in x. The graph below shows the funtion:

$$ y = 2x $$

You can calculate the gradient of this line by taking two points along that line and then dividing the difference between them. For example, if you have the coordinates (1, 2), (2, 4) the calculation for the gradient would be:

$$ \frac{4 - 2}{2 - 1} = \frac{2}{1} = 2 $$

So the rate of change of graph above is 2. This function is a straight line so its rate of the change is constant, no matter what two points you use to run the calculation above you will get the same result.

But how would you calculate the gradient of a curved graph? The next graph shows the function:

$$ y = x^2 $$

It’s not clear how we can solve the problem of finding the gradient or rate of change of this graph because it is constantly changing. At x = 0 the rate of change will be completely different from x = 10.

We can however approximate the rate of change in a local area of the graph by drawing a straight line between two points on the graph and calculating the gradient for that straight line with the simple *change in over change in * formula above. The graph below shows a zoomed in section of the y = x^2 curve with a line that intersects two points on the curve. Calculating the gradient of this line can approximate the rate of change at that point.

To get a more accurate rate of change for a specific point on the graph we could shrink the distance between the two intersection points until the distance is almost unnoticeable.

At this point the line we are calculating begins to look like a tangent to a point on the curve where we want to know the rate of change, and in fact, this is exactly what the derivate of a point is!

However, there’s a problem. With this method of testing smaller and smaller real points on the graph we can only ever reach an approximation of the gradient at any specific point but we would like a pure, beautiful and general way to describe the gradient at an arbitrary point on the graph.

Algebraic Approach

You can derive this intuition about the derivative algebraically from the *change in over change in * formula by thinking about in terms of this very small difference between two points, a difference which shrinks towards (but never actually reaches) zero. The name for this is sometimes also called dx. For a function:

$$ y = f(x) $$

This means that:

$$ \frac{\Delta x}{\Delta y} = \frac{f(x + \Delta x) - f(x)}{\Delta x} $$

Because we need to calculate the two positions of y at both x and x plus that very small difference Δx. The difference between these two ys is Δy.

Let’s use this with the example the graph above for the function:

$$ f(x) = x^2 $$

This means that we subsitute the function for x^2.

$$ \frac{(x + \Delta x)^2 - x^2}{\Delta x} \\[5pt] = \frac{x^2 + 2x\Delta x + \Delta x^2 - x^2}{\Delta x} \\[5pt] = \frac{2x\Delta x + \Delta x^2}{\Delta x} $$

We then cancel the Δx in the demoninator with each of the clauses in the numerator to get:

$$ 2x + \Delta x $$

And, this is the most important part about the intuition here because Δx is so small as to be almost, or shrinking towards, zero, it can be ignored. We can just remove it from the equation! 🤯

So the derivative of the graph f(x) = x^2 is just 2x. At point x = 2 the slope of the graph is 4 at point x = 5 the slope of the graph is 10.

This derivative basically just represents a line that is perfectly tangent to a point on the graph. The graph below demonstrates this interactively by dynamically calculating the derivative line for a point on the graph.

Derivative Rules

The derivative is often abbreviated to a tick mark '. For example the derivative of the function f would be written as f'.

Power Rule

The power rule states that you can calculate the derivative for function with a power in it by placing the power as a multipler in from of the algebraic term of the function and substracting 1 from the power.

$$ x^n' = nx^{n-1} $$

Examples:

$$ x^3' = 3x^2 \\[5pt] 5x^4' = 20x^3 $$

This also works for fractional and negative powers:

$$ \frac{1}{x}' = x^{-1}' = -x^{-2} = \frac{-1}{x^2} \\[5pt] $$

And for functions that use some root of x:

$$ \sqrt{x}' = x^{\frac{1}{2}}{'} = \frac{1}{2}x^{- \frac{1}{2}} \\[5pt] $$

Examples:

$$ \sqrt[3]{x}' = \frac{1}{3}x^{- \frac{1}{3}} \\[5pt] $$

*Why does the power rule work?

** Why does this method of manipulating the powers generalise? One way to think about this generalisation is based on the rules for expanding expressions that are raised to a power. Taking the example of f(x) = x^3. This means that the Δx component of this equation will be:

$$ (x + \Delta x)^3 \\[5pt] $$

Which if we expand it will be:

$$ x^3 + 3x^2\Delta x + 3x\Delta x^2 + \Delta x^3 \\[5pt] $$

Δx, as ever, is a small amount, approaching zero, so any term which raises Δx to a power will be even more tiny and insignificant. This means that all terms apart from 3x^2Δx term can be ignored. And, when we calculate the slope we divide by Δx removing the Δx from the 3x^2 term and leaving us with just 3x^2. The x^3 is also removed when we take the difference between the starting and ending amounts for Δy.

This **generalises for a function with x raised to the nth power.

$$ f(x) = x^n \\[5pt] slope = frac{(x + \Delta x)^n}{\Delta x} $$

If we expand the numerator of the slope function. The first term we get is:

$$ x^n $$

We then an n number of terms where we multiply together all the x terms -1 with one Δx term.

$$ \Delta x \times x \times x ... \times x = x^{n-1}\Delta x \\[5pt] x \times \Delta x \times x ... \times x = x^{n-1}\Delta x \\[5pt] x \times x \times \Delta x ... \times x = x^{n-1}\Delta x \\[5pt] \vdots\\ x \times x \times x ... \times \Delta x = x^{n-1}\Delta x \\[5pt] $$

Then all those terms in x raised to the n - 1 power are added together. So, at this point the expanded equation will look like:

$$ x^n + nx^{n-1} \Delta x $$

The next terms will involve a similar to expansive to this second term in a form like:

$$ \Delta x \times \Delta x \times x ... \times x = x^{n-2} \Delta x^2 \\[5pt] x \times \Delta x \times \Delta x ... \times x = x^{n-2} \Delta x^2 \\[5pt] \vdots\\ x \times x \times \Delta x ... \times \Delta x = x^{n-2} \Delta x^2 \\[5pt] $$

This time with more than one Δx term being. This results in Δx being raised to a power. And as we know raising a tiny amount to power makes it even more tiny and negligable, and since every term after this will feature Δx raised to some power we can just ignore it. As usual the x^n and the Δx on our second term is cancelled and we left with just.

$$ nx^{n-1} $$

Sum and Difference Rules

The sum rule states that the derviate is composable from the individual derivates of the function terms. The same difference rules applies for the difference between function terms.

$$ (f + g)' = f' + g' \\[5pt] (f - g)' = f' - g' $$

Examples:

$$ (x^2 + x^3)' = 2x + 3x^2 \\[5pt] (x^5 - x^6)' = 5x^4 + 6x^5 $$

Product Rule

The product rule states that the derivative of two functions that are multiplied together is the first function multipled by the derivative of the second function, plus the derivative of the first function multiple by the second function. This can be remembered with the nemonic:

right d left plus left d right

The derivative is therefore given by:

$$ g(x)h(x)' = g(x)\frac{dh}{dx} + h(x)\frac{dg}{dx} $$

Examples:

$$ (4t^{2} - t)(t^3 - 8t^2 + 12)' = \\[5pt] (4t^{2} - t)(3t^2 - 16t) + (t^3 - 8t^2 + 12)(8t - 1) = \\[5pt] 20t^4 - 132t^3 + 24t^2 + 96t - 12 $$

This generalises further to finding the derivate of expressions with more than two functions multipled together so that you take the derivate of each individual function multiplied by the other functions and added together for each combination of functions, in the form:

$$ g(x)h(x)f(x)' = g(x)'h(x)f(x) + g(x)h(x)'f(x) + g(x)'h(x)f(x)' $$

Visualising the Product Rule

You can visualise the product rule using simple geometric shapes. For example, in the case of two functions being multiplied together you can imagine this as a rectangle who’s length is given by the first function and who’s width is given by the second function. The example below shows a box, the length of which is given by the function sin(x) and the height of which is given by x^2.

┏━━sin(x)━━┓
┏━━━━━━━━━━┓ ┓
┃          ┃ ┃
┃          ┃ x^2
┃          ┃ ┃
┗━━━━━━━━━━┛ ┛

If you were to increase x slightly you would get 3 new pieces of area: A, B and C.

┏━━━━━━━━━━┓ ┏━┓
┃          ┃ ┃ ┃
┃          ┃ ┃A┃
┃          ┃ ┃ ┃
┗━━━━━━━━━━┛ ┗━┛
┏━━━━━━━━━━┓ ┏━┓
┃     B    ┃ ┃C┃
┗━━━━━━━━━━┛ ┗━┛

The areas of these different sections are:

$$ A = \Delta sin(x)x^2 \\[5pt] B = sin(x)\Delta x^2 \\[5pt] C = \Delta sin(x) \Delta x^2 $$

We can ignore C because the change of a Δ value raised to a power is negligible. That means the whole change of this expression over a tiny Δx is the area of A plus the area of B which is the same as product rule outlined above. So the solution would be:

$$ cos(x)x^2 + sin(x)2x $$

This geometric intuition can be generalised to three dimensions to implicate what happens with a higher number of functions. In that case you would end up with three cuboids that describe the derivative of the function.

Chain Rule

The chain rule is used for taking the derivative of functions inside other functions i.e. derivatives in the domain of function composition. For example, the function:

$$ sin(x^2) $$

Is a composition of the two functions (lets call them g and h):

$$ g(x) = sin(x) \\[5pt] h(x) = x^2 $$

With the function h plugged into the x input on function g.

The chain rule states that the derivative of two composed functions is the derivative of the outer function multiplied by the derivative of the inner function. So in the example of above this would mean that:

$$ sin(x^2)' = \\[5pt] cos(x^2)2x $$

More generally:

$$ g(h(x))' = \\[5pt] g'(h(x))h(x)' $$

Intuitively this makes sense because the derivative of a function that takes another funtion will be that function’s derivative adjusted proportionally by the derivative of the input function. This is why the input to the derivative of the outer function remains constant, that change is reflected by multiplying the outer function’s derivative by that of the inner function, while the input has to remain static for this propertional change of just the derivative to take effect.

Exponent Derivatives

The exponent of a derivative is the exponent term itself, multiplied by some constant specific to that function. This means that the derivate of an exponent is proportional to itself.

Is there base where $ b^{x}’ = b^{x}\times{1} = b^{x} $?

In fact there is, it is the special number $ e $. The derivative of $ e^{x} $ is itself $ e^{x} $.

You can apply the chain rule to derivatives of $ e $. For $ e^{3x} $ we would first derivative of the outer function (which is itself) multiplied by the derivative of the inner function $ 3x $ which is $ 3 $. So the derivative of $ e^{3x} $ is $ 3e^{3x} $.

You can write any exponential function in terms of $ e $ by describing the base of $ e $ followed by the exponents. For example $ 2^{x} $ is the same as $ e^{ln2x} $. To take the derivative of this we can use the chain rule again to get $ e^{ln2x} $ as the derivative of the outer function, i.e. itself, multiplied by the derviative of the inner function which just uses the standard power rule $ ln2x’ = ln2 $ because $ ln2 $ is just an alias for a constant that $ e $ should be raised to to equal 2.

Notation

There are several notation considerations with the different ways of describing derivatives.

The derivative of a function can be written in several different ways:

\[ f’ = \frac{\Delta f}{\Delta x} = \frac{dy}{dx} = \frac{d}{dx}f = \frac{d}{dx}y \]

There is also a distinction that is made between the average and instantaneous rates of change.

The average rate of change over a period is often written as:

\[ \frac{\Delta x}{\Delta y} \]

Where as the instantaneous rate of change (the change that represents a tangent to a single point on the graph) is written as:

\[ \frac{dy}{dx} \]