__COVARIANCE AND CONTRAVARIANCE__

When studying tensor calculus the distinction between *covariance* and *contravariance* may be obscure and is rarely explained
visually. A geometric explanation will be exhibited here.

First we will explain the distinction between the covariant and contravariant components of vectors, thinking of vector-fields where a
vector is defined at a point rather than as a position vector. This extends naturally to the components of higher order tensors.
Strictly speaking, despite usage to the contrary, there is no such thing as a “covariant vector” or a “contravariant vector”. A vector
is a vector is a vector. However it may be handled in two ways. Firstly by means of its components parallel to the coordinate directions
which form a parallelogram in the two-dimensional case, in the same way that **dx** and **dy** are defined as the sides of the
parallelogram related to an infinitesimal displacement **ds**. These components are referred to as its *contravariant components*.
Secondly we may handle it by means of its resolved parts along the coordinate directions, which are its *covariant components*.
The latter are the inner products of the vector with the coordinate unit vectors. The distinction is important e.g. when finding inner
products such as **F.s** for the work done by a force **F** producing a displacement **s**. We will follow that up later.

We will work with vectors in two dimensions to illustrate the principles involved. We will use non-orthogonal cartesian coordinates
i.e. coordinates defined relative to non-orthogonal axes. However tensors are especially concerned with the use of *curvilinear coordinates*
, where vectors and tensors are referred to curved coordinate lines which approach linearity at infinitesimal distances. The coordinate
axes used below should be regarded as the tangents to such coordinate lines in such cases, and vectors as directed magnitudes at an
origin O which is a local point in a field. The coordinate directions thus vary as O is varied. This covers cases where both coordinates
are of the same type (polar coordinates in two dimensions are an example where they are not).

Figure 1

__Contravariant Components__

The components of a vector in two dimensions are defined in the literature in relation to a change of coordinates from (x,y) to (x',y'), say.
The contravariant components are those which transform as follows e.g. for the new coordinate x' in terms of the old (x,y):

(1)

and similarly for y'. This is far from obvious at first sight, so we will show how the partial derivatives relate to the geometry.

This is how the coordinates themselves are transformed, and oddly enough vectors defined in this way are referred to as contravariant,
which at first sight seems rather perverse. However the comments about inner products below may shed light
on this oddity.

Figure 2

The vector at O is represented by OV and the parallelogram-component on the axis OX is OA, where VA is parallel to the axis OY. We will only illustrate the situation for the x-components. If we change coordinates to OX', OY' then the new x-component is OA' where VA' is parallel to OY'. Now we join A to P on OX' such that AP is parallel to OY'. Using the sine rule we get

(2)

where γ=φ+β-α and μ=180-φ-β.

Noting that OA'=x', OA=x and AV=y, partial differentiation of this with respect to x gives

from triangle OAP, holding y constant, and

from triangle VQA', holding x constant, giving from (2)

as required. A similar argument holds for the new y coordinate. The generalised version of (1) for more than two dimensions, using overlines instead of primes, is

or, using the repeated-index summing convention for k,

(3)

For the contravariant components it is customary to use superscripts for the indices such as j and k.

Thus our previous x' = x^{1} and y'=x^{2}.

Useful expressions for the contravariant coordinates of OV are, using the sine rule,

(4)

__Covariant Coordinates__

The covariant components of a vector are defined by the transformation

(5)

using subscripts for the indices in the covariant case. For the x-coordinate in two dimensions this is

(6)

where the partial derivatives are "inverted" compared with the contravariant case.

We start by assuming we know x, y, α and φ i.e. we know the initial coordinates of the vector rather than its magnitude OV=v or
its angle θ to OX. OA=x and OB=y (Figure 3):

Figure 3

then

(7)

Solving for θ gives

(8)

Now

which by (7) is

which by (8) is

We now encounter a subtlety of the meaning of the "inverted" partial derivatives, for they refer to the coordinates which are contravariant, so we must relate this back to them as follows:

Figure 4

If OX'=δx', OX=δx and OY=δy then using the sine rule in the infinitesimal case we get

showing that (9) is the same as (6), as required. For more than two dimensions the principle is the same but OV is no longer necessarily
in a coordinate plane.

We have thus exhibited how the geometrical interpretation of covariance and contravariance relates to the formal definitions when
the components are of the same type.

__Inner Product__

The distinction between contravariance and covariance is important e.g. when finding inner products such as **F.s** for the work
**W** done by a force **F** producing a displacement **s**. We take the inner product of the two vectors which usually means
resolving **F** along the direction of **s**. The actual evaluation of **W** amounts to summing the products of the
coordinate-system-components of **s** by the resolved parts of **F**. That is, we sum the products of the contravariant components
of **s** and the covariant components of **F** as for an inner vector product. To use instead the contravariant components
of **F** (which are perfectly respectable quantities) would obviously give the wrong result for **W**. However, we may instead
use the covariant components of **s** multiplied by the contravariant ones of **F** and get the correct result, but it seems an
unnatural way to handle the problem. It is more natural to handle **F** by means of its covariant components, which is perhaps why
the loose description of a force as a “covariant vector” has crept in. Similarly **s** is most naturally handled by means of its
parallelogram components.

We will now show how this works explicitly. Applying (4) to the vector **s** represented by OV of length s as in Figure 2, but at
an angle ψ to OX, gives

The covariant components of **F** represented by OV as in Figure 3 are:

and combining the two gives the inner product in tensor form:

which is the standard expression for the inner product.

If we change the coordinate system then the covariant components of **F** will change such that the above inner product remains
invariant (and valid!). This may explain the use of covariant for such components.

Generally a tensor is characterised by a set of functions defining how its components vary with the coordinates. A set of functions
comprise a tensor if the components satisfy (3) or (5). Another test is to multiply a set of functions by a tensor, and if the result
is a tensor then so are those functions. To find out whether the functions are the simplest possible for a tensor is more difficult,
remembering that the tensor is an entity that is described by the functions, just as a velocity is an independent physical entity
that may be described in various ways. Such an entity exists independently of the coordinates used to describe it since any equations
involving it will, in view of (3) and (5), be the same in any coordinate system e.g. work done expressed by an inner product.
However the functions may prove to be simpler in one coordinate system than another e.g. a radial electric field is better described
in polar coordinates than Cartesian.

In three or more dimensions, the resolved parts are obtained by projecting a vector onto the __axes__, not onto the coordinate planes.

Higher order tensors are in principle handled similarly, but they may be expressed with mixed coordinate types i.e. some covariant and some
contravariant. The metric tensor is g_{ij} and is most easily understood when represented by a square matrix.
The coordinate types are the same in that case and the infinitesimal distance between two points is
s=g_{ij}dx_{i}dx_{j}. The repeated suffix convention then implies that g_{ij} is summed with dx_{i}
and independently with dx_{j}. For two dimensions *i* and *j* vary from 1 to 2, so that g_{ij} is a
two-dimensional matrix, whereas for three dimensions they vary from 1 to 3 and the matrix is three-dimensional. This illustrates the
power of tensor notation where the same equation applies for any number of dimensions above 1, but of course the number of expressions
for the terms of g_{ij} increases with increasing dimensionality.