A metric or distance function is a function
There are multiple ways to define a metric on a set. A typical distance for real numbers is the absolute difference,
$ d : (x, y) \mapsto |x-y| $. But a scaled version of the absolute difference, or even $d(x, y) = \begin{cases} 0 &\mbox{if } x = y \ 1 & \mbox{if } x \ne y. \end{cases}$
are valid metrics as well. Every normed vector space induces a distance given by
Math.NET Numerics provides the following distance functions on vectors and arrays:
The sum of absolute difference is equivalent to the abs
function makes this metric a bit complicated to deal with analytically, but it is more robust than SSD.
$$$ d_{\mathbf{SAD}} : (x, y) \mapsto |x-y|1 = \sum{i=1}^{n} |x_i-y_i|
[lang=csharp]
double d = Distance.SAD(x, y);
The sum of squared difference is equivalent to the squared abs
function makes this metric convenient to deal with analytically, but the squares cause it to be very
sensitive to large outliers.
$$$ d_{\mathbf{SSD}} : (x, y) \mapsto |x-y|2^2 = \langle x-y, x-y\rangle = \sum{i=1}^{n} (x_i-y_i)^2
[lang=csharp]
double d = Distance.SSD(x, y);
The mean absolute error is a normalized version of the sum of absolute difference.
$$$ d_{\mathbf{MAE}} : (x, y) \mapsto \frac{d_{\mathbf{SAD}}}{n} = \frac{|x-y|1}{n} = \frac{1}{n}\sum{i=1}^{n} |x_i-y_i|
[lang=csharp]
double d = Distance.MAE(x, y);
The mean squared error is a normalized version of the sum of squared difference.
$$$ d_{\mathbf{MSE}} : (x, y) \mapsto \frac{d_{\mathbf{SSD}}}{n} = \frac{|x-y|2^2}{n} = \frac{1}{n}\sum{i=1}^{n} (x_i-y_i)^2
[lang=csharp]
double d = Distance.MSE(x, y);
The euclidean distance is the
$$$ d_{\mathbf{2}} : (x, y) \mapsto |x-y|2 = \sqrt{d{\mathbf{SSD}}} = \sqrt{\sum_{i=1}^{n} (x_i-y_i)^2}
[lang=csharp]
double d = Distance.Euclidean(x, y);
The Manhattan distance is the
$$$ d_{\mathbf{1}} \equiv d_{\mathbf{SAD}} : (x, y) \mapsto |x-y|1 = \sum{i=1}^{n} |x_i-y_i|
[lang=csharp]
double d = Distance.Manhattan(x, y);
The Chebyshev distance is the
$$$ d_{\mathbf{\infty}} : (x, y) \mapsto |x-y|\infty = \lim{p \rightarrow \infty}\bigg(\sum_{i=1}^{n} |x_i-y_i|^p\bigg)^\frac{1}{p} = \max_{i} |x_i-y_i|
[lang=csharp]
double d = Distance.Chebyshev(x, y);
The Minkowski distance is the generalized
$$$ d_{\mathbf{p}} : (x, y) \mapsto |x-y|p = \bigg(\sum{i=1}^{n} |x_i-y_i|^p\bigg)^\frac{1}{p}
[lang=csharp]
double d = Distance.Minkowski(p, x, y);
The Canberra distance is a weighted version of the Manhattan distance, introduced and refined 1967 by Lance, Williams and Adkins. It is often used for data scattered around an origin, as it is biased for measures around the origin and very sensitive for values close to zero.
$$$ d_{\mathbf{CAD}} : (x, y) \mapsto \sum_{i=1}^{n} \frac{|x_i-y_i|}{|x_i|+|y_i|}
[lang=csharp]
double d = Distance.Canberra(x, y);
The cosine distance contains the dot product scaled by the product of the Euclidean distances from the origin. It represents the angular distance of two vectors while ignoring their scale.
$$$ d_{\mathbf{cos}} : (x, y) \mapsto 1-\frac{\langle x, y\rangle}{|x|2|y|2} = 1-\frac{\sum{i=1}^{n} x_i y_i}{\sqrt{\sum{i=1}^{n} x_i^2}\sqrt{\sum_{i=1}^{n} y_i^2}}
[lang=csharp]
double d = Distance.Cosine(x, y);
The Pearson distance is a correlation distance based on Pearson's product-momentum correlation coefficient of the two sample vectors. Since the correlation coefficient falls between [-1, 1], the Pearson distance lies in [0, 2] and measures the linear relationship between the two vectors.
$$$ d_{\mathbf{Pearson}} : (x, y) \mapsto 1 - \mathbf{Corr}(x, y)
[lang=csharp]
double d = Distance.Pearson(x, y);
The hamming distance represents the number of entries in the two sample vectors which are different. It is a fundamental distance measure in information theory but less relevant in non-integer numerical problems.
[lang=csharp]
double d = Distance.Hamming(x, y);