dask_distance package

dask_distance.braycurtis(u, v)[source]

Finds the Bray-Curtis distance between two 1-D arrays.

\[\frac{ \sum_{i} \lvert u_{i} - v_{i} \rvert } { \sum_{i} \lvert u_{i} + v_{i} \rvert }\]
Parameters:
  • u – 1-D array or collection of 1-D arrays
  • v – 1-D array or collection of 1-D arrays
Returns:

Bray-Curtis distance

Return type:

float

dask_distance.canberra(u, v)[source]

Finds the Canberra distance between two 1-D arrays.

\[\sum_{i} \frac{ \lvert u_{i} - v_{i} \rvert } { \lvert u_{i} \rvert + \lvert v_{i} \rvert }\]
Parameters:
  • u – 1-D array or collection of 1-D arrays
  • v – 1-D array or collection of 1-D arrays
Returns:

Canberra distance

Return type:

float

dask_distance.cdist(XA, XB, metric=u'euclidean', **kwargs)[source]

Finds the distance matrix using the metric on each pair of points.

Parameters:
  • XA – 2-D array of points
  • XB – 2-D array of points
  • metric – string or callable
  • **kwargs – provided to the metric (see below)
Keyword Arguments:
 
  • p – p-norm for minkowski only (default: 2)
  • V – 1-D array of variances for seuclidean only (default: estimated from XA and XB)
  • VI – Inverse of the covariance matrix for mahalanobis only (default: estimated from XA and XB)
  • w – 1-D array of weights for wminkowski only (required)
Returns:

distance between each combination of points

Return type:

array

dask_distance.chebyshev(u, v)[source]

Finds the Chebyshev distance between two 1-D arrays.

\[\max_{i} \lvert u_{i} - v_{i} \rvert\]
Parameters:
  • u – 1-D array or collection of 1-D arrays
  • v – 1-D array or collection of 1-D arrays
Returns:

Chebyshev distance

Return type:

float

dask_distance.cityblock(u, v)[source]

Finds the City Block (Manhattan) distance between two 1-D arrays.

\[\sum_{i} \lvert u_{i} - v_{i} \rvert\]
Parameters:
  • u – 1-D array or collection of 1-D arrays
  • v – 1-D array or collection of 1-D arrays
Returns:

City Block (Manhattan) distance

Return type:

float

dask_distance.correlation(u, v)[source]

Finds the correlation distance between two 1-D arrays.

\[1 - \frac{ (u - \bar{u}) \cdot (v - \bar{v}) } { \lVert u - \bar{u} \rVert_{2} \lVert v - \bar{v} \rVert_{2} }\]
Parameters:
  • u – 1-D array or collection of 1-D arrays
  • v – 1-D array or collection of 1-D arrays
Returns:

correlation distance

Return type:

float

dask_distance.cosine(u, v)[source]

Finds the Cosine distance between two 1-D arrays.

\[1 - \frac{ u \cdot v } { \lVert u \rVert_{2} \lVert v \rVert_{2} }\]
Parameters:
  • u – 1-D array or collection of 1-D arrays
  • v – 1-D array or collection of 1-D arrays
Returns:

Cosine distance

Return type:

float

dask_distance.dice(u, v)[source]

Finds the Dice dissimilarity between two 1-D bool arrays.

\[\frac{ c_{TF} + c_{FT} }{ 2 \cdot c_{TT} + c_{TF} + c_{FT} }\]

where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)

Parameters:
  • u – 1-D bool array or collection of 1-D bool arrays
  • v – 1-D bool array or collection of 1-D bool arrays
Returns:

Dice dissimilarity

Return type:

float

dask_distance.euclidean(u, v)[source]

Finds the Euclidean distance between two 1-D arrays.

\[\lVert u - v \rVert_{2}\]
Parameters:
  • u – 1-D array or collection of 1-D arrays
  • v – 1-D array or collection of 1-D arrays
Returns:

Euclidean distance

Return type:

float

dask_distance.hamming(u, v)[source]

Finds the Hamming distance between two 1-D bool arrays.

\[\frac{ c_{TF} + c_{FT} }{ c_{TT} + c_{TF} + c_{FT} + c_{FF} }\]

where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)

Parameters:
  • u – 1-D bool array or collection of 1-D bool arrays
  • v – 1-D bool array or collection of 1-D bool arrays
Returns:

Hamming distance

Return type:

float

dask_distance.jaccard(u, v)[source]

Finds the Jaccard-Needham dissimilarity between two 1-D bool arrays.

\[\frac{ c_{TF} + c_{FT} }{ c_{TT} + c_{TF} + c_{FT} }\]

where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)

Parameters:
  • u – 1-D bool array or collection of 1-D bool arrays
  • v – 1-D bool array or collection of 1-D bool arrays
Returns:

Jaccard-Needham dissimilarity

Return type:

float

dask_distance.kulsinski(u, v)[source]

Finds the Kulsinski dissimilarity between two 1-D bool arrays.

\[\frac{ 2 \cdot \left(c_{TF} + c_{FT}\right) + c_{FF} } { c_{TT} + 2 \cdot \left(c_{TF} + c_{FT}\right) + c_{FF} }\]

where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)

Parameters:
  • u – 1-D bool array or collection of 1-D bool arrays
  • v – 1-D bool array or collection of 1-D bool arrays
Returns:

Kulsinski dissimilarity

Return type:

float

dask_distance.mahalanobis(u, v, VI)[source]

Finds the Mahalanobis distance between two 1-D arrays.

\[\sqrt{ (u - v) \cdot V^{-1} \cdot (u - v)^{T} }\]
Parameters:
  • u – 1-D array or collection of 1-D arrays
  • v – 1-D array or collection of 1-D arrays
  • VI – Inverse of the covariance matrix
Returns:

Mahalanobis distance

Return type:

float

dask_distance.minkowski(u, v, p)[source]

Finds the Minkowski distance between two 1-D arrays.

\[\left( \sum_{i} \lvert u_{i} - v_{i} \rvert^{p} \right)^{\frac{1}{p}}\]
Parameters:
  • u – 1-D array or collection of 1-D arrays
  • v – 1-D array or collection of 1-D arrays
  • p – degree of the norm to use
Returns:

Minkowski distance

Return type:

float

dask_distance.pdist(X, metric=u'euclidean', **kwargs)[source]

Finds the pairwise condensed distance matrix using the metric.

Parameters:
  • X – 2-D array of points
  • metric – string or callable
  • **kwargs – provided to the metric (see below)
Keyword Arguments:
 
  • p – p-norm for minkowski only (default: 2)
  • V – 1-D array of variances for seuclidean only (default: estimated from X)
  • VI – Inverse of the covariance matrix for mahalanobis only (default: estimated from X)
  • w – 1-D array of weights for wminkowski only (required)
Returns:

condensed distance between each pair

Return type:

array

Note

Tries to avoid redundant computations as much as possible. However this is limited in its ability to do this based on the chunk size of X (particularly along the first dimension). Smaller chunks will increase savings though there may be other tradeoffs.

dask_distance.rogerstanimoto(u, v)[source]

Finds the Rogers-Tanimoto dissimilarity between two 1-D bool arrays.

\[\frac{ 2 \cdot \left(c_{TF} + c_{FT}\right) } { c_{TT} + 2 \cdot \left(c_{TF} + c_{FT}\right) + c_{FF} }\]

where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)

Parameters:
  • u – 1-D bool array or collection of 1-D bool arrays
  • v – 1-D bool array or collection of 1-D bool arrays
Returns:

Rogers-Tanimoto dissimilarity

Return type:

float

dask_distance.russellrao(u, v)[source]

Finds the Russell-Rao dissimilarity between two 1-D bool arrays.

\[\frac{ c_{TF} + c_{FT} + c_{FF} } { c_{TT} + c_{TF} + c_{FT} + c_{FF} }\]

where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)

Parameters:
  • u – 1-D bool array or collection of 1-D bool arrays
  • v – 1-D bool array or collection of 1-D bool arrays
Returns:

Russell-Rao dissimilarity

Return type:

float

dask_distance.seuclidean(u, v, V)[source]

Finds the standardized Euclidean distance between two 1-D arrays.

\[\sqrt{\sum_{i} \left( \frac{\left( u_{i} - v_{i} \right)^{2}}{V_{i}} \right)}\]
Parameters:
  • u – 1-D array or collection of 1-D arrays
  • v – 1-D array or collection of 1-D arrays
  • V – 1-D array of variances
Returns:

standardized Euclidean

Return type:

float

dask_distance.sokalmichener(u, v)[source]

Finds the Sokal-Michener dissimilarity between two 1-D bool arrays.

\[\frac{ 2 \cdot \left(c_{TF} + c_{FT}\right) } { c_{TT} + 2 \cdot \left(c_{TF} + c_{FT}\right) + c_{FF} }\]

where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)

Parameters:
  • u – 1-D bool array or collection of 1-D bool arrays
  • v – 1-D bool array or collection of 1-D bool arrays
Returns:

Sokal-Michener dissimilarity

Return type:

float

dask_distance.sokalsneath(u, v)[source]

Finds the Sokal-Sneath dissimilarity between two 1-D bool arrays.

\[\frac{ 2 \cdot \left(c_{TF} + c_{FT}\right) } { c_{TT} + 2 \cdot \left(c_{TF} + c_{FT}\right) }\]

where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)

Parameters:
  • u – 1-D bool array or collection of 1-D bool arrays
  • v – 1-D bool array or collection of 1-D bool arrays
Returns:

Sokal-Sneath dissimilarity

Return type:

float

dask_distance.sqeuclidean(u, v)[source]

Finds the squared Euclidean distance between two 1-D arrays.

\[\lVert u - v \rVert_{2}^{2}\]
Parameters:
  • u – 1-D array or collection of 1-D arrays
  • v – 1-D array or collection of 1-D arrays
Returns:

squared Euclidean distance

Return type:

float

dask_distance.squareform(X, force=u'no')[source]

Converts between dense and sparse distance matrices

Parameters:
  • X – 2-D square symmetric matrix or 1-D vector of distances
  • force – whether to force to a vector or a matrix
Returns:

1-D vector or 2-D square symmetric matrix of distances

Return type:

array

dask_distance.wminkowski(u, v, p, w)[source]

Finds the weighted Minkowski distance between two 1-D arrays.

\[\left( \sum_{i} \lvert w_{i} \cdot (u_{i} - v_{i}) \rvert^{p} \right)^{ \frac{1}{p} }\]
Parameters:
  • u – 1-D array or collection of 1-D arrays
  • v – 1-D array or collection of 1-D arrays
  • p – degree of the norm to use
  • w – 1-D array of weights
Returns:

Minkowski distance

Return type:

float

dask_distance.yule(u, v)[source]

Finds the Yule dissimilarity between two 1-D bool arrays.

\[\frac{ 2 \cdot c_{TF} \cdot c_{FT} } { c_{TT} \cdot c_{FF} + c_{TF} \cdot c_{FT} }\]

where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)

Parameters:
  • u – 1-D bool array or collection of 1-D bool arrays
  • v – 1-D bool array or collection of 1-D bool arrays
Returns:

Yule dissimilarity

Return type:

float