dask_distance package¶
-
dask_distance.
braycurtis
(u, v)[source]¶ Finds the Bray-Curtis distance between two 1-D arrays.
\[\frac{ \sum_{i} \lvert u_{i} - v_{i} \rvert } { \sum_{i} \lvert u_{i} + v_{i} \rvert }\]Parameters: - u – 1-D array or collection of 1-D arrays
- v – 1-D array or collection of 1-D arrays
Returns: Bray-Curtis distance
Return type: float
-
dask_distance.
canberra
(u, v)[source]¶ Finds the Canberra distance between two 1-D arrays.
\[\sum_{i} \frac{ \lvert u_{i} - v_{i} \rvert } { \lvert u_{i} \rvert + \lvert v_{i} \rvert }\]Parameters: - u – 1-D array or collection of 1-D arrays
- v – 1-D array or collection of 1-D arrays
Returns: Canberra distance
Return type: float
-
dask_distance.
cdist
(XA, XB, metric=u'euclidean', **kwargs)[source]¶ Finds the distance matrix using the metric on each pair of points.
Parameters: - XA – 2-D array of points
- XB – 2-D array of points
- metric – string or callable
- **kwargs – provided to the metric (see below)
Keyword Arguments: - p – p-norm for minkowski only (default: 2)
- V – 1-D array of variances for seuclidean only (default: estimated from XA and XB)
- VI – Inverse of the covariance matrix for mahalanobis only (default: estimated from XA and XB)
- w – 1-D array of weights for wminkowski only (required)
Returns: distance between each combination of points
Return type: array
-
dask_distance.
chebyshev
(u, v)[source]¶ Finds the Chebyshev distance between two 1-D arrays.
\[\max_{i} \lvert u_{i} - v_{i} \rvert\]Parameters: - u – 1-D array or collection of 1-D arrays
- v – 1-D array or collection of 1-D arrays
Returns: Chebyshev distance
Return type: float
-
dask_distance.
cityblock
(u, v)[source]¶ Finds the City Block (Manhattan) distance between two 1-D arrays.
\[\sum_{i} \lvert u_{i} - v_{i} \rvert\]Parameters: - u – 1-D array or collection of 1-D arrays
- v – 1-D array or collection of 1-D arrays
Returns: City Block (Manhattan) distance
Return type: float
-
dask_distance.
correlation
(u, v)[source]¶ Finds the correlation distance between two 1-D arrays.
\[1 - \frac{ (u - \bar{u}) \cdot (v - \bar{v}) } { \lVert u - \bar{u} \rVert_{2} \lVert v - \bar{v} \rVert_{2} }\]Parameters: - u – 1-D array or collection of 1-D arrays
- v – 1-D array or collection of 1-D arrays
Returns: correlation distance
Return type: float
-
dask_distance.
cosine
(u, v)[source]¶ Finds the Cosine distance between two 1-D arrays.
\[1 - \frac{ u \cdot v } { \lVert u \rVert_{2} \lVert v \rVert_{2} }\]Parameters: - u – 1-D array or collection of 1-D arrays
- v – 1-D array or collection of 1-D arrays
Returns: Cosine distance
Return type: float
-
dask_distance.
dice
(u, v)[source]¶ Finds the Dice dissimilarity between two 1-D bool arrays.
\[\frac{ c_{TF} + c_{FT} }{ 2 \cdot c_{TT} + c_{TF} + c_{FT} }\]where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)
Parameters: - u – 1-D bool array or collection of 1-D bool arrays
- v – 1-D bool array or collection of 1-D bool arrays
Returns: Dice dissimilarity
Return type: float
-
dask_distance.
euclidean
(u, v)[source]¶ Finds the Euclidean distance between two 1-D arrays.
\[\lVert u - v \rVert_{2}\]Parameters: - u – 1-D array or collection of 1-D arrays
- v – 1-D array or collection of 1-D arrays
Returns: Euclidean distance
Return type: float
-
dask_distance.
hamming
(u, v)[source]¶ Finds the Hamming distance between two 1-D bool arrays.
\[\frac{ c_{TF} + c_{FT} }{ c_{TT} + c_{TF} + c_{FT} + c_{FF} }\]where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)
Parameters: - u – 1-D bool array or collection of 1-D bool arrays
- v – 1-D bool array or collection of 1-D bool arrays
Returns: Hamming distance
Return type: float
-
dask_distance.
jaccard
(u, v)[source]¶ Finds the Jaccard-Needham dissimilarity between two 1-D bool arrays.
\[\frac{ c_{TF} + c_{FT} }{ c_{TT} + c_{TF} + c_{FT} }\]where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)
Parameters: - u – 1-D bool array or collection of 1-D bool arrays
- v – 1-D bool array or collection of 1-D bool arrays
Returns: Jaccard-Needham dissimilarity
Return type: float
-
dask_distance.
kulsinski
(u, v)[source]¶ Finds the Kulsinski dissimilarity between two 1-D bool arrays.
\[\frac{ 2 \cdot \left(c_{TF} + c_{FT}\right) + c_{FF} } { c_{TT} + 2 \cdot \left(c_{TF} + c_{FT}\right) + c_{FF} }\]where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)
Parameters: - u – 1-D bool array or collection of 1-D bool arrays
- v – 1-D bool array or collection of 1-D bool arrays
Returns: Kulsinski dissimilarity
Return type: float
-
dask_distance.
mahalanobis
(u, v, VI)[source]¶ Finds the Mahalanobis distance between two 1-D arrays.
\[\sqrt{ (u - v) \cdot V^{-1} \cdot (u - v)^{T} }\]Parameters: - u – 1-D array or collection of 1-D arrays
- v – 1-D array or collection of 1-D arrays
- VI – Inverse of the covariance matrix
Returns: Mahalanobis distance
Return type: float
-
dask_distance.
minkowski
(u, v, p)[source]¶ Finds the Minkowski distance between two 1-D arrays.
\[\left( \sum_{i} \lvert u_{i} - v_{i} \rvert^{p} \right)^{\frac{1}{p}}\]Parameters: - u – 1-D array or collection of 1-D arrays
- v – 1-D array or collection of 1-D arrays
- p – degree of the norm to use
Returns: Minkowski distance
Return type: float
-
dask_distance.
pdist
(X, metric=u'euclidean', **kwargs)[source]¶ Finds the pairwise condensed distance matrix using the metric.
Parameters: - X – 2-D array of points
- metric – string or callable
- **kwargs – provided to the metric (see below)
Keyword Arguments: - p – p-norm for minkowski only (default: 2)
- V – 1-D array of variances for seuclidean only (default: estimated from X)
- VI – Inverse of the covariance matrix for mahalanobis only (default: estimated from X)
- w – 1-D array of weights for wminkowski only (required)
Returns: condensed distance between each pair
Return type: array
Note
Tries to avoid redundant computations as much as possible. However this is limited in its ability to do this based on the chunk size of X (particularly along the first dimension). Smaller chunks will increase savings though there may be other tradeoffs.
-
dask_distance.
rogerstanimoto
(u, v)[source]¶ Finds the Rogers-Tanimoto dissimilarity between two 1-D bool arrays.
\[\frac{ 2 \cdot \left(c_{TF} + c_{FT}\right) } { c_{TT} + 2 \cdot \left(c_{TF} + c_{FT}\right) + c_{FF} }\]where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)
Parameters: - u – 1-D bool array or collection of 1-D bool arrays
- v – 1-D bool array or collection of 1-D bool arrays
Returns: Rogers-Tanimoto dissimilarity
Return type: float
-
dask_distance.
russellrao
(u, v)[source]¶ Finds the Russell-Rao dissimilarity between two 1-D bool arrays.
\[\frac{ c_{TF} + c_{FT} + c_{FF} } { c_{TT} + c_{TF} + c_{FT} + c_{FF} }\]where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)
Parameters: - u – 1-D bool array or collection of 1-D bool arrays
- v – 1-D bool array or collection of 1-D bool arrays
Returns: Russell-Rao dissimilarity
Return type: float
-
dask_distance.
seuclidean
(u, v, V)[source]¶ Finds the standardized Euclidean distance between two 1-D arrays.
\[\sqrt{\sum_{i} \left( \frac{\left( u_{i} - v_{i} \right)^{2}}{V_{i}} \right)}\]Parameters: - u – 1-D array or collection of 1-D arrays
- v – 1-D array or collection of 1-D arrays
- V – 1-D array of variances
Returns: standardized Euclidean
Return type: float
-
dask_distance.
sokalmichener
(u, v)[source]¶ Finds the Sokal-Michener dissimilarity between two 1-D bool arrays.
\[\frac{ 2 \cdot \left(c_{TF} + c_{FT}\right) } { c_{TT} + 2 \cdot \left(c_{TF} + c_{FT}\right) + c_{FF} }\]where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)
Parameters: - u – 1-D bool array or collection of 1-D bool arrays
- v – 1-D bool array or collection of 1-D bool arrays
Returns: Sokal-Michener dissimilarity
Return type: float
-
dask_distance.
sokalsneath
(u, v)[source]¶ Finds the Sokal-Sneath dissimilarity between two 1-D bool arrays.
\[\frac{ 2 \cdot \left(c_{TF} + c_{FT}\right) } { c_{TT} + 2 \cdot \left(c_{TF} + c_{FT}\right) }\]where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)
Parameters: - u – 1-D bool array or collection of 1-D bool arrays
- v – 1-D bool array or collection of 1-D bool arrays
Returns: Sokal-Sneath dissimilarity
Return type: float
-
dask_distance.
sqeuclidean
(u, v)[source]¶ Finds the squared Euclidean distance between two 1-D arrays.
\[\lVert u - v \rVert_{2}^{2}\]Parameters: - u – 1-D array or collection of 1-D arrays
- v – 1-D array or collection of 1-D arrays
Returns: squared Euclidean distance
Return type: float
-
dask_distance.
squareform
(X, force=u'no')[source]¶ Converts between dense and sparse distance matrices
Parameters: - X – 2-D square symmetric matrix or 1-D vector of distances
- force – whether to force to a vector or a matrix
Returns: 1-D vector or 2-D square symmetric matrix of distances
Return type: array
-
dask_distance.
wminkowski
(u, v, p, w)[source]¶ Finds the weighted Minkowski distance between two 1-D arrays.
\[\left( \sum_{i} \lvert w_{i} \cdot (u_{i} - v_{i}) \rvert^{p} \right)^{ \frac{1}{p} }\]Parameters: - u – 1-D array or collection of 1-D arrays
- v – 1-D array or collection of 1-D arrays
- p – degree of the norm to use
- w – 1-D array of weights
Returns: Minkowski distance
Return type: float
-
dask_distance.
yule
(u, v)[source]¶ Finds the Yule dissimilarity between two 1-D bool arrays.
\[\frac{ 2 \cdot c_{TF} \cdot c_{FT} } { c_{TT} \cdot c_{FF} + c_{TF} \cdot c_{FT} }\]where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)
Parameters: - u – 1-D bool array or collection of 1-D bool arrays
- v – 1-D bool array or collection of 1-D bool arrays
Returns: Yule dissimilarity
Return type: float