dask_distance package¶
-
dask_distance.braycurtis(u, v)[source]¶ Finds the Bray-Curtis distance between two 1-D arrays.
\[\frac{ \sum_{i} \lvert u_{i} - v_{i} \rvert } { \sum_{i} \lvert u_{i} + v_{i} \rvert }\]Parameters: - u – 1-D array or collection of 1-D arrays
- v – 1-D array or collection of 1-D arrays
Returns: Bray-Curtis distance
Return type: float
-
dask_distance.canberra(u, v)[source]¶ Finds the Canberra distance between two 1-D arrays.
\[\sum_{i} \frac{ \lvert u_{i} - v_{i} \rvert } { \lvert u_{i} \rvert + \lvert v_{i} \rvert }\]Parameters: - u – 1-D array or collection of 1-D arrays
- v – 1-D array or collection of 1-D arrays
Returns: Canberra distance
Return type: float
-
dask_distance.cdist(XA, XB, metric=u'euclidean', **kwargs)[source]¶ Finds the distance matrix using the metric on each pair of points.
Parameters: - XA – 2-D array of points
- XB – 2-D array of points
- metric – string or callable
- **kwargs – provided to the metric (see below)
Keyword Arguments: - p – p-norm for minkowski only (default: 2)
- V – 1-D array of variances for seuclidean only (default: estimated from XA and XB)
- VI – Inverse of the covariance matrix for mahalanobis only (default: estimated from XA and XB)
- w – 1-D array of weights for wminkowski only (required)
Returns: distance between each combination of points
Return type: array
-
dask_distance.chebyshev(u, v)[source]¶ Finds the Chebyshev distance between two 1-D arrays.
\[\max_{i} \lvert u_{i} - v_{i} \rvert\]Parameters: - u – 1-D array or collection of 1-D arrays
- v – 1-D array or collection of 1-D arrays
Returns: Chebyshev distance
Return type: float
-
dask_distance.cityblock(u, v)[source]¶ Finds the City Block (Manhattan) distance between two 1-D arrays.
\[\sum_{i} \lvert u_{i} - v_{i} \rvert\]Parameters: - u – 1-D array or collection of 1-D arrays
- v – 1-D array or collection of 1-D arrays
Returns: City Block (Manhattan) distance
Return type: float
-
dask_distance.correlation(u, v)[source]¶ Finds the correlation distance between two 1-D arrays.
\[1 - \frac{ (u - \bar{u}) \cdot (v - \bar{v}) } { \lVert u - \bar{u} \rVert_{2} \lVert v - \bar{v} \rVert_{2} }\]Parameters: - u – 1-D array or collection of 1-D arrays
- v – 1-D array or collection of 1-D arrays
Returns: correlation distance
Return type: float
-
dask_distance.cosine(u, v)[source]¶ Finds the Cosine distance between two 1-D arrays.
\[1 - \frac{ u \cdot v } { \lVert u \rVert_{2} \lVert v \rVert_{2} }\]Parameters: - u – 1-D array or collection of 1-D arrays
- v – 1-D array or collection of 1-D arrays
Returns: Cosine distance
Return type: float
-
dask_distance.dice(u, v)[source]¶ Finds the Dice dissimilarity between two 1-D bool arrays.
\[\frac{ c_{TF} + c_{FT} }{ 2 \cdot c_{TT} + c_{TF} + c_{FT} }\]where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)
Parameters: - u – 1-D bool array or collection of 1-D bool arrays
- v – 1-D bool array or collection of 1-D bool arrays
Returns: Dice dissimilarity
Return type: float
-
dask_distance.euclidean(u, v)[source]¶ Finds the Euclidean distance between two 1-D arrays.
\[\lVert u - v \rVert_{2}\]Parameters: - u – 1-D array or collection of 1-D arrays
- v – 1-D array or collection of 1-D arrays
Returns: Euclidean distance
Return type: float
-
dask_distance.hamming(u, v)[source]¶ Finds the Hamming distance between two 1-D bool arrays.
\[\frac{ c_{TF} + c_{FT} }{ c_{TT} + c_{TF} + c_{FT} + c_{FF} }\]where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)
Parameters: - u – 1-D bool array or collection of 1-D bool arrays
- v – 1-D bool array or collection of 1-D bool arrays
Returns: Hamming distance
Return type: float
-
dask_distance.jaccard(u, v)[source]¶ Finds the Jaccard-Needham dissimilarity between two 1-D bool arrays.
\[\frac{ c_{TF} + c_{FT} }{ c_{TT} + c_{TF} + c_{FT} }\]where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)
Parameters: - u – 1-D bool array or collection of 1-D bool arrays
- v – 1-D bool array or collection of 1-D bool arrays
Returns: Jaccard-Needham dissimilarity
Return type: float
-
dask_distance.kulsinski(u, v)[source]¶ Finds the Kulsinski dissimilarity between two 1-D bool arrays.
\[\frac{ 2 \cdot \left(c_{TF} + c_{FT}\right) + c_{FF} } { c_{TT} + 2 \cdot \left(c_{TF} + c_{FT}\right) + c_{FF} }\]where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)
Parameters: - u – 1-D bool array or collection of 1-D bool arrays
- v – 1-D bool array or collection of 1-D bool arrays
Returns: Kulsinski dissimilarity
Return type: float
-
dask_distance.mahalanobis(u, v, VI)[source]¶ Finds the Mahalanobis distance between two 1-D arrays.
\[\sqrt{ (u - v) \cdot V^{-1} \cdot (u - v)^{T} }\]Parameters: - u – 1-D array or collection of 1-D arrays
- v – 1-D array or collection of 1-D arrays
- VI – Inverse of the covariance matrix
Returns: Mahalanobis distance
Return type: float
-
dask_distance.minkowski(u, v, p)[source]¶ Finds the Minkowski distance between two 1-D arrays.
\[\left( \sum_{i} \lvert u_{i} - v_{i} \rvert^{p} \right)^{\frac{1}{p}}\]Parameters: - u – 1-D array or collection of 1-D arrays
- v – 1-D array or collection of 1-D arrays
- p – degree of the norm to use
Returns: Minkowski distance
Return type: float
-
dask_distance.pdist(X, metric=u'euclidean', **kwargs)[source]¶ Finds the pairwise condensed distance matrix using the metric.
Parameters: - X – 2-D array of points
- metric – string or callable
- **kwargs – provided to the metric (see below)
Keyword Arguments: - p – p-norm for minkowski only (default: 2)
- V – 1-D array of variances for seuclidean only (default: estimated from X)
- VI – Inverse of the covariance matrix for mahalanobis only (default: estimated from X)
- w – 1-D array of weights for wminkowski only (required)
Returns: condensed distance between each pair
Return type: array
Note
Tries to avoid redundant computations as much as possible. However this is limited in its ability to do this based on the chunk size of X (particularly along the first dimension). Smaller chunks will increase savings though there may be other tradeoffs.
-
dask_distance.rogerstanimoto(u, v)[source]¶ Finds the Rogers-Tanimoto dissimilarity between two 1-D bool arrays.
\[\frac{ 2 \cdot \left(c_{TF} + c_{FT}\right) } { c_{TT} + 2 \cdot \left(c_{TF} + c_{FT}\right) + c_{FF} }\]where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)
Parameters: - u – 1-D bool array or collection of 1-D bool arrays
- v – 1-D bool array or collection of 1-D bool arrays
Returns: Rogers-Tanimoto dissimilarity
Return type: float
-
dask_distance.russellrao(u, v)[source]¶ Finds the Russell-Rao dissimilarity between two 1-D bool arrays.
\[\frac{ c_{TF} + c_{FT} + c_{FF} } { c_{TT} + c_{TF} + c_{FT} + c_{FF} }\]where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)
Parameters: - u – 1-D bool array or collection of 1-D bool arrays
- v – 1-D bool array or collection of 1-D bool arrays
Returns: Russell-Rao dissimilarity
Return type: float
-
dask_distance.seuclidean(u, v, V)[source]¶ Finds the standardized Euclidean distance between two 1-D arrays.
\[\sqrt{\sum_{i} \left( \frac{\left( u_{i} - v_{i} \right)^{2}}{V_{i}} \right)}\]Parameters: - u – 1-D array or collection of 1-D arrays
- v – 1-D array or collection of 1-D arrays
- V – 1-D array of variances
Returns: standardized Euclidean
Return type: float
-
dask_distance.sokalmichener(u, v)[source]¶ Finds the Sokal-Michener dissimilarity between two 1-D bool arrays.
\[\frac{ 2 \cdot \left(c_{TF} + c_{FT}\right) } { c_{TT} + 2 \cdot \left(c_{TF} + c_{FT}\right) + c_{FF} }\]where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)
Parameters: - u – 1-D bool array or collection of 1-D bool arrays
- v – 1-D bool array or collection of 1-D bool arrays
Returns: Sokal-Michener dissimilarity
Return type: float
-
dask_distance.sokalsneath(u, v)[source]¶ Finds the Sokal-Sneath dissimilarity between two 1-D bool arrays.
\[\frac{ 2 \cdot \left(c_{TF} + c_{FT}\right) } { c_{TT} + 2 \cdot \left(c_{TF} + c_{FT}\right) }\]where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)
Parameters: - u – 1-D bool array or collection of 1-D bool arrays
- v – 1-D bool array or collection of 1-D bool arrays
Returns: Sokal-Sneath dissimilarity
Return type: float
-
dask_distance.sqeuclidean(u, v)[source]¶ Finds the squared Euclidean distance between two 1-D arrays.
\[\lVert u - v \rVert_{2}^{2}\]Parameters: - u – 1-D array or collection of 1-D arrays
- v – 1-D array or collection of 1-D arrays
Returns: squared Euclidean distance
Return type: float
-
dask_distance.squareform(X, force=u'no')[source]¶ Converts between dense and sparse distance matrices
Parameters: - X – 2-D square symmetric matrix or 1-D vector of distances
- force – whether to force to a vector or a matrix
Returns: 1-D vector or 2-D square symmetric matrix of distances
Return type: array
-
dask_distance.wminkowski(u, v, p, w)[source]¶ Finds the weighted Minkowski distance between two 1-D arrays.
\[\left( \sum_{i} \lvert w_{i} \cdot (u_{i} - v_{i}) \rvert^{p} \right)^{ \frac{1}{p} }\]Parameters: - u – 1-D array or collection of 1-D arrays
- v – 1-D array or collection of 1-D arrays
- p – degree of the norm to use
- w – 1-D array of weights
Returns: Minkowski distance
Return type: float
-
dask_distance.yule(u, v)[source]¶ Finds the Yule dissimilarity between two 1-D bool arrays.
\[\frac{ 2 \cdot c_{TF} \cdot c_{FT} } { c_{TT} \cdot c_{FF} + c_{TF} \cdot c_{FT} }\]where \(c_{XY} = \sum_{i} \delta_{u_{i} X} \delta_{v_{i} Y}\)
Parameters: - u – 1-D bool array or collection of 1-D bool arrays
- v – 1-D bool array or collection of 1-D bool arrays
Returns: Yule dissimilarity
Return type: float