TY - GEN
T1 - Hardness of bichromatic closest pair with Jaccard similarity
AU - Pagh, Rasmus
AU - Stausholm, Nina Mesing
AU - Thorup, Mikkel
PY - 2019
Y1 - 2019
N2 - Consider collections A and B of red and blue sets, respectively. Bichromatic Closest Pair is the problem of finding a pair from A × B that has similarity higher than a given threshold according to some similarity measure. Our focus here is the classic Jaccard similarity |a ∩ b|/|a ∪ b| for (a, b) ∈ A × B. We consider the approximate version of the problem where we are given thresholds j1 > j2 and wish to return a pair from A × B that has Jaccard similarity higher than j2 if there exists a pair in A × B with Jaccard similarity at least j1. The classic locality sensitive hashing (LSH) algorithm of Indyk and Motwani (STOC’98), instantiated with the MinHash LSH function of Broder et al., solves this problem in Õ(n2−δ) time if j1 ≥ j21−δ. In particular, for δ = Ω(1), the approximation ratio j1/j2 = 1/j2δ increases polynomially in 1/j2. In this paper we give a corresponding hardness result. Assuming the Orthogonal Vectors Conjecture (OVC), we show that there cannot be a general solution that solves the Bichromatic Closest Pair problem in O(n2−Ω(1)) time for j1/j2 = 1/j2o(1). Specifically, assuming OVC, we prove that for any δ > 0 there exists an ε > 0 such that Bichromatic Closest Pair with Jaccard similarity requires time Ω(n2−δ) for any choice of thresholds j2 < j1 < 1 − δ, that satisfy j1 ≤ j21−ε
AB - Consider collections A and B of red and blue sets, respectively. Bichromatic Closest Pair is the problem of finding a pair from A × B that has similarity higher than a given threshold according to some similarity measure. Our focus here is the classic Jaccard similarity |a ∩ b|/|a ∪ b| for (a, b) ∈ A × B. We consider the approximate version of the problem where we are given thresholds j1 > j2 and wish to return a pair from A × B that has Jaccard similarity higher than j2 if there exists a pair in A × B with Jaccard similarity at least j1. The classic locality sensitive hashing (LSH) algorithm of Indyk and Motwani (STOC’98), instantiated with the MinHash LSH function of Broder et al., solves this problem in Õ(n2−δ) time if j1 ≥ j21−δ. In particular, for δ = Ω(1), the approximation ratio j1/j2 = 1/j2δ increases polynomially in 1/j2. In this paper we give a corresponding hardness result. Assuming the Orthogonal Vectors Conjecture (OVC), we show that there cannot be a general solution that solves the Bichromatic Closest Pair problem in O(n2−Ω(1)) time for j1/j2 = 1/j2o(1). Specifically, assuming OVC, we prove that for any δ > 0 there exists an ε > 0 such that Bichromatic Closest Pair with Jaccard similarity requires time Ω(n2−δ) for any choice of thresholds j2 < j1 < 1 − δ, that satisfy j1 ≤ j21−ε
KW - Bichromatic closest pair
KW - Fine-grained complexity
KW - Jaccard similarity
KW - Set similarity search
U2 - 10.4230/LIPIcs.ESA.2019.74
DO - 10.4230/LIPIcs.ESA.2019.74
M3 - Article in proceedings
AN - SCOPUS:85074870798
T3 - Leibniz International Proceedings in Informatics, LIPIcs
BT - 27th Annual European Symposium on Algorithms, ESA 2019
A2 - Bender, Michael A.
A2 - Svensson, Ola
A2 - Herman, Grzegorz
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 27th Annual European Symposium on Algorithms, ESA 2019
Y2 - 9 September 2019 through 11 September 2019
ER -