HyperLogLogLog: Cardinality Estimation With One Log More

Matti Karppa, Rasmus Pagh

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

12 Citations (Scopus)
24 Downloads (Pure)

Abstract

We present HyperLogLogLog, a practical compression of the HyperLogLog sketch that compresses the sketch from O(mogog n) bits down to m log2log2log2 m + O(m+loglog n) bits for estimating the number of distinct elements∼n using m∼registers. The algorithm works as a drop-in replacement that preserves all estimation properties of the HyperLogLog sketch, it is possible to convert back and forth between the compressed and uncompressed representations, and the compressed sketch maintains mergeability in the compressed domain. The compressed sketch can be updated in amortized constant time, assuming n is sufficiently larger than m. We provide a C++ implementation of the sketch, and show by experimental evaluation against well-known implementations by Google and Apache that our implementation provides small sketches while maintaining competitive update and merge times. Concretely, we observed approximately a 40% reduction in the sketch size. Furthermore, we obtain as a corollary a theoretical algorithm that compresses the sketch down to mlog2log2log2log2 m+O(mlogloglog m/loglog m+loglog n) bits.

Original languageEnglish
Title of host publicationKDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery, Inc.
Publication date2022
Pages753-761
ISBN (Electronic)9781450393850
DOIs
Publication statusPublished - 2022
Event28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022 - Washington, United States
Duration: 14 Aug 202218 Aug 2022

Conference

Conference28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022
Country/TerritoryUnited States
CityWashington
Period14/08/202218/08/2022
SponsorACM SIGKDD, ACM SIGMOD

Bibliographical note

Publisher Copyright:
© 2022 ACM.

Keywords

  • cardinality estimation
  • distinct elements
  • hashing
  • hyperloglog

Cite this