International Association for Cryptologic Research

International Association
for Cryptologic Research

CryptoDB

Compact Frequency Estimators in Adversarial Environments

Authors:
Sam A. Markelon
Mia Filić
Thomas Shrimpton
Download:
Search ePrint
Search Google
Presentation: Slides
Abstract: Count-Min Sketch (CMS) and HeavyKeeper (HK) are two realizations of a compact frequency estimator (CFE). These probabilistic data structures maintain a compact summary of high-volume streaming data and provide approximate estimates of the number of times an element occurred. CFEs are commonly used in streaming settings to identify elements with the largest frequencies (i.e., top-K elements, heavy hitters, elephant flows). Finding extreme elements is important for network planning, network monitoring, recommendation systems, etc. Traditionally, probabilistic guarantees on the accuracy of frequency estimates are proved under the implicit assumption that stream elements do not depend upon the internal randomness of the structure. This assumption is often not well-matched with reality; malicious actors could be incentivized to manipulate the data stream. In this talk, we reveal vulnerabilities in CMS and HK to adaptive attacks, by presenting attacks that cause significant estimation errors. For instance, elements never seen in the stream can be manipulated to resemble heavy hitters in CMS. This could, for example, cause network flow monitoring systems relying on CFEs to identify non-existent or benign flows as possible threats. Conversely, HK can make legitimate heavy hitters disappear. We analyze our attacks analytically and experimentally, obtaining a tight agreement between the two. These negative results seem unavoidable for (at least) sketch-based CFEs with parameters that are reasonable in practice. On the positive side, we build a new CFE (Count-Keeper) that can be seen as a composition of the CMS and HK structures. Count-Keeper estimates are typically more accurate (by at least a factor of two) than CMS for “honest” streams. Further, our attacks against CMS and HK are less effective (and more resource intensive) when used against Count-Keeper. Lastly, Count-Keeper has a native ability to flag estimates that are suspicious, which neither CMS or HK (or any other CFE, to our knowledge) admits.
Video: https://www.youtube.com/watch?v=EZzSEvqMmKA
BibTeX
@misc{rwc-2024-35378,
  title={Compact Frequency Estimators in Adversarial Environments},
  note={Video at \url{https://www.youtube.com/watch?v=EZzSEvqMmKA}},
  howpublished={Talk given at RWC 2024},
  author={Sam A. Markelon and Mia Filić and Thomas Shrimpton},
  year=2024
}