If you’re like most developers, you probably have no idea what probabilistic data structures are. In fact, I did a super-scientific poll on Twitter and found that out of 119 participants, 58% had never heard of them and 22% had heard the term but nothing more. I wonder what percentage of that 22% heard the term for the first time in the poll. We’re a literal-minded lot at times.
Anyhow. That’s 4 out of 5 developers. And that’s a lot of folks that need to be educated. So, let’s do that.
A probabilistic data structure is, well, they’re sort of like the TARDIS—bigger on the inside—and JPEG compression—a bit lossy. And, like both, they are fast, accurate enough, and can take you to interesting places of adventure. That last one might not be something a JPEG does.
More technically speaking, most probabilistic data structures use hashes to give you faster and smaller data structures in exchange for precision. If you’ve got a mountain of data to process, this is super useful. In this talk, we’ll briefly go over some common probabilistic data structures; dive deep into a couple (Bloom Filter, MinHash, and Top-K); and show a running application that makes use of Top-K in Redis to analyze the most commonly used words in all 112,092 of my UFO sightings.
When we’re done, you’ll be ready to start using some of these structures in your own applications.
Scheduled on Tuesday from 15:50 to 16:40 in Room E