This is the second of a 2 part post about how we improved query performance on our analytics dashboard by over 7000x. All just by moving some of our data from MySQL to Redis. Part 1 was a technical explanation of the setup, while part 2 shows the benchmarks we saw when comparing fetching data from both systems.
We use Redis a lot. It is fast, stable, effective and awesome! This time, we found Redis useful in solving a painful problem: counting unique users for multiple different filters.
We recently found a new feature in Redis (new for us at least): HyperLogLog. HyperLogLog is a probabilistic data structure which makes estimating the number of distinct objects in a set very fast (Actually, more like blazing fast), but with a minor standard error (You can read more about it here). The moment we read about HyperLogLog we knew there’s something in it. And now that Redis has made it so easy to use, our testing started almost immediately.
We Want Real-Time Data
Until now, we used to keep all data about unique users in MySQL. The data was saved in different variations and ready for filtering (country, day …). As time went by, our queries became slower and slower. It was a pretty grim situation when all our different optimizations on MySQL showed us there’s no real solution here. We were offered to take many different approaches using Redshift, Hadoop or ElasticSearch but we didn’t want to have our data presented in any delay to our users. We wanted a complete, real-time data presentation in our dashboard that is being instantly updated using our background workers.
Redis to The Rescue
Once we had Redis running and migrated the MySQL data in, the results were astonishing. We’ve been tweaking MySQL to try to make distinct counting faster for a couple of months now, and results were mediocre at best (not to MySQLs fault, we were counting cardinality in 10 million+ row tables), but Redis was FAST. Although speed wasn’t the only thing we had to benchmark, we weren’t sure how well the 0.8% error deviation Redis promises for HyperLogLog stood up when we ran queries on our data.
MySql is Under Performing
To get us started, here is a benchmark of part of the many many different ways we tried tweaking MySQL specifically for
We tried different query and index structures, the conclusions we drew from the process:
SELECT COUNT(*) FROM (SELECT * GROUP BY id)seemed to constantly work better than
SELECT COUNT(DISTINCT id).
- MySQLWorkbench is awesome.
- With 10M rows and getting larger every day, MySQL just wasn’t the tool for counting the cardinality of our user-data.
Revelation of Goodness
Once we migrated all of our MySQL data into Redis Keys, we saw Redis zip by MySQL in a blink of the eye.
There’s no mistake in that graph. We tried to chart both performance times of MySql and Redis on the same graph, but you probably can’t see redis’s values there. Here’s a close up of Redis performance times.
The Fly in The Ointment
This can’t be all so good. HyperLogLog only gives an estimate, so then it was time to compare the estimates to the actual MySQL counts. For most queries, the difference was much smaller than the 0.8% error deviation (the smallest was 0.03%), but after benchmarking many different queries, we also had 2 that reached an error of 1.1% and 1.7%.
In the end, these error deviations were acceptable for some of our use cases. We’re still saving exact counts outside of Redis … Just in case.
HyperLogLog is a very powerful tool for counting unique entities. You should definitely use it if you’re willing to accept its minor standard error.