TiDB Blog.

This is the most underrated feature in TiDB

Cover Image for This is the most underrated feature in TiDB
Morgan Tocker
Morgan Tocker

Several years ago, Alexander Rubin (a friend of mine!) published a blog post on percona.com that showed TiDB performs about 5-10x worse than MySQL on a single server. I was invited to review a draft of this post, and apart from suggesting a few clarifications, I agreed the findings were correct.

However, if you talk to users that deploy both TiDB and MySQL, you will frequently hear them report that TiDB actually requires fewer servers than sharded MySQL.

Wait, that doesn’t make sense! How can both of these facts be true at the same time?

Understanding Alex’s blog post

Alex demonstrated two different types of tests:

  1. An analytical workload, based on some sample queries he selected. Here TiDB performed better (by latency) than MySQL because of its multi-threaded query execution. These queries were performed on TiKV, as TiFlash (column storage) was not available at the time.
  2. A transactional workload, based on sysbench. Here TiDB performed 5-10x worse (by throughput), and although this was performed on an older version of TiDB, it matches my experience. I still see a single server performance gap of about 3x when comparing MySQL to TiDB.

But the part about the test being performed on a single server needs to be emphasized. The performance benefits of TiBD only become clear once you use multiple servers.

Why does TiDB require fewer servers?

The simple answer: TiDB allows you to utilize your hardware better, and there are two main reasons for this:

#1 Simplified Topology

Each TiKV node is both a source and a replica (using MySQL terms) at the same time. In MySQL deployments, it is typical that replicas are either on stand-by, or there is read/write split, and replicas perform more work than the source since they must both apply all writes and serve reads.

In TiDB clusters, the data is divided into 100M chunks, where a single TiKV server will be the source for some, and a replica for others (tikv terms: leader, follower).

Historically in MySQL the limiting factor in this topology was always replication lag. Parallel apply was introduced in MySQL 5.6/5.7, and then significantly improved again in MySQL 8.0. But in general the binary log (not InnoDB) is still the limiting factor with workloads that have high modification rates. The impact of the binary log was not in scope for Alex's single server test.

#2 TiDB requires less over-provisioning for hotspots than MySQL

I think it’s well understood that most datasets have hotspots - whether it be some tenants on a SaaS system, or some SKUs that are a lot more popular than others. So if you have a database sharded into 20 shards, it is typical that the most popular shard has 3x more load than the average.

But what is less well understood, is that hotspots change. That is to say that what was popular last week, might not be popular today. Sometimes this is obvious (newer content is more hot), but not all trends are predictable.

Most of time when a user shards MySQL, they are talking about a largely one time event. They seldom intend to split a shard if it suddenly becomes hot, and are even less likely to merge it back when the hotspot dies down.

Why not? It takes a lot of work! It’s easier to over-provision each shard so that it can absorb all but the very largest hotspots without manual intervention. For some users I’ve spoken to, this could be a 5x or a 10x over provisioning factor.

In TiDB however, hotspots are automatically balanced across the cluster. This is only possible because both of the following are true:

  1. At around 100M, the data chunk size is relatively small.
  2. TiDB always uses a two-phase commit internally, so moving data around does not create a shard boundaries problem where changes are no longer atomic.

Thus; because TiDB is able to dynamically balance hotspots, it effectively requires a lower over provisioning factor than MySQL.

Conclusion

Both features can be classified as “scheduling”, which is performed by the PD component of a TiDB cluster. It is in my opinion that scheduling is the most underrated feature of TiDB when compared to sharded MySQL.

Why do we not talk about it enough? I have my own theory for that:

  • It’s not obvious until you are running a system at scale exactly how common hotspot/balance issues are.
  • It has been a TiDB feature from the start. There tends to be a natural habit to talk about recent improvements only. I know I fall into this trap myself.

There are in fact many different schedules that occur in PD, including Leader Balance, Region Balance, and Hot Region balance. While there are some limits to scheduling (it is more reactive, than proactive) it is still an incredibly useful feature. I also see a bright future, where with AI it should also be possible to predict and eliminate hotspots before they occur.