UUID v4 vs v7: Why Your Database Will Thank You for Switching

The Problem Nobody Talks About Until Production Melts

You added UUID primary keys because you read that auto-increment integers are predictable and leak row counts. Fair enough. You reached for uuid4() because it was the obvious choice — it's random, it's unique, every library ships it. Six months later your DBA is staring at a query plan that shows a sequential scan across 40 million rows, and your inserts have degraded from 8,000/sec to under 2,000/sec. The UUIDs are not the problem. The version of UUID is.

UUID v7, finalized in RFC 9562 in May 2024, fixes the core performance pathology of v4 at the database layer. Understanding why requires a brief detour into how B-trees actually store and navigate data.

How B-Trees Fail Under Random Keys

A B-tree index is an ordered data structure. Every leaf page holds a sorted range of key values, and every internal node holds separator keys that tell the database engine which leaf page to descend into. This ordering is what makes range scans and lookups logarithmically fast — O(log n) rather than O(n).

When you insert a new row, the database has to find the correct sorted position for the new key and write it there. With a monotonically increasing key — an auto-increment integer, a timestamp, a sequence — new rows always land at the right edge of the index. The last leaf page is almost always in the buffer pool (it was just written), so the insert is essentially sequential memory access plus one or two logical reads to traverse the tree depth.

UUID v4 breaks this completely. A v4 UUID looks like this:

f47ac10b-58cc-4372-a567-0e02b2c3d479

128 bits of cryptographic randomness, uniformly distributed across the entire key space. Every insert lands at a random position in the B-tree. That position's leaf page is almost certainly not in the buffer pool. The database must perform a physical read to load it, update it, and write it back. At low row counts this is invisible. At tens of millions of rows, your working set of "recently touched index pages" approaches the size of the entire index, and your buffer pool hit rate collapses.

This effect has a name in PostgreSQL internals: write amplification. Each logical insert causes multiple physical page reads and writes as the tree rebalances around random positions. The phenomenon is well-documented in the PostgreSQL mailing lists and MySQL InnoDB internals documentation under "page splits" — when a leaf page fills up, it must be split in two, its parent node updated, and potentially that parent split as well. With sequential keys, splits happen at one predictable location (the right edge). With random keys, splits can happen anywhere, constantly.

What UUID v7 Actually Looks Like

RFC 9562 defines UUID v7 as a time-ordered identifier. The first 48 bits are a millisecond-precision Unix timestamp. Bits 48–51 are the version field (0111 = 7). The remaining 74 bits are random. A v7 UUID at the moment I'm writing this looks like:

0190f3a2-1b4c-7e8d-9a2b-f3c7d8e9f012

The leading segment changes slowly (it's a timestamp in milliseconds), and the trailing segment is random enough to prevent collisions even at high insertion rates across distributed nodes. Two inserts happening within the same millisecond will differ in those 74 random bits, and they'll sort adjacent to each other in the index — which is exactly the behavior you want.

This gives you the uniqueness guarantees of UUID (no coordination required, safe for distributed systems) with the insert locality of a sequential integer.

The Numbers: What "Insert Locality" Actually Buys You

Let me reason through the performance difference with concrete estimates rather than vague claims about "better performance."

A typical PostgreSQL installation on commodity hardware has a shared buffer pool configured at 25% of RAM — say 4 GB on a 16 GB instance. A B-tree leaf page is 8 KB. That gives you roughly 500,000 index pages in cache at any moment.

A table with 50 million rows and a UUID v4 primary key will have an index of approximately 2–3 GB (depending on fill factor and page density). The ratio of cache size to index size is around 1.5x to 2x — meaning a significant fraction of the index fits in memory, but as you insert more rows, cache pressure grows and your hit rate drops.

Here's the asymmetry: with UUID v4, inserts are uniformly distributed across all index pages. If you have 300,000 index leaf pages and your buffer pool holds 200,000 pages, roughly 1 in 3 inserts will target a page not in cache — a physical read. At 5,000 inserts per second, that's about 1,667 physical reads per second just for index maintenance. Your storage subsystem is doing double duty: handling actual data writes and constantly swapping index pages in and out of the buffer pool.

With UUID v7, inserts cluster near the right edge of the index. The "hot zone" of recently written index pages is small — typically just a handful of leaf pages near the maximum key value. These pages stay warm in the buffer pool essentially forever. Physical reads for index maintenance drop toward zero. Your storage I/O budget can be spent on what matters: serving reads.

Benchmarks run by the PostgreSQL community and corroborated by MySQL InnoDB testing (documented in the uuid-ossp and pgcrypto extension discussions) consistently show UUID v7 achieving 2x to 4x better insert throughput compared to UUID v4 at tables beyond ~10 million rows, with the gap widening as the table grows. The InnoDB team at Oracle has documented similar observations in their performance blog posts around the time UUIDv7 was being standardized.

The Secondary Index Problem

Primary keys aren't the only concern. In PostgreSQL and MySQL InnoDB, secondary indexes store the primary key as their row pointer. If your primary key is a random UUID, every secondary index implicitly carries 128 bits of entropy as its row locator. This affects:

  • Index size: A UUID primary key is 16 bytes versus 4 bytes for a 32-bit integer. Each secondary index row is 12 bytes larger than it would need to be.
  • Join performance: When the query planner performs an index scan followed by a heap fetch, each heap fetch requires looking up the primary key in the cluster. With a random UUID, that heap access is also random.
  • Covering index effectiveness: The larger the non-covering index rows, the fewer fit per page, the more pages must be read for a range scan.

UUID v7 doesn't eliminate these costs, but it substantially mitigates the clustering problem. Because recently inserted rows have temporally adjacent UUIDs, range queries on the primary key (which are common in "give me the last N items" patterns) benefit from physical locality on disk.

Adoption: Where the Ecosystem Stands

Language and library support has moved quickly since RFC 9562 was published.

Python: The standard library's uuid module added uuid.uuid7() in Python 3.14 (currently in release candidate). For production use today, the uuid6 package on PyPI is the defacto standard and has been for a couple of years — it's a single-file implementation with no dependencies, well-maintained, and commonly recommended in Django and SQLAlchemy communities.

JavaScript/TypeScript: The uuid package (the canonical one, 50+ million weekly downloads) has supported uuidv7() since version 9.0.0, released in 2023.

Go: The google/uuid package added uuid.NewV7() in v1.6.0 (late 2023).

Database-level generation: PostgreSQL doesn't natively generate UUID v7 yet (as of PostgreSQL 16), but the pg_uuidv7 extension provides a uuid_generate_v7() function. MySQL 8.0 and MariaDB don't have native support either. The community workaround is generating UUIDs in the application layer, which is generally preferable anyway — it means your application code knows the ID before the round-trip to the database, which simplifies a lot of patterns.

ORMs: Hibernate (Java) added UUID v7 support in 6.2. Doctrine (PHP) supports it through custom generators. Django's UUID field doesn't have built-in v7 support yet, but generating the value in Python code before passing it to the model is trivial.

Migration Considerations

If you have an existing table with UUID v4 primary keys, migrating is non-trivial. You can't simply change the format of existing keys — every foreign key reference, every application-level cache, every link handed out to users would break.

The practical approaches are:

Greenfield tables: Use v7 from day one. There's no compelling reason to use v4 for new tables unless you specifically want the keys to be opaque and non-ordered (some security contexts prefer non-predictable IDs; see below).

New tables in existing schemas: If you're adding tables that join to existing v4-keyed tables, you can use v7 for the new table's own primary key while storing the v4 foreign key as a lookup column.

Full migration: For high-value tables where insert performance is measurably degraded, a dual-write migration (write to both old and new tables during a transition window, then cut over reads) is the standard approach. Painful, but sometimes worth it at scale.

The Legitimate Argument for Keeping v4

There are real cases where UUID v4 is the right choice. If your IDs are exposed in URLs or APIs and you want them to be genuinely unpredictable — not just unique, but non-enumerable and non-sequential — then v7's timestamp prefix is a liability. An adversary who knows an approximate creation time can narrow the key space significantly. For short-lived authorization tokens, invite links, or password reset URLs, v4's full randomness is a feature.

The common pattern is to use v7 as the internal database primary key (never exposed externally) and generate a separate v4 token or a short opaque hash for any external-facing identifier. This is how systems like Stripe structure their object IDs — the database row has a sequential surrogate key, but the API exposes a different, non-sequential identifier.

The Bottom Line

UUID v7 is not a silver bullet for database performance, and switching to it won't rescue a poorly indexed schema or a query that needed an index months ago. But it directly addresses a real and well-understood pathology of using random keys in ordered data structures. If you're building a new system that will need UUIDs — and you don't have a specific reason to prefer non-sequential external identifiers at the primary key level — there's no good argument left for defaulting to v4.

The RFC is finalized. The libraries are there. Your B-tree will thank you.