Dan Donahue bio photo

Dan Donahue

Musician. Traveler. Programmer.

LinkedIn Github Stackoverflow

In most applications, developers use integers as their primary keys. And why not? They're dead simple. Most databases have a mechanism with which they can auto-generate these ids, they're a native language construct in every language imaginable and they work. They're also small data types - 4 bytes for an integer - so they don't take up too much space and perform slightly faster when you query.

GUIDs, however, have some benefits that are hard to ignore. And in fact, when building a system with disparate services that are autonomous, it's actually easier to use them than what you're used to. Allow me to explain.

The biggest benefit from a service-oriented standpoint is that they are generated on the client rather than the server. This means you don't have to go to the database to get the next available id. That's quite important. When your data is organized into discrete services, it's quite often the case that the data which makes up a concrete, real world entity is split up between different services. As an example, take the often used Order object in an ecommerce system. In a service-oriented architecture, it could be the case that the shipping address belongs to the Shipping service and the billing address and credit card details belong to the Billing service. And while we will not make a foreign key reference to the data in another BC, you will need an identifier that lets you know that you're working with the same concrete concept/entity in both subsytems. That's where client id generation comes in handy.

Another nice benefit of GUIDs is right in the name - they are GLOBALLY unique. That means that the id given to an entity in your system is globally unique to that entity, not just other entities of the same type. In concrete terms, imagine you have an Order with id = 1 and a Product with id = 1. That happens fairly frequently. Not the case when using GUIDs. Your order will have an id that is unique throughout the entirety of the system. That is useful. Anywhere that GUID is in your data, you can trace it back to the exact entity that it relates to. There's no chance of seeing an id = 5 and needing extra information to determine WHICH entity type it is referring to.

A third benefit of using GUIDs is that as soon as data enters your system, you can identify it. This is not the case with db-generated ids. Until first save, you can have a new entity in your system that has no identity. That's a strange thing to think about. The flip side of the argument is that until it's saved, does it really exist? Discarding a GUID that you've generated is not a big deal. You're not really wasting something that you have such a large supply of. But think about most systems being developed. Before an Order is saved, it has no id. That's strange to think about. You can perform actions on an object living in the system that can't be identified. Sure - it doesn't often happen that way in practice, but it does seem to map more closely to the real world that an object in your system has identity from the moment it is born.

A few things to note. When people first hear about generating on the client, they have a few concerns. The first being the chance of collision - meaning the worry that the client may generate the same GUID twice. Taken from the wikipedia page on GUIDs, "Assuming uniform probability for simplicity and that every second, every person on earth generated one million GUIDs, it would statistically take far longer than the current age of the universe to generate two identical GUIDs." I feel pretty confident in that.

The other concern is that ints can perform better in JOINs when querying the database because they can be more easily indexed. This is true. But remember - if you've done SOA correctly, you won't be running JOINs on data outside of your autonomous service so this is negligible. And yes, they do take up more space in the database for sure, but space is pretty cheap these days.

So think about GUIDs for ids. They're not ONLY for databases that hold more data than your int/long data type can support. They're going to give you benefits which make moving to SOA much easier.