Wiki
Just
to make sure we're all on the same page –
Unique
Identifier (which from now on will be referred to as ‘GUID', a.k.a. 'UUID'), is
an algorithm-based 32-byte hex string (128 bit integer) which is supposed to
generate completely unique values.
I’d
like to compare the chances of two identical generated GUIDs to the chance
earth will be destroyed by a meteor in the next five minutes.
Now,
if that happens – you won’t be here to prove me wrong! J
Here’s
one: 5B5D6F63-1FF0-4574-9B77-BD77D716FD66.
It
is probably unique and there’s no other like it in the world.
You
can read more here - http://en.wikipedia.org/wiki/Globally_Unique_Identifier
Why should I?
With
today’s increasing demand for fully-distributed systems, GUID usage is
increasing accordingly.
It
allows managing two or more disconnected databases which later on need to
merge, knowing that each row on desired shared tables is unique.
Here’s a simple example –
Consider a very large company with many
branches across the world.
Each branch has local users, and there’s a sync process which merges the entire
users list.
The ‘traditional’ way is to maintain a
range of numbers for each branch, for example:
- use
a prefix on the username to maintain uniqueness
- Assign
a numeric range for each branch (easier today with SQL 2012's sequences
support)
- Make
a combined key of two columns, for example - UserId+BranchId columns
This is a classic sample for using GUID
instead – make the UserId column GUID and problem is solved.
Sounds like fun! Any reason
why I shouldn’t use it?
Well, yes.
Before you get all excited and change
your “int’s” and “bigint’s” to GUIDs, consider the following:
- GUID
column takes 16 bytes, which is double than Bigint (long).
o This
means that index on the column will also take more space
- Index
Fragmentation is an issue and it is most likely to happen.
o (See
more about it on the Tips section below)
- Not
all databases support GUIDs.
o So,
if you migrate to mysql, you'll have to change the datatype to CHAR(32), so it
takes even more space as a result
Some Best-Practices Tips:
[Tip #1]
The first, and probably the most
important tip, is to avoid using CLUSTERED INDEX on GUID's.
This is because the GUID values are random – a physical sort will cause the
table to be re-sorted very frequently and will also result with an excessive
I/O consumption.
So do one of the following:
Either create another identity on the
table, which also serves as the PRIMARY KEY (also a generic best practice of
avoiding 'heap' tables)
OR -
When
possible - use the NEWSEQUENTIALID function.
This
will create unique GUIDs on a table but will also make sure the new generated
GUID is greater than the last. (It is still globally unique as long as the
machine has its own MAC address).
Let’s
review a quick sample of the differences between the NEWID() function and the
NEWSEQUENTIALID default value.
We'll
create two tables; each uses one of the techniques:
-- Creating the
tables:
IF OBJECT_ID('UsersNewId') IS NULL
BEGIN
CREATE TABLE UsersNewId
(
UserId UNIQUEIDENTIFIER PRIMARY KEY NONCLUSTERED DEFAULT NEWID(),
UserName VarChar(255)
)
END
IF OBJECT_ID('UsersNewSeqId') IS NULL
BEGIN
CREATE TABLE UsersNewSeqId
(
UserId UNIQUEIDENTIFIER PRIMARY KEY NONCLUSTERED DEFAULT NEWSEQUENTIALID(),
UserName VarChar(255)
)
END;
-- Now, let's
fill the tables with some data
-- Note: I'm
using SQL2008+ syntax here; On SQL2005 you need to separate the insert
statements
INSERT INTO UsersNewId(UserName) VALUES ('User1'),('User2'),('User3'),('User4'),('User5')
INSERT INTO UsersNewSeqId(UserName) VALUES ('User1'),('User2'),('User3'),('User4'),('User5')
-- Now, let's
see the tables
SELECT * FROM UsersNewId
SELECT * FROM UsersNewSeqId
The
results:
Note
the GUID on the 2nd table looks the same on first sight, but
actually it is not.
These
are increasing values that still maintain uniqueness.
This
is obviously much better for index fragmentation of any time!
[Tip #2]
Related to the above tip – as a DBA,
try to keep the GUID generation on your side!
Otherwise, if the GUID’s are generated
by a different layer in your application (DAL or so), the GUIDs cannot be
sequential.
Besides, try to think as if the GUID’s
are actually a new version of your INT identities; you wouldn’t let anyone
control the auto-increase now would you?
In
some cases you may want to consider surrogate keys, which will be used as the
foreign key between the tables.
To
use the sample at the beginning of this article – each user will have its own
GUID, but with an additional “UserInternalId” column (int) which will be the
table's primary key.
[Tip #4]
Don’t
use GUID if you don’t have to.
When
a table needs an ID which does not have to be unique across the enterprise,
keep using the regular numeric id.
It
takes less space and easier to manage.
Final Words:
GUIDs
are great and simple solution to maintain uniqueness across the enterprise.
However,
don’t overuse it when not required as it might hurt your databases
performances.
This
is a very wide subject – hope this in-a-nutshell post helps or at least gives
you a good head start.