XML Bulk Load: Need some suggestions about using UNIQUEIDENTIFIER

Hi all,
I have always tried to stay away from UNIQUEIDENTIFIER column types,
but I am working on a new project and I am thinking that maybe would be
the best solution.
In this project we are going to have around 200 tables, some of them may
have about 20 million records. Most of the tables are related each other.
The queries that we will have on the system will involve several tables
at the same time.
And here the most important... we are going to have multiple sites. The
information is going to be transferred between sites. Not all the tables
will be transferred but most of them. That's the reason I am thinking in
using a UNIQUEIDENTIFIER column type for the PKs of my tables.
We will be using SQL Server 2005.
What do you guys think? Do you think the overall performance of the
system can be altered a lot?
Thanks!On Mon, 24 Apr 2006 10:54:08 -0400, LEM wrote:

>What do you guys think?
Hi LEM,
First question is if you need a suurrogate key at all. Maybe the
business key is short enough to be used as the only key in the table?
But if you do need to add a surrogate key to some tables, AND you need
to be able to add values to the tables on various sites without running
into replication problems, UNIQUEIDENTIFIER might be the best choice for
you.
Hugo Kornelis, SQL Server MVP|||Thanks for your reply, Hugo.

> First question is if you need a suurrogate key at all. Maybe the
> business key is short enough to be used as the only key in the table?
Not sure if I understand. A regular int PK should be ok for each table,
but I would run into replication problems when transferring data.
The main reason I was considering using a UNIQUEIDENTIFIER column type
as PK for each table was because of replication.|||On Tue, 25 Apr 2006 10:49:32 -0400, LEM wrote:

>Thanks for your reply, Hugo.
>
>Not sure if I understand. A regular int PK should be ok for each table,
Hi LEM,
No, it's not.
In the design phase, you should already find out how your entities are
identified in the business. In 99.9% of all cases, you'll find that the
business already has a way to make sure that employees are discussing
the same customer / product / task / whatever. The attribute (or set of
attributes) the use for this is the business key fir the entity.
During implementation, you'll have to assess for each idividual entity's
business key if it's a good candidate to be the PRIMARY KEY for the
corresponding table. The default answer to that question should be
"yes". Reasons to answer "no" instead are
1. Business key is apt to frequent change, OR
2. Business key is a column with long character data or spans multiple
columns (or both) AND there are references to the entity in other
tables.
Consider using a surrogate key if either 1 or 2 applies. A surrogate key
is any system-generated key. This is added to the table IN ADDITION TO
the column(s) of the business key. The business key is NOT removed!!
Both surrogate key and business key are constrained to be unique (in
most cases, the surrogate key is made PRIMARY KEY and the business key
gets a UNIQUE constraint, but this may be reversed). Any references to
the entity (FOREIGN KEY references) are made using the surrogate key
instead of the business key. For performance reasons, the surrogate key
should be single-column and short - this is why an integer column with
the IDENTITY attribute is often chosen.
IMPORTANT: The surrogate key values should be kept internal to the
system. The end users only need to see the business key. Use stored
procedures or views to make sure that the surrogate key values are never
exposed to end users.
If you have tables with ONLY a system-generated PRIMARY KEY and no other
UNIQUE constraint, then you *will* get duplicates. Maybe not today, and
if you're lucky not tomorrow either - but one day, you will.

>but I would run into replication problems when transferring data.
>The main reason I was considering using a UNIQUEIDENTIFIER column type
>as PK for each table was because of replication.
As I said in my earlier reply - *if* you need a surrogate key in a
replicated scenarion, then UNIQUEIDENTIFIER might be a good choice. But
using no surrogate key (if possible) is better yet.
Hugo Kornelis, SQL Server MVP|||Hugo, thanks a lot for your detailed explanation.
Following your instructions this is what I have done
(I hope I have understood everything correctly):
CREATE TABLE [dbo].[TestTable](
[MySurrogateKey] [uniqueidentifier] NOT NULL CONSTRAINT
[DF_TestTable_MySurrogateKey] DEFAULT (newid()),
[MyBusinessKey] [int] NOT NULL,
CONSTRAINT [PK_TestTable] PRIMARY KEY NONCLUSTERED
(
[MySurrogateKey] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY],
CONSTRAINT [IX_TestTable] UNIQUE CLUSTERED
(
[MyBusinessKey] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
My question now is the following. If I transfer a record from one site
to another... don't you think that I may get duplicates because of the
MyBusinessKey column? That column is unique only in the CURRENT SITE.
That's the reason I had in mind having only the surrogate key, instead
of keeping both. Don't you think I would have that problem if I keep both?
Thanks!|||On Wed, 26 Apr 2006 13:50:07 -0400, LEM wrote:

>Hugo, thanks a lot for your detailed explanation.
>Following your instructions this is what I have done
>(I hope I have understood everything correctly):
>CREATE TABLE [dbo].[TestTable](
> [MySurrogateKey] [uniqueidentifier] NOT NULL CONSTRAINT
>[DF_TestTable_MySurrogateKey] DEFAULT (newid()),
> [MyBusinessKey] [int] NOT NULL,
> CONSTRAINT [PK_TestTable] PRIMARY KEY NONCLUSTERED
>(
> [MySurrogateKey] ASC
> )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY],
> CONSTRAINT [IX_TestTable] UNIQUE CLUSTERED
>(
> [MyBusinessKey] ASC
> )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
> ) ON [PRIMARY]
Hi LEM,
Looks good - assuming that the real table uses a better name for the
business key and includes some other data columns as well.
I like how you've made the business key clustered, rather than accepting
the default of clustering on the primary key. A uniqueidentifier often
is a bad choice for the clustered index due to frequent page splits.

>My question now is the following. If I transfer a record from one site
>to another... don't you think that I may get duplicates because of the
>MyBusinessKey column? That column is unique only in the CURRENT SITE.
That would indeed cause problems (failed replication due to violations
of the UNIQUE constraint). How to handle this depends on the real cause:
a) Either the business key is a good, unique identifier for the business
objects across the entire business. In that case, there really has been
a violation of a business rule and you want the replication to detect it
and handle it appropriately. As an example, think of tax payers
identified by their SSN - if two tax offices add a tax assessment for
the same SSN and the same fiscal year, you WANT the replication process
to detect that duplicate information has been entered and remove one of
the rows.
b) Or the business key turns out to be incomplete. If, for instance,
sales are numbered 1, 2, ... each day in each store, than the business
key (SaleDate, SaleNumber) is wrong as soon as you start replicating
data between stores - change it to (StoreID, SaleDate, SaleNumber) to
solve the problem and prevent the UNIQUE constraint violation.
Hugo Kornelis, SQL Server MVP|||Hugo,
Thanks a lot for your help.

> Looks good - assuming that the real table uses a better name for the
> business key and includes some other data columns as well.
Yes, that was only an example.

> I like how you've made the business key clustered, rather than accepting
> the default of clustering on the primary key. A uniqueidentifier often
> is a bad choice for the clustered index due to frequent page splits.
Exactly. I was reading a little bit about it and that's why I decided to
do that.

> b) Or the business key turns out to be incomplete. If, for instance,
> sales are numbered 1, 2, ... each day in each store, than the business
> key (SaleDate, SaleNumber) is wrong as soon as you start replicating
> data between stores - change it to (StoreID, SaleDate, SaleNumber) to
> solve the problem and prevent the UNIQUE constraint violation.
Yes, that should resolve my problem!
Thanks again for everything, Hugo.

Friday, March 23, 2012

Need some suggestions about using UNIQUEIDENTIFIER

No comments:

Post a Comment

XML Bulk Load

Blog Archive

About Me