Friday, March 23, 2012

Need sql script or statement to extract all relations from one tree

Suppose I have a table with relations (REL). In this table there are relations between parent en child, so the columns are: Rel_Id, parent_Id, child_Id. Example:

1, A, B
2, A, C
3, A, D
4, B, E
5, C, F
6, G, H
7, G, I
8, H, J

I need a query which returns all relations from REL that are in the same tree as the input unit.
In the example, giving D as input unit, it should return relations 1, 2, 3, 4 and 5, because A, B, C, E and F are (in)directly related to D and belong therefore to the same tree.
Giving H as input, it should return relations 6, 7 and 8.
(with a tree, I mean that a parent can have 0, 1 or more children and a child belongs at most one parent.)

Thanks for your help in advance!

Here you go...

CREATE TABLE REL(Num int, ColumnA varchar(5), ColumnB varchar(5))
CREATE TABLE Direct(Num int, ColumnA varchar(5), ColumnB varchar(5))
CREATE TABLE Indirect(Num int, ColumnA varchar(5), ColumnB varchar(5))

INSERT INTO REL(Num, ColumnA, ColumnB)VALUES(1, 'A', 'B')
INSERT INTO REL(Num, ColumnA, ColumnB)VALUES(2, 'A', 'C')
INSERT INTO REL(Num, ColumnA, ColumnB)VALUES(3, 'A', 'D')
INSERT INTO REL(Num, ColumnA, ColumnB)VALUES(4, 'B', 'E')
INSERT INTO REL(Num, ColumnA, ColumnB)VALUES(5, 'C', 'F')
INSERT INTO REL(Num, ColumnA, ColumnB)VALUES(6, 'G', 'H')
INSERT INTO REL(Num, ColumnA, ColumnB)VALUES(7, 'G', 'I')
INSERT INTO REL(Num, ColumnA, ColumnB)VALUES(8, 'H', 'J')

--Delete above after first execution

DECLARE @.Input varchar(5), @.Counter int

SET @.Input = 'D'

SET @.Counter = 0

INSERT INTO Direct
SELECT * FROM REL WHERE ColumnA = @.Input OR ColumnB = @.Input

WHILE @.Counter <> (SELECT COUNT(*) FROM REL)
BEGIN

INSERT INTO Indirect
SELECT * FROM REL
WHERE ColumnA IN (SELECT ColumnA FROM Direct)
OR ColumnA IN (SELECT ColumnB FROM Direct)
OR ColumnB IN (SELECT ColumnB FROM Direct)
OR ColumnB IN (SELECT ColumnA FROM Direct)

INSERT INTO Direct
SELECT DISTINCT * FROM Indirect

DELETE FROM Indirect

SET @.Counter = @.Counter + 1

END

SELECT * FROM Direct
UNION
SELECT * FROM Indirect

DELETE FROM Direct
DELETE FROM Indirect

Adamus

|||

Thanks Adamus,

That helps a lot. I have not run the script yet, but when looking at it, I have two questions.

1) You write

INSERT INTO Direct
SELECT DISTINCT * FROM Indirect

At the first cycle, the relation A-D is in table Direct, but this one is also added in the above insert into statement, when I am right. So, after the insert, I have two relations with A-D in Direct. Is that correct? If yes, I assume that I can use the distinct afterwards.

2) You write

WHILE @.Counter <> (SELECT COUNT(*) FROM REL)

The example I gave, had only 8 relations, but my actual table has thousands of relations. The while statement will then be executed a lot of times unnecessary (also in the order of thousands, because most trees are quite simple).

Is there a way to compare tables? If yes how can I compare if Direct is exactly the same as Indirect? Because, if that is the case, no more relations are found and I can end the loop.

|||

The loop is a necessary evil because the logical number of possibilities are NumOfRecords * NumOfFieldsCompared. You must iterate fully without exiting the loop in order to get accurate information. As a result, the Indirect table will contain many duplicate rows that are necessary in your case. This is why the DISTINCT Keyword is used.

Run the query without the loop. Look at the resultset. Run it again with the loop and compare. It may make more sense.

Also, the UNION makes a difference if you remove it and replace it.

And yes, I tested the query with all 8 relations. Please run the script before replying.

Thanks,

Adamus

|||

ratslav wrote:

The example I gave, had only 8 relations, but my actual table has thousands of relations. The while statement will then be executed a lot of times unnecessary (also in the order of thousands, because most trees are quite simple).

Is there a way to compare tables? If yes how can I compare if Direct is exactly the same as Indirect? Because, if that is the case, no more relations are found and I can end the loop.

Because of the temporary relationships, comparing the tables won't work. For each iteration, the relationships will change. For iteration 1, The child becomes the parent for the next iteration. During distinct iterations, each element will at some point be the parent with a direct relation to the proceeding child. Technically, there is never an indirect relationship until the query completes or before the query is executed.

There is a reason I named the tables Direct and Indirect. Indirect relationships become Direct until all relationships are found.

Do you understand that the query and the logic force all elements to be parents in order to find the children?

Also, there is no such thing as unnecessary iterations. You don't know how many records to compare until you compare all of them.

Thanks,

Adamus

|||Here is an example of how to solve this using SQL Server 2005

recursive queries - perhaps not as efficiently as is possible,

but it should work. This uses as sample data Employee,Manager

pairs from the sample database AdventureWorks, excluding some

top managers' rows so there are several "trees."

Your scenario is made a bit messy by the fact that some of

your items are only in ColumnA and some only in ColumnB. If

you had a single table containing one of every item, it would

be a little easier.

DECLARE @.Input varchar(5)

SET @.Input = 'D';

with Manages(EmpID,MgrID,Distance) as (

select

ColumnB,

ColumnB,

0

from REL

union

select

M1.ColumnA,

M1.ColumnA,

0

from REL as M1

where not exists (

select * from REL as M2

where M1.ColumnA = M2.ColumnB

)

union all

select

M.EmpID,

E.ColumnA,

Distance+1

from Manages as M

join REL as E

on E.ColumnB = M.MgrID

where E.ColumnA is not null

)

select

EmpID

from Manages

where MgrID = (

select top 1 MgrID

from Manages

where EmpID = @.Input

order by Distance desc

)

order by EmpID

go

Here is another solution that is a bit more direct and that

may be faster.

create function TopManager(

@.emp char

) returns char as begin

declare @.mgr char

set @.mgr = @.emp

set @.emp = case when @.mgr = 'A' then '$' else 'A' end

-- anything but @.mgr

while @.emp <> @.mgr begin

select

@.emp = @.mgr,

@.mgr = coalesce(ColumnA,@.mgr)

from REL

where ColumnB = @.mgr

if @.@.rowcount = 0 return @.mgr

end

return @.emp

end

go

declare @.emp char;

set @.emp = 'D';

with Descendants(EmpID) as (

select ColumnB

from REL

where ColumnA = dbo.TopManager(@.emp)

union all

select M.ColumnB

from REL as M

join Descendants as D

on D.EmpID = M.ColumnA

)

select EmpID

from Descendants

union

select dbo.TopManager(@.emp)

order by EmpID

go

-- Steve Kass

-- Drew University

-- http://www.stevekass.com

-- CBCEF696-CEA4-4B36-8099-0CBAC7C86AB5

ratslav@.discussions.microsoft.com wrote:

> Thanks Adamus,

>

> That helps a lot. I have not run the script yet, but when looking at it,

> I have two questions.

>

> 1) You write

>

> INSERT INTO Direct

> SELECT DISTINCT * FROM Indirect

>

> At the first cycle, the relation A-D is in table Direct, but this one is

> also added in the above insert into statement, when I am right. So,

> after the insert, I have two relations with A-D in Direct. Is that

> correct? If yes, I assume that I can use the distinct afterwards.

>

> 2) You write

>

> WHILE @.Counter <> (SELECT COUNT(*) FROM REL)

>

> The example I gave, had only 8 relations, but my actual table has

> thousands of relations. The while statement will then be executed a lot

> of times unnecessary (also in the order of thousands, because most trees

> are quite simple).

>

> Is there a way to compare tables? If yes how can I compare if Direct is

> exactly the same as Indirect? Because, if that is the case, no more

> relations are found and I can end the loop.

>

>

>

>|||

Adamus,

I would like to reply to your two answers:
First, I believe in your solution. Today I tried it at work, and it gave the correct relations.
I know a loop is necessary, but going through it that many times, seems unlogical to me.
I did run the script on the thousands of relations that were in my actual #REL.
After a minute I still didn′t have any results. Because others employees use the same database,
and their performance must not be affected by my query, this is no solution. I need a more efficient one.

Please review the script below. I am unable to run this at home. Tomorrow at the office I will try it myself.
I hope it works.


NNTP User,

Thanks for your contribution, but I don′t understand the text. Probably because I am quite new to SQL.
If the script below fails, I will look at your answer again (I first have to learn something about "with" etc).

CREATE TABLE #REL(Num int, ColumnA varchar(5), ColumnB varchar(5))
CREATE TABLE #Direct(Num int, ColumnA varchar(5), ColumnB varchar(5))
CREATE TABLE #Indirect(Num int, ColumnA varchar(5), ColumnB varchar(5))
CREATE TABLE #UniqueIndirect(Num int, ColumnA varchar(5), ColumnB varchar(5))

INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(1, 'A', 'B')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(2, 'A', 'C')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(3, 'A', 'D')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(4, 'B', 'E')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(5, 'C', 'F')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(6, 'G', 'H')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(7, 'G', 'I')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(8, 'H', 'J')

-- Delete above after first execution

DECLARE @.Input varchar(5)
SET @.Input = 471524

INSERT INTO #Direct
SELECT * FROM #REL WHERE ColumnA = @.Input OR ColumnB = @.Input

INSERT INTO #Indirect
SELECT * FROM #REL WHERE
ColumnA IN (SELECT ColumnA FROM #Direct) OR
ColumnA IN (SELECT ColumnB FROM #Direct) OR
ColumnB IN (SELECT ColumnB FROM #Direct) OR
ColumnB IN (SELECT ColumnA FROM #Direct)

INSERT INTO #UniqueIndirect
SELECT DISTINCT * FROM #Indirect

DELETE FROM #Indirect


WHILE (SELECT COUNT(*) FROM #UniqueIndirect) <> (SELECT COUNT(*) FROM #Direct)
BEGIN

DELETE FROM #Direct

INSERT INTO #Direct
SELECT * FROM #UniqueIndirect

DELETE FROM #UniqueIndirect

INSERT INTO #Indirect
SELECT * FROM #REL WHERE
ColumnA IN (SELECT ColumnA FROM #Direct) OR
ColumnA IN (SELECT ColumnB FROM #Direct) OR
ColumnB IN (SELECT ColumnB FROM #Direct) OR
ColumnB IN (SELECT ColumnA FROM #Direct)

INSERT INTO #UniqueIndirect
SELECT DISTINCT * FROM #Indirect

DELETE FROM #Indirect

END

SELECT * FROM #UniqueIndirect

DELETE FROM #Direct
DELETE FROM #Indirect

|||

Friend,

A query that runs for 1+ minutes is not that surprising. I have queries that take 15 minutes to run and these are standard queries. This is not unusual. Unless you have a terabyte of processing and maxed out RAM on the server, it's inevitable that some queries will take longer than others.

My only other suggestion, if possible, is to use concatenated keys on the fields so that indexes are created. This should speed up the query time. As far as minimizing the loops, I have no answer. As I have said before, you don't know how many records/fields to compare until you've compared them all at least once.

Best of luck,

Adamus

|||

ratslav wrote:

CREATE TABLE #REL(Num int, ColumnA varchar(5), ColumnB varchar(5))
CREATE TABLE #Direct(Num int, ColumnA varchar(5), ColumnB varchar(5))
CREATE TABLE #Indirect(Num int, ColumnA varchar(5), ColumnB varchar(5))
CREATE TABLE #UniqueIndirect(Num int, ColumnA varchar(5), ColumnB varchar(5))

INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(1, 'A', 'B')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(2, 'A', 'C')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(3, 'A', 'D')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(4, 'B', 'E')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(5, 'C', 'F')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(6, 'G', 'H')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(7, 'G', 'I')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(8, 'H', 'J')

-- Delete above after first execution

DECLARE @.Input varchar(5)
SET @.Input = 471524

INSERT INTO #Direct
SELECT * FROM #REL WHERE ColumnA = @.Input OR ColumnB = @.Input

INSERT INTO #Indirect
SELECT * FROM #REL WHERE
ColumnA IN (SELECT ColumnA FROM #Direct) OR
ColumnA IN (SELECT ColumnB FROM #Direct) OR
ColumnB IN (SELECT ColumnB FROM #Direct) OR
ColumnB IN (SELECT ColumnA FROM #Direct)

INSERT INTO #UniqueIndirect
SELECT DISTINCT * FROM #Indirect

DELETE FROM #Indirect


WHILE (SELECT COUNT(*) FROM #UniqueIndirect) <> (SELECT COUNT(*) FROM #Direct)
BEGIN

DELETE FROM #Direct

INSERT INTO #Direct
SELECT * FROM #UniqueIndirect

DELETE FROM #UniqueIndirect

INSERT INTO #Indirect
SELECT * FROM #REL WHERE
ColumnA IN (SELECT ColumnA FROM #Direct) OR
ColumnA IN (SELECT ColumnB FROM #Direct) OR
ColumnB IN (SELECT ColumnB FROM #Direct) OR
ColumnB IN (SELECT ColumnA FROM #Direct)

INSERT INTO #UniqueIndirect
SELECT DISTINCT * FROM #Indirect

DELETE FROM #Indirect

END

SELECT * FROM #UniqueIndirect

DELETE FROM #Direct
DELETE FROM #Indirect

This work fine for me?

You just have to drop the tables

Drop Table #Rel
Drop Table #Direct
Drop Table #Indirect
Drop Table #UniqueIndirect

Where are you having difficulty?

Adamus

|||

Also,

You may want to consult a MSVP on the performance difference in derived tables vs. real tables. I don't know enough about the memory allocations to make an experienced judgement call.

As a hunch, I would think temp tables would take longer simply due to allocations but I could be wrong.

Adamus

|||

Thanks Adamus,

I will try the script tomorrow and let you know if it works fine for me as well.

The reason why I used temp tables is because of permissions. I am not allowed to create dbo or user tables, but I have permission to create temp tables.

|||

The script worked for me for a small tree (the one from the example). I wasn′t satisfied with the result of a larger tree, but this may have been caused by the statements after the things were executed which were described above. Anyway, it looks very promising.

I have just one question left. I need to save the output of the following three lines into a textfile. In Oracle I could use something like spool. In SQL Server this won′t work. Has anyone suggestions?

Thanks!

SET NOCOUNT ON

print 'blablabefore' + CHAR(13)
select * from #endresult
print 'blablaafter'

No comments:

Post a Comment