Suppose I have a table with relations (REL). In this table there are relations between parent en child, so the columns are: Rel_Id, parent_Id, child_Id. Example:
1, A, B
2, A, C
3, A, D
4, B, E
5, C, F
6, G, H
7, G, I
8, H, J
I need a query which returns all relations from REL that are in the same tree as the input unit.
In the example, giving D as input unit, it should return relations 1, 2, 3, 4 and 5, because A, B, C, E and F are (in)directly related to D and belong therefore to the same tree.
Giving H as input, it should return relations 6, 7 and 8.
(with a tree, I mean that a parent can have 0, 1 or more children and a child belongs at most one parent.)
Thanks for your help in advance!
Here you go...
CREATE TABLE REL(Num int, ColumnA varchar(5), ColumnB varchar(5))
CREATE TABLE Direct(Num int, ColumnA varchar(5), ColumnB varchar(5))
CREATE TABLE Indirect(Num int, ColumnA varchar(5), ColumnB varchar(5))
INSERT INTO REL(Num, ColumnA, ColumnB)VALUES(1, 'A', 'B')
INSERT INTO REL(Num, ColumnA, ColumnB)VALUES(2, 'A', 'C')
INSERT INTO REL(Num, ColumnA, ColumnB)VALUES(3, 'A', 'D')
INSERT INTO REL(Num, ColumnA, ColumnB)VALUES(4, 'B', 'E')
INSERT INTO REL(Num, ColumnA, ColumnB)VALUES(5, 'C', 'F')
INSERT INTO REL(Num, ColumnA, ColumnB)VALUES(6, 'G', 'H')
INSERT INTO REL(Num, ColumnA, ColumnB)VALUES(7, 'G', 'I')
INSERT INTO REL(Num, ColumnA, ColumnB)VALUES(8, 'H', 'J')
--Delete above after first execution
DECLARE @.Input varchar(5), @.Counter int
SET @.Input = 'D'
SET @.Counter = 0
INSERT INTO Direct
SELECT * FROM REL WHERE ColumnA = @.Input OR ColumnB = @.Input
WHILE @.Counter <> (SELECT COUNT(*) FROM REL)
BEGIN
INSERT INTO Indirect
SELECT * FROM REL
WHERE ColumnA IN (SELECT ColumnA FROM Direct)
OR ColumnA IN (SELECT ColumnB FROM Direct)
OR ColumnB IN (SELECT ColumnB FROM Direct)
OR ColumnB IN (SELECT ColumnA FROM Direct)
INSERT INTO Direct
SELECT DISTINCT * FROM Indirect
DELETE FROM Indirect
SET @.Counter = @.Counter + 1
END
SELECT * FROM Direct
UNION
SELECT * FROM Indirect
DELETE FROM Direct
DELETE FROM Indirect
Adamus
|||Thanks Adamus,
That helps a lot. I have not run the script yet, but when looking at it, I have two questions.
1) You write
INSERT INTO Direct
SELECT DISTINCT * FROM Indirect
At the first cycle, the relation A-D is in table Direct, but this one is also added in the above insert into statement, when I am right. So, after the insert, I have two relations with A-D in Direct. Is that correct? If yes, I assume that I can use the distinct afterwards.
2) You write
WHILE @.Counter <> (SELECT COUNT(*) FROM REL)
The example I gave, had only 8 relations, but my actual table has thousands of relations. The while statement will then be executed a lot of times unnecessary (also in the order of thousands, because most trees are quite simple).
Is there a way to compare tables? If yes how can I compare if Direct is exactly the same as Indirect? Because, if that is the case, no more relations are found and I can end the loop.
|||
The loop is a necessary evil because the logical number of possibilities are NumOfRecords * NumOfFieldsCompared. You must iterate fully without exiting the loop in order to get accurate information. As a result, the Indirect table will contain many duplicate rows that are necessary in your case. This is why the DISTINCT Keyword is used.
Run the query without the loop. Look at the resultset. Run it again with the loop and compare. It may make more sense.
Also, the UNION makes a difference if you remove it and replace it.
And yes, I tested the query with all 8 relations. Please run the script before replying.
Thanks,
Adamus
|||ratslav wrote: The example I gave, had only 8 relations, but my actual table has thousands of relations. The while statement will then be executed a lot of times unnecessary (also in the order of thousands, because most trees are quite simple).
Is there a way to compare tables? If yes how can I compare if Direct is exactly the same as Indirect? Because, if that is the case, no more relations are found and I can end the loop.
Because of the temporary relationships, comparing the tables won't work. For each iteration, the relationships will change. For iteration 1, The child becomes the parent for the next iteration. During distinct iterations, each element will at some point be the parent with a direct relation to the proceeding child. Technically, there is never an indirect relationship until the query completes or before the query is executed.
There is a reason I named the tables Direct and Indirect. Indirect relationships become Direct until all relationships are found.
Do you understand that the query and the logic force all elements to be parents in order to find the children?
Also, there is no such thing as unnecessary iterations. You don't know how many records to compare until you compare all of them.
Thanks,
Adamus
|||Here is an example of how to solve this using SQL Server 2005recursive queries - perhaps not as efficiently as is possible,
but it should work. This uses as sample data Employee,Manager
pairs from the sample database AdventureWorks, excluding some
top managers' rows so there are several "trees."
Your scenario is made a bit messy by the fact that some of
your items are only in ColumnA and some only in ColumnB. If
you had a single table containing one of every item, it would
be a little easier.
DECLARE @.Input varchar(5)
SET @.Input = 'D';
with Manages(EmpID,MgrID,Distance) as (
select
ColumnB,
ColumnB,
0
from REL
union
select
M1.ColumnA,
M1.ColumnA,
0
from REL as M1
where not exists (
select * from REL as M2
where M1.ColumnA = M2.ColumnB
)
union all
select
M.EmpID,
E.ColumnA,
Distance+1
from Manages as M
join REL as E
on E.ColumnB = M.MgrID
where E.ColumnA is not null
)
select
EmpID
from Manages
where MgrID = (
select top 1 MgrID
from Manages
where EmpID = @.Input
order by Distance desc
)
order by EmpID
go
Here is another solution that is a bit more direct and that
may be faster.
create function TopManager(
@.emp char
) returns char as begin
declare @.mgr char
set @.mgr = @.emp
set @.emp = case when @.mgr = 'A' then '$' else 'A' end
-- anything but @.mgr
while @.emp <> @.mgr begin
select
@.emp = @.mgr,
@.mgr = coalesce(ColumnA,@.mgr)
from REL
where ColumnB = @.mgr
if @.@.rowcount = 0 return @.mgr
end
return @.emp
end
go
declare @.emp char;
set @.emp = 'D';
with Descendants(EmpID) as (
select ColumnB
from REL
where ColumnA = dbo.TopManager(@.emp)
union all
select M.ColumnB
from REL as M
join Descendants as D
on D.EmpID = M.ColumnA
)
select EmpID
from Descendants
union
select dbo.TopManager(@.emp)
order by EmpID
go
-- Steve Kass
-- Drew University
-- http://www.stevekass.com
-- CBCEF696-CEA4-4B36-8099-0CBAC7C86AB5
ratslav@.discussions.microsoft.com wrote:
> Thanks Adamus,
>
> That helps a lot. I have not run the script yet, but when looking at it,
> I have two questions.
>
> 1) You write
>
> INSERT INTO Direct
> SELECT DISTINCT * FROM Indirect
>
> At the first cycle, the relation A-D is in table Direct, but this one is
> also added in the above insert into statement, when I am right. So,
> after the insert, I have two relations with A-D in Direct. Is that
> correct? If yes, I assume that I can use the distinct afterwards.
>
> 2) You write
>
> WHILE @.Counter <> (SELECT COUNT(*) FROM REL)
>
> The example I gave, had only 8 relations, but my actual table has
> thousands of relations. The while statement will then be executed a lot
> of times unnecessary (also in the order of thousands, because most trees
> are quite simple).
>
> Is there a way to compare tables? If yes how can I compare if Direct is
> exactly the same as Indirect? Because, if that is the case, no more
> relations are found and I can end the loop.
>
>
>
>|||
Adamus,
I would like to reply to your two answers:
First, I believe in your solution. Today I tried it at work, and it gave the correct relations.
I know a loop is necessary, but going through it that many times, seems unlogical to me.
I did run the script on the thousands of relations that were in my actual #REL.
After a minute I still didn′t have any results. Because others employees use the same database,
and their performance must not be affected by my query, this is no solution. I need a more efficient one.
Please review the script below. I am unable to run this at home. Tomorrow at the office I will try it myself.
I hope it works.
NNTP User,
Thanks for your contribution, but I don′t understand the text. Probably because I am quite new to SQL.
If the script below fails, I will look at your answer again (I first have to learn something about "with" etc).
CREATE TABLE #REL(Num int, ColumnA varchar(5), ColumnB varchar(5))
CREATE TABLE #Direct(Num int, ColumnA varchar(5), ColumnB varchar(5))
CREATE TABLE #Indirect(Num int, ColumnA varchar(5), ColumnB varchar(5))
CREATE TABLE #UniqueIndirect(Num int, ColumnA varchar(5), ColumnB varchar(5))
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(1, 'A', 'B')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(2, 'A', 'C')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(3, 'A', 'D')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(4, 'B', 'E')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(5, 'C', 'F')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(6, 'G', 'H')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(7, 'G', 'I')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(8, 'H', 'J')
-- Delete above after first execution
DECLARE @.Input varchar(5)
SET @.Input = 471524
INSERT INTO #Direct
SELECT * FROM #REL WHERE ColumnA = @.Input OR ColumnB = @.Input
INSERT INTO #Indirect
SELECT * FROM #REL WHERE
ColumnA IN (SELECT ColumnA FROM #Direct) OR
ColumnA IN (SELECT ColumnB FROM #Direct) OR
ColumnB IN (SELECT ColumnB FROM #Direct) OR
ColumnB IN (SELECT ColumnA FROM #Direct)
INSERT INTO #UniqueIndirect
SELECT DISTINCT * FROM #Indirect
DELETE FROM #Indirect
WHILE (SELECT COUNT(*) FROM #UniqueIndirect) <> (SELECT COUNT(*) FROM #Direct)
BEGIN
DELETE FROM #Direct
INSERT INTO #Direct
SELECT * FROM #UniqueIndirect
DELETE FROM #UniqueIndirect
INSERT INTO #Indirect
SELECT * FROM #REL WHERE
ColumnA IN (SELECT ColumnA FROM #Direct) OR
ColumnA IN (SELECT ColumnB FROM #Direct) OR
ColumnB IN (SELECT ColumnB FROM #Direct) OR
ColumnB IN (SELECT ColumnA FROM #Direct)
INSERT INTO #UniqueIndirect
SELECT DISTINCT * FROM #Indirect
DELETE FROM #Indirect
END
SELECT * FROM #UniqueIndirect
DELETE FROM #Direct
DELETE FROM #Indirect
Friend,
A query that runs for 1+ minutes is not that surprising. I have queries that take 15 minutes to run and these are standard queries. This is not unusual. Unless you have a terabyte of processing and maxed out RAM on the server, it's inevitable that some queries will take longer than others.
My only other suggestion, if possible, is to use concatenated keys on the fields so that indexes are created. This should speed up the query time. As far as minimizing the loops, I have no answer. As I have said before, you don't know how many records/fields to compare until you've compared them all at least once.
Best of luck,
Adamus
|||This work fine for me?ratslav wrote: CREATE TABLE #REL(Num int, ColumnA varchar(5), ColumnB varchar(5))
CREATE TABLE #Direct(Num int, ColumnA varchar(5), ColumnB varchar(5))
CREATE TABLE #Indirect(Num int, ColumnA varchar(5), ColumnB varchar(5))
CREATE TABLE #UniqueIndirect(Num int, ColumnA varchar(5), ColumnB varchar(5))INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(1, 'A', 'B')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(2, 'A', 'C')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(3, 'A', 'D')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(4, 'B', 'E')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(5, 'C', 'F')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(6, 'G', 'H')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(7, 'G', 'I')
INSERT INTO #REL(Num, ColumnA, ColumnB) VALUES(8, 'H', 'J')-- Delete above after first execution
DECLARE @.Input varchar(5)
SET @.Input = 471524
INSERT INTO #Direct
SELECT * FROM #REL WHERE ColumnA = @.Input OR ColumnB = @.InputINSERT INTO #Indirect
SELECT * FROM #REL WHERE
ColumnA IN (SELECT ColumnA FROM #Direct) OR
ColumnA IN (SELECT ColumnB FROM #Direct) OR
ColumnB IN (SELECT ColumnB FROM #Direct) OR
ColumnB IN (SELECT ColumnA FROM #Direct)INSERT INTO #UniqueIndirect
SELECT DISTINCT * FROM #IndirectDELETE FROM #Indirect
WHILE (SELECT COUNT(*) FROM #UniqueIndirect) <> (SELECT COUNT(*) FROM #Direct)
BEGINDELETE FROM #Direct
INSERT INTO #Direct
SELECT * FROM #UniqueIndirectDELETE FROM #UniqueIndirect
INSERT INTO #Indirect
SELECT * FROM #REL WHERE
ColumnA IN (SELECT ColumnA FROM #Direct) OR
ColumnA IN (SELECT ColumnB FROM #Direct) OR
ColumnB IN (SELECT ColumnB FROM #Direct) OR
ColumnB IN (SELECT ColumnA FROM #Direct)INSERT INTO #UniqueIndirect
SELECT DISTINCT * FROM #IndirectDELETE FROM #Indirect
END
SELECT * FROM #UniqueIndirect
DELETE FROM #Direct
DELETE FROM #Indirect
You just have to drop the tables
Drop Table #Rel
Drop Table #Direct
Drop Table #Indirect
Drop Table #UniqueIndirect
Where are you having difficulty?
Adamus
|||Also,
You may want to consult a MSVP on the performance difference in derived tables vs. real tables. I don't know enough about the memory allocations to make an experienced judgement call.
As a hunch, I would think temp tables would take longer simply due to allocations but I could be wrong.
Adamus
|||Thanks Adamus,
I will try the script tomorrow and let you know if it works fine for me as well.
The reason why I used temp tables is because of permissions. I am not allowed to create dbo or user tables, but I have permission to create temp tables.
|||The script worked for me for a small tree (the one from the example). I wasn′t satisfied with the result of a larger tree, but this may have been caused by the statements after the things were executed which were described above. Anyway, it looks very promising.
I have just one question left. I need to save the output of the following three lines into a textfile. In Oracle I could use something like spool. In SQL Server this won′t work. Has anyone suggestions?
Thanks!
SET NOCOUNT ON
print 'blablabefore' + CHAR(13)
select * from #endresult
print 'blablaafter'
No comments:
Post a Comment