Wednesday, March 7, 2012

Need ID after OLEDB command (insert)

Hi,

I'm stuck on the following thing:

After a slowly changing dimension task I replaced the OLE DB Destination task by an OLE DB Command and created the insert manually. This because I need to work further on the dataset. So I do a union all between the output of the two OLE DB Commands (insert and update). Untill here no problem. But than, because I need an ID further on, I do a lookup in the table in which I just inserted and updated my data for the right ID's. When I run this project I get the error message "row yielded no match during lookup".

I don't understand this beacause I just inserted the data and I've checked, it's there.

I could resolve this by splitting up in two control flows (reselect all the needed data wíth the ID-field in my selection) but I would prefer to solve it in another way

Greets,

Tom

You cannot be assured that the inserted records exist and are avaiable to you until after the data-flow completes. You will need two data-flows I'm afraid. Use a raw file to pass data between them if you require.

-Jamie

|||

Hi Jamie,

In my opinion SSIS does a sort of precaching (I mean loading the lookup table in to memory before real execution of the whole data flow task and that must be the reason why he can't perform a lookup in an 'empty' table during the execution). << Can't I force him not to cache that table? just an idea >>

But in order to continue and to satisfy my boss, I will split it up...

Thanks anyway

Greets,
-Tom

|||

Tom DC wrote:

Hi Jamie,

In my opinion SSIS does a sort of precaching (I mean loading the lookup table in to memory before real execution of the whole data flow task and that must be the reason why he can't perform a lookup in an 'empty' table during the execution). << Can't I force him not to cache that table? just an idea >>

But in order to continue and to satisfy my boss, I will split it up...

Thanks anyway

Greets,
-Tom

Yes, you can easily configure the LOOKUP component not to cache any data. Still though you cannot guarantee that a row that you think will be there will indeed be there. [This is due to the buffer architecture of the pipeline which is elaborated on at various places around the internet and in textbooks if you fancy some heavy reading.]

So, regardless of your caching options you still need two data-flows.

-Jamie

|||

Allright, I've split it up. Thanks for your help!

|||

Setting the CacheType to none (and using a lookup table instead of SQL statement to avoid weird errors about unmatched parameters) did solve our problem. The data we expect to find in the table was written in the same dataflow by an OLE DB Command - it is not part of the dataflow itself.

Also, the lookup happens in a different "Execution Tree" than the insert via the OLE DB Command.

I think we are safe and do not have to split the dataflow.

Tom (the satisfied boss)

|||

Tom VdP wrote:

Setting the CacheType to none (and using a lookup table instead of SQL statement to avoid weird errors about unmatched parameters) did solve our problem. The data we expect to find in the table was written in the same dataflow by an OLE DB Command - it is not part of the dataflow itself.

Also, the lookup happens in a different "Execution Tree" than the insert via the OLE DB Command.

I think we are safe and do not have to split the dataflow.

Tom (the satisfied boss)

Fair enough. I can only advise as strngly as I possibly can that you do not under any circumstances do this. There is no way to guarantee that a record that you expect to be in teh table will indeed be there. This is because SSIS processes data buffer-by-buffer, not row-by-row.

It is of course up to you

-Jamie

|||

Jamie,

I understand. But how could SSIS treat an OLEDBCommand differently than just directly executing it ? The data flow engine doesn't know what the SQL command does, hence there is no way it would be able to buffer the result. Every "row" in the dataflow below the OLEDBCommand must have been processed (in our case: inserted in a table). Or am I thinking too much the old "DTS-way" ?

Regards,

Tom

|||

Tom VdP wrote:

Jamie,

I understand. But how could SSIS treat an OLEDBCommand differently than just directly executing it ? The data flow engine doesn't know what the SQL command does, hence there is no way it would be able to buffer the result. Every "row" in the dataflow below the OLEDBCommand must have been processed (in our case: inserted in a table). Or am I thinking too much the old "DTS-way" ?

Regards,

Tom

Tom,

Basically, yeah. You're thinking in DTS terms. The data-flow processes rows in groups called buffers whereas DTS was more row-by-row. Hence, if a row enters the LOOKUP component and it requires a row from the same buffer to be in the lookup table - it won't (yet) be in the lookup table.

So when i talk about buffers - I don't mean that any result is being buffered. Sorry, I should have elaborated on that more. The buffer architecture is a key feature of SSIS and I highly recommend you read around it to understand what it does and why it does it.

-Jamie

|||

Slight misunderstanding :-)

I thought you were hinting at the fact that the output of the OLEDBCommand might not yet be available. But you are referring to the input of the Lookup component. But then what is the CacheType property for ? If this is set to "none", doesn't it indicate that each lookup should be done against the actual contents of the table ? This property is undocumented in MSDN...

Regards,

Tom

|||

Correct. But like I'm trying (and failing miserably ) to explain is that there is no guarantee that a row put there by an OLE_DB Command/Destination adapter/whatever... will be available to all subsequent rows in the data-flow.

-Jamie

|||

Ok... to roundup allow me to summarize:

all data below an OLEDBCommand will have had its corresponding SQL statement executed, hence those changes are visible in the database

No comments:

Post a Comment