2012年3月6日星期二

Any Problems with Large Packages?

I've been suffering with a variety of symptoms lately, and I don't know why:

    Can only debug once in BIDS. After that, it can never attach to DtsDebugHost.exe. DtsDebugHost.exe never receives any CPU time, and never grows past its initial memory allocation.

    Save takes a long time with 100% CPU usage. This can be as simple as openning a SQL Server Destination component to correct the metadata due to an added column in the underlying table, then saving. It takes several minutes at 100% CPU.

    I've had several Visual Studio crashes today (DW has reported these)

I've read here about people with 10MB or 17MB packages, so I feel badly complaining about my 3.2MB package! However, something is going on, and I'd like to know what it is, so I can debug again.

Thanks for any help.

Rebooting can sometimes work wonders Smile

What SP level are you running? I highly recommend SP2.

As a "good" practice, I try to keep packages small, for a variety of reasons. Smaller packages are easier to debug, seem to be more stable in BIDS, and are easier to manage in a team environment with multiple developers. My packages normally are under 1mb, and most often are 300-700kb.

|||

I'm running SP2.

Once I get it out to QA, I may look into breaking it up into smaller packages. I'm sure the size is due to the 17 outputs from the XML Source all being processed in the same data flow task. At the cost of reading the XML more than once, and of creating subset schemas (so I don't get hundreds of column not used warnings), I can break some pieces off and execute them in parallel.There are performance reasons as well, as there are buffers made wide by large string and NTEXT fields.

In the meantime, I'm wondering about how to avoid that reboot issue...

|||

A couple of other suggestion:

Try setting Work Offline (under the SSIS menu) to true before saving, and see if that makes a difference (it shouldn't, but I've seen stranger things).

How complex are the transformations in the data flow? You might want to use one package to read the file once, and save all the outputs to RAW files. Then have other packages that pick up the RAW files to do the rest of the processing. RAW files are extremely fast, so they don't introduce as much overhead as you might expect. On the other hand, if the transformations are simple, you won't get a lot of benefit from that approach.

|||

Another wonderful behavior is that data viewers aren't working reliably. I've just given up on one pair: one each on the default output and on the error output of a derived column transformation. The final step in that execution path writes to a SQL Server Destination, and indeed, the rows get there. But they don't stop at either data viewer, and there's no path to the destination that does not go through the derived column transformation.

没有评论:

发表评论