显示标签为“distinct”的博文。显示所有博文
显示标签为“distinct”的博文。显示所有博文

2012年3月11日星期日

Any T-SQL advice?

Okay, given my newness to SQL, and the complexity of this query, I thought I'd
run this by you for your opinion:

SELECT DISTINCT a.WeekEnding, c.lastdate, c.numlate, b.totaldate

FROM Accomplishment a LEFT OUTER JOIN

(SELECT weekending, COUNT(weekending) AS
totaldate

FROM Accomplishment

WHERE (EmployeeID = 50)

GROUP BY weekending) b ON a.WeekEnding =
b.weekending LEFT OUTER JOIN

(SELECT weekending, MAX(entrydate) AS
lastdate, COUNT(weekending) AS numlate

FROM accomplishment

WHERE employeeid = 50 AND entrydate >
weekending

GROUP BY weekending) c ON a.WeekEnding =
c.weekending

ORDER BY a.WeekEnding

What I'm trying to do is for each pay period find which ones the employee
submitted a timesheet and which they were late (and if they were late, how
many of them). However, the query takes a good 5 seconds, and it seems
removing the "entrydate > weekending" clause speeds things up to almost
instant, however it does ruin the count that I really want. No idea why
that makes such a difference..CK (c_kettenbach@.hotmail.com) writes:
> Okay, given my newness to SQL, and the complexity of this query, I
> thought I'd run this by you for your opinion:
>...
> What I'm trying to do is for each pay period find which ones the employee
> submitted a timesheet and which they were late (and if they were late, how
> many of them). However, the query takes a good 5 seconds, and it seems
> removing the "entrydate > weekending" clause speeds things up to almost
> instant, however it does ruin the count that I really want. No idea why
> that makes such a difference..

Really why it takes longer with that clause I cannot tell, as I don't
know its tables nor its indexes. However, I found a simplification of
the query:

SELECT a.WeekEnding, b.lastdate, b.numlate, b.totaldate
FROM Accomplishment a
LEFT JOIN (SELECT weekending,
COUNT(weekending) AS totaldate,
SUM(CASE WHEN entrydate > weekending
THEN 1
ELSE 0
END) AS numlate,
MAX(CASE WHEN entrydate > weekending
THEN entrydate
END) AS lastdate
FROM Accomplishment
WHERE EmployeeID = 50
GROUP BY weekending) b ON a.WeekEnding = b.weekending
ORDER BY a.WeekEnding

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||On Mon, 06 Feb 2006 17:11:38 GMT, CK wrote:

>Okay, given my newness to SQL, and the complexity of this query, I thought I'd
>run this by you for your opinion:
>
>SELECT DISTINCT a.WeekEnding, c.lastdate, c.numlate, b.totaldate
>FROM Accomplishment a LEFT OUTER JOIN
> (SELECT weekending, COUNT(weekending) AS
>totaldate
> FROM Accomplishment
> WHERE (EmployeeID = 50)
> GROUP BY weekending) b ON a.WeekEnding =
>b.weekending LEFT OUTER JOIN
> (SELECT weekending, MAX(entrydate) AS
>lastdate, COUNT(weekending) AS numlate
> FROM accomplishment
> WHERE employeeid = 50 AND entrydate >
>weekending
> GROUP BY weekending) c ON a.WeekEnding =
>c.weekending
>ORDER BY a.WeekEnding
>
>What I'm trying to do is for each pay period find which ones the employee
>submitted a timesheet and which they were late (and if they were late, how
>many of them). However, the query takes a good 5 seconds, and it seems
>removing the "entrydate > weekending" clause speeds things up to almost
>instant, however it does ruin the count that I really want. No idea why
>that makes such a difference..

Hi CK,

The query looks more complicated than it needs to be. Based on a whole
lot of assumptions about your data and without any testing (check out
www.aspfaq.com/5006 if you prefer less assumptions and more testing),
I'd suggest changing it to

SELECT WeekEnding,
MAX(CASE WHEN entrydate > weekending THEN entrydate END) AS
LastDate,
COUNT(CASE WHEN entrydate > weekending THEN 'CountMe' END) AS
NumLate,
COUNT(*) AS TotalDate
FROM Accomplishment
WHERE EmployeeId = 30
GROUP BY WeekEnding
ORDER BY WeekEnding

Or, if you want a report for all employees:

SELECT EmployeeID,
WeekEnding,
MAX(CASE WHEN entrydate > weekending THEN entrydate END) AS
LastDate,
COUNT(CASE WHEN entrydate > weekending THEN 'CountMe' END) AS
NumLate,
COUNT(*) AS TotalDate
FROM Accomplishment
GROUP BY EmployeeID,
WeekEnding
ORDER BY EmployeeID,
WeekEnding

--
Hugo Kornelis, SQL Server MVP|||Seeing Hugo's queries, I realize that I did a blunder when I cut
DISTINCT. Assuming that you want all weekendings - also those when
Employee 50 did not enter anything at all, this may be better:

SELECT a.WeekEnding, b.lastdate, b.numlate, b.totaldate
FROM (SELECT DISTINCT WeekEnding FROM Accomplishment) AS a
LEFT JOIN (SELECT weekending,
COUNT(weekending) AS totaldate,
SUM(CASE WHEN entrydate > weekending
THEN 1
ELSE 0
END) AS numlate,
MAX(CASE WHEN entrydate > weekending
THEN entrydate
END) AS lastdate
FROM Accomplishment
WHERE EmployeeID = 50
GROUP BY weekending) b ON a.WeekEnding = b.weekending
ORDER BY a.WeekEnding

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||>> What I'm trying to do is for each pay period find which of the employees submitted a timesheet and which they were late (and if they were late, how many of them). <<

Where is the payperiod table in your data model??

>> No idea why that makes such a difference.. <<

Because your schema design is a pile of crap?? Duh!

Wild Idea! for uou Please post DDL, so that people do not have to
guess what the keys, constraints, Declarative Referential Integrity,
data types, etc. in your schema are. Sample data is also a good idea,
along with clear specifications. It is very hard to debug code when
you do not let us see it.

Newbies think that DDL is not as important as DML. They assume they
can "repair" a bad schema in code. You cannot. Ever. The best you can
hope for is a horrible nested sert of joins, like Sommarskog posted.

Do you want ot do it right or just kludge and patch it?

2012年2月13日星期一

Any clue to make this query run faster

Any help would be really appreciated...
My stored procedure...

CREATE PROCEDURE business3rd7
@.Fromdate DATETIME,
@.ToDate DATETIME
AS

select distinct CONVERT(VARCHAR(10),Receipts.Companynumber1)+CONVE RT(VARCHAR(10),Receipts.Companynumber2) as co ,
Receipts.Premium1+Receipts.Premium2 as Premium,
"CAN"=case when Receipts.transactiontype='CAN'
then (receipts.premium1+receipts.premium2)
else 0
end,
"NET"=Receipts.Premium1+Receipts.Premium2-case when Receipts.transactiontype='CAN'
then (receipts.premium1+receipts.premium2)
else 0
end,

"#NEW"=case when Receipts.transactiontype='NEW' then count(Receipts.policynumber) else
0 end,

---
"$NEW"=
case when Receipts.transactiontype='NEW' then (Receipts.Premium1+Receipts.premium2)
else 0 end,
"#REN"=case when Receipts.transactiontype='REN' then count(Receipts.policynumber) else
0 end,
"$RENEW"=
case when Receipts.transactiontype='REN' then (Receipts.Premium1+Receipts.premium2)
else 0 end,
"#AP"=case when Receipts.transactiontype='AP' then count(Receipts.policynumber) else 0
end,
"$AP"=
case when Receipts.transactiontype='AP' then (Receipts.Premium1+Receipts.premium2) else
0 end,
"#SENT"=case when policy.Renewalofferdate between @.Fromdate AND @.ToDate then
count(policy.policynumber) end,

"%"=case when case when Receipts.transactiontype='REN' then count(Receipts.policynumber)else 0
end =0 then 0
when case when policy.Renewalofferdate between @.Fromdate AND @.ToDate then
count(policy.policynumber) else 0 end=0 then 0
else
case when Receipts.transactiontype='REN' then count(Receipts.policynumber)else 0
end /case when policy.Renewalofferdate between @.Fromdate AND @.ToDate then
count(policy.policynumber) else 0 end end,

"Current Year"= case when policy.Renewalofferdate between @.Fromdate AND @.ToDate then
count(clubamount) end,
"Previous Year"=case when policy.Renewalofferdate between DateAdd(year, -1, @.Fromdate)
AND DateAdd(year, -1, @.ToDate) then count(clubamount) end,
"#AA"=count(receipts.clubamount),
"$AA"=sum(receipts.clubamount)

FROM Receipts,Policy
where Receipts.Agencyid=Policy.Agentid
group by
Receipts.CompanyNumber1,Receipts.CompanyNumber2,
Receipts.Premium1,Receipts.Premium2,
Receipts.TransactionType,policy.Renewalofferdate,
Receipts.Agencyid

GO

Query plan...

-------Query Plan

|--Sort(DISTINCT ORDER BY:([Expr1008] ASC, [Expr1009] ASC, [Expr1010] ASC, [Expr1011] ASC, [Expr1012] ASC, [Expr1013] ASC, [Expr1014] ASC, [Expr1015] ASC, [Expr1016] ASC, [Expr1017] ASC, [Expr1018] ASC, [Expr1019] ASC, [Expr1020] ASC, [Expr1021] ASC, [Ex
|--Compute Scalar(DEFINE:([Expr1008]=Convert([Receipts].[CompanyNumber1])+Convert([Receipts].[CompanyNumber2]), [Expr1009]=[Receipts].[Premium1]+[Receipts].[Premium2], [Expr1010]=If ([Receipts].[TransactionType]='CAN') then ([Receipts].[Premium1]+[R
|--Compute Scalar(DEFINE:([Expr1004]=Convert([Expr1076]), [Expr1005]=Convert([Expr1077]), [Expr1006]=Convert([Expr1078]), [Expr1007]=If ([Expr1078]=0) then NULL else [Expr1079]))
|--Stream Aggregate(GROUP BY:([Receipts].[CompanyNumber1], [Receipts].[CompanyNumber2], [Receipts].[Premium1], [Receipts].[Premium2], [Receipts].[TransactionType], [Policy].[RenewalOfferDate], [Receipts].[AgencyID]) DEFINE:([Expr1076]=COUN
|--Sort(ORDER BY:([Receipts].[CompanyNumber1] ASC, [Receipts].[CompanyNumber2] ASC, [Receipts].[Premium1] ASC, [Receipts].[Premium2] ASC, [Receipts].[TransactionType] ASC, [Policy].[RenewalOfferDate] ASC, [Receipts].[AgencyID] ASC))
|--Hash Match(Inner Join, HASH:([Policy].[AgentID])=([Receipts].[AgencyID]), RESIDUAL:([Policy].[AgentID]=[Receipts].[AgencyID]))
|--Table Scan(OBJECT:([gasInquiry].[dbo].[Policy]))
|--Table Scan(OBJECT:([gasInquiry].[dbo].[Receipts]))

\

The two tables has number of records as 13349 and 97032.It taking more than 30 mins...
Any way to make it faster...well distinct sucks.

you have 2 table scans. I always use the graphical execution plan, but if these are big query costs, you may want to index the columns involved.|||well distinct sucks.

you have 2 table scans. I always use the graphical execution plan, but if these are big query costs, you may want to index the columns involved.

Thank you Thrasymachus.My guess was right...yea one of the culprit is that two table scan let me create some index on those columns and I have to think over that distinct again.|||Lol - I didn't even know you could use distinct and group by in a single query. In any case - this makes distinct doubly unnecessary.|||Yea,thats my mistake ..I overlooked that...well after creating the index on the columns not a significant change has resulted.
...:S

Any clue...|||A where clause?|||A where clause?
I didnt get you,I can't use where clause with @.fromdate and @.todate ,bcoz I am doing the calculation of previous year also.
well, I think I have to consider the whole query in a different approach...|||where did you create your indices?|||I mean that it is usually pretty rare to need every record from one table and all related records from another. If you need them all then you need them all but point was - do you?|||I mean that it is usually pretty rare to need every record from one table and all related records from another. If you need them all then you need them all but point was - do you?

No,I don't need all records of one table only those records which satisfy the datefrom clause, and then I want some records of other table which is related to this table by agentid.
Clear?|||where did you create your indices?

I have created the clustered and non clustered index on the columns on both tables which are invloved in this query.ok?|||ummm, are they on the columns causing your scans? are your scans a large percentage of your query cost? how big are the tables involved?|||ummm, are they on the columns causing your scans? are your scans a large percentage of your query cost? how big are the tables involved?

well, I checked my estiimated execution plan and found that the sort is taking the 96% cost...I mean the grouping cost 96%.
Any clue?
The tables are having data of 13349 and 97032 respectively,its a testing databse so few data are there...|||Ok, I've taken some liberties trying to get to what I think you want. Some columns are just plain missing because they made no sense to me. The result set is quite differently arranged in that what was once one row has become many rows based on transaction type. This isn't an "apples to apples" comparison, but I think it does what I think you really want, and will actually be easier for you to use!SELECT
Policy.AgentID
, Receipts.CompanyNumber1
, Receipts.CompanyNumber2
, Receipts.transactiontype
, Count(*) AS c
, Sum(Receipts.premium1 + Receipts.premium2) AS dollars
, Count(Receipts.clubamount) AS "AA#"
, Sum(Receipts.clubamount) AS "AA$"
FROM Receipts
JOIN Policy
ON (policy.agentID = Receipts.AgentID)
WHERE policy.renewalofferdate BETWEEN @.Fromdate AND @.ToDate
GROUP BY Policy.AgentID
, Receipts.Company1
, Receipts.Company2
, Receipts.TransactionType-PatP|||This query still not working ...:(
I am trying all things but for some strange reason its not working...
SEE the modified code which contains the crosstab operations...
The two tables are Policy and Receipts tables...

CREATE TABLE [Policy] (

[PolicyNumber] [nvarchar] (10) ,
[PolicySequence] [smallint] NULL ,
[Name] [nvarchar] (40) ,

--there are many columns here and I am are not mentioning those

[AgentID] [int] NULL ,
[RenewalOfferDate] [smalldatetime] NULL

) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

CREATE TABLE [Receipts] (
[RecordType] [nchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[AgencyID] [int] NULL ,
[ContractNumber] NULL ,
[MonthlyPayment] [money] NULL ,
[PaymentType] [nchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[PolicyNumber] [nchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[PolicySequence] [smallint] NOT NULL ,
[ReceiptNumber] [bigint] NULL ,
[TransactionType] [nchar] (3) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[InsuredName] [nchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[DownPayment] [money] NULL ,
[ClubAmount] [money] NULL ,
[ServiceFee] [money] NULL ,
[MVRFee] [money] NULL ,
[SR22Fee] [money] NULL ,
[AgentInitial] [nchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[ProcessedInitial] [nchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[QuotedInitial] [nchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Source] [nchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[CompanyNumber1] [int] NULL ,
[PolicyNumber1] [nchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Premium1] [money] NULL ,
[CompanyNumber2] [int] NULL ,
[PolicyNumber2] [nchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Premium2] [money] NULL ,
[TotalAmount] [money] NULL ,
[HowPaid] [nchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[HowPaidCode] [nchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[AmountTendered] [money] NULL ,
[Change] [money] NULL ,
[DateEntered] [datetime] NULL ,
[Void] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[VoidDate] [datetime] NULL ,
[VoidBy] [nchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[RequestVoidBy] [nchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[ReasonVoid] [nchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]

Here is the code which ran really fast...

@.Fromdate DATETIME,
@.ToDate DATETIME
set @.fromdate='1/1/2006'
set @.todate='12/1/2006'

SELECT
Policy.AgentID
, Receipts.CompanyNumber1
, Receipts.CompanyNumber2
, Receipts.transactiontype
, Count(*) AS c
, Sum(Receipts.premium1 + Receipts.premium2) AS dollars
, Count(Receipts.clubamount) AS "AA#"
, Sum(Receipts.clubamount) AS "AA$"

FROM Receipts
JOIN Policy
ON (Policy.Agentid=Receipts.Agencyid)
WHERE policy.renewalofferdate BETWEEN @.Fromdate AND @.ToDate
GROUP BY Policy.AgentID
, Receipts.CompanyNumber1
, Receipts.CompanyNumber2
, Receipts.TransactionType

This is the modified code which is very slow one...

declare
@.Fromdate DATETIME,
@.ToDate DATETIME
set @.fromdate='1/1/2006'
set @.todate='12/1/2006'

SELECT
Policy.AgentID
, Receipts.CompanyNumber1
, Receipts.CompanyNumber2
, Receipts.transactiontype
, Count(*) AS c
, Sum(Receipts.premium1 + Receipts.premium2) AS dollars
, Count(Receipts.clubamount) AS "AA#"
, Sum(Receipts.clubamount) AS "AA$",

----- [B]These calculations are needed here

"CAN"=case when Receipts.transactiontype='CAN'
then (receipts.premium1+receipts.premium2)
else 0
end,
"NET"=Receipts.Premium1+Receipts.Premium2-case when Receipts.transactiontype='CAN'
then (receipts.premium1+receipts.premium2)
else 0
end,

"#NEW"=case when Receipts.transactiontype='NEW' then count(Receipts.policynumber) else
0 end,

---
"$NEW"=
case when Receipts.transactiontype='NEW' then (Receipts.Premium1+Receipts.premium2)
else 0 end,
"#REN"=case when Receipts.transactiontype='REN' then count(Receipts.policynumber) else
0 end,
"$RENEW"=
case when Receipts.transactiontype='REN' then (Receipts.Premium1+Receipts.premium2)
else 0 end,
"#AP"=case when Receipts.transactiontype='AP' then count(Receipts.policynumber) else 0
end,
"$AP"=
case when Receipts.transactiontype='AP' then (Receipts.Premium1+Receipts.premium2) else
0 end,
"#SENT"=case when policy.Renewalofferdate between @.Fromdate AND @.ToDate then
count(policy.policynumber) end,

"%"=case when case when Receipts.transactiontype='REN' then count(Receipts.policynumber)else 0
end =0 then 0
when case when policy.Renewalofferdate between @.Fromdate AND @.ToDate then
count(policy.policynumber) else 0 end=0 then 0
else
case when Receipts.transactiontype='REN' then count(Receipts.policynumber)else 0
end /case when policy.Renewalofferdate between @.Fromdate AND @.ToDate then
count(policy.policynumber) else 0 end end,

"Current Year"= case when policy.Renewalofferdate between @.Fromdate AND @.ToDate then
count(clubamount) end,
"Previous Year"=case when policy.Renewalofferdate between DateAdd(year, -1, @.Fromdate)
AND DateAdd(year, -1, @.ToDate) then count(clubamount) end

----

FROM Receipts
JOIN Policy
ON (Policy.Agentid=Receipts.Agencyid)
WHERE policy.renewalofferdate BETWEEN @.Fromdate AND @.ToDate
GROUP BY Policy.AgentID
, Receipts.CompanyNumber1
, Receipts.CompanyNumber2
, Receipts.TransactionType

--- three(3) more group by added

, Receipts.Premium1
, Receipts.Premium2
, Policy.Renewalofferdate

I have created neccessary Indices on the columns which are involed and are grouped.
I have created an index comprises of all the columns involed in the group except Policy.Renewalofferdate.

Any clue?