Join | Login | Why Join?   
SQL Server, Oracle, DB2, Sybase, MySQL Help - SSWUG.ORG HACKER SAFE certified sites prevent over 99.9% of hacker crime.
Search SSWUG:   
 
Access to 492 free guest articles, discussions and more, just create your free SSWUG User ID:
Email address:  
This will be your login ID - we'll email you your password - you'll even receive the newsletter, opt-out at any time.
Email to Friend //  Discuss Article //  Rate Article //  Digg Article //  Add to Del.icio.us //  Add to Technorati

Full site membership is required for this article. Already a full member? Login here.
You currently have a guest membership. You can upgrade to full site membership here (free trial available).

Deleting Duplicate Rows in Sql Server


 Article Abstract:

(Mohammad Luqman Usman) Here is how to Delete Duplicate Rows both in Sql Server and Oracle.

 Read this article...

 Related Articles - For Members.
All Articles By Author

Identify and delete duplicate data from a table.
The Top Ten Problem
SQL Server 2005 TOP clause
How to remove duplicate rows/date from a table
Find Duplicate Indexes
Identifying Duplicate Data in the table
Oracle: Deleting Duplicate Rows Efficiently
Second Normal Form (2NF)



Key (Please note):
(R) - registration may be required for access at the target site
($) - target site may require paid membership for access to this or other content


Reader Comments: Post Your Comments/Feedback

RE: Deleting Duplicate Rows in Sql Server
(posted: 3/25/2004 7:38:28 AM)
Well this works for exact duplicates. but not for run of the near duplicates. You need to use keys made from left, right etc. In your example you could see why--
You spelled out Road. What if the near dup used RD?

Through testing in large databases, we found the most accurate test for dups would be to use last4 of a credit +
the zip + the first five of the last name + the first of the first name. This too can have its issues.

see example in next comments




RE: Deleting Duplicate Rows in Sql Server
(posted: 3/25/2004 7:40:00 AM)

update K set duplicate = 0 from dbo.keyfinancial_test2 K

select * into #tmpDup from keyfinancial_test2
where uniquekeywithCC is not null and
left(zip,5)+ left(lastname,5)+ left(BillingAddress1,5)+left(firstname,2)
in

(select left(zip,5)+ left(lastname,5)+
left(BillingAddress1,5)+left(firstname,2) from
keyfinancial_test2
where uniquekeywithCC is not null

continued in next



RE: Deleting Duplicate Rows in Sql Server
(posted: 4/1/2004 10:45:32 AM)
Awesome. I had created a cursor (loop) sproc like that described in this article. I tested the processing of that sproc against this code. Against 100K email addresses with an estimated 60% duplication, my sproc took just under 26 minutes. Using this code, from creation of the identity field to completion of dupe removal, total processing time was less than 4 minutes.



Email to Friend //  Discuss Article //  Rate Article //  Digg Article //  Add to Del.icio.us //  Add to Technorati

   




 

[ Register ] [ Webcasts ] [ Podcasts ] [ Newsletter Archive ] [ RSS/Feeds ]
[ About ] [ Advertise ] [ Contact ] [ Privacy ] [ Terms of Service ]
[ Link to SSWUG ] [ List Server Archives ] [ Recent Orig. Content ]
(c) 1997-2008, Bits on the Wire, Inc.  (0)

Some names and products covered by SSWUG are the registered trademarks of their respective owners.
DAA10354WWW004