Deleting Duplicate Rows in Sql Server
|
(Mohammad Luqman Usman) Here is how to Delete Duplicate Rows both in Sql Server and Oracle. |
|
| Related Articles - For Members.
|
|
Key (Please note):
(R) - registration may be required for access at the target site
($) - target site may require paid membership for access to this or other content
Reader Comments: Post Your Comments/Feedback
RE: Deleting Duplicate Rows in Sql Server (posted: 3/25/2004 7:38:28 AM) Well this works for exact duplicates. but not for run of the near duplicates. You need to use keys made from left, right etc. In your example you could see why-- You spelled out Road. What if the near dup used RD?
Through testing in large databases, we found the most accurate test for dups would be to use last4 of a credit + the zip + the first five of the last name + the first of the first name. This too can have its issues.
see example in next comments
RE: Deleting Duplicate Rows in Sql Server (posted: 3/25/2004 7:40:00 AM)
update K set duplicate = 0 from dbo.keyfinancial_test2 K
select * into #tmpDup from keyfinancial_test2 where uniquekeywithCC is not null and left(zip,5)+ left(lastname,5)+ left(BillingAddress1,5)+left(firstname,2) in
(select left(zip,5)+ left(lastname,5)+ left(BillingAddress1,5)+left(firstname,2) from keyfinancial_test2 where uniquekeywithCC is not null
continued in next
RE: Deleting Duplicate Rows in Sql Server (posted: 4/1/2004 10:45:32 AM) Awesome. I had created a cursor (loop) sproc like that described in this article. I tested the processing of that sproc against this code. Against 100K email addresses with an estimated 60% duplication, my sproc took just under 26 minutes. Using this code, from creation of the identity field to completion of dupe removal, total processing time was less than 4 minutes.
|