Crazy shell scripting
April 14th, 2009 by shashi
Not quite a mapreduce implementation
, but nevertheless …
while read l; do echo "SELECT id, email FROM tblName WHERE email='$l' ORDER by id DESC LIMIT 1;"|mysql -u root dbName; done < /tmp/dupes3.txt |grep -v 'email'|cut -f1|sed 's/^\(.*\)$/DELETE FROM tblName WHERE id=\1;/g' > /tmp/deletedupes3.sql
- 2 Comments »
- Posted in notes, unix
April 14th, 2009 at 10:48 pm
Good one
As per my understanding,
1. dupes.txt with hundreds of records,
2. Split that file into dupes1.txt, dupes2.txt… dupesN.txt
3. run above script in multiple system
4. Combine the files generated by each system and run to remove duplicates in DB.
but how the dupes3.txt(or dupes.txt) populated?
echo “SELECT email from tblName group by email having count(email) > 1″ | mysql -uroot dbName | grep -v ‘email’ > dupes.txt
correct me if i am wrong.
April 15th, 2009 at 8:20 am
Aravinda,
Bang on! you’re right on all the points