Unix/Linux OS/Not able to remove duplicates using uniq command


Dear Sir,

Removing duplicates using sort and then uniq command is not working on my file, which contains blacklisted urls.For eg.

The command
uniq input.txt > output.txt results in:

whereas, I want output:

Can you plz suggest how to remove these duplicates (ip addresses or integer values)??

Thanks in advance.


There must be some "whitespace" causing uniq to consider them to be different...
I would guess that some of the entries came from a Windows type environment and some from *nix...
So, some lines have ^M in them...

Try this to remove carriage return, tab and space characters first:

tr -d "[\r\t ]" < input.txt | sort | uniq > output.txt

If that doesn't work, you need to look at the content of input.txt using something like "hd" or "od":
hd input.txt | less
And identify the character(s) causing the problem.
Once you know which character, you can add it to the list of characters to delete in the "tr" command.

Good Luck!

