Monday, June 22, 2009

Why isn't my unix sort working?

Gaah.Today I ran into a strange problem while running the 'sort' command on Unix. On running this command with the following input,

AECS
@ADS
@AED

I was getting


@ADS
AECS
@AED


as the output. I was expecting the output to be


@ADS
@AED
AECS



It was as if the '@' character in my input data was completely being ignored. This caused a long running data load process to fail due to wrong data as I was using sort and merge logic to eliminate duplicates and merge data from multiple files.


On seraching the internet, I found that the 'sort' command depends on locale to decide the ordering of characters. you can check the default locale by using the 'locale' command.


The solution to fix the above sort is to set LC_ALL to "C" before calling sort. "C" stands for collation locale.

> export LC_ALL=C
> cat inputdata sort -s -T .


Turns out that there are some other comands that depend on locale. Read more on this subject here.



-----------



No comments: