Monday, April 19, 2010

DOS/Windows End of Line vs Linux EOLN

Working with bash scripts on some text files can be really annoying. Everything seems perfect but you just can't get the desired results.
One common reason for this is the incompatibility of the end of line markers between different operating systems. On Windows there are two characters: '\r\n' and on Linux only one: '\n'. And that extra '\r' can really mess your terminal and your echo outputs. On the other hand, taking a Linux file on a Windows notepad, will display everything on the same line. But don't worry, this can be fixed. If you are on Windows, instead of Notepad try Wordpad or Word and this will eventually display your file correctly.

On Linux, first you should check your file to see what you deal with. You can use hexdump or mc(midnight commander - mcedit).


$ hexdump dos_test.txt -C
[...] 6f 77 73 20 73 74 79 6c |DOS/Windows styl|
[...] 20 4c 69 6e 65 0d 0a |e End Of Line..|

$ hexdump lin_test.txt -C
[...] 75 78 20 73 74 79 6c 65 |Unix/Linux style|
[...] 20 45 4f 4c 4e 0a | EOLN.|


On the first file we have 0d 0a sequence and on the second line only 0a.

To avoid headaches when you use a Windows file with some bash scripts or something similar you need to convert it.

You can use tr:

$ tr -d '\r' inputfile.txt > outputfile.txt

or

$ dos2unix dosfile.txt unixfile.txt

You can also use AWK, PHP, SED to convert files, even ftp.