Send in your Unix questions today! |
See additional Unix tips and tricks
Most every Unix sysadmin has run smack into line ending incompatibilities from time to time. The most common problem is the appearance of ^M characters at the ends of lines in text files that were built for or on Windows systems. Text files often end up with the pesky ^M characters when they're transferred from one system to another using scp or ftp in binary (byte-by-byte) mode instead of ASCII mode.
For many applications, the extra ^M characters cause no problems whatsoever. If you have a configuration file that includes these characters, the software that reads the file may not notice or balk. The ^M characters at the end of shebang lines, on the other hand, can confuse Unix systems. The string #!/bin/bash^M, after all, doesn't exactly match what the system needs to identify /bin/bash as the appropriate shell to process the file's contents. Try to execute a script that looks like this in vi and you'll end up with a "./shoplist: No such file or directory" error:
#!/bin/bash^M
^M
echo apple^M
echo banana^M
echo coconut^M
echo donut^M
echo egg
To make matters worse, we don't have only the difference between DOS-based endings that use both a carriage return and a linefeed to terminate lines and Unix systems that end lines with linefeeds only, we also have systems (e.g., Mac OS X) that end lines with carriage returns only. This third convention probably only makes sense to those of us old enough to remember using typewriters, but it is nonetheless a modern line termination convention.
How to make it right
Some Unix systems (like Solaris) provide utilities for converting DOS text files to Unix (linefeed) text files and vice versa. The dos2unix and unix2dos utilities will read a file in one format and create another, adding or removing the carriage return depending on the direction of the conversion. Numerous other tools can be used to effect the same conversion.
The dos2unix and unix2dos commands are generally used in one of two ways:
% dos2unix < shoplist.txt > shoplist
% dos2unix shoplist.txt > shoplist
In either case, the ^M (carriage return) characters will be stripped from the shoplist file.
The tr (translate) command can also be used to remove carriage returns:
cat shoplist.txt | tr -d '\015' > shoplist
In this command the 015 represents the octal code for a carriage return in ASCII.
A similar command could be used to strip linefeeds, leaving just the carriage return.
cat shoplist.txt | tr -d '\012' > shoplist.macos
Files treated this way would look very strange on most Unix systems:
% cat shoplist
a apple^Mb banana^Mc coconut^Md donut^Me egg
Then, of course, there are the Perl commands for doing the same thing. To change carriage return, linefeed endings to linefeed-only endings, you could do this:
perl -p -i -e 's/\r\n/\n/' shoplist.txt
The forward slashes in this command are the separators that isolate the carriage return, linefeed (\r\n) from the linefeed by itself (\n).
Or you could just strip out the carriage returns like this:
perl -p -i -e 's/\r//' shoplist.txt
To turn linefeeds back into carriage return linefeeds, you could use a command like this one:
perl -p -i -e 's/\n/\r\n/' shoplist.txt
To turn linefeeds into carriage returns or carriage returns into linefeeds, you would us one of the following commands:
perl -p -i -e 's/\n/\r/' shoplist
perl -p -i -e 's/\r/\n/' shoplist.macos
Similar commands using tr would look like these:
cat shoplist | tr '\012' '\015' > shoplist.macos
cat shoplist.macos | tr '\015' '\012' > shoplist
There are many ways to convert text files to the proper format for the target system. The only problem I have with the perl approach is that it tempts me to add "pie" to my shopping list!