open.itworld.com
  Search  
Security Home Page Security Webcasts Security White Papers Security Newsletters Security News Open Topics Careers ITworld Voices ITwhirled The Security site of ITworld.com

Unix Tip: Managing line termination differences in text files

ITworld.com 08/31/2006

Sandra Henry-Stocker, ITworld.com

Send in your Unix questions today! | See additional Unix tips and tricks

Most every Unix sysadmin has run smack into line ending incompatibilities from time to time. The most common problem is the appearance of ^M characters at the ends of lines in text files that were built for or on Windows systems. Text files often end up with the pesky ^M characters when they're transferred from one system to another using scp or ftp in binary (byte-by-byte) mode instead of ASCII mode.

For many applications, the extra ^M characters cause no problems whatsoever. If you have a configuration file that includes these characters, the software that reads the file may not notice or balk. The ^M characters at the end of shebang lines, on the other hand, can confuse Unix systems. The string #!/bin/bash^M, after all, doesn't exactly match what the system needs to identify /bin/bash as the appropriate shell to process the file's contents. Try to execute a script that looks like this in vi and you'll end up with a "./shoplist: No such file or directory" error:

#!/bin/bash^M
^M
echo apple^M
echo banana^M
echo coconut^M
echo donut^M
echo egg
To make matters worse, we don't have only the difference between DOS-based endings that use both a carriage return and a linefeed to terminate lines and Unix systems that end lines with linefeeds only, we also have systems (e.g., Mac OS X) that end lines with carriage returns only. This third convention probably only makes sense to those of us old enough to remember using typewriters, but it is nonetheless a modern line termination convention.

How to make it right

Some Unix systems (like Solaris) provide utilities for converting DOS text files to Unix (linefeed) text files and vice versa. The dos2unix and unix2dos utilities will read a file in one format and create another, adding or removing the carriage return depending on the direction of the conversion. Numerous other tools can be used to effect the same conversion.

The dos2unix and unix2dos commands are generally used in one of two ways:
% dos2unix < shoplist.txt > shoplist
% dos2unix shoplist.txt > shoplist
In either case, the ^M (carriage return) characters will be stripped from the shoplist file.

The tr (translate) command can also be used to remove carriage returns:
cat shoplist.txt | tr -d '\015' > shoplist
In this command the 015 represents the octal code for a carriage return in ASCII.

A similar command could be used to strip linefeeds, leaving just the carriage return.
cat shoplist.txt | tr -d '\012' > shoplist.macos
Files treated this way would look very strange on most Unix systems:
% cat shoplist
a apple^Mb banana^Mc coconut^Md donut^Me egg
Then, of course, there are the Perl commands for doing the same thing. To change carriage return, linefeed endings to linefeed-only endings, you could do this:
perl -p -i -e 's/\r\n/\n/' shoplist.txt
The forward slashes in this command are the separators that isolate the carriage return, linefeed (\r\n) from the linefeed by itself (\n).

Or you could just strip out the carriage returns like this:
perl -p -i -e 's/\r//' shoplist.txt
To turn linefeeds back into carriage return linefeeds, you could use a command like this one:
perl -p -i -e 's/\n/\r\n/' shoplist.txt
To turn linefeeds into carriage returns or carriage returns into linefeeds, you would us one of the following commands:
perl -p -i -e 's/\n/\r/' shoplist
perl -p -i -e 's/\r/\n/' shoplist.macos
Similar commands using tr would look like these:
cat shoplist | tr '\012' '\015' > shoplist.macos
cat shoplist.macos | tr '\015' '\012' > shoplist
There are many ways to convert text files to the proper format for the target system. The only problem I have with the perl approach is that it tempts me to add "pie" to my shopping list!

On this topic

 

Sandra Henry-Stocker has been administering Unix systems for more than 18 years. She describes herself as "USL" (Unix as a second language) but remembers enough English to write books and buy groceries. She currently works for TeleCommunication Systems, a wireless communications company, in Annapolis, Maryland, where no one else necessarily shares any of her opinions. She lives with her second family on a small farm on Maryland's Eastern Shore. Send comments and suggestions to bugfarm@gmail.com.




Sponsored Links

Sign up for a Microsoft Dynamics® CRM WEBCAST
Hear globally recognized leaders in customer strategy discuss the importance and evolution of CRM.
Sun Microsystems' - FREE 60 DAY TRIAL OFFER!
Test Sun's Newest Servers BEFORE YOU BUY. Plug Them In With Access To Full Technical Support.
100% Web Based Help Desk Software
Easy to use, customizable to meet your needs, powerful and scalable. Free online demo. Try it today!
Sign up for a Microsoft Dynamics® CRM WEBCAST
Hear globally recognized leaders in customer strategy discuss the importance and evolution of CRM.
Used and Refurbished HP ProCurve Switches
Lifetime Warranties, Professional Testing & Shipping on all HP Equipment Purchases!
» Buy a link now

Advertisements
Sponsored links
Locate Hidden Software on business PCs with this free tool
Bring harmony to your mix of UNIX-Linux-Windows computing environments
Top 5 Reasons to Combine App Performance and Security
KODAK i1400 Series Scanners stand up to the challenge
 Home   Open source  Operating systems  Unix
www.itworld.com    open.itworld.com     security.itworld.com     smallbusiness.itworld.com
storage.itworld.com     utilitycomputing.itworld.com     wireless.itworld.com

 
Contact Us   About Us   Privacy Policy    Terms of Service   Reprints  

CIO   Computerworld   CSO   GamePro   Games.net   IDG Connect   IDG World Expo   Infoworld   ITworld   JavaWorld   LinuxWorld  MacUser   Macworld   Network World   PC World   Playlist  

Copyright © Computerworld, Inc. All rights reserved

Reproduction in whole or in part in any form or medium without express written permission of Computerworld Inc. is prohibited. Computerworld and Computerworld.com and the respective logos are trademarks of International Data Group Inc.