From: www.itworld.com

Linux file compression tool guide

by Jacek Artymiak

March 23, 2001 —

 

The compression and decompression of files is one the most useful inventions in the history of computing, but the lack of portable and open tools poses a major obstacle to making the process as useful as it could be. As long as we exchange files with other users of Unix-like operating systems, we can use portable tools like compress, gzip, and bzip2 (see Resources for links). But the rest of the world uses a lot of proprietary software -- and even proprietary compression algorithms -- that the owners may never release to the public.



Such a state of affairs could prove dangerous. Imagine a future in which we can't decipher a substantial portion of our archives because we created them using proprietary tools and algorithms, and the operating systems and hardware those tools ran on became obsolete and disappeared. That is why we should use open source compression tools, or at least commercial compression tools that use free and well-documented algorithms. (Just remember to turn any special enhancements off.)



This article is a guide to working with .arc, .arj, .lzh (.lha), .rar, .sit, .zip, and .zoo files on the Linux operating system. I chose to discuss those particular formats because users of the three most popular operating systems (MS-DOS, Microsoft Windows, and Mac OS) use them most often. I wrote this guide to quickly point out the right tools, rather than to act as a detailed and technical discussion of file compression techniques.

LinuxWorld.com links


Lastly, I'd like to include a short note about self-extracting files. Such archives are in reality programs that contain data in compressed form. When you execute them, they will unpack and copy the data stored within the body of the program to whatever drive you specify. The only problem is that they do not work under Linux and there are no tools to extract them. In such cases you will need to ask the person who created the archive to compress it as an ordinary archive.



That's it for now. I hope you will find this guide useful and that you will write to me with your comments and suggestions.

Index





.arc



Files with the .arc extension are relatively rare. If you do stumble upon one of them, you can reasonably assume that it was created with the help of old MS-DOS SAE ARC or PKware PKARC archival utilities. This format is not well supported on Linux, and you can only hope to decompress those files on a Linux machine. If you need to create .arc files, try running original SAE ARC or PKware PKARC (look for them on FTP servers carrying utilities for MS-DOS) under DOSEMU or VMware.



To decompress .arc files, use Aladdin Expander for Linux. At the time of this writing, Aladdin Expander was available for free for public beta testing; whether it will remain free is an open question.



To decompress an .arc file, type unstuff file.arc. A useful option, -d, specifies the destination directory for decompressed files. Thus: unstuff -d=./home/james/incoming file.arc.



Note: to learn more about Aladdin Expander for Linux, read the section about .sit files.



VMware: http://www.vmware.com



DOSEMU: http://www.dosemu.org



Aladdin Systems: http://www.aladdinsys.com/expander/expander_linux_login.html



Back to index




.arj



Files with the .arj extension are created using ARJ Software's ARJ utility for MS-DOS and Windows. Since ARJ is a shareware program without freely available source code, tools matching its functionality on the Linux platform are almost nonexistent, which makes working with .arj files unnecessarily difficult.



All you can hope for with an ARJ file is to successfully decompress it. Compression in the .arj format is not possible under Linux, since there are no native compression tools for Linux that generate the files. (You may try running original ARJ software under DOSEMU or VMware, but that hardly represents an easy-to-use solution).



To decompress .arj files, use the unarj utility. It is slower and less capable than ARJ itself, but at least it extracts a majority of .arj files without problems. It can only extract files into the present working directory, list the contents of an archive, or test an archive.



To see a short help page, type unarj; for a longer description, see the unarj.doc file, usually located in the /usr/doc/unarj-2.43 directory.



Although it is up to you to choose the place where you will decompress a .arj file, to avoid cluttering up your home working directory you should always create a temporary subdirectory in your home directory, change the present working directory to the new directory, move the compressed file to the new directory, and then use unarj e archive.arj to decompress the file in question.



unarj for Linux is also limited to extracting all files at once; you cannot extract individual files from an archive. At least if a file already exists in the present working directory, you won't overwrite its contents when the file with the same name is found in the archive. Also, unarj does not support empty directories or self-extracting archives; the latter are MS-DOS/Windows programs that will not run under Linux, but may execute under DOSEMU or VMware.



Here are some additional unarj options:




Please note that the unarj options do not begin with a minus (-) sign.



If unarj fails to decompress a file, try Aladdin Expander for Linux (see the section on .sit for more information). You could also ask the person who created the archive to compress it using another tool such as gzip, which is available for MS-DOS for free. You might also ask for a simple.arj archive, in which all advanced ARJ options like volumes, dividing the archive into smaller parts, and self-extraction would be turned off.



Why not use self-extracting ARJ archives? These are MS-DOS-style executables that do not run under Linux; they use different system libraries, and their internal format differs from what Linux expects an executable binary to be. You may have luck running self-extracting ARJ archives under DOSEMU or VMware.



Compression in ARJ or JAR formats remains impossible on Linux and, according to ARJ Software's FAQ page, we should not hold our breath for a Linux port. If you know of any Linux tools that can decompress .jar files, I'd like to hear about them.



You can download unarj sources from ARJ Software's site or from the FTP server that carries your favorite Linux distribution. Also, unarj is often a part of basic Linux distributions, so you may find it on the main distribution CD-ROM. For a list of links to all Linux distributions, see the Linux distributions page.



ARJ Software: http://www.arjsoft.com



ARJ's FAQ page: http://www.arjsoft.com/faq.htm



ARJ's download page: http://www.arjsoft.com/files.htm



DOSEMU: http://www.dosemu.org



VMware: http://www.vmware.com



Aladdin Expander: http://www.aladdinsys.com/expander/expander_linux_login.html



gzip: http://www.gnu.org/software/gzip/gzip.html



Linux distribution page: http://www.linux.org/dist/index.html



Back to index




.lzh (.lha)



Files with .lzh or .lha extensions have been compressed with the LHa, LHarc, or LHx compression utilities developed by Y. Tagawa, H. Yoshizaki, Momozou, and Masaru Oki. Those utilities have been ported to many operating systems, and files created with them are fairly portable. Mats Andersson did the Linux port.



Unlike gzip, the lha utility performs multiple file compression. To compress a single file, type lha a archive file. Archives lha creates bear the .lzh extension. Should archive.lzh already exist in the present working directory, the file you tell lha to compress will be added to this existing archive. The previous contents of archive.lzh will be preserved, unless it already contains a file with the same name, in which case the old file will be replaced with the new one. This action is blind, meaning that it does not check the time stamps of files; to make sure that only files with a newer time stamp replace older files already saved in the archive, use the u option instead of the a option (e.g., lha u archive file).



When the person who receives an .lzh file complains that the archive cannot be decompressed, try lha ag archive file or lha ao archive file. The g and o options instruct lha to use the generic or the lha-compatible archival methods respectively. If you are only updating existing archives, use u instead of a.



You can archive multiple files or a single file with equal ease: either use wildcards (e.g., lha a archive files*) or type the name of the top directory in which files are stored, (e.g., lha a archive directory). All files and subdirectories will be automatically stored in the archive. File replacement rules and the u option work for multiple files in the same way as they work for a single file. Wildcards use the same syntax as they usually do in your favorite shell.



Here are some additional lha options:




Note that lha options do not have to start with a minus sign (-), and that there should not be spaces between them. For a list of additional options, type lha.



Decompression of .lzh and .lha files is easy; just type lha e archive.lha. If you prefer to decompress files into a directory other than the present working directory, type lha ew=path archive.lha.



You can also use lha w=path -e archive.lzh to extract the contents of the archive into the directory indicated by path. lha -ie archive.lzh will extract the contents of the archive but ignore the directory paths stored in it.



The source and binary versions of lha are freely available from servers that carry your favorite Linux distributions. For a list of links to all Linux distributions, see the Linux distributions page.



Linux distro page: http://www.linux.org/dist/index.html



gzip: http://www.gnu.org/software/gzip/gzip.html



Back to index




.rar



Files with the .rar filename extension are archives created with the help of the RAR and WinRAR archival and compression tools developed by Eugene Roshal. He first released them for the MS-DOS operating system. RAR became a hit among users who valued its wide range of features and high compression ratios, as well as its friendly user interface, which was similar to Norton Commander's (or Midnight Commander's).



Since RAR for Linux is a full port of the software, you can enjoy all of the benefits of original RAR (except the MC-style interface). Remember, though, that RAR for Linux is not free; you must register it with T:mi Softronic, which is based in Finland.



The list of RAR for Linux options is impressive and could be used as a to-do list by developers of other archival and compression tools. To compress a single file, type rar a archive file. The .rar extension will automatically append to the archive (and its full filename will be archive.rar). If archive.rar already exists in the present working directory, then the file you are trying to add to archive.rar will be simply added to the existing archive. Only when archive.rar already contains a file with the same name will the old file be replaced with the new one. Other files stored inside the archive will be unaffected. Keep in mind that this action is blind, meaning that RAR does not check the time stamps of files; that check is only done if you add the -u option after a. Note the missing minus sign (-) before a (e.g., rar a -u archive file).



The archiving of multiple files is very easy: to archive all files and directories in the present working directory, type rar a archive. To archive the contents of a particular directory, type rar a archive /path/to/directory. To archive groups of files, enclose them in quotes, like this: rar a archive '*.cpp'.



File replacement rules and the -u option work for multiple files in the same way that they work for a single file. Wildcards use the same syntax that they usually do in your favorite shell, but you must remember to quote them.



Here are some additional RAR options:




Note that not all RAR options start with a minus sign (-); for details, run RAR without any options and arguments (e.g., rar/rar, if the RAR directory is located in the present working directory). See the RAR manual file (it's the rar.txt file, also located in the RAR directory) for more details.



With RAR, you can decompress whole archives (use rar e archive.rar), a single file (use rar e archive.rar file, or groups of files (use rar e archive.rar '*.cpp'). Since RAR can work with multiple files, you can also extract groups of files from more than one archive using rar e '*.rar' '*.cpp'.



When RAR cannot extract a file from an archive, use the repair option to rescue it: rar r archive.rar.



You can get RAR for Linux from the official RAR site as a self-extracting archive. You will need to run it with the ./rarlnx271.sfx command (the number at the end may differ as new versions of RAR are released). The archive will unpack its contents into the automatically created rar subdirectory of the present working directory. There you will find the rar binary. You may move rar binary to the /sbin or /usr/sbin directory to make it available to all users. Apart from RAR itself, you can also get UnRAR, a small utility to decompress .rar files, which is handy if you want to give other users the option to open such archives but not to create them.



RAR site: http://www.rarsoft.com



Back to index




.sit



The .sit extension usually accompanies archives created with Aladdin Systems' StuffIt archiver for Mac OS. Those files may have additional .hqx or .bin extensions indicating that they have been processed with BinHex or MacBinary utilities to create a single text or binary file which can be transferred electronically over computer networks. (Macintosh files often have two parts, called forks, that must be joined together before transfer to make sure that they are transferred as a whole.) You can handle such encoded files with utilities form the macutil package, which is free and available for all decent Linux distributions).



Since Aladdin Systems controls the source code for StuffIt, your only choice when you receive an .sit file is to use the Aladdin Expander for Linux, which is currently in beta (but quite usable). Aladdin Systems has made the beta available to the public as freeware: you do not have to pay cash for it, but you do not get access to the source code either. Currently there are no tools for Linux that will create .sit archives.



To decompress an .sit archive, type unstuff archive.sit. The extracted files will go into the present working directory unless you use the -d option, which lets you specify the destination directory: unstuff -d=/home/james/oldmacfiles archive.sit If the file you unpack was protected with a password, use the -p option: unstuff -p=secret archive.sit. The text file translation filer option takes care of translating end-of-line characters from LF to CRLF and back again: unstuff -text=auto -eol=unix archive.sit.



Find more information about the Expander on its man page (type man unstuff to display it).



Aladdin Expander for Linux is available from the Aladdin Systems Website. There are no fees for using or downloading it, but you must register with Aladdin Systems. There are two versions of the Expander, one for RPM-based systems (Red Hat, Mandrake, SuSE, and others), and the other for .deb-based systems (Debian, Corel, and others). There are no Slackware-specific packages, but it should not be difficult to convert the RPM package to work on Slackware.



Aladdin Systems: http://www.aladdinsys.com



Linux distro page: http://www.linux.org/dist/index.html



macutil package: http://www.linux.org/dist/index.html



Aladdin Expander for Linux: http://www.aladdinsys.com/expander/expander_linux_login.html



Back to index




.zip



The .zip extension denotes a file that was created using one of many zip archivers and compressors (but not gzip). Since this is a very popular compression format, and detailed descriptions of the algorithm are widely available, you can find useful ports of it for all operating systems. This includes both compression and decompression utilities that create and expand archives with the .zip filename extension. On Linux, there are two such tools: the free Info-ZIP and the commercial PKZIP for Linux. If you only need to occasionally create or open zip files, use Info-ZIP. Choose PKZIP if you want to use the same tool you used on MS-DOS or other systems (PKZIP is available for many operating systems). Both utilities can create and open archives compatible with each other, WinZIP for Microsoft Windows, or StuffIt for Mac OS.



Info-ZIP offers a good choice of compression and decompression options and is probably the best format for exchanging compressed files between users of Linux, Microsoft Windows, and Mac OS when using gzip or tar is not an option. There are a lot of good zip programs (both open source and commercial) available for those operating systems, and they should ensure a smooth exchange of files (as long as special features unique to one particular tool are turned off, of course).



To archive a single file, type zip archive file. To compress multiple files, use wildcards (e.g., zip archive picture*jpg). It is also possible to pipe files into zip; type man zip for more information. To archive the contents of whole directories, including all subdirectories within them, use the -r option and type the name of the directory after the name of the archive (e.g., zip -r archive directory). You can also type zip -R archive . to archive the current working directory. The .zip extension is automatically added to the name of the archive.



When zip finds an archive with the same name that you are trying to create, it will add the file you want to compress to the existing archive without removing the files that are already there. There is one exception to this rule: if one or more files stored in the existing archive have the same name as one or more of the files that you want to compress, then the old file will be replaced with the new ones. File replacement rules can be modified with the -u option (see man zip for more information).



Here are some additional zip options:




Note that not all zip options start with a minus sign (-).



PKZIP options are similar to Info-ZIP, but you should read the pkzip.htm manual (it's an HTML document; read it with lynx pkzip.htm or netscape pkzip.htm) to make sure you know what you are doing.



To decompress a zip file, use the unzip archive.zip command. If you want, you may specify the directory in which the archive should be expanded with the -d option (e.g., unzip file.zip -d /home/james/zips will extract the contents of the file.zip into the /home/james/zips directory).



Note: Aladdin Expander for Linux can decompress zip files as well. For more information see the section about .sit files.



When zip cannot extract files from an archive, use the -F or -FF repair zip options to rescue it (e.g., zip -F archive.zip or zip -FF archive.zip). Remember to make backup copies before trying to rescue damaged archives.



You can get Info-ZIP for Linux in binary or in source form from the official Info-ZIP site.



You can download PKZIP for Linux from the PKZIP official site. It is distributed as a self-extracting archive. To unpack it, type ./pklin251.exe, and the contents of the archive will be unpacked into the present working directory. You may want to copy the pkzip25 binary into the /sbin or /usr/sbin directory to make it available to all users on your system. (You must buy an appropriate license, though, as it is a commercial package.)



gzip: http://www.gnu.org/software/gzip/gzip.html



tar: http://www.gnu.org/software/tar/tar.html



WinZIP: http://www.winzip.com



StuffIt: http://www.aladdinsys.com



Aladdin Expander: http://www.aladdinsys.com/expander/expander_linux_login.html



Download Info-ZIP in source form: http://www.freesoftware.com/pub/infozip



Download Info-ZIP in binary form: http://www.linux.org/dist/index.html



Download PKZIP: http://www.pkzip.org/shareware/pkzip_unix.html



Back to index




.zoo



The .zoo extension does not appear on the Internet very often, but rest assured that you can open and create such files with Linux's zoo utility. Keep in mind, however, that zoo is old and rather simple for today's users' expectations.



To archive a single file, type zoo a archive file. Similarly, to compress multiple files, you can use wildcards -- for example, zoo a archive picture*jpg. The .zip extension is automatically added to the name of the archive.



Here are some additional zoo options and the functionality they add:




Note that zoo options do not start with a minus sign (-).



To decompress a .zoo, file type zoo e archive.zoo.



You can get zoo for Linux in binary form for all Linux distributions.



Download zoo in binary form: http://www.linux.org/dist/index.html



Back to index


Resources