open.itworld.com
  Search  
Security Home Page Security Webcasts Security White Papers Security Newsletters Security News Open Topics Careers ITworld Voices ITwhirled The Security site of ITworld.com

Unix Tip: File typing with magic

ITworld 12/5/2007

Send in your Unix questions today! | See additional Unix tips and tricks

Following last week's column on file extensions, several readers wrote in to mention the other ways in which Unix systems determine file types and, as a consequence, how to handle files when you work with them. In particular, they mentioned the /etc/magic file and the file signatures that it provides to identify file types regardless of how the files are named -- even, in fact, when no file extensions are used.

To demonstrate the file typing operation, let's examine a file named "unknown" and see what we can learn about it.

First, here's a listing of the file:

  > ls -l unknown
  -rw-r--r-- 1 henrystocker staff 12578 Dec 4 11:10 unknown
 
When we ask the file command to identify it, it has no trouble determining that this particular file is a JPEG file:

  > file unknown
  unknown: JPEG file
 
JPEG files, like other image file types (PNG, GIF, TIFF etc.), contain a form of file identifier in addition to the data that comprises the image itself. They start with a particular sequence of bytes. This byte sequence might be \377\330\377\340 (0xffe0) or \377\330\377\341, (0xffe1). The difference between these two identifiers is whether the image uses JFIF or EXIF. You might think of these formats as extensions of the JPEG format, designed to support image details not specified by the JPEG standard.

If we examine the beginning of our "unknown" JPEG file, for example, we might see something like this:

  > od -bc unknown | head -2
  0000000 377 330 377 340 000 020 112 106 111 106 000 001 001 000 000 001
  377 330 377 340 \0 020 J F I F \0 001 001 \0 \0 001
 
As you can see from the second line of output, this file uses JFIF (JPEG File Interchange Format).

The second variety of JPEG uses EXIF (Exchangeable Image File Format). Most digital cameras store image files using this format.

  % od -bc myphoto.jpg | head -2
  0000000 377 330 377 341 077 376 105 170 151 146 000 000 111 111 052 000
  377 330 377 341 ? 376 E x i f \0 \0 I I * \0
 
EXIF markers store additional information about images such as an optional thumbnail and sometimes even audio information.

The /etc/magic file identifies file types by capturing this type of information and expressing it in four or five fields -- the byte offset (distance from the start of the file), the type of the identifying value (e.g., string or short), an optional operator, the value to be matched and the string to be printed by commands that are meant to identify the files.

Here are the Solaris /etc/magic descriptors for both forms of JPEG files:

  0 string \377\330\377\340 JPEG file
0 string \377\330\377\341 JPEG file
The offset for both JPEG identifiers is zero. In other words, the identifying information is at the beginning of the file. The values to be matched are classified as strings, though it is expressed as four bytes in octal format and the "string to be printed" is "JPEG file". The description may vary slightly from one OS to another. My Mac OS X system, for example, uses the description "JPEG image data" while Solaris uses "JPEG file", but both systems share the same basic knowledge about the file types and how to identify them.

Ask a Unix system about a .doc or .docx file, on the other hand, the file command is likely to tell you simply that the file is "data". While some Windows files may have embedded identifiers, they're not as obvious as those associated with standards-based image files and they don't seem to be heavily relied upon for file identification. Windows systems depend on file extensions to a greater degree than their Unix counterparts, but they do have some identifiers. Some .exe files, for example, start with the letters "MZ" (said to be the initials of one of the MS DOS developers) as shown in this example looking at the PuTTY executable:

  bash-2.05a$ od -bc putty.exe | head -2
  0000000 115 132 220 000 003 000 000 000 004 000 000 000 377 377 000 000
  M Z 220 \0 003 \0 \0 \0 004 \0 \0 \0 377 377 \0 \0
 


On both Windows explorer and Solaris file manager, when you double-click a file to open it, the system examines the file name extension. If it recognizes it, it opens the file with whatever program is associated with that file extension. If Windows doesn't recognize a file's extension, it asks you what program it should use. If Solaris doesn't recognize a file's extension or it doesn't have one, it subjects it to greater scrutiny and seems to assign the proper icon and action to it.

Identifying the type of files on the systems you manage isn't really magic. Instead, it's a carefully crafted system that makes use of the ways in which file types are defined.

On this topic

 




Sponsored Links

Sign up for a Microsoft Dynamics® CRM WEBCAST
Hear globally recognized leaders in customer strategy discuss the importance and evolution of CRM.
Sun Microsystems' - FREE 60 DAY TRIAL OFFER!
Test Sun's Newest Servers BEFORE YOU BUY. Plug Them In With Access To Full Technical Support.
100% Web Based Help Desk Software
Easy to use, customizable to meet your needs, powerful and scalable. Free online demo. Try it today!
Sign up for a Microsoft Dynamics® CRM WEBCAST
Hear globally recognized leaders in customer strategy discuss the importance and evolution of CRM.
Used and Refurbished HP ProCurve Switches
Lifetime Warranties, Professional Testing & Shipping on all HP Equipment Purchases!
» Buy a link now

Advertisements
Sponsored links
Locate Hidden Software on business PCs with this free tool
Bring harmony to your mix of UNIX-Linux-Windows computing environments
Top 5 Reasons to Combine App Performance and Security
KODAK i1400 Series Scanners stand up to the challenge
 Home   Open source  Operating systems  Unix
www.itworld.com    open.itworld.com     security.itworld.com     smallbusiness.itworld.com
storage.itworld.com     utilitycomputing.itworld.com     wireless.itworld.com

 
Contact Us   About Us   Privacy Policy    Terms of Service   Reprints  

CIO   Computerworld   CSO   GamePro   Games.net   IDG Connect   IDG World Expo   Infoworld   ITworld   JavaWorld   LinuxWorld  MacUser   Macworld   Network World   PC World   Playlist  

Copyright © Computerworld, Inc. All rights reserved

Reproduction in whole or in part in any form or medium without express written permission of Computerworld Inc. is prohibited. Computerworld and Computerworld.com and the respective logos are trademarks of International Data Group Inc.