Contents
File Formats
Audio
Windows Media Audio is not widely supported on non-Windows operating systems with most supporting players using the FFMPEG video transcoding toolkit as their backend. In addition, FFMPEG does not currently support Lossless WMA.
On a related note, many lossless compression formats are difficult to play back on non-Windows operating systems, if not impossible. For example, the reference implementations of Monkey's Audio Codec and Shorten (upon which most playback tools are based) have non-commercial licensing clauses which make them incompatible with the popular GNU GPL source license.
In addition, some Linux distributions, fearing software patents, omit support for MP3 and AAC playback, but asking people to avoid that would be going overboard.
If you are employing lossy compression for audio, please use one of the following:
Ogg Vorbis (Recommended)
For lossless compression, one of the following formats is suggested:
As expected, no open-source tool can play back DRM-encumbered WMA or AAC files.
Books, Papers, and Other Large Writing Projects
For most people, tools like Microsoft Word are their first choice for writing something. In fact, it makes your job harder and you don't even know it.
If you use Microsoft Office to write a book, you have to handle both the composition and the typesetting at the same time, and if you change your mind, you have to edit the entire document manually.
Use a semantic-markup language like LaTeX or DocBook or, if you prefer to keep things visual, a WYSIWYM (What You See Is What You Mean) tool like LyX. That way you can focus exclusively on the content of your book and apply the formatting after it's done. In addition, it keeps content separate from formatting. Change your mind? It takes just one setting to change everything.
Even better, it lets you automatically generate all the different versions (A PDF for printing, a bunch of HTML files for web hosting, etc.) from one source document with no extra effort.
When sending me word processing documents, please use one of the following formats:
OASIS OpenDocument (Recommended, a popular international standard)
PDF (Read-only, but great for printing)
RTF (but only as a last resort, please)
Please avoid the following formats:
- Microsoft Word (I don't have Microsoft Office)
Microsoft Office OpenXML (There are technical and legal reasons that this should have been denied international standard status. I also currently cannot open such documents.)
Compression and Archival
On Windows, the most popular archive formats vary depending on the people using them, but Zip, RAR, and ACE are among the most popular. However, ACE 2.x (the current version) is a closed format and RAR 3.x (also current) uses patented compression algorithms.
In addition, the native Linux-on-x86 extraction tool for ACE archives is of poor quality and the only current unrar tool is under a GPL-incompatible license. (Which, at the very least, makes it very difficult to implement integrated decompression of RAR archives into GUI tools)
Please use one of the following formats for file compression and archival:
- Zip
- UNIX tar compressed with gzip or bzip2 (Not supported by MS WinXP's "Compressed Folders" feature)
7-Zip (Compresses better than RAR and ACE in most situations. Requires p7zip (non-Windows, free) or 7-Zip (Windows, free) to decompress.)
Generic Data Formats
Note: Are you a programmer or interested in programming related things? If not, you should probably skip this section.
Have you ever created your own file format from scratch? What happened when you needed to extend the design? It probably wasn't pretty. Please use either XML or some other reasonably extensible metaformat. It will provide many benefits.
The general rule among Python programmers tends to be "Use XML if you need to send data to another program which may not be in Python. Otherwise, use something less bloated and more human-readable like the ConfigFile parser."
Images
Several years ago, there was a big furor over software patents related to the GIF image format and, even if that hadn't happened, people send "I can't open this" replies every day.
Please use JPEG files when you need lossy compression for photos and photo-like images, and PNG files when you need lossless compression or are compressing images with "hard edges" such as screenshots.
For Windows users, I recommend IrfanView as an image viewer and simple touch-up tool and The GIMP as a free image editor.
Though non-animated vector graphics are not overly common as of yet, please use the open SVG format for sharing them. I recommend Inkscape for creating and editing vector graphics.
Animations
There is currently no widely-supported alternative to animated GIF and therefore, no widely supported free format for animations with more than 256 colors. (Flash is not a free format) Please apply whatever pressure possible to add support for MNG images to browsers such as Firefox and Microsoft Internet Explorer.
Also, while Firefox 3.x adds support for APNG (Animated PNG), the format is a direct violation of the PNG standard (It breaks the promise that a PNG file will only contain one frame) and makes it difficult at best to identify whether a PNG image is static or animated. (The original reason that animation is an explicit violation of the PNG specification)
Please only send me APNG animations if you need help converting them to a format which is not an explicit standards violation, such as MNG or animated GIF.
Print-ready Media
Do the world a favor, use PDF for stuff that has to look the same everywhere and don't use it for things that won't be printed before being read.
PDF is the easiest and best way to share things that need to look the same everywhere, but anything that doesn't reflow to fit the user's screen will be a pain to read.
You can print to PDF files for free from any program by using PDFCreator. Some free tools, such as OpenOffice.org, don't even need it.
As a side-benefit, you will already have the necessary knowledge to generate PDFs for sites like Lulu.com.
Spreadsheets and Memos
Though they are currently the de facto standard, Microsoft Office formats are, by design and through continued effort, not interoperable with other programs. Despite a great deal of lobbying by Microsoft, a new standard known as OASIS OpenDocument has been introduced and, with the potential exception of Microsoft Office, every modern office suite now supports, or will support it without the need for 3rd-party plugins. (MS Office may require a 3rd-party plugin)
The following free office tools currently use OpenDocument as their native format:
OpenOffice.org (Recommended)
KOffice (Not currently available for Windows)
The following free tools have some degree of support for OpenDocument:
As a side, note, Microsoft Word documents aren't even guaranteed to print the same way that they display. (1)
Also, as mentioned earlier, there are technical and legal reasons to avoid Microsoft's new Office OpenXML format (One could argue that the name is a misnomer) beyond "I can't open such documents".
Miscellaneous
Character Encodings
Have you ever opened up a text file only to find unintelligible gibberish? Gets annoying, doesn't it.
Use UTF-8 encoded Unicode and encourage your friends to as well. Not only will you be building a world without encoding mismatches, you'll be able to combine different character sets within the same file. For example, writing a dissertation on high-level math in a non-latin alphabet.
Oh, if you're a software developer, please use normalization form C unless forced otherwise. The MacOS X filesystem requires normalization form D, but Windows, Linux, and the W3C web standards specify normalization form C.
Dates and Times
If I tell you to meet me at 7:30 on 02/04/05, am I telling you to meet me in the morning, or the evening? ...and which of these dates would you visit me on?
- February 4th, 2005? (Or 1905, or 2105)
- April 2nd, 2005?
- April 5th, 2002?
Please use ISO 8601 (an international standard) dates and times. As an added benefit, if you prefix your filenames with ISO 8601 dates, alphabetical and lexicographical (case-sensitive alphabetical) sorting gain an automatic chronological pre-sort.
For quick reference, February 4th, 2006 at 3:00 PM (local time) in the long form of ISO 8601 time would be 2006-02-04 15:00
Also, you may like this sci-fi song which employs the ambiguity of two-digit dates to disguise itself as a war song: Queen - '39
Not only does HTML e-mail (the mail where you can set colors and font sizes) take a lot more space than it's text-only counterpart, it allows you to trigger viruses and confirm your e-mail address for spammers just by viewing the mail in the preview window... and Microsoft Outlook Express' version of HTML is different from the standard, so people who read your mail may not see what you think they will.
Please use text-only e-mail. Especially with me since, if you send me HTML mail, the fancy colors and other junk will be silently and automatically stripped out by my mail client.
You want proof that my dire warnings are true? Well, I don't feel like summarizing old security reports, so here's a summary of how they work:
- An HTML mail can contain Javascript or references to it, which can then trigger a virus when you preview it.
- An HTML mail can reference a remote image. When the image is loaded by your mail client, the spammer looks at their server logs, knows that your e-mail address is valid, and sends you a flood of junk.
Web Content Creation
Web pages are written in HTML, which is a semantic markup language. If you use the "make bigger" button to make a header, then tools like screen readers for the blind will just treat it as plain text, rather than the header you intended. It also breaks the separation of content and presentation. Content goes in the HTML, and then markup goes in a CSS style file that the HTML references.
How else does non-graphical editing benefit you? How about sharing a single CSS file across your entire site. Switch in a new one, your entire site changes it's look. For an example of how powerful a technique this is, drop by the CSS Zen Garden.
Best of all, hand-written HTML is much more compact than automatically-generated WYSIWYG HTML. There are still a lot of dial-up users out there.
Also, HTML presentation with CSS styles uses a box model. (everything is a box) In the case of this site, the sidebar is a box, the lists inside it are boxes, and the list items are boxes inside those. Keep this design in mind when writing your HTML. You'll end up with much cleaner markup.
Oh, one other thing... Don't muck up your nice clean HTML with JavaScript/ECMAScript behavioural hooks, use Behaviour.js or the equivalent functionality in jQuery. As with CSS, you'll get reduced bandwidth consumption as a side-effect.

