File: WHY_TXT.TXT Guy Dunphy 8/6/95, Rev 16/12/96 The Perils of MS Word or Why Plain Text is Preferable to Word Processor Formats So, you want the documents you type to look pretty, and show off your fine sensibilities and good taste? You feel you really _must_ use just the right font, and that italics, underline and bold are essential elements of your unfettered self expression? You'd die without your embedded graphics? Maybe so. But tell me: where is the oldest document you ever wrote on a computer now? If it's more than say five years old, if you used a word processor, and you didn't save it in 'text only' format, I'd say the chances that you can still print out a copy now are fairly slim. Have you thought about times decades hence, when you (or others) want to re-read the documents you're writing today? For anyone who is concerned with future accessibility of today's data, there are many reasons to forego the deceptive benefits of complex word processors, and stick to the simple basics. Here are some of those reasons:- Benefits of simple text editors ------------------------------- * Typically, the editor fits on a single floppy. So you can easily transport and back it up. It can easily be transfered to new storage media types as they are introduced. You will be able to keep the editor for as long as you wish. * Even when removable storage capacities rise to levels where the size of any utility is insignificant, still an editor's size will be a consideration. This is because it is relatively small, the editor will load and run fast. There is a great satisfaction in being able to hit the 'E'(dit) key on a selected file, and have it flash up in an open edit window faster than you can blink. I suspect that no matter how fast processors get, complex word processors will always take a moment or two to get started, if only because their writers will always load them up with as many features as the processor power can bear. * A text editor's small size and simple structure means that it only consists of one or two files; you know what they are and what they do. What you must do to personalise an installation will be quite clear. Simplicity is an important point in maintaining control of tools critical to your life's activities. * With such a simple structure, it's setup and installation are simple, so for as long as a compatible operating system is in use, you will be able to use the editor. You'll be able to install it anywhere, anytime, simply and easily. Unlike typical Windows editors with their stacks of disks, owner name branding, and automatic installers with questionable hidden media searches and 'intelligent' overwriting of 'previous versions'. * Because its not so intricately bound up with the OS details, you do not become a slave to Microsoft, forced to follow them up the path of perpetual upgrades. There will always be plain ASCII editors, but there will _not_ always be, say, a 'Word for Windows, V6' compatible editor. It is an unfortunate fact that large software companies have a tendency to deliberately change the internal format of 'document' files between revisions of their editors. They do this to discourage competitors from offering viewers and tools for 'their' file format. * Keeping all your texts in plain, universal ASCII format means you are not committing yourself to regularly converting them all as you go through versions of word processors. Realisticly, you wouldn't do this anyway (which box of floppies did you say your backups are in?), so you'd eventually _lose_ your archived texts (ie they'd be unreadable.) * Plain ASCII text files are much more compact for their information content than equivalent word processor document files. Size ratios of ten to one are not unusual. So for a given piece of prose, it is much more efficient in storage space and network bandwidth to keep it in plain text form. * You can grep (automated search) through plain text files in bulk, but _not_ through word processor files. Will you always be able to remember just which file had the phrase you want somewhere in it? * Fancy (expensive) word processors tend to embed information such as their own serial numbers, who they are registered to, possibly even details about the machine on which you are writing it, in the text files they create. Sometimes you can see this stuff in the file if you 'dump' it, other times it is encrypted and only those in the know (not you) can extract and decipher it. This is not good. In a way, it implies that you are writing 'with their permission' (even if you actually do own the software.) Certainly, if you want your writing to be anonymous, don't use a word processor. You want every byte to be visible, so you _know_ there is nothing being hidden in there. Only plain text gives this certainty. * With ASCII text, you don't have the flexibility of control over your documents' format as with word processors, but neither does anyone else. What you type is what other people will see. Without the parameters of font, size, margins, bold, etc, there's no opportunity for others to casually fiddle with them. * If you want your thoughts to have a wide potential audience, don't use a word processor. No-one is going to spend hundreds of dollars buying an editor different to 'their favourite', just to read your document. Yet every text manipulating tool on the planet can read/import plain ASCII text. With the public nets in their current state of development, you simply _can't_ post non-ASCII documents, unless you first encode their binary files in an ASCII-only form. This often makes it all just too hard. * On the other hand, consider the issues of privacy via encryption, and authentication of authorship using digital signatures. Here too, text files have a great advantage over word processor files. There are many utilities available to perform encryption and signature stamping, and all of them work with text files. If you use some other word processor format, expect to have no end of trouble with such utilities, where they work at all. The importance of this aspect cannot be stressed too strongly. Public use of utilities such as PGP (Pretty Good Privacy) is an absolutely crucial weapon in the developing fight against repressive governments worldwide, and their intent to retain control of power (ie control of information). The retention of plain ASCII as a standard medium is vital to the defence of the right to free and private communication. Indeed, hardly any paranoid tendencies at all are needed to contemplate the idea that the enthusiasm of the major forces of the software world for non-ASCII based formats might have something to do with this. A related matter, is another one of Muckroscum's many acts of bastardry. That is their apropriation of the .DOC file suffix, by Word for Windows. Thanks, Microsux, for commandeering such a generic suffix for use with your proprietry application. You could have used .WWN, .WWD, etc. but oooh noooo....., it had to be something already in general use to indicate an informative ASCII text file. After all, nobody _owns_ the .DOC extension, right? So why shouldn't you use it, eh? Ever heard of showing consideration for others? How about common decency? I'm surprised you didn't copyright it. Did you? So now, when we see a file like 'HINTS.DOC', and try to view it, what we often see is a heap of binary garbage. 'Ah! A Murkosunk file. Drat.' we think. 'Is it worth going into Winkooze, and starting WforW, just to see this file? Probably not.' we continue. So we end up chewing through it in 'dump' mode, scanning the strings amongst the binary formatting trash, just to check it out. Summary ------- When the document is expected to be only transient, then some of the points above may not apply. However, there are many situations where permanence of the information is extremely important. Commercial product development and documentation for instance. In these cases, the creation of 'pretty layouts' is very similar in wisdom to the wearing of formal business suits and ties: a foolish slavery to fashion, a sign of detachment from practical reality, a symptom of meme infection. Its said that the use of acidic paper for most books published this century will result in the greatest loss of information in the history of mankind. Most books published between the early 1900's till recently will age very rapidly compared to earlier works printed on acid free paper. Within a hundred years, virtually all this century's books and documents will have crumbled to yellowing dust as residual acid content destroys the celulose fibres of the paper. Yet earlier books, using simpler but more robust paper will still be sound. Unwise use of 'new technology' will have deprived all future generations of the cultural inheritance from this important and rich century, when so many real advances occured. Yet this loss pales into insignificance compared to the losses that will eventually be experienced as society shifts to predominantly digital information storage, _unless_ there is widespread adherance to universal and flexible data storage formats. So far, the very opposite is happening. Commercial interests are once more shortsightedly poisoning our cultural archives, this time with the 'acid' of proprietry file formats. The urge to use the 'fashionable file format of the day' must be strongly resisted, where the format obviously has no potential for permanence. Thats why I try to ensure that all the key product documentation I produce gets archived with other files like schematics and so on, as ASCII text. Often people take this as a sign of technological backwardness; after all it 'looks so crude' compared to a nicely typeset Word document. Still, it will take a lot longer to revert to binary dust.