File: CPU_86.TXT        Guy Dunphy       (Oops. Lost date that I wrote this)

                     Intel 80x86 Processors

Computer programmers who generally code only in high level languages usually
give little thought to the fundamental nature of the processors they use.
The assumption is that 'the compiler takes care of all that', and so they
and their code are unaffected by the details of processor operation.

Well, it isn't so. There are a number of very deep ways in which the CPU, and
the inherited basic development tools for a processor family, _do_ affect
and limit what we can do at any level of software development.

Unfortunately, the Intel 80x86 family provides a particularly severe example.
This family seems to have taken just about every wrong turn possible.
Some of the mistakes are simply minor annoyances, like the Little Endian byte
order, which makes dealing with data import and export a pain, not to mention 
the inconvenience of 'jumbled' numbers in memory dumps.

More serious, are the complex instruction set and non-regular register
functionality, which make compiler optimization a real pest. Still, these
things can be more or less ignored at a high level.

What has really sidetracked the whole world of software development though,
are the 'fatal three':-
  - Non-relocatable code.
  - Segmented address registers/memory stucture.
  - Word/paragraph alignment.

Together, these have had ghastly consequences, severly limiting the range of
what we can do as programmers. In fact, they have even limited the scope of 
what most programmers seem to be able to _think_ of doing.

The first serious mistake these limitations led us into, was the idea that
a module of code should be in one form on a disk, and a completely different
form when loaded into memory. This was slipped past us with the introduction
of EXE files, relocating loaders, and so on. The argument was, that if you
have to patch a whole lot of absolute memory references throughout a piece of
code, or a data structure with internal/external linkages, just so you can
put it anywhere in memory, then what does it matter if you change anything at
all as you load the code?  'Not much', you say?  OK, fine. Except we lost the
fundamental option of 'code and data structure conservation'.
A whole bunch of capabilities went down the plughole when this happened:-

* Verifyable in-memory data object integrity.
  We cannot:-
  - Reliably identify a block of code in memory just by looking at it.
    This would allow a 'refuse to run non-conforming code' system.
  - Verify a module of code is uncorrupted.
    If the standard module form incorporated a checksum system, or even
    several different and interlocking checksum methods, then modules could
    be checked by background tasks while actually executing in memory.
    The two items above would make the survival of computer viruses vastly
    more difficult. This alone is a powerful reason for change.
  - Save it as a working whole to external storage.
    Now you can load a program into memory, but you can't save it back again.
    While this is a usefull barrier to software cracking and piracy, it is
    still a conceptually limiting environment. It prevents people from 
    thinking of programs as things that can be modified and evolve in their
    executable form. The sort of thing that is natural for interpreted code.

* Inherent relocatability.
  We cannot:-
  - Move a code module as a unit to another memory area, hence:-
  - Dynamicly defragment all memory. The two issues here are: whether the
    object will work at a different address, and what other things refer to
    it and hence must be informed of the move. With non-relocatable code the 
    whole idea is completely out of the question.
  - Remove code modules from memory. Now, there is no way of telling what a
    loaded program has patched itself into. If all code modules in memory
    (including OS components) were 'structure conserving', ie never a single
    byte ever changed, then all module linkages would have to be done in
    dynamic (and hopefully fully documented) data structures. If these were
    properly designed, it would be possible to remove code modules at will.
    This opens up all sorts of possibilities for dynamicly reconfigurable
    operating systems and software.



A second mistake, was to develop the idea of 'multiple memory models', and
then compound the problem by implementing 'protected' vs 'real' mode to try
and patch up the mess. This vast and gothic edifice is probably responsible
for wasting more programmer-hours than any other machine defect workaround
in the history of mankind.
It's secondary effect has been to so thoroughly bury the grave marker of
relocatable code, that no-one even seems to remember it.

The assumptions implicit in this state of affairs are even cemented into our
best tools. For instance, lets see you try to define and initialize a C
structure in which one of the records contains an offset to some component of
a variable format tail to the structure. The second of two variable length
strings, perhaps. Now thats an _offset_, mind you, not an absolute pointer.
What we want is something that can be saved in a binary file, then loaded
unmodified anywhere in memory and still be valid. Do you see the problem?
The C compiler is so removed from the idea of a memory address, even a 
relative one, that it can't even do something a simple assembler can do.
This is an advance? Perhaps it would not have gotten this way, if there were
not always the stages of linking, and relocating loaders between the
compiler's output, and a final program running in memory. 

You may be asking 'so why would I want to do that anyway?'. Well, the answer 
is that data structures with internal self-references (ie address references)
are something crucial to interpreter operation, and that interpreters (of one
form or another) are essential tools for achieving processor and hardware
independence.

'But C is portable! That gives us processor independence!' you say.
Well, only if you have a compiler for the target processor it is. (And the 
source code for the application.) If you want to write code that 'just runs'
on anything, as unmodified binary, the only way is to write in an interpreted
source form, and to make sure there is a conforming interpreter kernel on
the target machine. That is a much more worthy goal to work for than any
number of PC based software utilities.


The upshot of all this is that so long as the majority of computers in public
use remain based on processors with architectures like the 80x86 series, we
will have _no_ chance of achieving the true social benefits that widespread
ownership of processing power can bring.
Machines with the fatal flaws of today's PCs will never support advances
like a universal operating system. They will never encourage the greater mass
of people, the non-expert programmers, to take part in the development of the 
software that they use in daily life.
Such architectures are more than a simple unfortunate error of judgement,
they are now an obstruction in the path of social evolution.