With this article I intend to start a series on personal computer operating systems from Digital Research and Microsoft, basically those systems that use drive letters and run on the Intel x80/x86 and Motorola 68k processor architectures.
You can find a list of all articles in the series on the Computing page.
This article and the article series assumes basic understanding of assembler language and hexadecimal numbers, basically of everything that is too complicated for me to understand well enough to explain to people. I myself am somewhat familiar with PIC Microcontroller machine language and some high-level programming languages, but certainly not Intel or Zilog machine code.
To test and study CP/M I used Peter Schorn's excellent altairz80 simulator (based on the multi-system simulator SIMH). Except for some old memories of using CP/M 3.0 on a Commodore 128 some 20 years ago my only experience with CP/M comes from using it on altairz80. Peter Schorn himself was a great help too. Altairz80 ist really easy to use.
The original system of the large family is Digital Research's Control Program for Microprocessors, or CP/M, first released in 1974. It became the first standardised operating system to support the same applications on computers made by different companies.
CP/M computers of the time were based on a general architecture premiered by the Altair. The hardware consisted of an Intel 8080 (later usually a Zilog z80) CPU and up to 64 KB of RAM. The 8080 and z80 were 8 bit CPUs that could address 2^16 bytes of memory (64 KB) and the direct ancestors of the IBM PC's Intel 8088 CPU. CP/M quickly became the standard operating system for personal computers.
Due to the 16 bit addressing of the common CPUs CP/M had to deal with a maximum of 64 KB of memory. However, a computer system could have less than 64 KB of memory and CP/M had to be flexible.
As I understand it the memory organisation provided for a transient program area (for applications) below the area used by the operating system. The Transient Program Area (TPA) beginning with the so-called Zero Page (ZP) begins at the beginning of memory (address 0000h) and ends where the operating system area begins.
The operating system area ends at the top of memory (ideally address FFFFh, i.e. the 65536th byte) and defines by its size the beginning of the operating system area and the end of the Transient Program Area. The operating system area is itself divided into four areas for (from highest to lowest address) Basic Input Output System (BIOS), Basic Disk Operating System (BDOS), Program Loader Module (probably PLM) and Resident System Extensions (RSE).
The Basic Input Output System provides ways to communicate with Input Output devices, usually a teletype made up of a keyboard and monitor or printer or serial ports.
The Basic Disk Operating System provides ways to communicate with disk drives. Back in the day those were 8" floppy disk drives of the type one can still see in the 1983 movie WarGames.
I don't really know what the Program Loader Module does, but I am guessing it handles loading programs. I am told that it doesn't exist in early CP/M versions but was only added with CP/M 3.0.
I have no idea what Resident System Extensions are.
I imagine that the stack grows down from the top of the Transient Program Area while the heap grows from the top of the text and data segments towards the stack. Text and data segments start on top of the Zero Page at address 100h (the 256th byte).
The Zero Page contains useful information for the transient program (the application) written into it by the operating system when a transient program is loaded. Since the operating system is located at a, for the transient program, random location, one of the purposes of the Zero Page is to store a pointer to the operating system which the transient program can call when it wants to make a system call.
The higher 128 bytes (addresses 80h to FFh) contain the command line (command tail) used to load and start the transient program. The first byte (at address 80h) specifies how many characters the command line contained. The following 127 bytes contain the characters of the command line, presumably without a final zero and without the line feed and carriage return characters. (This 127 character limit still existed in MS-DOS and Windows 25 years later.)
The low 128 bytes (addresses 00h to 7Fh) contain two File Control Blocks (FCB, in which the state of two open files are maintained, although I understand that they overwrite each other, addresses 5Ch to 7Fh) and various other data below address 5Ch.
Address 00h |
Length: 3 bytes (8 bit jump
instruction, 16 bit address) |
Exit program/Jumps to BIOS |
Address 03h |
Length: 1 byte |
I/O byte (I have no idea what this does) |
Address 04h |
Length: 1 byte |
Command drive and user number |
Address 05h |
Length: 3 bytes (8 bit call
instruction, 16 bit address) |
System call/Calls the BDOS |
Address 08h |
Length: 51 bytes (if I counted right) |
Interrupt vectors |
Address 3Bh |
Length: 21 bytes (again...) |
Reserved (not used) |
Address 50h |
Length: 1 byte |
Drive from which program was loaded |
Address 51h |
Length: 6 bytes (two 16 bit addresses, two 8
bit numbers) |
Addresses and length of two FCB passwords |
Address 57h |
Length: 5 bytes |
Reserved (not used) |
Address 5Ch |
Length: 36 bytes |
File control blocks |
Address 80h |
Length: 128 bytes (to address FFh) |
Command tail |
Located at addresses 00h and 05h are two instructions that the transient program can use.
The instruction at address 00h jumps to the BIOS and effectively ends the transient program. Apparently it does a warm reboot and reloads the command processor. Regardless where the BIOS is located, the operating system would have written the correct address in the second and third byte of the Zero Page and hence a transient program can jump to the BIOS by jumping to address 00h.
The instruction at address 05h calls the BDOS and allows a transient program to make a system call without having to know where in memory the operating system actually is. System calls are made by configuring registers for the call number and parameters and then calling address 05h. (In DOS jargon this is referred to as a "call 5" system call. It is supported by both CP/M and MS-DOS.)
CP/M provides several system calls. I have so far looked only at BIOS calls and ignored actual BDOS calls. The number of the system call is written into the C register. A "call 5" will then execute that system call.
System call name |
Description |
Registers |
Return value |
P_TERMCPM |
System reset |
C=0 |
- |
C_READ |
Read character from input device |
C=1 |
A=character |
C_WRITE |
Write character to output device |
C=2, E=character |
- |
A_READ |
Read character from serial port |
C=3 |
A=character |
A_WRITE |
Write character to serial port |
C=4, E=character |
- |
L_WRITE |
Write character to printer |
C=5, E=character |
- |
C_WRITESTR |
Write string of characters to output device |
C=9, DE=address | - |
C_READSTR |
Read string from input device until carriage return |
C=10, DE=address |
- |
S_BDOSVER |
Return version number |
C=12 |
B=system type, A=version number |
several |
BDOS functions (to access disks) |
C=13 or higher |
several |
Like MS-DOS and Windows NT CP/M uses drive letters. Usually the boot drive is drive A, the first floppy drive. The altairz80 emulator also provides a hard disk drive I.
Note that my CP/M version was actually built in 2009 and contains a BIOS written for the altairz80 emulator.
Some of the programs on drive A came with the original CP/M, some are original third-party programs, and some have been written much later to make CP/M usable in an emulator.
File names are 8 characters long and have a 3-character extension marking the type of the file. The extension is not a part of the file name and programs usually assume that the files they are supposed to work with are of a certain type and add the extensions to the name themselves.
Note that the file name extensions and file types follow the same idea as they do in MS-DOS (or rather vice versa).
ASM |
Assembler source code, ASCII (see MAC) |
BAS |
Microsoft BASIC source code, ASCII |
COM |
A transient program image (COM stands for "Command") |
MAC |
Assembler source code, ASCII (MAC stands for "Macro Assembler") |
SUB |
Batch file, ASCII (run with submit command) |
Other file types include DAT and LIB but those extensions do not really mean anything specific to the system or other programs.
ccp |
CP/M command processor |
era |
Delete files |
pip |
Copy files (pip target=source) |
format |
Format disks |
submit |
Run batch (sub) files |
asm |
Digital Research's assembler |
load |
Loads a hex file into a binary image |
mbasic |
Microsoft BASIC |
m80 |
Microsoft's assembler |
l80 |
Microsoft's linker and loader |
Luckily there is a batch file that controls m80 and l80 because Microsoft's (presumably memory-saving) syntax for the commands is rather weird.
I could not get asm to work. But m80 produced a nice "hello.com" program for me. Note that the source file must be saved in DOS format (CRLF) not UNIX format (LF).
Copy and paste the source from here: hello.mac
Note that this is a screenshot of Emacs. Add the .mac extension to Emacs' asm mode by including the line
(add-to-list 'auto-mode-alist '("[.]mac$" . asm-mode))
in your ~/.emacs file.
Note that the "r" command is not part of CP/M but of the simulator. It allows copying a file from the host into the emulated system.
You can see the weird syntax of Microsoft's assembler in the screenshot and also the syntax of starting a batch file under CP/M (submit).
Sources
Microsoft m80/l80: http://www.msxarchive.nl/pub/msx/programming/asm/m80l80.txt
CP/M BDOS calls: http://www.seasip.demon.co.uk/Cpm/bdos.html
z80 opcodes: http://www.z80.info/z80oplist.txt
Useful Software
Altair 8800 simulator: http://www.schorn.ch/altair.html
Pasmo z80 asembler for Unix: http://pasmo.speccy.org
Questions, comments, complaints, corrections go to ajbrehm@gmail.com.