CP/M

With this article I intend to start a series on personal computer operating systems from Digital Research and Microsoft, basically those systems that use drive letters and run on the Intel x80/x86 and Motorola 68k processor architectures.

You can find a list of all articles in the series on the Computing page.

This article and the article series assumes basic understanding of assembler language and hexadecimal numbers, basically of everything that is too complicated for me to understand well enough to explain to people. I myself am somewhat familiar with PIC Microcontroller machine language and some high-level programming languages, but certainly not Intel or Zilog machine code.

To test and study CP/M I used Peter Schorn's excellent altairz80 simulator (based on the multi-system simulator SIMH). Except for some old memories of using CP/M 3.0 on a Commodore 128 some 20 years ago my only experience with CP/M comes from using it on altairz80. Peter Schorn himself was a great help too. Altairz80 ist really easy to use.

The original system of the large family is Digital Research's Control Program for Microprocessors, or CP/M, first released in 1974. It became the first standardised operating system to support the same applications on computers made by different companies.

CP/M computers of the time were based on a general architecture premiered by the Altair. The hardware consisted of an Intel 8080 (later usually a Zilog z80) CPU and up to 64 KB of RAM. The 8080 and z80 were 8 bit CPUs that could address 2^16 bytes of memory (64 KB) and the direct ancestors of the IBM PC's Intel 8088 CPU. CP/M quickly became the standard operating system for personal computers.

Due to the 16 bit addressing of the common CPUs CP/M had to deal with a maximum of 64 KB of memory. However, a computer system could have less than 64 KB of memory and CP/M had to be flexible.

As I understand it the memory organisation provided for a transient program area (for applications) below the area used by the operating system. The Transient Program Area (TPA) beginning with the so-called Zero Page (ZP) begins at the beginning of memory (address 0000h) and ends where the operating system area begins.

The operating system area ends at the top of memory (ideally address FFFFh, i.e. the 65536th byte) and defines by its size the beginning of the operating system area and the end of the Transient Program Area. The operating system area is itself divided into four areas for (from highest to lowest address) Basic Input Output System (BIOS), Basic Disk Operating System (BDOS), Program Loader Module (probably PLM) and Resident System Extensions (RSE).

The Basic Input Output System provides ways to communicate with Input Output devices, usually a teletype made up of a keyboard and monitor or printer or serial ports.

The Basic Disk Operating System provides ways to communicate with disk drives. Back in the day those were 8" floppy disk drives of the type one can still see in the 1983 movie WarGames.

I don't really know what the Program Loader Module does, but I am guessing it handles loading programs. I am told that it doesn't exist in early CP/M versions but was only added with CP/M 3.0.

I have no idea what Resident System Extensions are.

I imagine that the stack grows down from the top of the Transient Program Area while the heap grows from the top of the text and data segments towards the stack. Text and data segments start on top of the Zero Page at address 100h (the 256th byte).

The Zero Page contains useful information for the transient program (the application) written into it by the operating system when a transient program is loaded. Since the operating system is located at a, for the transient program, random location, one of the purposes of the Zero Page is to store a pointer to the operating system which the transient program can call when it wants to make a system call.

The higher 128 bytes (addresses 80h to FFh) contain the command line (command tail) used to load and start the transient program. The first byte (at address 80h) specifies how many characters the command line contained. The following 127 bytes contain the characters of the command line, presumably without a final zero and without the line feed and carriage return characters. (This 127 character limit still existed in MS-DOS and Windows 25 years later.)

The low 128 bytes (addresses 00h to 7Fh) contain two File Control Blocks (FCB, in which the state of two open files are maintained, although I understand that they overwrite each other, addresses 5Ch to 7Fh) and various other data below address 5Ch.

Address 00h
Length: 3 bytes (8 bit jump instruction, 16 bit address)
Exit program/Jumps to BIOS
Address 03h
Length: 1 byte
I/O byte (I have no idea what this does)
Address 04h
Length: 1 byte
Command drive and user number
Address 05h
Length: 3 bytes (8 bit call instruction, 16 bit address)
System call/Calls the BDOS
Address 08h
Length: 51 bytes (if I counted right)
Interrupt vectors
Address 3Bh
Length: 21 bytes (again...)
Reserved (not used)
Address 50h
Length: 1 byte
Drive from which program was loaded
Address 51h
Length: 6 bytes (two 16 bit addresses, two 8 bit numbers)
Addresses and length of two FCB passwords
Address 57h
Length: 5 bytes
Reserved (not used)
Address 5Ch
Length: 36 bytes
File control blocks
Address 80h
Length: 128 bytes (to address FFh)
Command tail

Located at addresses 00h and 05h are two instructions that the transient program can use.

The instruction at address 00h jumps to the BIOS and effectively ends the transient program. Apparently it does a warm reboot and reloads the command processor. Regardless where the BIOS is located, the operating system would have written the correct address in the second and third byte of the Zero Page and hence a transient program  can jump to the BIOS by jumping to address 00h.

The instruction at address 05h calls the BDOS and allows a transient program to make a system call without having to know where in memory the operating system actually is. System calls are made by configuring registers for the call number and parameters and then calling address 05h. (In DOS jargon this is referred to as a "call 5" system call. It is supported by both CP/M and MS-DOS.) 

CP/M provides several system calls. I have so far looked only at BIOS calls and ignored actual BDOS calls. The number of the system call is written into the C register. A "call 5" will then execute that system call.

System call name
Description
Registers
Return value
P_TERMCPM
System reset
C=0
-
C_READ
Read character from input device
C=1
A=character
C_WRITE
Write character to output device
C=2, E=character
-
A_READ
Read character from serial port
C=3
A=character
A_WRITE
Write character to serial port
C=4, E=character
-
L_WRITE
Write character to printer
C=5, E=character
-
C_WRITESTR
Write string of characters to output device
C=9, DE=address -
C_READSTR
Read string from input device until carriage return
C=10, DE=address
-
S_BDOSVER
Return version number
C=12
B=system type, A=version number
several
BDOS functions (to access disks)
C=13 or higher
several

Like MS-DOS and Windows NT CP/M uses drive letters. Usually the boot drive is drive A, the first floppy drive. The altairz80 emulator also provides a hard disk drive I.

Note that my CP/M version was actually built in 2009 and contains a BIOS written for the altairz80 emulator.

Some of the programs on drive A came with the original CP/M, some are original third-party programs, and some have been written much later to make CP/M usable in an emulator.

File names are 8 characters long and have a 3-character extension marking the type of the file. The extension is not a part of the file name and programs usually assume that the files they are supposed to work with are of a certain type and add the extensions to the name themselves.

Note that the file name extensions and file types follow the same idea as they do in MS-DOS (or rather vice versa).

ASM
Assembler source code, ASCII (see MAC)
BAS
Microsoft BASIC source code, ASCII
COM
A transient program image (COM stands for "Command")
MAC
Assembler source code, ASCII (MAC stands for "Macro Assembler")
SUB
Batch file, ASCII (run with submit command)

Other file types include DAT and LIB but those extensions do not really mean anything specific to the system or other programs.

ccp
CP/M command processor
era
Delete files
pip
Copy files (pip target=source)
format
Format disks
submit
Run batch (sub) files
asm
Digital Research's assembler
load
Loads a hex file into a binary image
mbasic
Microsoft BASIC
m80
Microsoft's assembler
l80
Microsoft's linker and loader

Luckily there is a batch file that controls m80 and l80 because Microsoft's (presumably memory-saving) syntax for the commands is rather weird.

I could not get asm to work. But m80 produced a nice "hello.com" program for me. Note that the source file must be saved in DOS format (CRLF) not UNIX format (LF).

Copy and paste the source from here: hello.mac

Note that this is a screenshot of Emacs. Add the .mac extension to Emacs' asm mode by including the line

(add-to-list 'auto-mode-alist '("[.]mac$" . asm-mode))

in your ~/.emacs file.

Note that the "r" command is not part of CP/M but of the simulator. It allows copying a file from the host into the emulated system.

You can see the weird syntax of Microsoft's assembler in the screenshot and also the syntax of starting a batch file under CP/M (submit).

To be continued…


Sources

Microsoft m80/l80: http://www.msxarchive.nl/pub/msx/programming/asm/m80l80.txt

CP/M BDOS calls: http://www.seasip.demon.co.uk/Cpm/bdos.html

z80 opcodes: http://www.z80.info/z80oplist.txt


Useful Software

Altair 8800 simulator: http://www.schorn.ch/altair.html

Pasmo z80 asembler for Unix: http://pasmo.speccy.org


Questions, comments, complaints, corrections go to ajbrehm@gmail.com.


© Andrew Brehm 2013