Some Address Math
for the
68000
After a break for multi-byte negation, because address math is so important, I think I should show you explicit 68000 corollaries for what I've shown you for the 6809, as well as the routines for the 6801 and for the 6800.
When instructions become more general, they often take more bytes to encode. This is especially clear for the 68000. And when you generalize an operation, it often takes more instructions to implement -- even with a more powerful instruction set CPU. And the more you repeat those multiple instructions, the more opportunity you have to make mistakes.
More than speed and byte count, this is why we define utility routines like we just looked at for the 6800 and
6801. We don't want to give ourselves too many opportunities for mistakes. (Macros can help with this, but we won't talk about that just yet.)
Between the 6809 and the 68000, it can be kind of a wash -- when you're working on 16-bit numbers and small applications that fit in a 64K memory space. When you start working with 32-bit numbers, it's advantage 68000, ... except then you also tend to work with 32-bit addresses, and the addresses can make byte count swell.
I transliterated the fig implementation of Forth from 6800 to 68000, and the object image size increased by about 80% (real rough estimate). This is because I didn't want to restrict it to operating in the lower 32K of memory, minus the interrupt vector table, so the virtual machine i-codes (function addresses, really) swelled from 16-bit to 32-bit. And since the Forth is mostly a clot of i-codes, the overall image size swells.
I started a conversion to direct call, which I got lost in (partly motivating this tutorial), and the code size does seem to improve a bit, but not completely to the size of the 6809 image.
Do look at assembly listings when you try to compare code sizes for stuff.
In particular, the 68000 will often seem to take about twice the
code bytes that the 6809 takes in these snippets. But when we move to concrete code where pieces come together, the code size comes down closer to the 6809 code size.
And I'll note again, being able to use single instructions instead of utility routines is nice, but it's actually more important that the 68000 has something of an optimal number of registers, so we don't have to worry about pseudo-registers in memory when switching processes.
As always, read the code and the comments in the code, and open up separate browser windows and compare side-by-side.
I'm showing the entire 68000 code in a single block because the abstract operations don't quite map the same, but I'm keeping the order roughly the same to keep it easy to find what to compare.
How registers are mapping when moving from 6809 to 68000 --
- I'm mapping the 6809's S to A7, of course;
- U to A6;
- DP will map to A5;
- X mostly to A0;
- Y to whatever.
- B is sort-of mapped to D7;
- A is sort-of mapped to D6 or the top bytes of D7 or D5 or something, depending on what I need it to do.
(And please don't just copy-and-paste code without thinking.)
* 68000 pointer math
ORG SOMETHING
* All of these work fine in-line, rather than called as subroutines.
* In fact, unless specifically specified otherwise, you should in-line.
* You can substitute any data register unless specified otherwise.
*
* Likewise, you can substitute any address register,
* except that A7 should always be in-lined --
* -- except for those routines which specifically handle the return address,
* but those routines are not really intended to be used anyway.
* Calling a subroutine and playing with the return stack
* without handling the return address
* just is not a good way to keep control of your program.
*
* And then there is alignment. 68000 needs 16- and 32-bit accesses
* to be 16-bit aligned, and will throw address errors if they are not.
* (Later CPUs are not so restricted.)
*
* Negate Dn in 8, 16, or 32 bits:
NEGLD7 NEG.L D7 ; .L => 32 bits, .W => 16 bits, .W => 8 bits
RTS
* On the 6800/6801/6809, you can negate (2's complement) a byte
* using a 1-byte instruction.
* On the 68000, it takes a 2-byte instruction.
* It takes 5 bytes of instruction to negate 16 bits on 6800/1/9,
* and 13 bytes to negate 32 bits.
* But on the 68000, it takes just two,
* the above 16-bit op-code with a couple of bits changed.
* This is a common pattern with 68000 instructions.
*
* And, for all the time I spend explaining NEG,
* since the 68000 can subtract registers in either order,
* we really don't need NEG here.
* Unsigned byte offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
ADDBX AND.W #$FF,D7 ; zero extend it
ADD.W D7,A0 ; 16-bit source sign extended to 32 bits
RTS
* Alternative
ADDBXalt
AND.W #FF,D7
LEA (A0,D7.W),A0 ; takes more bytes
RTS
*
* Signed byte offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
ADSBX EXT.W D7
ADD.W D7,A0 ; 16-bit source sign extended to 32 bit An
RTS
* Alternative
ADSBXalt
EXT.W D7
LEA (A0,D7.W),A0 ; takes more bytes
RTS
*
* Unsigned byte offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
SUBBX AND.W #$FF,D7 ; zero extend it
SUB.W D7,A0 ; 16-bit source sign extended to 32 bits
RTS
* Signed byte offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
SBSBX EXT.W D7
SUB.W D7,A0 ; 16-bit source sign extended to 32 bit An
RTS
*
* Unsigned 16-bit offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
ADDWX AND.L #$FFFF,D7 ; zero extend it
ADD.L D7,A0
RTS
* Alternative
ADDWXalt
AND.L #FFFF,D7
LEA (A0,D7.L),A0 ; takes more bytes
RTS
*
* Signed 16-bit offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
ADSWX ADD.W D7,A0
RTS
* Alternative
ADSWXalt
LEA (A0,D7.W),A0 ; takes more bytes
RTS
* Alternative
ADSWXalt2
LEA (A0,A1.W),A0 ; takes more bytes
RTS
*
* Unsigned 16-bit offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
SUBWX AND.L #$FFFF,D7 ; zero extend it
SUB.L D7,A0
RTS
* Signed 16-bit offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
SBSWX SUB.W D7,A0
RTS
*
* 32-bit offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
ADDLX ADD.L D7,A0
RTS
* Alternative
ADDLXalt
LEA (A0,D7.L),A0 ; takes more bytes
RTS
* Alternative
ADDLXalt2
LEA (A0,A1.L),A0 ; takes more bytes
RTS
*
* 32-bit offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
SUBLX SUB.L D7,A0
RTS
*
*************
* For the return stack
* As explained above, just in-line the LEA.
* These are provided as a solution to a puzzle,
* not as useful code.
*
* Signed byte offset
* Just in-line the EXT.W and ADD.W
ADSBS MOVE.L (A7)+,A0 ; get return address, restore stack address
EXT.W D7 ; zero extend it
ADD.W D7,A7 ; 16-bit source sign extended to 32 bits
JMP (A0) ; return via A0
* See above about LEA instead of ADD.
*
* Unsigned byte offset
* Just in-line the AND.W and ADD.W
ADDBS MOVE.L (A7)+,A0 ; get return address, restore stack address
AND.W #$FF,D7 ; zero extend it
ADD.W D7,A7 ; 16-bit source sign extended to 32 bits
JMP (A0) ; return via A0
* See above about LEA instead of ADD.
* Signed 16-bit offset
* Just in-line the ADD.W
ADSWS MOVE.L (A7)+,A0 ; get return address, restore stack address
ADD.W D7,A7 ; 16-bit source sign extended to 32 bits
JMP (A0) ; return via A0
* See above about LEA instead of ADD.
*
* Unsigned 16-bit offset
* Just in-line the AND.L and ADD.L
ADDWS MOVE.L (A7)+,A0 ; get return address, restore stack address
AND.L #$FFFF,D7 ; zero extend it
ADD.L D7,A7 ; 16-bit source sign extended to 32 bits
JMP (A0) ; return via A0
* See above about LEA instead of ADD.
* 32-bit offset
* Just in-line the ADD.L
ADDLS MOVE.L (A7)+,A0 ; get return address, restore stack address
ADD.L D7,A7 ; 32 bits
JMP (A0) ; return via A0
*
* Unsigned byte offset
* Just in-line the AND.W and SUB.W
SUBBS MOVE.L (A7)+,A0 ; get return address, restore stack address
AND.W #$FF,D7 ; zero extend it
SUB.W D7,A7 ; 16-bit source sign extended to 32 bits
JMP (A0) ; return via A0
*
* Unsigned 16-bit offset
* Just in-line the AND.L and SUB.L
SUBWS MOVE.L (A7)+,A0 ; get return address, restore stack address
AND.L #$FFFF,D7 ; zero extend it
SUB.L D7,A7 ; 32 bits
JMP (A0) ; return via A0
*
* Signed byte offset
* Just in-line the EXT.W and SUB.W
SUBBS MOVE.L (A7)+,A0 ; get return address, restore stack address
EXT.W D7 ; sign extend it
SUB.W D7,A7 ; 16-bit source sign extended to 32 bits
JMP (A0) ; return via A0
*
* Signed 16-bit offset
* Just in-line the SUB.W
SUBWS MOVE.L (A7)+,A0 ; get return address, restore stack address
SUB.W D7,A7 ; 16-bit source sign extended to 32 bits
JMP (A0) ; return via A0
* 32-bit offset
* Just in-line the SUB.L
SUBWS MOVE.L (A7)+,A0 ; get return address, restore stack address
SUB.L D7,A7 ; 32 bits
JMP (A0) ; return via A0
*
* INX and DEX trains and INS and DES trains are meaningless.
* HOWEVER, just to remind ourselves:
* (And all of these work for Y and U, too but IN-LINE them!!)
* (They work for S if in-lined, as well.)
ADD16X LEA 16(A0),A0
RTS
ADD14X LEA 14(A0),A0
RTS
SUB16X LEA -16(A0),A0
RTS
* Etc. In-line these.
INX LEA 1(A0),A0 ; Sigh. In-line it. Do not make trains with it. Please.
RTS
DEX LEA -1(A0),A0 ; See INX. In-line it. Do not make trains with it. PLEASE.
RTS
* Note that we can also use ADDQ and SUBQ for offset less than 9
*
* More solutions to puzzles.
* If you called these, you would have to juggle the return address as shown.
* You don't want to do that.
* Just in-line the LEAS instructions.
* Then there's no return address to juggle, no messing with X.
* DO NOT USE THIS CODE other than for examples of silly walks.
ADD16S MOVE.L (A7)+,A0
LEA 16(A7),A7
JMP (A0)
* etc.
* Could all be replaced with just LEA 16(A7),A7 in-line!
* That's actually cheaper than just the instruction JSR!!!
* Synthetic stacks restricted within page boundaries make no sense at all
* on the 68000. Except, I suppose they could, sort-of.
*
* In the first place,
* we should be able to use an extra address register to make a third stack.
* If we do, addressing has already been covered, above.
*
* But if we want a software stack maintained by pointers in memory,
* for some reason,
* Given a pseudo-register somewhere in process local variable space
* accessed via A5:
ORG SOMEWHERE
...
QSP DS.L 1 ; a synthetic stack pointer Q
* QSP-LOCBAS has to be within +/-32K on 68000, 2-byte op-code, 2-byte offset, syntax: QSP-LOCBAS(A5)
* 68020 and above allows 32-bit range, 4-byte op-code, 4-byte offset, syntax: (QSP-LOCBAS,A5)
...
DS.L 2 ; buffer zone
QSTKLIM DS.L 32
QSTKBAS DS.L 2 ; buffer zone
...
* 32-bit Dn for synthetic stack (could/should be in-line):
ADDQSP ADD.L D7,QSP-LOCBAS(A5) ; 4 bytes in op-code (+/-32K)
RTS
* If QSTKLIM-8 to QSTKBAS+7 are within an even 256-byte page boundary
* so that carries cannot be generated:
* unsigned byte D7
ADDQSPS ADD.B D7,QSP+3-LOCBAS(A5) ; 4 bytes in op-code (+/-32K)
RTS
* If QSTKLIM-8 to QSTKBAS+7 are within an even 65536-byte page boundary
* so that carries cannot be generated:
* unsigned 16-bit D7
ADDQSPW ADD.W D7,QSP+3-LOCBAS(A5) ; 4 bytes in op-code (+/-32K)
RTS
*
* 32-bit Dn for synthetic stack (could/should be in-line):
SUBQSP SUB.L D7,QSP-LOCBAS(A5) ; 4 bytes in op-code (+/-32K)
RTS
* If QSTKLIM-8 to QSTKBAS+7 are within an even 256-byte page boundary
* so that carries cannot be generated:
* unsigned byte
SUBQSPS SUB.B D7,QSP+3-LOCBAS(A5) ; 4 bytes in op-code (+/-32K)
RTS
* If QSTKLIM-8 to QSTKBAS+7 are within an even 65536-byte page boundary
* so that carries cannot be generated:
* unsigned 16-bit D7
SUBQSPW SUB.W D7,QSP+3-LOCBAS(A5) ; 4 bytes in op-code (+/-32K)
RTS
* 68000 has no memory indirection
QPSHD7L MOVE.L QSP-LOCBAS(A5),A4 ; 4 bytes in op-code
MOVE.L D7,-(A4) ; 2 bytes in op-code
MOVE.L A4,QSP-LOCBAS(A5) ; 4 bytes in op-code
RTS
*
* 68020+ have memory indirection
QPSHD7LI
SUBQ.L #4,QSP-LOCBAS(A5) ; 4 bytes in op-code (SUBQ.W would be faster for medium stack)
MOVE.L D7,([A4]) ; 4 bytes in op-code
RTS
*
QPOPD7L MOVE.L QSP-LOCBAS(A5),A4 ; 4 bytes in op-code
MOVE.L (A4)+,D7 ; 2 bytes in op-code
MOVE.L A4,QSP-LOCBAS(A5) ; 4 bytes in op-code
RTS
*
* 68020+ have memory indirection
QPOPD7LI
MOVE.L ([A4]),D7 ; 4 bytes in op-code
ADDQ.L #4,QSP-LOCBAS(A5) ; 4 bytes in op-code (ADDQ.W would be faster for medium stack)
RTS
* Register offsets from A7 were dealt with above.
* Lest I forget --
* On the 6800 or 6801, this would be reference by a process-local
* LOCALBASE or similar pseudo-register, which I almost forgot to talk about.
* On the 6809, it could be done by pseudo-register or (with some glue) by DP.
* On the 68000, we are going to use a spare address register,
* and I am going to pick A5.
* All the address math has been shown above,
* the only issue is being explicit about the assembly language idiom.
* Lest I forget --
*
* Given
ORG Whatever
LOCBAS EQU *
* ...
VAR DS.B m ; or .W or .L, etc.
*
* With A5 known to be set to LOCBAS,
LEA LOCBAS(PC),A5
* or
MOVEA.L #LOCBAS,A5
*
* In-line snippets --
* For variable VAR within 256 bytes of LOCBAS:
...
LEA VAR-LOCBAS(A5),A0 ; that's all! (4-byte op-code)
...
*
* When VAR is 256 bytes or more away from LOCBAS, but less than 32768
* (or, even, below LOCBAS but within -32768), in other words, signed 16-bit offset:
...
LEA VAR-LOCBAS(A5),A0 ; same thing!
...
*
* It's a little messier when the signed offset doesn't fit in 16 bits,
* less than -32768 below, or 32768 or greater above --
...
MOVE.L #VAR-LOCBASE,D7 ; Any Dn. An will also work, if it's not in use. 6 bytes.
LEA (A5,D7.L),A0 ; 4 bytes. total 10 bytes.
...
*
* From the 68020 on, 32-bit offsets are allowed, but the op-code is also 32-bits plus displacement:
...
LEA (VAR-LOCBASE,A5),A0 ; 8 byte total op-code
...
*
* Do I really need to show this as subroutines?
* signed 16-bit offset in D7:
LEALBWX LEA (A5,D7.W),A0 ; PLEASE just do this in-line!
RTS
*
* 32-bit offset in D7:
LEALBLX LEA (A5,D7.L),A0 ; PLEASE just do this in-line!
RTS
* ;-/
*
* I assume you're not going to be wanting to keep LOCBAS
* in a pseudo-register called LB_BASE.
* But you might want to maintain a separate allocation area
* with a pointer in AL_BASE, like this:
LOCBAS EQU *
...
AL_BASE DS.L 1
...
* for signed 16-bit offsets in D7:
ADDLBW MOVE.L AL_BASE-LOCBAS(A5),A0
ADD.W D7,A0 ; or LEA (A0,D7.W),A0
RTS
* for unsigned 16-bit offsets:
ADDLBU AND.L #$0000FFFF,D7 ; unsigned offset
* for 32-bit offsets
ADDLBL MOVE.L AL_BASE-LOCBAS(A5),A0
ADD.L D7,A0 ; or LEA (A0,D7.L),A0
RTS
*
* 68020 and above allow you to do weird things like this --
...
LEA ([AL_BASE-LOCBASE,A5],D7.L),A0
* ... ; 8-o
* ... quite literally letting you index directly off that pseudo-register
* out there in memory.
*
* As near as I can tell,
* memory indirect modes all require an address register,
* or the PC.
* But that's not so bad, other than some of the modes being overkill.
*
* And, in spite of my mugging, maybe this has been a good way
* to expand your grasp of the power of the 68000 addressing modes.
* Sorry about the mugging. Sort-of. ;-/
As you can see, the 68000 just basically does almost all the address math you need without subroutines.
Including, to some extent, arrays, but let's not go there yet.
As with the previous three chapters, I have not tested the code. It should run, modulo typos.
The 68000 is hard to wrap your head around. I know. If the above doesn't make sense yet, it's okay. I'll point you back here from time to time when we are working with more concrete examples of using the above
Look at how I've been avoiding this. I think it's time to build a
concrete example of stack frames on the 6801.
Or you can jump ahead to getting numeric output in binary.