Sunday, November 3, 2024

ALPP 02-24 -- Some Address Math for the 68000

  Some Address Math
for the
68000

(Title Page/Index)

After a break for multi-byte negation, because address math is so important, I think I should show you explicit 68000 corollaries for what I've shown you for the 6809, as well as the routines  for the 6801 and for the 6800.

When instructions become more general, they often take more bytes to encode. This is especially clear for the 68000. And when you generalize an operation, it often takes more instructions to implement -- even with a more powerful instruction set CPU. And the more you repeat those multiple instructions, the more opportunity you have to make mistakes. 

More than speed and byte count, this is why we define utility routines like we just looked at for the 6800 and 6801. We don't want to give ourselves too many opportunities for mistakes. (Macros can help with this, but we won't talk about that just yet.)

Between the 6809 and the 68000, it can be kind of a wash -- when you're working on 16-bit numbers and small applications that fit in a 64K memory space. When you start working with 32-bit numbers, it's advantage 68000, ... except then you also tend to work with 32-bit addresses, and the addresses can make byte count swell. 

I transliterated the fig implementation of Forth from 6800 to 68000, and the object image size increased by about 80% (real rough estimate). This is because I didn't want to restrict it to operating in the lower 32K of memory, minus the interrupt vector table, so the virtual machine i-codes (function addresses, really) swelled from 16-bit to 32-bit. And since the Forth is mostly a clot of i-codes, the overall image size swells. 

I started a conversion to direct call, which I got lost in (partly motivating this tutorial), and the code size does seem to improve a bit, but not completely to the size of the 6809 image.

Do look at assembly listings when you try to compare code sizes for stuff. In particular, the 68000 will often seem to take about twice the code bytes that the 6809 takes in these snippets. But when we move to concrete code where pieces come together, the code size comes down closer to the 6809 code size.

And I'll note again, being able to use single instructions instead of utility routines is nice, but it's actually more important that the 68000 has something of an optimal number of registers, so we don't have to worry about pseudo-registers in memory when switching processes.

As always, read the code and the comments in the code, and open up separate browser windows and compare side-by-side.

I'm showing the entire 68000 code in a single block because the abstract operations don't quite map the same, but I'm keeping the order roughly the same to keep it easy to find what to compare. 

[JMR202411070913 addendum:]

You may have missed the mention of the "here pointer" in the 6809 address math chapter:

LOCBAS	EQU	*

In Motorola assemblers, an asterisk where the assembler could parse an address means the location of the current instruction or directive, thus, "here". I'll need to explain more about it later, I'm sure.

[JMR202411070913 addendum end.]

How registers are mapping when moving from 6809 to 68000 --

  • I'm mapping the 6809's S to A7, of course;
  • U to A6;
  • DP will map to A5;
  • X mostly to A0;
  • Y to whatever.
  • B is sort-of mapped to D7;
  • A is sort-of mapped to D6 or the top bytes of D7 or D5 or something, depending on what I need it to do.

(And please don't just copy-and-paste code without thinking.)

* 68000 pointer math

	ORG	SOMETHING
* All of these work fine in-line, rather than called as subroutines.
* In fact, unless specifically specified otherwise, you should in-line.
* You can substitute any data register unless specified otherwise.
*
* Likewise, you can substitute any address register,
* except that A7 should always be in-lined --
* -- except for those routines which specifically handle the return address, 
* but those routines are not really intended to be used anyway.
* Calling a subroutine and playing with the return stack 
* without handling the return address
* just is not a good way to keep control of your program.
*
* And then there is alignment. 68000 needs 16- and 32-bit accesses 
* to be 16-bit aligned, and will throw address errors if they are not.
* (Later CPUs are not so restricted.)
*
* Negate Dn in 8, 16, or 32 bits:
NEGLD7	NEG.L	D7	; .L => 32 bits, .W => 16 bits, .W => 8 bits
	RTS
* On the 6800/6801/6809, you can negate (2's complement) a byte 
* using a 1-byte instruction.
* On the 68000, it takes a 2-byte instruction.
* It takes 5 bytes of instruction to negate 16 bits on 6800/1/9,
* and 13 bytes to negate 32 bits.
* But on the 68000, it takes just two,
* the above 16-bit op-code with a couple of bits changed.
* This is a common pattern with 68000 instructions.
*
* And, for all the time I spend explaining NEG, 
* since the 68000 can subtract registers in either order, 
* we really don't need NEG here.

* Unsigned byte offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
ADDBX	AND.W	#$FF,D7	; zero extend it
	ADD.W	D7,A0	; 16-bit source sign extended to 32 bits
	RTS
* Alternative
ADDBXalt
	AND.W	#FF,D7
	LEA	(A0,D7.W),A0	; takes more bytes
	RTS
*
* Signed byte offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
ADSBX	EXT.W	D7
	ADD.W	D7,A0	; 16-bit source sign extended to 32 bit An
	RTS
* Alternative
ADSBXalt
	EXT.W	D7
	LEA	(A0,D7.W),A0	; takes more bytes
	RTS
*
* Unsigned byte offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
SUBBX	AND.W	#$FF,D7	; zero extend it
	SUB.W	D7,A0	; 16-bit source sign extended to 32 bits
	RTS

* Signed byte offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
SBSBX	EXT.W	D7
	SUB.W	D7,A0	; 16-bit source sign extended to 32 bit An
	RTS
*
* Unsigned 16-bit offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
ADDWX	AND.L	#$FFFF,D7	; zero extend it
	ADD.L	D7,A0	
	RTS
* Alternative
ADDWXalt
	AND.L	#FFFF,D7
	LEA	(A0,D7.L),A0	; takes more bytes
	RTS
*
* Signed 16-bit offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
ADSWX	ADD.W	D7,A0	
	RTS
* Alternative
ADSWXalt
	LEA	(A0,D7.W),A0	; takes more bytes
	RTS
* Alternative
ADSWXalt2
	LEA	(A0,A1.W),A0	; takes more bytes
	RTS
*
* Unsigned 16-bit offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
SUBWX	AND.L	#$FFFF,D7	; zero extend it
	SUB.L	D7,A0	
	RTS

* Signed 16-bit offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
SBSWX	SUB.W	D7,A0	
	RTS
*
* 32-bit offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
ADDLX	ADD.L	D7,A0	
	RTS
* Alternative
ADDLXalt
	LEA	(A0,D7.L),A0	; takes more bytes
	RTS
* Alternative
ADDLXalt2
	LEA	(A0,A1.L),A0	; takes more bytes
	RTS
*
* 32-bit offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
SUBLX	SUB.L	D7,A0	
	RTS
*


*************
* For the return stack
* As explained above, just in-line the LEA.
* These are provided as a solution to a puzzle,
* not as useful code.
*
* Signed byte offset
* Just in-line the EXT.W and ADD.W
ADSBS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	EXT.W	D7	; zero extend it
	ADD.W	D7,A7	; 16-bit source sign extended to 32 bits
	JMP	(A0)	; return via A0
* See above about LEA instead of ADD.
*
* Unsigned byte offset
* Just in-line the AND.W and ADD.W
ADDBS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	AND.W	#$FF,D7	; zero extend it
	ADD.W	D7,A7	; 16-bit source sign extended to 32 bits
	JMP	(A0)	; return via A0
* See above about LEA instead of ADD.

* Signed 16-bit offset
* Just in-line the ADD.W
ADSWS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	ADD.W	D7,A7	; 16-bit source sign extended to 32 bits
	JMP	(A0)	; return via A0
* See above about LEA instead of ADD.
*
* Unsigned 16-bit offset
* Just in-line the AND.L and ADD.L
ADDWS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	AND.L	#$FFFF,D7	; zero extend it
	ADD.L	D7,A7	; 16-bit source sign extended to 32 bits
	JMP	(A0)	; return via A0
* See above about LEA instead of ADD.

* 32-bit offset
* Just in-line the ADD.L
ADDLS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	ADD.L	D7,A7	; 32 bits
	JMP	(A0)	; return via A0
*
* Unsigned byte offset
* Just in-line the AND.W and SUB.W
SUBBS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	AND.W	#$FF,D7	; zero extend it
	SUB.W	D7,A7	; 16-bit source sign extended to 32 bits
	JMP	(A0)	; return via A0
*
* Unsigned 16-bit offset
* Just in-line the AND.L and SUB.L
SUBWS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	AND.L	#$FFFF,D7	; zero extend it
	SUB.L	D7,A7	; 32 bits
	JMP	(A0)	; return via A0
*
* Signed byte offset
* Just in-line the EXT.W and SUB.W
SUBBS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	EXT.W	D7	; sign extend it
	SUB.W	D7,A7	; 16-bit source sign extended to 32 bits
	JMP	(A0)	; return via A0
*
* Signed 16-bit offset
* Just in-line the SUB.W
SUBWS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	SUB.W	D7,A7	; 16-bit source sign extended to 32 bits
	JMP	(A0)	; return via A0

* 32-bit offset
* Just in-line the SUB.L
SUBWS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	SUB.L	D7,A7	; 32 bits
	JMP	(A0)	; return via A0
*

* INX and DEX trains and INS and DES trains are meaningless.
* HOWEVER, just to remind ourselves:
* (And all of these work for Y and U, too but IN-LINE them!!)
* (They work for S if in-lined, as well.)
ADD16X	LEA 	16(A0),A0
	RTS
ADD14X	LEA	14(A0),A0
	RTS
SUB16X	LEA	-16(A0),A0
	RTS
* Etc. In-line these.
INX	LEA	1(A0),A0	; Sigh. In-line it. Do not make trains with it. Please.
	RTS
DEX	LEA	-1(A0),A0	; See INX. In-line it. Do not make trains with it. PLEASE.
	RTS
* Note that we can also use ADDQ and SUBQ for offset less than 9
*
* More solutions to puzzles.
* If you called these, you would have to juggle the return address as shown.
* You don't want to do that.
* Just in-line the LEAS instructions.
* Then there's no return address to juggle, no messing with X.
* DO NOT USE THIS CODE other than for examples of silly walks.
ADD16S	MOVE.L	(A7)+,A0
	LEA	16(A7),A7
	JMP	(A0)
* etc.
* Could all be replaced with just LEA	16(A7),A7 in-line!
* That's actually cheaper than just the instruction JSR!!!


* Synthetic stacks restricted within page boundaries make no sense at all
* on the 68000. Except, I suppose they could, sort-of.
*
* In the first place,
* we should be able to use an extra address register to make a third stack.
* If we do, addressing has already been covered, above.
*
* But if we want a software stack maintained by pointers in memory,
* for some reason,
* Given a pseudo-register somewhere in process local variable space
* accessed via A5:
	ORG	SOMEWHERE
	...
QSP	DS.L	1	; a synthetic stack pointer Q
* QSP-LOCBAS has to be within +/-32K on 68000, 2-byte op-code, 2-byte offset, syntax: QSP-LOCBAS(A5)
* 68020 and above allows 32-bit range, 4-byte op-code, 4-byte offset, syntax: (QSP-LOCBAS,A5)
	...
	DS.L	2	; buffer zone
QSTKLIM	DS.L	32
QSTKBAS	DS.L	2	; buffer zone
	...

* 32-bit Dn for synthetic stack (could/should be in-line):
ADDQSP	ADD.L	D7,QSP-LOCBAS(A5)	; 4 bytes in op-code (+/-32K)
	RTS
* If QSTKLIM-8 to QSTKBAS+7 are within an even 256-byte page boundary
* so that carries cannot be generated:
* unsigned byte D7
ADDQSPS	ADD.B	D7,QSP+3-LOCBAS(A5)	; 4 bytes in op-code (+/-32K)
	RTS
* If QSTKLIM-8 to QSTKBAS+7 are within an even 65536-byte page boundary
* so that carries cannot be generated:
* unsigned 16-bit D7
ADDQSPW	ADD.W	D7,QSP+3-LOCBAS(A5)	; 4 bytes in op-code (+/-32K)
	RTS
*
* 32-bit Dn for synthetic stack (could/should be in-line):
SUBQSP	SUB.L	D7,QSP-LOCBAS(A5)	; 4 bytes in op-code (+/-32K)
	RTS
* If QSTKLIM-8 to QSTKBAS+7 are within an even 256-byte page boundary
* so that carries cannot be generated:
* unsigned byte
SUBQSPS	SUB.B	D7,QSP+3-LOCBAS(A5)	; 4 bytes in op-code (+/-32K)
	RTS
* If QSTKLIM-8 to QSTKBAS+7 are within an even 65536-byte page boundary
* so that carries cannot be generated:
* unsigned 16-bit D7
SUBQSPW	SUB.W	D7,QSP+3-LOCBAS(A5)	; 4 bytes in op-code (+/-32K)
	RTS

* 68000 has no memory indirection
QPSHD7L	MOVE.L	QSP-LOCBAS(A5),A4	; 4 bytes in op-code
	MOVE.L	D7,-(A4)		; 2 bytes in op-code
	MOVE.L	A4,QSP-LOCBAS(A5)	; 4 bytes in op-code
	RTS
*
* 68020+ have memory indirection
QPSHD7LI
	SUBQ.L	#4,QSP-LOCBAS(A5)	; 4 bytes in op-code (SUBQ.W would be faster for medium stack)
	MOVE.L	D7,([A4])		; 4 bytes in op-code
	RTS
*
QPOPD7L	MOVE.L	QSP-LOCBAS(A5),A4	; 4 bytes in op-code
	MOVE.L	(A4)+,D7		; 2 bytes in op-code
	MOVE.L	A4,QSP-LOCBAS(A5)	; 4 bytes in op-code
	RTS
*
* 68020+ have memory indirection
QPOPD7LI
	MOVE.L	([A4]),D7		; 4 bytes in op-code
	ADDQ.L	#4,QSP-LOCBAS(A5)	; 4 bytes in op-code (ADDQ.W would be faster for medium stack)
	RTS


* Register offsets from A7 were dealt with above.

* Lest I forget --
* On the 6800 or 6801, this would be reference by a process-local
* LOCALBASE or similar pseudo-register, which I almost forgot to talk about.
* On the 6809, it could be done by pseudo-register or (with some glue) by DP.
* On the 68000, we are going to use a spare address register,
* and I am going to pick A5.
* All the address math has been shown above,
* the only issue is being explicit about the assembly language idiom.
* Lest I forget --
*
* Given 
	ORG	Whatever
LOCBAS	EQU	*
*	...
VAR	DS.B	m	; or .W or .L, etc.
*
* With A5 known to be set to LOCBAS,
	LEA	LOCBAS(PC),A5
* or
	MOVEA.L	#LOCBAS,A5
*
* In-line snippets --
* For variable VAR within 256 bytes of LOCBAS:
	...
	LEA	VAR-LOCBAS(A5),A0	; that's all! (4-byte op-code)
	...
*
* When VAR is 256 bytes or more away from LOCBAS, but less than 32768
* (or, even, below LOCBAS but within -32768), in other words, signed 16-bit offset:
	...
	LEA	VAR-LOCBAS(A5),A0	; same thing!
	...
*
* It's a little messier when the signed offset doesn't fit in 16 bits, 
* less than -32768 below, or 32768 or greater above --
	...
	MOVE.L	#VAR-LOCBASE,D7		; Any Dn. An will also work, if it's not in use. 6 bytes.
	LEA	(A5,D7.L),A0		; 4 bytes. total 10 bytes. 
	...
*
* From the 68020 on, 32-bit offsets are allowed, but the op-code is also 32-bits plus displacement:
	...
	LEA	(VAR-LOCBASE,A5),A0	; 8 byte total op-code
	...
* 
* Do I really need to show this as subroutines?
* signed 16-bit offset in D7:
LEALBWX	LEA	(A5,D7.W),A0	; PLEASE just do this in-line!
	RTS
*
* 32-bit offset in D7:
LEALBLX	LEA	(A5,D7.L),A0	; PLEASE just do this in-line!
	RTS
*			;-/
* 
* I assume you're not going to be wanting to keep LOCBAS
* in a pseudo-register called LB_BASE.
* But you might want to maintain a separate allocation area
* with a pointer in AL_BASE, like this:
LOCBAS	EQU	*
	...
AL_BASE	DS.L	1
	...
* for signed 16-bit offsets in D7: 
ADDLBW	MOVE.L	AL_BASE-LOCBAS(A5),A0
	ADD.W	D7,A0	; or LEA (A0,D7.W),A0
	RTS
* for unsigned 16-bit offsets:
ADDLBU	AND.L	#$0000FFFF,D7	; unsigned offset
* for 32-bit offsets
ADDLBL	MOVE.L	AL_BASE-LOCBAS(A5),A0
	ADD.L	D7,A0	; or LEA (A0,D7.L),A0
	RTS
*
* 68020 and above allow you to do weird things like this --
	...
	LEA	([AL_BASE-LOCBASE,A5],D7.L),A0
*	...					;  8-o
* ... quite literally letting you index directly off that pseudo-register
* out there in memory.
*
* As near as I can tell,
* memory indirect modes all require an address register,
* or the PC. 
* But that's not so bad, other than some of the modes being overkill.
*
* And, in spite of my mugging, maybe this has been a good way
* to expand your grasp of the power of the 68000 addressing modes.

* Sorry about the mugging. Sort-of. ;-/

As you can see, the 68000 just basically does almost all the address math you need without subroutines.

Including, to some extent, arrays, but let's not go there yet.

As with the previous three chapters, I have not tested the code. It should run, modulo typos.

The 68000 is hard to wrap your head around. I know. If the above doesn't make sense yet, it's okay. I'll point you back here from time to time when we are working with more concrete examples of using the above

Look at how I've been avoiding this. I think it's time to build a concrete example of stack frames on the 6801.

Or you can jump ahead to getting numeric output in binary.


(Title Page/Index)


 

 

 

 

No comments:

Post a Comment