Sunday, October 6, 2024

ALPP 02-12 -- On the Beach with Parameters -- 16/32-bit Arithmetic on the 68000

On the Beach with Parameters --
16/32-bit Arithmetic
on the 68000

(Title Page/Index)

 

So. 

Three different 8-bit processors, three different modes each for passing parameters at run-time. And the direct page for statically allocated variables. 

You thought I was showing you how to add and subtract? Well, yeah, that, too.

In all of this, the ancient 6809 still really looks impressive, if it weren't for the apparent simplicity and efficiency of static allocation on that descendant of the 6801 that management wants to compare it with, the 68HC11.

Apparent simplicity and efficiency. Danger! Danger! Will Robinson!

Heh. Okay, that's too dramatic, but the hidden dangers are there and show up when you have a project that suddenly outgrows the single-process-with-a-few-concurrent-subtasks (-threads) model that works reasonably well on the 6801 and even better on the 68HC11. 

And successful projects do grow. You want them to grow, don't you?

But, on the other hand, if you engineer every project for maximum growth, you have dozens of projects that die from over-engineering. So, ... 

Anyway, Motorola never extended the 6809 the way it should have been, and then  Hitachi did some random extensions that they hid (reference the 6309 CPU). 

So the next step up from the 68HC11 became the 68000, and this chapter is about where in the memory map parameters and variables on the 68000 should go.

(Two reasons I haven't been treating the 68HC11 in these tutorials -- 

  • (1) I haven't been able to find a good open source/libre simulator. And 
  • (2) The 68HC11 run-time model is going to be really close to the 6801 model. And 
  • (3) using the Y register well and appropriately requires careful analysis of the target application. In some cases, for instance, it could be an effective parameter stack pointer. In others, you would not want to do that.

What? Was that three reasons, not two? You may be right about that. :)

(The 6309? Similar. 

  • XRoar does the 6309, too, but Ciaran hasn't got single-step debugging in there, and I haven't been able to help him with that. It would take me a month for to get properly into his source code, to feel confident I was doing it right, plus a week or four to get the results properly debugged. 
  • And the run-time model that the 6309 needs would either be identical to the 6809's or just enough different to confuse us. 
  • Using the 6309's extensions wisely is not a topic for tutorials. Some are no-brainers, some are not, and some look like no-brainers but aren't.

Sigh. Ancient industry wars and their fallout. 8-| )

I should, for completeness, show you how absolute addressing on the 68000 consumes a lot of code space (comparatively speaking) for those 32-bit addresses, by doing this twice (not quite what I did for the 6809).  But I won't. It should be obvious that absolute addressing is similar to absolute addressing on the 8-bit CPUs, but at double the address width.

So I'm just going to point out that 32-bit absolute addresses are big and leave it up to you to figure out. (Almost, I'll talk a little more about it when we're done here.)

Well, let's put up a small wall of text here.

Comparing the 68000 to the 6809, the 68000 has pretty much everything the 6809 has, only bigger and more (as if the 68000 were designed and laid out in Texas ;-).

(Pretty much. No memory indirection, and no 8-bit addressing. 16-bit, yes, but, ... oh, there is 8-bit addressing, but you end up using it with a second index, and it's not cheaper than 16-bit.)

If we use the DP as a process-local base pointer, we can just allocate one of the address registers for that. And the offsets will be 16-bit instead of 8-bit. <:-)

So, whatever use we intend for the 6809's DP register, it can be done by a spare address register on the 68000. Sort-of -- but with big offsets. 

More address space is good! -- especially now that memory is cheap. Lots of memory means you can keep a lot more useful stuff in memory.

Except we need to note that the offsets (displacements, Motorola calls them) are signed offsets. Not offset 0 ($0000) to 65535 ($FFFF) from our address register doing DP duty, rather, offset -32768 (-$8000) to +32767 ($7FFF). Sigh.

And, as I parenthesized, the 68000 doesn't do memory indirection (which we haven't used yet). To indirect through a pointer in memory, the 68000 has to load the pointer into an (intermediary) address register (which it conveniently has enough of). It's not fatal, but it's sometimes inconvenient.

But the 6809's DP register doesn't directly support memory indirection, either. So that's a wash relative to the process local static allocation area. 

So the LEA instruction will be available on the 68000 for process local static variables, where it isn't on the 6809's DP ( -- the real reason for my habit of complaining about not having the DP mode duplicated in the 6809's index mode post-byte. :-/).

Whatever address register you replace DP with, you can use the 68000's full range of indexing capabilities on it.

So I'm going to allocate a 68000 address register for use as a local base pointer, for roughly the equivalent of the use I have made of the 6809's DP in the last chapter. Keep that in mind when you compare the code -- similar, but not the same.

Other things to pay attention to -- 

  • MOVEM (MOVE Multiple) is, as you might remember, intended for saving and restoring register sets in a single instruction, like the 6809's PSHU/S and PULU/S. So it doesn't have any effect on the registers, which is very convenient. Differently from the 6809's push and pop instructions, MOVEM can be used without increment or decrementing an address register, which is also convenient. 
  •  But you need to understand that MOVEM.W to a register sign-extends the 16-bit value loaded, even though it doesn't affect the flags.
  • MOVE (but not MOVEM) instructions can proceed memory-to-memory, without passing through a data or address register. 
  • ADDs and SUBtracts cannot operate memory-to-memory, but can operate register-to-memory or even immediate-to-memory, in addition to the usual memory to register. 
  • Be sure you check the number of bytes and kinds of object code produced by the various addressing modes.

With that much said, I think the comments -- along with comparing it to the 6809 code -- are sufficient, so here's the code for the parameter stack version:

	OPT LIST,SYMTAB	; Options we want for the stand-alone assembler.
	MACHINE MC68000	; because there are a lot the assembler can do.
	OPT DEBUG	; We want labels for debugging.
	OUTPUT
***********************************************************************
*
* 16-bit addition and subtraction for 68000 on parameter stack,
* with test code
* Joel Matthew Rees, October 2024
*
NATWID	EQU	4	; 4 bytes in the CPU's natural integer
*
*
	EVEN
LB_ADDR	EQU	*
ENTRY	BRA.W	START
	NOP		; A little buffer zone.
	NOP
A4SAVE	DS.L	1	; a place to keep A4 to A7 so we can return clean
A5SAVE	DS.L	1	; using it as pseudo-DP
A6SAVE	DS.L	1	; using it as PSP
A7SAVE	DS.L	1	; SP
FINAL1	DS.L	1	; 32-bit final result in process-local variable
FINAL2	DS.L	1	; another final result
FINAL3	DS.L	1	; yet another final result
	DS.W	1	; gap
FINAL16	DS.W	1	; 16-bit final result
GAP1	DS.L	54	; gap, make it an even 256 bytes.
*
*
	DS.L	1	; a little bumper space
SSTKLIM	DS.L	16	; 16 levels of call, max
* 			; 68000 is pre-dec (pre-store-decrement) push
SSTKBAS	DS.L	1	; a little bumper space
PSTKLIM	DS.L	32	; roughly 16 levels of call at two parameters per call
PSTKBAS	DS.L	1	; bumper space -- parameter stack is pre-dec
*
*
INISTKS	MOVE.L	(A7)+,A0	; get the return address
	LEA	A4SAVE(PC),A3
	MOVEM.L	A4-A7,(A3)	; Store away what the BIOS gives us.
	LEA	LB_ADDR(PC),A5	; set up our local base (pseudo-DP)
	LEA	SSTKBAS(PC),A7	; set up our return stack
	LEA	PSTKBAS(PC),A6	; set up our parameter stack
	JMP	(A0)		; return via A3
*
*
* PPOP and PPUSH are completely unnecessary, 
* but if we had to have them, here's one way to do it:
*PPOP16	MOVE.W	(A6)+,D7
*	RTS
*
*PPSH16	MOVE.W	D7,-(A6)
*	RTS
*
* Or, of course,
*PPOP16	MOVEM.W	(A6)+,D7	; movem to sign extend it.
*	RTS
*
*PPSH16	MOVEM.W	D7,-(A6)	; movem just because
*	RTS
*
*
* Don't need LD16I.
* If we needed it, it could look like this, but we don't.
*
* You could use it like this:
*	BSR.W	LD16I	; load D7 immediate
*	DC.W	$1234	; "immediate" 16-bit value to load
*	BSR	SOMEWHERE ; or some other executable code.
*
* LD16I	MOVE.L	(A7)+,A0	; point to the instruction stream
*	MOVE.W	(A0),D7	; from instruction stream
*	JMP	2(A0)	; return to the byte after the constant.
*
* But use
*	MOVE.W	#1234,D7	; 16 bits!
* instead.
*
* And if we need to index ROMmed tables or such, 
* we have something much better for that, too:
*
* TABLE	DC.B	SOMETHING
*	...
*	EVEN
*	...
* 	LEA	TABLE(PC),A0
*
*
* We often will not need these, but we'll go ahead and define them:
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit sum
ADD16	MOVE.W	(A6)+,D7	; right (16-bit only)
	ADD.W	D7,(A6)		; add to left
	RTS			; *** all flags valid!! ***
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit difference
SUB16	MOVE.W	(A6)+,D7	; right (16-bit only)
	SUB.W	D7,(A6)		; subtract from left
	RTS			; *** all flags valid!! ***
*
* input parameters:
*   32-bit left, right
* output parameter:
*   32-bit sum
ADD32	MOVE.L	(A6)+,D7	; right 
	ADD.L	D7,(A6)		; add to left
	RTS			; *** all flags valid!! ***
*
* input parameters:
*   32-bit left, right
* output parameter:
*   32-bit difference
SUB32	MOVE.L	(A6)+,D7	; right 
	SUB.L	D7,(A6)		; subtract from left
	RTS			; *** all flags valid!! ***
*
* input parameters:
*   16-bit unsigned left, right
* output parameter:
*   32-bit sum
ADD16L	CLR.L	D7
	MOVE.W	2(A6),D7	; left (no sign extension)
	CLR.L	D6
	MOVE.W	(A6),D6		; right (no sign extension)
	ADD.L	D6,D7		; 32-bit sum
	MOVE.L	D7,(A6)		; 32-bit result on stack
	RTS			; *** X, N, Z valid ***
*
* input parameters:
*   16-bit left, right
* output parameter:
*   32-bit signed difference
SUB16L	CLR.L	D7
	MOVE.W	2(A6),D7	; left (no sign extension)
	CLR.L	D6
	MOVE.W	(A6),D6		; right (no sign extension)
	SUB.L	D6,D7		; 32-bit difference
	MOVE.L	D7,(A6)		; 32-bit result on stack
	RTS			; *** X, N, Z valid ***
*
*
* Let's use what we have:
START	BSR.W	INISTKS
*
	MOVE.W	#$1234,-(A6)
	MOVE.W	#$CDEF,-(A6)
	BSR.W	ADD16	; result should be $E023
	MOVE.W	#$8765,-(A6)
	BSR.W	SUB16	; result should be $58BE
	MOVE.W	(A6)+,FINAL16-LB_ADDR(A5)	; store the result
*
*	The 32-bit math and the unsigned 16-bit widened to 32 bit math 
*	are left as exercises.
*
DONE	MOVEM.L	A4SAVE-LB_ADDR(A5),A4-A7	; restore the monitor's A4-A7
	NOP
	NOP		; landing pad

And I am serious about the exercises for the reader, I think. I mean, you should have seen enough to be able to pick the numbers to use for testing and add the code yourself by now. (I hope.) Leave me a note in the comments if you have problems. 

Let's try that disparaged combined stack version now. Again, I think reading the comments and comparing the code with the 6809 code will be sufficient explanation:

	OPT LIST,SYMTAB	; Options we want for the stand-alone assembler.
	MACHINE MC68000	; because there are a lot the assembler can do.
	OPT DEBUG	; We want labels for debugging.
	OUTPUT
***********************************************************************
*
* 16-bit addition and subtraction for 68000 on return stack,
* with test code
* Joel Matthew Rees, October 2024
*
NATWID	EQU	4	; 4 bytes in the CPU's natural integer
*
*
	EVEN
LB_ADDR	EQU	*
ENTRY	BRA.W	START
	NOP		; A little buffer zone.
	NOP
A4SAVE	DS.L	1	; a place to keep A4 to A7 so we can return clean
A5SAVE	DS.L	1	; using it as pseudo-DP
A6SAVE	DS.L	1	; save A6 anyway.
A7SAVE	DS.L	1	; SP
FINAL1	DS.L	1	; 32-bit final result in process-local variable
FINAL2	DS.L	1	; another final result
FINAL3	DS.L	1	; yet another final result
	DS.W	1	; gap
FINAL16	DS.W	1	; 16-bit final result
GAP1	DS.L	54	; gap, make it an even 256 bytes.
*
	DS.L	1	; a little bumper space
SSTKLIM	DS.L	16	; 16 levels of call, max
* 			; 68000 is pre-dec (pre-store-decrement) push
SSTKBAS	DS.L	1	; a little bumper space
*
*
INISTKS	MOVEM.L	(A7)+,A0	; get the return address
	LEA	A4SAVE(PC),A3
	MOVEM.L	A4-A7,(A3)	; Store away what the BIOS gives us.
	LEA	LB_ADDR(PC),A5	; set up our local base (pseudo-DP)
	LEA	SSTKBAS(PC),A7	; set up our return stack
	JMP	(A0)		; return via A3
*
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit sum
ADD16	MOVE.L	(A7)+,A0	; Get the return address out of the way
	MOVE.W	(A7)+,D7	; right (16-bit only)
	ADD.W	D7,(A7)		; add to left
	JMP	(A0)		; return, *** all flags valid!! ***
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit difference
SUB16	MOVE.L	(A7)+,A0	; Get the return address out of the way
	MOVE.W	(A7)+,D7	; right 
	SUB.W	D7,(A7)		; subtract from left
	JMP	(A0)		; return, *** all flags valid!! ***
*
* input parameters:
*   32-bit left, right
* output parameter:
*   32-bit sum
ADD32	MOVE.L	(A7)+,A0	; Get the return address.
	MOVE.L	(A7)+,D7	; right
	ADD.L	D7,(A7)		; add to left
	JMP	(A0)		; return, *** all flags valid!! ***
*
* JFTR, something like this should also work:
* ADD32	MOVE.L	(A7)+,D5/D6/D7
*	MOVE.L	D5,A0
*	ADD.L	D6,D7
*	MOVE.L	D7,-(A7)
*	JMP	(A0)		; return, *** all flags valid!! ***
*
* input parameters:
*   32-bit left, right
* output parameter:
*   32-bit difference
SUB32	MOVE.L	(A7)+,A0	; Get the return address.
	MOVE.L	(A7)+,D7	; right
	SUB.L	D7,(A7)		; subtract from left
	JMP	(A0)		; return, *** all flags valid!! ***
*
* input parameters:
*   16-bit unsigned left, right
* output parameter:
*   32-bit sum
ADD16L	MOVE.L	(A7)+,A0	; Get the return address.
	CLR.L	D7
	MOVE.W	2(A7),D7	; left (no sign extension)
	CLR.L	D6
	MOVE.W	(A7),D6		; right (no sign extension)
	ADD.L	D6,D7		; 32-bit sum
	MOVE.L	D7,(A7)		; 32-bit result on stack
	JMP	(A0)		; return, *** all flags valid!! ***
*
* input parameters:
*   16-bit left, right
* output parameter:
*   32-bit signed difference
SUB16L	MOVE.L	(A7)+,A0	; Get the return address.
	CLR.L	D7
	MOVE.W	2(A7),D7	; left (no sign extension)
	CLR.L	D6
	MOVE.W	(A7),D6		; right (no sign extension)
	SUB.L	D6,D7		; 32-bit difference
	MOVE.L	D7,(A7)		; 32-bit result on stack
	JMP	(A0)		; return, *** all flags valid!! ***
*
*
START	BSR.W	INISTKS
*
	MOVE.W	#$1234,-(A7)
	MOVE.W	#$CDEF,-(A7)
	BSR.W	ADD16	; result should be $E023
	MOVE.W	#$8765,-(A7)
	BSR.W	SUB16	; result should be $58BE
	MOVE.W	(A7)+,FINAL16-LB_ADDR(A5)	; store the result
*
*	The 32-bit math and the unsigned 16-bit widened to 32 bit math 
*	done as exercises in the parameter stack version
* 	should work here, too.
*
DONE	MOVEM.L	A4SAVE-LB_ADDR(A5),A4-A7	; restore the monitor's A4-A7
	NOP
	NOP		; landing pad

No surprises, no revelations. 

But do check that the 32-bit problems you worked out for the split-stack discipline don't break on the combined stack discipline, or, if they do, make sure you can fix them.

Up until now, we haven't really tried to do anything like direct page mode on the 68000, just used absolute/extended mode.

As I mentioned at the top of this chapter, the 68000 does have abbreviated addressing modes. Addresses in the first 64K (cough) ...

That's not right. Let's try that again. 

Addresses within 32K of address zero can use the short absolute form, which takes only 16-bits of address -- 2 bytes of address after the 2 bytes of op-code. 

Yeah, yeah, yeah, that's absolute addresses from -32768 to +32767, or -$8000 to +$7FFF, written in 32 bits as a signed integer, 

$FFFF8000 to $00007FFF

they can be given in short absolute as 

$8000 to $7FFF

But BIOS and TOS use more than 64 K at the bottom of the address space (addresses $0000 to $7FFF).  And addresses at the top of address space aren't implemented in the Atari ST. So the short address absolute mode isn't going to be much direct use to us.

In the 6809, we can move the DP past the range used in Disk I/O and MDOS on the EXORciser/EXORsim, and we did that. Moved it to $2000.

If pick, arbitrarily, A5 for a substitute for the 6809 DP register -- or, more correctly, as a base for the per-process variable space -- we can use short 16-bit constant (signed) offsets to access that space.

If we insist on (signed) 8-bit offsets, we can (again arbitrarily) designate A5+D3 as the base, loading D3 with zero and accessing 0 to 127 (positive) offsets with that address mode, but it still takes a full 16-bits to specify the addressing mode and the offset. (And remembering that 128 to 255 are going to actually be negative offset -128 to -1.)

With full 16-bit offsets, the range

-$8000 to +$7FFF (-32768 to 32767)

from the base address can be accessed. But negative offsets become tricky to work with, so it's probably best to consider it 0 to 32767 except in certain special cases. 32767 is an awful lot of room anyway.

(Motorola calls signed offsets "displacements", to help us, I suppose, remember they are signed.) 

Full 32-bit constant offsets were not available until the 68020 and beyond. If you needed them, you load the offset constant into a data register or a second address register as I mentioned above when talking about 8-bit offsets.

So, different from the 6809 DP in a number of ways, but it does allow us to set up a base for per-process variables.

Here's some code for addition and subtraction using statically allocated parameter variables based off A5 as a near-equivalent to DP in providing a base for per-process statically allocated variable space. 

Concerning my comments on consistency, yeah, it seems kind of ridiculous to bother with offsetting the 32-bit parameter variables by 2 so that the 16-bit parameters go into the low word portion of the variable in RAM, when there won't be any other code that accesses those parameters. But it doesn't cost anything at run-time, and it keeps the source consistent, and the hardest thing about statically-allocated parameters is keeping their use consistent.

It's worth the effort if you have to use statically allocated parameters.

Accessing a variable without being conscious of its size is way up there among ways to blow up your code silently.

	OPT LIST,SYMTAB	; Options we want for the stand-alone assembler.
	MACHINE MC68000	; because there are a lot the assembler can do.
	OPT DEBUG	; We want labels for debugging.
	OUTPUT
***********************************************************************
*
* 16-bit addition and subtraction for 68000 via per-process are
* scratch pad,
* with test code
* Joel Matthew Rees, October 2024
*
NATWID	EQU	4	; 4 bytes in the CPU's natural integer
*
*
	EVEN
LB_ADDR	EQU	*
ENTRY	BRA.W	START
	NOP		; A little buffer zone.
	NOP
A4SAVE	DS.L	1	; a place to keep A4 to A7 so we can return clean
A5SAVE	DS.L	1	; using it as pseudo-DP
A6SAVE	DS.L	1	; save A6 anyway.
A7SAVE	DS.L	1	; SP
FINAL1	DS.L	1	; 32-bit final result in process-local variable
FINAL2	DS.L	1	; another final result
FINAL3	DS.L	1	; yet another final result
	DS.W	1	; gap
FINAL16	DS.W	1	; 16-bit final result
*
* parameter/scratch area for leaf functions only:
* ** When using statically allocated parameters,
* you want to reuse them.
* ** And when reusing statically allocated parameters,
* you absolutely want to use them consistently.
* ** The assembler may not handle implicit offsets 
* like 6809 assemblers handle DP, 
* so you need to calculate the offsets yourself.
NLFT	DS.L	1	; binary operator left side parameter
NRT	DS.L	1	; binary operator right side parameter
NRES	DS.L	1	; unary/binary operator result
NTEMP	DS.L	1	; general scratch register for 
NPAR	EQU	NLFT	; unary operator parameter
NSCRAT	EQU	NLFT	; 
*
GAP1	DS.L	50	; gap, make it an even 256 bytes.
*
	DS.L	1	; a little bumper space
SSTKLIM	DS.L	16	; roughly 16 levels of call, max
*			; 68000 is pre-dec (pre-store-decrement) push
SSTKBAS	DS.L	1	; a little bumper space
*
*
INISTKS	MOVEM.L	(A7)+,A0	; get the return address
	LEA	A4SAVE(PC),A3
	MOVEM.L	A4-A7,(A3)	; Store away what the BIOS gives us.
	LEA	LB_ADDR(PC),A5	; set up our local base (pseudo-DP)
	LEA	SSTKBAS(PC),A7	; set up our return stack
	JMP	(A0)		; return via A3
*
*
* Don't need PPOP and PPSH, but wait 'til we need SCRATCHPUSH!
*
*
* input parameters:
*   16-bit left in low word of NLFT,
*   16-bit right in low word of NRT
* output parameter:
*   17-bit sum in all 32 bits of NRES
ADD16	CLR.L	D7	; for an entirely valid result
	MOVE.W	NLFT+2-LB_ADDR(A5),D7	; low word
	ADD.W	NRT+2-LB_ADDR(A5),D7	; low word
	MOVE.W	D7,NRES+2-LB_ADDR(A5)	; sum
	RTS
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit difference
SUB16	CLR.L	D7	; for an entirely valid result
	MOVE.W	NLFT+2-LB_ADDR(A5),D7	; low word
	SUB.W	NRT+2-LB_ADDR(A5),D7	; low word
	MOVE.W	D7,NRES+2-LB_ADDR(A5)	; difference
	RTS
*
*
START	BSR.W	INISTKS
*
	MOVE.W	#$1234,NLFT+2-LB_ADDR(A5)
	MOVE.W	#$CDEF,NRT+2-LB_ADDR(A5)
	BSR.W	ADD16	; result should be $E023
	MOVE.W	NRES+2-LB_ADDR(A5),NLFT+2-LB_ADDR(A5)
	MOVE.W	#$8765,NRT+2-LB_ADDR(A5)
	BSR.W	SUB16	; result should be $58BE
	MOVE.W	NRES+2-LB_ADDR(A5),FINAL16-LB_ADDR(A5)
*
* Repeat, with native instructions:
	MOVE.W	#$1234,D7
	ADD.W	#$CDEF,D7
	SUB.W	#$8765,D7
*
*	The 32-bit math and the unsigned 16-bit widened to 32 bit math 
*	are left as exercises.
*
DONE	MOVEM.L	A4SAVE-LB_ADDR(A5),A4-A7	; restore the monitor's A4-A7
	NOP
	NOP		; landing pad

I think it's time to start looking at getting numeric output -- probably before we look at multiplication and division, even though we'll need multiplication and division for decimal base output. Let's try binary output on the 6800 if you're ready to jump ahead.

Except, we are using the stack enough to start talking about balancing the stack and checking it, things we will need to know to debug our mistakes pretty soon.


(Title Page/Index)


No comments:

Post a Comment