Wednesday, October 2, 2024

ALPP 02-10 -- On the Beach with Parameters -- 16-bit Arithmetic on the 6809

On the Beach with Parameters --
16-bit Arithmetic
on the 6809

(Title Page/Index)

 

And now we've worked through three different ways to pass parameters at run-time on the 6801.

So what does the 6809 do for us?

The declarations from the 6800/6801 code we borrowed from the improved Hello World examples change in small ways, as does the initialization code.

PSP is now the U register, so we don't need a variable for it. We could actually get rid of everything in the DP, since SSAVE really doesn't need to be in the DP, but we'll keep it this way to be consistent.

The JMP at NOENTRY can be exchanged for a long branch, and I like that better. It allows us to make the code from NOENTRY up relocatable without load-time patching. So I'm going ahead and doing it. 

The return stack is now pre-decrement push, so the declarations for it change from the 6800/6801 code.

The initialization code really doesn't change, even though I am now using Load Effective Address instructions in PC-relative mode, which keeps the initialization code relocatable without patch-up. 

Push and pop on both the U stack (which we are using for parameters) and the S stack (the return address stack) are part of the native instruction set and fully encode in two bytes, so using PPUSH and PPOP routines would actually be de-optimizing in both terms of code size and cycle counts. We do want to note that load and store instructions (LDD/STD) affect the flags, where the push and pop instructions (PSHU/S and PULU/S) do not.

	ORG	$80	; MDOS and EXbug docs say it should be okay here.
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
SSAVE	RMB	2	; a place to keep S so we can return clean
*
*
	ORG	$2000	; MDOS says this is a good place for usr stuff
NOENTRY	LBRA	START
	RMB	2	; a little bumper space
SSTKLIM	RMB	32	; 16 levels of call, max
* 			; 6809 is pre-dec (pre-store-decrement) push
SSTKBAS	RMB	2	; a little bumper space
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
*
*
INISTKS	LEAU	PSTKBAS,PCR	; Set up the parameter stack
	PULS	X		; get return address
	STS	SSAVE		; Save what the monitor gave us.
	LEAS	SSTKBAS,PCR	; Move to our own stack
	JMP	,X	; return via X
*
* PPOP and PPUSH are completely unnecessary, 
* but if we had to have them, here's one way to do it:
*PPOP16	LDD	,U++
*	RTS
*
*PPSH16	STD	,--U
*	RTS
*
* Or, of course,
*PPOP16	PULU	A,B
*	RTS
*
*PPSH16	PSHU	A,B
*	RTS

Since the 6809, like the 6801, has LDD, we don't need a LD16I instruction, Huzzah!

We can do similar things if necessary

* Don't need LD16I.
* If we needed it, it could look like this, but we don't.
*
* You could use it like this:
*	LBSR	LD16I	; load D immediate
*	FDB	$1234	; "immediate" 16-bit value to load
*	BSR	SOMEWHERE ; or some other executable code.
*
* LD16I	PULS	X	; point to instruction stream
*	LDD	,X	; from instruction stream
*	JMP	2,X	; return to the byte after the constant.
*
* But use
*	LDD	#1234	; 16 bits!
* instead.
*
* And if we need to index ROMmed tables or such, 
* we have something much better for that, too:
*
* TABLE	FCB	SOMETHING
*	...
* 	LEAX	TABLE,PCR

When we need to load addresses to work on them, we can now use the LEA instructions instead of loading the address as an immediate into D.

Cool stuff, huh?

And, if we refer back to Wozniak's Sweet 16 virtual machine, we find that the 6809 instruction set and addressing modes basically implement everything that Sweet 16 gave the 6502 (and more), as native, full speed instructions, with compact encodings.

Is that exciting? Or does it get boring? 

Boring can be good, sometimes.

Well, one caveat. Motorola did not include DP-relative in the index mode post-byte, so indirecting through direct-page pointers requires loading the pointer into an index register. And getting the effective address for variables in the direct page requires just a little computation:

* Indirecting through DP variables --
* instead of
*	LDD	[<DP_PTR]
* use an intermediate index register
	LDX	<DP_PTR
	LDD	,X
*
* Loading effective address of DP variables --
* instead of 
* 	LEAX	<DP_VAR
* calculate it something like
	TFR	DP,A
	LDB	#DP_VAR-DP_BASE
	TFR	D,X

Bummer! Right?

Okay, the world is not our perfect oyster yet. We're not taking a huge hit, we can deal with it.

How do the addition and subtraction subroutines fare?

Oh, wow!

* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit sum
ADD16	LDD	2,U	; left 
	ADDD	,U++	; right
	STD	,U	; sum (N, Z, & C flags should be correct)
	RTS
* Flags: Specifically,
*        N and Z get set correctly by the final store double;
*        C should make it through manipulating X and storing D.
*        V gets cleared.
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit difference
SUB16	LDD	2,U	; left
	SUBD	,U++	; right
	STD	,U	; difference (N, Z, & C flags should be correct)
	RTS
* Flags: Specifically,
*        N and Z get set correctly by the final store double;
*        C should make it through manipulating X and storing D.
*        V gets cleared.

Stack maintenance basically disappears into the meat of the function. In fact, we look at that and wonder if we really need to call those routines any more. No more than six bytes to in-line them, as compared to three bytes to call them.

Sometimes we won't bother calling them.

AND THERE's MORE in those comments!

Again, even without the  

	TFR	CC,A

which the 6809 replaces TPA with, and without any bit twiddling or even much care about code ordering, the Zero, Negative, and Carry flags are right there for the caller to use. oVerflow still gets cleared. If we need it, we'll probably just use the instructions in-line.

Okay, putting the test frame for the 6809 together, with comments on what went away:

* 16-bit addition and subtraction for 6809 on parameter stack,
* with test code
* Joel Matthew Rees, October 2024
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS and EXbug docs say it should be okay here.
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
SSAVE	RMB	2	; a place to keep S so we can return clean
*
*
	ORG	$2000	; MDOS says this is a good place for usr stuff
NOENTRY	LBRA	START
	RMB	2	; a little bumper space
SSTKLIM	RMB	32	; 16 levels of call, max
* 			; 6809 is pre-dec (pre-store-decrement) push
SSTKBAS	RMB	2	; a little bumper space
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
*
*
INISTKS	LEAU	PSTKBAS,PCR	; Set up the parameter stack
	PULS	X		; get return address
	STS	SSAVE		; Save what the monitor gave us.
	LEAS	SSTKBAS,PCR	; Move to our own stack
	JMP	,X	; return via X
*
* PPOP and PPUSH are completely unnecessary, 
* but if we had to have them, here's one way to do it:
*PPOP16	LDD	,U++
*	RTS
*
*PPSH16	STD	,--U
*	RTS
*
* Or, of course,
*PPOP16	PULU	A,B
*	RTS
*
*PPSH16	PSHU	A,B
*	RTS
*
*
* Don't need LD16I.
* If we needed it, it could look like this, but we don't.
*
* You could use it like this:
*	LBSR	LD16I	; load D immediate
*	FDB	$1234	; "immediate" 16-bit value to load
*	BSR	SOMEWHERE ; or some other executable code.
*
* LD16I	PULS	X	; point to the instruction stream
*	LDD	,X	; from instruction stream
*	JMP	2,X	; return to the byte after the constant.
*
* But use
*	LDD	#1234	; 16 bits!
* instead.
*
* And if we need to index ROMmed tables or such, 
* we have something much better for that, too:
*
* TABLE	FCB	SOMETHING
*	...
* 	LEAX	TABLE,PCR
*
*
* We often will not need these, but we'll go ahead and define them:
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit sum
ADD16	LDD	2,U	; left 
	ADDD	,U++	; right
	STD	,U	; sum (N, Z, & C flags should be correct)
	RTS
* Flags: Specifically,
*        N and Z get set correctly by the final store double;
*        C should make it through manipulating X and storing D.
*        V gets cleared.
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit difference
SUB16	LDD	2,U	; left
	SUBD	,U++	; right
	STD	,U	; difference (N, Z, & C flags should be correct)
	RTS
* Flags: Specifically,
*        N and Z get set correctly by the final store double;
*        C should make it through manipulating X and storing D.
*        V gets cleared.
*
*
* Let's use what we have:
START	LBSR	INISTKS
*
	LDD	#$1234
	PSHU	A,B
	LDD	#$CDEF
	PSHU	A,B
	LBSR	ADD16	; result should be $E023
	LDD	#$8765
	PSHU	A,B
	LBSR	SUB16	; result should be $58BE
	LDD	,U++	; load the result into A:B
*
DONE	LDS	SSAVE,PCR	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

You know the drill. Step through it, try other constants. Convince yourself that you'd rather use the 6809 than even the 6801, when you're trying to get work done.

(Why didn't Motorola release the 6809 as an SOC core like it did the 6801? ブツブツブツ)

And now we're going to see some revelations about the single interleaved stack discipline I keep disparaging:

* 16-bit addition and subtraction for 6809 on return stack,
* with test code
* Joel Matthew Rees, October 2024
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS and EXbug docs say it should be okay here.
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
SSAVE	RMB	2	; a place to keep S so we can return clean
*
*
	ORG	$2000	; MDOS says this is a good place for usr stuff
NOENTRY	LBRA	START
	NOP		; bump to aligned
	RMB	2	; a little bumper space
SSTKLIM	RMB	96	; (64+32) roughly 16 levels of call, max
* 			; 6809 is pre-dec (pre-store-decrement) push
SSTKBAS	RMB	2	; a little bumper space
*
*
INISTKS	PULS	X		; get return address
	STS	SSAVE		; Save what the monitor gave us.
	LEAS	SSTKBAS,PCR	; Move to our own stack
	JMP	,X	; return via X
*
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit sum
ADD16	PULS	X	; get return address out of the way
	LDD	2,S	; left 
	ADDD	,S++	; right
	STD	,S	; sum (N, Z, & C flags should be correct)
	JMP	,X	; return
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit difference
SUB16	PULS	X	; get return address out of the way
	LDD	2,S	; left 
	SUBD	,S++	; right
	STD	,S	; difference (N, Z, & C flags should be correct)
	JMP	,X	; return
*
*
START	LBSR	INISTKS
*
	LDD	#$1234
	PSHS	A,B
	LDD	#$CDEF
	PSHS	A,B
	LBSR	ADD16	; result should be $E023
	LDD	#$8765
	PSHS	A,B
	LBSR	SUB16	; result should be $58BE
	LDD	,S++	; load the result into A:B
*
DONE	LDS	SSAVE,PCR	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

You're looking at me and saying,

What revelations?????? That looks almost identical to the code for the split stack!!

Well, that should be a revelation. On the 6809, the only cost for using a separate parameter stack is the cost of declaring the stack space and initializing it, and then we don't have to fuss with the return address in the middle of our parameters any more.

In this example we don't really see how much we gain, but at least we can see that there's no real cost -- on a processor like the 6809.

No real cost except the allocation, and so many engineers have thought the allocation was the biggest hurdle. It seems to be a losing battle, doesn't it. Let's soldier on.

[EDIT JMR202410042358:]

Almost identical, indeed.

Case in point of how easy it is to mess up your code when you are dancing around the return address to get to your parameters and local variables.

While working on the equivalent code to the above for the 68000, I realized that I had failed to de-allocate the stack before or on return from the ADD16 and SUB16 routines here. Here's what I had written:

* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit sum
ADD16	LDD	4,S	; left 
	ADDD	2,S	; right
	STD	2,S	; sum (N, Z, & C flags should be correct)
	RTS
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit difference
SUB16	LDD	4,S	; left 
	SUBD	2,S	; right
	STD	2,S	; sum (N, Z, & C flags should be correct)
	RTS

I had the offsets correct, you see? No problem there. Or, I thought so. I had successfully avoided overwriting the return address, but now the result was out of place and in the way, and the stack had one of the input parameters still live on it after the return. This is a good way to overflow the stack and in various ways screw up the calculations.

But so many engineers think that they won't do this. Or, rather, that they can write their compilers to keep them from doing it. 

And it would be nice if you would believe me for this, but I'm sure I'm going to have to present stronger evidence than my mistakes to really convince you.

[END EDIT JMR202410042358.]

How is the scratch area in DP version going to look?

* 16-bit addition and subtraction for 6809 via DP scratch pad,
* with test code
* Joel Matthew Rees, October 2024
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS and EXbug docs say it should be okay here.
	SETDP	0
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
SSAVE	RMB	2	; a place to keep S so we can return clean
* parameter/scratch area for leaf functions only:
NLFT	RMB	2	; binary operator left side parameter
NRT	RMB	2	; binary operator right side parameter
NRES	RMB	2	; unary/binary operator result
NTEMP	RMB	2	; general scratch register for 
NPAR	EQU	NLFT	; unary operator parameter
NSCRAT	EQU	NLFT	; 
*
*
	ORG	$2000	; MDOS says this is a good place for usr stuff
NOENTRY	LBRA	START
	NOP		; bump to aligned
	RMB	2	; a little bumper space
SSTKLIM	RMB	32	; roughly 16 levels of call, max
*			; 6809 is pre-dec (pre-store-decrement) push
SSTKBAS	RMB	2	; a little bumper space
*
*
INISTKS	PULS	X		; get return address
	STS	SSAVE		; Save what the monitor gave us.
	LEAS	SSTKBAS,PCR	; Move to our own stack
	JMP	,X		; return via X
*
*
* Don't need PPOP and PPSH, but wait 'til we need SCRPSH!
*
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit sum
ADD16	LDD	NLFT
	ADDD	NRT
ADD16S	STD	NRES	; sum
	RTS
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit difference
SUB16	LDD	NLFT
	SUBD	NRT
	STD	NRES	; difference
	RTS
* Stealing code would only save 1 byte.
*
*
START	LBSR	INISTKS
*
	LDD	#$1234
	STD	NLFT
	LDD	#$CDEF
	STD	NRT
	LBSR	ADD16	; result should be $E023
	LDD	NRES
	STD	NLFT
	LDD	#$8765
	STD	NRT
	LBSR	SUB16	; result should be $58BE
	LDD	NRES
*
* Repeat, with native instructions:
	LDD	#$1234
	ADDD	#$CDEF
	SUBD	#$8765
*
DONE	LDS	SSAVE,PCR	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

Now, if it weren't for the LBSR calls instead of the JSR calls, that would look just like the 6801 code! (Almost.) Why do we even need any stack at all?

Yeah! Why not just write

	LDD	#$1234
	ADDD	#$CDEF
	SUBD	#$8765

??

Why not just use the 6801?

Patience. We will get there. 

You know, I could have shown extended mode addressing vs. direct-page mode on each of these processors. That would be four modes, which would have been maybe too many. 

And the only difference between the absolute/extended mode and direct page mode for the 6800 and 6801 would have been the number of bytes for addresses for the parameter stack pointer and scratch registers.

There's another difference on the 6809, however. The DP register lets us move the direct page away from page zero. But ... really, for this example, that would not have been meaningful. We could have deliberately moved DP, but unless you were watching really closely as you stepped through, you might not have noticed. 

If the concept intrigues you, give it a try. The SETDP directive will be useful.

Some assemblers expect the SETDP to be given just the high byte of the base address, but the EXORsim assembler expects the whole base address (and warns if it is not on an even 256-byte boundary).

I will show how to use DP later.

I changed my mind. I know you wanted to explore it yourself. You can, of course. 

But I'm going ahead and showing you how to use DP before we move on to the 68000. There are concepts there I want to reference when I show you the 68000 code.

 

(Title Page/Index)

 

 

No comments:

Post a Comment