Sunday, September 29, 2024

ALPP 02-08 -- On the Beach with Parameters -- 16-bit Arithmetic on the 6800

On the Beach with Parameters --
16-bit Arithmetic
on the 6800

(Title Page/Index)

 

So we pretty much snuck the meat of the 16-bit arithmetic in already didn't we? We were passing byte parameters in and widening them in the last note and the previous two chapters, but we were doing 16-bit math.

Parameters. Oh, yeah. Those.

I wrote a couple of walls of text about parameters, and then decided I should show you code instead, or at least first. (Yes. Again.)

Let's define some library-style functions to add and subtract on the 6800, using the split stack parameter passing paradigm I keep talking about. Then I can philosophize a wall of text and maybe not put everyone to sleep. 

We'll borrow this code from the improved Hello World examples, to declare the stack pointers and set the stacks up, and to push and pop both accumulators:

	ORG	$80	; MDOS and EXbug docs say it should be okay here.
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
PSP	RMB	2	; parameter stack pointer
SSAVE	RMB	2	; a place to keep S so we can return clean
*
	ORG	$2000	; MDOS says this is a good place for usr stuff
NOENTRY	JMP	START
	RMB	2	; a little bumper space
SSTKLIM	RMB	31	; 16 levels of call, max
SSTKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
	RMB	2	; a little bumper space
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
*
*
INISTKS	LDX	#PSTKBAS	; Set up the parameter stack
	STX	PSP
	TSX		; point to return address
	LDX	0,X	; return address in X
	INS		; drop the return pointer on stack
	INS
	STS	SSAVE	; Save what the monitor gave us.
	LDS	#SSTKBAS	; Move to our own stack
	JMP	0,X	; return via X
*
PPOPD	LDX	PSP
	LDAA	0,X
	LDAB	1,X
	INX
	INX
	STX	PSP
	RTS
*
PPUSHD	LDX	PSP
	DEX
	DEX
	STX	PSP
	STAA	0,X
	STAB	1,X
	RTS

We want to be able to load immediate values to the A:B pair. Some assemblers would allow us to load them something like this:

VALUE	EQU	$1234
	...
	LDAA	#VALUE/256
	LDAB	#VALUE-VALUE/256

Some will even allow loading an address like this

BUFFER	RMB	80	; text buffer
	...
	LDAA	#BUFFER/256
	LDAB	#BUFFER-VALUE/256

But the one we are presently using in EXORsim will not do either. -- at this time.

Even assemblers that allow the former may not allow the latter, under the assumption that addresses should never be divided or multiplied in legitimate code. Treating addresses like integers has traditionally been considered evidence of operator error on the programmer's part, and many assemblers will complain if you do.

We could go looking for an assembler that will do what we want, but for now we want a workaround. (And some people think the following run-time "syntactic sugar" makes code more "readable", anyway.)

*
* Load a constant from the instruction stream into A:B, 
* continue execution after the constant.
* This is not self-modifying code, even though it feels like a trick
* and is playing with the return stack and instruction stream 
* in ways we wouldn't think we wanted to think we should.
* Call it a "necessary" bit of run-time syntactic sugar.
*
* Use it like this:
*	JSR	LD16I	; load A:B immediate
*	FDB	$1234	; "immediate" 16-bit value to load
*	JSR	SOMEWHERE ; or some other executable code.
*
LD16I	TSX		; point to top of return address stack
	LDX	0,X	; point into the instruction stream
	LDAA	0,X	; high byte from instruction stream
	LDAB	1,X	; low byte from instruction stream
	INS		; drop the return address we don't need
	INS
	JMP	2,X	; return to the byte after the constant.

What are you looking at me like that for? Yeah, this little bit of code to enable some syntactic sugar looks really strange when the concept of a return address is still fuzzy in your mind. And it seems so unnecessary. It takes space to define, the call takes as much space in code as the pair of immediate loads it replaces. WHY????

Well, if you study virtual machines like, for instance, the fig Forth run-time (the code for LIT), or Steve Wozniak's Sweet 16 VM that supplied 16-bit routines for some Apple II software, you recognize what it's doing. If you have a VM, it can be a way to save some bytes of object code, but what we're really trading is management time for runtime.

At a cost of a few cycles of (your) runtime, I can avoid the trouble of chasing done the bug in EXORsim, getting Joe H. Allen's attention, potentially discussing whether addresses should be allowed to have division done on them, etc., or, in the alternative, fixing it myself and forking the code like I did for the odd-ball EXORsim6801.

And you thought optimization was simple code size vs. speed. :)

Now we will define our addition and subtraction subroutines:

* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit sum
ADD16	LDX	PSP
	LDAB	3,X	; left low
	LDAA	2,X	; left high
	ADDB	1,X	; right low
	ADCA	0,X	; right high, with carry
	STAB	3,X	; sum low
	STAA	2,X	; sum high
	INX		; adjust parameter stack
	INX
	STX	PSP
	RTS
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit difference
SUB16	LDX	PSP
	LDAB	3,X	; left low
	LDAA	2,X	; left high
	SUBB	1,X	; right low
	SBCA	0,X	; right high, with borrow
	STAB	3,X	; difference low
	STAA	2,X	; difference high
	INX		; adjust parameter stack
	INX
	STX	PSP
	RTS

If you're wondering whether the processor flags are correct after all that, only the Carry flag makes it through the stack pointer update unscathed. Moreover, if you're watching, you should notice that the Zero flag does not show whether the entire 16 bits of the result are zero, only one byte at a time, the high byte last here.

We can sort of fix the flags, something like this (untested):

SUB16F	LDX	PSP
	LDAB	3,X	; left low
	LDAA	2,X	; left high
	SUBB	1,X	; right low
	SBCA	0,X	; right high, with borrow
	STAB	3,X	; difference low
	STAA	2,X	; difference high
* In this version, we will set the flags almost as if it were SUBD:
	TPA		; save the flags
	ANDA	#$FB	; clear the Z flag in the copy
	STAA	0,X	; re-use this byte to save the copied flags
	ORAB	2,X	; OR low byte with high to set the correct Z flag
	TPA
	ANDA	#$04	; clear all but Z
	ORAA	0,X	; combine corrected Z with copied flags
	PSHA		; which is worse? return stack or DP?
	INX		; adjust parameter stack before restoring the flags
	INX
	STX	PSP
	PULA		; get the flags back
	TAP		; replace the flags
	RTS

Wow, that's a lot of code! And we would want to test it thoroughly before using it for anything important. (It should work, but ...)

Pay particular attention to the order things are done: 

  1. We save the best copy of the flags.
  2. Before we update the stack pointer, we borrow some of the stack space that is no longer in use to calculate what the flags should have been.
  3. Then we save the corrected flags to the safest place we can think of. 
  4. Before updating the CPU flags, we update the stack pointer, so that updating the stack pointer will not thrash the flags we just calculated.
  5. Then we get the flags back and restore them in the CPU.
  6. RTS does not affect the flags. (This is a deliberate design decision by the CPU architects.)

So you can see how it could be done -- but we usually don't need all the flags corrected. (And, in fact, we didn't clear the Half-carry flag!) 

I'll show some alternate approaches later.

Let's put this all together with some test code. You'll want to pay close attention to what happens in the CPU when it executes each of the new routines, but especially LD16I.

* 16-bit addition and subtraction for 6800 on parameter stack, with test code
* Joel Matthew Rees, October 2024
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS and EXbug docs say it should be okay here.
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
PSP	RMB	2	; parameter stack pointer
SSAVE	RMB	2	; a place to keep S so we can return clean
*
*
	ORG	$2000	; MDOS says this is a good place for usr stuff
NOENTRY	JMP	START
	RMB	2	; a little bumper space
SSTKLIM	RMB	31	; 16 levels of call, max
SSTKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
	RMB	2	; a little bumper space
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
*
*
INISTKS	LDX	#PSTKBAS	; Set up the parameter stack
	STX	PSP
	TSX		; point to return address
	LDX	0,X	; return address in X
	INS		; drop the return pointer on stack
	INS
	STS	SSAVE	; Save what the monitor gave us.
	LDS	#SSTKBAS	; Move to our own stack
	JMP	0,X	; return via X
*
PPOP16	LDX	PSP
	LDAA	0,X
	LDAB	1,X
	INX
	INX
	STX	PSP
	RTS
*
PPSH16	LDX	PSP
	DEX
	DEX
	STX	PSP
	STAA	0,X
	STAB	1,X
	RTS
*
* Load a constant from the instruction stream into A:B, 
* continue execution after the constant.
* This is not self-modifying code, even though it feels like a trick
* and is playing with the return stack and instruction stream 
* in ways we wouldn't think we wanted to think we should.
* Call it a "necessary" bit of run-time syntactic sugar.
*
* Use it like this:
*	JSR	LD16I	; load D immediate
*	FDB	$1234	; "immediate" 16-bit value to load
*	JSR	SOMEWHERE ; or some other executable code.
*
LD16I	TSX		; point to top of return address stack
	LDX	0,X	; point into the instruction stream
	LDAA	0,X	; high byte from instruction stream
	LDAB	1,X	; low byte from instruction stream
	INS		; drop the return address we don't need
	INS
	JMP	2,X	; return to the byte after the constant.
*
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit sum
ADD16	LDX	PSP
	LDAB	3,X	; left low
	LDAA	2,X	; left high
	ADDB	1,X	; right low
	ADCA	0,X	; right high, with carry
	STAB	3,X	; sum low
	STAA	2,X	; sum high
	INX		; adjust parameter stack before restoring the flags
	INX
	STX	PSP
*
	RTS
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit difference
SUB16	LDX	PSP
	LDAB	3,X	; left low
	LDAA	2,X	; left high
	SUBB	1,X	; right low
	SBCA	0,X	; right high, with borrow
	STAB	3,X	; difference low
	STAA	2,X	; difference high
	INX		; adjust parameter stack before restoring the flags
	INX
	STX	PSP
	RTS
*
*
START	JSR	INISTKS
*
	JSR	LD16I
	FDB	$1234	; (FDB seems to want a comment.)
	JSR	PPSH16
	JSR	LD16I
	FDB	$CDEF	; (FDB seems to want a comment.)
	JSR	PPSH16
	JSR	ADD16	; result should be $E023
	JSR	LD16I
	FDB	$8765	; (FDB seems to want a comment.)
	JSR	PPSH16
	JSR	SUB16	; result should be $58BE
	LDX	PSP
	LDAB	1,X	; load the result into A:B
	LDAA	0,X
*
DONE	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

If something doesn't work, go back and make sure you've copied everything correctly.

Once you've stepped through it, you might want to try other constants. 

Before we move on to equivalent code for the 6801, let's compare how it would look with an interleaved (combined) parameter and return stack -- you know, the single stack discipline I keep disparaging. Here's a comparable test frame for the single stack:

* 16-bit addition and subtraction for 6800 on return stack,
* with test code
* Joel Matthew Rees, October 2024
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS and EXbug docs say it should be okay here.
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
SSAVE	RMB	2	; a place to keep S so we can return clean
*
*
	ORG	$2000	; MDOS says this is a good place for usr stuff
NOENTRY	JMP	START
	NOP		; bump to aligned
	RMB	2	; a little bumper space
SSTKLIM	RMB	95	; (64+31) roughly 16 levels of call, max
SSTKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
	RMB	2	; a little bumper space
*
*
INISTKS	TSX		; point to return address
	LDX	0,X	; return address in X
	INS		; drop the return pointer on stack
	INS
	STS	SSAVE	; Save what the monitor gave us.
	LDS	#SSTKBAS	; Move to our own stack
	JMP	0,X	; return via X
*
*
* Don't need PPOP and PPSH
*
* Load a constant from the instruction stream into A:B, 
* continue execution after the constant.
* This is not self-modifying code, even though it feels like a trick
* and is playing with the return stack and instruction stream 
* in ways we wouldn't think we wanted to think we should.
* Call it a "necessary" bit of run-time syntactic sugar.
*
* Use it like this:
*	JSR	LD16I	; load D immediate
*	FDB	$1234	; "immediate" 16-bit value to load
*	JSR	SOMEWHERE ; or some other executable code.
*
LD16I	TSX		; point to top of return address stack
	LDX	0,X	; point into the instruction stream
	LDAA	0,X	; high byte from instruction stream
	LDAB	1,X	; low byte from instruction stream
	INS		; drop the return address we don't need
	INS
	JMP	2,X	; return to the byte after the constant.
*
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit sum
ADD16	TSX
	LDAB	5,X	; left low
	LDAA	4,X	; left high
	ADDB	3,X	; right low
	ADCA	2,X	; right high, with carry
ADD16S	STAB	5,X	; sum low
	STAA	4,X	; sum high
	LDX	0,X	; before we deallocate it
	INS		; drop return address
	INS
	INS		; drop right-hand addend
	INS
	JMP	0,X	; return
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit difference
SUB16	TSX
	LDAB	5,X	; left low
	LDAA	4,X	; left high
	SUBB	3,X	; right low
	SBCA	2,X	; right high, with borrow
	BRA	ADD16S	; Steal code.
* Could steal code this way in the parameter stack example, as well.
*
*
START	JSR	INISTKS
*
	JSR	LD16I
	FDB	$1234	; (FDB seems to want a comment.)
	PSHB		; push in correct order
	PSHA
	JSR	LD16I
	FDB	$CDEF	; (FDB seems to want a comment.)
	PSHB
	PSHA
	JSR	ADD16	; result should be $E023
	JSR	LD16I
	FDB	$8765	; (FDB seems to want a comment.)
	PSHB
	PSHA
	JSR	SUB16	; result should be $58BE
	PULA
	PULB
*
DONE	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

On casual inspection and quick step-through, it looks simpler. It definitely runs faster. 

Being able to use the processor's native PSH/PULA/B instructions instead of the PPUSH/PPOP routines definitely seems to be a plus.

But deeper inspection reveals some tricky games dodging the return address, games that, if you get them wrong, crash the program in amusing ways just when you really didn't want to be amused.

I know you don't believe me, but hold on to your doubts for a moment.

For further reference, here's a comparable set of routines and test code that uses a scratch area in the DP to pass values in  and out. You could call this using direct page static globals as pseudo-registers:

* 16-bit addition and subtraction for 6800 via scratch pad,
* with test code
* Joel Matthew Rees, October 2024
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS and EXbug docs say it should be okay here.
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
SSAVE	RMB	2	; a place to keep S so we can return clean
* parameter/scratch area for leaf functions only:
NLFT	RMB	2	; binary operator left side parameter
NRT	RMB	2	; binary operator right side parameter
NRES	RMB	2	; unary/binary operator result
NTEMP	RMB	2	; general scratch register for 
NPAR	EQU	NLFT	; unary operator parameter
NSCRAT	EQU	NLFT	; 
*
*
	ORG	$2000	; MDOS says this is a good place for usr stuff
NOENTRY	JMP	START
	NOP		; bump to aligned
	RMB	2	; a little bumper space
SSTKLIM	RMB	31	; roughly 16 levels of call, max
SSTKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
	RMB	2	; a little bumper space
*
*
INISTKS	TSX		; point to return address
	LDX	0,X	; return address in X
	INS		; drop the return pointer on stack
	INS
	STS	SSAVE	; Save what the monitor gave us.
	LDS	#SSTKBAS	; Move to our own stack
	JMP	0,X	; return via X
*
*
* Don't need PPOP and PPSH
*
* Load a constant from the instruction stream into A:B, 
* continue execution after the constant.
* This is not self-modifying code, even though it feels like a trick
* and is playing with the return stack and instruction stream 
* in ways we wouldn't think we wanted to think we should.
* Call it a "necessary" bit of run-time syntactic sugar.
*
* Use it like this:
*	JSR	LD16I	; load D immediate
*	FDB	$1234	; "immediate" 16-bit value to load
*	JSR	SOMEWHERE ; or some other executable code.
*
LD16I	TSX		; point to top of return address stack
	LDX	0,X	; point into the instruction stream
	LDAA	0,X	; high byte from instruction stream
	LDAB	1,X	; low byte from instruction stream
	INS		; drop the return address we don't need
	INS
	JMP	2,X	; return to the byte after the constant.
*
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit sum
ADD16	LDAB	NLFT+1	; low
	LDAA	NLFT	; high
	ADDB	NRT+1	; low
	ADCA	NRT	; high, with carry
ADD16S	STAB	NRES+1	; sum low
	STAA	NRES	; sum high
	RTS
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit difference
SUB16	LDAB	NLFT+1	; low
	LDAA	NLFT	; high
	SUBB	NRT+1	; low
	SBCA	NRT	; high, with borrow
	BRA	ADD16S	; Steal code (5 bytes for 2)
* Could steal code this way in the parameter stack example, as well.
*
*
START	JSR	INISTKS
*
	JSR	LD16I
	FDB	$1234	; (FDB seems to want a comment.)
	STAB	NLFT+1
	STAA	NLFT
	JSR	LD16I
	FDB	$CDEF	; (FDB seems to want a comment.)
	STAB	NRT+1
	STAA	NRT
	JSR	ADD16	; result should be $E023
	LDAB	NRES+1
	LDAA	NRES
	STAB	NLFT+1
	STAA	NLFT
	JSR	LD16I
	FDB	$8765	; (FDB seems to want a comment.)
	STAB	NRT+1
	STAA	NRT
	JSR	SUB16	; result should be $58BE
	LDAB	NRES+1
	LDAA	NRES
	NOP
	NOP
* Repeat, without all the pushing and popping and jumping around:
	LDAB	#$34
	LDAA	#$12
	ADDB	#$EF
	ADCA	#$CD
	SUBB	#$65
	SBCA	#$87
*
DONE	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

This probably looks even simpler. 

It's not easy to see how complicated this becomes, how quickly, with a small test program like this, but it will become plain shortly (for some definition of shortly and some definition of plain).

And you should be looking at those six lines of code right before the DONE label and scratching your head. 

All of that? Just to write the equivalent of the following?

	LDAB	#$34
	LDAA	#$12
	ADDB	#$EF
	ADCA	#$CD
	SUBB	#$65
	SBCA	#$87

That is essentially what the test frame does. But, of course, we didn't write all of that just to write the test frame. We wrote it to allow us to do things well beyond what the test frame does.

And, in the cynical point of view, waste some of the applications' run-time cycles to reduce the design-time burden. 

But, no, not just that. There are things you cannot reduce to constants at design- or compile-time.

Things will become a bit clearer after we get a look at this for the 6801, 6809, and 68000, after we get a look at reading keys from the keyboard, and maybe a bit of other prep so we can take up a simple project to prove that we can make something directly useful from assembly language.

In the meantime, let's get a look at how this looks on the 6801.


(Title Page/Index)

 

No comments:

Post a Comment