On the Beach with Parameters --
16-bit Arithmetic
on the 6800
So we pretty much snuck the meat of the 16-bit arithmetic in already didn't
we? We were passing byte parameters in
and widening them
in the last note and the previous two chapters, but we were doing 16-bit
math.
Parameters. Oh, yeah. Those.
I wrote a couple of walls of text about parameters, and then decided I should
show you code instead, or at least first. (Yes. Again.)
Let's define some library-style functions to add and subtract on the 6800, using the split stack parameter passing paradigm I keep talking about. Then I can philosophize a wall of text and maybe not put everyone to sleep.
We'll borrow this code from the improved Hello World examples, to declare the
stack pointers and set the stacks up, and to push and pop both
accumulators:
ORG $80 ; MDOS and EXbug docs say it should be okay here.
ENTRY JMP START
NOP ; Just want even addressed pointers for no reason.
PSP RMB 2 ; parameter stack pointer
SSAVE RMB 2 ; a place to keep S so we can return clean
*
ORG $2000 ; MDOS says this is a good place for usr stuff
NOENTRY JMP START
RMB 2 ; a little bumper space
SSTKLIM RMB 31 ; 16 levels of call, max
SSTKBAS RMB 1 ; 6800 is post-dec (post-store-decrement) push
RMB 2 ; a little bumper space
PSTKLIM RMB 64 ; 16 levels of call at two parameters per call
PSTKBAS RMB 2 ; bumper space -- parameter stack is pre-dec
*
*
INISTKS LDX #PSTKBAS ; Set up the parameter stack
STX PSP
TSX ; point to return address
LDX 0,X ; return address in X
INS ; drop the return pointer on stack
INS
STS SSAVE ; Save what the monitor gave us.
LDS #SSTKBAS ; Move to our own stack
JMP 0,X ; return via X
*
PPOPD LDX PSP
LDAA 0,X
LDAB 1,X
INX
INX
STX PSP
RTS
*
PPUSHD LDX PSP
DEX
DEX
STX PSP
STAA 0,X
STAB 1,X
RTS
We want to be able to load immediate values to the A:B pair. Some assemblers would allow us to load them something like this:
VALUE EQU $1234
...
LDAA #VALUE/256
LDAB #VALUE-VALUE/256
Some will even allow loading an address like this
BUFFER RMB 80 ; text buffer
...
LDAA #BUFFER/256
LDAB #BUFFER-VALUE/256
But the one we are presently using in EXORsim will not do either. -- at this
time.
Even assemblers that allow the former may not allow the latter, under the assumption that addresses should never be divided or multiplied in legitimate code. Treating addresses like integers has traditionally been considered evidence of operator error on the programmer's part, and many assemblers will complain if you do.
We could go looking for an assembler that will do what we want, but for now we want a workaround. (And some people think the following run-time "syntactic sugar" makes code more "readable", anyway.)
*
* Load a constant from the instruction stream into A:B,
* continue execution after the constant.
* This is not self-modifying code, even though it feels like a trick
* and is playing with the return stack and instruction stream
* in ways we wouldn't think we wanted to think we should.
* Call it a "necessary" bit of run-time syntactic sugar.
*
* Use it like this:
* JSR LD16I ; load A:B immediate
* FDB $1234 ; "immediate" 16-bit value to load
* JSR SOMEWHERE ; or some other executable code.
*
LD16I TSX ; point to top of return address stack
LDX 0,X ; point into the instruction stream
LDAA 0,X ; high byte from instruction stream
LDAB 1,X ; low byte from instruction stream
INS ; drop the return address we don't need
INS
JMP 2,X ; return to the byte after the constant.
What are you looking at me like that for? Yeah, this little bit of code to enable some syntactic sugar looks really strange when the concept of a return address is still fuzzy in your mind. And it seems so unnecessary. It takes space to define, the call takes as much space in code as the pair of immediate loads it replaces. WHY????
Well, if you study virtual machines like, for instance, the fig Forth run-time (the code for LIT), or Steve Wozniak's Sweet 16 VM that supplied 16-bit routines for some Apple II software, you recognize what it's doing. If you have a VM, it can be a way to save some bytes of object code, but what we're really trading is management time for runtime.
At a cost of a few cycles of (your) runtime, I can avoid the trouble of chasing done the bug in EXORsim, getting Joe H. Allen's attention, potentially discussing whether addresses should be allowed to have division done on them, etc., or, in the alternative, fixing it myself and forking the code like I did for the odd-ball EXORsim6801.
And you thought optimization was simple code size vs. speed. :)
Now we will define our addition and subtraction subroutines:
* input parameters:
* 16-bit left, right
* output parameter:
* 16-bit sum
ADD16 LDX PSP
LDAB 3,X ; left low
LDAA 2,X ; left high
ADDB 1,X ; right low
ADCA 0,X ; right high, with carry
STAB 3,X ; sum low
STAA 2,X ; sum high
INX ; adjust parameter stack
INX
STX PSP
RTS
*
* input parameters:
* 16-bit left, right
* output parameter:
* 16-bit difference
SUB16 LDX PSP
LDAB 3,X ; left low
LDAA 2,X ; left high
SUBB 1,X ; right low
SBCA 0,X ; right high, with borrow
STAB 3,X ; difference low
STAA 2,X ; difference high
INX ; adjust parameter stack
INX
STX PSP
RTS
If you're wondering whether the processor flags are correct after all that, only the Carry flag makes it through the stack pointer update unscathed. Moreover, if you're watching, you should notice that the Zero flag does not show whether the entire 16 bits of the result are zero, only one byte at a time, the high byte last here.
We can sort of fix the flags, something like this (untested):
SUB16F LDX PSP
LDAB 3,X ; left low
LDAA 2,X ; left high
SUBB 1,X ; right low
SBCA 0,X ; right high, with borrow
STAB 3,X ; difference low
STAA 2,X ; difference high
* In this version, we will set the flags almost as if it were SUBD:
TPA ; save the flags
ANDA #$FB ; clear the Z flag in the copy
STAA 0,X ; re-use this byte to save the copied flags
ORAB 2,X ; OR low byte with high to set the correct Z flag
TPA
ANDA #$04 ; clear all but Z
ORAA 0,X ; combine corrected Z with copied flags
PSHA ; which is worse? return stack or DP?
INX ; adjust parameter stack before restoring the flags
INX
STX PSP
PULA ; get the flags back
TAP ; replace the flags
RTS
Wow, that's a lot of code! And we would want to test it thoroughly before using it for anything important. (It should work, but ...)
Pay particular attention to the order things are done:
- We save the best copy of the flags.
- Before we update the stack pointer, we borrow some of the stack space that is no longer in use to calculate what the flags should have been.
- Then we save the corrected flags to the safest place we can think of.
- Before updating the CPU flags, we update the stack pointer, so that updating the stack pointer will not thrash the flags we just calculated.
- Then we get the flags back and restore them in the CPU.
-
RTS does not affect the flags. (This is a deliberate design decision by the
CPU architects.)
So you can see how it could be done -- but we usually don't need all the flags corrected. (And, in fact, we didn't clear the Half-carry flag!)
I'll show some alternate approaches later.
Let's put this all together with some test code. You'll want to pay close
attention to what happens in the CPU when it executes each of the new
routines, but especially LD16I.
* 16-bit addition and subtraction for 6800 on parameter stack, with test code
* Joel Matthew Rees, October 2024
*
NATWID EQU 2 ; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
ORG $80 ; MDOS and EXbug docs say it should be okay here.
ENTRY JMP START
NOP ; Just want even addressed pointers for no reason.
PSP RMB 2 ; parameter stack pointer
SSAVE RMB 2 ; a place to keep S so we can return clean
*
*
ORG $2000 ; MDOS says this is a good place for usr stuff
NOENTRY JMP START
RMB 2 ; a little bumper space
SSTKLIM RMB 31 ; 16 levels of call, max
SSTKBAS RMB 1 ; 6800 is post-dec (post-store-decrement) push
RMB 2 ; a little bumper space
PSTKLIM RMB 64 ; 16 levels of call at two parameters per call
PSTKBAS RMB 2 ; bumper space -- parameter stack is pre-dec
*
*
INISTKS LDX #PSTKBAS ; Set up the parameter stack
STX PSP
TSX ; point to return address
LDX 0,X ; return address in X
INS ; drop the return pointer on stack
INS
STS SSAVE ; Save what the monitor gave us.
LDS #SSTKBAS ; Move to our own stack
JMP 0,X ; return via X
*
PPOP16 LDX PSP
LDAA 0,X
LDAB 1,X
INX
INX
STX PSP
RTS
*
PPSH16 LDX PSP
DEX
DEX
STX PSP
STAA 0,X
STAB 1,X
RTS
*
* Load a constant from the instruction stream into A:B,
* continue execution after the constant.
* This is not self-modifying code, even though it feels like a trick
* and is playing with the return stack and instruction stream
* in ways we wouldn't think we wanted to think we should.
* Call it a "necessary" bit of run-time syntactic sugar.
*
* Use it like this:
* JSR LD16I ; load D immediate
* FDB $1234 ; "immediate" 16-bit value to load
* JSR SOMEWHERE ; or some other executable code.
*
LD16I TSX ; point to top of return address stack
LDX 0,X ; point into the instruction stream
LDAA 0,X ; high byte from instruction stream
LDAB 1,X ; low byte from instruction stream
INS ; drop the return address we don't need
INS
JMP 2,X ; return to the byte after the constant.
*
*
* input parameters:
* 16-bit left, right
* output parameter:
* 16-bit sum
ADD16 LDX PSP
LDAB 3,X ; left low
LDAA 2,X ; left high
ADDB 1,X ; right low
ADCA 0,X ; right high, with carry
STAB 3,X ; sum low
STAA 2,X ; sum high
INX ; adjust parameter stack before restoring the flags
INX
STX PSP
*
RTS
*
* input parameters:
* 16-bit left, right
* output parameter:
* 16-bit difference
SUB16 LDX PSP
LDAB 3,X ; left low
LDAA 2,X ; left high
SUBB 1,X ; right low
SBCA 0,X ; right high, with borrow
STAB 3,X ; difference low
STAA 2,X ; difference high
INX ; adjust parameter stack before restoring the flags
INX
STX PSP
RTS
*
*
START JSR INISTKS
*
JSR LD16I
FDB $1234 ; (FDB seems to want a comment.)
JSR PPSH16
JSR LD16I
FDB $CDEF ; (FDB seems to want a comment.)
JSR PPSH16
JSR ADD16 ; result should be $E023
JSR LD16I
FDB $8765 ; (FDB seems to want a comment.)
JSR PPSH16
JSR SUB16 ; result should be $58BE
LDX PSP
LDAB 1,X ; load the result into A:B
LDAA 0,X
*
DONE LDS SSAVE ; restore the monitor stack pointer
NOP
NOP ; landing pad
If something doesn't work, go back and make sure you've copied everything correctly.
Once you've stepped through it, you might want to try other constants.
Before we move on to equivalent code for the 6801, let's compare how it would
look with an interleaved (combined) parameter and return stack -- you know,
the single stack discipline I keep disparaging. Here's a comparable test frame
for the single stack:
* 16-bit addition and subtraction for 6800 on return stack,
* with test code
* Joel Matthew Rees, October 2024
*
NATWID EQU 2 ; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
ORG $80 ; MDOS and EXbug docs say it should be okay here.
ENTRY JMP START
NOP ; Just want even addressed pointers for no reason.
SSAVE RMB 2 ; a place to keep S so we can return clean
*
*
ORG $2000 ; MDOS says this is a good place for usr stuff
NOENTRY JMP START
NOP ; bump to aligned
RMB 2 ; a little bumper space
SSTKLIM RMB 95 ; (64+31) roughly 16 levels of call, max
SSTKBAS RMB 1 ; 6800 is post-dec (post-store-decrement) push
RMB 2 ; a little bumper space
*
*
INISTKS TSX ; point to return address
LDX 0,X ; return address in X
INS ; drop the return pointer on stack
INS
STS SSAVE ; Save what the monitor gave us.
LDS #SSTKBAS ; Move to our own stack
JMP 0,X ; return via X
*
*
* Don't need PPOP and PPSH
*
* Load a constant from the instruction stream into A:B,
* continue execution after the constant.
* This is not self-modifying code, even though it feels like a trick
* and is playing with the return stack and instruction stream
* in ways we wouldn't think we wanted to think we should.
* Call it a "necessary" bit of run-time syntactic sugar.
*
* Use it like this:
* JSR LD16I ; load D immediate
* FDB $1234 ; "immediate" 16-bit value to load
* JSR SOMEWHERE ; or some other executable code.
*
LD16I TSX ; point to top of return address stack
LDX 0,X ; point into the instruction stream
LDAA 0,X ; high byte from instruction stream
LDAB 1,X ; low byte from instruction stream
INS ; drop the return address we don't need
INS
JMP 2,X ; return to the byte after the constant.
*
*
* input parameters:
* 16-bit left, right
* output parameter:
* 16-bit sum
ADD16 TSX
LDAB 5,X ; left low
LDAA 4,X ; left high
ADDB 3,X ; right low
ADCA 2,X ; right high, with carry
ADD16S STAB 5,X ; sum low
STAA 4,X ; sum high
LDX 0,X ; before we deallocate it
INS ; drop return address
INS
INS ; drop right-hand addend
INS
JMP 0,X ; return
*
* input parameters:
* 16-bit left, right
* output parameter:
* 16-bit difference
SUB16 TSX
LDAB 5,X ; left low
LDAA 4,X ; left high
SUBB 3,X ; right low
SBCA 2,X ; right high, with borrow
BRA ADD16S ; Steal code.
* Could steal code this way in the parameter stack example, as well.
*
*
START JSR INISTKS
*
JSR LD16I
FDB $1234 ; (FDB seems to want a comment.)
PSHB ; push in correct order
PSHA
JSR LD16I
FDB $CDEF ; (FDB seems to want a comment.)
PSHB
PSHA
JSR ADD16 ; result should be $E023
JSR LD16I
FDB $8765 ; (FDB seems to want a comment.)
PSHB
PSHA
JSR SUB16 ; result should be $58BE
PULA
PULB
*
DONE LDS SSAVE ; restore the monitor stack pointer
NOP
NOP ; landing pad
On casual inspection and quick step-through, it looks simpler. It definitely runs faster.
Being able to use the processor's native PSH/PULA/B instructions instead of the PPUSH/PPOP routines definitely seems to be a plus.
But deeper inspection reveals some tricky games dodging the return address, games that, if you get them wrong, crash the program in amusing ways just when you really didn't want to be amused.
I know you don't believe me, but hold on to your doubts for a moment.
For further reference, here's a comparable set of routines and test code that
uses a scratch area in the DP to pass values in and out. You could call this using direct page static globals as pseudo-registers:
* 16-bit addition and subtraction for 6800 via scratch pad,
* with test code
* Joel Matthew Rees, October 2024
*
NATWID EQU 2 ; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
ORG $80 ; MDOS and EXbug docs say it should be okay here.
ENTRY JMP START
NOP ; Just want even addressed pointers for no reason.
SSAVE RMB 2 ; a place to keep S so we can return clean
* parameter/scratch area for leaf functions only:
NLFT RMB 2 ; binary operator left side parameter
NRT RMB 2 ; binary operator right side parameter
NRES RMB 2 ; unary/binary operator result
NTEMP RMB 2 ; general scratch register for
NPAR EQU NLFT ; unary operator parameter
NSCRAT EQU NLFT ;
*
*
ORG $2000 ; MDOS says this is a good place for usr stuff
NOENTRY JMP START
NOP ; bump to aligned
RMB 2 ; a little bumper space
SSTKLIM RMB 31 ; roughly 16 levels of call, max
SSTKBAS RMB 1 ; 6800 is post-dec (post-store-decrement) push
RMB 2 ; a little bumper space
*
*
INISTKS TSX ; point to return address
LDX 0,X ; return address in X
INS ; drop the return pointer on stack
INS
STS SSAVE ; Save what the monitor gave us.
LDS #SSTKBAS ; Move to our own stack
JMP 0,X ; return via X
*
*
* Don't need PPOP and PPSH
*
* Load a constant from the instruction stream into A:B,
* continue execution after the constant.
* This is not self-modifying code, even though it feels like a trick
* and is playing with the return stack and instruction stream
* in ways we wouldn't think we wanted to think we should.
* Call it a "necessary" bit of run-time syntactic sugar.
*
* Use it like this:
* JSR LD16I ; load D immediate
* FDB $1234 ; "immediate" 16-bit value to load
* JSR SOMEWHERE ; or some other executable code.
*
LD16I TSX ; point to top of return address stack
LDX 0,X ; point into the instruction stream
LDAA 0,X ; high byte from instruction stream
LDAB 1,X ; low byte from instruction stream
INS ; drop the return address we don't need
INS
JMP 2,X ; return to the byte after the constant.
*
*
* input parameters:
* 16-bit left, right
* output parameter:
* 16-bit sum
ADD16 LDAB NLFT+1 ; low
LDAA NLFT ; high
ADDB NRT+1 ; low
ADCA NRT ; high, with carry
ADD16S STAB NRES+1 ; sum low
STAA NRES ; sum high
RTS
*
* input parameters:
* 16-bit left, right
* output parameter:
* 16-bit difference
SUB16 LDAB NLFT+1 ; low
LDAA NLFT ; high
SUBB NRT+1 ; low
SBCA NRT ; high, with borrow
BRA ADD16S ; Steal code (5 bytes for 2)
* Could steal code this way in the parameter stack example, as well.
*
*
START JSR INISTKS
*
JSR LD16I
FDB $1234 ; (FDB seems to want a comment.)
STAB NLFT+1
STAA NLFT
JSR LD16I
FDB $CDEF ; (FDB seems to want a comment.)
STAB NRT+1
STAA NRT
JSR ADD16 ; result should be $E023
LDAB NRES+1
LDAA NRES
STAB NLFT+1
STAA NLFT
JSR LD16I
FDB $8765 ; (FDB seems to want a comment.)
STAB NRT+1
STAA NRT
JSR SUB16 ; result should be $58BE
LDAB NRES+1
LDAA NRES
NOP
NOP
* Repeat, without all the pushing and popping and jumping around:
LDAB #$34
LDAA #$12
ADDB #$EF
ADCA #$CD
SUBB #$65
SBCA #$87
*
DONE LDS SSAVE ; restore the monitor stack pointer
NOP
NOP ; landing pad
This probably looks even simpler.
It's not easy to see how complicated this becomes, how quickly, with a small test program like this, but it will become plain shortly (for some definition of shortly and some definition of plain).
And you should be looking at those six lines of code right before the DONE label and scratching your head.
All of that? Just to write the equivalent of the following?
LDAB #$34
LDAA #$12
ADDB #$EF
ADCA #$CD
SUBB #$65
SBCA #$87
That is essentially what the test frame does. But, of course, we didn't write all of that just to write the test frame. We
wrote it to allow us to do things well beyond what the test frame does.
And, in the cynical point of view, waste some of the applications' run-time cycles to reduce the design-time burden.
But, no, not just that. There are things you cannot reduce to constants at design- or compile-time.
Things will become a bit clearer after we get a look at this for the 6801, 6809, and 68000, after we get a look at reading keys from the keyboard, and maybe a bit of other prep so we can take up a simple project to prove that we can make something directly useful from assembly language.
No comments:
Post a Comment