On the Beach with Parameters --
16-bit Arithmetic
on the 6809
And now we've worked through
three different ways to pass parameters at run-time on the 6801.
So what does the 6809 do for us?
The declarations from the 6800/6801 code we borrowed from the improved Hello World examples change in small ways, as does the initialization code.
PSP is now the U register, so we don't need a variable for it. We could actually get rid of everything in the DP, since SSAVE really doesn't need to be in the DP, but we'll keep it this way to be consistent.
The JMP at NOENTRY can be exchanged for a long branch, and I like that better. It allows us to make the code from NOENTRY up relocatable without load-time patching. So I'm going ahead and doing it.
The return stack is now pre-decrement push, so the declarations for it change from the 6800/6801 code.
The initialization code really doesn't change, even though I am now using Load Effective Address instructions in PC-relative mode, which keeps the initialization code relocatable without patch-up.
Push and pop on both the U stack (which we are using for parameters) and the S
stack (the return address stack) are part of the native instruction set and
fully encode in two bytes, so using PPUSH and PPOP routines would actually be
de-optimizing in both terms of code size and cycle counts. We do want to note
that load and store instructions (LDD/STD) affect the flags, where the push
and pop instructions (PSHU/S and PULU/S) do not.
ORG $80 ; MDOS and EXbug docs say it should be okay here.
ENTRY JMP START
NOP ; Just want even addressed pointers for no reason.
SSAVE RMB 2 ; a place to keep S so we can return clean
*
*
ORG $2000 ; MDOS says this is a good place for usr stuff
NOENTRY LBRA START
RMB 2 ; a little bumper space
SSTKLIM RMB 32 ; 16 levels of call, max
* ; 6809 is pre-dec (pre-store-decrement) push
SSTKBAS RMB 2 ; a little bumper space
PSTKLIM RMB 64 ; 16 levels of call at two parameters per call
PSTKBAS RMB 2 ; bumper space -- parameter stack is pre-dec
*
*
INISTKS LEAU PSTKBAS,PCR ; Set up the parameter stack
PULS X ; get return address
STS SSAVE ; Save what the monitor gave us.
LEAS SSTKBAS,PCR ; Move to our own stack
JMP ,X ; return via X
*
* PPOP and PPUSH are completely unnecessary,
* but if we had to have them, here's one way to do it:
*PPOP16 LDD ,U++
* RTS
*
*PPSH16 STD ,--U
* RTS
*
* Or, of course,
*PPOP16 PULU A,B
* RTS
*
*PPSH16 PSHU A,B
* RTS
Since the 6809, like the 6801, has LDD, we don't need a LD16I instruction, Huzzah!
We can do similar things if necessary
* Don't need LD16I.
* If we needed it, it could look like this, but we don't.
*
* You could use it like this:
* LBSR LD16I ; load D immediate
* FDB $1234 ; "immediate" 16-bit value to load
* BSR SOMEWHERE ; or some other executable code.
*
* LD16I PULS X ; point to instruction stream
* LDD ,X ; from instruction stream
* JMP 2,X ; return to the byte after the constant.
*
* But use
* LDD #1234 ; 16 bits!
* instead.
*
* And if we need to index ROMmed tables or such,
* we have something much better for that, too:
*
* TABLE FCB SOMETHING
* ...
* LEAX TABLE,PCR
When we need to load addresses to work on them, we can now use the LEA instructions instead of loading the address as an immediate into D.
Cool stuff, huh?
And, if we refer back to Wozniak's Sweet 16 virtual machine, we find that the 6809 instruction set and addressing modes basically implement everything that Sweet 16 gave the 6502 (and more), as native, full speed instructions, with compact encodings.
Is that exciting? Or does it get boring?
Boring can be good, sometimes.
Well, one caveat. Motorola did not include DP-relative in the index mode post-byte, so indirecting through direct-page pointers requires loading the pointer into an index register. And getting the effective address for variables in the direct page requires just a little computation:
* Indirecting through DP variables --
* instead of
* LDD [<DP_PTR]
* use an intermediate index register
LDX <DP_PTR
LDD ,X
*
* Loading effective address of DP variables --
* instead of
* LEAX <DP_VAR
* calculate it something like
TFR DP,A
LDB #DP_VAR-DP_BASE
TFR D,X
Bummer! Right?
Okay, the world is not our perfect oyster yet. We're not taking a huge hit, we
can deal with it.
How do the addition and subtraction subroutines fare?
Oh, wow!
* input parameters:
* 16-bit left, right
* output parameter:
* 16-bit sum
ADD16 LDD 2,U ; left
ADDD ,U++ ; right
STD ,U ; sum (N, Z, & C flags should be correct)
RTS
* Flags: Specifically,
* N and Z get set correctly by the final store double;
* C should make it through manipulating X and storing D.
* V gets cleared.
*
* input parameters:
* 16-bit left, right
* output parameter:
* 16-bit difference
SUB16 LDD 2,U ; left
SUBD ,U++ ; right
STD ,U ; difference (N, Z, & C flags should be correct)
RTS
* Flags: Specifically,
* N and Z get set correctly by the final store double;
* C should make it through manipulating X and storing D.
* V gets cleared.
Stack maintenance basically disappears into the meat of the function. In fact,
we look at that and wonder if we really need to call those routines any more.
No more than six bytes to in-line them, as compared to three bytes to call
them.
Sometimes we won't bother calling them.
AND THERE's MORE in those comments!Again, even without the
TFR CC,A
which the 6809 replaces TPA with, and without any bit twiddling or even much care about code ordering, the Zero, Negative, and Carry flags are right there for the caller to use. oVerflow still gets cleared. If we need it, we'll probably just use the instructions in-line.
Okay, putting the test frame for the 6809 together, with comments on what went away:
* 16-bit addition and subtraction for 6809 on parameter stack,
* with test code
* Joel Matthew Rees, October 2024
*
NATWID EQU 2 ; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
ORG $80 ; MDOS and EXbug docs say it should be okay here.
ENTRY JMP START
NOP ; Just want even addressed pointers for no reason.
SSAVE RMB 2 ; a place to keep S so we can return clean
*
*
ORG $2000 ; MDOS says this is a good place for usr stuff
NOENTRY LBRA START
RMB 2 ; a little bumper space
SSTKLIM RMB 32 ; 16 levels of call, max
* ; 6809 is pre-dec (pre-store-decrement) push
SSTKBAS RMB 2 ; a little bumper space
PSTKLIM RMB 64 ; 16 levels of call at two parameters per call
PSTKBAS RMB 2 ; bumper space -- parameter stack is pre-dec
*
*
INISTKS LEAU PSTKBAS,PCR ; Set up the parameter stack
PULS X ; get return address
STS SSAVE ; Save what the monitor gave us.
LEAS SSTKBAS,PCR ; Move to our own stack
JMP ,X ; return via X
*
* PPOP and PPUSH are completely unnecessary,
* but if we had to have them, here's one way to do it:
*PPOP16 LDD ,U++
* RTS
*
*PPSH16 STD ,--U
* RTS
*
* Or, of course,
*PPOP16 PULU A,B
* RTS
*
*PPSH16 PSHU A,B
* RTS
*
*
* Don't need LD16I.
* If we needed it, it could look like this, but we don't.
*
* You could use it like this:
* LBSR LD16I ; load D immediate
* FDB $1234 ; "immediate" 16-bit value to load
* BSR SOMEWHERE ; or some other executable code.
*
* LD16I PULS X ; point to the instruction stream
* LDD ,X ; from instruction stream
* JMP 2,X ; return to the byte after the constant.
*
* But use
* LDD #1234 ; 16 bits!
* instead.
*
* And if we need to index ROMmed tables or such,
* we have something much better for that, too:
*
* TABLE FCB SOMETHING
* ...
* LEAX TABLE,PCR
*
*
* We often will not need these, but we'll go ahead and define them:
*
* input parameters:
* 16-bit left, right
* output parameter:
* 16-bit sum
ADD16 LDD 2,U ; left
ADDD ,U++ ; right
STD ,U ; sum (N, Z, & C flags should be correct)
RTS
* Flags: Specifically,
* N and Z get set correctly by the final store double;
* C should make it through manipulating X and storing D.
* V gets cleared.
*
* input parameters:
* 16-bit left, right
* output parameter:
* 16-bit difference
SUB16 LDD 2,U ; left
SUBD ,U++ ; right
STD ,U ; difference (N, Z, & C flags should be correct)
RTS
* Flags: Specifically,
* N and Z get set correctly by the final store double;
* C should make it through manipulating X and storing D.
* V gets cleared.
*
*
* Let's use what we have:
START LBSR INISTKS
*
LDD #$1234
PSHU A,B
LDD #$CDEF
PSHU A,B
LBSR ADD16 ; result should be $E023
LDD #$8765
PSHU A,B
LBSR SUB16 ; result should be $58BE
LDD ,U++ ; load the result into A:B
*
DONE LDS SSAVE,PCR ; restore the monitor stack pointer
NOP
NOP ; landing pad
You know the drill. Step through it, try other constants. Convince yourself that you'd rather use the 6809 than even the 6801, when you're trying to get work done.
(Why didn't Motorola release the 6809 as an SOC core like it did the 6801?
ブツブツブツ)
And now we're going to see some revelations about the single interleaved stack
discipline I keep disparaging:
* 16-bit addition and subtraction for 6809 on return stack,
* with test code
* Joel Matthew Rees, October 2024
*
NATWID EQU 2 ; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
ORG $80 ; MDOS and EXbug docs say it should be okay here.
ENTRY JMP START
NOP ; Just want even addressed pointers for no reason.
SSAVE RMB 2 ; a place to keep S so we can return clean
*
*
ORG $2000 ; MDOS says this is a good place for usr stuff
NOENTRY LBRA START
NOP ; bump to aligned
RMB 2 ; a little bumper space
SSTKLIM RMB 96 ; (64+32) roughly 16 levels of call, max
* ; 6809 is pre-dec (pre-store-decrement) push
SSTKBAS RMB 2 ; a little bumper space
*
*
INISTKS PULS X ; get return address
STS SSAVE ; Save what the monitor gave us.
LEAS SSTKBAS,PCR ; Move to our own stack
JMP ,X ; return via X
*
*
* input parameters:
* 16-bit left, right
* output parameter:
* 16-bit sum
ADD16 PULS X ; get return address out of the way
LDD 2,S ; left
ADDD ,S++ ; right
STD ,S ; sum (N, Z, & C flags should be correct)
JMP ,X ; return
*
* input parameters:
* 16-bit left, right
* output parameter:
* 16-bit difference
SUB16 PULS X ; get return address out of the way
LDD 2,S ; left
SUBD ,S++ ; right
STD ,S ; difference (N, Z, & C flags should be correct)
JMP ,X ; return
*
*
START LBSR INISTKS
*
LDD #$1234
PSHS A,B
LDD #$CDEF
PSHS A,B
LBSR ADD16 ; result should be $E023
LDD #$8765
PSHS A,B
LBSR SUB16 ; result should be $58BE
LDD ,S++ ; load the result into A:B
*
DONE LDS SSAVE,PCR ; restore the monitor stack pointer
NOP
NOP ; landing pad
You're looking at me and saying,
What revelations?????? That looks almost identical to the code for the split stack!!
Well, that should be a revelation. On the 6809, the only cost for using a
separate parameter stack is the cost of declaring the stack space and
initializing it, and then we don't have to fuss with the return address in the
middle of our parameters any more.
In this example we don't really see how much we gain, but at least we can see that there's no real cost -- on a processor like the 6809.
No real cost except the allocation, and so many engineers have thought the
allocation was the biggest hurdle. It seems to be a losing battle, doesn't it.
Let's soldier on.
[EDIT JMR202410042358:]
Almost identical, indeed.
Case in point of how easy it is to mess up your code when you are dancing
around the return address to get to your parameters and local variables.
While working on the equivalent code to the above for the 68000, I realized that I had failed to de-allocate the stack before or on return from the ADD16 and SUB16 routines here. Here's what I had written:
* input parameters:
* 16-bit left, right
* output parameter:
* 16-bit sum
ADD16 LDD 4,S ; left
ADDD 2,S ; right
STD 2,S ; sum (N, Z, & C flags should be correct)
RTS
*
* input parameters:
* 16-bit left, right
* output parameter:
* 16-bit difference
SUB16 LDD 4,S ; left
SUBD 2,S ; right
STD 2,S ; sum (N, Z, & C flags should be correct)
RTS
I had the offsets correct, you see? No problem there. Or, I thought so. I had
successfully avoided overwriting the return address, but now the result was
out of place and in the way, and the stack had one of the input parameters
still live on it after the return. This is a good way to overflow the stack
and in various ways screw up the calculations.
But so many engineers think that they won't do this. Or, rather, that they can write their compilers to keep them from doing it.
And it would be nice if you would believe me for this, but I'm sure I'm going
to have to present stronger evidence than my mistakes to really convince you.
[END EDIT JMR202410042358.]
How is the scratch area in DP version going to look?
* 16-bit addition and subtraction for 6809 via DP scratch pad,
* with test code
* Joel Matthew Rees, October 2024
*
NATWID EQU 2 ; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
ORG $80 ; MDOS and EXbug docs say it should be okay here.
SETDP 0
ENTRY JMP START
NOP ; Just want even addressed pointers for no reason.
SSAVE RMB 2 ; a place to keep S so we can return clean
* parameter/scratch area for leaf functions only:
NLFT RMB 2 ; binary operator left side parameter
NRT RMB 2 ; binary operator right side parameter
NRES RMB 2 ; unary/binary operator result
NTEMP RMB 2 ; general scratch register for
NPAR EQU NLFT ; unary operator parameter
NSCRAT EQU NLFT ;
*
*
ORG $2000 ; MDOS says this is a good place for usr stuff
NOENTRY LBRA START
NOP ; bump to aligned
RMB 2 ; a little bumper space
SSTKLIM RMB 32 ; roughly 16 levels of call, max
* ; 6809 is pre-dec (pre-store-decrement) push
SSTKBAS RMB 2 ; a little bumper space
*
*
INISTKS PULS X ; get return address
STS SSAVE ; Save what the monitor gave us.
LEAS SSTKBAS,PCR ; Move to our own stack
JMP ,X ; return via X
*
*
* Don't need PPOP and PPSH, but wait 'til we need SCRPSH!
*
*
* input parameters:
* 16-bit left, right
* output parameter:
* 16-bit sum
ADD16 LDD NLFT
ADDD NRT
ADD16S STD NRES ; sum
RTS
*
* input parameters:
* 16-bit left, right
* output parameter:
* 16-bit difference
SUB16 LDD NLFT
SUBD NRT
STD NRES ; difference
RTS
* Stealing code would only save 1 byte.
*
*
START LBSR INISTKS
*
LDD #$1234
STD NLFT
LDD #$CDEF
STD NRT
LBSR ADD16 ; result should be $E023
LDD NRES
STD NLFT
LDD #$8765
STD NRT
LBSR SUB16 ; result should be $58BE
LDD NRES
*
* Repeat, with native instructions:
LDD #$1234
ADDD #$CDEF
SUBD #$8765
*
DONE LDS SSAVE,PCR ; restore the monitor stack pointer
NOP
NOP ; landing pad
Now, if it weren't for the LBSR calls instead of the JSR calls, that would
look just like the 6801 code! (Almost.) Why do we even need any stack at
all?
Yeah! Why not just write
LDD #$1234
ADDD #$CDEF
SUBD #$8765
??
Why not just use the 6801?
Patience. We will get there.
You know, I could have shown extended mode addressing vs. direct-page mode on each of these processors. That would be four modes, which would have been maybe too many.
And the only difference between the absolute/extended mode and direct page mode for the 6800 and 6801 would have been the number of bytes for addresses for the parameter stack pointer and scratch registers.
There's another difference on the 6809, however. The DP register lets us move the direct page away from page zero. But ... really, for this example, that would not have been meaningful. We could have deliberately moved DP, but unless you were watching really closely as you stepped through, you might not have noticed.
If the concept intrigues you, give it a try. The SETDP directive will be useful.
Some assemblers expect the SETDP to be given just the high byte of the base address, but the EXORsim assembler expects the whole base address (and warns if it is not on an even 256-byte boundary).
I will show how to use DP later.
I changed my mind. I know you wanted to explore it yourself. You can, of course.
But I'm going ahead and
showing you how to use DP
before we move on to the 68000. There are concepts there I want to reference
when I show you the 68000 code.
No comments:
Post a Comment