And
this one has been sitting at the bottom of the pool
for a while, as predicted, even longer than
the 6809 example.
Ascending the Wrong Island --
Single-stack Stack Frame Example:
6800
Now that we have seen how we can implement concrete examples of both single-stack and split stack stack frames on the 6801, let's see if we can get a better feel for what the 6801's extensions buy for us, by repeating those implementations using only the 6800's original instruction set.
The usual caveat -- I do not recommend stack frames, and I especially do not
recommend combining parameters and return addresses on a single stack. Part of
the reason we're doing this is to study addressing techniques, but the other
part is to convince ourselves that we don't want to do this.
I started by working out an implementation of PUSHX and POPX routines, since
the PSHX and PULX routines featured so prominently the the 6801 code. Late at
night, when I had time to work on this, I typed without thinking, probably in
6809 mode or something,
PUSHX STX XWORK
LDAA XWORK
LDAB XWORK+1
PSHB
PSHA
RTS
*
POPX PULA
PULB
STAA XWORK
STAB XWORK+1
LDX XWORK
RTS
As we know, the results of this code on the 6801 would be humorous. I laughed
at myself and went to bed.
(If these were macros, or if we were doing it in-line, this would actually be exactly what we'd do -- leaving off the RTS, of course. And, of course, if we needed to do a software stack on the 6809, the push and pop routines would be even more straightforward.)
But we have to dance around the return address. So it ends up something like
this:
SPSHX STX XWORK
DES
DES
TSX
LDAA 2,X
LDAB 3,X
STAA 0,X
STAB 1,X
LDAA XWORK
LDAB XWORK+1
STAA 2,X
STAB 3,X
RTS
*
SPULX TSX
LDX 2,X
STX XWORK
LDAA 0,X
LDAB 1,X
STAA 2,X
STAB 3,X
LDX XWORK
INS
INS
RTS
which is disgustingly long. But necessary. Because of the return address dance.
With that written, I at least was confident (the next time I could work on it)
that the same stack frames we used on the 6801 would be workable. (If you
don't have the 6801 code open in another browser window for reference, go
ahead and
open it up, you'll want it handy to compare.) And if the stack frame would be the same,
I could just convert the link and unlink from the 6801 code:
LINKF DES
DES
DES
DES
TSX
LDD 4,X
STD 0,X
LDD VBP
STD 4,X
LDD FP
STD 2,X
INX
INX
STX FP
STX VBP
RTS
*
* No return value on stack
UNLKF TSX
LDD 2,X ; get old FP, dodge return address
STD FP
LDD 4,X ; old VBP
STD VBP
LDD 0,X ; return address
STD 4,X ; copy it so we can return
INS ; drop 4 bytes
INS
INS
INS
RTS
It was a little bit trickier, since we don't have PSHX and PULX on the 6800, but it wasn't too bad.
And then I proceeded to work on converting the addition routines. (And in the process realized I had misnamed SUB16whatever, but I've taken care of that now.)
And I discovered that moving the return value back into the X and A:B registers at the end of the routines was waste motion.
You have to use the accumulators and the index to simply get in and out of
your subroutines' stack frames, so you're just thrashing the stack at
procedure entrance and exit.
So I added two variables in the direct page for the return values, RETVHI and RETVLO.
And I paused to reflect for a moment whether they would have also been useful
in the 6801 code. I think it would be a wash, really, because using these
direct page variables for the return values means that the caller has to load
them, and the 6801 does have PSHX and PULX.
Lots of stuff like that came up during the conversion.
Another issue that came up was that the EXORsim interactive assembler has a bug that makes the FDB (form double byte constant) directive unusable.
I brought in the PSH16I routine to load 16-bit literals out of the instruction stream:
PSH16I TSX ; point to top of return address stack
LDX 0,X ; point into the instruction stream
LDAA 0,X ; high byte from instruction stream
LDAB 1,X ; low byte from instruction stream
INS ; drop the return address we almost have in X
INS
PSHB ; replace it with the constant
PSHA
JMP 2,X ; return to the byte after the constant.
But you have to follow that with the two bytes you want to push onto the stack, and that's a FDB in the case of addresses and 16-bit offsets --
And EXORsim's interactive assembler doesn't help us split addresses up, so there were several places I loaded the 16-bit address or offset into the index register and called PSHX, or did similar things.
And the stack (and runtime) initialization (STKINI is a misnomer, isn't it?) needs these routines, so I moved stuff in there around so that I'd have the stack ready early, And there was some of the math that had to be done by hand in the process of setting the stack up. I and I end up doing some things by hand anyway.
But you'll see a premonition of why this is all so meaningless in a routine
that is just for the initialization code, UADD16.
* Utility 16-bit add, leave result in A:B
UADD16 TSX ; no frame
LDAB 5,X ; left
ADDB 3,X ; right ; because we can
LDAA 4,X ; left
ADCA 2,X ; right
LDX 0,X
UADROP INS ; drop return address and parameters
INS
INS
INS
INS
INS
JMP 0,X ; return via X
I didn't end up using USUB16, but I left it in for your enjoyment.
Anyway, it was by no means as straightforward as I had hoped (of course). I
ended up trying a number things that didn't help, like defining PUSHD and POPD
routines, and a SBX routine.
Admittedly, some of the complexities could be avoided by simply restricting the stack from crossing a 256-byte boundary, and leaving LOUD notes in the comments about ALWAYS keeping the size and location so that it doesn't. You'll note that this it is actually the case here that the size and location would allow us to optimize out carries for the stack pointer.
But, although shouting with capitals can be done in plain text, colored text cannot, so I don't think it's a wise example ...
No, that's not it. It just doesn't really solve enough of the problems to make stack frames a reasonable option. Try it yourself if you are not convinced.
As always, read the code and the comments. Don't assume I remembered to fix the comments after every edit, if the comments don't seem to match the code, they may not, or the code may be doing things you don't understand. Take time to work through it and be sure.
* 16-bit addition as example of single-stack stack frame discipline on 6800,
* with test code
* Joel Matthew Rees, October, November 2024
*
OPT 6800
NATWID EQU 2 ; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
ORG $80 ; MDOS says this is a good place for user stuff.
*
ENTRY JMP START
NOP ; Just want even addressed pointers for no reason.
NOP ; bumper
NOP ; 6 bytes to this point.
SSAVE RMB 2 ; a place to keep S so we can return clean
RMB 4 ; bumper
* All of the pseudo-registers must be saved and restored on context switch,
* cannot be accessed during interrupt service.
XWORK RMB 2 ; For saving an index register temporarily in leaf functions only
DWORK RMB 2 ; For saving D temporarily in leaf functions only
RETVHI RMB 2 ; high half of 32-bit return values (because we can't push X easily)
RETVLO RMB 2 ; 16-bit return values and low half (because loading and saving is redundant)
FP RMB 2 ; frame pointer
VBP RMB 2 ; variable base pointer
LB_BASE RMB 2 ; For process local variables
HPPTR RMB 2 ; heap pointer (not yet managed)
HPALL RMB 2 ; heap allocation pointer
HPLIM RMB 2 ; heap limit
* End of pseudo-registers
RMB 4 ; bumper
GAP1 RMB 2 ; Mark the bottom of the gap
*
*
*
ORG $2000 ; Give the DP room.
LB_ADDR RMB 4 ; a little bumper space
FINAL RMB 4 ; 32-bit Final result in DP variable (to show we can)
FINALX EQU 4
STKLIM RMB 192 ; roughly 16 to 20 levels of call
STKLIMX EQU FINALX+4
STKBAS RMB 8 ; for canary return
STKSZ EQU 192 ; for EXORsim assembler limits
STKBASX EQU STKLIMX+192 ; must be STKLIMX+STKSZ -- assembler won't take symbol
STKFAK RMB 2 ; fake frame pointer, self-link
STKFAKX EQU STKBASX+8 ; 6801 is post-dec (post-store-decrement) push
STKBMP RMB 4 ; a little bumper space
STKBMPX EQU STKFAKX+2 ; But we are going to init S through X
*
* My assembler limits RMBs to $100 long, so we'll use a different way.
HBASE RMB 1 ; $1024 or something ; Not using or managing heap yet.
HBASEX EQU STKBMPX+4
*HLIM RMB 4 ; bumper
*HLIMX EQU HBASEX+$100 ; 1024
*
*
ORG $3000
CDBASE JMP ERROR ; more bumpers
NOP
*STKBASM FDB STKBASX ; Doesn't work within EXORsim assembler limits after all
*HBASEXM FDB HBASEX ; by avoiding splitting large constants up at assemble time
*
INISTK LDX #LB_ADDR ; set up process local space
STX LB_BASE ; local space functional
LDAA LB_BASE ; bootstrap own stack
LDAB LB_BASE+1
* ADDB STKBASM+1
* ADCA STKBASM
LDX #STKBASX ; Instead of FDB
STX XWORK
ADDB XWORK+1
ADCA XWORK
*
STAB XWORK+1 ; initial stack pointer
STAA XWORK
*
LDX #STKUNDR ; for fake return address
STX DWORK ; save it for a moment
*
PULA ; pop real return address
PULB
LDX XWORK ; ready own stack pointer
STS SSAVE ; save stack pointer from monitor ROM
TXS ; move to our own stack (let TXS convert it)
PSHB ; put return address on own stack
PSHA ; stack now ready for interrupts, utility routines
*
LDAA DWORK ; error handler for fake return
LDAB DWORK+1
STAA 0,X ; in the cell beyond empty stack pointer
STAB 1,X
STAA 6,X ; full fake frame
STAB 7,X
LDAA XWORK ; calculate final self-link
LDAB XWORK+1
ADDB #8
ADCA #0
STAA 4,X ; fake VBP
STAB 5,X
STAA 8,X ; final self-link
STAB 9,X
INX ; prepare first fake stack frame links
INX
STX FP ; get frame pointers ready
STX VBP
STX 0,X ; first self-link for list terminator
*
LDAA LB_BASE
LDAB LB_BASE+1
PSHB
PSHA
* JSR PSH16I
* FDB HBASEX ; EXORsim's interactive assembler doesn't like FDBs.
LDX #HBASEX
JSR SPSHX
*
JSR UADD16
STAA HPPTR ; as if we were ready to use heap
STAB HPPTR+1
STAA HPALL
STAB HPALL+1
* JSR PSH16I ; FDBs
* FDB CDBASE
* JSR PSH16I
* FDB (-4) ; extra bumper
* JSR UADD16
LDX #CDBASE
STX XWORK
LDAA XWORK
LDAB XWORK+1
SUBB #4
SBCA #0
*
STAA HPLIM
STAB HPLIM+1
RTS ; finally done, now can return
*
***
* Since negative index offsets are so expensive,
* we want to create a stack frame with only positive offsets.
* And we want the frame pointer to be pushed after the call,
* on entry to the local context.
* And the saved frame pointer needs to link to the previous one.
* And when we restore the previous frame,
* we need to be able to restore the previous frame base.
*
* Cross-section of general frame structure in called routine:
* [{LOCVAR}] for calling routine
* [{TEMP} ] for calling routine
* [PARAM ] from calling routine
* [RETADR ] to calling routine
* [VARBP ] base of local variables in calling routine
* [FRMLNK ] at entry to calling routine
* [LOCVAR ] for called -- current -- routine
* [TEMP ] for called -- current -- routine
* [(PARAM) ] to be passed to a further call
*
* Broader cross-section, showing chaining for routine 3, in-flight:
* [RETADR1 ]
* [VARBP1 ]
* [FRMLNK2 ] <= FRMLNK3
* [LOCVAR2 ] <= VARBP2
* [TEMP2 ]
* [PARAM3 ]
* [RETADR2 ]
* [VARBP2 ]
* [FRMLNK3 ] <= FP (frame pointer)
* [LOCVAR3 ] <= VBP (variable base pointer)
* [TEMP3 ]
* [(PARAM4)] <= SP (return stack pointer (6800 S is byte below))
***
*
***
* Utility routines
*
* Push low half of return value
PSHLH TSX
LDAA 0,X ; return address
LDAB 1,X
PSHB
PSHA
LDAA RETVLO
LDAB RETVLO+1
STAA 0,X
STAB 1,X
RTS
*
* Avoid the math to split 16-bit constants into two 8-bit loads,
* and push them while we are here.
* The constant follows the call in the instruction stream.
* Leaves constant in A:B, as well.
PSH16I TSX ; point to top of return address stack
LDX 0,X ; point into the instruction stream
LDAA 0,X ; high byte from instruction stream
LDAB 1,X ; low byte from instruction stream
INS ; drop the return address we almost have in X
INS
PSHB ; replace it with the constant
PSHA
JMP 2,X ; return to the byte after the constant.
*
* 8 bytes for the meat of this vs. 3 for the call.
* We end up using it a lot since EXORsim's interactive assembler doesn't do FDBs.
SPSHX STX XWORK
DES
DES
TSX
LDAA 2,X
LDAB 3,X
STAA 0,X
STAB 1,X
LDAA XWORK
LDAB XWORK+1
STAA 2,X
STAB 3,X
RTS
*
* 6 bytes for the meat of this vs. 3 for the call, instead of FDB
TXD STX XWORK
LDAA XWORK
LDAB XWORK+1
RTS
*
* Utility 16-bit add, leave result in A:B
UADD16 TSX ; no frame
LDAB 5,X ; left
ADDB 3,X ; right ; because we can
LDAA 4,X ; left
ADCA 2,X ; right
LDX 0,X
UADROP INS ; drop return address and parameters
INS
INS
INS
INS
INS
JMP 0,X ; return via X
*
* Utility 16-bit add, leave result in A:B
USUB16 TSX ; no frame
LDAB 5,X ; left
SUBB 3,X ; right ; because we can
LDAA 4,X ; left
SBCA 2,X ; right
LDX 0,X
BRA UADROP ; drop return address and parameters
*
* Let the caller do allocation after.
LINKF DES ; allocate room to push to
DES
DES
DES
TSX
LDAA 4,X ; return address
LDAB 5,X ; not sure of any reason to use or not use B
STAA 0,X ; move it down to new top of stack
STAB 1,X
LDAA VBP ; copy VBP and FP above return address
LDAB VBP+1
STAA 4,X
STAB 5,X
LDAA FP
LDAB FP+1
STAA 2,X
STAB 3,X
INX
INX
STX FP
STX VBP
RTS
*
* No return value on stack
UNLKF LDX FP
LDAA 2,X ; old VBP
LDAB 3,X
STAA VBP
STAB VBP+1
PULA ; get the return address
PULB
STAA 2,X ; put return address in place
STAB 3,X
TXS ; drop temporaries and locals
LDX 0,X ; get old FP
STX FP
INS
INS
RTS
*
* We really don't want to put S in a temp if we can avoid it
ALOCS8 PULA
PULB
ALOS8I DES
DES
ALOS6I DES
DES
ALOS4I DES
DES
ALOS2I DES
DES
PSHB
PSHA
RTS
*
ALOCS6 PULA
PULB
BRA ALOS6I
*
ALOCS4 PULA
PULB
BRA ALOS4I
*
ALOCS2 PULA
PULB
BRA ALOS2I
*
INI0_8 CLRA
CLRB
* call with initialization value in A:B
INIS8 TSX
INIT8 STAA 8,X
STAB 9.X
INIT6 STAA 6,X
STAB 7.X
INIT4 STAA 4,X
STAB 5.X
INIT2 STAA 2,X
STAB 3,X
RTS ; 0,X is return address!
*
INI0_6 CLRA
CLRB
* call with initialization value in A:B
INIS6 TSX
BRA INIS6
*
INI0_4 CLRA
CLRB
* call with initialization value in A:B
INIS4 TSX
BRA INIS4
*
INI0_2 CLRA
CLRB
* call with initialization value in A:B
INIS2 TSX
BRA INIS2
*
DROP8 PULA
PULB
INS
INS
DROP6I INS
INS
INS
INS
INS
INS
PSHB
PSHA
RTS
*
DROP6 PULA
PULB
BRA DROP6I
*
*
* Stack after LINK and allocation
* when functions are called by MAIN
* with two parameters
* We will return results in RETVHI:RETVLO in direct page
* [<SELF> ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY ]
* [<SELF> ] <= <SELF>,VARBPX,FRMLNKY
* [STKUNDR ]STKBAS
* [VARBPX ]
* [FRMLNKY=STKBAS+NATWID ] <= FRMLNKX,VARBP0
* [RETADR0 ]
* [VARBP0 ]
* [FRMLNKX ] <= FRMLNK0
* [32:VAR1_1]
* [32:VAR1_2] <= VARBP1
* [PARAM2_1]
* [PARAM2_2]
* [RETADR1 ]
* [VARBP1 ]
* [FRMLNK0 ] <= FP,SP,VBP
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
* 16-bit left 1st pushed, right 2nd
* output parameter:
* 17-bit sum in 32-bit D:X D high, X low
* Does not alter the parameters.
ADD16S JSR LINKF
TSX ; no local allocations
*
LDAA #(-1) ; prepare for sign extension
TST 8,X ; the left-hand operand sign bit
BMI ADD16SR
CLRA ; zero extend
ADD16SR PSHA ; push left extension
PSHA ; left sign cell below X now
LDAA #(-1) ; reload
TST 6,X ; the right-hand operand sign bit
BMI ADD16SL
CLRA ; zero extend
ADD16SL PSHA ; push right extension
PSHA
TSX ; point to sign extensions
LDAA 12,X ; left-hand low cell
LDAB 13,X
ADDB 11,X ; right-hand low cell
ADCA 10,X
STAA RETVLO ; save low half of result
STAB RETVLO+1
LDAA 2,X ; left-hand extension
LDAB 3,X
ADCB 1,X ; right-hand extension
ADCA 0,X
STAA RETVHI ; Save high half of result
STAB RETVHI+1
*
JSR UNLKF ; drops temporaries
RTS ; result is in RETVLO:RETVHI
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
* 16-bit left, right
* output parameter:
* 17-bit sum in 32-bit D:X D high
ADD16U JSR LINKF
TSX ; no local allocations
*
LDAA 8,X ; left
LDAB 9,X
ADDB 7,X ; right
ADCA 6,X
STAA RETVLO ; save low half
STAB RETVLO+1
LDAB #0
ADCB #0
STAB RETVHI+1 ; save carry bit in high half
CLR RETVHI ; will never carry beyond bit 17
*
JSR UNLKF ; drops temporaries
RTS ; result is in RETVLO:RETVHI
*
* Etc.
*
***
*
* Stack after LINK #0 when fuctions are called by MAIN
* with one input parameter
* (#0 means no local variables)
* [<SELF> ] <= <SELF>
* [<SELF> ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY ]
* [<SELF> ] <= <SELF>,VARBPX,FRMLNKY
* [STKUNDR ]STKBAS
* [VARBPX ]
* [FRMLNKY=STKBAS+NATWID ] <= FRMLNKX,VARBP0
* [RETADR0 ]
* [VARBP0 ]
* [FRMLNKX ] <= FRMLNK0
* [32:VAR1_1]
* [32:VAR1_2] <= VARBP1
* [PARAM2_1]
* [RETADR1 ]
* [VARBP1 ]
* [FRMLNK0 ] <= FP,SP,VBP
*
* To show how to walk the stack --
* Add 16-bit signed parameter
* to 32 bit caller's 2nd 32-bit internal variable.
* input parameter:
* 16-bit addend
* target parameter in caller
* 2nd 32-bit variable at offset -2*NATWID
* no output parameter:
ADD16SI JSR LINKF
TSX ; no local variables
*
LDAA #(-1)
TST 6,X ; high byte of paramater
BMI ADD16SIP
CLRA
ADD16SIP PSHA ; save the sign extension half
PSHA
LDX 2,X ; get caller's VBP
LDAA 2,X ; caller's 2nd variable, low
LDAB 3,X
LDX FP
ADDB 7,X ; parameter
ADCA 6,X
LDX 2,X ; caller's VBP
STAA 2,X ; save result low half away
STAB 3,X
LDAA 0,X ; caller's 2nd variable, high
LDAB 1,X
TSX
ADCB 1,X ; sign extension half
ADCA 0,X
LDX FP
LDX 2,X
STAA 0,X ; save result high half away
STAB 1,X
*
JSR UNLKF ; drops temporaries
RTS ; no result to load
*
*
***
* Stack after LINK
* [<SELF> ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY ]
* [<SELF> ] <= <SELF>,VARBPX,FRMLNKY
* [STKUNDR ]STKBAS
* [VARBPX ]
* [FRMLNKY=STKBAS+NATWID ] <= FRMLNKX,VARBP0
* [RETADR0 ]
* [VARBP0 ]
* [FRMLNKX ] <= FP
* [32:VAR1_1]
* [32:VAR1_2] <= SP,VBP
*
MAIN JSR LINKF
JSR ALOCS8 ; 2 calls, 6 bytes vs. 1 clr + 8 pushes , 9 bytes
JSR INI0_8
TSX
STX VBP ; link and allocate complete
*
JSR PSH16I
* FDB $1234 ; parameters
FCB $12
FCB $34
JSR PSH16I
* FDB $CDEF
FCB $CD
FCB $EF
JSR ADD16U ; result in D:X should be $E023
INS ; drop one parameter, reuse other
INS
TSX
LDAA RETVLO ; four extra bytes compared to calling PSHLH
LDAB RETVLO+1
STAA 0,X
STAB 1,X
JSR PSH16I
* FDB $8765
FCB $87
FCB $65
JSR ADD16S ; result in D:X should be $FFFF6788
INS ; drop one parameter, reuse other
INS
LDX VBP
LDAA RETVHI
LDAB RETVHI+1
STAA 0,X
STAB 1,X
LDAA RETVLO
LDAB RETVLO+1
STAA 2,X
STAB 3,X
TSX
LDAB #$A5
STAB 0,X ; $A5
STAB 1,X ; $A5A5
JSR ADD16SI ; result in 2nd variable should be FFFF0D2D
LDX VBP ; get the result from our variable
LDAA 2,X ; low half
LDAB 3,X
LDX LB_BASE ; store it in FINAL, in process local space
STAA FINALX+2,X
STAB FINALX+3,X
LDX VBP
LDAA 0,X ; high half
LDAB 1,X
LDX LB_BASE
STAA FINALX,X
STAB FINALX+1,X
*
JSR UNLKF
RTS
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP
***
* (who knows?) <= FP
***
* (who knows?) <= VBP
***
*
* Stack after initialization:
* [<SELF> ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY ]
* [<SELF> ] <= <SELF>,FP,VBP
* [STKUNDR ]STKBAS <= SP
***
* Stack after LINK (at call to MAIN)
* [<SELF> ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY ]
* [<SELF> ] <= <SELF>,VARBPX,FRMLNKY
* [STKUNDR ]STKBAS
* [VARBPX ]
* [FRMLNKY=STKBAS+NATWID ] <= SP,FP,VBP
*
START NOP
JSR INISTK
NOP
*
JSR LINKF
*
JSR MAIN
*
JSR UNLKF
*
DONE NOP
ERROR NOP ; define error labels as something not DONE, anyway
STKUNDR NOP
LDS SSAVE ; restore the monitor stack pointer
NOP
NOP ; landing pad to set breakpoint at
NOP
NOP
LDX $FFFE ; alternatively, jmp through reset vector
JMP 0,X
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor,
* but not necessarily to your program in a runnable state.
I had to use my own assembler to clean up some mistakes, but the code assembles and runs correctly in EXORsim. As always, I will make no guarantees that this code is appropriate to be generalized for compilers and such.
We've seen what this kind of code looks like without stack frames, but once I get the split-stack version of this code up, I'm planning to do the functionality without frames so you can really see it and compare.
Again, if you're getting worn out, go ahead and move
on to
getting numeric output in binary.