Well,
this should be the last of the rubber bricks at the bottom of the pool, taking, as predicted, a long time, much longer than
the 6809 example.
Walking the Pontoons --
Split-stack Stack Frame Example:
6800
Now that we have worked out the concrete single-(combined)-stack stack frame example for the 6800, let's get a look at how the split stack discipline/paradigm improves things on this processor.
The conversion of both the split-stack and single-stack stack frame code from the 68000 to the 6809 was straightforward.
From the 6809 to the
6801, the single-stack conversion was rather hairy. It surprised me a little that
the conversion from the 6801 to the
6800
was also rather hairy. The difference a few instructions makes can be rather
dramatic, and the less support you have for
address math, the more the combined single stack shows up as a bottleneck
By comparison, the conversion of the split-stack code was a little tricky from the 6809 to the 6801, but quite straightforward from the 6801 to the 6800. This really isn't a surprise, since the big jump from the 6809 to the 6801 is the code to support the software stack, and that code moved to the 6800 is primarily a matter of doing 16-bit additions and subtractions a byte at a time, which we now know is not that hard.
It's mostly just a matter of doing the less-significant byte before the more-significant, and remembering to include the carry on the more significant byte -- mostly just replacing one instruction with two.
Well, and some places where you use PSHX and PULX on the 6801, where on the
6800 you instead need to save X to a temporary and, as immediately as
possible, grab it back in one or both accumulators and push it or do math on
it. Or the reverse order, popping the address to accumulator(s), saving in the
proper order to adjacent temporary bytes, maybe working on it, and loading it
back into X as soon as possible. Oh, and make sure the stack pointer gets
updated in the appropriate places.
I used both accumulators to make it clear what I was doing, but if you need to
keep one accumulator available for something else, the process can be done
just as efficiently with one accumulator. Even the addition and subtraction
works in the same number of instructions with a single accumulator, you just
have to complete moving and/or working on one byte before starting on the
other.
This is, as I say, all straightforward -- because you separate the stacks.
When you're working on paramaters, the parameter pointer is in X. When you're
working on the return stack, you've pulled S into X. When you are pointing at
other things, the pointer for that is in X.
When you have the stacks combined, there are times when you find yourself needing to point at two things at once, and finding yourself with only one X. Sure, you can save X off to a temporary, but then you have to be careful how you use that temporary for other things, or the code balloons in your face (if it doesn't just blow up).
Yes, this is a problem in allocating temporary variables and in being
disciplined in how you use them, but it's also a problem in optimization
against the discipline. You want to make efficient use of the on-CPU resources
if you can.
I did hit one snag in the un-mark routine. With the 6801, the split stack made it unnecessary to actually have an un-mark routine, but the 6800 used enough instructions to make it worthwhile. But, in replacing the PULX instruction in that routine, I used INX instead of INS to update the return stack pointer, and lost a couple of hours staring at the code and tracing through it and wondering why the stack pointer ended up pointing to a frame boundary instead of a return address when the code was supposed to be returning from a subroutine.
And that was the hardest problem going from 6801 to 6800 in the split-stack
discipline.
I want to note that I realized, while debugging this, that the FINAL variable
really should have some bumper space after it. We know our test routines won't
get anywhere even close to overwriting it. And it's only used at the end, so,
even if a stack overflow overwrote it, it wouldn't matter. But it still
should have the buffer. Or, at least, I should be testing it to be still zero
before we store the final value. It's something to keep in mind.
For reference, here's what the two stacks look like, from the discussion of the two-stack discipline on the 6801, the return stack:
* [PRETADR ]
* [PCALLERFRM]
* [RETADR ]
* [CALLERFRM ] <= RSP
and the (conceptual) parameter stack:
* [VARIABLES ] <= CALLERFRM
* [TEMPORARIES]
* [PARAMETERS ]
* [VARIABLES ] <= FP
* [TEMPORARIES]
* [PARAMETERS ] <= PSP
again, keeping offsets positive because we don't have the 6809's fancy negative offsets.
I've left some routines i didn't end up using in the code, don't let that
bother you.
* 16-bit addition as example of split-stack frame-less discipline on 6800
* using the direct page,
* with test code
* Joel Matthew Rees, October, November 2024
*
OPT 6800
NATWID EQU 2 ; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
ORG $80 ; MDOS says this is a good place for usr stuff.
*
ENTRY JMP START
NOP ; Just want even addressed pointers for no reason.
NOP ; bumper
NOP ; 6 bytes to this point.
SSAVE RMB 2 ; a place to keep S so we can return clean
RMB 4 ; bumper
* All of the pseudo-registers must be saved and restored on context switch,
* cannot be accessed during interrupt service.
XWORK RMB 2 ; For saving an index register temporarily
DWORK RMB 2 ; For saving D temporarily
ALCOUNT RMB 1 ; for counting utility routines
RMB 1 ; reserved
PSP RMB 2 ; parameter stack pointer
PTR RMB 2 ; frame pointer
LB_BASE RMB 2 ; For process local variables
HPPTR RMB 2 ; heap pointer (not yet managed)
HPALL RMB 2 ; heap allocation pointer
HPLIM RMB 2 ; heap limit
* End of pseudo-registers
RMB 4 ; bumper
GAP1 RMB 2 ; Mark the bottom of the gap
*
*
*
ORG $2000 ; Give the DP room.
LB_ADDR RMB 4 ; a little bumper space
FINAL RMB 4 ; 32-bit Final result in DP variable (to show we can)
FINALX EQU 4 ; Should've had a bumper after this (but it's only used at the end?)
SSTKLIM RMB 64 ; 16 levels of call
SSTKLMX EQU FINALX+4
SSTKBAS RMB 6 ; for canary return
SSTKBSX EQU SSTKLMX+64
SSTKFAK RMB 2 ; fake frame pointer, self-link
SSTKFAX EQU SSTKBSX+6 ; 6801 is post-dec (post-store-decrement) push
SSTKBMP RMB 4 ; a little bumper space
SSTKBMX EQU SSTKFAX+2 ; But we are going to init S through X
PSTKLIM RMB 64 ; 16 levels of call at two parameters per call
PSTKLMX EQU SSTKBMX+4
PSTKBAS RMB 4 ; bumper space -- parameter stack is pre-dec
PSTKBSX EQU PSTKLMX+64
PSTKBMP RMB 4 ; a little bumper space
PSTKBMX EQU PSTKBSX+4
*
* My assembler limits RMBs to $100 long, so we'll use a different way.
HBASE RMB 1 ; $1024 or something ; Not using or managing heap yet.
HBASEX EQU PSTKBMX+4
*HLIM RMB 4 ; bumper
*HLIMX EQU HBASEX+$100 ; 1024
*
*
ORG $3000
CDBASE JMP ERROR ; more bumpers
NOP
*
INISTKS LDX #LB_ADDR ; set up process local space
STX LB_BASE
LDAA LB_BASE ; bootstrap own return stack
LDAB LB_BASE+1
LDX #SSTKBSX ; Instead of FDB
STX XWORK
ADDB XWORK+1
ADCA XWORK
STAB XWORK+1 ; initial return stack pointer
STAA XWORK
*
LDX #SSTKNDR ; for fake return address
STX DWORK ; save it for a moment
PULA ; pop real return address
PULB
LDX XWORK ; ready own return stack pointer
STS SSAVE ; save stack pointer from monitor ROM
TXS ; move to our own stack (let TXS convert it)
PSHB ; put return address on own stack
PSHA ; stack now ready for interrupts
*
LDAA DWORK ; error handler for fake return
LDAB DWORK+1
STAA 0,X ; in the cell beyond empty stack pointer
STAB 1,X ; prime the return stack with error handler
STAA 4,X ; second fake return to error handler
STAB 5,X
*
LDAA LB_BASE ; bootstrap parameter stack
LDAB LB_BASE+1
LDX #PSTKBSX ; Instead of FDB
STX XWORK
ADDB XWORK+1 ; initial parameter stack pointer
ADCA XWORK
STAA PSP ; parameter stack now ready
STAB PSP+1
STAA FP ; initial frame pointer
STAB FP+1
TSX ; pointing at return address below fake
STAA 4,X ; empty parameter stack frame pointer for fake frame,
STAB 5,X ; stacks primed
*
LDAA LB_BASE ; set up heap as if we actually had one
LDAB LB_BASE+1
LDX #HBASEX ; Instead of FDB
STX XWORK
ADDB XWORK+1 ; calculat EA
ADCA XWORK
STAA HPPTR
STAB HPPTR+1
STAA HPALL ; as if the heap were functional
STAB HPALL+1
LDX #CDBASE
STX XWORK
LDAA XWORK
LDAB XWORK+1
SUBB #4
SBCA #0
STAA HPLIM
STAB HPLIM+1
RTS
*
*
***
* General structure of the stacks,
*
* return stack is always in pairs:
* [PRETADR ]
* [PCALLERFRM]
* [RETADR ]
* [CALLERFRM ] <= RSP
*
* order of elements on the parameter stack,
* when they are present:
* [PARAMETERS ]
* [VARIABLES ] <= CALLERFRM
* [TEMPORARIES]
* [PARAMETERS ]
* [VARIABLES ] <= FP
* [TEMPORARIES]
* [PARAMETERS ] <= PSP
*
* Result is returned on parameter stack
*
***
* Utility routines
PPOPD LDX PSP
LDAA 0,X
LDAB 1,X
INX
INX
STX PSP
RTS
*
PPSHD LDX PSP
DEX
DEX
STX PSP
STAA 0,X
STAB 1,X
RTS
*
* subroutine to make sure we don'T forget anything
MARK PULA ; move the return address down
PULB
DES
DES
TSX ; to stack the mark
PSHB
PSHA
LDAA FP
LDAB FP+1
STAA 0,X ; mark old FP, no allocate
STAB 1,X
LDX PSP
STX FP ; update frame pointer
RTS
*
UNMK PULA ; get the return address out of the way
PULB
TSX ; point to the return stack
LDX 0,X ; get the old frame pointer
STX FP ; and restore it
INS ; drop the old frame pointer
INS
PSHB ; put the return address back
PSHA
RTS
*
* Compromise between speed and reusability
* Enter here to load PSP and initialize to 0
* 8 bytes
ALCL8 CLRA
CLRB
* Enter here with initial value in A:B
ALCLD8 LDX PSP
* Enter here with PSP loaded and initial value in D
ALCLI8 DEX
DEX
STAA 0,X
STAB 1,X
ALCLI6 DEX
DEX
STAA 0,X
STAB 1,X
ALCLI4 DEX
DEX
STAA 0,X
STAB 1,X
ALCLI2 DEX
DEX
STAA 0,X
STAB 1,X
STX PSP
RTS
*
* six bytes
ALCL6 CLRA
CLRB
ALCLD6 LDX PSP
BRA ALCLI6
*
* four bytes
ALCL4 CLRA
CLRB
ALCLD4 LDX PSP
BRA ALCLI4
*
* two bytes
ALCL2 CLRA
CLRB
LDX PSP
BRA ALCLI4
*
*
PDROP_8 LDAB #8 ; saves two bytes, 7 vs. 3
CLRA
* Add A:B to PSP -- negative for allocation, positive for deallocation
ADDPSP ADDB PSP+1
ADCA PSP
STAA PSP
STAB PSP+1
LDX PSP ; return with X ready
RTS
*
* Just use ADDPSP with a negative 8 or whatever
*PDROP_8 LDAB #8 ; saves two bytes, 7 vs. 3
* deallocate count in B
*PDROPB LDX PSP ; 5 bytes to deallocate in-line
* ABX ; vs. 3 bytes to call this.
* STX PSP ; ABX is useful for deallocation
* RTS ; 5 bytes vs. 7 total
*
*
***
* Return stack when functions are called by MAIN
* Return stack on entry, after link:
* [SSTKNDR ]
* [<EMPTYP>]
* [SSTKNDR ]SSTKBAS
* [FRMPTRm1==<EMPTYP>]
* [RETADR0 ]
* [FRMPTR0==<EMPTYP>]
* [RETADR1 ]
* [FRMPTR1 ] <= RSP
*
* Parameter stack when called by MAIN
* with two 32-bit local variables
* and two 16-bit parameters,
* after mark (no local allocation)
* [<unknown>] <= FRMPTR0,FRMPTR1
* [32:VAR1_1--]
* [32:VAR1_2--] <= FRMPTR1
* [16:PARAM2_1]
* [16:PARAM2_2] <= PSP,FP
*
****
*
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
* 16-bit left, right
* output parameter:
* 17-bit sum in 32-bit
ADD16S JSR MARK ; mark, no allocate, X is PSP
*
LDAB #(-1) ; default negative
TBA
JSR ALCLI4 ; allocate 2 temporary cells and init
LDX FP ; (could have optimized that to 2 bytes.)
TST 2,X ; the left-hand operand sign bit
BMI ADD16SR
LDX PSP
CLR 2,X ; positive
CLR 3,X
ADD16SR LDX FP
TST 0,X ; the right-hand operand sign bit
BMI ADD16SL
LDX PSP
CLR 0,X ; positive
CLR 1,X
ADD16SL LDX FP
LDAA 2,X ; left hand
LDAB 3,X
ADDB 1,X ; right hand
ADCA 0,X
STAA 2,X ; store low half
STAB 3,X
LDX PSP
LDAA 2,X
LDAB 3,X
ADCB 1,X
ADCA 0,X
LDX FP ; wouldn't need to do this if we tracked PSP extras
STAA 0,X
STAB 1,X
*
STX PSP ; drop the temporaries
JSR UNMK
RTS
*
* The alternative without link, mark, or restore will be shown in the no-frame case.
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
* 16-bit left, right
* output parameter:
* 17-bit sum in 32-bit
ADD16U JSR MARK ; mark, no allocate, X is PSP
*
LDX FP
LDAA 2,X ; left
LDAB 3,X
ADDB 1,X ; add right
ADCA 0,X
STAA 2,X ; save low
STAB 3,X
LDAB #0 ; extend
ADCB #0 ; extend Carry unsigned (could ROL)
STAB 1,X ; re-use right side to store high half
CLR 0,X ; only bit 8 can be affected
*
JSR UNMK
RTS
*
* Etc.
*
*
***
* Parameter stack when called by MAIN
* with one 16-bit parameter,
* after mark (no local allocation)
* [<unknown> ] <= FRMPTR0
* [32:VAR1_1 ]
* [32:VAR1_2 ] <= FRMPTR1
* [16:PARAM2_1] <= PSP,FP
*
* To show how to walk the stack --
* Add 16-bit signed parameter
* to 32 bit caller's 2nd 32-bit internal variable.
* input parameter:
* 16-bit addend
* target parameter in caller
* 2nd 32-bit variable at offset -2*NATWID
* no output parameter:
ADD16SI JSR MARK
*
LDAB #(-1) ; make a temporary -1
TBA
JSR ALCLI2 ; (default to signed)
LDX FP
TST 0,X ; test high byte
BMI ADD16SP
LDX PSP
CLR 0,X ; zero extend
CLR 1,X
ADD16SP TSX
LDX 0,X ; caller's FP
LDAA 2,X ; caller's 2nd variable, low
LDAB 3,X
LDX FP
ADDB 1,X ; parameter
ADCA 0,X
TSX
LDX 0,X
STAA 2,X ; update low half with result
STAB 3,X
LDAA 0,X ; 2nd variable, high half
LDAB 1,X
LDX PSP
ADCB 1,X ; sign extension half
ADCA 0,X
TSX
LDX 0,X
STAA 0,X ; update high half
STAB 1,X
*
LDX FP
INX ; drop parameter
INX
STX PSP ; and sign temporary goes bye-bye, too
JSR UNMK
RTS
*
*
***
* Return stack on entry:
* [SSTKNDR ]
* [<EMPTYP>]
* [SSTKNDR ]SSTKBAS
* [FRMPTRm1==<EMPTYP>]
* [RETADR0 ] <= RSP
*
* Return stack after link:
* [SSTKNDR ]
* [<EMPTYP>]
* [SSTKNDR ]SSTKBAS
* [FRMPTRm1==<EMPTYP>]
* [RETADR0 ]
* [FRMPTR0==<EMPTYP>] <= RSP
*
* Parameter stack after mark and local allocation
* [<unknown>] <= FRMPTR0
* [VAR1_1--]
* [VAR1_2--] <= PSP,FP
*
MAIN JSR MARK
JSR ALCL8 ; allocate and clear 8 bytes
STX FP ; Point FP to base of local variables.
*
LDAA #$12
LDAB #$34
JSR PPSHD
LDAA #$CD
LDAB #$EF
JSR PPSHD
JSR ADD16U ; 32-bit result on parameter stack should be $0000E023
LDAA #$87
LDAB #$65
LDX PSP ; reuse parameter space, since order is okay
STAA 0,X
STAB 1,X
JSR ADD16S ; result on parameter stack should be $FFFF6788
LDX PSP
LDAA 2,X ; result low half
LDAB 3,X
LDX FP
STAA 2,X ; to 2nd local variable low half
STAB 3,X
LDX PSP
LDAA 0,X ; result high half
LDAB 1,X
LDX FP
STAA 0,X ; to 2nd local variable high half
STAB 1,X
LDAA #$A5
TAB
JSR PPSHD
JSR ADD16SI ; result in 2nd variable should be FFFF0D2D (Carry set)
LDX FP
LDAA 2,X ; 2nd variable low half
LDAB 3,X
LDX LB_BASE
STAA FINALX+2,X
STAB FINALX+3,X
LDX FP
LDAA 0,X
LDAB 1,X
LDX LB_BASE
STAA FINALX,X
STAB FINALX+1,X
*
JSR PDROP_8
JSR UNMK
RTS
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP
***
* (who knows?) <= FP
***
*
***
* Return stack will always be in pairs:
* [RETADRNN ]
* [CALLERFMNN]
*
* Return stack after initialization:
* [SSTKNDR ]
* [<EMPTYP>]
* [SSTKNDR ]SSTKBAS <= RSP
*
* Return stack after saving previous mark:
* [SSTKNDR ]
* [<EMPTYP>]
* [SSTKNDR ]SSTKBAS
* [FRMPTRm1==<EMPTYP>] <= RSP
*
* Parameter stack after initialization, mark:
* [<unknown]PSTKBAS <= PSP,FP==<EMPTYP>
*
START JSR INISTKS
JSR MARK
* LDAA PSP
* LDAB PSP+1
* JSR PPSHD ; empty previous mark
* STX FP ; empty new mark
*
JSR MAIN
*
JSR UNMK
*
*
DONE NOP
ERROR NOP ; define error labels as something not DONE, anyway
SSTKNDR NOP
LDS SSAVE ; restore the monitor stack pointer
NOP
NOP
NOP ; another landing pad to set breakpoint at
NOP
LDX $FFFE
JMP 0,X ; alternatively, jmp through reset vector
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor,
* but not necessarily to your program in a runnable state.
As always, I have tested this code. it handles the stack frames properly and gets the right result in the right place. And, as always, I do not guarantee that this code can be generalized or automatically produced, by hand or by compiler.
As I keep saying, we have seen what
this kind of code looks like without stack frames. I plan to come back and do these examples without stack frames, so we can get a clearer picture of how the disciplines affect the code. It may take several weeks, as I have to get back to some sort of day job to pay the rent.
I now (25 Nov.) have the functionality of this code done in single-stack discipline, without stack frames. Have a look at it.
At some point in the tutorial, where we've gone far enough to need to reference these concepts again, I will point you back here. If you haven't really followed what the stack frames business is all about, you will want to come back and re-read this then. You may want to, anyway.
For the time being, here's some simpler stuff --
getting numeric output in binary.
No comments:
Post a Comment