And
this one has been sitting at the bottom of the pool
for a while, as predicted, even longer than
the 6809 example.
Ascending the Wrong Island --
Single-stack Stack Frame Example:
6801
This is a concrete example to demonstrate some approaches to the problems in single-stack stack frames on the 6801. I've taken the concrete example for the 68000 and transliterated it to the 6809, and we've taken a bit of a detour through addressing math that might be helpful, finishing with some examples of some of the fancy modes on the 68000.
Here I'll work on a more concrete and, hopefully, more understandable
translation of stack frames to the 6801. It can't be a transliteration,
because we can't reference local variables at a negative offset from the frame
pointer. But the linked list of frame pointers has to remain such that it can
be walked backwards to get caller's context, and such that, when the called
routine ends, it can restore the caller's context.
The advantage of this concrete example is that we won't have to push the 6801 past the workaround limits of the CPU. None of the routines require more stack pointer math than a few pushes and pops. And we'll rig something up to keep all offsets positive.
And, again, I want to emphasize that I do not recommend the single stack
discipline that most of the current "modern" software engineering
infrastructure is built on. This is just here for comparison. To that end, I
will provide examples of both the single-stack stack frame and the split-stack
stack frame for the 6801 here, the single stack version first.
Probably the biggest impediment to doing stack frames on the 6800 and 6801 is the lack of support for fast general address arithmetic. We can do the arithmetic, but it's slow enough to cause the programmer serious angst about using variables that require address math just to access.
There are possible faster work-arounds for some parts of it (such as if you
arrange to
keep the stack entirely within a 256-byte page), but they have specific ranges of applicability that take time to
understand, and may not allow general use.
And address math requires temporary variables, preferably in the direct page, which themselves require consideration and support at interrupt time. And, because the 6801 only has positive constant offsets, we must, if possible, arrange to only need constant positive offsets.
Even if we had the SBX instruction, we would prefer not to have to load B and
use it, if we could.
Now the reason we use negative offsets in the 6809 and 68000 code is that they can be kept constant, and the compiler (or programmer) doesn't need to specifically remember how many temporaries and parameters have been pushed/allocated while it/she/he is generating code -- only while setting up the frame.
And if the frame is built by pushing initial values for variables, we barely have to remember it then. (Which may be why Motorola figured SBX was not necessary.)
Here's a cross-section of what we think we want the stack frame list to look
like:
* Cross-section of general frame structure in called routine:
* [{LOCVAR}] for calling routine
* [{TEMP} ] for calling routine
* [PARAM ] from calling routine
* [RETADR ] to calling routine
* [FRMLNK ] at entry to calling routine
* [LOCVAR ] for called -- current -- routine
* [TEMP ] for called -- current -- routine
* [(PARAM) ] to be passed to a further call
Let's take a bit broader section and show the connections that we used for the 6809 and 68000:
* Broader cross-section, showing chaining for routine 3, in-flight:
* [RETADR1 ]
* [FRMLNK2 ] <= FRMLNK3
* [LOCVAR2 ]
* [TEMP2 ]
* [PARAM3 ]
* [RETADR2 ]
* [FRMLNK3 ] <= FP (frame pointer)
* [LOCVAR3 ]
* [TEMP3 ]
* [(PARAM4)] <= SP (return stack pointer)
With the
6809
and
68000, we could index downward from the pointer to the frame link -- from the
frame pointer. So that was the way I constructed those.
With the 6800/1, we can't use negative constant offsets in indexed mode, only positive.
If the compiler (or programmer) keeps track of how many bytes have been pushed and popped since entering the routine, it's actually no problem to add that many bytes to the offsets needed to reach the local variables, and to skip over the frame link and return address to the parameters.
But it does make the compiler more complex, and it adds a step or two for the programmers, which provides more opportunity for mistakes and bugs of the sort that like to hide themselves until they can really bite hard.
It would be nice to have a second pseudo-register pointing to the local variables (Call it the variable base pointer, VBP?), but how could we maintain that? Specifically, how could the calling routine restore its variable base pointer after the called routine completes and returns? All we have to help us so far is knowing at compile time how many bytes of local variables, temporaries, and parameters we have to adjust the provious frame pointer by. That is not available at run-time unless we save it somewhere.
We could stack the offset from SP to VBP and do the math at runtime to reproduce the VBP, but that's run-time math.
A better alternative would be to just push the VBP itself when we push the frame pointer.
But either way ends up further fattening a stack already well-fattened by the stack frame overhead.
What we want is some way to combine the function of the frame pointer with the function of the variable base pointer without having to calculate offsets at run time. Again, that's why we liked the negative constant offsets on the 6809. We could let the CPU handle the calculations for us, and hide the address calculation time in the overall access overhead.
(You can see those calculation times in the 68000 and 68020. In the 68030 and
beyond, they've added a lot of circuitry to do as much as possible of those
calculations in parallel with whatever else the CPU is doing, which makes
those processors significantly faster than the 68020 at the same CPU clock
rates, even.)
Late last night, I was thinking that staging the linkage, moving VBP through FP on its way to the stack, would do the trick. But that's actually what we are doing with SP.
I'm not seeing any other alternatives. Either
-
make the compiler or programmer track the number of bytes between the
current SP and the first byte of the local variables in the stack;
-
or push the base address of the local variables along with the frame
pointer.
If you're going to saddle the programmer with the burden of maintaining the changing offsets anyway, what's the purpose in the discipline of maintaining a frame? It's precisely the burden of remembering what's on the stack that stack frames are supposed to "solve".
So, stack the pointer to the base address of the routine's local (dynamic)
variables, too.
The single-stack example below relies on stacking the local variable base pointer along with the frame pointer. And you have to do it every time, or you have to remember that you didn't -- which is essentially stacking a flag, so why not just stack the VBP?
Or drag the entire compile-time analysis of the code with you to make it
possible to run the compiled code? (Kind of like having to bury a link table
in your object code just to run it.) Should every routine access its entry in
the table of sizes of variable allocations when it terminates or something?
Somehow, I suspect that's actually part of the code-bloat in modern code
support libraries. I am not going to touch that here.
So note, in the comments in the source code, the frame structure that we are actually going to use.
There are also a few places where I adjusted the code for the tools, such as my asm68c assembler not (presently) being able to declare blocks larger than 256 bytes with RMB.
[JMR202411141016 edit:]
Working through coding this for the 6800, I recognized I had left unnecessary code in a few places, and while I was checking that I hadn't screwed anything up removing it, discovered I had not quite completed the example. The corrections are just meaningful enough that I'm leaving the old code down below the end of the chapter.
[JMR202411141016 end edit.]
Be careful to read the comments in the code along with the code. Again, I'm
giving the details of the discussion there. And watch out for when I forget to
update the comments to match the code! (Read the code.)
* 16-bit addition as example of single-stack stack frame discipline on 6801,
* with test code
* Joel Matthew Rees, October 2024
*
OPT 6801
NATWID EQU 2 ; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
ORG $80 ; MDOS says this is a good place for usr stuff.
*
ENTRY JMP START
NOP ; Just want even addressed pointers for no reason.
NOP ; bumper
NOP ; 6 bytes to this point.
SSAVE RMB 2 ; a place to keep S so we can return clean
RMB 4 ; bumper
* All of the pseudo-registers must be saved and restored on context switch,
* cannot be accessed during interrupt service.
XWORK RMB 2 ; For saving an index register temporarily
DWORK RMB 2 ; For saving D temporarily
FP RMB 2 ; frame pointer
VBP RMB 2 ; variable base pointer
LB_BASE RMB 2 ; For process local variables
HPPTR RMB 2 ; heap pointer (not yet managed)
HPALL RMB 2 ; heap allocation pointer
HPLIM RMB 2 ; heap limit
* End of pseudo-registers
RMB 4 ; bumper
GAP1 RMB 2 ; Mark the bottom of the gap
*
*
*
ORG $2000 ; Give the DP room.
LB_ADDR RMB 4 ; a little bumper space
FINAL RMB 4 ; 32-bit Final result in DP variable (to show we can)
FINALX EQU 4
STKLIM RMB 192 ; roughly 16 to 20 levels of call
STKLIMX EQU FINALX+4
STKBAS RMB 8 ; for canary return
STKBASX EQU STKLIMX+192
STKFAK RMB 2 ; fake frame pointer, self-link
STKFAKX EQU STKBASX+8 ; 6801 is post-dec (post-store-decrement) push
STKBMP RMB 4 ; a little bumper space
STKBMPX EQU STKFAKX+2 ; But we are going to init S through X
*
* My assembler limits RMBs to $100 long, so we'll use a different way.
HBASE RMB 1 ; $1024 or something ; Not using or managing heap yet.
HBASEX EQU STKBMPX+4
*HLIM RMB 4 ; bumper
*HLIMX EQU HBASEX+$100 ; 1024
*
*
ORG $3000
CDBASE JMP ERROR ; more bumpers
NOP
INISTK LDX #LB_ADDR ; set up process local space
STX LB_BASE
LDD LB_BASE
ADDD #HBASEX ; calculat EA
STD HPPTR ; as if we actually had a heap
STD HPALL
LDD #CDBASE
SUBD #4 ; extra bumper
STD HPLIM
LDD LB_BASE
ADDD #STKBASX+2
STD FP ; initialize
STD VBP ; initialize
LDX FP
STX 0,X ; self link
ADDD #6
STD 6,X ; last self link
STD 2,X ; error VARBP
LDX #STKUNDR ; error handler
STX XWORK
LDD XWORK
LDX FP
DEX
DEX
STD 0,X ; last fake return to error handler
STD 6,X ; first fake return to error handler
PULA ; get the return address
PULB
STS SSAVE ; Save what the monitor gave us.
TXS ; move to our own stack
PSHB
PSHA
RTS
*
***
* Since negative index offsets are so expensive,
* we want to create a stack frame with only positive offsets.
* And we want the frame pointer to be pushed after the call,
* on entry to the local context.
* And the saved frame pointer needs to link to the previous one.
* And when we restore the previous frame,
* we need to be able to restore the previous frame base.
*
* Cross-section of general frame structure in called routine:
* [{LOCVAR}] for calling routine
* [{TEMP} ] for calling routine
* [PARAM ] from calling routine
* [RETADR ] to calling routine
* [VARBP ] base of local variables in calling routine
* [FRMLNK ] at entry to calling routine
* [LOCVAR ] for called -- current -- routine
* [TEMP ] for called -- current -- routine
* [(PARAM) ] to be passed to a further call
*
* Broader cross-section, showing chaining for routine 3, in-flight:
* [RETADR1 ]
* [VARBP1 ]
* [FRMLNK2 ] <= FRMLNK3
* [LOCVAR2 ] <= VARBP2
* [TEMP2 ]
* [PARAM3 ]
* [RETADR2 ]
* [VARBP2 ]
* [FRMLNK3 ] <= FP (frame pointer)
* [LOCVAR3 ] <= VBP (variable base pointer)
* [TEMP3 ]
* [(PARAM4)] <= SP (return stack pointer (6800 S is byte below))
***
*
***
* Utility routines
*
* Let the caller do allocation after.
LINKF PULA ; get return address
PULB
LDX VBP ; push frame base
PSHX
LDX FP ; and link the frame in
PSHX
TSX ; set up new frame pointers
STX FP ; because we want to use the pointer at will
STX VBP ; link and allocate 0 complete
PSHB ; put return address back
PSHA
RTS
*
* No return value
UNLKF PULA ; get return address
PULB
LDX FP ; deallocate
TXS ; and unlink
PULX ; restore previous
STX FP
PULX
STX VBP
PSHB ; restore return address
PSHA
RTS
*
*
* Stack after LINK and allocation
* when functions are called by MAIN
* with two parameters
* We will return result in D:X
* [<SELF> ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY ]
* [<SELF> ] <= <SELF>,VARBPX,FRMLNKY
* [STKUNDR ]STKBAS
* [VARBPX ]
* [FRMLNKY=STKBAS+NATWID ] <= FRMLNKX,VARBP0
* [RETADR0 ]
* [VARBP0 ]
* [FRMLNKX ] <= FRMLNK0
* [32:VAR1_1]
* [32:VAR1_2] <= VARBP1
* [PARAM2_1]
* [PARAM2_2]
* [RETADR1 ]
* [VARBP1 ]
* [FRMLNK0 ] <= FP,SP,VBP
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
* 16-bit left 1st pushed, right 2nd
* output parameter:
* 17-bit sum in 32-bit D:X D high, X low
* Does not alter the parameters.
ADD16S JSR LINKF
TSX ; no local allocations
*
LDAA #(-1) ; prepare for sign extension
TST 8,X ; the left-hand operand sign bit
BMI ADD16SR
CLRA ; zero extend
ADD16SR PSHA ; push left extension
PSHA ; left sign cell below X now
LDAA #(-1) ; reload
TST 6,X ; the right-hand operand sign bit
BMI ADD16SL
CLRA ; zero extend
ADD16SL PSHA ; push right extension
PSHA
TSX ; point to sign extensions
LDD 12,X ; left-hand low cell
ADDD 10,X ; right-hand low cell
STD XWORK ; save low half of result
LDD 2,X ; left-hand extension
ADCB 1,X ; right-hand extension
ADCA 0,X
STD DWORK ; Save high half of result
*
JSR UNLKF ; drops temporaries
LDX XWORK ; get low half of result
LDD DWORK ; get high half of result
RTS ; result is in D:X
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
* 16-bit left, right
* output parameter:
* 17-bit sum in 32-bit D:X D high
ADD16U JSR LINKF
TSX ; no local allocations
*
LDD 8,X ; left
ADDD 6,X ; right
STD XWORK ; save low half
LDD #0
ADCB #0
STD DWORK ; save carry bit in high half
*
JSR UNLKF ; drops temporaries
LDX XWORK ; get low half of result
LDD DWORK ; get high half of result
RTS ; result is in D:X
*
* Etc.
*
***
*
* Stack after LINK #0 when fuctions are called by MAIN
* with one input parameter
* (#0 means no local variables)
* [<SELF> ] <= <SELF>
* [<SELF> ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY ]
* [<SELF> ] <= <SELF>,VARBPX,FRMLNKY
* [STKUNDR ]STKBAS
* [VARBPX ]
* [FRMLNKY=STKBAS+NATWID ] <= FRMLNKX,VARBP0
* [RETADR0 ]
* [VARBP0 ]
* [FRMLNKX ] <= FRMLNK0
* [32:VAR1_1]
* [32:VAR1_2] <= VARBP1
* [PARAM2_1]
* [RETADR1 ]
* [VARBP1 ]
* [FRMLNK0 ] <= FP,SP,VBP
*
* To show how to walk the stack --
* Add 16-bit signed parameter
* to 32 bit caller's 2nd 32-bit internal variable.
* input parameter:
* 16-bit addend
* target parameter in caller
* 2nd 32-bit variable at offset -2*NATWID
* no output parameter:
ADD16SI JSR LINKF
TSX ; no local allocations
*
LDAA #(-1)
TST 6,X ; high byte of paramater
BMI ADD16SIP
CLRA
ADD16SIP PSHA ; save the sign extension half
PSHA
LDX 2,X ; get caller's VBP
LDD 2,X ; caller's 2nd variable, low
LDX FP
ADDD 6,X ; parameter
LDX 2,X ; caller's VBP
STD 2,X ; save result low half away
LDD 0,X ; caller's 2nd variable, high
TSX
ADCB 1,X ; sign extension half
ADCA 0,X
LDX FP
LDX 2,X
STD 0,X ; save result high half away
*
JSR UNLKF ; drops temporaries
RTS ; no result to load
*
*
***
* Stack after LINK
* [<SELF> ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY ]
* [<SELF> ] <= <SELF>,VARBPX,FRMLNKY
* [STKUNDR ]STKBAS
* [VARBPX ]
* [FRMLNKY=STKBAS+NATWID ] <= FRMLNKX,VARBP0
* [RETADR0 ]
* [VARBP0 ]
* [FRMLNKX ] <= FP
* [32:VAR1_1]
* [32:VAR1_2] <= SP,VBP
*
MAIN JSR LINKF
LDX #0
PSHX ; four pushes is only one byte more than a call.
PSHX
PSHX
PSHX
TSX
STX VBP ; link and allocate complete
*
LDX #$1234 ; parameters
PSHX
LDX #$CDEF
PSHX
JSR ADD16U ; result in D:X should be $E023
INS ; could reuse instead of dropping
INS
INS
INS
PSHX
LDX #$8765
PSHX
JSR ADD16S ; result in D:X should be $FFFF6788
INS ; could reuse instead of dropping
INS
INS
INS
STX XWORK
LDX VBP
STD 0,X
LDD XWORK
STD 2,X
LDX #$A5A5
PSHX
JSR ADD16SI ; result in 2nd variable should be FFFF0D2D
LDX VBP ; get the result from our variable
LDD 2,X ; low half
LDX LB_BASE ; store it in FINAL, in process local space
STD FINALX+2,X
LDX VBP
LDD 0,X ; high half
LDX LB_BASE
STD FINALX,X
*
JSR UNLKF
RTS
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP
***
* (who knows?) <= FP
***
* (who knows?) <= VBP
***
*
* Stack after initialization:
* [<SELF> ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY ]
* [<SELF> ] <= <SELF>,FP,VBP
* [STKUNDR ]STKBAS <= SP
***
* Stack after LINK (at call to MAIN)
* [<SELF> ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY ]
* [<SELF> ] <= <SELF>,VARBPX,FRMLNKY
* [STKUNDR ]STKBAS
* [VARBPX ]
* [FRMLNKY=STKBAS+NATWID ] <= SP,FP,VBP
*
START NOP
JSR INISTK
NOP
*
JSR LINKF
*
JSR MAIN
*
JSR UNLKF
*
DONE NOP
ERROR NOP ; define error labels as something not DONE, anyway
STKUNDR NOP
LDS SSAVE ; restore the monitor stack pointer
NOP
NOP ; landing pad to set breakpoint at
NOP
NOP
LDX $FFFE ; alternatively, jmp through reset vector
JMP 0,X
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor,
* but not necessarily to your program in a runnable state.
I think I'm going to continue using the fake return technique to keep things
better under control.
I have tested this code. It does run; it builds the stack frames and tears them down as advertised. And, as always, I will not guarantee that this code can be generalized. Nor will I guarantee that it can be generated by any real compiler.
[JMR202411182142 edit:]
Just realized, while working on this
for the 6800, that the name for SUB16SI did not agree with what it is
doing. So I'm fixing it, calling it ADD16SI, instead.
I am going to post the split-stack stack frame version for comparison, but
this has gotten so long that it really needs to be in a separate post. Also,
I'm pretty sure you'll want to compare this with that, side-by-side, in
separate browser windows. The differences become that obvious.
As a reminder, we've already seen what
this kind of code looks like without stack frames.
Once I get
the split-stack version of this code up
(It's up now.), I'll convert it to the 6800. If you're getting worn out, go ahead and move
on to
getting numeric output in binary.
[JMR202411141016 old code version:]
* 16-bit addition as example of single-stack stack frame discipline on 6801,
* with test code
* Joel Matthew Rees, October 2024
*
OPT 6801
NATWID EQU 2 ; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
ORG $80 ; MDOS says this is a good place for usr stuff.
*
ENTRY JMP START
NOP ; Just want even addressed pointers for no reason.
NOP ; bumper
NOP ; 6 bytes to this point.
SSAVE RMB 2 ; a place to keep S so we can return clean
RMB 4 ; bumper
* All of the pseudo-registers must be saved and restored on context switch,
* cannot be accessed during interrupt service.
XWORK RMB 2 ; For saving an index register temporarily
DWORK RMB 2 ; For saving D temporarily
FP RMB 2 ; frame pointer
VBP RMB 2 ; variable base pointer
LB_BASE RMB 2 ; For process local variables
HPPTR RMB 2 ; heap pointer (not yet managed)
HPALL RMB 2 ; heap allocation pointer
HPLIM RMB 2 ; heap limit
* End of pseudo-registers
RMB 4 ; bumper
GAP1 RMB 2 ; Mark the bottom of the gap
*
*
*
ORG $2000 ; Give the DP room.
LB_ADDR RMB 4 ; a little bumper space
FINAL RMB 4 ; 32-bit Final result in DP variable (to show we can)
FINALX EQU 4
STKLIM RMB 192 ; roughly 16 to 20 levels of call
STKLIMX EQU FINALX+4
STKBAS RMB 8 ; for canary return
STKBASX EQU STKLIMX+192
STKFAK RMB 2 ; fake frame pointer, self-link
STKFAKX EQU STKBASX+8 ; 6801 is post-dec (post-store-decrement) push
STKBMP RMB 4 ; a little bumper space
STKBMPX EQU STKFAKX+2 ; But we are going to init S through X
*
* My assembler limits RMBs to $100 long, so we'll use a different way.
HBASE RMB 1 ; $1024 or something ; Not using or managing heap yet.
HBASEX EQU STKBMPX+4
*HLIM RMB 4 ; bumper
*HLIMX EQU HBASEX+$100 ; 1024
*
*
ORG $3000
CDBASE JMP ERROR ; more bumpers
NOP
INISTK LDX #LB_ADDR ; set up process local space
STX LB_BASE
LDD LB_BASE
ADDD #HBASEX ; calculat EA
STD HPPTR ; as if we actually had a heap
STD HPALL
LDD #CDBASE
SUBD #4 ; extra bumper
STD HPLIM
LDD LB_BASE
ADDD #STKBASX+2
STD FP ; initialize
STD VBP ; initialize
LDX FP
STX 0,X ; self link
ADDD #6
STD 6,X ; last self link
STD 2,X ; error VARBP
LDX #STKUNDR ; error handler
STX XWORK
LDD XWORK
LDX FP
DEX
DEX
STD 0,X ; last fake return to error handler
STD 6,X ; first fake return to error handler
PULA ; get the return address
PULB
STS SSAVE ; Save what the monitor gave us.
TXS ; move to our own stack
PSHB
PSHA
RTS
*
***
* Since negative index offsets are so expensive,
* we want to create a stack frame with only positive offsets.
* And we want the frame pointer to be pushed after the call,
* on entry to the local context.
* And the saved frame pointer needs to link to the previous one.
* And when we restore the previous frame,
* we need to be able to restore the previous frame base.
*
* Cross-section of general frame structure in called routine:
* [{LOCVAR}] for calling routine
* [{TEMP} ] for calling routine
* [PARAM ] from calling routine
* [RETADR ] to calling routine
* [VARBP ] base of local variables in calling routine
* [FRMLNK ] at entry to calling routine
* [LOCVAR ] for called -- current -- routine
* [TEMP ] for called -- current -- routine
* [(PARAM) ] to be passed to a further call
*
* Broader cross-section, showing chaining for routine 3, in-flight:
* [RETADR1 ]
* [VARBP1 ]
* [FRMLNK2 ] <= FRMLNK3
* [LOCVAR2 ] <= VARBP2
* [TEMP2 ]
* [PARAM3 ]
* [RETADR2 ]
* [VARBP2 ]
* [FRMLNK3 ] <= FP (frame pointer)
* [LOCVAR3 ] <= VBP (variable base pointer)
* [TEMP3 ]
* [(PARAM4)] <= SP (return stack pointer (6800 S is byte below))
***
*
***
* Utility routines
*
* Let the caller do allocation after.
LINKF PULA ; get return address
PULB
LDX VBP ; push frame base
PSHX
LDX FP ; and link the frame in
PSHX
TSX ; set up new frame pointers
STX FP ; because we want to use the pointer at will
STX VBP ; link and allocate 0 complete
PSHB ; put return address back
PSHA
RTS
*
* No return value
UNLKF PULA ; get return address
PULB
LDX FP ; deallocate
TXS ; and unlink
PULX ; restore previous
STX FP
PULX
STX VBP
PSHB ; restore return address
PSHA
RTS
*
*
* Stack after LINK and allocation
* when functions are called by MAIN
* with two parameters
* We will return result in D:X
* [<SELF> ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY ]
* [<SELF> ] <= <SELF>,VARBPX,FRMLNKY
* [STKUNDR ]STKBAS
* [VARBPX ]
* [FRMLNKY=STKBAS+NATWID ] <= FRMLNKX,VARBP0
* [RETADR0 ]
* [VARBP0 ]
* [FRMLNKX ] <= FRMLNK0
* [32:VAR1_1]
* [32:VAR1_2] <= VARBP1
* [PARAM2_1]
* [PARAM2_2]
* [RETADR1 ]
* [VARBP1 ]
* [FRMLNK0 ] <= FP,SP,VBP
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
* 16-bit left 1st pushed, right 2nd
* output parameter:
* 17-bit sum in 32-bit D:X D high, X low
* Does not alter the parameters.
ADD16S LDX VBP
JSR LINKF
TSX ; no local allocations
*
LDAA #(-1) ; prepare for sign extension
TST 8,X ; the left-hand operand sign bit
BMI ADD16SR
CLRA ; zero extend
ADD16SR PSHA ; push left extension
PSHA ; left sign cell below X now
LDAA #(-1) ; reload
TST 6,X ; the right-hand operand sign bit
BMI ADD16SL
CLRA ; zero extend
ADD16SL PSHA ; push right extension
PSHA
TSX ; point to sign extensions
LDD 12,X ; left-hand low cell
ADDD 10,X ; right-hand low cell
STD XWORK ; save low half of result
LDD 2,X ; left-hand extension
ADCB 1,X ; right-hand extension
ADCA 0,X
STD DWORK ; Save high half of result
*
JSR UNLKF ; drops temporaries
LDX XWORK ; get low half of result
LDD DWORK ; get high half of result
RTS ; result is in D:X
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
* 16-bit left, right
* output parameter:
* 17-bit sum in 32-bit D:X D high
ADD16U LDX VBP
JSR LINKF
TSX ; no local allocations
*
LDD 8,X ; left
ADDD 6,X ; right
STD XWORK ; save low half
LDD #0
ADCB #0
STD DWORK ; save carry bit in high half
*
JSR UNLKF ; drops temporaries
LDX XWORK ; get low half of result
LDD DWORK ; get high half of result
RTS ; result is in D:X
*
* Etc.
*
***
*
* Stack after LINK #0 when fuctions are called by MAIN
* with one input parameter
* (#0 means no local variables)
* [<SELF> ] <= <SELF>
* [<SELF> ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY ]
* [<SELF> ] <= <SELF>,VARBPX,FRMLNKY
* [STKUNDR ]STKBAS
* [VARBPX ]
* [FRMLNKY=STKBAS+NATWID ] <= FRMLNKX,VARBP0
* [RETADR0 ]
* [VARBP0 ]
* [FRMLNKX ] <= FRMLNK0
* [32:VAR1_1]
* [32:VAR1_2] <= VARBP1
* [PARAM2_1]
* [RETADR1 ]
* [VARBP1 ]
* [FRMLNK0 ] <= FP,SP,VBP
*
* To show how to walk the stack --
* Add 16-bit signed parameter
* to 32 bit caller's 2nd 32-bit internal variable.
* input parameter:
* 16-bit addend
* target parameter in caller
* 2nd 32-bit variable at offset -2*NATWID
* no output parameter:
SUB16SI LDX VBP
JSR LINKF
TSX ; no local allocations
*
LDAA #(-1)
TST 6,X ; high byte of paramater
BMI SUB16SIP
CLRA
SUB16SIP PSHA ; save the sign extension half
PSHA
LDX 2,X ; get caller's VBP
LDD 2,X ; caller's 2nd variable, low
LDX FP
ADDD 6,X ; parameter
LDX 2,X ; caller's VBP
STD 2,X ; save result low half away
LDD 0,X ; caller's 2nd variable, high
TSX
ADCB 1,X ; sign extension half
ADCA 0,X
LDX FP
LDX 2,X
STD 0,X ; save result high half away
*
JSR UNLKF ; drops temporaries
RTS ; no result to load
*
*
***
* Stack after LINK
* [<SELF> ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY ]
* [<SELF> ] <= <SELF>,VARBPX,FRMLNKY
* [STKUNDR ]STKBAS
* [VARBPX ]
* [FRMLNKY=STKBAS+NATWID ] <= FRMLNKX,VARBP0
* [RETADR0 ]
* [VARBP0 ]
* [FRMLNKX ] <= FP
* [32:VAR1_1]
* [32:VAR1_2] <= SP,VBP
*
MAIN LDX VBP
JSR LINKF
LDX #0
PSHX ; four pushes is only one byte more than a call.
PSHX
PSHX
PSHX
TSX
STX VBP ; link and allocate complete
*
LDX #$1234 ; parameters
PSHX
LDX #$CDEF
PSHX
JSR ADD16U ; result in D:X should be $E023
INS ; could reuse instead of dropping
INS
INS
INS
PSHX
LDX #$8765
PSHX
JSR ADD16S ; result in D:X should be $FFFF6788
INS ; could reuse instead of dropping
INS
INS
INS
STX XWORK
LDX VBP
STD 0,X
LDD XWORK
STD 2,X
LDX #$A5A5
PSHX
JSR SUB16SI ; result in 2nd variable should be FFFF0D2D
LDX VBP ; get the result from our variable
LDD 2,X ; low half
LDX LB_BASE ; store it in FINAL, in process local space
STD FINALX+2,X
LDX VBP
LDD 0,X ; high half
LDX LB_BASE
STD FINALX,X
*
JSR UNLKF
RTS
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP
***
* (who knows?) <= FP
***
* (who knows?) <= VBP
***
*
* Stack after initialization:
* [<SELF> ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY ]
* [<SELF> ] <= <SELF>,FP,VBP
* [STKUNDR ]STKBAS <= SP
***
* Stack after LINK (at call to MAIN)
* [<SELF> ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY ]
* [<SELF> ] <= <SELF>,VARBPX,FRMLNKY
* [STKUNDR ]STKBAS
* [VARBPX ]
* [FRMLNKY=STKBAS+NATWID ] <= SP,FP,VBP
*
START NOP
JSR INISTK
NOP
*
LDX VBP ; mark
PSHX
LDX FP
PSHX
TSX ; link
STX FP
STX VBP
*
JSR MAIN
*
DONE NOP
ERROR NOP ; define error labels as something not DONE, anyway
STKUNDR NOP
LDS SSAVE ; restore the monitor stack pointer
NOP
NOP ; landing pad to set breakpoint at
NOP
NOP
LDX $FFFE ; alternatively, jmp through reset vector
JMP 0,X
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor,
* but not necessarily to your program in a runnable state.
[JMR202411141016 end old code version.]
No comments:
Post a Comment