Wednesday, November 6, 2024

ALPP 02-25 -- Ascending the Wrong Island -- Single-stack Stack Frame Example: 6801

And this one has been sitting at the bottom of the pool for a while, as predicted, even longer than the 6809 example.

  Ascending the Wrong Island --
Single-stack Stack Frame Example:
6801

(Title Page/Index)

This is a concrete example to demonstrate some approaches to the problems in single-stack stack frames on the 6801. I've taken the concrete example for the 68000 and transliterated it to the 6809, and we've taken a bit of a detour through addressing math that might be helpful, finishing with some examples of some of the fancy modes on the 68000.  

Here I'll work on a more concrete and, hopefully, more understandable translation of stack frames to the 6801. It can't be a transliteration, because we can't reference local variables at a negative offset from the frame pointer. But the linked list of frame pointers has to remain such that it can be walked backwards to get caller's context, and such that, when the called routine ends, it can restore the caller's context.

The advantage of this concrete example is that we won't have to push the 6801 past the workaround limits of the CPU. None of the routines require more stack pointer math than a few pushes and pops. And we'll rig something up to keep all offsets positive.

And, again, I want to emphasize that I do not recommend the single stack discipline that most of the current "modern" software engineering infrastructure is built on. This is just here for comparison. To that end, I will provide examples of both the single-stack stack frame and the split-stack stack frame for the 6801 here, the single stack version first.

Probably the biggest impediment to doing stack frames on the 6800 and 6801 is the lack of support for fast general address arithmetic. We can do the arithmetic, but it's slow enough to cause the programmer serious angst about using variables that require address math just to access. 

There are possible faster work-arounds for some parts of it (such as if you arrange to keep the stack entirely within a 256-byte page), but they have specific ranges of applicability that take time to understand, and may not allow general use.

And address math requires temporary variables, preferably in the direct page, which themselves require consideration and support at interrupt time. And, because the 6801 only has positive constant offsets, we must, if possible, arrange to only need constant positive offsets.

Even if we had the SBX instruction, we would prefer not to have to load B and use it, if we could.

Now the reason we use negative offsets in the 6809 and 68000 code is that they can be kept constant, and the compiler (or programmer) doesn't need to specifically remember how many temporaries and parameters have been pushed/allocated while it/she/he is generating code -- only while setting up the frame. 

And if the frame is built by pushing initial values for variables, we barely have to remember it then. (Which may be why Motorola figured SBX was not necessary.)

Here's a cross-section of what we think we want the stack frame list to look like:

* Cross-section of general frame structure in called routine:
* [{LOCVAR}] for calling routine
* [{TEMP}  ] for calling routine
* [PARAM   ] from calling routine
* [RETADR  ] to calling routine
* [FRMLNK  ] at entry to calling routine
* [LOCVAR  ] for called -- current -- routine
* [TEMP    ] for called -- current -- routine
* [(PARAM) ] to be passed to a further call

Let's take a bit broader section and show the connections that we used for the 6809 and 68000:

* Broader cross-section, showing chaining for routine 3, in-flight:
* [RETADR1 ] 
* [FRMLNK2 ] <= FRMLNK3
* [LOCVAR2 ] 
* [TEMP2   ]
* [PARAM3  ]
* [RETADR2 ]
* [FRMLNK3 ] <= FP (frame pointer)
* [LOCVAR3 ]
* [TEMP3   ]
* [(PARAM4)] <= SP (return stack pointer)

With the 6809 and 68000, we could index downward from the pointer to the frame link -- from the frame pointer. So that was the way I constructed those.

With the 6800/1, we can't use negative constant offsets in indexed mode, only positive. 

If the compiler (or programmer) keeps track of how many bytes have been pushed and popped since entering the routine, it's actually no problem to add that many bytes to the offsets needed to reach the local variables, and to skip over the frame link and return address to the parameters. 

But it does make the compiler more complex, and it adds a step or two for the programmers, which provides more opportunity for mistakes and bugs of the sort that like to hide themselves until they can really bite hard.  

It would be nice to have a second pseudo-register pointing to the local variables (Call it the variable base pointer, VBP?), but how could we maintain that? Specifically, how could the calling routine restore its variable base pointer after the called routine completes and returns? All we have to help us so far is knowing at compile time how many bytes of local variables, temporaries, and parameters we have to adjust the provious frame pointer by. That is not available at run-time unless we save it somewhere.

We could stack the offset from SP to VBP and do the math at runtime to reproduce the VBP, but that's run-time math. 

A better alternative would be to just push the VBP itself when we push the frame pointer. 

But either way ends up further fattening a stack already well-fattened by the stack frame overhead.

What we want is some way to combine the function of the frame pointer with the function of the variable base pointer without having to calculate offsets at run time. Again, that's why we liked the negative constant offsets on the 6809. We could let the CPU handle the calculations for us, and hide the address calculation time in the overall access overhead.

(You can see those calculation times in the 68000 and 68020. In the 68030 and beyond, they've added a lot of circuitry to do as much as possible of those calculations in parallel with whatever else the CPU is doing, which makes those processors significantly faster than the 68020 at the same CPU clock rates, even.)

Late last night, I was thinking that staging the linkage, moving VBP through FP on its way to the stack, would do the trick. But that's actually what we are doing with SP.

I'm not seeing any other alternatives. Either 

  • make the compiler or programmer track the number of bytes between the current SP and the first byte of the local variables in the stack; 
  • or push the base address of the local variables along with the frame pointer.

If you're going to saddle the programmer with the burden of maintaining the changing offsets anyway, what's the purpose in the discipline of maintaining a frame? It's precisely the burden of remembering what's on the stack that stack frames are supposed to "solve".

So, stack the pointer to the base address of the routine's local (dynamic) variables, too.

The single-stack example below relies on stacking the local variable base pointer along with the frame pointer. And you have to do it every time, or you have to remember that you didn't -- which is essentially stacking a flag, so why not just stack the VBP?

Or drag the entire compile-time analysis of the code with you to make it possible to run the compiled code? (Kind of like having to bury a link table in your object code just to run it.) Should every routine access its entry in the table of sizes of variable allocations when it terminates or something? Somehow, I suspect that's actually part of the code-bloat in modern code support libraries. I am not going to touch that here.

So note, in the comments in the source code, the frame structure that we are actually going to use. 

There are also a few places where I adjusted the code for the tools, such as my asm68c assembler not (presently) being able to declare blocks larger than 256 bytes with RMB.

[JMR202411141016 edit:]

Working through coding this for the 6800, I recognized I had left unnecessary code in a few places, and while I was checking that I hadn't screwed anything up removing it, discovered I had not quite completed the example. The corrections are just meaningful enough that I'm leaving the old code down below the end of the chapter.

[JMR202411141016 end edit.]

Be careful to read the comments in the code along with the code. Again, I'm giving the details of the discussion there. And watch out for when I forget to update the comments to match the code! (Read the code.)

* 16-bit addition as example of single-stack stack frame discipline on 6801,
* with test code
* Joel Matthew Rees, October 2024
*
	OPT	6801
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS says this is a good place for usr stuff.
*
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
	NOP		; bumper
	NOP		; 6 bytes to this point.
SSAVE	RMB	2	; a place to keep S so we can return clean
	RMB	4	; bumper
* All of the pseudo-registers must be saved and restored on context switch,
* cannot be accessed during interrupt service.
XWORK	RMB	2	; For saving an index register temporarily
DWORK	RMB	2	; For saving D temporarily
FP	RMB	2	; frame pointer
VBP	RMB	2	; variable base pointer
LB_BASE	RMB	2	; For process local variables
HPPTR	RMB	2	; heap pointer (not yet managed)
HPALL	RMB	2	; heap allocation pointer
HPLIM	RMB	2	; heap limit
* End of pseudo-registers
	RMB	4	; bumper
GAP1	RMB	2	; Mark the bottom of the gap
*
*
*
	ORG	$2000	; Give the DP room.
LB_ADDR	RMB	4	; a little bumper space
FINAL	RMB	4	; 32-bit Final result in DP variable (to show we can)
FINALX	EQU	4
STKLIM	RMB	192	; roughly 16 to 20 levels of call
STKLIMX	EQU	FINALX+4
STKBAS	RMB	8	; for canary return
STKBASX	EQU	STKLIMX+192
STKFAK	RMB	2	; fake frame pointer, self-link
STKFAKX	EQU	STKBASX+8	; 6801 is post-dec (post-store-decrement) push
STKBMP	RMB	4	; a little bumper space
STKBMPX	EQU	STKFAKX+2	; But we are going to init S through X
*
* My assembler limits RMBs to $100 long, so we'll use a different way.
HBASE	RMB	1	; $1024 or something	; Not using or managing heap yet.
HBASEX	EQU	STKBMPX+4
*HLIM	RMB	4	; bumper
*HLIMX	EQU	HBASEX+$100	; 1024
*
*
	ORG	$3000
CDBASE	JMP	ERROR		; more bumpers
	NOP
INISTK	LDX	#LB_ADDR	; set up process local space
	STX	LB_BASE
	LDD	LB_BASE	
	ADDD	#HBASEX		; calculat EA
	STD	HPPTR		; as if we actually had a heap
	STD	HPALL
	LDD	#CDBASE
	SUBD	#4		; extra bumper
	STD	HPLIM
	LDD	LB_BASE
	ADDD	#STKBASX+2
	STD	FP	; initialize
	STD	VBP	; initialize
	LDX	FP
	STX	0,X	; self link
	ADDD	#6
	STD	6,X	; last self link
	STD	2,X	; error VARBP
	LDX	#STKUNDR	; error handler
	STX	XWORK
	LDD	XWORK
	LDX	FP
	DEX
	DEX
	STD	0,X	; last fake return to error handler
	STD	6,X	; first fake return to error handler
	PULA		; get the return address
	PULB
	STS	SSAVE	; Save what the monitor gave us.
	TXS		; move to our own stack
	PSHB
	PSHA
	RTS
*
***
* Since negative index offsets are so expensive,
* we want to create a stack frame with only positive offsets.
* And we want the frame pointer to be pushed after the call,
* on entry to the local context.
* And the saved frame pointer needs to link to the previous one.
* And when we restore the previous frame, 
* we need to be able to restore the previous frame base.
*
* Cross-section of general frame structure in called routine:
* [{LOCVAR}] for calling routine
* [{TEMP}  ] for calling routine
* [PARAM   ] from calling routine
* [RETADR  ] to calling routine
* [VARBP   ] base of local variables in calling routine
* [FRMLNK  ] at entry to calling routine
* [LOCVAR  ] for called -- current -- routine
* [TEMP    ] for called -- current -- routine
* [(PARAM) ] to be passed to a further call
*
* Broader cross-section, showing chaining for routine 3, in-flight:
* [RETADR1 ] 
* [VARBP1  ]
* [FRMLNK2 ] <= FRMLNK3
* [LOCVAR2 ] <= VARBP2
* [TEMP2   ]
* [PARAM3  ]
* [RETADR2 ]
* [VARBP2  ]
* [FRMLNK3 ] <= FP (frame pointer)
* [LOCVAR3 ] <= VBP (variable base pointer)
* [TEMP3   ]
* [(PARAM4)] <= SP (return stack pointer (6800 S is byte below))
***
*
***
* Utility routines
*
* Let the caller do allocation after.
LINKF	PULA		; get return address
	PULB
	LDX	VBP	; push frame base
	PSHX
	LDX	FP	; and link the frame in
	PSHX
	TSX		; set up new frame pointers
	STX	FP	; because we want to use the pointer at will
	STX	VBP	; link and allocate 0 complete
	PSHB		; put return address back
	PSHA
	RTS
*
* No return value
UNLKF	PULA		; get return address
	PULB
	LDX	FP	; deallocate
	TXS		; and unlink
	PULX		; restore previous
	STX	FP
	PULX
	STX	VBP
	PSHB		; restore return address
	PSHA
	RTS
*
*
* Stack after LINK and allocation
* when functions are called by MAIN
* with two parameters
* We will return result in D:X
* [<SELF>  ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY  ]
* [<SELF>  ] <= <SELF>,VARBPX,FRMLNKY
* [STKUNDR ]STKBAS
* [VARBPX  ]
* [FRMLNKY=STKBAS+NATWID ] <= FRMLNKX,VARBP0
* [RETADR0 ] 
* [VARBP0  ]
* [FRMLNKX ] <= FRMLNK0
* [32:VAR1_1]
* [32:VAR1_2] <= VARBP1
* [PARAM2_1]
* [PARAM2_2]
* [RETADR1 ] 
* [VARBP1  ]
* [FRMLNK0 ] <= FP,SP,VBP
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
*   16-bit left 1st pushed, right 2nd
* output parameter:
*   17-bit sum in 32-bit D:X D high, X low
* Does not alter the parameters.
ADD16S	JSR	LINKF
	TSX		; no local allocations
*
	LDAA	#(-1)	; prepare for sign extension
	TST	8,X	; the left-hand operand sign bit
	BMI	ADD16SR
	CLRA		; zero extend
ADD16SR	PSHA		; push left extension
	PSHA		; left sign cell below X now
	LDAA	#(-1)	; reload
	TST	6,X	; the right-hand operand sign bit
	BMI	ADD16SL
	CLRA		; zero extend
ADD16SL	PSHA		; push right extension
	PSHA
	TSX		; point to sign extensions
	LDD	12,X	; left-hand low cell
	ADDD	10,X	; right-hand low cell
	STD	XWORK	; save low half of result
	LDD	2,X	; left-hand extension
	ADCB	1,X	; right-hand extension
	ADCA	0,X
	STD	DWORK	; Save high half of result
*
	JSR	UNLKF	; drops temporaries
	LDX	XWORK	; get low half of result
	LDD	DWORK	; get high half of result
	RTS		; result is in D:X
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
*   16-bit left, right
* output parameter:
*   17-bit sum in 32-bit D:X D high
ADD16U	JSR	LINKF
	TSX		; no local allocations
*
	LDD	8,X	; left
	ADDD	6,X	; right
	STD	XWORK	; save low half
	LDD	#0
	ADCB	#0
	STD	DWORK	; save carry bit in high half
*
	JSR	UNLKF	; drops temporaries
	LDX	XWORK	; get low half of result
	LDD	DWORK	; get high half of result
	RTS		; result is in D:X
*
* Etc.
*
***
*
* Stack after LINK #0 when fuctions are called by MAIN
* with one input parameter
* (#0 means no local variables)
* [<SELF>  ] <= <SELF>
* [<SELF>  ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY  ]
* [<SELF>  ] <= <SELF>,VARBPX,FRMLNKY
* [STKUNDR ]STKBAS
* [VARBPX  ]
* [FRMLNKY=STKBAS+NATWID ] <= FRMLNKX,VARBP0
* [RETADR0 ] 
* [VARBP0  ]
* [FRMLNKX ] <= FRMLNK0
* [32:VAR1_1]
* [32:VAR1_2] <= VARBP1
* [PARAM2_1]
* [RETADR1 ] 
* [VARBP1  ]
* [FRMLNK0 ] <= FP,SP,VBP
*
* To show how to walk the stack --
* Add 16-bit signed parameter
* to 32 bit caller's 2nd 32-bit internal variable.
* input parameter:
*   16-bit addend
* target parameter in caller
*   2nd 32-bit variable at offset -2*NATWID
* no output parameter:
ADD16SI	JSR	LINKF
	TSX		; no local allocations
*
	LDAA	#(-1)
	TST	6,X	; high byte of paramater
	BMI	ADD16SIP
	CLRA
ADD16SIP	PSHA	; save the sign extension half
	PSHA
	LDX	2,X	; get caller's VBP
	LDD	2,X	; caller's 2nd variable, low
	LDX	FP
	ADDD	6,X	; parameter
	LDX	2,X	; caller's VBP
	STD	2,X	; save result low half away
	LDD	0,X	; caller's 2nd variable, high
	TSX
	ADCB	1,X	; sign extension half
	ADCA	0,X
	LDX	FP
	LDX	2,X
	STD	0,X	; save result high half away
*
	JSR	UNLKF	; drops temporaries 
	RTS		; no result to load
*
*
***
* Stack after LINK
* [<SELF>  ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY  ] 
* [<SELF>  ] <= <SELF>,VARBPX,FRMLNKY
* [STKUNDR ]STKBAS
* [VARBPX  ] 
* [FRMLNKY=STKBAS+NATWID ] <= FRMLNKX,VARBP0
* [RETADR0 ] 
* [VARBP0  ]
* [FRMLNKX ] <= FP
* [32:VAR1_1]
* [32:VAR1_2] <= SP,VBP
*
MAIN	JSR	LINKF
	LDX	#0
	PSHX		; four pushes is only one byte more than a call. 
	PSHX
	PSHX
	PSHX
	TSX
	STX	VBP	; link and allocate complete
*
	LDX	#$1234	; parameters
	PSHX
	LDX	#$CDEF
	PSHX
	JSR	ADD16U	; result in D:X should be $E023
	INS	; could reuse instead of dropping
	INS
	INS
	INS
	PSHX
	LDX	#$8765
	PSHX
	JSR	ADD16S	; result in D:X should be $FFFF6788
	INS	; could reuse instead of dropping
	INS
	INS
	INS
	STX	XWORK
	LDX	VBP
	STD	0,X
	LDD	XWORK
	STD	2,X
	LDX	#$A5A5
	PSHX
	JSR	ADD16SI		; result in 2nd variable should be FFFF0D2D
	LDX	VBP		; get the result from our variable
	LDD	2,X		; low half
	LDX	LB_BASE		; store it in FINAL, in process local space
	STD	FINALX+2,X
	LDX	VBP
	LDD	0,X		; high half
	LDX	LB_BASE
	STD	FINALX,X
*
	JSR	UNLKF
	RTS
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP
***
* (who knows?) <= FP
***
* (who knows?) <= VBP
***
*
* Stack after initialization:
* [<SELF>  ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY  ] 
* [<SELF>  ] <= <SELF>,FP,VBP
* [STKUNDR ]STKBAS <= SP
***
* Stack after LINK (at call to MAIN)
* [<SELF>  ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY  ] 
* [<SELF>  ] <= <SELF>,VARBPX,FRMLNKY
* [STKUNDR ]STKBAS
* [VARBPX  ] 
* [FRMLNKY=STKBAS+NATWID ] <= SP,FP,VBP
*
START	NOP
	JSR	INISTK
	NOP
*
	JSR	LINKF
*
	JSR	MAIN
*
	JSR	UNLKF
*
DONE	NOP
ERROR	NOP	; define error labels as something not DONE, anyway
STKUNDR	NOP
	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad to set breakpoint at
	NOP
	NOP
	LDX	$FFFE	; alternatively, jmp through reset vector
	JMP	0,X
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor, 
* but not necessarily to your program in a runnable state.

I think I'm going to continue using the fake return technique to keep things better under control.

I have tested this code. It does run; it builds the stack frames and tears them down as advertised. And, as always, I will not guarantee that this code can be generalized. Nor will I guarantee that it can be generated by any real compiler.

[JMR202411182142 edit:] 

Just realized, while working on this for the 6800, that the name for SUB16SI did not agree with what it is doing. So I'm fixing it, calling it ADD16SI, instead.

[JMR202411182142 edit end.]

I am going to post the split-stack stack frame version for comparison, but this has gotten so long that it really needs to be in a separate post. Also, I'm pretty sure you'll want to compare this with that, side-by-side, in separate browser windows. The differences become that obvious.

As a reminder, we've already seen what this kind of code looks like without stack frames.

Once I get the split-stack version of this code up (It's up now.), I'll convert it to the 6800. If you're getting worn out, go ahead and move on to getting numeric output in binary.


(Title Page/Index)

 

[JMR202411141016 old code version:]

* 16-bit addition as example of single-stack stack frame discipline on 6801,
* with test code
* Joel Matthew Rees, October 2024
*
	OPT	6801
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS says this is a good place for usr stuff.
*
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
	NOP		; bumper
	NOP		; 6 bytes to this point.
SSAVE	RMB	2	; a place to keep S so we can return clean
	RMB	4	; bumper
* All of the pseudo-registers must be saved and restored on context switch,
* cannot be accessed during interrupt service.
XWORK	RMB	2	; For saving an index register temporarily
DWORK	RMB	2	; For saving D temporarily
FP	RMB	2	; frame pointer
VBP	RMB	2	; variable base pointer
LB_BASE	RMB	2	; For process local variables
HPPTR	RMB	2	; heap pointer (not yet managed)
HPALL	RMB	2	; heap allocation pointer
HPLIM	RMB	2	; heap limit
* End of pseudo-registers
	RMB	4	; bumper
GAP1	RMB	2	; Mark the bottom of the gap
*
*
*
	ORG	$2000	; Give the DP room.
LB_ADDR	RMB	4	; a little bumper space
FINAL	RMB	4	; 32-bit Final result in DP variable (to show we can)
FINALX	EQU	4
STKLIM	RMB	192	; roughly 16 to 20 levels of call
STKLIMX	EQU	FINALX+4
STKBAS	RMB	8	; for canary return
STKBASX	EQU	STKLIMX+192
STKFAK	RMB	2	; fake frame pointer, self-link
STKFAKX	EQU	STKBASX+8	; 6801 is post-dec (post-store-decrement) push
STKBMP	RMB	4	; a little bumper space
STKBMPX	EQU	STKFAKX+2	; But we are going to init S through X
*
* My assembler limits RMBs to $100 long, so we'll use a different way.
HBASE	RMB	1	; $1024 or something	; Not using or managing heap yet.
HBASEX	EQU	STKBMPX+4
*HLIM	RMB	4	; bumper
*HLIMX	EQU	HBASEX+$100	; 1024
*
*
	ORG	$3000
CDBASE	JMP	ERROR		; more bumpers
	NOP
INISTK	LDX	#LB_ADDR	; set up process local space
	STX	LB_BASE
	LDD	LB_BASE	
	ADDD	#HBASEX		; calculat EA
	STD	HPPTR		; as if we actually had a heap
	STD	HPALL
	LDD	#CDBASE
	SUBD	#4		; extra bumper
	STD	HPLIM
	LDD	LB_BASE
	ADDD	#STKBASX+2
	STD	FP	; initialize
	STD	VBP	; initialize
	LDX	FP
	STX	0,X	; self link
	ADDD	#6
	STD	6,X	; last self link
	STD	2,X	; error VARBP
	LDX	#STKUNDR	; error handler
	STX	XWORK
	LDD	XWORK
	LDX	FP
	DEX
	DEX
	STD	0,X	; last fake return to error handler
	STD	6,X	; first fake return to error handler
	PULA		; get the return address
	PULB
	STS	SSAVE	; Save what the monitor gave us.
	TXS		; move to our own stack
	PSHB
	PSHA
	RTS
*
***
* Since negative index offsets are so expensive,
* we want to create a stack frame with only positive offsets.
* And we want the frame pointer to be pushed after the call,
* on entry to the local context.
* And the saved frame pointer needs to link to the previous one.
* And when we restore the previous frame, 
* we need to be able to restore the previous frame base.
*
* Cross-section of general frame structure in called routine:
* [{LOCVAR}] for calling routine
* [{TEMP}  ] for calling routine
* [PARAM   ] from calling routine
* [RETADR  ] to calling routine
* [VARBP   ] base of local variables in calling routine
* [FRMLNK  ] at entry to calling routine
* [LOCVAR  ] for called -- current -- routine
* [TEMP    ] for called -- current -- routine
* [(PARAM) ] to be passed to a further call
*
* Broader cross-section, showing chaining for routine 3, in-flight:
* [RETADR1 ] 
* [VARBP1  ]
* [FRMLNK2 ] <= FRMLNK3
* [LOCVAR2 ] <= VARBP2
* [TEMP2   ]
* [PARAM3  ]
* [RETADR2 ]
* [VARBP2  ]
* [FRMLNK3 ] <= FP (frame pointer)
* [LOCVAR3 ] <= VBP (variable base pointer)
* [TEMP3   ]
* [(PARAM4)] <= SP (return stack pointer (6800 S is byte below))
***
*
***
* Utility routines
*
* Let the caller do allocation after.
LINKF	PULA		; get return address
	PULB
	LDX	VBP	; push frame base
	PSHX
	LDX	FP	; and link the frame in
	PSHX
	TSX		; set up new frame pointers
	STX	FP	; because we want to use the pointer at will
	STX	VBP	; link and allocate 0 complete
	PSHB		; put return address back
	PSHA
	RTS
*
* No return value
UNLKF	PULA		; get return address
	PULB
	LDX	FP	; deallocate
	TXS		; and unlink
	PULX		; restore previous
	STX	FP
	PULX
	STX	VBP
	PSHB		; restore return address
	PSHA
	RTS
*
*
* Stack after LINK and allocation
* when functions are called by MAIN
* with two parameters
* We will return result in D:X
* [<SELF>  ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY  ]
* [<SELF>  ] <= <SELF>,VARBPX,FRMLNKY
* [STKUNDR ]STKBAS
* [VARBPX  ]
* [FRMLNKY=STKBAS+NATWID ] <= FRMLNKX,VARBP0
* [RETADR0 ] 
* [VARBP0  ]
* [FRMLNKX ] <= FRMLNK0
* [32:VAR1_1]
* [32:VAR1_2] <= VARBP1
* [PARAM2_1]
* [PARAM2_2]
* [RETADR1 ] 
* [VARBP1  ]
* [FRMLNK0 ] <= FP,SP,VBP
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
*   16-bit left 1st pushed, right 2nd
* output parameter:
*   17-bit sum in 32-bit D:X D high, X low
* Does not alter the parameters.
ADD16S	LDX	VBP
	JSR	LINKF
	TSX		; no local allocations
*
	LDAA	#(-1)	; prepare for sign extension
	TST	8,X	; the left-hand operand sign bit
	BMI	ADD16SR
	CLRA		; zero extend
ADD16SR	PSHA		; push left extension
	PSHA		; left sign cell below X now
	LDAA	#(-1)	; reload
	TST	6,X	; the right-hand operand sign bit
	BMI	ADD16SL
	CLRA		; zero extend
ADD16SL	PSHA		; push right extension
	PSHA
	TSX		; point to sign extensions
	LDD	12,X	; left-hand low cell
	ADDD	10,X	; right-hand low cell
	STD	XWORK	; save low half of result
	LDD	2,X	; left-hand extension
	ADCB	1,X	; right-hand extension
	ADCA	0,X
	STD	DWORK	; Save high half of result
*
	JSR	UNLKF	; drops temporaries
	LDX	XWORK	; get low half of result
	LDD	DWORK	; get high half of result
	RTS		; result is in D:X
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
*   16-bit left, right
* output parameter:
*   17-bit sum in 32-bit D:X D high
ADD16U	LDX	VBP
	JSR	LINKF
	TSX		; no local allocations
*
	LDD	8,X	; left
	ADDD	6,X	; right
	STD	XWORK	; save low half
	LDD	#0
	ADCB	#0
	STD	DWORK	; save carry bit in high half
*
	JSR	UNLKF	; drops temporaries
	LDX	XWORK	; get low half of result
	LDD	DWORK	; get high half of result
	RTS		; result is in D:X
*
* Etc.
*
***
*
* Stack after LINK #0 when fuctions are called by MAIN
* with one input parameter
* (#0 means no local variables)
* [<SELF>  ] <= <SELF>
* [<SELF>  ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY  ]
* [<SELF>  ] <= <SELF>,VARBPX,FRMLNKY
* [STKUNDR ]STKBAS
* [VARBPX  ]
* [FRMLNKY=STKBAS+NATWID ] <= FRMLNKX,VARBP0
* [RETADR0 ] 
* [VARBP0  ]
* [FRMLNKX ] <= FRMLNK0
* [32:VAR1_1]
* [32:VAR1_2] <= VARBP1
* [PARAM2_1]
* [RETADR1 ] 
* [VARBP1  ]
* [FRMLNK0 ] <= FP,SP,VBP
*
* To show how to walk the stack --
* Add 16-bit signed parameter
* to 32 bit caller's 2nd 32-bit internal variable.
* input parameter:
*   16-bit addend
* target parameter in caller
*   2nd 32-bit variable at offset -2*NATWID
* no output parameter:
SUB16SI	LDX	VBP
	JSR	LINKF
	TSX		; no local allocations
*
	LDAA	#(-1)
	TST	6,X	; high byte of paramater
	BMI	SUB16SIP
	CLRA
SUB16SIP	PSHA	; save the sign extension half
	PSHA
	LDX	2,X	; get caller's VBP
	LDD	2,X	; caller's 2nd variable, low
	LDX	FP
	ADDD	6,X	; parameter
	LDX	2,X	; caller's VBP
	STD	2,X	; save result low half away
	LDD	0,X	; caller's 2nd variable, high
	TSX
	ADCB	1,X	; sign extension half
	ADCA	0,X
	LDX	FP
	LDX	2,X
	STD	0,X	; save result high half away
*
	JSR	UNLKF	; drops temporaries 
	RTS		; no result to load
*
*
***
* Stack after LINK
* [<SELF>  ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY  ] 
* [<SELF>  ] <= <SELF>,VARBPX,FRMLNKY
* [STKUNDR ]STKBAS
* [VARBPX  ] 
* [FRMLNKY=STKBAS+NATWID ] <= FRMLNKX,VARBP0
* [RETADR0 ] 
* [VARBP0  ]
* [FRMLNKX ] <= FP
* [32:VAR1_1]
* [32:VAR1_2] <= SP,VBP
*
MAIN	LDX	VBP
	JSR	LINKF
	LDX	#0
	PSHX		; four pushes is only one byte more than a call. 
	PSHX
	PSHX
	PSHX
	TSX
	STX	VBP	; link and allocate complete
*
	LDX	#$1234	; parameters
	PSHX
	LDX	#$CDEF
	PSHX
	JSR	ADD16U	; result in D:X should be $E023
	INS	; could reuse instead of dropping
	INS
	INS
	INS
	PSHX
	LDX	#$8765
	PSHX
	JSR	ADD16S	; result in D:X should be $FFFF6788
	INS	; could reuse instead of dropping
	INS
	INS
	INS
	STX	XWORK
	LDX	VBP
	STD	0,X
	LDD	XWORK
	STD	2,X
	LDX	#$A5A5
	PSHX
	JSR	SUB16SI		; result in 2nd variable should be FFFF0D2D
	LDX	VBP		; get the result from our variable
	LDD	2,X		; low half
	LDX	LB_BASE		; store it in FINAL, in process local space
	STD	FINALX+2,X
	LDX	VBP
	LDD	0,X		; high half
	LDX	LB_BASE
	STD	FINALX,X
*
	JSR	UNLKF
	RTS
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP
***
* (who knows?) <= FP
***
* (who knows?) <= VBP
***
*
* Stack after initialization:
* [<SELF>  ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY  ] 
* [<SELF>  ] <= <SELF>,FP,VBP
* [STKUNDR ]STKBAS <= SP
***
* Stack after LINK (at call to MAIN)
* [<SELF>  ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY  ] 
* [<SELF>  ] <= <SELF>,VARBPX,FRMLNKY
* [STKUNDR ]STKBAS
* [VARBPX  ] 
* [FRMLNKY=STKBAS+NATWID ] <= SP,FP,VBP
*
START	NOP
	JSR	INISTK
	NOP
*
	LDX	VBP	; mark
	PSHX
	LDX	FP
	PSHX
	TSX	; link
	STX	FP
	STX	VBP
*
	JSR	MAIN
*
DONE	NOP
ERROR	NOP	; define error labels as something not DONE, anyway
STKUNDR	NOP
	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad to set breakpoint at
	NOP
	NOP
	LDX	$FFFE	; alternatively, jmp through reset vector
	JMP	0,X
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor, 
* but not necessarily to your program in a runnable state.

[JMR202411141016 end old code version.]

 

 

No comments:

Post a Comment