Tuesday, November 12, 2024

ALPP 02-26 -- Walking the Pontoons -- Split-stack Stack Frame Example: 6801

And this one has been sitting at the bottom of the pool for a while, as predicted, even longer than the 6809 example.

  Walking the Pontoons --
Split-stack Stack Frame Example:
6801

(Title Page/Index)

Having worked through a concrete single-(combined)-stack stack frame example for the 6801, let's look at whether the split stack discipline/paradigm improves things (as I think it should).

In the split stack discipline, we no longer need frame links. Each frame pointer is simply stacked up like the VBP was in the single stack example. Walking from frame to frame is simply moving back in the return stack -- since we've decided to store the frame pointers there in this split stack discipline.

We decided this?

Okay, I decided it.

Here are some of my reasons -- 

  • we want linkage on one stack and parameter on the other. 
  • And one of the side benefits is that it's no longer a linked list, it's a stack of pointers, really easy to walk.

We can now simply point to the actual bottom of the fixed part of each frame, instead of the frame link.

And that means that local references can now be positive offset from the frame pointer, thanks to the natural structure in logic, reason, and mathematics -- the structure in the universe, if you will allow me, that either God or the initial conditions provided us with.

We might still wish we had the SBX instruction. It would make allocation without initialization more straightforward in many cases. But the Double accumulator and adding negatives can help get us over that. Sort-of.

As with the 6809 and 68000,  the return stack will always be in pairs.

However, rather than pushing the previous frame pointer immediately, we'll have the called routine push the frame pointer after allocating the local variables.

And it will be pushed in order of return address, caller frame pointer:

* [PRETADR   ]
* [PCALLERFRM]
* [RETADR    ]
* [CALLERFRM ] <= RSP

The parameter stack is just whatever is out there, but the conceptual order would be 

* [VARIABLES  ] <= CALLERFRM
* [TEMPORARIES]
* [PARAMETERS ]
* [VARIABLES  ] <= FP
* [TEMPORARIES]
* [PARAMETERS ] <= PSP

The difference here with the 6809 and 68000 implementations is that, instead of relying on negative offsets to access the variables as is the case there, we will move the frame pointer to the bottom of the variables after allocating them. Variables and parameters passed in will be at known positive offsets from the frame pointer.

The frame pointer we pushed before allocation was the frame pointer of the calling routine, which is as it should be. This will not allow us reliable (easy) means of accessing the calling function's temporary variables, but we shouldn't want to access them anyway.

And we note again that we no longer need to dodge around the return pointer and the saved stack frame pointer because they are on the other stack (where they should be).

You may be wondering about the local variable allocation and initialization, since the 6801 provides no native push instruction for the parameter stack, and the clean method we used in-line in the combined frames example:

	LDX	#0	; 3:  3#	3:  3~
	PSHX		; 1:  4 	4:  7
	PSHX		; 1:  5 	4: 11
	PSHX		; 1:  6 	4: 15
	PSHX		; 1:  7 	4: 21

is no longer available. 

We might want to use this, instead:

	LDX	#0
	JSR	PPSHD
	JSR	PPSHD
	JSR	PPSHD
	JSR	PPSHD

but it comes at a time cost:

PPOPD	LDX	PSP	; 2:  2#	4:  4~
	LDD	0,X	; 2:  4 	5:  9
	INX		; 1:  5 	3: 12
	INX		; 1:  6 	3: 15
	STX	PSP	: 2:  8 	4: 19
	RTS		: 1:  9 	5: 24 (+3#, +6~ call)
*
PPSHD	LDX	PSP	; 2:  2 	4:  4
	DEX		; 1:  3 	3:  7
	DEX		; 1:  4 	3: 10
	STX	PSP	: 2:  6		4: 14
	STD	0,X	: 2:  8 	5: 19
	RTS		: 1:  9 	5: 24 (+3#, +6~ call)

Four calls to PPSHD is going to take 120 cycles (plus the load Double accumulator), six times what the PSHX costs. That hurts.

There is a lot of redundant loading and storing of PSP going on in those routines, not to mention the basic call overhead -- the JSR and RTS instructions. 

Would doing it in a loop help?

* count in low byte of DWORD
ALIND0	LDD	#0	; 3:  3#	3:  3~
* count in pseudo-register BCOUNT,
* initial value in D
ALINDC	LDX	PSP	; 2:  2#	4:  4~
ALINDL	STD	0,X	; 2:  4 	5:  9
	DEX		; 1:  5 	3: 12
	DEX		; 1:  6 	3: 15
	DEC	BCOUNT	; 3:  9 	6: 21  (pseudo-register, but extended mode)
	BNE	ALINDL	; 2: 11 	3: 24 (20 * count + 4 => count 4=>84)
	STX	PSP	; 2: 13 	4: (84)+4
	RTS		; 1: 14 	5: (84)+9 (+3#, +6~ call)

That's better than our four calls to PPSHD, but still over 4 times the 4 PSHXes. And, in order to clear 16 bits at a time, I'm using a pseudo-register -- BCOUNT. 

And we are reminded that the 6801 does not have direct-page addressing modes for the read-modify-write (unary operator) instructions, so BCOUNT will be addressed in extended (absolute) mode. 

(Someday, I want to use programmed logic to implement a 6801 source-code compatible CPU with the op-codes moved around a bit, add SBX, and add direct-page mode op-codes for the read-modify-write -- unary -- instructions. It wouldn't save all that much time in the above loop, but I would also add address space decoding, so that the direct page could be popped out of the absolute address space, and you could do fancy things like bank-switch internal dual-ported RAM in the direct page, speeding access further and allowing really fast process context switches. Heh. Dreams.)

If we are really clever, we can line it all up and provide routines for allocating and clearing up  to four 16-bit local variables like this:

ALCL8	LDD	#0	; 3:  3#	3:  3~
	LDX	PSP	; 2:  5 	4:  7 
	DEX		; 1:  6 	3: 10
	DEX		; 1:  7 	3: 13
	STD	0,X	; 2:  9 	5: 18
ALCLI6	DEX		; 1: 10 	3: 21
	DEX		; 1: 11 	3: 24
	STD	0,X	; 2: 13 	5: 29
ALCLI4	DEX		; 1: 14 	3: 32
	DEX		; 1: 15 	3: 35
	STD	0,X	; 2: 17 	5: 40
ALCLI2	DEX		; 1: 18 	3: 43
	DEX		; 1: 19 	3: 46
	STD	0,X	; 2: 21 	5: 51
	STX	PSP	; 2: 23 	4: 55
	RTS		; 1: 24 	5: 60
*
ALCL6	LDD	#0	; 3:  3#	 3:  3~
	LDX	PSP	; 2:  5 	 4:  7
	BRA	ALCLI6	; 2:  7 	 3: 10
*			; 0:  7 	42: 52
*
ALCL4	LDD	#0	; 3:  3#	 3:  3~
	LDX	PSP	; 2:  5 	 4:  7
	BRA	ALCLI4	; 2:  7 	 3: 10
*			; 0:  7 	31: 41
*
ALCL2	LDD	#0	; 3:  3#	 3:  3~
	LDX	PSP	; 2:  5 	 4:  7 
	BRA	ALCLI4	; 2:  7 	 3: 10
*			; 0:  7 	20: 30
That's half the cycle count (not including the call) of the four calls to PPSHD, but still around three times doing the 8 bytes with four PSHX instructions. It is an improvement.

What's the fastest possible way on the 6801 short of doing dangerous (because of interrupts) things like quickly swapping the return stack pointer in for the parameter stack pointer and swapping it out when done? 

(Never do that. If you swap out the stack pointer, you will not avoid interrupts, no matter how hard you try, and your interrupts will save registers all over things. I mean, it is possible to physically strap all interrupt inputs high, but then you have the devil of a time interfacing with the real world. Keep your return stack valid at all times.)

If we replace all those DEXs with direct calculation, we can trim it up a bit, like this:

ALCLD8	LDD	#-8	; 3:  3#	3:  3~
	ADDD	PSP	; 2:  5 	4:  7
	STD	PSP	; 2:  7 	4: 11
	LDX	PSP	; 2:  9 	4: 15
	CLRB		; 1: 10 	2: 17
	STD	0,X	; 2: 12 	5: 22
	STD	2,X	; 2: 14 	5: 27
	STD	4,X	; 2: 16 	5: 32
	STD	6,X	; 2: 18 	5: 37
	STX	PSP	; 2: 20 	4: 41
	RTS		; 1: 21 	5: 46

But there's no reusable code in there, other than that the whole routine can be used as a subroutine anywhere you need to allocate and clear 8 bytes on the parameter stack.

I'm not going to show all the approaches I've checked, but that's the best for time that I can figure out. The down side is that I can't figure out how to steal code from it, so it has to be mostly recreated for the 6 byte and 4 byte case. The 2 byte case you're just going to want to in-line. But it's always faster.

On the other hand, if you need to trim up the object code size, we do have a sort-of decent option in the DEX chain.

Doesn't this de-optimization of stack allocation have negative consequences for the split stack discipline?

Yes, but I've found, at least in my range of experience, that the negatives are not nearly as negative as they might seem, and the benefits outweigh the disadvantages.

So, the code --

As always, read the code, read the comments and be careful because sometimes the code and the comments are out-of-sync. (You can be sure the comments are closer to the code than any detailed explanation I could give in the blog.)

* 16-bit addition as example of split-stack stack frame discipline on 6801
* using the direct page,
* with test code
* Joel Matthew Rees, October 2024
*
	OPT	6801
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS says this is a good place for usr stuff.
*
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
	NOP		; bumper
	NOP		; 6 bytes to this point.
SSAVE	RMB	2	; a place to keep S so we can return clean
	RMB	4	; bumper
* All of the pseudo-registers must be saved and restored on context switch,
* cannot be accessed during interrupt service.
XWORK	RMB	2	; For saving an index register temporarily
DWORK	RMB	2	; For saving D temporarily
ALCOUNT	RMB	1	; for allocation utility routines
	RMB	1	; reserved
PSP	RMB	2	; parameter stack pointer
FP	RMB	2	; frame pointer
LB_BASE	RMB	2	; For process local variables
HPPTR	RMB	2	; heap pointer (not yet managed)
HPALL	RMB	2	; heap allocation pointer
HPLIM	RMB	2	; heap limit
* End of pseudo-registers
	RMB	4	; bumper
GAP1	RMB	2	; Mark the bottom of the gap
*
*
*
	ORG	$2000	; Give the DP room.
LB_ADDR	RMB	4	; a little bumper space
FINAL	RMB	4	; 32-bit Final result in DP variable (to show we can)
FINALX	EQU	4
SSTKLIM	RMB	64	; 16 levels of call
SSTKLMX	EQU	FINALX+4
SSTKBAS	RMB	8	; for canary return
SSTKBSX	EQU	SSTKLMX+64
SSTKFAK	RMB	2	; fake frame pointer, self-link
SSTKFAX	EQU	SSTKBSX+8	; 6801 is post-dec (post-store-decrement) push
SSTKBMP	RMB	4	; a little bumper space
SSTKBMX	EQU	SSTKFAX+2	; But we are going to init S through X
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKLMX	EQU	SSTKBMX+4
PSTKBAS	RMB	4	; bumper space -- parameter stack is pre-dec
PSTKBSX	EQU	PSTKLMX+64
PSTKBMP	RMB	4	; a little bumper space
PSTKBMX	EQU	PSTKBSX+4
*
* My assembler limits RMBs to $100 long, so we'll use a different way.
HBASE	RMB	1	; $1024 or something	; Not using or managing heap yet.
HBASEX	EQU	PSTKBMX+4
*HLIM	RMB	4	; bumper
*HLIMX	EQU	HBASEX+$100	; 1024
*
*
	ORG	$3000
CDBASE	JMP	ERROR		; more bumpers
	NOP
*
INISTKS	LDX	#LB_ADDR	; set up process local space
	STX	LB_BASE
	LDD	LB_BASE	
	ADDD	#HBASEX		; calculat EA
	STD	HPPTR		; as if we actually had a heap
	STD	HPALL
	LDD	#CDBASE
	SUBD	#4		; extra bumper
	STD	HPLIM
	LDD	LB_BASE
	ADDD	#PSTKBSX	; calculate the address
	STD	PSP	; initialize parameter stack pointer empty
	STD	FP	; initialize frame pointer
	LDX	#SSTKNDR	; error handler
	STX	DWORK	; save it for a bit (not ready for stack)
	LDD	LB_BASE
	ADDD	#SSTKBSX	; calculate the address
	STD	XWORK	; move it to X (not ready fof stack)
	LDX	XWORK
	LDD	DWORK	; prime the stack with underflow error handler
	STD	0,X
	STD	4,X
	LDD	PSP	; and empty parameter stack frame
	STD	2,x	; stacks primed
	PULA		; get the return address
	PULB
	STS	SSAVE	; Save what the monitor gave us.
	TXS		; move to our own stack
	PSHB
	PSHA
	RTS
*
*
***
* General structure of the stacks, 
*
* return stack is always in pairs:
* [PRETADR   ]
* [PCALLERFRM]
* [RETADR    ]
* [CALLERFRM ] <= RSP
*
* order of elements on the parameter stack,
* when they are present:
* [PARAMETERS ]
* [VARIABLES  ] <= CALLERFRM
* [TEMPORARIES]
* [PARAMETERS ]
* [VARIABLES  ] <= FP
* [TEMPORARIES]
* [PARAMETERS ] <= PSP
*
* Result is returned on parameter stack
*
***
* Utility routines
PPOPD	LDX	PSP
	LDD	0,X
	INX
	INX
	STX	PSP
	RTS
*
PPSHD	LDX	PSP
	DEX
	DEX
	STX	PSP
	STD	0,X
	RTS
*
* subroutine to make sure we don'T forget anything
MARK	PULA
	PULB
	LDX	FP
	PSHX		; mark, no allocate,
	LDX	PSP
	STX	FP
	PSHB
	PSHA
	RTS
*
*UNMK	PULA
*	PULB
*	PULX
*	STX	FP
*	PSHB
*	PSHA
*	RTS
*
* Compromise between speed and reusability
* Enter here to load PSP and initialize to 0
* 8 bytes
ALCL8	LDD	#0	; 3:  3#	3:  3~
* Enter here with initial value in D
ALCLD8	LDX	PSP	; 2:  5 	4:  7 
* Enter here with PSP loaded and initial value in D
ALCLI8	DEX		; 1:  6 	3: 10
	DEX		; 1:  7 	3: 13
	STD	0,X	; 2:  9 	5: 18
ALCLI6	DEX		; 1: 10 	3: 21
	DEX		; 1: 11 	3: 24
	STD	0,X	; 2: 13 	5: 29
ALCLI4	DEX		; 1: 14 	3: 32
	DEX		; 1: 15 	3: 35
	STD	0,X	; 2: 17 	5: 40
ALCLI2	DEX		; 1: 18 	3: 43
	DEX		; 1: 19 	3: 46
	STD	0,X	; 2: 21 	5: 51
	STX	PSP	; 2: 23 	4: 55
	RTS		; 1: 24 	5: 60
*
* six bytes
ALCL6	LDD	#0	; 3:  3#	 3:  3~
ALCLD6	LDX	PSP	; 2:  5 	 4:  7
	BRA	ALCLI6	; 2:  7 	 3: 10
*			; 0:  7 	42: 52
* four bytes
ALCL4	LDD	#0	; 3:  3#	 3:  3~
ALCLD4	LDX	PSP	; 2:  5 	 4:  7
	BRA	ALCLI4	; 2:  7 	 3: 10
*			; 0:  7 	31: 41
* two bytes
ALCL2	LDD	#0	; 3:  3#	 3:  3~
	LDX	PSP	; 2:  5 	 4:  7 
	BRA	ALCLI4	; 2:  7 	 3: 10
*			; 0:  7 	20: 30
*
*
* Add D to PSP -- negative for allocation, positive for deallocation
ADDPSP	ADDD	PSP	; plus 3 bytes for load: 6 bytes vs. 9 total
	STD	PSP
	LDX	PSP	; 3 bytes for call vs. 6 bytes in-line
	RTS
*
PDROP_8	LDAB	#8	; saves two bytes, 7 vs. 3
* deallocate count in B
PDROPB	LDX	PSP	; 5 bytes to deallocate in-line
	ABX		; vs. 3 bytes to call this.
	STX	PSP	; ABX is useful for deallocation
	RTS		; 5 bytes vs. 7 total
*
*
***
* Return stack when functions are called by MAIN
* Return stack on entry, after link:
* [SSTKNDR ]
* [<EMPTYP>]
* [SSTKNDR ]SSTKBAS
* [FRMPTRm1==<EMPTYP>]
* [RETADR0 ]
* [FRMPTR0==<EMPTYP>]
* [RETADR1 ]
* [FRMPTR1 ] <= RSP
*
* Parameter stack when called by MAIN
* with two 32-bit local variables
* and two 16-bit parameters,
* after mark (no local allocation)
* [<unknown>] <= FRMPTR0,FRMPTR1
* [32:VAR1_1--]
* [32:VAR1_2--] <= FRMPTR1
* [16:PARAM2_1]
* [16:PARAM2_2] <= PSP,FP
*
****
*
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
*   16-bit left, right
* output parameter:
*   17-bit sum in 32-bit
ADD16S	JSR	MARK	; mark, no allocate, X is PSP
*
	LDD	#(-1)	; default negative
	JSR	ALCLI4	; allocate 2 temporary cells and init
	LDX	FP	; 
	TST	2,X	; the left-hand operand sign bit
	BMI	ADD16SR
	LDX	PSP
	CLR	2,X	; positive
	CLR	3,X
ADD16SR	LDX	FP
	TST	0,X	; the right-hand operand sign bit
	BMI	ADD16SL
	LDX	PSP
	CLR	0,X	; positive
	CLR	1,X
ADD16SL	LDX	FP
	LDD	2,X	; left hand 
	ADDD	0,X	; right hand
	STD	2,X	; store low half
	LDX	PSP
	LDD	2,X
	ADCB	1,X
	ADCA	0,X
	LDX	FP	; wouldn't need to do this if we tracked PSP extras
	STD	0,X
*
	STX	PSP	; drop the temporaries
	PULX
	STX	FP
	RTS
*
* The alternative without link, mark, or restore will be shown in the no-frame case.
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
*   16-bit left, right in 32-bit
* output parameter:
*   17-bit sum in 32-bit D1
ADD16U	JSR	MARK	; mark, no allocate, X is PSP
*
	LDX	FP
	LDD	2,X	; left
	ADDD	0,X	; add right
	STD	2,X	; save low
	LDD	#0	; extend
	ROLB		; extend Carry unsigned (could ADC #0)
	STD	0,X	; re-use right side to store high half
*
	PULX		; restore FP
	STX	FP
	RTS
*
* Etc.
*
*
***
* Parameter stack when called by MAIN
* with one 16-bit parameters,
* after mark (no local allocation)
* [<unknown>  ] <= FRMPTR0
* [32:VAR1_1  ]
* [32:VAR1_2  ] <= FRMPTR1
* [16:PARAM2_1] <= PSP,FP
*
* To show how to walk the stack --
* Add 16-bit signed parameter
* to 32 bit caller's 2nd 32-bit internal variable.
* input parameter:
*   16-bit addend in 32-bit
* target parameter in caller
*   2nd 32-bit variable at offset -2*NATWID
* no output parameter:
ADD16SI	JSR	MARK
*
	LDD	#(-1)	; make a temporary -1
	JSR	ALCLI2	; (default to signed)
	LDX	FP
	TST	0,X	; test high byte
	BMI	ADD16SP
	LDX	PSP
	CLR	0,X	; zero extend
	CLR	1,X
ADD16SP	TSX
	LDX	0,X	; caller's FP
	LDD	2,X	; caller's 2nd variable, low
	LDX	FP
	ADDD	0,X	; parameter
	TSX
	LDX	0,X
	STD	2,X	; update low half with result
	LDD	0,X	; 2nd variable, high half
	LDX	PSP
	ADCB	1,X	; sign extension half
	ADCA	0,X
	TSX
	LDX	0,X
	STD	0,X	; update high half
*
	LDX	FP
	INX		; drop parameter
	INX	
	STX	PSP	; and sign temporary goes bye-bye, too
	PULX
	STX	FP
	RTS
*
*
***
* Return stack on entry:
* [SSTKNDR ]
* [<EMPTYP>]
* [SSTKNDR ]SSTKBAS
* [FRMPTRm1==<EMPTYP>]
* [RETADR0 ] <= RSP
*
* Return stack after link:
* [SSTKNDR ]
* [<EMPTYP>]
* [SSTKNDR ]SSTKBAS
* [FRMPTRm1==<EMPTYP>]
* [RETADR0 ]
* [FRMPTR0==<EMPTYP>] <= RSP
*
* Parameter stack after mark and local allocation
* [<unknown>] <= FRMPTR0
* [VAR1_1--]
* [VAR1_2--] <= PSP,FP
*
MAIN	JSR	MARK
	JSR	ALCL8	; allocate and clear 8 bytes
	STX	FP	; Point FP to base of local variables.
*
	LDD	#$1234
	JSR	PPSHD
	LDD	#$CDEF
	JSR	PPSHD
	JSR	ADD16U	; 32-bit result on parameter stack should be $0000E023
	LDD	#$8765
	LDX	PSP	; reuse parameter space, since order is okay
	STD	0,X
	JSR	ADD16S	; result on parameter stack should be $FFFF6788
	LDX	PSP
	LDD	2,X	; result low half
	LDX	FP
	STD	2,X	; to 2nd local variable low half
	LDX	PSP
	LDD	0,X	; result high half
	LDX	FP
	STD	0,X	; to 2nd local variable high half
	LDD	#$A5A5	
	JSR	PPSHD
	JSR	ADD16SI	; result in 2nd variable should be FFFF0D2D (Carry set)
	LDX	FP
	LDD	2,X	; 2nd variable low half
	LDX	LB_BASE
	STD	FINALX+2,X
	LDX	FP
	LDD	0,X
	LDX	LB_BASE
	STD	FINALX,X
*
	JSR	PDROP_8
	PULX
	STX	FP	; restore FP	
	RTS
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP
***
* (who knows?) <= FP
***
*
***
* Return stack will always be in pairs:
* [RETADRNN  ]
* [CALLERFMNN]
*
* Return stack after initialization:
* [SSTKNDR ]
* [<EMPTYP>]
* [SSTKNDR ]SSTKBAS <= RSP
*
* Return stack after saving previous mark:
* [SSTKNDR ]
* [<EMPTYP>]
* [SSTKNDR ]SSTKBAS
* [FRMPTRm1==<EMPTYP>] <= RSP
*
* Parameter stack after initialization, mark:
* [<unknown]PSTKBAS <= PSP,FP==<EMPTYP>
*
START	JSR	INISTKS
	LDX	PSP
	PSHX	; empty previous mark
	STX	FP	; empty new mark
*
	JSR	MAIN
*
*
DONE	NOP
ERROR	NOP	; define error labels as something not DONE, anyway
SSTKNDR	NOP
	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP
	NOP		; another landing pad to set breakpoint at
	NOP
	LDX	$FFFE
	JMP	0,X	; alternatively, jmp through reset vector
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor, 
* but not necessarily to your program in a runnable state.

Again, I have tested this code. It builds the stack frames and tears them down as advertised and gets the right result in the right place. Again, I will not guarantee that this code can be generalized, whether by hand or by automaton (compiler, etc.).

[JMR202411182142 edit:] 

Just realized, while working on this for the 6800, that the name for SUB16SI did not agree with what it is doing. So I'm fixing it, calling it ADD16SI, instead.

[JMR202411182142 edit end.] 

Again, I remind you that we have seen what this kind of code looks like without stack frames. I may come back and strip the stack frames from this specific example, just to be obnoxious, but we need to look at both kinds of stack frames on the 6800, single stack, first.

If you don't want to wait, move on to getting numeric output in binary.

[JMR202411290957 addendum:]

I've mentioned several times how I'd have liked a subtract B from X SBX instruction in the 6801. We might have put it to good use it here, as well.

Another thing that would be useful, if we must have stack frames, would be a version of the JSR call instruction(s) that pushes both the address of the next instruction and the 2-byte frame pointer variable from the direct page, with a corresponding return that pops both the PC and the frame pointer, something like 

	CALL	<address>:PSP

where PSP is the direct page variable. 

The return might look like

	RET	PSP

That would be asking for too much, really, but it would streamline the split-stack discipline. 

It could also help with single-stack stack frames. 

(Sigh. Daydreams.)

[JMR202412152250: I wasted a few brain cycles on more meanderings on extending the 6801, here.

[JMR202411290957 addendum end.]

(Title Page/Index)


 

 

 

 

No comments:

Post a Comment