Saturday, August 31, 2024

ALPP 02-02 -- Foothold! (Split Stacks ... barely on the Beach) -- 6801

Foothold!
(Split Stacks ... barely on the Beach)
6801

(Title Page/Index)

 

The 6801 does not seem to have a lot to add for improving our foothold text output routines on the 6800, but it does offer a little, and what it offers is important. You might not think it enough to warrant a separate chapter, but the last chapter was a bit long and deep, and it will make good review of the 6801. And ...

... as I have said, one major purpose of the split stack is to separate return addresses from parameters and local variables, producing somewhat more robust, less difficult to debug code. 

And, you probably won't believe me yet, but the split stack also simplifies and streamlines the call interface for the subroutines and procedures you write. Promise. It's not quite visible yet, but you shall see it soon.

Maybe not quite yet, but soon. 

The 6801 doesn't really give us anything to improve our 6800 version of OUTC that handles the stack directly:
OUTC	LDX	PSP	; get the parameter stack pointer
	LDAA	1,X	; get the low byte where the EXbug's 7-bit character should be.
	INX		; drop the passed character off the stack
	INX
	STX	PSP	; update the stack pointer
	JSR	XOUTCH	; output A via monitor ROM
	RTS

... unless you want to save X across the use of the routine, of course, but that slightly breaks our promise not to use the return address stack for other purposes:

OUTC	PSHX
	LDX	PSP	; get the parameter stack pointer
	LDAA	1,X	; get the low byte where the EXbug's 7-bit character should be.
	INX		; drop the passed character off the stack
	INX
	STX	PSP	; update the stack pointer
	JSR	XOUTCH	; output A via monitor ROM
	PULX
	RTS

It does, however, give us something for some minor improvements to the parameter push and pop:

PPOPD	LDX	PSP
	LDD	0,X	; We can do both A & B in one instruction now.
	INX
	INX
	STX	PSP
	RTS
*
PPUSHD	LDX	PSP
	DEX
	DEX
	STX	PSP
	STD	0,X	; Double accumulator here, too -- both at once.
	RTS

Here, too, If we wanted to make the push and pop less intrusive to the use of X, we could use the 6801's PSHX and POPX, again slightly breaking our promise not to use the return stack for anything but return addresses:

PPOPD	PSHX
	LDX	PSP
	LDD	0,X	; We can do both A & B in one instruction now.
	INX
	INX
	STX	PSP
	PULX
	RTS
*
PPUSHD	PSHX
	LDX	PSP
	DEX
	DEX
	STX	PSP
	STD	0,X	; Double accumulator here, too -- both at once.
	PULX
	RTS

No changes in OUTC using PPUSHD and PPOPD. We'll have it call our slightly improved push and pop, but you can't tell from OUTC.

OUTC	JSR	PPOPD	; get the character in B
	TBA		; put it where XOUTCH wants it.
	JSR	XOUTCH	; output via monitor ROM
	RTS

Calling PPUSHD might not change any:

	CLRA	; 1 byte, 2 cycles
	LDAB	#'H	; 2 bytes, 2 cycles
	JSR	PPUSHD
	JSR	OUTC

Or it might:

	LDD	#'H	; 3-bytes, 3 cycles
	JSR	PPUSHD
	JSR	OUTC

If you've got two monitors, you may want to open up the 6800 code on the other one so you can see what the differences between the code here and the code there are. Even if you have only one monitor, you may want to use separate browser windows.

There are only minor changes in the overall program (here, not saving X across the parameter pushes and pops):
* Essential monitor ROM routines
XOUTCH	EQU	$F018
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS and EXbug docs say it should be okay here.
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
PSP	RMB	2	; parameter stack pointer
SSAVE	RMB	2	; a place to keep S so we can return clean
*
	ORG	$2000	; MDOS says this is a good place for user stuff
NOENTRY	JMP	START
	RMB	2	; a little bumper space
SSTKLIM	RMB	31	; 16 levels of call, max
SSTKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
	RMB	2	; a little bumper space
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
* (But this example only uses two levels.) * * INISTKS LDX #PSTKBAS ; Set up the parameter stack STX PSP PULX ; 6801 lets us do this -- return address in X STS SSAVE ; Save what the monitor gave us. LDS #SSTKBAS ; Move to our own stack JMP 0,X ; return via X * PPOPD LDX PSP LDD 0,X INX INX STX PSP RTS * PPUSHD LDX PSP DEX DEX STX PSP STD 0,X RTS * * OUTC JSR PPOPD ; get the character in B TBA ; put it where XOUTCH wants it. JSR XOUTCH ; output via monitor ROM RTS * * START JSR INISTKS LDD #'H JSR PPUSHD JSR OUTC * DONE LDS SSAVE ; restore the monitor stack pointer NOP NOP ; landing pad

This should run, but, as we discovered in the last set of examples, it's going to be hard to test on EXORsim6801. (It should be testable on, say, XRoar's MC-10 emulation, if you have already figured out how to use that with assembly language. I plan on talking about XRoar later.)

So, let's modularize the terminal initialization and the pause and add them to this code and see what happens with EXORsim6801:

* Essential control codes
LF	EQU	$0A	; line feed
CR	EQU	$0D	; carriage return
* Essential monitor ROM routines
XOUTCH	EQU	$F018
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS and EXbug docs say it should be okay here.
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
PSP	RMB	2	; parameter stack pointer
SSAVE	RMB	2	; a place to keep S so we can return clean
*
	ORG	$2000	; MDOS says this is a good place for user stuff
NOENTRY	JMP	START
	RMB	2	; a little bumper space
SSTKLIM	RMB	31	; 16 levels of call, max
SSTKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
	RMB	2	; a little bumper space
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
* (But this example only uses two levels.)
*
*
INISTKS	LDX	#PSTKBAS	; Set up the parameter stack
	STX	PSP
	PULX		; 6801 lets us do this -- return address in X
	STS	SSAVE	; Save what the monitor gave us.
	LDS	#SSTKBAS	; Move to our own stack
	JMP	0,X	; return via X
*
PPOPD	LDX	PSP
	LDD	0,X
	INX
	INX
	STX	PSP
	RTS
*
PPUSHD	LDX	PSP
	DEX
	DEX
	STX	PSP
	STD	0,X
	RTS
*
*
OUTC	JSR	PPOPD	; get the character in B
	TBA		; put it where XOUTCH wants it.
	JSR	XOUTCH	; output via monitor ROM
	RTS
*
* Output a newline combination
OUTNEWL	LDD	#CR
	JSR	PPUSHD
	JSR	OUTC
	LDD	#LF
	JSR	PPUSHD
	JSR	OUTC
	RTS
*
* Takes a visible character to print on each line as a parameter.
* Using the simpler run-time model, lots of pushes and pops:
INITRM	LDD	#40	; screen full of lines
	JSR	PPUSHD
INITRL	BSR	OUTNEWL
	LDX	PSP
	DEC	1,X	; the count
	BNE	INITRL
	JSR	PPOPD	; drop the count
	RTS
*
* This is not the best timing LOOP for all purposes.
* Also, timing LOOPs are usually not the correct solution.
* Usually.
PAUSE	LDAA	#5	; adjust this for your workstation.
	CLRB
	LDX	#0
PLOOP	DEX		; 3~
* If not 0 after dec, go back and dec again
	BNE	PLOOP	; 3~, tot 6 : 65536 times => 393216~ 
	DECB		; 2~, tot 393218~
	BNE	PLOOP	; 3~, tot 393221~ : times 256 => 100664576~
* At 1 MHz, that should be about 100 seconds.
* But the emulator seems to be running a bit fast.
* It's about 2 seconds on my workstation.
	DECA		; 100664578~
	BNE	PLOOP	; 100664581~ : times 5 => 503322905
	RTS		; total of about 10 seconds on my WS
*
START	JSR	INISTKS
	JSR	INITRM	; get the terminal's attention
*
	LDD	#'H
	JSR	PPUSHD
	JSR	OUTC
*
	JSR	OUTNEWL	; make a clear line for the breakpoint output
	JSR	PAUSE
DONE	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

This has been lightly tested and it works. Give it a try.

You'll note that, just before the pause, I've added a call to the new line routine. Without that, the line buffering for the terminal emulation in the older EXORsim seems to hold onto the output character until it gets swallowed in the the traceback.

You may be thinking that this code should be optimizable by quite a bit. It really is not worth optimizing, in terms of the code, but walking through optimization to show what can be done while keeping it readable and clean is a useful exercise in understanding both the CPU and the optimization processes. Be careful to read the comments.

* Essential control codes
LF	EQU	$0A	; line feed
CR	EQU	$0D	; carriage return
* Essential monitor ROM routines
XOUTCH	EQU	$F018
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS and EXbug docs say it should be okay here.
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
PSP	RMB	2	; parameter stack pointer
SSAVE	RMB	2	; a place to keep S so we can return clean
*
	ORG	$2000	; MDOS says this is a good place for user stuff
NOENTRY	JMP	START
	RMB	2	; a little bumper space
SSTKLIM	RMB	31	; 16 levels of call, max
SSTKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
	RMB	2	; a little bumper space
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
* (But this example only uses two levels.)
*
*
INISTKS	LDX	#PSTKBAS	; Set up the parameter stack
	STX	PSP
	PULX		; 6801 lets us do this -- return address in X
	STS	SSAVE	; Save what the monitor gave us.
	LDS	#SSTKBAS	; Move to our own stack
	JMP	0,X	; return via X
*
PPOPD	LDX	PSP
	LDD	0,X
	INX
	INX
	STX	PSP
	RTS
*
PPUSHD	LDX	PSP
	DEX
	DEX
	STX	PSP
	STD	0,X
	RTS
*
*
* Let's handle the parameter stack directly:
OUTC	LDX	PSP
	LDAA	1,X	; Get the character in A where XOUTCH wants it.
	INX		; Got it in A, now drop it from the stack.
	INX
	STX	PSP
OUTCRB	JSR	XOUTCH	; output via monitor ROM
	RTS
*
* Output a newline combination
OUTNEWL	LDAA	#CR
	BSR	OUTCRB
	LDAA	#LF
	BRA	OUTCRB	; Rob OUTC's RTS
*
* Takes a visible character to print on each line as a parameter.
* Since we can see OUTCRB does not walk on B,
* we'll put the count in B instead of on the parameter stack.
INITRM	LDAB	#40	; screen full of lines
INITRL	BSR	OUTNEWL
	DECB		; the count
	BNE	INITRL
	RTS
* Note that if the code OUTNEWL or INITRM get separated from OUTC, 
* it will be easy to forget what we are relying on --
* making it easy to introduce bugs down the road.
*
* This is not the best timing LOOP for all purposes.
* Also, timing LOOPs are usually not the correct solution.
* Usually.
PAUSE	LDAA	#5	; adjust this for your workstation.
	CLRB
	LDX	#0
PLOOP	DEX		; 3~
* If not 0 after dec, go back and dec again
	BNE	PLOOP	; 3~, tot 6 : 65536 times => 393216~ 
	DECB		; 2~, tot 393218~
	BNE	PLOOP	; 3~, tot 393221~ : times 256 => 100664576~
* At 1 MHz, that should be about 100 seconds.
* But the emulator seems to be running a bit fast.
* It's about 2 seconds on my workstation.
	DECA		; 100664578~
	BNE	PLOOP	; 100664581~ : times 5 => 503322905
	RTS		; total of about 10 seconds on my WS
*
* Let's keep this part high-level.
START	JSR	INISTKS
	JSR	INITRM	; get the terminal's attention
*
	LDD	#'H
	JSR	PPUSHD
	JSR	OUTC
*
	JSR	OUTNEWL	; make a clear line break for the breakpoint output
	JSR	PAUSE
DONE	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

I noted in the comments that when code you're relying on in your optimizations gets moved away from where you are optimizing for it, it's really easy to forget what's going on and add bugs to your code. It's a concept really worth remembering, especially when trying to decide whether you want to optimize or not.

Now, let's turn our attention the the string output routines, and give them a similar treatment. We'll go ahead and put the new-line and pause code in, to avoid spending too much time going over the same ground.

First, let's look at handling the parameter stack via subroutine only:

	OPT	6801
* First 6800 Foothold on the split stack beach,
* by Joel Matthew Rees August 2024, Copyright 2024 -- All rights reserved.
*
* Essential control codes
LF	EQU	$0A	; line feed
CR	EQU	$0D	; carriage return
NUL	EQU	0
*
* Essential monitor ROM routines
XOUTCH	EQU	$F018
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS and EXbug docs say it should be okay here.
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
PSP	RMB	2	; parameter stack pointer
SSAVE	RMB	2	; a place to keep S so we can return clean
*
*
	ORG	$2000	; MDOS says this is a good place for usr stuff
NOENTRY	JMP	START
	RMB	2	; a little bumper space
SSTKLIM	RMB	31	; 16 levels of call, max
SSTKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
	RMB	2	; a little bumper space
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
*
*
HELLO	FCB	CR,LF	; Put message at beginning of line
	FCC	"Ashi-ba ga dekita!"	; Whatever the user wants here.
	FCB	CR,LF,NUL	; Put the debugger's output on a new line.
*
*
INISTKS	LDX	#PSTKBAS	; Set up the parameter stack
	STX	PSP
	PULX		; 6801 lets us do this -- return address in X
	STS	SSAVE	; Save what the monitor gave us.
	LDS	#SSTKBAS	; Move to our own stack
	JMP	0,X	; return via X
*
*
PPOPD	LDX	PSP
	LDD	0,X
	INX
	INX
	STX	PSP
	RTS
*
PPOPX	BSR	PPOPD
	PSHB		; keep the order straight
	PSHA		; high byte low in memory
	PULX		; Got X, return address is back on top.
	RTS
*
PPUSHD	LDX	PSP
	DEX
	DEX
	STX	PSP
	STD	0,X
	RTS
*
PPUSHX	PSHX
	PULA		; keep the order straight!
	PULB
	BRA	PPUSHD	; Rob the return
*
OUTC	JSR	PPOPD	; get the character in B
	TBA		; put it where XOUTCH wants it.
	BSR	OUTCV	; output A via monitor ROM
	RTS
*
OUTCV	JMP	XOUTCH	; common hook
*
OUTS	JSR	PPOPX	; get the string pointer
OUTSL	LDAA	0,X	; get the byte out there
	BEQ	OUTDN	; if NUL, leave
	BSR	OUTCV	; use the same call OUTC uses.
	INX		; point to the next
	BRA	OUTSL	; next character
OUTDN	RTS
*
*
* Output a newline combination
OUTNEWL	LDD	#CR
	JSR	PPUSHD
	JSR	OUTC
	LDD	#LF
	JSR	PPUSHD
	JSR	OUTC
	RTS
*
* Using the simpler run-time model, lots of pushes and pops:
INITRM	LDD	#40	; screen full of lines
	JSR	PPUSHD
INITRL	BSR	OUTNEWL
	LDX	PSP
	DEC	1,X	; the count
	BNE	INITRL
	JSR	PPOPD	; drop the count
	RTS
*
* This is not the best timing LOOP for all purposes.
* Also, timing LOOPs are usually not the correct solution.
* Usually.
PAUSE	LDAA	#5	; adjust this for your workstation.
	CLRB
	LDX	#0
PLOOP	DEX		; 3~
* If not 0 after dec, go back and dec again
	BNE	PLOOP	; 3~, tot 6 : 65536 times => 393216~ 
	DECB		; 2~, tot 393218~
	BNE	PLOOP	; 3~, tot 393221~ : times 256 => 100664576~
* At 1 MHz, that should be about 100 seconds.
* But the emulator seems to be running a bit fast.
* It's about 2 seconds on my workstation.
	DECA		; 100664578~
	BNE	PLOOP	; 100664581~ : times 5 => 503322905
	RTS		; total of about 10 seconds on my WS
*
*
START	JSR	INISTKS
	JSR	INITRM
*
	LDX	#HELLO	; There are other ways to push the address.
	JSR	PPUSHX
	JSR	OUTS	; Our string has its own CR/LF this time around.
*
	JSR	PAUSE
DONE	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

Now let's look at handling the stack a little more directly in places where it might make sense to do so:

	OPT	6801
* First 6800 Foothold on the split stack beach,
* by Joel Matthew Rees August 2024, Copyright 2024 -- All rights reserved.
*
* Essential control codes
LF	EQU	$0A	; line feed
CR	EQU	$0D	; carriage return
NUL	EQU	0
*
* Essential monitor ROM routines
XOUTCH	EQU	$F018
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS and EXbug docs say it should be okay here.
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
PSP	RMB	2	; parameter stack pointer
SSAVE	RMB	2	; a place to keep S so we can return clean
*
*
	ORG	$2000	; MDOS says this is a good place for usr stuff
NOENTRY	JMP	START
	RMB	2	; a little bumper space
SSTKLIM	RMB	31	; 16 levels of call, max
SSTKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
	RMB	2	; a little bumper space
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
*
*
HELLO	FCB	CR,LF	; Put message at beginning of line
	FCC	"Ashi-ba ga dekita!"	; Whatever the user wants here.
	FCB	CR,LF,NUL	; Put the debugger's output on a new line.
*
*
INISTKS	LDX	#PSTKBAS	; Set up the parameter stack
	STX	PSP
	PULX		; 6801 lets us do this -- return address in X
	STS	SSAVE	; Save what the monitor gave us.
	LDS	#SSTKBAS	; Move to our own stack
	JMP	0,X	; return via X
*
*
PPOPD	LDX	PSP
	LDD	0,X
	INX
	INX
	STX	PSP
	RTS
*
PPOPX	BSR	PPOPD
	PSHB		; keep the order straight
	PSHA		; high byte low in memory
	PULX		; Got X, return address is back on top.
	RTS
*
PPUSHD	LDX	PSP
	DEX
	DEX
	STX	PSP
	STD	0,X
	RTS
*
PPUSHX	PSHX
	PULA		; keep the order straight!
	PULB
	BRA	PPUSHD	; Rob the return
*
* Let's handle the parameter stack directly:
OUTC	LDX	PSP
	LDAA	1,X	; Get the character in A where XOUTCH wants it.
	INX		; Got it in A, now drop it from the stack.
	INX
	STX	PSP
	BSR	OUTCV	; output via monitor ROM hook
	RTS
*
OUTCV	JMP	XOUTCH	; common hook
*
* Because of the conflict in using X for two purposes,
* Handling the stack directly will use quite a bit of code.
* So we won't do that here.
OUTS	JSR	PPOPX	; get the string pointer
OUTSL	LDAA	0,X	; get the byte out there
	BEQ	OUTDN	; if NUL, leave
	BSR	OUTCV	; use the same call OUTC uses.
	INX		; point to the next
	BRA	OUTSL	; next character
OUTDN	RTS
*
*
* Output a newline combination
OUTNEWL	LDAA	#CR
	BSR	OUTCV
	LDAA	#LF
	BSR	OUTCV	; Can't rob OUTC's RTS
	RTS		; Should be able to rob XOUTCH's RTS,
* but don't really want to rely on things I can't see nearby.
*
* Since we can see OUTCRB does not walk on B,
* we'll put the count in B instead of on the parameter stack.
INITRM	LDAB	#40	; screen full of lines
INITRL	BSR	OUTNEWL
	DECB		; the count
	BNE	INITRL
	RTS
* Note that if the code OUTNEWL or INITRM get separated from OUTC, 
* it will be easy to forget what we are relying on --
* making it easy to introduce bugs down the road.
*
* This is not the best timing LOOP for all purposes.
* Also, timing LOOPs are usually not the correct solution.
* Usually.
PAUSE	LDAA	#5	; adjust this for your workstation.
	CLRB
	LDX	#0
PLOOP	DEX		; 3~
* If not 0 after dec, go back and dec again
	BNE	PLOOP	; 3~, tot 6 : 65536 times => 393216~ 
	DECB		; 2~, tot 393218~
	BNE	PLOOP	; 3~, tot 393221~ : times 256 => 100664576~
* At 1 MHz, that should be about 100 seconds.
* But the emulator seems to be running a bit fast.
* It's about 2 seconds on my workstation.
	DECA		; 100664578~
	BNE	PLOOP	; 100664581~ : times 5 => 503322905
	RTS		; total of about 10 seconds on my WS
*
*
START	JSR	INISTKS
	JSR	INITRM
*
	LDX	#HELLO	; There are other ways to push the address.
	JSR	PPUSHX
	JSR	OUTS	; Our string has its own CR/LF this time around.
*
	JSR	PAUSE
DONE	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

Both of these are lightly tested and should work. 

Oddly, I seem to have forgotten to handle the stack more directly in the main routine, from START. Probably too focused on the INITRM and PAUSE routines or something. If you want to give it a try, it would look something like this:

START	JSR	INISTKS
	JSR	INITRM
*
	LDX	#HELLO	; There are other ways to push the address.
	PSHX
	PULA		; watch the order!
	PULB
	LDX	PSP
	DEX
	DEX
	STX	PSP
	STD	0,X
	JSR	OUTS	; Our string has its own CR/LF this time around.
*
	JSR	PAUSE

You may be thinking about how much these "minor" improvements will make the 6801 easier to keep responsive and stable than the 6800 at interrupt time and such. I hope so.

Not that you can't do it on the 6800, but it's trickier, especially when trying to keep it responsive. It's going to be slower in general on the 6800 to make sure it's done right.

I also hope you're thinking about having  OUTS call OUTC instead of the shared OUTCV. We'll talk about that after we've looked out how we do this on the 6809.

And I hope you are thinking about how to check that the stacks have stayed in balance.

I think it's time to see a little bit of what gets people excited about the 6809.


(Title Page/Index)

 

No comments:

Post a Comment