Foothold!
(Split Stacks ... barely on the Beach)
6801
The 6801 does not seem to have a lot to add for improving our
foothold text output routines on the 6800, but it does offer a little, and what it offers is important. You might not
think it enough to warrant a separate chapter, but the last chapter was a bit
long and deep, and it will make good review of the 6801. And ...
And, you probably won't believe me yet, but the split stack also simplifies and streamlines the call interface for the subroutines and procedures you write. Promise. It's not quite visible yet, but you shall see it soon.
Maybe not quite yet, but soon.
The 6801 doesn't really give us anything to improve our 6800 version of OUTC that handles the stack directly:
OUTC LDX PSP ; get the parameter stack pointer
LDAA 1,X ; get the low byte where the EXbug's 7-bit character should be.
INX ; drop the passed character off the stack
INX
STX PSP ; update the stack pointer
JSR XOUTCH ; output A via monitor ROM
RTS
... unless you want to save X across the use of the routine, of course, but
that slightly breaks our promise not to use the return address stack for other
purposes:
OUTC PSHX
LDX PSP ; get the parameter stack pointer
LDAA 1,X ; get the low byte where the EXbug's 7-bit character should be.
INX ; drop the passed character off the stack
INX
STX PSP ; update the stack pointer
JSR XOUTCH ; output A via monitor ROM
PULX
RTS
It does, however, give us something for some minor improvements to the
parameter push and pop:
PPOPD LDX PSP
LDD 0,X ; We can do both A & B in one instruction now.
INX
INX
STX PSP
RTS
*
PPUSHD LDX PSP
DEX
DEX
STX PSP
STD 0,X ; Double accumulator here, too -- both at once.
RTS
Here, too, If we wanted to make the push and pop less intrusive to the use of X, we could use the 6801's PSHX and POPX, again slightly breaking our promise not to use the return stack for anything but return addresses:
PPOPD PSHX
LDX PSP
LDD 0,X ; We can do both A & B in one instruction now.
INX
INX
STX PSP
PULX
RTS
*
PPUSHD PSHX
LDX PSP
DEX
DEX
STX PSP
STD 0,X ; Double accumulator here, too -- both at once.
PULX
RTS
No changes in OUTC using PPUSHD and PPOPD. We'll have it call our slightly improved push and pop, but you can't tell from OUTC.
OUTC JSR PPOPD ; get the character in B
TBA ; put it where XOUTCH wants it.
JSR XOUTCH ; output via monitor ROM
RTS
Calling PPUSHD might not change any:
CLRA ; 1 byte, 2 cycles
LDAB #'H ; 2 bytes, 2 cycles
JSR PPUSHD
JSR OUTC
Or it might:
LDD #'H ; 3-bytes, 3 cycles
JSR PPUSHD
JSR OUTC
If you've got two monitors, you may want to
open up the 6800 code
on the other one so you can see what the differences between the code here and
the code there are. Even if you have only one monitor, you may want to use
separate browser windows.
* Essential monitor ROM routines
XOUTCH EQU $F018
*
NATWID EQU 2 ; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
ORG $80 ; MDOS and EXbug docs say it should be okay here.
ENTRY JMP START
NOP ; Just want even addressed pointers for no reason.
PSP RMB 2 ; parameter stack pointer
SSAVE RMB 2 ; a place to keep S so we can return clean
*
ORG $2000 ; MDOS says this is a good place for user stuff
NOENTRY JMP START
RMB 2 ; a little bumper space
SSTKLIM RMB 31 ; 16 levels of call, max
SSTKBAS RMB 1 ; 6800 is post-dec (post-store-decrement) push
RMB 2 ; a little bumper space
PSTKLIM RMB 64 ; 16 levels of call at two parameters per call
PSTKBAS RMB 2 ; bumper space -- parameter stack is pre-dec
* (But this example only uses two levels.)
*
*
INISTKS LDX #PSTKBAS ; Set up the parameter stack
STX PSP
PULX ; 6801 lets us do this -- return address in X
STS SSAVE ; Save what the monitor gave us.
LDS #SSTKBAS ; Move to our own stack
JMP 0,X ; return via X
*
PPOPD LDX PSP
LDD 0,X
INX
INX
STX PSP
RTS
*
PPUSHD LDX PSP
DEX
DEX
STX PSP
STD 0,X
RTS
*
*
OUTC JSR PPOPD ; get the character in B
TBA ; put it where XOUTCH wants it.
JSR XOUTCH ; output via monitor ROM
RTS
*
*
START JSR INISTKS
LDD #'H
JSR PPUSHD
JSR OUTC
*
DONE LDS SSAVE ; restore the monitor stack pointer
NOP
NOP ; landing pad
This should run, but, as we discovered in the last set of examples, it's going
to be hard to test on EXORsim6801. (It should be testable on, say,
XRoar's MC-10
emulation, if you have already figured out how to use that with assembly
language. I plan on talking about XRoar later.)
So, let's modularize the terminal initialization and the pause and add them to this code and see what happens with EXORsim6801:
* Essential control codes
LF EQU $0A ; line feed
CR EQU $0D ; carriage return
* Essential monitor ROM routines
XOUTCH EQU $F018
*
NATWID EQU 2 ; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
ORG $80 ; MDOS and EXbug docs say it should be okay here.
ENTRY JMP START
NOP ; Just want even addressed pointers for no reason.
PSP RMB 2 ; parameter stack pointer
SSAVE RMB 2 ; a place to keep S so we can return clean
*
ORG $2000 ; MDOS says this is a good place for user stuff
NOENTRY JMP START
RMB 2 ; a little bumper space
SSTKLIM RMB 31 ; 16 levels of call, max
SSTKBAS RMB 1 ; 6800 is post-dec (post-store-decrement) push
RMB 2 ; a little bumper space
PSTKLIM RMB 64 ; 16 levels of call at two parameters per call
PSTKBAS RMB 2 ; bumper space -- parameter stack is pre-dec
* (But this example only uses two levels.)
*
*
INISTKS LDX #PSTKBAS ; Set up the parameter stack
STX PSP
PULX ; 6801 lets us do this -- return address in X
STS SSAVE ; Save what the monitor gave us.
LDS #SSTKBAS ; Move to our own stack
JMP 0,X ; return via X
*
PPOPD LDX PSP
LDD 0,X
INX
INX
STX PSP
RTS
*
PPUSHD LDX PSP
DEX
DEX
STX PSP
STD 0,X
RTS
*
*
OUTC JSR PPOPD ; get the character in B
TBA ; put it where XOUTCH wants it.
JSR XOUTCH ; output via monitor ROM
RTS
*
* Output a newline combination
OUTNEWL LDD #CR
JSR PPUSHD
JSR OUTC
LDD #LF
JSR PPUSHD
JSR OUTC
RTS
*
* Takes a visible character to print on each line as a parameter.
* Using the simpler run-time model, lots of pushes and pops:
INITRM LDD #40 ; screen full of lines
JSR PPUSHD
INITRL BSR OUTNEWL
LDX PSP
DEC 1,X ; the count
BNE INITRL
JSR PPOPD ; drop the count
RTS
*
* This is not the best timing LOOP for all purposes.
* Also, timing LOOPs are usually not the correct solution.
* Usually.
PAUSE LDAA #5 ; adjust this for your workstation.
CLRB
LDX #0
PLOOP DEX ; 3~
* If not 0 after dec, go back and dec again
BNE PLOOP ; 3~, tot 6 : 65536 times => 393216~
DECB ; 2~, tot 393218~
BNE PLOOP ; 3~, tot 393221~ : times 256 => 100664576~
* At 1 MHz, that should be about 100 seconds.
* But the emulator seems to be running a bit fast.
* It's about 2 seconds on my workstation.
DECA ; 100664578~
BNE PLOOP ; 100664581~ : times 5 => 503322905
RTS ; total of about 10 seconds on my WS
*
START JSR INISTKS
JSR INITRM ; get the terminal's attention
*
LDD #'H
JSR PPUSHD
JSR OUTC
*
JSR OUTNEWL ; make a clear line for the breakpoint output
JSR PAUSE
DONE LDS SSAVE ; restore the monitor stack pointer
NOP
NOP ; landing pad
This has been lightly tested and it works. Give it a try.
You'll note that, just before the pause, I've added a call to the new line
routine. Without that, the line buffering for the terminal emulation in the
older EXORsim seems to hold onto the output character until it gets swallowed
in the the traceback.
You may be thinking that this code should be optimizable by quite a bit. It
really is not worth optimizing, in terms of the code, but walking through
optimization to show what can be done while keeping it readable and clean is a
useful exercise in understanding both the CPU and the optimization processes.
Be careful to read the comments.
* Essential control codes
LF EQU $0A ; line feed
CR EQU $0D ; carriage return
* Essential monitor ROM routines
XOUTCH EQU $F018
*
NATWID EQU 2 ; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
ORG $80 ; MDOS and EXbug docs say it should be okay here.
ENTRY JMP START
NOP ; Just want even addressed pointers for no reason.
PSP RMB 2 ; parameter stack pointer
SSAVE RMB 2 ; a place to keep S so we can return clean
*
ORG $2000 ; MDOS says this is a good place for user stuff
NOENTRY JMP START
RMB 2 ; a little bumper space
SSTKLIM RMB 31 ; 16 levels of call, max
SSTKBAS RMB 1 ; 6800 is post-dec (post-store-decrement) push
RMB 2 ; a little bumper space
PSTKLIM RMB 64 ; 16 levels of call at two parameters per call
PSTKBAS RMB 2 ; bumper space -- parameter stack is pre-dec
* (But this example only uses two levels.)
*
*
INISTKS LDX #PSTKBAS ; Set up the parameter stack
STX PSP
PULX ; 6801 lets us do this -- return address in X
STS SSAVE ; Save what the monitor gave us.
LDS #SSTKBAS ; Move to our own stack
JMP 0,X ; return via X
*
PPOPD LDX PSP
LDD 0,X
INX
INX
STX PSP
RTS
*
PPUSHD LDX PSP
DEX
DEX
STX PSP
STD 0,X
RTS
*
*
* Let's handle the parameter stack directly:
OUTC LDX PSP
LDAA 1,X ; Get the character in A where XOUTCH wants it.
INX ; Got it in A, now drop it from the stack.
INX
STX PSP
OUTCRB JSR XOUTCH ; output via monitor ROM
RTS
*
* Output a newline combination
OUTNEWL LDAA #CR
BSR OUTCRB
LDAA #LF
BRA OUTCRB ; Rob OUTC's RTS
*
* Takes a visible character to print on each line as a parameter.
* Since we can see OUTCRB does not walk on B,
* we'll put the count in B instead of on the parameter stack.
INITRM LDAB #40 ; screen full of lines
INITRL BSR OUTNEWL
DECB ; the count
BNE INITRL
RTS
* Note that if the code OUTNEWL or INITRM get separated from OUTC,
* it will be easy to forget what we are relying on --
* making it easy to introduce bugs down the road.
*
* This is not the best timing LOOP for all purposes.
* Also, timing LOOPs are usually not the correct solution.
* Usually.
PAUSE LDAA #5 ; adjust this for your workstation.
CLRB
LDX #0
PLOOP DEX ; 3~
* If not 0 after dec, go back and dec again
BNE PLOOP ; 3~, tot 6 : 65536 times => 393216~
DECB ; 2~, tot 393218~
BNE PLOOP ; 3~, tot 393221~ : times 256 => 100664576~
* At 1 MHz, that should be about 100 seconds.
* But the emulator seems to be running a bit fast.
* It's about 2 seconds on my workstation.
DECA ; 100664578~
BNE PLOOP ; 100664581~ : times 5 => 503322905
RTS ; total of about 10 seconds on my WS
*
* Let's keep this part high-level.
START JSR INISTKS
JSR INITRM ; get the terminal's attention
*
LDD #'H
JSR PPUSHD
JSR OUTC
*
JSR OUTNEWL ; make a clear line break for the breakpoint output
JSR PAUSE
DONE LDS SSAVE ; restore the monitor stack pointer
NOP
NOP ; landing pad
I noted in the comments that when code you're relying on in your optimizations
gets moved away from where you are optimizing for it, it's really easy to
forget what's going on and add bugs to your code. It's a concept really worth
remembering, especially when trying to decide whether you want to optimize or
not.
Now, let's turn our attention the the string output routines, and give them a similar treatment. We'll go ahead and put the new-line and pause code in, to avoid spending too much time going over the same ground.
First, let's look at handling the parameter stack via subroutine only:
OPT 6801
* First 6800 Foothold on the split stack beach,
* by Joel Matthew Rees August 2024, Copyright 2024 -- All rights reserved.
*
* Essential control codes
LF EQU $0A ; line feed
CR EQU $0D ; carriage return
NUL EQU 0
*
* Essential monitor ROM routines
XOUTCH EQU $F018
*
NATWID EQU 2 ; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
ORG $80 ; MDOS and EXbug docs say it should be okay here.
ENTRY JMP START
NOP ; Just want even addressed pointers for no reason.
PSP RMB 2 ; parameter stack pointer
SSAVE RMB 2 ; a place to keep S so we can return clean
*
*
ORG $2000 ; MDOS says this is a good place for usr stuff
NOENTRY JMP START
RMB 2 ; a little bumper space
SSTKLIM RMB 31 ; 16 levels of call, max
SSTKBAS RMB 1 ; 6800 is post-dec (post-store-decrement) push
RMB 2 ; a little bumper space
PSTKLIM RMB 64 ; 16 levels of call at two parameters per call
PSTKBAS RMB 2 ; bumper space -- parameter stack is pre-dec
*
*
HELLO FCB CR,LF ; Put message at beginning of line
FCC "Ashi-ba ga dekita!" ; Whatever the user wants here.
FCB CR,LF,NUL ; Put the debugger's output on a new line.
*
*
INISTKS LDX #PSTKBAS ; Set up the parameter stack
STX PSP
PULX ; 6801 lets us do this -- return address in X
STS SSAVE ; Save what the monitor gave us.
LDS #SSTKBAS ; Move to our own stack
JMP 0,X ; return via X
*
*
PPOPD LDX PSP
LDD 0,X
INX
INX
STX PSP
RTS
*
PPOPX BSR PPOPD
PSHB ; keep the order straight
PSHA ; high byte low in memory
PULX ; Got X, return address is back on top.
RTS
*
PPUSHD LDX PSP
DEX
DEX
STX PSP
STD 0,X
RTS
*
PPUSHX PSHX
PULA ; keep the order straight!
PULB
BRA PPUSHD ; Rob the return
*
OUTC JSR PPOPD ; get the character in B
TBA ; put it where XOUTCH wants it.
BSR OUTCV ; output A via monitor ROM
RTS
*
OUTCV JMP XOUTCH ; common hook
*
OUTS JSR PPOPX ; get the string pointer
OUTSL LDAA 0,X ; get the byte out there
BEQ OUTDN ; if NUL, leave
BSR OUTCV ; use the same call OUTC uses.
INX ; point to the next
BRA OUTSL ; next character
OUTDN RTS
*
*
* Output a newline combination
OUTNEWL LDD #CR
JSR PPUSHD
JSR OUTC
LDD #LF
JSR PPUSHD
JSR OUTC
RTS
*
* Using the simpler run-time model, lots of pushes and pops:
INITRM LDD #40 ; screen full of lines
JSR PPUSHD
INITRL BSR OUTNEWL
LDX PSP
DEC 1,X ; the count
BNE INITRL
JSR PPOPD ; drop the count
RTS
*
* This is not the best timing LOOP for all purposes.
* Also, timing LOOPs are usually not the correct solution.
* Usually.
PAUSE LDAA #5 ; adjust this for your workstation.
CLRB
LDX #0
PLOOP DEX ; 3~
* If not 0 after dec, go back and dec again
BNE PLOOP ; 3~, tot 6 : 65536 times => 393216~
DECB ; 2~, tot 393218~
BNE PLOOP ; 3~, tot 393221~ : times 256 => 100664576~
* At 1 MHz, that should be about 100 seconds.
* But the emulator seems to be running a bit fast.
* It's about 2 seconds on my workstation.
DECA ; 100664578~
BNE PLOOP ; 100664581~ : times 5 => 503322905
RTS ; total of about 10 seconds on my WS
*
*
START JSR INISTKS
JSR INITRM
*
LDX #HELLO ; There are other ways to push the address.
JSR PPUSHX
JSR OUTS ; Our string has its own CR/LF this time around.
*
JSR PAUSE
DONE LDS SSAVE ; restore the monitor stack pointer
NOP
NOP ; landing pad
Now let's look at handling the stack a little more directly in places where it might make sense to do so:
OPT 6801
* First 6800 Foothold on the split stack beach,
* by Joel Matthew Rees August 2024, Copyright 2024 -- All rights reserved.
*
* Essential control codes
LF EQU $0A ; line feed
CR EQU $0D ; carriage return
NUL EQU 0
*
* Essential monitor ROM routines
XOUTCH EQU $F018
*
NATWID EQU 2 ; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
ORG $80 ; MDOS and EXbug docs say it should be okay here.
ENTRY JMP START
NOP ; Just want even addressed pointers for no reason.
PSP RMB 2 ; parameter stack pointer
SSAVE RMB 2 ; a place to keep S so we can return clean
*
*
ORG $2000 ; MDOS says this is a good place for usr stuff
NOENTRY JMP START
RMB 2 ; a little bumper space
SSTKLIM RMB 31 ; 16 levels of call, max
SSTKBAS RMB 1 ; 6800 is post-dec (post-store-decrement) push
RMB 2 ; a little bumper space
PSTKLIM RMB 64 ; 16 levels of call at two parameters per call
PSTKBAS RMB 2 ; bumper space -- parameter stack is pre-dec
*
*
HELLO FCB CR,LF ; Put message at beginning of line
FCC "Ashi-ba ga dekita!" ; Whatever the user wants here.
FCB CR,LF,NUL ; Put the debugger's output on a new line.
*
*
INISTKS LDX #PSTKBAS ; Set up the parameter stack
STX PSP
PULX ; 6801 lets us do this -- return address in X
STS SSAVE ; Save what the monitor gave us.
LDS #SSTKBAS ; Move to our own stack
JMP 0,X ; return via X
*
*
PPOPD LDX PSP
LDD 0,X
INX
INX
STX PSP
RTS
*
PPOPX BSR PPOPD
PSHB ; keep the order straight
PSHA ; high byte low in memory
PULX ; Got X, return address is back on top.
RTS
*
PPUSHD LDX PSP
DEX
DEX
STX PSP
STD 0,X
RTS
*
PPUSHX PSHX
PULA ; keep the order straight!
PULB
BRA PPUSHD ; Rob the return
*
* Let's handle the parameter stack directly:
OUTC LDX PSP
LDAA 1,X ; Get the character in A where XOUTCH wants it.
INX ; Got it in A, now drop it from the stack.
INX
STX PSP
BSR OUTCV ; output via monitor ROM hook
RTS
*
OUTCV JMP XOUTCH ; common hook
*
* Because of the conflict in using X for two purposes,
* Handling the stack directly will use quite a bit of code.
* So we won't do that here.
OUTS JSR PPOPX ; get the string pointer
OUTSL LDAA 0,X ; get the byte out there
BEQ OUTDN ; if NUL, leave
BSR OUTCV ; use the same call OUTC uses.
INX ; point to the next
BRA OUTSL ; next character
OUTDN RTS
*
*
* Output a newline combination
OUTNEWL LDAA #CR
BSR OUTCV
LDAA #LF
BSR OUTCV ; Can't rob OUTC's RTS
RTS ; Should be able to rob XOUTCH's RTS,
* but don't really want to rely on things I can't see nearby.
*
* Since we can see OUTCRB does not walk on B,
* we'll put the count in B instead of on the parameter stack.
INITRM LDAB #40 ; screen full of lines
INITRL BSR OUTNEWL
DECB ; the count
BNE INITRL
RTS
* Note that if the code OUTNEWL or INITRM get separated from OUTC,
* it will be easy to forget what we are relying on --
* making it easy to introduce bugs down the road.
*
* This is not the best timing LOOP for all purposes.
* Also, timing LOOPs are usually not the correct solution.
* Usually.
PAUSE LDAA #5 ; adjust this for your workstation.
CLRB
LDX #0
PLOOP DEX ; 3~
* If not 0 after dec, go back and dec again
BNE PLOOP ; 3~, tot 6 : 65536 times => 393216~
DECB ; 2~, tot 393218~
BNE PLOOP ; 3~, tot 393221~ : times 256 => 100664576~
* At 1 MHz, that should be about 100 seconds.
* But the emulator seems to be running a bit fast.
* It's about 2 seconds on my workstation.
DECA ; 100664578~
BNE PLOOP ; 100664581~ : times 5 => 503322905
RTS ; total of about 10 seconds on my WS
*
*
START JSR INISTKS
JSR INITRM
*
LDX #HELLO ; There are other ways to push the address.
JSR PPUSHX
JSR OUTS ; Our string has its own CR/LF this time around.
*
JSR PAUSE
DONE LDS SSAVE ; restore the monitor stack pointer
NOP
NOP ; landing pad
Both of these are lightly tested and should work.
Oddly, I seem to have forgotten to handle the stack more directly in the main
routine, from START. Probably too focused on the INITRM and PAUSE routines or
something. If you want to give it a try, it would look something like this:
START JSR INISTKS
JSR INITRM
*
LDX #HELLO ; There are other ways to push the address.
PSHX
PULA ; watch the order!
PULB
LDX PSP
DEX
DEX
STX PSP
STD 0,X
JSR OUTS ; Our string has its own CR/LF this time around.
*
JSR PAUSE
You may be thinking about how much these "minor" improvements will make the 6801 easier to keep responsive and stable than the 6800 at interrupt time and such. I hope so.
Not that you can't do it on the 6800, but it's trickier, especially when trying to keep it responsive. It's going to be slower in general on the 6800 to make sure it's done right.I also hope you're thinking about having OUTS call OUTC instead of the shared OUTCV. We'll talk about that after we've looked out how we do this on the 6809.
And I hope you are thinking about how to check that the stacks have stayed in
balance.
I think it's time to see a little bit of what gets people excited about the 6809.