Saturday, August 31, 2024

ALPP 02-02 -- Foothold! (Split Stacks ... barely on the Beach) -- 6801

Foothold!
(Split Stacks ... barely on the Beach)
6801

(Title Page/Index)

 

The 6801 does not seem to have a lot to add for improving our foothold text output routines on the 6800, but it does offer a little, and what it offers is important. You might not think it enough to warrant a separate chapter, but the last chapter was a bit long and deep, and it will make good review of the 6801. And ...

... as I have said, one major purpose of the split stack is to separate return addresses from parameters and local variables, producing somewhat more robust, less difficult to debug code. 

And, you probably won't believe me yet, but the split stack also simplifies and streamlines the call interface for the subroutines and procedures you write. Promise. It's not quite visible yet, but you shall see it soon.

Maybe not quite yet, but soon. 

The 6801 doesn't really give us anything to improve our 6800 version of OUTC that handles the stack directly:
OUTC	LDX	PSP	; get the parameter stack pointer
	LDAA	1,X	; get the low byte where the EXbug's 7-bit character should be.
	INX		; drop the passed character off the stack
	INX
	STX	PSP	; update the stack pointer
	JSR	XOUTCH	; output A via monitor ROM
	RTS

... unless you want to save X across the use of the routine, of course, but that slightly breaks our promise not to use the return address stack for other purposes:

OUTC	PSHX
	LDX	PSP	; get the parameter stack pointer
	LDAA	1,X	; get the low byte where the EXbug's 7-bit character should be.
	INX		; drop the passed character off the stack
	INX
	STX	PSP	; update the stack pointer
	JSR	XOUTCH	; output A via monitor ROM
	PULX
	RTS

It does, however, give us something for some minor improvements to the parameter push and pop:

PPOPD	LDX	PSP
	LDD	0,X	; We can do both A & B in one instruction now.
	INX
	INX
	STX	PSP
	RTS
*
PPUSHD	LDX	PSP
	DEX
	DEX
	STX	PSP
	STD	0,X	; Double accumulator here, too -- both at once.
	RTS

Here, too, If we wanted to make the push and pop less intrusive to the use of X, we could use the 6801's PSHX and POPX, again slightly breaking our promise not to use the return stack for anything but return addresses:

PPOPD	PSHX
	LDX	PSP
	LDD	0,X	; We can do both A & B in one instruction now.
	INX
	INX
	STX	PSP
	PULX
	RTS
*
PPUSHD	PSHX
	LDX	PSP
	DEX
	DEX
	STX	PSP
	STD	0,X	; Double accumulator here, too -- both at once.
	PULX
	RTS

No changes in OUTC using PPUSHD and PPOPD. We'll have it call our slightly improved push and pop, but you can't tell from OUTC.

OUTC	JSR	PPOPD	; get the character in B
	TBA		; put it where XOUTCH wants it.
	JSR	XOUTCH	; output via monitor ROM
	RTS

Calling PPUSHD might not change any:

	CLRA	; 1 byte, 2 cycles
	LDAB	#'H	; 2 bytes, 2 cycles
	JSR	PPUSHD
	JSR	OUTC

Or it might:

	LDD	#'H	; 3-bytes, 3 cycles
	JSR	PPUSHD
	JSR	OUTC

If you've got two monitors, you may want to open up the 6800 code on the other one so you can see what the differences between the code here and the code there are. Even if you have only one monitor, you may want to use separate browser windows.

There are only minor changes in the overall program (here, not saving X across the parameter pushes and pops):
* Essential monitor ROM routines
XOUTCH	EQU	$F018
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS and EXbug docs say it should be okay here.
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
PSP	RMB	2	; parameter stack pointer
SSAVE	RMB	2	; a place to keep S so we can return clean
*
	ORG	$2000	; MDOS says this is a good place for user stuff
NOENTRY	JMP	START
	RMB	2	; a little bumper space
SSTKLIM	RMB	31	; 16 levels of call, max
SSTKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
	RMB	2	; a little bumper space
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
* (But this example only uses two levels.) * * INISTKS LDX #PSTKBAS ; Set up the parameter stack STX PSP PULX ; 6801 lets us do this -- return address in X STS SSAVE ; Save what the monitor gave us. LDS #SSTKBAS ; Move to our own stack JMP 0,X ; return via X * PPOPD LDX PSP LDD 0,X INX INX STX PSP RTS * PPUSHD LDX PSP DEX DEX STX PSP STD 0,X RTS * * OUTC JSR PPOPD ; get the character in B TBA ; put it where XOUTCH wants it. JSR XOUTCH ; output via monitor ROM RTS * * START JSR INISTKS LDD #'H JSR PPUSHD JSR OUTC * DONE LDS SSAVE ; restore the monitor stack pointer NOP NOP ; landing pad

This should run, but, as we discovered in the last set of examples, it's going to be hard to test on EXORsim6801. (It should be testable on, say, XRoar's MC-10 emulation, if you have already figured out how to use that with assembly language. I plan on talking about XRoar later.)

So, let's modularize the terminal initialization and the pause and add them to this code and see what happens with EXORsim6801:

* Essential control codes
LF	EQU	$0A	; line feed
CR	EQU	$0D	; carriage return
* Essential monitor ROM routines
XOUTCH	EQU	$F018
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS and EXbug docs say it should be okay here.
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
PSP	RMB	2	; parameter stack pointer
SSAVE	RMB	2	; a place to keep S so we can return clean
*
	ORG	$2000	; MDOS says this is a good place for user stuff
NOENTRY	JMP	START
	RMB	2	; a little bumper space
SSTKLIM	RMB	31	; 16 levels of call, max
SSTKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
	RMB	2	; a little bumper space
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
* (But this example only uses two levels.)
*
*
INISTKS	LDX	#PSTKBAS	; Set up the parameter stack
	STX	PSP
	PULX		; 6801 lets us do this -- return address in X
	STS	SSAVE	; Save what the monitor gave us.
	LDS	#SSTKBAS	; Move to our own stack
	JMP	0,X	; return via X
*
PPOPD	LDX	PSP
	LDD	0,X
	INX
	INX
	STX	PSP
	RTS
*
PPUSHD	LDX	PSP
	DEX
	DEX
	STX	PSP
	STD	0,X
	RTS
*
*
OUTC	JSR	PPOPD	; get the character in B
	TBA		; put it where XOUTCH wants it.
	JSR	XOUTCH	; output via monitor ROM
	RTS
*
* Output a newline combination
OUTNEWL	LDD	#CR
	JSR	PPUSHD
	JSR	OUTC
	LDD	#LF
	JSR	PPUSHD
	JSR	OUTC
	RTS
*
* Takes a visible character to print on each line as a parameter.
* Using the simpler run-time model, lots of pushes and pops:
INITRM	LDD	#40	; screen full of lines
	JSR	PPUSHD
INITRL	BSR	OUTNEWL
	LDX	PSP
	DEC	1,X	; the count
	BNE	INITRL
	JSR	PPOPD	; drop the count
	RTS
*
* This is not the best timing LOOP for all purposes.
* Also, timing LOOPs are usually not the correct solution.
* Usually.
PAUSE	LDAA	#5	; adjust this for your workstation.
	CLRB
	LDX	#0
PLOOP	DEX		; 3~
* If not 0 after dec, go back and dec again
	BNE	PLOOP	; 3~, tot 6 : 65536 times => 393216~ 
	DECB		; 2~, tot 393218~
	BNE	PLOOP	; 3~, tot 393221~ : times 256 => 100664576~
* At 1 MHz, that should be about 100 seconds.
* But the emulator seems to be running a bit fast.
* It's about 2 seconds on my workstation.
	DECA		; 100664578~
	BNE	PLOOP	; 100664581~ : times 5 => 503322905
	RTS		; total of about 10 seconds on my WS
*
START	JSR	INISTKS
	JSR	INITRM	; get the terminal's attention
*
	LDD	#'H
	JSR	PPUSHD
	JSR	OUTC
*
	JSR	OUTNEWL	; make a clear line for the breakpoint output
	JSR	PAUSE
DONE	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

This has been lightly tested and it works. Give it a try.

You'll note that, just before the pause, I've added a call to the new line routine. Without that, the line buffering for the terminal emulation in the older EXORsim seems to hold onto the output character until it gets swallowed in the the traceback.

You may be thinking that this code should be optimizable by quite a bit. It really is not worth optimizing, in terms of the code, but walking through optimization to show what can be done while keeping it readable and clean is a useful exercise in understanding both the CPU and the optimization processes. Be careful to read the comments.

* Essential control codes
LF	EQU	$0A	; line feed
CR	EQU	$0D	; carriage return
* Essential monitor ROM routines
XOUTCH	EQU	$F018
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS and EXbug docs say it should be okay here.
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
PSP	RMB	2	; parameter stack pointer
SSAVE	RMB	2	; a place to keep S so we can return clean
*
	ORG	$2000	; MDOS says this is a good place for user stuff
NOENTRY	JMP	START
	RMB	2	; a little bumper space
SSTKLIM	RMB	31	; 16 levels of call, max
SSTKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
	RMB	2	; a little bumper space
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
* (But this example only uses two levels.)
*
*
INISTKS	LDX	#PSTKBAS	; Set up the parameter stack
	STX	PSP
	PULX		; 6801 lets us do this -- return address in X
	STS	SSAVE	; Save what the monitor gave us.
	LDS	#SSTKBAS	; Move to our own stack
	JMP	0,X	; return via X
*
PPOPD	LDX	PSP
	LDD	0,X
	INX
	INX
	STX	PSP
	RTS
*
PPUSHD	LDX	PSP
	DEX
	DEX
	STX	PSP
	STD	0,X
	RTS
*
*
* Let's handle the parameter stack directly:
OUTC	LDX	PSP
	LDAA	1,X	; Get the character in A where XOUTCH wants it.
	INX		; Got it in A, now drop it from the stack.
	INX
	STX	PSP
OUTCRB	JSR	XOUTCH	; output via monitor ROM
	RTS
*
* Output a newline combination
OUTNEWL	LDAA	#CR
	BSR	OUTCRB
	LDAA	#LF
	BRA	OUTCRB	; Rob OUTC's RTS
*
* Takes a visible character to print on each line as a parameter.
* Since we can see OUTCRB does not walk on B,
* we'll put the count in B instead of on the parameter stack.
INITRM	LDAB	#40	; screen full of lines
INITRL	BSR	OUTNEWL
	DECB		; the count
	BNE	INITRL
	RTS
* Note that if the code OUTNEWL or INITRM get separated from OUTC, 
* it will be easy to forget what we are relying on --
* making it easy to introduce bugs down the road.
*
* This is not the best timing LOOP for all purposes.
* Also, timing LOOPs are usually not the correct solution.
* Usually.
PAUSE	LDAA	#5	; adjust this for your workstation.
	CLRB
	LDX	#0
PLOOP	DEX		; 3~
* If not 0 after dec, go back and dec again
	BNE	PLOOP	; 3~, tot 6 : 65536 times => 393216~ 
	DECB		; 2~, tot 393218~
	BNE	PLOOP	; 3~, tot 393221~ : times 256 => 100664576~
* At 1 MHz, that should be about 100 seconds.
* But the emulator seems to be running a bit fast.
* It's about 2 seconds on my workstation.
	DECA		; 100664578~
	BNE	PLOOP	; 100664581~ : times 5 => 503322905
	RTS		; total of about 10 seconds on my WS
*
* Let's keep this part high-level.
START	JSR	INISTKS
	JSR	INITRM	; get the terminal's attention
*
	LDD	#'H
	JSR	PPUSHD
	JSR	OUTC
*
	JSR	OUTNEWL	; make a clear line break for the breakpoint output
	JSR	PAUSE
DONE	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

I noted in the comments that when code you're relying on in your optimizations gets moved away from where you are optimizing for it, it's really easy to forget what's going on and add bugs to your code. It's a concept really worth remembering, especially when trying to decide whether you want to optimize or not.

Now, let's turn our attention the the string output routines, and give them a similar treatment. We'll go ahead and put the new-line and pause code in, to avoid spending too much time going over the same ground.

First, let's look at handling the parameter stack via subroutine only:

	OPT	6801
* First 6800 Foothold on the split stack beach,
* by Joel Matthew Rees August 2024, Copyright 2024 -- All rights reserved.
*
* Essential control codes
LF	EQU	$0A	; line feed
CR	EQU	$0D	; carriage return
NUL	EQU	0
*
* Essential monitor ROM routines
XOUTCH	EQU	$F018
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS and EXbug docs say it should be okay here.
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
PSP	RMB	2	; parameter stack pointer
SSAVE	RMB	2	; a place to keep S so we can return clean
*
*
	ORG	$2000	; MDOS says this is a good place for usr stuff
NOENTRY	JMP	START
	RMB	2	; a little bumper space
SSTKLIM	RMB	31	; 16 levels of call, max
SSTKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
	RMB	2	; a little bumper space
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
*
*
HELLO	FCB	CR,LF	; Put message at beginning of line
	FCC	"Ashi-ba ga dekita!"	; Whatever the user wants here.
	FCB	CR,LF,NUL	; Put the debugger's output on a new line.
*
*
INISTKS	LDX	#PSTKBAS	; Set up the parameter stack
	STX	PSP
	PULX		; 6801 lets us do this -- return address in X
	STS	SSAVE	; Save what the monitor gave us.
	LDS	#SSTKBAS	; Move to our own stack
	JMP	0,X	; return via X
*
*
PPOPD	LDX	PSP
	LDD	0,X
	INX
	INX
	STX	PSP
	RTS
*
PPOPX	BSR	PPOPD
	PSHB		; keep the order straight
	PSHA		; high byte low in memory
	PULX		; Got X, return address is back on top.
	RTS
*
PPUSHD	LDX	PSP
	DEX
	DEX
	STX	PSP
	STD	0,X
	RTS
*
PPUSHX	PSHX
	PULA		; keep the order straight!
	PULB
	BRA	PPUSHD	; Rob the return
*
OUTC	JSR	PPOPD	; get the character in B
	TBA		; put it where XOUTCH wants it.
	BSR	OUTCV	; output A via monitor ROM
	RTS
*
OUTCV	JMP	XOUTCH	; common hook
*
OUTS	JSR	PPOPX	; get the string pointer
OUTSL	LDAA	0,X	; get the byte out there
	BEQ	OUTDN	; if NUL, leave
	BSR	OUTCV	; use the same call OUTC uses.
	INX		; point to the next
	BRA	OUTSL	; next character
OUTDN	RTS
*
*
* Output a newline combination
OUTNEWL	LDD	#CR
	JSR	PPUSHD
	JSR	OUTC
	LDD	#LF
	JSR	PPUSHD
	JSR	OUTC
	RTS
*
* Using the simpler run-time model, lots of pushes and pops:
INITRM	LDD	#40	; screen full of lines
	JSR	PPUSHD
INITRL	BSR	OUTNEWL
	LDX	PSP
	DEC	1,X	; the count
	BNE	INITRL
	JSR	PPOPD	; drop the count
	RTS
*
* This is not the best timing LOOP for all purposes.
* Also, timing LOOPs are usually not the correct solution.
* Usually.
PAUSE	LDAA	#5	; adjust this for your workstation.
	CLRB
	LDX	#0
PLOOP	DEX		; 3~
* If not 0 after dec, go back and dec again
	BNE	PLOOP	; 3~, tot 6 : 65536 times => 393216~ 
	DECB		; 2~, tot 393218~
	BNE	PLOOP	; 3~, tot 393221~ : times 256 => 100664576~
* At 1 MHz, that should be about 100 seconds.
* But the emulator seems to be running a bit fast.
* It's about 2 seconds on my workstation.
	DECA		; 100664578~
	BNE	PLOOP	; 100664581~ : times 5 => 503322905
	RTS		; total of about 10 seconds on my WS
*
*
START	JSR	INISTKS
	JSR	INITRM
*
	LDX	#HELLO	; There are other ways to push the address.
	JSR	PPUSHX
	JSR	OUTS	; Our string has its own CR/LF this time around.
*
	JSR	PAUSE
DONE	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

Now let's look at handling the stack a little more directly in places where it might make sense to do so:

	OPT	6801
* First 6800 Foothold on the split stack beach,
* by Joel Matthew Rees August 2024, Copyright 2024 -- All rights reserved.
*
* Essential control codes
LF	EQU	$0A	; line feed
CR	EQU	$0D	; carriage return
NUL	EQU	0
*
* Essential monitor ROM routines
XOUTCH	EQU	$F018
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS and EXbug docs say it should be okay here.
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
PSP	RMB	2	; parameter stack pointer
SSAVE	RMB	2	; a place to keep S so we can return clean
*
*
	ORG	$2000	; MDOS says this is a good place for usr stuff
NOENTRY	JMP	START
	RMB	2	; a little bumper space
SSTKLIM	RMB	31	; 16 levels of call, max
SSTKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
	RMB	2	; a little bumper space
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
*
*
HELLO	FCB	CR,LF	; Put message at beginning of line
	FCC	"Ashi-ba ga dekita!"	; Whatever the user wants here.
	FCB	CR,LF,NUL	; Put the debugger's output on a new line.
*
*
INISTKS	LDX	#PSTKBAS	; Set up the parameter stack
	STX	PSP
	PULX		; 6801 lets us do this -- return address in X
	STS	SSAVE	; Save what the monitor gave us.
	LDS	#SSTKBAS	; Move to our own stack
	JMP	0,X	; return via X
*
*
PPOPD	LDX	PSP
	LDD	0,X
	INX
	INX
	STX	PSP
	RTS
*
PPOPX	BSR	PPOPD
	PSHB		; keep the order straight
	PSHA		; high byte low in memory
	PULX		; Got X, return address is back on top.
	RTS
*
PPUSHD	LDX	PSP
	DEX
	DEX
	STX	PSP
	STD	0,X
	RTS
*
PPUSHX	PSHX
	PULA		; keep the order straight!
	PULB
	BRA	PPUSHD	; Rob the return
*
* Let's handle the parameter stack directly:
OUTC	LDX	PSP
	LDAA	1,X	; Get the character in A where XOUTCH wants it.
	INX		; Got it in A, now drop it from the stack.
	INX
	STX	PSP
	BSR	OUTCV	; output via monitor ROM hook
	RTS
*
OUTCV	JMP	XOUTCH	; common hook
*
* Because of the conflict in using X for two purposes,
* Handling the stack directly will use quite a bit of code.
* So we won't do that here.
OUTS	JSR	PPOPX	; get the string pointer
OUTSL	LDAA	0,X	; get the byte out there
	BEQ	OUTDN	; if NUL, leave
	BSR	OUTCV	; use the same call OUTC uses.
	INX		; point to the next
	BRA	OUTSL	; next character
OUTDN	RTS
*
*
* Output a newline combination
OUTNEWL	LDAA	#CR
	BSR	OUTCV
	LDAA	#LF
	BSR	OUTCV	; Can't rob OUTC's RTS
	RTS		; Should be able to rob XOUTCH's RTS,
* but don't really want to rely on things I can't see nearby.
*
* Since we can see OUTCRB does not walk on B,
* we'll put the count in B instead of on the parameter stack.
INITRM	LDAB	#40	; screen full of lines
INITRL	BSR	OUTNEWL
	DECB		; the count
	BNE	INITRL
	RTS
* Note that if the code OUTNEWL or INITRM get separated from OUTC, 
* it will be easy to forget what we are relying on --
* making it easy to introduce bugs down the road.
*
* This is not the best timing LOOP for all purposes.
* Also, timing LOOPs are usually not the correct solution.
* Usually.
PAUSE	LDAA	#5	; adjust this for your workstation.
	CLRB
	LDX	#0
PLOOP	DEX		; 3~
* If not 0 after dec, go back and dec again
	BNE	PLOOP	; 3~, tot 6 : 65536 times => 393216~ 
	DECB		; 2~, tot 393218~
	BNE	PLOOP	; 3~, tot 393221~ : times 256 => 100664576~
* At 1 MHz, that should be about 100 seconds.
* But the emulator seems to be running a bit fast.
* It's about 2 seconds on my workstation.
	DECA		; 100664578~
	BNE	PLOOP	; 100664581~ : times 5 => 503322905
	RTS		; total of about 10 seconds on my WS
*
*
START	JSR	INISTKS
	JSR	INITRM
*
	LDX	#HELLO	; There are other ways to push the address.
	JSR	PPUSHX
	JSR	OUTS	; Our string has its own CR/LF this time around.
*
	JSR	PAUSE
DONE	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

Both of these are lightly tested and should work. 

Oddly, I seem to have forgotten to handle the stack more directly in the main routine, from START. Probably too focused on the INITRM and PAUSE routines or something. If you want to give it a try, it would look something like this:

START	JSR	INISTKS
	JSR	INITRM
*
	LDX	#HELLO	; There are other ways to push the address.
	PSHX
	PULA		; watch the order!
	PULB
	LDX	PSP
	DEX
	DEX
	STX	PSP
	STD	0,X
	JSR	OUTS	; Our string has its own CR/LF this time around.
*
	JSR	PAUSE

You may be thinking about how much these "minor" improvements will make the 6801 easier to keep responsive and stable than the 6800 at interrupt time and such. I hope so.

Not that you can't do it on the 6800, but it's trickier, especially when trying to keep it responsive. It's going to be slower in general on the 6800 to make sure it's done right.

I also hope you're thinking about having  OUTS call OUTC instead of the shared OUTCV. We'll talk about that after we've looked out how we do this on the 6809.

And I hope you are thinking about how to check that the stacks have stayed in balance.

I think it's time to see a little bit of what gets people excited about the 6809.


(Title Page/Index)

 

Saturday, August 24, 2024

ALPP 01-12 -- Hello, World! (Not Yet on the Beach) -- 6801 (EXORsim6801)

Hello, World!
(Not Yet on the Beach)
6801 (EXORsim6801)

(Title Page/Index)

 

So I didn't actually check whether there would be any special problems with running the 6800 code in the 6801 simulator in the Hello World exercise using the EXbug monitor ROM routines before telling you the code would be exactly the same except for timing. Didn't check until I started testing the code on the foothold chapter for the 6801.

My bad. Sorry about that.

The code and how it runs is exactly the same except for instruction cycle counts (time), but the EXORsim6801 simulator is from an earlier version of EXORsim, and it does some stuff that needs explaining. And it needs a workaround that requires timing and initialization loops, which are topics I wanted to avoid until I had laid a better foundation.

But timing loops, as much as they really shouldn't be used if you can avoid them, are basically the simplest deterministically terminating loops. And the terminal initialization loop is pretty straightforward, as well. So this can be viewed as a soft introduction to branching and loops.

Whether you work through the 68000 code in the Hello World exercise on the Atari ST first or not is probably not that important. Just don't skip either that chapter or this.

Anyway, here's why this chapter is necessary --

I made the fork of EXORsim when JHAllen was working on getting the terminal emulation to work better, and I just grabbed it at a bad point in the code, I think. Anyway, this version needs about a screenfull of carriage return/line feed combination to start putting text on the screen. And it also always shows a traceback when it hits a breakpoint, which makes it hard to see what happened, especially if the terminal you're using doesn't have a scrollback (history) buffer.

The former leaves you unable to see output until there is a lot. So we need to make more than a single character.

The latter means you have to scroll back up in your scroll (history) buffer, if you've enabled it. If you haven't enabled it, do so now. If your terminal emulator doesn't have such a thing, install one that does. 

If your workstation is a Linux OS workstation, XTerm or something similar should be available. Look for the scroll-back setting and set it for at least a couple of thousand lines, and you should be good. 

If you're working on Cygwin under MSWindows, I think you can install XTerm or something similar. If you're working straight under MSWindows, I think the power shell should do this for you, even if the command-line shell for the OS version you're using doesn't. 

For the problem of having to wade through the traceback, I suggest a timing loop. The emulation seems to be running fast, so the usual million cycles equals one second at 1 MHz clock calculation doesn't work out.

For the former problem, I suggest simply putting a new line character combination in front of the character and after, and putting the character and trailing new line combo in a loop, enough times to fill the screen with new lines. 

Here's a simple delay loop using the B accumulator:

SOFTMR	LDAB	#100	; 500 microseconds at 1 MHz clock.
SOFTLP	DECB		; 2~
	BNE	SOFTLP	; 3~ : tot. 5~
	...

I think it's fairly self-explanatory with the comments. But I'll be long-winded --

DECB, as you might guess, means decrement the B accumulator -- subtract 1 from it.

BNE means branch if not equal. Branches are like jumps, but shorter. (There is more to be said about that, but it'll do for now.) 

Branch if not equal to what? In this case, not equal to zero. (In all cases ≠ 0, really, but more on that later.)

So the code keeps subtracting 1 from the contents of B and branching back to do it again until B hits 0. If you start from a hundred, that's going to take a hundred times to count out. 

If the processor were a human, it would be complaining. But it's just doing what it was designed to do, so apparently it doesn't complain.

DECB takes 2 cycles, and BNE takes 3. That's a total of 5 cycles each time through.

5 times 100 is 500. 

A 1 MHz clock has a period of 1 microsecond, so 500 clocks should be 500 μS. (LDAB immediate takes another 2 cycles, so it's actually a total of 502  μS.)

You can use the X register as well. DEX takes 3 cycles, so it would be 6 cycles per loop. 

Also, on the 6801 and 6809, you can use the double accumulator D, using SUBD subtract from D immediate, since there is no special decrement D instruction:

SOFTMR	LDD	#100	; 700 microseconds at 1 MHz clock.
SOFTLP	SUBD	#1	; 4~
	BNE	SOFTLP	; 3~ : tot. 7~
	...

You can nest loops to get longer times, and you can decrement a byte in memory, as well:

	ORG	$90
COUNT	RMB	1	; one byte counter
	...
	ORG	$2000
	...
SOFTMR	LDAB	#10	; a little bit more than a second
	STAB	COUNT
SOFLOUT	LDA	#100	; a little bit more than 100 milliseconds
SOFLMID	LDAB	#200	; a little bit more than 1000 μS at 1 MHz
SOFLIN	DECB		; 2~
	BNE	SOFLIN	; 3~ : tot. 5~
	DECA		; 2~ : tot. 1004 μS at 1 MHz
	BNE	SOFLMID	; 3~ : tot. 1007 μS at 1 MHz
	DEC	COUNT	; 6~ : tot. 100706 μS at 1 MHz
	BNE	SOFLOUT	; 3= : tot. 100709 μS at 1 MHz
	...

(The ORG directives change where the code is assembled. Ignore them for now; I'm putting them in here to show that the variables and code are in different places, and so you'll sort-of recognize them later. More explanation then.)

Of course, you can tune the initial counts to get closer to a true one-second total delay. But you really want to do this another way if you can.

Now, if you can branch back to do nothing but count, you can also branch back to output the same character over and over. And that's what we'll do to get enough output to get the terminal emulator code in this older version of EXORsim to respond: 

* Essential control codes
LF	EQU	$0A	; line feed
CR	EQU	$0D	; carriage return
*
XOUTCH	EQU	$F018
*
ENTRY	LDAA	#CR	; Put out a CR/LF to start.
	JSR	XOUTCH
	LDAA	#LF
	JSR	XOUTCH
*
* Now let's try putting an H on the screen.
* But let's put a bunch out, with end-of-line.
	LDAB	#40	; more than a screen full of lines
*
* Output the character,
CLOOP	LDAA	#'H	; the character to ouput
	JSR	XOUTCH	; Call the output routine in monitor ROM
*
* ... and the end of line.
	LDAA	#CR	; CR/LF to start
	JSR	XOUTCH
	LDAA	#LF
	JSR	XOUTCH
	DECB		; count down the number of lines
	BNE	CLOOP	; If not zero, go back and do it again.
*
* This is not the best timing LOOP for all purposes.
* Also, timing LOOPs are usually not the correct solution.
* Usually.
PAUSE	LDAA	#5	; adjust this for your workstation.
	CLRB
	LDX	#0
PLOOP	DEX		; 3~
* If not 0 after dec, go back and dec again
	BNE	PLOOP	; 3~, tot 6 : 65536 times => 393216~ 
	DECB		; 2~, tot 393218~
	BNE	PLOOP	; 3~, tot 393221~ : times 256 => 100664576~
* At 1 MHz, that should be about 100 seconds.
* But the emulator seems to be running a bit fast.
* It's about 2 seconds on my workstation.
	DECA		; 100664578~
	BNE	PLOOP	; 100664581~ : times 5 => 503322905
* Should be about 10 seconds on my workstation.
DONE	NOP	; landing pad
	NOP

Yeah, it's no longer short. 

Step through the initialization code for a bit, set a (b)reakpoint and (c)ontinue when you get bored, and scroll back when you hit the breakpoint. The timing loop should give you enough time to see the output before the traceback listing hits the screen.

We'll use the same strategy for the full string, printing the full string with the CR/LF at the end enough times to fill the screen with lines, then delaying hopefully long enough to get a look at the output without having to scroll back:

* Essential control codes
LF	EQU	$0A	; line feed
CR	EQU	$0D	; carriage return
EOT	EQU	$04	; $04 is decimal 4
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
XOUTCH	EQU	$F018	; character output, one at a time
XPDAT1	EQU	$F027	; string output, terminated by EOT
*
ENTRY	JMP	START
* (EXORsim apparently doesn't want to calculate RMB arguments.)
*	RMB	16*NATWID-1
	RMB	31	; 16 levels of call minus any saved registers, max
STKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
*
SAVES	RMB	2	; a place to keep S so we can be clean
*
* Don't want an extra CR/LF in the loop.
*HELLO	FCB	CR,LF	; Put message at beginning of line
HELLO	FCB	"SEKAI YO, YAI!"	; Whatever the user wants here.
	FCB	CR,LF,EOT	; Put the debugger's output on a new line.
*
START	STS	SAVES	; Save what the monitor gives us.
	LDS	#STKBAS	; Move to our own stack
*
	LDAA	#CR	; CR/LF to start
	JSR	XOUTCH
	LDAA	#LF
	JSR	XOUTCH
*
* Now let's try putting the string on the screen.
* But let's put a bunch out.
	LDAB	#40	; more than a screen full of lines
*
* Now, output the string
CLOOP	PSHB		; save count (not really necessary)
	LDX	#HELLO	; point to the string
	JSR	XPDAT1	; output it
*
* ... and the end of line CR/LF is in the string.
	PULB		; restore count
	DECB		; count down the number of lines
	BNE	CLOOP	; If not zero, go back and do it again.
*
* This is not the best timing LOOP for all purposes.
* Also, timing LOOPs are usually not the correct solution.
* Usually.
PAUSE	LDAA	#5	; adjust this for your workstation.
	CLRB
	LDX	#0
PLOOP	DEX		; 3~
* If not 0 after dec, go back and dec again
	BNE	PLOOP	; 3~, tot 6 : 65536 times => 393216~ 
	DECB		; 2~, tot 393218~
	BNE	PLOOP	; 3~, tot 393221~ : times 256 => 100664576~
* At 1 MHz, that should be about 100 seconds.
* But the emulator seems to be running a bit fast.
* It's about 3 seconds on my workstation.
	DECA		; 100664578~
	BNE	PLOOP	; 100664581~ : times 5 => 503322905
* Should be about 10 seconds on my workstation.
	NOP
	NOP
Remember, this chapter was necessary not because of differences between the 6801 and the 6800, but because I've been too lazy to redo my EXORsim6801 (probably with correct timings this time) on JHAllen's most recent version.

And it makes a good excuse for a soft introduction of branches and loops. 

While we're in the mood for debugging, let's take a detour through debugging on the 6800 and 6809.

Or, if you haven't gone through Hello World on the Atari ST (Hatari) in the previous chapter yet, I'd strongly encourage you to do so.

If you've done all that, it's time for getting a real foothold by doing the string output ourselves in a split stack discipline.

(Title Page/Index)

 

 

 

 

Saturday, August 17, 2024

ALPP 02-01 -- Foothold! (Split Stacks ... barely on the Beach) -- 6800

Foothold!
(Split Stacks ... barely on the Beach)
6800

(Title Page/Index)

 

We've had a look at getting characters out on the screen on the EXORciser and Atari ST emulators. And we've had a look at using the resources they provide to put out strings of text. But the EXbug monitor ROM and the Atari ST BIOS both use different disciplines than the discipline I promised to show you.

Now, we could pervert our simulators and go completely raw on them, replacing the monitor, BIOS, DOS, etc. with code that uses that discipline, but we want some cooked code to take with us when we do -- and a lot more experience. 

Yes, at some point we'll do a little bit-twiddling and BLiTting. But first we'll use the character I/O that the monitor and BIOS provide, and prepare some interesting code for when we get there. 

If we limit our use of the existing monitor and BIOS code to the character I/O routines, we make it simpler to move the code to other targets.

To that end, our project for this chapter is to write a bit of glue code for character and text/string terminal I/O on the EXORciser under EXbug, to connect between the split stack discipline I use and the single-stack discipline used in EXbug. Later, we'll extend to using persistent store -- disk I/O. There also, we'll limit our use of code from the other disciplines, to make it easier to port the code we produce to other targets.

Why split stack? 

What is split stack? 

It's going to be easier to show than to explain, but a little hand-waving at the start might help, so I'll give it a try. See (as I wave my hands), it's like this ...

Unless you went spelunking in the EXbug monitor ROM, you may not have noticed it in there. But the PSH (push) and PUL (pop) instructions on the 6800 and 6801 push and pop registers -- on the same stack as the return address. This is the case on most currently commercially successful CPUs. 

Most currently successful CPUs -- the typical run-time models implemented on even the 6809 and 68000, which provide push and pop instructions for multiple stacks, tend to support and use only a single stack.

(I realize this is jumping in a little deep all of a sudden, but we need to get a look at where we are going. It will become clear after a few more topics.)

A single stack is all fine and good, better than none, anyway, but what happens when you forget what you pushed on and try to return to a parameter (in other words, interpreted as an address)? Or when you pop one too many bytes of parameter off and try to return to an address that is part of a parameter and part of something else? 

Yes, things blow up. Or freeze. Or both. Or, worse, silently fail and leave you with hidden corruption in your data. Or, somebody with a far too clever and sneaky bent leaves too much data in your network buffers, overwrites a return address on your stack, and starts remote controlling your computer to do nefarious things like reveal your credit card numbers and send mail in your name to your friends to lure them into traps.

No, that's not being too alarmist, although the split stack is not a complete and perfect cure for all bad coding woes.

What the split stack can do is keep the return address separate from the parameters and local variables, so that if you screw up in popping, pushing, dropping, etc., at least your code eventually returns where it might have been supposed to be instead of somewhere completely irrelevant. It returns in a bad state, yes, but in a somewhat controlled state.

I'm getting too excited about this. Let's look at some ways of doing the glue code:

OUTC	LDX	PSP	; get the parameter stack pointer
	LDAA	1,X	; get the low byte where the EXbug's 7-bit character should be.
	INX		; drop the passed character off the stack
	INX
	STX	PSP	; update the stack pointer
	JSR	XOUTCH	; output A via monitor ROM
	RTS

You were wondering what we were going to use for a parameter stack pointer, weren't you? The 6800 has only one stack, after all, and it is the return address stack I was just getting excited about (not) using.

What we're going to do is implement a software stack. The variable PSP will be its stack pointer, and we can load it into X to use it.

Of course, if we have to maintain it every time we use it, we may find it a bit unwieldy to use. So we can borrow from the Forth playbook and define a couple of routines to do that for us: 

  • Parameter POP Double accumulator and 
  • Parameter PUSH Double accumulator

like this:

PPOPD	LDX	PSP
	LDAA	0,X
	LDAB	1,X
	INX
	INX
	STX	PSP
	RTS
*
PPUSHD	LDX	PSP
	DEX
	DEX
	STX	PSP
	STAA	0,X
	STAB	1,X
	RTS

Under the split-stack discipline, parameters are usually passed on the stack. These two routines will be a little different from that, in that they will use the accumulators to pass the value being pushed or popped. 

We can use PPOPD in our new OUTC as follows. It uses a few more cycles, but the OUTC routine will take fewer bytes than the first version I showed above:

OUTC	JSR	PPOPD	; get the character in B
	TBA		; put it where XOUTCH wants it.
	JSR	XOUTCH	; output via monitor ROM
	RTS

And we can pass the character in using PPUSHD as follows;

	CLRA
	LDAB	#'H
	JSR	PPUSHD
	JSR	OUTC

You may be wondering what happens to X. And, if you think about it, A and B.

Under this approach, if you have something important in the index register or the accumulators, you have to save it before calling routines like these. We could use another global variable to save X while we are using for something like this, but we would then need to worry (more) about interrupts and re-entrancy. (Recursion, we can explicitly disallow.) That is, the more such globals we use the more we have to worry about them.

On the 6801, we could save X by using PSHX to push it to the return address stack, which on the one hand brings us back toward those problems I mentioned about mixing return addresses with saved registers, but on the other hand avoids the interrupt-time issues of globally referenced variables. On the 6809 and 68000, we have enough index registers and addressing modes to not have to resort to this kind of game, and we'll look at those when we have got reasonable solutions to this for the 6800 and 6801.

Let's see what this looks like using PPUSHD and PPOPD as above:

* Essential monitor ROM routines
XOUTCH	EQU	$F018
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS and EXbug docs say it should be okay here.
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
PSP	RMB	2	; parameter stack pointer
SSAVE	RMB	2	; a place to keep S so we can return clean
*
	ORG	$2000	; MDOS says this is a good place for user stuff
NOENTRY	JMP	START
	RMB	2	; a little bumper space
SSTKLIM	RMB	31	; 16 levels of call, max
SSTKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
	RMB	2	; a little bumper space
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
* (But this example only uses two levels.) * * INISTKS LDX #PSTKBAS ; Set up the parameter stack STX PSP TSX ; point to return address LDX 0,X ; return address in X INS ; drop the return pointer on stack INS STS SSAVE ; Save what the monitor gave us. LDS #SSTKBAS ; Move to our own stack JMP 0,X ; return via X * PPOPD LDX PSP LDAA 0,X LDAB 1,X INX INX STX PSP RTS * PPUSHD LDX PSP DEX DEX STX PSP STAA 0,X STAB 1,X RTS * * OUTC JSR PPOPD ; get the character in B TBA ; put it where XOUTCH wants it. JSR XOUTCH ; output via monitor ROM RTS * * START JSR INISTKS CLRA LDAB #'H JSR PPUSHD JSR OUTC * DONE LDS SSAVE ; restore the monitor stack pointer NOP NOP ; landing pad

I've lightly tested it, it should run. Go ahead and give it a try. Check previous chapters if you've forgotten something. 

And remember the (h)elp command if there's something you want to try but don't know how. No promises, but there are things in EXORsim I haven't explained yet that might be helpful.

Now, just for completeness, here's the same thing, but letting the code handle PSP directly, instead of by subroutines, per the first example:

* Essential monitor ROM routines
XOUTCH	EQU	$F018
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS and EXbug docs say it should be okay here.
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
PSP	RMB	2	; parameter stack pointer
SSAVE	RMB	2	; a place to keep S so we can return clean
*
	ORG	$2000	; MDOS says this is a good place for usr stuff
NOENTRY	JMP	START
	RMB	2	; a little bumper space
SSTKLIM	RMB	31	; 16 levels of call, max
SSTKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
	RMB	2	; a little bumper space
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
*
*
INISTKS	LDX	#PSTKBAS	; Set up the parameter stack
	STX	PSP
	TSX		; point to return address
	LDX	0,X	; return address in X
	INS		; drop the return pointer on stack
	INS
	STS	SSAVE	; Save what the monitor gave us.
	LDS	#SSTKBAS	; Move to our own stack
	JMP	0,X	; return via X
*
*
OUTC	LDX	PSP	; get the parameter stack pointer
	LDAA	1,X	; get the low byte where the EXbug's 7-bit character should be.
	INX		; drop the passed character off the stack
	INX
	STX	PSP	; update the stack pointer
	JSR	XOUTCH	; output A via monitor ROM
	RTS
*
*
START	JSR	INISTKS
	LDX	PSP
	DEX
	DEX
	STX	PSP
	CLR	0,X
	LDAB	#'H
	STAB	1,X
	JSR	OUTC
*
DONE	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

About the use of the addresses down at $80, that's in the 6800's direct page. (It's called page zero on some other CPUs.) 

The direct page is in the same overall address space as everything else in the 6800, but you can address it in either of two ways when using binary memory-to-register instructions, specifically, either extended addressing or direct page addressing. Direct page addressing is shorter and quicker.

To explain, if we have a 2-byte pointer variable at address $0080, we can load it into X it with either 

FE 00 80            LDX >$0080

or

DE 80               LDX <$0080

Using the right angle brackets (greater-than and less-than) symbols to force extended mode and direct mode addressing only works on some assemblers. It does not work on EXORsim's interactive mode assembler. Don't get hung up on that, just remember that EXORsim's assembler chooses direct page for you when it can.

Let's look closely at the object code for the two different op-codes:

  • $FE is the op-code for extended mode LDX.
    It has a 2-byte address field, so it takes 3 bytes to encode and runs in 5 cycles.
  • $DE is the op-code for direct-page mode addressing LDX.
    It has a 1-byte address field, so it takes 2 bytes to encode and runs in 4 cycles.

So if we keep a virtual stack pointer down there, we can save some bytes and cycles referencing it. 

In other words, with PSP allocated in the direct page, it's just a tad bit faster to load and store, and just a tad byte shorter code, as well. 

Of course, then you have to be careful to make sure that putting PSP down there won't conflict with other stuff down there, kind of like being careful that using U on the 6809 or A6 on the 68000 as a parameter stack pointer won't conflict with the way the OS uses those registers. 

Yeah, there is software that uses variables that will conflict with our variables at $80, but I assume we aren't using those at the same time we're doing this. If so, we can move our variables a bit higher, maybe to $90.

For a quick detour, how about if we did like everyone else and passed the character on the return address stack? PSHB and PSHA are pretty quick.

NO_OUTC	PULA	; Get the high byte out of the way, we think.
	PULA	; get the low byte where the EXbug's 7-bit character should be, we think.
	JSR	XOUTCH	; output A via monitor ROM, we think
	RTS		; we think

Keeeewelll! Why didn't we just do this in the first place?!!?!

(Cough.)

Talk about trying to output the low byte of the return address instead of the character passed, and trying to return to the character that was supposed to be the parameter as if it were an address -- $0048.

Dang. Okay, what about this?

HMMOUTC	PULX	; get the return address out of the way.
	PULA	; Get the high byte out of the way.
	PULA	; get the low byte where the EXbug's 7-bit character should be.
	JSR	XOUTCH	; output A via monitor ROM
	JMP	0,X	; return through X (XOUTCH preserves X, doesn't it?)

That would actually work on the 6801, since the 6801 has a PULX. But the 6800 does not.

AAAARRRRGGGHHHH!!!!!!! NO FAIR!!

Heh. Okay, let's try something that would actually work on the 6800.

OUTC1S	TSX	; Point to the return address on the stack.
	LDAA	3,X	; Skip the return address and high byte.
	LDX	0,X	; get the return address
	INS		; get rid of what we no longer need
	INS		; bump S past the return address
	INS
	INS		; past the character passed in
	JSR	XOUTCH	; output A via monitor ROM
	JMP	0,X	; return through X

Now, unless you have already read up about how TSX works, you should be wondering why the offsets in that are not one off. 

Motorola, in their wisdom, made TSX and TXS to adjust addresses between X and S so that the pre-decrement push nature of the 6800 S is hidden away. Cool, and convenient, but it bites you when you forget and save S to memory and then try to load it to X for some reason.

But you still have to do stuff to maintain the stack if you put parameters on the return stack on the 6800.

Okay, back from the detour. Let's look at getting strings output. I know, I know, there's already a lot to chew on in this chapter, but we're trying to build a beachhead, and we need a foothold. We aren't there yet. 

We want a routine to output a string with some kind of terminator. EXbug used EOT ($04) as a terminator. The Atari ST's DOS calls used a NUL ($00).

The programming language C uses a NUL as a terminator for many of its standard library functions. There are some problems in doing so, but it's an easy function to define. Point to the string, grab characters in sequence and send them to the output device until we grab a NUL.

This requires learning how to repeat sections of code in a loop, and we'll show how to do that with conditional and absolute branches. (We've already had a soft introduction in the workarounds for EXORsim6801.)

The instruction mnemonic BEQ means Branch if EQual to zero after many instructions.

After a CoMPare instruction or a SUBtract instruction, it means Branch if the two operands of the previous compare or subtract were EQual -- in other words, if the difference is zero.

BNE means Branch if Not Equal to zero. Or, after a compare or subtract, Branch if Not Equal.

Mnemonics can be a little dicey.

Some instructions have more or less unfortunate mnemonics. At least, I might wish the engineers had chosen "BR" instead of "BRA" for BRanch Always. Oh, well. Motorola was not the first to use the mnemonic by any means. 

It's best not to get too hung up on linguistic infelicities.

In a sort of pseudo-code, the string output code might be written something like

Point to first character of the string.
Do in a loop:
	Get the character.
	If it is not NUL, 
		output the character.
until you get to a NUL.

Brainstorming --

If we load the string address into A and B and push it on the parameter stack, we know how to access it as a local variable now, don't we? 

No?

Yes. Have a look:

	LDX	#HELLO
	STX	XWORK
	LDAA	XWORK	; there are other ways to do this.
	LDAB	XWORK+1
	JSR	PPUSHD	; local variable to point into the string
	LDX	PSP	; point to the local variable

So we got the address of the string onto the parameter stack, and we grabbed the top of the parameter stack and that points to the local copy of the pointer to the string.

But then what? The moment we use X to point to the string, we've lost our pointer to the local variables. And if we load our pointer to the local variable, we've lost our pointer to the string. 

To get a little better view of what we want to do, let's see if we can handle the parameter stack directly and write a string output routine something like this:

OUTS	LDX	PSP	; get the parameter stack pointer
* Point to first character of the string:
	LDX	0,X	; point to the string
* Do in a loop:
* 	Get the character.
OUTSL	LDAA	0,X	; get the byte that's out there
* 	If it is not NUL, 
	BEQ	OUTDN	; if NUL, leave
* 		output the character.
	JSR	XOUTCH	; Call through EXbug
	INX		; point to the next
* until you get to a NUL.
	BRA	OUTSL	; do the next character
OUTDN	LDX	PSP	; drop pointer from parameter stack
	INX
	INX
	STX	PSP
	RTS

So, to recap, we

  • get the pointer to the top of the parameter stack;
  • get the pointer to the string, leave PSP as it was;
  • (The label OUTSL marks the beginning of the loop.)
  • load accumulator A via the pointer to the string,
    setting the Z flag if it's a NUL;
  • if the Z flag is set,
    branch out of the loop, to label OUTDN;
  • increment the pointer to the string;
  • branch unconditionally to OUTSL,
    the beginning of the loop, to go again.
  • (The label OUTDN marks the first instruction location
    after the loop body.)

And after the loop is complete, at label OUTDN, we

  • get the pointer to the top of the parameter stack back;
  • increment it twice; and
  • update the top of the stack
    to drop the pointer from the stack.

It looks like it should work.

Here's the complete test program, which light testing shows does work:

* Essential control codes
LF	EQU	$0A	; line feed
CR	EQU	$0D	; carriage return
NUL	EQU	0	; ASCII NUL
*
* Essential monitor ROM routines
XOUTCH	EQU	$F018
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS and EXbug docs say it should be okay here.
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
PSP	RMB	2	; parameter stack pointer
SSAVE	RMB	2	; a place to keep S so we can return clean
*
* XWORK must not be used by any routine that calls another routine!
* It must also not be used for values with long duration.
XWORK	RMB	2	; a place to work on X for very short calcualations.
*
	ORG	$2000	; MDOS says this is a good place for usr stuff
NOENTRY	JMP	START
	RMB	2	; a little bumper space
SSTKLIM	RMB	31	; 16 levels of call, max
SSTKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
	RMB	2	; a little bumper space
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
*
*
HELLO	FCB	CR,LF	; Put message at beginning of line
	FCC	"Ashi-gakari ga dekita!"	; Whatever the user wants here.
	FCB	CR,LF,NUL	; Put the debugger's output on a new line.
*
*
INISTKS	LDX	#PSTKBAS	; Set up the parameter stack
	STX	PSP
	TSX		; point to return address
	LDX	0,X	; return address in X
	INS		; drop the return pointer on stack
	INS
	STS	SSAVE	; Save what the monitor gave us.
	LDS	#SSTKBAS	; Move to our own stack
	JMP	0,X	; return via X
*
*
OUTC	LDX	PSP	; get the parameter stack pointer
	LDAA	1,X	; get the low byte where the EXbug's 7-bit character should be.
	INX		; drop the passed character off the stack
	INX
	STX	PSP	; update the stack pointer
	BSR	OUTCV	; output A via monitor ROM
	RTS
*
OUTCV	JMP	XOUTCH	; Centralize the calls into the monitor.
*
OUTS	LDX	PSP	; get the parameter stack pointer
	LDX	0,X	; get the string pointer
OUTSL	LDAA	0,X	; get the byte out there
	BEQ	OUTDN	; if NUL, leave
	BSR	OUTCV	; use the same call OUTC uses.
	INX		; point to the next
	BRA	OUTSL	; next character
OUTDN	LDX	PSP	; drop pointer from parameter stack
	INX
	INX
	STX	PSP
	RTS
*
*
START	JSR	INISTKS
	LDX	#HELLO	; Other assemblers allow splitting addresses in half.
	STX	XWORK
	LDAA	XWORK
	LDAB	XWORK+1
	LDX	PSP
	DEX
	DEX
	STX	PSP
	STAA	0,X
	STAB	1,X
	JSR	OUTS
*
DONE	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

Note that I am centralizing the call into the monitor ROM so that, if you use this code on another platform, you only need to change the address that OUTCV jumps to. 

Also, the FCC directive, Form Constant Character, is used for making strings. Many assemblers will allow strings to be assembled with FCB, but some will require FCC instead.

Now, if we want to cut this up into re-usable subroutines, like PPUSHD and PPOPD, we probably want to define PPUSHX and PPOPX. If we are careful, PPUSHX and PPOPX can re-use PPUSHD and PPOPD.

* Essential control codes
LF	EQU	$0A	; line feed
CR	EQU	$0D	; carriage return
NUL	EQU	0	; ASCII NUL
*
* Essential monitor ROM routines
XOUTCH	EQU	$F018
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS and EXbug docs say it should be okay here.
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
PSP	RMB	2	; parameter stack pointer
SSAVE	RMB	2	; a place to keep S so we can return clean
*
* XWORK must not be used by any routine that calls another routine!
* XWORK must also not be used for values with long duration.
XWORK	RMB	2	; a place to work on X for very short calcualations.
* More accurately, it must not be in use when a routine that uses it 
* calls another routine.
*
* XSTKSV is strictly for PPUSHX and PPOPX
XSTKSV	RMB	2	; a place to keep X for pushing and popping.
*
	ORG	$2000	; MDOS says this is a good place for usr stuff
NOENTRY	JMP	START
	RMB	2	; a little bumper space
SSTKLIM	RMB	31	; 16 levels of call, max
SSTKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
	RMB	2	; a little bumper space
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
*
*
HELLO	FCB	CR,LF	; Put message at beginning of line
	FCC	"Ashi-gakari ga dekita!"	; Whatever the user wants here.
	FCB	CR,LF,NUL	; Put the debugger's output on a new line.
*
*
INISTKS	LDX	#PSTKBAS	; Set up the parameter stack
	STX	PSP
	TSX		; point to return address
	LDX	0,X	; return address in X
	INS		; drop the return pointer on stack
	INS
	STS	SSAVE	; Save what the monitor gave us.
	LDS	#SSTKBAS	; Move to our own stack
	JMP	0,X	; return via X
*
*
PPOPD	LDX	PSP
	LDAA	0,X
	LDAB	1,X
	INX
	INX
	STX	PSP
	RTS
*
PPOPX	BSR	PPOPD
	STAA	XSTKSV
	STAB	XSTKSV+1
	LDX	XSTKSV
	RTS
*
PPUSHD	LDX	PSP
	DEX
	DEX
	STX	PSP
	STAA	0,X
	STAB	1,X
	RTS
*
PPUSHX	STX	XSTKSV
	LDAA	XSTKSV
	LDAB	XSTKSV+1
	BRA	PPUSHD	; rob RTS
*
OUTC	JSR	PPOPD	; get the character in B
	TBA		; put it where XOUTCH wants it.
	BSR	OUTCV	; output A via monitor ROM
	RTS
*
OUTCV	JMP	XOUTCH
*
OUTS	JSR	PPOPX	; get the string pointer
OUTSL	LDAA	0,X	; get the byte out there
	BEQ	OUTDN	; if NUL, leave
	BSR	OUTCV	; use the same call OUTC uses.
	INX		; point to the next
	BRA	OUTSL	; next character
OUTDN	RTS
*
*
START	JSR	INISTKS
	LDX	#HELLO	; There are other ways to push the address.
	JSR	PPUSHX
	JSR	OUTS
*
DONE	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

How does that work? 

We hide the resource conflicts in the pushes and pops of the glue subroutines.

And, incidentally, here we can see that, when the code is close, we can use branches instead of jumps, including Branch to SubRoutine instead of Jump to SubRoutine.

Branches on the 6800 and 6801 use byte offsets instead of 2-byte absolute addresses, and are limited to a range of -128 to +127 from the address after the offset. This saves a byte and a cycle. 

Also, branch offsets are not absolute addresses, but that is not as important as we might wish it were -- not until the 6809 and 68000 (which needs a chapter or two of its own to explain, way down the road).

It's worth noting here that, if we intend to use the code above to make our own monitor or BIOS, we'll need to consider a number of things we're ignoring here, such as what to do during interrupts -- and how to make sure the stacks have remained in balance.

And, with that, I think we are ready to see how the 6801 improves things just a little.


(Title Page/Index)

 

 

 

Sunday, August 11, 2024

ALPP 01-11 -- Hello, World! -- (Not Yet on the Beach) -- 68000

Hello, World!
(Not Yet on the Beach)
68000

(Title Page/Index)

 

Now that we have waded through getting text on the terminal screen with the 6800/6801 and 6809 under the EXORbug monitor ROM EXbug, let's take a look at how to do it on the 68000 under the Atari/EmuTOS BIOS -- and DOS.

The larger address space of the 68000 allows working efficiently with larger character sets, but even the older Japanese Industrial Standards JIS codes (roughly 2000 characters at the time, now much more), containing just Japanese Kanji and Kana, US ASCII, Greek, and Cyrillic, were enough to bring a Macintosh with an early 68000 to its knees unless the application software engineers really knew what they were doing. (Parsing shift-JIS could become especially intransigent in certain situations)

As far as I know, Atari never officially tried to deal with character sets as large as the JIS encodings for the ST, much less complete Unicode. 

But it has fairly complete US-ASCII plus a bit, and that will work fine for us for now.

Again, our target for this exercise is to get a string of characters, such as "On the beach!" or "Hello World!" output on the screen where we can read it. 

With the EXORciser, we used the EXbug monitor routines for outputting characters and strings. 

Character Output

With the Atari ST, we'll use the BIOS routines for outputting characters. 

The BIOS routines are documented in several sources. The one which I have seen most recommended so far is Abacus Software's Atari ST Internals, by Gerits, Englisch, and Bruckmann. You can find PDFs of it which can be downloaded, but I am not in a position to tell you whether Abacus Software or the authors or their successors in interest have any particular opinions about you downloading it.

I'll try to provide enough of the documentation of the routines we use for our purposes here.

Facts about BIOS calls on the Atari ST to (partially) absorb before we dive in -- 

  • BIOS routines are called via the 68000's TRAP instruction, which is similar to but different from the 8086's  INT instruction, and kind of similar to but more different from the 6800/6801/6809 SWI instruction. 
  • Specifically, TRAP 13 is used for BIOS calls on the Atari ST.
  • Doing BIOS calls via TRAP instructions takes more time than a direct call (JSR or BSR) into a monitor ROM would, significantly more than twice as long. 
  • But the TRAP call also provides the means of switching from USER state to SUPERVISOR state, supporting putting a wall between system data and user code, and such.
  • By Atari ST BIOS convention, call parameters for the BIOS routines are pushed on the A7 stack before the TRAP is issued. (Compare this to the EXbug convention of using registers and a simple call.)
  • The BIOS routine number will also be pushed onto the A7 stack, as the last parameter pushed before the TRAP is issued.
  • Registers D0-D2 and A0-A2 are volatile across the BIOS calls. Save them if you need them.
  • Register D0 will carry the return value on return. 

The routine we will use to output a character to the screen is called  bconout, BIOS routine number 3. It takes two parameters, the character to output and the device to output it to, in that order.

Device numbers are 

  • 0: (PRT) printer port (Centronics)
  • 1: (AUX) serial port (RS-232C)
  • 2: (CON) console (keyboard/screen)
  • 3: (MIDI) MIDI interface
  • 4: (IKBD) programmable keyboard interface

This snippet of code demonstrates how the above could be used to output the character 'H' to the screen:

	MOVE.W	#'H',-(A7)	; push character to be output
	MOVE.W	#2,-(A7)	; push the device number for the screen
	MOVE.W	#3,-(A7)	; push the bconout BIOS routine selector
	TRAP	#13		; call into the BIOS
	ADDQ.L	#6,A7		; deallocate the parameters when done
Explicit pushes on the 68000 are accomplished by predecrement MOVE instructions targeting the stack register. The order in which the arguments are pushed matters.

If, for example, at the point of the TRAP instruction,

  • user A7 has the address $00077FF2
  • superviser A7 has the address $00007E64
  • PC has the address $00013d22, and
  • SR/CC has only the interrupt mask set to level 3 ($0300)

here's what the BIOS code will see on the user and system stacks at the start of the TRAP 13 service routine:

Address Data Description
User Stack:
$00077FF8: (arbitrary)
$00077FF6: $0048 'H' (ASCII)
$00077FF4: $0002 Device #
$00077FF2: $0003 bconout code
System Stack:
$00007E62: $3d24 Return Address low word
$00007E60: $0001 Return Address high word
$00007E5E: $0300 SR/CC state before TRAP

From there, the BIOS code will proceed to figure out that the call is to put the H on the screen at the current cursor position, etc., and do it.

If you're thinking that's a lot more work than we were doing with EXbug, and a lot more than EXbug was doing, you're right. If you're wondering whether it's reasonable, well, ... it's a long story. We need some background. Anyway, that's what it looks like.

Here's some code to try out on Hatari:

	OPT LIST,SYMTAB	; Options we want for the stand-alone assembler.
	MACHINE MC68000	; because there are a lot the assembler can do.
	OPT DEBUG	; We want labels for debugging.
	OUTPUT
***********************************************************************

*
BIOSTRAP	EQU	13
bconout		EQU	3
devscrkbd	EQU	2

	EVEN	
ENTRY	JMP	START
*
START	MOVE.W	#'H',-(A7)		; push character to be output
	MOVE.W	#devscrkbd,-(A7)	; push the device number
	MOVE.W	#bconout,-(A7)	; push the BIOS routine selector
	TRAP	#BIOSTRAP		; call into the BIOS
	ADDQ.L	#6,A7		; deallocate the parameters when done
	NOP
DONE	NOP
* One way to return to the OS or other calling program
	clr.w	-(sp)	; there should be enough room on the caller's stack
	trap	#1		;	quick exit

You could save that as something like "out_H.s", and then assemble it with something like

vasmm68k_mot -Ftos -no-opt -o OUT_H.PRG -L out_H.lst out_H.s

You might want to make a new directory to keep this work separate from the earlier work, and save the file there before you assemble it.

Remember, after you start hatari, to 

  • use Ctrl-C to break out of the GUI shell into the command-line shell, 
  • change to your new working directory, 
  • use Alt-PAUSE to break into the debugger, 
  • set a breakpoint on entry to TEXT,
  • continue back to the command-line shell,
  • run the program by the program file name you gave to vasm68k,
  • and show the registers before and while you step through the code.

Both before you execute the TRAP and after, you can use the memory dump command to look at the stacks, to help see what's going on. 

Using the register display command, you will see the user stack pointer (USP) and system (interrupt) stack pointer (ISP) and you can use those addresses for the memory dump:

Skipping duplicate address & symbol name checks when autoload is enabled.
Loaded 6 symbols (4 TEXT) from '/home/nova/usr/share/hatari/C:/primer/s2_str/OUT_H.PRG'.

CPU=$13d10, VBL=7282, FrameCycles=62520, HBL=61, LineCycles=544, DSP=N/A
00013d10 4ef9 0001 3d16           jmp $00013d16
> s

CPU=$13d16, VBL=7282, FrameCycles=62532, HBL=61, LineCycles=556, DSP=N/A
00013d16 3f3c 0048                move.w #$0048,-(a7) [0000]
> s

CPU=$13d1a, VBL=7282, FrameCycles=62544, HBL=61, LineCycles=568, DSP=N/A
00013d1a 3f3c 0002                move.w #$0002,-(a7) [0000]
> s

CPU=$13d1e, VBL=7282, FrameCycles=62556, HBL=61, LineCycles=580, DSP=N/A
00013d1e 3f3c 0003                move.w #$0003,-(a7) [0000]
> r
  D0 00000000   D1 00000000   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000000 
  A0 00000000   A1 00000000   A2 00000000   A3 00000000 
  A4 00013D2E   A5 00013D2E   A6 00077FC6   A7 00077FF4 
USP  00077FF4 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 3f3c (MOVE) 0003 (OR) Chip latch 00000000
00013d1e 3f3c 0003                move.w #$0003,-(a7) [0000]
Next PC: 00013d22
> s

CPU=$13d22, VBL=7282, FrameCycles=62568, HBL=61, LineCycles=592, DSP=N/A
00013d22 4e4d                     trap #$0d
> r 
  D0 00000000   D1 00000000   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000000 
  A0 00000000   A1 00000000   A2 00000000   A3 00000000 
  A4 00013D2E   A5 00013D2E   A6 00077FC6   A7 00077FF2 
USP  00077FF2 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 4e4d (TRAP) 5c8f (ADDA) Chip latch 00000000
00013d22 4e4d                     trap #$0d
Next PC: 00013d24
> m $00077FF2 32
00077FF2: 00 03 00 02 00 48 00 00 00 00 00 01 3c 10 c6 00   .....H......<...
00078002: c6 00 38 00 38 00 00 00 00 00 00 00 00 00 00 00   ..8.8...........
> m $0007E64 16
00007E64: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
> s

CPU=$e00c4a, VBL=7282, FrameCycles=62604, HBL=61, LineCycles=628, DSP=N/A
00e00c4a 3239 00e0 198c           move.w $00e0198c [000c],d1
> r
  D0 00000000   D1 00000000   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000000 
  A0 00000000   A1 00000000   A2 00000000   A3 00000000 
  A4 00013D2E   A5 00013D2E   A6 00077FC6   A7 00007E5E 
USP  00077FF2 ISP  00007E5E 
T=00 S=1 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 3239 (MOVE) 00e0 (ILLEGAL) Chip latch 00000000
00e00c4a 3239 00e0 198c           move.w $00e0198c [000c],d1
Next PC: 00e00c50
> m $77FF2 16
00077FF2: 00 03 00 02 00 48 00 00 00 00 00 01 3c 10 c6 00   .....H......<...
> m $7E5E 16
00007E5E: 03 00 00 01 3d 24 00 00 00 00 00 00 00 00 00 00   ....=$..........
> 

You can see the parameters and the BIOS call selector there on the USP stack. After the call, you can see the status register and return address there on the ISP stack.

While we are here, look back at the supervisor (S) bit in the status flags and at the address in A7 to see which stack is active when.

You may not want to trace all the way through the BIOS code. It does get a bit monotonous. Or you might find it interesting, after all. Whatever. Either way, go ahead and step through and/or continue to the end.

Check the screen after you let it run to completion. Look for the 'H' on the next line before the command-line shell prompt, something like

C:\primer\s2_str>OUT_H.PRG
HC:\primer\s2_str>

(If you don't let it run to completion and return to the TOS command-line prompt, you'll find that line buffering won't show you the  character you output until it shows you the whole next line with the prompt. Ignore that for now, since it does ultimately show up.)

String Output

There doesn't seem to be a routine in the BIOS to output a string -- nor even in the extended XBIOS. But there is one in the DOS level calls, called PRINT LINE in some documentation, which prints an entire string to the screen at the current cursor position. Again, we won't be using the system library string output routines for much after this, but let's take a look at how it works.

Things to consider about this call:

  • The GEMDOS calls are called by TRAP 1. 
  • The PRINT LINE function is function number 9.
  • The parameter is the 32-bit address of the string.
  • The string is terminated by an ASCII NUL (0)

Before looking at the example code below, refer to the example above and see if you can construct an appropriate example on your own. Remember you need to push the parameters and the function code in the correct order, and then make the right TRAP call.

If you define your own stack area, you'll want to know that the 68000's standard equivalent of RMB is DS.B. Since the assembler also provides DS.L, you don't have to multiply the number of levels of call you want to provide room for by the byte width of return addresses.

What do you think? Can you do it?

Compare your work with mine:

	OPT LIST,SYMTAB	; Options we want for the stand-alone assembler.
	MACHINE MC68000	; because there are a lot the assembler can do.
	OPT DEBUG	; We want labels for debugging.
	OUTPUT
***********************************************************************

*
GEMDOSTRAP	EQU	1
GEMprintstr	EQU	9	; PRINT LINE in some docs

LF	EQU	$0A	; line feed
CR	EQU	$0D	; carriage return
NUL	EQU	0

* Opinions may vary about the natural integer.
NATWID	EQU	4	; 4 bytes in the CPU's natural integer

	EVEN	
ENTRY	JMP	START
STKLIM	DS.L	32	; 32 levels minus paramaters and locals
STKBAS	EQU	*	; 68000 is pre-dec (pre-store-decrement) push
*
HELLO	DC.B	CR,LF	; Put message at beginning of line
	DC.B	"SEKAI YO, YAI!"	; Whatever the user wants here.
	DC.B	CR,LF,NUL	; Put the debugger's output on a new line.
*
	EVEN
START	MOVE.L	A7,D7			; D7 is supposed to be preserved.
	MOVE.L	#STKBAS,A7		; load our own stack
	MOVE.L	#HELLO,-(A7)		; push address of string to be output
	MOVE.W	#GEMprintstr,-(A7)	; push the BIOS routine selector
	TRAP	#GEMDOSTRAP		; call into GEMDOS
	MOVE.L	D7,A7			; restore previous user stack pointer
*	ADDQ.L	#6,A7		; no need to deallocate after restore
	NOP
DONE	NOP
* One way to return to the OS or other calling program
	clr.w	-(sp)	; there should be enough room on the caller's stack
	trap	#1		;	quick exit

If you save the above as "out_str.s", you can assemble it with

vasmm68k_mot -Ftos -no-opt -o OUT_STR.PRG -L out_str.lst out_str.s

and you can run it under the debugger as you did with the character output routine. Go ahead and do that and watch the stack as you do:

Loaded 11 symbols (5 TEXT) from '/home/nova/usr/share/hatari/C:/primer/s2_str/OUT_STR.PRG'.

CPU=$13d10, VBL=6908, FrameCycles=97032, HBL=95, LineCycles=512, DSP=N/A
00013d10 4ef9 0001 3daa           jmp $00013daa
> r
  D0 00000000   D1 00000000   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000000 
  A0 00000000   A1 00000000   A2 00000000   A3 00000000 
  A4 00013DC8   A5 00013DC8   A6 00077FC6   A7 00077FF8 
USP  00077FF8 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 4ef9 (JMP) 0001 (OR) Chip latch 00000000
00013d10 4ef9 0001 3daa           jmp $00013daa
Next PC: 00013d16
> s

CPU=$13daa, VBL=6908, FrameCycles=97044, HBL=95, LineCycles=524, DSP=N/A
00013daa 2e0f                     move.l a7,d7
> s

CPU=$13dac, VBL=6908, FrameCycles=97048, HBL=95, LineCycles=528, DSP=N/A
00013dac 2e7c 0001 3d96           movea.l #$00013d96,a7
> s

CPU=$13db2, VBL=6908, FrameCycles=97060, HBL=95, LineCycles=540, DSP=N/A
00013db2 2f3c 0001 3d96           move.l #$00013d96,-(a7) [00000000]
> s

CPU=$13db8, VBL=6908, FrameCycles=97080, HBL=95, LineCycles=560, DSP=N/A
00013db8 3f3c 0009                move.w #$0009,-(a7) [0000]
> r
  D0 00000000   D1 00000000   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00077FF8 
  A0 00000000   A1 00000000   A2 00000000   A3 00000000 
  A4 00013DC8   A5 00013DC8   A6 00077FC6   A7 00013D92 
USP  00013D92 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 3f3c (MOVE) 0009 (ILLEGAL) Chip latch 00000000
00013db8 3f3c 0009                move.w #$0009,-(a7) [0000]
Next PC: 00013dbc
> s

CPU=$13dbc, VBL=6908, FrameCycles=97092, HBL=95, LineCycles=572, DSP=N/A
00013dbc 4e41                     trap #$01
> m $13d92 16
00013D92: 00 01 3d 96 0d 0a 53 45 4b 41 49 20 59 4f 2c 20   ..=...SEKAI YO, 
> m $7e64 16
00007E64: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
> r
  D0 00000000   D1 00000000   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00077FF8 
  A0 00000000   A1 00000000   A2 00000000   A3 00000000 
  A4 00013DC8   A5 00013DC8   A6 00077FC6   A7 00013D90 
USP  00013D90 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 4e41 (TRAP) 2e47 (MOVEA) Chip latch 00000000
00013dbc 4e41                     trap #$01
Next PC: 00013dbe
> s

CPU=$fa002a, VBL=6908, FrameCycles=97128, HBL=95, LineCycles=608, DSP=N/A
00fa002a 0008                     illegal 
> r
  D0 00000000   D1 00000000   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00077FF8 
  A0 00000000   A1 00000000   A2 00000000   A3 00000000 
  A4 00013DC8   A5 00013DC8   A6 00077FC6   A7 00007E5E 
USP  00013D90 ISP  00007E5E 
T=00 S=1 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 0008 (ILLEGAL) 690a (Bcc) Chip latch 00000000
00fa002a 0008                     illegal 
Next PC: 00fa002c
> m $13d90 16
00013D90: 00 09 00 01 3d 96 0d 0a 53 45 4b 41 49 20 59 4f   ....=...SEKAI YO
> m $7e5e 16
00007E5E: 03 00 00 01 3d be 00 00 00 00 00 00 00 00 00 00   ....=...........
> 

Go ahead and step through or continue, and make sure you can get the string of your choice output to the screen:

C:\primer\s2_str>OUT_STR.PRG

SEKAI YO, YAI!
C:\primer\s2_str>

I want to show you the PEA instruction, and I may not have another excuse for it, so let's look at that again using some of those other addressing modes the 68000 enables. Here's the code, I've stepped through it and it works:

	OPT LIST,SYMTAB	; Options we want for the stand-alone assembler.
	MACHINE MC68000	; because there are a lot the assembler can do.
	OPT DEBUG	; We want labels for debugging.
	OUTPUT
***********************************************************************

*
GEMDOSTRAP	EQU	1
GEMprintstr	EQU	9	; PRINT LINE in some docs

LF	EQU	$0A	; line feed
CR	EQU	$0D	; carriage return
NUL	EQU	0

* Opinions may vary about the natural integer.
NATWID	EQU	4	; 4 bytes in the CPU's natural integer

	EVEN	
ENTRY	BRA.W	START
STKLIM	DS.L	32	; 32 call levels minus paramaters and locals
STKBAS	EQU	*	; 68000 is pre-dec (pre-store-decrement) push
*
HELLO	DC.B	CR,LF	; Put message at beginning of line
	DC.B	"SEKAI YO, YAI!"	; Whatever the user wants here.
	DC.B	CR,LF,NUL	; Put the debugger's output on a new line.
*
	EVEN
START	MOVE.L	A7,D7			; D7 is supposed to be preserved.
	LEA	STKBAS(PC),A7		; load our own stack
	PEA	HELLO(PC)		; push address of string to be output
	MOVE.W	#GEMprintstr,-(A7)	; push the BIOS routine selector
	TRAP	#GEMDOSTRAP		; call into GEMDOS
	MOVE.L	D7,A7			; restore previous user stack pointer
	NOP
DONE	NOP
* One way to return to the OS or other calling program
	clr.w	-(sp)	; there should be enough room on the caller's stack
	trap	#1		;	quick exit

Go ahead and assemble it and run it through the debugger. After you step through the PEA instruction, take a look at the stack to make sure the right address is there. Use the (m)emory dump instruction to show appropriate pieces of memory, to make sure you believe it.

Now, the Push Effective Address (PEA) instruction is not going to be nearly as useful to us as we might wish. It only pushes effective addresses on the A7 stack, and we are not going to be doing that very much. But you need to know it exists.

So, we have a sort-of Hello-world! program that isn't really as functional as we wish for a number of reasons.

If you haven't yet gone through the soft introduction to branching and loops that I used to work around issues in my EXORsim6801 branch of EXORsim, you should. 

And you might also want to take a detour (I strongly recommend it!) to practice debugging on the 6800 and 6809.

If you have already gone through that, I think we want to walk through using the character output functions to write our own string output functions. And I guess we'll want to understand why, first, or at least as we go.


(Title Page/Index)

 

 

 

 

Sunday, August 4, 2024

ALPP 01-10 -- Hello, World! (Not Yet on the Beach) -- 6800, 6801, 6809

Hello, World!
(Not Yet on the Beach)
6800, 6801, 6809

(Title Page/Index)

 

We've seen a little about how to work our way sequentially through a list of small integers.

Characters are essentially small integers. In the case of 7-bit US ASCII, they are -- or were -- 7-bit integers, with values ranging from 0 to 127. 

So a string of text is a list of small integers, serially accessed.

At least, it used to be so in the small computer world, back in the 1970s and early '80s. Then ligatures came to the PC world. That alone broke the 7-bit ASCII small integer paradigm. And then along came international character sets and mixed contexts and byte orders and assertions that 16 bits would be enough for all the useful characters of all modern languages together, ...

Don't get me wrong, the Unicode Consortium has been doing lots of useful work cataloging characters and their construction. And Unicode itself makes possible a lot more international communication than has been possible in the past.

But the existence of Unicode makes it clear that it is no longer possible to just consider characters to be small integers and be done with it. Nor can we consider text to be a list of small integers serially accessed and be done with it. Not really true, even when you consider that 32-bit integers are still small integers.

But it never really was true. Ligatures were only part of the problems that broke the paradigm, even before we started trying to take on all the languages of the world.

On the other hand, you could do a lot of useful work under the assumption of a list of small integers. That assumption has driven the success of Google and is at the heart of the "progress" of AI. There is a very large subset of the characters that we use and the text that we generate and use that does follow the paradigms of a sequence of small integers. 

And we still can do a lot of useful work this way, particularly in the limited world of retro computing. We just need to recognize up front that we there are a lot of Unicode characters beyond US ASCII that we won't be able to deal with.

We'll start here using the ASCII set that fits nicely in 7 bits, under the assumption that at least these characters are nice small integers. (And we may not actually get beyond that paradigm in any practical sense, in this primer.)

This subset usage is  made possible by our choice of platforms, although most of the computers that are now considered retro had fairly simple, compact character sets. 

(If we had chosen the Tandy Color Computer for the 6809, we would have a much more limited set of characters defined by the 6847 video display generator, not even true US ASCII.)

Our target for this exercise is to get a string of characters, such as 

  • "On the beach!"
  • "Hello World!" 
  • "My name is Mudd."
  • "Here we are now."

output where we can read it. 

Both the EXORciser and the Atari ST include low-level routines for outputting characters. If you can output characters and if you know how to construct a loop, you can output strings of text, as well.

The EXORciser also has low-level routines for outputting strings. The purpose of this tutorial is not to teach how to use the EXORciser monitor ROM, EXbug --  or the Atari BIOS, either -- but we will take a quick look at one of the string routines in EXbug, both to get a general idea of how to access such routines, and to get a general idea of what the routines were are going to put together for our purposes should do and should not do.

War story -- When I was first investigating Joe H. Allen's EXORsim (6800-only at the time), to see how useful it would be for testing my old fig-Forth source code transcriptions, I went looking for how to get character input and output. Couldn't find much, but I noted Joe's mention of the facts files being used by the simulator to figure out labels for disassembly. So I took a look inside that file.

From my previous experience (and the fig-FORTH source), I thought I should be looking for labels such as "INCH" and "OUTCH". And I found several suspicious labels in the facts file. 

In particular, these labels, part of what appears from the facts file to be a jump table that starts at $F000, caught my attention:

f012 code INBYTEV	Input byte with echo
f015 code INCHV		Input char with echo (strip bit 7)
f018 code OUTCH		Output character

If you disassemble the ROM code at $F000, you'll see that there is, indeed, a series of jump instructions starting at $F000: 

% u f000 
F000: 7E F5 58 COLDSTART  JMP $F558 [COLDBOOT Cold start (main reset)]			* Cold start
F003: 7E F7 89 ADDRPROMPT JMP $F789 [ADDRPRMT Prompt for addresses]			* Prompt for address
F006: 7E FA A7 HEXBIN     JMP $FAA7 [CVTHEX Convert ascii hex to binary]			* Convert hex to binary
F009: 7E F9 C0 CVTUPPER   JMP $F9C0 [CVTUPP Convert upper half to ascii hex]			* Convert upper half to ascii
F00C: 7E F9 C4 CVTLOWER   JMP $F9C4 [CVTLOW Convert lower half to ascii hex]			* Convert lower half to ascii
F00F: 7E FA 65 GETHEX     JMP $FA65 [GETHEX4 Get 4 hex digits from user into x]			* Get 4 hex digits from user into x
F012: 7E FA 8B INBYTEV    JMP $FA8B [INBYTE Input byte with echo unless AECHO is set]			* Input byte with echo
F015: 7E FA A0 INCHV      JMP $FAA0 [INCH Input character]			* Input char with echo (strip bit 7)
F018: 7E F9 DC OUTCH      JMP $F9DC [OUTCH Output character with NULs]			* Output character
F01B: 7E FA 24 OUTHEX1    JMP $FA24 [OUTHEX1 Output byte in hex ,x+]			* Output byte in hex ,x+
F01E: 7E FA 22 OUTHEX2    JMP $FA22 [OUTHEX2 Output 2 bytes in hex ,x+]			* Output 2 bytes in hex ,x++
F021: 7E FA 41 PCRLF      JMP $FA41 [PCRLF Print CR-LF]			* Print CR-LF
F024: 7E FA 33 PDATA      JMP $FA33 [PDATA Print CR-LF then string]			* Print CR-LF then string
F027: 7E FA 35 PDATA1     JMP $FA35 [PDATA1 Print string]			* Print string
F02A: 7E FA 26 PSPC       JMP $FA26 [PSPC Print space]			* Print space

Yes, this is a jump table. It's purpose is to present a reliable set of addresses to key functions.

Let's look at OUTCH, the entry at $F018, as an example. You see that it will jump from there to $F9DC. This seems like unnecessary jumping around, but the extra jump doesn't take much time compared to the character output function itself. 

If the version of the monitor ROM changes, the actual character output routine might move, say, to $F9D8. In that case, the jump at OUTCH will change to jump to $F9D8 instead of $F9DC, but any program that jumps to OUTCH will then jump from there to the right place without having to be updated. 

The ROM jump table provides a way for programs that use the functions inside the ROM to access those functions even when the insides of the ROM change -- as long as the jump table is maintained. (It can be maintained by re-assembling the monitor from the source code, or by patching it by hand. But the former is more reliable.)

Outputting a single character, 6800/6801:

After I disassembled the code that the OUTCH entry jumps to and examined it, I decided to give it a try. I used something like the following bit of code:

XOUTCH	EQU	$F018
*
ENTRY	LDAA	#'H	; the character to ouput
	JSR	XOUTCH	; Call the output routine in monitor ROM
	NOP		; landing pad
	NOP

Yeah. It's really short. Let's use it now.

EQU is a directive to the assembler, to EQUate a label to a value. In this case the label XOUTCH is defined as the OUTCH entry in the EXbug monitor ROM jump table. 

EQU produces no actual object code, only an entry in a symbol table.

The operand to the LDAA is a small integer, in this case the ASCII code for 'H'. You can put a trailing single quote ('H') on it to make it less disconcerting to human readers, but Motorola's assemblers don't care here. It's just a one-byte character, anything after the byte will be ignored.

JSR is Jump to SubRoutine, the general way to call subroutines in the 6800 and the 6801. It pushes the return address (the address of the first following NOP, in this case) on the return address stack pointed to by the stack pointer register, S. When the subroutine is done, it ends with a RTS -- ReTurn from Subroutine -- instruction, which pops the return address back off the stack and puts it back in the program counter, allowing the CPU to proceed with the instruction that follows the JSR.

How did S get to point to valid memory, by the way? The monitor ROM does that for us, but we won't want to depend on that in the future. I know it will be okay this time. But I'll show you how we can set up our own stack as soon as we've watched this tiny program run on the 6800 and 6801.

So, break out an EXORciser 6800 session, as we've done before, and assemble this at $2000. Disassemble it to be sure.

$ ./exor --mon
Load facts file 'facts'
'exbug.bin' loaded.
  EXBUG-1.1 detected
'mdos.dsk' opened for drive 0 (double sided)

OSLOAD...

Hit Ctrl-C for simulator command line.  Starting simulation...

>         0 A=00 B=00 X=0000 SP=00FF ------          0020: B6 E8 00 LDA E800                 

6800 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% a 2000
2000: XOUTCH	EQU	$F018
2000: *
2000: ENTRY	LDAA	#'H'	; the character to ouput
2002: 	JSR	XOUTCH	; output routine in monitor ROM
2005: 	NOP		; landing pad
2006: 	NOP
2007: 
% u 2000
2000: 86 48               LDA #$48
2002: BD F0 18            JSR $F018 [OUTCH Output character]
2005: 01                  NOP
2006: 01                  NOP
2007: 00                  ???
2008: 00                  ???
...
%

Set a breakpoint at the appropriate place, set tracing on, and, just for fun, step (s 2000) through the first instruction at $2000, to watch it load $48 (ASCII code for 'H') into accumulator A:

% b 2006
Breakpoint set at 2006
% t on
% s 2000

          0 A=00 B=00 X=0000 SP=00FF ------ ENTRY    2000: 86 48    LDA #48   EA=2001 D=48   
>         1 A=48 B=00 X=0000 SP=00FF ------          2002: BD F0 18 JSR F018                 

6800 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
%

There it is in A.

Continue from there and watch it trace through the ROM routine and return. (Scroll to the right if you can't see what the CPU is doing in the listing.) --

% c

          1 A=48 B=00 X=0000 SP=00FF ------          2002: BD F0 18 JSR F018  EA=F018(XOUTCH) 

          2 A=48 B=00 X=0000 SP=00FD ------ XOUTCH   F018: 7E F9 DC JMP F9DC  EA=F9DC(OUTCH) 
          3 A=48 B=00 X=0000 SP=00FD ------ OUTCH    F9DC: 37       PSHB                     Output character with NULs
          4 A=48 B=00 X=0000 SP=00FC ------          F9DD: F6 FC F4 LDB FCF4  EA=FCF4(ACIA0) D=02 
          5 A=48 B=02 X=0000 SP=00FC ------          F9E0: C5 02    BITB #02  EA=F9E1 D=02   
          6 A=48 B=02 X=0000 SP=00FC ------          F9E2: 27 F9    BEQ F9DD  EA=F9DD        
          7 A=48 B=02 X=0000 SP=00FC ------          F9E4: B7 FC F5 STA FCF5  EA=FCF5(ACIA1) D=48 
          8 A=48 B=02 X=0000 SP=00FC ------          F9E7: 81 0D    CMPA #0D  EA=F9E8 D=0D   
          9 A=48 B=02 X=0000 SP=00FC ------          F9E9: 26 1B    BNE FA06  EA=FA06        
         10 A=48 B=02 X=0000 SP=00FC ------          FA06: 7D FF 02 TST FF02  EA=FF02(NULCTRL) 
         11 A=48 B=02 X=0000 SP=00FC ---Z--          FA09: 2A F9    BPL FA04  EA=FA04        
         12 A=48 B=02 X=0000 SP=00FC ---Z--          FA04: 33       PULB                     
         13 A=48 B=00 X=0000 SP=00FD ---Z--          FA05: 39       RTS                      

         14 A=48 B=00 X=0000 SP=00FF ---Z--          2005: 01       NOP                      

Breakpoint!
H>        15 A=48 B=00 X=0000 SP=00FF ---Z--          2006: 01       NOP                      

6800 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% 

And we see the H output somewhere in there, in this case, right after the breakpoint is reported.

Just for grins, read through the code that it executed in the ROM routine and see if you can guess what it is doing.

Also just for fun and enlightenment, let's take a look at the memory pointed to by S. Type "m fc" and hit return until we've gone far enough, and then hit Ctrl-C once:

% m fc
00fc 00 
00fd 00 
00fe 20 
00ff 05 
0100 00 
0101 00 
%

You can see the return address, $2005, stored from $00fe to $00ff. Think about what address was in S when, and what that means for a minute.

I'll wait.

Got it, right? 

No?

Okay, I guess it isn't all that obvious. :-*

On the 6800 and 6801, S will be pointing to the next available byte on the stack any time you try to look at it. That's kind of opposite the usual approach in the industry, but it works, with a few caveats which I will return to later.

Again, the 6801 code will be exactly the same. And since no one has at this point attempted to optimize the monitor ROM for the 6801, the monitor ROM code should also be exactly the same, as well. The only difference would be that, if my emulation of the 6801 were accurate, it would run a bit faster. I'm not going to urge you check the code on the 6801 for this one unless you want the practice.  

[JMR202408181632 edit:]

If you do, you'll run into a problem I had forgotten about. If I get around to updating EXORsim6801, the problem should go away, but until then you'll need some workarounds. Don't forget to come back here if you go look at that now.

[JMR202408181632 edit-end.]

Continuing with the war story, now that Joe has added exor09 to his simulator, I have dug into the facts file for the 6809, and it is (currently) exactly the same as the 6800 facts file. Disassembling the ROM code for the 6809 shows that Motorola's engineers have made quite an effort to avoid changing entry points and effects of code, in spite of the 6809 providing much better native approaches for much of it.  

And, looking at the M6809 EXORciser's User's Guide, the jump table and some other entry points are now documented in the User's Guide, where they weren't in the M6800 EXORciser's User's Guide. So we have a more complete description of the routines.

If you download the M6809 EXORciser's User's Guide, you can find descriptions of the supported entry points in section 3-7, beginning on page 3-9, about p. 50 of the PDF I have.

[JMR202410141236 Note: I haven't explored all of the facts files yet, but I have found a parameter definition that should be different -- AECHO. Based on that, we should be a bit wary of assuming too much. I haven't checked much of the variables and parameters. The entry points I've checked so far do match.]

Outputting a single character, 6809:

The 6809 source code will use the 6809 LDA mnemonic, and otherwise will be the same as the 6800:

$ ./exor09 --mon
Load facts file 'facts09'
'exbug09.bin' loaded.
  EXBUG09-2.1 detected
'mdos09.dsk' opened for drive 0 (double sided)

OSLOAD...

Hit Ctrl-C for simulator command line.  Starting simulation...

>         0 A=00 B=00 X=0000 Y=0000 U=0000 S=00FF P=00 --------            0020: 86 10        LDA #$10                   

6809 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% a 2000
2000: XOUTCH	EQU	$F018
2000: *
2000: ENTRY	LDA	#'H'	; the character to ouput
2002: 	JSR	XOUTCH	; output routine in monitor ROM
2005: 	NOP		; landing pad
2006: 	NOP
2007: 
% u 2000
2000: 86 48               LDA #$48
2002: BD F018             JSR $f018
2005: 12                  NOP 
2006: 12                  NOP 
2007: 00 00               NEG $00
2009: 00 00               NEG $00
...
% b 2006 Breakpoint set at 2006 % c 2000 Breakpoint! H 28 A=48 B=00 X=0000 Y=0000 U=0000 S=00FF P=00 -----Z-- 2005: 12 NOP > 29 A=48 B=00 X=0000 Y=0000 U=0000 S=00FF P=00 -----Z-- 2006: 12 NOP 6809 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
%

Woops! Forgot to turn tracing on. (Deliberately? ;-) But you can see the 'H' output there. 

Turn tracing on and do it again (and scroll to the right in the listing below to watch what the CPU is doing): 

% t on
% c 2000

         29 A=48 B=00 X=0000 Y=0000 U=0000 S=00FF P=00 -----Z-- ENTRY      2000: 86 48        LDA #$48                   
         30 A=48 B=00 X=0000 Y=0000 U=0000 S=00FF P=00 --------            2002: BD F018      JSR $f018                  
         31 A=48 B=00 X=0000 Y=0000 U=0000 S=00FD P=00 -------- XOUTCH     F018: 16 0092      LBRA $f0ad                 
         32 A=48 B=00 X=0000 Y=0000 U=0000 S=00FD P=00 --------            F0AD: 7D FF67      TST $ff67   EA=FF67 D=00   
         33 A=48 B=00 X=0000 Y=0000 U=0000 S=00FD P=00 -----Z--            F0B0: 26 76        BNE $f128                  
         34 A=48 B=00 X=0000 Y=0000 U=0000 S=00FD P=00 -----Z--            F0B2: 8D 74        BSR $f128                  
         35 A=48 B=00 X=0000 Y=0000 U=0000 S=00FB P=00 -----Z--            F128: 17 00AC      LBSR $f1d7                 
         36 A=48 B=00 X=0000 Y=0000 U=0000 S=00F9 P=00 -----Z--            F1D7: 34           PSHS B                     
         37 A=48 B=00 X=0000 Y=0000 U=0000 S=00F8 P=00 -----Z--            F1D9: F6 FCF4      LDB $fcf4   EA=FCF4(ACIA0) D=02 
         38 A=48 B=02 X=0000 Y=0000 U=0000 S=00F8 P=00 --------            F1DC: C5 02        BITB #$02                  
         39 A=48 B=02 X=0000 Y=0000 U=0000 S=00F8 P=00 --------            F1DE: 27 F9        BEQ $f1d9                  
         40 A=48 B=02 X=0000 Y=0000 U=0000 S=00F8 P=00 --------            F1E0: B7 FCF5      STA $fcf5   EA=FCF5(ACIA1) D=48 
         41 A=48 B=02 X=0000 Y=0000 U=0000 S=00F8 P=00 --------            F1E3: 35           PULS PC,B                  
         42 A=48 B=00 X=0000 Y=0000 U=0000 S=00FB P=00 --------            F12B: 34           PSHS B,A                   
         43 A=48 B=00 X=0000 Y=0000 U=0000 S=00F9 P=00 --------            F12D: 84 7F        ANDA #$7f                  
         44 A=48 B=00 X=0000 Y=0000 U=0000 S=00F9 P=00 --------            F12F: 81 0D        CMPA #$0d                  
         45 A=48 B=00 X=0000 Y=0000 U=0000 S=00F9 P=00 --------            F131: 26 15        BNE $f148                  
         46 A=48 B=00 X=0000 Y=0000 U=0000 S=00F9 P=00 --------            F148: F6 FF02      LDB $ff02   EA=FF02(NULCTRL) D=00 
         47 A=48 B=00 X=0000 Y=0000 U=0000 S=00F9 P=00 -----Z--            F14B: 7D FF67      TST $ff67   EA=FF67 D=00   
         48 A=48 B=00 X=0000 Y=0000 U=0000 S=00F9 P=00 -----Z--            F14E: 27 ED        BEQ $f13d                  
         49 A=48 B=00 X=0000 Y=0000 U=0000 S=00F9 P=00 -----Z--            F13D: C4 7F        ANDB #$7f                  
         50 A=48 B=00 X=0000 Y=0000 U=0000 S=00F9 P=00 -----Z--            F13F: 5A           DECB                       
         51 A=48 B=FF X=0000 Y=0000 U=0000 S=00F9 P=00 ----N---            F140: 2B 0E        BMI $f150                  
         52 A=48 B=FF X=0000 Y=0000 U=0000 S=00F9 P=00 ----N---            F150: 4F           CLRA                       
         53 A=00 B=FF X=0000 Y=0000 U=0000 S=00F9 P=00 -----Z--            F151: 35           PULS PC,B,A                
         54 A=48 B=00 X=0000 Y=0000 U=0000 S=00FD P=00 -----Z--            F0B4: 7D FF37      TST $ff37   EA=FF37 D=00   
         55 A=48 B=00 X=0000 Y=0000 U=0000 S=00FD P=00 -----Z--            F0B7: 27 DA        BEQ $f093                  
         56 A=48 B=00 X=0000 Y=0000 U=0000 S=00FD P=00 -----Z--            F093: 39           RTS                        
         57 A=48 B=00 X=0000 Y=0000 U=0000 S=00FF P=00 -----Z--            2005: 12           NOP                        

Breakpoint!
H>        58 A=48 B=00 X=0000 Y=0000 U=0000 S=00FF P=00 -----Z--            2006: 12           NOP                        

6809 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
%

We can see that the 6809's path through the monitor ROM is different, and contains instructions unique to the 6809. Just for curiosity, let's look at the 6809's stack after the code is run:

% m 00f8
00f8 00 
00f9 48 
00fa 00 
00fb f0 
00fc b4 
00fd 20 
00fe 05 
00ff 00 
0100 00 
0101 00 
% 

We can see that much more information has been stored on the stack during the run for some reason. (More registers to save? I'm guessing that the 6809 is doing a bit more than would be necessary, in order to maintain compatibility with the EXORciser's 6800-based code.)

We should note that the return address to our code has been stored from $00fd to $00fe. The 6809 varies from the 6800 in stack discipline, always pointing to the last byte pushed instead of to the next available byte. We'll discuss this later, as well.

Now let's try outputting a full string of text on the 6800/6801.

As I mentioned earlier, in the M6809 EXORciser's User's Guide, we find a more complete description of the EXbug routines for outputting strings. We don't intend to use these routines extensively, for reasons I'll explain later, but we will take a quick look at them.

XPDATA and XPDAT1 are described on p. 3-35, about p. 56 in the PDF I have. These correspond to PDATA and PDATA1 in Joe's facts file. 

Specific points of usage are that the string should be terminated by EOT (ASCII 4), and that the X index register should point to the beginning of the string when you call the routine.

Providing PDATA as an entry point that outputs a carriage return/line feed in front of a string is a bit of a byte-count-saving technique, which we will ignore, to keep our excursion into the ROMs short. We'll just use the more general PDATA1 and insert CR/LF characters as necessary to separate the strings from the debugger output.

And, as I said, this time we'll provide our own stack. The Reserve Memory Block directive, RMB, is useful for this.

The assembler keeps track of where the next instruction should be allocated in memory. This pointer is sometimes called the "here" counter or pointer, as in "You are here.".

In effect, the RMB directive adds its argument to the assembler's here pointer, without recording any instructions, constants, or data in the gap. 

A linker or loader may later fill the gap in with zeroes or some other constant, or may just leave it with whatever might have been there before, essentially "garbage" data. You can't rely on what might be in there.

You'll note that the stack's label is at the end of the area allocated for the stack, not the beginning. This is because data is pushed down in memory, most recently pushed data going below what was there before. Also, you'll note that , for the 6800, the initial stack pointer is pointing to the first place available to push a byte to. 

Another note, Joe's interactive assembler doesn't support having the assembler calculate the size of the stack, which is why I commented that line out and replaced it with a pre-calculated size.

I've braced the string in CR/LF pairs to make it more visible in the output. "Sekai yo, yai!" is a Latinization of (or, Rōmaji for) 「世界よ、ヤイ!」 -- a rough approximation of "Hey, World, Hello!" in Japanese. Choose your own phrase to put in here. 

Placement of the text after the stack was deliberate, to avoid the text being overwritten in the unlikely occurrence of a stack overflow.

And don't forget that the monitor ROM routines expect the End Of Text character at the end of the string.

XPDAT1	EQU	$F027	; string output, terminated by EOT
EOT	EQU	$04	; $04 is decimal 4
LF	EQU	$0A	; line feed
CR	EQU	$0D	; carriage return
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
ENTRY	JMP	START
* (EXORsim apparently doesn't want to calculate RMB arguments.)
*	RMB	16*NATWID-1
	RMB	31	; 16 levels of call minus any saved registers, max
STKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
SAVES	RMB	2	; a place to keep S so we can be clean
HELLO	FCB	CR,LF	; Put message at beginning of line
	FCB	"SEKAI YO, YAI!"	; Whatever the user wants here.
	FCB	CR,LF,EOT	; Put the debugger's output on a new line.
*
START	STS	SAVES	; Save what the monitor gives us.
	LDS	#STKBAS	; Move to our own stack
	LDX	#HELLO	; point to the string
	JSR	XPDAT1	; output it
DONE	LDS	SAVES	; restore the stack pointer
	NOP
	NOP		; landing pad

Assemble at $2000. Set the breakpoint at the DONE label, just after the return from the call to XPDAT1 this time, but don't turn tracing on. Single step a few instructions to watch S and X get set. Then continue (c) from there to the breakpoint, and single step after the breakpoint to see S restored:

$ ./exor --mon
Load facts file 'facts'
'exbug.bin' loaded.
  EXBUG-1.1 detected
'mdos.dsk' opened for drive 0 (double sided)

OSLOAD...

Hit Ctrl-C for simulator command line.  Starting simulation...

>         0 A=00 B=00 X=0000 SP=00FF ------          0020: B6 E8 00 LDA E800                 

6800 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% a 2000
2000: XPDAT1	EQU	$F027	; string output, terminated by EOT
2000: EOT	EQU	$04	; $04 is decimal 4
2000: LF	EQU	$0A	; line feed
2000: CR	EQU	$0D	; carriage return
2000: *
2000: NATWID	EQU	2	; 2 bytes in the CPU's natural integer
2000: *
2000: ENTRY	JMP	START
2003: * (EXORsim apparently doesn't want to calculate RMBs.)
2003: *	RMB	16*NATWID-1
2003: 	RMB	31	; 16 levels of call minus any saved registers, max
2022: STKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
2023: SAVES	RMB	2	; a place to keep S so we can be clean
2025: HELLO	FCB	CR,LF	; Put message at beginning of line
2027: 	FCB	"SEKAI YO, YAI!"	; Whatever the user wants here.
2035: 	FCB	CR,LF,EOT	; Put the debugger's output on a new line.
2038: *
2038: START	STS	SAVES	; Save what the monitor gives us.
Address at 2001 set to 2038
203b: 	LDS	#STKBAS	; Move to our own stack
203e: 	LDX	#HELLO	; point to the string
2041: 	JSR	XPDAT1	; output it
2044: DONE	LDS	SAVES	; restore the stack pointer
2047: 	NOP
2048: 	NOP		; landing pad
2049: 
% b 2044
Breakpoint set at 2044
% s 2000

          0 A=00 B=00 X=0000 SP=00FF ------ ENTRY    2000: 7E 20 38 JMP 2038  EA=2038(START) 
>         1 A=00 B=00 X=0000 SP=00FF ------ START    2038: BF 20 23 STS 2023                 

6800 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% s

          1 A=00 B=00 X=0000 SP=00FF ------ START    2038: BF 20 23 STS 2023  EA=2023(SAVES) D=00FF 
>         2 A=00 B=00 X=0000 SP=00FF ------          203B: 8E 20 22 LDS #$2022                

6800 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% s

          2 A=00 B=00 X=0000 SP=00FF ------          203B: 8E 20 22 LDS #$2022 EA=203C D=2022 
>         3 A=00 B=00 X=0000 SP=2022 ------          203E: CE 20 25 LDX #$2025                

6800 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% s

          3 A=00 B=00 X=0000 SP=2022 ------          203E: CE 20 25 LDX #$2025 EA=203F D=2025 
>         4 A=00 B=00 X=2025 SP=2022 ------          2041: BD F0 27 JSR F027                 


6800 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% c


Breakpoint!

SEKAI YO, YAI!
        321 A=04 B=00 X=2037 SP=2020 ---Z--          FA40: 39       RTS                      

>       322 A=04 B=00 X=2037 SP=2022 ---Z-- DONE     2044: BE 20 23 LDS 2023                 

6800 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% s

        322 A=04 B=00 X=2037 SP=2022 ---Z-- DONE     2044: BE 20 23 LDS 2023  EA=2023(SAVES) D=00FF 
>       323 A=04 B=00 X=2037 SP=00FF ------          2047: 01       NOP                      

6800 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% 

And we see the string output just after the breakpoint is reported.

While you're looking at the output string, play around a bit. Disassemble the code, look at the stack, check where the stack pointer was saved, that kind of stuff. Take notes of anything interesting. 

You might even try doing it again, setting trace mode on this time. But, if you do, be prepared to wade through a lot of trace output. And be prepared to have to keep a sharp look out for the characters of the string mixed in with the trace.

Again, the 6801 will do just the same as the 6800. Do it for practice, anyway, unless you're really impatient. 

[JMR202408181636 edit:]

But, as I have now noted above, you'll run into a problem I had forgotten about. If I get around to updating EXORsim6801, the problem should go away, but until then you'll need some workarounds. And, again, don't forget to come back here when done, if you go there now.

[JMR202408181636 edit-end.]

And let's try outputting a full string of text on the 6809.

The same 6800 source will actually work on the 6809 as it is. As it turns out, 15 and a half levels minus the bytes of saved registers is still plenty and then some. Give it a try and note how the return address gets saved starting one byte lower on the stack. You were expecting that by now, right?

But let's modify the source for the way the 6809 does the stack, and, while we are at it, use the 6809's PC-relative addressing modes a little:

* 6809 special version
XPDAT1	EQU	$F027	; string output, terminated by EOT
EOT	EQU	$04	; $04 is decimal 4
LF	EQU	$0A	; line feed
CR	EQU	$0D	; carriage return
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
ENTRY	BRA	START	; Close enough for the short branch.
	RMB	32	; 16 levels of call minus any saved registers, max
*STKBAS	EQU	*	; Didn't want to tie STKBAS to SAVES
* But the interactive assembler doesn't recognize * as the here pointer.
SAVES	RMB	2	; a place to keep S so we can return cleanly
* And the interactive assembler wants EQU arguments to be known when used.
STKBAS	EQU	SAVES 	; 6809 is pre-dec (pre-store-decrement) push
HELLO	FCB	CR,LF	; Put message at beginning of line
	FCB	"SEKAI YO, YAI!"	; Whatever the user wants here.
	FCB	CR,LF,EOT	; Put the debugger's output on a new line.
*
START	STS	SAVES,PCR	; Save what the monitor gives us.
	LEAS	STKBAS,PCR	; Move to our own stack
	LEAX	HELLO,PCR	; point to the string
	LBSR	XPDAT1		; output it
DONE	LDS	SAVES,PCR	; restore the stack pointer
	NOP
	NOP		; landing pad

I'll explain about the asterisk and the here pointer and why we can't use them in this code later. For now, it should be enough to show that the initial stack pointer on the 6809 points 1 address beyond the actual stack allocation area. This is because the 6809 decrements the stack pointer before pushing a byte on the stack.

For a variety of reasons, we will be revisiting these concepts as we go. 

Let's take a look at this code running:

$ ./exor09 --mon
Load facts file 'facts09'
'exbug09.bin' loaded.
  EXBUG09-2.1 detected
'mdos09.dsk' opened for drive 0 (double sided)

OSLOAD...

Hit Ctrl-C for simulator command line.  Starting simulation...

>         0 A=00 B=00 X=0000 Y=0000 U=0000 S=00FF P=00 --------            0020: 86 10        LDA #$10                   

6809 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% a 2000
2000: * 6809 special version
2000: XPDAT1	EQU	$F027	; string output, terminated by EOT
2000: EOT	EQU	$04	; $04 is decimal 4
2000: LF	EQU	$0A	; line feed
2000: CR	EQU	$0D	; carriage return
2000: *
2000: NATWID	EQU	2	; 2 bytes in the CPU's natural integer
2000: *
2000: ENTRY	BRA	START	; Close enough for the short branch.
later = 2001
2002: 	RMB	32	; 16 levels of call minus any saved registers, max
2022: *STKBAS	EQU	*	; Didn't want to tie STKBAS to SAVES
2022: * But the interactive assembler doesn't recognize * as the here pointer.
2022: SAVES	RMB	2	; a place to keep S so we can return cleanly
2024: * And the interactive assembler wants EQU arguments to be known when used.
2024: STKBAS	EQU	SAVES 	; 6809 is pre-dec (pre-store-decrement) push
2024: HELLO	FCB	CR,LF	; Put message at beginning of line
2026: 	FCB	"SEKAI YO, YAI!"	; Whatever the user wants here.
2034: 	FCB	CR,LF,EOT	; Put the debugger's output on a new line.
2037: *
2037: START	STS	SAVES,PCR	; Save what the monitor gives us.
Offset at 2001 set to 35
203b: 	LEAS	STKBAS,PCR	; Move to our own stack
203e: 	LEAX	HELLO,PCR	; point to the string
2041: 	LBSR	XPDAT1		; output it
2044: DONE	LDS	SAVES,PCR	; restore the stack pointer
2048: 	NOP
2049: 	NOP		; landing pad
204a: 
% u 2000
2000: 20 35               BRA $2037
2002: 00 00               NEG $00
...
2020: 00 00               NEG $00
2022: 00 00               NEG $00
2024: 0D 0A               TST $0a
2026: 53                  COMB 
2027: 45                  ???A 
2028: 4B                  ???A 
% u 2037
2037: 10EF 8C E7           STS $2022,PCR
203B: 32 8C E4            LEAS $2022,PCR
203E: 30 8C E3            LEAX $2024,PCR
2041: 17 CFE3             LBSR $f027 [XPDAT1 Print data string (Enter with X)]
2044: 10EE 8C DA           LDS $2022,PCR
2048: 12                  NOP 
2049: 12                  NOP 
204A: 00 00               NEG $00
204C: 00 00               NEG $00
...
% b 2044
Breakpoint set at 2044
% s 2000

          0 A=00 B=00 X=0000 Y=0000 U=0000 S=00FF P=00 -------- ENTRY      2000: 20 35        BRA $2037                  
>         1 A=00 B=00 X=0000 Y=0000 U=0000 S=00FF P=00 -------- START      2037: 10EF 8C E7   STS $2022,PCR                

6809 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% s

          1 A=00 B=00 X=0000 Y=0000 U=0000 S=00FF P=00 -------- START      2037: 10EF 8C E7   STS $2022,PCR EA=2022(STKBAS) D=00FF 
>         2 A=00 B=00 X=0000 Y=0000 U=0000 S=00FF P=00 --------            203B: 32 8C E4     LEAS $2022,PCR                

6809 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% s

          2 A=00 B=00 X=0000 Y=0000 U=0000 S=00FF P=00 --------            203B: 32 8C E4     LEAS $2022,PCR                
>         3 A=00 B=00 X=0000 Y=0000 U=0000 S=2022 P=00 --------            203E: 30 8C E3     LEAX $2024,PCR                

6809 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% s

          3 A=00 B=00 X=0000 Y=0000 U=0000 S=2022 P=00 --------            203E: 30 8C E3     LEAX $2024,PCR                
>         4 A=00 B=00 X=2024 Y=0000 U=0000 S=2022 P=00 --------            2041: 17 CFE3      LBSR $f027                 

6809 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% c


Breakpoint!

SEKAI YO, YAI!
        587 A=04 B=00 X=2036 Y=0000 U=0000 S=2020 P=00 -----Z--            F069: 39           RTS                        
>       588 A=04 B=00 X=2036 Y=0000 U=0000 S=2022 P=00 -----Z-- DONE       2044: 10EE 8C DA   LDS $2022,PCR                

6809 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% s

        588 A=04 B=00 X=2036 Y=0000 U=0000 S=2022 P=00 -----Z-- DONE       2044: 10EE 8C DA   LDS $2022,PCR EA=2022(STKBAS) D=00FF 
>       589 A=04 B=00 X=2036 Y=0000 U=0000 S=00FF P=00 --------            2048: 12           NOP                        

6809 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% 
  • Copy it, assemble it, 
  • use the disassembler to make sure it assembled correctly; 
  • set the breakpoint, 
  • single-step a few to watch the stack pointer and the index register get set correctly, 
  • continue execution to watch the string get output; 
  • single-step to watch the S stack pointer get restored.

After that, check the stack memory with the (d)ump or (m)emory commands. (Remember, use Ctrl-C to get out of the memory change cycle.)

One more thing to play with -- Try leaving the EOT off the end of the string and see what happens. (I'll talk more about this later.)

This kind of playing is important. So important that I'll dedicate an entire chapter to it every now and then.

And, just for the record, in case you missed it above, we will be building our own string output routines.

This chapter has gotten kind of long, so we'll try to repeat this all on the Atari ST's 68000 in the next chapter

Or you could check out the workarounds I recommend for EXORsim6801 now, and look at the 68000 code after.


(Title Page/Index)