Thursday, October 17, 2024

ALPP 02-17 -- Unsteady Footing -- Stack Frame for Single Stack: 6801

This is going to sit here for a while, maybe a long while. Maybe I'll get up the energy and interest to work it through sometime in the future, and make it useful. For now, it's just going to sit here like a rubber brick on the bottom of the pool.

  Unsteady Footing --
Stack Frame for Single Stack:
6801

(Title Page/Index)

Single-stack stack frames on the 6809 and 68000 seem almost reasonable. Almost.

But with the conflicts between uses of the stack, index, accumulator, offset calculations, and so on, it's tempting to just punt on the 6800 and 6801. 

I'm sure it can be done. Sort-of.

But you'd be better off just moving ahead to numeric output in binary on the 6800, rather than getting lost down this rabbit hole. 

But sometimes it's okay to go down rabbit holes, with a bit of scratch paper and a pencil and a big eraser. If you proceed in this chapter, keep that in mind.

Negative index offsets are a problem with the 6800 and 6801. The instruction set provides for positive constant offsets, but not negative. 

This means you have to waste time calculating addresses at run time to get to things in the stack frame that are below the frame pointer. And that's the way the 68000's LINK and UNLK instructions construct the stack frame.

It's possible to have the frame pointer point to the bottom of the frame, but then you need a frame size variable, and run-time address calculation, to get to the top. Or you need to track the top of the frame with a separate top-of-frame pointer, and that also gets tricky.

So your frame pointer points to the top of the frame, and to the link to the caller's frame. The stack pointer points to the bottom. 

And you usually don't offset the stack pointer. But you can.

That still leaves allocating the new stack frame as a problem.

But we did that with the ADDBX low-level utility subroutine for the split-stack case.

Including an SBX (subtract B from X) instruction to parallel the ABX (add B to X) in the 6801's extended instruction set would have helped, but Motorola didn't do that. And it still would have been awkward.

Signed ADDBX is too slow to use for indexing every variable.

So adjusting to indexing into the stack frame from the stack pointer seems it might be the best option. Maybe. It gets tricky when you are pushing parameters before a call, but we can figure something out for that.

Thinking about speeding up ADDBX, when the adjustment is just two bytes, two DEX or INX in a row is two bytes and 8 cycles, so direct increment or decrement is not bad for that. And a list of INX or DEX instructions that you can jump into the middle of could be a reasonable trade-off for up to 8 bytes of offset. And adding D instead of just B on the 6801 is going to be faster on the 6801.

As I said, we really shouldn't be trying to tackle this problem this early in the tutorial, which is yet another reason I didn't want to show how to use stack frames.

So I really should punt for now, and pick this back up waaaaay down the road. 

But I'm feeling obstinate. Here goes nothing:

We are going to add a BFP bottom of frame pointer. This will allow us to not worry about how many parameters have been pushed. Maybe. The calling routine will have to restore it after a call.

Now, with all the way we are thrashing D and X, we are going to have to do something about the return value. The calling routine will allocate space for two or four bytes of return value just above the parameters, and it will grab what has been stored there and store it in D and X just after the return. I think that will work.

With this protocol, here's how the stack looks before the first routine pushes parameters for the first call:


nnnn


RA_0

FA_0 <= _FP_
T1_1

T1_2 <= _RSP_,_BFP_
????

And here's how it looks after it pushes parameters, including the return value parameters, V2_1 and V2_2:

nnnn


RA_0

FA_0 <= _FP_
T1_1

T1_2 <=
_BFP_
V2_1

V2_2

P2_1

P2_2 <=
_RSP_
????


????

And after the call, but before executing the protocol:

nnnn


RA_0

FA_0 <= _FP_
T1_1

T1_2 <=
_BFP_
V2_1

V2_2

P2_1

P2_2

RA_1 <= _RSP_
????

And here's how it looks after it executes the protocol:

nnnn


RA_0

FA_0 <= _FA_1
T1_1

T1_2

V2_1

V2_2

P2_1

P2_2

RA_1

FA_1 <= _FP_
T2_1

T2_2

T2_3 <= _RSP_,_BFP_
????

 

So BFP will mostly be just tracking the return stack pointer, and basically just be used to access the current stack frame. 

Still easy to walk.

Still easy for a rogue routine to walk all over the frame links and return addresses.

BFP will be another direct page variable that we will have to save and restore on task switch, along with FP and XWORK. 

The code below should be considered scratch work and a lot of hand-waving. It's too abstract and will not work as advertised. It might be useful as a guide to implementing stack frames on the 6801 if they absolutely must be implemented, but it might not.

* calling protocol for single-stack stack frames on 6809
* with FP (in the direct page) as the frame pointer (no PSP)
* BFP (in the direct page) to track the bottom of the stack,
* and routines saving the frame pointer
* and allocating the new frame on entry.
* The return value will be allocated above the parameters, 
* because we have to use D and X all over the place,
* and the calling routine will get the return values and restore BFP.
*
* More variables in DP which must be save and restored on task switch
*		ORG	SOMEWHERE_IN_DP
* BFP	RMB	2
* RETADR	RMB	2
*
* LINK general, size in D, negative for allocation
* Uses RETADR in the direct page, which must be saved and restored on task switch
LINKG	PULX		; grab the return address
	STX	RETADR
	LDX	FP
	PSHX		; old frame pointer saved
	TSX		; do not save S directly
	STX	FP	; we want to use TOS as a pointer
	ADDD	FP	; allocate
	PSHB
	PSHA
	PULX
	TXS		; allocation complete
	STX	BFP
	LDX	RETADR
	JMP	0,X	; return
*
INX8	INX
	INX
INX6	INX
	INX
INX4	INX
	INX
	INX
	INX
	RTS
*
DEX8	DEX
	DEX
DEX6	DEX
	DEX
DEX4	DEX
	DEX
	DEX
	DEX
	RTS
*
ADDDX	STX	XWORK
	ADDD	XWORK
	STD	XWORK
	LDX	XWORK
	RTS
*
FRAME_SIZE	SET	SOMETHING	; ROUTINE1's frame size
* Could use a faster entry if FRAMESIZE <= 8 bytes:
ROUTINE1
	LDD	#-FRAMESIZE
	JSR	LINKG
*	...
	LDX	PARAMETER1
	PSHX
	LDD	PARAMETER2
	PSHB
	PSHA
	JSR	ROUTINE2
* If 1 parameter:
*	INS
*	INS
*	TSX
* IF 2 to 4 parameters:
*	TSX
*	JSR	INXN	; 4, 6, or 8
*	TXS
* if greater:
	TSX
	LDD	#PARAMETER_SIZE
	JSR	ADDDX
	TXS	
*		; still has return values between TOS and BFP
* if 2 bytes:
*	PULB
*	PULA
*	STD	RESULTVAR_OFFSET,X	; low part
* if 4 bytes:
	PULB
	PULA
	STD	RESULTVAR_OFFSET+2,X	; in the case of 4 bytes, high part
	PULB
	PULA
	STD	RESULTVAR_OFFSET+2,X	; low part
*		; SP is what BFP should be
	TSX
	STX	BFP
*	...
	LDX	FP		; restore/deallocate
	TXS
	PULX
	STX	FP
	RTS
* and the caller moves results where they go in its own frame 
* immediately after dropping the parameters that are no longer in use.
*	...
*
*
* called routine entry protocol for single-stack stack frames
* with U as the frame pointer,
* and routines saving the frame pointer
* and allocating the new frame on entry.
* (Caller, of course, pushes call parameters before call,
* and pops return parameters to where they go on return.):
FRAME_SIZE	SET	SOMETHING	; ROUTINE2's frame size
* Showing faster entry for FRAMESIZE <= 8
ROUTINE2
	LDX	FP
	PSHX		; old frame pointer saved
	TSX		; do not save S directly
	STX	FP	; we want to use TOS as a pointer
	JSR	DEXN	; 8, 6, or 4; DEX2 would be in-line
	TXS		; allocation complete
	STX	BFP
*	...
* No code for dealing with large return values. Can't do that.
* Large return values have to be dealt with outside of the stack frame.	
*	...	
	LDX	BFP
	LDD	PARAMETER2OFFSET+FRAMESIZE,X	; access the second parameter
*	...
	SUBD	PARAMETER1OFFSET+FRAMESIZE,X	; access the first parameter
*	...
	LDX	FP			; get the caller's frame pointer
	LDX	0,X
	ADDD	CALLER.PARAMETERNOFFSET+CALLER.FRAMESIZE,X	; or something -- positive offset
*	...
	LDX	FP		; restore/deallocate
	TXS
	PULX
	STX	FP
	RTS
*

No. I'm not going to claim this is good code, or even good theorizing. 

Maybe, after I let it sit for a year or two, I'll come back to it. For now, it's abandoned. We didn't need to go down this path anyway.

Go ahead and look at getting numbers output in binary.


(Title Page/Index)


 

 

 

 

Wednesday, October 16, 2024

ALPP 02-16 -- One Foot on One Beach, One Foot on Another -- Stack Frame for Single Stack: 68000 and 6809

  One foot on One Beach, One Foot on Another --
Stack Frame for Single Stack:
68000 and 6809

(Title Page/Index)

Now that I've shown you how you can do stack frames on split-stack run-times, both with and without frame pointers, at long last, I'm going to show you "normal" single interleaved stack stack frames -- one kind, anyway.

There are several ways to do this, and there is a particular reason (although not a very good reason) I've picked this one.

Let's get a look at the single stack being used for parameters, temporaries, and variables without frames during a routine. 

Below, nnnn is stuff we don't really know about, but it's not garbage. RA_0 is the return address to whatever called the routine we are in. T1_1 and T1_2 are either temporaries or variables, we don't care which:

nnnn


RA_0

T1_1

T1_2 <= _RSP_
????

And here's how it looks after the routine we are in pushes parameters P2_1 and P2_2 and enters a second routine:

nnnn


RA_0

T1_1

T1_2

P2_1

P2_2

RA_1 <= _RSP_
????

And after a third call:

nnnn


RA_0

T1_1

T1_2

P2_1

P2_2

RA_1

T2_1

T2_2

T2_3

P3_1

RA_2 <= _RSP_
????

And it's hard to look at the stack and tell what's what. If i hadn't been tracking the calls, labeling what was pushed as it was pushed, I wouldn't have this map. 

Each routine knows it's own context (maybe), but heaven help it if it has to access a caller's context.

And when a debugger looks inside, well, a good debugger can read the context of each call from the source code and build its map backwards, but it has to be a really good debugger, the sort that you don't have until your platform has been on the market for several years.

The engineer doing the debugging can do what a good debugger can, but he takes much more time to do so, and that slows down debugging and creates opportunities for mistakes. 

For a variety of reasons, we want to impose some order on that stack.

Assume your routines on the 68000 have this entry and exit protocol:
	MOVE.L	A6,-(A7)
	MOVE.L	A7,A6
	LEA	-FRAMESIZE(A7),A7
*	...
	MOVE.L	A6,A7
	MOVE.L	(A7)+,A6

With this protocol, here's how the stack looks before the first routine pushes parameters for the first call:

nnnn


RA_0

FA_0 <= _FP_
T1_1

T1_2 <= _RSP_
????

There's something new in there. There's a frame pointer (FP), and it's pointing to frame address 0 (FA_0) -- maybe the saved frame address of whatever called the first routine, whatever that might be.

And here's how it looks after it pushes parameters and calls a second routine, but before it executes the entry protocol:

nnnn


RA_0

FA_0 <= _FP_
T1_1

T1_2

P2_1

P2_2

RA_1 <= _RSP_
????

And here's how it looks after it executes the protocol:

nnnn


RA_0

FA_0 <= _FA_1
T1_1

T1_2

P2_1

P2_2

RA_1

FA_1 <= _FP_
T2_1

T2_2

T2_3 <= _RSP_
????

The variables and temporaries of the second routine won't be initialized yet, but the frame has been constructed, and it is relatively easy to walk it backwards to the previous frame. 

Here's what it looks like after the third call, with 1 parameter and 1 variable or temporary allocated:

nnnn


RA_0

FA_0 <= _FA_1
T1_1

T1_2

P2_1

P2_2

RA_1

FA_1 <= FA_2
T2_1

T2_2

T2_3

P3_1

RA_2

FA_2 <= _FP_
T3_1 <=
_RSP_
????

Still easy to walk.

Of course it does require knowing what a context is supposed to look like, to access that context. That's not a problem. The engineer or debugger only needs to look at the context of the routine of interest.

But it's also relatively easy for a rogue routine to walk all over the stack, including frame links and return addresses, which is why I don't care for it.

Hmm. This indicates another approach I could show for doing stack frames in the split stack discipline. But I'm not going to do that, even though it seems interesting. I'll leave that as an exercise for the reader. I'm not a fan of stack frames, even well constructed ones where the parts that hold them together are kept separate from what each frame contains.

Why did I use this protocol? I said there was a reason.

The snippet above isn't really enough to get an idea of what the code will look like. Let's look at the actual code for all four CPUs.

I ought to save the 68000 for last, but I'll spoil the surprise and do it first. Otherwise, the 6800 and 6801 code are going to wear both of us out and we won't be able to appreciate it.

* calling protocol for single-stack stack frames on 68000
* with A6 as the frame pointer, 
* and routines saving the frame pointer
* and allocating the new frame on entry.
ROUTINE1
*	...
	MOVE.L	PARAMETER1,-(A7)
	MOVE.L	PARAMETER2,-(A7)
	BSR.W	ROUTINE2
	LEA	PARAMETER_SIZE(A7),A7    ; drop 
	MOVE.L	D0,RESULTVAR_OFFSET(A6)    ; negative offset
*	...

Wait. 

For the linked list of frames to be valid, every routine has to use the same protocol. Every routine that is a routine, anyway. Let's show that:

* calling protocol for single-stack stack frames on 68000
* with A6 as the frame pointer, 
* and routines saving the frame pointer
* and allocating the new frame on entry.
FRAME_SIZE	SET	SOMETHING	; ROUTINE1's frame size
ROUTINE1
	LINK	A6,#FRAME_SIZE	; This context's frame size
*	...
	MOVE.L	PARAMETER1,-(A7)
	MOVE.L	PARAMETER2,-(A7)
	BSR.W	ROUTINE2
	LEA	PARAMETER_SIZE(A7),A7	; drop parameters
	MOVE.L	D0,RESULTVAR_OFFSET(A6)	; negative offset
*	...
	UNLK	A6
	RTS
* and the caller moves results where they go in its own frame 
* before using the return value register, which is usually immediately
* after dropping the parameters that are no longer in use.
*	...
*
*
* called routine entry protocol for single-stack stack frames
* with A6 as the frame pointer,
* and routines saving the frame pointer
* and allocating the new frame on entry.
* (Caller, of course, pushes call parameters before call,
* and pops return parameters to where they go on return.):
FRAME_SIZE	SET	SOMETHING	; ROUTINE2's frame size
ROUTINE2
	LINK	A6,#FRAME_SIZE
*	...
* No code for dealing with large return values. Can't do that.
* Large return values have to be dealt with outside of the stack frame.	
*	...	
	MOVE.L	PARAMETER2OFFSET+8(A6),D1	; dodge frame link and return address
*	...
	SUB.L	PARAMETER1OFFSET+8(A6),D3	; dodge frame link and return address
*	...
	MOVE.L	(A6),A0	; get the caller's frame pointer
	ADD.L	CALLER.VARIABLENOFFSET(A0),D5	; or something -- negative offset
*	...
* routine return protocol in all cases:
	LEA	FRAMESIZE-RETURNSIZE(A6),A6
	UNLK	A6
	RTS

That looks pretty, doesn't it?

WHAT is that LINK instruction? And the UNLK?

There's the reason for the protocol I chose. The 68000 does it for us. Trade five instructions for two, reduce the apparent cost of frames, help maintain their integrity. 

Get in the way of large return values.

How do we do large return values using this kind of stack frame?

Well, let me tell you. It involves something called static allocation from a memory pool, constructors and destructors, and garbage collection, and ...

Or you can do it the C way, and the caller passes, as a parameter, an explicit pointer to where the return value is ultimately supposed to go anyway. There is elegance to that solution, and it is the one I usually recommend for returning large values anyway. 

I'll have more to say about what constitutes a large return value and what constitutes a large return value later, when we have something (relatively) concrete to look at.

Unfortunately, we do not have the LINK and UNLK instructions in the 6809's repertoire. But it's no great misfortune:

* 6809
* calling protocol for single-stack stack frames on 6809
* with U as the frame pointer
* and routines saving the frame pointer
* and allocating the new frame on entry.
FRAME_SIZE	SET	SOMETHING	; ROUTINE1's frame size
ROUTINE1
	PSHS	U
	TFR	S,U
	LEAS	-FRAMESIZE,S
*	...
	LDX	PARAMETER1
	LDD	PARAMETER2
	PSHS	D,X
	LBSR	ROUTINE2
	LEAS	PARAMETER_SIZE,S	; drop parameters
	STD	RESULTVAR_OFFSET,U	; store result -- negative offset
*	...
	TFR	U,S		; restore/deallocate
	PULS	U
	RTS
* and the caller moves results where they go in its own frame 
* before using the return value register, which is usually immediately
* after dropping the parameters that are no longer in use.
*	...
*
*
* called routine entry protocol for single-stack stack frames
* with U as the frame pointer,
* and routines saving the frame pointer
* and allocating the new frame on entry.
* (Caller, of course, pushes call parameters before call,
* and pops return parameters to where they go on return.):
FRAME_SIZE	SET	SOMETHING	; ROUTINE2's frame size
ROUTINE2
	PSHS	U
	TFR	S,U
	LEAS	-FRAMESIZE,S
*	...
* No code for dealing with large return values. Can't do that.
* Large return values have to be dealt with outside of the stack frame.	
*	...	
	LDD	PARAMETER2OFFSET,U	; access the second parameter
*	...
	SUBD	PARAMETER1OFFSET,U	; access the first parameter
*	...
	LDX	,U			; get the caller's frame pointer
	ADDD	CALLER.PARAMETERNOFFSET,X	; or something -- negative offset
*	...
	TFR	U,S		; restore/deallocate
	PULS	U
	RTS

Nicely enough done.

But, the 6801 and the 6800 -- with the conflicts between uses of the stack, index, accumulator, offset calculations, and so on, lack of LINK and UNLK instructions is great misfortune on the 6800 and 6801. (But how would adding those instructions be done?) 

It's tempting to just punt on the 6800 and 6801. The sort of code required to do it goes way beyond what we've looked at so far.

You can mix and match various stack frames and frameless in the split stack runtime, as long as you either don't reach into the caller's stack frame, or at least know what the caller's stack frame looks like so you know how to reach in. 

But you can't mix and match in a single-stack runtime -- not without losing the benefits of the stack frame.

Really, if we're being sensible, we're not usually going to want to bother with stack frames anyway. 

So I'm going to recommend that you move ahead to numeric binary output, rather than dig further into stack frames on the 6801 and 6800.

(Title Page/Index)


 

ALPP 02-15 -- Switching Feet on the Beach -- Ephemeral Frame Pointer for Split Stack

 Switching Feet on the Beach --
Ephemeral Frame Pointer for Split Stack

(Title Page/Index)

Talking about stack frames with a frame pointer on split stacks didn't put you to sleep?

Well, let's try it without a frame pointer register.

How would this work?

We're going to assume a runtime discipline in which every routine allocates all its parameters, local (dynamic) variables, and temporary variables on the parameter stack at entry, and never pushes anything else to it, never pops anything from it. Pascal can do this. Ada can do this. C can even do this, but it really doesn't need to.

Again, we have routine 1 that has received two parameters, P1_1 and P1_2. And it has two temporary values or local variables, we don't care which, on the parameter stack, T1_1 and T1_2. Reviewing the case without a stack frame, this is how we diagrammed it before:


nnnn
P1_1_
P1_2
T1_1
T1_2 <= _PSP_
????
rrrr
RA_0 <= _RSP_
????
????
????
????

And routine 1 calls another routine, routine 2, passing that routine two parameters:

nnnn
P1_1_
P1_2
T1_1
T1_2
P2_1_
P2_2 <= _PSP_
????
rrrr
RA_0
RA_1 <= _RSP_
????
????
????


 

This time, 

we're going to set up a frame without using a frame pointer register. 

Here's how we start:


nnnn
<= FP_0
P1_1_
P1_2
T1_1
T1_2 <= _PSP_
????
rrrr
FA_0
RA_0 <= _RSP_
????
????
????

Then the caller pushes its stack pointer as a frame pointer on the return stack, and pushes the called routine's parameters on the parameter stack:


nnnn
<= FP_0
P1_1_
P1_2
T1_1
T1_2 <= FP_1
P2_1_
P2_2 <= _PSP_
????
rrrr
FA_0
RA_0
FA_1 <= _RSP_
????
????
????

 

Then the call issues.

On entry, the called routine will allocate space for its return parameter(s), and then copy the call parameters down, leaving room for the return parameter(s) where they can be left behind on exit:

nnnn
<= FP_0
P1_1_
P1_2
T1_1
T1_2 <= FP_1
RP2_1_
P2_1_
P2_2 <= _PSP_
????
rrrr
FA_0
RA_0
FA_1
RA_0 <= _RSP_
????
????
????

 

And then it will allocate room for the called function's local variables and temporary values, which will look like this:

nnnn
<= FA_0
P1_1_
P1_2
T1_1
T1_2 <= FA_1
RP2_1_
P2_1_
P2_2
V1_1
T1_1 <= _PSP_
????
rrrr
FA_0
RA_0
FA_1
RA_0 <= _RSP_
????
????
????

 

 

 

The called routine will reference all its own parameters, variables, and temporaries via the parameter stack pointer. If it needs to reference something in the caller's context and knows where it should be in the caller's context, it can load the caller's frame pointer from the return stack and do so.

At exit, it can simply deallocate everything but the return parameter(s) and issue a return.

And the calling routine can immediately move the return parameter(s) where it needs them, drop them, and go on its merry way.

Again this is not hard on the 68000:
* calling protocol for split stack frames on 68000
* with A6 as the parameter stack pointer, 
* and the frame pointer saved to the return stack only.
ROUTINE1
*	...
	MOVE.L	A6,(-A7)	; saved A6 is the caller's frame pointer
	MOVE.L	PARAMETER1,-(A6)
	MOVE.L	PARAMETER2,-(A6)
	BSR.W	ROUTINE2
	MOVE.L	FRAME2.RETVAL1_OFFSET(A6),CALLSIZE+VARIABLEN_OFFSET(A6)
	LEA	RETURNSIZE1(A6),A6	; deallocate parameters
	LEA	(A7),A7	; drop frame pointer if it no longer needs it.
*	...
* and the caller must move results where they go in it's own frame 
* immediately, to restore the parameter stack pointer offsets --
* or use some dynamic offset calculation that tracks what is on the stack.
*	...
*
*
* routine entry protocol for split stack frames
* and A6 as the parameter stack pointer
* and frame pointer on return stack only,
* but each routine allocates everything it needs on entry
* (Caller, of course, pushes call parameters before call,
* and pops return parameters to where they go on return.):
ROUTINE2
	LEA	-RETURNSIZE(A6),A6	; call and return parameters are part of local frame
	MOVE.L	A6,A0				; for copying
	MOVE.L	RETURNSIZE(A0),(A0)+
*	... repeat as necessary
	LEA	-FRAMESIZE+CALLSIZE+RETURNSIZE(A6),A6	; allocate frame
*	...	
	MOVE.L	PARAMETER2OFFSET(A6),D1	; access the second parameter
*	...
	SUB.L	PARAMETER1OFFSET(A6),D3	; access the first parameter
*	...
	MOVE.L	4(A7),A0	; get the caller's frame pointer
	ADD.L	CALLER.VARIABLENOFFSET(A0),D5	; or something
*	...
* routine return protocol in all cases:
	LEA	FRAMESIZE-RETURNSIZE(A6),A6
	RTS

I hope I wasn't too tired to get that right.

This also looks pretty reasonable.

On the 6809, we no longer need the frame pointer variable in DP:

* calling protocol for split stack frames on 6809
* with U as the parameter stack pointer, 
* and the frame pointer saved to the return stack only.
ROUTINE1
*	...
	PSHS	U	; the caller's frame pointer
	LDX	PARAMETER1
	LDD	PARAMETER2
	PSHU	D,X
	LBSR	ROUTINE2
	LDD	FRAME2.RETVAL1_OFFSET,U
	STD	CALLSIZE+VARIABLEN_OFFSET,U
	LEAU	RETURNSIZE1,U	; deallocate parameters
	LEAS	2,S	; drop frame pointer if it no longer needs it.
*	...
* and the caller must move results where they go in it's own frame 
* immediately, to restore the parameter stack pointer offsets --
* or use some dynamic offset calculation that tracks what is on the stack.
*	...
*
*
* routine entry protocol for split stack frames
* and A6 as the parameter stack pointer
* and frame pointer on return stack only,
* but each routine allocates everything it needs on entry
* (Caller, of course, pushes call parameters before call,
* and pops return parameters to where they go on return.):
ROUTINE2
	LEAU	-RETURNSIZE,U	; call and return parameters are part of local frame
	TFR	U,X				; for copying
	LDD	RETURNSIZE,X
	STD	,X++
*	... repeat as necessary
	LEAU	-FRAMESIZE+CALLSIZE+RETURNSIZE,U	; allocate frame
*	...	
	LDD	PARAMETER2OFFSET,U	; access the second parameter
*	...
	SUBD	PARAMETER1OFFSET,U	; access the first parameter
*	...
	LDX	2,S			; get the caller's frame pointer
	ADDD	CALLER.PARAMETERNOFFSET,X	; or something
*	...
* routine return protocol in all cases:
	LEAU	FRAMESIZE-RETURNSIZE,U
	RTS

The 6801 has no surprises:

* 6801
* calling protocol for split stack frames on 6801
* with PSP as the parameter stack pointer, 
* and the frame pointer saved to the return stack only.
ROUTINE1
*	...
	LDX	PSP
	PSHX			; the caller's frame pointer
	LDD	PARAMETER1
	JSR	PPSHD
	LDX	PARAMETER2
	JSR	PPSHX
	JSR	ROUTINE2
	LDX	PSP
	LDD	FRAME2.RETVAL1_OFFSET,X
	STD	CALLSIZE+VARIABLEN_OFFSET,X
	LDX	PSP	; deallocate parameters
	INX
	INX
	STX	PSP
	INS		; drop frame pointer if it no longer needs it.
	INS
*	...
* and the caller must move results where they go in it's own frame 
* immediately, to restore the parameter stack pointer offsets --
* or use some dynamic offset calculation that tracks what is on the stack.
*	...
*
*
* routine entry protocol for split stack frames
* and PSP as the parameter stack pointer
* and frame pointer on return stack only,
* but each routine allocates everything it needs on entry
* (Caller, of course, pushes call parameters before call,
* and pops return parameters to where they go on return.):
ROUTINE2
	LDD	PSP
	ADDD	#-RETURNSIZE	; call and return parameters are part of local frame
	STD	PSP
	LDX	PSP			; for copying
	LDD	RETURNSIZE,X
	STD	0,X
	INX
	INX
*	... repeat as necessary
	LDD	PSP
	ADDD	#-FRAMESIZE+CALLSIZE+RETURNSIZE
	STD	PSP			; allocate frame
*	...	
	LDX	PSP
	LDD	PARAMETER2OFFSET,X	; access the second parameter
*	...
	LDX	PSP
	SUBD	PARAMETER1OFFSET,X	; access the first parameter
*	...
	TSX
	LDX	2,X			; get the caller's frame pointer
	ADDD	CALLER.PARAMETERNOFFSET,X	; or something
*	...
* routine return protocol in all cases:
	LDD	PSP
	ADDD	#FRAMESIZE-RETURNSIZE
	STD	PSP
	RTS

and the 6800:

* calling protocol for split stack frames on 6800
* with PSP as the parameter stack pointer, 
* and the frame pointer saved to the return stack only.
ROUTINE1
*	...
	LDAA	PSP
	LDAB	PSP+1
	PSHB			; the caller's frame pointer
	PSHA
	LDAA	PARAMETER1
	LDAB	PARAMETER1+1
	JSR	PPSHD
	LDX	PARAMETER2
	JSR	PPSHX
	JSR	ROUTINE2
	LDX	PSP
	LDAA	FRAME2.RETVAL1_OFFSET,X
	LDAB	FRAME2.RETVAL1_OFFSET+1,X
	STAA	CALLSIZE+VARIABLEN_OFFSET,X
	STAB	CALLSIZE+VARIABLEN_OFFSET+1,X
	LDX	PSP	; deallocate parameters
	INX
	INX
	STX	PSP
	INS		; drop frame pointer if it no longer needs it.
	INS
*	...
* and the caller must move results where they go in it's own frame 
* immediately, to restore the parameter stack pointer offsets --
* or use some dynamic offset calculation that tracks what is on the stack.
*	...
*
*
* routine entry protocol for split stack frames
* and PSP as the parameter stack pointer
* and frame pointer on return stack only,
* but each routine allocates everything it needs on entry
* (Caller, of course, pushes call parameters before call,
* and pops return parameters to where they go on return.):
ROUTINE2
	LDAB	#RETURNSIZE	; call and return parameters are part of local frame
	JSR	ADDBPSP
	LDX	PSP			; for copying
	LDAA	RETURNSIZE,X
	LDAB	RETURNSIZE+1,X
	STAA	0,X
	STAB	1,X
	INX
	INX
*	... repeat as necessary
	LDAB	#-FRAMESIZE+CALLSIZE+RETURNSIZE
	JSR	ADDBPSP			; allocate frame
*	...	
	LDX	PSP
	LDAA	PARAMETER2OFFSET,X	; access the second parameter
	LDAB	PARAMETER2OFFSET+1,X
*	...
	LDX	PSP
	SUBB	PARAMETER1OFFSET+1,X	; access the first parameter
	SBCA	PARAMETER1OFFSET,X
*	...
	TSX
	LDX	2,X			; get the caller's frame pointer
	ADDB	CALLER.PARAMETERNOFFSET+1,X	; or something
	ADCB	CALLER.PARAMETERNOFFSET,X
*	...
* routine return protocol in all cases:
	LDAB	#FRAMESIZE-RETURNSIZE
	JSR	ADDBPSP
	RTS
*
ADDBPSP	CLRA
	TSTB
	BPL	ADDBPSM
	COMA
ADDBPSM	ADDB	PSP+1
	ADCA	PSP
	STAB	PSP+1
	STAA	PSP
	RTS

*
ADDBX	STX	XWORK
	CLRA
	TSTB
	BPL	ADDBXP
	COMA
ADDBXP	ADDB	XWORK+1
	ADCA	XWORK
	STAB	XWORK+1
	STAA	XWORK
	LDX	XWORK
	RTS

And, again, that doesn't look all that bad.

A lot of that would be done on the 6800 and 6801 by low-level subroutines, of course, since we don't want the object just blowing up in size, just to support high-level constructs at run-time.

Well, really, if we're being sensible, we're not usually going to want to bother with frames. 

But you can mix and match frames like this with frameless in the split stack runtime, as long as you either don't reach into the caller's stack frame, or at least know what the caller's stack frame looks like so you know how to reach in.

And we must not forget that multitasking and multiprocessing on the 6800 and 6801 require saving and restoring the virtual registers in the direct page -- PSP, XWORK, and such,

And I guess it's time to look at stack frames with a single combined stack. Yuck.

Or skip out and move ahead to binary output.

(Title Page/Index)


 

 

 

 

Tuesday, October 15, 2024

ALPP 02-14 -- One Foot on the Beach, One in the Surf -- Frame Pointer for Split Stack

One Foot on the Beach,
One in the Surf --
Frame Pointer for Split Stack

(Title Page/Index)

Didn't I say we weren't going to look at this yet?

Talking about balancing the stack when we have just begun using it wasn't boring enough?

I'm not going to complain if you skip ahead to getting binary output.

Oh, well.

There's a philosophy that you should allocate all the memory a function will need when you start a function, and deallocate it all when you finish. 

It's not a meaningless philosophy.

And it's not really limited to the single-stack run-times, but it is the only way to keep a single stack runtime sane when parameter lists become longer and local variables increase.

I think we've seen enough of accessing variables on the parameter stack that I can introduce the concept of a stack frame in somewhat abstract terms. After this introduction, I'll show some actual implementations borrowed from the split stack examples of this chapter.

Say we have a routine, routine 1 that has received two parameters, P1_1 and P1_2. And it has two temporary values on the parameter stack, T1_1 and T1_2. The parameter stack (pointed to by PSP) and the return stack (pointed to by RSP) could be diagrammed as follows, using nnnn to show parameters and temporaries, and rrrr to show return addresses from further back in the call chain, and RA_0 to show the return address from the code that called routine 1:

nnnn
P1_1_
P1_2
T1_1
T1_2 <= _PSP_
????
rrrr
RA_0 <= _RSP_
????
????
????
????

And routine 1 calls another routine, routine 2, passing that routine two parameters:

nnnn
P1_1_
P1_2
T1_1
T1_2
P2_1_
P2_2 <= _PSP_
????
rrrr
RA_0
RA_1 <= _RSP_
????
????
????


 

Now,  that isn't too hard to keep track of, but what if you have a call chain twenty calls deep? (The correct answer is that a called routine shouldn't want to know, but that answer doesn't work for languages like Pascal that want called routines to be able to access the caller routine's environment.)

We could keep a frame pointer in some spare register, to show where the parameters passed to the current routine are. The calling routine would push the frame pointer before it makes a call. The called routine would move PSP to the frame pointer on entry. And the stacks would look like this:

nnnn
P1_1_
P1_2 <= _FP_
T1_1
T1_2 <= _PSP_
????
rrrr
FA_0
RA_0 <= _RSP_
????
????
????

And just after routine 1 calls routine 2 and routine 2 updates the frame pointer, it looks like this:

nnnn
P1_1_
P1_2
T1_1
T1_2
P2_1_
P2_2 <= _PSP_,_FP_
????
rrrr
FA_0
RA_0
FA_1
RA_1 <= _RSP_
????


 

And routine 2, the called routine, can push temporaries on the parameter stack and reference its parameters via the frame pointer. 

And if it needs, for some reason, to reference the previous frame (and knows that the caller routine pushed the frame pointer), it can go look at the previous frame pointer on the return stack.

Now, that would not be so hard on the 68000, because we have lots of registers. We could do it like this:

* calling protocol for split stack frames
* with A6 as the parameter stack pointer 
* and A4 as the frame pointer:
ROUTINE1
*	...
	MOVE.L	PARAMETER1,-(A6)
	MOVE.L	PARAMETER2,-(A6)
	MOVE.L	A4,(-A7)
	BSR.W	ROUTINE2
	MOVE.L	(A7)+,A4	; pop own frame pointer
*	...
*
*
* routine entry protocol for split stack frames
* with A4 as the frame pointer
* and A6 as the parameter stack pointer:
ROUTINE2
	MOVE.L A6,A4
*	...	
	MOVE.L	0(A4),D1	; access the second parameter
*	...
	SUB.L	4(A4),D3	; access the first parameter
*	...
	MOVE.L	4(A7),A0	; get the caller's frame pointer
	ADD.L	0(A0),D5	; access the caller's last parameter pushed!
*	...
* routine return protocol in case of return-on-stack
	LEA	PARAMETER_SIZE-RETURN_SIZE(A4),A4
	MOVE.L	RETURN_VALUE0(A6),(A4)
	...
	MOVE.L	RETURN_VALUEN(A6),N*4(A4) 
	MOVE.L	A4,A6
	RTS
* routine return protocol in case of return-in-register discipline:
	MOVE.L	A4,A6		; clear the local frame, except for own parameters
	LEA	8(A6),A6	; drop two parameters
	RTS

And we need to talk about the return protocol.

In many run-time disciplines, the return value is limited to what will fit in one or two, or maybe three registers. In such a case, the only thing you need to do to return, once you have the return values in your registers, is deallocate all the stuff that you were using on stack, deallocate your parameters, and return. Deallocate all the non-parameters is accomplished by moving the frame pointer in A4 back to the parameter stack pointer in A6.

But if you need to return something bigger than that, and the run-time supports doing so, the return protocol needs to calculate the difference and adjust A4, copy the return values into place, and then deallocate by moving the adjusted address in A4 to A6.

You have to wait to deallocate until after you copy the return values into place, of course. Otherwise, an interrupt might leave you calculating things that aren't there any more.

And the calling routine needs to restore its own frame pointer when the called routine returns.

Looks pretty reasonable, if that's what we want to do.

On the 6809, 6800 and 6801, of course we'd declare a frame pointer in the direct page, and that would require more code to maintain all that. It's an interesting exercise which I should ignore for now. 

Oh, why not? 6809 code:

* calling protocol for split stack frames on 6809
* with U as the parameter stack pointer 
* and direct page variable FP as the frame pointer:
ROUTINE1
*	...
	LDD	PARAMETER1
	PSHU	D
	LDX	PARAMETER2
	PSHU	X
	LDX	FP
	PSHS	X
	LBSR	ROUTINE2
	PULS	X
	STX	FP
*	...
*
*
* routine entry protocol for split stack frames on 6809
* with U as the parameter stack pointer 
* and direct page variable FP as the frame pointer:
ROUTINE2
	STU	FP
*	...
	LDX	FP	
	LDD	0,X	; access the second parameter
*	...
	LDY	FP
	SUBD	2,Y	; access the first parameter
*	...
	LDX	2,S	; get the caller's frame pointer
	ADDD	0,X	; access the caller's last parameter pushed!
*	...
* routine return protocol in case of return-in-register discipline:
	LDU	FP	; clear the local frame, except for own parameters
	LEAU	4,U	; drop two parameters
	RTS
* routine return protocol in case of return-on-stack
	LDX	FP
	LEAX	PARAMETER_SIZE-RETURN_SIZE,X
	LDD	RETURN_VALUE0,U
	STD	,X
	...
	LDD	RETURN_VALUEN,U
	STD	N*4,X 
	TFR	X,U
	RTS

6801:

* calling protocol for split stack frames on 6801
* with direct page variable PSP as the parameter stack pointer 
* and direct page variable FP as the frame pointer:
ROUTINE1
*	...
	LDD	PARAMETER1
	JSR	PPSHD
	LDX	PARAMETER2
	JSR	PPSHX
	LDX	FP
	PSHX
	JSR	ROUTINE2
	PULX
	STX	FP
*	...
*
*
* routine entry protocol for split stack frames on 6801
* with direct page variable PSP as the parameter stack pointer 
* and direct page variable FP as the frame pointer:
ROUTINE2
	LDX	PSP
	STX	FP
*	...
	LDX	FP	
	LDD	0,X	; access the second parameter
*	...
	LDX	FP
	SUBD	2,X	; access the first parameter
*	...
	TSX
	LDX	2,X	; get the caller's frame pointer
	ADDD	0,X	; access the caller's last parameter pushed!
*	...
* routine return protocol in case of return-in-register discipline:
	LDX	FP	; clear the local frame, except for own parameters
	INX	; drop two parameters (might be done by subroutine).
	INX
	INX
	INX
	STX	PSP
	RTS
* routine return protocol in case of return-on-stack
	LDD	FP
	ADDD	#PARAMETER_SIZE-RETURN_SIZE
	STD	FP
	LDX	PSP
	LDD	RETURN_VALUE0,X
	LDX	FP
	STD	0,X
	...
	LDX	PSP
	LDD	RETURN_VALUEN,X
	LDX	FP
	STD	N*4,X 
	STX	PSP
	RTS

and 6800:

* calling protocol for split stack frames on 6800
* with direct page variable PSP as the parameter stack pointer 
* and direct page variable FP as the frame pointer:
ROUTINE1
*	...
	LDD	PARAMETER1
	JSR	PPSHD
	LDX	PARAMETER2
	JSR	PPSHX
	LDAA	FP
	LDAB	FP+1
	PSHB
	PSHA
	JSR	ROUTINE2
	PULB
	PULA
	STAA	FP
	STAB	FP+1
*	...
*
*
* routine entry protocol for split stack frames on 6800
* with direct page variable PSP as the parameter stack pointer 
* and direct page variable FP as the frame pointer:
ROUTINE2
	LDX	PSP
	STX	FP
*	...
	LDX	FP	
	LDAA	0,X	; access the second parameter
	LDAB	1,X
*	...
	LDX	FP
	SUBB	3,X	; access the first parameter
	SBCA	2,X
*	...
	TSX
	LDX	2,X	; get the caller's frame pointer
	ADDB	1,X	; access the caller's last parameter pushed!
	ADCA	0,X
*	...
* routine return protocol in case of return-in-register discipline:
	LDX	FP	; clear the local frame, except for own parameters
	INX	; drop two parameters (might be done by subroutine).
	INX
	INX
	INX
	STX	PSP
	RTS
* routine return protocol in case of return-on-stack
	LDX	FP
	LDAB	#PARAMETER_SIZE-RETURN_SIZE	; -128 <= SZ < 128!
	JSR	ADDBX
	STX	FP
	LDX	PSP
	LDAA	RETURN_VALUE0,X	
	LDAB	RETURN_VALUE0+1,X	
	LDX	FP
	STAA	0,X
	STAB	1,X
	...
	LDX	PSP
	LDAA	RETURN_VALUEN,X	
	LDAB	RETURN_VALUEN+1,X	
	LDX	FP
	STAA	N*4,X 
	STAB	N*4+1,X 
	STX	PSP
	RTS
*	...
ADDBX	STX	XWORK
	CLRA
	TSTB
	BPL	ADDBXP
	COMA
ADDBXP	ADDB	XWORK+1
	ADCA	XWORK
	STAB	XWORK+1
	STAA	XWORK
	LDX	XWORK
	RTS

You know, that doesn't look all that bad.

It's all untested, but it should work. Uhm, for some definition of "should" and "work".

In particular, the protocol for copying return values back up the parameter stack in the case of return values on the parameter stack won't be that simple. You can't just copy from the bottom up without overwriting something before you can get it copied in some cases. And you'll have the same problem copying from top down. Which means you'll need to schedule the copying so that locations that will be used for return values will get copied into place first.

You can schedule that by hand, but it's going to be a pain to do when it's necessary. You can also have a compiler schedule it for you, but the compiler has to chase all the dependencies down before it starts spitting out code.

All of which leaves us to appreciate the sensibility of limiting return values to what fits in a register or two -- if you're going to go to the trouble of using stack frames.

Another approach to handling return values, which I describe in the next chapter, would be to consider that the return parameters and call parameters might exist in the called routine's context at the same time, and copy the call parameters below where the return parameters will be left, which is much more sensible. 

I describe it there because it seems to go well with a stack frame that doesn't substantially change in size over the life of the routine. But there's no reason not to use it with this kind of stack frame.

Oh, and one more thing about getting the above code to work in a multitasking, multiprocessor environment on the 6800 and 6801 -- think about what happens if the CPU is interrupted in the middle of modifying PSP or FP, or using XWORK.

Your OS will need its own PSP, FP, XWORK, etc., and saving and restoring the user process's PSP, FP, XWORK, etc. will be a required step in switching tasks. Some other processes besides the OS may want their own virtual registers in the direct page, as well. 

The 6809 can give each process its own direct page, of course.

Well, that's enough for one chapter, and, as I say, I want to show you another way to do frames with a split stack before I show you how to do the single stack.

(Title Page/Index)


 

 

 

 

ALPP 02-XX -- Backsliding -- Stack Frames

Botched start, keeping for notes

Here's where you want to go instead: https://joels-programming-fun.blogspot.com/2024/10/alpp-02-14-one-foot-on-beach-one-in-surf-frame-pointer-split-stack.html.

Balancing on Some Other Beach --
Stack Frames
and Debugging on the 6800

(Title Page/Index)

 [This was from the start to the split-stack frames.]

Didn't I say we weren't going to look at this yet?

Talking about balancing the stack when we have just begun using it wasn't boring enough?

I'm not going to complain if you skip ahead to getting binary output.

Oh, well.

There's a philosophy that you should allocate all the memory a function will need when you start a function, and deallocate it all when you finish. 

It's not a meaningless philosophy.

And it's not really limited to the single-stack run-times, but it is the only way to keep a single stack runtime sane when parameter lists become longer and local variables increase.

Considering the previous chapter, where we calculated a predicted value for the stack pointer at the end of the function. To do that, we had to know what parameters we were accepting, and what we were returning. If, in addition to that, we know what temporary variables we will be using, we can allocate it all at once and not worry about what is happening to the stack in odd corners of the code.

And if we can do that, we think we can dodge the return address.

There are a number of problems with this ideal. In practice, we have limited our return values to a single value that we can return in a register, or maybe in a few registers. Anything bigger than that, we tend to follow the practice of C, passing a pointer to the place we want the value returned to to the function, and defining the function to work through the pointer.

It's not too hard, to understand, conceptually, but it gets clumsy, and it creates a bottleneck.

If you're already thinking a stack frame sounds like a lot of work to set up by hand, you're right. But compilers can work out the details and hide them from us. 

And, to a certain extent, in languages like Pascal and Ada, prevent us from overwriting return addresses.

Enough of the sales job. Let's look at a "typical" stack frame on the 68000, since the 68000 is designed to handle them. 

There are several ways to set up stack frames, I'll pick two that will work okay on our four initial targets. 

Two?

Yeah. I'm going to show you a stack frame on a split stack runtime, as well.

 

[The following was from a false start on the single-stack stack frames on the 6801, which still hasn't gotten anywhere real.]

Single-stack stack frames on the 6809 and 68000 seem almost reasonable. Almost.

But with the conflicts between uses of the stack, index, accumulator, offset calculations, and so on, it's tempting to just punt on the 6800 and 6801. 

It can be done. Sort-of.

Negative index offsets are a problem with the 6800 and 6801. The instruction set provides for positive constant offsets, but not negative. 

This means you have to waste time calculating addresses at run time to get to things in the stack frame that are below the frame pointer, and that's the way the 68000's LINK and UNLK instructions work it. Your frame pointer points to the top of the frame, and the link to the caller's frame. The stack pointer points to the bottom.

 

Including an SBX (subtract B from X) instruction to parallel the ABX (add B to X) in the 6801's extended instruction set would have helped, but they didn't do that. And it still would have been awkward.

Adjusting the frame pointer to point to the bottom instead of the top would be workable, but then you either need to have an explicit size to the frame, leading to more run-time calculations, or you need to track both the bottom and top. That might work better than providing low-level routines to add and subtract offsets from pointers.

A lot of times, a small number of INX or DEX instructions will do the job. If we limit the size of a stack frame to 16 bytes, utility subroutines of nothing but INX or DEX instructions and multiple entry points could do the job. And ADDDX instruction will be faster on the 6801 than signed ADDBX.

Using two frame pointers, a frame base pointer and a frame link pointer, should help a lot, but the stack allocation and deallocation will be weird. 

Oh, and using D and X all over the place to calculate offsets will make it impossible to put return values in D and X, so we're going to allocate 4 bytes of return value variable at the top of the stack.

We really shouldn't be trying to tackle this problem this early in the tutorial, which is yet another reason I didn't want to use stack frames.

So I really should punt for now, and pick this back up waaaaay down the road. 

But I'm feeling obstinate. Here goes nothing:

 


(Title Page/Index)


 

ALPP 02-13 -- Balancing on the Beach -- Comparing Pointers, Checking the Stack

Balancing on the Beach --
Comparing Pointers,
Checking the Stack

(Title Page/Index)

Four different processors, three different parameter passing modes. And natural width (address width) arithmetic -- 16-bit on the 8-bit processors, but 32-bit on the "16-bit" 68000.

And I really don't want to leave a certain set of questions begging.

* How do you make sure that your use of statically allocated parameters (and variables) has no internal conflicts? How do you make sure that you aren't using, say, a parameter/variable to allocate an I/O buffer, then, before you finish the allocation maintenance code, you go off and start a routine to collect de-allocated buffers and use the same variable for a conflicting purpose?

Discipline! 

There are several disciplines for static parameters and variables, and the disciplines allow the use of automated tools that can check for conflicting uses. 

If you don't have such tools, you have to analyze their use yourself. Discipline helps, but, ultimately, you want to avoid using more statically allocated parameters and variables than you can track and analyze.

I'll talk about various approaches to discipline as we go, variable naming and use, file and directory strategies and such -- because we really can't avoid some degree of static allocation.

* How do you make sure your return address stack is balanced?

Well, if you don't put temporaries and parameters and such on the return address stack, that's a tautology. If you come back from every subroutine you call, your return address stack is balanced. If you don't, it isn't.

Well, there are other reasons for not coming back, such as crashing the stack and infinite loops, topics which we will brush on here and there.

* What if you do use a combined stack? 

Actually, if you use a combined stack and you manage to return from every call, you can be pretty sure it's balanced -- given the exceptions mentioned above. But when you have a lot of parameters and variables on that stack, the probability of not coming back increases.

So you use a stack frame, which I have been waiting to explain until we come to some functions that are complicated enough to need a stack frame

Otherwise, just make sure you have a pop for every push and never have too many pops, and watch for your code to fail to come back from a subroutine call. 

And if it doesn't return?

8-0

Yeah, seeing things blow up and figuring out which call it didn't return from can become something of a black art. So you tend to overuse the stack frame.

* If you use a separate parameter stack, how do you make sure you've kept it balanced? The return stack and the parameter stack might get out of sync, might they not? And you could return from every call, but still leave parameters out there to be processed, or end up having tried to process parameters that were never pushed out there.

Just make sure you have a pop for every push and never have too many pops. Simple. ...

And how do you do that?

AAAARRRRRGGGGGHHHHHH!!!!!

No magic. 

But we do have strategies, some of which we use and some of which we don't, depending on various things. A few of them are

  • Testing pushes and pops before you do them.
  • Periodically checking that the parameter stack pointer is pointing at valid parameter stack space.
  • Providing bumpers, or buffer zones between the stack space and other spaces, and periodically checking that nothing has been written into those zones.
  • Saving the stack pointer when you enter a function and checking that it matches before you leave.
  • Using a stack frame on the parameter stack, in effect pushing everything at once, and popping it all at once.
  • Letting the hardware allocate the parameter stack in an area of memory isolated from the rest, so that overflows and underflows cause memory errors.

These strategies all work on the single combined stack, as well, by the way.

We really aren't prepared to talk about all of those, but we can talk about some of them.

For instance, here's code to check the stack pointer on exit and periodically check it mid-flight, on the 68000:

* This is not the best way for every use.
...
PSTKLIM	DS.L	32	; roughly 16 levels of call at two parameters per call
PSTKBAS	DS.L	1	; bumper space -- parameter stack is pre-dec
* ...
START	BSR.W	INISTKS
*	...
	CMP.L	#PSTKBAS,A6
	BHI.W	STKERR
	CMP.L	#PSTKLIM,A6
	BLO.W	STKERR
*	...
DONE	CMP.L	#PSTKBAS,A6
	BEQ.S	EXIT
STKERR *	...
*	...
EXIT	MOVEM.L	A4SAVE-LB_ADDR(A5),A4-A7	; restore the monitor's A4-A7
	NOP
	NOP		; landing pad

What is this CoMPare instruction?

It effectively subtracts the operands, but doesn't store the result. It only effects the flags, so you can keep your value but branch on the basis of the comparison. 

Many of the branch instructions Motorola implements on their CPUs test multiple flags, and the mnemonics they specify are supposed to help you remember what the combinations mean.

  • HI means that the target of the CoMPare is HIgher (unsigned compare) than the source. (Carry is set.) So if A6 is higher than PSTKBAS, it branches to STKERR. Remember, stack pushed to low memory, so it's "upside-down", so to speak.
  • LO means the target is LOwer (unsigned) than the source. (Carry is set and Zero is clear.) So if A6 is lower than PSTKLIM, it branches to STKERR.
  • EQ means the source is EQual to the target. (Zero flag is set.) So if, after the body of the main function is done, A6 is equal to PSTKBAS -- nothing left on stack, and not pointing beyond the bottom of stack -- it branches around the STKERR function to EXIT.
  • NE, by the way, means Not Equal. (Zero flag is clear.)

This is essentially something we could do for all four processors -- For the 6800:

* This is not the best way for every use.
* 6800 CPX flags are incomplete, 
* but Z is valid.
* And N should be valid all by itself, 
* even though the manual says
* "not intended for conditional branching".
*	...
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
*	...
START	JSR	INISTKS
*	...
	LDX	PSP
	CPX	#PSTKLIM	; Z is valid, N should be valid
	BEQ	TSTOVR		; have to test separately
	BPL	TSTOVR
	JMP	STKERR
TSTOVR	CPX	#PSTKBAS
	BEQ	CONT
	BPL	STKERR
CONT *	...
*	...
DONE	LDX	PSP
	CPX	#PSTKBAS
	BEQ	EXIT
STKERR *	... 
*	...
EXIT	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

For the 6801:

* This is not the best way for every use.
* 6801 CPX flags are all valid 
*	...
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
*	...
START	JSR	INISTKS
*	...
	LDX	PSP
	CPX	#PSTKLIM	; All flags valid
	BLO	STKERR
	CPX	#PSTKBAS
	BHI	STKERR
CONT *	...
*	...
DONE	LDX	PSP
	CPX	#PSTKBAS
	BEQ	EXIT
STKERR *	... 
*	...
EXIT	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

For the 6809 using absolute addressing:

* This is not the best way for every use.
* 6809 CMPU flags are all valid 
*	...
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
*	...
START	LBSR	INISTKS
*	...
	CMPU	#PSTKLIM
	BLO	STKERR
	CMPU	#PSTKBAS
	BHI	STKERR
CONT *	...
*	...
DONE	CMPU	#PSTKBAS
	BEQ	EXIT
STKERR *	... 
*	...
EXIT	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

But we had initialized U via PC relative addressing. so we should use LEA to calculate the addresses and do it this way:

* This is not the best way for every use.
* 6809 CMPU flags are all valid 
*	...
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
*	...
START	LBSR	INISTKS
*	...
	LEAX	PSTKLIM,PCR
	PSHS	X
	CMPU	,S++
	BLO	STKERR
	LEAX	PSTKBAS,PCR
	PSHS	X
	CMPU	,S++
	BHI	STKERR
CONT *	...
*	...
DONE	CMPU	#PSTKBAS
	BEQ	EXIT
STKERR *	... 
*	...
EXIT	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

And, come to think of it, our initialization code for the 68000 was also PC relative, so here's how to do it PC relative on the 68000:

* This is not the best way for every use.
*...
PSTKLIM	DS.L	32	; roughly 16 levels of call at two parameters per call
PSTKBAS	DS.L	1	; bumper space -- parameter stack is pre-dec
*...
START	BSR.W	INISTKS
*	...
	LEA	PSTKBAS(PC),A0
	CMP.L	A0,A6
	BHI.W	STKERR
	LEA	PSTKLIM(PC),A0
	CMP.L	A0,A6
	BLO.W	STKERR
*	...
DONE	LEA	PSTKBAS(PC),A0
	CMP.L	A0,A6
	BEQ.S	EXIT
STKERR *	...
*	...
EXIT	MOVEM.L	A4SAVE-LB_ADDR(A5),A4-A7	; restore the monitor's A4-A7
	NOP
	NOP		; landing pad

That's pretty heavy duty. If it's just on the main function, I suppose it isn't so bad. Particularly, it's something we might insert in the main function when we are debugging and need to watch what's happening to the stack.

But if  checking the stack on function entry and exit is going to be that hard, do we really want to work the processor that hard?

(This is one of the places where current CPUs are all lacking. The CPU should define limit registers for the stacks and handle these checks at run-time transparently. Current practice is to isolate the single stack in its own MMU segment, frame function calls in rigid stack frames, and let the MMU hardware trap stack-out-of-bounds. But there are ways hostile code, or code running wild, can get around that in most modern MMU implementations. We are too focused on speed.)

Simply checking that the stack is balanced between entry and exit isn't quite as good as full bounds checking, but it does help. If all functions remain balanced, the only problem we might have is stack overflow.

(cough mumble functions that allocate or deallocate stack mutter mumble)

Let's look at balance checks on the 68000:
* This is not the best way for every use.
FENTRY	MOVE.L	A4,-(A7)	; Save A4 on entry to function
	MOVE.L	A6,A4	; Mark the current allocation depth.
	...
	CMP.L	A4,A6	; subtract A4 from A6, don't store result
	BNE.W	STKERR	; STKERR will have to deal with the saved A4
	MOVE.L	(A7)+,A4
	RTS

And you insert the meat of the function in place of the ellipsis.

How would it look on the 6809? We don't have enough registers to just use one, but we could push the marker to the return stack. Of course, then we have to be careful to keep that in balance, but if we let the return stack go out of balance we've got problems anyway.
* This is not the best way for every use.
FENTRY	PSHS	U
	...
	CMPU	,S++	; compare and pop
	BNE	STKERR
	RTS

And, if you're thinking that we could have done that on the 68000, yeah, we could have, but we don't seem to be using A4 for anything, and it was convenient.

How would it look on the 6801?

* This is not the best way for every use.
FENTRY	LDX	PSP
	PSHX
	...
	PULX
	CPX	PSP	; inverted compare, 
	BEQ	FEXIT	; but we're only looking at equal
	JMP	STKERR
FEXIT	RTS

And how about the 6800? Can it be done reasonably effeciently?

We're gonna need a subroutine, but that means we're going to be fighting with the return address. But, yeah it can be done. It's a lotta code.

PMARK	DES		; Need space.
	DES
	TSX
	LDAA	2,X	; return address
	LDAB	3,X
	STAA	0,X	; move it down
	STAB	1,X
	LDAA	PSP	; get the mark
	LDAB	PSP+1
	STAA	2,X	; move it in
	STAB	3,X
	RTS
* ...
FENTRY	JSR	PMARK
*	...    ; meat of function here.
	TSX
	LDX	0,X
	CPX	PSP
	BEQ	FEXIT
	JMP	STKERR
FEXIT	RTS

I guess that's not so bad. Maybe.

But, wait!

 (Told you so.)

So far, our functions have cleaned up their own stacks, and the parameter stack pointer on entrance is not same as the stack pointer on exit. It changes. We can't use these.

Or can we, with some modifications?

Usually, we know how much the function is going adjust the stack by on return. Could we save the mark pre-adjusted?

Hmm.

Let's look at pre-adjusted balance checks on the 68000.

68000, with adjusted A4:

FENTRY	MOVE.L	A4,-(A7)	; Save A4 on entry to function
	LEA	ADJVAL(A6),A4	; Mark the target allocation depth.
	...
	CMP.L	A4,A6
	BNE.W	STKERR	; STKERR will have to deal with the saved A4
	MOVE.L	(A7)+,A4
	RTS

68000, adjusted and pushed on return stack:

FENTRY	LEA	ADJVAL(A6),A0	; Use a volatile register
	MOVE.L	A0,-(A7)	; Save pre-adjusted on entry to function
*	...
	CMP.L	(A7),A6		; deliberately leave it on the stack for STKERR
	BNE.W	STKERR
	LEA	NATWID(A7),A7
	RTS 

And on the 6809, adjusted and pushed on the return stack:

FENTRY	LEAX	ADJVAL,U
	PSHS	X
*	...
	CMPU	,S	; leave it on stack for STKERR
	BNE	STKERR
	LEAS	NATWID,S
	RTS

On the 6801, again, adjusted and pushed on the return stack:

FENTRY	LDD	#ADJVAL    ; Unsigned ABX doesn't help.
	ADDD	PSP
	PSHB
	PSHA
*	...
	PULX
	CPX	PSP	; inverted compare, 
	BEQ	FEXIT	; but we're only looking at equal
	JMP	STKERR	; STKERR can see both mark in X and PSP
FEXIT	RTS

And, finally, on the 6800 (wow!):

* Signed offset in B, -128 to +127
PMARK	DES		; Need space.
	DES
	TSX
	LDAA	2,X	; return address
	STAA	0,X	; move it down
	LDAA	3,X	; leave B alone
	STAA	1,X
	CLRA
	TSTB
	BPL	PMARKA
	COMA		; sign extend
PMARKA	ADDB	PSP+1
	ADCA	PSP
	STAA	2,X	; move it in
	STAB	3,X
	RTS
* ...
FENTRY	LDAB	#ADJVAL
	JSR	PMARK
*	...
	TSX
	LDX	0,X
	CPX	PSP
	BEQ	FEXIT
	JMP	STKERR
FEXIT	RTS

Okay, so it's possible to mark the stack. But maybe we only want to do it for those functions that we aren't confident are going to stay balanced because different branches of the code do different things. 

Or when we are debugging.

Am I sure that using a combined stack, with stack frames, is not going to be easier, more efficient, and safer?

Is it time to look at stack frames? 

No! NO! Noooooooooooo!

I DON'T want to do that!

First I want to show you how to get numeric output, so that we don't have to depend on the debugger to see the effects of our code! We know enough to get binary output.

Sigh. 

Besides, we really need something more complex than what we have at this point to make it make sense. Don't we? 

Maybe not, especially if we start with frame pointers on a split stack.


(Title Page/Index)

 

Monday, October 14, 2024

ALPP 01-13 -- Hello, Bugs! Debugging Example, 6800 and 6809

Hello, Bugs!
Debugging Example
6800 and 6809

(Title Page/Index)

 

We've got simple text output using the monitor ROM routines under EXORsim, with the 6800, 6801, and 6809. And we have the same on the 68000, using the Atari BIOS under Hatari.

I've talked a little about debugging, and we used some debugging techniques getting EXORsim6801 to offer us something stable to look at.

But I need to make sure I give you a chance to figure out how to deal with problems, because assembler level code tends to be very sensitive to programmer error.

I may have gone over the top on this one. If so, forgive me.

Our theoretical target for this debugging exercise is to get all of the following strings of characters to display on the terminal: 

  • "To the beach!"
  • "Hello Bugs Bunny!" 
  • "I am a muggle."
  • "Are we there yet?"

Now let's try that on the 6800. 

(The same code can be used on the 6801, but you'll need to add the kickstarter code.)

Here's some code with a couple of common bugs:

XPDAT1	EQU	$F027	; string output, terminated by EOT
EOT	EQU	$04	; $04 is decimal 4
LF	EQU	$0A	; line feed
CR	EQU	$0D	; carriage return
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
ENTRY	JMP	START
* (EXORsim apparently doesn't want to calculate RMB arguments.)
*	RMB	16*NATWID-1
	RMB	31	; 16 levels of call minus any saved registers, max
STKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
SAVES	RMB	2	; a place to keep S so we can be clean
STR1	FCB	CR,LF	; Put message at beginning of line
	FCB	"To the beach!"	; Whatever the user wants here.
	FCB	CR,LF,EOT	; Put the debugger's output on a new line.
STR1	FCB	CR,LF	; Put message at beginning of line
	FCB	"Hello Bugs Bunny!"	; Whatever the user wants here.
	FCB	CR,LF,EOT	; Put the debugger's output on a new line.
STR1	FCB	CR,LF	; Put message at beginning of line
	FCB	"I am a muggle."	; Whatever the user wants here.
	FCB	CR,LF,EOT	; Put the debugger's output on a new line.
STR1	FCB	CR,LF	; Put message at beginning of line
	FCB	"Are we there yet?"	; Whatever the user wants here.
	FCB	CR,LF,EOT	; Put the debugger's output on a new line.
*
START	STS	SAVES	; Save what the monitor gives us.
	LDS	#STKBAS	; Move to our own stack
	LDX	#STR1	; point to the string
	JSR	XPDAT1	; output it
	LDX	#STR1	; point to the string
	JSR	XPDAT1	; output it
	LDX	#STR1	; point to the string
	JSR	XPDAT1	; output it
	LDX	#STR1	; point to the string
	JSR	XPDAT1	; output it
DONE	LDS	SAVES	; restore the stack pointer
	NOP
	NOP		; landing pad
"
"" 
""
""

Don't bother looking for the bugs yet, just copy and paste it into the assemble command of a running EXORsim 6800 session and watch the output:

(Assemble at $2000.)

$ ./exor --mon
Load facts file 'facts'
'exbug.bin' loaded.
  EXBUG-1.1 detected
'mdos.dsk' opened for drive 0 (double sided)

OSLOAD...

Hit Ctrl-C for simulator command line.  Starting simulation...

>         0 A=00 B=00 X=0000 SP=00FF ------          0020: B6 E8 00 LDA E800                 

6800 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% a 2000
2000: XPDAT1	EQU	$F027	; string output, terminated by EOT
2000: EOT	EQU	$04	; $04 is decimal 4
2000: LF	EQU	$0A	; line feed
2000: CR	EQU	$0D	; carriage return
2000: *
2000: NATWID	EQU	2	; 2 bytes in the CPU's natural integer
2000: *
2000: ENTRY	JMP	START
2003: * (EXORsim apparently doesn't want to calculate RMB arguments.)
2003: *	RMB	16*NATWID-1
2003: 	RMB	31	; 16 levels of call minus any saved registers, max
2022: STKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
2023: SAVES	RMB	2	; a place to keep S so we can be clean
2025: STR1	FCB	CR,LF	; Put message at beginning of line
2027: 	FCB	"To the beach!"	; Whatever the user wants here.
2034: 	FCB	CR,LF,EOT	; Put the debugger's output on a new line.
2037: STR1	FCB	CR,LF	; Put message at beginning of line
Symbol 'STR1' already defined to 2025
2039: 	FCB	"Hello Bugs Bunny!"	; Whatever the user wants here.
204a: 	FCB	CR,LF,EOT	; Put the debugger's output on a new line.
204d: STR1	FCB	CR,LF	; Put message at beginning of line
Symbol 'STR1' already defined to 2025
204f: 	FCB	"I am a muggle."	; Whatever the user wants here.
205d: 	FCB	CR,LF,EOT	; Put the debugger's output on a new line.
2060: STR1	FCB	CR,LF	; Put message at beginning of line
Symbol 'STR1' already defined to 2025
2062: 	FCB	"Are we there yet?"	; Whatever the user wants here.
2073: 	FCB	CR,LF,EOT	; Put the debugger's output on a new line.
2076: *
2076: START	STS	SAVES	; Save what the monitor gives us.
Address at 2001 set to 2076
2079: 	LDS	#STKBAS	; Move to our own stack
207c: 	LDX	#STR1	; point to the string
207f: 	JSR	XPDAT1	; output it
2082: 	LDX	#STR1	; point to the string
2085: 	JSR	XPDAT1	; output it
2088: 	LDX	#STR1	; point to the string
208b: 	JSR	XPDAT1	; output it
208e: 	LDX	#STR1	; point to the string
2091: 	JSR	XPDAT1	; output it
2094: DONE	LDS	SAVES	; restore the stack pointer
2097: 	NOP
2098: 	NOP		; landing pad
2099: "
2099: "" 
2099: ""
Symbol '""' already defined to 2099
2099: ""
Symbol '""' already defined to 2099
2099: 
% 

Woops.

Symbol '""' already defined to 2099

The interactive assembler is telling us we have trash at the end, left over from my copying and pasting the strings in. Those could cause problems, but won't in this case..

(If you've already found the real bugs, bear with us.)

(u)nassemble from $2000:

% u 2000
2000: 7E 20 76            JMP $2076
2003: 00                  ???
...

 (Lots of question marks after that.) 

Then from $2076:

% u 2076
2076: BF 20 23            STS $2023
2079: 8E 20 22            LDS #$2022
207C: CE 20 25            LDX #$2025
207F: BD F0 27            JSR $F027 [PDATA1 Print string]
2082: CE 20 25            LDX #$2025
2085: BD F0 27            JSR $F027 [PDATA1 Print string]
2088: CE 20 25            LDX #$2025
208B: BD F0 27            JSR $F027 [PDATA1 Print string]
208E: CE 20 25            LDX #$2025
2091: BD F0 27            JSR $F027 [PDATA1 Print string]
2094: BE 20 23            LDS $2023
2097: 01                  NOP
2098: 01                  NOP
2099: 00                  ???
...

 Set a breakpoint at $2097 and just (c)ontinue from $2000:

% b 2097
Breakpoint set at 2097
% c 2000


Breakpoint!

To the beach!

To the beach!

To the beach!

To the beach!
       1211 A=04 B=00 X=2036 SP=2022 ---Z-- DONE     2094: BE 20 23 LDS 2023  EA=2023(SAVES) D=00FF 
>      1212 A=04 B=00 X=2036 SP=00FF ------          2097: 01       NOP                      

6800 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% 

Yeah, yeah, you had already seen that and you were expecting it. In fact, you were raising your hand to say,

Hey, there's other already defined label messages up there!

And you're right. Starting at address $2037, it says STR1 is already defined to $2025.

So, let's go in where you have that in your text editor ...

What, you just copied from here into EXORsim? Silly you. 

Silly me for not warning you you'd need to paste it into a text editor so you could fix it. 8-* 

(And if you're ahead of me on that, good for you.)

... and fix the 2nd through 4th STRs to STR2, STR3, and STR4.

Will that fix it?

You're saying, of course not. We have to load the addresses of the latter three STRNs into X in their turn, or it just repeats the same output.

Okay, go fix the LDX lines, too. And clear out those trailing quotation marks, while you're at it.

Go ahead, paste it in and get it running.

You need to go ahead and play with EXORsim and Hatari like this a bit, to get used to the noise it makes and figuring out what it means and when it's useful.

I'm going to skip the 6801. Trying to run 6801 code on the 6800 would be more interesting, but let's not do that, either. Not yet.

Let's try a little more subtle puzzle on the 6809. 

Remember to start the session with 

$ ./exor09 --mon

And (a)ssemble it at $2000. 

Here's some code:

* 6809 very special version
XPDAT1	EQU	$F026	; string output, terminated by EOT
EOT	EQU	$00	; $04 is decimal 4
LF	EQU	$0A	; line feed
CR	EQU	$0D	; carriage return
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
ENTRY	BRA	START	; Close enough for the short branch.
	RMB	32	; 16 levels of call minus any saved registers, max
SAVES	RMB	2	; a place to keep S so we can return cleanly
STKBAS	EQU	SAVES 	; 6809 is pre-dec (pre-store-decrement) push

STR1	FCB	CR,LF	; Put message at beginning of line
	FCB	"To the beach!"	; Whatever the user wants here.
	FCB	CR,LF,EOT	; Put the debugger's output on a new line.
STR2	FCB	CR,LF	; Put message at beginning of line
	FCB	"Hello Bugs Bunny!"	; Whatever the user wants here.
	FCB	CR,LF,EOT	; Put the debugger's output on a new line.
STR3	FCB	CR,LF	; Put message at beginning of line
	FCB	"I am a muggle."	; Whatever the user wants here.
	FCB	CR,LF,EOT	; Put the debugger's output on a new line.
STR4	FCB	CR,LF	; Put message at beginning of line
	FCB	"Are we there yet?"	; Whatever the user wants here.
	FCB	CR,LF,EOT	; Put the debugger's output on a new line.
*
START	STS	SAVES,PCR	; Save what the monitor gives us.
	LEAS	STKBAS,PCR	; Move to our own stack
	LEAX	HELLO,PCR	; point to the string
	BSR	XPDAT1		; output it
	LEAS	STKBAS,PCR	; Move to our own stack
	LEAX	HELLO,PCR	; point to the string
	BSR	XPDAT2		; output it
	LEAS	STKBAS,PCR	; Move to our own stack
	LEAX	HELLO,PCR	; point to the string
	BSR	XPDAT3		; output it
	LEAS	STKBAS,PCR	; Move to our own stack
	LEAX	HELLO,PCR	; point to the string
	BSR	XPDAT4		; output it
DONE	LDS	SAVES,PCR	; restore the stack pointer
	NOP
	NOP		; landing pad

Again, you might see the bugs already. Let's see what they do.

$ ./exor09 --mon
Load facts file 'facts09'
'exbug09.bin' loaded.
  EXBUG09-2.1 detected
'mdos09.dsk' opened for drive 0 (double sided)

OSLOAD...

Hit Ctrl-C for simulator command line.  Starting simulation...

>         0 A=00 B=00 X=0000 Y=0000 U=0000 S=00FF P=00 --------            0020: 86 10        LDA #$10                   

6809 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% a 2000
2000: * 6809 very special version
2000: XPDAT1	EQU	$F026	; string output, terminated by EOT
2000: EOT	EQU	$00	; $04 is decimal 4
2000: LF	EQU	$0A	; line feed
2000: CR	EQU	$0D	; carriage return
2000: *
2000: NATWID	EQU	2	; 2 bytes in the CPU's natural integer
2000: *
2000: ENTRY	BRA	START	; Close enough for the short branch.
later = 2001
2002: 	RMB	32	; 16 levels of call minus any saved registers, max
2022: SAVES	RMB	2	; a place to keep S so we can return cleanly
2024: STKBAS	EQU	SAVES 	; 6809 is pre-dec (pre-store-decrement) push
2024: 
% STR1	FCB	CR,LF	; Put message at beginning of line
Huh?
% 	FCB	"To the beach!"	; Whatever the user wants here.
Huh?
% 	FCB	CR,LF,EOT	; Put the debugger's output on a new line.
Huh?
...

What is that 

Huh?

thing? It starts after STKBAS is declared. If we look back up there, all we see is a blank line.

Ah. Oh, yeah. The interactive assembler thinks a blank line is meant to terminate the (a)ssemble command, and then it tries to interpret STR1 as a command and tells us it doesn't understand. 

(Cheeky assembler. ;-)

Okay, so let's put an asterisk in on that line so it's a comment and doesn't terminate the assembly job. And paste it in, (u)nassemble it from $2000 and then $2077 to see how it looks, set a (b)reakpoint at $20A1 where the NOP landing pad is, and (c)ontinue at $2000:

$ ./exor09 --mon
Load facts file 'facts09'
'exbug09.bin' loaded.
  EXBUG09-2.1 detected
'mdos09.dsk' opened for drive 0 (double sided)

OSLOAD...

Hit Ctrl-C for simulator command line.  Starting simulation...

>         0 A=00 B=00 X=0000 Y=0000 U=0000 S=00FF P=00 --------            0020: 86 10        LDA #$10                   

6809 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% a 2000
2000: * 6809 very special version
2000: XPDAT1	EQU	$F026	; string output, terminated by EOT
2000: EOT	EQU	$00	; $04 is decimal 4
2000: LF	EQU	$0A	; line feed
2000: CR	EQU	$0D	; carriage return
2000: *
2000: NATWID	EQU	2	; 2 bytes in the CPU's natural integer
2000: *
2000: ENTRY	BRA	START	; Close enough for the short branch.
later = 2001
2002: 	RMB	32	; 16 levels of call minus any saved registers, max
2022: SAVES	RMB	2	; a place to keep S so we can return cleanly
2024: STKBAS	EQU	SAVES 	; 6809 is pre-dec (pre-store-decrement) push
2024: *
2024: STR1	FCB	CR,LF	; Put message at beginning of line
2026: 	FCB	"To the beach!"	; Whatever the user wants here.
2033: 	FCB	CR,LF,EOT	; Put the debugger's output on a new line.
2036: STR2	FCB	CR,LF	; Put message at beginning of line
2038: 	FCB	"Hello Bugs Bunny!"	; Whatever the user wants here.
2049: 	FCB	CR,LF,EOT	; Put the debugger's output on a new line.
204c: STR3	FCB	CR,LF	; Put message at beginning of line
204e: 	FCB	"I am a muggle."	; Whatever the user wants here.
205c: 	FCB	CR,LF,EOT	; Put the debugger's output on a new line.
205f: STR4	FCB	CR,LF	; Put message at beginning of line
2061: 	FCB	"Are we there yet?"	; Whatever the user wants here.
2072: 	FCB	CR,LF,EOT	; Put the debugger's output on a new line.
2075: *
2075: 	NEG	$00
2077: START	STS	SAVES,PCR	; Save what the monitor gives us.
Offset at 2001 set to 75
207b: 	LEAS	STKBAS,PCR	; Move to our own stack
207e: 	LEAX	HELLO,PCR	; point to the string
later = 2080
2082: 	BSR	XPDAT1		; output it
Error: Offset for 2083 out of range.  It was 53154 but must be -128 to 127
2082: 	LEAS	STKBAS,PCR	; Move to our own stack
2085: 	LEAX	HELLO,PCR	; point to the string
later = 2087
2089: 	BSR	XPDAT2		; output it
later = 208a
208b: 	LEAS	STKBAS,PCR	; Move to our own stack
208e: 	LEAX	HELLO,PCR	; point to the string
later = 2090
2092: 	BSR	XPDAT3		; output it
later = 2093
2094: 	LEAS	STKBAS,PCR	; Move to our own stack
2097: 	LEAX	HELLO,PCR	; point to the string
later = 2099
209b: 	BSR	XPDAT4		; output it
later = 209c
209d: DONE	LDS	SAVES,PCR	; restore the stack pointer
20a1: 	NOP
20a2: 	NOP		; landing pad
20a3: 
% u 2000
2000: 20 75               BRA $2077
2002: 00 00               NEG $00
...
% u 2077
2077: 10EF 8C A7           STS $2022,PCR
207B: 32 8C A4            LEAS $2022,PCR
207E: 30 8D 0000           LEAX $2082,PCR
2082: 32 8C 9D            LEAS $2022,PCR
2085: 30 8D 0000           LEAX $2089,PCR
2089: 8D 00               BSR $208b
208B: 32 8C 94            LEAS $2022,PCR
208E: 30 8D 0000           LEAX $2092,PCR
2092: 8D 00               BSR $2094
2094: 32 8C 8B            LEAS $2022,PCR
2097: 30 8D 0000           LEAX $209b,PCR
209B: 8D 00               BSR $209d
209D: 10EE 8C 81           LDS $2022,PCR
20A1: 12                  NOP 
20A2: 12                  NOP 
20A3: 00 00               NEG $00
...
% b 20a1
Breakpoint set at 20A1
% c 2000


Breakpoint!
         13 A=00 B=00 X=209B Y=0000 U=0000 S=2020 P=00 -------- DONE       209D: 10EE 8C 81   LDS $2022,PCR EA=2022(STKBAS) D=00FF 
>        14 A=00 B=00 X=209B Y=0000 U=0000 S=00FF P=00 --------            20A1: 12           NOP                        

6809 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% 

Erk. 

Nothing.

Why?

Look back at the output as it assembles, and you should see an error message. Offset out of range for branching to subroutine to XPDAT1.

Well, yeah. XPDAT1. It's way up there at 53154, which needs a long branch. 

But XPDAT2 and XPDAT3 don't seem to cause problems?

(u)nassemble again from $2077.

% u 2077
2077: 10EF 8C A7           STS $2022,PCR
207B: 32 8C A4            LEAS $2022,PCR
207E: 30 8D 0000           LEAX $2082,PCR
2082: 32 8C 9D            LEAS $2022,PCR
2085: 30 8D 0000           LEAX $2089,PCR
2089: 8D 00               BSR $208b
208B: 32 8C 94            LEAS $2022,PCR
208E: 30 8D 0000           LEAX $2092,PCR
2092: 8D 00               BSR $2094
2094: 32 8C 8B            LEAS $2022,PCR
2097: 30 8D 0000           LEAX $209b,PCR
209B: 8D 00               BSR $209d
209D: 10EE 8C 81           LDS $2022,PCR
20A1: 12                  NOP 
20A2: 12                  NOP 
20A3: 00 00               NEG $00

Well, the branch offsets seem to be set to 00. And HELLO? The offsets to HELLO are ...

HELLO?

There's no HELLO in there any more. Not supposed to be.

Ah! Not defined. What did I do?

Gack. Somehow I seem to have thought I was pasting in STR1, STR2, etc. and I was changing the XPDAT1 output string target to XPDAT2 and ..., which is why those are undefined. There is no XPDAT2, etc. Okay, so get those labels fixed, and don't forget the long branches:

START	STS	SAVES,PCR	; Save what the monitor gives us.
	LEAS	STKBAS,PCR	; Move to our own stack
	LEAX	STR1,PCR	; point to the string
	LBSR	XPDAT1		; output it
	LEAS	STKBAS,PCR	; Move to our own stack
	LEAX	STR2,PCR	; point to the string
	LBSR	XPDAT1		; output it
	LEAS	STKBAS,PCR	; Move to our own stack
	LEAX	STR3,PCR	; point to the string
	LBSR	XPDAT1		; output it
	LEAS	STKBAS,PCR	; Move to our own stack
	LEAX	STR4,PCR	; point to the string
	LBSR	XPDAT1		; output it

Okay, paste the whole thing (not just the fixed part) into a fresh EXORsim 6809 session and (u)nassemble from $2077 again, and the offsets look reasonable now. And no error messages. (c)ontinue it from $2000.

Well, that's interesting. Looks like it's looping or something. I'm going to have to break out with a Ctrl-C.

% c 2000


To the beach!

Hello Bugs Bunny!

I am a muggle.

Are we there yet?
2020202~ODjF K3 4
Hello Bugs Bunny!

I am a muggle.

Are we there yet?
2020202~ODjF K3 4
I am a muggle.

Are we there yet?
2020202~ODjF K3 4
Are we there yet?
2020202~Interrupt!
    5152051 A=00 B=00 X=3D53 Y=0000 U=0000 S=2019 P=00 -----Z-C            F1D9: F6 FCF4      LDB $fcf4   EA=FCF4(ACIA0) D=02 
>   5152052 A=00 B=02 X=3D53 Y=0000 U=0000 S=2019 P=00 -------C            F1DC: C5 02        BITB #$02                  

6809 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% 

Let's try single stepping. Fresh session, freshly assembled, (s)tep from$2000.

% s 2000

          0 A=00 B=00 X=0000 Y=0000 U=0000 S=00FF P=00 -------- ENTRY      2000: 20 75        BRA $2077                  
>         1 A=00 B=00 X=0000 Y=0000 U=0000 S=00FF P=00 -------- START      2077: 10EF 8C A7   STS $2022,PCR                

6809 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% s

          1 A=00 B=00 X=0000 Y=0000 U=0000 S=00FF P=00 -------- START      2077: 10EF 8C A7   STS $2022,PCR EA=2022(STKBAS) D=00FF 
>         2 A=00 B=00 X=0000 Y=0000 U=0000 S=00FF P=00 --------            207B: 32 8C A4     LEAS $2022,PCR                

6809 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% s

          2 A=00 B=00 X=0000 Y=0000 U=0000 S=00FF P=00 --------            207B: 32 8C A4     LEAS $2022,PCR                
>         3 A=00 B=00 X=0000 Y=0000 U=0000 S=2022 P=00 --------            207E: 30 8C A3     LEAX $2024,PCR                

6809 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% s

          3 A=00 B=00 X=0000 Y=0000 U=0000 S=2022 P=00 --------            207E: 30 8C A3     LEAX $2024,PCR                
>         4 A=00 B=00 X=2024 Y=0000 U=0000 S=2022 P=00 --------            2081: 17 CFA2      LBSR $f026                 

6809 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
%     

Long branch to subroutine to $F026. Hmm. Just in case, let's go look at the 6809 code we've used before.

$F027!!!

How'd I do that?

(Really, these kinds of things do happen. I promise.)

Okay, fix that line. Very first line. Well, first line of non-comment code:

XPDAT1	EQU	$F027	; string output, terminated by EOT

Fresh session, freshly assembled, run it and, ... it's doing the same thing? Leave it.Maybe it will stop by itself.

Output four times, with a long pause between each. And trash after. Hmm.

Look back up at that first line of code. Idly look at the second line of code.

* 6809 very special version
XPDAT1	EQU	$F027	; string output, terminated by EOT
EOT	EQU	$00	; $04 is decimal 4
LF	EQU	$0A	; line feed
CR	EQU	$0D	; carriage return

EOT EQUated to $00?

O O
<
()

The comment is right. How'd that happen?

:-/

Well, yeah, this kind of thing happens, too. 

Go ahead and fix it and assemble and run it. This time it should behave itself.

Do you want to do a debug session like this with Hatari? Hmm. Gotta think about that.

Until I figure that out, you can go ahead and walk through the examples using the single character output functions to write your own string output functions.


(Title Page/Index)