Tuesday, October 15, 2024

ALPP 02-13 -- Balancing on the Beach -- Comparing Pointers, Checking the Stack

Balancing on the Beach --
Comparing Pointers,
Checking the Stack

(Title Page/Index)

Four different processors, three different parameter passing modes. And natural width (address width) arithmetic -- 16-bit on the 8-bit processors, but 32-bit on the "16-bit" 68000.

And I really don't want to leave a certain set of questions begging.

* How do you make sure that your use of statically allocated parameters (and variables) has no internal conflicts? How do you make sure that you aren't using, say, a parameter/variable to allocate an I/O buffer, then, before you finish the allocation maintenance code, you go off and start a routine to collect de-allocated buffers and use the same variable for a conflicting purpose?

Discipline! 

There are several disciplines for static parameters and variables, and the disciplines allow the use of automated tools that can check for conflicting uses. 

If you don't have such tools, you have to analyze their use yourself. Discipline helps, but, ultimately, you want to avoid using more statically allocated parameters and variables than you can track and analyze.

I'll talk about various approaches to discipline as we go, variable naming and use, file and directory strategies and such -- because we really can't avoid some degree of static allocation.

* How do you make sure your return address stack is balanced?

Well, if you don't put temporaries and parameters and such on the return address stack, that's a tautology. If you come back from every subroutine you call, your return address stack is balanced. If you don't, it isn't.

Well, there are other reasons for not coming back, such as crashing the stack and infinite loops, topics which we will brush on here and there.

* What if you do use a combined stack? 

Actually, if you use a combined stack and you manage to return from every call, you can be pretty sure it's balanced -- given the exceptions mentioned above. But when you have a lot of parameters and variables on that stack, the probability of not coming back increases.

So you use a stack frame, which I have been waiting to explain until we come to some functions that are complicated enough to need a stack frame

Otherwise, just make sure you have a pop for every push and never have too many pops, and watch for your code to fail to come back from a subroutine call. 

And if it doesn't return?

8-0

Yeah, seeing things blow up and figuring out which call it didn't return from can become something of a black art. So you tend to overuse the stack frame.

* If you use a separate parameter stack, how do you make sure you've kept it balanced? The return stack and the parameter stack might get out of sync, might they not? And you could return from every call, but still leave parameters out there to be processed, or end up having tried to process parameters that were never pushed out there.

Just make sure you have a pop for every push and never have too many pops. Simple. ...

And how do you do that?

AAAARRRRRGGGGGHHHHHH!!!!!

No magic. 

But we do have strategies, some of which we use and some of which we don't, depending on various things. A few of them are

  • Testing pushes and pops before you do them.
  • Periodically checking that the parameter stack pointer is pointing at valid parameter stack space.
  • Providing bumpers, or buffer zones between the stack space and other spaces, and periodically checking that nothing has been written into those zones.
  • Saving the stack pointer when you enter a function and checking that it matches before you leave.
  • Using a stack frame on the parameter stack, in effect pushing everything at once, and popping it all at once.
  • Letting the hardware allocate the parameter stack in an area of memory isolated from the rest, so that overflows and underflows cause memory errors.

These strategies all work on the single combined stack, as well, by the way.

We really aren't prepared to talk about all of those, but we can talk about some of them.

For instance, here's code to check the stack pointer on exit and periodically check it mid-flight, on the 68000:

* This is not the best way for every use.
...
PSTKLIM	DS.L	32	; roughly 16 levels of call at two parameters per call
PSTKBAS	DS.L	1	; bumper space -- parameter stack is pre-dec
* ...
START	BSR.W	INISTKS
*	...
	CMP.L	#PSTKBAS,A6
	BHI.W	STKERR
	CMP.L	#PSTKLIM,A6
	BLO.W	STKERR
*	...
DONE	CMP.L	#PSTKBAS,A6
	BEQ.S	EXIT
STKERR *	...
*	...
EXIT	MOVEM.L	A4SAVE-LB_ADDR(A5),A4-A7	; restore the monitor's A4-A7
	NOP
	NOP		; landing pad

What is this CoMPare instruction?

It effectively subtracts the operands, but doesn't store the result. It only effects the flags, so you can keep your value but branch on the basis of the comparison. 

Many of the branch instructions Motorola implements on their CPUs test multiple flags, and the mnemonics they specify are supposed to help you remember what the combinations mean.

  • HI means that the target of the CoMPare is HIgher (unsigned compare) than the source. (Carry is set.) So if A6 is higher than PSTKBAS, it branches to STKERR. Remember, stack pushed to low memory, so it's "upside-down", so to speak.
  • LO means the target is LOwer (unsigned) than the source. (Carry is set and Zero is clear.) So if A6 is lower than PSTKLIM, it branches to STKERR.
  • EQ means the source is EQual to the target. (Zero flag is set.) So if, after the body of the main function is done, A6 is equal to PSTKBAS -- nothing left on stack, and not pointing beyond the bottom of stack -- it branches around the STKERR function to EXIT.
  • NE, by the way, means Not Equal. (Zero flag is clear.)

This is essentially something we could do for all four processors -- For the 6800:

* This is not the best way for every use.
* 6800 CPX flags are incomplete, 
* but Z is valid.
* And N should be valid all by itself, 
* even though the manual says
* "not intended for conditional branching".
*	...
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
*	...
START	JSR	INISTKS
*	...
	LDX	PSP
	CPX	#PSTKLIM	; Z is valid, N should be valid
	BEQ	TSTOVR		; have to test separately
	BPL	TSTOVR
	JMP	STKERR
TSTOVR	CPX	#PSTKBAS
	BEQ	CONT
	BPL	STKERR
CONT *	...
*	...
DONE	LDX	PSP
	CPX	#PSTKBAS
	BEQ	EXIT
STKERR *	... 
*	...
EXIT	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

For the 6801:

* This is not the best way for every use.
* 6801 CPX flags are all valid 
*	...
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
*	...
START	JSR	INISTKS
*	...
	LDX	PSP
	CPX	#PSTKLIM	; All flags valid
	BLO	STKERR
	CPX	#PSTKBAS
	BHI	STKERR
CONT *	...
*	...
DONE	LDX	PSP
	CPX	#PSTKBAS
	BEQ	EXIT
STKERR *	... 
*	...
EXIT	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

For the 6809 using absolute addressing:

* This is not the best way for every use.
* 6809 CMPU flags are all valid 
*	...
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
*	...
START	LBSR	INISTKS
*	...
	CMPU	#PSTKLIM
	BLO	STKERR
	CMPU	#PSTKBAS
	BHI	STKERR
CONT *	...
*	...
DONE	CMPU	#PSTKBAS
	BEQ	EXIT
STKERR *	... 
*	...
EXIT	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

But we had initialized U via PC relative addressing. so we should use LEA to calculate the addresses and do it this way:

* This is not the best way for every use.
* 6809 CMPU flags are all valid 
*	...
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
*	...
START	LBSR	INISTKS
*	...
	LEAX	PSTKLIM,PCR
	PSHS	X
	CMPU	,S++
	BLO	STKERR
	LEAX	PSTKBAS,PCR
	PSHS	X
	CMPU	,S++
	BHI	STKERR
CONT *	...
*	...
DONE	CMPU	#PSTKBAS
	BEQ	EXIT
STKERR *	... 
*	...
EXIT	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

And, come to think of it, our initialization code for the 68000 was also PC relative, so here's how to do it PC relative on the 68000:

* This is not the best way for every use.
*...
PSTKLIM	DS.L	32	; roughly 16 levels of call at two parameters per call
PSTKBAS	DS.L	1	; bumper space -- parameter stack is pre-dec
*...
START	BSR.W	INISTKS
*	...
	LEA	PSTKBAS(PC),A0
	CMP.L	A0,A6
	BHI.W	STKERR
	LEA	PSTKLIM(PC),A0
	CMP.L	A0,A6
	BLO.W	STKERR
*	...
DONE	LEA	PSTKBAS(PC),A0
	CMP.L	A0,A6
	BEQ.S	EXIT
STKERR *	...
*	...
EXIT	MOVEM.L	A4SAVE-LB_ADDR(A5),A4-A7	; restore the monitor's A4-A7
	NOP
	NOP		; landing pad

That's pretty heavy duty. If it's just on the main function, I suppose it isn't so bad. Particularly, it's something we might insert in the main function when we are debugging and need to watch what's happening to the stack.

But if  checking the stack on function entry and exit is going to be that hard, do we really want to work the processor that hard?

(This is one of the places where current CPUs are all lacking. The CPU should define limit registers for the stacks and handle these checks at run-time transparently. Current practice is to isolate the single stack in its own MMU segment, frame function calls in rigid stack frames, and let the MMU hardware trap stack-out-of-bounds. But there are ways hostile code, or code running wild, can get around that in most modern MMU implementations. We are too focused on speed.)

Simply checking that the stack is balanced between entry and exit isn't quite as good as full bounds checking, but it does help. If all functions remain balanced, the only problem we might have is stack overflow.

(cough mumble functions that allocate or deallocate stack mutter mumble)

Let's look at balance checks on the 68000:
* This is not the best way for every use.
FENTRY	MOVE.L	A4,-(A7)	; Save A4 on entry to function
	MOVE.L	A6,A4	; Mark the current allocation depth.
	...
	CMP.L	A4,A6	; subtract A4 from A6, don't store result
	BNE.W	STKERR	; STKERR will have to deal with the saved A4
	MOVE.L	(A7)+,A4
	RTS

And you insert the meat of the function in place of the ellipsis.

How would it look on the 6809? We don't have enough registers to just use one, but we could push the marker to the return stack. Of course, then we have to be careful to keep that in balance, but if we let the return stack go out of balance we've got problems anyway.
* This is not the best way for every use.
FENTRY	PSHS	U
	...
	CMPU	,S++	; compare and pop
	BNE	STKERR
	RTS

And, if you're thinking that we could have done that on the 68000, yeah, we could have, but we don't seem to be using A4 for anything, and it was convenient.

How would it look on the 6801?

* This is not the best way for every use.
FENTRY	LDX	PSP
	PSHX
	...
	PULX
	CPX	PSP	; inverted compare, 
	BEQ	FEXIT	; but we're only looking at equal
	JMP	STKERR
FEXIT	RTS

And how about the 6800? Can it be done reasonably effeciently?

We're gonna need a subroutine, but that means we're going to be fighting with the return address. But, yeah it can be done. It's a lotta code.

PMARK	DES		; Need space.
	DES
	TSX
	LDAA	2,X	; return address
	LDAB	3,X
	STAA	0,X	; move it down
	STAB	1,X
	LDAA	PSP	; get the mark
	LDAB	PSP+1
	STAA	2,X	; move it in
	STAB	3,X
	RTS
* ...
FENTRY	JSR	PMARK
*	...    ; meat of function here.
	TSX
	LDX	0,X
	CPX	PSP
	BEQ	FEXIT
	JMP	STKERR
FEXIT	RTS

I guess that's not so bad. Maybe.

But, wait!

 (Told you so.)

So far, our functions have cleaned up their own stacks, and the parameter stack pointer on entrance is not same as the stack pointer on exit. It changes. We can't use these.

Or can we, with some modifications?

Usually, we know how much the function is going adjust the stack by on return. Could we save the mark pre-adjusted?

Hmm.

Let's look at pre-adjusted balance checks on the 68000.

68000, with adjusted A4:

FENTRY	MOVE.L	A4,-(A7)	; Save A4 on entry to function
	LEA	ADJVAL(A6),A4	; Mark the target allocation depth.
	...
	CMP.L	A4,A6
	BNE.W	STKERR	; STKERR will have to deal with the saved A4
	MOVE.L	(A7)+,A4
	RTS

68000, adjusted and pushed on return stack:

FENTRY	LEA	ADJVAL(A6),A0	; Use a volatile register
	MOVE.L	A0,-(A7)	; Save pre-adjusted on entry to function
*	...
	CMP.L	(A7),A6		; deliberately leave it on the stack for STKERR
	BNE.W	STKERR
	LEA	NATWID(A7),A7
	RTS 

And on the 6809, adjusted and pushed on the return stack:

FENTRY	LEAX	ADJVAL,U
	PSHS	X
*	...
	CMPU	,S	; leave it on stack for STKERR
	BNE	STKERR
	LEAS	NATWID,S
	RTS

On the 6801, again, adjusted and pushed on the return stack:

FENTRY	LDD	#ADJVAL    ; Unsigned ABX doesn't help.
	ADDD	PSP
	PSHB
	PSHA
*	...
	PULX
	CPX	PSP	; inverted compare, 
	BEQ	FEXIT	; but we're only looking at equal
	JMP	STKERR	; STKERR can see both mark in X and PSP
FEXIT	RTS

And, finally, on the 6800 (wow!):

* Signed offset in B, -128 to +127
PMARK	DES		; Need space.
	DES
	TSX
	LDAA	2,X	; return address
	STAA	0,X	; move it down
	LDAA	3,X	; leave B alone
	STAA	1,X
	CLRA
	TSTB
	BPL	PMARKA
	COMA		; sign extend
PMARKA	ADDB	PSP+1
	ADCA	PSP
	STAA	2,X	; move it in
	STAB	3,X
	RTS
* ...
FENTRY	LDAB	#ADJVAL
	JSR	PMARK
*	...
	TSX
	LDX	0,X
	CPX	PSP
	BEQ	FEXIT
	JMP	STKERR
FEXIT	RTS

Okay, so it's possible to mark the stack. But maybe we only want to do it for those functions that we aren't confident are going to stay balanced because different branches of the code do different things. 

Or when we are debugging.

Am I sure that using a combined stack, with stack frames, is not going to be easier, more efficient, and safer?

Is it time to look at stack frames? 

No! NO! Noooooooooooo!

I DON'T want to do that!

First I want to show you how to get numeric output, so that we don't have to depend on the debugger to see the effects of our code! We know enough to get binary output.

Sigh. 

Besides, we really need something more complex than what we have at this point to make it make sense. Don't we? 

Maybe not, especially if we start with frame pointers on a split stack.


(Title Page/Index)

 

No comments:

Post a Comment