Wednesday, October 30, 2024

ALPP 02-21 -- Some Address Math for the 6801

  Some Address Math
for the
6801

(Title Page/Index)

I had thought I would not need to show this for the 6801, but the difference between addressing math on the 6800 and on the 6801, due to being able to add and subtract the double accumulator and being able to push and pop X is dramatic enough that I guess I should.

This chapter, then, will be an extension of the handwaving and conceptualizing in the unsteady footing chapter

Even if you aren't interested in stack frames, this discussion of addressing math should be useful, although I'm adding it a bit earlier than I had planned.

In the 6801, as I keep noting, we have ABX to help us with address math, but no corollary SBX. 

But the D register math is wide enough to do addresses, the big problem being in moving addresses between D and X. Two pushes and a pop, or two pops and a push, is not bad, but going through a pseudo-register in the direct page works quicker, and takes more bytes of object code. And sometimes you didn't want to use the whole D accumulator.

Now that I think of it, a sign-extend B into A instruction like the 6809's sign-extend instruction, SEX, might have been helpful in a few places. (cough.) Still, just using D is not an onerous burden.

We still have to use a pseudo-register for many/most of the calculations.

-- Which causes issues at interrupt time, unless we have separate pseudo-registers used by separate routines for interrupt-time, and copy the user tasks' pseudo-registers out and in on context switch.

Here are those NEGate D snippets, modified for 6801:

* For reference -- NEGate a 16-bit value in D (same as 6800) --
NEGD	COMA	; 2's complement NEGate is bit COMplement + 1
	NEGB
	BNE	NEGDX	; or BCS. but BNE works -- extends 0
	INCA
NEGDX	RTS
*
* Another way (use Double accumulator subtract):
NEGDS	PSHB
	PSHA
	CLRB	; 0 - D
	CLRA
	TSX
	SUBD	0,X
	INS
	INS
	RTS	
*
* Same thing using Double accumulator and a temporary
* somewhere in DP:
	...
SCRCHD	RMB	2
	...
* somewhere else
NEGDV	STD	SCRCHA
	LDD	#0	: 0 - D
	SUBD	SCRCHA
	RTS
	...

Remember to read the code and the comments in the code, and open up a separate browser window to compare side-by-side with the 6800. Read through my transliterations from the 6800, but don't jump to conclusions before you get to the very end.

Again, assume you have these declarations for the pseudo-registers:

	ORG	$80
	...
XOFFA	RMB	1
XOFFB	RMB	1
XOFFSV	RMB	2
	...

Using D is so much faster than either 8-bit accumulator that it really doesn't make much sense to provide anything but D-offset, but I've kept the 8-bit and subtract-by-negating entry points for reference. Lack of a negate D means this way to subtract de-optimizes subtraction, and, since the D offset is 16-bit, it's quicker to just load a negative offset in D and call ADDDX instead of bothering with using the SUBDX entry point.

ADDBX	CLRA
ADDDX	STX	XOFFSV
	ADDD	XOFFSV
	STD	XOFFSV
	LDX	XOFFSV
	RTS
SUBBX	CLRA	; B is unsigned
SUBDX	COMA
	NEGB
	BNE	ADDDX	; or BCS. but BNE works -- extends
	INCA
	BRA	ADDDX

If you want a SUBDX entry point for some reason, it may be worth keeping the logic separate and moving the operands. The Double accumulator math speeds this up significantly.

* Alternative, don't use ADDDX, use XOFFA and XOFFB instead 
SUBBX	CLRA	; B is unsigned
SUBDX	STD	XOFFA	; subtraction does not commute.
	STX	XOFFSV	; Handle operand order.
	LDD	XOFFSV
	SUBD	XOFFA
	STD	XOFFSV
	LDX	XOFFSV
	RTS

Just so I don't gloss over ABX, here's ADDBX as a subroutine. 8-bit offset SUBBX remains as it was for the 6800, except using ABX for the add means there's not code sharing:

* Working in byte offsets just takes that much more code than D,
* these are all superfluous.
* Well, the ABX instruction can be useful in-line.
* Alternative unsigned byte only
* subtract needs to be checked again
* range 0 to 255
ADDBX	ABX
	STX	XOFFSV
	RTS
* No improvements here without just using D.
SUBBX	NEGB
	BNE	SUBDXL	; or BCS. but BNE works -- extends
	DEC	XOFFSV	; I think inverting the add should work
SUBDXL	ADDB	XOFFSV+1
	BCC	SUBBXL	; still need to bring the carry in
	INC	XOFFSV+1
SUBBXL	STAB	XOFFSV
	LDX	XOFFSV
	RTS

Using ABX for the positive half of the signed 8-bit routines also emphasizes the lack of SBX in the 6801:

* ABX partially improves the positive half of things here,
* but you really don't want to do this.
* Needs to be checked again.
ADDSBX	STX	XOFFSV
	TSTB	; sign extend B
*	BEQ	ADSBXD	; use only if we really want to optimize 0
	BPL	ADSBXU
	NEGB	; high byte is -1 (low byte is not 0 anyway)
	ADDB	XOFFSV+1
	DEC	XOFFSV	; add -1 (I think )
	LDX	XOFFSV
ADSBXD	RTS
ADSBXU	ABX
	STX	XOFFSV
	RTS

Return stack pointer math with byte offsets losing its meaning on the 6801. You really want the speed when doing math on S, so you're just going to use D.

PSHX and PULX helps with handling the return address..

Again, you should recognize that the call writes the return address into the allocated space on allocation, so if you've stored before allocation, you'll be walking on what you stored.

The declarations,  note that we are adding SOFFA for the double accumulator:

* For S stack
* Even though we really don't want to be bumping the return stack that far,
* Using D is just faster on the 6801
	ORG	$90
	...
SOFFA	RMB	1
SOFFB	RMB	1
SOFFSV	RMB	2

And the code, watch the return address code:

* Here's what we can use the 6801 extensions for when doing unsigned byte offsets,
* but, really, use D instead:
	ORG	SOMETHING
ADDBS	PULX	; get return address, restore stack address
	STS	SOFFSV
	ADDB	SOFFSV+1	; can't use ABX because we need X for return
	BCC	ADDBSL
	INC	SOFFSV
ADDBSL	STAB	SOFFSV
	LDS	SOFFSV
	JMP	0,X	; return through X
SUBBS	NEGB
	BNE	ADDBS	; or BCS. but BNE works -- extend
	INCA
	BRA	ADDBS

Doing it with D instead, but use negative offsets instead of the SUBDS entry point:

* Do it with D, instead, but use negative offsets instead of SUBDS:
ADDDS	PULX	; get return address, restore stack address
	STS	SOFFSV
	ADDD	SOFFSV	; can't use ABX because we need X for return
ADDDSL	STD	SOFFSV
	LDS	SOFFSV
	JMP	0,X	; return through X
SUBDS	COMA
	NEGB
	BNE	ADDDS	; or BCS. but BNE works -- extend
	INCA
	BRA	ADDDS

Moving the operands around, if we think we must subtract positive offsets instead of adding negative offset, gets a lot of improvement. Again, just use D instead and call SUBDS instead of trying to optimize with the 8-bit B accumulator:

* use SOFFB instead of ADDBS
* range 0 to 255
* Need to check again
SUBBS	PULX	; get return address, restore stack pointer
	STS	SOFFSV
	STAB	SOFFB
	BPL	SUBBSM
	INC	SOFFSV	; subtract -1 (I think )
SUBBSM	LDAB	SOFFSV+1
	SUBB	SOFFB
	BCC	SUBBSL
	DEC	SOFFSV	; subtract the borrow
SUBBSL	STAB	SOFFSV+1
	LDS	SOFFSV
	JMP	0,X	; return through X
* Do it with D, instead
* use SOFFA instead of ADDDS
SUBDS	PULX	; get return address, restore stack pointer
	STS	SOFFSV
	STD	SOFFA
	LDD	SOFFSV
	SUBD	SOFFA
	STD	SOFFSV
	LDS	SOFFSV
	JMP	0,X	; return through X

At this point, I think it is obvious that long trains of INX are meaningless on the 6801: Two to four, in-line, sure. More, no.

Long trains for S also become questionable, but PULX can make an appearance, which is interesting, though not useful more than for something to think about:
* For small decrements, 8 <= decrement <= 16
* Uses X for return
* Note that this writes the return address in the upper portion of the allocation,
* which may not be desirable.
SUB14S	PULX
	BRA	ISB14S
SUB12S	PULX
	BRA	ISB12S
SUB10S	PULX
	BRA	ISB10S
SUB8S	PULX
	BRA	ISB8S
SUB16S	PULX
ISB16S	DES
	DES
ISB14S	DES
	DES
ISB12S	DES
	DES
ISB10S	DES
	DES
ISB8S	DES
	DES
	DES	; SUB7S and less are shorter in-line
	DES
	DES	
	DES
	DES
	DES
	JMP	0,X
* For small increments, 8 <= increment <= 16
* Uses X for return
ADD14S	PULX
	BRA	IAD14S
ADD12S	PULX
	BRA	IAD12S
ADD10S	PULX
	BRA	IAD10S
ADD8S	PULX
	BRA	IAD8S
ADD6S	PULX
	BRA	IAD6S
ADD16S	PULX
IAD16S	INS
	INS
IAD14S	INS
	INS
IAD12S	INS
	INS
IAD10S	INS
	INS
IAD8S	INS
	INS
IAD6S	INS	; ADD5S and less are shorter in-line
	INS
	INS	
	INS
	INS
	INS
	JMP	0,X

I guess, since I'm being noisy about SBX not being implemented on the 6801, I should also be noisy about ABS (add B to S) and SBS (subtract B from S) being missing.

But so much of the above really becomes irrelevant if we just liberate ourselves from the stack frame mentality/paradigm. Stack frames really ought to be classed among Monty Python's silly walks. 

Stacks allocated entirely within a single page

Concerning the optimization of allocating stacks entirely within a page and only doing math on the low byte, the 6801 offers no improvements to that, only to make the optimization less meaningful. I'll repeat, with the full address math below to make it clear. 

 Oh, but working directly on the parameter stack pointer becomes more interesting.

* And stacks restricted within page boundaries no longer make as much sense on the 6801.
* Pseudo-registers somewhere in DP:
PSP	RMB	2
XOFFSV	RMB	2
XOFFA	RMB	1
XOFFB	RMB	1
SOFFA	RMB	1
SOFFB	RMB	1
SOFFSV	RMB	2
	...
	ORG	$500	; or something
	RMB	4	; buffer zone
PSTKLIM	RMB	64
PSTKBAS	RMB	4	; buffer zone
SSTKLIM	RMB	32
SSTKBAS	RMB	4	; buffer zone
	...

* B for parameter stack:
ADBPSX	STX	PSP
ADBPSP	ADDB	PSP+1	; Stack allocated completely within page, never carries.
	STAB	PSP+1
	LDX	PSP
	RTS
*
* D for parameter stack:
ADDPSX	STX	PSP
ADDPSP	ADDD	PSP
	STD	PSP	; does the whole pointer, negatives, too
	LDX	PSP
	RTS
*
* B for parameter stack:
SBBPSX	STX	PSP
SBBPSP	STAB	XOFFB
	LDAB	PSP+1
	SUBB	XOFFB	; Stack allocated completely within page, never carries.
	STAB	PSP+1
	LDX	PSP
	RTS
*
* D for parameter stack:
SBDPSX	STX	PSP
SBDPSP	STD	XOFFA
	LDD	PSP
	SUBD	XOFFA	; does the whole pointer
	STD	PSP
	LDX	PSP
	RTS

* B for return stack:
ADBSP	PULX	; return address
	STS	SOFFSV
	ADDB	SOFFSV+1	; Stack allocated completely within page, never carries.
	STAB	SOFFSV+1
	LDS	SOFFSV
	JMP	0,X	; return via X
*
* D for return stack (but we saw this above):
ADDSP	PULX	; return address
	STS	SOFFSV
	ADDD	SOFFSV	; does the whole pointer, negatives, too
	STD	SOFFSV
	LDS	SOFFSV
	JMP	0,X	; return via X

* B for return stack:
SBBSP	PULX	; return address
	STS	SOFFSV
	STAB	SOFFB
	LDAB	SOFFSV+1
	SUBB	SOFFB		; Stack allocated completely within page, never carries
	STAB	SOFFSV+1
	LDS	SOFFSV
	JMP	0,X	; return via X
*
* D for return stack (but we saw this above):
SBDSP	PULX	; return address
	STS	SOFFSV
	STD	SOFFA
	LDD	SOFFSV
	SUBD	SOFFA		; does the whole pointer
	STD	SOFFSV
	LDS	SOFFSV
	JMP	0,X	; return via X

As with the last chapter, I have not tested the code. I do think it should run, modulo typos.

[JMR202411021012 addendum:]

 Not stack frame related, but address math. I discussed it in the 6800 address math chapter, and I want to show the 6801 version of the code. 

This is for accessing per-process global variables that don't need such high-speed access that they are worth slowing process switches down with, which is almost all per-process variables except when the hardware application only has a few very limited processes. See the discussion before the 6800 snippets.

* In the DP somewhere, we have a base pointer for the per-process
* variable space, and a working register, something like
	...
LOCBAS	RMB	2
LBXPTR	RMB	2
	...
*
* And, to get the address of variables in the per-process variable space,
* something like these functions --
ADDLBB	CLRA			; entry point for the byte offset in B
ADDLBD	ADDD	LOCBAS		; entry point for larger offsets in A:B
	STD	LBXPTR
	RTS
*
ADDLBX	BSR	ADDLBB	; and load X
	LDX	LBXPTR
	RTS
*
ADDLDX	BSR	ADDLBD	; and load X
	LDX	LBXPTR
	RTS

[JMR202411021012 addendum end.]

And with this in mind, too, while thinking about how the 6801's enhanced instruction set can make some of the above code much less intransigent, let's remind ourselves why the 6809 and 68000 don't need routines like these before we take a look at a concrete example of stack frames on the 6801.

Or you can jump ahead to getting numeric output in binary.


(Title Page/Index)


 

 

 

 

ALPP 02-20 -- Some Address Math for the 6800

  Some Address Math
for the
6800

(Title Page/Index)

Perhaps I would not have gotten so tangled up in the discussion of stack frames if I had simply written this chapter immediately after the demonstration of 16- and 32-bit arithmetic on the 68000. But sometimes you just need to see a reason for doing something before you see someone doing it, or it blows your mind.

What is the difference between address math and other math?

Not a lot. You still have to pay attention to signs and stuff, and watch what happens when you wrap around the limits of your registers. Rings are fun, but you have to get used to them. 

Ah, yes, right. One thing about general address math is that you need to be aware of the limits of your registers. You often don't know in advance where in memory the address you're working on is going to be.

Not to say you don't have to be aware of limits in non-address math -- rather, where the limits hit and how they hit can be different, so you have to watch a different way.

One other difference is that, for general math, you want your call and result parameters in places where they can be easily carried from one stage in calculations to the next. That's why I have been demonstrating the use of the parameter stack versus global variables (versus registers).

For address math, if possible, you absolutely want your parameters and the result in registers, specifically the result in a particular register that can be used in addressing.

In the earliest CPUs, the math itself was hard enough (unknown enough) that addressing seemed to be an afterthought -- or even outside the plans. You can't plan well without knowing what you're planning for -- and what you're planning.

We really didn't know what we were doing. 

Intel, for instance, almost killed themselves in the mid-1970s working on a CPU design that was supposed to be the be-all-and-end-all of CPUs, the iAPX 432. But there was too much theory without experience, and it was slow and fought with itself to get work done. When they saw deadlines pass without end in sight, especially when rumors of what Motorola was doing hit the backyard fence, they scrambled and used part of what they had learned and produced the 8086, and the 8086 was definitely an improvement on the 8080 -- and saved their bacon when the 432 that was delivered didn't live up to promise. And the 8086 was a small enough step forward that it was easy for customers to adopt -- setting the stage for Intel to lead by adopting small improvements in steps that could be handled. But the 8086 also was, and its descendants still are, more than a little baroque.

Motorola, for their part, had figured out they needed to do something radical to stay competitive, and had started examining source code for the 6800 that they had access to, looking for ways to relieve computational bottlenecks. They used that research in the original design of the 68000, and there was a parallel team that had access to the research and put it to use in the design of the 6809.

And they hit a home run on the 6809 -- almost. Brought three runners in and left the DP register stranded on 3rd, so to speak. If you think of DP as the pinch runner or something. Okay, the metaphor doesn't quite work, unless you think of the DP register as the pinch runner for a wider address space, which it almost was.

The 68000 was another home run -- out of season and some overkill. And it has some warts, too.

Every real CPU is going to have warts. It's a mathematical requirement. 

I'm not kidding. There is an axiom in systems science 

Every model is insufficient to reality.

And that has some consequences:

  • Every system has vulnerabilities, and
  • every system contains the seeds of its own undoing, and
  • every market window is a sandpit.

Translated into general science, we know in advance that every theory and every law will eventually fail.

But that kind of cold water just is not popular in the sales department, so, instead of emblazoning it on the halls of all higher learning and in the chambers of legislatures, we hide it away. 

(Mostly -- there is some recognition at times -- POSIWID.) 

All of that to warn you:

Ugly code in here. 

I did some handwaving and conceptualizing for the 6801 in the unsteady footing chapter. I'm continuing with more handwaving and untested code in this chapter, but for the 6800.

First, in the 6800, we have nothing special to add a constant to the index register with anything but an ephemeral result. That's great for some things like constant offsets (thus, the 6800's indexed mode), but not so great for some other things. And it's always a positive constant, which makes some stack-related uses hard.

In the 6801, we have ABX to add a small offset -- unsigned, less than 256 -- but no SBX to subtract an offset, and no signed ASBX or whatever.

The way the instruction set is constructed, we end up having to use a variable in memory to do the math, and because we have to use X to index the stack(s), passing the offset in as a dynamically allocated parameter is a case of trying to resolve a cyclic dependency.

Thus, we simply have to use a pseudo-register -- preferably in the direct page. 

-- Which causes issues at interrupt time, unless we have separate pseudo-registers used by separate routines for interrupt-time, and copy the user tasks' pseudo-registers out and in on context switch.

We'll be using 16-bit negation rather frequently, keep a couple or three snippets in mind:

* For reference -- NEGate a 16-bit value in A:B --
NEGAB	COMA	; 2's complement NEGate is bit COMplement + 1
	NEGB
	BNE	NEGABX	; or BCS. but BNE works -- extends 0
	INCA
NEGABX	RTS
*
* Another way, using stack for temporary:
NEGABS	PSHB
	PSHA
	CLRB	; 0 - A:B
	CLRA
	TSX
	SUBB	1,X
	SBCA	0,X
	INS
	INS
	RTS	
*
* Same thing using a temporary
* somewhere in DP:
	...
SCRCHA	RMB	1
SCRCHB	RMB	1
	...
* somewhere else
NEGABV	STAA	SCRCHA
	STAB	SCRCHB
	CLRA		: 0 - A:B
	CLRB
	SUBB	SCRCHB
	SBCA	SCRCHA
	RTS
	...

I'm going to assume that you'll be reading the code and the comments closely enough to tell when you should doubt me.

Assume you have these declarations for the pseudo-registers:

	ORG	$80
	...
XOFFA	RMB	1
XOFFB	RMB	1
XOFFSV	RMB	2
	...

These entry points should add and subtract offsets in A:B. Note that the code inverts A:B to do the subtraction, to avoid commutation issues. (Note carefully the INCA. I think I have this right for handling the NEGB when B is zero.)

ADDBX	CLRA
ADDDX	STX	XOFFSV
	ADDB	XOFFSV+1
	ADCA	XOFFSV
	STAB	XOFFSV+1
	STAA	XOFFSV
	LDX	XOFFSV
	RTS
SUBBX	CLRA	; B is unsigned
SUBDX	COMA
	NEGB
	BNE	ADDDX	; or BCS. but BNE works -- extends
	INCA
	BRA	ADDDX

As an alternative, we could move the operands around using more pseudo-registers (and remembering the consequences). This code may be a little easier to believe in, but it does mean two more bytes to save away and restore on context switch.

* Alternative, don't use ADDDX, use XOFFA and XOFFB instead 
SUBBX	CLRA	; B is unsigned
SUBDX	STAA	XOFFA	; subtraction does not commute.
	STAB	XOFFB	; Handle operand order.
	STX	XOFFSV
	LDAA	XOFFSV
	LDAB	XOFFSV+1
	SUBB	XOFFB
	SBCA	XOFFA
	STAA	XOFFSV
	STAB	XOFFSV+1
	LDX	XOFFSV
	RTS

You can optimize the above a bit if you limit offsets to 0 to 255, which is a completely reasonable restriction for many applications. I won't show those. I don't want to wear you out with too much untested code.

Signed byte offset (-128 to 127) is also completely reasonable for many applications, and may offer some aesthetic satisfaction:

* this is faster than SUBDX and almost as fast as ADDDX, 
* Range is -128 to 128 which should be enough for many purposes.
* But unsigned byte-only can be faster.
* Needs to be checked again.
ADDSBX	STX	XOFFSV
	TSTB	; sign extend B
*	BEQ	ADSBXD	; use only if we really want to optimize 0
	BPL	ADSBXU
	NEGB	; high byte is -1 (low byte is not 0 anyway)
	ADDB	XOFFSV+1
	DEC	XOFFSV	; add -1 (I think )
	BRA	ADSBXL
ADSBXU	ADDB	XOFFSV+1
	BCC	ADSBXL
	INC	XOFFSV
ADSBXL	LDX	XOFFSV
ADSBXD	RTS

And we can do similar things with the return stack, S. S, in particular, should never need offsets larger than 255 on the 6800, so we'll focus on the unsigned byte options. 

The stack has the additional constraints of requiring some means of handling the return address.

One more thing, you should recognize that the call writes the return address into the allocated space on allocation. If there is something important there, it's toast.

The declarations:

* For S stack
* unsigned byte only,
* because we really don't want to be bumping the return stack that much
	ORG	$90
	...
SOFFB	RMB	1
SOFFSV	RMB	2

And the code, watch the return address code:

ADDBS	TSX
	LDX	0,X	; get return address
	INS
	INS		; restore stack address
	STS	SOFFSV
	ADDB	SOFFSV+1
	BCC	ADDBSL
	INC	SOFFSV
ADDBSL	STAB	SOFFSV
	LDS	SOFFSV
	JMP	0,X	; return through X
SUBBS	NEGB
	BNE	ADDBS	; or BCS. but BNE works -- extend
	INCA
	BRA	ADDBS

Again, the subtraction can alternatively move the operands into the right order, at the cost of using another pseudo-register:

* use SOFFB instead of ADDBS
* range 0 to 255
* Need to check again
SUBBS	TSX
	LDX	0,X	; get return address
	INS
	INS		; restore stack pointer
	STS	SOFFSV
	STAB	SOFFB
	BPL	SUBBSM
	INC	SOFFSV	; subtract -1 (I think )
SUBBSM	LDAB	SOFFSV+1
	SUBB	SOFFB
	BCC	SUBBSL
	DEC	SOFFSV	; subtract the borrow
SUBBSL	STAB	SOFFSV+1
	LDS	SOFFSV
	JMP	0,X	; return through X

You'll remember I made reference to long trains of INX and DEX as a substitute for direct math on X:

* For small increments <= 16
ADD16X	INX
	INX
ADD14X	INX
	INX
ADD12X	INX
	INX
ADD10X	INX
	INX
ADD8X	INX
	INX
ADD6X	INX
	INX
	INX	; ADD4X and less shorter in-line
	INX
	INX	
	INX
	RTS

* For small decrements <= 16
SUB16X	DEX
	DEX
SUB14X	DEX
	DEX
SUB12X	DEX
	DEX
SUB10X	DEX
	DEX
SUB8X	DEX
	DEX
SUB6X	DEX
	DEX
	DEX	; SUB4X and less shorter in-line
	DEX
	DEX	
	DEX
	RTS

Just jump to the label for the offset you need to add or subtract.

I know it looks ... ugly. But it works, and it avoids the use of pseudo-registers, and it's fast, and it actually doesn't use up more code space than the general routines we've looked at. These are worth considering.

And you're thinking, well, that's not going to work for the return stack? 

Hah!

* For small decrements, 8 <= decrement <= 16
* Uses X for return
* Note that this writes the return address in the upper portion of the allocation,
* which may not be desirable.
SUB14S	TSX
	LDX	0,X
	BRA	ISB14S
SUB12S	TSX
	LDX	0,X
	BRA	ISB12S
SUB10S	TSX
	LDX	0,X
	BRA	ISB10S
SUB8S	TSX
	LDX	0,X
	BRA	ISB8S
SUB16S	TSX
	LDX	0,X
ISB16S	DES
	DES
ISB14S	DES
	DES
ISB12S	DES
	DES
ISB10S	DES
	DES
ISB8S	DES
	DES
	DES	; SUB7S and less are shorter in-line
	DES
	DES	
	DES	; two less because of the return address
	JMP	0,X

* For small increments, 8 <= increment <= 16
* Uses X for return
ADD14S	TSX
	LDX	0,X
	BRA	IAD14S
ADD12S	TSX
	LDX	0,X
	BRA	IAD12S
ADD10S	TSX
	LDX	0,X
	BRA	IAD10S
ADD8S	TSX
	LDX	0,X
	BRA	IAD8S
ADD16S	TSX
	LDX	0,X
IAD16S	INS
	INS
IAD14S	INS
	INS
IAD12S	INS
	INS
IAD10S	INS
	INS
IAD8S	INS
	INS
	INS	; ADD7S and less are shorter in-line
	INS
	INS	
	INS
	INS
	INS
	INS	; two more to cover the return address
	INS
	JMP	0,X

What's that? Do I hear complaints about the smell.

It's ugly, but it could be useful.

Stacks allocated entirely within a 256-byte page

Finally, if we are talking about stacks (and other largish things in memory), it may be possible to arrange them in memory so that the stacks lie completely within a single 256 byte page, such that the high byte of address does not change. This particular trick was used to great effect on the 6502 and 6805, in particular. 

We can use it on the 6800 in some cases, if we can be absolutely sure that everybody who ever touches the code is aware of the requirement to keep each stack entirely within a single page.

* Stacks within page boundaries:
* Pseudo-registers somewhere in DP:
PSP	RMB	2
XOFFSV	RMB	2
XOFFB	RMB	1
SOFFB	RMB	1
SOFFSV	RMB	2
	...
	ORG	$500	; or something
	RMB	4	; buffer zone
PSTKLIM	RMB	64
PSTKBAS	RMB	4	; buffer zone
SSTKLIM	RMB	32
SSTKBAS	RMB	4	; buffer zone
	...

* For parameter stack:
ADBPSX	STX	PSP
ADBPSP	ADDB	PSP+1	; Stack allocated completely within page, never carries.
	STAB	PSP+1
	LDX	PSP
	RTS
*
SBBPSX	STX	PSP
SBBPSP	STAB	XOFFB
	LDAB	PSP+1
	SUBB	XOFFB	; Stack allocated completely within page, never carries.
	STAB	PSP+1
	LDX	PSP
	RTS

* For return stack:
ADBSP	TSX
	LDX	0,X	; return address
	ADDB	#2		; faster, same byte count
	STS	SOFFSV
	ADDB	SOFFSV+1	; Stack allocated completely within page, never carries.
	STAB	SOFFSV+1
	LDS	SOFFSV
	JMP	0,X	; return via X

SBBSP	TSX
	LDX	0,X	; return address
	ADDB	#2		; faster, same byte count
	STS	SOFFSV
	STAB	SOFFB
	LDAB	SOFFSV+1
	SUBB	SOFFB		; Stack allocated completely within page, never carries
	STAB	SOFFSV+1
	LDS	SOFFSV
	JMP	0,X	; return via X
Again, I have not tested the code. It should run. I think.

As a reminder, we've already seen what code looks like without stack frames. The only reason I'm showing you this stuff is so that you understand why stack frames may not be preferred for many applications (and, if you can understand that, maybe you can sometime see it for all applications).

Well, no, not the only reason. Maybe the only reason I'm showing it to you now rather than later.

[JMR202411020931 addendum:]

This is not stack frame related, but it's address math related, and I think it would be good to discuss it here, lest I forget --

There are two approaches to per-process variables. 
  • Pseudo-registers like PSP, XWORK, XOFFSV, SOFFSV, etc. will either be saved and restored on process switch or will have separate versions for each task, if there are not too many.
  • Most per-process variables with global allocation should be in a per-process address space. 

You'll usually use both, a few pseudo-registers for variables that need quick access, and they need to just a few to keep the management overhead on task/process switch to a minimum. Every pseudo-register must be saved and restored on process switch --

Except for a couple of special cases, 

  • It's useful to keep system pseudo-registers separate from non-system pseudo-registers, complete with separate routines to manage them.
  • If there are just a few non-system processes in a small hardware application, it may be useful to give each process its own pseudo-registers, along with the routines to manage them.

What kinds of things need to be pseudo-registers? 

XWORK and other such temporaries, including SOFFSV and such above.

And PSP, as well. (Note that, if the system functions use a parameter stack, it should be a separate SPSP or something, which would have to have its own support routines.)

If there are a lot of per-process variables, you would need, separate from pseudo-registers, a process-local space. And you would need a pointer to that space, with routines to access the variables there:

* In the DP somewhere, we have a base pointer for the per-process
* variable space, and a working register, something like
	...
LOCBAS	RMB	2
LBXPTR	RMB	2
	...
*
* And, to get the address of variables in the per-process variable space,
* something link these --
ADDLBB	CLRA			; entry point for the byte offset in B
ADDLBD	ADDB	LOCBAS+1	; entry point for larger offsets in A:B
	ADCA	LOCBAS
	STAA	LBXPTR
	STAB	LBXPTR+1	; let other code load X
	RTS
*
ADDLBX	BSR	ADDLBB	; and load X
	LDX	LBXPTR
	RTS
*
ADDLDX	BSR	ADDLBD	; and load X
	LDX	LBXPTR
	RTS

[JMR202411020931 addendum end.]

With all this in mind, look at how the 6801's enhanced instruction set can make some of the above code much less intransigent before we take a look at a concrete example of stack frames on the 6800.

Or you can jump ahead to getting numeric output in binary.


(Title Page/Index)


 

 

 

 

Tuesday, October 29, 2024

Teaching Myself Python, part 1

Right in the middle of my assembly language tutorial, I decided to teach myself Python.

(A friend needed some help with his classwork.)

I'm running the interactive interpreter, and this is a partial record of the conversation.

account@computer:~$ python
Python 2.7.17 (default, Sep 30 2024, 12:35:16) 
[GCC 7.5.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

That's pretty much not unexpected. 

What did I just do? 

I invoked Python with just 

python

at the bash shell command-line $ prompt. And Python tells me what version I'm running and some information about how it was built, followed by a prompt about things people might be particularly interested in. 

And you see I'm running a slightly old python -- version 2.7. (I actually have python v. 3 running, as well, but python v. 2 is my default right now.)

Where did I get Python?

Python gets installed by default now on a lot of Posix OSses. It's available in pretty much every distribution's package manager. I'm running an older version of Ubuntu Linux. (I need to upgrade, trying to decide between going back to Debian or over to Devuan, or staying with Ubuntu, or heading back to one of the BSDs.) 

I think v. 2 was installed by default, and I installed v. 3 from the packages. Or maybe it was the other way around, or maybe I installed both from packages. I don't remember.

If you're running MacOS, you can get Python from python.org or through one of several 3rd party package managers. (Macs actually come with Python, and it should be good enough for what I'm doing here. But if you decide to use Python regularly, it's recommended that you install a separate version so that you don't disturb the system's version when you install Python stuff that Mac OS doesn't use.)

If you're running MSWindows, you can apparently get Python from the Microsoft Store as well as from python.org. Or you can install Cygwin and get Python from the Cygwin packages.

And Python is available from python.org and others in the Android and Apple iOS stores, from what I hear.

Anyway, Python has been installed, and I have been putting off actually learning it, for quite a while. 

And there are two fundamental ways of running Python --

  1. Run a program written in Python (whether by clicking an icon or by invoking it at the command line), and
  2. run the interactive Python interpreter itself (whether by clicking an icon or invoking it at the command line)

I'm doing the latter, invoking it at the command line. (See above.)

What I type once Python is running is on the lines that start

>>>

Let's see what happens when I type "help", like the prompt suggests:

>>> help
Type help() for interactive help, or help(object) for help about object.
>>> 

Huh? 

Hmm. (Head scratching ensues.) 

Maybe, type "help()"?

>>> help()

Welcome to Python 2.7!  This is the online help utility.

If this is your first time using Python, you should definitely check out
the tutorial on the Internet at http://docs.python.org/2.7/tutorial/.

Enter the name of any module, keyword, or topic to get help on writing
Python programs and using Python modules.  To quit this help utility and
return to the interpreter, just type "quit".

To get a list of available modules, keywords, or topics, type "modules",
"keywords", or "topics".  Each module also comes with a one-line summary
of what it does; to list the modules whose summaries contain a given word
such as "spam", type "modules spam".

help>

And now it seems to be prompting me with "help> instead of ">>>".

Okay, "spam" sounds interesting.

help> spam
no Python documentation found for 'spam'

help>

Oh. I guess I need to type "modules spam" for that.

help> modules spam

Here is a list of matching modules.  Enter any module name to get more help.

Gtk-Message: 07:40:20.836: Failed to load module "canberra-gtk-module"

help>

I guess that means that something is not installed. I don't feel like chasing that down right now, so let's forget about the module "spam".

Maybe I could read up on arrays in Python.

help> modules array

Here is a list of matching modules.  Enter any module name to get more help.

array - This module defines an object type which can efficiently represent
numpy.core._methods - Array methods which are called by both the C-code for the method
numpy.core.arrayprint - Array printing function
numpy.core.defchararray - This module contains a set of functions for vectorized string
numpy.core.info - Defines a multi-dimensional array and useful procedures for Numerical computation.
numpy.core.multiarray 
numpy.core.multiarray_tests 
numpy.core.records - Record Arrays
numpy.lib.arraypad - The arraypad module contains a group of functions to pad values onto the edges
numpy.lib.arraysetops - Set operations for arrays based on sorting.
numpy.lib.arrayterator - A buffered iterator for big arrays.
numpy.lib.format - Define a simple format for saving numpy arrays to disk with the full
numpy.lib.mixins - Mixin classes for custom array types that don't inherit from ndarray.
numpy.lib.recfunctions - Collection of utilities to manipulate structured arrays.
numpy.lib.twodim_base - Basic functions for manipulating 2d arrays
numpy.lib.ufunclike - Module of functions that are like ufuncs in acting on arrays and optionally
numpy.lib.user_array - Standard container-class for easy multiple-inheritance.
numpy.ma.extras - Masked arrays add-ons.
numpy.ma.testutils - Miscellaneous functions for testing masked arrays and subclasses

help> 

Well, maybe I'll chase down the arrays module later and just get back to trying out Python. It said I could type "quit" to go back:

help> quit

You are now leaving help and returning to the Python interpreter.
If you want to ask for help on a particular object directly from the
interpreter, you can type "help(object)".  Executing "help('string')"
has the same effect as typing a particular string at the help> prompt.
>>> 

Well, that's useful information about how to use the help function.

Let's see if I can get Python to say "hello".

>>> hello
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'hello' is not defined
>>> 

Not that way. Maybe tell it to print it out?

>>> print hello
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'hello' is not defined
>>>

that doesn't seem to be the way to do it, either. I guess it needs to be quoted:

>>> print "hello"
hello
>>> 

That seems to work. What next?

>>> name = "Joel"
>>> print "hello ", name
hello  Joel
>>> 

Let's see if I can stroke my ego some more:

>>> rank = 1
>>> print "hello", name, "you're number ", rank, "!"
hello Joel you're number  1 !
>>> 

Cheap thrills. Heh.

One more vanity one-liner, but it requires putting either the program or the name in a file. We'll put the program in a file. 

To do this, you open up a text editor. Some people like geany or kate. I generally use gedit or vim. All four and others are available in most Posix OSses' package managers, and can be downloaded for Mac OS or installed via 3rd party package managers. 

Of course, Mac OS has Xcode (vim, too!), and I have found Xcode quite useful on Mac, useful enough that I usually have not installed gedit. (Make sure you download the command-line tools when you download Xcode.)

And, if you must use MSWindows, all four can be installed from Cygwin packages if you install Cygwin. (Geany and Gedit are supported for MSWindows, and can be downloaded from their respective download sites, but Kate takes a bit more work if you don't get it through Cygwin.) Or you can use Microsoft's Visual Studio. (Bleaugh. Just kidding. Sort-of.)

Anyway, open up a text editor and type in the following one-liner and save it in a directory called something like "play/python" as "greet.py". You'll need to change directory in your command-line shell, as well. Oh. Yeah. You'll need to use a command-line shell for this one. I don't think it can be done easily when you start Python by clicking an icon.

Actually, this one-liner is a two-liner:

import sys
print "Hello", sys.argv[ 1 ]

As I say, save it as "greet.py". Then go to your command-line shell and type the command "python greet.py Joel:

account@computer:~/play/python$ python greet.py Joel
Hello Joel
account@computer:~/play/python$ 

By way of explanation --

(1) You can save your program in a file with a ".py" extension (ending on the name) and have python run the program for you. Interactive is fun, but there are some things that don't work well interactively (can't work well interactively).

(2) Python makes whatever you type after the name of the program file available to your program in a list (I think it's a list.) called "argv", in the system module called "sys".

If you're not used to using these command-line parameters, that may not be much explanation, but once you start using them, they're not hard to understand.

Well, let's see what else might be interesting. Can I make the thing count?

>>> mylist = [ 1, 2, 3, 4 ]
>>> print mylist, "I just plain adore ..."
[1, 2, 3, 4] I just plain adore ...
>>> 

Rim shot. Let's try an explicit loop:

>>> for i in range( 1, 4 )
  File "<stdin>", line 1
    for i in range( 1, 4 )
                         ^
SyntaxError: invalid syntax
>>> 

Ooooh, rejected!

Needs a colon on the end. Try again:

>>> for i in range( 1, 4 ):
...   print i
... 
1
2
3
>>> 

The "..." ellipsis prompt means it wants you to type more. So I typed the body of the loop.

And it's still not quite there. The last number in the range is not included.

Point one, for people who like their code blocks to have beginning braces and end braces or BEGIN and END keywords, Python uses indentation to demarcate blocks.

Yeah. The whitespace. The stuff you've trained yourself to not see. Or at least I have.

That's one of the reasons I got miffed at Python in the past. But I have a friend this time, so I'll just go with it.

I do see that the colon seems to be a sort of BEGIN marker. And after some playing around late last night, I figured out that, even though python doesn't require it, you can put comment characters in place to show where the indentation hits. It's kind of like painting the fingers of your invisible robot hands so you can see what they're doing, but it seems to sort of do the job.

My friend needs help with working through a list, summing up a column, and taking an average. So let's look at two ways to sum a list in Python. This is a kind of a sudden jump, but we've seen most of the essential elements of the language that I'm using, so, hang on to your hat, and I'll hang on to mine:

Grabbing input from the keyboard can get confusing, so we're gong to save this as a program, call it "totaloop.py":

total = 0
count = 0
for i in range( 1, 10):
  numstr = input()
  previous = total
  total = total + numstr
  count = count + 1
  print count, ": ", previous, " + ", numstr, " = ", total ;
  # end of loop

average = float( total ) / count
print "average of ", total, " / ", count, " is ", average

What this does is

  • read a number from the keyboard,
  • add it immediately to the total, and
  • print out the number, count, and running total
  • until 9 numbers have been read; 
  • at which point, the average is calculated and printed out.

Make sure you save it with the empty trailing line. Call it from the command line, like this:

$ python totaloop.py

And then type in a bunch of numbers

$ python totaloop.py
2
1 :  0  +  2  =  2
6
2 :  2  +  6  =  8
3
3 :  8  +  3  =  11
2
4 :  11  +  2  =  13
3
5 :  13  +  3  =  16
6
6 :  16  +  6  =  22
4
7 :  22  +  4  =  26
9
8 :  26  +  9  =  35
3
9 :  35  +  3  =  38
average of  38  /  9  is  4.22222222222

You can see that I typed in the numbers 2 6 3 2 3 6 4 9 3.

Now, instead of immediately calculating the running total, we'll input the entire list first, and then use a built-in function to sum up the whole list:

numbers = []
for i in range( 1, 10 ):
  numstr = input()
  numbers.append( float( numstr ) )
  print i, ": ", numstr, numbers 
  # End of loop

total = sum( numbers )
average = total / len( numbers )

print "list: ", numbers
print "sum is ", total, " and average is ", average

Re-emphasizing, the program 

  • reads in a number,
  • appends it to the list,
  • prints the number read and the list
  • until 9 numbers are read,
  • and then, after the loop is finished, sums them all up and
  • prints the results 

Call it from the command line and type in the list of numbers typed in before just so we can compare:

$ python totalist.py
2
1 :  2 [2.0]
6
2 :  6 [2.0, 6.0]
3
3 :  3 [2.0, 6.0, 3.0]
2
4 :  2 [2.0, 6.0, 3.0, 2.0]
3
5 :  3 [2.0, 6.0, 3.0, 2.0, 3.0]
6
6 :  6 [2.0, 6.0, 3.0, 2.0, 3.0, 6.0]
4
7 :  4 [2.0, 6.0, 3.0, 2.0, 3.0, 6.0, 4.0]
9
8 :  9 [2.0, 6.0, 3.0, 2.0, 3.0, 6.0, 4.0, 9.0]
3
9 :  3 [2.0, 6.0, 3.0, 2.0, 3.0, 6.0, 4.0, 9.0, 3.0]
list:  [2.0, 6.0, 3.0, 2.0, 3.0, 6.0, 4.0, 9.0, 3.0]
sum is  38.0  and average is  4.22222222222

The first way uses less memory.

The second way is what you use if you need to do more than one thing with the list of numbers.

Let's say you want to remember what the 4th number entered was and print it out at the end. 

You could add code to the first one like this:

total = 0
count = 0
memo = 0
for i in range( 1, 10):
  numstr = input()
  previous = total
  total = total + numstr
  count = count + 1
  if count == 4:
    memo = numstr
    # end of conditional
  print count, ": ", previous, " + ", numstr, " = ", total ;
  # end of loop

average = float( total ) / count
print "average of ", total, " / ", count, " is ", average
print "4th number:", memo

Essentially, watch the count, and when it hits 4 put the input number in the memo variable.

Note that Python wants an empty line at the same indentation at the end of the loop and at the end of the conditional if. I'm marking the empty line and the indentation with a comment. (Comments have no code, and can serve as empty lines.)

If you fail to indent the trailing empty line, it terminates everything (and makes a fuss about syntax).

Here's a way we can do it if we keep the whole list in memory:

numbers = []
for i in range( 1, 10 ):
  numstr = input()
  numbers.append( float( numstr ) )
  print i, ": ", numstr, numbers 
  # End of loop

total = sum( numbers )
average = total / len( numbers )

print "list: ", numbers
print "sum is ", total, " and average is ", average
# first is numbers[ 0 ], 4th is numbers[ 3 ]
print "4th number:", numbers[ 3 ]

Since we still have the whole list in memory, we can just index the 4th element. Remember, lists start with index 0, so the first one is numbers[0]. Therefore, the 4th one is at index 3, numbers[3].

Hopefully, this will be enough for my friend to work it through.

Monday, October 28, 2024

ALPP 03-XX -- Personalized "Hello World" on Four Processors

(False start, keeping for notes.)

Personalized "Hello World" on Four CPUs.

(Title Page/Index)

 

We can now get the ASCII code of a key input from the keyboard, as provided by the monitor/BIOS routines.

But I've been so focused on numbers since we got the "Hello World" string output (barely on the beach) working that I wasn't really thinking about showing how to get a string in. We don't really need to get a string in to do a simple calculator.

I think I have good reason. Numbers are pretty much what computers do, in order to get information in and out, so numbers are important.

But there's a cycle of understanding in this. Parsing numbers for real requires parsing a string, even if you're riding the keyboard. So I have two paths I can take you down, and it occurs to me that I possibly should have done the personalized Hello World in the 2nd unit.

Since I have the detour through stack frames in between, it's going to be a bit hard to just go back and patch that in -- and then send all readers who have finished this point back.

So, maybe, before we go parsing numbers, we should first learn how to get a string from the keyboard? 

That's what we're going to do here.

 


(Title Page/Index)

 

 

 

 

Saturday, October 26, 2024

ALPP 03-14 -- Keyboard Input Routines and Character Code Output on the 68000 (Debug Session -- Dealloc Error)

Keyboard Input Routines
and Character Coded Output
on the 68000
(Debug Session -- Dealloc Error)

(Title Page/Index)

 

So we found one of the bugs in the code our test program to read the keyboard and show the character and the character code in binary and hexadecimal. 

And I told you we should use techniques that I have described to check that stack balance has been maintained.

But inserting test code into code is not just a great way to test code, it's also a great way to insert new bugs, mask old bugs, and increase opportunities to accidentally alter the code.

So we want to figure out what parts of the code we want to look at before we insert the code to look at it with.

Let's start another Hatari session and set some breakpoints. You'll want the assembly output listing from vasm in a text editor window for reference. In my case, I called it "inkey_68K.list" when I did the assembly the last time:

vasmm68k_mot -Ftos -no-opt -o INKEY_68K.PRG -L inkey_68K.list inkey_68K.s

Without either the listing or the source code open to look at, you'll be flying blind. Even with the listing, you'll be flying instrument rules, so to speak.

Break into the debugger and set the TEXT breakpoint, of course. Then (c)ontinue:

----------------------------------------------------------------------
You have entered debug mode. Type c to continue emulation, h for help.

CPU=$e1d7e2, VBL=1366, FrameCycles=128, HBL=0, LineCycles=128, DSP=N/A
00e1d7e2 46c0                     move.w d0,sr
> b pc=TEXT
CPU condition breakpoint 1 with 1 condition(s) added:
	pc = TEXT
> c
Returning to emulation...

Back at the EMUCON console, invoke the program (INKEY_68K.PRG above) and when it tries to enter the TEXT segment code it will take you to the breakpoint. 

Step into the BRA START at ENTRY and take a disassembly from the PC at START:

1. CPU breakpoint condition(s) matched 1 times.
	pc = TEXT
Reading symbols from program '/home/nova/usr/share/hatari/C:/primer/char_io/stepinch/INKEY_68K.PRG' symbol table...
TOS executable, DRI / GST symbol table, reloc=0, program flags: PRIVATE (0x0)
Program section sizes:
  text: 0x350, data: 0x0, bss: 0x0, symtab: 0x32c
Trying to load DRI symbol table at offset 0x36c...
Offsetting BSS/DATA symbols from TEXT section.
Skipping duplicate address & symbol name checks when autoload is enabled.
Loaded 56 symbols (41 TEXT) from '/home/nova/usr/share/hatari/C:/primer/char_io/stepinch/INKEY_68K.PRG'.

CPU=$13d10, VBL=3801, FrameCycles=210184, HBL=206, LineCycles=888, DSP=N/A
00013d10 6000 02a0                bra.w #$02a0 == $00013fb2 (T)
> s

CPU=$13fb2, VBL=3801, FrameCycles=210196, HBL=206, LineCycles=900, DSP=N/A
00013fb2 6100 ff34                bsr.w #$ff34 == $00013ee8
> d
(PC)
START:
00013fb2 6100 ff34                bsr.w #$ff34 == $00013ee8
00013fb6 4e71                     nop 
00013fb8 6100 005a                bsr.w #$005a == $00014014
DONE:
00013fbc 4e71                     nop 
00013fbe 4ced f000 0008           movem.l (a5,$0008) == $00014068,a4-a7
00013fc4 4e71                     nop 
00013fc6 4e71                     nop 
00013fc8 4e71                     nop 
00013fca 4e71                     nop 
00013fcc 4267                     clr.w -(a7) [0000]
00013fce 4e41                     trap #$01
INCHNE:
00013fd0 610a                     bsr.b #$0a == $00013fdc
00013fd2 c0bc 0000 ffff           and.l #$0000ffff,d0
00013fd8 2d00                     move.l d0,-(a6) [00000000]
00013fda 4e75                     rts  == $00000000
INCHV:
00013fdc 3f3c 0002                move.w #$0002,-(a7) [0000]
00013fe0 3f3c 0002                move.w #$0002,-(a7) [0000]
00013fe4 4e4d                     trap #$0d
> 

Get a look at the registers and step through INITRT, watching the stack and run-time initilizations. Show the registers again at return, even if you don't need to see them before that.

Remember that INITRT returns through JMP A0, not RTS:

> r

  D0 00000000   D1 00000000   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000000 
  A0 00000000   A1 00000000   A2 00000000   A3 00000000 
  A4 00014060   A5 00014060   A6 00077FC6   A7 00077FF8 
USP  00077FF8 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 6100 (BSR) ff34 (ILLEGAL) Chip latch 00000000
00013fb2 6100 ff34                bsr.w #$ff34 == $00013ee8
Next PC: 00013fb6
> s

CPU=$13ee8, VBL=3801, FrameCycles=210216, HBL=206, LineCycles=920, DSP=N/A
00013ee8 205f                     movea.l (a7)+ [00013fb6],a0
> s

CPU=$13eea, VBL=3801, FrameCycles=210228, HBL=206, LineCycles=932, DSP=N/A
00013eea 47fa fe24                lea.l (pc,$fe24) == $00013d10,a3
> s

CPU=$13eee, VBL=3801, FrameCycles=210236, HBL=206, LineCycles=940, DSP=N/A
00013eee 48eb f000 0008           movem.l a4-a7,(a3,$0008) == $00013d18
> s

CPU=$13ef4, VBL=3801, FrameCycles=210280, HBL=206, LineCycles=984, DSP=N/A
00013ef4 2a4b                     movea.l a3,a5
> s

CPU=$13ef6, VBL=3801, FrameCycles=210284, HBL=206, LineCycles=988, DSP=N/A
00013ef6 4fed 0148                lea.l (a5,$0148) == $00013e58,a7
> s

CPU=$13efa, VBL=3801, FrameCycles=210292, HBL=206, LineCycles=996, DSP=N/A
00013efa 4ded 01d0                lea.l (a5,$01d0) == $00013ee0,a6
> s

CPU=$13efe, VBL=3801, FrameCycles=210300, HBL=206, LineCycles=1004, DSP=N/A
00013efe 4ed0                     jmp (a0)
> r
  D0 00000000   D1 00000000   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000000 
  A0 00013FB6   A1 00000000   A2 00000000   A3 00013D10 
  A4 00014060   A5 00013D10   A6 00013EE0   A7 00013E58 
USP  00013E58 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 4ed0 (JMP) 4286 (CLR) Chip latch 00000000
00013efe 4ed0                     jmp (a0)
Next PC: 00013f00
> 

Step into the main routine, PGSTRT, and, before you step too far, show the registers and get a disassembly from the PC. Take particular note of the parameter stack pointer, A6.

Remember, when you step,  and when you show registers, it shows you the next op-code to perform, not the one just completed:

> s

CPU=$13fb6, VBL=3801, FrameCycles=210308, HBL=206, LineCycles=1012, DSP=N/A
00013fb6 4e71                     nop 
> r
  D0 00000000   D1 00000000   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000000 
  A0 00013FB6   A1 00000000   A2 00000000   A3 00013D10 
  A4 00014060   A5 00013D10   A6 00013EE0   A7 00013E58 
USP  00013E58 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 4e71 (NOP) 6100 (BSR) Chip latch 00000000
00013fb6 4e71                     nop 
Next PC: 00013fb8
> s

CPU=$13fb8, VBL=3801, FrameCycles=210312, HBL=207, LineCycles=0, DSP=N/A
00013fb8 6100 005a                bsr.w #$005a == $00014014
> s

CPU=$14014, VBL=3801, FrameCycles=210332, HBL=207, LineCycles=20, DSP=N/A
00014014 41fa ffd4                lea.l (pc,$ffd4) == $00013fea,a0
> s

CPU=$14018, VBL=3801, FrameCycles=210340, HBL=207, LineCycles=28, DSP=N/A
00014018 2d08                     move.l a0,-(a6) [00000000]
> r
  D0 00000000   D1 00000000   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000000 
  A0 00013FEA   A1 00000000   A2 00000000   A3 00013D10 
  A4 00014060   A5 00013D10   A6 00013EE0   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 2d08 (MOVE) 6100 (BSR) Chip latch 00000000
00014018 2d08                     move.l a0,-(a6) [00000000]
Next PC: 0001401a
> s

CPU=$1401a, VBL=3801, FrameCycles=210352, HBL=207, LineCycles=40, DSP=N/A
0001401a 6100 ff50                bsr.w #$ff50 == $00013f6c
> d
(PC)
0001401a 6100 ff50                bsr.w #$ff50 == $00013f6c
0001401e 6100 ffb0                bsr.w #$ffb0 == $00013fd0
00014022 2d16                     move.l (a6) [00013fea],-(a6) [00000000]
00014024 41fa ffe2                lea.l (pc,$ffe2) == $00014008,a0
00014028 2d08                     move.l a0,-(a6) [00000000]
0001402a 6100 ff40                bsr.w #$ff40 == $00013f6c
0001402e 6100 ff24                bsr.w #$ff24 == $00013f54
00014032 2d16                     move.l (a6) [00013fea],-(a6) [00000000]
00014034 41fa ffd7                lea.l (pc,$ffd7) == $0001400d,a0
00014038 2d08                     move.l a0,-(a6) [00000000]
0001403a 6100 ff30                bsr.w #$ff30 == $00013f6c
0001403e 6100 fef0                bsr.w #$fef0 == $00013f30
00014042 2d16                     move.l (a6) [00013fea],-(a6) [00000000]
00014044 41fa ffc9                lea.l (pc,$ffc9) == $0001400f,a0
00014048 2d08                     move.l a0,-(a6) [00000000]
0001404a 6100 ff20                bsr.w #$ff20 == $00013f6c
0001404e 6100 feb0                bsr.w #$feb0 == $00013f00
00014052 6100 fef6                bsr.w #$fef6 == $00013f4a
00014056 221e                     move.l (a6)+ [00013fea],d1
00014058 b23c 0051                cmp.b #$51,d1
0001405c 66b6                     bne.b #$b6 == $00014014 (T)
0001405e 4e75                     rts  == $00013fbc
>

Use the listing and the disassembly to work out the addresses for the breakpoints, and set a breakpoint after every call:

> b pc=$1401e
CPU condition breakpoint 2 with 1 condition(s) added:
	pc = $1401e
> b pc=$14022
CPU condition breakpoint 3 with 1 condition(s) added:
	pc = $14022
> b pc=$1402e
CPU condition breakpoint 4 with 1 condition(s) added:
	pc = $1402e
> b pc=$14032
CPU condition breakpoint 5 with 1 condition(s) added:
	pc = $14032
> b pc=$1403e
CPU condition breakpoint 6 with 1 condition(s) added:
	pc = $1403e
> b pc=$14042
CPU condition breakpoint 7 with 1 condition(s) added:
	pc = $14042
> b pc=$1404e
CPU condition breakpoint 8 with 1 condition(s) added:
	pc = $1404e
> b pc=$14052
CPU condition breakpoint 9 with 1 condition(s) added:
	pc = $14052
> b pc=$14056
CPU condition breakpoint 10 with 1 condition(s) added:
	pc = $14056
> 

Show the registers and continue to the first breakpoint, and repeat, watching the stacks, in particular. Check the listing for what should be on the parameter stack before and after each call.

The first call is to OUTS, and the address of the PROMPT string should be on the stack. From $13EE0 at empty stack to $13EDC is four bytes, so that's one address.

> r
  D0 00000000   D1 00000000   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000000 
  A0 00013FEA   A1 00000000   A2 00000000   A3 00013D10 
  A4 00014060   A5 00013D10   A6 00013EDC   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 6100 (BSR) ff50 (ILLEGAL) Chip latch 00000000
0001401a 6100 ff50                bsr.w #$ff50 == $00013f6c
Next PC: 0001401e
> c
Returning to emulation...
2. CPU breakpoint condition(s) matched 1 times.
	pc = $1401e

CPU=$1401e, VBL=3802, FrameCycles=27092, HBL=26, LineCycles=676, DSP=N/A
0001401e 6100 ffb0                bsr.w #$ffb0 == $00013fd0
> r
  D0 00000001   D1 00000000   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000000 
  A0 00000000   A1 00002F50   A2 00000000   A3 00014008 
  A4 00014060   A5 00013D10   A6 00013EE0   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=1 V=0 C=0 IMASK=3 STP=0
Prefetch 6100 (BSR) ffb0 (ILLEGAL) Chip latch 00000000
0001401e 6100 ffb0                bsr.w #$ffb0 == $00013fd0
Next PC: 00014022
> 

When it returns, A6 = $13EE0 shows that the string address has been removed, and the stack is empty again.

Now it calls INCHNE,

> c
Returning to emulation...
3. CPU breakpoint condition(s) matched 1 times.
	pc = $14022

CPU=$14022, VBL=3803, FrameCycles=232088, HBL=228, LineCycles=440, DSP=N/A
00014022 2d16                     move.l (a6) [0000000d],-(a6) [00000000]
> r
  D0 0000000D   D1 00002310   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000000 
  A0 00002F50   A1 00002F50   A2 00000000   A3 00014008 
  A4 00014060   A5 00013D10   A6 00013EDC   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 2d16 (MOVE) 41fa (LEA) Chip latch 00000000
00014022 2d16                     move.l (a6) [0000000d],-(a6) [00000000]
Next PC: 00014024
> 

When it returns from INCHNE, it has the character on the stack in a full 4 byte integer ($13EDC). 

I didn't look at the contents of the parameter stack here, but, if you need to, you can use the (m)emory dump command

> m a6 32

to show the top 32 bytes.

It should be noted that the comment "duplicate" in the source code somehow moved two lines below where it should be.

> s

CPU=$14024, VBL=3803, FrameCycles=232108, HBL=228, LineCycles=460, DSP=N/A
00014024 41fa ffe2                lea.l (pc,$ffe2) == $00014008,a0
> s

CPU=$14028, VBL=3803, FrameCycles=232116, HBL=228, LineCycles=468, DSP=N/A
00014028 2d08                     move.l a0,-(a6) [00000000]
> s

CPU=$1402a, VBL=3803, FrameCycles=232128, HBL=228, LineCycles=480, DSP=N/A
0001402a 6100 ff40                bsr.w #$ff40 == $00013f6c
> r
  D0 0000000D   D1 00002310   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000000 
  A0 00014008   A1 00002F50   A2 00000000   A3 00014008 
  A4 00014060   A5 00013D10   A6 00013ED4   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 6100 (BSR) ff40 (ILLEGAL) Chip latch 00000000
0001402a 6100 ff40                bsr.w #$ff40 == $00013f6c
Next PC: 0001402e
> 

At this point, A6 is $13Ed4 -- three integers on stack: the character input, a copy (duplicate) of the character, and a pointer to bit of leader text to demarcate it. And we're going to call OUTS and OUTC, to show the character.

> c
Returning to emulation...
4. CPU breakpoint condition(s) matched 1 times.
	pc = $1402e

CPU=$1402e, VBL=3803, FrameCycles=243532, HBL=239, LineCycles=708, DSP=N/A
0001402e 6100 ff24                bsr.w #$ff24 == $00013f54
> r
  D0 00000001   D1 0007A309   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000000 
  A0 000000A0   A1 00002F50   A2 00000000   A3 0001400D 
  A4 00014060   A5 00013D10   A6 00013ED8   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=1 V=0 C=0 IMASK=3 STP=0
Prefetch 6100 (BSR) ff24 (ILLEGAL) Chip latch 00000000
0001402e 6100 ff24                bsr.w #$ff24 == $00013f54
Next PC: 00014032
> c
Returning to emulation...
5. CPU breakpoint condition(s) matched 1 times.
	pc = $14032

CPU=$14032, VBL=3803, FrameCycles=245588, HBL=241, LineCycles=732, DSP=N/A
00014032 2d16                     move.l (a6) [0000000d],-(a6) [0000000d]
> 

I should have shown the registers again, to show that $A6 was back to $13EDC after the call. But I stepped. It's okay, we can deduce where things were from the next register dump.

> s

CPU=$14034, VBL=3803, FrameCycles=245608, HBL=241, LineCycles=752, DSP=N/A
00014034 41fa ffd7                lea.l (pc,$ffd7) == $0001400d,a0
> s

CPU=$14038, VBL=3803, FrameCycles=245616, HBL=241, LineCycles=760, DSP=N/A
00014038 2d08                     move.l a0,-(a6) [00014008]
> s

CPU=$1403a, VBL=3803, FrameCycles=245628, HBL=241, LineCycles=772, DSP=N/A
0001403a 6100 ff30                bsr.w #$ff30 == $00013f6c
> r
  D0 00000001   D1 00000000   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 0000000D 
  A0 0001400D   A1 00002F50   A2 00000000   A3 0001400D 
  A4 00014060   A5 00013D10   A6 00013ED4   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 6100 (BSR) ff30 (ILLEGAL) Chip latch 00000000
0001403a 6100 ff30                bsr.w #$ff30 == $00013f6c
Next PC: 0001403e
> 

Before the call to put the colon before the character code in binary out, the character, a copy, and the string address on stack -- $13ED4.

> c
Returning to emulation...
6. CPU breakpoint condition(s) matched 1 times.
	pc = $1403e

CPU=$1403e, VBL=3803, FrameCycles=248512, HBL=244, LineCycles=608, DSP=N/A
0001403e 6100 fef0                bsr.w #$fef0 == $00013f30
> r
  D0 00000001   D1 0007A304   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000000 
  A0 000000A0   A1 00002F50   A2 00000000   A3 0001400F 
  A4 00014060   A5 00013D10   A6 00013ED8   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=1 V=0 C=0 IMASK=3 STP=0
Prefetch 6100 (BSR) fef0 (ILLEGAL) Chip latch 00000000
0001403e 6100 fef0                bsr.w #$fef0 == $00013f30
Next PC: 00014042
> 

Between the call to put the colon out and the call to put the binary character code out. The character and a copy on the stack -- $13ED8.

Next we let it call OUTB8:

> c
Returning to emulation...
7. CPU breakpoint condition(s) matched 1 times.
	pc = $14042

CPU=$14042, VBL=3804, FrameCycles=7176, HBL=7, LineCycles=64, DSP=N/A
00014042 2d16                     move.l (a6) [0001400d],-(a6) [00000000]
> r
  D0 00000001   D1 0007A314   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000031 
  A0 000000A0   A1 00002F50   A2 00000000   A3 0001400F 
  A4 00014060   A5 00013D10   A6 00013ED4   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=1 V=0 C=0 IMASK=3 STP=0
Prefetch 2d16 (MOVE) 41fa (LEA) Chip latch 00000000
00014042 2d16                     move.l (a6) [0001400d],-(a6) [00000000]
Next PC: 00014044
> 

After the binary output, A6 is $13ED4. It should be back to just the character on the stack, $13EDC.

But let's trace through and see what else we can see.

> s

CPU=$14044, VBL=3804, FrameCycles=7196, HBL=7, LineCycles=84, DSP=N/A
00014044 41fa ffc9                lea.l (pc,$ffc9) == $0001400f,a0
> s

CPU=$14048, VBL=3804, FrameCycles=7204, HBL=7, LineCycles=92, DSP=N/A
00014048 2d08                     move.l a0,-(a6) [00000000]
> s

CPU=$1404a, VBL=3804, FrameCycles=7216, HBL=7, LineCycles=104, DSP=N/A
0001404a 6100 ff20                bsr.w #$ff20 == $00013f6c
> r
  D0 00000001   D1 0007A314   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000031 
  A0 0001400F   A1 00002F50   A2 00000000   A3 0001400F 
  A4 00014060   A5 00013D10   A6 00013ECC   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 6100 (BSR) ff20 (ILLEGAL) Chip latch 00000000
0001404a 6100 ff20                bsr.w #$ff20 == $00013f6c
Next PC: 0001404e
> c
Returning to emulation...
8. CPU breakpoint condition(s) matched 1 times.
	pc = $1404e

CPU=$1404e, VBL=3804, FrameCycles=15812, HBL=15, LineCycles=572, DSP=N/A
0001404e 6100 feb0                bsr.w #$feb0 == $00013f00
> r
  D0 00000001   D1 0007A319   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000000 
  A0 000000A0   A1 00002F50   A2 00000000   A3 00014013 
  A4 00014060   A5 00013D10   A6 00013ED0   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=1 V=0 C=0 IMASK=3 STP=0
Prefetch 6100 (BSR) feb0 (ILLEGAL) Chip latch 00000000
0001404e 6100 feb0                bsr.w #$feb0 == $00013f00
Next PC: 00014052
> c
Returning to emulation...
9. CPU breakpoint condition(s) matched 1 times.
	pc = $14052

CPU=$14052, VBL=3804, FrameCycles=21640, HBL=21, LineCycles=304, DSP=N/A
00014052 6100 fef6                bsr.w #$fef6 == $00013f4a
> r
  D0 00000001   D1 0007A31D   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000044   D7 00000044 
  A0 000000A0   A1 00002F50   A2 00000000   A3 00014013 
  A4 00014060   A5 00013D10   A6 00013ED4   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 6100 (BSR) fef6 (ILLEGAL) Chip latch 00000000
00014052 6100 fef6                bsr.w #$fef6 == $00013f4a
Next PC: 00014056
> c
Returning to emulation...
10. CPU breakpoint condition(s) matched 1 times.
	pc = $14056

CPU=$14056, VBL=3804, FrameCycles=27624, HBL=27, LineCycles=192, DSP=N/A
00014056 221e                     move.l (a6)+ [0001400d],d1
> r
  D0 00000001   D1 00000000   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000044   D7 0000000A 
  A0 00000000   A1 00002F50   A2 00000000   A3 00014013 
  A4 00014060   A5 00013D10   A6 00013ED4   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 221e (MOVE) b23c (CMP) Chip latch 00000000
00014056 221e                     move.l (a6)+ [0001400d],d1
Next PC: 00014058
> s

CPU=$14058, VBL=3804, FrameCycles=27636, HBL=27, LineCycles=204, DSP=N/A
00014058 b23c 0051                cmp.b #$51,d1
> s

CPU=$1405c, VBL=3804, FrameCycles=27644, HBL=27, LineCycles=212, DSP=N/A
0001405c 66b6                     bne.b #$b6 == $00014014 (T)
> r
  D0 00000001   D1 0001400D   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000044   D7 0000000A 
  A0 00000000   A1 00002F50   A2 00000000   A3 00014013 
  A4 00014060   A5 00013D10   A6 00013ED8   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=1 Z=0 V=0 C=1 IMASK=3 STP=0
Prefetch 66b6 (Bcc) 4e75 (RTS) Chip latch 00000000
0001405c 66b6                     bne.b #$b6 == $00014014 (T)
Next PC: 0001405e
> s

CPU=$14014, VBL=3804, FrameCycles=27656, HBL=27, LineCycles=224, DSP=N/A
00014014 41fa ffd4                lea.l (pc,$ffd4) == $00013fea,a0
> s

CPU=$14018, VBL=3804, FrameCycles=27664, HBL=27, LineCycles=232, DSP=N/A
00014018 2d08                     move.l a0,-(a6) [0001400d]
> s

CPU=$1401a, VBL=3804, FrameCycles=27676, HBL=27, LineCycles=244, DSP=N/A
0001401a 6100 ff50                bsr.w #$ff50 == $00013f6c
> r
  D0 00000001   D1 0001400D   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000044   D7 0000000A 
  A0 00013FEA   A1 00002F50   A2 00000000   A3 00014013 
  A4 00014060   A5 00013D10   A6 00013ED4   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 6100 (BSR) ff50 (ILLEGAL) Chip latch 00000000
0001401a 6100 ff50                bsr.w #$ff50 == $00013f6c
Next PC: 0001401e
> 

And as we watched it through, OUTHX8 and OUTNWLN did what was expected on the stack, leaving the stack with the 8 extra bytes OUTB8 left us with still on the stack for the beginning of the next go through.

Which explains why A6 was expanding until it walked on the return stack, and why, when the CPU tried to return to something that was not the return address, we ended up trying to execute who knows what, with A7 set to who knows what.

So, we could look at OUTB8 in rt_rig03_68K.s and see that, yes, indeed, we do  seem to be subtracting 4 from A6 on exit instead of adding 4 to drop the parameter and the temporary variable.

Maybe I got confused about which way the stack shrinks when I decided to use ADDQ/SUBQ?

* Output the 8-bit number on the stack in binary (base two).
* For consistency, we are passing the byte in the lowest-order byte
* of a 32-bit word.
* Uses D6, D7.
OUTB8	MOVE.L	(A6),D6	; shift on memory is 16-bit, use register
	MOVE.W	#8,(A6)	; 8 bits to output, borrow parameter high word.
OUTB8L	LSL.B	#1,D6	; Get the leftmost bit of the lowest byte.
	BCS.S	OUTB81
OUTB80	MOVEQ.L	#'0',D7
	BRA.S	OUTB8D
OUTB81	MOVEQ.L	#'1',D7
OUTB8D	BSR.S	OUTCV
	SUBQ.W	#1,(A6)
	BNE.S	OUTB8L	; loop if not Zero
	SUBQ.L	#NATWID,A6	; drop parameter character
	RTS

And subtracting 4 instead of adding 4 would, indeed, leave 8 bytes too many on exit.

But wouldn't OUT8HX then output trash on the stack, probably something that was not the character code at all?

Clear the breakpoints and let it run and see if it does.

 

Oh. That's what's happening. And it won't even respond to Q for quit. The only way out is to crash it. Ouch.

Are we convinced?

Then why put the balance checks in? Just for practice?

Yeah. Practice is good.

If we do a quick search through the code, we aren't using D3 through D5 anywhere at this point. We only need to put stack checks in the main routine and in the OUTB8 routine, so two markers should be sufficient. No need for nesting markers, either.

If we use D3 to mark the stack in the PGSTRT and D4 to mark the stack in OUTB8, we don't need to declare variables in memory, and that will help us limit the impact of the debugging code we insert.

For inserting the code, we could talk about conditional assembly, but we really need to start simple, so let's just put marker comments around the code we insert.

Here's what we'll do with OUTB8, with the bug still in place: 

* Output the 8-bit number on the stack in binary (base two).
* For consistency, we are passing the byte in the lowest-order byte
* of a 32-bit word.
* Uses D6, D7.
OUTB8	MOVE.L	(A6),D6	; shift on memory is 16-bit, use register
******
* DEBUG 1 input parameter, no output parameters
	LEA	NATWID(A6),A0	; this is what A6 should be when we leave.
	MOVE.L	A0,D4
* END DEBUG
******
	MOVE.W	#8,(A6)	; 8 bits to output, borrow parameter high word.
OUTB8L	LSL.B	#1,D6	; Get the leftmost bit of the lowest byte.
	BCS.S	OUTB81
OUTB80	MOVEQ.L	#'0',D7
	BRA.S	OUTB8D
OUTB81	MOVEQ.L	#'1',D7
OUTB8D	BSR.S	OUTCV
	SUBQ.W	#1,(A6)
	BNE.S	OUTB8L	; loop if not Zero
	SUBQ.L	#NATWID,A6	; drop parameter character
******
* DEBUG
	CMP.L	D4,A6
	BNE.W	ERROR
* END DEBUG
******
	RTS
And here's what we'll do with PGSTRT:
	EVEN
PGSTRT	LEA	PROMPT(PC),A0
******
* DEBUG 1 input parameter, no output parameters
	MOVE.L	A6,D3	; this is what A6 should be when we leave.
* END DEBUG
******
	MOVE.L	A0,-(A6)
	BSR.W	OUTS
	BSR.W	INCHNE	; Hold off echo
	MOVE.L	(A6),-(A6)	; duplicate
	LEA	KEYCOL(PC),A0
	MOVE.L	A0,-(A6)
	BSR.W	OUTS
	BSR.W	OUTC	; output character
	MOVE.L	(A6),-(A6)	; duplicate
	LEA	COLBIN(PC),A0
	MOVE.L	A0,-(A6)
	BSR.W	OUTS
	BSR.W	OUTB8
	MOVE.L	(A6),-(A6)	; duplicate
	LEA	COLHEX(PC),A0
	MOVE.L	A0,-(A6)
	BSR.W	OUTS
	BSR.W	OUTHX8
	BSR.W	OUTNWLN
	MOVE.L	(A6)+,D1	; balance stack
	CMP.B	#ASCQ,D1
	BNE.S	PGSTRT
******
* DEBUG
	CMP.L	D3,A6
	BNE.W	ERROR
* END DEBUG
******
	RTS

You should be able to just copy the debug lines and paste them into place. Don't forget the comment lines so you know what to remove.

Add a NOP immediately after DONE, with the label ERROR:

START	BSR.W	INITRT
	NOP		; place to set breakpoint
*
	BSR.W	PGSTRT
*
DONE	NOP		; place to set breakpoint
ERROR	NOP		; place to go on errors
	MOVEM.L	A4SAVE-LOCBAS(A5),A4-A7	; restore the monitor's A4-A7

Assemble it and start an Hatari session, breaking out and setting a breakpoint at TEXT, as usual:

$ vasmm68k_mot -Ftos -no-opt -o INKEY_68K.PRG -L inkey_68K.list inkey_68K.s
vasm 1.9f (c) in 2002-2023 Volker Barthelmann
vasm M68k/CPU32/ColdFire cpu backend 2.6c (c) 2002-2023 Frank Wille
vasm motorola syntax module 3.18 (c) 2002-2023 Frank Wille
vasm tos output module 2.3 (c) 2009-2016,2020,2021,2023 Frank Wille

text(acrx2):	         870 bytes
nova@she:~/usr/share/hatari/C:/primer/char_io/stepinch$ hatari
INFO : Hatari v2.4.0-devel (Dec 18 2021), compiled on:  Dec 18 2021, 11:41:51
INFO : Inserted disk '/home/nova/usr/share/hatari/fig68kwrk.st' to drive A:.
INFO : Inserted disk '/home/nova/usr/share/hatari/stuff.st' to drive B:.
MMU emulation requires 68030/040/060 and it is not JIT compatible.
INFO : Mounting IDE hard drive image /home/nova/work/emu/hatari/hd80mb.image
INFO : GEMDOS HDD emulation, C: <-> /home/nova/usr/share/hatari/C:.
WARN : GEMDOS HD drive C: (may) override ACSI/SCSI/IDE image partitions!
MMU emulation requires 68030/040/060 and it is not JIT compatible.
WARN : Bus Error reading at address $ffffa200, PC=$e00ce2 addr_e3=e00ce2 op_e3=4a10
WARN : No GEMDOS dir '/home/nova/usr/share/hatari/C:/AUTO'

----------------------------------------------------------------------
You have entered debug mode. Type c to continue emulation, h for help.

CPU=$e1d7e2, VBL=1395, FrameCycles=128, HBL=0, LineCycles=128, DSP=N/A
00e1d7e2 46c0                     move.w d0,sr
> b pc=TEXT
CPU condition breakpoint 1 with 1 condition(s) added:
	pc = TEXT
> c
Returning to emulation...

When you run INKEY_68K.PRG (or pretty much anything not built-in) from the EmuTOS console it will break. Step once to the START label and disassemble from the PC:

1. CPU breakpoint condition(s) matched 1 times.
	pc = TEXT
Reading symbols from program '/home/nova/usr/share/hatari/C:/primer/char_io/stepinch/INKEY_68K.PRG' symbol table...
TOS executable, DRI / GST symbol table, reloc=0, program flags: PRIVATE (0x0)
Program section sizes:
  text: 0x366, data: 0x0, bss: 0x0, symtab: 0x33a
Trying to load DRI symbol table at offset 0x382...
Offsetting BSS/DATA symbols from TEXT section.
Skipping duplicate address & symbol name checks when autoload is enabled.
Loaded 57 symbols (42 TEXT) from '/home/nova/usr/share/hatari/C:/primer/char_io/stepinch/INKEY_68K.PRG'.

CPU=$13d10, VBL=102109, FrameCycles=166200, HBL=163, LineCycles=592, DSP=N/A
00013d10 6000 02ac                bra.w #$02ac == $00013fbe (T)
> s

CPU=$13fbe, VBL=102109, FrameCycles=166212, HBL=163, LineCycles=604, DSP=N/A
00013fbe 6100 ff28                bsr.w #$ff28 == $00013ee8
> d
(PC)
START:
00013fbe 6100 ff28                bsr.w #$ff28 == $00013ee8
00013fc2 4e71                     nop 
00013fc4 6100 005c                bsr.w #$005c == $00014022
DONE:
00013fc8 4e71                     nop 
ERROR:
00013fca 4e71                     nop 
00013fcc 4ced f000 0008           movem.l (a5,$0008) == $0001407e,a4-a7
00013fd2 4e71                     nop 
00013fd4 4e71                     nop 
00013fd6 4e71                     nop 
00013fd8 4e71                     nop 
00013fda 4267                     clr.w -(a7) [0000]
00013fdc 4e41                     trap #$01
INCHNE:
00013fde 610a                     bsr.b #$0a == $00013fea
00013fe0 c0bc 0000 ffff           and.l #$0000ffff,d0
00013fe6 2d00                     move.l d0,-(a6) [00000000]
00013fe8 4e75                     rts  == $00000000
INCHV:
00013fea 3f3c 0002                move.w #$0002,-(a7) [0000]
> 

Set two breakpoints, one at the label DONE and one at ERROR (the NOP immediately after). Then continue:

> b pc=$13fc8
CPU condition breakpoint 2 with 1 condition(s) added:
	pc = $13fc8
> b pc=$13fca
CPU condition breakpoint 3 with 1 condition(s) added:
	pc = $13fca
> c
Returning to emulation...

Back in the EmuTOS console, it will be waiting for you to hit a key. The first key you hit, it should break, and the console should not respond. Return to the debugger and check which breakpoint it took. It should be the one at the ERROR label:

3. CPU breakpoint condition(s) matched 1 times.
	pc = $13fca

CPU=$13fca, VBL=140869, FrameCycles=94972, HBL=93, LineCycles=484, DSP=N/A
00013fca 4e71                     nop 
> 

In this case, you can see it was, at $13FCA, which is where we set the second breakpoint. Well, where I set it.

We probably should have set up two ERROR labels, one for OUTB8 to jump to, and one for the main routine, PGSTRT, to jump to. But you can check the return stack for clues:

> r
  D0 00000001   D1 0007A31D   D2 00000000   D3 00013EE0 
  D4 00013EDC   D5 00000000   D6 00000000   D7 00000030 
  A0 000000A0   A1 00002F50   A2 00000000   A3 0001401D 
  A4 00014076   A5 00013D10   A6 00013ED4   A7 00013E50 
USP  00013E50 ISP  00007E64 
T=00 S=0 M=0 X=0 N=1 Z=0 V=0 C=1 IMASK=3 STP=0
Prefetch 4e71 (NOP) 4ced (MVMEL) Chip latch 00000000
00013fca 4e71                     nop 
Next PC: 00013fcc
> m A7 32
00013E50: 00 01 40 52 00 01 3f c8 00 00 00 00 00 00 00 00   ..@R..?.........
00013E60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
>

If i had stepped through the initializations and done a register dump with the stacks set up, we would have an idea what A7 should be.

Those addresses one the return stack should be return addresses, pointing at code:

> d $14052
00014052 2d16                     move.l (a6) [0001401b],-(a6) [00000000]
00014054 41fa ffc7                lea.l (pc,$ffc7) == $0001401d,a0
00014058 2d08                     move.l a0,-(a6) [00000000]
0001405a 6100 ff1c                bsr.w #$ff1c == $00013f78
0001405e 6100 fea0                bsr.w #$fea0 == $00013f00
00014062 6100 fef2                bsr.w #$fef2 == $00013f56
00014066 221e                     move.l (a6)+ [0001401b],d1
00014068 b23c 0051                cmp.b #$51,d1
0001406c 66b4                     bne.b #$b4 == $00014022 (T)
0001406e bdc3                     cmpa.l d3,a6
00014070 6600 ff58                bne.w #$ff58 == $00013fca (T)
00014074 4e75                     rts  == $00014052
00014076 0000 0000                or.b #$00,d0
...
> d $13fc8
DONE:
00013fc8 4e71                     nop 
(PC)
ERROR:
00013fca 4e71                     nop 
00013fcc 4ced f000 0008           movem.l (a5,$0008) == $00013d18,a4-a7
00013fd2 4e71                     nop 
00013fd4 4e71                     nop 
00013fd6 4e71                     nop 
00013fd8 4e71                     nop 
00013fda 4267                     clr.w -(a7) [3f48]
00013fdc 4e41                     trap #$01
INCHNE:
00013fde 610a                     bsr.b #$0a == $00013fea
00013fe0 c0bc 0000 ffff           and.l #$0000ffff,d0
00013fe6 2d00                     move.l d0,-(a6) [00000000]
00013fe8 4e75                     rts  == $00014052
INCHV:
00013fea 3f3c 0002                move.w #$0002,-(a7) [3f48]
00013fee 3f3c 0002                move.w #$0002,-(a7) [3f48]
00013ff2 4e4d                     trap #$0d
00013ff4 588f                     addaq.l #$04,a7
00013ff6 4e75                     rts  == $00014052
> 

That pretty much proves that the jump to ERROR was from OUTB8. 

And we can also look at what's on the parameter stack:

> m a6 32
00013ED4: 00 01 40 1b 00 00 00 6a 00 00 00 6a 00 00 00 00   ..@....j...j....
00013EE4: 00 00 00 00 20 5f 47 fa fe 24 48 eb f0 00 00 08   .... _G..$H.....
> 

We can see two copies of the character of the key I hit, "j".

I wonder what's at $1401b?

> d $1401b
COLBIN:
0001401b 3a00                     move.w d0,d5
COLHEX:
0001401d 3a20                     move.w -(a0) [079c],d5
0001401f 2400                     move.l d0,d2
...

COLBIN and COLHEX. Okay,

> m $1401b 32
0001401B: 3a 00 3a 20 24 00 00 41 fa ff d4 26 0e 2d 08 61   :.: $..A...&.-.a
0001402B: 00 ff 4c 61 00 ff ae 2d 16 41 fa ff e0 2d 08 61   ..La...-.A...-.a
> m $14000 32
00014000: 6e 79 20 6b 65 79 2c 20 51 20 74 6f 20 71 75 69   ny key, Q to qui
00014010: 74 2e 20 0d 0a 00 4b 45 59 3a 00 3a 00 3a 20 24   t. ...KEY:.:.: $
> 

So that's the address of the colon that goes before the binary output, leftover from the call in PGSTRT. Dead on. 

Let's fix the code in OUTB8 and run it with the stack checks in place. I've told you what to delete, and you know what to fix. The question is whether to use

	ADDQ	#NATWID,A6

or

	LEA	NATWID,(A6),A6

Which I will leave up to you, along with actually stepping through the code and proving that it doesn't ERROR out. 

Don't assume that it will work as advertised. Go ahead and take the fifteen minutes or so to try it. Practice is essential. And you may think of something else interesting to try while you're at it.

I've been planning next to build a post-fix (RPN) integer calculator that only does addition and subtraction, in binary, octal, or hexadecimal. We know enough to do this, and, when we have it working, we can add multiplication and division and more interesting stuff.

But first, let's figure out how to output decimal numbers.


(Title Page/Index)

 

 

 

 

Thursday, October 24, 2024

ALPP 03-13 -- Keyboard Input Routines and Character Code Output on the 68000 (Debug Session -- Init Error)

Keyboard Input Routines
and Character Coded Output
on the 68000
(Debug Session -- Init Error)

(Title Page/Index)

 

After I had finished the code for keyboard input and character code output on the 6809, while I was preparing the code for this chapter, I became aware of a bug I introduced in the code for binary output on the 68000

Became aware of?

The code went BOOM!

Well, it threw some sort of memory addressing fault or something.

It took me a few tens of minutes of stepping, setting breakpoints, continuing on to the breakpoints, checking registers, and so forth to find what I thought was the problem, but I could not convince myself it was the only problem. Something about the stacks did not seem right, and I was too tired to focus when I could work on it, and I kept getting lost in code and thinking that, in fixing one problem I must be causing another.

I'm getting old. Hard to program meaningfully at night.

Anyway, the upshot of that is all that I went back and wrote a chapter attempting to talk about a few debugging techniques using artificial bugs -- Artificial? I could call them puzzles or riddles, I guess.

And I wrote another chapter detailing how to set up mechanisms to prove that the stack is balanced, correct, and not out of bounds at certain points in a program. (And that chapter sent me down a detour discussion of stack frames, which I probably should have left for another day.) 

And I dug back to where the bug was introduced in the binary output for the 68000 chapter and left notes to reduce reader surprise, and now it's time to recap the experience.

You don't really notice these kinds of bugs unless you execute a piece of code several times. So it wouldn't show up in the binary output chapter where I introduced the bug, because we only used those functions once. 

Buuuuuuuuuuut -- now I'm trying to reproduce the bug and it won't.

Well, I remember vaguely fixing one problem and leaving it fixed and fixing another problem and not leaving it fixed and ... yeah, I think I remember something. Maybe I didn't leave the real buggy version source code behind. 

So while I'm scratching my head and nosing around, I do a listing of my working directory from my bash shell in Ubuntu:

$ ls -lart
合計 96
-rw-rw-r-- 1 nova nova 25101 10月 14 16:36 inkey_68K.lst
-rw-rw-r-- 1 nova nova  1692 10月 14 16:36 inkey_68K.PRG
-rw-rw-r-- 1 nova nova  7250 10月 14 16:50 rt_rig03_68K_ww.s
-rw-rw-r-- 1 nova nova  7250 10月 14 16:50 rt_rig03_68K.s
drwxrwxr-x 3 nova nova  4096 10月 24 14:05 ..
-rw-rw-r-- 1 nova nova  1629 10月 24 14:07 inkey_68K_ww.s
-rw-rw-r-- 1 nova nova  1629 10月 24 14:07 inkey_68K.s
-rw-rw-r-- 1 nova nova 25101 10月 25 00:07 inkey_68K.list
-rw-rw-r-- 1 nova nova  1692 10月 25 00:07 INKEY_68K.PRG
drwxrwxr-x 2 nova nova  4096 10月 25 11:15 .

Well, I'm going to have a hard time telling which of inkey_68K.PRG and INKEY_68K.PRG I'm running when I try to run that from the hatari EMUCON shell. They look the same to EmuTOS, and the shell will probably just grab the first one it sees.

So (in the bash shell where I can do this) I rename inkey_68K.PRG to BINKY_68K.PRG, and I'm thinking about this and I pull inkey_68K.lst into a text editor -- Note the difference between .lst and .list and the difference in the dates! -- and, lo and behold the one I just renamed is left over from my fixing things without tracks in the middle of the night. That's why it wouldn't bomb.

And, now that I can, I run INKEY_68K.PRG, and it does bomb.

Hatari's configuration dialog, in the Hatari screen subdialog (not Atari screen), has a screenshot option. Way cool. Let's see if I can convince Blogger to let me upload a pic today. Foul language moment. Google seems to think it needs cookies to upload. I even tried giving it cookies, and it wouldn't take them. How did I get around this last time? Ah. Had to break Chrome out of the mothballs then. And it's giving me error messages and won't let me quite because I didn't enable cookies that Firefox enabled. The save message says it's saved, so kill Firefox and edit in  Chrome. Yuck. Guess it's time to do it again.

Got it uploaded and now I'm back in Firefox. Yay.

Not an address violation, an illegal instruction. PC is someplace it shouldn't be and the CPU is trying to execute stuff that isn't code, which is often the result of mucking with the return stack in a single interleaved stack runtime, which this is not, so probably one stack overflowed into data or onto the other. Looking at A6 and A7, though, neither is where it should be -- $0003e4c and $00007e5e. A5, our DP substitute, doesn't look right, either. All three should be up in the $14000 area, like A3 and A4 are.

Time to step through. Nice that I managed to fail to overwrite both the listing and the object code when I compiled these late last night. It gives me something to work with. Man I'm getting both old and lucky

Restart Hatari, Ctrl-Z to break out to the EMUCON shell, cd where I'm working, Alt-Break to break into the debugger, set the breakpoint pc=TEXT,  (c)ontinue, run inkey_68K.PRG (since the saved one is now BINKY_68K.PRG), and when it breaks back to the debugger, (s)tep. Step, step, step ...

----------------------------------------------------------------------
You have entered debug mode. Type c to continue emulation, h for help.

CPU=$e1d7e2, VBL=1331, FrameCycles=128, HBL=0, LineCycles=128, DSP=N/A
00e1d7e2 46c0                     move.w d0,sr
> b pc=TEXT
CPU condition breakpoint 1 with 1 condition(s) added:
	pc = TEXT
> c
Returning to emulation...
1. CPU breakpoint condition(s) matched 1 times.
	pc = TEXT
Reading symbols from program '/home/nova/usr/share/hatari/C:/primer/char_io/stepinch/INKEY_68K.PRG' symbol table...
TOS executable, DRI / GST symbol table, reloc=0, program flags: PRIVATE (0x0)
Program section sizes:
  text: 0x350, data: 0x0, bss: 0x0, symtab: 0x32c
Trying to load DRI symbol table at offset 0x36c...
Offsetting BSS/DATA symbols from TEXT section.
Skipping duplicate address & symbol name checks when autoload is enabled.
Loaded 56 symbols (41 TEXT) from '/home/nova/usr/share/hatari/C:/primer/char_io/stepinch/INKEY_68K.PRG'.

CPU=$13d10, VBL=1911, FrameCycles=34524, HBL=33, LineCycles=996, DSP=N/A
00013d10 6000 02a0                bra.w #$02a0 == $00013fb2 (T)
> s 

CPU=$13fb2, VBL=1911, FrameCycles=34536, HBL=33, LineCycles=1008, DSP=N/A
00013fb2 6100 ff34                bsr.w #$ff34 == $00013ee8
> s

CPU=$13ee8, VBL=1911, FrameCycles=34556, HBL=34, LineCycles=12, DSP=N/A
00013ee8 205f                     movea.l (a7)+ [00013fb6],a0
> 

CPU=$13eea, VBL=1911, FrameCycles=34568, HBL=34, LineCycles=24, DSP=N/A
00013eea 47fa fe24                lea.l (pc,$fe24) == $00013d10,a3
> 

CPU=$13eee, VBL=1911, FrameCycles=34576, HBL=34, LineCycles=32, DSP=N/A
00013eee 48eb f000 0008           movem.l a4-a7,(a3,$0008) == $00013d18
> 

CPU=$13ef4, VBL=1911, FrameCycles=34620, HBL=34, LineCycles=76, DSP=N/A
00013ef4 3a4b                     movea.w a3,a5
> 

CPU=$13ef6, VBL=1911, FrameCycles=34624, HBL=34, LineCycles=80, DSP=N/A
00013ef6 4fed 0148                lea.l (a5,$0148) == $00003e58,a7
> 

CPU=$13efa, VBL=1911, FrameCycles=34632, HBL=34, LineCycles=88, DSP=N/A
00013efa 4ded 01d0                lea.l (a5,$01d0) == $00003ee0,a6
> 

What's this? A7 being set to $00003e58? and A6 to $00003ee0?

Ah. 

	movea.w a3,a5

There it is. One of the bugs, anyway. That should not be a 16-bit MOVE, move address or otherwise.

Here's the stack initialization code, with the culprit line:

SSTKLIM	DS.L	16	; 16 levels of call, max
* 			; 68000 is pre-dec (decrement-before-store) push
SSTKBAS	DS.L	2	; a little bumper space
PSTKLIM	DS.L	32	; roughly 16 levels of call at two parameters per call
PSTKBAS	DS.L	2	; bumper space -- parameter stack is pre-dec


******************************
* Maintain local static area:
INITRT	MOVE.L	(A7)+,A0	; get the return address
	LEA	LOCBAS(PC),A3	; temporary local base (pseudo-DP)
	MOVEM.L	A4-A7,A4SAVE-LOCBAS(A3)	; Store away what the BIOS gives us.
	MOVE	A3,A5			; set up our local base (pseudo-DP)
	LEA	SSTKBAS-LOCBAS(A5),A7	; set up our return stack
	LEA	PSTKBAS-LOCBAS(A5),A6	; set up our parameter stack
	JMP	(A0)		; return via A0

Look at the line with the comment "set up our local base (pseudo-DP)". Default option is to assume 16-bit when nothing is specified, thus "move.w".

This is the one that I couldn't figure out, and had me worried a week and a half ago, and sent me off on that detour. Let's see whether fixing just this one only will result in running code with the other bug unfixed.  

And ... (drumroll)

Nope, it didn't. Illegal instruction again. This time, at least, A6 was sort-of where it was supposed to be. So, yeah, suspect a stack overflow, somehow. 

Looking back up at the stack declarations, the parameter stack is above the return stack (and they are both way too small, but that's good for catching things like this before they get out into production code). So if the parameter stack oveflows, it will kill the return stack.

So now is it time to break out the stack balancing code that I wrote while out on that detour?

Let's get a look at our code without the stack balance checks. 

Our test "application" code:

***********************************************************************
* simple 8-bit keyboard input for 68000
* using parameter stack,
* with test frame
* Joel Matthew Rees, October 2024

	OPT LIST,SYMTAB	; Options we want for the stand-alone assembler.
	OPT DEBUG	; We want labels for debugging.
	OUTPUT
***********************************************************************

	INCLUDE	rt_rig03_68K.s

	MACHINE MC68000	; because there are a lot the assembler can do.


****************
* Program code:
*

* Essential BIOS/OS:
conin		EQU	2


* INCHAR	; Echo must be handled by user code?
INCHNE	BSR.S	INCHV
	AND.L	#$FFFF,D0	; clear out the scan code
	MOVE.L	D0,-(A6)	; return the character
	RTS

* Wait for input, return character, scan code in D7
INCHV	MOVE.W	#devscrkbd,-(A7)	; push the device number
	MOVE.W	#conin,-(A7)		; push the BIOS routine selector
	TRAP	#BIOSTRAP		; call into the BIOS
	ADDQ.L	#4,A7			; deallocate the BIOS 
	RTS


PROMPT	DC.B	CR,LF	; Put message at beginning of line
	DC.B	"Type any key, Q to quit. "	; 
	DC.B	CR,LF	; Put the prompt  on a new line
	DC.B	NUL
KEYCOL	DC.B	"KEY:"
	DC.B	NUL
COLBIN	DC.B	ASCCOL,NUL
COLHEX	DC.B	": $"
	DC.B	NUL	
*
*
*
	EVEN
PGSTRT	LEA	PROMPT(PC),A0
	MOVE.L	A0,-(A6)
	BSR.W	OUTS
	BSR.W	INCHNE	; Hold off echo
	MOVE.L	(A6),-(A6)
	LEA	KEYCOL(PC),A0
	MOVE.L	A0,-(A6)	; duplicate
	BSR.W	OUTS
	BSR.W	OUTC	; output character
	MOVE.L	(A6),-(A6)	; duplicate
	LEA	COLBIN(PC),A0
	MOVE.L	A0,-(A6)
	BSR.W	OUTS
	BSR.W	OUTB8
	MOVE.L	(A6),-(A6)	; duplicate
	LEA	COLHEX(PC),A0
	MOVE.L	A0,-(A6)
	BSR.W	OUTS
	BSR.W	OUTHX8
	BSR.W	OUTNWLN
	MOVE.L	(A6)+,D1	; balance stack
	CMP.B	#ASCQ,D1
	BNE.S	PGSTRT
	RTS
*
	END	ENTRY

INCHNE and INCHV don't look wrong.

Here's the framework rigging, with the stack initialization fixed:

***********************************************************************
* A simple run-time framework inclusion for 68000
* providing parameter stack and local base
* Version 00.00.03
* Joel Matthew Rees, October 2024

	MACHINE MC68000	; because there are a lot the assembler can do.
***********************************************************************
*
* Essential control codes
LF	EQU	$0A	; line feed
CR	EQU	$0D	; carriage return
NUL	EQU	0

* Other essential ASCII codes
ASC0	EQU	'0'	; Some assemblers won't handle 'c constants well.
ASC9	EQU	'9'
ASCA	EQU	'A'
ASCXGAP	EQU	ASCA-ASC9-1	; Gap between '9' and 'A' for hexadecimal
ASCCOL	EQU	':
ASCQ	EQU	'Q


* Essential BIOS/OS:
*
* We will not be using these:
*GEMDOSTRAP	EQU	1
*GEMprintstr	EQU	9	; PRINT LINE in some docs
*

NATWID	EQU	4	; 4 bytes in the CPU's natural integer
HAFNAT	EQU	(NATWID/2)	; 2 bytes in the CPU's half natural integer
LOWBYTE	EQU	(NATWID-1)	; offset of low byte

*********************************
* Per-process local static area:
*	ORG	$20000	; TOS just gives you what it gives you.
	EVEN
LOCBAS	EQU	*	; here pointer, local static base starts here.
ENTRY	BRA.W	START
	NOP		; buffers to 4 byte even
	NOP
A4SAVE	DS.L	1	; a place to keep A4 to A7 so we can return clean
A5SAVE	DS.L	1	; using it as pseudo-DP
A6SAVE	DS.L	1	; save A6 anyway.
A7SAVE	DS.L	1	; SP
* A7 will, of course, be our return address stack pointer
* A6 will be our parameter stack pointer
* A5 will be our local static base.
* A4 not yet dedicated, should be preserved if used.

* A0, A1, A2 are assumed volatile in BIOS.
* A3 is used here (volatile).

* D6, D7 are used here (volatile).
* D0, D1, D2 are assumed volatile in BIOS.
* D3, D4, and D5 should be preserved if used.
* 

* In this code, we handle the parameter stack directly.
* Items on the parameter stack
* are assumed to be full natural width (NATWID bytes),
* as a simplifying assumption.

*			  room for something
GAP1	DS.L	58	; to even 256 bytes, not much here yet
*
	DS.L	2	; a little bumper space
SSTKLIM	DS.L	16	; 16 levels of call, max
* 			; 68000 is pre-dec (decrement-before-store) push
SSTKBAS	DS.L	2	; a little bumper space
PSTKLIM	DS.L	32	; roughly 16 levels of call at two parameters per call
PSTKBAS	DS.L	2	; bumper space -- parameter stack is pre-dec


******************************
* Maintain local static area:
INITRT	MOVE.L	(A7)+,A0	; get the return address
	LEA	LOCBAS(PC),A3	; temporary local base (pseudo-DP)
	MOVEM.L	A4-A7,A4SAVE-LOCBAS(A3)	; Store away what the BIOS gives us.
	MOVE.L	A3,A5			; set up our local base (pseudo-DP)
	LEA	SSTKBAS-LOCBAS(A5),A7	; set up our return stack
	LEA	PSTKBAS-LOCBAS(A5),A6	; set up our parameter stack
	JMP	(A0)		; return via A0


*****************************************
* Low level I/O:
* This will be a focus point in porting, 
* Primarily OUTNWLN, OUTC, INCH


* No need for PPOPD, PPUSHD, PPOPX, or PPUSHX, etc. in 68000 code.


* Output an 8-bit byte in hexadecimal,
* byte as a 16-bit parameter on PSP.
OUTHX8	CLR.L	D6
	MOVE.B	LOWBYTE(A6),D6	; get the byte
	LSR.B	#4,D6
	BSR.S	OUTRAD
	MOVE.B	LOWBYTE(A6),D6	; Get it again.
	AND.W	#$0F,D6	; mask it off
	BSR.S	OUTRAD
	LEA	NATWID(A6),A6
	RTS

* Convert the value in B to ASCII numeric,
* including hexadecimals and up to base 36 ('Z'+1)
OUTRAD	ADD.W	#ASC0,D6	; Add the ASCII for '0'
	CMP.W	#ASC9,D6; Greater than '9'?
	BLS.S	OUTRADD	; no, output as is.
	ADD.W	#ASCXGAP,D6	; Adjust it to 'A' to 'Z'
OUTRADD	MOVE.L	D6,D7
	BSR.W	OUTCV
	RTS


* Output the 8-bit number on the stack in binary (base two).
* For consistency, we are passing the byte in the lowest-order byte
* of a 32-bit word.
* Uses D6, D7.
OUTB8	MOVE.L	(A6),D6	; shift on memory is 16-bit, use register
	MOVE.W	#8,(A6)	; 8 bits to output, borrow parameter high word.
OUTB8L	LSL.B	#1,D6	; Get the leftmost bit of the lowest byte.
	BCS.S	OUTB81
OUTB80	MOVEQ.L	#'0',D7
	BRA.S	OUTB8D
OUTB81	MOVEQ.L	#'1',D7
OUTB8D	BSR.S	OUTCV
	SUBQ.W	#1,(A6)
	BNE.S	OUTB8L	; loop if not Zero
	SUBQ.L	#NATWID,A6	; drop parameter character
	RTS

* driver level code to output a new line
OUTNWLN	MOVEQ.L	#CR,D7
	BSR.S	OUTCV
	MOVEQ.L	#LF,D7
	BSR.S	OUTCV
	RTS

* Essential BIOS/OS:
BIOSTRAP	EQU	13
bconout		EQU	3
devscrkbd	EQU	2


* OUTC and OUTCV are two entry points to the same routine.
OUTC	CLR.W	D7		; clear character high byte.
	MOVE.B	NATWID-1(A6),D7	; Get the character, high byte cleared
	ADDQ.L	#NATWID,A6	; drop it from parameter stack before falling through.
* Falls through!
* Common low-level hook: call with the character in D7. 
* This bit of glue calls the BIOS.
OUTCV	MOVE.W	D7,-(A7)	; Push the character on A7 where bconout wants it.
	MOVE.W	#devscrkbd,-(A7)	; push the device number
	MOVE.W	#bconout,-(A7)		; push the BIOS routine selector
	TRAP	#BIOSTRAP		; call into the BIOS
	ADDQ.L	#6,A7			; deallocate the BIOS parameters when done
	RTS
* Return from TRAP is RTE,not RTS.
* Can't rob TRAP's return.

* Output NUL-terminated string of bytes.
* Very low-level, no guard rails on length.
* A3 is preserved by Hatari BIOS.
OUTS	MOVE.L	(A6)+,A3	; get the string pointer
OUTSL	CLR.L	D7		; Prepare the high bytes.
	MOVE.B	(A3)+,D7	; get the byte, update the pointer
	BEQ.S	OUTDN		; if NUL, leave before outputting the character.
	BSR.S	OUTCV	; use OUTCV directly
	BRA.S	OUTSL	; next character
OUTDN	RTS


******************************
* intermediate-level library:
*
* We often will not need these, but we'll go ahead and define them:
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit sum
ADD16	MOVE.W	(A6)+,D7	; right (16-bit only)
	ADD.W	D7,(A6)		; add to left
	RTS			; *** all flags valid!! ***
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit difference
SUB16	MOVE.W	(A6)+,D7	; right (16-bit only)
	SUB.W	D7,(A6)		; subtract from left
	RTS			; *** all flags valid!! ***
*
* input parameters:
*   32-bit left, right
* output parameter:
*   32-bit sum
ADD32	MOVE.L	(A6)+,D7	; right 
	ADD.L	D7,(A6)		; add to left
	RTS			; *** all flags valid!! ***
*
* input parameters:
*   32-bit left, right
* output parameter:
*   32-bit difference
SUB32	MOVE.L	(A6)+,D7	; right 
	SUB.L	D7,(A6)		; subtract from left
	RTS			; *** all flags valid!! ***
*
* input parameters:
*   16-bit unsigned left, right
* output parameter:
*   32-bit sum
ADD16L	CLR.L	D7
	MOVE.W	2(A6),D7	; left (no sign extension)
	CLR.L	D6
	MOVE.W	(A6),D6		; right (no sign extension)
	ADD.L	D6,D7		; 32-bit sum
	MOVE.L	D7,(A6)		; 32-bit result on stack
	RTS			; *** X, N, Z valid ***
*
* input parameters:
*   16-bit left, right
* output parameter:
*   32-bit signed difference
SUB16L	CLR.L	D7
	MOVE.W	2(A6),D7	; left (no sign extension)
	CLR.L	D6
	MOVE.W	(A6),D6		; right (no sign extension)
	SUB.L	D6,D7		; 32-bit difference
	MOVE.L	D7,(A6)		; 32-bit result on stack
	RTS			; *** X, N, Z valid ***


************************************
* Start run-time, call program.
* Expects program to define PGSTRT:
*
START	BSR.W	INITRT
	NOP		; place to set breakpoint
*
	BSR.W	PGSTRT
*
DONE	NOP		; place to set breakpoint
	MOVEM.L	A4SAVE-LOCBAS(A5),A4-A7	; restore the monitor's A4-A7
	NOP
	NOP		; landing pad
	NOP
	NOP
* One way to return to the OS or other calling program
	CLR.W	-(A7)	; there should be enough room on the caller's stack
	TRAP	#1	;	quick exit

That's a lot to look through. But look at each routine and see if you see something suspicious about the way it's handling A6 on return.

I think you see it.

This has become rather long, I'm getting tired, and I don't want to go to the trouble of doing the stack balance check code.

But if we don't we'll lose a good opportunity. So, even if you see it, let's insert the stack balance checks in the next chapter.


(Title Page/Index)