Sunday, November 3, 2024

ALPP 02-24 -- Some Address Math for the 68000

  Some Address Math
for the
68000

(Title Page/Index)

After a break for multi-byte negation, because address math is so important, I think I should show you explicit 68000 corollaries for what I've shown you for the 6809, as well as the routines  for the 6801 and for the 6800.

When instructions become more general, they often take more bytes to encode. This is especially clear for the 68000. And when you generalize an operation, it often takes more instructions to implement -- even with a more powerful instruction set CPU. And the more you repeat those multiple instructions, the more opportunity you have to make mistakes. 

More than speed and byte count, this is why we define utility routines like we just looked at for the 6800 and 6801. We don't want to give ourselves too many opportunities for mistakes. (Macros can help with this, but we won't talk about that just yet.)

Between the 6809 and the 68000, it can be kind of a wash -- when you're working on 16-bit numbers and small applications that fit in a 64K memory space. When you start working with 32-bit numbers, it's advantage 68000, ... except then you also tend to work with 32-bit addresses, and the addresses can make byte count swell. 

I transliterated the fig implementation of Forth from 6800 to 68000, and the object image size increased by about 80% (real rough estimate). This is because I didn't want to restrict it to operating in the lower 32K of memory, minus the interrupt vector table, so the virtual machine i-codes (function addresses, really) swelled from 16-bit to 32-bit. And since the Forth is mostly a clot of i-codes, the overall image size swells. 

I started a conversion to direct call, which I got lost in (partly motivating this tutorial), and the code size does seem to improve a bit, but not completely to the size of the 6809 image.

Do look at assembly listings when you try to compare code sizes for stuff. In particular, the 68000 will often seem to take about twice the code bytes that the 6809 takes in these snippets. But when we move to concrete code where pieces come together, the code size comes down closer to the 6809 code size.

And I'll note again, being able to use single instructions instead of utility routines is nice, but it's actually more important that the 68000 has something of an optimal number of registers, so we don't have to worry about pseudo-registers in memory when switching processes.

As always, read the code and the comments in the code, and open up separate browser windows and compare side-by-side.

I'm showing the entire 68000 code in a single block because the abstract operations don't quite map the same, but I'm keeping the order roughly the same to keep it easy to find what to compare. 

How registers are mapping when moving from 6809 to 68000 --

  • I'm mapping the 6809's S to A7, of course;
  • U to A6;
  • DP will map to A5;
  • X mostly to A0;
  • Y to whatever.
  • B is sort-of mapped to D7;
  • A is sort-of mapped to D6 or the top bytes of D7 or D5 or something, depending on what I need it to do.

(And please don't just copy-and-paste code without thinking.)

* 68000 pointer math

	ORG	SOMETHING
* All of these work fine in-line, rather than called as subroutines.
* In fact, unless specifically specified otherwise, you should in-line.
* You can substitute any data register unless specified otherwise.
*
* Likewise, you can substitute any address register,
* except that A7 should always be in-lined --
* -- except for those routines which specifically handle the return address, 
* but those routines are not really intended to be used anyway.
* Calling a subroutine and playing with the return stack 
* without handling the return address
* just is not a good way to keep control of your program.
*
* And then there is alignment. 68000 needs 16- and 32-bit accesses 
* to be 16-bit aligned, and will throw address errors if they are not.
* (Later CPUs are not so restricted.)
*
* Negate Dn in 8, 16, or 32 bits:
NEGLD7	NEG.L	D7	; .L => 32 bits, .W => 16 bits, .W => 8 bits
	RTS
* On the 6800/6801/6809, you can negate (2's complement) a byte 
* using a 1-byte instruction.
* On the 68000, it takes a 2-byte instruction.
* It takes 5 bytes of instruction to negate 16 bits on 6800/1/9,
* and 13 bytes to negate 32 bits.
* But on the 68000, it takes just two,
* the above 16-bit op-code with a couple of bits changed.
* This is a common pattern with 68000 instructions.
*
* And, for all the time I spend explaining NEG, 
* since the 68000 can subtract registers in either order, 
* we really don't need NEG here.

* Unsigned byte offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
ADDBX	AND.W	#$FF,D7	; zero extend it
	ADD.W	D7,A0	; 16-bit source sign extended to 32 bits
	RTS
* Alternative
ADDBXalt
	AND.W	#FF,D7
	LEA	(A0,D7.W),A0	; takes more bytes
	RTS
*
* Signed byte offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
ADSBX	EXT.W	D7
	ADD.W	D7,A0	; 16-bit source sign extended to 32 bit An
	RTS
* Alternative
ADSBXalt
	EXT.W	D7
	LEA	(A0,D7.W),A0	; takes more bytes
	RTS
*
* Unsigned byte offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
SUBBX	AND.W	#$FF,D7	; zero extend it
	SUB.W	D7,A0	; 16-bit source sign extended to 32 bits
	RTS

* Signed byte offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
SBSBX	EXT.W	D7
	SUB.W	D7,A0	; 16-bit source sign extended to 32 bit An
	RTS
*
* Unsigned 16-bit offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
ADDWX	AND.L	#$FFFF,D7	; zero extend it
	ADD.L	D7,A0	
	RTS
* Alternative
ADDWXalt
	AND.L	#FFFF,D7
	LEA	(A0,D7.L),A0	; takes more bytes
	RTS
*
* Signed 16-bit offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
ADSWX	ADD.W	D7,A0	
	RTS
* Alternative
ADSWXalt
	LEA	(A0,D7.W),A0	; takes more bytes
	RTS
* Alternative
ADSWXalt2
	LEA	(A0,A1.W),A0	; takes more bytes
	RTS
*
* Unsigned 16-bit offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
SUBWX	AND.L	#$FFFF,D7	; zero extend it
	SUB.L	D7,A0	
	RTS

* Signed 16-bit offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
SBSWX	SUB.W	D7,A0	
	RTS
*
* 32-bit offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
ADDLX	ADD.L	D7,A0	
	RTS
* Alternative
ADDLXalt
	LEA	(A0,D7.L),A0	; takes more bytes
	RTS
* Alternative
ADDLXalt2
	LEA	(A0,A1.L),A0	; takes more bytes
	RTS
*
* 32-bit offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
SUBLX	SUB.L	D7,A0	
	RTS
*


*************
* For the return stack
* As explained above, just in-line the LEA.
* These are provided as a solution to a puzzle,
* not as useful code.
*
* Signed byte offset
* Just in-line the EXT.W and ADD.W
ADSBS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	EXT.W	D7	; zero extend it
	ADD.W	D7,A7	; 16-bit source sign extended to 32 bits
	JMP	(A0)	; return via A0
* See above about LEA instead of ADD.
*
* Unsigned byte offset
* Just in-line the AND.W and ADD.W
ADDBS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	AND.W	#$FF,D7	; zero extend it
	ADD.W	D7,A7	; 16-bit source sign extended to 32 bits
	JMP	(A0)	; return via A0
* See above about LEA instead of ADD.

* Signed 16-bit offset
* Just in-line the ADD.W
ADSWS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	ADD.W	D7,A7	; 16-bit source sign extended to 32 bits
	JMP	(A0)	; return via A0
* See above about LEA instead of ADD.
*
* Unsigned 16-bit offset
* Just in-line the AND.L and ADD.L
ADDWS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	AND.L	#$FFFF,D7	; zero extend it
	ADD.L	D7,A7	; 16-bit source sign extended to 32 bits
	JMP	(A0)	; return via A0
* See above about LEA instead of ADD.

* 32-bit offset
* Just in-line the ADD.L
ADDLS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	ADD.L	D7,A7	; 32 bits
	JMP	(A0)	; return via A0
*
* Unsigned byte offset
* Just in-line the AND.W and SUB.W
SUBBS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	AND.W	#$FF,D7	; zero extend it
	SUB.W	D7,A7	; 16-bit source sign extended to 32 bits
	JMP	(A0)	; return via A0
*
* Unsigned 16-bit offset
* Just in-line the AND.L and SUB.L
SUBWS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	AND.L	#$FFFF,D7	; zero extend it
	SUB.L	D7,A7	; 32 bits
	JMP	(A0)	; return via A0
*
* Signed byte offset
* Just in-line the EXT.W and SUB.W
SUBBS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	EXT.W	D7	; sign extend it
	SUB.W	D7,A7	; 16-bit source sign extended to 32 bits
	JMP	(A0)	; return via A0
*
* Signed 16-bit offset
* Just in-line the SUB.W
SUBWS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	SUB.W	D7,A7	; 16-bit source sign extended to 32 bits
	JMP	(A0)	; return via A0

* 32-bit offset
* Just in-line the SUB.L
SUBWS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	SUB.L	D7,A7	; 32 bits
	JMP	(A0)	; return via A0
*

* INX and DEX trains and INS and DES trains are meaningless.
* HOWEVER, just to remind ourselves:
* (And all of these work for Y and U, too but IN-LINE them!!)
* (They work for S if in-lined, as well.)
ADD16X	LEA 	16(A0),A0
	RTS
ADD14X	LEA	14(A0),A0
	RTS
SUB16X	LEA	-16(A0),A0
	RTS
* Etc. In-line these.
INX	LEA	1(A0),A0	; Sigh. In-line it. Do not make trains with it. Please.
	RTS
DEX	LEA	-1(A0),A0	; See INX. In-line it. Do not make trains with it. PLEASE.
	RTS
* Note that we can also use ADDQ and SUBQ for offset less than 9
*
* More solutions to puzzles.
* If you called these, you would have to juggle the return address as shown.
* You don't want to do that.
* Just in-line the LEAS instructions.
* Then there's no return address to juggle, no messing with X.
* DO NOT USE THIS CODE other than for examples of silly walks.
ADD16S	MOVE.L	(A7)+,A0
	LEA	16(A7),A7
	JMP	(A0)
* etc.
* Could all be replaced with just LEA	16(A7),A7 in-line!
* That's actually cheaper than just the instruction JSR!!!


* Synthetic stacks restricted within page boundaries make no sense at all
* on the 68000. Except, I suppose they could, sort-of.
*
* In the first place,
* we should be able to use an extra address register to make a third stack.
* If we do, addressing has already been covered, above.
*
* But if we want a software stack maintained by pointers in memory,
* for some reason,
* Given a pseudo-register somewhere in process local variable space
* accessed via A5:
	ORG	SOMEWHERE
	...
QSP	DS.L	1	; a synthetic stack pointer Q
* QSP-LOCBAS has to be within +/-32K on 68000, 2-byte op-code, 2-byte offset, syntax: QSP-LOCBAS(A5)
* 68020 and above allows 32-bit range, 4-byte op-code, 4-byte offset, syntax: (QSP-LOCBAS,A5)
	...
	DS.L	2	; buffer zone
QSTKLIM	DS.L	32
QSTKBAS	DS.L	2	; buffer zone
	...

* 32-bit Dn for synthetic stack (could/should be in-line):
ADDQSP	ADD.L	D7,QSP-LOCBAS(A5)	; 4 bytes in op-code (+/-32K)
	RTS
* If QSTKLIM-8 to QSTKBAS+7 are within an even 256-byte page boundary
* so that carries cannot be generated:
* unsigned byte D7
ADDQSPS	ADD.B	D7,QSP+3-LOCBAS(A5)	; 4 bytes in op-code (+/-32K)
	RTS
* If QSTKLIM-8 to QSTKBAS+7 are within an even 65536-byte page boundary
* so that carries cannot be generated:
* unsigned 16-bit D7
ADDQSPW	ADD.W	D7,QSP+3-LOCBAS(A5)	; 4 bytes in op-code (+/-32K)
	RTS
*
* 32-bit Dn for synthetic stack (could/should be in-line):
SUBQSP	SUB.L	D7,QSP-LOCBAS(A5)	; 4 bytes in op-code (+/-32K)
	RTS
* If QSTKLIM-8 to QSTKBAS+7 are within an even 256-byte page boundary
* so that carries cannot be generated:
* unsigned byte
SUBQSPS	SUB.B	D7,QSP+3-LOCBAS(A5)	; 4 bytes in op-code (+/-32K)
	RTS
* If QSTKLIM-8 to QSTKBAS+7 are within an even 65536-byte page boundary
* so that carries cannot be generated:
* unsigned 16-bit D7
SUBQSPW	SUB.W	D7,QSP+3-LOCBAS(A5)	; 4 bytes in op-code (+/-32K)
	RTS

* 68000 has no memory indirection
QPSHD7L	MOVE.L	QSP-LOCBAS(A5),A4	; 4 bytes in op-code
	MOVE.L	D7,-(A4)		; 2 bytes in op-code
	MOVE.L	A4,QSP-LOCBAS(A5)	; 4 bytes in op-code
	RTS
*
* 68020+ have memory indirection
QPSHD7LI
	SUBQ.L	#4,QSP-LOCBAS(A5)	; 4 bytes in op-code (SUBQ.W would be faster for medium stack)
	MOVE.L	D7,([A4])		; 4 bytes in op-code
	RTS
*
QPOPD7L	MOVE.L	QSP-LOCBAS(A5),A4	; 4 bytes in op-code
	MOVE.L	(A4)+,D7		; 2 bytes in op-code
	MOVE.L	A4,QSP-LOCBAS(A5)	; 4 bytes in op-code
	RTS
*
* 68020+ have memory indirection
QPOPD7LI
	MOVE.L	([A4]),D7		; 4 bytes in op-code
	ADDQ.L	#4,QSP-LOCBAS(A5)	; 4 bytes in op-code (ADDQ.W would be faster for medium stack)
	RTS


* Register offsets from A7 were dealt with above.

* Lest I forget --
* On the 6800 or 6801, this would be reference by a process-local
* LOCALBASE or similar pseudo-register, which I almost forgot to talk about.
* On the 6809, it could be done by pseudo-register or (with some glue) by DP.
* On the 68000, we are going to use a spare address register,
* and I am going to pick A5.
* All the address math has been shown above,
* the only issue is being explicit about the assembly language idiom.
* Lest I forget --
*
* Given 
	ORG	Whatever
LOCBAS	EQU	*
*	...
VAR	DS.B	m	; or .W or .L, etc.
*
* With A5 known to be set to LOCBAS,
	LEA	LOCBAS(PC),A5
* or
	MOVEA.L	#LOCBAS,A5
*
* In-line snippets --
* For variable VAR within 256 bytes of LOCBAS:
	...
	LEA	VAR-LOCBAS(A5),A0	; that's all! (4-byte op-code)
	...
*
* When VAR is 256 bytes or more away from LOCBAS, but less than 32768
* (or, even, below LOCBAS but within -32768), in other words, signed 16-bit offset:
	...
	LEA	VAR-LOCBAS(A5),A0	; same thing!
	...
*
* It's a little messier when the signed offset doesn't fit in 16 bits, 
* less than -32768 below, or 32768 or greater above --
	...
	MOVE.L	#VAR-LOCBASE,D7		; Any Dn. An will also work, if it's not in use. 6 bytes.
	LEA	(A5,D7.L),A0		; 4 bytes. total 10 bytes. 
	...
*
* From the 68020 on, 32-bit offsets are allowed, but the op-code is also 32-bits plus displacement:
	...
	LEA	(VAR-LOCBASE,A5),A0	; 8 byte total op-code
	...
* 
* Do I really need to show this as subroutines?
* signed 16-bit offset in D7:
LEALBWX	LEA	(A5,D7.W),A0	; PLEASE just do this in-line!
	RTS
*
* 32-bit offset in D7:
LEALBLX	LEA	(A5,D7.L),A0	; PLEASE just do this in-line!
	RTS
*			;-/
* 
* I assume you're not going to be wanting to keep LOCBAS
* in a pseudo-register called LB_BASE.
* But you might want to maintain a separate allocation area
* with a pointer in AL_BASE, like this:
LOCBAS	EQU	*
	...
AL_BASE	DS.L	1
	...
* for signed 16-bit offsets in D7: 
ADDLBW	MOVE.L	AL_BASE-LOCBAS(A5),A0
	ADD.W	D7,A0	; or LEA (A0,D7.W),A0
	RTS
* for unsigned 16-bit offsets:
ADDLBU	AND.L	#$0000FFFF,D7	; unsigned offset
* for 32-bit offsets
ADDLBL	MOVE.L	AL_BASE-LOCBAS(A5),A0
	ADD.L	D7,A0	; or LEA (A0,D7.L),A0
	RTS
*
* 68020 and above allow you to do weird things like this --
	...
	LEA	([AL_BASE-LOCBASE,A5],D7.L),A0
*	...					;  8-o
* ... quite literally letting you index directly off that pseudo-register
* out there in memory.
*
* As near as I can tell,
* memory indirect modes all require an address register,
* or the PC. 
* But that's not so bad, other than some of the modes being overkill.
*
* And, in spite of my mugging, maybe this has been a good way
* to expand your grasp of the power of the 68000 addressing modes.

* Sorry about the mugging. Sort-of. ;-/

As you can see, the 68000 just basically does almost all the address math you need without subroutines.

Including, to some extent, arrays, but let's not go there yet.

As with the previous three chapters, I have not tested the code. It should run, modulo typos.

The 68000 is hard to wrap your head around. I know. If the above doesn't make sense yet, it's okay. I'll point you back here from time to time when we are working with more concrete examples of using the above

Look at how I've been avoiding this. I think it's time to build a concrete example of stack frames on the 6801.

Or you can jump ahead to getting numeric output in binary.


(Title Page/Index)


 

 

 

 

Saturday, November 2, 2024

ALPP 02-23 -- Synthesizing Multibyte NEGate on 6809 (Applies to 6800 and 6801)

  Synthesizing Multibyte NEG
on 6809
(Applies to 6800 and 6801)

(and 6805, with modifications, but we won't talk about that)

(Title Page/Index)

The fact that the NEGD routine effectively does not change from the 6800 to the 6809 had me looking at the 68000's NEGX instruction and NOT instruction and scratching my head as to why there was no NOTX instruction and the reasons for the rules for generating the X bit for the NEG, NEGX, and NOT instructions, and I started losing confidence in the NEGation sequence I have been using in my 6800/6801 and 6809 work:

NEGAB	COMA	; 2's complement NEGate is bit COMplement + 1
	NEGB
	BNE	NEGABX	; or BCS. but BNE works -- extends 0
	INCA
NEGABX	RTS

The theory is pretty straightforward. Radix complement negation is to subtract the number from the radix (or radix raised to the power of the number of columns) and remember your borrow (carry). 

Bit complement is 1's complement, or reduced radix complementr.

So 2's complement is 1's complement plus 1. 

And then you just carry any borrow from the first column as far as it carries. 

No, there's something wrong with that description, which is what had me going.

I guess it's more accurate to say the lack of carry carries until you get a non-zero column? As soon as you get a carry, everything to the left becomes 1's complement. Because the carry means borrow. And 1 - 1 is zero, but 0 - 1 is 1. Or something.

[JMR202411050852 clarification:]

The carry generated on the NEG instruction in Motorola CPUs is the borrow from the (virtual) subtraction from zero. This is the inverse of the carry from adding one.

When you subtract from zero, there's going to be a borrow for any non-zero operand.

When you add 1 to the 1's complement (bit inverse), the carry is only going to generate when the result of the add is zero -- which is exactly the same as when the argument (operand) is zero.

So when there is no borrow from the NEG is when there is carry from the add, and you can stop when there is no carry from the add, which, for me anyway, confirms the reasoning behind stopping when you no longer get 0 as a result.

[JMR202411050852 clarification end.]

Anyway, I was getting lost in something, so I came up with a little routine to test every possible result against straight-out subtracting from 0:

	org $2000
start	ldu #$2100
	ldd #$5a5a
	std ,--u
	ldd #0
	std ,--U
	std ,--u
testl	ldd 2,u
	std ,u
	ldd #0
	subd 2,u
	com ,u
	neg 1,u
	bne testni
	inc ,u
testni	cmpd ,u
	bne teste
	ldd #1
	addd 2,u
	std 2,u
	bne testl
	std 4,u
	nop
	nop
teste	ldd 4,u
	nop
	leau 6,u
	nop

Set a breakpoint at teste, step through the loop a couple of times, and let it rip, and if the value at the top of the U stack is cleared when you're done, every value worked correctly.

Now, I keep saying something about BCS, so I can test that, as well. Just change the

	bne	testni

to

	bcs	testni

and let'r rip again.

And it works.

And som'eres in there, it hit me like the proverbial ton of bricks going down again, bit complement (1's complement in a binary field) does not carry. So the NOT instruction is it's own extending form. And, yes, you prime the NEGX loop on the 68000 with a straight NEG.

... yeah, and I guess I'm not having a smart brain day today or something ...

Well, so, here's a 4-byte negate on 6809:

* negate the 32-bit number on top of stack:
NEG32	COM	,U
	COM	1,U
	COM	2,U
	NEG	3,U
	BNE	NEG32X
	INC	2,U
	BNE	NEG32X
	INC	1,U
	BNE	NEG32X
	INC	,U
NEG32X	RTS

It ought to work. 

Back to the regularly scheduled programming, as soon as I finish figuring out what instruction and addressing combinations on the 68000 are relevant to what I'm demonstrating.


(Title Page/Index)


 

 

 

 

Friday, November 1, 2024

ALPP 02-22 -- Some Address Math for the 6809

  Some Address Math
for the
6809

(Title Page/Index)

Maybe it feels like going around in circles, but address math is so important that I think I should show you explicit 6809 corollaries for the utility address math routines I've shown you for the 6801 and for the 6800.

When instructions become more general, they often take more bytes to encode.  And when you generalize an operation, it often takes more instructions to implement -- even with a more powerful instruction set CPU. And the more you repeat those multiple instructions, the more opportunity you have to make mistakes. 

This is why we define utility routines like we just looked at for the 6800 and 6801.

But in practice with the 6809, this is not usually the case. 

To get a sense of how the size is affected in real code, you will want to compare these examples I give to the concrete examples I have given -- and give later -- for the other processors.

As much as having actual instructions to do the work for you improves things, the more important improvement is eliminating almost all need for pseudo-registers that have to be managed when switching processes.

Remember to read the code and the comments in the code, and open up separate browser windows to compare side-by-side with the 6800 and 6801. Reading code is important.

Let me say it again: 

No need for pseudo-registers on either the 6809 or 68000!

Unless you really want to synthesize a third stack or something on the 6809. 

Almost -- That's modulo per-process global variables, depending on how you handle them. And modulo some use of stack as temporaries instead of pseudo-registers, because stack is just a better place for temporaries, and is so easily accessed on the 6809.

Let's look at the 6809 code. 

You'll (hopefully) notice that mapping the abstract operations to the 6809 works out somewhat different than for the 6800 and 6801. So I'm showing the 6809 code in a single block and relying more on comments in the code. The order of presentation is roughly the same, so it should be easy enough to find what to compare with what. 

One of the reasons I demonstrate an alternate way to NEGate the Double accumulator is to demonstrate a very useful way to use the stack to avoid using temporary variables in memory. (I guess I need to go back and make this explicit in the 6801 and 6800 address math chapters.)

Do not miss the fact that the 6809 has four indexable registers, and all the address math instructions work for all four indexable registers -- where the routines may not! Where I say in-line, that means just use the instructions rather than calling the routines. 

(If you're wondering, fix the mnemonics for the required register -- LEAX for X, LEAY for Y, LEAU for U, LEAS for S, etc. And don't forget the addressing mode index registers. And, no, don't include the RTS at the end when you're inserting the code in-line. 8-/ I know you caught all that, but some people just copy-and-paste without thinking.)

* 6809 pointer math
*	ORG	$80
*	...
*XOFFA	RMB	1 ; don't need these at all
*XOFFB	RMB	1
*XOFFSV	RMB	2
*	...

	ORG	SOMETHING
* All of these work fine in-line, rather than called as subroutines
*
* Two ways to negate D on the 6809:
NEGD	COMA		; 6800 version -- still no NEGD
	NEGB            ; and sign extending doesn't help.
	BNE	NEGDX	; or BCS. but BNE works -- extends 0
	INCA
NEGDX	RTS
*
NEGDS	PSHS	D	; slightly slower, uses stack
	LDD	#0
	SUBD	,S++
	RTS
*
* Unsigned byte offset
* Absolutely should in-line. X only.
ADDBX	ABX	; X only
	RTS
*
* For unsigned byte offset other than X, zero extend B into A
* Destroys A.
* Should in-line for Y or U. Should use ABX for X. Must in-line for S.
ADDBY	CLRA	; for Y/U/S, zero extend B for unsigned offset
	LEAY	D,Y
	RTS
*
* Signed byte offset
* Should in-line for X, Y or U. Must in-line for S.
ADSBX	LEAX	B,X	; sign extended B, Y/U/S also
	RTS
*
* Signed byte offset
* Should in-line for X, Y or U. Must in-line for S.
SBSBX	NEGB		; signed subtract B, Y/U/S also
	LEAX	B,X
	RTS
*
* Unsigned byte offset, zero extend A
* Destroys A
* Could in-line for X, Y or U. Must in-line for S.
SUBBX	CLRA	; B is unsigned, therefore positive
* 16-bit offset, must in-line for S.
SUBDX	COMA		; no NEGD
	NEGB
	BNE	ADDDX	; or BCS. but BNE works -- extends
	INCA
* 16-bit offset, must in-line for S
ADDDX	LEAX	D,X	; Y/U/S also
	RTS

* Alternatively, use D for explicit subtraction
* Here as an example of math that can be done,
* probably not as a useful subroutine.
SUBBXS	CLRA	; B is unsigned, destroys A
SUBDXS	PSHS	D	; for subtraction
	EXG	X,D	; X to subtract, save D
	SUBD	,S++	; do the subtraction
	EXG	X,D	; Offset result to X, restore D
	RTS

* No particular reason to try to use ABX in signed byte offset.
* This is a solution to a puzzle, not useful code.
* You don't really want to do this.
ADDSBX	TSTB
	BPL	ADDSBXA
	LEAX	B,X	; Absolutely no reason not to use this in the first place.
	RTS
ADDSBXA	ABX
	RTS

*************
* For S stack
* As mentioned above, just in-line the LEAS.
* These are also provided as a solution to a puzzle,
* not as useful code.
*
* Signed byte offset
ADSBS	PULS	X	; get return address, restore stack address
	LEAS	B,S	; you really could just in-line this.
	JMP	,X	; return via X
*
* Unsigned byte offset, zero extend A, destroys A, X
ADDBS	CLRA		; just in-line the CLRA and the LEAS D,S
* 16-bit offset
ADDDS	PULS	X	; get return address, restore stack address
	LEAS	D,S
	JMP	,X	; return
*
* Do you really want to do this?
* Unsigned byte offset, zero extend into A, destroys A
SUBBS	CLRA
SUBDS	COMA
	NEGB
	BNE	ADDDS	; or BCS. but BNE works -- extend
	INCA
	BRA	ADDDS	; let ADDDS handle the return address and the math
* Do the math in D for explicit subtraction
* No more useful than the rest of this for X.
* Here just as an example of math that can be done.
SUBBSS	CLRA	; B is unsigned, destroys A
SUBDSS	LDX	,S	; get return address
	STD	,S	; save D
	TFR	S,D	; get S without endangering the stack
	ADDD	#2	; adjust for having D on the stack
	SUBD	,S	; finally subtract the offset
* Alternative 1, leaves D destroyed
	TFR	D,S	; update stack pointer
	JMP	,X	; return via X
* Alternative 2, restores offset in D
	PSHS	D	; working realllllly hard not to destroy D.
	LDD	2,S	; got the offset
	LDS	,S	; update S
	JMP	,X

* INX and DEX trains and INS and DES trains are meaningless.
* HOWEVER, just to remind ourselves:
* (And all of these work for Y and U, too but IN-LINE them!!)
* (They work for S if in-lined, as well.)
ADD16X	LEAX	16,X
	RTS
ADD14X	LEAX	14,X
	RTS
SUB16X	LEAX	-16,X
	RTS
* Etc. In-line these.
INX	LEAX	1,X	; Sigh. In-line it. Do not make trains with it. Please.
	RTS
DEX	LEAX	-1,X	; See INX. In-line it. Do not make trains with it. PLEASE.
*
* More solutions to puzzles.
* If you called these, you would have to juggle the return address as shown.
* You don't want to do that.
* Just in-line the LEAS instructions.
* Then there's no return address to juggle, no messing with X.
* DO NOT USE THIS CODE other than examples of silly walks.
ADD16S	PULS	X
	LEAS	16,S
	JMP	,X
* etc.
* Could all be replaced with just LEAS	16,S; in-line!
* That's actually cheaper than just the instruction JSR!!!


* And stacks restricted within page boundaries make no sense at all on the 6809.
* Pseudo-register somewhere in DP:
QSP	RMB	2	; a synthetic stack pointer Q
	...
	ORG	SOMETHING
	RMB	4	; buffer zone
QSTKLIM	RMB	64
QSTKBAS	RMB	4	; buffer zone
SSTKLIM	RMB	32
SSTKBAS	RMB	4	; buffer zone
	...

* signed B for synthetic stack:
ADBQSP	LDX	QSP
ADBQSX	LEAX	B,X	; does the whole pointer, negatives, too
	STX	QSP
	RTS
*
* unsigned B and D for synthetic stack:
ADUQSP	CLRA		; unsigned B entry point
ADDQSP	LDX	QSP
ADDQSX	LEAX	D,X	; does the whole pointer, negatives, too
	STX	QSP
	RTS
*
* Choose whether you want to negate D or move it around, and see above.
* Or just decide you can add a negative instead of subtracting
*
* Destroys A
SBSQSP	SEX	; sign extend B into A (Yes, that's the mnemonic.)
	BRA	SBDQSP
SBUQSP	CLRA	; B is unsigned, therefore positive
* 16-bit offset
SBDQSP	COMA		; no NEGD
	NEGB
	BNE	ADDQSX	; or BCS. but BNE works -- extends
	INCA
	BRA	ADDQSX

* Alternatively, use D for explicit subtraction
SBSQSPS	SEX	; sign extend B into A (Yes, that's the mnemonic.)
	BRA	SBDQSPS
SBUQSPS	CLRA	; B is unsigned, destroys A
SBDQSPS	PSHS	D	; for subtraction
	LDD	QSP	; Get things in the right place
	SUBD	,S++	; do the subtraction
	STD	QSP	; update
	RTS

* More stuff that there is no reason to do.
* Just in-line the LEAS B,S
ADBSP	PULS	X	; return address
	LEAS	B,S	; signed B, but full 16-bit address math.
	JMP	,X
*
* Just in-line the LEAS D,S
* D for return stack (but we saw this above):
ADDSP	PULS	X	; return address
	LEAS	D,S
	JMP	,X
*
* Just in-line the NEGB	and LEAS B,S, Still cheaper than the call.
* signed B for return stack:
SBBSP	PULS	X	; return address
	NEGB
	LEAS	B,S	; full 16-bit address math
	RTS
*
* This one might be worth a routine for,
* if you actually have to do it.
* D for return stack (but we saw this above):
SBDSP	PULS	X	; return address
	COMA
	NEGB
	BNE	SBDSPM
	INCA
SBDSPM	LEAS	D,S
	JMP	,X
* or
SBDSPS	LDX	,S	; return address	
	STD	,S	; offest
	TFR	S,D
	ADDD	#2	; adjust it
	SUBD	,S
	TFR	D,S
	JMP	,X

As you can see, the 6809 just basically does almost all the address math you need without subroutines.

Uhm, until we get to arrays, but let's not do that yet.

[JMR202411031752 correction:]

In the comments to the code, I suggested (or asserted?) that there would be no reason on the 6809 to allocate a stack entirely within a single page so that the stack pointer math would never overflow, and the increment and decrement could be handled with the INC and DEC instructions only, ignoring overflow.

On my way to bed last night, I realized that would not entirely be true.

Pointer variables in the direct page cannot be indirected without loading the variable into an index register. So if your top of stack pointer is process local, there would be no point in not using the auto-inc/dec modes and LEA instructions to do the index updates.

But if the synthesized stack or queue is global to all processes (such as a system resource allocation stack or queue), it may be reasonable to use absolute (extended mode) addressing, in which case memory indirection is available. In that case, it may be completely sensible to use the optimization of no-overflow INC or DEC in a stack or queue allocated entirely within a single page:

* A synthetic stack contained entirely in a page,
* using absolute (extended mode) addressing:
	ORG	$400	; anywhere that ESPLOB to ESPHIB-1 are all within a page
ESPLOB	RMB	4	; bumper, lowest related address
ESPLIM	RMB	64	; 32 2-byte items possible on stack
ESPBAS	RMB	4	; bumper
ESPHIB	EQU	*	; highest related address (plus 1)
	...
ESP	RMB	2	; only the low byte will change
	...
EPSHD	DEC	ESP+1	; stack all within a page!
	DEC	ESP+1	; no carry
	STD	[ESP]	; indirection
	RTS
*
EPOPD	LDD	[ESP]	; indirection
	INC	ESP+1	; stack all within a page!
	INC	ESP+1	; no carry
	RTS
*
ADDBESP	ADDB	ESP+1	; signed
	STB	ESP+1	
	RTS
*
SUBBESP	PSHS	B	; unsigned
	LDB	ESP
	ADDB	,S+
	STB	ESP
	RTS

Hopefully, I can devote a chapter or three to giving this proper treatment somewhere down the road.

[JMR202411031752 correction end.]

Oh, and I have mentioned, I think, the DP register, how it isn't as fully supported as I'd have liked

The DP can be used as a base for per-process global variables (in other words, variables local to the process, but globally/statically allocated within the process). I discussed this to a certain extent in the 6800 addressing math chapter.

* On the 6800 or 6801, this would be reference by a process-local
* LOCALBASE or similar pseudo-register, which I almost forgot to talk about.
* How to get the effective address of a variable in DP:
* Instead of 
*	LEAX	<VAR
* or
*	LEAX	VAR,DP
* or even 
*	LEAX	VAR-DPBASE,DP
* which we do not have in the 6809,
* we can do this --
*
* Given 
	ORG	$nn00		; even 256-byte page address
	SETDP	$nn
DPBAS	EQU	*
*	...
VAR	RMB	m
*
* In-line snippets --
* For variable VAR within 256 bytes of DPBAS:
	...
	LDB	#VAR-DPBASE	; put the offset in DP in B (unsigned)
	TFR	DP,A		; pull the base address high byte into A
	TFR	D,X		; move it to X
	...
*
* Using DP when VAR is 256 bytes or more away from DPBAS:
	...
	TFR	DP,A		; pull the base address high byte into A
	CLRB			; make the full base address
	ADDD	#VAR-DPBASE	; add the offset
	TFR	D,X		; move it to X
	...
*
* Or, if the assembler lets us split the offset up with advanced math:
	...
	TFR	DP,A
	LDB	#(VAR-DPBASE)&$FF	; bit-and mask -- no carry!
	ADDA	#(VAR-DPBASE)/$100	; add the high byte
	TFR	D,X
	...
* 
* As subroutines --
* unsigned offset in B:
LEADPUX	TFR	DP,A		; pull the base address high byte into A
	TFR	D,X		; move it to X
	RTS
*
* unsigned offset in D:
LEADPDX	TFR	DP,A		; pull the base address high byte into A
	CLRB			; make the full base address
	ADDD	#VAR-DPBASE	; add the offset
	TFR	D,X		; move it to X
	RTS
*
* Because DP is not in the index post-byte,
* in some applications, it may be better to keep 
* LOCBAS as a pseudo-register,
* in which case it would look like this --
* for small offsets < 128: 
ADDLBB	LDX	<LOCBAS	; but do this in-line!
	LEAX	B,X
	RTS
* for 127 < offset < 256, maybe, maybe not:
ADDLBU	CLRA		; unsigned offset
* for larger offsets
ADDLBD	LDX	<LOCBAS	; and definitely do this in-line, too!
	LEAX	D,X
	RTS	

As with the previous two chapters, I have not tested the code. It should run, modulo typos.

Even though I keep saying things like "in-line this", and "you don't need that", it may be hard to visualize the impact that 6809 addressing modes has on addressing math until we compare the stack frame code for the 6800 and 6801 to the stack frame code for the 6809.

Likewise the 68000. But let's get an overview of addressing math on the 68000 before we take a look at a concrete example of stack frames on the 6801. And on our way to addressing math on the 68000, let's take a detour for multi-byte negation on the 6809.

Or you can jump ahead to getting numeric output in binary.


(Title Page/Index)


 

 

 

 

Wednesday, October 30, 2024

ALPP 02-21 -- Some Address Math for the 6801

  Some Address Math
for the
6801

(Title Page/Index)

I had thought I would not need to show this for the 6801, but the difference between addressing math on the 6800 and on the 6801, due to being able to add and subtract the double accumulator and being able to push and pop X is dramatic enough that I guess I should.

This chapter, then, will be an extension of the handwaving and conceptualizing in the unsteady footing chapter

Even if you aren't interested in stack frames, this discussion of addressing math should be useful, although I'm adding it a bit earlier than I had planned.

In the 6801, as I keep noting, we have ABX to help us with address math, but no corollary SBX. 

But the D register math is wide enough to do addresses, the big problem being in moving addresses between D and X. Two pushes and a pop, or two pops and a push, is not bad, but going through a pseudo-register in the direct page works quicker, and takes more bytes of object code. And sometimes you didn't want to use the whole D accumulator.

Now that I think of it, a sign-extend B into A instruction like the 6809's sign-extend instruction, SEX, might have been helpful in a few places. (cough.) Still, just using D is not an onerous burden.

We still have to use a pseudo-register for many/most of the calculations.

-- Which causes issues at interrupt time, unless we have separate pseudo-registers used by separate routines for interrupt-time, and copy the user tasks' pseudo-registers out and in on context switch.

Here are those NEGate D snippets, modified for 6801:

* For reference -- NEGate a 16-bit value in D (same as 6800) --
NEGD	COMA	; 2's complement NEGate is bit COMplement + 1
	NEGB
	BNE	NEGDX	; or BCS. but BNE works -- extends 0
	INCA
NEGDX	RTS
*
* Another way (use Double accumulator subtract):
NEGDS	PSHB
	PSHA
	CLRB	; 0 - D
	CLRA
	TSX
	SUBD	0,X
	INS
	INS
	RTS	
*
* Same thing using Double accumulator and a temporary
* somewhere in DP:
	...
SCRCHD	RMB	2
	...
* somewhere else
NEGDV	STD	SCRCHA
	LDD	#0	: 0 - D
	SUBD	SCRCHA
	RTS
	...

Remember to read the code and the comments in the code, and open up a separate browser window to compare side-by-side with the 6800. Read through my transliterations from the 6800, but don't jump to conclusions before you get to the very end.

Again, assume you have these declarations for the pseudo-registers:

	ORG	$80
	...
XOFFA	RMB	1
XOFFB	RMB	1
XOFFSV	RMB	2
	...

Using D is so much faster than either 8-bit accumulator that it really doesn't make much sense to provide anything but D-offset, but I've kept the 8-bit and subtract-by-negating entry points for reference. Lack of a negate D means this way to subtract de-optimizes subtraction, and, since the D offset is 16-bit, it's quicker to just load a negative offset in D and call ADDDX instead of bothering with using the SUBDX entry point.

ADDBX	CLRA
ADDDX	STX	XOFFSV
	ADDD	XOFFSV
	STD	XOFFSV
	LDX	XOFFSV
	RTS
SUBBX	CLRA	; B is unsigned
SUBDX	COMA
	NEGB
	BNE	ADDDX	; or BCS. but BNE works -- extends
	INCA
	BRA	ADDDX

If you want a SUBDX entry point for some reason, it may be worth keeping the logic separate and moving the operands. The Double accumulator math speeds this up significantly.

* Alternative, don't use ADDDX, use XOFFA and XOFFB instead 
SUBBX	CLRA	; B is unsigned
SUBDX	STD	XOFFA	; subtraction does not commute.
	STX	XOFFSV	; Handle operand order.
	LDD	XOFFSV
	SUBD	XOFFA
	STD	XOFFSV
	LDX	XOFFSV
	RTS

Just so I don't gloss over ABX, here's ADDBX as a subroutine. 8-bit offset SUBBX remains as it was for the 6800, except using ABX for the add means there's not code sharing:

* Working in byte offsets just takes that much more code than D,
* these are all superfluous.
* Well, the ABX instruction can be useful in-line.
* Alternative unsigned byte only
* subtract needs to be checked again
* range 0 to 255
ADDBX	ABX
	STX	XOFFSV
	RTS
* No improvements here without just using D.
SUBBX	NEGB
	BNE	SUBDXL	; or BCS. but BNE works -- extends
	DEC	XOFFSV	; I think inverting the add should work
SUBDXL	ADDB	XOFFSV+1
	BCC	SUBBXL	; still need to bring the carry in
	INC	XOFFSV+1
SUBBXL	STAB	XOFFSV
	LDX	XOFFSV
	RTS

Using ABX for the positive half of the signed 8-bit routines also emphasizes the lack of SBX in the 6801:

* ABX partially improves the positive half of things here,
* but you really don't want to do this.
* Needs to be checked again.
ADDSBX	STX	XOFFSV
	TSTB	; sign extend B
*	BEQ	ADSBXD	; use only if we really want to optimize 0
	BPL	ADSBXU
	NEGB	; high byte is -1 (low byte is not 0 anyway)
	ADDB	XOFFSV+1
	DEC	XOFFSV	; add -1 (I think )
	LDX	XOFFSV
ADSBXD	RTS
ADSBXU	ABX
	STX	XOFFSV
	RTS

Return stack pointer math with byte offsets losing its meaning on the 6801. You really want the speed when doing math on S, so you're just going to use D.

PSHX and PULX helps with handling the return address..

Again, you should recognize that the call writes the return address into the allocated space on allocation, so if you've stored before allocation, you'll be walking on what you stored.

The declarations,  note that we are adding SOFFA for the double accumulator:

* For S stack
* Even though we really don't want to be bumping the return stack that far,
* Using D is just faster on the 6801
	ORG	$90
	...
SOFFA	RMB	1
SOFFB	RMB	1
SOFFSV	RMB	2

And the code, watch the return address code:

* Here's what we can use the 6801 extensions for when doing unsigned byte offsets,
* but, really, use D instead:
	ORG	SOMETHING
ADDBS	PULX	; get return address, restore stack address
	STS	SOFFSV
	ADDB	SOFFSV+1	; can't use ABX because we need X for return
	BCC	ADDBSL
	INC	SOFFSV
ADDBSL	STAB	SOFFSV
	LDS	SOFFSV
	JMP	0,X	; return through X
SUBBS	NEGB
	BNE	ADDBS	; or BCS. but BNE works -- extend
	INCA
	BRA	ADDBS

Doing it with D instead, but use negative offsets instead of the SUBDS entry point:

* Do it with D, instead, but use negative offsets instead of SUBDS:
ADDDS	PULX	; get return address, restore stack address
	STS	SOFFSV
	ADDD	SOFFSV	; can't use ABX because we need X for return
ADDDSL	STD	SOFFSV
	LDS	SOFFSV
	JMP	0,X	; return through X
SUBDS	COMA
	NEGB
	BNE	ADDDS	; or BCS. but BNE works -- extend
	INCA
	BRA	ADDDS

Moving the operands around, if we think we must subtract positive offsets instead of adding negative offset, gets a lot of improvement. Again, just use D instead and call SUBDS instead of trying to optimize with the 8-bit B accumulator:

* use SOFFB instead of ADDBS
* range 0 to 255
* Need to check again
SUBBS	PULX	; get return address, restore stack pointer
	STS	SOFFSV
	STAB	SOFFB
	BPL	SUBBSM
	INC	SOFFSV	; subtract -1 (I think )
SUBBSM	LDAB	SOFFSV+1
	SUBB	SOFFB
	BCC	SUBBSL
	DEC	SOFFSV	; subtract the borrow
SUBBSL	STAB	SOFFSV+1
	LDS	SOFFSV
	JMP	0,X	; return through X
* Do it with D, instead
* use SOFFA instead of ADDDS
SUBDS	PULX	; get return address, restore stack pointer
	STS	SOFFSV
	STD	SOFFA
	LDD	SOFFSV
	SUBD	SOFFA
	STD	SOFFSV
	LDS	SOFFSV
	JMP	0,X	; return through X

At this point, I think it is obvious that long trains of INX are meaningless on the 6801: Two to four, in-line, sure. More, no.

Long trains for S also become questionable, but PULX can make an appearance, which is interesting, though not useful more than for something to think about:
* For small decrements, 8 <= decrement <= 16
* Uses X for return
* Note that this writes the return address in the upper portion of the allocation,
* which may not be desirable.
SUB14S	PULX
	BRA	ISB14S
SUB12S	PULX
	BRA	ISB12S
SUB10S	PULX
	BRA	ISB10S
SUB8S	PULX
	BRA	ISB8S
SUB16S	PULX
ISB16S	DES
	DES
ISB14S	DES
	DES
ISB12S	DES
	DES
ISB10S	DES
	DES
ISB8S	DES
	DES
	DES	; SUB7S and less are shorter in-line
	DES
	DES	
	DES
	DES
	DES
	JMP	0,X
* For small increments, 8 <= increment <= 16
* Uses X for return
ADD14S	PULX
	BRA	IAD14S
ADD12S	PULX
	BRA	IAD12S
ADD10S	PULX
	BRA	IAD10S
ADD8S	PULX
	BRA	IAD8S
ADD6S	PULX
	BRA	IAD6S
ADD16S	PULX
IAD16S	INS
	INS
IAD14S	INS
	INS
IAD12S	INS
	INS
IAD10S	INS
	INS
IAD8S	INS
	INS
IAD6S	INS	; ADD5S and less are shorter in-line
	INS
	INS	
	INS
	INS
	INS
	JMP	0,X

I guess, since I'm being noisy about SBX not being implemented on the 6801, I should also be noisy about ABS (add B to S) and SBS (subtract B from S) being missing.

But so much of the above really becomes irrelevant if we just liberate ourselves from the stack frame mentality/paradigm. Stack frames really ought to be classed among Monty Python's silly walks. 

Stacks allocated entirely within a single page

Concerning the optimization of allocating stacks entirely within a page and only doing math on the low byte, the 6801 offers no improvements to that, only to make the optimization less meaningful. I'll repeat, with the full address math below to make it clear. 

 Oh, but working directly on the parameter stack pointer becomes more interesting.

* And stacks restricted within page boundaries no longer make as much sense on the 6801.
* Pseudo-registers somewhere in DP:
PSP	RMB	2
XOFFSV	RMB	2
XOFFA	RMB	1
XOFFB	RMB	1
SOFFA	RMB	1
SOFFB	RMB	1
SOFFSV	RMB	2
	...
	ORG	$500	; or something
	RMB	4	; buffer zone
PSTKLIM	RMB	64
PSTKBAS	RMB	4	; buffer zone
SSTKLIM	RMB	32
SSTKBAS	RMB	4	; buffer zone
	...

* B for parameter stack:
ADBPSX	STX	PSP
ADBPSP	ADDB	PSP+1	; Stack allocated completely within page, never carries.
	STAB	PSP+1
	LDX	PSP
	RTS
*
* D for parameter stack:
ADDPSX	STX	PSP
ADDPSP	ADDD	PSP
	STD	PSP	; does the whole pointer, negatives, too
	LDX	PSP
	RTS
*
* B for parameter stack:
SBBPSX	STX	PSP
SBBPSP	STAB	XOFFB
	LDAB	PSP+1
	SUBB	XOFFB	; Stack allocated completely within page, never carries.
	STAB	PSP+1
	LDX	PSP
	RTS
*
* D for parameter stack:
SBDPSX	STX	PSP
SBDPSP	STD	XOFFA
	LDD	PSP
	SUBD	XOFFA	; does the whole pointer
	STD	PSP
	LDX	PSP
	RTS

* B for return stack:
ADBSP	PULX	; return address
	STS	SOFFSV
	ADDB	SOFFSV+1	; Stack allocated completely within page, never carries.
	STAB	SOFFSV+1
	LDS	SOFFSV
	JMP	0,X	; return via X
*
* D for return stack (but we saw this above):
ADDSP	PULX	; return address
	STS	SOFFSV
	ADDD	SOFFSV	; does the whole pointer, negatives, too
	STD	SOFFSV
	LDS	SOFFSV
	JMP	0,X	; return via X

* B for return stack:
SBBSP	PULX	; return address
	STS	SOFFSV
	STAB	SOFFB
	LDAB	SOFFSV+1
	SUBB	SOFFB		; Stack allocated completely within page, never carries
	STAB	SOFFSV+1
	LDS	SOFFSV
	JMP	0,X	; return via X
*
* D for return stack (but we saw this above):
SBDSP	PULX	; return address
	STS	SOFFSV
	STD	SOFFA
	LDD	SOFFSV
	SUBD	SOFFA		; does the whole pointer
	STD	SOFFSV
	LDS	SOFFSV
	JMP	0,X	; return via X

As with the last chapter, I have not tested the code. I do think it should run, modulo typos.

[JMR202411021012 addendum:]

 Not stack frame related, but address math. I discussed it in the 6800 address math chapter, and I want to show the 6801 version of the code. 

This is for accessing per-process global variables that don't need such high-speed access that they are worth slowing process switches down with, which is almost all per-process variables except when the hardware application only has a few very limited processes. See the discussion before the 6800 snippets.

* In the DP somewhere, we have a base pointer for the per-process
* variable space, and a working register, something like
	...
LOCBAS	RMB	2
LBXPTR	RMB	2
	...
*
* And, to get the address of variables in the per-process variable space,
* something like these functions --
ADDLBB	CLRA			; entry point for the byte offset in B
ADDLBD	ADDD	LOCBAS		; entry point for larger offsets in A:B
	STD	LBXPTR
	RTS
*
ADDLBX	BSR	ADDLBB	; and load X
	LDX	LBXPTR
	RTS
*
ADDLDX	BSR	ADDLBD	; and load X
	LDX	LBXPTR
	RTS

[JMR202411021012 addendum end.]

And with this in mind, too, while thinking about how the 6801's enhanced instruction set can make some of the above code much less intransigent, let's remind ourselves why the 6809 and 68000 don't need routines like these before we take a look at a concrete example of stack frames on the 6801.

Or you can jump ahead to getting numeric output in binary.


(Title Page/Index)


 

 

 

 

ALPP 02-20 -- Some Address Math for the 6800

  Some Address Math
for the
6800

(Title Page/Index)

Perhaps I would not have gotten so tangled up in the discussion of stack frames if I had simply written this chapter immediately after the demonstration of 16- and 32-bit arithmetic on the 68000. But sometimes you just need to see a reason for doing something before you see someone doing it, or it blows your mind.

What is the difference between address math and other math?

Not a lot. You still have to pay attention to signs and stuff, and watch what happens when you wrap around the limits of your registers. Rings are fun, but you have to get used to them. 

Ah, yes, right. One thing about general address math is that you need to be aware of the limits of your registers. You often don't know in advance where in memory the address you're working on is going to be.

Not to say you don't have to be aware of limits in non-address math -- rather, where the limits hit and how they hit can be different, so you have to watch a different way.

One other difference is that, for general math, you want your call and result parameters in places where they can be easily carried from one stage in calculations to the next. That's why I have been demonstrating the use of the parameter stack versus global variables (versus registers).

For address math, if possible, you absolutely want your parameters and the result in registers, specifically the result in a particular register that can be used in addressing.

In the earliest CPUs, the math itself was hard enough (unknown enough) that addressing seemed to be an afterthought -- or even outside the plans. You can't plan well without knowing what you're planning for -- and what you're planning.

We really didn't know what we were doing. 

Intel, for instance, almost killed themselves in the mid-1970s working on a CPU design that was supposed to be the be-all-and-end-all of CPUs, the iAPX 432. But there was too much theory without experience, and it was slow and fought with itself to get work done. When they saw deadlines pass without end in sight, especially when rumors of what Motorola was doing hit the backyard fence, they scrambled and used part of what they had learned and produced the 8086, and the 8086 was definitely an improvement on the 8080 -- and saved their bacon when the 432 that was delivered didn't live up to promise. And the 8086 was a small enough step forward that it was easy for customers to adopt -- setting the stage for Intel to lead by adopting small improvements in steps that could be handled. But the 8086 also was, and its descendants still are, more than a little baroque.

Motorola, for their part, had figured out they needed to do something radical to stay competitive, and had started examining source code for the 6800 that they had access to, looking for ways to relieve computational bottlenecks. They used that research in the original design of the 68000, and there was a parallel team that had access to the research and put it to use in the design of the 6809.

And they hit a home run on the 6809 -- almost. Brought three runners in and left the DP register stranded on 3rd, so to speak. If you think of DP as the pinch runner or something. Okay, the metaphor doesn't quite work, unless you think of the DP register as the pinch runner for a wider address space, which it almost was.

The 68000 was another home run -- out of season and some overkill. And it has some warts, too.

Every real CPU is going to have warts. It's a mathematical requirement. 

I'm not kidding. There is an axiom in systems science 

Every model is insufficient to reality.

And that has some consequences:

  • Every system has vulnerabilities, and
  • every system contains the seeds of its own undoing, and
  • every market window is a sandpit.

Translated into general science, we know in advance that every theory and every law will eventually fail.

But that kind of cold water just is not popular in the sales department, so, instead of emblazoning it on the halls of all higher learning and in the chambers of legislatures, we hide it away. 

(Mostly -- there is some recognition at times -- POSIWID.) 

All of that to warn you:

Ugly code in here. 

I did some handwaving and conceptualizing for the 6801 in the unsteady footing chapter. I'm continuing with more handwaving and untested code in this chapter, but for the 6800.

First, in the 6800, we have nothing special to add a constant to the index register with anything but an ephemeral result. That's great for some things like constant offsets (thus, the 6800's indexed mode), but not so great for some other things. And it's always a positive constant, which makes some stack-related uses hard.

In the 6801, we have ABX to add a small offset -- unsigned, less than 256 -- but no SBX to subtract an offset, and no signed ASBX or whatever.

The way the instruction set is constructed, we end up having to use a variable in memory to do the math, and because we have to use X to index the stack(s), passing the offset in as a dynamically allocated parameter is a case of trying to resolve a cyclic dependency.

Thus, we simply have to use a pseudo-register -- preferably in the direct page. 

-- Which causes issues at interrupt time, unless we have separate pseudo-registers used by separate routines for interrupt-time, and copy the user tasks' pseudo-registers out and in on context switch.

We'll be using 16-bit negation rather frequently, keep a couple or three snippets in mind:

* For reference -- NEGate a 16-bit value in A:B --
NEGAB	COMA	; 2's complement NEGate is bit COMplement + 1
	NEGB
	BNE	NEGABX	; or BCS. but BNE works -- extends 0
	INCA
NEGABX	RTS
*
* Another way, using stack for temporary:
NEGABS	PSHB
	PSHA
	CLRB	; 0 - A:B
	CLRA
	TSX
	SUBB	1,X
	SBCA	0,X
	INS
	INS
	RTS	
*
* Same thing using a temporary
* somewhere in DP:
	...
SCRCHA	RMB	1
SCRCHB	RMB	1
	...
* somewhere else
NEGABV	STAA	SCRCHA
	STAB	SCRCHB
	CLRA		: 0 - A:B
	CLRB
	SUBB	SCRCHB
	SBCA	SCRCHA
	RTS
	...

I'm going to assume that you'll be reading the code and the comments closely enough to tell when you should doubt me.

Assume you have these declarations for the pseudo-registers:

	ORG	$80
	...
XOFFA	RMB	1
XOFFB	RMB	1
XOFFSV	RMB	2
	...

These entry points should add and subtract offsets in A:B. Note that the code inverts A:B to do the subtraction, to avoid commutation issues. (Note carefully the INCA. I think I have this right for handling the NEGB when B is zero.)

ADDBX	CLRA
ADDDX	STX	XOFFSV
	ADDB	XOFFSV+1
	ADCA	XOFFSV
	STAB	XOFFSV+1
	STAA	XOFFSV
	LDX	XOFFSV
	RTS
SUBBX	CLRA	; B is unsigned
SUBDX	COMA
	NEGB
	BNE	ADDDX	; or BCS. but BNE works -- extends
	INCA
	BRA	ADDDX

As an alternative, we could move the operands around using more pseudo-registers (and remembering the consequences). This code may be a little easier to believe in, but it does mean two more bytes to save away and restore on context switch.

* Alternative, don't use ADDDX, use XOFFA and XOFFB instead 
SUBBX	CLRA	; B is unsigned
SUBDX	STAA	XOFFA	; subtraction does not commute.
	STAB	XOFFB	; Handle operand order.
	STX	XOFFSV
	LDAA	XOFFSV
	LDAB	XOFFSV+1
	SUBB	XOFFB
	SBCA	XOFFA
	STAA	XOFFSV
	STAB	XOFFSV+1
	LDX	XOFFSV
	RTS

You can optimize the above a bit if you limit offsets to 0 to 255, which is a completely reasonable restriction for many applications. I won't show those. I don't want to wear you out with too much untested code.

Signed byte offset (-128 to 127) is also completely reasonable for many applications, and may offer some aesthetic satisfaction:

* this is faster than SUBDX and almost as fast as ADDDX, 
* Range is -128 to 128 which should be enough for many purposes.
* But unsigned byte-only can be faster.
* Needs to be checked again.
ADDSBX	STX	XOFFSV
	TSTB	; sign extend B
*	BEQ	ADSBXD	; use only if we really want to optimize 0
	BPL	ADSBXU
	NEGB	; high byte is -1 (low byte is not 0 anyway)
	ADDB	XOFFSV+1
	DEC	XOFFSV	; add -1 (I think )
	BRA	ADSBXL
ADSBXU	ADDB	XOFFSV+1
	BCC	ADSBXL
	INC	XOFFSV
ADSBXL	LDX	XOFFSV
ADSBXD	RTS

And we can do similar things with the return stack, S. S, in particular, should never need offsets larger than 255 on the 6800, so we'll focus on the unsigned byte options. 

The stack has the additional constraints of requiring some means of handling the return address.

One more thing, you should recognize that the call writes the return address into the allocated space on allocation. If there is something important there, it's toast.

The declarations:

* For S stack
* unsigned byte only,
* because we really don't want to be bumping the return stack that much
	ORG	$90
	...
SOFFB	RMB	1
SOFFSV	RMB	2

And the code, watch the return address code:

ADDBS	TSX
	LDX	0,X	; get return address
	INS
	INS		; restore stack address
	STS	SOFFSV
	ADDB	SOFFSV+1
	BCC	ADDBSL
	INC	SOFFSV
ADDBSL	STAB	SOFFSV
	LDS	SOFFSV
	JMP	0,X	; return through X
SUBBS	NEGB
	BNE	ADDBS	; or BCS. but BNE works -- extend
	INCA
	BRA	ADDBS

Again, the subtraction can alternatively move the operands into the right order, at the cost of using another pseudo-register:

* use SOFFB instead of ADDBS
* range 0 to 255
* Need to check again
SUBBS	TSX
	LDX	0,X	; get return address
	INS
	INS		; restore stack pointer
	STS	SOFFSV
	STAB	SOFFB
	BPL	SUBBSM
	INC	SOFFSV	; subtract -1 (I think )
SUBBSM	LDAB	SOFFSV+1
	SUBB	SOFFB
	BCC	SUBBSL
	DEC	SOFFSV	; subtract the borrow
SUBBSL	STAB	SOFFSV+1
	LDS	SOFFSV
	JMP	0,X	; return through X

You'll remember I made reference to long trains of INX and DEX as a substitute for direct math on X:

* For small increments <= 16
ADD16X	INX
	INX
ADD14X	INX
	INX
ADD12X	INX
	INX
ADD10X	INX
	INX
ADD8X	INX
	INX
ADD6X	INX
	INX
	INX	; ADD4X and less shorter in-line
	INX
	INX	
	INX
	RTS

* For small decrements <= 16
SUB16X	DEX
	DEX
SUB14X	DEX
	DEX
SUB12X	DEX
	DEX
SUB10X	DEX
	DEX
SUB8X	DEX
	DEX
SUB6X	DEX
	DEX
	DEX	; SUB4X and less shorter in-line
	DEX
	DEX	
	DEX
	RTS

Just jump to the label for the offset you need to add or subtract.

I know it looks ... ugly. But it works, and it avoids the use of pseudo-registers, and it's fast, and it actually doesn't use up more code space than the general routines we've looked at. These are worth considering.

And you're thinking, well, that's not going to work for the return stack? 

Hah!

* For small decrements, 8 <= decrement <= 16
* Uses X for return
* Note that this writes the return address in the upper portion of the allocation,
* which may not be desirable.
SUB14S	TSX
	LDX	0,X
	BRA	ISB14S
SUB12S	TSX
	LDX	0,X
	BRA	ISB12S
SUB10S	TSX
	LDX	0,X
	BRA	ISB10S
SUB8S	TSX
	LDX	0,X
	BRA	ISB8S
SUB16S	TSX
	LDX	0,X
ISB16S	DES
	DES
ISB14S	DES
	DES
ISB12S	DES
	DES
ISB10S	DES
	DES
ISB8S	DES
	DES
	DES	; SUB7S and less are shorter in-line
	DES
	DES	
	DES	; two less because of the return address
	JMP	0,X

* For small increments, 8 <= increment <= 16
* Uses X for return
ADD14S	TSX
	LDX	0,X
	BRA	IAD14S
ADD12S	TSX
	LDX	0,X
	BRA	IAD12S
ADD10S	TSX
	LDX	0,X
	BRA	IAD10S
ADD8S	TSX
	LDX	0,X
	BRA	IAD8S
ADD16S	TSX
	LDX	0,X
IAD16S	INS
	INS
IAD14S	INS
	INS
IAD12S	INS
	INS
IAD10S	INS
	INS
IAD8S	INS
	INS
	INS	; ADD7S and less are shorter in-line
	INS
	INS	
	INS
	INS
	INS
	INS	; two more to cover the return address
	INS
	JMP	0,X

What's that? Do I hear complaints about the smell.

It's ugly, but it could be useful.

Stacks allocated entirely within a 256-byte page

Finally, if we are talking about stacks (and other largish things in memory), it may be possible to arrange them in memory so that the stacks lie completely within a single 256 byte page, such that the high byte of address does not change. This particular trick was used to great effect on the 6502 and 6805, in particular. 

We can use it on the 6800 in some cases, if we can be absolutely sure that everybody who ever touches the code is aware of the requirement to keep each stack entirely within a single page.

* Stacks within page boundaries:
* Pseudo-registers somewhere in DP:
PSP	RMB	2
XOFFSV	RMB	2
XOFFB	RMB	1
SOFFB	RMB	1
SOFFSV	RMB	2
	...
	ORG	$500	; or something
	RMB	4	; buffer zone
PSTKLIM	RMB	64
PSTKBAS	RMB	4	; buffer zone
SSTKLIM	RMB	32
SSTKBAS	RMB	4	; buffer zone
	...

* For parameter stack:
ADBPSX	STX	PSP
ADBPSP	ADDB	PSP+1	; Stack allocated completely within page, never carries.
	STAB	PSP+1
	LDX	PSP
	RTS
*
SBBPSX	STX	PSP
SBBPSP	STAB	XOFFB
	LDAB	PSP+1
	SUBB	XOFFB	; Stack allocated completely within page, never carries.
	STAB	PSP+1
	LDX	PSP
	RTS

* For return stack:
ADBSP	TSX
	LDX	0,X	; return address
	ADDB	#2		; faster, same byte count
	STS	SOFFSV
	ADDB	SOFFSV+1	; Stack allocated completely within page, never carries.
	STAB	SOFFSV+1
	LDS	SOFFSV
	JMP	0,X	; return via X

SBBSP	TSX
	LDX	0,X	; return address
	ADDB	#2		; faster, same byte count
	STS	SOFFSV
	STAB	SOFFB
	LDAB	SOFFSV+1
	SUBB	SOFFB		; Stack allocated completely within page, never carries
	STAB	SOFFSV+1
	LDS	SOFFSV
	JMP	0,X	; return via X
Again, I have not tested the code. It should run. I think.

As a reminder, we've already seen what code looks like without stack frames. The only reason I'm showing you this stuff is so that you understand why stack frames may not be preferred for many applications (and, if you can understand that, maybe you can sometime see it for all applications).

Well, no, not the only reason. Maybe the only reason I'm showing it to you now rather than later.

[JMR202411020931 addendum:]

This is not stack frame related, but it's address math related, and I think it would be good to discuss it here, lest I forget --

There are two approaches to per-process variables. 
  • Pseudo-registers like PSP, XWORK, XOFFSV, SOFFSV, etc. will either be saved and restored on process switch or will have separate versions for each task, if there are not too many.
  • Most per-process variables with global allocation should be in a per-process address space. 

You'll usually use both, a few pseudo-registers for variables that need quick access, and they need to just a few to keep the management overhead on task/process switch to a minimum. Every pseudo-register must be saved and restored on process switch --

Except for a couple of special cases, 

  • It's useful to keep system pseudo-registers separate from non-system pseudo-registers, complete with separate routines to manage them.
  • If there are just a few non-system processes in a small hardware application, it may be useful to give each process its own pseudo-registers, along with the routines to manage them.

What kinds of things need to be pseudo-registers? 

XWORK and other such temporaries, including SOFFSV and such above.

And PSP, as well. (Note that, if the system functions use a parameter stack, it should be a separate SPSP or something, which would have to have its own support routines.)

If there are a lot of per-process variables, you would need, separate from pseudo-registers, a process-local space. And you would need a pointer to that space, with routines to access the variables there:

* In the DP somewhere, we have a base pointer for the per-process
* variable space, and a working register, something like
	...
LOCBAS	RMB	2
LBXPTR	RMB	2
	...
*
* And, to get the address of variables in the per-process variable space,
* something link these --
ADDLBB	CLRA			; entry point for the byte offset in B
ADDLBD	ADDB	LOCBAS+1	; entry point for larger offsets in A:B
	ADCA	LOCBAS
	STAA	LBXPTR
	STAB	LBXPTR+1	; let other code load X
	RTS
*
ADDLBX	BSR	ADDLBB	; and load X
	LDX	LBXPTR
	RTS
*
ADDLDX	BSR	ADDLBD	; and load X
	LDX	LBXPTR
	RTS

[JMR202411020931 addendum end.]

With all this in mind, look at how the 6801's enhanced instruction set can make some of the above code much less intransigent before we take a look at a concrete example of stack frames on the 6800.

Or you can jump ahead to getting numeric output in binary.


(Title Page/Index)


 

 

 

 

Tuesday, October 29, 2024

Teaching Myself Python, part 1

Right in the middle of my assembly language tutorial, I decided to teach myself Python.

(A friend needed some help with his classwork.)

I'm running the interactive interpreter, and this is a partial record of the conversation.

account@computer:~$ python
Python 2.7.17 (default, Sep 30 2024, 12:35:16) 
[GCC 7.5.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

That's pretty much not unexpected. 

What did I just do? 

I invoked Python with just 

python

at the bash shell command-line $ prompt. And Python tells me what version I'm running and some information about how it was built, followed by a prompt about things people might be particularly interested in. 

And you see I'm running a slightly old python -- version 2.7. (I actually have python v. 3 running, as well, but python v. 2 is my default right now.)

Where did I get Python?

Python gets installed by default now on a lot of Posix OSses. It's available in pretty much every distribution's package manager. I'm running an older version of Ubuntu Linux. (I need to upgrade, trying to decide between going back to Debian or over to Devuan, or staying with Ubuntu, or heading back to one of the BSDs.) 

I think v. 2 was installed by default, and I installed v. 3 from the packages. Or maybe it was the other way around, or maybe I installed both from packages. I don't remember.

If you're running MacOS, you can get Python from python.org or through one of several 3rd party package managers. (Macs actually come with Python, and it should be good enough for what I'm doing here. But if you decide to use Python regularly, it's recommended that you install a separate version so that you don't disturb the system's version when you install Python stuff that Mac OS doesn't use.)

If you're running MSWindows, you can apparently get Python from the Microsoft Store as well as from python.org. Or you can install Cygwin and get Python from the Cygwin packages.

And Python is available from python.org and others in the Android and Apple iOS stores, from what I hear.

Anyway, Python has been installed, and I have been putting off actually learning it, for quite a while. 

And there are two fundamental ways of running Python --

  1. Run a program written in Python (whether by clicking an icon or by invoking it at the command line), and
  2. run the interactive Python interpreter itself (whether by clicking an icon or invoking it at the command line)

I'm doing the latter, invoking it at the command line. (See above.)

What I type once Python is running is on the lines that start

>>>

Let's see what happens when I type "help", like the prompt suggests:

>>> help
Type help() for interactive help, or help(object) for help about object.
>>> 

Huh? 

Hmm. (Head scratching ensues.) 

Maybe, type "help()"?

>>> help()

Welcome to Python 2.7!  This is the online help utility.

If this is your first time using Python, you should definitely check out
the tutorial on the Internet at http://docs.python.org/2.7/tutorial/.

Enter the name of any module, keyword, or topic to get help on writing
Python programs and using Python modules.  To quit this help utility and
return to the interpreter, just type "quit".

To get a list of available modules, keywords, or topics, type "modules",
"keywords", or "topics".  Each module also comes with a one-line summary
of what it does; to list the modules whose summaries contain a given word
such as "spam", type "modules spam".

help>

And now it seems to be prompting me with "help> instead of ">>>".

Okay, "spam" sounds interesting.

help> spam
no Python documentation found for 'spam'

help>

Oh. I guess I need to type "modules spam" for that.

help> modules spam

Here is a list of matching modules.  Enter any module name to get more help.

Gtk-Message: 07:40:20.836: Failed to load module "canberra-gtk-module"

help>

I guess that means that something is not installed. I don't feel like chasing that down right now, so let's forget about the module "spam".

Maybe I could read up on arrays in Python.

help> modules array

Here is a list of matching modules.  Enter any module name to get more help.

array - This module defines an object type which can efficiently represent
numpy.core._methods - Array methods which are called by both the C-code for the method
numpy.core.arrayprint - Array printing function
numpy.core.defchararray - This module contains a set of functions for vectorized string
numpy.core.info - Defines a multi-dimensional array and useful procedures for Numerical computation.
numpy.core.multiarray 
numpy.core.multiarray_tests 
numpy.core.records - Record Arrays
numpy.lib.arraypad - The arraypad module contains a group of functions to pad values onto the edges
numpy.lib.arraysetops - Set operations for arrays based on sorting.
numpy.lib.arrayterator - A buffered iterator for big arrays.
numpy.lib.format - Define a simple format for saving numpy arrays to disk with the full
numpy.lib.mixins - Mixin classes for custom array types that don't inherit from ndarray.
numpy.lib.recfunctions - Collection of utilities to manipulate structured arrays.
numpy.lib.twodim_base - Basic functions for manipulating 2d arrays
numpy.lib.ufunclike - Module of functions that are like ufuncs in acting on arrays and optionally
numpy.lib.user_array - Standard container-class for easy multiple-inheritance.
numpy.ma.extras - Masked arrays add-ons.
numpy.ma.testutils - Miscellaneous functions for testing masked arrays and subclasses

help> 

Well, maybe I'll chase down the arrays module later and just get back to trying out Python. It said I could type "quit" to go back:

help> quit

You are now leaving help and returning to the Python interpreter.
If you want to ask for help on a particular object directly from the
interpreter, you can type "help(object)".  Executing "help('string')"
has the same effect as typing a particular string at the help> prompt.
>>> 

Well, that's useful information about how to use the help function.

Let's see if I can get Python to say "hello".

>>> hello
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'hello' is not defined
>>> 

Not that way. Maybe tell it to print it out?

>>> print hello
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'hello' is not defined
>>>

that doesn't seem to be the way to do it, either. I guess it needs to be quoted:

>>> print "hello"
hello
>>> 

That seems to work. What next?

>>> name = "Joel"
>>> print "hello ", name
hello  Joel
>>> 

Let's see if I can stroke my ego some more:

>>> rank = 1
>>> print "hello", name, "you're number ", rank, "!"
hello Joel you're number  1 !
>>> 

Cheap thrills. Heh.

One more vanity one-liner, but it requires putting either the program or the name in a file. We'll put the program in a file. 

To do this, you open up a text editor. Some people like geany or kate. I generally use gedit or vim. All four and others are available in most Posix OSses' package managers, and can be downloaded for Mac OS or installed via 3rd party package managers. 

Of course, Mac OS has Xcode (vim, too!), and I have found Xcode quite useful on Mac, useful enough that I usually have not installed gedit. (Make sure you download the command-line tools when you download Xcode.)

And, if you must use MSWindows, all four can be installed from Cygwin packages if you install Cygwin. (Geany and Gedit are supported for MSWindows, and can be downloaded from their respective download sites, but Kate takes a bit more work if you don't get it through Cygwin.) Or you can use Microsoft's Visual Studio. (Bleaugh. Just kidding. Sort-of.)

Anyway, open up a text editor and type in the following one-liner and save it in a directory called something like "play/python" as "greet.py". You'll need to change directory in your command-line shell, as well. Oh. Yeah. You'll need to use a command-line shell for this one. I don't think it can be done easily when you start Python by clicking an icon.

Actually, this one-liner is a two-liner:

import sys
print "Hello", sys.argv[ 1 ]

As I say, save it as "greet.py". Then go to your command-line shell and type the command "python greet.py Joel:

account@computer:~/play/python$ python greet.py Joel
Hello Joel
account@computer:~/play/python$ 

By way of explanation --

(1) You can save your program in a file with a ".py" extension (ending on the name) and have python run the program for you. Interactive is fun, but there are some things that don't work well interactively (can't work well interactively).

(2) Python makes whatever you type after the name of the program file available to your program in a list (I think it's a list.) called "argv", in the system module called "sys".

If you're not used to using these command-line parameters, that may not be much explanation, but once you start using them, they're not hard to understand.

Well, let's see what else might be interesting. Can I make the thing count?

>>> mylist = [ 1, 2, 3, 4 ]
>>> print mylist, "I just plain adore ..."
[1, 2, 3, 4] I just plain adore ...
>>> 

Rim shot. Let's try an explicit loop:

>>> for i in range( 1, 4 )
  File "<stdin>", line 1
    for i in range( 1, 4 )
                         ^
SyntaxError: invalid syntax
>>> 

Ooooh, rejected!

Needs a colon on the end. Try again:

>>> for i in range( 1, 4 ):
...   print i
... 
1
2
3
>>> 

The "..." ellipsis prompt means it wants you to type more. So I typed the body of the loop.

And it's still not quite there. The last number in the range is not included.

Point one, for people who like their code blocks to have beginning braces and end braces or BEGIN and END keywords, Python uses indentation to demarcate blocks.

Yeah. The whitespace. The stuff you've trained yourself to not see. Or at least I have.

That's one of the reasons I got miffed at Python in the past. But I have a friend this time, so I'll just go with it.

I do see that the colon seems to be a sort of BEGIN marker. And after some playing around late last night, I figured out that, even though python doesn't require it, you can put comment characters in place to show where the indentation hits. It's kind of like painting the fingers of your invisible robot hands so you can see what they're doing, but it seems to sort of do the job.

My friend needs help with working through a list, summing up a column, and taking an average. So let's look at two ways to sum a list in Python. This is a kind of a sudden jump, but we've seen most of the essential elements of the language that I'm using, so, hang on to your hat, and I'll hang on to mine:

Grabbing input from the keyboard can get confusing, so we're gong to save this as a program, call it "totaloop.py":

total = 0
count = 0
for i in range( 1, 10):
  numstr = input()
  previous = total
  total = total + numstr
  count = count + 1
  print count, ": ", previous, " + ", numstr, " = ", total ;
  # end of loop

average = float( total ) / count
print "average of ", total, " / ", count, " is ", average

What this does is

  • read a number from the keyboard,
  • add it immediately to the total, and
  • print out the number, count, and running total
  • until 9 numbers have been read; 
  • at which point, the average is calculated and printed out.

Make sure you save it with the empty trailing line. Call it from the command line, like this:

$ python totaloop.py

And then type in a bunch of numbers

$ python totaloop.py
2
1 :  0  +  2  =  2
6
2 :  2  +  6  =  8
3
3 :  8  +  3  =  11
2
4 :  11  +  2  =  13
3
5 :  13  +  3  =  16
6
6 :  16  +  6  =  22
4
7 :  22  +  4  =  26
9
8 :  26  +  9  =  35
3
9 :  35  +  3  =  38
average of  38  /  9  is  4.22222222222

You can see that I typed in the numbers 2 6 3 2 3 6 4 9 3.

Now, instead of immediately calculating the running total, we'll input the entire list first, and then use a built-in function to sum up the whole list:

numbers = []
for i in range( 1, 10 ):
  numstr = input()
  numbers.append( float( numstr ) )
  print i, ": ", numstr, numbers 
  # End of loop

total = sum( numbers )
average = total / len( numbers )

print "list: ", numbers
print "sum is ", total, " and average is ", average

Re-emphasizing, the program 

  • reads in a number,
  • appends it to the list,
  • prints the number read and the list
  • until 9 numbers are read,
  • and then, after the loop is finished, sums them all up and
  • prints the results 

Call it from the command line and type in the list of numbers typed in before just so we can compare:

$ python totalist.py
2
1 :  2 [2.0]
6
2 :  6 [2.0, 6.0]
3
3 :  3 [2.0, 6.0, 3.0]
2
4 :  2 [2.0, 6.0, 3.0, 2.0]
3
5 :  3 [2.0, 6.0, 3.0, 2.0, 3.0]
6
6 :  6 [2.0, 6.0, 3.0, 2.0, 3.0, 6.0]
4
7 :  4 [2.0, 6.0, 3.0, 2.0, 3.0, 6.0, 4.0]
9
8 :  9 [2.0, 6.0, 3.0, 2.0, 3.0, 6.0, 4.0, 9.0]
3
9 :  3 [2.0, 6.0, 3.0, 2.0, 3.0, 6.0, 4.0, 9.0, 3.0]
list:  [2.0, 6.0, 3.0, 2.0, 3.0, 6.0, 4.0, 9.0, 3.0]
sum is  38.0  and average is  4.22222222222

The first way uses less memory.

The second way is what you use if you need to do more than one thing with the list of numbers.

Let's say you want to remember what the 4th number entered was and print it out at the end. 

You could add code to the first one like this:

total = 0
count = 0
memo = 0
for i in range( 1, 10):
  numstr = input()
  previous = total
  total = total + numstr
  count = count + 1
  if count == 4:
    memo = numstr
    # end of conditional
  print count, ": ", previous, " + ", numstr, " = ", total ;
  # end of loop

average = float( total ) / count
print "average of ", total, " / ", count, " is ", average
print "4th number:", memo

Essentially, watch the count, and when it hits 4 put the input number in the memo variable.

Note that Python wants an empty line at the same indentation at the end of the loop and at the end of the conditional if. I'm marking the empty line and the indentation with a comment. (Comments have no code, and can serve as empty lines.)

If you fail to indent the trailing empty line, it terminates everything (and makes a fuss about syntax).

Here's a way we can do it if we keep the whole list in memory:

numbers = []
for i in range( 1, 10 ):
  numstr = input()
  numbers.append( float( numstr ) )
  print i, ": ", numstr, numbers 
  # End of loop

total = sum( numbers )
average = total / len( numbers )

print "list: ", numbers
print "sum is ", total, " and average is ", average
# first is numbers[ 0 ], 4th is numbers[ 3 ]
print "4th number:", numbers[ 3 ]

Since we still have the whole list in memory, we can just index the 4th element. Remember, lists start with index 0, so the first one is numbers[0]. Therefore, the 4th one is at index 3, numbers[3].

Hopefully, this will be enough for my friend to work it through.

Saturday, October 26, 2024

ALPP 03-14 -- Keyboard Input Routines and Character Code Output on the 68000 (Debug Session -- Dealloc Error)

Keyboard Input Routines
and Character Coded Output
on the 68000
(Debug Session -- Dealloc Error)

(Title Page/Index)

 

So we found one of the bugs in the code our test program to read the keyboard and show the character and the character code in binary and hexadecimal. 

And I told you we should use techniques that I have described to check that stack balance has been maintained.

But inserting test code into code is not just a great way to test code, it's also a great way to insert new bugs, mask old bugs, and increase opportunities to accidentally alter the code.

So we want to figure out what parts of the code we want to look at before we insert the code to look at it with.

Let's start another Hatari session and set some breakpoints. You'll want the assembly output listing from vasm in a text editor window for reference. In my case, I called it "inkey_68K.list" when I did the assembly the last time:

vasmm68k_mot -Ftos -no-opt -o INKEY_68K.PRG -L inkey_68K.list inkey_68K.s

Without either the listing or the source code open to look at, you'll be flying blind. Even with the listing, you'll be flying instrument rules, so to speak.

Break into the debugger and set the TEXT breakpoint, of course. Then (c)ontinue:

----------------------------------------------------------------------
You have entered debug mode. Type c to continue emulation, h for help.

CPU=$e1d7e2, VBL=1366, FrameCycles=128, HBL=0, LineCycles=128, DSP=N/A
00e1d7e2 46c0                     move.w d0,sr
> b pc=TEXT
CPU condition breakpoint 1 with 1 condition(s) added:
	pc = TEXT
> c
Returning to emulation...

Back at the EMUCON console, invoke the program (INKEY_68K.PRG above) and when it tries to enter the TEXT segment code it will take you to the breakpoint. 

Step into the BRA START at ENTRY and take a disassembly from the PC at START:

1. CPU breakpoint condition(s) matched 1 times.
	pc = TEXT
Reading symbols from program '/home/nova/usr/share/hatari/C:/primer/char_io/stepinch/INKEY_68K.PRG' symbol table...
TOS executable, DRI / GST symbol table, reloc=0, program flags: PRIVATE (0x0)
Program section sizes:
  text: 0x350, data: 0x0, bss: 0x0, symtab: 0x32c
Trying to load DRI symbol table at offset 0x36c...
Offsetting BSS/DATA symbols from TEXT section.
Skipping duplicate address & symbol name checks when autoload is enabled.
Loaded 56 symbols (41 TEXT) from '/home/nova/usr/share/hatari/C:/primer/char_io/stepinch/INKEY_68K.PRG'.

CPU=$13d10, VBL=3801, FrameCycles=210184, HBL=206, LineCycles=888, DSP=N/A
00013d10 6000 02a0                bra.w #$02a0 == $00013fb2 (T)
> s

CPU=$13fb2, VBL=3801, FrameCycles=210196, HBL=206, LineCycles=900, DSP=N/A
00013fb2 6100 ff34                bsr.w #$ff34 == $00013ee8
> d
(PC)
START:
00013fb2 6100 ff34                bsr.w #$ff34 == $00013ee8
00013fb6 4e71                     nop 
00013fb8 6100 005a                bsr.w #$005a == $00014014
DONE:
00013fbc 4e71                     nop 
00013fbe 4ced f000 0008           movem.l (a5,$0008) == $00014068,a4-a7
00013fc4 4e71                     nop 
00013fc6 4e71                     nop 
00013fc8 4e71                     nop 
00013fca 4e71                     nop 
00013fcc 4267                     clr.w -(a7) [0000]
00013fce 4e41                     trap #$01
INCHNE:
00013fd0 610a                     bsr.b #$0a == $00013fdc
00013fd2 c0bc 0000 ffff           and.l #$0000ffff,d0
00013fd8 2d00                     move.l d0,-(a6) [00000000]
00013fda 4e75                     rts  == $00000000
INCHV:
00013fdc 3f3c 0002                move.w #$0002,-(a7) [0000]
00013fe0 3f3c 0002                move.w #$0002,-(a7) [0000]
00013fe4 4e4d                     trap #$0d
> 

Get a look at the registers and step through INITRT, watching the stack and run-time initilizations. Show the registers again at return, even if you don't need to see them before that.

Remember that INITRT returns through JMP A0, not RTS:

> r

  D0 00000000   D1 00000000   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000000 
  A0 00000000   A1 00000000   A2 00000000   A3 00000000 
  A4 00014060   A5 00014060   A6 00077FC6   A7 00077FF8 
USP  00077FF8 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 6100 (BSR) ff34 (ILLEGAL) Chip latch 00000000
00013fb2 6100 ff34                bsr.w #$ff34 == $00013ee8
Next PC: 00013fb6
> s

CPU=$13ee8, VBL=3801, FrameCycles=210216, HBL=206, LineCycles=920, DSP=N/A
00013ee8 205f                     movea.l (a7)+ [00013fb6],a0
> s

CPU=$13eea, VBL=3801, FrameCycles=210228, HBL=206, LineCycles=932, DSP=N/A
00013eea 47fa fe24                lea.l (pc,$fe24) == $00013d10,a3
> s

CPU=$13eee, VBL=3801, FrameCycles=210236, HBL=206, LineCycles=940, DSP=N/A
00013eee 48eb f000 0008           movem.l a4-a7,(a3,$0008) == $00013d18
> s

CPU=$13ef4, VBL=3801, FrameCycles=210280, HBL=206, LineCycles=984, DSP=N/A
00013ef4 2a4b                     movea.l a3,a5
> s

CPU=$13ef6, VBL=3801, FrameCycles=210284, HBL=206, LineCycles=988, DSP=N/A
00013ef6 4fed 0148                lea.l (a5,$0148) == $00013e58,a7
> s

CPU=$13efa, VBL=3801, FrameCycles=210292, HBL=206, LineCycles=996, DSP=N/A
00013efa 4ded 01d0                lea.l (a5,$01d0) == $00013ee0,a6
> s

CPU=$13efe, VBL=3801, FrameCycles=210300, HBL=206, LineCycles=1004, DSP=N/A
00013efe 4ed0                     jmp (a0)
> r
  D0 00000000   D1 00000000   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000000 
  A0 00013FB6   A1 00000000   A2 00000000   A3 00013D10 
  A4 00014060   A5 00013D10   A6 00013EE0   A7 00013E58 
USP  00013E58 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 4ed0 (JMP) 4286 (CLR) Chip latch 00000000
00013efe 4ed0                     jmp (a0)
Next PC: 00013f00
> 

Step into the main routine, PGSTRT, and, before you step too far, show the registers and get a disassembly from the PC. Take particular note of the parameter stack pointer, A6.

Remember, when you step,  and when you show registers, it shows you the next op-code to perform, not the one just completed:

> s

CPU=$13fb6, VBL=3801, FrameCycles=210308, HBL=206, LineCycles=1012, DSP=N/A
00013fb6 4e71                     nop 
> r
  D0 00000000   D1 00000000   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000000 
  A0 00013FB6   A1 00000000   A2 00000000   A3 00013D10 
  A4 00014060   A5 00013D10   A6 00013EE0   A7 00013E58 
USP  00013E58 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 4e71 (NOP) 6100 (BSR) Chip latch 00000000
00013fb6 4e71                     nop 
Next PC: 00013fb8
> s

CPU=$13fb8, VBL=3801, FrameCycles=210312, HBL=207, LineCycles=0, DSP=N/A
00013fb8 6100 005a                bsr.w #$005a == $00014014
> s

CPU=$14014, VBL=3801, FrameCycles=210332, HBL=207, LineCycles=20, DSP=N/A
00014014 41fa ffd4                lea.l (pc,$ffd4) == $00013fea,a0
> s

CPU=$14018, VBL=3801, FrameCycles=210340, HBL=207, LineCycles=28, DSP=N/A
00014018 2d08                     move.l a0,-(a6) [00000000]
> r
  D0 00000000   D1 00000000   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000000 
  A0 00013FEA   A1 00000000   A2 00000000   A3 00013D10 
  A4 00014060   A5 00013D10   A6 00013EE0   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 2d08 (MOVE) 6100 (BSR) Chip latch 00000000
00014018 2d08                     move.l a0,-(a6) [00000000]
Next PC: 0001401a
> s

CPU=$1401a, VBL=3801, FrameCycles=210352, HBL=207, LineCycles=40, DSP=N/A
0001401a 6100 ff50                bsr.w #$ff50 == $00013f6c
> d
(PC)
0001401a 6100 ff50                bsr.w #$ff50 == $00013f6c
0001401e 6100 ffb0                bsr.w #$ffb0 == $00013fd0
00014022 2d16                     move.l (a6) [00013fea],-(a6) [00000000]
00014024 41fa ffe2                lea.l (pc,$ffe2) == $00014008,a0
00014028 2d08                     move.l a0,-(a6) [00000000]
0001402a 6100 ff40                bsr.w #$ff40 == $00013f6c
0001402e 6100 ff24                bsr.w #$ff24 == $00013f54
00014032 2d16                     move.l (a6) [00013fea],-(a6) [00000000]
00014034 41fa ffd7                lea.l (pc,$ffd7) == $0001400d,a0
00014038 2d08                     move.l a0,-(a6) [00000000]
0001403a 6100 ff30                bsr.w #$ff30 == $00013f6c
0001403e 6100 fef0                bsr.w #$fef0 == $00013f30
00014042 2d16                     move.l (a6) [00013fea],-(a6) [00000000]
00014044 41fa ffc9                lea.l (pc,$ffc9) == $0001400f,a0
00014048 2d08                     move.l a0,-(a6) [00000000]
0001404a 6100 ff20                bsr.w #$ff20 == $00013f6c
0001404e 6100 feb0                bsr.w #$feb0 == $00013f00
00014052 6100 fef6                bsr.w #$fef6 == $00013f4a
00014056 221e                     move.l (a6)+ [00013fea],d1
00014058 b23c 0051                cmp.b #$51,d1
0001405c 66b6                     bne.b #$b6 == $00014014 (T)
0001405e 4e75                     rts  == $00013fbc
>

Use the listing and the disassembly to work out the addresses for the breakpoints, and set a breakpoint after every call:

> b pc=$1401e
CPU condition breakpoint 2 with 1 condition(s) added:
	pc = $1401e
> b pc=$14022
CPU condition breakpoint 3 with 1 condition(s) added:
	pc = $14022
> b pc=$1402e
CPU condition breakpoint 4 with 1 condition(s) added:
	pc = $1402e
> b pc=$14032
CPU condition breakpoint 5 with 1 condition(s) added:
	pc = $14032
> b pc=$1403e
CPU condition breakpoint 6 with 1 condition(s) added:
	pc = $1403e
> b pc=$14042
CPU condition breakpoint 7 with 1 condition(s) added:
	pc = $14042
> b pc=$1404e
CPU condition breakpoint 8 with 1 condition(s) added:
	pc = $1404e
> b pc=$14052
CPU condition breakpoint 9 with 1 condition(s) added:
	pc = $14052
> b pc=$14056
CPU condition breakpoint 10 with 1 condition(s) added:
	pc = $14056
> 

Show the registers and continue to the first breakpoint, and repeat, watching the stacks, in particular. Check the listing for what should be on the parameter stack before and after each call.

The first call is to OUTS, and the address of the PROMPT string should be on the stack. From $13EE0 at empty stack to $13EDC is four bytes, so that's one address.

> r
  D0 00000000   D1 00000000   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000000 
  A0 00013FEA   A1 00000000   A2 00000000   A3 00013D10 
  A4 00014060   A5 00013D10   A6 00013EDC   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 6100 (BSR) ff50 (ILLEGAL) Chip latch 00000000
0001401a 6100 ff50                bsr.w #$ff50 == $00013f6c
Next PC: 0001401e
> c
Returning to emulation...
2. CPU breakpoint condition(s) matched 1 times.
	pc = $1401e

CPU=$1401e, VBL=3802, FrameCycles=27092, HBL=26, LineCycles=676, DSP=N/A
0001401e 6100 ffb0                bsr.w #$ffb0 == $00013fd0
> r
  D0 00000001   D1 00000000   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000000 
  A0 00000000   A1 00002F50   A2 00000000   A3 00014008 
  A4 00014060   A5 00013D10   A6 00013EE0   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=1 V=0 C=0 IMASK=3 STP=0
Prefetch 6100 (BSR) ffb0 (ILLEGAL) Chip latch 00000000
0001401e 6100 ffb0                bsr.w #$ffb0 == $00013fd0
Next PC: 00014022
> 

When it returns, A6 = $13EE0 shows that the string address has been removed, and the stack is empty again.

Now it calls INCHNE,

> c
Returning to emulation...
3. CPU breakpoint condition(s) matched 1 times.
	pc = $14022

CPU=$14022, VBL=3803, FrameCycles=232088, HBL=228, LineCycles=440, DSP=N/A
00014022 2d16                     move.l (a6) [0000000d],-(a6) [00000000]
> r
  D0 0000000D   D1 00002310   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000000 
  A0 00002F50   A1 00002F50   A2 00000000   A3 00014008 
  A4 00014060   A5 00013D10   A6 00013EDC   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 2d16 (MOVE) 41fa (LEA) Chip latch 00000000
00014022 2d16                     move.l (a6) [0000000d],-(a6) [00000000]
Next PC: 00014024
> 

When it returns from INCHNE, it has the character on the stack in a full 4 byte integer ($13EDC). 

I didn't look at the contents of the parameter stack here, but, if you need to, you can use the (m)emory dump command

> m a6 32

to show the top 32 bytes.

It should be noted that the comment "duplicate" in the source code somehow moved two lines below where it should be.

> s

CPU=$14024, VBL=3803, FrameCycles=232108, HBL=228, LineCycles=460, DSP=N/A
00014024 41fa ffe2                lea.l (pc,$ffe2) == $00014008,a0
> s

CPU=$14028, VBL=3803, FrameCycles=232116, HBL=228, LineCycles=468, DSP=N/A
00014028 2d08                     move.l a0,-(a6) [00000000]
> s

CPU=$1402a, VBL=3803, FrameCycles=232128, HBL=228, LineCycles=480, DSP=N/A
0001402a 6100 ff40                bsr.w #$ff40 == $00013f6c
> r
  D0 0000000D   D1 00002310   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000000 
  A0 00014008   A1 00002F50   A2 00000000   A3 00014008 
  A4 00014060   A5 00013D10   A6 00013ED4   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 6100 (BSR) ff40 (ILLEGAL) Chip latch 00000000
0001402a 6100 ff40                bsr.w #$ff40 == $00013f6c
Next PC: 0001402e
> 

At this point, A6 is $13Ed4 -- three integers on stack: the character input, a copy (duplicate) of the character, and a pointer to bit of leader text to demarcate it. And we're going to call OUTS and OUTC, to show the character.

> c
Returning to emulation...
4. CPU breakpoint condition(s) matched 1 times.
	pc = $1402e

CPU=$1402e, VBL=3803, FrameCycles=243532, HBL=239, LineCycles=708, DSP=N/A
0001402e 6100 ff24                bsr.w #$ff24 == $00013f54
> r
  D0 00000001   D1 0007A309   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000000 
  A0 000000A0   A1 00002F50   A2 00000000   A3 0001400D 
  A4 00014060   A5 00013D10   A6 00013ED8   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=1 V=0 C=0 IMASK=3 STP=0
Prefetch 6100 (BSR) ff24 (ILLEGAL) Chip latch 00000000
0001402e 6100 ff24                bsr.w #$ff24 == $00013f54
Next PC: 00014032
> c
Returning to emulation...
5. CPU breakpoint condition(s) matched 1 times.
	pc = $14032

CPU=$14032, VBL=3803, FrameCycles=245588, HBL=241, LineCycles=732, DSP=N/A
00014032 2d16                     move.l (a6) [0000000d],-(a6) [0000000d]
> 

I should have shown the registers again, to show that $A6 was back to $13EDC after the call. But I stepped. It's okay, we can deduce where things were from the next register dump.

> s

CPU=$14034, VBL=3803, FrameCycles=245608, HBL=241, LineCycles=752, DSP=N/A
00014034 41fa ffd7                lea.l (pc,$ffd7) == $0001400d,a0
> s

CPU=$14038, VBL=3803, FrameCycles=245616, HBL=241, LineCycles=760, DSP=N/A
00014038 2d08                     move.l a0,-(a6) [00014008]
> s

CPU=$1403a, VBL=3803, FrameCycles=245628, HBL=241, LineCycles=772, DSP=N/A
0001403a 6100 ff30                bsr.w #$ff30 == $00013f6c
> r
  D0 00000001   D1 00000000   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 0000000D 
  A0 0001400D   A1 00002F50   A2 00000000   A3 0001400D 
  A4 00014060   A5 00013D10   A6 00013ED4   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 6100 (BSR) ff30 (ILLEGAL) Chip latch 00000000
0001403a 6100 ff30                bsr.w #$ff30 == $00013f6c
Next PC: 0001403e
> 

Before the call to put the colon before the character code in binary out, the character, a copy, and the string address on stack -- $13ED4.

> c
Returning to emulation...
6. CPU breakpoint condition(s) matched 1 times.
	pc = $1403e

CPU=$1403e, VBL=3803, FrameCycles=248512, HBL=244, LineCycles=608, DSP=N/A
0001403e 6100 fef0                bsr.w #$fef0 == $00013f30
> r
  D0 00000001   D1 0007A304   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000000 
  A0 000000A0   A1 00002F50   A2 00000000   A3 0001400F 
  A4 00014060   A5 00013D10   A6 00013ED8   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=1 V=0 C=0 IMASK=3 STP=0
Prefetch 6100 (BSR) fef0 (ILLEGAL) Chip latch 00000000
0001403e 6100 fef0                bsr.w #$fef0 == $00013f30
Next PC: 00014042
> 

Between the call to put the colon out and the call to put the binary character code out. The character and a copy on the stack -- $13ED8.

Next we let it call OUTB8:

> c
Returning to emulation...
7. CPU breakpoint condition(s) matched 1 times.
	pc = $14042

CPU=$14042, VBL=3804, FrameCycles=7176, HBL=7, LineCycles=64, DSP=N/A
00014042 2d16                     move.l (a6) [0001400d],-(a6) [00000000]
> r
  D0 00000001   D1 0007A314   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000031 
  A0 000000A0   A1 00002F50   A2 00000000   A3 0001400F 
  A4 00014060   A5 00013D10   A6 00013ED4   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=1 V=0 C=0 IMASK=3 STP=0
Prefetch 2d16 (MOVE) 41fa (LEA) Chip latch 00000000
00014042 2d16                     move.l (a6) [0001400d],-(a6) [00000000]
Next PC: 00014044
> 

After the binary output, A6 is $13ED4. It should be back to just the character on the stack, $13EDC.

But let's trace through and see what else we can see.

> s

CPU=$14044, VBL=3804, FrameCycles=7196, HBL=7, LineCycles=84, DSP=N/A
00014044 41fa ffc9                lea.l (pc,$ffc9) == $0001400f,a0
> s

CPU=$14048, VBL=3804, FrameCycles=7204, HBL=7, LineCycles=92, DSP=N/A
00014048 2d08                     move.l a0,-(a6) [00000000]
> s

CPU=$1404a, VBL=3804, FrameCycles=7216, HBL=7, LineCycles=104, DSP=N/A
0001404a 6100 ff20                bsr.w #$ff20 == $00013f6c
> r
  D0 00000001   D1 0007A314   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000031 
  A0 0001400F   A1 00002F50   A2 00000000   A3 0001400F 
  A4 00014060   A5 00013D10   A6 00013ECC   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 6100 (BSR) ff20 (ILLEGAL) Chip latch 00000000
0001404a 6100 ff20                bsr.w #$ff20 == $00013f6c
Next PC: 0001404e
> c
Returning to emulation...
8. CPU breakpoint condition(s) matched 1 times.
	pc = $1404e

CPU=$1404e, VBL=3804, FrameCycles=15812, HBL=15, LineCycles=572, DSP=N/A
0001404e 6100 feb0                bsr.w #$feb0 == $00013f00
> r
  D0 00000001   D1 0007A319   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000000   D7 00000000 
  A0 000000A0   A1 00002F50   A2 00000000   A3 00014013 
  A4 00014060   A5 00013D10   A6 00013ED0   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=1 V=0 C=0 IMASK=3 STP=0
Prefetch 6100 (BSR) feb0 (ILLEGAL) Chip latch 00000000
0001404e 6100 feb0                bsr.w #$feb0 == $00013f00
Next PC: 00014052
> c
Returning to emulation...
9. CPU breakpoint condition(s) matched 1 times.
	pc = $14052

CPU=$14052, VBL=3804, FrameCycles=21640, HBL=21, LineCycles=304, DSP=N/A
00014052 6100 fef6                bsr.w #$fef6 == $00013f4a
> r
  D0 00000001   D1 0007A31D   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000044   D7 00000044 
  A0 000000A0   A1 00002F50   A2 00000000   A3 00014013 
  A4 00014060   A5 00013D10   A6 00013ED4   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 6100 (BSR) fef6 (ILLEGAL) Chip latch 00000000
00014052 6100 fef6                bsr.w #$fef6 == $00013f4a
Next PC: 00014056
> c
Returning to emulation...
10. CPU breakpoint condition(s) matched 1 times.
	pc = $14056

CPU=$14056, VBL=3804, FrameCycles=27624, HBL=27, LineCycles=192, DSP=N/A
00014056 221e                     move.l (a6)+ [0001400d],d1
> r
  D0 00000001   D1 00000000   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000044   D7 0000000A 
  A0 00000000   A1 00002F50   A2 00000000   A3 00014013 
  A4 00014060   A5 00013D10   A6 00013ED4   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 221e (MOVE) b23c (CMP) Chip latch 00000000
00014056 221e                     move.l (a6)+ [0001400d],d1
Next PC: 00014058
> s

CPU=$14058, VBL=3804, FrameCycles=27636, HBL=27, LineCycles=204, DSP=N/A
00014058 b23c 0051                cmp.b #$51,d1
> s

CPU=$1405c, VBL=3804, FrameCycles=27644, HBL=27, LineCycles=212, DSP=N/A
0001405c 66b6                     bne.b #$b6 == $00014014 (T)
> r
  D0 00000001   D1 0001400D   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000044   D7 0000000A 
  A0 00000000   A1 00002F50   A2 00000000   A3 00014013 
  A4 00014060   A5 00013D10   A6 00013ED8   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=1 Z=0 V=0 C=1 IMASK=3 STP=0
Prefetch 66b6 (Bcc) 4e75 (RTS) Chip latch 00000000
0001405c 66b6                     bne.b #$b6 == $00014014 (T)
Next PC: 0001405e
> s

CPU=$14014, VBL=3804, FrameCycles=27656, HBL=27, LineCycles=224, DSP=N/A
00014014 41fa ffd4                lea.l (pc,$ffd4) == $00013fea,a0
> s

CPU=$14018, VBL=3804, FrameCycles=27664, HBL=27, LineCycles=232, DSP=N/A
00014018 2d08                     move.l a0,-(a6) [0001400d]
> s

CPU=$1401a, VBL=3804, FrameCycles=27676, HBL=27, LineCycles=244, DSP=N/A
0001401a 6100 ff50                bsr.w #$ff50 == $00013f6c
> r
  D0 00000001   D1 0001400D   D2 00000000   D3 00000000 
  D4 00000000   D5 00000000   D6 00000044   D7 0000000A 
  A0 00013FEA   A1 00002F50   A2 00000000   A3 00014013 
  A4 00014060   A5 00013D10   A6 00013ED4   A7 00013E54 
USP  00013E54 ISP  00007E64 
T=00 S=0 M=0 X=0 N=0 Z=0 V=0 C=0 IMASK=3 STP=0
Prefetch 6100 (BSR) ff50 (ILLEGAL) Chip latch 00000000
0001401a 6100 ff50                bsr.w #$ff50 == $00013f6c
Next PC: 0001401e
> 

And as we watched it through, OUTHX8 and OUTNWLN did what was expected on the stack, leaving the stack with the 8 extra bytes OUTB8 left us with still on the stack for the beginning of the next go through.

Which explains why A6 was expanding until it walked on the return stack, and why, when the CPU tried to return to something that was not the return address, we ended up trying to execute who knows what, with A7 set to who knows what.

So, we could look at OUTB8 in rt_rig03_68K.s and see that, yes, indeed, we do  seem to be subtracting 4 from A6 on exit instead of adding 4 to drop the parameter and the temporary variable.

Maybe I got confused about which way the stack shrinks when I decided to use ADDQ/SUBQ?

* Output the 8-bit number on the stack in binary (base two).
* For consistency, we are passing the byte in the lowest-order byte
* of a 32-bit word.
* Uses D6, D7.
OUTB8	MOVE.L	(A6),D6	; shift on memory is 16-bit, use register
	MOVE.W	#8,(A6)	; 8 bits to output, borrow parameter high word.
OUTB8L	LSL.B	#1,D6	; Get the leftmost bit of the lowest byte.
	BCS.S	OUTB81
OUTB80	MOVEQ.L	#'0',D7
	BRA.S	OUTB8D
OUTB81	MOVEQ.L	#'1',D7
OUTB8D	BSR.S	OUTCV
	SUBQ.W	#1,(A6)
	BNE.S	OUTB8L	; loop if not Zero
	SUBQ.L	#NATWID,A6	; drop parameter character
	RTS

And subtracting 4 instead of adding 4 would, indeed, leave 8 bytes too many on exit.

But wouldn't OUT8HX then output trash on the stack, probably something that was not the character code at all?

Clear the breakpoints and let it run and see if it does.

 

Oh. That's what's happening. And it won't even respond to Q for quit. The only way out is to crash it. Ouch.

Are we convinced?

Then why put the balance checks in? Just for practice?

Yeah. Practice is good.

If we do a quick search through the code, we aren't using D3 through D5 anywhere at this point. We only need to put stack checks in the main routine and in the OUTB8 routine, so two markers should be sufficient. No need for nesting markers, either.

If we use D3 to mark the stack in the PGSTRT and D4 to mark the stack in OUTB8, we don't need to declare variables in memory, and that will help us limit the impact of the debugging code we insert.

For inserting the code, we could talk about conditional assembly, but we really need to start simple, so let's just put marker comments around the code we insert.

Here's what we'll do with OUTB8, with the bug still in place: 

* Output the 8-bit number on the stack in binary (base two).
* For consistency, we are passing the byte in the lowest-order byte
* of a 32-bit word.
* Uses D6, D7.
OUTB8	MOVE.L	(A6),D6	; shift on memory is 16-bit, use register
******
* DEBUG 1 input parameter, no output parameters
	LEA	NATWID(A6),A0	; this is what A6 should be when we leave.
	MOVE.L	A0,D4
* END DEBUG
******
	MOVE.W	#8,(A6)	; 8 bits to output, borrow parameter high word.
OUTB8L	LSL.B	#1,D6	; Get the leftmost bit of the lowest byte.
	BCS.S	OUTB81
OUTB80	MOVEQ.L	#'0',D7
	BRA.S	OUTB8D
OUTB81	MOVEQ.L	#'1',D7
OUTB8D	BSR.S	OUTCV
	SUBQ.W	#1,(A6)
	BNE.S	OUTB8L	; loop if not Zero
	SUBQ.L	#NATWID,A6	; drop parameter character
******
* DEBUG
	CMP.L	D4,A6
	BNE.W	ERROR
* END DEBUG
******
	RTS
And here's what we'll do with PGSTRT:
	EVEN
PGSTRT	LEA	PROMPT(PC),A0
******
* DEBUG 1 input parameter, no output parameters
	MOVE.L	A6,D3	; this is what A6 should be when we leave.
* END DEBUG
******
	MOVE.L	A0,-(A6)
	BSR.W	OUTS
	BSR.W	INCHNE	; Hold off echo
	MOVE.L	(A6),-(A6)	; duplicate
	LEA	KEYCOL(PC),A0
	MOVE.L	A0,-(A6)
	BSR.W	OUTS
	BSR.W	OUTC	; output character
	MOVE.L	(A6),-(A6)	; duplicate
	LEA	COLBIN(PC),A0
	MOVE.L	A0,-(A6)
	BSR.W	OUTS
	BSR.W	OUTB8
	MOVE.L	(A6),-(A6)	; duplicate
	LEA	COLHEX(PC),A0
	MOVE.L	A0,-(A6)
	BSR.W	OUTS
	BSR.W	OUTHX8
	BSR.W	OUTNWLN
	MOVE.L	(A6)+,D1	; balance stack
	CMP.B	#ASCQ,D1
	BNE.S	PGSTRT
******
* DEBUG
	CMP.L	D3,A6
	BNE.W	ERROR
* END DEBUG
******
	RTS

You should be able to just copy the debug lines and paste them into place. Don't forget the comment lines so you know what to remove.

Add a NOP immediately after DONE, with the label ERROR:

START	BSR.W	INITRT
	NOP		; place to set breakpoint
*
	BSR.W	PGSTRT
*
DONE	NOP		; place to set breakpoint
ERROR	NOP		; place to go on errors
	MOVEM.L	A4SAVE-LOCBAS(A5),A4-A7	; restore the monitor's A4-A7

Assemble it and start an Hatari session, breaking out and setting a breakpoint at TEXT, as usual:

$ vasmm68k_mot -Ftos -no-opt -o INKEY_68K.PRG -L inkey_68K.list inkey_68K.s
vasm 1.9f (c) in 2002-2023 Volker Barthelmann
vasm M68k/CPU32/ColdFire cpu backend 2.6c (c) 2002-2023 Frank Wille
vasm motorola syntax module 3.18 (c) 2002-2023 Frank Wille
vasm tos output module 2.3 (c) 2009-2016,2020,2021,2023 Frank Wille

text(acrx2):	         870 bytes
nova@she:~/usr/share/hatari/C:/primer/char_io/stepinch$ hatari
INFO : Hatari v2.4.0-devel (Dec 18 2021), compiled on:  Dec 18 2021, 11:41:51
INFO : Inserted disk '/home/nova/usr/share/hatari/fig68kwrk.st' to drive A:.
INFO : Inserted disk '/home/nova/usr/share/hatari/stuff.st' to drive B:.
MMU emulation requires 68030/040/060 and it is not JIT compatible.
INFO : Mounting IDE hard drive image /home/nova/work/emu/hatari/hd80mb.image
INFO : GEMDOS HDD emulation, C: <-> /home/nova/usr/share/hatari/C:.
WARN : GEMDOS HD drive C: (may) override ACSI/SCSI/IDE image partitions!
MMU emulation requires 68030/040/060 and it is not JIT compatible.
WARN : Bus Error reading at address $ffffa200, PC=$e00ce2 addr_e3=e00ce2 op_e3=4a10
WARN : No GEMDOS dir '/home/nova/usr/share/hatari/C:/AUTO'

----------------------------------------------------------------------
You have entered debug mode. Type c to continue emulation, h for help.

CPU=$e1d7e2, VBL=1395, FrameCycles=128, HBL=0, LineCycles=128, DSP=N/A
00e1d7e2 46c0                     move.w d0,sr
> b pc=TEXT
CPU condition breakpoint 1 with 1 condition(s) added:
	pc = TEXT
> c
Returning to emulation...

When you run INKEY_68K.PRG (or pretty much anything not built-in) from the EmuTOS console it will break. Step once to the START label and disassemble from the PC:

1. CPU breakpoint condition(s) matched 1 times.
	pc = TEXT
Reading symbols from program '/home/nova/usr/share/hatari/C:/primer/char_io/stepinch/INKEY_68K.PRG' symbol table...
TOS executable, DRI / GST symbol table, reloc=0, program flags: PRIVATE (0x0)
Program section sizes:
  text: 0x366, data: 0x0, bss: 0x0, symtab: 0x33a
Trying to load DRI symbol table at offset 0x382...
Offsetting BSS/DATA symbols from TEXT section.
Skipping duplicate address & symbol name checks when autoload is enabled.
Loaded 57 symbols (42 TEXT) from '/home/nova/usr/share/hatari/C:/primer/char_io/stepinch/INKEY_68K.PRG'.

CPU=$13d10, VBL=102109, FrameCycles=166200, HBL=163, LineCycles=592, DSP=N/A
00013d10 6000 02ac                bra.w #$02ac == $00013fbe (T)
> s

CPU=$13fbe, VBL=102109, FrameCycles=166212, HBL=163, LineCycles=604, DSP=N/A
00013fbe 6100 ff28                bsr.w #$ff28 == $00013ee8
> d
(PC)
START:
00013fbe 6100 ff28                bsr.w #$ff28 == $00013ee8
00013fc2 4e71                     nop 
00013fc4 6100 005c                bsr.w #$005c == $00014022
DONE:
00013fc8 4e71                     nop 
ERROR:
00013fca 4e71                     nop 
00013fcc 4ced f000 0008           movem.l (a5,$0008) == $0001407e,a4-a7
00013fd2 4e71                     nop 
00013fd4 4e71                     nop 
00013fd6 4e71                     nop 
00013fd8 4e71                     nop 
00013fda 4267                     clr.w -(a7) [0000]
00013fdc 4e41                     trap #$01
INCHNE:
00013fde 610a                     bsr.b #$0a == $00013fea
00013fe0 c0bc 0000 ffff           and.l #$0000ffff,d0
00013fe6 2d00                     move.l d0,-(a6) [00000000]
00013fe8 4e75                     rts  == $00000000
INCHV:
00013fea 3f3c 0002                move.w #$0002,-(a7) [0000]
> 

Set two breakpoints, one at the label DONE and one at ERROR (the NOP immediately after). Then continue:

> b pc=$13fc8
CPU condition breakpoint 2 with 1 condition(s) added:
	pc = $13fc8
> b pc=$13fca
CPU condition breakpoint 3 with 1 condition(s) added:
	pc = $13fca
> c
Returning to emulation...

Back in the EmuTOS console, it will be waiting for you to hit a key. The first key you hit, it should break, and the console should not respond. Return to the debugger and check which breakpoint it took. It should be the one at the ERROR label:

3. CPU breakpoint condition(s) matched 1 times.
	pc = $13fca

CPU=$13fca, VBL=140869, FrameCycles=94972, HBL=93, LineCycles=484, DSP=N/A
00013fca 4e71                     nop 
> 

In this case, you can see it was, at $13FCA, which is where we set the second breakpoint. Well, where I set it.

We probably should have set up two ERROR labels, one for OUTB8 to jump to, and one for the main routine, PGSTRT, to jump to. But you can check the return stack for clues:

> r
  D0 00000001   D1 0007A31D   D2 00000000   D3 00013EE0 
  D4 00013EDC   D5 00000000   D6 00000000   D7 00000030 
  A0 000000A0   A1 00002F50   A2 00000000   A3 0001401D 
  A4 00014076   A5 00013D10   A6 00013ED4   A7 00013E50 
USP  00013E50 ISP  00007E64 
T=00 S=0 M=0 X=0 N=1 Z=0 V=0 C=1 IMASK=3 STP=0
Prefetch 4e71 (NOP) 4ced (MVMEL) Chip latch 00000000
00013fca 4e71                     nop 
Next PC: 00013fcc
> m A7 32
00013E50: 00 01 40 52 00 01 3f c8 00 00 00 00 00 00 00 00   ..@R..?.........
00013E60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
>

If i had stepped through the initializations and done a register dump with the stacks set up, we would have an idea what A7 should be.

Those addresses one the return stack should be return addresses, pointing at code:

> d $14052
00014052 2d16                     move.l (a6) [0001401b],-(a6) [00000000]
00014054 41fa ffc7                lea.l (pc,$ffc7) == $0001401d,a0
00014058 2d08                     move.l a0,-(a6) [00000000]
0001405a 6100 ff1c                bsr.w #$ff1c == $00013f78
0001405e 6100 fea0                bsr.w #$fea0 == $00013f00
00014062 6100 fef2                bsr.w #$fef2 == $00013f56
00014066 221e                     move.l (a6)+ [0001401b],d1
00014068 b23c 0051                cmp.b #$51,d1
0001406c 66b4                     bne.b #$b4 == $00014022 (T)
0001406e bdc3                     cmpa.l d3,a6
00014070 6600 ff58                bne.w #$ff58 == $00013fca (T)
00014074 4e75                     rts  == $00014052
00014076 0000 0000                or.b #$00,d0
...
> d $13fc8
DONE:
00013fc8 4e71                     nop 
(PC)
ERROR:
00013fca 4e71                     nop 
00013fcc 4ced f000 0008           movem.l (a5,$0008) == $00013d18,a4-a7
00013fd2 4e71                     nop 
00013fd4 4e71                     nop 
00013fd6 4e71                     nop 
00013fd8 4e71                     nop 
00013fda 4267                     clr.w -(a7) [3f48]
00013fdc 4e41                     trap #$01
INCHNE:
00013fde 610a                     bsr.b #$0a == $00013fea
00013fe0 c0bc 0000 ffff           and.l #$0000ffff,d0
00013fe6 2d00                     move.l d0,-(a6) [00000000]
00013fe8 4e75                     rts  == $00014052
INCHV:
00013fea 3f3c 0002                move.w #$0002,-(a7) [3f48]
00013fee 3f3c 0002                move.w #$0002,-(a7) [3f48]
00013ff2 4e4d                     trap #$0d
00013ff4 588f                     addaq.l #$04,a7
00013ff6 4e75                     rts  == $00014052
> 

That pretty much proves that the jump to ERROR was from OUTB8. 

And we can also look at what's on the parameter stack:

> m a6 32
00013ED4: 00 01 40 1b 00 00 00 6a 00 00 00 6a 00 00 00 00   ..@....j...j....
00013EE4: 00 00 00 00 20 5f 47 fa fe 24 48 eb f0 00 00 08   .... _G..$H.....
> 

We can see two copies of the character of the key I hit, "j".

I wonder what's at $1401b?

> d $1401b
COLBIN:
0001401b 3a00                     move.w d0,d5
COLHEX:
0001401d 3a20                     move.w -(a0) [079c],d5
0001401f 2400                     move.l d0,d2
...

COLBIN and COLHEX. Okay,

> m $1401b 32
0001401B: 3a 00 3a 20 24 00 00 41 fa ff d4 26 0e 2d 08 61   :.: $..A...&.-.a
0001402B: 00 ff 4c 61 00 ff ae 2d 16 41 fa ff e0 2d 08 61   ..La...-.A...-.a
> m $14000 32
00014000: 6e 79 20 6b 65 79 2c 20 51 20 74 6f 20 71 75 69   ny key, Q to qui
00014010: 74 2e 20 0d 0a 00 4b 45 59 3a 00 3a 00 3a 20 24   t. ...KEY:.:.: $
> 

So that's the address of the colon that goes before the binary output, leftover from the call in PGSTRT. Dead on. 

Let's fix the code in OUTB8 and run it with the stack checks in place. I've told you what to delete, and you know what to fix. The question is whether to use

	ADDQ	#NATWID,A6

or

	LEA	NATWID,(A6),A6

Which I will leave up to you, along with actually stepping through the code and proving that it doesn't ERROR out. 

Don't assume that it will work as advertised. Go ahead and take the fifteen minutes or so to try it. Practice is essential. And you may think of something else interesting to try while you're at it.

I've been planning next to build a post-fix (RPN) integer calculator that only does addition and subtraction, in binary, octal, or hexadecimal. We know enough to do this, and, when we have it working, we can add multiplication and division and more interesting stuff.


(Title Page/Index)