Wednesday, December 18, 2024

ALPP 03-XX -- Radix Output

False start, kept for reference.

Radix  Output

(Title Page/Index)

 

Now that we've debugged getting a key from the ST's keyboard and outputting its ASCII code value in hexadecimal and binary on the 68000, a natural next step would be to learn how to parse numbers from the input. But that will require multiplying and dividing by ten.

Why? Because we usually interact with numbers in decimal base -- radix base ten.

While we can do that on the 68000, we haven't really talked about it, and we haven't looked at how to synthesize multiplication and division on 8-bit CPUs that don't have them.

So, instead of going directly to parsing numbers, I want to look at multiplication and division, at least enough to be able to multiply and divide by ten. 

But we've already been multiplying and dividing by two and sixteen, haven't we? 

Haven't we?

Let's look again at getting both binary and hexadecimal output. We need to understand what we are doing there.

When converting to binary from base ten by hand, the usual approach (ignoring fractions) is 

Set the radix point (fraction/decimal point) on the right.
Do until all digits (bits) converted (until quotient is zero):
  Divide the number by 2, keeping both quotient and remainder.
  Convert the remainder to a character and 
    write it down as the next digit,
    going left from the radix point.
  Repeat with the quotient.

Now, even taking into account that this algorithmic description is rather loose, looking at what we were doing in the 6800 chapter, it looks different, doesn't it? 

We were going left-to-right, and not even noticing the radix point until we were done, if then.

Let's look at the 6809 code again (since I think the 6809 code is easier to read):

* Output a 0
OUT0	LDB	#'0
OUT01	PSHU	D
	LBSR	OUTC
	RTS
*
* Output a 1 
OUT1	LDB	#'1
	BRA	OUT01
* Rob code, shave a couple of bytes, waste a few cycles.
*
* Output the 8-bit binary (base two) number on the stack.
* For consistency, we are passing the byte in the low-order byte
* of a 16-bit word.
OUTB8	LDB	#8	; 8 bits
	STB	0,U	; Borrow the upper byte of the parameter.
OUTB8L	LSL	1,U	; Get the leftmost bit of the lower byte.
	BCS	OUTB81
OUTB80	BSR	OUT0
	BRA	OUTB8D
OUTB81	BSR	OUT1
OUTB8D	DEC	,U
	BNE	OUTB8L	; loop if not Zero
	LEAU	2,U	; drop parameter bytes
	RTS

In human language, that's going to look like

Do:
  Shift the bits left, capturing the bit off the top.
  Convert the captured bit to a character and
    write it down as the next digit,
    going right.
  Repeat until no bits remain to be converted.

Yep. Going the opposite direction. And the radix point just ended up where we stopped.

Completely backwards!

What's going on here?

You'll remember that I mentioned that shifting digits to the left (shifting the radix point to the right and filling with zeroes) is the same as multiplying by the radix.

You don't remember that I said that?

What did I say? Ah, here it is, in the chapter on hexadecimal output on the 6800:

... shifting is division and multiplication by powers of two. ...

a little before talking about moving the radix point in decimal numbers, which is the same as shifting decimal digits.

So, shifting bits to the left is multiplying by two. And shifting bits to the right is dividing by two.

And when we grabbed the bit that came off the high end into the carry, we were just grabbing the bits as the came off, right?

Here's how I want to see that. On the one hand we were multiplying by two. On the other hand, the top bit came off into the carry, and we grabbed it. So we were shifting left bey7. 

Which is dividing by 27, dividing by 128.

This is because the byte is 8 bits, and the 8 bits form something mathematicians call a ring, which we aren't going to describe in detail because I don't want to put everyone to sleep.

But it's mathematics. We can rely on it. Multiplication in a ring is division, and sometimes that is useful.

Now, we could do this:

NUMBUF	RMB	34	; enough for 32 bits of output
*
CNVB8	TFR	DP,A	; point to the direct page
	CLRB
	TFR	D,Y
	LEAY	NUMBUF-LOCBAS,Y	; point to NUMBUF
	LEAY	9,Y	; start at the right
	CLR	,-Y	; NUL terminate it
	LDA	#8	; 8 bits
CNVB8L	LDB	#'0'	; ASCII '0'
	LSR	1,U	; Get the lowest bit into the carry
	ADCB	#0	; convert it to ASCII
	STB	,-Y	; build the string right-to-left
CNVB8D	DECA
	BNE	CNVB8L	; loop until counted out
	STY	,U	; return the address of the buffer
	RTS		; (this ought to work, anyway)

Now we can take the address that CNVB8 returns and pass it off to OUTS, and print the number as a string.

 

 

*********

 

 

With a little thought, we could figure out how to output the character code in base four or eight, as well. Any power of 2 would be just a matter of shifting the bits appropriately and adjusting the resultant value to a symbol that represents the value. 

For any base ten or less, the adjustment is really straightforward in ASCII-based characters -- just adding the value to the ASCII-based character code value of '0'. (This works for UNICODE, too, but we won't be going there.) 

For any base up to sixteen, if the resultant symbol exceeds the ASCII value of '9',  we further add one less than the difference between the ASCII for 'A' and the ASCII for '9'. Or we can test the value first and add the appropriate adjustment in one step: 

  • ASCII for '0' if less than or equal to 9 
  •  and ASCII for ten less than 'A' if greater than 9. 

We saw the former method in the 6800 code for hexadecimal output:

ASC0	EQU	'0	; Some assemblers won't handle 'c constants well.
ASC9	EQU	'9
ASCA	EQU	'A
ASCXGAP	EQU	ASCA-ASC9-1	; Gap between '9' and 'A' for hexadecimal
*
* Mask off and convert the nybble in B to ASCII numeric,
* including hexadecimals
OUTH4	ANDB	#$0F	; mask it off
	ADDB	#ASC0	; Add the ASCII for '0'
	CMPB	#ASC9	; Greater than '9'?
	BLS	OUTH4D	; no, output as is.
	ADDB	#ASCXGAP	; Adjust it to 'A' - 'F'
	...

This would also work as it is for a radix higher than sixteen, if we accept the approach usually taken in radix eleven through sixteen and continue with it, up to base 36 (highest valued digit 'Z').

There are reasons we may not want to do that, but it could be done. 

Anyway, we know we can handle the adjustment in the cases that interest us most. 

Now, let's look again at how we got each digit.

For binary, it was easy. Shift a bit off the left (high-order) end of the binary integer and convert it to ASCII '0' or '1':

* Output a 0
OUT0	LDAB	#'0
OUT01	JSR	PPSHD
	JSR	OUTC
	RTS
*
* Output a 1 
OUT1	LDAB	#'1
	BRA	OUT01
* :::
OUTB8L	LSL	1,X	; Get the leftmost bit.
	BCS	OUTB81
OUTB80	BSR	OUT0
	BRA	OUTB8D
OUTB81	BSR	OUT1

For hexadecimal, it may not be quite as clear that was what we were doing -- shifting a digit's worth of bits off the left, capturing them, and converting them to ASCII:

	LDAB	1,X	; get the byte
	LSRB		; move the hexadecimal digit into place
	LSRB
	LSRB
	LSRB
	BSR	OUTRAD	; convert to ASCII and output
	LDX	PSP
	LDAB	1,X
	ANDB	#$0F	; mask the high digit off
	BSR	OUTRAD	; convert to ASCII and output

Say WHAT?!?!? Those are right shifts! And then no shifts! just a bit-AND to mask off the  ...

Yeah, it would have been a little bit more plain like this:

	CLRB		; ready to capture high four bits
	LSL	1,X	; get high bit off top 
	RORB		; capture it
	LSL	1,X	; get next bit off top 
	RORB		; shift over and capture it
	LSL	1,X	; get next bit off top 
	RORB		; shift over and capture it
	LSL	1,X	; get next bit off top 
	RORB		; shift over and capture it
	BSR	OUTRAD
	CLRB		; ready to capture next four bits
	LSL	1,X	; get next bit off top 
	RORB		; capture it
	LSL	1,X	; get next bit off top 
	RORB		; shift over and capture it
	LSL	1,X	; get next bit off top 
	RORB		; shift over and capture it
	LSL	1,X	; get next bit off top 
	RORB		; shift over and capture it
	BSR	OUTRAD

If you can't see from just reading the code that the result in B and the output is the same, go ahead and substitute these lines of code into the code for chapter 03-05 and trace through it, watching the bits shift around. 

In either method, we shift the high four bits of the byte we're putting out in order, into the low four bits of B. 

Then, in the method above, we shift the low four bits back into the low four bits where they came from. But in the method of chapter 03-05 way we just leave them there and mask the high bits off.

If you think of the byte register as a ring of 8 bits, you might be able to see the bits coming back around. 

There's another way of looking at it. In chapter 03-05, we noted that shifting digits left one column is the same as multiplying by the radix. 

Shifting a decimal number left (by adding a zero to the right and moving the decimal fraction point to the right of the added zero) is the same as multiplying by ten. 

Shifting a binary number left one bit is the same as multiplying it by two.

Shifting a hexadecimal digit left by one digit is the same as multiplying by sixteen. Or, shifting a binary number left by four bits is the same as multiplying by sixteen.

How about shifting right? It's the same as dividing by the radix. We'll look at that in a bit.

So, back to thinking about output in base 4. 

If we want to output in base four, we can shift two bits left, capturing and outputting each pair as we go. Do it four times and we've got the byte output in quaternary base.

How about octal? If we shift three bits, then three more, we've only got two left, and that doesn't work. So what we should have done is recognize that we only had 8 bits to shift and only shifted two bits to start, then shifted three and three.

Why? 

It's helpful to note that or FFsixteen (all bits 1, 255ten) is 377eight. That's how you wright the maximum value of an 8-bit byte in base eight. And the high digit of that is 3, which only takes two bits in binary. So it makes sense that you would only shift off two bits for the first digit.

Now, if you were doing two bytes at once, that would be five sets of three bits and one bit for the most significant digit. 177777eight. is how you write FFFFsixteen (65535ten), the maximum value of a sixteen-bit number, for octal. For binary, it's sixteen digits: 11111111two. For quaternary, it's eight digits: 33333333four.

So we are getting some ideas how output in base four or base eight would work, and how to output sixteen bit values in any radix base that is a power of two. It's just shifting.

But base ten doesn't work like this.

Why?

Let me take you on a short detour through something called binary-coded decimal (BCD).

In hexadecimal, we can record a digit from  0sixteen to Fsixteen in four bits, right?

Well, what if we decide to only record digit values 0 through 9? It's a little bit wasteful, but it's enough to encode a decimal digit in four bits.

Let's see it:

0: 0000
1: 0001
2: 0010
3: 0011
4: 0100
5: 0101
6: 0110
7: 0111
8: 1000
9: 1001

Yep, it can be done. 

But, 10011001two (99sixteen) is (128 + 16 + 8 + 1) equal to 153ten

Where  10011001BCD is 99ten. Eaaaoooooohhh confusion!

But maybe we can see that shifting a BCD number four bits to the left is multiplying by ten? Maybe?

It is. We can play with that later. Let's set BCD aside for a moment.

The point is that, where we can divide binary numbers into fields of n bits for any radix base 2n, and we can even do something like that for binary coded decimal, trying to divide a straight binary number into fields of radix base ten is going to have us trying to use fractions of bits.

And we don't now how to do that.

I don't think anyone knows a good, simple way to do it, other than repeatedly dividing by ten, which isn't very simple in binary (which is why this chapter is so long). 

Dividing is shifting right. Right? (Sorry.)

It is.

I pointed out that shifting left 1 bit is the same as shifting right 7? Well, if you capture the bits correctly, anyway.

I'm going to use 6809 code for this example instead of 6800, because we can focus a bit better on what we are doing without having a lot of DEX instructions getting in the way.

Here's what we did for the 6809 binary output in chapter 03-03

OUTB8	LDB	#8	; 8 bits
	STB	0,U	; Borrow the upper byte of the parameter.
OUTB8L	LSL	1,U	; Get the leftmost bit of the lower byte.
	BCS	OUTB81
OUTB80	BSR	OUT0
	BRA	OUTB8D
OUTB81	BSR	OUT1
OUTB8D	DEC	,U
	BNE	OUTB8L	; loop if not Zero

Instead of shifting out the top and capturing the carry (multiplying by two and capturing the overflow), and writing the number left-to-right, let's divide by two and build the output string for the number right-to-left:

NUMBUF	RMB	34	; enough for 32 bits of output
*
CNVB8	TFR	DP,A	; point to the direct page
	CLRB
	TFR	D,Y
	LEAY	NUMBUF-LOCBAS,Y	; point to NUMBUF
	LEAY	9,Y	; start at the right
	CLR	,-Y	; NUL terminate it
	LDA	#8	; 8 bits
CNVB8L	LDB	#'0'	; ASCII '0'
	LSR	1,U	; Get the lowest bit into the carry
	ADCB	#0	; convert it to ASCII
	STB	,-Y	; build the string right-to-left
CNVB8D	DECA
	BNE	CNVB8L	; loop until counted out
	STY	,U	; return the address of the buffer
	RTS		; (this ought to work, anyway)

Now we can take the address that CNVB8 returns and pass it off to OUTS, and print the number as a string.

And we could take the same approach with the hexadecimal conversion, dividing by sixteen -- shifting four bits right and capturing them in order -- and converting and storing right-to-left.

(But we would actually make a copy, mask the high bits out, convert and store, then divide by sixteen for the next digit, because it's quicker that way. But we will ignore the optimization.)

If we think about it, when we convert from decimal to binary or hexadecimal by hand, that's the way we do it. We divide by the base we are converting to, capture the remainder and write that, writing from right-to-left. And it works from decimal to binary or hexadecimal. Or any radix base to any radix base.

Why not do that in the first place?

Several reasons. One is that it's useful to be able to get numbers in and out without sending them through a conversion string buffer. Another is that shifting bits in registers and memory is one of the more useful things you can learn about, especially for assembly language. Yet another is, well, ...

Now you know that dividing and multiplying by powers of two is easy, right?

On the 68000, we have general multiply and divide, at least for 16 bits.

On the 6809 and 6801, we have byte multiply to 16 bits. No divide. (Multiply is much easier than divide.)

On the 6800 we have no multiply and no divide.

We are going to have to synthesize some multiplication and division. Also, even on the 68000, multiplication and division cost more than shifts in CPU cycle counts.

It would be nice to be have a quick way to multiply and divide by constants other than powers of 2, wouldn't it? Especially by ten?

Why, yes, it would. Let's do it. Multiplication is easier. Let's do some middle-school algebra:

10X == 2(5X)

5X == (4 + 1)X => 10X == 2((4+1)X)

(4 + 1)X == 4X + X => 10X == 2(4X + X)

4X == 2(2X) => 10X == 2(2(2X)+X)

Let's build that up from adds and shifts:

*
MUL10	LDD	,U	; X
	CLR	,-U	; for overflow (parameter 1 off)
	CLR	,-U	; 16 bits (parameter 2 off)
	ASLB		; 2X
	ROLA
	ROL	1,U
	ASLB		; 2(2X)
	ROLA
	ROL	1,U
	ADDD	2,U	; 2(2X)+X
	BCC	MUL10N
	INC	1,U
MUL10N	ASLB		; 2(2(2X)+X) == 10X
	ROLA
	ROL	1,U
	STD	2,U
	RTS

 

 


(Title Page/Index)

 

 

 

 

Sunday, December 15, 2024

ALPP 02-35 -- Tentative Op-code Map of RK0801 CPU (Extension of M6801)

One final bit of treasure from the bottom of the pool.

  Tentative Op-code Map of
RK0801 CPU
(Extension of 6801)

(Title Page/Index)

 

This is a tentative op-code map of extensions to the 6801 CPU that I think would make it significantly more efficient without blowing the semiconductor real estate (gates) budget for an 8-bit CPU core, from some older ideas I've had for a while (direct page unaries and SBX) and some new ideas suggested by the addressing math and stack frame examples

New in this map:

  • SBX: Subtract B from X corollary to existing ABX. This optimizes small-to-medium allocations where size is not known at compile/assemble time, also helps when following relative links around.
    (Adding an op-code to add D to X might be another possibility, but would require sign-extending into A.)
  • Add signed Immediate byte to X and S, ADIX/ADIS. This optimizes small-to-medium stack and other allocations where size is known at compile/assemble time.
    (Add and Subtract unsigned byte immediate is an option, but requires more op-codes in the very tight primary op-code table. Add 16-bit immediate is yet another option, but is less efficient with code size, enough so as to make the most common case, add plus or minus 2, meaningless.)
    (Considered dropping INX/DEX and INS/DES, but that de-optimizes byte string operations.)
  • Direct-page versions of unary/read-modify-write byte instructions,
    • NEG (NEGate byte),
    • COM (bit COMplement byte),
    • LSR (Logical Shift Right byte),
    • ROR (ROtate Right byte through carry)
    • ASR (Arithmetic Shift Right byte, copying sign),
    • ROL (ROtate Left byte through carry),
    • DEC (DECrement byte),
    • INC (INCrement byte),
    • TST (TeST byte),
    • CLR (CLeaR byte).

    (These are, really, more appropriate in direct-page mode than in extended mode, to provide effective pseudo-registers.)
    (Also, it might be useful to provide address function code outputs that distinguish between direct page and extended mode, providing an effective separate address space for pseudo-registers and I/O, with all addressing modes enabled on it.)

  • 16-bit read/modify/write  instructions:
    • DINC, (Double-byte INCrement)
      (including INCD, INCrement Double accumulator),
    • DDEC (Double-byte DECrement)
      (including DECD, DECrement Double accumulator),
    • DASL (Double-byte Arithmetic Shift Left)
      (including ASLD, Arithmetic Shift Left Double accumulator),
    • DLSR (Double-byte Logical Shift Right)
      (including LSRD, Logical Shift Right Double accumulator).

    (DASL and DLSR are moved from their position in the 6801 map to the corresponding position in the new 16-bit ranks.)
    (16-bit increment and decrement in the direct page will be especially helpful for software stacks.)

  • JMP to direct-page target (not in 6801 op-codes).

Adding the FDIV and IDIV instructions that the 68HC11 has would be fun, but would likely shoot the gates budget. Likewise, adding the 68HC11's bit testing and manipulation instructions or an additional stack register would require using pre-bytes, and I don't want to do that, either.

Instead of moving the op-codes around, the missing op-codes could be squeezed into empty codes in the 6801 map, but that would require gates that could be used for something else. 

Using a pre-byte and putting the direct page op-codes in a second op-code map would partially erase the advantage of direct-page op-codes.

Left half of the op-code table:

Mnemonic

UNARY
BRANCH
UNARY

**ACCA **INH REL INH **ACCB *Dir Ind Ext

0 1 2 3 4 5 6 7
0 NEG ***CBA BRA TSX NEG NEG
1
NOP BRN [INS] INCD
*DINC
2
***SBA BHI PULA DECD
*DDEC
3 COM ***ABA BLS PULB COM COM
4 LSR ***TAB BCC [DES] LSR LSR
5
***TBA BCS TXS **ASLD *DASL
6 ROR TAP BNE PSHA ROR ROR
7 ASR TPA BEQ PSHB ASR ASR
8 ASL [INX] BVC PULX ASL ASL
9 ROL [DEX] BVS RTS ROL ROL
A DEC CLV BPL ABX DEC DEC
B
SEV BMI RTI ***LSRD *DLSR
C INC CLC BGE PSHX INC INC
D TST SEC BLT MUL TST TST
E ***DAA CLI BGT WAI *SBX *JMP
F CLR SEI BLE SWI CLR CLR

*Not in 6801 *No JMP dp in 6801

**Moved in 2801

***Both row and column moved.

Right half of the op-code table:

Mnemonic

BINARY

ACCA ACCB

Imm Dir Ind Ext Imm Dir Ind Ext

8 9 A B C D E F
0 SUB
1 CMP
2 SBC
3 SUBD ADDD
4 AND
5 BIT
6 LDA
7
STA
STA
8 EOR
9 ADC
A ORA
B ADD
C CPX LDD
D BSR JSR
STD
E LDS LDX
F *~ADIS STS *~ADIX STX

*Not in 6801

*~ADIS and ADIX are signed byte constant

Expanding the address map via segment registers or widened address registers is tempting, but I'm thinking to simply be satisfied with two additional address function outputs to allow distinction between

  • code space (PC relative),
  • return address stack space (S relative),
  • direct page space (DP mode),
  • general data (everything else).

Four address spaces won't really even double available address space because of issues in indexing and hard space separation, but it will make it possible to reach or somewhat exceed full 64 K  addressing.

On the other hand, it would not be hard to give the '0801 widened X and PC and maybe S, or segment registers for two, three or all four of the above address spaces or something similar. If segment registers, I would want to use either full-width segment registers, or have the segment registers offset a full byte. None of that 4-bit offset wamby-pamby.

Further extensions, such as a second stack and widened address registers, and the Y register, bit operators, and IDIV and FDIV from the 'HC11, would warrant another part number, say 2801, the "2" indicating two stacks. 

Or a second 16-bit accumulator, such as the 6309 has, would make it a 16-bit CPU, so maybe 21601. But borrowing from the 6309 tends to point to the idea that, beyond a certain point, we'd want to move up to an extended derivative of the 6809.

Well, I don't think I have anything more for this rabbit hole at this moment, so you can return to the irregularly scheduled assembly language tutorial, continuing with getting numbers output.


(Title Page/Index)


Sunday, December 8, 2024

ALPP 02-34 -- Ascending the Right Island -- Frameless Examples (Single- & Split-stack): 68000

This should be the final bits of treasure I have to drag up from the bottom of the pool before I get back to I/O.

  Ascending the Right Island --
 Frameless Examples (Single- & Split-stack):
68000

(Title Page/Index)

 

Having worked through the 6809 versions of the stack frame example code with the stack frame code stripped out, let's continue full circle and look at the 68000 version with the stack frame code stripped out. 

Again, there is not a whole lot to say here that can't be seen fairly easily in the code. Let's start with the single-stack no-frame code, comparing it with the single-stack stack frame code for the 68000 and the same for the 6809 in a new window. You'll want to assemble this and trace through it, stopping appropriately to look at the registers, stack, and memory.

	OPT LIST,SYMTAB	; Options we want for the stand-alone assembler.
	MACHINE MC68000	; because there are a lot the assembler can do.
	OPT DEBUG	; We want labels for debugging.
	OUTPUT
***********************************************************************
*
* 16-bit addition as example of single-stack no-frame discipline on 68000
* with test code
* Joel Matthew Rees, October, November 2024
*
NATWID	EQU	4	; 4 bytes in the CPU's natural integer
HLFNAT	EQU	2	; half natural integer
*
*
	EVEN
LB_ADDR	EQU	*
ENTRY	BRA.W	START
	NOP		; A little buffer zone.
	NOP
A4SAVE	DS.L	1	; a place to keep A4 to A7 so we can return clean
A5SAVE	DS.L	1	; using it as pseudo-DP
A6SAVE	DS.L	1	; 
A7SAVE	DS.L	1	; SP
	DS.L	2	; gap
HPPTR	DS.L	1	; heap pointer (not yet managed)
HPALL	DS.L	1	; heap allocation pointer
	DS.L	2	; gap
FINAL	DS.L	1	; unused statically allocated variable
GAP1	DS.L	51	; gap, make it an even 256 bytes.
*
	DS.L	2	; a little bumper space
SSTKLIM	DS.L	16*8	; 16 levels of call, with room for stack frames
* 			; 68000 is pre-dec (pre-store-decrement) push
SSTKBAS	DS.L	4	; for canary return
	DS.L	2	; bumper
HBASE	DS.L	$1000	; heap space (not yet managing it)
HLIM	DS.L	2	; bumper
*
*
	EVEN
INISTKS	MOVEM.L	(A7)+,A0	; get the return address from the BIOS-provided stack
	LEA	LB_ADDR(PC),A3	; point to our process-local area
	MOVEM.L	A4-A7,A4SAVE-LB_ADDR(A3)	; Store away what the BIOS gives us.
	MOVE.L	A3,A5	; set up our local base (pseudo-DP) in A5
	LEA	SSTKBAS+4*NATWID-LB_ADDR(A5),A7	; set up our return stack
	PEA	STKUNDR(PC)	; fake return to stack underflow handler
	PEA	STKUNDR(PC)	; fake return to stack underflow handler
	LEA	HBASE-LB_ADDR(A5),A4	; as if we actually had a heap
	MOVE.L	A4,HPPTR-LB_ADDR(A5)
	MOVE.L	A4,HPALL-LB_ADDR(A5)
	JMP	(A0)		; return via A0
*

***
* Stack after entry when functions are called by MAIN
* with two parameters
* We will return result in D0:D1
* [STKUNDR ]
* [STKUNDR ]SSTKBAS
* [RETADR0 ] 
* [--------]
* [--------] 
* [PARAM2_1]
* [PARAM2_2]
* [RETADR1 ] <= SP 
*

* Signed 16 bit add to 32 bit result
* Why do this? Stack cell is 32-bit, parameters are 16.
* Handle sign overflow without losing precision.
* input parameters:
*   16-bit left, right in 32-bit cell on stack
* output parameter:
*   17-bit sum in 32-bit D1
ADD16S	MOVE.W	NATWID+HLFNAT(A7),D0	; right (16-bit only)
	EXT.L	D0
	MOVE.W	2*NATWID+HLFNAT(A7),D1	; add to left (16-bit only)
	EXT.L	D1
	ADD.L	D0,D1			; 32-bit result
	RTS		; return, *** all flags valid!! ***
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
*   16-bit left, right in 32-bit cell on stack
* output parameter:
*   17-bit sum in 32-bit D1
ADD16U	CLR.L	D0
	MOVE.W	NATWID+HLFNAT(A7),D0	; right (16-bit only)
	CLR.L	D1
	MOVE.W	2*NATWID+HLFNAT(A7),D1	; add to left (16-bit only)
	ADD.L	D0,D1			; 32-bit result
	RTS		; return, *** all flags valid!! ***
*
* Etc.
*

***
* Stack after entry when functions are called by MAIN
* with two parameters (pointer and addend)
* We will return result in D0:D1
* [STKUNDR ]
* [STKUNDR ]SSTKBAS
* [RETADR0 ] 
* [VAR1_1--]
* [VAR1_2--] <= PARAM2_1
* [PARAM2_1]
* [PARAM2_2]
* [RETADR1 ] <= SP
* To show how to access caller's local variables through pointer
* instead of walking stack --
* Add 16-bit signed parameter
* to 32 bit caller's 32-bit internal variable.
* input parameter:
*   16-bit pointer to 32-bit integer
*   16-bit addend
* no output parameter:
ADD16SI	MOVE.W	NATWID+HLFNAT(A7),D1	; skip over return address
	EXT.L	D1
	MOVE.L	2*NATWID(A7),A0		; get pointer to (internal) variable
	ADD.L	D1,(A0)			; add to variable pointed to
	RTS		; return, *** all flags valid!! ***
*
*
***
* Stack on entry
* [STKUNDR ]
* [STKUNDR ]SSTKBAS
* [RETADR0 ] <= SP
*
MAIN	CLR.L	-(A7)	; 2 variables
	CLR.L	-(A7)
	MOVE.L	#$1234,-(A7)
	MOVE.L	#$CDEF,-(A7)
	BSR.W	ADD16U	; result in D1 should be $E023
	LEA	2*NATWID(A7),A7	; could reuse, instead
	MOVE.L	D1,-(A7)
	MOVE.L	#$8765,-(A7)
	BSR.W	ADD16S	; result in D1 should be $FFFF6788 (and carry set)
	LEA	2*NATWID(A7),A7	; drop the parameters
	MOVE.L	D1,(A7)	; store result in 2nd local variable
	PEA	(A7)
	MOVE.L	#$A5A5,-(a7)
	BSR.W	ADD16SI		; result in 2nd variable should be FFFF0D2D (Carry set)
	MOVE.L	2*NATWID(A7),FINAL-LB_ADDR(A5)	; store the result
	LEA	4*NATWID(A7),A7	; drop the parameters and locals
	RTS
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP (A7)
***
*
* Stack after initialization:
* [STKUNDR ]
* [STKUNDR ]SSTKBAS <= SP
***
*
START	BSR.W	INISTKS
*
	NOP
*
	BSR.W	MAIN
*
	NOP
*
DONE	NOP
ERROR	NOP		; stack underflow and ERROR skip DONE
STKUNDR	NOP
	MOVEM.L	A4SAVE-LB_ADDR(A5),A4-A7	; restore the monitor's A4-A7
	NOP
	NOP		; landing pad
	NOP
	NOP
* One way to return to the OS or other calling program
	CLR.W	-(A7)	; there should be enough room on the caller's stack
	TRAP	#1	;	quick exit
*

I have stepped through the code myself. It runs, puts the correct results where they are supposed to go, and restores the stack as it should.

Now let's look at it with a split-stack frameless discipline, comparing it with both the above in a new browser window and the split-stack stack frame version for the 68000, and with the split-stack version for the 6809. Assemble this one, as well, and step through it, watching memory and registers.

	OPT LIST,SYMTAB	; Options we want for the stand-alone assembler.
	MACHINE MC68000	; because there are a lot the assembler can do.
	OPT DEBUG	; We want labels for debugging.
	OUTPUT
***********************************************************************
*
* 16-bit addition as example of split-stack, frameless discipline on 68000
* with test code
* Joel Matthew Rees, October 2024
*
NATWID	EQU	4	; 4 bytes in the CPU's natural integer
HLFNAT	EQU	2	; half natural integer
*
*
	EVEN
LB_ADDR	EQU	*
ENTRY	BRA.W	START
	NOP		; A little buffer zone.
	NOP
A4SAVE	DS.L	1	; a place to keep A4 to A7 so we can return clean
A5SAVE	DS.L	1	; using it as pseudo-DP
A6SAVE	DS.L	1	; 
A7SAVE	DS.L	1	; SP
	DS.L	2	; gap
HPPTR	DS.L	1	; heap pointer (not yet managed)
HPALL	DS.L	1	; heap allocation pointer
	DS.L	2	; gap
FINAL	DS.L	1	; unused statically allocated variable
GAP1	DS.L	51	; gap, make it an even 256 bytes.
*
	DS.L	2	; a little bumper space
SSTKLIM	DS.L	16*2	; 16 levels of call, with room for frame pointers
* 			; 68000 is pre-dec (pre-store-decrement) push
SSTKBAS	DS.L	2	; for canary return
SSTKBMP	DS.L	2	; bumper
PSTKLIM	DS.L	16*4	; roughly 16 levels of call
PSTKBAS	DS.L	2	; bumper
HBASE	DS.L	$1000	; heap space (not yet managing it)
HLIM	DS.L	2	; bumper
*
*
	EVEN
INISTKS	MOVE.L	(A7)+,A0	; get the return address from the other (BIOS) stack
	LEA	LB_ADDR(PC),A3
	MOVEM.L	A4-A7,A4SAVE-LB_ADDR(A3)	; Store away what the BIOS gives us.
	MOVE.L	A3,A5	; set up our local base (pseudo-DP)
	LEA	SSTKBMP-LB_ADDR(A5),A7	; set up our return stack
	LEA	PSTKBAS-LB_ADDR(A5),A6	; set up our parameter stack
	PEA	STKUNDR(PC)	; fake return to stack underflow handler
	PEA	STKUNDR(PC)	; fake return to stack underflow handler
	LEA	HBASE-LB_ADDR(A5),A4	; as if we actually had a heap
	MOVE.L	A4,HPPTR-LB_ADDR(A5)
	MOVE.L	A4,HPALL-LB_ADDR(A5)
	JMP	(A0)		; return via A0
*

***
* Return stack when functions are called by MAIN
* Return stack on entry:
* [STKUNDR ]
* [STKUNDR ]SSTKBAS
* [RETADR0 ] 
* [RETADR1 ] <= RSP
*
* Parameter stack when called by MAIN
* with two 16-bit parameters,
* [<unknown>]
* [32:VAR1_1--]
* [32:VAR1_2--]
* [16:PARAM2_1]
* [16:PARAM2_2] <= PSP
*

* Signed 16 bit add with 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
*   16-bit left, right in 16-bit cells
* output parameter:
*   17-bit sum in 32-bit cell
ADD16S	MOVEM.W	(A6)+,D0/D1	; D0 lowest, but 16-bit sign extends!
*	EXT.L	D0		; right
*	EXT.L	D1		; left
	ADD.L	D0,D1		; add right to left
	MOVE.L	D1,-(A6)
	RTS		; return, *** all flags valid!! ***
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
*   16-bit left, right in 16-bit cells
* output parameter:
*   17-bit sum in 32-bit cell
ADD16U	CLR.L	D0
	CLR.L	D1
*	MOVEM.W	(A6)+,D0/D1	; D0 lowest, but 16-bit sign extends!
	MOVE.W	(A6)+,D0	; right
	MOVE.W	(A6)+,D1	; left
	ADD.L	D0,D1		; add right to left
	MOVE.L	D1,-(A6)
	RTS		; return, *** all flags valid!! ***
*
* Etc.
*

***
* Parameter stack when called by MAIN
* with two parameters, 32-bit pointer and 16-bit addend
* [32:VAR1_1--]
* [32:VAR1_2--] <= PARAM2_1 
* [32:PARAM2_1]
* [16:PARAM2_2] <= PSP

* Instead of walking the stack, pass in a pointer --
* Add 16-bit signed parameter
* to caller's 2nd 32-bit internal variable.
* input parameter:
*   32-bit pointer to 32-bit integer
*   16-bit addend
* no output parameter:
ADD16SI	MOVE.W	(A6)+,D1		; addend
	EXT.L	D1
	MOVE.L	(A6)+,A0		; get caller's internal variable pointer
	ADD.L	D1,(A0)	; add to caller's 2nd variable
	RTS		; return, *** all flags valid!! ***
*
*
***
* Return stack on entry:
* [STKUNDR ]
* [STKUNDR ]SSTKBAS
* [RETADR0 ] <= RSP
*
* Parameter stack after local allocation
* [<unknown>]
* [VAR1_1--]
* [VAR1_2--] <= PSP
*
MAIN	CLR.L	-(A6)		; allocate and initialize
	CLR.L	-(A6)		; allocate and initialize
	MOVE.W	#$1234,-(A6)
	MOVE.W	#$CDEF,-(A6)
	BSR.W	ADD16U	; result on parameter stack should be $E023
	LEA	HLFNAT(A6),A6	; adjust to 16 bit, could be optimized out
	MOVE.W	#$8765,-(A6)
	BSR.W	ADD16S	; result on parameter stack should be $FFFF6788 (and carry set)
	MOVE.L	(A6),NATWID(A6)	; save in local
	LEA	NATWID(A6),A0
	MOVE.L	A0,(A6)		; save the pointer.
	MOVE.W	#$A5A5,-(a6)	; push the addend.
	BSR.W	ADD16SI		; result in 2nd variable should be FFFF0D2D (Carry set)
	MOVE.L	(A6),FINAL-LB_ADDR(A5)	; store the result
	RTS
*
***
* Stack at START:
* (what BIOS/OS gave us) <= RSP (A7)
***
* (who knows?) <= PSP (A6)
***
*
***
* Return stack will always be in pairs:
* [RETADRNN  ]
* [CALLERFMNN]
*
* Return stack after initialization:
* [STKUNDR ]
* [STKUNDR ]SSTKBAS <= RSP
*
* Parameter stack after initialization, mark:
* [<unknown] <= PSP,FP==<EMPTYP>
*
START	BSR.W	INISTKS
*
	NOP
*
	BSR.W	MAIN
*
	NOP
*
DONE	NOP
	NOP		; landing pad
ERROR	NOP
STKUNDR	NOP
	MOVE.L	(A7)+,A4
	MOVEM.L	A4SAVE-LB_ADDR(A5),A4-A7	; restore the monitor's A4-A7
	NOP
	NOP		; landing pad
	NOP
	NOP
* One way to return to the OS or other calling program
	CLR.W	-(A7)	; there should be enough room on the caller's stack
	TRAP	#1	;	quick exit
*

Man, I'm worn out. On the other hand, I think this detour will help me focus as we progress through the I/O examples, starting with binary numeric output.

(It definitely helped me figure out some daydreams about extending the 6801, if you're interested.)


(Title Page/Index)


Wednesday, December 4, 2024

ALPP 02-33 -- Ascending the Right Island -- Frameless Examples (Single- & Split-stack): 6809

Yet another couple of useful bits, from the bottom of the pool.

Ascending the Right Island --
 Frameless Examples (Single- & Split-stack):
6809

(Title Page/Index)

 

Now that we have worked through both the single-stack and split-stack frameless examples for the 6801, we can finally get back to the code that started this detour (6809 version) and strip out the code for maintaining the stack frames. 

On higher-level architectures like the 6809, the stack frame maintenance code can be so non-intrusive that it can be easy to fail to notice it. 

But it can still get in the way. So I'm going ahead and showing the code without it here, in single-stack no-frame and split-stack frameless discipline. 

Frameless does mean we have to keep track of what's on the stack(s).

And there's really not much more left to talk about, although we want to remember that, because we are making specific use of the direct page, the entry address is $2000 instead $80.

First the single-stack version. You'll want to compare with the single-stack stack frame version for the 6809 to get a better feel for what is happening with stack frames, as well as with the single-stack no frame version for the 6801 to see how the 6809's addressing modes make things easier.
* 16-bit addition as example of single-stack no frame discipline on 6809
* using the direct page,
* with test code
* Joel Matthew Rees, October 2024
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$2000	; MDOS says this is a good place for usr stuff.
*	SETDP	$20	; for some other assemblers
	SETDP	$2000	; for EXORsim
*
ENTRY	LBRA	START
	NOP		; Just want even addressed pointers for no reason.
	NOP		; bumper
	NOP
SSAVE	RMB	2	; a place to keep S so we can return clean
SSAVEX	EQU	6	; manufacture offsets for assemblers that can't do SSAVE-ENTRY
USAVE	RMB	2	; just for kicks, save U, too
USAVEX	EQU	SSAVEX+2
DPSAVE	RMB	2	; a place to keep DP so we can return clean
DPSAVEX	EQU	USAVEX+2
	RMB	4	; bumper
XWORK	RMB	2	; For saving an index register temporarily
XWORKX	EQU	DPSAVEX+6
HPPTR	RMB	2	; heap pointer (not yet managed)
HPPTRX	EQU	XWORKX+2
HPALL	RMB	2	; heap allocation pointer
HPALLX	EQU	HPPTRX+2
	RMB	4	; bumper
FINAL	RMB	4	; 32-bit Final result in DP variable (to show we can)
FINALX	EQU	HPALLX+6
GAP1	RMB	2	; Mark the bottom of the gap
GAP1X	EQU	FINALX+4
*
LB_ADDR	EQU	ENTRY
*
*
	SETDP	0	; Not yet set up
	ORG	$2100	; Give the DP room.
	RMB	4	; a little bumper space
SSTKLIM	RMB	96	; roughly 16 levels of call
SSTKLIMX	EQU	$104
* 			; 6809 is pre-dec (pre-store-decrement) push
SSTKBAS	RMB	4	; for canary return
SSTKBASX	EQU	SSTKLIMX+96
SSTKBMP	RMB	4	; a little bumper space
SSTKBMPX	EQU	SSTKBASX+4
*
HBASE	RMB	$1024		; Not using or managing heap yet.
HBASEX	EQU	SSTKBMPX+4
HLIM	RMB	4	; bumper
HLIMX	EQU	HBASEX+$1024
*
*
INISTK	TFR	DP,A
	CLRB
	TFR	D,X		; save old DP base for a moment
	LEAY	ENTRY,PCR	; Set up new DP base
	TFR	Y,D
	TFR	A,DP		; Now we can access DP variables correctly.
*	SETDP	$20	; some other assemblers
	SETDP	$2000	; EXORsim
	STX	<DPSAVE		; technically only need to save high byte
	STU	<USAVE
	PULS	X		; get return address
	STS	<SSAVE		; Save what the monitor gave us.
	LEAS	SSTKBMPX,Y	; Move to our own stack
	LEAY	STKUNDR,PCR	; fake return to stack underflow handler
	PSHS	Y		; 
	PSHS	Y		; one more fake return to handler
	CLRB			; A still has run-time DP
	ADDD	#HBASEX		; calculat EA
	TFR	D,Y		; as if we actually had a heap
	STY	<HPPTR
	STY	<HPALL
	JMP	,X	; return via X
*
***
* Stack after call when fuctions are called by MAIN
* with two parameters
* (#0 means no local variables)
* We will return result in D:X
* [STKUNDR ]
* [STKUNDR ]SSTKBAS
* [RETADR0 ] 
* [--------]
* [--------] 
* [PARAM2_1]
* [PARAM2_2]
* [RETADR1 ] 
*
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
*   16-bit left 1st pushed, right 2nd
* output parameter:
*   17-bit sum in 32-bit D:X D high, X low
ADD16S	LDX	#-1	; sign extend right
	TST	2,S	; sign bit, anyway
	BMI	ADD16SR
	LEAX	1,X	; 0
ADD16SR	PSHS	X	; push right extension (parameters 4 offset)
	LDX	#-1	; negative
	LDD	6,S	; left
	BMI	ADD16SL
	LEAX	1,X	; 0
ADD16SL	PSHS	X	; push left extension (parameters 6 offset)
	ADDD	6,S	; add right
	TFR	D,X	; save low
	PULS	D	; get left sign extension (parameters 4 offset)
	ADCB	1,S	; carry is still safe
	ADCA	,S	; high word complete
	LEAS	2,S	; drop temporary
	RTS		; C, N valid, Z not valid
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
*   16-bit left, right
* output parameter:
*   17-bit sum in 32-bit D:X D high
ADD16U	LDD	4,S	; left
	ADDD	2,S	; add right
	TFR	D,X	; save low
	LDD	#0	; extend
	ADCB	#0	; extend Carry unsigned (could ROL in)
	RTS		; C, N valid, Z not valid
*
* Etc.
*
***
* Stack at entry when called by MAIN
* (#0 means no local variables)
* We will return result in D0:D1
* [STKUNDR ]
* [STKUNDR ]SSTKBAS
* [RETADR0 ] 
* [VAR1_1--]
* [VAR1_2--] <= PARAM2_1
* [PARAM2_1] (pointer to VAR1_2)
* [PARAM2_2]
* [RETADR1 ] 
*
* To show how to access caller's local variables through pointer
* instead of walking stack --
* Add 16-bit signed parameter
* to 32 bit caller's 32-bit internal variable.
* input parameter:
*   16-bit pointer to 32-bit integer
*   16-bit addend
* no output parameter:
ADD16SI	LDX	#-1	; sign extend 1st parameter
	TST	2,S
	BMI	ADD16SIP
	LEAX	1,X
ADD16SIP	PSHS	X	; parameters now 4 offset
	LDX	6,S	; pointer -- LDD [6,X] gets the high half
	LDD	2,X	; caller's 2nd variable, low
	ADDD	4,S	; 1st parameter
	STD	2,X	; update low half
	LDD	,X	; caller's 2nd variable, high
	ADCB	1,S	; sign extension
	ADCA	,S	; high byte 
	STD	,X	; update
	LEAS	2,S	; drop temporary
	RTS		; C, N valid, Z not valid
*
*
***
* Stack after allocating local variables
* [STKUNDR ]
* [STKUNDR ]SSTKBAS
* [RETADR0 ] 
* [32:VAR1_1]
* [32:VAR1_2] <= SP
*
MAIN	LDD	#0	; allocate and initialize
	TFR	D,X
	PSHS	D,X
	PSHS	D,X
*
	LDX	#$1234
	LDD	#$CDEF
	PSHS	D,X
	LBSR	ADD16U	; result in D:X should be $E023
	STX	2,S
	LDD	#$8765
	STD	0,S
	LBSR	ADD16S	; result in D:X should be $FFFF6788 (and carry set)
	STX	6,S	; result in 2nd local variable
	STD	4,S
	LEAX	4,S	; calculate address of 2nd variable to pass in
	STX	2,S
	LDD	#$A5A5
	STD	,S	
	LBSR	ADD16SI		; result in 2nd variable should be FFFF0D2D (Carry set)
	LDD	4,S
	STD	<FINAL
	LDD	6,S
	STD	<FINAL+2
	LEAS	12,S	; drop both the used parameters and the local variables together
	RTS		; C, N still valid, Z still not
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP
***
*
* Stack after initialization:
* [STKUNDR ]
* [STKUNDR ]SSTKBAS <= SP
***
*
START	NOP
	LBSR	INISTK
	NOP
*
*
	LBSR	MAIN
*
DONE	NOP
ERROR	NOP	; define error labels as something not DONE, anyway
STKUNDR	NOP
	LDS	<SSAVE	; restore the monitor stack pointer
	LDU	<USAVE	; restore U
	LDD	<DPSAVE	; restore the monitor DP last
	TFR	A,DP
	SETDP	0	; For lack of a better way to set it.
	NOP
	NOP		; landing pad to set breakpoint at
	NOP
	NOP
	JMP	[$FFFE]	; alternatively, jmp through reset vector
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor, 
* but not necessarily to your program in a runnable state.

Again, not much to say about the split-stack code. other than that you'll want to compare it with the split-stack stack frame version for the 6809 and the split-stack stack frame version for the 6801, for the same reasons as mentioned above. to get a better feel of the differences.
* 16-bit addition as example of split-stack frame-free discipline on 6809
* using the direct page,
* with test code
* Joel Matthew Rees, October 2024
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$2000	; MDOS says this is a good place for usr stuff.
*	SETDP	$20	; for lwasm and some other assemblers
	SETDP	$2000	; for EXORsim and some other assemblers
*
ENTRY	LBRA	START
	NOP		; Just want even addressed pointers for no reason.
	NOP		; bumper
	NOP
SSAVE	RMB	2	; a place to keep S so we can return clean
SSAVEX	EQU	4	; manufacture offsets for assemblers that can't do SSAVE-ENTRY
USAVE	RMB	2	; just for kicks, save U, too
USAVEX	EQU	SSAVEX+2
DPSAVE	RMB	2	; a place to keep DP so we can return clean
DPSAVEX	EQU	USAVEX+2
	RMB	4	; bumper
XWORK	RMB	2	; For saving an index register temporarily
XWORKX	EQU	DPSAVEX+6
HPPTR	RMB	2	; heap pointer (not yet managed)
HPPTRX	EQU	XWORKX+2
HPALL	RMB	2	; heap allocation pointer
HPALLX	EQU	HPPTRX+2
	RMB	4	; bumper
FINAL	RMB	4	; 32-bit Final result in DP variable (to show we can)
FINALX	EQU	HPALLX+6
GAP1	RMB	2	; Mark the bottom of the gap
GAP1X	EQU	FINALX+4
*
LB_ADDR	EQU	ENTRY
*
*
	SETDP	0	; Not yet set up
	ORG	$2100	; Give the DP room.
	RMB	4	; a little bumper space
SSTKLIM	RMB	32	; 16 levels of call
SSTKLIMX	EQU	$104	; Skip over the DP page.
* 			; 6809 is pre-dec (pre-store-decrement) push
SSTKBAS	RMB	4	; for canary return
SSTKBASX	EQU	SSTKLIMX+32
SSTKBMP	RMB	4	; a little bumper space
SSTKBMPX	EQU	SSTKBASX+4
PSTKLIM	RMB	64	; about 16 levels of call at two parameters per call
PSTKLIMX	EQU	SSTKBMPX+4
PSTKBAS	RMB	4	; bumper space -- parameter stack is pre-dec
PSTKBASX	EQU	PSTKLIMX+64
*
HBASE	RMB	$1024		; Not using or managing heap yet.
HBASEX	EQU	PSTKBASX+4
HLIM	RMB	4	; bumper
HLIMX	EQU	HBASEX+$1024
*
*
* Calculate DP because we don't have DP relative in index postbyte:
INISTKS	TFR	DP,A
	CLRB
	TFR	D,X		; save old DP base for a moment
	LEAY	ENTRY,PCR	; Set up new DP base
	TFR	Y,D
	TFR	A,DP		; Now we can access DP variables correctly.
*	SETDP	$20	; some other assemblers
	SETDP	$2000	; EXORsim
	STX	<DPSAVE		; technically only need to save high byte
	STU	<USAVE
	PULS	X		; get return address
	STS	<SSAVE		; Save what the monitor gave us.
	LEAS	SSTKBMPX,Y	; Move to our own return stack
	LEAU	PSTKBASX,Y	; and our own parameter stack
	LEAY	STKUNDR,PCR	; fake return to stack underflow handler
	PSHS	Y
	PSHS	Y		; one more fake return to stack underflow handler
	CLRB			; A still has run-time DP
	ADDD	#HBASEX		; calculat EA
	TFR	D,Y		; as if we actually had a heap
	STY	<HPPTR
	STY	<HPALL
	JMP	,X	; return via X
*
*
***
* Return stack when functions are called by MAIN
* Return stack on entry:
* [STKUNDR ]
* [STKUNDR ]SSTKBAS
* [RETADR0 ]
* [RETADR1 ]
*
* Parameter stack when called by MAIN
* with two 32-bit local variables
* and two 16-bit parameters,
* after mark (no local allocation)
* [<unknown>]
* [32:VAR1_1--]
* [32:VAR1_2--]
* [16:PARAM2_1]
* [16:PARAM2_2] <= PSP
*
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
*   16-bit left, right
* output parameter:
*   17-bit sum in 32-bit
ADD16S	LDX	#-1	; sign extend right
	TST	,U	; sign bit, anyway (Use Y to show it can be used.)
	BMI	ADD16SR
	LEAX	1,X	; 0
ADD16SR	PSHU	X	; push right extension (parameters 2 offset)
	LDX	#-1	; negative
	LDD	4,U	; left
	BMI	ADD16SL
	LEAX	1,X	; 0
ADD16SL	PSHU	X	; push left extension (parameters 4 offset)
	ADDD	4,U	; add right
	STD	6,U	; save low
	PULU	D	; get left sign extension (parameters 2 offset)
	ADCB	1,U	; carry is still safe
	ADCA	,U++	; high word complete, tricky postinc (parameters 0 offset)
	STD	,U
	RTS		; C, N valid, Z not valid
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
*   16-bit left, right in 32-bit
* output parameter:
*   17-bit sum in 32-bit 
ADD16U	LDD	2,U	; left
	ADDD	,U	; add right
	STD	2,U	; save low
	LDD	#0	; extend
	ROLB		; extend Carry unsigned (could ADC #0)
	STD	,U
	RTS		; C, N valid, Z not valid
*
* Etc.
*
*
***
* Parameter stack when called by MAIN
* with two 16-bit parameters,
* [32:VAR1_1--]
* [32:VAR1_2--] <= PARAM2_1
* [16:PARAM2_1]
* [16:PARAM2_2] <= PSP
*
* Instead of walking the stack, pass in a pointer --
* Add 16-bit signed parameter
* to 32 bit caller's 2nd 32-bit internal variable.
* input parameter:
*   16-bit pointer to 32-bit integer
*   16-bit addend
* no output parameter:
ADD16SI	LDD	#-1	; sign extend addend parameter
	TST	,U
	BMI	ADD16SIP
	LDD	#0
ADD16SIP	PSHU	D	; save sign extension (parameters 2 offset)
	LDX	4,U	; get pointer to variable
	LDD	2,X	; caller's 2nd variable, low
	ADDD	2,U	; addend parameter
	STD	2,X	; update low half
	LDD	,X	; caller's 2nd variable, high
	ADCB	1,U	; sign extension low byte
	ADCA	,U	; high byte
	STD	,X	; store result
	LEAU	6,U	; drop temporary and parameters -- no return parameter
	RTS		; C, N valid, Z not valid
*
*
***
* Return stack on entry:
* [STKUNDR ]
* [STKUNDR ]SSTKBAS
* [RETADR0 ] <= RSP
*
* Parameter stack after local allocation
* [<unknown>]
* [VAR1_1--]
* [VAR1_2--] <= PSP
*
MAIN	LDD	#0	; allocate and initialize
	TFR	D,X
	PSHU	D,X
	PSHU	D,X
	LDX	#$1234
	LDD	#$CDEF
	PSHU	D,X	; 8 bytes local, 4 bytes parameter, 12 bytes offset
	LBSR	ADD16U	; 32-bit result on parameter stack should be $0000E023
	LEAU	2,U	; drop high part (could be optimized out).
	LDD	#$8765
	PSHU	D
	LBSR	ADD16S	; result on parameter stack should be $FFFF6788 (and carry set)
	PULU	D,X	; 4 bytes of used parameters removed from stack (local variables on top)
	STX	2,U	; low half, store in local variable
	STD	,U	; high half
	LEAX	,U	; point to 2nd variable
	LDD	#$A5A5
	PSHU	D,X	; X pushed first
	LBSR	ADD16SI		; result in 2nd variable should be FFFF0D2D (Carry set)
	LDD	2,U
	STD	<FINAL+2
	LDD	,U
	STD	<FINAL
	LEAU	8,U
	RTS
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= RSP (S)
***
* (who knows?) <= PSP (U)
***
*
***
* Return stack will be just the return addresses:
* [RETADRNN  ]
*
* Return stack after initialization:
* [STKUNDR ]
* [STKUNDR ]SSTKBAS <= RSP
*
*
* Parameter stack after initialization, mark:
* [<unknown] <= PSP
*
START	LBSR	INISTKS
*
	LBSR	MAIN
*
*
DONE	NOP
ERROR	NOP	; define error labels as something not DONE, anyway
STKUNDR	NOP
	LDS	<SSAVE	; restore the monitor stack pointer
	LDU	<USAVE	; restore U
	LDD	<DPSAVE	; restore the monitor DP last
	TFR	A,DP
	SETDP	0	; For lack of a better way to set it.
	NOP
	NOP		; landing pad to set breakpoint at
	NOP
	NOP
	JMP	[$FFFE]	; alternatively, jmp through reset vector
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor, 
* but not necessarily to your program in a runnable state.

As always, I have stepped through the code and made sure it does what I say it does. 

If reading through it and comparing it with other version brings up questions that stepping through the code doesn't answer, go ahead and leave me a comment.

From here, you can either go ahead to digging into outputting binary numbers, or (when I get it ready) you can look at one more set of examples for frameless discipline, on the 68000.

--

Ah, more squirrels to chase. I mean, more daydreams.

With the stack split up, we might be able to see how a simple hysteric spill-fill cache could significantly optimize calls and returns.

Calls and returns cost, in addition to the code and cycles to load the new PC, cycles to save and restore the old. With the combined stack, they also tend to incur code and cycle costs in moving parameters into place and saving and restoring registers.

With a cache attached to the return stack pointer, saves and restores can happen in parallel with fetching, decoding, and executing instructions, effectively hiding the basic call/return overhead. 

Here's what I mean by hysteric spill/fill:

Say the cache has sixteen entries (32 bytes).

When pushing a new return address crosses the boundary between the 12th and 13th entry -- 3/4 ful, the cache controller starts pushing saved addresses off the other end into main RAM, to make more room. It watches the bus so that it can do so when the bus is not busy with instruction fetches or data or DMA accesses, unless it the cache completely fills, in which case it gets the bus at higher priority than instruction fetches. 

It will keep pushing addresses out until the cache is half-empty again, or until a return cancels the fill. 

Returns will work in reverse. As long as it is nested more than four calls deep, it will try to keep at least four return addresses in the cache, schedule reads to bring addresses back in from RAM when the boundary between the 5th and 4th entries is crossed.

It uses a cache base and limit register to maintain position in the stack address space, and a stack base and limit register to tell the controller when to come to a hard stop, and when to initiate stack overflow or underflow interupt/exception processing.

I assume that you have noticed that splitting the stack helps relieve the costs of moving parameters into place, and can even be of some relief relative to saving and restoring registers.

The parameter stack is not as regularly structured as the return address stack, but it could profitably be cached in a similar manner, with a larger cache, either double or quadruple size.

Both of these caches should be paired, to enable fast context switching. Or maybe done in sets of four, but I'm not sure the 6809 would benefit from four of each. One for the current process and one to be writing back to RAM after a process switch should be enough.

And I guess, since I've commented after the 6801 examples about how the direct page should be a bank of memory to use as pseudo-registers, I should mention the concept of a cache for the direct page here. This would also be paired, with the switch activated when the DP is set. There would need to be several different strategies for filling the new cache and writing back the dirty entries from the old cache, plus a way of setting priority for differnt regions of the direcgt page.

Caching the direct page would conflict with using it for I/O devices, so I'm thinking the 6809 wants a second direct page (specifiable in the index post-byte) just for I/O. 

Heh. Daydreams, indeed. This is just an 8/16-bit processor with a 16-bit address space. Too greedy. Unless we had a true 16/32-bit descendant of the 6809.

Ah. Sorry for the further distractions. 

The 68000 frameless examples

Or outputting binary numbers

 

(Title Page/Index)


 

 

 

 

Friday, November 29, 2024

ALPP 02-32 -- More Ascending the Right Island -- Split-stack No Frame Example: 6801

Still digging into that treasure from the bottom of the pool.

  Ascending the Right Island --
Split-stack No Frame Example:
6801

(Title Page/Index)

 

About the only thing I want to point out here is that, with the support for 16-bit operations on the 6801, it becomes easier to see how splitting the return address allows a more seamless approach to passing parameters than the single-stack no-frame example we just finished. 

Hopefully the code is mostly self-explanatory by now. (We've been looking at the meat of it for so long ...)

Compare with both the single-stack example for the 6801 and the split-stack example for the 6800 to help see what is and is not going on.

As always, read the code and step through it:

* 16-bit addition as example of split-stack frame-free discipline on 6801
* using the direct page,
* with test code
* Joel Matthew Rees, October, November 2024
*
	OPT	6801
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS says this is a good place for usr stuff.
*
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
	NOP		; bumper
	NOP		; 6 bytes to this point.
SSAVE	RMB	2	; a place to keep S so we can return clean
	RMB	4	; bumper
* All of the pseudo-registers must be saved and restored on context switch,
* cannot be accessed during interrupt service.
XWORK	RMB	2	; For saving an index register temporarily
DWORK	RMB	2	; For saving D temporarily
PSP	RMB	2	; parameter stack pointer
LB_BASE	RMB	2	; For process local variables
HPPTR	RMB	2	; heap pointer (not yet managed)
HPALL	RMB	2	; heap allocation pointer
HPLIM	RMB	2	; heap limit
* End of pseudo-registers
	RMB	4	; bumper
GAP1	RMB	2	; Mark the bottom of the gap
*
*
*
	ORG	$2000	; Give the DP room.
LB_ADDR	RMB	4	; a little bumper space
FINAL	RMB	4	; 32-bit Final result in DP variable (to show we can)
FINALX	EQU	4
	RMB	4	; Put a bumper after the process static variables
SSTKLIM	RMB	64	; 16 levels of call
SSTKLMX	EQU	FINALX+8
SSTKBAS	RMB	4	; for canary return
SSTKBSX	EQU	SSTKLMX+64
SSTKFAK	RMB	2	; fake frame pointer, self-link
SSTKFAX	EQU	SSTKBSX+8	; 6801 is post-dec (post-store-decrement) push
SSTKBMP	RMB	4	; a little bumper space
SSTKBMX	EQU	SSTKFAX+2	; But we are going to init S through X
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKLMX	EQU	SSTKBMX+4
PSTKBAS	RMB	4	; bumper space -- parameter stack is pre-dec
PSTKBSX	EQU	PSTKLMX+64
PSTKBMP	RMB	4	; a little bumper space
PSTKBMX	EQU	PSTKBSX+4
*
* My assembler limits RMBs to $100 long, so we'll use a different way.
HBASE	RMB	1	; $1024 or something	; Not using or managing heap yet.
HBASEX	EQU	PSTKBMX+4
*HLIM	RMB	4	; bumper
*HLIMX	EQU	HBASEX+$100	; 1024
*
*
	ORG	$3000
CDBASE	JMP	ERROR		; more bumpers
	NOP
*
INISTKS	LDX	#LB_ADDR	; set up process local space
	STX	LB_BASE
	LDD	LB_BASE		; bootstrap own return stack
	ADDD	#SSTKBSX
	STD	XWORK
	LDX	XWORK		; initial return stack pointer
*
	LDD	#SSTKNDR
	STD	0,X	; in the cell beyond empty stack pointer
	STD	2,X	; and the next cell, for good measure
	PULA		; pop real return address
	PULB
	STS	SSAVE	; save stack pointer from monitor ROM
	TXS		; move to our own stack (let TXS convert it)
	PSHB		; put return address on own stack
	PSHA		; stack now ready for interrupts
*
	LDD	LB_BASE		; bootstrap parameter stack
	ADDD	#PSTKBSX
	STD	PSP		; parameter stack now ready
*
	LDAA	LB_BASE		; set up heap as if we actually had one
	LDAB	LB_BASE+1
	ADDD	#HBASEX
	STD	HPPTR
	STD	HPALL		; as if the heap were functional
	LDD	#CDBASE
	SUBD	#4
	STAA	HPLIM
	RTS
*
*
***
* General structure of the stacks, 
*
* return stack is just the return address:
* [PRETADR   ]
* [RETADR    ] <= SP
*
* order of elements on the parameter stack,
* when they are present:
* [PARAMETERS ]
* [VARIABLES  ]
* [TEMPORARIES]
* [PARAMETERS ]
* [VARIABLES  ]
* [TEMPORARIES]
* [PARAMETERS ] <= PSP
*
* Result is returned on parameter stack
*
***
*
* Utility routines
*
PPOPD	LDX	PSP
	LDD	0,X
	INX
	INX
	STX	PSP
	RTS
*
* This saves bytes:
ALCL2	CLRA
	CLRB	; fall through
* 
PPSHD	LDX	PSP
ALCLI2	DEX
	DEX
	STX	PSP
	STD	0,X
	RTS
*
* Compromise between speed and reusability
* Returns with PSP in X.
* Enter here to load PSP and initialize to 0
* 8 bytes
ALCL8	CLRA	
	CLRB
* Enter here with initial value in A:B
ALCLD8	LDX	PSP
* Enter here with PSP loaded and initial value in D
ALCLI8	DEX
	DEX
	STD	0,X
ALCLI6	DEX
	DEX
	STD	0,X
ALCLI4	DEX
	DEX
	STD	0,X
	BRA	ALCLI2
*
* six bytes
ALCL6	CLRA
	CLRB
ALCLD6	LDX	PSP
	BRA	ALCLI6
*
* four bytes
ALCL4	CLRA
	CLRB
ALCLD4	LDX	PSP
	BRA	ALCLI4
*
*
***
* Return stack when functions are called by MAIN
* Return stack on entry
* [SSTKNDR ]
* [SSTKNDR ]SSTKBAS
* [RETADR0 ]
* [RETADR1 ] <= SP
*
* Parameter stack when called by MAIN
* with two 32-bit local variables
* and two 16-bit parameters,
* after entry (before temporary allocation)
* [<unknown>]
* [32:VAR1_1--]
* [32:VAR1_2--]
* [16:PARAM2_1]
* [16:PARAM2_2] <= PSP
*
****
*
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
*   16-bit left, right
* output parameter:
*   17-bit sum in 32-bit
ADD16S	LDD	#(-1)	; default negative
	JSR	ALCLD4	; returns with PSP in X
	TST	6,X	; the left-hand operand sign bit
	BMI	ADD16SR
	CLR	2,X	; positive
	CLR	3,X
ADD16SR	TST	4,X	; the right-hand operand sign bit
	BMI	ADD16SL
	CLR	0,X	; positive
	CLR	1,X
ADD16SL	LDD	6,X	; left hand 
	ADDD	4,X	; right hand
	STD	6,X	; store low half
	LDD	2,X
	ADCB	1,X
	ADCA	0,X
	STD	4,X
*
	LDAB	#4	; shorter and faster than 4*INX, walks on B
	ABX
	STX	PSP	; drop the temporaries
	RTS
*
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
*   16-bit left, right in 32-bit
* output parameter:
*   17-bit sum in 32-bit
ADD16U	LDX	PSP
	LDD	2,X	; left
	ADDD	0,X	; add right
	STD	2,X	; save low
	LDD	#0	; extend
	ROLB		; extend Carry unsigned (could ADC #0)
	STD	0,X	; re-use right side to store high half
*
	RTS		; PSP unchanged
*
* Etc.
*
*
***
* Parameter stack when called by MAIN
* with one 16-bit parameters,
* after after entry (before temporary allocation)
* [<unknown>  ]
* [32:VAR1_1  ]
* [32:VAR1_2  ]
* [16:PARAM2_1]
* [16:PARAM2_2] <= PSP
*
* Instead of walking the stack, pass in a pointer --
* Add 16-bit signed parameter
* to 32 bit caller's 2nd 32-bit internal variable.
* input parameter:
*   16-bit pointer to 32-bit integer
*   16-bit addend
* no output parameter:
ADD16SI	LDD	#(-1)	; make a temporary -1
	JSR	PPSHD	; (default to signed) returns with PSP in X, 2 bytes on stack
	TST	2,X	; test parameter high byte
	BMI	ADD16SP
	CLR	0,X	; zero extend
	CLR	1,X
ADD16SP	LDX	4,X	; pointer to caller's local
	LDD	2,X	; caller's 2nd variable, low
	LDX	PSP
	ADDD	2,X	; parameter
	LDX	4,X	; pointer
	STD	2,X	; update low half with result
	LDD	0,X	; 2nd variable, high half
	LDX	PSP
	ADCB	1,X	; sign extension half
	ADCA	0,X
	LDX	4,X	; pointer
	STD	0,X	; update high half
*
	LDX	PSP
	LDAB	#6	; drop sign temporary and two parameters
	ABX
	STX	PSP
	RTS
*
*
***
* Return stack on entry:
* [SSTKNDR ]
* [SSTKNDR ]SSTKBAS
* [RETADR0 ] <= SP
*
* Parameter stack after local allocation
* [<unknown>]
* [VAR1_1--]
* [VAR1_2--] <= PSP
*
MAIN	JSR	ALCL8	; allocate and clear 8 bytes
*
	LDD	#$1234
	JSR	PPSHD
	LDD	#$CDEF
	JSR	PPSHD
	JSR	ADD16U	; 32-bit result on parameter stack should be $0000E023
	LDX	PSP	; order is okay, low half where we want it (PSP returned in X anyway)
	LDD	#$8765	; reuse high half
	STD	0,X
	JSR	ADD16S	; result on parameter stack should be $FFFF6788
	LDX	PSP	; (PSP returned in X anyway)
	LDD	2,X	; result low half
	STD	6,X	; to 2nd local variable low half
	LDD	0,X	; result high half
	STD	4,X	; to 2nd local variable high half
	LDD	PSP	; address of 2nd local variable
	ADDD	#4
	STD	2,X	; pointer is 1st arg
	LDD	#$A5A5
	STD	0,X	; 1st arg
	JSR	ADD16SI	; result in 2nd variable should be FFFF0D2D (Carry set)
	LDX	PSP	: unnecssary, ...
	LDD	2,X	; 2nd variable low half
	LDX	LB_BASE
	STD	FINALX+2,X
	LDX	PSP
	LDD	0,X
	LDX	LB_BASE
	STD	FINALX,X
*
	LDD	PSP
	ADDD	#8	; deallocate the locals
	STD	PSP
	RTS
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP
***
* (who knows?) <= FP
***
*
***
* Return stack will be just the return address:
* [RETADRNN  ]
*
* Return stack after initialization:
* [SSTKNDR ]
* [SSTKNDR ]SSTKBAS <= SP
*
*
* Parameter stack after initialization, mark:
* [<unknown]PSTKBAS <= PSP
*
START	JSR	INISTKS
*
	JSR	MAIN
*
DONE	NOP
ERROR	NOP	; define error labels as something not DONE, anyway
SSTKNDR	NOP
	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP
	NOP		; another landing pad to set breakpoint at
	NOP
	LDX	$FFFE
	JMP	0,X	; alternatively, jmp through reset vector
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor, 
* but not necessarily to your program in a runnable state.

Again, I have tested the code and it produces the correct results without stack frames.

If you think you've seen enough for now, go ahead and move ahead with getting numeric output in binary. Otherwise, I'll be cleaning the stack frame support code out of the 6809 examples next. 

[JMR202412050835 daydream addendum:]

Before we leave this topic behind, if you've been following what I've been talking about on this detour, I could mention a bit more of my daydreams.

I think I've mentioned it in passing, but I have often thought it was unfortunate that Motorola didn't push the opcodes around a bit and keep direct mode address for the unary operator -- INC, ROL, TST, etc. -- instructions. (They did so on the 6809.) 

In fact, I would have preferred that they had kept direct page and left out extended mode for unary instructions. (They did exactly that for the 6805.)

"Unary" operators on the 68XX CPUs are mostly read-modify-write instructions that would benefit greatly, in terms of timing and object code efficiency, from having short-addressed versions, and they would also help make the direct page area even more of a psuedo-register memory file.

But we didn't really understand principles of locality in coding back then, so we can, shifting ourselves back to the context of the 1960s and '70s, understand why they saw it as a reasonable tradeoff, and why they wanted to leave as many op-codes as possible available for "inherent" mode operators that didn't seem to fit the unary/binary operator partition they were using -- like Add B to A (ABA), et. al.

If they had, or if, in producing the 6801 as an object-code compatible upgrade to the 6800, they had been willing to produce a mnemonic-level compatible object-code incompatible version of the 6801 with direct-page versions of the the unary operators -- daydream warning! -- it should have been possible to shave at least two cycles off the timing, compared to the 6801's extended mode timing (6 cycles extended, vs. potentially 4 cycles direct-page), giving more meaning to the idea of pseudo-registers -- or making the direct page more of a static cache. 

And if the RAM were going to be built-in (as it pretty much always was in 6801 SOCs), it might even have been possible to shave off yet another cycle, bringing DP variables within a cycle of accumulator timing.

And ... well, the 6801 has 16-bit shifts of the double accumulator,  so why not have 16-bit shifts and increments/decrements for direct page variables? Yeah, maybe that's just being greedy.

And, then, here's yet another step out into alternate reality -- a couple of extra address lines (48-pin DIP packages?) for address space, and it would be possible to distinguish between accessing code, data, stack, and the direct page, helping expand the address range beyond the tight squeeze of 64K.

Erk. Lost in my daydreams again. No wonder it takes me so long to get things done.

Okay, moving on to the 6809 examples, or skipping ahead to getting numeric output in binary.

[JMR202412050835 daydream addendum end.]


(Title Page/Index)


 

 

 

 

Thursday, November 28, 2024

ALPP 02-31 -- More Looking in the Rear-view Mirror -- Single-stack No Frame Example: 6801

More treasure from the bottom of the pool.

  More Looking in the Rear View Mirror --
Single-stack No Frame Example:
6801

(Title Page/Index)

 

Not much to say that I haven't already said. We've seen frameless for the 6800, both the single-stack frameless discipline of one chapter back and the split-stack frameless discipline that we just finished. I'm not sure but what I should leave the 6801, 6809, and 68000 versions as exercises for the interested reader, but I'm a sucker for easy puzzles, so I'll post them anyway. There are plenty of things an interested reader can think of to try for him- or herself.

One thing to pay attention to as you go through is the fact that I have left the utility routines out. Doing them in-line is not that much more code than a JSR, and I didn't want to hide what's going on. That's how much of an improvement the 6801 is over the 6800.

The down side of doing it in-line (by hand) is that there are more opportunities for mistakes.

Go ahead and read the code and compare, and if you are not sure you understand what's going on, single-step through the code.
* 16-bit addition as example of single-stack no frame discipline on 6801,
* with test code
* Joel Matthew Rees, October, November 2024
*
	OPT	6801
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS says this is a good place for usr stuff.
*
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
	NOP		; bumper
	NOP		; 6 bytes to this point.
SSAVE	RMB	2	; a place to keep S so we can return clean
	RMB	4	; bumper
* All of the pseudo-registers must be saved and restored on context switch,
* cannot be accessed during interrupt service.
XWORK	RMB	2	; For saving an index register temporarily
DWORK	RMB	2	; For saving D temporarily
LB_BASE	RMB	2	; For process local variables
HPPTR	RMB	2	; heap pointer (not yet managed)
HPALL	RMB	2	; heap allocation pointer
HPLIM	RMB	2	; heap limit
* End of pseudo-registers
	RMB	4	; bumper
GAP1	RMB	2	; Mark the bottom of the gap
*
*
*
	ORG	$2000	; Give the DP room.
LB_ADDR	RMB	4	; a little bumper space
FINAL	RMB	4	; 32-bit Final result in DP variable (to show we can)
FINALX	EQU	4
	RMB	4	; buffer
STKLIM	RMB	192	; roughly 16 to 20 levels of call
STKLIMX	EQU	FINALX+8
STKBAS	RMB	4	; for canary return
STKBASX	EQU	STKLIMX+192
STKFAK	RMB	2	; fake frame pointer, self-link
STKFAKX	EQU	STKBASX+4	; 6801 is post-dec (post-store-decrement) push
STKBMP	RMB	4	; a little bumper space
STKBMPX	EQU	STKFAKX+2	; But we are going to init S through X
*
* My assembler limits RMBs to $100 long, so we'll use a different way.
HBASE	RMB	1	; $1024 or something	; Not using or managing heap yet.
HBASEX	EQU	STKBMPX+4
*HLIM	RMB	4	; bumper
*HLIMX	EQU	HBASEX+$100	; 1024
*
*
	ORG	$3000
CDBASE	JMP	ERROR		; more bumpers
	NOP
INISTK	LDX	#LB_ADDR	; set up process local space
	STX	LB_BASE		; local space functional
	LDD	LB_BASE		; bootstrap own stack
	ADDD	#STKBASX
	STD	XWORK	; avoid using BIOS stack
	LDX	XWORK	; ready own stack pointer
*
	PULA		; pop real return address
	PULB
	STS	SSAVE	; save stack pointer from monitor ROM
	TXS		; move to our own stack (let TXS convert it)
	PSHB		; put return address on own stack
	PSHA		; stack now ready for interrupts, utility routines
*
	LDD	#STKUNDR
	STD	0,X	; in the cell beyond empty stack pointer
	STD	2,X	; and the next cell, for good measure
*
	LDD	LB_BASE	
	ADDD	#HBASEX
	STD	HPPTR		; as if we were ready to use heap
	STD	HPALL
	LDD	#CDBASE
	SUBD	#4
	STD	HPLIM
	RTS		; finally done, now can return
*
***
* Not generating a stack frame
*
* Cross-section of general stack structure in called routine:
* [{LOCVAR}] for calling routine
* [{TEMP}  ] for calling routine
* [PARAM   ] from calling routine
* [RETADR  ] to calling routine
* [LOCVAR  ] for called -- current -- routine
* [TEMP    ] for called -- current -- routine
* [(PARAM) ] to be passed to a further call
*
* Broader cross-section, showing nesting for routine 3, in-flight:
* [RETADR1 ] 
* [LOCVAR2 ]
* [TEMP2   ]
* [PARAM3  ]
* [RETADR2 ]
* [LOCVAR3 ]
* [TEMP3   ]
* [(PARAM4)] <= SP (return stack pointer (6800 S is byte below))
***
*
***
* Utility routines left out
*
* Let the caller do allocation after.
*
* Stack at entry, before allocation
* when functions are called by MAIN
* with two 32-bit parameters
* We will return result in D:X
* [STKUNDR ]
* [STKUNDR ]STKBAS
* [RETADR0 ] 
* [32:VAR1_1]
* [32:VAR1_2]
* [PARAM2_1]
* [PARAM2_2]
* [RETADR1 ] <= SP (return stack pointer (6800 S is byte below))
*
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
*   16-bit left 1st pushed, right 2nd
* output parameter:
*   17-bit sum in 32-bit D:X D high, X low
* Does not alter the parameters.
ADD16S	TSX		; no local allocations
	LDAA	#(-1)	; prepare for sign extension
	TST	4,X	; the left-hand operand sign bit
	BMI	ADD16SR
	CLRA		; zero extend
ADD16SR	PSHA		; push left extension
	PSHA		; left sign cell below X now
	LDAA	#(-1)	; reload
	TST	2,X	; the right-hand operand sign bit
	BMI	ADD16SL
	CLRA		; zero extend
ADD16SL	PSHA		; push right extension
	PSHA
	TSX		; point to sign extensions (4 temporary bytes on stack)
	LDD	8,X	; left-hand low cell
	ADDD	6,X	; right-hand low cell
	STD	XWORK	; save low half of result
	LDD	2,X	; left-hand extension
	ADCB	1,X	; right-hand extension
	ADCA	0,X	; high half done
*
	INS		; fastest to just drop the temporaries
	INS
	INS
	INS
	LDX	XWORK	; get low half of result
	RTS		; result is in D:X
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
*   16-bit left, right
* output parameter:
*   17-bit sum in 32-bit D:X D high
ADD16U	TSX		; no local allocations
	LDD	4,X	; left
	ADDD	2,X	; right
	STD	XWORK	; save low half
	LDD	#0
	ADCB	#0
*
	LDX	XWORK	; get low half of result
	RTS		; result is in D:X
*
* Etc.
*
***
*
* Stack after LINK #0 when fuctions are called by MAIN
* with one input parameter
* (#0 means no local variables)
* [STKUNDR ]
* [STKUNDR ]STKBAS
* [RETADR0 ] 
* [32:VAR1_1]
* [32:VAR1_2] 
* [PARAM2_1] (pointer)
* [PARAM2_2] (addend)
* [RETADR1 ] <= SP (return stack pointer (6800 S is byte below))
*
* To show how to access caller's local through pointer
* instead of walking stack --
* Add 16-bit signed parameter
* to 32 bit caller's 32-bit internal variable.
* input parameter:
*   16-bit pointer to 32-bit integer
*   16-bit addend
* no output parameter:
ADD16SI	TSX		; no local allocations up front
	LDAA	#(-1)
	TST	2,X	; high byte of paramater
	BMI	ADD16SIP
	CLRA
ADD16SIP	PSHA	; save the sign extension half (2 temporary bytes on stack)
	PSHA
	LDX	4,X	; get caller's pointer
	LDD	2,X	; caller's 2nd variable, low
	TSX
	ADDD	4,X	; parameter
	LDX	6,X	; caller's pointer
	STD	2,X	; save result low half away
	LDD	0,X	; caller's 2nd variable, high
	TSX
	ADCB	1,X	; sign extension half
	ADCA	0,X
	LDX	6,X	; caller's pointer
	STD	0,X	; save result high half away
*
	INS		; drop temporary 
	INS
	RTS		; no result to load
*
*
***
* Stack after local allocation
* [STKUNDR ]
* [STKUNDR ]STKBAS
* [RETADR0 ] 
* [32:VAR1_1]
* [32:VAR1_2] <= SP
*
MAIN	LDX	#0
	PSHX		; four pushes is only one byte more than a call. 
	PSHX
	PSHX
	PSHX
*
	LDX	#$1234	; parameters
	PSHX
	LDX	#$CDEF
	PSHX
	JSR	ADD16U	; result in D:X should be $E023
	INS		; could reuse instead of dropping
	INS
	INS
	INS
	PSHX		; low half
	LDX	#$8765
	PSHX
	JSR	ADD16S	; result in D:X should be $FFFF6788
	STX	XWORK
	STD	DWORK
	INS		; could reuse instead of dropping
	INS
	INS
	INS
	TSX
	LDD	XWORK
	STD	2,X
	LDD	DWORK
	STD	0,X
*	LDAB	#0	; calculate pointer
*	ABX		; would use ABX here if there were an offset.
	PSHX
	LDX	#$A5A5
	PSHX
	JSR	ADD16SI		; result in 2nd variable should be FFFF0D2D
	INS		; drop parameters
	INS
	INS
	INS
	TSX
	LDD	2,X		; low half
	LDX	LB_BASE		; store it in FINAL, in process local space
	STD	FINALX+2,X
	TSX
	LDD	0,X		; high half
	LDX	LB_BASE
	STD	FINALX,X
*
	TSX
	LDAB	#8
	ABX
	TXS
	RTS
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP
***
* (who knows?) <= FP
***
* (who knows?) <= VBP
***
*
* Stack after initialization:
* [STKUNDR ]
* [STKUNDR ]STKBAS <= SP
***
*
START	NOP
	JSR	INISTK
	NOP
*
	JSR	MAIN
*
DONE	NOP
ERROR	NOP	; define error labels as something not DONE, anyway
STKUNDR	NOP
	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad to set breakpoint at
	NOP
	NOP
	LDX	$FFFE	; alternatively, jmp through reset vector
	JMP	0,X
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor, 
* but not necessarily to your program in a runnable state.

If you've seen enough binary output is still waiting. (And it will still be waiting in a few more hours or days, really.) 

If not, split stack with no stack frames is also great on the 6801, even a bit better than what we saw here.

 -- 

Maybe this would be a good place to bring up (again?) the regrets I have that Motorola didn't include a SBX subtract B from X instruction in the 6801. It would have been useful in the stack allocation code as you can see from where I used (and didn't use) ABX. It would also have been useful to have an add immediate to index AIX op-code, possibly 16-bit to do both allocation and deallocation, or signed 8-bit, or unsigned, paired with a subtract immediate from X (SIX?) instruction.

Yeah, more daydreams. Sorry. --


 (Title Page/Index)

 


 

 

Wednesday, November 27, 2024

ALPP 02-30 -- Ascending the Right Island -- Split-stack No Frame Example: 6800

Leaving those rubber bricks at the side of the pool, let's keep going down for more treasure.

  Ascending the Right Island --
Split-stack No Frame Example:
6800

(Title Page/Index)

 

At this point, from working through the single-stack example for the 6800 without stack frames, you might be seeing the reasoning behind stack frames. It can be really difficult figuring out where your data is and where it should be heading without some frame of reference, and stack frames do provide a frame of reference when you're deep in the arcane definitions of some routine. 

But building the code to support the stack frames tends to consume time and energy that you'd rather devote to the actual problem at hand, unless your CPU provides high-level support for the frames. It tends to end up a mixed blessing at best, with net costs usually, in my opinion, outweighing benefits, even when your CPU  supports it.

Here on the 6800, we can see those costs most clearly by looking carefully at the code I present here, reading the source code in a text editor while stepping through it in the simulator, and comparing it with the split-stack stack frame version and the single-stack versions. 

Before you get to wondering why anyone wanted to use a stack frame in the first place, it's worth noting that stack frames' utility became especially especially apparent in very large procedures with complex logic. When your procedure extends to hundreds of lines of code (or more) with dozens of variables (or more), you use tools in the assembler to name your local variables by their offset from the frame base pointer, and it helps greatly to manage the complexity. 

And it helps in constructing compilers, especially in the initial "bootstrap stages" of development. The compiler may be able to manage constructing and tearing down the frames more easily than it could handle remembering changing offsets.

But.

The frames get in the way. 

Especially when return addresses are inside the stack frames, they get in the way.

All the benefits of stack frames can, in fact, be found in this simple example of split-stack frameless coding discipline. You might think it's just my opinion, but I'll explain further as we go.

I think the code explains itself, particularly when comparing it to the split-stack example with stack frames and the single-stack example without frames, that we just finished.

One thing that might be a point of interest, I had thought I would use an ADDDX Add double accumulator to X routine in MAIN, 

* Could use this in the single-stack no frames example, too.
LEADPX	LDX	PSP	; Add D to PSP and load to X
* Add A:B to X through XWORK.
* Returned in both X and A:B.
* But we won't use it, after all.
ADDDX	STX	XWORK
	ADDB	XWORK+1
	ADDA	XWORK
	STAA	XWORK
	STAB	XWORK+1
	LDX	XWORK
	RTS	

to calculate the effective address of the variable that we are passing, but it worked out to be a wash. Took almost as much code to set it up as to just do it there in place.

Read the code, step through it, compare to what we've worked through so far. Note in particular how we are passing the return values back here, and how it is different from the way we use when working with various kinds of stack frames, and even different from the method of the frameless single-stack discipline:

* 16-bit addition as example of split-stack frame-free discipline on 6800
* using the direct page,
* with test code
* Joel Matthew Rees, October, November 2024
*
	OPT	6800
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS says this is a good place for usr stuff.
*
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
	NOP		; bumper
	NOP		; 6 bytes to this point.
SSAVE	RMB	2	; a place to keep S so we can return clean
	RMB	4	; bumper
* All of the pseudo-registers must be saved and restored on context switch,
* cannot be accessed during interrupt service.
XWORK	RMB	2	; For saving an index register temporarily
DWORK	RMB	2	; For saving D temporarily
PSP	RMB	2	; parameter stack pointer
LB_BASE	RMB	2	; For process local variables
HPPTR	RMB	2	; heap pointer (not yet managed)
HPALL	RMB	2	; heap allocation pointer
HPLIM	RMB	2	; heap limit
* End of pseudo-registers
	RMB	4	; bumper
GAP1	RMB	2	; Mark the bottom of the gap
*
*
*
	ORG	$2000	; Give the DP room.
LB_ADDR	RMB	4	; a little bumper space
FINAL	RMB	4	; 32-bit Final result in DP variable (to show we can)
FINALX	EQU	4
	RMB	4	; Put a bumper after the process static variables
SSTKLIM	RMB	64	; 16 levels of call
SSTKLMX	EQU	FINALX+8
SSTKBAS	RMB	6	; for canary return
SSTKBSX	EQU	SSTKLMX+64
SSTKFAK	RMB	2	; fake frame pointer, self-link
SSTKFAX	EQU	SSTKBSX+6	; 6801 is post-dec (post-store-decrement) push
SSTKBMP	RMB	4	; a little bumper space
SSTKBMX	EQU	SSTKFAX+2	; But we are going to init S through X
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKLMX	EQU	SSTKBMX+4
PSTKBAS	RMB	4	; bumper space -- parameter stack is pre-dec
PSTKBSX	EQU	PSTKLMX+64
PSTKBMP	RMB	4	; a little bumper space
PSTKBMX	EQU	PSTKBSX+4
*
* My assembler limits RMBs to $100 long, so we'll use a different way.
HBASE	RMB	1	; $1024 or something	; Not using or managing heap yet.
HBASEX	EQU	PSTKBMX+4
*HLIM	RMB	4	; bumper
*HLIMX	EQU	HBASEX+$100	; 1024
*
*
	ORG	$3000
CDBASE	JMP	ERROR		; more bumpers
	NOP
*
INISTKS	LDX	#LB_ADDR	; set up process local space
	STX	LB_BASE
	LDAA	LB_BASE		; bootstrap own return stack
	LDAB	LB_BASE+1
	LDX	#SSTKBSX	; Instead of FDB
	STX	XWORK 
	ADDB	XWORK+1
	ADCA	XWORK
	STAB	XWORK+1		; initial return stack pointer
	STAA	XWORK
*
	LDX	#SSTKNDR	; for fake return address
	STX	DWORK		; save it for a moment
	PULA		; pop real return address
	PULB
	LDX	XWORK	; ready own return stack pointer
	STS	SSAVE	; save stack pointer from monitor ROM
	TXS		; move to our own stack (let TXS convert it)
	PSHB		; put return address on own stack
	PSHA		; stack now ready for interrupts
*
	LDAA	DWORK	; error handler for fake return
	LDAB	DWORK+1
	STAA	0,X	; in the cell beyond empty stack pointer
	STAB	1,X	; prime the return stack with error handler
	STAA	2,X	; second fake return to error handler
	STAB	3,X
* 
	LDAA	LB_BASE		; bootstrap parameter stack
	LDAB	LB_BASE+1
	LDX	#PSTKBSX	; Instead of FDB
	STX	XWORK 
	ADDB	XWORK+1		; initial parameter stack pointer
	ADCA	XWORK
	STAA	PSP		; parameter stack now ready
	STAB	PSP+1
*
	LDAA	LB_BASE		; set up heap as if we actually had one
	LDAB	LB_BASE+1
	LDX	#HBASEX		; Instead of FDB
	STX	XWORK 
	ADDB	XWORK+1		; calculat EA
	ADCA	XWORK
	STAA	HPPTR
	STAB	HPPTR+1
	STAA	HPALL		; as if the heap were functional
	STAB	HPALL+1
	LDX	#CDBASE
	STX	XWORK
	LDAA	XWORK
	LDAB	XWORK+1
	SUBB	#4
	SBCA	#0
	STAA	HPLIM
	STAB	HPLIM+1
	RTS
*
*
***
* General structure of the stacks, 
*
* return stack is only the return address
* (and maybe extremely ephemeral temporaries):
* [PRETADR   ]
* [RETADR    ]
*
* order of elements on the parameter stack,
* when they are present:
* [PARAMETERS ]
* [VARIABLES  ]
* [TEMPORARIES]
* [PARAMETERS ]
* [VARIABLES  ]
* [TEMPORARIES]
* [PARAMETERS ] <= PSP
*
* Result is returned on parameter stack
*
***
* Utility routines
*
* Could use this in the single-stack no frames example, too.
*LEADPX	LDX	PSP	; Add D to PSP and load to X
* Add A:B to X through XWORK.
* Returned in both X and A:B.
* But we won't use it, after all.
*ADDDX	STX	XWORK
*	ADDB	XWORK+1
*	ADDA	XWORK
*	STAA	XWORK
*	STAB	XWORK+1
*	LDX	XWORK
*	RTS	
*
PPOPD	LDX	PSP
	LDAA	0,X
	LDAB	1,X
	INX
	INX
	STX	PSP
	RTS
*
* This saves bytes:
ALCL2	CLRA
	CLRB	; fall through
* 
PPSHD	LDX	PSP
	DEX
	DEX
	STX	PSP
	STAA	0,X
	STAB	1,X
	RTS
*
* Compromise between speed and reusability
* Returns with PSP in X.s
* Enter here to load PSP and initialize to 0
* 8 bytes
ALCL8	CLRA	
	CLRB
* Enter here with initial value in A:B
ALCLD8	LDX	PSP
* Enter here with PSP loaded and initial value in D
ALCLI8	DEX
	DEX
	STAA	0,X
	STAB	1,X
ALCLI6	DEX
	DEX
	STAA	0,X
	STAB	1,X
ALCLI4	DEX
	DEX
	STAA	0,X
	STAB	1,X
ALCLI2	DEX		; PPSHD usually costs less.
	DEX
	STAA	0,X
	STAB	1,X
	STX	PSP
	RTS
*
* six bytes
ALCL6	CLRA
	CLRB
ALCLD6	LDX	PSP
	BRA	ALCLI6
*
* four bytes
ALCL4	CLRA
	CLRB
ALCLD4	LDX	PSP
	BRA	ALCLI4
*
* two bytes
*ALCL2	CLRA
*	CLRB
*	LDX	PSP
*	BRA	ALCLI2
*
*
PDROP8	LDAB	#8	; saves two bytes, 7 vs. 3
PDROP_B	CLRA
* Add A:B to PSP -- negative for allocation, positive for deallocation
ADDPSP	ADDB	PSP+1
	ADCA	PSP
	STAA	PSP
	STAB	PSP+1
	LDX	PSP	; return with X ready
	RTS
*
PDROP6	LDAB	#6
	BRA	PDROP_B	
*
PDROP4	LDAB	#4
	BRA	PDROP_B	
*
PDROP2	LDAB	#2	; JSR is 3 bytes, LDX PSP; INX; INX; STX PSP is 6
	BRA	PDROP_B	
*
*
***
* Return stack when functions are called by MAIN
* Return stack on entry, after link:
* [SSTKNDR ]
* [SSTKNDR ]SSTKBAS
* [RETADR0 ]
* [RETADR1 ]
*
* Parameter stack when called by MAIN
* with two 32-bit local variables
* and two 16-bit parameters,
* after mark (no local allocation)
* [<unknown>]
* [32:VAR1_1--]
* [32:VAR1_2--]
* [16:PARAM2_1]
* [16:PARAM2_2] <= PSP
*
****
*
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
*   16-bit left, right
* output parameter:
*   17-bit sum in 32-bit
ADD16S	LDX	PSP
	LDAB	#(-1)	; default negative
	TBA
	JSR	ALCLI4	; allocate 2 temporary cells and init (leaves PSP in X)
	TST	6,X	; the left-hand operand sign bit
	BMI	ADD16SR
	CLR	2,X	; positive
	CLR	3,X
ADD16SR	TST	4,X	; the right-hand operand sign bit
	BMI	ADD16SL
	CLR	0,X	; positive
	CLR	1,X
ADD16SL	LDAA	6,X	; left hand 
	LDAB	7,X
	ADDB	5,X	; right hand
	ADCA	4,X
	STAA	6,X	; store low half
	STAB	7,X
	LDAA	2,X
	LDAB	3,X
	ADCB	1,X
	ADCA	0,X
	STAA	4,X	; store high half
	STAB	5,X
	JSR	PDROP4
	RTS
*
* The alternative, without link, mark, or restore?
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
*   16-bit left, right in 2 16-bit
* output parameter:
*   17-bit sum in 32-bit
ADD16U	LDX	PSP
	LDAA	2,X	; left
	LDAB	3,X
	ADDB	1,X	; add right
	ADCA	0,X
	STAA	2,X	; save low in left side
	STAB	3,X
	LDAB	#0	; extend
	ADCB	#0	; extend Carry unsigned (could ROL)
	STAB	1,X	; re-use right side to store high half
	CLR	0,X	; only bit 8 can be affected
	RTS
*
* Etc.
*
*
***
* Parameter stack when called by MAIN
* with one 16-bit parameters,
* after mark (no local allocation)
* [<unknown>  ]
* [32:VAR1_1  ]
* [32:VAR1_2  ]
* [16:PARAM2_1]
* [16:PARAM2_2] <= PSP
*
* Instead of walking the stack, pass in a pointer --
* Add 16-bit signed parameter
* to 32 bit caller's 2nd 32-bit internal variable.
* input parameter:
*   16-bit pointer to 32-bit integer
*   16-bit addend
* no output parameter:
ADD16SI	LDAB	#(-1)	; make a temporary -1
	TBA
	JSR	PPSHD	; default to signed (leaves PSP in X)
	TST	2,X	; test high byte
	BMI	ADD16SP
	CLR	0,X	; zero extend
	CLR	1,X
ADD16SP	LDX	4,X	; get pointer to target
	LDAA	2,X	; target low
	LDAB	3,X
	LDX	PSP
	ADDB	3,X	; parameter
	ADCA	2,X
	LDX	4,X	: pointer to target
	STAA	2,X	; update low half with result
	STAB	3,X
	LDAA	0,X	; target, high half
	LDAB	1,X
	LDX	PSP
	ADCB	1,X	; sign extension half
	ADCA	0,X
	LDX	4,X	; target
	STAA	0,X	; update high half
	STAB	1,X
	JSR	PDROP6	; drop temporary and parameters
	RTS
*
*
***
* Return stack on entry:
* [SSTKNDR ]
* [SSTKNDR ]SSTKBAS
* [RETADR0 ] <= RSP
*
*
* Parameter stack after mark and local allocation
* [<unknown>]
* [VAR1_1--]
* [VAR1_2--] <= PSP
*
MAIN	JSR	ALCL8	; allocate and clear 8 bytes
	LDAA	#$12
	LDAB	#$34
	JSR	PPSHD
	LDAA	#$CD
	LDAB	#$EF
	JSR	PPSHD
	JSR	ADD16U	; 32-bit result on parameter stack should be $0000E023
	LDAA	#$87	; ADD16U leaves PSP in X
	LDAB	#$65
	STAA	0,X	; reuse low half of result space, overwrite high half
	STAB	1,X
	JSR	ADD16S	; result on parameter stack should be $FFFF6788
	LDAA	2,X	; result low half -- ADD16S leaves PSP in X
	LDAB	3,X	; put result away
	STAA	6,X	; to 2nd local variable low half
	STAB	7,X
	LDAA	0,X	; result high half
	LDAB	1,X
	STAA	4,X	; to 2nd local variable high half
	STAB	5,X
	STX	XWORK	; instead of JSR ADDDX: 
	LDAB	XWORK+1	; LDAB #4; CLRA; JSR ADDDX; LDX PSP; STAB 3,X; STAA 2,X
	LDAA	XWORK	; Moving results around takes a lot of code,
	ADDB	#4 	; So just do it here.
	ADCA	#0
	STAB	3,X
	STAA	2,X
	LDAA	#$A5
	TAB		; don't really need to use both, just making things clear.
	STAA	0,X
	STAB	1,X
	JSR	ADD16SI	; result in 2nd variable should be FFFF0D2D (Carry set)
	LDAA	2,X	; 2nd variable low half -- ADD16SI leaves PSP in X
	LDAB	3,X
	LDX	LB_BASE
	STAA	FINALX+2,X
	STAB	FINALX+3,X
	LDX	PSP
	LDAA	0,X
	LDAB	1,X
	LDX	LB_BASE
	STAA	FINALX,X
	STAB	FINALX+1,X
	JSR	PDROP8	; ADD16SI also dropped its arguments for us, so only locals
	RTS
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP
***
*
***
* Return stack will only contain return addresses (and very ephemeral temporaries):
* [RETADRNN  ]
*
* Return stack after initialization:
* [SSTKNDR ]
* [SSTKNDR ]SSTKBAS <= RSP
*
*
* Parameter stack after initialization:
* [<unknown]PSTKBAS <= PSP
*
START	JSR	INISTKS
*
	JSR	MAIN
*
*
DONE	NOP
ERROR	NOP	; define error labels as something not DONE, anyway
SSTKNDR	NOP
	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP
	NOP		; another landing pad to set breakpoint at
	NOP
	LDX	$FFFE
	JMP	0,X	; alternatively, jmp through reset vector
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor, 
* but not necessarily to your program in a runnable state.

As always, I have tested this code, and it produces the correct results without stack frames, passing both input and return parameters on the stack, except for utility routines which use lower level register protocols not available to higher-level routines. 

I will be pointing you back here later. If this talk about stack frames and parameter passing methods seems a little fuzzy at this point, it's okay to move ahead for now.

You may want to move ahead with getting numeric output in binary, or you might want to see how single-stack, no-frame parameter passing works on the 6801, next.


(Title Page/Index)