Still digging into that
treasure from the bottom of the pool.
Ascending the Right Island --
Split-stack No Frame Example:
6801
About the only thing I want to point out here is that, with the support for 16-bit operations on the 6801, it becomes easier to see how splitting the return address allows a more seamless approach to passing parameters than the single-stack no-frame example we just finished.
Hopefully the code is mostly self-explanatory by now. (We've been looking at the meat of it for so long ...)
Compare with both the single-stack example for the 6801 and the split-stack example for the 6800 to help see what is and is not going on.
As always, read the code and step through it:
* 16-bit addition as example of split-stack frame-free discipline on 6801
* using the direct page,
* with test code
* Joel Matthew Rees, October, November 2024
*
OPT 6801
NATWID EQU 2 ; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
ORG $80 ; MDOS says this is a good place for usr stuff.
*
ENTRY JMP START
NOP ; Just want even addressed pointers for no reason.
NOP ; bumper
NOP ; 6 bytes to this point.
SSAVE RMB 2 ; a place to keep S so we can return clean
RMB 4 ; bumper
* All of the pseudo-registers must be saved and restored on context switch,
* cannot be accessed during interrupt service.
XWORK RMB 2 ; For saving an index register temporarily
DWORK RMB 2 ; For saving D temporarily
PSP RMB 2 ; parameter stack pointer
LB_BASE RMB 2 ; For process local variables
HPPTR RMB 2 ; heap pointer (not yet managed)
HPALL RMB 2 ; heap allocation pointer
HPLIM RMB 2 ; heap limit
* End of pseudo-registers
RMB 4 ; bumper
GAP1 RMB 2 ; Mark the bottom of the gap
*
*
*
ORG $2000 ; Give the DP room.
LB_ADDR RMB 4 ; a little bumper space
FINAL RMB 4 ; 32-bit Final result in DP variable (to show we can)
FINALX EQU 4
RMB 4 ; Put a bumper after the process static variables
SSTKLIM RMB 64 ; 16 levels of call
SSTKLMX EQU FINALX+8
SSTKBAS RMB 4 ; for canary return
SSTKBSX EQU SSTKLMX+64
SSTKFAK RMB 2 ; fake frame pointer, self-link
SSTKFAX EQU SSTKBSX+8 ; 6801 is post-dec (post-store-decrement) push
SSTKBMP RMB 4 ; a little bumper space
SSTKBMX EQU SSTKFAX+2 ; But we are going to init S through X
PSTKLIM RMB 64 ; 16 levels of call at two parameters per call
PSTKLMX EQU SSTKBMX+4
PSTKBAS RMB 4 ; bumper space -- parameter stack is pre-dec
PSTKBSX EQU PSTKLMX+64
PSTKBMP RMB 4 ; a little bumper space
PSTKBMX EQU PSTKBSX+4
*
* My assembler limits RMBs to $100 long, so we'll use a different way.
HBASE RMB 1 ; $1024 or something ; Not using or managing heap yet.
HBASEX EQU PSTKBMX+4
*HLIM RMB 4 ; bumper
*HLIMX EQU HBASEX+$100 ; 1024
*
*
ORG $3000
CDBASE JMP ERROR ; more bumpers
NOP
*
INISTKS LDX #LB_ADDR ; set up process local space
STX LB_BASE
LDD LB_BASE ; bootstrap own return stack
ADDD #SSTKBSX
STD XWORK
LDX XWORK ; initial return stack pointer
*
LDD #SSTKNDR
STD 0,X ; in the cell beyond empty stack pointer
STD 2,X ; and the next cell, for good measure
PULA ; pop real return address
PULB
STS SSAVE ; save stack pointer from monitor ROM
TXS ; move to our own stack (let TXS convert it)
PSHB ; put return address on own stack
PSHA ; stack now ready for interrupts
*
LDD LB_BASE ; bootstrap parameter stack
ADDD #PSTKBSX
STD PSP ; parameter stack now ready
*
LDAA LB_BASE ; set up heap as if we actually had one
LDAB LB_BASE+1
ADDD #HBASEX
STD HPPTR
STD HPALL ; as if the heap were functional
LDD #CDBASE
SUBD #4
STAA HPLIM
RTS
*
*
***
* General structure of the stacks,
*
* return stack is just the return address:
* [PRETADR ]
* [RETADR ] <= SP
*
* order of elements on the parameter stack,
* when they are present:
* [PARAMETERS ]
* [VARIABLES ]
* [TEMPORARIES]
* [PARAMETERS ]
* [VARIABLES ]
* [TEMPORARIES]
* [PARAMETERS ] <= PSP
*
* Result is returned on parameter stack
*
***
*
* Utility routines
*
PPOPD LDX PSP
LDD 0,X
INX
INX
STX PSP
RTS
*
* This saves bytes:
ALCL2 CLRA
CLRB ; fall through
*
PPSHD LDX PSP
ALCLI2 DEX
DEX
STX PSP
STD 0,X
RTS
*
* Compromise between speed and reusability
* Returns with PSP in X.
* Enter here to load PSP and initialize to 0
* 8 bytes
ALCL8 CLRA
CLRB
* Enter here with initial value in A:B
ALCLD8 LDX PSP
* Enter here with PSP loaded and initial value in D
ALCLI8 DEX
DEX
STD 0,X
ALCLI6 DEX
DEX
STD 0,X
ALCLI4 DEX
DEX
STD 0,X
BRA ALCLI2
*
* six bytes
ALCL6 CLRA
CLRB
ALCLD6 LDX PSP
BRA ALCLI6
*
* four bytes
ALCL4 CLRA
CLRB
ALCLD4 LDX PSP
BRA ALCLI4
*
*
***
* Return stack when functions are called by MAIN
* Return stack on entry
* [SSTKNDR ]
* [SSTKNDR ]SSTKBAS
* [RETADR0 ]
* [RETADR1 ] <= SP
*
* Parameter stack when called by MAIN
* with two 32-bit local variables
* and two 16-bit parameters,
* after entry (before temporary allocation)
* [<unknown>]
* [32:VAR1_1--]
* [32:VAR1_2--]
* [16:PARAM2_1]
* [16:PARAM2_2] <= PSP
*
****
*
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
* 16-bit left, right
* output parameter:
* 17-bit sum in 32-bit
ADD16S LDD #(-1) ; default negative
JSR ALCLD4 ; returns with PSP in X
TST 6,X ; the left-hand operand sign bit
BMI ADD16SR
CLR 2,X ; positive
CLR 3,X
ADD16SR TST 4,X ; the right-hand operand sign bit
BMI ADD16SL
CLR 0,X ; positive
CLR 1,X
ADD16SL LDD 6,X ; left hand
ADDD 4,X ; right hand
STD 6,X ; store low half
LDD 2,X
ADCB 1,X
ADCA 0,X
STD 4,X
*
LDAB #4 ; shorter and faster than 4*INX, walks on B
ABX
STX PSP ; drop the temporaries
RTS
*
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
* 16-bit left, right in 32-bit
* output parameter:
* 17-bit sum in 32-bit
ADD16U LDX PSP
LDD 2,X ; left
ADDD 0,X ; add right
STD 2,X ; save low
LDD #0 ; extend
ROLB ; extend Carry unsigned (could ADC #0)
STD 0,X ; re-use right side to store high half
*
RTS ; PSP unchanged
*
* Etc.
*
*
***
* Parameter stack when called by MAIN
* with one 16-bit parameters,
* after after entry (before temporary allocation)
* [<unknown> ]
* [32:VAR1_1 ]
* [32:VAR1_2 ]
* [16:PARAM2_1]
* [16:PARAM2_2] <= PSP
*
* Instead of walking the stack, pass in a pointer --
* Add 16-bit signed parameter
* to 32 bit caller's 2nd 32-bit internal variable.
* input parameter:
* 16-bit pointer to 32-bit integer
* 16-bit addend
* no output parameter:
ADD16SI LDD #(-1) ; make a temporary -1
JSR PPSHD ; (default to signed) returns with PSP in X, 2 bytes on stack
TST 2,X ; test parameter high byte
BMI ADD16SP
CLR 0,X ; zero extend
CLR 1,X
ADD16SP LDX 4,X ; pointer to caller's local
LDD 2,X ; caller's 2nd variable, low
LDX PSP
ADDD 2,X ; parameter
LDX 4,X ; pointer
STD 2,X ; update low half with result
LDD 0,X ; 2nd variable, high half
LDX PSP
ADCB 1,X ; sign extension half
ADCA 0,X
LDX 4,X ; pointer
STD 0,X ; update high half
*
LDX PSP
LDAB #6 ; drop sign temporary and two parameters
ABX
STX PSP
RTS
*
*
***
* Return stack on entry:
* [SSTKNDR ]
* [SSTKNDR ]SSTKBAS
* [RETADR0 ] <= SP
*
* Parameter stack after local allocation
* [<unknown>]
* [VAR1_1--]
* [VAR1_2--] <= PSP
*
MAIN JSR ALCL8 ; allocate and clear 8 bytes
*
LDD #$1234
JSR PPSHD
LDD #$CDEF
JSR PPSHD
JSR ADD16U ; 32-bit result on parameter stack should be $0000E023
LDX PSP ; order is okay, low half where we want it (PSP returned in X anyway)
LDD #$8765 ; reuse high half
STD 0,X
JSR ADD16S ; result on parameter stack should be $FFFF6788
LDX PSP ; (PSP returned in X anyway)
LDD 2,X ; result low half
STD 6,X ; to 2nd local variable low half
LDD 0,X ; result high half
STD 4,X ; to 2nd local variable high half
LDD PSP ; address of 2nd local variable
ADDD #4
STD 2,X ; pointer is 1st arg
LDD #$A5A5
STD 0,X ; 1st arg
JSR ADD16SI ; result in 2nd variable should be FFFF0D2D (Carry set)
LDX PSP : unnecssary, ...
LDD 2,X ; 2nd variable low half
LDX LB_BASE
STD FINALX+2,X
LDX PSP
LDD 0,X
LDX LB_BASE
STD FINALX,X
*
LDD PSP
ADDD #8 ; deallocate the locals
STD PSP
RTS
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP
***
* (who knows?) <= FP
***
*
***
* Return stack will be just the return address:
* [RETADRNN ]
*
* Return stack after initialization:
* [SSTKNDR ]
* [SSTKNDR ]SSTKBAS <= SP
*
*
* Parameter stack after initialization, mark:
* [<unknown]PSTKBAS <= PSP
*
START JSR INISTKS
*
JSR MAIN
*
DONE NOP
ERROR NOP ; define error labels as something not DONE, anyway
SSTKNDR NOP
LDS SSAVE ; restore the monitor stack pointer
NOP
NOP
NOP ; another landing pad to set breakpoint at
NOP
LDX $FFFE
JMP 0,X ; alternatively, jmp through reset vector
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor,
* but not necessarily to your program in a runnable state.
Again, I have tested the code and it produces the correct results without stack frames.
If you think you've seen enough for now, go ahead and move ahead with getting numeric output in binary. Otherwise, I'll be cleaning the stack frame support code out of the 6809 examples next.
[JMR202412050835 daydream addendum:]
Before we leave this topic behind, if you've been following what I've been talking about on this detour, I could mention a bit more of my daydreams.
I think I've mentioned it in passing, but I have often thought it was unfortunate that Motorola didn't push the opcodes around a bit and keep direct mode address for the unary operator -- INC, ROL, TST, etc. -- instructions. (They did so on the 6809.)
In fact, I would have preferred that they had kept direct page and left out extended mode for unary instructions. (They did exactly that for the 6805.)
"Unary" operators on the 68XX CPUs are mostly read-modify-write instructions that would benefit greatly, in terms of timing and object code efficiency, from having short-addressed versions, and they would also help make the direct page area even more of a psuedo-register memory file.
But we didn't really understand principles of locality in coding back then, so we can, shifting ourselves back to the context of the 1960s and '70s, understand why they saw it as a reasonable tradeoff, and why they wanted to leave as many op-codes as possible available for "inherent" mode operators that didn't seem to fit the unary/binary operator partition they were using -- like Add B to A (ABA), et. al.
If they had, or if, in producing the 6801 as an object-code compatible upgrade to the 6800, they had been willing to produce a mnemonic-level compatible object-code incompatible version of the 6801 with direct-page versions of the the unary operators -- daydream warning! -- it should have been possible to shave at least two cycles off the timing, compared to the 6801's extended mode timing (6 cycles extended, vs. potentially 4 cycles direct-page), giving more meaning to the idea of pseudo-registers -- or making the direct page more of a static cache.
And if the RAM were going to be built-in (as it pretty much always was in 6801 SOCs), it might even have been possible to shave off yet another cycle, bringing DP variables within a cycle of accumulator timing.
And ... well, the 6801 has 16-bit shifts of the double accumulator, so why not have 16-bit shifts and increments/decrements for direct page variables? Yeah, maybe that's just being greedy.
And, then, here's yet another step out into alternate reality -- a couple of extra address lines (48-pin DIP packages?) for address space, and it would be possible to distinguish between accessing code, data, stack, and the direct page, helping expand the address range beyond the tight squeeze of 64K.
Erk. Lost in my daydreams again. No wonder it takes me so long to get things done.
Okay, moving on to the 6809 examples, or skipping ahead to getting numeric output in binary.
[JMR202412050835 daydream addendum end.]
No comments:
Post a Comment