More treasure from the bottom of the pool.
More Looking in the Rear View Mirror --
Single-stack No Frame Example:
6801
Not much to say that I haven't already said. We've seen frameless for the 6800, both the single-stack frameless discipline of one chapter back and the split-stack frameless discipline that we just finished. I'm not sure but what I should leave the 6801, 6809, and 68000 versions as exercises for the interested reader, but I'm a sucker for easy puzzles, so I'll post them anyway. There are plenty of things an interested reader can think of to try for him- or herself.
One thing to pay attention to as you go through is the fact that I have left the utility routines out. Doing them in-line is not that much more code than a JSR, and I didn't want to hide what's going on. That's how much of an improvement the 6801 is over the 6800.
The down side of doing it in-line (by hand) is that there are more opportunities for mistakes.
* 16-bit addition as example of single-stack no frame discipline on 6801,
* with test code
* Joel Matthew Rees, October, November 2024
*
OPT 6801
NATWID EQU 2 ; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
ORG $80 ; MDOS says this is a good place for usr stuff.
*
ENTRY JMP START
NOP ; Just want even addressed pointers for no reason.
NOP ; bumper
NOP ; 6 bytes to this point.
SSAVE RMB 2 ; a place to keep S so we can return clean
RMB 4 ; bumper
* All of the pseudo-registers must be saved and restored on context switch,
* cannot be accessed during interrupt service.
XWORK RMB 2 ; For saving an index register temporarily
DWORK RMB 2 ; For saving D temporarily
LB_BASE RMB 2 ; For process local variables
HPPTR RMB 2 ; heap pointer (not yet managed)
HPALL RMB 2 ; heap allocation pointer
HPLIM RMB 2 ; heap limit
* End of pseudo-registers
RMB 4 ; bumper
GAP1 RMB 2 ; Mark the bottom of the gap
*
*
*
ORG $2000 ; Give the DP room.
LB_ADDR RMB 4 ; a little bumper space
FINAL RMB 4 ; 32-bit Final result in DP variable (to show we can)
FINALX EQU 4
RMB 4 ; buffer
STKLIM RMB 192 ; roughly 16 to 20 levels of call
STKLIMX EQU FINALX+8
STKBAS RMB 4 ; for canary return
STKBASX EQU STKLIMX+192
STKFAK RMB 2 ; fake frame pointer, self-link
STKFAKX EQU STKBASX+4 ; 6801 is post-dec (post-store-decrement) push
STKBMP RMB 4 ; a little bumper space
STKBMPX EQU STKFAKX+2 ; But we are going to init S through X
*
* My assembler limits RMBs to $100 long, so we'll use a different way.
HBASE RMB 1 ; $1024 or something ; Not using or managing heap yet.
HBASEX EQU STKBMPX+4
*HLIM RMB 4 ; bumper
*HLIMX EQU HBASEX+$100 ; 1024
*
*
ORG $3000
CDBASE JMP ERROR ; more bumpers
NOP
INISTK LDX #LB_ADDR ; set up process local space
STX LB_BASE ; local space functional
LDD LB_BASE ; bootstrap own stack
ADDD #STKBASX
STD XWORK ; avoid using BIOS stack
LDX XWORK ; ready own stack pointer
*
PULA ; pop real return address
PULB
STS SSAVE ; save stack pointer from monitor ROM
TXS ; move to our own stack (let TXS convert it)
PSHB ; put return address on own stack
PSHA ; stack now ready for interrupts, utility routines
*
LDD #STKUNDR
STD 0,X ; in the cell beyond empty stack pointer
STD 2,X ; and the next cell, for good measure
*
LDD LB_BASE
ADDD #HBASEX
STD HPPTR ; as if we were ready to use heap
STD HPALL
LDD #CDBASE
SUBD #4
STD HPLIM
RTS ; finally done, now can return
*
***
* Not generating a stack frame
*
* Cross-section of general stack structure in called routine:
* [{LOCVAR}] for calling routine
* [{TEMP} ] for calling routine
* [PARAM ] from calling routine
* [RETADR ] to calling routine
* [LOCVAR ] for called -- current -- routine
* [TEMP ] for called -- current -- routine
* [(PARAM) ] to be passed to a further call
*
* Broader cross-section, showing nesting for routine 3, in-flight:
* [RETADR1 ]
* [LOCVAR2 ]
* [TEMP2 ]
* [PARAM3 ]
* [RETADR2 ]
* [LOCVAR3 ]
* [TEMP3 ]
* [(PARAM4)] <= SP (return stack pointer (6800 S is byte below))
***
*
***
* Utility routines left out
*
* Let the caller do allocation after.
*
* Stack at entry, before allocation
* when functions are called by MAIN
* with two 32-bit parameters
* We will return result in D:X
* [STKUNDR ]
* [STKUNDR ]STKBAS
* [RETADR0 ]
* [32:VAR1_1]
* [32:VAR1_2]
* [PARAM2_1]
* [PARAM2_2]
* [RETADR1 ] <= SP (return stack pointer (6800 S is byte below))
*
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
* 16-bit left 1st pushed, right 2nd
* output parameter:
* 17-bit sum in 32-bit D:X D high, X low
* Does not alter the parameters.
ADD16S TSX ; no local allocations
LDAA #(-1) ; prepare for sign extension
TST 4,X ; the left-hand operand sign bit
BMI ADD16SR
CLRA ; zero extend
ADD16SR PSHA ; push left extension
PSHA ; left sign cell below X now
LDAA #(-1) ; reload
TST 2,X ; the right-hand operand sign bit
BMI ADD16SL
CLRA ; zero extend
ADD16SL PSHA ; push right extension
PSHA
TSX ; point to sign extensions (4 temporary bytes on stack)
LDD 8,X ; left-hand low cell
ADDD 6,X ; right-hand low cell
STD XWORK ; save low half of result
LDD 2,X ; left-hand extension
ADCB 1,X ; right-hand extension
ADCA 0,X ; high half done
*
INS ; fastest to just drop the temporaries
INS
INS
INS
LDX XWORK ; get low half of result
RTS ; result is in D:X
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
* 16-bit left, right
* output parameter:
* 17-bit sum in 32-bit D:X D high
ADD16U TSX ; no local allocations
LDD 4,X ; left
ADDD 2,X ; right
STD XWORK ; save low half
LDD #0
ADCB #0
*
LDX XWORK ; get low half of result
RTS ; result is in D:X
*
* Etc.
*
***
*
* Stack after LINK #0 when fuctions are called by MAIN
* with one input parameter
* (#0 means no local variables)
* [STKUNDR ]
* [STKUNDR ]STKBAS
* [RETADR0 ]
* [32:VAR1_1]
* [32:VAR1_2]
* [PARAM2_1] (pointer)
* [PARAM2_2] (addend)
* [RETADR1 ] <= SP (return stack pointer (6800 S is byte below))
*
* To show how to access caller's local through pointer
* instead of walking stack --
* Add 16-bit signed parameter
* to 32 bit caller's 32-bit internal variable.
* input parameter:
* 16-bit pointer to 32-bit integer
* 16-bit addend
* no output parameter:
ADD16SI TSX ; no local allocations up front
LDAA #(-1)
TST 2,X ; high byte of paramater
BMI ADD16SIP
CLRA
ADD16SIP PSHA ; save the sign extension half (2 temporary bytes on stack)
PSHA
LDX 4,X ; get caller's pointer
LDD 2,X ; caller's 2nd variable, low
TSX
ADDD 4,X ; parameter
LDX 6,X ; caller's pointer
STD 2,X ; save result low half away
LDD 0,X ; caller's 2nd variable, high
TSX
ADCB 1,X ; sign extension half
ADCA 0,X
LDX 6,X ; caller's pointer
STD 0,X ; save result high half away
*
INS ; drop temporary
INS
RTS ; no result to load
*
*
***
* Stack after local allocation
* [STKUNDR ]
* [STKUNDR ]STKBAS
* [RETADR0 ]
* [32:VAR1_1]
* [32:VAR1_2] <= SP
*
MAIN LDX #0
PSHX ; four pushes is only one byte more than a call.
PSHX
PSHX
PSHX
*
LDX #$1234 ; parameters
PSHX
LDX #$CDEF
PSHX
JSR ADD16U ; result in D:X should be $E023
INS ; could reuse instead of dropping
INS
INS
INS
PSHX ; low half
LDX #$8765
PSHX
JSR ADD16S ; result in D:X should be $FFFF6788
STX XWORK
STD DWORK
INS ; could reuse instead of dropping
INS
INS
INS
TSX
LDD XWORK
STD 2,X
LDD DWORK
STD 0,X
* LDAB #0 ; calculate pointer
* ABX ; would use ABX here if there were an offset.
PSHX
LDX #$A5A5
PSHX
JSR ADD16SI ; result in 2nd variable should be FFFF0D2D
INS ; drop parameters
INS
INS
INS
TSX
LDD 2,X ; low half
LDX LB_BASE ; store it in FINAL, in process local space
STD FINALX+2,X
TSX
LDD 0,X ; high half
LDX LB_BASE
STD FINALX,X
*
TSX
LDAB #8
ABX
TXS
RTS
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP
***
* (who knows?) <= FP
***
* (who knows?) <= VBP
***
*
* Stack after initialization:
* [STKUNDR ]
* [STKUNDR ]STKBAS <= SP
***
*
START NOP
JSR INISTK
NOP
*
JSR MAIN
*
DONE NOP
ERROR NOP ; define error labels as something not DONE, anyway
STKUNDR NOP
LDS SSAVE ; restore the monitor stack pointer
NOP
NOP ; landing pad to set breakpoint at
NOP
NOP
LDX $FFFE ; alternatively, jmp through reset vector
JMP 0,X
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor,
* but not necessarily to your program in a runnable state.
If you've seen enough binary output is still waiting. (And it will still be waiting in a few more hours or days, really.)
If not,
split stack with no stack frames is also great on the 6801, even a bit better than what we saw here.
--
Maybe this would be a good place to bring up (again?) the regrets I have that Motorola didn't include a SBX subtract B from X instruction in the 6801. It would have been useful in the stack allocation code as you can see from where I used (and didn't use) ABX. It would also have been useful to have an add immediate to index AIX op-code, possibly 16-bit to do both allocation and deallocation, or signed 8-bit, or unsigned, paired with a subtract immediate from X (SIX?) instruction.
Yeah, more daydreams. Sorry. --
No comments:
Post a Comment