I'm putting this here for reference. Eventually, I plan to do a chapter on shifts, and most of this will be demonstrated there. I've only tested part of the code.
Demonstrating Left Shift --
6800
I've shown you some theoretical background on bit shifting left and multiplying by powers of 2, and I want to move ahead because we can't print out the results in decimal yet.
But I talked it over with God. Oh, some people will understand, some people won't. I could call it a hunch -- a strong hunch, strong enough to keep me from proceeding -- if you prefer.
And the result? There's a lot of code in these. Read through the 6809 and
68000 versions, scan the other two, and test one or more if you are inclined.
Come back for reference if things get murky when we talk about synthesizing
the multiplication and division routines.
I didn't want to consume four posts for this, to show the rigging and the test code for each processor, but it's going to be four posts even without the rigging framework.
But you've already seen everything in the rigging framework anyway, in the
single character input chapters. I'm going to let you move the new stuff into
the rigging framework yourself this time.
Starting with the 6800 code for character input, open the file rt_rig03_6800.asm up in a text editor and save it as rt_rig04_6800.asm. Keep it open and open inkey_6800.asm (or whatever you saved it as) up and save it as shftst_6800.asm (or whatever). Change the inclusion (EXP) line to include rt_rig04_6800.asm instead of rt_rig03_6800.asm.
Now cut INCHAR and INCHNE out of shftst_6800.asm , from the comments on AECHO to the hook to INCHV, and move them into someplace appropriate in rt_rig04_6800.asm. You might want to move them in two or three separate pieces, or you might want to move those lines altogether at once to the same place. Your choice.
Also grab PDUP for the 6800 and 6801.
Save the two files and make sure they still assemble and run as in chapter 3-10.
Now, if you want to do this part yourself, cut the test code out of
multest_6800.asm and replace it with appropriate 6800 code
from the last chapter. Yes, that does mean you'll need to convert the 6809 code to 6800 code. It's
not hard, just tedious, and instructive.
If you don't want to do the conversion yourself, or if you want to see how I'd
do it, the following is some demonstration code I produced. A little less than
two thirds of the way down, I realized I was heading the wrong direction and
quit testing it. Some of the remaining code is known not to do what I
intended, and the rest of the remaining code is not tested, but I'm leaving it
here for reference.
* test shifts and multiplies for 6800 (EXORsim)
* using parameter stack,
* with test frame
* Copyright Joel Matthew Rees, December 2024
*
EXP rt_rig04_6800.asm
****************
* Program code:
*
*
INX8 INX
INX7 INX
INX6 INX
INX
INX4 INX
INX
INX
INX
RTS
*
DEX8 DEX
DEX
DEX6 DEX
DEX
DEX4 DEX
DEX
DEX
DEX
RTS
*
* Unrolled 64-bit integer shift left 1 bit:
LSL64 LSL 7,X ; least significant byte (byte 7)
ROL 6,X ; next less significant byte (byte 6)
ROL 5,X ; next more significant byte (byte 5)
ROL 4,X ; next more significant byte (byte 4)
ROL 3,X ; next more significant byte (byte 3)
ROL 2,X ; next more significant byte (byte 2)
ROL 1,X ; next more significant byte (byte 1)
ROL 0,X ; most significant byte on (byte 0)
RTS
*
* 64-bit integer shift left 1 bit in a loop:
LSL64LP LDX PSP ; but do not update!
JSR INX7 ; point to last byte
LDAA #7 ; bytes to ROL
LSL 0,X ; least significant byte starts with LSL
SHL64L DEX ; carry not affected
ROL 0,X ; next more significant byte
DECA ; count down, carry not affected
BNE SHL64L ; do next
RTS
* Ends with X pointing to most significant byte
*
PGSTRT LDAB #$5A ; a common bit pattern to watch move
ROLB ; $B4 9-bit rotate left
ROLB ; $68
ROLB ; $D1
ROLB ; $A2
ROLB ; $45
ROLB ; $8B
ROLB ; $16
ROLB ; $2D
ROLB ; $5A
NOP ; Pause for a look.
*
ROLB
ADCB #0 ; $B4 8-bit rotate left
ROLB
ADCB #0 ; $69
ROLB
ADCB #0 ; $D2
ROLB
ADCB #0 ; $A5
TBA ; is another common bit pattern to watch move
ROLB
ADCB #0 ; $4B
ROLB
ADCB #0 ; $96
ROLB
ADCB #0 ; $2D
ROLB
ADCB #0 ; $5A
NOP
*
LSLB ; 1st: $A55A 16-bit shift left
ROLA ; $4AB4
LSLB ; 2nd
ROLA ; $9568
LSLB ; 3rd
ROLA ; $2AD0
LSLB ; 4th
ROLA ; $55A0
NOP
LSLB ; 5th
ROLA ; $AB40
LSLB ; 6th
ROLA ; $5680
LSLB ; 7th
ROLA ; $AD00
LSLB ; 8th
ROLA ; $5A00
NOP
*
LDX PSP ; 32-bit shift left mixed stack/register
DEX ; allocate two bytes
DEX
STX PSP
LDAA #$87
LDAB #$65
STAB 1,X
STAA 0,X ; $8765 on stack
LDAA #$43
LDAB #$21 ; $4321 in D
LSLB ; least significant byte
ROLA ; next less significant byte
ROL 1,X ; next more significant byte on stack
ROL 0,X ; most significant byte on stack
LSLB ; 2nd time
ROLA
ROL 1,X
ROL 0,X
LSLB ; 3rd time
ROLA
ROL 1,X
ROL 0,X
LSLB ; 4th time
ROLA
ROL 1,X
ROL 0,X ; result -- 7654:3210
NOP
*
LDAB #$10 ; set up test data
STAB 1,X ; X still has PSP
ADDB #$22
LDAA #7
T64LP STX PSP ; allocate before store
STAB 0,X
ADDB #$22
DEX
DECA
BNE T64LP
NOP
* Check the contents of the parameter stack when done.
LDX PSP ; sync X and PSP
JSR LSL64 ; unrolled loop
JSR LSL64LP ; loop
JSR LSL64
JSR LSL64LP
NOP
* Should be shifted left one hexadecimal digit.
JSR LSL64
JSR LSL64LP
JSR LSL64
JSR LSL64LP
NOP
* Should be shifted left another hexadecimal digit,
* which is a full byte!
* But that's a hard way to shift left 8 bits.
* Let's try an easier, quicker way:
LDAA #7
LS8BITL LDAB 1,X
STAB 0,X
INX
DECA
BNE LS8BITL
* Wasn't that fast?
NOP
LDX PSP
JSR INX8 ; drop them all
STX PSP
NOP
* Multiply 8 bits by 4
LDAB #65 ; $41
LSLB ; multiply by 2
LSLB ; ignore carry and multiply by 2
NOP
* Multiply 16 bits by 4
LDAB #65
CLRA
LSLB ; multiply by 2
ROLA ; catch the carry
LSLB ; and again
ROLA ; catch the carry again
NOP
* Multiply 16 bits by 16
LDAB #65
CLRA
LSLB ; multiply by 2
ROLA ; catch the carry
LSLB ; and again
ROLA ; catch the carry again
LSLB ; and again
ROLA ; catch the carry again
LSLB ; and again
ROLA ; catch the carry again
NOP
* Multiply 16 bits by 16, using loop
LDAB #65
CLRA
JSR PPSHD ; PSP in X on return
LDAB #4
JSR PPSHD
LDAA 2,X ; operand on parameter stack
LDAB 3,X
MUL16WL LSLB ; multiply by 2
ROLA ; catch the carry
DEC 1,X
BNE MUL16WL
NOP
INX ; drop count
INX
STX PSP
STAA 0,X ; save result
STAB 1,X
NOP ; re-use 2 bytes of allocation
* Other powers of 2: 2^7 == 128
LDAB #83 ; $53 X 128
CLRA ; for the high bits
LSLB ; 1st
ROLA
LSLB ; 2nd
ROLA
LSLB ; 3rd
ROLA
LSLB ; 4th
ROLA
LSLB ; 5th
ROLA
LSLB ; 6th
ROLA
LSLB ; 7th
ROLA
STAA 0,X ; $2980 == 10624
STAB 1,X
NOP
* compare going the other direction,
* ends with high byte in B, low byte in A:
LDAB #83 ; $53 X 128
CLRA ; for result
LSRB ; bit 0 to carry, B becomes high byte
RORA ; bit 0 of B now in bit 7 of A
NOP
* Saturation math:
LDAB #83 ; $53 X 128
LSRB ; bit 0 to carry
RORB ; now to bit 7
ANDB #$80 ; chop off the lost, double-shifted high bits
NOP
* Extend the saturation math with recovery (de-optimization):
LDAB #83 ; $53 X 128 (9 bits => 7 to left is 2 to right)
LSRB ; bit 0 to carry
RORB ; now to bit 7
TBA
ROLA ; bring the high bits back into position
ANDB #$80 ; chop off the high bits
ANDA #$7F ; chop off the low bits
NOP
* and again, more efficiently, but not most efficiently
LDAB #83 ; $53 X 128 (8 bits => 7 to left is 1 to right)
TBA ; make two halves
LSRA ; bit 0 to carry, high bits
RORB ; now to bit 7 (bit 0 to carry)
ANDB #$80 ; chop off the high bits
NOP
* 2^6 == 64
LDAB #83 ; $53 X 64
CLRA ; for the high bits
LSLB ; 1st
ROLA
LSLB ; 2nd
ROLA
LSLB ; 3rd
ROLA
LSLB ; 4th
ROLA
LSLB ; 5th
ROLA
LSLB ; 6th
ROLA
STAA 0,X
STAB 1,X
NOP
* compare going the other direction,
* ends with high byte in B, low byte in A:
LDAB #83 ; $53 X 64
CLRA ; for result
LSRB ; bit 0 to carry
RORA ; old bit 0 of B now in bit 7 of A
LSRB ; old bit 1 of B to carry
RORA ; old bit 1,0 of B now in bit 7,6 of A
NOP
* Saturation math:
LDAB #83 ; $53 X 64
LSRB ; bit 0 to carry
RORB ; now to bit 7, old bit 1 to carry
RORB ; now to bit 7,6 in order
ANDB #$C0 ; chop off the remainder
NOP
* Extend the saturation math with recovery (de-optimization):
LDAB #83 ; $53 X 64 (9 bits => 6 to left is 3 to right)
LSRB ; bit 0 to carry
RORB ; now to bit 7, old bit 1 to carry
RORB ; now to bit 7,6 in order
TBA
ROLA ; recover high bits including last carry
ANDB #$C0 ; chop off the high bits
ANDA #$3F ; chop off the low bits
NOP
* and again, copying first
LDAB #83 ; $53 X 64 (8 bits => 6 to left is 2 to right)
TBA ; make two halves
LSRA ; bit 0 to carry, high bits
RORB ; now to bit 7 (bit 0 to carry)
LSRA ; bring the high bits into place (bit 1 to C)
RORB ; now to bit 7,6 in order
ANDB #$C0 ; chop off the high bits
NOP
* 2^5 == 32
LDAB #83 ; $53 X 32
CLRA ; for the high bits
LSLB ; 1st
ROLA
LSLB ; 2nd
ROLA
LSLB ; 3rd
ROLA
LSLB ; 4th
ROLA
LSLB ; 5th
ROLA
STAA 0,X
STAB 1,X
NOP
* compare going the other direction,
* ends with high byte in B, low byte in A:
LDAB #83 ; $53 X 32
CLRA ; for the high bits
LSRB ; bit 0 to carry
RORA ; old bit 0 of B now in bit 7 of A
LSRB ; old bit 1 of B to carry
RORA ; old bit 1,0 of B now in bit 7,6 of A
LSRB ; old bit 2 of B to carry
RORA ; old bit 2,1,0 of B now in bit 7,6,5 of A
NOP
* Saturation math:
LDAB #83 ; $53 X 32
LSRB ; bit 0 to carry
RORB ; now to bit 7, old bit 1 to carry
RORB ; now to bit 7,6 in order, old bit 2 to carry
RORB ; now to bit 7,6,5 in order
ANDB #$E0 ; chop off the high bits
NOP
* Extend the saturation math with recovery:
LDAB #83 ; $53 X 32 (9 bits => 5 to left is 4 to right)
LSRB ; bit 0 to carry
RORB ; now to bit 7, old bit 1 to carry
RORB ; now to bit 7,6 in order, old bit 2 to carry
RORB ; now to bit 7,6,5 in order
TBA
ROLA ; recover last carry into high bits
ANDB #$E0 ; chop off the remainder
ANDA #$1F ; chop off the low bits
NOP
* and again, copying first
LDAB #83 ; $53 X 32 (8 bits => 5 to left is 3 to right)
TBA ; make two halves
LSRA ; bit 0 to carry, high bits
RORB ; now to bit 7 (bit 0 to carry)
LSRA ; bring the high bits into place (bit 1 to C)
RORB ; now to bit 7,6 in order
LSRA ; once more (old bit 2 to C)
RORB ; now to bit 7,6,5 in order
ANDB #$E0 ; chop off the high bits
NOP
* balance the stack
INX
INX
STX PSP ; clear stack
NOP
*
* From here down either hasn't been tested
* or doesn't function as I intended.
*
* shift left by 13 by right rotation => multiply by 8192, lose high bits
LDAA #$41 ; $4153 == 16723
LDAB #$53
LSRB ; bit 0 to carry
RORA ; now to bit 15, bit 8 to carry
RORB ; bit 8 to bit 7, old bit 1 to carry
RORA ; old bit 1 to bit 15 in order
RORB ; old bit 9 to bit 7, old bit 2 to carry
RORA ; old bit 2 to bit 15 in order
RORB ; old bit 10 to bit 7 (old bit 3 to carry)
ANDA #$E0 ; chop off the top bytes, ignore carry
CLRB
NOP
* shift left by 13 => multiply by 8192, capture all bits:
LDAA #$41 ; $4153 == 16723
LDAB #$53
JSR DEX4 ; pre-allocate
STX PSP
LSRB ; bit 0 to carry
RORA ; now to bit 15, bit 8 to carry
RORB ; bit 8 to bit 7, old bit 1 to carry
RORA ; old bit 1 to bit 15 in order
RORB ; old bit 9 to bit 7, old bit 2 to carry
RORA ; old bit 2 to bit 15 in order
RORB ; old bit 10 to bit 7 (old bit 3 to carry)
CLR 3,X ; least significant, ignore carry
CLR 2,X ; save a place for lower middle byte
STAB 1,X ; save next more significant byte
TAB ; copy to split out high and low
ANDB #$E0 ; chop off the high bits
STAB 2,X ; save the lower middle byte
ANDA #$1F ; chop off low bits
STAA 0,X ; save high bits
NOP
* Check the results before continuing.
NOP
* shift left directly by 13 => multiply by 8192, capture all bits:
DEX ; 2 placeholders, for middle upper byte
DEX ; and high byte
STX PSP
LDAA #$41 ; $4153 == 16723
LDAB #$53
LSLB
ROLA
ROL 1,X ; catch 1
LSLB
ROLA
ROL 1,X ; catch 2
LSLB
ROLA
ROL 1,X ; catch 3
LSLB
ROLA
ROL 1,X ; catch 4
LSLB
ROLA
ROL 1,X ; catch 5
LSLB
ROLA
ROL 1,X ; catch 6
LSLB
ROLA
ROL 1,X ; catch 7
LSLB
ROLA
ROL 1,X ; catch 8
DEX ; for high byte final resting place
CLR 0,X ; not completely filled
LSLB
ROLA
ROL 1,X ; catch 9
ROL 0,X ; catch 1
LSLB
ROLA
ROL 1,X ; catch 10
ROL 0,X ; catch 2
LSLB
ROLA
ROL 1,X ; catch 11
ROL 0,X ; catch 3
LSLB
ROLA
ROL 1,X ; catch 12
ROL 0,X ; catch 4
LSLB
ROLA
ROL 1,X ; catch 13
ROL 0,X ; catch 5
NOP
* stop to check
NOP
JSR INX6 ; drop all the above
STX PSP
NOP
* 8-bit-wide rotation
* accumulator-wide ROL by 3 / ROR by 5 using the stack:
LDX PSP ; just to by sure, and remind ourselves
DEX ; temp
STX PSP
LDAA #83 ; $53
STAA 0,X ; copy to stack
LSL 0,X ; shift left by 3
LSL 0,X
LSL 0,X
LSRA ; shift right by 5
LSRA ; (more shifts, use the faster shift accumulator)
LSRA
LSRA
LSRA
ORAA 0,X ; put results together
NOP
* accumulator-wide ROL by 3 / ROR by 5 using ABA:
LDAB #83 ; $53
TBA ; copy
LSLA ; shift left by 3
LSLA
LSLA
LSRB ; shift right by 5
LSRB
LSRB
LSRB
LSRB
ABA ; put the results together
NOP
* accumulator-wide ROL by 3 / ROR by 5 using ADC #0 trick:
LDAB #83 ; $53
LSLB
ADCB #0
LSLB
ADCB #0
LSLB
ADCB #0
NOP
* ugly accumulator-wide ROR by 5 / ROL by 3 using branch and set:
LDAB #83 ; $53
LSRB ; clears bit 7
BCC RR8BN1
ORAB #$80 ; set it for the carry
RR8BN1 LSRB
BCC RR8BN2
ORAB #$80
RR8BN2 LSRB
BCC RR8BN3
ORAB #$80
RR8BN3 LSRB
BCC RR8BN4
ORAB #$80
RR8BN4 LSRB
BCC RR8BN5
ORAB #$80
RR8BN5 NOP ; next instruction
* Not as ugly accumulator-wide ROR by 5 / ROL by 3,
* but uses both accumulators to avoid branches:
LDAB #83 ; $53
TBA
LSRA ; get lowest bit in carry first
RORB
LSRA ; get 2nd bit in carry first
RORB
LSRA ; get 3rd bit in carry first
RORB
LSRA ; get 4th bit in carry first
RORB
LSRA ; get 5th bit in carry first
RORB
NOP
* Compare result before dropping
INX ; drop temp
STX PSP
NOP
* 16-bit integer rotate left 3 / right 13 on 6800:
LDAA #$41 ; $4153 == 16723
LDAB #$53
LSLB ; clear bottom bit on shifting left
ROLA
ADCB #0 ; push the top carry in (16-bit rotation)
LSLB
ROLA
ADCB #0 ; push the top carry in (16-bit rotation)
LSLB
ROLA
ADCB #0 ; push the top carry in (16-bit rotation)
NOP
* 16-bit integer rotate right 3 / left 13 on 6800:
DEX ; temp to grab bit with
STX PSP
STAB 0,X ; copy
LSR 0,X ; get bottom bit
RORA ; rotate it into top byte
RORB ; 1 bit complete
LSR 0,X ; get next bottom bit
RORA
RORB ; 2nd bit complete
LSR 0,X ; get next bottom bit
RORA
RORB ; 3rd bit complete
NOP ; Should be back to $4153
INX
STX PSP
*
RTS
*
END ENTRY
Now let's look at multiplying by some small constants that aren't powers of two.
No comments:
Post a Comment