I'm putting this chapter here for reference. I discovered I was heading too deep, too early, but I want to keep this chapter handy. I haven't checked the code or the explanations carefully, be careful if you read or try to use what I have here.
Multiplying by Small Constants
(Shift Left and Add)
We've worked out multiplying by some constant powers of 2, hopefully enough to have some confidence in using bit shifts for multiplying.
(Note that we are not using Trachtenberg methods.
So, I said that multiplying by small constants that are not powers of ten would still be easy. Considering I said that multiplying by powers of two would be easy, you may be doubting me.
Fair enough.
But let's look at multiplying by constant 3.
We can look at it several different ways:
3X == X + X + X
Multiplying something by three is adding it to itself three times. That's what the algebra says, and when I say it in somewhat ordinary English, it makes sense. Sort of. But if we are very careful about how we say this, we have to say adding the number to zero three times.
Adding it to itself three times could actually mean something else, something we just talked about in the last chapter, and are still talking about here.
But multiplying something by two -- doubling -- is adding it to zero twice, or adding it to itself (once). Anyway,
3X == 2X + X
Multiplying something by three is multiplying by two and then adding it again -- or left-shifting in binary once and then adding it to the product again.
For a byte on the 6809, catching carries:
CLRA ; for carries
LDB ,U ; get the byte
LSLB ; 2X
ROLA ; grab any high bit
ADDB ,U ; 2X+X==3X
ADCA #0 ; get any carry
Let's do that to a 16-bit integer, as a subroutine. And capture
the carries.
MUL3 LDD ,U
CLR ,-U ; for carries
LSLB ; 2X
ROLA
ROL ,U ; get carries
ADDD 1,U ; + X makes 3X
BCC MUL3NC
INC ,U ; get carries
MUL3NC STD 1,U
RTS
If we weren't capturing carries and setting it up as a subroutine, it would be really short.
Speaking of capturing carries, you will note that this subroutine accepts a
16-bit integer as input, but returns a 24-bit integer with the carries in the
added most significant byte.
Let's not tell anyone, but , rather than using shifts, we might have simply loaded and then added twice, also:
MUL3ADD LDD ,U
CLR ,U ; for carries
LDD 1,U ; 1X
ADDD 1,U ; 2X
ROL ,U ; get first carry
ADDD 1,U ; 3X
BCC MUL3NC
INC ,U ; get 2nd carry
MUL3ANC STD 1,U
RTS
The first load is the same as adding it to zero, so that's really just multiplying by 3 the hard way, adding it up three times.
Adding a number to itself is another way to shift it left by one bit, by the way.
How about another way that works on the 6809 and 6801? Both have the MUL A × B
instruction that leaves the product in D:
MUL3MUL LDD ,U
PSHU A ; save high byte
LDA #3
MUL ; multiply low byte in B
STD 1,U ; save 16-bit result
LDB ,U ; get high byte back
CLR ,U ; for result addition
LDA #3
MUL ; multiply high byte in B
ADD ,U
STD ,U
RTS
MUL takes 10 cycles on the 6801 and 11 on the 6809, so it would take a little longer than the shift method, but not much, and with a similar byte count.
But this routine using MUL could be generalized into an 8-bit by 16-bit routine that could be fairly quick for both small variables and small constants:
* 16 by 8 to 24-bit multiply
* 16-bit multiplicand in 2nd and 3rd bytes,
* 8-bit multiplier in 8 bits on top:
MUL16X8 LDB 2,U ; low byte
LDA ,U ; multiplier
MUL
STB 2,U ; store result low byte
PSHU A ; save result middle byte temp
LDD 1,U ; multiplier in A, high byte in B
MUL
ADDB ,U+ ; add result middle byte, pop it
ADCA #0 ; add the carry into the high byte
STD ,U ; save middle and high bytes of result
RTS
*
MUL3CAL LDB #3
MUL3ROB PSHU B
BRA MUL16X8 ; rob the RTS
MUL5CAL LDB #5
BRA MUL3ROB ; rob the PSHU
MUL10CAL LDB #10
BRA MUL3ROB ; rob the PSHU
Cool enough?
How about multiplying by 5 via shift-and-add, just to compare? Five is four plus one:
MUL5 LDD ,U
CLR ,-U ; for carries
LSLB ; 2X
ROLA
ROL ,U ; carries
LSLB ; 2X again makes 4X
ROLA
ROL ,U ; carries
ADDD ,U ; + X makes 5X
BCC MUL5NC
INC ,U ; get carries
MUL5NC STD 1,U
RTS
Similar length to the MUL method, but just a little faster.
How about 10?
Ten is five times 2.
Shifting left and then calling MUL5 would lose us a carry. But we can call
MUL5 and then shift once more left. Or use the shift-and-add MUL5 code and
hang another shift on the end:
* Multiply a 16-bit integer by 10,
* return as a 24 bit integer, to keep the carries.
* For the 6809
MUL10 LDD ,U
CLR ,-U ; push a zero for overflow
LSLB ; 2X
ROLA
ROL ,U ; catch the overflow
LSLB ; 2X again makes 4X
ROLA
ROL ,U ; catch the overflow
ADDD 1,U ; + X makes 5X
BCC MUL10NC
INC ,U ; carry in the addition
MUL10NC LSLB ; 2X again makes 10X
ROLA
ROL ,U ; catch the overflow
STD 1,U
RTS
But if we have the MUL16X8 routine, it's probably going to be about as fast to call that.
That's the 6809. How about actual code on the other processors?The conversion is straightforward. I recommend trying it.
For the 6801, you will be loading X from PSP and updating PSP appropriately. Also, you will be able to replace LSLB ; ROLA pairs with LSLD.
For the 6800, in addition to loading and updating PSP, you will need to split
the double accumulator operators into individually working on A and B.
As a rough guess, given how fast the MUL is on 6801 and 6809, it would be
reasonable to just use MUL and ADD on both. Shift-and-add will be useful on
the 6800, however.
For the 68000, don't try to be too literal. Give 32-bit results instead of
24.
Oh, okay, let's look at code for the 68000:
MUL3_16: ; guessing it'll be about 48 processor cycles
CLR.W -(A6) ; for carries
MOVE.L (A6),D7
LSL.L #1,D7
ADD.L (A6)
MOVE.L D7,(A6) ; return in 32 bits
RTS
MUL3_16_ADD:
CLR.W -(A6) ; for carries
MOVE.L (A6),D7
ADD.L (A6) ; Or maybe do it in-register?
ADD.L (A6)
MOVE.L D7,(A6)
RTS
* No 8-bit MUL in 68000
* MULU takes 38 + 2 per bit processor clocks.
MUL3_16_MUL: ; MULU by #3 takes about 46 processor cycles, about 12 memory clocks.
MOVE.W (A6)+,D7 ; pop it
MULU.W #3,D7 ; ignore source high, 32-bit result in D7
MOVE.L D7,-(A6)
RTS
* 16 by 16 to 32 bit multiply
* 16-bit multiplicand in 3rd and 4th bytes,
* 16-bit multiplier in 16 bits on top:
MUL16X16:
MOVE.W 2(A6),D7
MULU.W (A6),D7 ; ignore source high, 32-bit result in D7
MOVE.L D7,(A6)
RTS
*
MUL3CAL_16:
MOVE.W #3,-(A6)
BRA.S MUL16X16 ; rob the code
MUL5CAL_16:
MOVE.W #5,-(A6)
BRA.S MUL16X16 ; rob the code
MUL10CAL_16:
MOVE.W #10,-(A6)
BRA.S MUL16X16 ; rob the code
MUL5_16:
CLR.W -(A6) ; for carries
MOVE.L (A6),D7
LSL.L #2,D7 ; 4X
ADD.L (A6) ; 4X+X
MOVE.L D7,(A6) ; return in 32 bits
RTS
MUL10_16:
CLR.W -(A6) ; for carries
MOVE.L (A6),D7
LSL.L #2,D7 ; 4X
ADD.L (A6) ; 4X+X
LSL.L #1,D7 ; 2(4X+X)
MOVE.L D7,(A6) ; return in 32 bits
RTS
And 16 bit integers ought to be big enough for anyone, right? I mean, who would ever do anything with an integer bigger than 30,000?
(Which is part of where the old "Who needs more than 64K of memory?" excuse for corner-cutting in the early-to-mid 1980s came from.)
However, we do often want to do this with 32-bit operands, and may want to
catch the carries, from that, even. So let's look at giving 32-bit results with carries in 40, I mean, 48 bits. (We shouldn't want to do math in odd numbers of bytes on the 68000.):
Draw your own conclusions about optimization.
Recap
And now we can do part of what is necessary for decimal input and output. That was quick.
Dividing from the top.
multiply byte by ten, remainder from divide by 100?
***********
No comments:
Post a Comment