Saturday, August 13, 2022

VTL-2 part 3, Optimizing for the 6801

The 6801 is one of my favorite hobby horses -- or one of my favorite axes to grind. Motorola kind of missed out on some opportunities with it. I've ranted about that elsewhere, here I'll just make use of it for what it is.

If you wonder why I would do a hand-optimization of VTL-2 for the 6801, John Linville got me digging into the getting the 6800 version running, and I enjoy working with the 6801 almost as much as I enjoy working with the 6809. Anyway, it's a kind of recreational activity for me.

Two notes before I dig in: 

One is a CPX trick someone used in the 6800 source that won't work on the 6801. There's an FCB $9C at line 541 of the source I posted with the variables moved out of the direct page, in the middle of the DIVide routine. (It's from the original that I'm working with.) That's the op-code for CPX, and CPX doesn't affect carry or overflow on the 6800. It is used as an effective no-op to skip the SEC instruction, instead of using a branch around the set carry instruction. It saves one precious byte (and a couple of processor cycles). 

I've replaced it with a branch around in the optimizations for the 6801, because the 6801 fixes the CPX instruction to fully implement all flags for the sixteen-bit comparison, which means it ain't gonna work on the 6801. And it won't work for the 6809, either. 

(That's the thing about such tricks. They fall apart under progress.)

The other is that the 6800 uses a post-decrement push, so that the stack pointer is always pointing to the next available byte, not the last byte pushed. That's important in the insert-line-by-stack-blast routine, which I'm going to replace just as a matter of principle before I'm done, but the first version for the 6801 keeps it. 

For anyone getting ahead of me and cribbing from my conversions to work on the 6809 transliteration, remember that, and position Y or U correctly before you copy down or up, as you may choose, in your line insertion routine.

At the present point, I have it running on my fake 6801 simulator that I added to Joe H. Allen's EXORsim. This first version does easy stuff, like converting two-byte load and store sequences using the accumulators to single double-accumulator sequences. 

It also moves the temporary/local variables out of where they were hiding among the VTL pre-declared variables (and saving RAM space). Two of the temporaries were easily replaced with PSHX and PULX sequences (since the 6801 can do that). The others have usage patterns that don't fit so easily to shoving them off on the stack, so I moved them to the direct page.

Why, you ask? Why use extra RAM in the direct page when I just moved things out of the direct page?

Well, twenty bytes is a lot different from the entire 256 bytes of the direct page. Twenty bytes can be moved around with an appropriate ORG to fit your hardware. And moving those temporaries back into the direct page gives us back some of our code byte count savings that came from having it all in the direct page in the first place.

But the big reason is that it is an excuse to give them more meaningful names than SAVE0, SAVE1, ... etc. labels, which will help when moving the code to the 6809.

See the code for the rest of the story:

https://osdn.net/users/reiisi/pastebin/8475 (replaced with following:)

https://osdn.net/users/reiisi/pastebin/8605 (See notes for 20220904 and 20220911 below.)

[202208140155: add (yeah, up way too early on a Sunday morning]

Here is source that clears out the stack blasts, for clarity and for playing nice with interrupts:

https://osdn.net/users/reiisi/pastebin/8476 (replaced with following:)

https://osdn.net/users/reiisi/pastebin/8606 (See notes for 20220904 and 20220911 below.)

[202208140155: add end]

I'll add at least one more when I get it running on the MC10 emulator, then I should be ready to tackle the 6809 transliteration -- if no one beats me to it.

[202209022213: add (Oh! how embarrassing!)]

I have discovered, on the road to getting the MC10 version up, that my emulation was faulty. For all the discussion on the difference between CPX on the 6800 and on the 6801, I forgot to set the carry flag in my emulation, and the RAM check routine I added to the 6801 version has its end test inverted. Eventually, I'll delete the pastebins above and replace them with pastebins that have the BHI on line 150/152 fixed. Until then, go in and edit that line in whichever source, change the BHI to BLO:

PROBET	CPX	#COLD
	BLO	PROBE	; CPX on 6801 works right.

And get the most recent revision to my 6801 version of EXORsim. Hopefully I'll get the fixed simulator code up sometime tomorrow (3 Sept.). 

[202209031747: Done. Corrected pastebuffers added and linked, buggy pastebuffers deleted.]

[202209022213: add end]

[202209041331: add (more embarrassment!)]

I made a bad optimization to the random function in both of the above, and in the MC-10 version. Don't know how I missed this one, either. Blame it on my age?

At the label AR2 in both the above, I falsely corrected, without thinking, the addition of the low byte to the high, and vice-versa. My false correction looks like this:

AR2	STD	0,X	; STORE NEW VALUE
	ADDD	QUITE	; RANDOMIZER
	STD	QUITE
	RTS

Looks reasonable, right? As long as you ignore the comment about randomizer?

Well, here's what it looked like in the original 6800 code:

AR2	STAA	0,X	; STORE NEW VALUE
	STAB	1,X
	ADDB	QUITE	; RANDOMIZER
	ADCA	QUITE+1
	STAA	QUITE
	STAB	QUITE+1
	RTS

 See what was going on? 

Well, I probably should fix the pastebuffer, but OSDN won't let me today. Maybe I have too many. But I don't want to publish this code in a repository without some explicit permission from the authors. Anyway, the fix I recommend is below, with a couple of lines of code to detect lack of initialization and semi-auto initialize it:

AR2	STD	0,X	; STORE NEW VALUE
	BNE	AR2RND	; Initialize/don't get stuck on zero.
	INCB		; Keep it known cheap.
*	ADDD	QUITE	; RANDOMIZER	; NO! Don't do this.
AR2RND	ADDB	QUITE	; RANDOMIZER	; Adding the low byte to the high byte
	ADCA	QUITE+1	;		; is cheap but intentional.
	STD	QUITE
	RTS
*

Just search the code for "RANDOMIZER", cut the bad code out, and paste in the fix. Or, better yet, edit it with your own, improved random function.

Initialization -- yeah, this is another place where the original code left initialization out to keep the code tiny.

[202209041331: add end]

[202209111229: add] 

While transliterating for the 6809, I discovered that I missed some more opportunities to optimize for the 6801.

If you look for the label 

SUBTR    SUBD    0,X

you'll notice that, in the 6800 source, it was a very short routine to subtract whatever X pointed to from D. Every call to SUBTR in the source for the 6801 can be replaced with the actual body of the subroutine, with no increase of code size. You can search for BSR SUBTR and JSR SUBTR and replace each with SUBD 0,X. (I don't think there will be any JSR SUBTR, but if there are, you can.)

[202209111229: add end]

[JMR202209231810: add]

I have written up a post on VTL expressions, here: https://joels-programming-fun.blogspot.com/2022/09/short-description-vtl-2-expressions-very-tiny-language-p1.html , which will help in testing and otherwise making use of the language. I should also shortly have a post up introducing programming in VTL-2.

[JMR202209231810: add end]

[JMR202210011737: add]

I now have my work on VTL-2 up in a private repository:

https://osdn.net/users/reiisi/pf/nsvtl/wiki/FrontPage

There is a downloadable version of VTL-2 for the Tandy MC-10 (6801) in the stock 4K RAM configuration in there, with source, executable as a .c10 file, and assembly listing for reference. Look for it in the directory mc10:

https://osdn.net/users/reiisi/pf/nsvtl/files/

[JMR202210011737: add end]

 

No comments:

Post a Comment