joel's programming fun: 2022

Thursday, October 13, 2022

Notes on building m6809-gdb for use with XRoar

Ciaran has a page up on his local build of an older version of gdb for working with 6809 and XRoar at

https://www.6809.org.uk/dragon/m6809-gdb.shtml

I am running Ubuntu, I think it's 18 (need to upgrade that sometime, I suppose, but no rush.)

I'm building XRoar from Ciaran's repository and installing it in my user-local executables directory. So the dependencies for XRoar were already installed when I started.

The first time I tried to clone Ciaran's repository for m6809-gdb, network traffic just kept git from being able to load anything. Had to wait and try later, meaning today.

It's about 300 or 400 MB, and it's not a terribly time-consuming build, in the range of thirty minutes with a reasonable workstation configuration and broadband connection.

Following his instructions, i did the configure with the recommended parameters, and it got stuck part-way through. Couldn't find my Ada or flex; and then other peculiarities of my OS configuration got it stuck without those. (Thought I had both installed, but a quick check showed they weren't.) So, ...

I logged out of my working user and logged in to my admin user and got the most recent updates with Ubuntu's apt tool. (Where it's apt-get on Debian, it's just apt on Ubuntu, but pretty much similar. apt update, then apt upgrade.)

Then I looked around for Ada and flex. They are a little hard to find, because apt search and man -k both throw a lot of false-positives up there.

Ada -- the package on Ubuntu is gnat. Gnat-8 was available for my OS, but I loaded the default gnat, which turned out to be gnat-7:

sudo apt install gnat

Flex -- no, this is not that Flex, the old Flex operating system for 6800 and 6809. It is also not Digital Research's FlexOS. (Weren't there trademark issues with that?) Nor was it ...

Flex in this case is the fast lexical analyser generator, a re-implementation of the lex low-level parser (lexer) generator that was used with Yet Another Compiler Compiler (yacc) in putting together the early C compilers during the birth of Unix and C. The package name in Ubuntu is flex, so

sudo apt install flex

I remembered there were also messages about bison, so while I was at it I got that, too:

sudo apt install bison

(Yeah, bison is a re-implementation of yacc. Puns all around.)

When I logged back into my workstation user, the build process got stuck with cached configuration that said I didn't have bison.

Make clean and several other of the usual methods didn't clear that, and I couldn't find config.cache to delete. So I just deleted the whole directory and started over with the git clone step.

This time, it successfully built.

Knowing that it built okay, I did a make clean and re-ran the configure step with a prefix option:

--prefix=/home/myuser/local/mybin

(Not my real file system structure.) My login name in the above would be

myuser

and my user-local place for keeping binaries in the above would be

~/local/mybin

Running the make and make install steps after this configuration option is set allows me to run m6809-dbg when I'm logged in as myuser, but keeps it out of the general system executables. (I also do this with my own asm68c assembler, FWIW.)

Notes on running gdb:

b *0x3971 to set breakpoint at 0x3971 without symbols
ni (next instruction) to single-step
info registers to show CPU registers
c to continue

Saturday, September 24, 2022

A Short Description of VTL-2 Programming (Very Tiny Language pt 2)

(Continuing from VTL-2 Expressions, because some people want to know:)

As I understand it, Very Tiny Language essentially owes its existence to the paucity of support from MITS for the Altair 680. RAM boards from the manufacturer were not just expensive, but they were really hard to come by. And even if you had the RAM, putting anything in the RAM was painful. No floppies, no cassette tape, only the serial port, so you had to load BASIC or the assembler or other tools from paper tape, and that took a long, noisy several hours according to what I have heard. And without tools, it's hard to develop tools in the first place.

So Gary Shannon and Frank McCoy at the famous Computer Store made this very tiny programming language that fit into three ROM sockets on a single ROM board. 760 bytes of fairly tight code. (When I say fairly tight, don't get me wrong. I might kibbitz their work, but they did a pretty good job.)

They took a bunch of short-cuts in the design of the language to keep it tight, which is something to keep in mind while we walk through using the language to write programs.

And it turned out to be useful enough that owners of the 8800 also wanted it, so they wrote a version for that. And then 3rd parties wrote versions for the 6502 and other processors.

Why?

Because it can turn your microcomputer into a programmable (integer) calculator even with very little ROM and RAM. (And it's also an example of a useful, minimal ad-hoc programming language.)

My post on expressions talked about using it without really programming in it. This post will be about programming in it.

Allocation variables!:

Before we can start putting VTL programs in memory, we have to make sure VTL knows where it can store the lines of program we feed it.

(Fasten your seat-belt and hang on, this detour-is-not-a-detour is a bit of a rough ride.)

We need to check the two allocation variables, the current allocation base -- & (ampersand) -- and the top of allocatable memory -- * (asterisk). (Yes, ampersand and asterisk are variables.)

?=&

will print the current allocation base. And

?=*

will print the top of allocatable memory -- except that they have to be correctly set first.

Yes, these are variables, and yes, that's what they are named.

If you want to know the total amount of memory available for programming (and the array, more later), type

?=*-&

-- if they happen to be correctly set. Which is what I should explain now.

To understand the problem, consider that Gary Shannon and Frank McCoy at one time were making a little money off the VTL ROMs they sold. Not much, though. They really didn't want to have lots of different versions of the ROM in their inventory.

Other than the CPU decision, they just wanted the user to be able to plug the ROMs into a ROM board, set the board's address, and turn on power. So they didn't want to program a RAM limit into the ROMs that would not apply to a large number of their customers, even if the customer could fix the limit. Better to make customers aware of what those variables are and why, and ask the customers to set them themselves.

Maybe.

Anyway, if you have the original ROMs or the original source (including the version I started with in bringing VTL-2 up on EXORsim), these variables are not set when VTL-2 starts up.

And, until they are set, you can't start programming.

If you are running VTL-2 inside an emulator, the emulator may do you the convenience of clearing memory before loading the interpreter -- in which case it will be obvious that they are not set.

Or it may not.

And if you are running real hardware, the variables will probably have garbage in them on boot-up, which may be obvious, or which may look valid to the interpreter (and may cause crashes if you try to edit or look at program lines).

On the other hand, if you are working with one of the executable images made from source in which I moved the VTL variables out of the direct page (all the rest of the versions I have posted), I went ahead and put start-up code in the interpreter that sets them -- specifically because I moved those variables and the initial allocation base is no longer a simple question of 264 on a 68XX CPU.

Similarly, C being what C is, VTL-C sets them for you. And others have done likewise, for similar reasons.

So, you need to take a look at them and figure out if they need to be set or not.

Top of allocatable RAM:

The top of allocatable RAM is fairly straightforward. Check it now:

?=*

It should be either equal to the amount of RAM you have (or have given it), if VTL resides (say, in a ROM) in an upper address range, or the amount of RAM minus 1K to 2K -- so that VTL and maybe system stuff like maybe stack can reside in RAM above the allocatable area.

Confusing? Maybe I should draw a map:

0000:	Possible system stuff
(low address):	VTL variables also, input buffer and usually VTL stack
(slightly higher address):	& (allocation base is here.) VTL allocatable space -- for program (and maybe an array) * (top of allocatable area is here.)
(still higher address):	possible system stuff (including, possibly, the host stack)
(even higher address, may be top half of memory):	VTL interpreter code image
(above that):	more system stuff (including, possibly, the host stack)

Hard numbers?

The top of the allocatable area should be something around 30000 or more for 32K RAM systems, 14000 or more for 16K, 6000 or more for 8K, etc. If the interpreter is up in the upper 32K of address space, where ROM often is, top of RAM may be exact powers of two.

However, in some systems, the system stuff at the bottom of RAM is huge, and general purpose RAM itself may not start until something like 16384. And general purpose RAM may not come in simple powers of 2. So you may need to figure out the total RAM size and add it to the physical beginning of general purpose RAM.

(For the MC-10, for example, you probably want to play around with it with my unedited sources in the XRoar emulator, then read the source and figure out how to adapt it to your hardware, to use your RAM configuration effectively. Your likely configurations are 4K, 8K, 20K, and ??, so top of allocatable space is likely to be 1 or 2 K less than one of those.)

(To be strictly accurate, in the MC-10 images and the CoCo images I've produced, I'm letting the VTL interpreter piggyback its stack on the stack the BASIC host uses. This is because the MC-10 interrupt routines, in particular, don't like the stack being somewhere else -- which means you really could eliminate the stack allocation in the source code, but I didn't want to pull too many rugs out from under things.

Without the necessity of maintaining BASIC's stuff, you would need less than 900 bytes at the top of RAM for the interpreter, not 2K. Top of RAM would be above 31000, 15000, 7000, etc.)

Allocation base:

If you are running an original ROM or an image from the original source, the initial value of the allocation base depends on the CPU:

For the 6800, the allocation base at startup is 264, eight bytes above the stack area at the top of the direct page.
For the 8080, they moved the variables up to make space for the CPU interrupt shims at the bottom of memory, and the initial allocation base is 320.

If you are running an image from my modified sources and have not altered the label ZERO or the ORG before where ZERO is declared, the allocation base depends on which source you are using. You can check the assembler listing for the value of the label PRGM.

(Cough. Let me correct myself.)

If you are assembling it yourself from whatever sources, you should check the assembler listing.

Go ahead and find the listing, or re-run the assembler and get it, and look for the label PRGM. If the interpreter is setting things for you, a little bit after the label COLD, or a little before START, you'll see something like


      LDX	#PRGM
    STX	AMPR

That's the allocation base. And you'll see the code setting the STAR variable shortly before the START label. That's the theoretical top of memory.

(If you're running a C language version, it will most likely be 264 on startup. Or, if it's a 32-bit version, 528 or something. Maybe. Look for clues in the C source. Or assume it's good unless it crashes.)

At the time I write this post,

For my 6800 version with just the variables moved, the initial allocation base is hexadecimal $308, which is 776 in base ten.
For my general 6801 enabled, slightly optimized version with the variables moved, it is the same.
For the MC-10 using the general 6801 image, because there is a huge gap where nothing exists below video RAM and because VTL and its variables have to reside above video RAM and memory in use by interrupt routines and such, the initial allocation base is $4400, or 17408 in base ten.
For the Color Computer using the transliteration image, while we don't have that gap of wasted memory map like on the MC-10, we do have memory used by BASIC and interrupt routines, and by the video RAM, which have to reside below the variables. The initial allocation base I'm using is $1708, or 5896 in base ten.

Whew. End of detour-is-not-a-detour.

If the values of the allocation variables don't look reasonable on startup, figure out what they should be and set them before you continue. Otherwise, you will by typing in code and VTL won't be remembering it.

One more point before we return to our originally scheduled programming:

New command:

Now that you have noted and remembered the initial value of the allocation base, consider what happens if you have a program in memory and you type

&=264

or whatever the initial allocation base label was on your system (say, 5896 on the Color Computer or 17408 on the MC-10).

Presto.

VTL has forgotten your program and is ready for a new one.

Programming a counted loop in VTL-2:

With the allocation variables properly set, we can now type program lines in and list them.

In my bring-up process, I used several variations of a simple counting program because it was simple. We'll start by looking at that. If you've been through my expressions post, much of this will be familiar, and you'll be able to see some differences from BASIC.

10 A=0
20 A=A+1
30 ?="A=";
40 ?=A
50 ?=""
60 #=(A<10)*20
70 ?="BYE-BYE NOW"

Lines 10 and 20 look just like BASIC.

If you make a mistake, _ (the underscore key) allows you to back up one space, but the terminal doesn't back up or erase it from the screen. If you have trouble keeping track of what's happening this way, type @ (the at-each key) and start the line over from scratch.

Line 30 has that print command that reminds one of the BASIC shortcut key, but isn't the same. It will just print the string

A=

on the output device. And the semicolon will stop the VTL interpreter from putting a new-line after it.

Line 40 prints the value of A.

Line 50 doesn't have a trailing ; (semicolon), so printing the empty string puts a new line on the output device.

Line 60:

You may remember from the expressions post that # is the current line number. And I mentioned that setting it forces a jump. But what is that weird business about multiplying a comparison by 20?

There is no line 0 in VTL. Trying to jump to line 0 is a no-op.

The value of the comparison is 0 (false) or 1 (true). Multiply that by 20 and you get either 0 or 20. If it is 0, there is no jump. If it is 20, setting the line number to 20 causes a jump to line 20.

Say what?

This is VTL's version of BASIC's IF -- GOTO statement.

VTL re-uses the assignment syntax -- and the semantics -- not just for jumps, but for conditional jumps, too.

Just for the record, the left-to-right parsing means that we don't really need the parentheses on the condition. I put them in there to make them more obvious. Whether you do or not is up to you. Remember that more complex conditions will likely need the parentheses.

So, that's how this counted loop works. If you run it.

Can you guess how to run it?

That's right. Type

#=1

on a line by itself.

OK
#=1
A=1
A=2
A=3
A=4
A=5
A=6
A=7
A=8
A=9
A=10
BYE-BYE NOW

Huh? Why not #=10?

#=10 would also work, but VTL does you the convenience of looking for the next higher existing line number and starting there, so #=1 will run any program.

What about line 0? Try it if you haven't yet.

Nothing? Remember, VTL forbids line 0.

That means that, when you type 0 by itself on a line, it does not try to edit, save, or delete a line 0. And the authors took advantage of that.

OK
0
10 A=0
20 A=A+1
30 ?="A=";
40 ?=A
50 ?=""
60 #=(A<10)*20
70 ?="BYE-BYE NOW"

Typing a 0 by itself on a line gives you a listing, instead -- if the allocation variables define memory to remember a program in, and if there is a program in there.

Some versions of VTL use an L instead of a 0 as a listing command, just for the record.

Programming Factorials in VTL-2

This one is from the PDF manual, but I've modified it just a little bit for no particular reason.

Since VTL wants us to be frugal, we'll take the cumulative approach.

LIne 10, N will be the index number.

Line 20, F will be the cumulative factorial.

Line 30, we want to print N,

then, in line 40 print some text to show what it is,

and, in line 50, print F.

In line 60, we'll terminate the output line.

In line 70, we'll increment remember F in G.

In line 80, we'll increment N,

and in line 90, we'll use the new value of N and the old value of the factorial to obtain the factorial for the new N.

In line 100, we'll see if the new factorial is so large that it wraps around. This only works once, and we only know it works because we know it works. We might as well check to see if N is less than 9.

And, in fact, the first time through, it doesn't work. We don't have a less than or equal comparison, so we'll add a line for the equality at 95.

10 N=0
20 F=1
30 ?=N
40 ?="! = ";
50 ?=F
60 ?=""
70 G=F
80 N=N+1
90 F=F*N
95 #=(G=F)*30
100 #=(G<F)*30

and that prints out the factorials of 0 through 8.

Subroutines in VTL:

Print the value of the (system) variable ! (exclamation mark).

?=!

If you just ran the program above, it should give you 101.

VTL does a little more than just set the line number when it sets the line number. (I told a little lie back there about the semantics. Or, at least, not the whole truth. It's doing a little more than re-using the semantics of assignment.)

Before VTL sets the # line number variable, it saves the current line number plus 1 in the ! (exclamation mark) variable.

So you get one level of call and return. Exclamation mark is the current return line number.

Don't get too excited, it's just one level. If you want to further call subroutines, you have to save the current return value first, yourself, somewhere.

There are other system variables, but I will stop here.

This should be enough to help you decide whether or not you want to go looking for the PDF manual with the rest of the system variables listed and some example programs to try.

One place the PDF manual for the Altair 680 can be found (with some other retro stuff) is at http://www.altair680kit.com/ .

The manual includes more sample code.

One downside of VTL-2 on the MC-10 or Color Computer is that we don't have a way to load the programs in other than typing them. If we had a true monitor, we could save and load the program area. Or someone could extend VTL-2 on these two, to create cassette tape save and load functions, perhaps -- someone besides me. I think I need to get back to some other projects.

Friday, September 23, 2022

A Short Description of VTL-2 Expressions (Very Tiny Language pt 1)

(Because some people want to know:)

The simplest way to describe VTL (Very Tiny Language) is to say it it is very small, and it turns your 8-bit microcomputer into a programmable integer calculator. Typical assembler implementations can fit into well less than 1 kilobyte of ROM.

Comparing this to the programming language BASIC, which can also be described as a way to turn your computer into a programmable desktop calculator. a typical small BASIC interpreter capable of handling floating point (fractional/scientific notation) numbers takes about 8 to 24 kilobytes of object code on a typical 8-bit microcomputer. BASIC on the original IBM PC took about 32 kilobytes. Bywater BASIC (something close to the original BASIC on the IBM 5150 PC) on a modern 64-bit computer takes about 180 kilobytes.

The Unix utility bc is another language that turns your computer into a calculator with arbitrary precision, and it has a typical object code image of about 90 kilobytes on a 64-bit computer.

Obviously, you won't get the functionality of a full BASIC or bc (or Forth) with VTL. But you can write small programs with it, so it's interesting.

Anyway, the available manuals all tend to be PDFs, and you may not want to go to the trouble of finding one and downloading it and hoping it's relevant to the implementation you have, so I'm writing down my impressions of VTL-2, with some examples.

This post focuses on expressions in VTL-2.

One word of warning, VTL hardly gives any feedback at all. In fact, my conversions to the MC-10 and the Color Computer don't even give you a cursor. (I should try to fix that, I suppose, but not today.)

But it does at least give you the

OK

prompt when it's ready for more commands.

No error messages. (Which I do not intend to try to fix.)

If you want see whether the VTL interpreter you've just gone to the trouble of downloading, assembling, and loading into your target machine is running, try this:

?=1+1

Don't forget the = after the ? . This is not a dialect of BASIC.

It should print out 2 for you. That means the expression grammar part is working at some level.

So, let's proceed with syntax and semantics, etc. --

Assignment:

Assignment is done with the usual = symbol:

A=A+1

and such.

Variables:

All variables are pre-declared and pre-allocated. Since the variable names are limited to one letter or character, there aren't that many, and it takes less memory to have them already exist than to write code to let you declare and allocate them.

The 26 variables A through Z are integer variables you can use as you want.

Output:

Printing re-uses the assignment syntax:

?=A

prints whatever integer value is in A to your output device.

Note that it sort of looks like the short-cut available in many BASICs, but is not.

? A

does something hard to explain just yet, but does not really do what you want, even though it may look like it does.

Printing strings is a logical extension to the print syntax:

?="APPLES"

puts the string

APPLES

on your output device.

You want to know why the = is in there? Re-using the syntax allows re-using the code and keeping it small, is the best explanation I can think of. (I assume. I am neither Gary Shannon nor Frank McCoy. Ask them if you get a chance.)

More variables, including system variables:

Variables other than A through Z?

All variable names are one character.

Take a look at your nearest ASCII chart. (It should be possible to produce an EBCDIC version of VTL, but we won't talk about that here.)

If you're running a real operating system, you can use

man ascii

at the command line to get an ASCII chart. Otherwise, a quick web search can find you one.

Stock VTL doesn't give you separate variables with lower-case names.

It does give you variables with punctuation marks for names, from exclamation mark to caret, but some, like ? and the digits 0 through 9 are not directly available, and several of the others are used by the interpreter.

The current line number, for instance, is #. Of particular note,

#=1200

is how you jump to line 1200. Note, again, the re-use of the assignment syntax. Or, actually, this is a re-use of assignment itself, but needs some more explanation, later.

(If you are having trouble with VTL-2 bombing or freezing on you, try setting the allocation base and top of RAM variables to zero for now:

&=0
*=0

But if VTL-2 is stable, don't do that. I'll explain later, in the programming introduction.)

Expressions:

First, let's work through the fundamental VTL expressions.

+ is addition,
- is subtraction,
* is multiplication,
/ is division (leaving the remainder in the variable %),
= in expressions (cough) is comparison for equality,
> in expressions is comparison for greater-than-or-equal,
< in expressions is comparison for less than, and
() nests expressions, overriding the normal left-to-right parse.

Comparisons give you 0 if false, 1 if true, which I will show how to use later.

All assume 16-bit integer math.

So, let's look at some examples.

The expression we used as a test up there

?=1+1

was just adding 1 and 1 and printing the result.

Typing in

Z=26

sets the variable Z to 26. After that, you can type

?=Z

and it prints out the value of Z,

26

unless you've changed it since then.

Let's look at comparisons for a moment.

?=Z=26

might cross your eyes, but it will print out 1 for true. It's comparing the value of Z with 26, which is what we just set it to, isn't it?

?=Z<26

will print out 0 for false, and

?=Z>26

will print out 1 for true - Remember that > in VTL is not just greater than. It's the opposite of <, so it's greater than or equal. (I don't think I'd have done it quite that way, but this is not my toy.)

Correcting input:

Continuing on,

?=31416/100000

Woops. Too many zeroes. If you tried backspace, you learned that doesn't work. (I think it could be made to work, but I haven't tried.) Unmodified VTL uses an underscore (back arrow on the MC-10 and Color Computer) to cancel single characters of input:

?=31416/100000_

but it does not erase them on the screen:

?=31416/100000_
3
OK

That's a little hard to get used to.

VTL also let's us cancel whole lines with at-each:

?=31416/100000@

Try it:

?=31416/100000@
OK
?=31416/10000
3
OK

So it divides 31,416 by 10,000 and prints the result. Following that with

?=%

prints the remainder

1416
OK

And that's helpful when you don't have a floating point or other fractional math built-in.

Some limits:

Again, this is integer only, so

?=3.1416

gives a result that is hard to explain just yet.

Also, just to warn you in advance,

31416 / 10000

without the ?= at the front defines line 31416 containing the invalid expression / 10000 . Which leads us to the fact that there is no line 0, which the authors have turned to advantage.

0

on a line by itself tells the interpreter to list the current program memory -- If you have valid allocation variables for your program memory.

Valid allocation variables? That's a detour we want to avoid just now.

Back to expressions.

Operator precedence:

The parse is strictly left-to-right, unless you use parentheses. (Again, they are keeping the interpreter simple and the code small.)

So, let's look at something a little complicated. Say we are calculating the area of a circle.

A = πr²

VTL-2 does not have (the last time I looked) exponentiation, so, in VTL-2, that would be

A=p*r*r

except that we don't have floating point or any other fractional math, so we're going to have to scale it. And p is not pi, so we'll have to get π in there ourselves somehow.

Suppose we can just throw the fractional part away, and a single fractional digit plus an extra is sufficient:

A=(314*R*R)/100

Try it. Set R to an integer radius and use the expression above.

R=4
A=(314*R*R)/100
?=A

Check it in bc if you have access to bc:

scale=40
pi=a(1)*4
r=4
a=pi*r^2
a
50.2654824574366918154022941324720461471488

(Heh. bc is fun.)

Or check it in a BASIC:

bwBASIC: 10 p=3.1415927
bwBASIC: 20 r=4
bwBASIC: 30 a=p*r^2
bwBASIC: 40 print "Area:", a
bwBASIC: list
     10: p=3.1415927
     20: r=4
     30: a=p*r^2
     40: print "Area:", a
bwBASIC: run
Area:         50.2654832

Okay? (I think bc is more fun.)

Okay. We're somewhat satisfied.

What happens if we leave the parentheses out in VTL? Try it.

In this case, it works okay going from left to right. In fact, that is exactly what we want -- In this case.

Use of parenthesis:

Let's look at something a little more involved. Say we want to calculate a markup of 50% on a lot of 50 apples and 80 nashi pears. Apples are ¥125 each and nashi are ¥189. (Cheap today.)

A=50
N=80
P=(A*125+N*189)*150/100
?=P

Will this work with just the one set of parentheses? If you guess no, you're right.

397

What happened? Work it left-to-right instead of using the algebraic precedence you worked so hard to remember in middle school or elementary:

50 * 125 is 6250.
6250 + 80 is 6330.
6330 * 189 is 1196370. Woops. That's way too big for 16 bits.

(You may be wondering, so I'll tell you here. All variables in VTL are unsigned. If you handle negative numbers, you have to handle them by magnitude and somewhere remember that they are negative.)

But go ahead and plug it into VTL anyway. Unless you've got VTL running on a 68000 or some other 32-bit or 64-bit CPU (or written to calculate 32-bit integers on a 16-bit or 8-bit CPU or something), it gives you 16722 to this point, right?

(Break out bc and subtract 2^16*18 and, oh, 16722 is exactly what it should have given us at this point.)

Okay, 50% is a half, so we should be able to just scale by 2 instead of 100. A and N should still be as we want them, so let's just repeat the expression setting P:

P=(A*125+N*189)*3/2

Oh, but there's something niggling at the back of your mind. It's going to blow up at 6330*189 again. It shouldn't. What's going wrong?

When parentheses don't say otherwise, it's working strictly left to right, right?

We have to have more parentheses because we have to calculate an intermediate value and VTL does not understand that algebraic precedence business that we spent so much time learning in middle school. (It's like some of the old desktop calculators from decades ago, or like some modern cheap calculators.)

I wonder if it will really work with more parentheses.

P=((A*125)+(N*189))*3/2

Does it give you 32055?

Okay, I guess it worked. bc says that's what it should give us. But it got close to the limits of 16-bit math in there. (Try each step by hand to watch it get close to blowing up.)

Now we know something about VTL-2 expressions, and have seen some of the limits, and this short description is getting long. Let's save programming for another post.

*** The post on programming is here:
https://joels-programming-fun.blogspot.com/2022/09/short-description-vtl-2-programming-very-tiny-language-pt2.html

Monday, September 19, 2022

VTL-2 part 5, Transliterating to 6809

Tandy/Radio Shack TRS-80 Color Computer 1

16K Color Computer 1
by Wikimedia contributor Bilby,
licensed under CC BY 3.0
via Wikimedia Commons

Well, the transliteration of the MC-10 version of VTL-2 in 6801 assembler to 6809 assembler on the Tandy Color Computer actually went pretty quickly once I made time, using the relationships between the 6800/6801 assembly language and runtime and the 6809 assembly language and runtime that I describe in my post on the software differences between the three CPUs.

But then I found myself using the assembly language equivalent of poking telltales to the screen to figure out where I'd fallen asleep at the wheel.

There was only one place, really, where I had inverted the transfer from B to A in the character output routine, or was it from A to B in the keyboard input? Something like that.

And it handled basic expressions, but wandered off in the ether any time I tried to type in a program. After going back over the transliteration with a fine-toothed comb and finding nothing (and falling asleep doing it) for several days in a row, then using more telltales to the screen to pinpoint where it was dying, I decided to look back in the commented disassembly of Color Computer extended BASIC.

It was only a matter of a half an hour to finding the problem.

Color Computer BASIC uses a number of variables in the direct page.

I had dodged a few of the variables down around $C0 by starting the direct page variables at $D0. But there is a variable at $E2 used by the IRQ handler routine as a counter, and by starting there I was using $E2 as the SAVLIN variable, which is essentially the most used variable during program editing.

But $00E2 is the place where the CoCo BASIC IRQ response return was counting screen refresh interrupts (60 times a second).

So I moved everything down to start at $C4, just after the PIA mask variable, and the variable list ended right before $E2.

So I didn't have to move any variables, and I didn't have to optimize the copy routines to be sensible and use X and Y together instead of X and the SRC and DEST variables in the DP.

And I can type in the test program I've been using:

10 A=0
20 A=A+1
30 ?=A
40 ?=""
50 #=(A<11)*20
60 ?="DONE"

and list it by typing

0

and hitting Enter. And I can run it by typing

#=1

and hitting Enter.

The source code is at https://osdn.net/users/reiisi/pastebin/8705. Copy or download it from there.

If you want to assemble it to run in 32K, fix the ORG before COLD. It needs less than 1K of RAM for the code. With another 1K to the end of RAM for the BASIC stack it borrows and as a buffer between BASIC and VTL-2, set the ORG at 2K before the end of RAM.

You'll need LWTools to assemble it -- either download it or use mercurial or git to clone it, then compile it and copy the executables into your preferred place for user-local executables.

Then use the following command line or something similar to assemble the source:

lwasm --list --symbols -f decb -o VTL_6809_coco_translit.bin VTL_6809_coco_translit.asm

This will give you a .bin file that XRoar can load from XRoar's File menu. (Sometime I'll put a screenshots here. Until then, refer to the MC-10 post.)

Run XRoar with the command line with

xroar -machine coco -ram 16k &

for a 16K RAM Color Computer configuration. It will look a lot like what I show of XRoar running the MC-10 emulation in the MC-10 post.

If you load it from the .bin file, you'll need to type

EXEC &h3800

at the BASIC prompt to get VTL-2 running after loading it.

Another way to load it is to convert it to a .cas format and load it from the Tape Control dialog, as I described for the MC-10. You can convert the .bin file to a .cas file with a command like

bin2cas.pl -o VTL-2.CAS -C -l 0x3800 -e 0x3800 VTL_6809_coco_translit.bin

To get the bin2cas.pl tool, look for it under CAS Tools on the the dragon page on 6809.org:

https://www.6809.org.uk/dragon/

If you specify the load and execute addresses in the bin2cas command line as above, you'll need to type

CLOAD
EXEC

at the Color Computer BASIC prompt.

You should be able to modify this source code to run on other 6809 computers with a little thought, especially if you walk through my posts on getting the 6800 and 6801 source running on the MC-10 and on EXORsim.

You should also find some interesting challenges left for the interested reader, such as modifying it to use the Y register in the edit routines, and to work with the DP moved out of BASIC's working area. Such things as that.

I think I'm off to other interesting projects now.

(Maybe. Not sure whether I want to go back and see if I can get fig-Forth running right on the 6809 first or whether I want to try my hand at writing my own VTL source, or a VTL-like language of my own design, etc. Or go back to trying to work on novels for a while. I probably need to get XRoar set up for GNU-debugging before anything else.)

[JMR202209231810: add]

I have written up a post on VTL expressions, here: https://joels-programming-fun.blogspot.com/2022/09/short-description-vtl-2-expressions-very-tiny-language-p1.html , which will help in testing and otherwise making use of the language. I should also shortly have a post up introducing programming in VTL-2.

JMR202209231810: add end]

[JMR202210011737: add]

I now have my work on VTL-2 up in a private repository:

https://osdn.net/users/reiisi/pf/nsvtl/wiki/FrontPage

There is a downloadable version of VTL-2 for the Tandy MC-10 (6801) in the stock 4K RAM configuration in there, with source, executable as a .c10 file, and assembly listing for reference. Look for it in the directory mc10:

https://osdn.net/users/reiisi/pf/nsvtl/files/

I haven't wrapped it up for the Color Computer yet, expect that later. But you can see the latest source I'm working on for the Color Computer in the source tree:

https://osdn.net/users/reiisi/pf/nsvtl/scm/tree/master/

[JMR202210011737: add end]

Software Differences between the 6800, the 6801/6803, and the 6809

Following on from my previous post about the software differences between the 6800 and the 6801/6803, I'm going to try to set out the software differences between those and the 6809. You may want to read that post first for background.

In terms of history, the 6801 came after the 6809, not, as some seem to expect, before.

Certain of Motorola's customers looked at the 6809 and said they wouldn't know what to do with all of that, and couldn't they have a chip that was just the 6800 with some ROM and RAM built-in instead?

Motorola already had the 6802/6808 out, which combined the 6800 and 128 bytes of RAM. The 6802 also had most of the CPU clock generation circuitry built in. The 6808 (not directly related to the 68HC08 and later derivatives of the 6805) was an odd chip, basically the 6802 with the RAM disabled (presumably because the RAM failed Q/A tests).

The 6802 was intended to be paired with custom ROM versions of the 6846, which was similar to MOS Technologies' RIOT chip, but definitely not the same. The 6846 had 2K of ROM, a parallel port, and a hardware timer device. (I had a Micro Chroma 68, which had the 6808 or 6802 paired with a 6846 with a monitor program in the ROM -- Fun little board for prototyping, and I should have done more than I did with it.)

As I said in the post linked above, Motorola borrowed a few ideas from the 6809, but designed the 6801 to be strictly compatible with the 6800 -- except for the new op-codes and the condition codes for the CPX compare X instruction. And they improved the instruction timing. But they failed to put in the missing direct-page versions of the unary operators like increment, decrement, and such.

And one of Motorola's big customers said the 6801 was still too much CPU, couldn't they strip it down more and make it cheaper?

And Motorola complied, producing the 6805, which is a true 8-bit CPU with only one accumulator and only an 8 bit wide index. But it has bit manipulation instructions for the direct page, and it has direct-page versions of all instructions. For unary instructions, it has 16-bit offset indexed versions of the instructions. On the 6805, instead of index+constant-offset, you (often) wanted to think constant-base+index. (Oh. And the initial 6805s had no push or pop, just branch and jump to subroutine. You had to do software stacks if you wanted them.) This was all in fewer transistors than the original 6800.

Motorola would evolve the 6805 in HCMOS versions, with more instructions and such. Some years later, Motorola brought out the 68HC08, which added a high byte of index to the 68HC05. (And then there was the 68HCS08 with push and pop, and the 'R08 and ....)

And later still, Motorola brought out the 68HC11, which added a Y index, bit manipulation instructions, and a little more to the 68HC01. The 68HC11 was strictly upward-compatible with the 6801. And we are roughly a decade ahead of the story now. But it's relevant.

As I say, the 6809 preceded the 6801 and 6805. The latter two benefited from lessons learned in the 6809, which the 6809 was never evolved to benefit from. (Nor was the 68000, unfortunately, until the CPU32, much later.)

Remember, the 6801 is almost (99.9% or so) perfectly object-code compatible with the 6800, but with faster instruction timings on a lot of instructions.

The 6805 is not object-code compatible, but inherits most of the overall instruction set and run-time model of the 6800 series. It also has better instruction times than the 6800, more effective use of memory space, and more flexible indexing.

The 6809 is not object-code compatible. Instructions are laid out a little different, and index encoding is significantly different. It inherits most of the instruction set of the 6800, and it extends the register and run-time model of the 6800 series, but it does not have all the improved instruction timings. The instructions it is missing can be synthesized, but care has to be exercised in the process in many cases.

And it departs from the 6800 in an important, non-trivial way --

Pushes on the 6800 and 6801 are post-decrement. Pops (PULs) are pre-increment. The stack pointer S is always pointing to the next available byte on the stack. That's a bit of a departure from most stack implementations, but the 6800 makes up for it in the TSX (transfer S to X) instruction by adding 1 before the transfer. And it makes up for it in the TXS instruction (transfer X to S) by subtracting 1 before the transfer. So, after a TXS, X is pointing to the last item pushed.

But if you store S in RAM and load it into X with the 6800 or 6801, this adjustment does not happen. You have to remember to use INX (increment X) after the LDX (load X), or adjust the offset to X. In other words, on the 6800,


      PSHA
    TSX
    ORAA   0,X

but


      PSHA
    STS	   TEMP
    LDX	   TEMP
    ORAA   1,X	; adjusting the offset instead of INXing.

On the 6809, pushes are pre-decrement, and pops (PULs) are post-increment. So the stack register is always pointing to the last item pushed, without adjustment. If you play games with the stack in 6800 or 6801 code, you have to be careful with this when moving the code to the 6809. (But the games usually played with the stack on the 6800 or 6801 are pretty much beside-the-point on the 6809. There are much better ways. More below.)

Using the above example, on the 6809,


      PSHS   A
    TFR	   S,X
    ORA    ,X

and


      PSHS   A
    STS	   TEMP
    LDX    TEMP
    ORA	   ,X	; no adjusting the offset or index.

or, better yet,


      PSHS   A
    ORA	   ,S+  ; but this is getting a little ahead of the story.

That's a long preamble, but I hope it will help you know what to look for as I compare the 6809 with the other two processors.

In broad overview, the differences in the 6809 are as follows, kind of in the order they tend to stand out:

Pairing A:B as double accumulator D, as in the 6801 (but not including 16-bit shifts).
Five indexable registers (X, Y, U, S, and PC) in the 6809, two of which (U and S) can be stack registers, as opposed to the 6800/6801 single X index register and single S stack register.
Movable direct page, with associated DP (upper 8 bits) direct page register. (This was not implemented as well as we might have hoped, more below.)
Extended interrupt model. (More registers means more time required to save them all, so an interrupt service that used only one or two registers could be assigned to the FIRQ fast interrupt request, and save time by only saving what it used.)
Two extra software interrupts (which allows the one-byte SWI to be used as a software breakpoint and the two-byte SWI2 and SWI3 to be used as system calls, if you want to do system calls with SWIs).
Stack model that always points directly at the top of stack, as mentioned above.
Compactly encoded extended indexing model:
- no offset,
- small (5 bit) signed constant offset,
- medium (8 bit) signed constant offset,
- long (16 bit) signed constant offset,
- A, B or D accumulator signed offset,
- post-increment and pre-decrement indexing (for efficient data moving -- goodbye stack blasting!),
- using the PC as an index register (as noted above),
- final memory indirect with any of the above,
- plus memory indirect with extended mode (but not, unfortunately, with direct page mode),
- and, for some reason, no way to use a load effective address (LEA) instruction to directly calculate the address of a variable in the direct page (further explained below).
Unary instructions all have direct-page mode, but are slower in direct-page mode than you might hope.
Effective address calculation instructions (LEA) to put the target addresses calculated in each of the indexed modes above into one of X, Y, U, or S. (JMP instruction allows indexed modes, so PC as a target is not needed.)
Note that LEAS and LEAU do not affect the condition codes, where LEAX and LEAY affect the Z bit.
No INX/DEX or INS/DES. Use LEAX 1,X/LEAX -1,X and LEA 1,S/LEA -1,S instead (in other words, for single increment/decrement. See below).
Full compare instructions for the four proper index registers, compatible with the 6801 CPX but not the 6800 CPX.
Multiple register push and pop for both S and U stacks.
- (And, hello, again, stack blasting. Hah.
  Be very careful if you do. Even with interrupts masked, this does not do the trick:
  PULU A,B,X,Y PSHS A,B,X,Y ; look at what happens when you repeat! Probably the best you should want to do is
  PULU A,B,X STD ,Y++ STX ,Y++ unless you like wee-hours-of-the-morning debugging sessions.)
Register-to-register transfer (TFR) instruction with source and register encoded in post-byte replaces the Txx transfer instructions in the 6800.
Register-with-register exchange (EXG) instruction that uses the same post-bytes as TFR.
Has ABX as in the 6801 (but not 6800), not affecting the condition codes.
LEAX B,X would be almost equivalent, except ABX treats B as unsigned and sets no flags by the result, where LEAX B,X sets the Z flag.
No ABA (add B to A) or SBA (subtract B from A). Instead, you can do it something like this: PSHS B ADDA ,S+ ; for ABA which is slower, but essentially equivalent if your S stack pointer is pointing to valid memory.
Instead of SEC, CLC, etc., for working with the condition codes, the 6809 has
- ORCC #FLAGBITS ; for setting flags
- ANDCC #~FLAGBITS ; for clearing flags
Neither of the 16-bit shifts that the 6801 has.
For LSRD use
LSRA RORB For LSLD use
LSLB ROLA
SYNC instruction can be used for fast synchronization to external input or to wait for an interrupt (slightly different from 6800/6801 WAI).
6809 has long branches, which aid in writing position-independent code. Conveniently, LBRA (long branch always) and LBSR are allocated single-byte op-codes to help motivate their use.
And the 6809 has BRN/LBRN as in the 6801.

I think that pretty much covers it, except the devil that is in the details.

In other words, you can map most single 6800 and 6801 instructions to single 6809 instructions. Those that don't map to single 6809 instructions either map to pairs of instructions or, especially in the case of the interrupt instructions, have to be fixed to account for the larger register set being pushed by the interrupt and for some technical differences implied by hardware in whatever design you're using.

But if you do that kind of unintelligent transliteration, you'll want to do at least a second pass. For instance,


      INX
    INX
    INX
    INX

which maps mechanically to,

    LEAX   1,X
    LEAX   1,X
    LEAX   1,X
    LEAX   1,X

really wants to be replaced with the single instruction


      LEAX    4,X

And


      LDAA    0,X
    LDAB    1,X
    INX
    INX

which maps mechanically to


      LDA    0,X
    LDB    1,X
    LEAX   1,X
    LEAX   1,X 
  really wants to be replaced with the single instruction


      LDD     ,X++

Unfortunately, the above kinds of optimizations tend to be hidden by the ways programmers tend to try to optimize 6800 code.

I think I've covered stack games pretty well. If you understand how to build a stack frame on the 6801 and understand what I've mentioned about the stacks and the TFR instruction above, you should probably be able to extrapolate how to do stack frames on the 6809.

Oh, parameter handling is a separate topic, but I guess I should at least give it a mention.

On the 6800 or 6801, if you are passing parameters on the call stack (which I don't prefer, myself), your calling routine on the 6800 might do something like this:


      PSHA    ; parameter in A
    JSR     SOMEFUN
    INS
    ...

and your called routine might do something like this:


  SOMEFUN
    ...
    TSX
    LDAB    2,X   ; skip over return address to parameter
    ...
    RTS

On the 6809 it would like like this:


      PSHS    A    ; parameter in A
    LBSR    SOMEFUN
    LEAS    1,S
    ...
    
    ...
SOMEFUN
    ...
    LDAB    2,S   ; skip over return address to parameter
    ...
    RTS

And since I've mentioned that I prefer splitting the stacks, on the 6800, a separate software parameter stack would look something like


  PMSTKP      ; preferably in the direct page
    RMB     2
    ...
PUSHA       ; Looks like a lot of bother, I know.
    LDX     PMSTKP
    DEX
    STX     PMSTKP
    STAA    0,X
    RTS
* (We need this, too:)
INCPS1      ; Looks like more bother, yes.
    LDX    PMSTKP
    INX
    STX    PMSTKP
    RTS
    ...

    ...
*** Woops! not this:    JMP     PUSHA   ; parameter in A
    JSR    PUSHA   ; parameter in A (this)
    JSR    SOMEFUN
*** Facepalm! not this:    INS
    JSR    INCPS1  ; increment parameter stack (this)
    ...
    
    ...
SOMEFUN
    ...
    LDX     PMSTKP
    LDAB    0,X   ; no return address to worry about
    ...
    RTS

On the 6809 the separate software stack pointer is in U:


      PSHU    A    ; parameter in A
    LBSR    SOMEFUN
    LEAU    1,S
    ...
    
    ...
SOMEFUN
    ...
    LDAB    ,U   ; no return address to worry about
    ...
    RTS

Which I think is very clean.

I need to talk a little about the direct page register.

The DP register allows a potential trap. If interrupt handler routines use variables in the direct page, they should load and set their own DP, or they should use the full extended address.

For example, if you have a timer variable at $00E2 that is incremented in the IRQ service routine, you might do this:


      SETDP   0
    ...
IRQ 
    ...
    INC   $00E2
    ...
    RTI

and the assembler would assemble that as a direct page reference.

If the user code that gets interrupted has DP set to $02, the address that will get incremented at interrupt time will be $02E2, not $00E2, which is not what you want at all.

So what you should do is


      ...
  SETDP   0
IRQ 
    LDA   #0
    TFR   A,DP
    ...
    INC   $00E2  ; now the address is right
    ...
    RTI   ; restores the interrupted DP

Or you might do


      ...
IRQ 
    LDA   #0
    TFR   A,DP
    ...
    INC   >$00E2  ; force extended mode
    ...
    RTI

(I think I'm not getting that backwards, > to force extended and < to force direct page.) If your assembler doesn't allow both of the above, it is not a decent 6809 assembler. Get another one.

~~Here's how to~~

I think I need to show how to calculate the effective address of a direct-page variable:


  DPBASE	EQU n*$100
    ORG   DPBASE
    ...
DPVAR RMB 1
    ...
    ORG   CODE
    SETDP   n
    LDA   #n
    TFR   A,DP
    ...
    LDB   #DPVAR-DPBASE
    TFR   DP,A
    TFR   D,X   ; X now contains the address of DPVAR
    ...

And I need to mention the PC relative addressing mode, even though you won't be considering it when moving 6800 or 6801 code to the 6809. This is just so you can get a real idea of why the 6809 is so interesting when it doesn't automatically run 6800 code faster.

Let's look at a simple 7 constant table buried in code that needs to be position independent:


  *
MFCONST:
    FCB   CONST0
    FCB   CONST1
    FCB   CONST2
    FCB   CONST3
    FCB   CONST4
    FCB   CONST5
    FCB   CONST6
*
MOREFUN
    ... 
    CMPB  #7
    BHS   MFERROR
*** ERK! Not this:    LDX   MFCONST,PCR  ; the assembler should calculate the offset for you.
    LEAX  MFCONST,PCR  ; the assembler should calculate the offset for you. (this)
    LDA   B,X
    ...

Without the PCR addressing, you'd need a loader to patch the address of MFCONST used in MOREFUN to be able to run MOREFUN at arbitrary addresses. With the PCR addressing, MOREFUN and the constant table can be in ROM.

[JMR202210011737: add]

As an example of source code that is being kept very parallel, I now have my work on VTL-2 up in a private repository:

https://osdn.net/users/reiisi/pf/nsvtl/wiki/FrontPage

You can peruse the source tree and compare files:

https://osdn.net/users/reiisi/pf/nsvtl/scm/tree/master/

[JMR202210011737: add end]

Saturday, September 3, 2022

Programming Tandy/Radio Shack's MC-10 in Assembly Language

What you were probably looking for is here:

https://joels-programming-fun.blogspot.com/2022/08/trs-mc-10-assembly-lang-pt1-vtl-2.html

This is where that rant originally resided, but the URL and the content didn't quite match, so I moved it.

I will probably add links for more assembly language information for the MC-10 here later.

Sunday, August 28, 2022

Tandy/Radio Shack MC-10 Assembly Language, pt 1 -- Getting VTL-2 (Very Tiny Language) Running

Tandy/Radio Shack MC-10 Microcomputer
photo by Simon South, from Wikimedia,
licensed under GNU Free Documentation License

(This is part 4 of the VTL series.)

The MC-10 is a very stripped-down computer based on the 6801 microprocessor and the 6847 video display generator that Tandy/Radio Shack produced in 1983, a couple of years too late and priced too high to compete in its target market, against the Commodore VIC-20 and the Timex/Sinclair ZX81.

Yes, if Radio Shack management had recognized the market , they could have released essentially the same computer in 1981, maybe with a non-Microsoft BASIC. I think it would have been very competitive in 1981.

Part of the reason I find it interesting is that it is similar to Motorola's Micro Chroma 68 prototyping kit, which was my first computer back in 1981.

Anyway, I have been working over the last several weeks, on adding functionality to my assembler for the 6800/6801, asm68c, so that it can produce the .c10 format file that the MC-10 can read. That is in itself worth a post, but not now. My purpose in doing so was to make it possible to install VTL-2 (Very Tiny Language) and fig-Forth on the MC-10.

In order to get either running on the MC-10, I need to arrange to get input from the keyboard and output to the display. In fact, pretty much anything you want to do beyond graphics is going to pretty much have you using the keyboard and the display.

Output to the raw display in assembler is actually not all that hard, and input from the raw keyboard is not particularly impossible, but especially the keyboard would require writing and debugging a lot of code. If the BASIC ROM provides routines I can use I might as well use them at this level of work.

Writing a simple set of BIOS class routines can be a project for a hypothetical 'nother day. Others have done something along this line, providing a more capable BASIC interpreter that doesn't get in the way of high resolution (for the 6847 controller) graphics and such.

I remembered that someone had posted information on using the BASIC ROM routines in the MC-10 group Facebook group, so I went hunting and found a disassembly file of the BASIC ROM that Simon Jonassen had posted, mc10.txt. And then Greg Dionne told me about a full assembly listing that is also available, MC10_ROM.txt.

In these files, I found a list of useful hooks into the ROM just under the interrupt vectors, starting at address $FFDC. The first two hooks are


    FFDC | F8 83    POLCAT  fdb  KEYIN   ; read keyboard
FFDE | F9 C6    CHROUT  fdb  PUTCHR  ; console out

But that's not enough information to be confident I can use them.

You can go to the labels KEYIN and PUTCHR in the listing and find more complete descriptions. For PUTCHR, you find this (plus a bit more):

* Send character in ACCA to the current output device.

Likewise, for the KEYIN routine, you find this:

* Poll the keyboard for a key-down transition.
* Return ASCII/control code in ACCA or 0 if no key-down transitions.

It looks straightforward, but I have found there are often a lot of hidden assumptions behind such routines. It's usually wisest to test them with simple code before you go trying to wrap whole parsers and interpreters around them.

And I was, for some reason, a little suspicious of the POLCAT routine -- too suspicious, maybe.

And that's the subject of this post.

My first attempt was pretty simple. Scan the keyboard, output the result, wait a while so I have time to see the results, repeat. Except that I put in code that adapted it to my source for VTL-2, which wants to use the B accumulator to get the characters in and out instead of the A accumulator:


    *	INPUT ONE CHAR INTO B ACCUMULATOR
INCH	PSHA
	PSHX
	LDX	INCHV
	JSR	0,X
	PULX
	TAB
	PULA
	RTS
*
*	OUTPUT ONE CHAR 
OUTCH	PSHA
	PSHX
	LDX	OUTCHV
	TBA
	JSR	0,X
	PULX
	PULA
	RTS

Then I ran out of time. (I'm working on this in the evenings and on the weekends).

And when I came back I had forgotten that the B accumulator shims were in there.

Wasted a whole evening "discovering" that the BASIC ROM was actually using the B accumulator, in contradiction to all claims and appearances! And started planning this post to explain how I discovered this pseudo-fact. (Bleaugh!)

I suppose I could retrace my steps in those "discoveries". Outside of the fact that I was ignoring the shims that were right in front of my nose, I was actually using useful debugging techniques. But since the whole process was based on false assumptions, that would be confusing. And somebody might end up thinking those pseudo-facts were facts. So I won't.

Instead, in this post, I'll walk through the tests as I should have done them. Or at least a few of the steps.

(And, again, I ran out of time. Too many distractions. I hope I can remember what I was doing next time I pick this up.)

Okay, so I tried this, somewhere along the line:


    	OPT	6801
	ORG	$4C00

* MC-10 BASIC ROM vectors
POLCATV	EQU	$FFDC	; Scan keyboard	
CHROUTV	EQU	$FFDE	; Write char to screen


TEST	LDX	POLCATV
	JSR	0,X	; scan
	TSTA		; 0 means no key pressed
	BEQ	TEST	; That's going to play Hobb with CTL-@.
	LDX	CHROUTV
	JSR	0,X	; print to screen and advance
*	CLRB
*	CLRA
*LOOOP	SUBD	#1	; wait a sizeable fraction of a second
*	BNE	LOOOP
	BRA	TEST	; get more
*
	ORG TEST

You'll notice that I also tried a wait loop, which is now commented out. Without the wait loop, it sits there and waits for me to hit a key and then outputs it at the current location on the screen.

With a little more code and testing, I was able to convince myself that it does what the programmer who wrote the comments considered polling (although it isn't what I consider polling) and I figured out how to work the interface:


    * MC-10 BASIC ROM vectors
INCHV	EQU	$FFDC	; Scan keyboard	
OUTCHV	EQU	$FFDE	; Write char to screen
*
*	RECEIVER POLLING
POLCAT	PSHA
	PSHX
	LDX	INCHV	; at any rate, don't wait.
	JSR	0,X	; 
	TAB		; MC-10 ROM says NUL is not input.
	SEC
	BNE	POLCATR	; Don't wait.
	CLC
POLCATR	PULX
	PULA
	RTS
*POLCAT	LDAB	ACIACS
*	ASRB
*	RTS
*
*	INPUT ONE CHAR INTO B ACCUMULATOR
INCH	BSR	POLCAT
	INC	$400F	; DBG
	BCC	INCH	; Wait here.
	INC	$4010	; DBG
	STAB	$4011	; DBG
	RTS
*
*	OUTPUT ONE CHAR 
OUTCH	PSHA
	PSHX
	LDX	OUTCHV
	TBA
	JSR	0,X
	PULX
	PULA
	RTS

But when I tried to load VTL-2 with this (CLOADM), BASIC gave me I/O errors. And I noticed that the output object was relatively huge.

And I remembered that the ORG and the RMBs that set up the direct page variables will cause the .c10 file to include object where there is no memory on the MC-10 -- in the range from $0100 to $4000. Trying to CLOADM binary code where there is no memory will cause I/O errors from BASIC.

So I decided, as a quick fix, to change the direct page declarations to EQUs instead of RMBs:


    * 	ORG	$C0	; Move this according to your environment's needs.
DPBASE	EQU	$C0	; Change this to move the registers.
* PARSET	RMB	2	; Instead of SAVE0 in TERM/NXTRM
PARSET	EQU	DPBASE+2
* CVTSUM	RMB	2	; Instead of SAVE1 in CBLOOP
CVTSUM	EQU	PARSET+2
* MLDVCT	EQU	CVTSUM	; Instead of SAVE1 in mul/div (1 byte only)
MLDVCT	EQU	CVTSUM
* DIVQUO	RMB	2	; Instead of SAVE2 in DIV
DIVQUO	EQU	MLDVCT+2
* MPLIER	EQU	DIVQUO	; Instead of SAVE2 in MULTIP
MPLIER	EQU	DIVQUO
* EVALPT	RMB	2	; Instead of SAVE3
EVALPT	EQU	MPLIER+2
* CNVPTR	RMB	2	; Instead of SAVE4
CNVPTR	EQU	EVALPT+2
* VARADR	RMB	2	; Instead of SAVE6
VARADR	EQU	CNVPTR+2
* OPRLIN	RMB	2	; Instead of SAVE7
OPRLIN	EQU	VARADR+2
* EDTLIN	RMB	2	; Instead of SAVE8
EDTLIN	EQU	OPRLIN+2
* INSPTR	RMB	2	; Instead of SAVE10 (maybe? Will some VTL programs want it back?)
INSPTR	EQU	EDTLIN+2
* SAVLIN	RMB	2	; Instead of SAVE11
SAVLIN	EQU	INSPTR+2
* SRC	RMB	2	; For copy routine
SRC	EQU	SAVLIN+2
* DST	RMB	2	; ditto
DST	EQU	SRC+2
STKMRK	EQU	DST+2	; to restore the stack on each pass.
DPALLOC	EQU	STKMRK+2	; total storage declared in the direct page

That's a little tiresome work, adding the previous label plus two in place of all the RMBs, but it does the job.

(I think I'm going to regret doing it this way, later.)

That loaded, but then when I tried to EXEC it, it just went away and didn't come back.

After some thought, I decided to use the BASIC stack instead of making a private stack for VTL-2. I'll need to check that it really is enough stack, but I thought that was the first thing to try, and that's what the second-to-last line there defining STKMRK is for. (Okay, that's two steps in one I'm showing.)

The interpreter wants to reset the stack each time through, so it needs something to reset the stack to, and that is why I'll save the BASIC stack position to STKMRK on entry from the COLD label.

And that worked well enough to give me a cursor. But what I typed at the keyboard did not show on the screen.

So I added some debugging assembly language equivalent of POKEs to the screen. With the right debugging output to screen

(see the source of a later step at https://osdn.net/users/reiisi/pastebin/8573, the lines with DBG in the comments),

I was able to see that the commands were actually going in, and that they were being recognized, because I was able to get good output from typing

?=*

even though the command itself did not show on the screen. And that quickly led to recognizing that the MC-10 does not echo what you type until after BASIC parses it. So I needed to decide where to add code to echo the input. The EXORsim code simply assumes that INCH will echo for me, so I decided to follow that approach and just add an output JSR in the INCH routine. Should work.

Let's publish this rant-in-progress to my blog while I see if it works.

And, it did.

Okay, now the echo works. And I added what I thought would store the stack pointer in the Z variable so I could examine it, but I did it like this:


    START
*	LDS	#STACK	; re-initialize at beginning of each evaluate
	LDS	STKMRK	; from mark instead of constant
	STS	VARS+25 ; DBG so we can get a look at the stack pointer BASIC gives us.

That should have been VARS+25*2. So, instead of in Z, it shows up spread across L and M, which is an eye-crosser to read. That's easy to fix. And I should remove the other DBG lines before I forget.

Now I can see where the stack is when BASIC passes control to VTL-2.

Why should I worry?

The last byte assembled by the above source is at $4F67. The end of memory for a 4K MC-10 is $4FFF. I think that should be enough space between the code and what BASIC is using at the end of RAM, but putting that stack pointer somewhere I can see it helps me figure if it really is enough.

But checking the memory bounds variables shows that my memory probe code is not working right. & is set right, but * ends up with the probe having not run at all. Still, if I overwrite the * variable with 19456 ($4C00), I can enter a simple program, list it, and run it.

Hmm. It looks like I have the branch upside down in the probe somehow. How did I have that working before? Did I?

[Sort of. But, yes, the test is upside down. I am embarrassed. More below when I get it fixed. It's lousy, still having to work a day job instead of devoting full time to playing games like this.]

Somewhat of success, but I think that's all I have time for tonight. (What day was this?)

Picking this back up on the 3rd of September, I'll give you the summary version of what happened with the probe test that was upside down. Yes, this is embarrassing.

You know how I got all enthusiastic about the improved CPX in my explanation of the software differences between the 6800 and the 6801? Well, when I was adapting Joe H. Allen's simulator, I forgot to make sure the carry flag got set in the CPX emulation routine when running as a 6801. Probably also failed to set the other flags right. So, in my initial successful code for EXORsim running a 6801 core, I had the test inverted.

So I had to go back and fix that in EXORsim6801, and test it a little, and move things around a bit in my blogs and OSDN stuff, and that's where I've been for three weeks or however long it took.

I should describe how to get this stuff running on the Xroar MC-10 emulator at this point, I think. Get the source from the pastebuffer in my OSDN pages:

https://osdn.net/users/reiisi/pastebin/8607 (But see notes for 20220904 and 20220911 below.)

Assemble with the switch to output the .c10 file. The command line

asm68c -l2 -c10 VTL_6801_mc10.asm

should give you the listing to the screen so you can check for errors. Or you can redirect the listing to a file

asm68c -l2 -c10 VTL_6801_mc10.asm > VTL_6801_mc10.list

and pull the listing file up in a text editor.

Run Xroar with the command line

xroar -machine mc10

You may need to add some to the configuration file to get that to work, I don't remember. (I put an ampersand on the end to tell the bash shell to run it concurrently with the terminal session, so I don't have to open another terminal.)

Selecting the tool menu will allow you to select keyboard translation, which I need (with my Japanese keyboard), and, more importantly, open up the tape control:

Click the insert button in the tape control dialogue and select the file VTL_6801_mc10.c10 (assuming you've given the assembler file the same name as I show above).

Now click the emulator window to make it active again, and type in the CLOADM command.

Go back to the tape control dialogue and hit the play button, and the MC-10 emulator should load the virtual tape for something like twenty virtual seconds (virtual seconds! Goes by very quickly, less than a real second.) and then tell you it got it. Below the file name, type in the EXEC command with no start address.

I'm not sure whether the CLOADM command in the stock MC-10 BASIC allows specifying a load address. Nor am I sure whether the EXEC command allows specifying an execute address. But the lowest ORG directive in the assembler file sets the load address for the C10 file. And the ORG directive at the end of the assembler file sets the starting address in the S-record output, with the C10 output setting the same starting address. So you don't need to tell BASIC where to load or where to execute.

Anyway, entering the EXEC command should give you an OK prompt ~~and a cursor~~ without a cursor, and you may think nothing happened until you try typing something.

[202209111413 add:]

No cursor in the source I have up. Adding a cursor is pretty simple if you look through the listing of BASIC A flashing cursor takes a little more thought and effort. I'll leave that as an exercise for the reader, since I want to move ahead to the 6809 transliteration.

[202209111413 add end.]

VTL doesn't give many error messages, it mostly just doesn't do anything it doesn't understand. So if you type "ZZ" and hit enter, rather than a cryptic BASIC error message, you just get another OK and more cursor:

Test it by typing in a math expression, like

?= 1+1*2

The ?= is like BASIC's PRINT comman. But you'll quickly notice that VTL's calculator parsing is not the algebraic infix you learned in school and get from BASIC. When you need to alter the left-to-right parsing, use parenthesis:

?= 1+(1*2)

The ampersand variable & tells you where in memory your program will get stored. It's kind of the bottom of currently available memory. (Kind of.)

The asterisk variable * tells you what the highest available address for VTL programs is. (The way I've set it up, it's the byte before the VTL interpreter itself starts.)

Both the & and the * are supposed to be set by hand in the original VTL-2, but I had to move so much around that it just made more sense to have the interpreter probe and set them for you.

I did check whether it should run in a 4K MC-10.

xroar -machine mc10 -ram 4K &

With that, VTL-2 returns these values for the beginning and end of the programming area, and I ended up putting the BASIC stack pointer marker in the semicolon variable:

?=&
17416
?=*
19455
?=;
20375

That's

$4408 begining of VTL programming area
$4BFF end of VTL programming area
$4F97 BASIC stack mark

The last address assembled in the code as it stands now is $4F67. So you have $4F98-$4F68 or $30 (decimal 48) bytes of stack space before the stack overwrites the return from OUTCH, and VTL crashes. That should be enough, especially for programs that fit in the very small programming area.

It's plenty to run a short test program:

10 A=0
20 A=A+1
30 ?=A
40 ?=""
50 #=(A<20)*20
60 ?="DONE"

If you want to give VTL-2 more room to work, you can set the ORG before COLD to the highest address your memory configuration allows, about one or two kilobytes less than your end of memory. And that should be the only thing you have to change, if I did my job right in the source.

Uh, yeah. The object image for VTL is less than a kilobyte. Just so you know. Just so you're prepared for the limits.

This is not thoroughly tested, and has some known problems -- like random number generation is not quite functional.

[202209041331: add (more embarrassment!)]

I made a bad optimization to the random function in the above, and in the EXORciser versions. Don't know how I missed this one, either. Blame it on my age?

At the label AR2 in the source code, I falsely corrected, without thinking, the addition of the low byte to the high, and vice-versa. My false correction looks like this:


    AR2	STD	0,X	; STORE NEW VALUE
	ADDD	QUITE	; RANDOMIZER
	STD	QUITE
	RTS

Looks reasonable, right? As long as you ignore the comment about randomizer?

Well, here's what it looked like in the original 6800 code:


    AR2	STAA	0,X	; STORE NEW VALUE
	STAB	1,X
	ADDB	QUITE	; RANDOMIZER
	ADCA	QUITE+1
	STAA	QUITE
	STAB	QUITE+1
	RTS

See what was going on?

Well, I probably should fix the pastebuffer, but OSDN won't let me today. Maybe I have too many. But I don't want to publish this code in a repository without some explicit permission from the authors. Anyway, the fix I recommend is below, with a couple of lines of code to detect lack of initialization and semi-auto initialize it:


    AR2	STD	0,X	; STORE NEW VALUE
	BNE	AR2RND	; Initialize/don't get stuck on zero.
	INCB		; Keep it known cheap.
*	ADDD	QUITE	; RANDOMIZER	; NO! Don't do this.
AR2RND	ADDB	QUITE	; RANDOMIZER	; Adding the low byte to the high byte
	ADCA	QUITE+1	;		; is cheap but intentional.
	STD	QUITE
	RTS

Just search the code for "RANDOMIZER", cut the bad code out, and paste in the fix. Or, better yet, edit it with your own, improved random function.

Initialization -- yeah, this is another place where the original code left initialization out to keep the code tiny.

[202209041331: add end]

[2022091112291333: add]

While transliterating for the 6809, I discovered that I missed some more opportunities to optimize for the 6801.

If you look for the label

SUBTR SUBD 0,X

you'll notice that, in the 6800 source, it was a very short routine to subtract whatever X pointed to from D. Every call to SUBTR can be replaced with the actual body of the subroutine in the 6801 source, with no increase of code size. You can search for BSR SUBTR and JSR SUBTR and replace each with SUBD 0,X. (I don't think there will be any JSR SUBTR, but if there are, you can replace that, two, for a byte of code saved.)

Really wish I had more time after work to focus on this stuff.

[2022091112291333: add end]

[JMR202209231810: add]

JMR202209231810: add end]

[JMR202210011737: add]

I now have my work on VTL-2 up in a private repository:

https://osdn.net/users/reiisi/pf/nsvtl/wiki/FrontPage

https://osdn.net/users/reiisi/pf/nsvtl/files/

[JMR202210011737: add end]

Thursday, August 25, 2022

Software Differences between the 6800 and the 6801/6803

For those not familiar with the 6801, the issues when trying to run 6800 code on the 6801 or 6803 would fall under the following categories:

(1) Differences in the condition code calculations for CPX;

(2) Differences in function of any op-codes undefined in the 6800 which may be used;

(3) Timing differences for software timing loops;

(4) Differences in the memory map, which depend on the mode the MPU is operating in.

(Remember that the 6803 is a 6801 with the ROM missing or disabled.)

Revisiting this a week later, I realize that I've left out something important by not mentioning the rest of the 6801's extensions to the 6800:

A and B accumulators are concatenated to form the D double accumulator for certain new instructions. Condition codes fully reflect the results in all 16 bits, which is more important than just getting things done in fewer instructions.
The new instructions that work with the D accumulator are
- LDD and STD, 16-bit load and store of the double accumulator;
- ADDD and SUBD, 16-bit add to and subtract from the double accumulator; and
- LSRD and ASLD/LSLD, 16-bit logical shifts right and left of the double accumulator.
The new MUL instruction also works with the double accumulator, sort-of. It's an 8-bit by 8-bit unsigned multiply of the contents of A and B, leaving the results in D.
You can synthesize a 16-bit multiply using four of these and appropriate ADD instructions, which uses a few more bytes than shifting and adding, but is about seven times as fast.
Improvements in the condition code calculations for CPX, the compare with X instruction which I already mentioned above.
The PSHX and PULX instructions push X to and pop it from the return address stack.
The ABX instruction adds B to X, which is very helpful in accessing record fields and such.
The JSR instruction has a new direct page addressing mode, which can be used to speed calling a few carefully selected short, heavily used routines.
There is now an official branch never instruction, BRN, effectively a two-byte NOP. This can be useful in debugging and in simplifying code generation in some high-level languages.
There were changes in Motorola's assembler and the manual itself which I kind of shrugged my shoulders at, but may be of interest:
- LSL is added as an alias of ASL. (Note that LSLD is an alias of ASLD.)
- BHS and BLO are added as aliases of BCC and BCS, respectively.
  (Branch High or Same == Branch Carry Clear)
  (Branch Low == Branch Carry Set)

Details of Code Porting Issues:

(I'm summarizing information from the MC6801RM (AD2) MC6801 8-bit Single-chip Microcomputer Reference Manual -- )

(1) On the 6800, the CPX instruction is not recommended for anything other than comparing X for equality -- in other words, you would generally want to follow CPX on the 6800 with either BEQ or BNE, but not with other branches.

On the 6801, CPX affects all the condition codes correctly for the 16-bit comparison, and it can be used in the full range of signed and unsigned comparisons. If the 6800 code confines itself to the recommended use of CPX, there should be no problem.

But much of the existing 6800 code does do tricky things with CPX. In particular, it is often used as a NO-OP in the assumption that CPX will not affect the carry flag. In particular, it is often used to hide another op-code in the operand field. For example, on the 6800, with

SKIPR CPX #$8601

executing through SKIPR would only see a probably meaningless comparison of X with hex 8601. But branching to SKIPR+1 would see, instead, LDAA #1. This is a fragile optimization, of course, and the effort to use it usually costs more than whatever it was supposed to gain.

What to do with this? It only costs 1 more byte and actually uses 1 clock cycle less to spell it out properly:


    SKIPR  BRA NOLOAD
LOAD1  LDAA #1
NOLOAD

If that 1 byte is fatal, you can probably find something nearby to clean up slightly and save a byte, perhaps using one of the 6801 extensions mentioned above. You really don't want such fragile optimizations, anyway.

(2) is similar to (1), in that the incompatibilities are the result of tricks engineers really should avoid. (You don't want to find yourself faced with an undefined op-code changing its behavior in future mask sets, not to mention possible architecture extensions like the 68HC11, among other things.) Again, you can usually find places in nearby code to clean up, or to take advantage of the new 6801 instructions in a way that allows avoiding undefined op-code abuse, even if using the undefined op-code actually did save a byte.

(3) is the downside of the improved timings for the 6801 instructions. There's nothing to do here but recalculate the timing constants, which should be enough. (But note the exception for CPX timings.)

Maybe I should try to give a summary of the improved timings:

Branches take 1 cycle less (3 vs. 4).
Branch to subroutine, BSR, takes 2 cycles less (6 vs. 8).
Indexed mode binary operand byte instructions (ADDA/B n,X; etc.) take 1 cycle less (4 vs. 5).
Indexed mode unary byte instructions (ASLA/B n,X; etc.) take 1 cycle less (6 vs. 7).
Indexed mode 16-bit load and store instructions (LDS n,X; etc.) take 1 cycle less (5 vs. 6).
CPX takes one cycle more except in indexed mode, I guess to get the flags right doing it 8 bits at a time:
- immediate is 4 for 6801 vs. 3 for 6800,
- direct page is 5 vs. 4,
- indexed is 6 for both processors,
- extended/absolute address is 6 vs. 5.
Inherent mode 16-bit instructions (DES, TSX, etc.) take 1 cycle less (3 vs. 4).
JMP in indexed mode takes 1 less (3 vs. 4).
JSR gets some nice improvements:
- 5 cycles vs. not available on the 6800 in direct page mode, as mentioned above,
- 6 vs. 8 in indexed mode,
- 6 vs. 9 in extended/absolute address mode.

(4) The differences in the memory map show up at the bottom, top, and middle of memory:

At the top of memory, the 6801/6803 define new interrupt vectors for the built-in peripheral devices. 6800 code probably puts something at those addresses that will need to be moved, and/or the 6801/3 code will need to include instructions that mask out the associated device interrupts.

In the middle of memory, you may find the internal ROM on the 6801, but not on the 6803.

The 6801 in several of its operating modes will have ROM somewhere in the middle of memory -- usually from $F800 to $FFFF, but not always. You can switch the ROM out of memory by the operating mode.

In particular, the 6803, which has no functional ROM, should only be operated in either mode 2 or mode 3, which are the (mostly) external bus modes. Mode 2 keeps the internal RAM (addresses $0080 to $00FF) in the memory map, and Mode 3 switches it out. These two modes can be used to avoid conflicts when existing 6800 code wants to use the addresses where the ROM would be.

At the bottom of memory, you have the devices themselves and the built-in RAM. The built-in RAM is less likely to get in the way, but you can use operation mode 3 to switch it out of the memory map.

The built-in peripheral interface registers cannot all be switched out. There will always be devices at address $0000 to $0003 and $0008 to $001F.

Since the direct page is special, 6800 code should really not be written to conflict with the addresses at the bottom of memory in a way that doesn't allow just moving some of the direct page variables up a bit, but quite a lot of code is written in a way that does conflict.

One example is Very Tiny Language. (VTL-2 is what you will probably find if you go hunting for it.), VTL-2 basically allocates the language's variables A, B, ... Z to addresses defined by their ASCII values, which are in the direct page. I have found a way to mostly work around this for VTL (see my recent posts on adopting VTL-2 to hardware other than the MITS/ALTAIR 680), but I'm not perfectly confident it's a perfect work-around.

Anyway, that probably covers the differences.

If you want another example of how these changes actually affect code, I have adapted the fig-Forth interpreter model for the 6800 to the 6801, optimizing it with the new instructions. (I may have missed some possible optimizations.) That source may be interesting to examine. You can find it either in my adaptation of Joe H. Allen's EXORsim:

https://osdn.net/users/reiisi/pf/exorsim6801/scm/tree/master/

or in the source tree of my 6800/6801 assembler:

https://sourceforge.net/p/asm68c/code/ci/master/tree/fig-forth/