A benchmark of three different floating point packages for the 6809

jcmeyrignac 8 months ago

Here are a few other 6809 FP libraries: https://gitlab.com/dfffffff/fpo9 https://github.com/spotlessmind1975/ugbasic/blob/main/ugbc/s...

If I remember correctly, the ROM of MO6/TO7 contain also a fast implementation, but I was not able to find a disassembly.

spc476 8 months ago

The first link is to the MC6839, which is covered already in the blog post. The second link is the Lennart Beschop's floating point routines, which is what the post is about.
> ROM of MO6/TO7
Do you mean the MC6839 here? Or something else?
- jcmeyrignac 8 months ago
  
  The MO5/MO6/TO7 were based on the 6809: https://en.wikipedia.org/wiki/Thomson_MO6 https://en.wikipedia.org/wiki/Thomson_TO7 Since I met the engineers, I believed that they implemented their own Basic, but I might be wrong.

mannyv 8 months ago

If you can find a copy Apple's SANE for the 6502 test it. It was faster than hardware fpus. That was according to a friend who was an FP geek back when.

brucehoult 8 months ago

That is a completely ridiculous suggestion.
The hardware FPU on the 1954 IBM 704 took 60 µs for addition and 300 µs for multiplication.
That is likely to be the slowest hardware FPU ever. The Intel 8087, for example, was about six times faster. I expect the optional FPUs on things like early PDP-11s to also be faster.
AppleSoft BASIC has quick&dirty FP operations and uses about 1000 µs for an add and 3000 µs for a multiply.
SANE is IEEE-compliant and has to be careful to get 0.5 ULP, round correctly, set flags etc and therefore I would expect it to be slower than AppleSoft. But I don't know how much slower. I never used it on a 6502.
- adrian_b 8 months ago
  
  Moreover, Intel 8080 and its derivatives, like Z80, where frequently much faster for floating-point operantions than 6502 derivatives, depending on their relative clock frequencies and on the speed of the memory.
  The main reason was that 8080/Z80 could do much faster FP multiplications (which were extremely slow on 8-bit microprocessors, typically requiring many milliseconds or even tens of milliseconds per FP64 multiplication), because those could be implemented by using 16-bit additions and 16-bit shifts and accumulating partial results in registers (i.e. by using the 16-bit index operations and registers, not the 8-bit accumulator operations). 6502 had only 8-bit operations and too few registers to keep partial results in them, so the partial results had to be stored in memory.
  Nevertheless, the great advantage of 8080/Z80 in computational speed was not always realized, because a lot of the 8080/Z80 programs used naive multiplication procedures that were an order of magnitude or more slower than the optimized multiplication.
  Even in the Microsoft BASIC and in the run-time library of the Microsoft FORTRAN compiler for CP/M the floating-point operations did not have an optimum implementation, so after reverse-engineering them and replacing the core algorithms I could speed up a lot my programs in MS CP/M FORTRAN and BASIC.
  The AMD floating-point peripherals for 8080/Z80, Am9511 and Am9512, which were second sourced by Intel as 8231 and 8232, were faster than achievable in software. They used a microprogrammed implementation, IIRC with a 16-bit ALU.
  Am9511/Intel 8231 was not much faster than optimized software, but Am9512/Intel 8232 was much faster. However the latter was launched only in 1980, not much before Intel 8087 and after the first discussions for the standardization of the Intel 8087 FP formats, so Am9512/Intel 8232 was actually the first hardware FPU to implement them (i.e. the future IEEE 754-1985 standard). Since 8087 was available only for 8086/8088, 9512/8232 remained the solution available for 8080/Z80, but I doubt that many have used it, because whoever had money to pay for an expensive FPU would have been likely to also pay for a better 16-bit CPU, instead of staying with Z80.
  
  brucehoult 8 months ago
  
  I'd like to see that z80 library.
  The 8088 has more registers than the z80, but BASICA FP (at 4.77 MHz) is only very slightly faster than AppleSoft FP (at 1 MHz).
  My experience is that 16 bit operations on the z80 save a lot of code size but very little time compared to doing multiple 8 bit operations. Adding a register pair to HL takes 11 cycles (15 to add to IX/IY), vs 4 cycles to add an 8 bit register to A. ADC to HL takes 15 cycles. Each byte copied from one register to another takes 4 cycles (7 for IX/IY H/L).
  So if you want to ADC BC,DE you're looking at 64 = 24 cycles for adding in A vs 27 cycles copying BC to HL, adding DE, and copying back. The code size is 6 bytes vs 5 bytes.
  On 6502 the same thing, using Zero Page, is 20 cycles and 13 bytes of code. So 6502 wins on cycles, but 6502 machines were 1-2 MHz and z80 3-4 MHz, so z80 wins on high clock speed.
  BUT ... how the heck do you keep two 32 bit floats, plus* some just as large temporary variables, in the 11 bytes of registers on a z80? You can get a bit more using EX and EXX but it's going to be incredibly fiddly -- and you remove any possibility of fast interrupt handling on the same machine.
  As soon as the z80 needs to use RAM it loses. Loading or storing a register pair to/from a fixed RAM location is 20 cycles. Indirecting via (IX/IY+n) is 19 cycles per byte. Indirecting via (HL) then INC/DEC HL is 13 cycles per byte.
  Really, I'd like to see that 8080/z80 FP library.
spc476 8 months ago

First off, the assembler is for a Motorola 6809, which is not a 6502. Second, one benchmark result I found [1] showed it being a bit slower than some alternatives.
[1] https://www.callapple.org/programming/sane-programming-on-th...
- adrian_b 8 months ago
  
  Motorola MC6809 was much faster for floating-point computations than any other 8-bit microprocessor, because it not only had 16-bit additions and shifts, but it also had 8 bit by 8 bit multiplications. No other 8-bit microprocessor had multiplication instructions.
  MC6809 had a very beautiful ISA in comparison with all other 8-bit microprocessors, but it was launched too late, in 1979, when there already were the 16-bit microprocessors Intel 8086 and Motorola MC68000.
  Motorola had made the mistake of developing simultaneously two incompatible instruction sets, MC6809 intended for cheap CPUs and MC68000 intended for expensive CPUs. They should have developed a single architecture, with a stripped-down version of MC68000 for cheap CPUs, like they have done later with MC68008, but only when it was too late, because the cheap version of Intel 8086, i.e. Intel 8088, had already won the IBM PC.
  
  dboreham 8 months ago
  
  68K die was vastly too expensive for those 8-bit markets way into the late 80s.
  
  musicale 8 months ago
  
  The 68000 was ~$15 in 1984, the year the 128 KB Macintosh came out at $2495.
  In comparison, a 128KB Apple //e with a disk drive, monochrome monitor, and 80-column card (including 64KB of expansion memory) sold for $1995.
  
  anthk 8 months ago
  
  The 6809 works great under the Vectrex. It's dumb easy to generate a game with vectors there, even more than a raster game with a 6502.
- Someone 8 months ago
  
  From that article: “In comparing measurements remember that SANE calculates to 19 digits. BASIC to 9 and Pascal to 6”.
  SANEs claim to fame wasn’t that it was fast, it was that it produced results accurate to ½ ulp (https://en.wikipedia.org/wiki/Unit_in_the_last_place) in a large (for the time) floating point format.

itishappy 8 months ago

Bit of a tangent, but I'm curious what people make of the obligatory picture.

https://www.conman.org/people/spc/about/2024/0802.html

numerosix 8 months ago

I second that.

musicale 8 months ago

I know HN loves the 6502, but the 6809 deserves some appreciation as well!