- The LaST Upgrade -
PART 11 -
16MHz 32MHz 38MHz STFM/STE Booster
Project started October 9, 2013 - Last updated January 31, 2020
One thing I always wanted to do was mod the STFM CPU to work at 16mhz. While there is a 16mhz hack out there, I found when I simulated the switching of the cycles, it "glitched" from switching from 16mhz to 8mhz. I have heard reports of the mod being unreliable so that is why.
It is worth noting that the mod only increases the CPU speed when it is not accessing the ST bus. This means ST RAM etc still has to run at 8mhz. After much pondering, it is near impossible to boost the ST RAM speed without creating video problems. I have ran the MMU and 60ns simms at 16mhz and it DOES work, but the video is totally messed up. The data is being clocked into the shifter at double speed, so the display simply runs out of data each scan line. Apart from that, the GLUE and MMU control the video syncs so they become unstable also. The only way it could work is to re-build the video syncs, and build a new shifter than can take increased data input speeds, but still output at the correct speeds.
I did have the thought to only boost ST RAM access when the shifter is not enabled. This would solve some problems, but again, the video syncs become out of sync so the video still suffers. Overall A new shifter would have to be designed. Possibly one which will take the video data and store the data internally then output at normal speed. It could be possible, but its not something I have time to do myself. The only other solution would be to capture the shifter RGB output, frame store, and output at normal speeds. That way the MMU/RAM/SHIFTER could all run at 16mhz, but the output of the shifter would still need correcting.
Due to all above said problems, it is only possible to increase the CPU speed when it is not accessing anything on the bus. This can sound a bit pointless, but it is important to note that some instructions can take many cycles internally in the CPU so speeding up those cycles can give a boost in overall speed. Normally the MMU allocates 2 cycles for video and 2 cycles for bus access. However, we can still have 2 cycles for video, but now 4 cycles for bus access. Though the 4 cycles cannot actually access the bus at 16mhz, only when the CPU is not accessing the bus. So only some CPU instructions will run at double speed. The rest of the bus still runs at 8mhz so there is no other problems in that respect.
Below is the "classic" 16mhz mod which is freely found, however be warning, this does not actually work.. At least not as I know.
GadgetGuy did some videos of the mod below which he also had problems with. Later he did my mod and got his ST working. Check out his channel for his blog of fitting the 16mhz mod among some other cool things too :)
THE BELOW CIRCUIT DOES NOT WORK - DO NOT BUILD IT!
|The problem is the clock is not syncronised and glitches when switching between 16mhz and 8mhz modes. This produces the Faults as shown in the timing diagram. Those pulses actually work out at about 32mhz! which is enough to crash the system.
After many circuit designs I created one which gave glitch free switching. However at the time of writing this is actually a larger problem than it first seems.
The waveforms above are using the STFM's own 16mhz and 8mhz clocks. So they will always be syncronised to each other. So the circuit can be relatively simple. When a external clock is used, even 16mhz, the 8mhz and 16mhz clock are no longer in sync. So the circuit will glitch again.
The solution is to use edge triggered flip-flops, however, under simulations they take 2 full cycles to switch over which is a huge problem. IF we needed to switch to 8mhz from 16mhz, it needs to happen immediately not 2 cycles later. As if that happens the ST bus will be running at 16mhz not 8mhz and will cause the system to crash.
According to some websites there is no clock switching delays, but the simulation says there is. However, I do plan to build the real circuit in due time as if it is possible to use a faster clock, then the CPU could be run at 20mhz or higher.
Above is my 16mhz mod in operation. Integer Division seems to be the one which is making full use of 16mhz speeds. A useful boost in speed in other areas it seems also.
Next up I am working on a 16mhz mod for TOS. Thankfully TOS is hooked into the main ST bus, so the CPU can talk to TOS at 16mhz without any problems. However, Eproms seem to max out at 100ns, which is about 10mhz. So they do not work at 16mhz. Faster ROM's all seem to be CMOS types which do not have enough current to drive the ST's TTL based bus. So currently I am working on a new buffered TOS design.
One thing not mentioned above, is the "over ride" signal in the timing graph. This is actually a mod to switch TOS to 16mhz also. At the time of writing it is untested until I can build the faster TOS board. I have found some 45nS chips, so they are good for about 22mhz. I assume they would push to 24mhz if the CPU can take it. Though these are things to come later....
October 10, 2013 UPDATE
|Etched out new TOS PCB.. :o)
Above is cleaned up and drilled.. Now off into the cooker ...
*several moments later*
October 11, 2013 UPDATE
Finally getting there. After testing the board I found it will boot games etc, but it crashes just before the GEM desktop comes up. After much investigation I am at a loss why this is. I programed some normal 100ns EPROM's and those work fine. So it would seem there is some other problem with those ROM's that is causes half of TOS not to function. Those chips are AMD brand, so I will try ATMEL to see if there was any difference there. I also found some 45ns eproms, so Those are on order also. Hopefully they will not need the buffer chips.
While 16mhz is working, my target is 24mhz. The ideal clock generator chip is out of stock for a month (at the time of writing) so there is a big wait before I can try faster modes :-( The chip will allow various speeds to be programed in. The ideal clock rates will be 16mhz, 20mhz, 24mhz, 26mhz, 32mhz on a initial look. I suspect 24mhz will be the max speed that will work. The ROM chips will max out at 24mhz, No way to tell what speed the CPU will max out at until tried.
October 15, 2013 UPDATE
Well The ATMEL chips were tried and they did not work at all. As to why this is I am not sure. Both ROM makes look identical spec wise. They both have a logic 1 threshold of 2 volts, so no problems there. The CPU is more than capable of driving the cmos address bus.. Just have to see if the fast EPROM chips work. If those do not work then there must be some really odd timing issue where the 68000 just will not work with fast chips.. time will tell.
October 28, 2013 UPDATE
The new clock chip arrived. First of all it was feed with a 8mhz clock to double up to 16mhz to replace the ST's 16mhz clock on the CPU booster circuit. This worked. However, when boosting to 18mhz, it did not. At first I thought something was simply running to fast. I tried with a 4mhz clock and doubled back to 16mhz, which worked. So I doubled (down) to 12mhz, and that did not work either.
The IC is a PLL type so the clocks should all be in sync. In theory, the CPU has 2 clock cycles at 8mhz, so any number of clock cycles you can cram in, for example 4 16mhz clocks work fine. The PLL driver does not seem to like the ST's 8mhz clock input. A x2 on 8mhz comes out at 40mhz for some strange reason. I suspect a multiple of 8 might work, for 24mhz. Though I am unable to get a PLL working at 24mhz. I will see if I can find a different chip to try, though I am not holding out much hopes any other speed other than 16mhz is possible.
The faster Eproms I am still waiting for. When they come I will interface 16mhz into the ROM section to give a boost of speed with ROM access.
October 31, 2013 UPDATE
|The fast ROM's finally arrived and these do not work either :-( After having so many issues with the DMA chip when trying to get hard drives to work (READ ABOUT IT HERE) I checked a few of the Address bus lines and found these are totally bizarre also. The images below are the type of thing I was referring to on the Hard Drive pages.
The left image was showing a normal HI-LO-HI pule with a strange HI spike which I can only presume is noise. The left image is even worse. Pulses going from 5volts to 2 volts, or 5 volts to 4 volts etc. I am going to assume its the same problem which causes the DMA to not function correctly.
I presume these noise pulses (that is what I will call them) are so fast that the slow ROM's simply to not react to them. In essence, simply get ignored. Though on faster ROM's, they can actually react to these noise pulses and cause errors on the address bus. To the point that TOS cannot even initialize since the start up routines are never called as the address ranges are simply wrong. The noise pulses are borderline from logic HI and LO in a lot of cases. Its a wonder the ST ever worked at all really! I guess it could explain why there are so many PCB revisions for the STFM alone!
February 9, 2014 UPDATE
The ROM work is on hold until I can get someone to help with a new decoding logic circuit. There are simply too many issues with speeding up ROM with the built on decoding inside the GLUE chip. So new logic needs to be created to replace the GLUE decoding functions. I am currently asking around for assistance with this. Meanwhile if anyone can design a GAL code or simply enough logic circuit to build then please let me know.
Below is the current 16mhz mod which has proven to work. The "forced" 16mhz line is in place for the (hopefully) future expansion of the ROM speed mod.
The circuit is a simulation circuit, but was also used to build the real world circuit. The 2 inverters are only there as a buffer (simulator did not have a LS buffer ?!) for V4 pulse generator which simulates the CPU BG & AS functions. BG & AS should be linked to the CPU lines not together as shown, I had to do that for simulation.
The below was the breadboard test circuit, incidentally I used 74F series for the chips.
I plan to design a simply PCB for the mod in due time, though I am currently waiting on help to finish the ROM speed mod before I start looking towards a full PCB solution. Updates to be posted as they happen, it may be some months before there is another update though :-( But I will try and develop a simple 16mhz mode as I really want to get one added to one of my STFM's soon :-)
February 20, 2014 UPDATE
A more concrete BETA design. I noticed there was a incorrect gate on the output of the simulation circuit which has been fixed in above REV1 design.
The design has been built and tested as shown below. The good thing is now is the circuit uses 1 less IC :)
Next is to design a PCB and implement a sync clock so I can try 24mhz and 32mhz speeds.
Here are some useful utils..
February 25, 2014 UPDATE
Thanks to the help of rpineau from AF, I am able to start work on the 16mhz ROM section :)
The PCB is designed to solder into the empty blitter socket in the STFM. I choose to pickup the signals from there rather than the CPU, as there are other kits such as IDE addons which already fit over the CPU, so for me at least, it wasn't viable to add another project on top of the CPU. All the signals needed are available on the blitter socket so there is no need to add more on top of the CPU.
The circuit has the already tested 16mhz CPU mod. This has proven to give around 30% additional speed and clearly shows in a lot of games tested. As mentioned previously, the next step is the 16mhz ROM access. As the ROM address lines are being controlled by the GLUE, it is not possible to run the GLUE at 16mhz, so the GLUE logic had to be replaced (for the ROM decoding that is, not totally). rpineau has been busy working on the GAL code to allow faster ROM decoding. The GAL code when decoding ROM address ranges will output a signal to my 16mhz mode to drive the mod into 16mhz. This way, the CPU & ROM are then running at 16mhz speeds.
As a future/further thought, I have added in a new clock generator chip which will generate a 16mhz signal, so the shifter clock wire is no longer needed. Also the clock generator is capable of running at 24mhz. It is unclear at this time if 24mhz will even work, but the circuit will at least allow testing for this. The clock chip is capable of 32mhz also, though it is doubtful that speed is even plausible. If 24mhz works then CPU & ROM will have a 50% boost in speed, of course never know to you try ;)
I am currently waiting on the PCBs to arrive. The chips are mostly "though hole" types so easy to solder in place, with exception of the clock generator and a few SMT capacitors.
March 18, 2014 UPDATE
The 8/16mhz switch is working and also ROM is now working outside of GLUE logic.
The current problem is the 16mhz switch for the ROM. There seems to be a odd issue on power up which must be glitching the CPU clock causing it to lock up totally. Pressing the reset button does nothing, even though, after power up the lines are all as expected and it *should* then reset, but it does not :(
The blue line on the first 3 images show the random state of reset during power up. The 4th image shows the 16mhz AUX line glitching on power up. It must be LO during power up to enable 8mhz bus access. Though it mostly all the glitching happens before CPU clock (yellow) starts running, so it shouldn't be a problem. Rodolphe updated the GAL code to compensate for the reset problems, so the GAL would not function until reset is stable and high, but it did not solve the problem. Looking at the last image, it looks like the clock glitches at 16mhz for the first cycle on power up. But power up measurements are a problem as a lot of random things can happen.
I am going to get my Logic Sniffer working later this week to monitor more lines to try and see better the timing relationships with the signals. I suspect that while all the logic is powering up, it is simply sending incorrect signals to the rest of the logic causing it to glitch and crash during reset power up. This is a tough problem to solve :( The only method I can currently think of , is to buffer the GAL's outputs with a 3-state buffer, and simply disconnect the GAL's outputs on power up. This should be tied to the CPU reset line, but as now known , this seems unstable and not reliable. So another method of delay is needed during power up to ensure a stable state before logic starts running.
March 27, 2014 UPDATE
After much hair pulling this past week, it has been decided to restart the entire project but not using the blitter area and locating the PCB over the CPU. I wanted to avoid using the CPU as the "add on" point as other projects use the CPU area, but unfortunately due to recent "enlightenments" its not going to be possible.
For starters, the 16mhz mode is being coded into a GAL chip. GAL logic may just be a fraction faster, but also allows the flexibility to make changes easily. We still have not got beyond 16mhz yet, but it is still yet to be investigated fully. So higher than 16mhz speeds have not been ruled out yet.
The reason for relocating the PCB to the CPU, is that the GLUE chips issues a DTACK after 4 CPU cycles, about 500ns. The problem is, when running the ROM at 16mhz the GAL decoding does the 4 cycles in just 250ns. This results in the GAL logic issuing DTACK at 250ns, then the GLUE issuing DTACK at 500ns, when there is not actual valid data on the bus. This just causes the ST to get stuck in a constant resetting loop until it eventually locks up. So a new method to disable the GLUE decoding had to be devised.
What we decided to do was isolate the CPU address bus from the entire motherboard bus when ROM decoding is done via GAL logic. This means that during ROM decoding, the ST bus will see all 1's on the address bus (4k7 pulls ups on address bus by default) so the GLUE will not see a valid ROM address and not do the decoding. This means the GAL decoding can work without the GLUE also decoding the same address ranges. The result of this is that the CPU has to be isolated from the address but totally which can only be done at the CPU itself.
We have also found that the GAL chips seem to isolate outputs during power up, which means the logic outputs of the GAL are unknown, which was also causing some issues. Thankfully it is a easy fix to add pull up/down resistors on such outputs to prevent glitching on power up.
I have also found the 8mhz clock isn't constant either. Starting from the shifter, the clock starts life at 32mhz. Output from the shifter is 16mhz, MMU downclocks to 8mhz etc. Though I have found during power up, the shifter actually outputs 32mhz for a few cycles. This means the entire 8mhz bus is clocked at 16mhz. As to why this does not cause the ST to crash is beyond me.
Looking at the diagram for the 32mhz clock generator, it seems to be linked to HSYNC, but that part of the circuit is totally missing on the STF. As a test I disabled the HSYNC line and then got a stable clock, which was constant speed. However, I did notice there was some odd video faults. For example, on my LCD monitor the GEM desktop colour (green) seems to have wavy lines vertically. This actually happens anyway, but with the mod to correct the clock problem, those wavy lines scroll to the left. So it looks like green background is actually scrolling to the left. It also looks like edges of the desktop screen seem to judder slightly, its hard to explain but it just looks strange somehow. I can only assume that there is some odd timing issues relating to the video syncs, which Atari doubled the clock speed to correct the problem. It is strange to think the entire bus is being clocked at 16mhz, but nothing can be using the bus, else it would simply crash. I can only assume Atari did it to speed up something relating to video timing, maybe to correct some "counter" issue with the GLUE chip or something..
In anycase, the ST was left on the GEM desktop for a few moments and it reset. It did this for about half hour, just reset for no reason. I disabled my clock mod, and the odd resetting issue stopped. It is worth noting that you cannot depend on the 8mhz clock being constant, or the 16mhz clock for that matter. This may tally as to why I had problems using the 16mhz clock to feed the WD1772 with 16mhz. I always had odd issues which were cured when I used a separate 16mhz clock. So I think when the clock went to 32mhz, it was causing the WD1772 to glitch. So when using the onboard clocks care must be taken not to assume the clock speed is constant.
Above image is the RESET line (red) and the 8mhz clock (blue). Even before and after reset, the 8mhz clock is actually running at 16mhz!
Above is taken while the ST is in a resetting loop. RESET line is always HI! Though you can see the clock is 8mhz, then jumps to 16mhz for about 10 cycles, then glitches back to 8mhz. After the glitch the 8mhz clock actually is about 10% out of phase to what it was before the 16mhz cycles. I can only assume as the entire bus is using the 8mhz clock than it going out of sync isn't a problem anyway.
March 31, 2014 UPDATE
|Rodolphe has kindly designed this next pcb. This one fits into the CPU socket. As mentioned before, this had to be done as we needed to isolate the CPU from the address bus to prevent the GLUE from decoding the ROM address's. The PCB we have opted for 4 layers as I did not want to risk grounding problems due to the high speed (in MHz range). The PCB is currently in production so hopefully it will be here within the next couple of weeks.
April 2, 2014 UPDATE
Rodolphe sent me the latest GAL code to try the new clock switching. My logic sniffer does not show the top signal correctly, but it is actually a 25mhz clock. Second line is the default 8mhz clock then the "clock switch" line and lastly the CPU clock line. The GAL seems to work flawlessly on my test setup!
The good thing is all 3 signals were unrelated (not synchronized) signals. The Clock Select line (line 2) I ran a signal of about 400khz to simulate bus access. The GAL includes code to allow switching of the TOS ROM's to 16mhz speeds as well as the CPU clock. Once the new PCB's arrive there should be no problem to try faster clock speeds :)
April 14, 2014 UPDATE
|PCB's finally arrived :) Now to start building it up..
April 27, 2014 UPDATE
I tried the board at default of 8mhz to verify things are working. Though there was intermittent corruption and often the ST would reset or bomb out with all kinds of crazy things on screen.
After MUCH investigation I found the bus pull up resistors on the ST itself (10K) we not pulling the address bus HI fast enough. In fact it took over 200ns!. I changed the bus resistors to 4k7 and got the delay down to about 60ns.
After some talking to Rodolphe, We started thinking about the /AS pin. Similar problem that 10k resistor was taking over 100ns to pull up. Later I then went down to 1K and got about 10ns time, and that is about as fast as it can be.
Line 0 shows the /AS line going LO, and line 2 is the GLUE decoding and selecting ROM CS in just 25ns!
Unfortunately the GAL logic and ABT logic take about 20ns to disable /AS, so the GLUE still manage to decode before the GAL+ABT can isolate the bus. It was a unforeseen problem that the GLUE was going to decode so fast, so we are currently looking into workarounds for this.
CURRENT MOD TO THE PCB
A wire link is to be placed on the bottom of the PCB on the 2 pins as shown.
Pin 22 of the FREQ_GAL goes to the CPU CLK pin. this trace needs to be cut and a 33R resistor placed in the CLK line. This Resistor may not be a ideal value but I will look into this more at a later date.
OCT 17, 2014 UPDATE
Some good news finally! We have got 16mhz working along with TOS206. We decided TOS206 would be a easier starting place as its naturally outside of GLUE decoding. Though its great news that it now boots and verified also the 16mhz mode is working.
It seems 7ns GAL's are needed. Slower ones are just to slow and do not work. I think we had 25ns before.
Tasks currently are to try faster than 16mhz speeds, Try TOS104 and investigate GLUE isolation issues. Also to work on speeding up TOS to 16mhz.
November 9, 2014
Finally some progress :) I posted a new thread about progress on the forum HERE where more regular updates are posted. I will only post key facts and figures on this page from now on.
We finally can use TOS206 or TOS104 in 16mhz mode and the results are shown in the image above. Overall 16mhz TOS gives around 20-40% speed boost across various functions. GEM runs a lot faster so we are pleased its finally possible.
I have been trying to speed up the CPU some more. The 16mhz CPU I think can run at 20mhz, but it resulted in lots of bombs across the screen. Investigation saw that there was 24mhz glitches in the clock and as the CPU could not run that fast, it caused the ST to crash. This limits the speed gains to multiple of 8mhz. Unfortunately, there isn't a 24mhz 68000 CPU that I know of so for this upgrade 16mhz is the maximum speed. We are looking into the 020 CPU next. It is a faster CPU and can operate up to 32mhz. We are also looking into speeding up the blitter.
November 18, 2014
I spent a lot of time working on the Blitter side of things on my STFM this past week. The blitter now runs at 16mhz when it is not accessing the ST bus. This gives a small boost in speed, typically about 2%. I have also tried CPU at 16mhz to program the blitter registers at 16mhz also. Overall this gave about 5% boost in graphics speed. Though the figures vary from about 2% to 16%. "Blitting" showed a 16% speed boost.
I also found out that while the blitter has control of the bus, the CPU can still actually be running instructions. I found this out as there was a 35% drop in speed which took some time to figure out. At the time the GAL code was set to run the CPU at 8mhz while the blitter had control of the bus. But after some code fixes, the CPU runs always 16mhz while blitter has control of the bus. So the speed was back up again.
Overall there is 12% speed loss in graphics speed using the blitter (though this is going from the 150% speed. So the overall speed ends up about 138% still). This is tricky to explain. But without the blitter, the CPU has more time on the bus to do things and ultimately spends more time at 16mhz. So graphics speed results in 150% speed boost for explain. Though when using the blitter (and GB4 blitter ref box ticked) the graphics drops down to 138%. Really this shouldn't happen. Though I think because the blitter spends some time accessing ST RAM, the CPU has to wait for RAM access sometimes , so the speed the CPU has to the bus is a little less. This results in a 12% speed drop than expected.
This is not a problem though, just was something I did not expect. Overall the blitter actually runs faster, by 7% overall. This may sound unimpressive, but blitting power went from 681% to 720% which is a boost of 41%. A few % in speed anywhere on the system results in a noticeable increase in speed. This actually does more than you think.
For example, I have found that when the CPU access the blitter at 16mhz, there is a small 5% speed increase overall. Though as the CPU is spending less time accessing the blitter, the CPU also has more free time to start executing its next instructions sooner. This gave way to some interesting ideas.
For example. When the CPU reads the ACIA bus ( keyboard etc) it does this on 8mhz. If we can boost the reading of the ACIA to use the CPU at 16mhz, then the read cycle will be done in half the time. Then the CPU can start its next tasks a few cycles sooner. Its possible a few things on the bus could be adapted to read the data faster. All those small % speed increase will all add up. Though it is something to look into in the future.
So where is the project at now ? Well Rodolphe is designing a new PCB at the moment. The main problem is that I had to bodge 4 GAL chips to get the current setup working. The GAL's simply do not have enough IO ports to do what we need. So we have opted to move to a smaller SMT part which has has over 60 IO ports. This will allow us to port all the code over into 1 file and make life overall easier in making changes. Also the spare IO ports we can use for other experiments later. Such as speeding up read cycles on the ST bus.
The 2 large DIP ROMs are being replaced by a single smaller PLCC ROM. We are going for a 4MB chips, so we can program TOS104 and TOS206 on the same chip and make it switchable. We also are going to add 4MB of alt-ram to the board. This will be actually "Fast-alt-ram" where the alt-ram will have access to the CPU at 16mhz. Programs which run in RAM at 16mhz will gain a huge boost in speed. It is possible at some point we could add a IDE connector to load from a CF card at 16mhz. This will make the IDE drive run several times faster than going via the slower DMA port. Though it is important to realise the limitations of what can and cannot be run in fast ram.
We also are going to continue to try and break the 16mhz barrier. I think this is possible. Though the only possible 68000 CPU which may work faster is the HC variation. These CPU's really do not like operating in the ST. We are going to investigate why when the new PCBs are made. The main problem is that while we can have 16mhz and a 8mhz and 20mhz clocks, the switching of the clocks results in glitches when switching from 8mhz to 20mhz, it can result in a clock glitch of 28mhz for example. If the CPU could operate at 28mhz, then this wouldn't be a problem. Though it is near impossible to find a 68000 CPU which will run faster than 16mhz. Technically the 16mhz 68000 can probably run at 20mhz, though there is no easy solution to cure the clock glitches. It will probably need 24mhz to operate glitch free. Work on this will probably happen sometime in the new year.
As a side note, once the new design is finished and working, I hope to attack the ST-RAM bottleneck. Speeding up the MMU and RAM is possible, though the side effects of the video and countless other problems to overcome are going to take a lot of time to investigate.
November 28, 2014
Here I started trying 32mhz, and after a few hours I got it to work :)
I had to add some pull down resistors onto the CPU bus as for some reason the 68HC000 CPU would not function correctly without them. It seems its a noise problem again and adding the resistors helps smooth out the noise.
First I tried my simple 16mhz switcher which just uses 74Fxx series chips, but on 32mhz clock, the propagation delays were around 20ns which is to much for 32mhz speeds. So I used a 7ns GAL and did the switching there.
First I tried the normal 16mhz TTL CPU's. Though these do not like 32mhz. I suspect they will only run up to around 20mhz. One test I had the CPU clock glitching from 20mhz to 24mhz and it did not work. So pretty much conclude 24mhz won't work.
After some searching around the Internet, some people were saying they goo 50-60mhz out of the 68HC000 CPU. So I set to work in investigating it. The results are as follows.
We can see Integer Division is now 370% which is good. Though from 16mhz speeds, nothing much else changes a great deal.
Below is a overlay of 16mhz vs 32mhz speeds with the difference at the end of each line.
32mhz does give a nice boost in speed overall, though the speed boost isn't as great from 8mhz to 16mhz jump. For GEM Dialog Box for example, 16mhz gave 22% speed gain, but 32mhz only 7% additional gain.
But we should not be disappointed as this is a rare thing for the ST to run at 32mhz. Also most of the tests are GEM benchmarks, so once 32mhz TOS ROM is added, 32mhz will push the results up a lot more.
So can we push past 32mhz ? Well probably not. We had to use 100ns ROM for 16mhz for reliable operation. 32mhz would mean 50ns ROM, which is just about possible. Faster than that and ROM speeds become a huge problem. Even if we boost CPU speed only, we got around 3 times less speed from 16mhz to 32mhz jump, than 8mhz to 16mhz. If we assume the same again, then we would go from 29% to 7% to 1%. This would have to be 64mhz just to gain 1% extra speed. Not really possible or doable, even if it was, there would be near zero speed boost. 32mhz from 8mhz is still a awesome jump in speed and beats most boosters already :)
|Rodolphe has been working on the new booster card. The above isn't the final revision (still much to do). This will have a ATF1504 chip to replace the GAL's as used on the previous design. Also the CPU has been changed for a PLCC type. As 32mhz speeds are now a reality, we can try higher speeds with PLCC CPU. Also we have a socket for blitter on there. We had to remove the SRAM from this design as even with 4 layers, it wouldn't route :( But now there is a expansion header where a SRAM card can be plugged in (if required). Also it opens up the project should anyone else wish to develop hardware to make use of a 32mhz bus.
December 9, 2014
I have been trying to run the HC CPU at 32mhz along with the ROM. Though I have not yet had any luck at this. My current thoughts are that the CPU cannot sustain 32mhz for more than a couple of cycles. Though it is still unknown why it fails to boot. Everything works fine at 16mhz. The CPU on its own seems happy at 32mhz, but accessing ROM at 32mhz isn't working. The ROM's are 45ns and considering it takes the CPU a few cycles to access ROM, the ROMs are more than capable of operating at 32mhz. 45ns does equate to 22mhz so it is still a possibility that the ROMs are not fast enough.
I have ordered in a TOSHIBA 68HC000 to compare results Vs the Motorola one. From what people have mentioned about other boosters made in the past, there seems to be few which will run at 32mhz. Though a different brand of CPU may or may not be able to overclock better. I have also on order a PLCC type. These were in production for longer than the DIP version. So possible newer CPUs might over clock better. Currently I am still waiting for all my CPU's to arrive :(
Meanwhile I have been working on the problem of operating unrelated clocks. For example, I can run the CPU at 16mhz, which is generated by the shifter, but, If I use a external 16mhz clock, the machine will not boot. Its clear the clocks must be synchronized. It is also a problem as I am limited to the clocks on the motherboard, which are 8,16,32mhz. Generating other frequencies is near impossible, especially when trying to keep them synchronized.
I have spent the past few days tweaking the 16mhz logic design to switch between 2 unrelated clocks.
The green line is /AS and the red line is CPU clock. The switching starts at 8mhz and when /AS goes HI, the circuit switching into 16mhz mode. Though the first cycle when the /AS goes HI, glitches on the clock. However, this circuit was carefully designed so the glitches always happen below the maximum clock speed. Without such attention to the design, the clock can glitch at 60mhz or more which is too much for the CPU. With this current design, the clock glitches at about 12mhz. Since the CPU does not do much on the first cycle this shouldn't be a problem. Depending on the phase relationships between the 2 clocks, sets how much of a glitch there is. But always lower than 16mhz.
My first attempts at this generated below 16mhz glitches, until the clocks got to around 250deg out of phase. Then the glitches would go back to below 16mhz speed after about 300deg out of phase. The delays throughout the circuit really have a huge impact on where the phase angle causes faster than 16mhz glitches. Currently this is only working in simulation, but I hope to try a real hardware test very soon.
Once I can verify unrelated clocks work, I then can start off at 16mhz with the CPU & ROM and slowly increase the speed until I hit a point where the machine will no longer boot. Once I can find this "tipping point" I can investigate better what is failing to run at the faster speeds. It will also be good as currently the design is stuck at 16mhz ROM access. Even if we cannot run totally at 32mhz, its possible something like 24mhz is still possible. The end user could possibly try his own frequencies of overclocking with various CPUs to find the maximum speeds. So I think it will be a great feature to be able to adjust the CPU clock speed.
As a side note, I tried using flip flops to sync unrelated clocks. While this works, it takes 2 clock cycles to change clock speed. In which case the CPU will be running at 16mhz when it shouldn't be. So the flip flop idea simply cannot be used. While the current circuit is a hybrid of the V1 booster circuit, it does glitch, but not in a way to cause problems. Considering it may open the door to faster than 16mhz speed, the glitching is really a very small side effect. Overall the machine will run faster, and that is all I am interested in, without of course glitching to the point of a crash.
March 10, 2015
A new booster called the V2 was produced as shown below.
All the mod's which were hacked onto the previous design have been implemented on this new board. This new V2 design holds PLCC ROM which can hold 2 versions of TOS. These would be TOS1.04 and TOS2.06. The CPU has been changed for a newer PLCC type which I believe will be a more recent mask and support better over clocking. Currently I have noticed the 8mhz type in the STE can overclock to 16mhz. It is my hope that a later batch of the CPU's marked as 16 or 20mhz will overclock to 32mhz. To date I have overclocked the HC CPU to 32mhz so the speeds are more than possible.
Unfortunately I had a epic of problems with this board. I later found that there was 2 issues. Firstly the GAL's used were not always a good spec..
I have noticed various font's and shades of text across Gal's sourced from China. Mostly not good printed text, often not 100% straight etc. Some GAL's quote 7ns, but in reality, these can be 15ns or 25ns parts. As these GAL's are discontinued by Lattice, it marks hard work in finding genuine devices. The booster will function with 15ns parts, though it may not function at 32mhz speeds. This is currently a on-going problem.
As a side note, the V3 design uses a ATF1504 device which is rated around 5ns where they are still in production. Though as the V2 is a lower cost design/version then Gal's have to be used.
Later I found another problem with the output of the GAL. Mostly it showed up on the ST /AS isolation.
The problem was that when the ST /AS went HI, when ROM was being accessed (not using GLUE to access from but the onboard GAL logic) the output oscillated up to 6.2V. Looking at the supply rails and measuring some ground connection, everything was stable. So it was assumed for some time the GAL had some crazy issue.
On sourcing some genuine Atmel 22V10 GALs. They did not work at all. It turns out after a long epic investigation, that pin 4 is a power down pin on the Atmel chip. It seems you need a programmer which supports a EXT version of the device in order to disable it. Currently I gave up looking. There is oneprogramer, the G540 I think, which lists it, but it does not seem to work. So currently there is no way to actually disable that feature making the Atmel GAL somewhat useless.
I did manage to swap some pins around to avoid using pin4 (pin 1 was not actually used so a small re-wire was done to swap the pins). Though after all that , I had exactly the same problem.
So I went back and re-measured all the ground points on the board to the motherboard ground. To my amazement, there was a 2V spike on one of the GND pins on the CPU. Considering it was connected directly to the other GND pins (just 1cm away) this seemed impossible.
Though there it was. While ground spikes are not new to me, I did not expect this to actually happen on this PCB.
Looking closer into the PCB layout below shows the problem.
All the blue area is copper fill. The highlighted pins on the PLCC package is the 2 GND points on the CPU. Actually 2 GND pins top and bottom of the CPU.
The top copper fill is a nice large area where there is no noise problems. However, the bottom 2 GND pins have 2 volts spike on them. Below I plotted the GND path to show it more clearly.
While the route is only about 1cm long, it generates enough resistance to cause a 2V spike. I hardwired some cable on the back of the PCB and routed a lot of the GND points, and even some 5V rails direct to the CPU, but this did not help. Really if anything, the extra length of cable just makes the problem worse.
Looking at my PCB design files more closely, It looks like while I was having problems fitting the design into 4 layers, the auto router broke apart my ground polygons :( Mostly this is unavoidable as there are so many signals being routed in such a confined space that all 4 layers had to be used for routing. That design ended up with around 800 Via's!
So after spending 9 hours straight in trying to route the PCB better, I did have some limited success. Though It is questionable even if that design would work. It will "probably" work, but that does not sound like a good investment.
So currently I have converted the design over to 6 layers. Now the entire bottom layer is a GND plane, which is the ideal way to do things. The extra signal layer makes better routing and reduced the number of Via's to about 60.
The only problem is here, even from moving from 2 layers to 4 layers doubled the cost of the PCB. Moving to 6 layers doubles the cost again. Some quick workings out would mean the PCB would end up costing around $40-$60 each if just based on a low quantity of about 10 pcbs. Moving to 20 or more, they do get a fraction cheaper, but not much.
If 32mhz is going to be a option, then the 6 layer PCB is the only viable PCB design to use. Without a solid ground plane, it would be simply asking for trouble. So this now poses a problem as to if this booster project is going to be financially viable to produce.
March 11, 2015
I tweaked the GAL code for the 16mhz switch to see what happens with GLUE decoding TOS and results are pretty close!
Below is the GAL decoding of ROM at 16mhz. Followed by GLUE decoding at 16mhz.
Looking at things like GEM WINDOW, 149% vs 164%, that is 15% difference which is a fair drop in speed.Overall there is only about 2% speed difference.
The GLUE still runs at the same 8mhz speeds. Though as found out in previous tests, GLUE issues DTACK reasonably fast. So we can run the CPU at 16mhz during GLUE ROM access and the CPU picks up the bus a lot faster. Technically TOS is still running at 8mhz speeds.
So I revisited my 32mhz tests and shocked to find now that 32mhz CPU gives a really nice speed boost. I have not yet got 32mhz TOS working. But the below image is 16mhz CPU vs 32mhz CPU. This is a really nice surprise.
April 2, 2015
While converting the V1 design over to GAL code. I found some oscillation problems.
The image on the left is the motherboard 8mhz clock, via a 100R resistor to the CPU. The ST boots fine.
The right image is going via a GAL (really just using as buffer) and it generates a oscillation around 70mhz. Without the 100R resistor in series with GAL output to CPU, the ST will not boot. The resistor helps in reducing the ringing, but it is still pretty bad, but the ST will boot. I noticed on the T25 accelerator that 330R resistor was used in the CPU clock line. I am not fond of using higher than 100R in series with the clocks as it starts to reduce the voltage. So I am currently hoping the V2 design with a proper GND and 5V layer will help reduce or even cure the ringing problem.
Currently I am looking into a small snubber to clean up the waveform. The oscillation is around 70mhz so it shouldn't need much to clamp it down.
I did a quick test on using the READ cycle of the CPU to also switch to 16mhz speeds as shown below.
We have 16mhz CPU, 16mhz ROM in first image, then 16mhz WRITE cycle on second image.
Most figures get around 8% speed boost. Overall 5% speed boost.
That was a STFM running with the blitter, so this confirms blitter is working good on this new design also.
I will be doing a proper PCB for this new booster GAL version. I am going to call this V1.5 booster. The actual booster won't be much price difference over the V1 design, BUT, please note, you will also need the ROM adapter kit so the overall price of the V1.5 design will be higher than the V1 design. The V2 design will be much faster and will include blitter speed ups, also faster ROM decoding, and TOS206. I am still working towards faster speeds with the V2 design so I hope V2 will see 32mhz speeds. Though of course as that PCB is 6 layers, the cost of that kit will be a lot higher. There V3 design is still in progress, but one step at a time :)
A sneak preview to the V2 PCB is shown below.
|Over the previous V2 design. This board is "angled" to miss the MMU socket, so that my MMU 4MB kit does not get in the way of this new PCB layout. Also as 4 layers turned into a disaster with over 1,000 VIA's, This new design sees 6 layers using only about 60 VIA's. Also the power rails have been turned into a solid polygon to prevent GND issues which happened on the previous V2 design. The layer stack sees the bottom layer as 100% ground, and with only 60 VIA's, they hardly impact the GND layer at all. The top layer is 100% +5V. So this creates a solid supply rail right across the board which rules out spikes & oscillation's in the supply rails. The inner 4 layers are all the signal routing.
April 20, 2015
These boards are the new V1.5 boosters. I designed a DIP & SMT version as I wanted to use the DIP one for debugging and such. The SMT version would be the final version. Saying that, there is no reason the DIP version can't be used on the MEGA's as they do not have as much of a height restriction as on the STFM.
Strangely, these boards work by going into 16mhz mode on WRITE cycles on the bus. But do not work in 16mhz on a READ cycle. Even so, as mentioned before, it does give around 5% speed boost so can't complain. More testing needs to be done to make sure its stable running like that. Though as it will run GemBench fine, then that's normally a good indication things are fine.
GLUE of course is still doing the decoding at 8mhz. Though we can enter 16mhz on ROM access which gives GEM a really nice speed boost. On the V1 booster Display got 25% speed boost. On this new booster, Display gets 56% speed boost. So a gain of 31%.
56% is of course just the average. We have 83% speed boost on blitting. around 60% on VDI tests.
How the new ROM adapter and V1.5 booster looks installed :)
April 21, 2015
After testing out some floppy drives, I found out that the RW speed up mode caused 2 bombs when formatting a floppy :( This is a bit strange as the RW mod would work perfectly with GEMBENCH but not when formatting a floppy. So must be the DMA circuit or something related does not like that speed up.
So the RW mod I have removed from the V1.5 booster code as it is not stable. Though its not a huge loss in speed . About 3%. So its not to bad :)
May 7, 2015
Work on V2 booster isn't going well :( It seems any delay in /AS signal, even just 10ns is enough to cause GLUE to go crazy. Attempts to solve this have not been successful. I have tried syncing /AS in relation to the 8mhz clock, the CPU clock, 16mhz clock and it just makes matters worse. Overall TOS will boot, but every time it access floppy drive, it will crash. Well ok, sometimes it will boot and run gembench even. But it is not stable at all.
Doing some more comparison tests. So far there seems to be some slow down on the CPU which I have yet been unable to resolve. Integer division drops to 192% and Average speed ends up 142%. So currently the V1.5 booster is actually faster. The V2 should be giving 150% Average results as that is what I have found before. But I suspect as its not stable anyway, that is the problem for the slow down.
I have been asking around for help in any solution to the problem, so will see if anyone else can suggest something. Current;y it seems, GLUE expects very accurate timings and 10ns either side of what is expecting just won't work. The strange thing is the MSTE uses bus isolation the same was as the V2 booster does it, and of course the MSTE works. Though possible the MSTE uses a update GLUE which may not suffer from such timing issues.
So is this the end ? Well hopefully not. I think rather than fighting against the GLUE, just use it to decode TOS104 as normal. It may make access to TOS a fraction slower, but in previous tests it was only a few % so its not a big deal. I can tweak the GAL chip to disable GAL DTACK when TOS104 is selected, and enable GAL DTACK when TOS206 is selected. This way it will / should, function correctly without having to isolate and delay the /AS signal. So unless someone comes up with some other solution, that is how it will have to be.
The next phase of tests will be to get the board working on 32mhz CPU & TOS. TOS104 will probably not gain a huge boost in speed as the GLUE working at 8mhz will still issue DTACK. With TOS206, with GAL decoding, we can really make use of full 32mhz access and even 32mhz CPU alone really makes a difference!
Hopefully there will be something better to report in the coming weeks :)
So time to expand the html layout :)
July 14, 2015
Holger Zimmerman has been kind enough to help solve the /AS isolation problem. I was so close to the solution but never quiet got there :( What Holger suggested was using /BG to actually isolate the ST /AS line, before I was isolating it purley on changes to /AS itself. So using BG isolates the signal sooner and solves no end of problems. So currently the V2 is actually working :)
The hold up on the V2 now is I am trying to get 32mhz working. I had a simple GAL hardwired ontop of a DIP CPU a few months ago and got 32mhz out of it. Though it was running for a few hours then I broke something trying to add 32mhz access to ROM. Though at the time, it was using the iffy V2 code so random problems were expected.
Though 32mhz on the V2 itself causes a row of bombs on power up. As to why the V2 isn't working, when my mock test did, I have no idea. I have not had time to spend on this yet as I have been busy working on other things at the moment. It is looking likely that 32mhz just isn't going to be possible with simple hardware :(
My Veloce STE has a 020 CPU. So I did some basic benchmarks on it vs the V1.5 booster. The 020 is faster but the results are a little confusing. Overall the instruction cache on the 020 seems to greatly increase some functions. Anyway, screenshots below..
These are 020 CPU on a STE with Instruction Cache On (Left) and Instruction Cache off (right)
So we can see the 020 Instruction cache does give a huge boost in speed. Though for some reason, with cache off, a lot of seem to drop in speed. Visually the tests still look fast, so I am not sure if those results are accurate.
Below is same tests but with the blitter turned off.
Interestingly, integer division seems a lot faster on the 020 CPU.
Just for reference as a blind test between the 68000 16mhz and the 020 16mhz..
Again, Integer division is throwing out the average results somewhat. Overall, it seems the 020 CPU is around 35% more efficient than the 68000 running at 16mhz. When the Instruction cache is turned on, the 020 really gives another massive boost in speed.
THE 020 BOOSTER PROJECT NOW HAS ITS OWN PAGE HERE Maintained by Rodolphe.
August 9, 2015
Current V2 booster design. This version currently runs TOS104 & TOS206 at 16mhz. Also CPU runs at 16mhz while not accessing the bus. Speeds are about the same as the V1.5 booster.
August 13, 2015
Not much progress on this so far :( I have been unable to re-create any 32mhz tests currently, when I had it working a few months ago. I can only assume it is because I have since changed motherboards and every motherboard behaves differently. So it has resulted in further delays with this project :(
Myself and Rodolphe have been working on getting the CPU running at 16mhz all the time and using a "state machine" to emulate 8mhz bus cycles. I have been asking around on various sites & Amiga forums the past few weeks for help with this and while most people did not even reply, the ones that did were not partially helpful. So basically we are having to reinvent the wheel due to lack of support from various people and communities :( So this has greatly delayed the progress of our booster projects :(
Due to all the hackery on V2 and V1.5 boards during the past couple of weeks, I have decided to create yet another booster PCB. Only this one is intended for use as a development system. This one will use a ATF1502 (or 1504 if need more design space) and a PLCC CPU, and that is it! I want to route every signal (other than the actual bus) "though" the ATF PLD so we can alter any signal or timing we want. Using Jtag programming will greatly speed up making changes to the physical hardware. Also every pin on the board will have its own header so we can easily connect a logic analyser. Currently I have several sockets soldered ontop of each other for various "mod" to multiple signals, plus the LA in a socket ontop of this huge mess. Where I have to dismantle the thing to reprogram the GAL each time. It just isn't viable to keep working like that. So while it will take a couple of weeks to get this new design to real hardware, I think in the long run it will save a huge amount of time.
Rest assured that while there may not seem any progress for weeks at a time, that we are both constantly working on these booster projects :) We will not rest until we break the 16mhz barrier and push past the 32mhz barrier :) So I have been told the PLCC 68000 is capable of 80mhz. While I am sceptical of this, 32mhz is pushing the CPU somewhat already. On the Amiga forums people suggest 50-60mhz is possible. Though there is no way to currently confirm these claims and no way to know if they were stable.
Ultimately I think 32mhz is a reasonable goal. I also want to add fast-ram so RAM / ROM & CPU will run at 32mhz. This will make one ultra fast ST! Of course if the CPU can clock higher then we will of course push it to the limits :)
August 14, 2015
This is the proposed draft Dev-System booster. We have a huge ATF1508 with not 1 single IO free. Plenty of ceramic and bulk capacitance. Every IO signal (excluding the bus) is going via resistors to clamp on bad signals and oscillations. PLCC CPU so TTL or HC 68000 can be used. 4096 ROM to hold TOS1.4 and TOS206 (like V2). Jtag programming header. Every CPU pin has a optional pull up SIL resistor array. Some pins need pull up, some CPU lines might need stiffer pulls ups, so thats all now totally covered. Every CPU and IO pin has a header for easy connection of logic sniffers. Oscillator for testing out of sync clocks and finally expansion header which Rodolphe proposed for the V3 booster. The expansion header is in mind for future upgrades such as IDE and Fast-ram.
At the time of typing, I need to add a few more signals, and do PCB layout. I hope I can have this design finished by end of next week :)
August 16, 2015
Final PCB layout for the dev-system. Now to get a couple manufactured so we can start work properly :)
September 18, 2015
|This board has everything in sockets. So easy to swap
out/change pull ups. Also every IO pin (other than address & databus
form CPU) has a 100R resistor to reduce the nasty ringing on signals. They
also in sockets (DIL ones) so they can be changed easily or swapped out
for other values.
4 layer PCB, middle 2 layers are VCC and GND. It inherently has about 915pF of capacitance, so this will help with HF bypassing across the board. A couple of bulk capacitors and a lot of de-coupling caps all over the place. May be overkill, but I'm seriously not wanting to end up buggering about on this board.
4096 ROM like on V2, TOS104 & TOS206 on board. 68HC000 Freescale CPU & ATF1508 PLD (1504 will probably be enough though for now) Jtag programmable so I don't have to keep faffing about removing chips in and out the board every 5 seconds to reprogram them. The ATF has awesome amounts of IO capability and speed. So no more "it won't fit in the 22V10" type annoyances.
Across the front is access to every IO line on the board for easy connection to Logic Analyser. I got tired of spending time bodging sockets on sockets to have the GAL's removable and LA pins on top, So this is all as simple as it gets now. They are all labeled up as to what signal is on what pin. Another annoyance of keep having to refer to the schematic and working out what signals are on what pins. So all these improvements mean we mean business to making progress this time ;)
The unused header at the top of the board is for the expansion bus, that will come later and will offer things like fast-ram, which will run at whatever speed the CPU is running at (my aim is 32mhz but hope to push higher eventually). So far I have been struggling to get past 16mhz, but 32mhz I really think is solvable in due time. At least on this board I don't have to worry about signal noise or regulation issues on the supply rails. Signal noise is mostly what I seem to be spending time on fighting on past boards rather than spending the time actually developing the code with Rodolphe.
Hopefully will have some more news in the not to distant future. Updates may be slow, but rest-assured these boosters are still being worked on :)
December 6, 2015
Finally got chance to start working on this project again. Above is a small mod link which needs to be done to the dev-system board.
December 9, 2015
I backtracked a little to the V1.5 booster as I was hacking it in to fit in the STE. I also tried 32mhz input and it works! Well just for CPU boost only. Images above show STE with 16mhz CPU Vs 32mhz CPU only (no fast-TOS in this test) and blitter enabled.
ROM access does not work at 32mhz :-( From what I can tell, when the CPU sets /AS LO there are 3 states before CPU check for /DTACK. At 32mhz, this is 32ns per cycle and 16ns per "state". So 3 CPU states = 16 x 3 = 48ns. I took into account 18ns delay from the GLUE so 48ns this delays ROM decode to 45ns (ROM IC speed) plus 18ns (GLUE time from /AS LO to ROM_CE LO) which works out best possible ROM speed is 45+18ns = 63ns. As our CPU checks DTACK in 48ns, we need to take away 18ns for GLUE delay so the time CPU must have ROM data in 48-18=30ns. Our total ROM delay speed is 63ns so we are 63-30=33ns to slow :-( . 63ns is as fast as ROM circuit can decode, so 63ns = 16MHz!
The above is not totally true that the ROM top speed is 16mhz. If the CPU runs at 16mhz, then we have 3 states at 32ns = 96ns. So ROM having 63ns means nothing is unhappy. In fact after some thinking, the maximum possible ROM speed would be about 25MHz! I hope to try a small circuit to delay DTACK by 1 32mhz cycle (32ns) to see if this fixes the "slow" ROM access speed.
December 10, 2015
So more tests with 32mhz. First time ever have 32mhz TOS :) Well, not exactly 32mhz and we had to put some wait states in there. Though possible we can speed it up further in the future. So the good news is TOS now runs faster along side the 32mhz CPU and ROM ACCESS now has a whopping 43% speed boost! So overall, 11% boost on CPU, 21% on display. Overall 18%! GEM window sees 28% speed boost, so while results may not seem to impressive, there is a sure worthwhile speed boost there! I suspect ROM could be running at 16mhz here, but its early days. The issue currently is the blitter seems unhappy to run with 32mhz TOS. So above results are with the blitter disabled. So next task is to figure out why blitter is unhappy...
December 11, 2015
FINALLY!! Now I have 32mhz CPU with 32mhz ROM (with waitstates :( ) and now with the blitter working! The blitter issue is actually the same issue I had with the V1 booster and the blitter in the STFM. The V1.5 booster on its own doesn't have the issue. Though for some reason while the CPU clock outputs 8mhz with 8mhz on the blitter from the master clock, there is at least 7ns delay between the clocks and the blitter simply does not like it. So I linked the CPU clock to the blitter clock and now the blitter is a happy bunny again :) Most probably with a faster PLD this 8mhz problem won't be there.
So, with the blitter we gain some huge boosts in speed. Display is now 165% faster, CPU is the same of course. GEM Window sees 123% speed boost. All good.
.. and below is the current WIP STE :)
A nest of wires, but progress at long last :)
December 12, 2015
So after much thinking, about 30ns WaitState will give full speed access to TOS (at at least as fast as can be). So on the left we have about 68ns delay on ROM access, and on the right about 36ns delay. My mathematical guesswork 36ns delay +18ns GLUE delay = 54ns total. So thats about 18mhz.. I think... In anycase, with 45ns ROM's this is right on the upper limit of what speed is possible.
So now we have gained a additional 25% average, 5% CPU and 34% on the display. GEM window sees 36% increase in speed. Awesome :)
Now more good news...
As the ROM's are 45ns this was using CE, Now using OE only I can run twice as fast. When I tried this before I got nothing, not even white screen at power up. So most probably now I linked blitter clock to CPU and solved that issue, it now solved another ROM issue :) So on the left we have 32MHz CPU and approx 18MHz ROM. On the right, 32MHz CPU and 32MHz ROM :)
So now we gain another 18% on Display, Another 13% on CPU, overall 17% boost! VDI text sees 35% speed increase! So clear that ever MHz counts! :)
December 17, 2015
|I did notice some time ago, that if the blitter is run
at 16mhz when the CPU program its registers, then it gives about 2% speed
boost. Though while testing the blitter today, I noticed that while the
register code is not in this GAL, I still had a 1% speed up.
On the left is the blitter on 8mhz clock, and on the right the blitter linked to CPU clock. The CPU only run at 32mhz when access TOS or CPU_AS = HI. But I assume here when blitter access TOS (CPU_AS=HI AND ROM_CS=LO, IE 32mhz mode) , that the blitter will actually read TOS a fraction faster.. I don't get why ROM speed goes up 1% as the blitter shouldn't have any effect on that test.. But clear some tests gain 1% extra speed this way.
So possible with CPU talking to blitter at 32mhz (if code switches to 32mhz when CPU talk to blitter) possible to gain 4% there (2% on 16mhz this method). Also as I now notice 1% speed boost, that overall should give 5% boost once register address's are added into 32mhz switching code. Not a huge boost, but interesting so far the blitter seems happy with 32mhz input.. at least in this test.
December 21, 2015
Falcon on the left, ST 32mhz on the right. GEM Window gets to 15 then crashes, so maybe that was a bad idea on my part to up the windows from 10 to 20 :-\
May 2, 2016
Some small progress. I have tweaked the STFM V1.5 booster design to be used in the STE. Hopefully this can run at 32mhz on all STE's. The board on the left is the main CPU booster. This solders into the STE CPU area and the new CPU plugs into the board. The PCB on the right is the new Fast-Tos board which is switchable between TOS1.62 and TOS2.06. Both boards have been carefully designed on 4 layers to work at 32mhz speeds.
In development next for the STE is the below design..
This is a prototype which will initially be tested on the STFM. The V2 STE BOOSTER will also be 32mhz like the V1, but it will also have IDE and Fast-Ram running at 32mhz.
I would imagine the Fast-Ram will be easy to get working. Though I have no experience in IDE circuits. I have adapted Putnik's IDE circuit which is tried and tested. Though I have made some changes to do-away with the monostable and other TTL chips. I have ported his code over to a GAL which I use and can fit the design in that I want. Though at this time it is untested. I don't know when the above design will be done as I am backlogged and swamped with several other hardware developments currently. The STE V1 booster will be my main area of development.
May 16, 2016
Prototype STE booster PCB's arrived :) I hope to do some testing over the next week.
Also I got side tracked into trying a 020 CPU mashed up with the V2 booster for ROM decoding as this requires TOS206.
Benchmarks are really interesting. While I was comparing TOS104 to TOS206, there will be a few % difference anyway. That in part explains some scores being below 100%. Though its likely the 020 logic isn't working optimal (yet). While this is still at 8mhz, the Integer Division shows a clear boost in speed even with the cache disabled on the hardware. I think I read somewhere in the 020 datasheet that it had been optimised over the 68000. So this would make some 3D stuff run a lot faster.
On the right image, the cache was enabled and we can see we have a nice speed boost over around 20% as expected.
The blitter isn't working on this test, though I hope to fix that on a later build. I also hope to jump into high gear and get this working at 16mhz. It will be really interesting to see what the 020 CPU can deliver over the raw 16mhz power of the 68000. While the 020 cache gives around 20% boost at 8mhz, likely it can push 40% when running at 16mhz.
I have designed a new 020 prototype PCB This is similar to the V2 booster, but with a 020 CPU. I don't know when work will continue on this project due to lack of funds lately :(
I either have a faulty 020 CPU which only sometimes works with the cache disabled, or a fake 020 CPU. The CPU on the left works well, the CPU on the right works really badly. Looking how some text is missing on the right image, plus the different print on the heatsink pad itself, I am going to assume its some sort of fake CPU. Or at the very least, faulty.
May 17, 2016
The new STE booster isn't stable at 32mhz. In fact it only boots at all when the scope probe is on the CPU clock line. Otherwise its just a black screen. Considering the V1.5 booster was hacked into a STE and it worked fine and now a proper PCB does not work got me wondering. Closer look at the motherboard itself shows I have used 2 revisions. The V1.5 booster was using a early STE where the TTL chip was ontop of another chip in front of the RAM. The second board has the 3 chips soldered into the board indicating it was a later revision.
So I looked more towards the relationship between the 32mhz and 8mhz clocks and found the early STE uses a 74S257, The later board uses a 74F257.
F on the left (does not work), and S on the right (works)
The relationship between the 32mhz and 8mhz is slightly different. With the F IC , the 8mhz and 32mhz RISE is exactly in sync. On the S series, the FALL is exactly in sync. I suspect loading the CPU clock with the scope probe is delaying the clock just enough for it to start working again.
Looking at the datasheets, the F series has about 5ns delay, the S series about 10ns delay. Considering the skew is about 5ns then that is likely what the problem is. I will swap the IC's around and see what happens...
So changing the IC itself did not make any difference. The timings are exactly the same as before so a big WTF there as that chip is the 32mhz buffer IC.
So I found the scope probs on the CPU clock pin works. This is odd. In fact just 25pF works.
It seems there is a 1volt glitch which seems likely the cause. While the capacitor would slow down the clock overall a fraction, it also effects the glitch pulse. In fact I also used some diodes and resistors to drop the voltage and that also worked. The glitch would seem to fast to cause any issues. Though its possible the CPU can run at speeds 50mhz or over. So the CPU could may well "see" the clock pulse then cause the machine not to boot.
I have seen that pulse before and tried for a long time to cure it on the STFM boosters. Though with 16mhz switching the pulse isn't generally there so its not a problem. In fact even with the pulse 16mhz generally works. I suspect the pulse is some odd problem internally with the GAL logic itself. While adding the capacitor works, I'm not happy with that solution. Next I will try and see if I can add a buffer IC which has some good hysteresis to clamp down on such false signals.
The only IC I had to hand was the ones I used on the STM ROM boards, so I used one PCB and a buffer chips (actually AND GATE) and hacked that into the CPU clock line. Image on the right shows the 1V spikes totally gone! Of course I do see some under/overshoot there which wasn't there before. Though with the wiring not being ideal.. etc.
I've also changed the data & address SIL arrays for 2.2K ones. Likely I will add those onto the next revision as I have had issues on the STFM with values under 2K. While the STE uses 4.7K and 10K, I'm not totally happy about leaving that to chance.
And not so finally..
Benchmark vs a stock STFM, with and without blitter.
May 30, 2016
Been driving myself nuts trying to work out how to run the CPU faster than 32mhz these past days. I finally got something working at 38mhz. It seems the CPU might not be able to operate faster than that. I have got some more CPU's arriving to see if they work any faster. Though currently the speed gains are not huge. Likely I need to run in multiples of 8mhz. So 40mhz or higher would likely be the next "sweet spot" for running faster. Though its unknown at the moment.
Meanwhile. Here is the result of the 38mhz benchmark vs the current 32mhz CPU & ROM.
Nothing terribly interesting other than Integer Division of course gets a huge boost in speed (well 20%). Note the above benchmark is in comparison to 32mhz speeds not a stock machine! While its seems a waste of time, one thing to realise is, if this was the 020 or 030 CPU then the internal caches would be running faster aswell. Its clear that every mhz counts, though after 32mhz, at least so far, there isn't much to "gain" in a basic 68000 system. With the caches running faster, likely those results would see slightly higher figures, maybe around 10%. Again, this is 10% above 32mhz speeds.
In relation to a stock machine, Integer Division jumps from 355% to 428% (73% boost) . Overall we see a 15% speed boost, not bad! This is only 6mhz jump, but 15% is nothing to sniff at! Also since the 020 CPU can run on a 3 clock bus cycle, there is no reason why that can't be put to good use and while 3 clocks gives a boost over 4 clocks, the few MHz boost ontop , it all will add up!
I really want to hit 50mhz, though if the CPU is going to max out at 38mhz then it doesn't seem likely :( I wonder if the PGA package would overclock better, though without building up another circuit to try then its hard to know for sure. With time and funds hitting a alltime low I'm not sure what the best direction to go in is.
UPDATE July 12, 2016
I had a thought that maybe when the CPU hits speeds of over 38MHz that it could be re-reading DTACK from the previous cycle as DTACK is linked to the 8MHz clock. So The GAL was changed to include DTACK in the switching. Speeds dropped overall by about 50% and 38MHz was still the upper limit :( So It again looks like the CPU simply maxes out at 38MHz. I have also tried without the Fast-ROM board just in case it was the ROM maxing out, but made no difference. I also tried 7ns and 20ns GAL's and made no difference. I have also tried upping the speed during integer division and it bombs also. As the CPU doesn't access anything on the bus during that test, then its pure CPU clocking. As it crashes, then I think its pretty conclusive that the CPU can't run any faster.
I did look into the SEC CPU, as people claim 80MHz speeds are possible. Though unfortunately is only has 2 wire bus arbitration and the ST uses 3. I don't see any easy way to fix that. It would be interesting to see what higher speeds do. Though I think moving to a 020 or 030 CPU is a better option. The 040 CPU is awesome, but unfortunately it doesn't support 8/16bit modes without more support logic. Jumping to the 020 and 030 shouldn't be to much of a logic change.
I think the STE booster is pretty much maxed out now. Nothing more can be done there. While I could possibly develop a small clock board for people to try their luck at overclocking, I'm not sure its worth investing the time into it just to gain a few more %. Though if anyone wants to experiment with higher clock speeds, then they can design their own board. A RSO like the LTC1799 & LTC6905 are likely the only solution. Fixed oscillators are ideal, but with limited frequency ranges then some sort of programmable oscillator is the only likely solution.
The STFM and 32MHz still needs solving. The code for the DEV-System still needs fixing. This should be able to run the V2 booster code, but the new ATF PLD is causing some issues which I am yet to solve. This is also holding up progress in general. My aim is to run a 030 CPU at 32mhz, at least short term. Likely those CPUs are 50-60MHz capable. So The ultimate aim could maybe be 64MHz. Of course with my time so limited lately I am not sure when work on the boosters will continue.
Solving the ATF PLD coding issues is the next step to solve and getting the STFM V2 booster working on the new ATF PLD. This could take some months, simply due to lack of time on my part. Hopefully 32MHz can be made working on the STFM around the same time. While I did design and even get PCBs made for a new prototype V2 booster, its physically to large to fit. While the V2 uses 3 smaller GAL's, things can be squashed up to fit easier. Though with a larger PLD, offering higher speeds and more IO power, the design becomes to large. Overall its likely a 32MHz STFM booster will not get developed. It would be interesting to see if the STE V1 booster would work in a STFM, though with adapter boards hard to find, its probably not something I will ever try out.
Rodolphe has been working on a 020 booster with a Xlinix IC. Though this needs VHDL programming which I know nothing about and simply don't have time to learn. Rodolphe doesn't know much about it either, so I am not sure how that project will turn out. He was working on the sync issues with syncing a higher speed clock back to the bus to run faster than 32MHz speeds. Though with our time being more and more limited of late, I am not sure if it will get to a more final prototype design anytime soon. I am still trying to solve the ATF PLD issues (which is the main reason Rodolphe abandoned the ATF devices) As I am more familiar with GAL PLD programming. The ATF PLD is more than capable of doing what I need, its just working out why it doesn't behave. I hope over the next several months I will have time to work on this again some more.
July 19, 2016
A bit of a backwards update here. In relation to a topic which was talked about a couple years ago about speeding up the ST-RAM. http://atari-forum.com/viewtopic.php?f=15&t=27088&start=275#p264418
The long and short of it all, is basically cutting the 16mhz input to the MMU and feeding in 32mhz. Then everything "down the line" gets double clocked. This also needs faster ST-RAM! Then the GLUE needs to work to fix some video issues.
The MFP should get 4MHz, PSG gets 2MHz, CPU gets 16MHz, MMU gets 32MHz, Glue gets original 8MHz.
GLUE still running at 8mhz means some delays on DTACK generation at higher speeds. As proven with the V2 booster, GAL decoding is a fraction faster. So it would be interesting to try this mod with a faster ROM speed. 32MHZ access to ROM with 16mhz RAM speeds should be pretty awesome. Though with the STFM refusing to be reliable currently at 32MHz then it needs solving first to make this mod worthwhile.
Once the MMU part is speeded up, the MMU triggers the display twice as fast. so the screen area actually seems to become a 64K block! I guess the effective resolution would be 640x400. Though I'm not sure about that.
Once the GLUE modifications are completed, the video is restored.
My plan currently is to try and resurrect that machine and try with GB6 to get a better idea what the speeds actually are. Likely I will see if I can add some buffering to the clocks on the board, or just replace them all with the MEGA buffer board as that has multiple clock outputs already available.
If faster ROM decoding was there, which is offered with the V2 booster, then there should be a small boost appear in a lot of the tests. Ideally 32MHz would be the ideal thing to go for. Considering 32MHz alone can push 363% overall speeds with RAM running at stock speeds, Then All the 32MHz benchmarks should almost double in speed as ST-RAM would be running at double speed. I would expect to see something about 500%-600%. With integer division not really using RAM, then such tests are unlikely to see much speed boost.
I will likely look back at the V2 booster design and see if I can work out why it doesn't work at 32MHz. Its actually similar to the STE booster, just the STE booster doesn't have ROM decoding. So the design itself should operate at 32MHz. So much work to be done as always... Likely the 020 and 030 booster would be pushed to compete with such a booster. Though faster than 32MHz speeds on the 030 are yet another problem which needs a lot of time to finish fixing.
November 13, 2016
Work is still on going with various aspects. Currently I am working on a next-gen of STE booster which will include a IDE port which will be compatible with the 32Mhz speeds. It seems, for what ever reasons, general IDE designs do not work with the 32MHz speeds. So I am trying to learn how IDE works to include it on the next design. Also there will be a Fast-RAM expansion port, where 32MHz access to Fast-RAM will be possible. Currently due to lack of time, I am held up with the IDE side of things. I need to design a new PCB to try the design out.. But could be some months before I get time for that.
Again, no work on the STFM in solving the 32MHz issues. So mostly I am working on the STE. Though I think its a problem of CPU signals all arriving at times that the CPU doesn't like when its running at higher speeds. The STE seems to have better control and as mentioned before, I have run up to about 40MHz variable without issues. So I need to look into why the STE behaves and the STFM does not. Likely this boils down to various timings with /AS and /DTACK. Solving this has turned into a huge epic of chaos and its something I just don't have time to look into currently.
I have also pretty much given up with the ATF series of PLD. In part, while its awesome for simple stuff, when circuits get complicated, I just have no idea how to convert my circuit ideas into code, so its just become a uphill battle fighting the circuit and the code. So I have started with a Altera device which is a LOT more expensive, but it allows me to "draw" a circuit and the Altera compiler works out how to program the code for the IC for me. So that will solve the coding issues I have had since day 1. Because I am starting out with a new IC, it also means I have to start over from scratch with the booster technology.
As my time is limited and there is very few people willing to help development, its possible the next-gen STE booster will be the last booster I produce. I have had in mind to try the 68SEC000 CPU as people claim 50MHZ-80MHz is possible, though its yet another design to do, yet another PCB, even more debugging and work.. Not only that, I was also interested in moving to the 020 and 030 CPU and moving to 32bit FAST-RAM and TOS. As mentioned before, I have dabbled in the 020 already, so the 030 shouldn't be to much of a problem. Though with so many design issues to solve, I am thinking its just not realistic for one person to design this stuff from start to end. I am so busy with so many other Atari projects lately that I think I am just going to have to drop working on these booster projects in the near future. Work has been ongoing for over 3 years now and just not enough progress is being made. I think time would be better spent on projects which are more easily completable
I do plan to finish the IDE and fast-ram addon for the next-gen booster. Though with not having much time to work on it, its likely going to be several months before I even get time to produce a prototype. This of course depends on funds generated from the current V1 design. With myself funding so many projects my webshop isn't producing enough funds lately to recover costs of building those projects, nevermind funding new developments. So time and funds have become a huge issue over the past year. My webshop currently has about £12,000 worth a stock I need to sell yet, with a projection of £20,000 once most current projects are completed. This is also excluding the huge piles of PCB's, parts and machines I have puchased for various projects which haven't seen the light of day yet. I'm not a rich person and just can't keep standing out of huge sums of cash on every project. So I think some serious changes will have to happen over the next year.
December 23, 2016
Last update of 2016! A year gone already!
I was lucky to obtain a PAK020 board so I did some quick tests to see what it does. This board has a 020 CPU and a FPU. Though the FPU isn't really interesting to me as unlikely any software even uses it on the STFM. This PAK020 seems to use a clock doubler to double the 8MHz clock to 16MHz and some clever logic to keep it in sync with the 8MHz clock. What is interesting is the 020 can also run in 32bit mode, so ROM access is actually 32bit not the normal 16bit. This means ROM speed doubles just going from 16bit to 32bit alone. More on that later though. The 020 CPU also has a internal instruction cache which gives a fair speed bump in most tests.
Results below are the PAK020 running with the cache on and off.
Left = Cache OFF , Right = Cache ON. (I disabled the cache on the PCB itself so GB6 still reports cache on but it actually isn't)
Display tests see a 33% speed boost so thats pretty sweet. Int-div gets a boost from 453% to 553% so the cache works well there also.
ROM speed seems a lot lower speed than I would have expect for 32bit. But still a really good speed jump.
Below I overlay the STE booster running at 32MHz vs the 020 (with cache off to make it a fair comparision) which is running at 16MHz.
Whats interesting is the 16bit 32MHz STE ROM access can easily beat the 020 16MHz 32bit access. With the 020 having more instruction cycle efficiency we can see the PAK020 easily beats the STE 32MHz booster. Overall there isn't that much difference between the 2 benchmarks.
The 020 only really pushes ahead with its instruction cache enabled, which is to be expected. The results vary from about 20-40% additional speed boost depending on test.
What caught my eye was the 020 fell behind on RAM access by 5% so actually runs slower than a stock machine! This likely explains why the 020 struggles to match 32MHz 16bit speeds. I first thought it was down to some bug in the 020 logic or something slow with the GAL's used. Though sanjyuubi said the following to me...
I actually looked on the few first cycles after reset on A600 with 68000 and 68020. 68020 performed more reads than 68000. Normally those additional reads ends up in the cache, but if you disable cache, you're wasting bus cycles for instruction fetch if the bus is 16-bit (doesn't matter on 32-bit bus since it's always one cycle), and even more if the bus is 8-bit. I noticed this first time when I ran 68020 on 7MHz in A600 without cache and fastram, sysinfo said that this CPU is a bit slower than 68000 and Jim Power game was stuttering. 68020 with cache enabled is always faster then without, even if the memory have no waitstates, nothing surprising since CPU doesn't have to waste 3 clocks to read from memory.
So really the 020 simply isn't designed to run with the cache disabled. While the 020 is more cycle efficient in some cases, it also wastes more cycles than the 68000 with the cache disabled. That to me really seems like a bug in the 020 design as the CPU shouldn't be wasting cycles in that manner, but oh well, that's just my opinion :)
At some point next year I want to try pushing the 68000 to 64mhz. I maxed out at 38MHz on the STE, but there is a odd version of the 68000 where the Amiga people claim 50-80mhz is possible. Though that CPU wires up almost like a 020 does, so its not a simple swap by far. Currently I am trying not to get sidetracked into 020 designs until I have totally exhausted all the possible 68000 designs.
Similar with other aspects of the boosters and problems yet to solve, its taking up a lot of time and over the past year I have hardly any time to work on such issues. Its got even more complicated since I want to move over to a Altera Device, but that means I am going to have to totally reinvent my previous booster designs and learn about a new device and programming software. I am not sure when I will get time to do all that so I think its unlikely any new designs will appear over the next year.