Page 2 of 3


Posted: Fri Dec 29, 2017 12:32 pm
by exxos
Tag RAM is impossible to find these days, its why I look for other solutions.

SRAM is easy to find overall, just voltage tends to be 3.3V these days.

I wonder how the MSTE solves DMA type accesses in relation to the cache ?


Posted: Fri Dec 29, 2017 6:05 pm
by rpineau
Regarding caching and DMA/Blitter access, on the MSTE the cache get invalidated every time another device takes the bus (/BR, /BG, /BGACK).


Posted: Fri Dec 29, 2017 6:48 pm
by Petari
Well, some people talked about doing tests about that cache reset (invalidate) - when blitter activates. So far nothing.
Similar can be done for disk DMA too. It seems that I will need to do it myself ...
Fill cache by reading some area multiple times, then blit there, or load from disk there some other data pattern, then compare it using CPU . If cache is reset, it will read real RAM content. If not, then CPU will see old content, what is only in cache.


Posted: Fri Dec 29, 2017 9:01 pm
by Cyprian
Petari wrote:
Fri Dec 29, 2017 6:48 pm
some people talked about doing tests about that cache reset (invalidate) - when blitter activates. So far nothing
me, unfortunately my free time is very limited, you know, kids, fultime job...
shall do them.


Posted: Mon Jan 01, 2018 7:09 pm
by exxos
I have been looking at SRAM..

Basically falling back on the 1MB 5V 16bit chips.. ... 2bHpnt8%3D

Which would have course cache 1MB of RAM... I think as games will mostly be using ST-RAM then a cache of the first 1MB should be fine.. Anything using more than 1MB of RAM, should really be some application, where really it should be running in alt-ram..

But in any case, even just for a test, 1MB is absolutely fine..

Problem then becomes I need a 17 bit to store cache "hit or miss" data.. So the cheapest one is a 1MB 8bit chip... ... ELx29nE%3D ... PeIdM5A%3D

Really it probably needs more like 10ns for the 17th bit, as we need to read this bit very fast, to intercept the CPU_AS faster.. Am not really sure if 45ns will be fast enough... As when the CPU sets address strobe low, the PLD logic will look at 17 to see if it is set, if so, then ST_AS will be isolated to prevent the MMU seeing the RAM request cycle.. But of course address strobe will be low for that 45ns time period.. I doubt this would cause a problem, but would be happier with a much faster RAM..

It will be a long time before I will get chance to design his PCB.. But if anyone else can design it with these chips then it will save me a lot of time.. It will need a large PLD, so likely I will stick with the ATF1508..

As mentioned before, the logic is very simple, or should be.. Basically just write to the cache RAM during CPU wright cycles. During CPU read cycles, ST_AS will be isolated from the motherboard when the 17th bit is set high.. indicating a cache hit. So the CP will read from the cache at higher speeds and will not request ST-RAM.

Of course if the 17th bit is low, indicating the cache "miss" (aka no data loaded yet) , then CPU access ST-RAM as normal and the 17bit updated afterwards.


Actually just realised that the 17th bit would need a global reset, to reset every single address to clear the cache.. So this would actually be a bit of a problem I think :roll:

The only possible workaround is that when something else is accessing the bus, that another PLD would go through every single address clear the hit or miss bit.. But I fear this could even take some time to do even at 32Mhz :roll: ..

1 second to ns = 1000000000 / 45ns (sram speed) 22,222,222 address per second (1MB = 1,048,575 addresses ?) so around 0.05seconds to clear the address range :roll:

If only we had a PLD with a million single bit latches... I am not sure technology is quite there yet :lol: :roll:

So I think that there is no solution to this problem :( at least I looked into it, but I guess the only way to speed up is to overclock the MMU. I think this would ultimately be a lot faster, and is likely a lot easier as well anyway...


Posted: Tue Jan 02, 2018 6:32 am
by keli
I don't think you'll need a valid bit for every byte of cached memory. You could for instance have larger cache lines. That would impose a bigger delay on initial cache misses but then more would have been fetched on subsequent reads.


Posted: Tue Jan 02, 2018 8:31 am
by Petari
I think that clearing cache tag can be done smarter: when reset signal happens, you just set cache off global HW register - 1 bit . That will result in not accessing cache at all, so CPU works slower. In same moment start to delete tag cache, and when it is finished, you can set cache on. So, no need to wait for complete tag clear. Indeed, problem is worse with larger cache.
Question is is there sense to use so large cache like 1MB is some ST ? I don't think that it will benefit really. We still need to solve copying of written data in cache to main (slow) RAM with some system. Actually, if you clear cache, you need to ensure that what is written there, and still not copied to main RAM will be copied - and that may cause much bigger delay than clearing tag. So, you will need not only 1 bit for 1 byte, but probably 2 - second will be on for data in cache waiting to be written to main RAM.

Doing new MMU, designed for much faster main RAM - we can use simply some SDRAM module, or even DDR - there is lot of it around, and they are very fast. Then no need to any cache for CPU. RAM is enough fast for video - and of course for some better video modes than 32KB ST modes. And to be accessible from CPU bus without any slowdowns, while CPU runs at 50 MHz for instance. Well, that would be case with very fast DRAM. With syncro DRAM there are some waits when accessing it not in order. All this is not simple.
Best solution would be using SRAM (static RAM) . Then it can be really ultrafast, without waits, without cache. I remember using 32KB SRAM from old hard disk instead DRAM in Spectrum. It was in 1992. Now we have much bigger SRAMs in same: hard disks. So, I will look what can get, at least for some tests - in dead hard drives lying around by me. Something like 4-16 MB would be great for new ST design. Still, you need new MMU, but it will be simpler with SRAM than DRAM.

P.S. On 200 GB Maxtor there is Samsung K4S281632 ... - 128Mbit SDRAM , what would be 16 MBytes . Not exactly SRAM. Datasheets available.
On 3GB Seagate drive there is Winbond W24 1024AJ - that's SRAM, but only 8-bit, 128KB. 15nS .
So, it seems that they went on SDRAM in newer drives.


Posted: Tue Jan 02, 2018 11:33 am
by exxos
cache could be turned off yes, but as address ranges are of course basically random, there would be no way to really clear the whole area.

The only possible workaround would be that each time a address is read from the cache, that the hit bit is automatically cleared. But this would be likely a fair slowdown as same address would be 50% from ST-RAM and 50% from cache. Whereas if a address was read multiple times, it would be always read from the cache. But this really defeats the object of having a cache if it isn't really giving a good speed up.

The operation of it is very simple, as already explained, it is a lot easier to cache entire address range as do not need any special logic for address matching etc

Basically read and writes happen in parallel to the processor. But if there is a cache hit, then it will be read from the cache and the CPU will not be allowed to set address strobe to the MMU.. So ST-RAM isn't accessed. It is basically all it needs to be.

There are no other problems other than clearing the cache hit/miss bits. But SRAM has no reset pin to do this :( CTRAM is small and hard to find these days.

I think time is better spent on overclocking the MMU initially. Of course if someone created a new faster MMU then we could have SRAM running at 32MHz with the CPU.


Posted: Tue Jan 02, 2018 12:39 pm
by Petari
I looked about SRAM prices. 4MB, 16 bit (so, good for some ST clone as main RAM) 55nS is about 20 Euros. 10nS v. is about 80 Euros, what is too much indeed.
Overclocking MMU - but then it will break whole video generation - it will increase horizontal freq. of video and all other. And of course, it is questionable how much OC is possible.
55nS SRAM instead some 200nS DRAM means 4x faster RAM . So, will be good for 32 MHz 68000 + video . No need for cache (and no sense) - there will be no wait states for CPU when accessing RAM. 62.5 nS would be RAM cycle then - 1 for CPU (bus), 1 for video . But only with some extended video mode. For usual ST resolutions only every 4-th cycle of video access is enough - then will read 2 bytes each 500nS - and only when DE is active.
This could allow for instance that disk DMA access RAN in ST video modes without slowing CPU at all. Same for blitter - but I don't think that someone will make new, superfast blitter clone.
And we are again at changing a lot of custom chips. ST design is just too well optimized - you can not just to change one component to get some significant speed gain. Cache like it is in MSTE is one of exceptions, but it is not that simple - there are at least some 6 chips involved - counting PAL chips too, of course.
What is sure is that RAM speed was main bottleneck for almost everything - video modes, CPU clock, blitter speed, and even partially for disk DMA max speed. That's best visible with Falcon, where there is slowdown of CPU in higher video modes.
So, I say that faster RAM is first thing in boosting.


Posted: Tue Jan 02, 2018 12:49 pm
by exxos
Have you seen the MMU speed up myself and troed have been working on ?


You can retain normal video modes etc. but more troed was looking at using the extra resolution which is also possible. But for now, you can simply speed up MMU RAM access speed without breaking the video.. needs a little work, but possible.

Scroll down near bottom, July 19, 2016.

That is 200% overclock of CPU & MMU. If we had new MMU using SRAM, then we could easily get 400% speeds, and I also talked with troed about bandwidth to shifter, we could create new shifter to support more colours and higher resolutions.. but a lot of work to do.. but it is being worked on.. Just I have little time to work on my shifter replacement.. but need new MMU first anyway.