HDD drivers. How does it work?

News,announcements,programming,fixes,game patches & discussions.

Moderator: Petari

fenarinarsa
Posts: 9
Joined: Mon Dec 18, 2017 4:03 pm

HDD drivers. How does it work?

Post by fenarinarsa » Wed Dec 20, 2017 10:47 pm

Hi,

While doing Bad Apple!! I did some research about the HDD/FDD DMA on ST because I wondered why my disk access were so slow.

To be precise, I'm using an UltraSatan with Ppera's drivers (last version I think).

The first version loaded frame sequentially from a file. Frames were of variable size, from 2 bytes to 10kB. This version was very slow.

When reading a recent documentation (very interesting, can't find the link right now), I learnt that DMA transfered data in blocks of 512 bytes and that in order to read less data a full block needed to be read in a temp buffer and then copied back to the final buffer.
I also read about different sector sizes that makes things more difficult.

I also found out that tools like Kobold copied data really fast by using a big buffer, that's why I thought I had to use a bigger buffer and read as much data as I can instead of a few bytes/kB each time.

So the final version reads out data by blocks that are at least 100kB. It's not rounded to 512 bytes or to a sector size since I don't have this information, but the loading is WAY FASTER.

My conclusion is that the driver may do the following:
- transfer more 512-bytes blocks in a single ACSI command, more effective is the sector size is bigger and the file not fragmented (but this last requirements may not be true, depending on how the TOS handles reading of files with no fragmentation)
- only use a temp buffer for the beginning and ending of the transfer and for the middle contents transfer directly to the final buffer.

But I'm not sure about those either. Note that I did my tests with TOS 1.62 and 2.06.

In addition I'm wondering if when doing memory copy the driver or the TOS use the blitter. If so, it might explain some crashs.


Some other questions:

Before I had the latest Ppera's driver I used BIGDOS.PRG. This lead to a lot of crashs (hangs) during the FREAD call. I don't know why BIGDOS would work 90% of the time instead of 100%?

People using other drivers like HDDRIVER reported crashes on the loading screen, this is also a mystery to me, unless HDDRIVER has some requirements regarding MFP interrupts. I stop all timers for the demo and I use Timer A & B only. I was thinking that if a driver needed an interrupt, it could set it up at the beginning of the transfer.

In addition Ppera's driver also hangs on the loading screen when I tried to make an ACSI HDD image on Hatari. I don't know why, so I just stopped trying and used the HDD internal emulation of Hatari instead, but it's not cycle exact :/

I also found out that on Mega STE with 16Mhz+cache, the FREAD call sometimes crashes.

Petari
Posts: 528
Joined: Tue Nov 28, 2017 1:32 pm

Re: HDD drivers. How does it work?

Post by Petari » Thu Dec 21, 2017 9:02 am

Welcome to this little forum, hope we can have here some nice and constructive talk.

OK, first some very basic things considering hard disk/mass storage:
They are called often 'block devices' - because you need to read whole sector, even if want to read only 1 byte. Usual sector size is 512 bytes - as it is on usual floppies.
Task of hard disk driver SW is to convert disk access requests from OS, it's filesystem part to 'physical' access to hard disk (port). OS gives not exactly it - it gives relative sector address inside partition. Driver needs to add to it starting location of partition. And of course driver is responsible for timings, initialization of drive and like. This was case of small FAT16 partitions, under 32 MB.
Number 16 is very important here - If you calc. little, will see that 32MB is exactly 512bytes x 2 POW 16 .
Right in years when Atari ST was launched hard disk capacities started to go over 100-200MB . Creating many 32MB partitions on some 240 MB drive was not to practical. So, BigFAT16 systems were introduced. I don't know exactly what firms when, what ... Will talk here about what DRI made, and is used with Atari STs.
Staying at FAT16 in case of partitions over 32MB needed actually simple thing, what was partially already used: so called clusters. It was there even with floppies. The aim was to make FAT tables smaller. In short: in FAT table each record is not for 512 byte sector, but for 2 adjacent ones, so 1KB. That makes FAT table 2x smaller. And more slack on disk, but that's not so relevant, especially with larger files.
So, idea was simply to make larger clusters - in case of TOS 1.04 it goes up to 16 KB . And here we are at discutable decision of DRI/Atari programmers.
Why they needed to solve it with so called 'large sectors' - they stayed at 1 cluster = 2 sectors, and enlarged logical sector sizes (which were 512 bytes with small FAT16) ? I think that simply because they used 16-bit code in filesystem (maybe C compiler was not that good ?).
In practice that means that BigFAT16 partitions up to 64 MB has 1 KB logical sectors, up to 128 MB 2KB ... up to 512MB 8KB . With that, OS can access 512MB partitions with 16-bit sector addressing. But here is the flaw: now minimal block is 8KB, not 512bytes.
It's on hard disk driver to take care about proper logical sector sizes, and it must even to care about creating proper sized buffers, instead TOS .

This is why I recommended 16 KB or multiple of it block reads - then on partitions of 512 MB (and smaller ones, of course) will always read straight to dest. Of course, first read must be at pos 0 or pos what is multiple of 16 KB.

It is not relevant is it single or multiple sector ACSI read command - btw. in case of ACSI can read max 255 sectors at once. Driver needs to read/write correct number of physical sectors - and that is calculated by simple multiply (actually shift left) according to logical sector sizes.
It has nothing with fragmentation in fact - fragmentation will result simply in that OS will perform many requests with less sector counts instead one, or fewer with more sectors - driver can get only request for contigous block.
TOS 1.00-1.02 can max 256 MB partitions, btw.
Using blitter for memory copy has no sense - you can get practically same speed using movem.l with many registers.

Considering crashes: BIGDOS worked usually fine for me. But it was not much compatible with some SW, with many of games. It is actually filesystem add on, and that was for sure not simple task.
Hddriver and usual hard disk drivers rely on default Timer-C - so 200Hz freq. on it. So, you better don't stop it, or change freq. and mode.
Because fact, that many SW changes Timer-C, I solved mine to be independent from it.
I think that it is not cycle exactness problem of Hatari, but that it emulates not hot-swap in case of ICD extended driver. With Hatari is best to use basic ACSI driver instead ICD one. I can post you it for free, just mail me.

My driver always switches off cache when there is read on ACSI (DMA) . And switches back on after it's done. So, problem is somewhere else.
There is 2 kind of people: one thinking about moving to Mars after here becomes too bad, the others thinking about how to keep this planet habitable.

fenarinarsa
Posts: 9
Joined: Mon Dec 18, 2017 4:03 pm

Re: HDD drivers. How does it work?

Post by fenarinarsa » Thu Dec 21, 2017 10:37 pm

Thanks for the explanations.

I was aware of the cluster issue, it's the same on FAT32 MS-DOS if I'm right.
So indeed it's better to bound reads to at least 32kB, and the best should be 64kB. But easier said than done :)

Now in the following case:
clusters of 16k
file of 20.1k
If I do a FREAD, what the driver/TOS would do?
1- read two clusters in a temp buffer then copy 16k+4.1k into the final buffer (pointed by the FREAD call)
or
2- read the first cluster directly to the final buffer, then the second cluster to a temp buffer and copy back 4.1k?

I guess if it copies back everything (answer 1) this is a TOS limitation.

It's very interesting that you disable cache on Mega STE. I got everything I could find on this cache and I'd like to see if I can use it in an optimized way. However if you disable the cache, all cached data is automatically trashed so every future read/write in RAM won't use the cache.

And you know what ? I found out that on Mega STE my render functions were a lot slower when a read access is in progress. I thought it was because of the DMA transfers, but now it makes sense, it's surely because you disable cache /o\

There's surely a way of keeping the cache on until the end of the file read but it's not trivial and may force use of the blitter to copy data by the TOS :/ however using the blitter means killing all interrupts during the blitter initialization ~_~

What's ICD driver? Your driver?

Petari
Posts: 528
Joined: Tue Nov 28, 2017 1:32 pm

Re: HDD drivers. How does it work?

Post by Petari » Fri Dec 22, 2017 7:58 am

Clusters are solved different in DOS Big FAT16 . Logical sector size is always 512 bytes, so 1 phys. sector. And count of sectors in cluster can be: 1, 2, 4, 8 .... up to 256 or even more . That way is better, but needs 32-bit sector addressing.

Here is example how it goes in TOS. Partition is 300 MB, what means that logical sectors are 8 KB . SW sends request to load exactly 8KB + 1 bytes (at once, what would be 8193 bytes) . TOS will send req. to driver to load 1 log sector to final dest. Then TOS will send req. to driver to load 1 log sector to sector buffer. Then TOS will copy 1 byte from buffer to final dest, and return count of loaded bytes to SW .
Inefficience is in need to load 16 sectors because only 1 byte (or up to 512) . DOS will load in that case only 1 sector for that 1 byte after 8KB. This makes effective speed of disk access some 45% lower . Note that TOS does not need to load whole cluster in case of short requests, because it can access 1/2 of cluster size - what would be logical sector size.
Cache disabling on Mega STE: you missed somehow this: " And switches back on after it's done." . Cache will be restored in state as was at beginning of driver code for read/write.

You are right, it goes slower when disk read is in progress. DMA chip has absolute priority on CPU bus - when it transfers RAM content - read or write CPU will be stopped for some time. I have function in my speed test program to show even that:
http://atari.8bitchip.info/ahpt.html
Right first screenshot shows that at 1172 KB/sec (best what I got with UltraSatan) CPU power/speed falls to 68% .
But not only CPU, blitter speed falls too, probably even more.
Since you do lot of mem. copy with blitter, you should disabling cache while it happens too. Blitter is actually another DMA chip in Atari.

'ICD driver' is hard disk driver what works under by firm ICD established extended ACSI command set. I'm surprised that you are not familiar with term ICD. That firm made first ACSI-SCSI adapters which override 1GB capacity limit. UltraSatan, Gigafile follow ICD protocol - that's why you can access multi GB cards with it. But as said, Hatari emulates it not completely.
And some advertising to end. You should try Steem Debugger. I say that it is much better than Hatari for testing SW, especially some graphic.
I use mostly old 3.2 . It has no own code for ACSI emulation, but solves it via Pasti . Basic ACSI emulation, so max 1GB . But that's more than enough for testing purposes - except speed - it is way too fast in compare to real ACSI speeds.
There is 2 kind of people: one thinking about moving to Mars after here becomes too bad, the others thinking about how to keep this planet habitable.

Cyprian
Posts: 59
Joined: Fri Dec 22, 2017 9:16 am

Re: HDD drivers. How does it work?

Post by Cyprian » Fri Dec 22, 2017 9:23 am

Petari wrote:
Fri Dec 22, 2017 7:58 am
Since you do lot of mem. copy with blitter, you should disabling cache while it happens too. Blitter is actually another DMA chip in Atari.

I read on Tobe site http://www.gnagnaki.net/wiki/guide/atari/megaste/cache (unfortunately it is no more available) that BLiTTER invalidates the cache due to it is on the same bus with the CPU. If its true, there is no need to disable the cache during blitting.
That should be validated on the MSTE.
Jaugar / TT030 / Mega STe / 800 XL / 1040 STe / Falcon030 / 65 XE / 520 STm / SM124 / SC1435
SDrive / PAK68/3 / CosmosEx / SatanDisk / UltraSatan / USB Floppy Drive Emulator / Eiffel / SIO2PC / Crazy Dots / PAM Net
Hatari / Steem SSE / Aranym / Saint
http://260ste.appspot.com/

Petari
Posts: 528
Joined: Tue Nov 28, 2017 1:32 pm

Re: HDD drivers. How does it work?

Post by Petari » Fri Dec 22, 2017 10:10 am

Cache must be not set off in all cases. It is just simple measure to be sure. Problem appears when DMA writes in RAM area what is cached - and cache size is 16 KB on MSTE. Cache is not aware about it, because that write came not from CPU - while cache control is tied only to CPU. So, there will be different content in RAM and in cache, after DMA writings to cached area. Now, I did not read some concrete docs about, but since blitter can work only at 8 MHz in MSTE, there is no sense to cache blitter RAM access, so I think that blitter can have same effect as disk DMA chip.
Someone can look on schematic of Mega STE that cache logic, and has it anything with blitter. And of course can make tests too on real HW.
MSTE cache will be never emulated 100% accurately - that's just too hard, and not much SW is affected.

In case of driver, driver gets only destination address where to write in RAM from disk. It can't know at all is that area in cache or not, so it must disable cache always.
There is 2 kind of people: one thinking about moving to Mars after here becomes too bad, the others thinking about how to keep this planet habitable.

Cyprian
Posts: 59
Joined: Fri Dec 22, 2017 9:16 am

Re: HDD drivers. How does it work?

Post by Cyprian » Fri Dec 22, 2017 10:34 am

Petari wrote:
Fri Dec 22, 2017 10:10 am
Cache must be not set off in all cases. It is just simple measure to be sure. Problem appears when DMA writes in RAM area what is cached - and cache size is 16 KB on MSTE. Cache is not aware about it, because that write came not from CPU - while cache control is tied only to CPU. So, there will be different content in RAM and in cache, after DMA writings to cached area. Now, I did not read some concrete docs about, but since blitter can work only at 8 MHz in MSTE, there is no sense to cache blitter RAM access, so I think that blitter can have same effect as disk DMA chip.
Someone can look on schematic of Mega STE that cache logic, and has it anything with blitter. And of course can make tests too on real HW.
MSTE cache will be never emulated 100% accurately - that's just too hard, and not much SW is affected.

In case of driver, driver gets only destination address where to write in RAM from disk. It can't know at all is that area in cache or not, so it must disable cache always.
yep, Tobe claimed he checked MSTE schematic and the BLiTTER is on the same side as CPU against the cache, therefore every blitter activity goes through the cache.
Worth to verify that.
Jaugar / TT030 / Mega STe / 800 XL / 1040 STe / Falcon030 / 65 XE / 520 STm / SM124 / SC1435
SDrive / PAK68/3 / CosmosEx / SatanDisk / UltraSatan / USB Floppy Drive Emulator / Eiffel / SIO2PC / Crazy Dots / PAM Net
Hatari / Steem SSE / Aranym / Saint
http://260ste.appspot.com/

keli
Posts: 62
Joined: Tue Aug 22, 2017 1:34 pm

Re: HDD drivers. How does it work?

Post by keli » Fri Dec 22, 2017 10:51 am

Petari wrote:
Fri Dec 22, 2017 10:10 am
Cache must be not set off in all cases. It is just simple measure to be sure. Problem appears when DMA writes in RAM area what is cached - and cache size is 16 KB on MSTE. Cache is not aware about it, because that write came not from CPU - while cache control is tied only to CPU. So, there will be different content in RAM and in cache, after DMA writings to cached area. Now, I did not read some concrete docs about, but since blitter can work only at 8 MHz in MSTE, there is no sense to cache blitter RAM access, so I think that blitter can have same effect as disk DMA chip.
Someone can look on schematic of Mega STE that cache logic, and has it anything with blitter. And of course can make tests too on real HW.
MSTE cache will be never emulated 100% accurately - that's just too hard, and not much SW is affected.

In case of driver, driver gets only destination address where to write in RAM from disk. It can't know at all is that area in cache or not, so it must disable cache always.
If the cache does some sort of snooping on the bus to automatically invalidate cached data when other bus masters write to memory, it makes sense that the Blitter would have that effect and the DMA not. The blitter drives the bus the same way the CPU would, putting valid addresses on the address wires, etc. The DMA chip does not however, as it is dependent on the MMU to do the addressing. As the MMU by necessity is connected directly to the DRAM bus, it can cheat a bit and performs a bit of a shortcut. It generates the decoded row and column addresses for the DMA transfer directly from its internal DMA address registers without them ever showing up on the CPU side. This means that any cache watching the CPU bus will never see the accesses and cannot automatically evict any cached data.

Petari
Posts: 528
Joined: Tue Nov 28, 2017 1:32 pm

Re: HDD drivers. How does it work?

Post by Petari » Fri Dec 22, 2017 11:14 am

Why is that I must do everything myself ? :D Looking schematic says that blitter CLK is connected to 8MHz clk. So, it never works at 16 - that's case only by Falcon, which has 2x faster RAM. CPU clock comes from one of PALs - can be 8 and 16 MHz. Not to mention that there should be then bit to set blitter to 16 MHz in MSTE config. register.
Ergo - if blitter can not 16MHz there is no sense to cache it's RAM accessing. Furthermore, even if it could 16MHz, there is no sense with slow RAM. Cache will not add anything. Purpose of cache is to make work of CPU faster, and CPU spends lot of time in reading opcodes, constants, addresses etc.
Blitter mostly just transfers from one loc. to other. So, from RAM to RAM in most cases. Immediate cache write will not speed it at all. Blitter works more efficient than CPU, but limited number of operations. There is no opcode fetch (extra RAM access what does not transfers) in blitter, once it gets command to start it just accessing source and target RAM areas.
It is hard to follow schematic, we can't know PAL logic ... If someone still can find proof that cache is involved with blitter, please come and present it. Best would be aided with some tests on real HW.
There is 2 kind of people: one thinking about moving to Mars after here becomes too bad, the others thinking about how to keep this planet habitable.

Cyprian
Posts: 59
Joined: Fri Dec 22, 2017 9:16 am

Re: HDD drivers. How does it work?

Post by Cyprian » Fri Dec 22, 2017 11:37 am

actually, I'm going to write some code and check the BLiTTER/cache interaction.
Tobe describe very well the way how cache works (rows/columns) in MSTE, unfortunately his site isn't available anymore.
Jaugar / TT030 / Mega STe / 800 XL / 1040 STe / Falcon030 / 65 XE / 520 STm / SM124 / SC1435
SDrive / PAK68/3 / CosmosEx / SatanDisk / UltraSatan / USB Floppy Drive Emulator / Eiffel / SIO2PC / Crazy Dots / PAM Net
Hatari / Steem SSE / Aranym / Saint
http://260ste.appspot.com/

Post Reply