- The LaST Upgrade -
PART 15 - "BAD DMA" INVESTIGATION
exxos 2014 - Last updated March 4, 2023
DMA fixes have been reorganised into a new page HERE.
This page shows my research into the DMA related problem on the STFM/STE/MEGA computers. Firstly, It is common knowledge that some STE's had a "buggy DMA" but so far I have seen no evidence of this. Some STE's were fitted with a -38 DMA which is assumed to be the "buggy DMA" however, the -38 DMA was also in the STFM's from very early on , and I personally have more issues with the STFM DMA than either DMA version in the STE.
I was sent THIS article (submitted by SteveBagley thanks!) by ST format that The first 100 STE's had a bad DMA. Of course, this makes no sense in recent times as pretty much everyone has some form of DMA issue. While it is possible there were indeed faulty machines, I think it is highly unlikely. Of course this type of article seems to have done more harm than good.
I was later sent THIS snippet ( thanks mikro!) which claims fitting resistors on some DMA control lines will solve the issues. Unfortunately this did nothing in my tests. I tried resistors and capacitors on ALL signals and both versions of DMA and it did not solve anything. As to how that information came to light I have no idea.
So to get to the bottom of these issues I did my own tests and published the progress herein this page. My "fix" last year (2014 fix) solved some peoples issues with intermittent booting or random corruption, but it did not solve everyone's issues. In some cases it made things worse for people. The "capacitor fix" as stated in 2014, might not be needed now as I have basically superseded the "fix" with a possible better fix using a resistor array on the DMA data lines.
I started my work due to random floppy issues which had me baffled for some time. Though later I tried the Ultra Satan hard drive by jookie and had progressive problems the longer the machine was in use. With my work with P.Putnik on our fast CF drive we found regular bit pattern errors on the same sectors on our CF drives. As our CF drive pushes the DMA to its limits, we soon found issues. So it is assumed the same bit pattern issues are what causes the floppy drive and ultra satan to misbehave.
Mostly this issue starts to show itself with random access problems on the floppy drive. While bad floppy, bad drive, bad PSU all contribute to the problems, it has become clear "bit issues" still happen which cause unexpected floppy issues. These issues seem to only appear after the machine has been turned on for about a hour and gets progressively worse. Similar with ultra Satan, it tends to start with random character changes in filenames and pressing ESC to refresh will normally causes more corruption and may appear to corrupt the drive totally, even to the point the drive no longer boots. But this does not mean your drive is actually trashed. It is more a "read error" while reading the FAT. But of course if you write to the drive during this unstable time, then it can be possible to trash your drive.
After much work, I came up with the "SIL Resistor Fix" for the DMA. This simply involves adding a 1K pull up network onto the DMA datalines. I strongly urge everyone to do the resistor fix on the DMA datalines, regardless of if you have a STFM/STE/MEGA and regardless of DMA version. This is all explained in my results below. After I did the fix all my floppy and ultra Satan issues went away.
September 1, 2015 DMA FIX & TESTINGS
I noticed over the past year that I started to get a lot of troubles with the floppy drives on my STFM's. At first I was blaming the floppy drive itself, then the actual floppies. Then blaming the floppy drives again. I later tried the suspect floppies on a STE and it worked perfectly. That STE had a -38DMA in case anyones wondering. I do not use my STE machines much, but there could be a similar floppy problem on them also.
So for those who are having these floppy issues (it may effect hard drives as well) I would firstly update your PSU capacitors with a kit from my STORE or buy a refurb PSU from me. The DMA circuit is the first to start failing if the PSU has gone bad. Swapping the PSU for another one, or buying a "new one" from elsewhere on the Internet will NOT work!
Secondly, please get a head cleaning kit like THIS ONE . Or get some IPA spray and a cotton bud and clean your drive heads that way. These drives are 20+ years old and NEED cleaning! Also make sure the floppy you are using is actually reliable first. Either try a known working ST or get someone else to verify the floppy is good. A lot of coverdisks are floating about being sold as "blanks", these generally are really cheap disks and probably do not work reliably any longer.
So once you have done those 3 things above, if you are still having issues then you an try this mod below. Trying this mod with a bad PSU or dirty floppy drive will undoubtedly fail. So please don't skip the steps above.
Above was the mods I tried.. Let me explain what's what...
Above is the RDY signal on the DMA. On the left the original signal, and on the right a modded version. Clearly there is 2V undershoot where its amazing that this did not blow the DMA chip as normally such a undershoot would kill most IC's. Anyway, I lifted the RDY signal pin 38 out of the socket and pushed in a 100R resistor into the socket, then soldered the resistor end to the now bent up DMA pin. I also added in 50pF from the DMA pin to 0V to help clamp the signal better. In fact , on the right image, 50pF was probably to high value to use, Though 50pF was all I had to hand. Probably 10pF would be a better value. In anycase this helped a lot with drive read problems.. but this did not cure the problem by far.
So after much investigation, I found that the -38 DMA was acting like it had "open collector" outputs. Whereas, a pull up resistor is needed to correctly set a logic HI. It appeared that moving my hand near the DMA would cause it to behave differently. This suggests open collector outputs, or at least floating outputs. Though that did not seem to hold totally true as loading the signals with the scope probe should make the problem worse, but it did not seem to.
So some more investigations into the databus on the DMA. I compared the -38 DMA, to the 001A DMA, vs the -38 DMA with my resistor mods.
-38 DMA with mods
|So first of all, there does not seem to be much difference
between the -38 and the -001 DMA. Though on the 3rd image on the -38, there
is a HI pulse and a slow "drain" back to LO level which takes
500uS or there abouts. The 001 does seem to do the same thing, but It was
hard to capture it. So I am going to class that as normal behavior.
Same with some logic HI voltages. Both -38 and -001 mostly are at 4V logic HI, but sometimes spike to 5V. So nothing seems to be much different between the 2 DMA versions. Though obviously something is wrong somewhere.
So on the final images are the -38 with pull up resistors on the databus. I think this is interesting as now the signals are mostly spending time HI. On the stock -38 and -001, the signals are mostly pulsing HI/LO. But with the resistors, this does not happen.
I am not really sure what to really conclude yet. With the data being totally different with the pull ups, and it works, then a lot of the data shown with the stock -38 and -001 would mean most of the pulses are just random data which isn't being read. I can't say the bus is floating as I would expect to see something other than a firm logic HI or LO. But saying that, that 500uS "drain" from HI to LO as mentioned before, I do not see that with the pull up resistors. So this may suggest the bus is floating, but ultimately going LO with both versions of the DMA.
There is also some 1V "pulses" in both DMA versions but not on the pull up version. As both DMA versions have the same 1V pulses, I would have to assume they are not a issue.
Also between DMA versions there is a lot of noise from 4V to 5V. Not a solid logic 1 like on the pull up version. Generally we need 2V for a logic HI, so I wouldn't class that as a problem. The 3rd image on the -38 DMA with the slow drain to LO (500uS) might be a clue as it would be somewhat random if the bus was being read while that signal was in transition. Though as the 001 DMA seems to do the same, then I can't conclude that is the problem either.
So as to why the 001 works in my STFM and the -38 does not, I have not figured that out yet. It is however concluded that the DMA databus does need some pull ups. I will semi-conclude (actually again) that what ever the difference is between the DMA versions, it is so slight that its more of a fluke it works than anything. I would not say the new DMA is better and I would not say the old DMA is "bad". The new DMA is probably just marginally less prone to problems on the motherboard itself. Updating to the "good 001 DMA" is probably just a short term fix for a much deeper set of problems.
September 20, 2015
I have been testing some floppy drives out for people today and found more floppy related issues..
The drives I am testing are EPSON 380's. What seems to happen is that they only work when putting a little pressure on the top head itself. So either the head ages and the voltage becomes to low to read the floppy, or the metal "spring" the head is mounted on, warps slightly skewing the head from the floppy surface. I have even upped and downed the 5V rail just to see if that helps, but didn't. I have spent a little time before trying to solve this head problem, but there does not seem to be any adjustments either mechanical or circuit related. Up until recently I had been blaming the floppy drive itself, but it isn't always. Of course cleaning the floppy heads helps a lot and should be mandatory these days.
Now I tried another EPSON 380 and that drive worked perfectly. So I tried the other one, failed to read reliably. I have been going back and forth for almost 2 hours with this, and each time I would generally conclude the floppy drive is faulty. BUT..
I just put 10K pull ups on the DMA databus, and that "faulty" drive just loaded ECOPY fine and formatted a floppy fine. Before I was lucky to get to the desktop and open up A: drive window. So DMA problems strike again :-(
So at this point, I am leaning towards actually recommending everyone
with a STFM with a IMP or -38 DMA put pull ups on the DMA data lines.
My next lot of tests on the "to do list". Where Atari on the STE used the 74LS 244/245 buffers, but they only used them on the hard drive port. Obviously the STE has DMA issues (still) so it did not cure the "problem". Atari obviously thought buffering was needed, but as to why they didn't put the buffer so the WD1772 gets buffered as well is beyond me *shrug*
I have ordered a couple 74ABT245 buffers and going to try buffering the data lines directly on the DMA, so hard drive and floppy drive both get buffered. The ABT245's are easy to drive (CMOS inputs) but also have some mA TTL output drive capability. This way, the DMA can easily drive the ABT245, and floppy/hard drive can easily drive the ABT245 which will give good input signals to the DMA. win - win ;)
My best guess is as 10K pull ups are needed, that the DMA doesn't have enough drive current, or transient drive current to power the WD1772 AND a hard drive. In fact the DMA doesn't seem to be able to drive the WD1772 correctly. It could be possible that as the silicon has aged, it simply needs more current to drive the inputs of the WD1772 for example, where the DMA isn't able to deliver. Though 10K pull up is only 0.5mA per data line anyway. But possible that is enough to correct the ageing silicon.
Anyway, The ABT buffers being easy to drive shouldn't be a problem for the DMA, so in theory the pull ups shouldn't be needed once the buffers are added in, and there won't be any drive issues with the 1772 or a hard drive.. So we will see if the buffers work on not.. :)
September 22, 2015
Tried it just, initial testing show no change with the -38 DMA :( The 001A DMA works fine with this mod. So back to square 1 again :-(
I did notice on the STE diagram that they routed the OE pin though a flip flip, so OE would be 1 cycle behind. I don't see why they did that, I would have thought the data being available should be on the databus as soon as possible, not delay it by 1 cycle *shrug*. Maybe Atari thought the databus would "settle down" a cycle later, but that wouldn't work based on my research. It would just delay all the bad signals by 1 cycle instead.
I will do some more testing later see if I can find out anything else. I want to look at the actual rise and fall times on the DMA with and without resistors, I don't think I actually looked at that before.
So I did more testing and I am leaning towards the outputs of the DMA are floating, but as mentioned before, its not exactly true.. actually this happens...
So when the DMA outputs go HI, they do so in about 20ns which is pretty good, then they seem to float. The yellow line is what is on the ABT buffer output, So it sees the decaying HI from the DMA as HI for some time, until the DMA HI voltage reaches about 2V then the ABT buffer (yellow) turns off.
I placed a 1K resistor on a DMA dataline to test rise and fall times, and no real difference. I then noticed while holding the free end of the 1K resistor that the voltage went crazy on the pin. I tried the 001A DMA and that suffered the same, BUT, instead of the voltage varying, it actually seems to vary the pulse width of the signal.
So the 001A DMA - the data line under test is going via 1K resistor which I touch, and depending how hard I grip the resistor , it varies from like 100uS to 500uS+ on that signal.
Same test with the -38 DMA and the LO voltage seems to vary from 0 to 3 volts, but it becomes unstable overall. So the 2 DMA chips behave differently. But still have a similar "fault".
A dataline (right most image) is fitted with a 1K pull up (both show the same with either DMA)
The transition time from HI to LO is about 20ns, but the LO to HI transition really seems to be struggling. I did a quick calculation and 1K into 50pF will give a rise time similar to what is shown on the scope. So I assume that is the input capacitance of the WD1772 as there is nothing else connected to the bus (IE no hard drive).
In anycase, its clear now what the difference is between the 2 DMA chips. As to what they behave like that no idea. Though it also explains write issues to either the floppy drive or hard drive. For example, the hard drive controller (whatever micro/PLD it is) probably has a good drive voltage and current, easily 5mA at 5V, so the DMA can read such signals easily. Though when the DMA writes to the hard drive, the bus is "floating" and simply gets screwed up. The only way it could ever work is if the hard drive latched the data on the DMA bus as soon as it appeared the HI voltage is in constant decay. I don't know much about how fast hard drive read and writes, but if something took longer than 100uS then the data would have "faded" by that time. Also don't forget that I had no hard drive connected to the "load" on the bus would greatly reduce the "data stable time". hard drives could may well process the data faster than 100uS, but even so, this "bus floating" issues is pretty much concluded I think.
10K pull ups on the 001 DMA seem to work the same as not having them. No apparent issues with read or write to the floppy drive this time. The LO to HI transition is slower, but that is understandable as I used 1K in the previous tests above. Even so , the 001A "slows down" the more the databus is loaded. So the 001A should have the pulls ups to make sure it stays reliable.
Conclusion ? More testing needs to be done by other people, though it simple looks like the DMA needs 1K pull ups on its datalines. While the 001A DMA behaves differently, it still suffers from pull up issues. I wouldn't say the 001A is "better" and I sure wouldn't say the -38 is buggy. Probably the issue with the STE was the bus drivers Atari fitted simply loaded the DMA datalines to much and pushed it over the edge into not working on some machines. This would mean any STE with a -38 DMA is prone to this pull up issue, even more so than the STFM's using the -38 DMA. Changing to the newer DMA 001A, while it works for some people, its not really a cure to the route problem of no pullups on the databus. The 001A simply "manages" the problem better. There shouldn't be any issue of either DMA version in a STE when using data line pull ups. I think the 001 DMA simply gets slower as the databus is loaded, the -38DMA goes a bit mental. Though the slower speeds of the 001 DMA could possibly casues problems when trying to use modern SD card based drives while it may not be a problem on older mechanical drives.
So before buying a "new 001 DMA" try this inexpensive resistor mod first!! It drives me nuts almost every week that people are referring to the -38 DMA as the "buggy one". It managed to work fine for many years in the STFM, so why is it all of a sudden buggy in the STE ? That makes no sense at all. I suspect as drive speeds are getting faster that the DMA was never tested by Atari at higher speeds which is why it is only in recent years such issues are starting to show. Plus the hardware is now around 30 years old so likely that is a issue for recent years also. So PLEASE will everyone forgot this "buggy DMA myth" it's pure nonsense.
Unfortunately everyone is hell-bent on this "buggy DMA" myth.
I really blame that ST Format article for starting off that hype. As
to if there really was a batch of buggy DMA's we may never know, but
clear by the volume of issues, that those acclaimed 100 machines are
not going to be the ones which everyone happens to still be using today.
I must have heard "buggy DMA" a 100 times this past year,
statistically I find it improbable that the only surviving STE's all
just so happen to have a "buggy DMA". So PLEASE will people
forget this "bad DMA" rubbish.
June 3, 2016 UPDATE - Possible "buggy" DMA found!!
I had a STE in for repair this week. It was suffering from DMA problems. The symptoms were random corruption. I first started out copying about 300 files and it seemed OK. Then on the second batch of files it corrupted the entire drive. I have seen this before many times, but always my fixes solved it. After a while it seemed to get worse where it got to the point where I would save the desktop and desktop.inf would appear in the C: drive window. But refreshing the window, desktop.inf vanished. I tried several times with same result. I then tried to copy 300 files and it got to creating 2 folder and said something about it cannot find the files I was trying to access. Then the C: drive was totally trashed.
What seems to happen is during the write cycle to the drive, it seems the data isn't going where its supposed to. Its like instead of writing to particular sectors on the drive, it seems to start where the FAT tables are stored (I don't recall where on the drive they are, Think they are at the start of the drive). So its like the drive can read data, but on writing it always seems to write to the first sector on the drive and corrupts the FAT totally.
After many hours trying to find a cause for the corruption, I decided to basically give up. I couldn't find any cause for the corruption. I later tried the -38 DMA and even the 001A DMA out of my test machines and the STE had no issues reading and writing several thousand files of various sizes. So clear this DMA was at the very least faulty. As to if it came out of the Atari factory that way, or it became damaged some time in its life is unknown.
Below is a image of the suspect "buggy" DMA. Really it is to display the full numbers (batch number) so if anyone else has one of these them PLEASE!!!! send it to me so we can confirm once and for all if there is a buggy batch of the -38 DMAs!.
As mentioned before. Please try my fixes before deciding if your DMA is actually faulty. This is the first one I have seen in over 20 years to behave this way!
UPDATE July 11, 2016
3 STE's all having the "good DMA" and failing to access the floppy drive correctly. It will read data perfectly well form the floppy drive and all I did was to save desktop and now I have a 2GB newdesk.inf! Not bad for a 720K floppy! For those who think I have "made up" the above then see the video on youtube.. https://www.youtube.com/watch?v=czwdADV1ycA
Current mods include 2.2K resistors on the CPU address and databus and 1.2K on P100 resistor pack and 2.2K pullups on the DMA databus. It is a mixed bag of what resistor arrays are fitted on the STE it seems. Mostly they are 10K or 4.7K, though I have found 4.7K isn't stable enough and changed them for 2.2K.
Currently I have found suspect issues on the upper data bus. In particular D9-D14. It appears a logic low isn't always there and a lot of the time there is a 0.5V spike when the address strobe is low. So likely this is causing some databus corruption.
UPDATE November 4, 2016
Some really interesting develops this past month with some testing kindly done by Stefan Krug. Stefan's had several machines all suffering with DMA issues with Ultra Satan & GigaFile. Upon replacing the DMA with a 001A his problems seemed to be cured. He also tried IMP DMA with the diode in the 5V line and that also worked. So I suggested he try the original DMA's -38 with the diode and Stefan reports success and no hard drive issues.
I suggested also another fix, as mentioned previously where the 10K and 4.7K arrays be changed with 2.2K ones. Also resistor pack P100 should be 1.2K.. Stefan reports this also cured the DMA problems with the -38 DMA and worked now without the diode.
As for the SIL array posted originally, This seems to solve problems (at least for me) with various floppy drive issues. Though while I have not done much testing on the STFM, I would suggest the SIL array be also connected on the DMA databus.
One thing to bare in mind is on my STE boosters, which my machines generally have now, I have no problems with DMA and almost never have. Though Because my STE machines I use for booster work, I changed the SIL arrays for 2.2K ones anyway. So likely this and with the tests by Stefan goes to further prove that the DMA is not to blame for hard drive issues.
So I still hold by my statement that the 001A is NOT better than the -38 (more on this in a moment) and the -38 is NOT faulty on ever STE machine ever produced. The 001A is simply slightly more noise immune than the -38 DMA. That is all! Noise is why Atari had to place the diode on the IMP chips, not because the IMP chips are bad, but because IMP chips are actually faster and respond to noise faster. Slower chips work better as they are more noise immune. This make similar situation with the DMA. The 001A is simply slower and doesn't respond to bus noise. So technically the -38 is a better IC speed wise. Overall speed doesn't matter anyway. So whereas people may call IMP and -38 DMA evil.. Actually not. Actually they are showing up the crappy bus on the Atari because they are faster chips.
More people need to try the above out. Though myself never having DMA issues with the -38 and now with the testing of Stefan, this also confirms my thoughts. As mentioned previously on this page, I have seen issues with the 001A chip which can also cause corruption. So the 001A DMA is NOT a "sure fire fix" for DMA issues. In fact so I have heard over the years, the 001A has mixed views on what it "fixes", so as there is doubt on 001A solving such issues, then it just further concludes the DMA chip itself is not to blame.
The jury is still out on "bad DMA chips". But for all intents and purposes, I think this "bad DMA myth" be finally closed :)
February 2, 2017
I had a DMA sent to me from Damian (updated IC list also) This DMA failed on WRITE. Though while I was scoping out my working -38 DMA it also started to fail on READS showing corruption. This was pin 38 the RDY signal. Considering it only took the load of the scope probe to cause problems , it got me interested into that pins a little more. Looking at the diagrams it has just a 1K pull up on that pin. So I added another 1K and nothing changed. In fact it failed on floppy read! Remove scope probe, working again on reads. The RDY signal looks to have a lot of noise on the low, of about 1volt and about 2 volts undershoot.
I then found I had a 1V offest on the RDY pin, so I removed the extra 1K resistor and it droped to 0.8V. I added in 100R in series with the RDY pin and the offset on the DMA side was about 0.1V. Loading the RDY pin on the DMA resulted in reliable floppy operation, on the ST side, it was random corruption again. Interesting was that with the extra 1K resistor in place, the floppy motor was going on/off a lot when normally it would be constantly on during floppy access. So Damian's DMA was placed back into the ST and that now showed reliable READ operation.
I started to probe various DMA pins while copying files from floppy to hard drive, it failed on every file saying write error, but re-try and it went onto the next file. None of those files every appeared on the hard drive after the file copy finished. However, when I got to D7 (pin 29) it then did not show up any write errors and some files appeared on the SD card.
Considering the DMA manages to write to a floppy just fine, but not a hard drive is confusing in itself. The DMA's data bus must be working fine, and there isn't much else to a hard drive other than a couple extra signals from the DMA.
That DMA issue is still under investigation..
February 3, 2017 - Playing with fire - IMP DMA :)
One thing which seems a impossible combination is the IMP DMA in a STE :) We know the IMP DMA's work, with a diode in the 5V rail, at least on the MEGA ST. So I tried one in my STE and guess what, WRITE fail exactly the same as the -38 DMA I was testing from Damian.
With my study of the IMP - Atari basically said a diode is needed in the 5V rail to slow it down. So I tried that in the STE to see what would happen... and it still failed on write.
I find it unlikely 2 manufactures would manufacture 2 faulty DMA chips. The only possible reason was Atari changed the DMA circuit and got 2 manufacturers to manufacture the bad design. Though as the IMP series can be made to work in the MEGA, it would seem unlikely.
I tried Damian's -38 in my MEGA ST and it worked perfectly. I did not bother to try the IMP DMA. The -38 was simply plugged in, no other mods done at all. I think its again conclusive that these "faulty DMA's" are not faulty. If it was faulty, surely they would have failed in my MEGA ST ?
I am starting to wonder if the STE buffers are causing the issue. For some reason the databus enable is clocked via a flipflop. I will see if I can disable that , then it would be closer to how a ST would use the DMA and the buffers simply act as buffers, not being delayed with the flipflop nonsense. So I basically put a inverter in there and it failed on write. It took a while for me to realise I actually had the good -38 in there and now that is now "broke". So speeding up the 7474 has caused a write fail on a otherwise working DMA. Interesting!
I did have a 200R resistor in series with the 8MHz clock on the 7474 as the clock was really bad, so removed that, DMA still failed. So I removed the extra 1K in series with the RDY line I started out with, and the DMA had a good write. So obviously a higher than 1K value on RDY is bad. So now I have to re-visit my 7474 tests again as that resitor could have been the soul cause of the current issues with the good -38 DMA.
So with a 32MHz clock input to the 7474 with bad DMA, no change, write fail, placing the inverter only in the /G line, no change. So it looks like speeding up the 7474 with good or bad DMA makes no difference.
February 6, 2017
Much testing and little results. I won't go into it all, but basically the 2 buffers have been removed and wire linked over so the circuit matches the STFM now, though still does not work. I also re-tested the -38 DMA in question in my MEGA ST and heated it up a lot, and it still worked perfectly.
I also tried the -38 out of my MEGA ST into the STE test machine and that DMA also failed. So for some reason this motherboard I have simply doesn't like any -38's in there so far, even though they work perfectly well in a MEGA ST or STFM. Similar with the IMP DMA, works fine in any machine other than the STE. So I think its pretty conclusive the STE itself is somehow causing the issues and the suspect "buggy DMA" isn't actually buggy at all.
Enter Jookie.... He was kind enough to write a little app to constant write to the hard drive and read back the data to verify it was correct. The first version wrote a single sector and was in a loop doing that, and it passed just fine. I tried to create a folder on desktop and it corrupted right away. I shorted out the DMA databus and jookie's program reported errors, so the program seems to work fine. Jookie suggested multiple writes might be the issue, so version 2 of his tester allowed multiple sector writes. I tried 1,5,10,50 sectors and that also passed perfectly. So at this point we are both scratching our heads.
I suggested a 3rd program to not only verify written data, but to also verify the sector being written to, is actually being written to. As writes do not actually seem to fail, I can only assume the sector number is not correct. For example, DMA attempts to write to sector 0100, but may write to sector 1010 for example. I could only attribute that to a databus problem, though the wrong sector would have to be wrong twice, which would seem unlikely , assuming the databus is wrong with noise on the bus causing the sector issues.
I have documented before that the SIL resistors on the address and databus should be changed as this seems to greatly improve DMA operation and has been verified as a working fix. Though its clear there is still some other unknown problem causing the -38 (and IMP DMA's ) to fail in the STE. I did find it interesting that Atari documented a "fix" to place a 33pF capacitor across D0 and D2 which should fix some DMA troubles. This suggests to me there is noise on the bus, but placing capacitors across 2 datalines like that seems a little odd. Normally to "snub" a signal, the capacitor is placed from the signal pin to gnd. I have tried that on every pin on the DMA, and its hard to say if it changes anything or not. Though if there is noise on more than 1 pin, then it wouldn't be a fair test anyway.
So far all the evidence is still noise on the bus causing the issues. I *hope* that Jookie's next program will show some faults so I have something more visible to work from. So far it always fails to even create a folder without corruption. So I hope the next program will show faults which will give me something to work with.
My current thoughts are to build a adapter PCB and run the 16bit data bus from the CPU to the DMA though schmitt buffers, Basically to clean up noise on the bus between the DMA and CPU. I will also run most other signals though something like a 33R resistor to clamp down on ringing on signals. The 8MHz clock is really bad on the DMA for example. It could be a lot of smaller faults all adding up to a larger problem, rather than there being simply 1 single problem somewhere.
The investigation continues...
February 9, 2017
Jookie wrote a nice update to write sector numbers from 0-255 to the SD card, and write some known data. See image HERE.
All sectors are written to correctly and in the right order. I can't see any failure in what was written. So at this point I am stumped. The SD card will not even partition on the STE. If I partition with a 001A DMA, it works fine, then when I write a simple folder with the -38 DMA, it corrupts the drive. As to why that fails and the sector write tests pass easily doesn't make sense.
My only option now is to create a new partition with a 001A DMA, write a folder, then do a sector dump on my PC to save a working setup. Then repeat with a -38 DMA and see where the data fails.
The next test is as follows. Format and partition 100MB drive. Create newdesk.inf and save sector dump on my PC. Same again, but only saving newdesk.inf with the -38 DMA. On desktop and reboot newdesk.inf doesn't seem corrupted at this point. Sector dumps HERE . I am not expert in FAT, though looking though the data it seems to look basically OK. There are a lot of differences. Though the one which seems different the most is the test1.bin seems to have junk data in random places.
I do not see any junk data on the ok.bin file (with 001A DMA)
I tried writing a folder to the card and it vanished. I did a sector dump and I couldn't even see it in the FAT or anywhere on the drive. Jookie said his program was based on AHDI sources, so he had the idea to try AHDI driver, but that failed in the same way.
Only odd thing is the first file on the drive doesn't get corrupted, but the second one does. Though I just deleted newdesk.inf and created a folder, and it worked that time, a second folder vanished again. So it seems it doesn't like more than 1 file/folder being created on the drive.
I took a look at a FAT table on the 001A DMA, I created 1-5 in folders and 12345678. There could have been a couple corrupted filenames in there, but doesn't matter for the tests.
Then I formatted and swapped back over to the -38 and created foldernames 11111111.111 and 22222222.222. Again the first write worked, but the second filename vanished.
Only this time , what looks like random corruption at the end of the sector. This end of sector corruption seemed to span across a few random sectors.
I dumped the whole 32GB SD card I was using and loaded it into a better hex editor to search for my "22222222" filename, and it did not find it anywhere.
Jookie suggested the diagnostic tools for HD Driver, I had not seen (or take notice) of them before. But seem slike it can output commands to the serial port to monitor on a PC. So likely I will get a USB-serial adapter and cable so I can look into that also soon.
February 10, 2017
I had a idea to changed the CPU for a HC type (as this is used in my STE bosoter where I never had any issues) and now I can write multiple folders without corruption. This is interesting. Considering I never had any issues with the -38 DMA in general, I always used the HC CPU as my main test machine is my STE with the 32MHz booster. Though on that booster I used 2.2k pull up resistors for the address and databus also. Something I have already found to be a huge clue as to why I never had such issues.
Though my "new" test machine, is a stock machine, no mods at all (other than TOS206) and simply swapping the CPU for a HC has seemed to be a huge push in the right direction. As the address and databus I think can be ruled out, it could be /AS or /DTACK at fault here.. So time to investigate...
I tried a 1K pull up on /AS and then /DTACK and corruption is there (with the original Motorola CPU). It again puts me back to thinking the address or databus is at fault which was my original conclusion.. So time to swap out those pullups for 2.2k on this board... Which did not solve the problem.. I also tagged another 2.2K on the bottom of the databus ones (so 1.1K now) and still no change. I guess this explains why this DMA failed also in my boosted STE. Next to try lower address bus resistors... Same result, BUT oddly, the first folder write seemed to be ok, and failed on additional writes. So something changed.
Considering there is only the databus on the DMA and A1 (and CPU clock) changing the CPU shouldn't really have any effect other than not loading the bus as much.
I tried 3 other MC CPU's and they all acted the same. I tried a ST CPU and that one worked. So that in itself was interesting. Some voltages on the CPU seem to go over 6volts, I have seen that issue on my boosters with bad ground planes, so its possible the higher current draw of some CPU's is inherently causing the bus to become unstable during DMA operations. Clearly the HC CPU is lower power and has no issues, so the ST CPU I assume is lower power also.
This doesn't explain why my boosted STE didn't work as that has the bus pullups and the HC CPU anyway. So time to re-connect up that machine to see whats going on... Using the same -38 as in my other STE, the boosted STE seems to be working fine now. As to why this DMA failed originally, maybe I just had a bad connection somewhere. Saying that, it sparked off the DMA investigation once again so in a way it was good it failed.
So currently, I am back to saying, my test machine never suffers from DMA issues, because it has the lower value bus pull-ups and the HC CPU. Basically because it has my V1 STE booster fitted and that is now my main test machine. So while the bus resistors solved almost all the issues, the CPU itself being a HC CPU also seems to have solved these current DMA issues.
As to why it only fails on the desktop, not with Jookie's test program.. thats a odd one. It must be when it is on the desktop, much more bus activity with maybe ROM, so it causes more oscillations on the bus and then causes it to fail. So using a HC ROM IC (like my DUAL TOS board does) likely also helps with bus loading issues. I did swap over the DUAL TOS board over to my first STE (not with the booster) to see what would happen. The ST CPU worked as before, and the MC CPU worked on the first write but not the second one. I saw that behavior before at some point, But I would lean towards my DUAL TOS board in helping to solve the bus issues as the MC CPU earlier today was failing all the time on writes.
What I may do at some point is to design a small PCB for the DMA to fit into so I can run the whole DMA though some 33R resistors to see if that makes it more stable. Overall , the "bad DMA myth" is yet again busted. While I can confirm some DMA chips have issues, the DMA chip itself isn't to blame. As suggested before, likely cause is some DMA chips are more susceptible to noise than others.
Here is the image from before showing writes to the first sector.
The sector number is also within the first bytes of the sector, so sector 1 will be 00 00 01, sector 2, 00 00 002 etc. So I can confirm that the first 100 sectors are written with the correct data and in the correct order. The ASCII pattern is correct for each sector also. So this confirms the "BAD DMA" is writing to UltraSatan perfectly well and reliably. So as suggested before, the DMA isn't faulty even though it appears to be.
February 13, 2017
Designed and ordered the prototype PCB to run the DMA though resistors on every pin. This should help reduce the load on the bus and clamp down on oscillations. Thing to bare in mind here, the Motorola CPU will load the bus a lot more than a HC CPU, so the same "idea" is to reduce the load on the bus the DMA end to see if that is enough to get things working also.
While I am seeing 6V+ on some CPU pins, its also possible if the DMA has FET inputs that a huge over or undersoot can cause FET inputs to latch up. So if the HC CPU reduces the "pull" on the 5V rail, then oscillations and under/overshoot generally become a lot less and then that could be why the DMA then works. The test PCB should also cure the same problem but its not exactly fair as the DMA only uses the databus from the CPU, not the address bus. Read more about latch up HERE. Of course it is only a guess that the DMA has FET inputs. It is just one of the possible scenarios that can cause a circuit to fail.
I was doing some upgraded STE PSU's about the same time and noticed the floppy drive started to fail to read. After much confusion, I re-fitted the 244 & 245 buffers and the floppy started to behave again. So those buffers could be there relating to the floppy drive more than the hard drive. Even so, they do seem to actually solve some "new" problem which has only just surfaced. I have at some point removed the SIL resistor from the DMA databus (which connects to the buffers) so that could have contributed to the problem. I tried both -38 and 001A DMA's and both did the same. It is still in question about fitting the DMA bus resistor, though I would tend to think its a good idea to fit it still currently.
April 13, 2017
I had a couple of emails from 'Oliver de Font' who has several STE's all of which suffer from DMA issues with ultrasatan. He did as I suggested previously in changing the CPU and he reported back it solved his DMA issues. He since emailed me saying the same fix also worked on another STE. His STE's all have the -38 DMA.
So why does the CPU "fix" DMA issues ? Well like I said from the start, noise on the bus. In fact after investigation the problems mostly seem to be generated from near the CPU. Poor grounding on the CPU is causing huge voltage spikes near the DMA IC. Such spikes can latch up logic internally in the DMA and that is likely what happens.
Changing the CPU to a HC type pushes less current though the ground connections on the motherboard and the voltage "bounce" gets greatly reduced. Once this problem is reduced, the DMA no longer "sees" these screwy voltages and it behaves perfectly.
I have also seen the DMA issues vary depending on the types of ROM IC's fitted. As proved some months before using Jookie's HDD tester program, DMA writes work perfectly using his software. However, even a simple "save desktop" write trashes the drive as proven many times already. So how can it work and not work ? Well the answer is actually down to the ROM's used. The problem is, the ROM IC's are being changed from various sources all around the Internet, and likely a contributing factor to DMA issues. For example, by design the ST ROM circuit powers down and powers up the ROMs as they are being accessed. While this is a "power save" type feature, it also causes the power rails to spike and the ROM's will also spike the address and databus, and guess what, the DMA IC is right next to the ROM sockets!!
This also makes sense as to why I hardly ever see this DMA issues, as my test machines mostly use my STE booster (which uses the HC CPU) So basically solves the DMA issues already. I also use a CMOS ROM, which uses such little power, I just leave it enabled all the time. So this power up down up down up down type of problem simply isn't there anymore. The bus doesn't get spiked anywhere near as much, and neither do the power rails. So machines fitted with my booster (which also include better pull up resistors) solve bus related noise a great deal.
I have also found that STE's fitted with a Motorola CPU are almost guaranteed to have issues with the DMA IC. While a SGS CPU (I only had 2 to test) did not suffer from DMA issues. So even the brand of CPU used in the STE can be the tipping point to a DMA failure or not.
We must remember these machines were "built to a price" and mostly all these issues I have seen are issues down to PCB layout problems creating noise on the bus. It is not easy to design high frequency PCB's on 2 layers. So Atari did a pretty decent job to get it working as good as they did. I myself only use a dedicated ground layer on my designs. If the Atari motherboard was re-done with a dedicated ground plane (extra PCB layer) then I suspect the machine would not have any of these odd DMA issues. At a guess, if Atari had the CPU where the ROM's are, and the ROM's where the CPU is, then it probably wouldn't have any issues. Though if they did move the CPU, they could have created issues relating to the RAM access (GLUE / Blitter logic) instead.
So yet again the "DMA myth" is busted and "bus noise" has been proven once again to be the issue. So I urge people who have DMA issues to change the CPU to a HC type, or at the never least, use a SGS CPU and not the Motorola ones. Then see how it goes and report back to me your findings!
A list of users who have tested fixes on machines listed on this page.
Oliver de Font - STE Corruption with -38 DMA with Ultrasatan -
CPU fixed the issue.
The 68HC000 CPU can be found in my STORE. Please help fund future research by making a purchase from my store!
CAN I USE ANY HC CPU ?
Technically you can. But note wherever you source them from could likely be faulty. The ones from the exxos store are tested before they are sold. which involves a lot of time and binning of faulty parts. I also personally think it is somewhat of a shitty thing to do considering the amount of time I put into research and fixes for people , only for someone to go by the CPU elsewhere!
THE HC CPU MADE THINGS WORSE ?
IS THE CPU 100% COMPATIBLE WITH THE ORIGINAL ?
WAS THERE ANY BAD DMA CHIPS FOUND ?
TESTED DMA NUMBERS
THIS SECTION IS CONSIDERED OBSOLETE AS "FAILING DMA'S" HAVE BEEN PROVEN NOT TO BE A FAULT WITH THE DMA IC.
Below is a list of DMA numbers which have been tested by various people in a STE to see if they work reliable or not. So anyone who has a working or suspect buggy -38 DMA then please send me the IC! This list will only be of use if all Atari owners submit numbers to me. It is also possible "buggy" chips could have been damaged by a faulty hard drive, so while they may be listed as buggy, it could have just been damaged during its life (bad PSU etc). Names of who submitted the batch numbers are also listed.
Please note while I have found a bad DMA, this is in no means proof of any STEs shipped with any "buggy" DMA's chips. Generally such DMA problems show within a hour of use. But there can be many causes of hard drive problems not just the DMA chip itself!
I keep hearing problems with DMA IC's, but I am yet to find more than 1 IC which has issues. I have also had reports of motherboards which I have personally tested and working fine, failing with suspect DMA problems. So its clear the problems are also not related to the motherboard. The failure is either the hard drive itself, bad PSU or something not on the motherboard itself.VERIFIED WRITE FAILURE
9L4 10 - exxos
9M2 16 - Damian (works on floppy write but fails on HDD write MB IMAGE Interesting is a early STE revision) - IC under further investigation on Feb 3rd 2017 update.
9JA 95 - Bart
6B1 59 - Bart
6J1 86 - Bart
6K1 91 -Bart
8M1 32 - Bart
9L1 03 - Stefan
NOTES ON IMP CHIPS
Thanks to Stefan for pointing me towards this document he found over at http://gossuin.be/index.php/520-et-1040-stx . Unfortunatly it was in French so translation to English might be a little "rough".
Original Document HERE.
If anyone can help with a better translation of the below texts then please
let me know!
New set of chips (IMP)
As you probably know, since about a year Atari partially uses a new set of ICs on which the name IMP is mentioned. This concerns the GLUE, MMU, Shifter and DMA ICs. In the beginning, certain problems could arise when four IMP ICs were used together. For this reason, Atari installed mixed ICs on their printed circuit boards because then these systems worked perfectly. It turned out that the IMP ICs were too fast and because of that were sensitive to noise or to minimal timing differences. The evolution of IMP ICs now permits to use all four IMP ICs together without any problem. The first systems that make use of that technology were already delivered to you and work perfectly. Currently, we deliver two types of printed circuit boards: PCBs which can be modified and PCBs already adapted for the use of four IMP ICs.
The purpose of this modification is to decrease the supply [voltage] of the IMP IC to about 4.5 volts, in order to make it slower and to avoid any interference. This reduction of the [voltage] can be done in the following ways:
PCB C100167-001 Rev 5.0, Mega ST2/4
PCB C103277 Rev 2.2 Mega ST 1
Circuit board already adapted for IMP: PCB C100167 Rev B Mega ST 2/4
EXXOS NOTE: As suspected the IMP chips are actually faster which are responding to noise on the bus. Technically faster is better, but I have also seen similar issues even with ROM speeds. Faster ROM's than 100ns can fail to work as the noise on the bus causes false signals. The Ricoh brand are slower and do not suffer with noise problems, they are simply to slow to "see the noise". IMP chips are faster and *can* see the noise so they fail to work correctly. It's technically incorrect to assume IMP chips are "bad", it depends on your perspective.
The diode on the 5V rail of IMP chips was there to lower its voltage to about 4.5V. Doing so "slowed" the IMP chip enough to function correctly. Again its not really that IMP chips are "bad" just they are faster and respond to all the noise on the Atari bus. If there was no noise on the bus, then IMP chips would function perfectly.
DMA FAIL AFTER "RESET" COMPUTER
Using some hard drives of the old model (SH205 / megafile 20) there may be a non-functioning HDD after "Reset" from the computer. The only way to solve this failure is to cut and restore the system full on. This failure is caused by the fact that in the following on 25uS RESET signal another signal appears on the bus. Some hard drives are very sensitive. The problem depends greatly from the combination of hardware.
The solution to the problem is to alter the response time 10 uS to +/- 100 uS, the U20 circuit (74LS13 one shot) This extension of time may be realized how next : Replace R53 75 K Ohm 240K Ohm 1/4 W +/- 5% ca ~ bone. Replace C35 to 330 pF by 1.0uF> 25 VDC +/- 20% ceramics. See diagram attached
2014 STFM FIX (possibly obsolete)
For those who have not read my research into the "bad" DMA please see HERE for all the technicals. Work was done with the Gigafile and may not work with other hard drives.
Please note This capacitor fix has been superseded by another fix at the bottom of this page. So please try that fix first before trying the capacitor mods.
Is is common knowledge that STE users must have a "good DMA" to solve hard drive problems, though you may not need one. Try my fix before spending out on expensive DMA replacements. This fix has been proven to solve DMA problems on STFM and STE with -38 or 001 DMA.
This fix has been tested on several STFM machines and a couple of STE machines. This fixes issues such as random non-booting and random file corruption when accessing the drive. So far GadgetGuy has verified the mod works on his STFM with UltraSatan. You can see his channel HERE. If anyone else can verify this working and with what drive,DMA, motherboard revision, then please email me and I will post comments here.
The DMA and hard drives have been known to suffer on STFM and STE machines. "Buggy DMA" is not just on STE machines. For example, swapping a "new/good DMA" into a STFM did not fix hard drive issues. I have ran the older DMA in a STE with this fix, and tested with a "new" DMA the same fix, and both DMA's work perfectly. I tried the old DMA in the STE without the capacitor mods and the old DMA worked fine each time. I have never personally seen any hard drive problems relating to STE machines at all regardless of DMA used.
As to the exact cause of these problem is not yet known, but it would appear to be relating to motherboard revisions rather than simply blaming the DMA chip itself. For example, I tried several old DMA's in various STFM revisions with huge problems, and yet, trying the same DMA's in the STE, worked perfectly. This is contrary to belief that the older DMA will not work in the STE, when clearly it does. It is known that the faults do not follow the DMA used, regardless of STFM/STE machine used. So the DMA is not to blame.
The general thought is that early STE machines were fitted with the old DMA, which did not work, and the later STE machines had a updated DMA which was worked. During my tests I saw no difference between the 2 DMA numbers. I suspect Atari coindidently changed the motherboard layout, or made some revisions to the STE motherboard, about the same time they started using the new DMA chip. As mentioned before, I had no issues changing a new DMA out of a STE and placing a old DMA in there, it worked perfectly. So something else is going on as already suggested.
I urge people to try this fix and please let me know so we can get to the bottom of it once at for all! It would help if people even mention what hard drives work and don't work with various machines along with motherboard revision number and DMA number. If we can get enough people to build a database up we might be able to trace exactly where things are going wrong.
However be warning that this fix is not a "fix all" solution. Hard drives and DMA problems can be down to bad caps on the motherboard and bad PSU caps. I have already documented problems of this elsewhere. Normally you can do a quick test by simply formatting a floppy several times. For some reason the floppy drive starts to randomly fail during formats, probably due to the drive motor pulling more current from the PSU during access. Normally poor video or video problems showing while formatting a floppy can indicate a bad PSU. Chances are, if your cannot reliably format a floppy several times without errors (please use Fcopy/Ecopy or some other formatter which shows bad sectors) then your hard drive will probably not work either.
The capacitors are soldered directly on top of the DMA chip exactly as show and assumed the mod can be done on any Atari machine using the DMA chip. During my tests I found values of 330pF to 390pF are the optimum values, though these were only tested on my STFM and STE machines. The mod has not been tried on any other machines so the values may be different. It is also worth noting that on the STFM machine, values below 220pF or higher than 470pF do not work at all! A value of 330pF is suggested for STE and STFM machines.
Try this mod at your own risk! I do not guarentee it will work for everyone either as there can be more than 1 cause for hard drive problems. Try this mod ONLY if you suspect the DMA is to blame. Please note I do not own any other Atari machines or know of any other DMA related problems either.
As a side note, while I was sorting though some old backups from early 90s, I found a text file which I wrote stating the -38 would work in a STFM but not a STE. Mixed results as the STE's I have now work fine with the -38. So again it suggests the DMA is not the problem.
I keep getting asked about this fix and relating to problems with "UltraSatan". One person said the mod fixed all problems, another said it made things worse. Another said one capacitor helped but the other did not. I hear many reports of UltraSatan not working correctly with the floppy drive on some machines.
The capacitor values I stated were what worked with GigaFile. I had no problems with this mod on any STFM or STE machines with either version of the DMA. Capacitor values may need to be lower or higher. I do not know as I do not own a UltraSatan. No research has been done with this mod with any other machines or hard drives by myself or others.