MyAtari magazine: Casting a light on DRAM

Issue 19: May 2002

Features
-	Foreword
-	New Software from Poland
-	Tip of the day
-	Mouse of a Time
-	Stochastic Screening - Color Correction
-	8-bit Games Fair in Germany
-	Casting a light on DRAM
-	Atari in the USA 2002
-	6.5.02 Day
-	Game On

Casting a light on DRAM

Text: Matthias Alles
English translation: Peter West

Memory under the microscope
Today's computers could not manage without DRAM (Dynamic Random Access Memory). Thanks to the permanent relatively good price/performance relationship, this kind of memory undoubtedly has contributed to the massive spread of computers. But the disadvantages cannot be disputed: The low price is a result of a simple construction, which manifests itself in a reduction of operating speed. However, as the basic principle of DRAMs has not changed in decades, it is rather surprising that it still finds use in today's computers. In order to understand certain properties of DRAMs, it is necessary to take a look at the innards of memory chips.

Figure 1: This small section of a memory matrix makes clear the very simple construction of DRAMs.

To store one bit of information, modern DRAMs require only a single transistor and a capacitor. The information is stored in a tiny capacitor of a few fF (1 femto-Farad = 10^-15 Farads), which is charged with 0 V for a logical 0 or for example 5 V for a logical 1. The resulting single-transistor cell permits a very high degree of integration on the silicon die, so the costs of such a memory chip can be held very low. Static RAM (SRAM) on the other hand, which is used for cache memory among other things, has faster access times but also needs six transistors to store one bit, which makes them more expensive due to the extra real estate required.

Now there is not very much you can do with one bit, which is why a multiplicity of single-transistor cells are arranged in a matrix on the chip. One complete row of this matrix is called a page. To address one bit you need both a row address as well as a column address. These are passed to the memory chip by multiplexed address lines via /RAS and /CAS signals (more about this later).

To read out a bit, the following happens: First of all a so-called pre-charge has to occur. During this all data lines are brought to half the voltage for a logical 1, so for instance 2.5 V if the chip works with 5 V. The time required to do this is called the pre-charge-time. While this is happening, the row decoder in the memory chip evaluates the row address and after the pre-charge selects the suitable word-line (WL) with a voltage pulse. All the transistors of this page now conduct. The charged capacitor now ensures that the voltage on the data line alters slightly depending on the stored charge - in other words becomes slightly lower or slightly higher than the previous 2.5 V.

At the end of the data line the sense amplifier comes into play. This uses the pre-charge voltage as a reference, and converts voltages lower than this to 0 V and higher than this to 5 V. Depending on the column address, the contents of the selected memory cell will be output externally. However, the problem now arises that all capacitors of one page have been discharged and so their contents have been lost. As a consequence, all these memory cells have to have their previous information written back to them again. So after the read-out the sense amplifiers ensure that the evaluated information is fed again as either 0 V or 5 V to all data lines. Finally the capacitors are charged with this voltage and so store the correct information once more.

To write a bit, one has to proceed in exactly the same way as for reading, except that after reading the page no data are passed to the outside but read from outside and passed to the selected data line. If one did not read out the whole page before writing a bit, it's clear that apart from this one written bit all other information in the page would be lost. So during a read or write access, all memory cells of a page are read out and then written back again, even if they are not required.

Memory chips that can output four bits simultaneously, for instance, simply contain four matrices working in parallel. The way that the matrix is organized has a direct effect on the current consumption of the chip: The more columns that are present, the more capacitors have to be recharged with each read or write access. The result is higher current consumption (and so greater heat dissipation). Today a multitude of memory chips have become established that often have a different matrix organizations. A 16 Mbit chip, for instance, can be organized as a 4,096 x 4,096 or a 16,384 x 1,024 matrix. For the first example we talk about 12/12 mapping (2¹² x 2¹² = 2²⁴ = 16,777,216), whereas the second example would correspond to a 14/10 mapping (2¹⁴ x 2¹⁰ = 2²⁴ = 16,777,216). Finally, four such memory areas would make the chip a 64 Mbit chip, organized as 16 Mbit x 4.

From the mapping you can determine how many bits are necessary for the row address and how many for the column address. An asymetric 14/10 mapping requires 14 address bits for the rows (2¹⁴ = 16,384) and 10 address bits (2¹⁰ = 1,024) for the columns. To save on connections, the memory chip is given only as many pins for addressing as are absolutely necessary, thus in this case 14. So to address a memory cell this address connection is multiplexed, meaning the data are transferred successively: First the page address is transmitted with an activated /RAS signal (Row Address Strobe) and subsequently the column address with an activated /CAS signal (Column Address Strobe).

These are the basic principles of dynamic memory. The information throughput therefore appears fairly low at first, as row and column addresses have to be sent to the memory chip for each read or write access. So quite early on thoughts turned to how memory throughput could be increased so the CPU (Central Processing Unit, which performs the actual calculations) wouldn't be starved. If one takes a look at CPU code in RAM, it quickly becomes apparent that this is usually stored sequentially. As a consequence, the data requested by the CPU often lie in one page, which makes it unnecessary to transmit the page address with the /RAS signal each time. Instead this is transmitted only once and stored on the RAM chip. During this time the memory logic holds the /RAS signal active. The memory chip then knows that fast accesses to the RAM will follow. Synchronously with the /CAS signal for column addressing the memory chip now only receives the column address; thanks to this a lot of time can be saved with these so-called burst accesses. DRAMs that work in this manner are called Page Mode DRAMs, as they can provide data faster following a page hit.

The next logical consequence is that the read and write amplifiers do not have to write back the just-read data to the memory matrix after each access, but only after a page change. Thanks to this the RAS pre-charge time is done away with in burst mode, which reduces the CAS cycle time. The Page Mode DRAMs optimized in this manner are known as Fast Page Mode DRAMs (FPM-DRAM), which found their way into the TT and Falcon, for instance.

But FPM-DRAMs still have the disadvantage that the /CAS signal determines how long a read cycle takes. Due to this there is no possibility of passing the column address of the following access in the meantime. This can only happen after the data on the data bus are written back. And it is just this point that is tackled by EDO-RAMs (Extended Data Out). Here the /OE signal has the task of indicating the end of a read cycle. Therefore the next address can be transmitted with the /CAS signal already while the previously read datum is still present on the data connection. Due to this light parallelling of the access (overlap) it is possible to reduce the CAS cycle time from 60 ns for DRAMs to 40 ns for FPM-RAM and 25 ns for EDO-RAM. As a result of this, EDO-RAM during burst accesses and a bus clock of 40 MHz can deliver data at each clock cycle, while FPM-RAM can only manage this at 25 MHz. However, it should be noted that the speed gain of EDO versus FPM only applies during reading. During writing an overlap is not possible.

Further development of EDO-RAMs is described as BEDO (Burst Extended Data Out). As the requested data usually lies both in a page as well as directly following in a row, one lets the BEDO-DRAMs do their own thing after passing the column and row addresses. The memory chip generates the new column addresses internally with an address generator and outputs the required data synchronously with the /CAS signal, which makes an additional reduction of the CAS cycle time possible. Thanks to this, BEDOs can operate with a 66 MHz bus clock with a 5-1-1-1 burst and so are distinctly superior to normal EDOs. But despite the good concept they never found wide acceptance - particularly due to fast SDRAM - and so were only available for a short time.

It is just these SDRAMs (Synchronous DRAM) that dominate today's memory market. In basic principle this memory species works exactly like the memory types described above. Nevertheless SDRAM is appreciably more flexible and intelligent than its predecessors. All control signals depend on one clock, which - according to the age of ther SDRAMs - may be 66, 100, 133 or at present 166 MHz. The control signals themselves (/CS, /RAS, /CAS, /WE) should be understood as a kind of command issuers which tell the memory exactly what it has to do with various bit-patterns. The SDRAM stores the present command during a rising clock edge. It is so advanced in its construction that after receiving a command it can perform the task assigned to it by itself, without requiring further external control signals.

Figure 2: Memory types compared: With refined processes higher memory throughput can be achieved.

To adapt the memory to the requirements of the operating system, there is even the possibility of setting certain properties of the memory chip oneself with the mode register. Among other things one can determine here how many memory accesses a burst access should consist of. So it is possible that it consists of 1, 2, 4 or 8 accesses, or that the whole page is read out at once. The lead-off cycle (the reading of the first datum during a burst access) however lasts 5 clock cycles with SDRAMs as well, just like for previously described memory types. After this however the data move to or from the external world at system clock speed. If desired a burst may also be terminated, or only frozen if it is to be continued after a short while. The CAS-latency, which can also be set in the mode register, specifies the number of clock cycles following the receipt of the column address after which the memory is to output the first valid data. You can buy SDRAMs with CL2 and those with CL3. You can see from the price already that DRAMs with CL2 are always faster than those with CL3.

A further innovation, besides the clock signal, is that SDRAMs are made up internally of at least two banks, which may be addressed independently. This makes it possible to run certain actions in parallel, so they no longer slow down the memory. For instance, one can read from one bank in burst mode while already executing a command for a pre-charge for the next access on the other bank. In this way it is also possible to conceal the pre-charge time or the 5 clock cycles of the lead-off cycle by addressing one bank while the other is still delivering data.

But besides the actual memory chips, the memory modules (DIMMs) also contain a small EEPROM, the SPD-EEPROM (Serial Presence Detect). This can be read out via the I_C bus and contains information about the memory chips on the module, for instance their organization of the access times.

Ater taking a close look at SDRAMs, we reach the peak of current memory developments: Here we find the very expensive RAMBUS memory, which is apparently being used in fewer and fewer computers and will not be discussed further here. Another memory type represents a further development of the SDRAM, even though the changes are not that great. The DDR-SDRAM (Double Data Rate) which is being talked about is attractive because it is said to deliver twice the amount of data of previous DRAMs. However, DDR-SDRAM does not - as one might suppose - just double the clock rate. Rather it performs two actions during one clock cycle. While traditional SDRAMs always synchronize to the rising edges of the clock signal, this memory makes uses of both the rising as well as the falling edges for data and command transfer.

As this memory stands on rather unsteady legs, it has been given an additional bi-directional signal (DQS) for control purposes. If the memory is outputting data then it will indicate the validity of this data, while if the computer's chip-set outputs data then it controls the DQS signal.

A further property of DRAMs, which we have ignored completely till now, is the refresh. As we saw at the start of the article, the information of a memory cell is stored in an associated capacitor. Due to leakage currents however the capacitor will be discharged again quite quickly, and this cannot be stopped. This property is just the price that one has to pay for a high packing density of the memory: The speed may be seriously affected in these circumstances.

During a refresh the contents of a page are read out and written back straight afterwards. Naturally this refresh must occur before the voltage on the capacitor no longer suffices for establishing the stored information by the sense amplifiers. The maximum time that may pass between two refreshes of the same capacitor is called the refresh period. Since refreshes are always carried out for a complete row, one can read from the number of rows how large the refresh cycle has to be that specifies how many refreshes have to be executed in a refresh period. Chips with 2¹¹ = 2,048 rows normally have a 2 K refresh, while chips with 4,096 rows have a 4 K refresh. The row refresh cycle (the average time required for a row refresh), is fixed at 15.6 µs. With a 2 K refresh this results in a refresh period of 32 ms (2,048 x 15.6 µs = 32 ms), while a 4 K refresh takes 64 ms (4,096 x 15.6 µs = 64 ms).

Figure 3: Due to leakage currents the capacitors lose charge. Therefore a refresh of the memory contents is necessary.

Figure 4: To perform a refresh one uses signal combinations that are not possible with read and write accesses.

Here too there are several methods to more or less reduce the importance of such refresh delays. Roughly one can differentiate three types of refresh:

RAS-only refresh: When addressing the DRAM one proceeds first as for a read access, but after the row address one does not transmit a column address, thus leaving the /CAS signal inactive. After this the DRAM executes a refresh in the specified row.
CAS before RAS refresh: During a normal read/write access the row address is always passed first via /RAS. However, if the /CAS signal is activated first, /RAS specifies the duration of the refresh cycle. Here the row address no longer has to be transmitted, as the DRAM contains an internal self-incrementing address counter for this refresh.
Hidden refresh: This type of refresh is actually no longer in use these days, as high bus frequencies hardly make its use possible. If /CAS remains active after a read access, then the hidden refresh is triggered if a further /RAS impulse follows. Here too the DRAM contains its own row address counter. However, hidden refresh only works if the next RAM access follows the actual refresh - in other words if the bus frequency is low enough to hide the refresh between two memory accesses.

Things are simpler with SDRAMs or DDR-SDRAMs, as one only has to transmit the command for a refresh and one does not have to worry about the timing of the individual control signals.

And finally: For 37 years now, Moore's Law which states that a doubling of processor capabilities occurs every 18 to 24 months, has remained valid. And at present one cannot foresee when this will no longer apply. But what applies to CPUs is equally true of the capacity of memory chips. Performance as well as size reduction also follows Moore's Law and are at present rising steeply. Improved production processes that permit higher packing densities and higher operating frequencies will ensure in the future too that the eye of the needle between the CPU and memory does not become too small. But the principle behind DRAM is unlikely to change in the next few years.

A microphotograph of a 4 Mbit chip. The large areas represent the memory matrices.

This article was originally published in German by st-computer magazine, February 2002, and is reproduced in English with kind permission.

Useful links

st-computer magazine
www.st-computer.net

[ Top of page ]

MyAtari magazine - Feature #7, May 2002