mrbombermillzys TT display enhancements

Blogs & guides and tales of woo by forum members.
User avatar
stephen_usher
Posts: 5580
Joined: Mon Nov 13, 2017 7:19 pm
Location: Oxford, UK.
Contact:

Re: mrbombermillzys TT display enhancements

Post by stephen_usher »

Here are the details:
  • Sampling Opt.
    • Allow TVP HPLL2x On
    • Allow upsample2x On
    • 640x480 Timing
      • H.Sample Rate: 911
      • H S.Rate adj: 0
      • H.synclen: 96
      • H.Backporch: 58
      • H.Active: 640
      • V.Synclen: 2
      • V.Backporch: 29
      • V.Active: 480
  • Output proc
    • 480p/576p Proc: Line2x
Intro retro computers since before they were retro...
ZX81->Spectrum->Memotech MTX->Sinclair QL->520STM->BBC Micro->TT030->PCs & Sun Workstations.
Added code to the MiNT kernel (still there the last time I checked) + put together MiNTOS.
Collection now with added Macs, Amigas, Suns and Acorns.
User avatar
Cyprian
Posts: 387
Joined: Fri Dec 22, 2017 9:16 am
Location: Poland

Re: mrbombermillzys TT display enhancements

Post by Cyprian »

Steve wrote: Sat Feb 13, 2021 1:00 pm @Cyprian So I assume we can't really create an 'easy fix' of injecting our own custom timing as it will effect the system too much? In that case I wonder if there is a slightly more complicated method we could use like the ossc, but not as expensive as the ossc :)
good question

as there is not an easy method for changing PixelClock, we could try to provide external HSYNC & VSYNC (small chip connected to the video port) which are better suitable for VGA. I've created a new topic about that:
https://www.exxosforum.co.uk/forum/viewt ... 101&t=3939
Lynx I / Mega ST 1 / 7800 / Portfolio / Lynx II / Jaguar / TT030 / Mega STe / 800 XL / 1040 STe / Falcon030 / 65 XE / 520 STm / SM124 / SC1435
DDD HDD / AT Speed C16 / TF536 / SDrive / PAK68/3 / Lynx Multi Card / LDW Super 2000 / XCA12 / SkunkBoard / CosmosEx / SatanDisk / UltraSatan / USB Floppy Drive Emulator / Eiffel / SIO2PC / Crazy Dots / PAM Net
http://260ste.atari.org
User avatar
mrbombermillzy
Posts: 1441
Joined: Sun Jun 03, 2018 7:37 pm

Re: mrbombermillzys TT display enhancements

Post by mrbombermillzy »

** Please guys, can we not go off on too much of a topic tangent, or I will forget what I have done here! **

Ok, back to the TT software display enhancements:

So, I can either do substantially more work on this and basically show you the finished product in 2061, or I can show you the already slow progress as it comes.

Ive opted for the latter.

Right, so the next thing on the agenda after walloping the Shifter with as many writes as possible to find its write throughput limit, is to intelligently find out what is actually required to get the desired result. (i.e. the display of higher than 256 colour static images).

For that we need some information about the images we intend to display on the screen. Pertinent information would be the actual no. of colours displayed on any given raster line for any given picture, just so we have an idea what we are dealing with:

Here's an example picture (albeit a purposely low colour transition/count one used to make me feel good) in 24bpp:

nebula.PNG
nebula.PNG (273.75 KiB) Viewed 1623 times

To reduce the colour writing needed, we can look at different ways of optimising the display so as not to have to write so many colours to the shifter.

Currently I have found a few optimisations that can be to a greater or lesser extent effective in reducing writes to palette registers:


Optimisation 1: (Smear mode only optimisation) How many times is the palette value at the current position different to the value of the previous (x-1) position?


Note: the good old TT has something similar to a built in RLE compression system, which has its uses for this optimisation.

This counts the number of per pixel palette changes per line. (This is NOT the same as the number of unique colours per line. e.g. you may just have 4 different palette values in the whole line, but if they are repeated successively in a line: e.g.: 269,1082,621,12,269,1082,621,12,... then that's more than 4 colour changes.

The end result achieved: the higher the number of adjacent non palette change results, the more palette writes of zero can be made (e.g. A zero write cycle time will be less than any regular write). This optimisation will have to be used in conjunction with smear mode to make the speed gains at all possible, as writing a zero just repeats the currenly set colour in smear mode. (Hence for this optimisation, only the TT makes it possible! The only other computer that has this mode is the Apple IIGS, and maybe there is some sort of Blitter that can do this too). Also, if the I/D caches can be harnessed, (which, writing the same data and instructions at least, if not the same location, should cache and therefore avoid fetching both I and D for a good speed boost) then even bigger benefits could be achieved. (This side of things can possibly be tested with the Hatari profiler). Testing should be done to see whether CLR.W or MOVE.W #0 is the most effective when caches are involved and in a real world scenario.

opt1.PNG
opt1.PNG (32.68 KiB) Viewed 1623 times

The image shows horizontally, in groups of two, the no. of 'regular' palette writes, then next to that is the no. of zero writes needed. As you can see, there's quite a saving on this low colour transition image. Obviously there are harder images to deal with, but we will get there, I have no doubt!

The above technique, whilst being quite effective, will definitely be improved by being augmented with the below one...


Optimisation 2: How many palette values on the current line are equal to the value on the same palette position on the previous line?


The end result achieved: This is how many palette writes we can avoid writing altogether per scanline. e.g: If the previous lines equivalent palette value is equal to the current one, there is no need to rewrite it with the same data.

Note: the first displayed line (although not having a previous line to compare to) can also avoid writing the full 256 writes/palette, by sneakily comparing with the (previous) bottom line values, once the first frame of the whole screen has been displayed. (However, we have plenty of time before the initial line is displayed to set whatever palette values we require for the initial first line write, so it all works out well).

opt2.PNG
opt2.PNG (23.91 KiB) Viewed 1623 times

As you can see, the image is different for opt.2. We just have a 'regular' palette write figure per line and no zero write column.

The no. of writes starts off close to the 256 like any new image would need. However, like mentioned earlier; 1. The bottom line comparator has kicked in, taking the edge off the full 256 writes needed and 2. It doesn't really matter much anyway, as you should be able to write the whole 256 colour palette at your leisure before displaying the first line of the image (unlike the rest of them).

With this in mind, the lines below the first show a marked reduction in write amount. This is probably about as good as it gets (image/colour reduction wise) with optimisation 2. Therefore it will probably be rendered obsolete once I start testing optimisations 3>.

Even so, lets now look at the combined force of optimisations 1 and 2:

opt1and2.PNG
opt1and2.PNG (28.92 KiB) Viewed 1623 times

Chained together, they are obviously more effective.


Optimisation 3: How many palette register colours are repeats of colours already used on the current line? (Or alternatively, how many non unique colour values are on this line?)


This optimisation is an enhanced version of optimisation 1, (with the enhancement being that the non adjacent repeated colours are not needing to be written as well as the adjacent ones. In fact ANY repeated colours are not written) and utilises the moving of the palette locations on the actual TT display RAM area.

So for example, instead of having the screen filled with palette 0----255, if palette value 62 is used throughout very often, then we can 'hardwire' the screen values like e.g. 0-62-2-3-4-62-62-62-8-9-10-11-62-13-62...etc-255 ..or as however needed, on a per line basis. The significance of this, specifically on a static image, is once setup, however many of these colours are used like this require no extra writes to the colour palette (as they are already written to it and in the correct position!).

Another bonus is that we can use a full 320px image width, providing that the no. of individual colours per line on the 320px wide image does not exceed 256 (or from another perspective the number of repeated palette values is going to be above 63), then we know we have enough colours spare.

Every single repeated colour palette value is one less to be written AND this won't affect optimisation 2 above as the read palette order stays the same.

It may not be complete plain sailing though, as the shifter not reaching a palette/pixel location in time, that is currently having its colour re-written for the next line will result in the next line value/wrong colour being displayed. A re-written low palette number being repeated towards the end of the line or a high palette position being written very early in the line write routine being good examples of the problem:


(Effects of shifter not reaching a pixel position in time when that pixels palette value is changed for the next lines new data):

problem1.PNG
problem1.PNG (4.63 KiB) Viewed 1623 times

However, the problem can be completely eliminated by writing a write order sorting routine: (e.g. if palette position <xx and screen write position >xx OR if palette position >xx and screen write position <xx then write this value first, etc). Maybe splitting the palette writes into two/three and writing the third section first, then second, or maybe writing the palettes backwards may work.


Optimisation 4: How many palette values are the same as used on the previous line? (NOT necessarily directly above in the same x axis palette position)


Similar to optimisation 2, but in this case, if there is a matching value that's not the same palette no. as before, we can just swap the positions in the TT screen memory to compensate, so this optimisation is another static graphic only one which should further reduce the no. of palette changes needed per line.

This optimisation will also need the same write order sorting routine as optimisation 3.

Unfortunately, optimisations 3 and 4 have not been finished yet, as can be seen:

noopt3.PNG
noopt3.PNG (13.31 KiB) Viewed 1623 times


***************************************************************************************************


To make use of the above optimisations together, we must test each individual palette value for eligibility of all optimisations at the same palette location to get a correct picture of how many writes can be avoided, or at least sped up, because the initial optimisation may well negate the need for the next in each optimisation chain 'set' as well as all tests individually accumulating separate palette position 'hits' for the amount that can avoid being written.

This will precisely give the actual number of palette writes needed, after the above optimisations; faster writes or otherwise. We can then use this to dictate suitable image sizes or colour reductions needed. (Hopefully few and far between).

Of course, the above calculations do not factor in any faster zero write speed hikes. They will require extra benchmarking, as I believe the clock speed increase can be substantially more, due to either of the 030 I/D caches kicking in.

Anyhow, that's about as far as I have reached with the display system.

The next post with more information on this will probably be once optimisation 3 (at least) has been wired in.
Post Reply

Return to “MEMBER BLOGS”