Spent most of today trying to remove some of the bottleneck from 'DOT' rendering. While previously it ok for a few points here and there, but would simply choke when trying to draw a full screen of dots. This is not acceptable!
It turns out that a hell of a lot of time was lost in clipping overhead, so after a little tinkering. hey presto it's now able to draw 800*600 dots in about 400ms. Still not amazing, but that's about 10 times faster than it was previously for number of pixels.
One of the issues is that DOT is basically a safe render, that mean it's it supports clipping, and handles the destination buffer for you and the various draw modes.
So obviously we can gain a little more back but removing it's safeness and letting the user lock/unlock the buffers and handle clipping. Which gives us FastDOT
FastDot on my machine is about 2.5 times faster than DotC. You can render a full screen of pixels 800*600 in 150 milliseconds on my duron 800mhzz. That's about a 5/6fps for a pixel by pixel screen fill (code bellow). Still not staggering, but a very healthy improvement.
Code for PlayBASIC V1.13
[pbcode]
w=getscreenwidth()
h=getscreenheight()
rendertoscreen
; Render the full Screen Using DOTC and record how long it takes
tim1=timer()
lockbuffer
For ypoint=0 To h
For xpoint=0 To w
dotc xpoint,ypoint,rgb(255,0,0)
Next
Next
unlockbuffer
tim1=timer()-tim1
c=rndrgb()
sync
waitkey
basec=rgb(255,0,255)
Do
cls 0
tim2=timer()
c=basec
lockbuffer
For ypoint=0 To h-1
For xpoint=0 To w-1
fastdot xpoint,ypoint,c
Next
c=c+xpoint+1
Next
unlockbuffer
tim2=timer()-tim2
; show the time of DOTC and FASTDOT
print tim1
print tim2
sync
loop
[/pbcode]
UPDATE NOTES: (14th Nov 2022)
-Read PlayBASIC Help Files about FAST DOT 2 (http://playbasic.com/help.php?page=GRAPHICS.FASTDOT2)
-Read PlayBASIC Help Files about FAST DOT 3 (http://playbasic.com/help.php?page=GRAPHICS.FASTDOT3)
-Read PlayBASIC Help Files about FAST DOT 4 (http://playbasic.com/help.php?page=GRAPHICS.FASTDOT4)
Very nice, getting these full scale pixel based rendering functions working fast seems like a pain in the tailhole :)
Well, the main drama is keeping things generic. Generic and fast, really don't go together. It doesn't matter how much fat I trim away from the edges, it's still generic. A more viable approach would be to implement a pointer data type, so the user can write their own customer dot filler. Although, this is really a situation where a concept like PB-Asm would shine
Pointer example
[pbcode]
Dim Address as pointer
Dim FrameBufferAddress as pointer
FrameBufferAddress= GetSurfacePtr(0)
FrameBufferModulo=GetSurfaceModulo(0)
; assume as 32bit filler
For Ylp=1 to height
Address=FrameBufferAddress+(Ylp*FrameBUfferModulo)
For Xlp = 1 to width
*Address = Rgb(255,0,255)
inc Address,4
next
next
[/pbcode]
That would certainly be quicker in the long term, but the draw back is the user has to support all video formats manually.
Conceptually, if we go ahead with PB-Asm, this would probably be the fastest way to generate time critical code, without it being totally platform dependant.
Dim Address as pointer
Dim FrameBufferAddress as pointer
FrameBufferAddress= GetSurfacePtr(0)
FrameBufferModulo=GetSurfaceModulo(0)
; assume as 32bit filler
For Ylp=1 to height
Address=FrameBufferAddress+(Ylp*FrameBUfferModulo)
FillColour =Rgb(255,0,255)
Asm
; Seed registers (R0 through R3 32bit)
Mov.l R0, Width
Mov.l R1, FillColour
Mov.l R2, Address
; Fill loop
Loop:
Mov.l (R2), R1
Add.l R2,4
DecBne R0,Loop
EndAsm
next
The main appeal of implementing something like PB-Asm, would be it's a way to by pass the variables/pointers and manipulate memory directly. The Asm segments could be jitted to the host platforms native machine code. Given the simplicity of the potential instruction set, Most, if not all operations would translate 1 to 1. In cycle terms that's about 4/5 cycles per pixel for that fill loop. Compared to the 100's of cycles it takes now per pixel.
but anyway, I digress..
Fast Dot Revisited
I've been quietly optimizing some of the old VM baggage away from PB1.17. This is often necessary as over time things get bloated which can often be stream lined. While I do have a pre-set standard benchmarks/results I use when testing for speed, these are mainly math and loop orientated. So I figured I'd use the raw DOT screen filler as gfx one.
Results,
In the screen shot above the PB1.13 is filling a screen full of (800*600*32bit) pixels in 150ms.
PB1.17 now performs this task in 132ms
Test Machine Duron 800mhz, GF2 Video WinXp pro
In Frame rate terms that's above another full frame per second faster (as there's 20 milliseconds per frame). Which doesn't sound impressive, but effectively that mean the brute looping crunching power of PB in this case is about %12 better in this situation.
If you calculate the fill rate per pixel, you can get an idea of just how many pixels my test machine can fill at reasonable rate. ( Fill rate = (Fill WidthW * Fill Height) / Milliseconds )
My Machine (duron 800mhz) fill about 3200 pixels per millisecond. So it's fast enough to do this in 320*240*32Bit.at 38/40fps Which is pretty staggering (to me at least), as it sure wasn't able to come close to that just a few weeks ago.
;makebitmapfont 1,$ffffff
w=getscreenwidth()
h=getscreenheight()
w=320
h=240
openscreen w,h,32,2
rendertoscreen
;ScreenVsync on
; Render the full Screen Using DOTC and record how long it takes
tim1=timer()
lockbuffer
For ypoint=0 To h
For xpoint=0 To w
dotc xpoint,ypoint,rgb(255,0,0)
Next
Next
unlockbuffer
tim1=timer()-tim1
basec=rgb(255,0,255)
Do
cls 0
rendertoscreen
dot 0,ypoint
tim2=timer()
c=basec
lockbuffer
For ypoint=0 To h-1
For xpoint=0 To w-1
fastdot xpoint,ypoint,c
Next
c=c+xpoint+1
Next
unlockbuffer
tim2=timer()-tim2
basec=basec+w
; show the time of DOTC and FASTDOT
print fps()
print "MS"+str$(tim2)
print "Fill Rate:"+str$(Float(w*h)/tim2)
sync
loop
Are you going to throw PB-ASM in? Looks extremely useful...
Probably, although it's not like I can just throw it in. Effectively it's like producing a mini compiler, within a compiler.
To me PB-Asm sounds like a way to circumvent engine/interpreter problems.
Personally I would avoid something like PB-Asm - it will result in a second language to be supported/improved/bugfixed...
Maybe it makes more sense to enhance the interpreter. The issue you're trying to solve looks pretty much like "HotSpot" from Java.
With JIT the SUN people were able to transform method code into machine code but the call stack (the sequence of methods to be called) was still interpreted in Java. So SUN worked on HotSpot which means they transform whole call stack regions into native code.
This works pretty well for big loops for example. Maybe that gives you some more ideas, Kevin...
But that's just my 2 cents ;)
Cheers,
Tommy
Who actually have use of PB-Asm? As Thaaks says it's like a 2nd language with all the problems involved with it. Also PB is meant to be an easy way to program games. Personally I would like to se PB FX first and perhaps other things that are more directly useable for game creation. PB FX would allow us to do great looking modern 2D games. When that's taken care of there's nothing that stops including more complicated features.
I knew this would be miss interpreted. Implementing something like PB-Asm is a low priority , the concept is as old PB it self. However, there is certainly a need for way to stream line time critical loops without the compiler generated overheads getting in the road.
The plan has always been to compile the source down to one generic byte code instruction set (which is what it already does), then translate the byte code to native machine code were possible. The translation can occur either in the platform VM, or externally (aka of a module). Anyway, the issue (one of them) is that no matter how clever how the code generator is, it's highly unlikely to able to reach the speed of a manually set out asm loop. But it'll certainly be a lot quicker either way :)
The idea for PB-ASM opens up all kinds of new possibilities for PB. It would certainly attract wanna-be ASM coders and the computer science crowd. Imagine :
* Learn the fundamentals of ASM -- using PlayBasic!
* Learn the fundamentals of making your own operating system -- using PlayBasic!
* Learn the fundamentals of making your own programming language -- using PlayBasic!
I began a 6502 emulator project for this exact reason, sadly, I lack the time to program it myself.
It will be awhile before I'm able to get into computer science again, but I can definitely say that there is a market for this kind of thing.
The two requirements for attracting this audience would be
* Include PB-ASM in PB Source -- for people looking to write their own ASM routines.
* An option to develop using only the VM2.
If you ever want to continue with it, please post a brainstorming thread. :)
-- Shawn
See Kyruss (http://www.underwaredesign.com/forums/index.php?topic=529.0)