News:

Building a 3D Ray Tracer  By stevmjon

Main Menu

Economizing Image Blitting

Started by kevin, February 09, 2007, 10:02:06 AM

Previous topic - Next topic

kevin




 Visit www.PlayBasic.com


NOTE: You'll find these articles and MANY more in the PlayBASIC HELP FILES under the ABOUT TAB



Economizing Image Blitting (drawing!)







I like Big Blit's


  If you've been programming for a while you've perhaps noticed a strange anomaly when drawing (bliting) images.  Which is, why does the blitter (the image drawer) seem to choke up drawing a lots of small images,  when it can throw around huge ones easily ?

 To answer this, we have to look at how the windows give us control of the video hardware.   It should be noted, that we're primarily discussing the rendering of Video Images (those stored into your graphics cards video memory) and not those in system memory.

 Your PC has what's termed a blitter.  You can think of this as the  part of the graphics chip solely designed to draw/copy rectangular graphic data quickly.  Now every time you create a image (in video memory) and render it.   PB uses the blitter device to transfer the pixel information for you.   Now this is all lovely,  but there's a catch !   The Blitter is shared devic  and we're not the only one's using it!.  (windows + other programs are using it also)

  Since the blitter is shared, when we want to draw (blit) something, we're forced to wait inline for it to become available (from either a pervious draw call, or some other program/task is using it). Once it's free,  we can send it our drawing job and continue on our way.   The drawing will take place while our program continues running.   This is called asynchronous rendering.  Which is fancy way of saying the graphics chip is drawing while the main CPU's continue on doing something else.    (Note: Not all video cards support this!)

  The issue is that if we call the blittler frequently, we're inevitably going end up stalling our program in wait loops, while DX waits for the graphics cards Blitter to become available to us.  This is what occurs when we're try and draw lots of small images with the blitter.   While the drawing might be quick once acquired, all this polling/waiting for the blitter equalls one sure thing, lots of lost time.  Which can really impact performance.    You've probably noticed this when drawing Maps with really small blocks for example.

 Lets demonstrate.   In this example bellow were going to render a tiled image to the screen and monitor the FPS.  Obviously newer GPU's with faster blitters will cope better with this than older ones, this doesn't mean the issue is anymore, just that it'd impacting those users less.  

 In this  example we'll draw a tiled image with 8*8 tiles, 16*16, 32*32, 64*64, 128*128, 256*256 blocks etc etc then monitor the average FPS at these sizes.


PlayBASIC Code: [Select]
openscreen 800,600,32,2

Seconds=5
Maxtests=6

Type tFrame
Afps#
Frames
Size
endtype


dim Info(Maxtests) as tFrame

TestCount=0
BlockSize =8


repeat

Cls 0
ShadeBox 0,0,BlockSize,BLockSize,rgb(255,0,0),rgb(255,255,255),255,rgb(0,255,0)
getimage 1,0,0,BlockSize,BlockSize
EndTime=timer()+Seconds*1000
frames=0
repeat
inc frames
TileImage 1,Xpos,Xpos,false
Xpos=mod(Xpos+1,BlockSize)

Info(testCount).frames =frames
Info(testCount).Size =BlockSize
text 0,0,str$(fps())+" Block Size:"+str$(BlockSize)
SYnc
until Timer()>EndTime

BlockSize=BlockSize*2
inc testCount
until TestCount>=Maxtests


cls 0
print "Results"


;writefile "C:\BlitResults.txt",1

For lp=0 to Maxtests-1
f#=info(lp).frames
f$=digits$(f#,5)
fr$=str$(f#/seconds)
m$=digits$(lp,2)+" Frames:"+F$+" Fps:"+fr$+" Size:"+str$(info(lp).Size)

print m$
; writestring 1, m$
next
;closefile 1
Sync
Waitkey








results; (Duron 800 & GeForce 2 MX200)


00  Frames:00050  Fps:10.0  Size:8
01  Frames:00193  Fps:38.6  Size:16
02  Frames:00572  Fps:114.4  Size:32
03  Frames:00721  Fps:144.2  Size:64
04  Frames:00746  Fps:149.2  Size:128
05  Frames:00755  Fps:151.0  Size:256










Big C.

These are my results for information based on AMD 3500+ (Socket 939) and GeForce 6600 GT...


00  Frames:00117  Fps:23.4  Size:8
01  Frames:00474  Fps:94.8  Size:16
02  Frames:01687  Fps:337.4  Size:32
03  Frames:04503  Fps:900.6  Size:64
04  Frames:06138  Fps:1227.6  Size:128
05  Frames:06696  Fps:1339.2  Size:256


Big C.

kevin

#2
Economizing Image Blitting (drawing!)




Crash course in Image & Sprites Draw Modes



  One the most misunderstood aspects of PlayBasic (among other things), is how to design your program to take advantage of image & sprite draw modes.  Draw mode gives us control over how the sprites are rendered.  This is nice and all, but certain render modes can place additional stress on your computer, if you're not careful.   Making them slower than need be.  This most commonly occurs when using the Alpha Blended modes.  In fact, it'll effect any draw mode that has to read from the destination surface.

 Getting the best out it, comes down to how well you understand the different types of image buffers  PlayBasic offers us.  Today (PB1.63 and bellow)  we  have 2 primary image types... ( note: future editions of PlayBasic have more) which are


Image Type #1   - Video Image

   Video Images are images where the pixel data is stored in your computers graphics card memory.   These images can be copied/drawn to the screen (which is also in your video cards memory) very quickly, since they utilize you've video cards blitter.  Which is the part of the GPU specially designed for this purpose.

  While it's fast to transfer Images around in video memory with the blitter (on most cards), the blitter has it's fair share of limitations.  Those being, that it can basically only Copy & fill rectangles of pixels, and that's about  it.  

  So all other rendering is in the hands of the CPU.  While the CPU can write data to images in video memory very quickly, it can't read from them.   Effectively reading from video memory is 20->30 times slower than writing!  Even worse on some systems.   Most people no doubt assume this is a PlayBasic thing,  it's not,  it's limitation of how the PC was designed.    It doesn't matter what language you use, we just can't write or read from video memory at the same rate.

 Basically, Video images are our best option if we want to draw loads of solid (no alpha), none rotated sprites around the screen.



Image Type #2   - FX Image

   FX images are a variation of normal images.   The primary difference however, is that the Pixel data is stored in your computers main memory.  While there's no visible difference between the two,  they do give us a difference set of abilities.      

  Being stored in system memory grants us the freedom of fast access to the pixels in the image.  So we can  we can read & write pixels as fast as our cpu can manage.   But there's a price, FX images are slower to draw to the screen.  Since the drawing has to be performing using the CPU.  We can't use the video cards blitter, since it was designed to work with image data stored in video memory, not system memory.

   Now since  FX's images are stored in system memory, this gives us the ability to draw them rotated/scaled in real time.   This is possible as the software rotation code can read the pixels from the image as fast as CPU / Memory will allow.   We couldn't do this as fast, if the image was stored in video memory.  You can try that yourself.   Load Image "myIMageNameHere",1, then try and draw it rotated.... it'll be slow for the aforementioned reason.  Then try it with and FX image.

   


Common Mistakes


  OK, so we've established that if we want to rotate an image or read from it a lot, then FX images are probably our bet at this time.  But what about rendering translucence styled image effects such as  Alpha Blending, Alpha Addition, Alpha Subtract, Logical operations etc... ?

  This is where the most common mistakes are made!   I.e.  A newbie loads up an FX image and tries to alpha blend it to the screen.  First thing they notice that's nice and slow!   To explain why, we'll examine the basic process that's being performed when we blend images together..


  For each pixel we're doing the following.



 Step #1 Read the Src pixel from the image were drawing

 Step #2 Read the corresponding Dest Pixel from the destination image (that one we're going to overwrite/blend with)

 Step #3  Perform the blending operation.

 Step #4  Output the newly blended pixel to the destination image.



  Now, that seems simple enough, so what are we missing ?  

  Notice how that in order to blend a pixel, we have to the read pixels from the destination image ?    Hmmm, well what if that destination is in Video memory ?  Wouldn't all that reading video meory slow our drawing routine down ?  Yep, it certainly will !

  While you can get away with blending small images directly to the screen or video images,  the better approach, if you want a heavy amounts of blending, would be to draw your screen to a screen sized FX image,  do all your blending stuff on that, then transfer (draw) that FX image to the screen.  This will avoid video memory reading completely.

   The upside of this approach is that we get much faster blending.     However,  we do loose the assistance of the video cards blitter while doing this.   So commands like CLS/BOX  &  drawing solid images/map will be slower.  How slow, depends on how fast your cpu/memory is, and don't forget we have to transfer the whole image to the screen.   But even with this added Burdon, it's still way faster than attempting to render to blend effects directly to video memory.  


Example

 This example  shows the process of rendering sprites to an FX image, then drawing the image the screen so we can see it.    Those with a keen eye will notice it's a variation of the Alpha sprite example that comes in the PB example pack.  The main difference is the scrolling backdrop isn't present in this version.

  Anyway,  I've provided the example so you can get an idea of how much FX image blitting your system can push around.  I recommend experimenting with the sprite particle size, screen depth.   Every system will have a balancing point.  


PlayBASIC Code: [Select]
   global   Use_FX_Buffer      = true
Constant Particle_Size =32 ; try 16, 24, 32, 48, 64, 96, 128
Constant RequiredFrameRate =30


OpenScreen 640,480,32,2
; OpenScreen 640,480,16,2

#include "BlitIMage"

MakeBitmapFont 1,$ffffff

sw=GetScreenWidth()
sh=GetScreenHeight()

Dim ParticleImages(4)
Size=Particle_Size
ParticleImages(0)=MakeParticle(size,RGB(255,RndRange(100,200),Rnd(75)))
ParticleImages(1)=MakeParticle(size,RGB(255,RndRange(20,40),Rnd(15)))
ParticleImages(2)=MakeParticle(size,RndRGB())
ParticleImages(3)=MakeParticle(size,RndRGB())
ParticleImages(4)=MakeParticle(size,RndRGB())




Type tObject
Status
x#,y#
xdir#,ydir#
sprite
rotspeed#
EndType

max=10
Dim Objects(max) As tobject

Gosub INit_Objects



Screen=NewFXImage(sw,sh)

; ------------------------------------------------------------------
; Start of Main Loop
; ------------------------------------------------------------------

Do
Gosub Update_Logic
Gosub Render_Scene
Sync
Loop



` *=----------------------------------------------------------------------=*
` >> Update Sprites Rebound Logic <<
` *=----------------------------------------------------------------------=*

Update_logic:
For lp=0 To max
If Objects(lp).status
spr=Objects(lp).sprite
MoveSprite spr,objects(lp).xdir#,objects(lp).ydir#
If SpriteInRegion(spr,0,0,sw,sh)=False
If GetSpriteX(spr)<0
objects(lp).xdir#=objects(lp).xdir#*-1
EndIf
If GetSpriteX(spr)>sw
objects(lp).xdir#=objects(lp).xdir#*-1
EndIf
If GetSpriteY(spr)<0
objects(lp).ydir#=objects(lp).ydir#*-1
EndIf
If GetSpriteY(spr)>sh
objects(lp).ydir#=objects(lp).ydir#*-1
EndIf
EndIf
TurnSprite spr,objects(lp).rotspeed#
EndIf
Next

If UpKey()
max=max+10
Gosub INit_Objects
EndIf


If DownKey() And max>10
max=max-10
Gosub INit_Objects
EndIf

Return



` *=----------------------------------------------------------------------=*
` >> Render The Current Scene <<
` *=----------------------------------------------------------------------=*


Render_Scene:
ClsColour=rgb(110,140,170)

if Use_FX_Buffer=true
RenderToImage screen
else
Cls ClsColour
endif

; Draw the Sprites
DrawAllSprites

if Use_FX_Buffer=true
RenderToScreen
; render the FX screen to the real screen
BlitImageClear(Screen,0,0,ClsColour)
endif


if Use_FX_buffer=false
t$="[Video Render] "
else
t$="[FX Render] "
endif

Text 0,0,t$+Str$(max)+" Sprites @ "+str$(CurrentFPS)+"fps"

if enterkey()
Use_FX_Buffer=1-Use_FX_Buffer
Flushkeys
endif

if SpaceKey()
SetCursor 0,20
Pixels=Max*(Particle_Size*Particle_Size)
Print " Blended Pixels :"+Str$(pixels)
Print "Dots Per Second :"+Str$(Pixels*CurrentFPS)
EndIF


; Check the fps, if it's over our target, then
; add more sprites to the scene
CurrentFps=fps()
if Timer()>CheckFpsTime
CheckFpsTime=timer()+250
if (CurrentFps-1)=>RequiredFrameRate
Login required to view complete source code





  Keys

   Enter =  Toggle between Render to an FX image or rendering Directly to Video memory.

   Space = See basic filler stats on your machine.

 


Results

   All test were conducted in 640*480 (full screen exclusive) display mode in both 16&32bit modes.  The test tries to calc how many sprites of size X, can be  on screen while holding 30fps on the machine.   All sprites are rotating,  but have randomly assigned an alpha draw modes.  



800mhz Duron & GF2 mx200          

* 32bit
Quote
   Sprite Size [16]   =     764
   Sprite Size [32]   =     336
   Sprite Size [64]   =      118
   Sprite Size [128]   =     38

* 16 bit
Quote
   Sprite Size [16]  =      908
   Sprite Size [32]  =      422
   Sprite Size [64]  =      166
   Sprite Size [128] =     56  



 3gig AMD 64 & GF6600            

* 32bit

Quote
   Sprite Size [16]   =    3200
   Sprite Size [32]   =     1558
   Sprite Size [64]   =     630
   Sprite Size [128]   =   200  


* 16 bit

Quote
   Sprite Size [16]  =    3418
   Sprite Size [32]  =    1690  
   Sprite Size [64]  =      668
   Sprite Size [128] =    244  







kevin

#3
Economizing Image Blitting (drawing!)




Copying/Drawing Between Image Types



    A good rule of thumb when working out how to arrange your image data in memory,  is to keep them in the same format all the way through the drawing process.   This is not always possible as how you render will also determinate what format the image needs to be held in.    While it will vary system to system, here's some general tips on the subject.


Image Type Key

 VI  = Video Image
 FX = FX image


Performance Guide when  Drawing fixed sized solid/transparent images

Quote
  Copying   VI to VI   =  FASTEST -  These will generally be assisted by the Graphics Card Blitter.
  Copying   VI to FX   =  VERY SLOW -  Requires reading from the video memory.    Avoid if possible
  Copying   FX to VI   = GOOD -    CPU driven.  
  Copying   FX to FX   = GOOD -    CPU driven.  

     
Performance Guide when  Drawing  Rotated/scaled images

Quote
  Copying   FX to FX   = FASTEST -    CPU driven.  (and no buffer lock stalls)  
  Copying   FX to VI   =  FAST -    CPU driven.  
  Copying   VI to VI   =  VERY SLOW -  Requires reading from the video memory.    Avoid if possible
  Copying   VI to FX   =  VERY SLOW -  Requires reading from the video memory.    Avoid if possible



Performance Guide when  Drawing Blended (alpha/add/sub  etc etc ) images

Quote
 Copying   FX to FX   = FASTEST  -    CPU driven.  
 Copying   VI to VI   =  EXTREMELY SLOW -  Requires reading from both video memory buffers.    Avoid at all costs!
 Copying   VI to FX   =  VERY SLOW -  Requires reading from the source video memory.    Avoid if possible
 Copying   FX to VI   =  VERY SLOW -  Requires reading from the destination video memory.    Avoid if possible








Ian Price

That's a really interesting piece there, and it may well be very useful for my Sorcery game - when you destroy an enemy it release a scaling, alpha blending rainbow explosion. When one is on screen it's fine, but it slows dramatically when there are two it can slow down dramatically if both are quite large. I presume you've noticed this ;)
I came. I saw. I played some Nintendo.

stef


Really interesting topic!

(I assume that Kevin was a teacher in a previous life  :))

Is there a connection to PBFX/3D?

is '3Dimage' similar to 'FXimage' ?

Draco9898

3d image = a texture in your gfx card waiting to be used on a polygon
DualCore Intel Core 2 processor @ 2.3 ghz, Geforce 8600 GT (latest forceware drivers), 2 gigs of ram, WIN XP home edition sp2, FireFox 2.

"You'll no doubt be horrified to discover that PlayBasic is a Programming Language." -Kevin

Adaz

Very edifying! I found this solution myself last week, and now know why it was so slow lately. Thanks!

Ádáz

Hungary

stef

#8
Hi Draco!

Thx for reply! :)

Atm things becomes a bit complicated in connection with upcoming PBFX.(at least it sounds a bit complicated)

And the examples in this thread don't work with PB1.65 (or on my comp)

Greetings
stef

leopardps

#9
I really appreciate your tutorials and information - Good job!

But I am having some problems in understanding.  I understand the 'concept' of having an image in either: (1) system memory (FXimage), or (2) in Video memory (on the video card).  BUT, I don't see any commands that say specifically that they load an image into the Video memory - at  least the help file doesn't make it clear.  I would like to test the speed difference between the two different methods.

Also, related to these Video Images... you first say that Video Images are drawn SOLID with no alpha - which I am guessing means 'no transparency':
QuoteImage Type #1   - Video Image

...Video images are our best option if we want to draw loads of solid (no alpha), none rotated sprites around the screen.

but then later you say that Video images can have transparency:
QuoteImage Type Key

 VI  = Video Image
 FX = FX image


 Performance Guide when  Drawing fixed sized solid/transparent images

Quote
  Copying   VI to VI   =  FASTEST -  These will generally be assisted by the Graphics Card Blitter.
  Copying   VI to FX   =  VERY SLOW -  Requires reading from the video memory.    Avoid if possible
  Copying   FX to VI   = GOOD -    CPU driven.  
  Copying   FX to FX   = GOOD -    CPU driven.  

So, my questions are
(A) can a Video Image have any sort of transparency, either as a mask (like how 'Magic Pink' is used in some languages RGB(255,0,255), or an Alpha mask (plot the pixel if the alpha = 255, any other alpha then don't plot), or full alpha transparency (where the src pixel is blended with the destination)?

(B) What commands are used to load/unload an image directly into Video memory (ot the screen yet, just memory on the video card)? What are the commands to Draw(copy to...) an image on the video card to the video screen?

for some reason his has been very confusing for me...

(C) I assume that all the specialized sprite commands, collision abilities, etc are based off of FX images, NOT images on the video card, right?

I tend to only use images with transparency (either a mask, or, prefer 0-255 full alpha) - I am never rotating/scaling/other special effects that FX images can have applied, so my thought is that I can get away with solely using video images.  If Video images can actually be used with a version of transparency, then I have one more question:

(D) I assume that I can then make a screen sized Video Image, blit all my video images into it utilizing their transparency, and then blit that entire screen image onto the screen - without moving anything to system memory and utilizing the speedy video blitter - basically make a screen buffer (unless there already is such a thing on the video card...)?

UPDATE:
ok, re-read some of the help info on images and did some testing and I think I figured out a few of my questions:
Answer to A: yes, the video image can have a transparency - it appears any 'black' pixel rgb(0,0,0) is considered transparent when drawn with the 'transparent flag' set to '1' - speed test indicates that drawing an image, from video, is faster than an FXimage - but seems to ONLY be so when the transparent flag is set, seems same speed when no transparency(nned to test this more as it doesn't jive with what you have said) (also just read about the whole 'mask color' thing which talks about the default transparent color being black... Pretty neat that you can specify the mask color! nice!)

Answer to B: looks like loadImage, loadNewImage, createImage all deal with Video Images.  DrawImage will draw both Video Images and FX images...

and answering those two questions actually answers my remaining two questions!

leopardps

silly me... I am using a low-end notebook.... video memory is apparently 'shared' system memory anyways... I guess video images probably don't offer much speed improvement, if any