A Crash Course In Optimization Part #3 - Smarter Drawing Techniques
In all facets of video game development, optimizing our programs drawing behavior is one of the most beneficial ways of improving our games overall performance. Now, I know what the little voice in back your head is saying, but doesn't my computers
super fast video card make optimization redundant ? - Sadly No. Like everything in the computer they have a sweet spot. As such, they're designed to work in a certain way. While we can simply ignore this and press on regardless, this will often place more strain upon the drawing devices than need be. Making our programs slower than they could be.
How can I optimized my drawing ? - Well that's a good question, rather than getting into anything thats game or genre specific, we'll just try and cover some simple basic tips that can help us in all types of graphics programming. There's tips could be categorized as,
Tips * When Not To Clear The Screen
* Avoid Rendering Dead Pixels (Drawing nothing sure takes a long time)
* Changing Display depths can improve performance ?
* Drawing Everything Real Time ?
* Selectively Refresh The Screen (Dirty Rectangles)
Note: The objective of optimizing our programs rendering, should be to improve the programs performance, without changing the appearance. We can generally do this by removing redundant drawing operations. For example, if we equated drawing each pixel on the screen as costing 1$ dollar per pixel. Then we'd want to make sure our program wasn't drawing the same pixels over and over again. As if we draw over a pixel twice then old one is gone (the user of our program never sees it), and cost of drawing that pixel has now doubled.
When Not To Clear The Screen
Clearing the screen (aka CLS) is more often than not one of the first commands you'll find our a programs main rendering loop. But is it really necessarily ? - Sometimes yes, but not always. We only need to clear the screen, if we're not going to be completely filling the screen with some backdrop. Such as a backdrop picture, gradient or perhaps a map even. It's in those situations that CLS is just a wasting some rendering time.
To demonstrate this has effect, the following example simulates the situation where we're drawing a backdrop image and also doing the unnecessary CLS. Press Space to toggle the CLS on/off between modes.
// Create a screen sized image. This is used to simulate a backdrop
// picture you might use.
GameBackDrop=NewImage(GetScreenWidth(),GetSCreenHeight())
//Fill the BackDrop image with a blue gradient
RendertoImage GameBackDrop
c1=rgb(50,70,200)
c2=rgb(250,70,200)
ShadeBox 0,0,GetScreenWidth(),GetSCreenHeight(),c1,c2,c2,c1
rendertoscreen
ClsEnabled=true
// start of programs main loop
Do
//
if ClsEnabled=true
// Clear the Screen to a bright pinky/orange colour
// We can't see it, since the backdrop is being completely covered
// So the CLS is here for no reason
cls Rgb(200,100,100)
endif
// Draw the gane image as we would in our game.
// Since we're drawing it full screen, and we don't want
// anything to show through it, we draw it solid
Drawimage GameBackDrop,0,0,false
// SHow FPS
Text 10,10,"fps:"+str$(fps())
if ClsEnabled=true
t$="Cls Enabled"
else
t$="Cls Disabled"
endif
Text 10,30,t$
// Hit the spacekey to toggle CLS
if Spacekey()=true
ClsEnabled=1-ClsEnabled
FlushKeys
endif
Sync
loop
This example, is an extension of the same subject more or less. This one draws a gradient backdrop and a some foreground mountains as two screen sized separate images. Visually, we get a sort of sunset effect sitting behind the scrolling mountain side (which is just some ellipses in this example). The thing is, since the backdrop gradient is none descriptive (no distinctive markings), what we could do in this case, is combine the gradient and the mountain picture together. This will effectively 1/2 the amount of drawing our program is doing and therefore give us a free speed boost. I should point out, this isn't always possible, but well worth it if you can!
Press Space to toggle between the two method in this demo
// =======================================
// Create a Screen Sized Image. This is used to simulate a backdrop
// picture you might use.
// =======================================
GradientBackDropLayer=NewImage(GetScreenWidth(),GetSCreenHeight())
//Fill the BackDrop image with a gradient
RendertoImage GradientBackDropLayer
c1=rgb(50,70,200)
c2=rgb(250,70,200)
ShadeBox 0,0,GetScreenWidth(),GetSCreenHeight(),c1,c1,c2,c2
// =======================================
// Create a second foreground layer
// =======================================
ForegroundLayer =NewImage(GetScreenWidth(),GetSCreenHeight())
RendertoImage ForegroundLayer
// Draw some ellipses to this surface to
Radius=GetSCreenWidth()/3
For Xlp=Radius/2 to GetScreenWidth()-1 step Radius
Ellipsec Xlp,GetScreenHeight(),Radius/2,Radius*1.5,true,rgb(0,255,0)
next
// =======================================
// Create version that is merged together
// =======================================
MergedBackDrop=NewImage(GetScreenWidth(),GetSCreenHeight())
RendertoImage MergedBackDrop
Drawimage GradientBackDropLayer,0,0,false
Drawimage ForegroundLayer,0,0,true
rendertoscreen
//
DRawMergedBackDrop=false
// start of programs main loop
Do
ScrollX=Mod(ScrollX-1,GetScreenWidth())
if DRawMergedBackDrop=false
// Draw the gane image as we would in our game.
// Since we're drawing it full screen, and we don't want
// anything to show through it, we draw it solid
Drawimage GradientBackDropLayer,0,0,false
// Draw the forreground layer over the gradient
Tileimage ForegroundLayer,ScrollX,0,true
else
// Draw the pre merged version of the back drop
// in place of the two seperate images
TileImage MergedBackDrop,ScrollX,0,false
Endif
// SHow FPS
Text 10,10,"fps:"+str$(fps())
if DRawMergedBackDrop=false
t$="Drawing Backdrops Layers Separately"
else
t$="Drawing Merged version"
endif
Text 10,30,t$
// Hit the spacekey to toggle CLS
if Spacekey()=true
DRawMergedBackDrop=1-DRawMergedBackDrop
FlushKeys
endif
Sync
loop
Avoid Rendering Dead Pixels
What's a dead pixel ? - These are pixels in the image that we're drawing but are not visible to the end user.
Solid VS Transparent rendering // Create a blank image 100*100 pixels in size
BlankImage =NewIMage(100,100)
TransparentRender =false
// start of programs main loop
Do
// Clear the Screen
Cls rgb (100,150,200)
// Draw the Blank image to screen 25 times
For lp=1 to 25
Xpos=150+(lp*20)
Ypos=100+(lp*15)
Xpos=Xpos-GetIMageWidth(BlankImage)/2
Ypos=Ypos-GetIMageHeight(BlankImage)/2
DrawImage BlankImage,Xpos,Ypos,TransparentRender
next
// SHow FPS
Text 10,10,"fps:"+str$(fps())
if TransparentRender=False
t$="Drawing Solid Images"
else
t$="Drawing Transparent Images"
endif
Text 10,30,t$
// Hit the space key to toggle between transparent and solid rendering
if Spacekey()=true
TransparentRender=1-TransparentRender
FlushKeys
endif
Sync
loop
If you test (cut and paste the code into PlayBASIC) you'll possibly see something you didn't expect. While your intuition might be telling you that since we can't see anything being drawn, when rendering transparent pixels, then this isn't eating up our computing time. If that was the case, then it'd be much faster when rendering an all transparent image. However, the reality is quite different. It's actually a fraction slower. How much slower really depends on the system here.
Armed with little bit of knowledge, the next question is how can we best take advantage of it ? - Well, there's a few situations where we can optimize our images to smooth out our games rendering performance. Such as trimming backdrop images of unnecessary transparent sections and full screen HUD overlays for starters. But one place our games can eat up a lot of processing power is when we draw our characters to the screen. In particular character animations.
Why ? - Well animations are often drawn and laid out upon sprite sheets so that all frames are the same size. While this might make loading the animation relatively straight forward, it does mean that we're potentially going to have animation frames that are all the same size (width/height) regardless of the what's in each frame.
If you imagine a running animation for a second, then hopefully it's easy to visualize that certain frames during the animation are going to wider or higher than others. Such as when the character is fully extended for example. Moreover if we imagine an explosion animation, where the debris starts out centralized and then radiates out. Then in such explosion animations, the initial frames will generally be smaller than the later ones.
OK, you're probably falling asleep by now wondering why this is an issue. Well, It relates back to our discovery above about how rendering transparent pixels costs the same (or more) than drawing a visible pixel. So if all of our animation frames have extra unused (transparent) space around the visible graphics porttion, then that space is not only costing us a little bit of render performance, but it's also wasting memory.
Example Lets have a look,
// Create two identical images. The only difference is that
// one is larger than the other. The larger (oversized) one
// is bit slower to render.
OverSizedImage =MakeBallImage(256,90)
TrimmedImage =MakeBallImage(180,90)
CurrentImage=OverSizedImage
//
DrawTrimmedIMages=false
// start of programs main loop
Do
// Clear the Screen
Cls rgb (30,40,50)
// Draw the current image to screen 25 times
For lp=1 to 25
Xpos=200+(lp*15)
Ypos=200+(lp*10)
Xpos=Xpos-GetIMageWidth(CurrentImage)/2
Ypos=Ypos-GetIMageHeight(CurrentImage)/2
DRawimage CurrentImage,Xpos,Ypos,true
next
// SHow FPS
Text 10,10,"fps:"+str$(fps())
if DrawTrimmedIMages=False
t$="Drawing Oversized Images"
CurrentImage=OverSizedImage
else
t$="Drawing Trimmed Images"
CurrentImage=trimmedImage
endif
Text 10,30,t$
// Hit the space key to toggle between trimmed and oversize images
if Spacekey()=true
DrawTrimmedIMages=1-DrawTrimmedIMages
FlushKeys
endif
Sync
loop
Function MakeBallImage(FrameSize,Radius)
Index=NewImage(FrameSize,FrameSize)
Rendertoimage Index
For lp=Radius to 1 step -1
Circlec frameSize/2,frameSize/2,lp,true,rgb(100,200-lp,255-(lp*2))
next
rendertoscreen
EndFunction index
Example #2 - Fixed Size Vs Trimmed Animations Size To demonstrate, in this example we're creating two animation sequences. One is trimmed of any dead (transparent) pixels the other isn't. We're testing the performance difference by rendering these animations on screen, each in a different stage of it's animation sequence. You can probably surmise what's going to happen. The one where we're showing fixed size animation frames will get progressively slower (the more you have the slower it gets) than the one with the trimmed animations. So trimming helps us balance the load better.
FrameCount=25
FrameSize=150
// Make two Expanding Ball Type Animations.
Dim FixedSizeImages(FrameCount*2)
Dim TrimmedImages(FrameCount*2)
MakeArray Animation()
Radius=70
For lp=0 to frameCount
Radius2=Radius-(lp*2.5)
// Make the Fixed size version
img=MakeBallImage(FrameSize,Radius2,rgb(00,255,lp))
FixedSizeImages(lp)=img
FixedSizeImages(FrameCount*2-lp)=img
// Make a 'rough' trimmed version anim (in a different colour)
img=MakeBallImage((Radius2*2)-2,Radius2,rgb(255,155,lp))
TrimmedImages(lp)=img
TrimmedImages(FrameCount*2-lp)=img
next
// Set toggle which animation is being viewed
DrawTrimmedImages=false
// start of programs main loop
Do
// Clear the Screen
Cls rgb (30,50,70)
if DrawTrimmedImages=False
t$="Fixed Size Animations"
SetArray Animation(),GetArray(FixedSizeImages())
else
t$="Trimmed Animations"
SetArray Animation(),GetArray(TrimmedImages())
endif
FrameIndeX=Mod(FrameIndex+1,FrameCount*2)
// Draw a screen full of animations
Xpos=100
Ypos=50
For lp=1 to 200
ThisFrame=Mod(FrameIndex+lp,FrameCount*2)
CurrentImage=Animation(Thisframe)
Xpos=Xpos+20
if Xpos>700
Xpos=100
Ypos=Ypos+100
endif
X=Xpos-GetIMageWidth(CurrentImage)/2
Y=Ypos-GetIMageHeight(CurrentImage)/2
DRawimage CurrentImage,X,Y,true
next
// DRaw the anim over the scene again.
X=100-GetIMageWidth(CurrentImage)/2
Y=200-GetIMageHeight(CurrentImage)/2
DRawimage CurrentImage,X,Y,false
box x,y,x+GetIMageWidth(CurrentImage),Y+GetIMageHeight(CurrentImage),false
// SHow FPS
Text 10,10,"fps:"+str$(fps())
Text 10,30,t$
// Hit the space key to toggle between trimmed and fixed size versions
if Spacekey()=true
DrawTrimmedImages=1-DrawTrimmedImages
FlushKeys
endif
Sync
loop
Function MakeBallImage(FrameSize,Radius,Colour)
Index=NewImage(FrameSize,FrameSize)
Rendertoimage Index
For lp=Radius to 1 step -1
Scale#=(Radius-lp)/Float(Radius)
Circlec frameSize/2,frameSize/2,lp,true,RgbFade(Colour,Scale#*100)
next
rendertoscreen
EndFunction index
Changing Display depths can improve performance ?
While you might not think of it, the screen resolution (games display dimensions width & height) and the number of colours play a big part in final performance of our game. This is particularly important for those wanting to design programs that still perform well on older computers. While most of us take for granted that modern machines are capable of throwing high resolution images around the display fairly easily, the same can not be said for older systems. Which is clearly evident when we compare games from 2008 with games from 2004, or back in 2000, or 1980 for example.
The basic rule here is that the higher the resolution, the more pixels on screen, the more pixels on screen, the more power required to fill the screen with images at a reasonable speed.
For example,
If we use a screen size of 1048x * 768y with 32bit colour. That's (1024*768*4)=3,145,728 bytes for the screen. Yes, 3 meg !
Now lets say you CLS the screen and draw one layer of map to it. That means your GPU is roughly shifting 3 meg for the clear, and another 3 meg for the map render. So to draw the image costs approximately 6 meg of bandwidth per frame.
Now as we previously discovered, Every single pixel that is rendered (solid, transparent (mask colour) or translucent ) costs you performance! While Modern GPU's have a much higher fill rate (they can draw more pixels per second), older ones simply can't cope with too much graphics data. So we'll need a way to reduce the work load, if we hope to get reasonable performance on those systems.
We can reduce the work load in number of two ways.
1) Give the user the option of lowering the display depth from 24bit/32bit down to 16bit. This halves the amount of graphics data that has to be shifted, the trade of is that visually we lose some colour quality as well. It's also worth noting the many older GPU's were optimized for
16bit display modes over 24bit or 32bit modes. Really old systems often don't have 32bit mode at all. So on those systems there's no option but 16bit.
2) Change to smaller screen size. However, this is not always convenient when making 2D games. Since most of the artwork is drawn specifically for a certain display resolution. It is possible to scale the graphics media proportionally from within your program. However this will add a lot of extra processing work, and is unlikely to be very efficient or visually attractive. But possible if you really wanted. This approach is far easier in the 3D world.
3) Try to minimize the the amount of the wasted drawing the program might be doing. (The stuff mentioned in this very tutorial)
4) If you're doing things like rotation/image scaling then reducing the quality (the images dimensions) of the image can be very beneficial on old systems. Might not look as good, but this can keep the frame rate higher on those older systems.
5) Don't worry about supporting old clunkers anymore.. Optimizing is your choice, so you don't have to go out of your way support old systems.
Example What example demonstrates how much the resolution can effect the performance of our program. It does this by rendering a set of boxes to a collection of the 'screen' sized images. You can cycle through the images and monitor the programs performance(). The boxes are drawn proportionally, so the scene looks the same regardless of the size. Moreover, It also displays the number of pixels drawn at each resolution. The bigger the display gets the more pixels that are being drawn, and therefore the slower the demo gets.
MaxCords=50
// Create an array to hold
Dim BoxCords(MaxCords,5)
For lp=0 to MaxCords
BoxCords(lp,1)=rnd(100)
BoxCords(lp,2)=rnd(100)
BoxCords(lp,3)=rnd(100)
BoxCords(lp,4)=rnd(100)
BoxCords(lp,5)=rndrgb()
next
Dim Screens(5)
Screens(1)=NewIMage(320,240)
Screens(2)=NewIMage(400,300)
Screens(3)=NewIMage(640,480)
Screens(4)=NewIMage(800,600)
Screens(5)=NewIMage(1024,768)
CurrentScreen=1
// start of programs main loop
Do
// Clear the Screen
Cls rgb (30,50,70)
ThisImage=Screens(CurrentScreen)
PixelsDrawn=DrawScene(ThisImage)
Xpos=GetScreenWidth()/2-GetIMageWidth(ThisImage)/2
Ypos=GetSCreenHeight()/2-GetIMageHeight(ThisImage)/2
DRawimage ThisImage,Xpos,Ypos,false
w=GetimageWidth(ThisImage)
h=GetimageHeight(ThisImage)
t$="Current Screen Size:"+str$(w)+"*"+str$(h)
// SHow FPS
Text 10,10,"fps:"+str$(fps())
Text 10,30,t$
Text 10,50,"Pixels Drawn:"+str$(PixelsDrawn)
// Hit the space key to toggle between trimmed and fixed size versions
if Spacekey()=true
CurrentScreen =CurrentScreen+1
if CurrentScreen>GetArrayElements(Screens(),1) then CurrentScreen=1
FlushKeys
endif
Sync
loop
Function DrawScene(ThisImage)
rendertoimage Thisimage
w=GetIMageWidth(ThisImage)
h=GetIMageHeight(ThisImage)
ScaleX#=w/100.0
ScaleY#=h/100.0
c1=rgb(200,100,10)
c2=rgb(200,100,150)
ShadeBox 0,0,w,h,c1,c1,c2,c2
For lp=0 to GetArrayElements(BoxCords(),1)
x1=BoxCords(lp,1)*Scalex#
Y1=BoxCords(lp,2)*Scaley#
x2=BoxCords(lp,3)*Scalex#
y2=BoxCords(lp,4)*Scaley#
swapifhigher x1,x2
swapifhigher y1,y2
boxc x1,y1,x2,y2,true,BoxCords(lp,5)
PixelsDrawn=PixelsDrawn+((x2-x1)*(y2-y1))
next
rendertoscreen
EndFunction PixelsDrawn
Drawing Everything Real Time
Initially when we first start getting into game programming, one of the most common
misconceptions, well, mistakes people make is they assume 'everything' needs to be drawn every update. While this is sometimes true, often we can selectively update and player won't really notice the difference. However, this does mean that it's not quite as cut and dry as just drawing everything all the time. But it's not rocket science either.
One the most common opportunities for selectively updating occurs in things refresh backdrop animations and foreground overlaps.
Scores and Health Bars In this example all were doing is drawing a simple score and health bar bellow. It's the sort of thing we'd just slap into our game and forget, but we can actually improve the performance by selectively updating it. The same approach will work for all things HUD related in fact. So If it's not changing, don't render it!
LoadFont "Courier",1,24,0
ScoreOverLayImage=NewImage(204,50)
// start of programs main loop
Do
// Clear the Screen
Cls rgb (30,50,70)
scw=GetScreenWidth()/2
sch=GetScreenHeight()*0.4
if timer()>NextUpdate
NextUpdate=timer()+100
Score=Score+rnd(100)
Health=mod(health+1,100)
endif
if DrawMode=0
// DRaw the score and health directive to the screen, each update
DrawScoreAndHealth(scw,sch,Score,Health)
endif
if DrawMode=1
// Selectively cache the score& health to an image (since image rendering is quicker)
RefreshScore=false
if OldScore<>Score or OldHealth<>Health
; either the score or health have been changed, so we neeed to
; the score image cache
OldScore=Score
OldHealth=Health
RefreshScore=true
endif
if RefreshScore=true
rendertoimage ScoreOverLayImage
cls 0
DrawScoreAndHealth(101,0,Score,Health)
rendertoscreen
endif
drawimage ScoreOverLayImage,scw-101,sch,true
endif
if Drawmode=0
t$="Drawing to the screen"
else
t$="Selectively updating"
endif
// SHow FPS
Text 10,10,"fps:"+str$(fps())
Text 10,30,t$
// Hit the space key to toggle between draw mode
if Spacekey()=true
DrawMode =1-DrawMode
FlushKeys
endif
Sync
loop
Function DrawScoreAndHealth(Xpos,Ypos,Score,Health)
s$="Score:"+Digits$(Score,6)
CenterText Xpos,Ypos,s$
th=GetTextHeight(s$)
x1=xpos-102
x2=xpos+102
y1=ypos+th+5
y2=y1+th
c=rgb(20,30,40)
boxc x1,y1,x2,y2,true,c
c1=rgb(220,30,40)
c2=rgb(20,40,240)
ShadeBox x1+2,y1+2,x1+(Health*2),y2-2,c1,c2,c2,c1
EndFunction
Selectively Refreshing The Screen (Dirty Rectangles)
Often when we start writing games, it's common to see programmers just throwing everything at the graphics engine/hardware and let that deal with it. While this may work fine for some games on some computers, but not others. The reason for this is that graphics engines are generally designed to just do what you tell it to do. if you tell it draw the same image/circle/pixel 100 times, then that's what it'll do. It doesn't know what you're trying to achieve, or that you might be drawing something that isn't even on screen.
Selective refreshing is conceptually simple. Rather than just shoveling everything on screen every frame, we going to pay closer attention to what we're drawing and where. In this particular example, we're simply drawing some sprites (with alpha channel) moving around on a static backdrop. If we take the brute force approach, we've have a FX image the size of the screen. Copy the backdrop to this image, draw all the sprites over it, then copy the screen image to actual screen. This will work, but do we really need to refresh the entire backdrop every frame if the backdrop is static picture ? Nope. We only need to refresh the parts of the backdrop that are being overwritten.
There's a few ways of doing this, in this example we're simply going to treat the backdrop picture as if it was Tile Map. Using an array to signify when a tile (of rectangular portion of the backdrop image) needs to be refreshed to it's original state. During our sprite movement routines, we not only move the sprite, but we work out what tiles it's going to overwrite when it's drawn. These backdrop tiles are then flagged as needing to be refreshed when the backdrop is being restored.
So each loop we're basically doing this
Do
* Refresh Backdrop (redraws any tile that was flagged as being overdrawn last refresh. this restores the backdrop to it's original state)
For TileY = 0 to TilesDown
For TileX = 0 to TilesAcross
if TileRefresh(xlp,ylp)
; redraw this portion of the backdrop, since it was previously covered by a sprite
endif
next
next
For each character in game
.. Move character (AI , physics etc)
Get characters final screen coordinates and calc bounding rectangle
convert the sprites screen coords to array coords and fill this rectangle of tiles in the array with
next
refresh the screen
Loop
What this lets us do, is we're effectively removing the cost of the redrawing the full backdrop each update. That is assuming we dont have a full screen of sprites covering the backdrop, in that case it'll all get redrawn regardless.
See example:
Selective Tile Map Refreshing See Example #2
Dirty Rectangles (Combined Video & Fx Rendering)