(http://www.underwaredesign.com/PlayBasicSig.png)
Visit www.PlayBasic.com (http://www.playbasic.com)
A Crash Course In Optimization Part#2 - Removing Redundant Calculations (Micro Opt's)
A redundant calculation is any program code that is being performed where when the result never changes. This is particularly important inside high frequency loops (Loops with lots of iterations).
Variables Faster Than Array/Types Accesses In this example, we're looking at a situation that sometimes can occur when performance calculations upon structures (types/arrays). Every time you access an array/type PB has to resolve this access location to a where the item is in memory. While this isn't a massive overhead (considering what it does), it's a still far greater overhead than variables have. So it can often be beneficial to hold the calculation results in temp variables when doing calculations from a type/array.
[pbcode]
Tests=500
// Variables have the lowest access overhead, so it's often quicker
// to pull values out of arrays or type if your performing calculations
// on them.
Type tGameObject
x#
y#
SpeedX#
SpeedY#
EndType
Dim Object(10) as tGameObject
ScreenWidth=GetScreenWidth()
ScreenHeight=GetScreenHeight()
Do
cls 0
inc frames
// ===================================
// Calling Functions In a Loop
// ===================================
X=300
// ==========
// Test #1
// ==========
T=timer()
For LP=0 to Tests
// Here we're constantly accessing the type fields. So PB has to resolve
// this every time.
// You'll notice PB has to access the X# and y# fields at least 4 times, possible 5
// times. This adds up.
for object=1 to 10
Object(Objectlp).X#= Object(Objectlp).X#+Object(Objectlp).SpeedX#
Object(Objectlp).Y#= Object(Objectlp).Y#+Object(Objectlp).SpeedY#
if Object(Objectlp).X#>ScreenWidth
Object(Objectlp).X#=ScreenWidth
endif
if Object(Objectlp).Y#>ScreenHeight
Object(Objectlp).Y#=ScreenHeight
endif
next
next
tt1#=tt1#+(timer()-t)
Print "Test #1 Average Time:"+Str$(tt1#/frames)
// ==========
// Test #2
// ==========
T=timer()
// Since the screen width is never going to change
// during the loop, if we want speed we should move
// this calculate outside the loop
ScreenWidth=GetScreenWidth()
For LP=0 to Tests
// Here we're constantly accessing the type fields. So PB has to resolve
// this every time.
for object=1 to 10
// since we're going to compare it's new position, we'll store this in
// a variables X# & Y#, rather than back in the type.
// This way we're reduce the type accessof fields X# * y# to 2 accesses in total
X#= Object(Objectlp).X#+Object(Objectlp).SpeedX#
Y#= Object(Objectlp).Y#+Object(Objectlp).SpeedY#
if X#>ScreenWidth
X#=ScreenWidth
endif
if Y#>ScreenHeight
Y#=ScreenHeight
endif
// Once we're done now we store it back in the type
Object(Objectlp).X#=X#
Object(Objectlp).Y#=Y#
next
next
tt2#=tt2#+(timer()-t)
Print "Test #2 Average Time:"+Str$(tt2#/frames)
Sync
loop
[/pbcode]
Calling Functions In Loops This demonstration we've two simple loops. The first loop we're constantly comparing a variable X with the screenwidth. Each loop it calls the "GetScreenWidth()" function. The question is why ? - Will the screen ever change sizes inside the loop ? - Nope. So in the second version of the loop we precalculated the screen width by calling the GetScreenWidth and storing the value in a variable.
So if we call functions too much, they can eat up valuable time from our game. A millisecond improvement might not sound like much, but can make all the difference to your games performance.
[pbcode]
Tests=10000
Do
cls 0
inc frames
// ===================================
// Calling Functions In a Loop
// ===================================
X=300
// ==========
// Test #1
// ==========
T=timer()
For LP=0 to Tests
// Check if the X variable is large the Screen Width
if X>GetScreenWidth()
Print "X bigger than Screen edge"
endif
next
tt1#=tt1#+(timer()-t)
Print "Test #1 Average Time:"+Str$(tt1#/frames)
// ==========
// Test #2
// ==========
T=timer()
// Since the screen width is never going to change
// during the loop, if we want speed we should move
// this calculate outside the loop
ScreenWidth=GetScreenWidth()
For LP=0 to Tests
// Check if the X variable is large the Screen Width
if X>ScreenWidth
Print "X bigger than Screen edge"
endif
next
tt2#=tt2#+(timer()-t)
Print "Test #2 Average Time:"+Str$(tt2#/frames)
Sync
loop
[/pbcode]
Storing Calculation Results In Temp Variables This (fairly bogus) example demonstrates how including unnecessary calculations inside heavy loops can cost us some performance.
[pbcode]
Tests=10000
Do
cls 0
inc frames
// ===================================
// Precalc Values Out Of Heavy Loops
// ===================================
A=100
B=200
// ==========
// Test #1
// ==========
T=timer()
For LP=0 to Tests
Result=A*B*lp
next
tt1#=tt1#+(timer()-t)
Print "Test #1 Average Time:"+Str$(tt1#/frames)
// ==========
// Test #2
// ==========
T=timer()
// Since values of A and B never change inside the loop
// we can pre calc the value outside before we enter the loop
ATimesB =A*B
For LP=0 to Tests
Result=ATimesB*lp
next
tt2#=tt2#+(timer()-t)
Print "Test #2 Average Time:"+Str$(tt2#/frames)
Sync
loop
[/pbcode]
Short Circuiting Comparisons This example demonstrates how we can win back some performance by separating combined comparisons. While will generally win us some performance, providing we place the more unlikely (to be true) compare first. Other wise, if the first compare is true frequently, this might cost us more, since we're falling through both comparisons to get a complete match.
[pbcode]
Tests=10000
Do
cls 0
inc frames
// ===================================
// Short Circuiting Comparisons
// ===================================
A=100
B=200
// ==========
// Test #1
// ==========
T=timer()
For LP=0 to Tests
if A=12345 and B=12345
print "Both Match"
endif
next
tt1#=tt1#+(timer()-t)
Print "Test #1 Average Time:"+Str$(tt1#/frames)
// ==========
// Test #2
// ==========
T=timer()
For LP=0 to Tests
if A=12345
if B=12345
print "Both Match"
endif
endif
next
tt2#=tt2#+(timer()-t)
Print "Test #2 Average Time:"+Str$(tt2#/frames)
Sync
loop
[/pbcode]
Dropping the True from Comparisons Often it's common to compare the return value from a function call with a TRUE/FALSE within an IF statement. This extra comparison normally not needed. Since the IF statement assumes that value/calculation following it, is FALSE if the calc return/variable is zero. Anything else, and the IF will treat is as TRUE.
[pbcode]
Tests=10000
Do
cls 0
inc frames
// ===================================
// Dropping the True from Comparisons
// ===================================
// ==========
// Test #1
// ==========
T=timer()
For LP=0 to Tests
if GetSpriteStatus(1)=true
print "Sprite Exists"
endif
next
tt1#=tt1#+(timer()-t)
Print "Test #1 Average Time:"+Str$(tt1#/frames)
// ==========
// Test #2
// ==========
T=timer()
// If we're checking for TRUE/FALSE return value from
// a function we can drop extra =TRUE
// in these situations. Since GetSpriteStatus(1) will
// return a true/false value anyway.
For LP=0 to Tests
if GetSpriteStatus(1)
print "Sprite Exists"
endif
next
tt2#=tt2#+(timer()-t)
Print "Test #2 Average Time:"+Str$(tt2#/frames)
Sync
loop
[/pbcode]
User Functions VS Projected Subroutines [pbcode]
Tests=10000
Do
cls 0
inc frames
// ===================================
// USer Functions VS Projected Subroutines
// ===================================
// ==========
// Test #1
// ==========
T=timer()
For LP=0 to Tests
result=SomeFunctionCalc(10,lp)
next
tt1#=tt1#+(timer()-t)
Print "Test #1 Average Time:"+Str$(tt1#/frames)
// ==========
// Test #2
// ==========
T=timer()
// Call the Psub function
For LP=0 to Tests
result=SomesubCalc(10,lp)
next
tt2#=tt2#+(timer()-t)
Print "Test #2 Average Time:"+Str$(tt2#/frames)
Sync
loop
Function SomeFunctionCalc(A,B)
A=A*B
EndFunction A
Psub SomeSubCalc(A,B)
A=A*B
EndPsub A
[/pbcode]
String Thrashing[pbcode]
Tests=10000
Do
cls 0
inc frames
// ===================================
// Using MID$() when comparing a Character within a String
// ===================================
TestString$="Hello World"
// ==========
// Test #1
// ==========
T=timer()
For LP=0 to Tests
if Mid$(TestString$,5,1)="a"
print "Character 5 is A"
endif
next
tt1#=tt1#+(timer()-t)
Print "Test #1 Average Time:"+Str$(tt1#/frames)
// ==========
// Test #2
// ==========
T=timer()
// MID() is variation of the MID$(), except it
// returns the ASCII value as integer.
// Integers are faster to compare than strings
For LP=0 to Tests
if Mid(TestString$,5)=asc("a")
print "Character 5 is A"
endif
next
tt2#=tt2#+(timer()-t)
Print "Test #2 Average Time:"+Str$(tt2#/frames)
Sync
loop
[/pbcode]
Disc Thrashing One cause of performance loss often occurs when we have to read or write large files. Sadly disc drives are notoriously slow beasts, so we have to work with them and not against them. What you might not know, is that because disc devices (and CD/DVD's also) have to physically spin up and seek out the position of the information we're requesting or writing. There's a latency between each time you try and read or write. This latency is particularly magnified when we try and read/write small pieces of data. Each time we ask it to read/write a small packet of data, the device has to local this area on the spinning disc. This really goes against how these devices are designed, which is they like to read/write big fat chunks of information in one hit. So knowing this, we can improve our disc performance by working with the devices strengths. If we have big file to work with, then rather than nibble away it with BYTE/WORD/Long access, it's generally more efficient to load (or save) a bigger chunk in memory and then work with that.
This example shows the worst possible case scenario, which is reading and writing files BYTE by BYTE.
[pbcode]
Filename$=CurrentDir$() +"TempFile.txt"
// Alloc 100 K sized bank
Bank=NewBank(100*1024)
For lp=0 to size-1
pokeBankByte Bank,lp,lp
next
repeat
Cls rgb(10,20,40)
inc frames
// This test write the Bank contents BYTE by BYTE
t=Timer()
WriteBankAsBytes(Filename$,Bank)
t1#=t1#+(timer()-t)
print "Write Bank As Bytes:"+Str$(t1#/frames)
// This test reads the file into a new bank, byte by byte
// this is where you see just how much this can choke
// your app's disc performance.
t=Timer()
ThisBank=ReadBankAsBytes(Filename$)
t2#=t2#+(timer()-t)
print "Read Bank As Bytes:"+Str$(t2#/frames)
DeleteBank ThisBank
// These versions read/write the same data, but they use
// batch writing. Making them much quicker
t=Timer()
WriteBankAsBlock(Filename$,Bank)
t3#=t3#+(timer()-t)
print "Write Bank As Data Block:"+Str$(t3#/frames)
t=Timer()
ThisBank=ReadBankAsBlock(Filename$)
t4#=t4#+(timer()-t)
print "Read Bank As Block:"+Str$(t4#/frames)
DeleteBank ThisBank
Sync
until Spacekey()
// remove the temp file
DeleteFIle Filename$
end
Function WriteBankAsBytes(Filename$,ThisBank)
If FileExist(filename$) then deletefile Filename$
Fh=WriteNewFile(Filename$)
if Fh
For lp=0 to getbanksize(Thisbank)-1
WriteByte Fh,PeekBankByte(ThisBank,lp)
next
CloseFile FH
endif
EndFunction
Function ReadBankAsBytes(Filename$)
Fh=ReadNewFile(Filename$)
if Fh
Size=FileSize(Filename$)
ThisBank=NewBank(size)
For lp=0 to size-1
POkeBankByte ThisBank,lp,ReadByte(Fh)
next
CloseFile FH
endif
EndFunction ThisBank
Function WriteBankAsBlock(Filename$,ThisBank)
If FileExist(filename$) then deletefile Filename$
Fh=WriteNewFile(Filename$)
if Fh
Address=GEtBankPtr(ThisBank)
WriteMemory fh,Address,Address+GetBankSize(ThisBank)
CloseFile FH
endif
EndFunction
Function ReadBankAsBlock(Filename$)
Fh=ReadNewFile(Filename$)
if Fh
Size=FileSize(Filename$)
ThisBank=NewBank(size)
readMemory Fh,GetBankPtr(ThisBank),Size
CloseFile FH
endif
EndFunction ThisBank
[/pbcode]
Examples,
Load Text Files To String Array (http://www.underwaredesign.com/forums/index.php?topic=1917.0) Buffered file copy (copy data in blocks) (http://www.underwaredesign.com/forums/index.php?topic=3815.0)
(http://www.underwaredesign.com/PlayBasicSig.png)
Visit www.PlayBasic.com (http://www.playbasic.com)
A Crash Course In Optimization Part #3 - Smarter Drawing Techniques
In all facets of video game development, optimizing our programs drawing behavior is one of the most beneficial ways of improving our games overall performance. Now, I know what the little voice in back your head is saying, but doesn't my computers
super fast video card make optimization redundant ? - Sadly No. Like everything in the computer they have a sweet spot. As such, they're designed to work in a certain way. While we can simply ignore this and press on regardless, this will often place more strain upon the drawing devices than need be. Making our programs slower than they could be.
How can I optimized my drawing ? - Well that's a good question, rather than getting into anything thats game or genre specific, we'll just try and cover some simple basic tips that can help us in all types of graphics programming. There's tips could be categorized as,
Tips * When Not To Clear The Screen
* Avoid Rendering Dead Pixels (Drawing nothing sure takes a long time)
* Changing Display depths can improve performance ?
* Drawing Everything Real Time ?
* Selectively Refresh The Screen (Dirty Rectangles)
Note: The objective of optimizing our programs rendering, should be to improve the programs performance, without changing the appearance. We can generally do this by removing redundant drawing operations. For example, if we equated drawing each pixel on the screen as costing 1$ dollar per pixel. Then we'd want to make sure our program wasn't drawing the same pixels over and over again. As if we draw over a pixel twice then old one is gone (the user of our program never sees it), and cost of drawing that pixel has now doubled.
When Not To Clear The Screen
Clearing the screen (aka CLS) is more often than not one of the first commands you'll find our a programs main rendering loop. But is it really necessarily ? - Sometimes yes, but not always. We only need to clear the screen, if we're not going to be completely filling the screen with some backdrop. Such as a backdrop picture, gradient or perhaps a map even. It's in those situations that CLS is just a wasting some rendering time.
To demonstrate this has effect, the following example simulates the situation where we're drawing a backdrop image and also doing the unnecessary CLS. Press Space to toggle the CLS on/off between modes.
[pbcode]
// Create a screen sized image. This is used to simulate a backdrop
// picture you might use.
GameBackDrop=NewImage(GetScreenWidth(),GetSCreenHeight())
//Fill the BackDrop image with a blue gradient
RendertoImage GameBackDrop
c1=rgb(50,70,200)
c2=rgb(250,70,200)
ShadeBox 0,0,GetScreenWidth(),GetSCreenHeight(),c1,c2,c2,c1
rendertoscreen
ClsEnabled=true
// start of programs main loop
Do
//
if ClsEnabled=true
// Clear the Screen to a bright pinky/orange colour
// We can't see it, since the backdrop is being completely covered
// So the CLS is here for no reason
cls Rgb(200,100,100)
endif
// Draw the gane image as we would in our game.
// Since we're drawing it full screen, and we don't want
// anything to show through it, we draw it solid
Drawimage GameBackDrop,0,0,false
// SHow FPS
Text 10,10,"fps:"+str$(fps())
if ClsEnabled=true
t$="Cls Enabled"
else
t$="Cls Disabled"
endif
Text 10,30,t$
// Hit the spacekey to toggle CLS
if Spacekey()=true
ClsEnabled=1-ClsEnabled
FlushKeys
endif
Sync
loop
[/pbcode]
This example, is an extension of the same subject more or less. This one draws a gradient backdrop and a some foreground mountains as two screen sized separate images. Visually, we get a sort of sunset effect sitting behind the scrolling mountain side (which is just some ellipses in this example). The thing is, since the backdrop gradient is none descriptive (no distinctive markings), what we could do in this case, is combine the gradient and the mountain picture together. This will effectively 1/2 the amount of drawing our program is doing and therefore give us a free speed boost. I should point out, this isn't always possible, but well worth it if you can!
Press Space to toggle between the two method in this demo
[pbcode]
// =======================================
// Create a Screen Sized Image. This is used to simulate a backdrop
// picture you might use.
// =======================================
GradientBackDropLayer=NewImage(GetScreenWidth(),GetSCreenHeight())
//Fill the BackDrop image with a gradient
RendertoImage GradientBackDropLayer
c1=rgb(50,70,200)
c2=rgb(250,70,200)
ShadeBox 0,0,GetScreenWidth(),GetSCreenHeight(),c1,c1,c2,c2
// =======================================
// Create a second foreground layer
// =======================================
ForegroundLayer =NewImage(GetScreenWidth(),GetSCreenHeight())
RendertoImage ForegroundLayer
// Draw some ellipses to this surface to
Radius=GetSCreenWidth()/3
For Xlp=Radius/2 to GetScreenWidth()-1 step Radius
Ellipsec Xlp,GetScreenHeight(),Radius/2,Radius*1.5,true,rgb(0,255,0)
next
// =======================================
// Create version that is merged together
// =======================================
MergedBackDrop=NewImage(GetScreenWidth(),GetSCreenHeight())
RendertoImage MergedBackDrop
Drawimage GradientBackDropLayer,0,0,false
Drawimage ForegroundLayer,0,0,true
rendertoscreen
//
DRawMergedBackDrop=false
// start of programs main loop
Do
ScrollX=Mod(ScrollX-1,GetScreenWidth())
if DRawMergedBackDrop=false
// Draw the gane image as we would in our game.
// Since we're drawing it full screen, and we don't want
// anything to show through it, we draw it solid
Drawimage GradientBackDropLayer,0,0,false
// Draw the forreground layer over the gradient
Tileimage ForegroundLayer,ScrollX,0,true
else
// Draw the pre merged version of the back drop
// in place of the two seperate images
TileImage MergedBackDrop,ScrollX,0,false
Endif
// SHow FPS
Text 10,10,"fps:"+str$(fps())
if DRawMergedBackDrop=false
t$="Drawing Backdrops Layers Separately"
else
t$="Drawing Merged version"
endif
Text 10,30,t$
// Hit the spacekey to toggle CLS
if Spacekey()=true
DRawMergedBackDrop=1-DRawMergedBackDrop
FlushKeys
endif
Sync
loop
[/pbcode]
Avoid Rendering Dead Pixels
What's a dead pixel ? - These are pixels in the image that we're drawing but are not visible to the end user.
Solid VS Transparent rendering[pbcode]
// Create a blank image 100*100 pixels in size
BlankImage =NewIMage(100,100)
TransparentRender =false
// start of programs main loop
Do
// Clear the Screen
Cls rgb (100,150,200)
// Draw the Blank image to screen 25 times
For lp=1 to 25
Xpos=150+(lp*20)
Ypos=100+(lp*15)
Xpos=Xpos-GetIMageWidth(BlankImage)/2
Ypos=Ypos-GetIMageHeight(BlankImage)/2
DrawImage BlankImage,Xpos,Ypos,TransparentRender
next
// SHow FPS
Text 10,10,"fps:"+str$(fps())
if TransparentRender=False
t$="Drawing Solid Images"
else
t$="Drawing Transparent Images"
endif
Text 10,30,t$
// Hit the space key to toggle between transparent and solid rendering
if Spacekey()=true
TransparentRender=1-TransparentRender
FlushKeys
endif
Sync
loop
[/pbcode]
If you test (cut and paste the code into PlayBASIC) you'll possibly see something you didn't expect. While your intuition might be telling you that since we can't see anything being drawn, when rendering transparent pixels, then this isn't eating up our computing time. If that was the case, then it'd be much faster when rendering an all transparent image. However, the reality is quite different. It's actually a fraction slower. How much slower really depends on the system here.
Armed with little bit of knowledge, the next question is how can we best take advantage of it ? - Well, there's a few situations where we can optimize our images to smooth out our games rendering performance. Such as trimming backdrop images of unnecessary transparent sections and full screen HUD overlays for starters. But one place our games can eat up a lot of processing power is when we draw our characters to the screen. In particular character animations.
Why ? - Well animations are often drawn and laid out upon sprite sheets so that all frames are the same size. While this might make loading the animation relatively straight forward, it does mean that we're potentially going to have animation frames that are all the same size (width/height) regardless of the what's in each frame.
If you imagine a running animation for a second, then hopefully it's easy to visualize that certain frames during the animation are going to wider or higher than others. Such as when the character is fully extended for example. Moreover if we imagine an explosion animation, where the debris starts out centralized and then radiates out. Then in such explosion animations, the initial frames will generally be smaller than the later ones.
OK, you're probably falling asleep by now wondering why this is an issue. Well, It relates back to our discovery above about how rendering transparent pixels costs the same (or more) than drawing a visible pixel. So if all of our animation frames have extra unused (transparent) space around the visible graphics porttion, then that space is not only costing us a little bit of render performance, but it's also wasting memory.
Example Lets have a look,
[pbcode]
// Create two identical images. The only difference is that
// one is larger than the other. The larger (oversized) one
// is bit slower to render.
OverSizedImage =MakeBallImage(256,90)
TrimmedImage =MakeBallImage(180,90)
CurrentImage=OverSizedImage
//
DrawTrimmedIMages=false
// start of programs main loop
Do
// Clear the Screen
Cls rgb (30,40,50)
// Draw the current image to screen 25 times
For lp=1 to 25
Xpos=200+(lp*15)
Ypos=200+(lp*10)
Xpos=Xpos-GetIMageWidth(CurrentImage)/2
Ypos=Ypos-GetIMageHeight(CurrentImage)/2
DRawimage CurrentImage,Xpos,Ypos,true
next
// SHow FPS
Text 10,10,"fps:"+str$(fps())
if DrawTrimmedIMages=False
t$="Drawing Oversized Images"
CurrentImage=OverSizedImage
else
t$="Drawing Trimmed Images"
CurrentImage=trimmedImage
endif
Text 10,30,t$
// Hit the space key to toggle between trimmed and oversize images
if Spacekey()=true
DrawTrimmedIMages=1-DrawTrimmedIMages
FlushKeys
endif
Sync
loop
Function MakeBallImage(FrameSize,Radius)
Index=NewImage(FrameSize,FrameSize)
Rendertoimage Index
For lp=Radius to 1 step -1
Circlec frameSize/2,frameSize/2,lp,true,rgb(100,200-lp,255-(lp*2))
next
rendertoscreen
EndFunction index
[/pbcode]
Example #2 - Fixed Size Vs Trimmed Animations Size To demonstrate, in this example we're creating two animation sequences. One is trimmed of any dead (transparent) pixels the other isn't. We're testing the performance difference by rendering these animations on screen, each in a different stage of it's animation sequence. You can probably surmise what's going to happen. The one where we're showing fixed size animation frames will get progressively slower (the more you have the slower it gets) than the one with the trimmed animations. So trimming helps us balance the load better.
[pbcode]
FrameCount=25
FrameSize=150
// Make two Expanding Ball Type Animations.
Dim FixedSizeImages(FrameCount*2)
Dim TrimmedImages(FrameCount*2)
MakeArray Animation()
Radius=70
For lp=0 to frameCount
Radius2=Radius-(lp*2.5)
// Make the Fixed size version
img=MakeBallImage(FrameSize,Radius2,rgb(00,255,lp))
FixedSizeImages(lp)=img
FixedSizeImages(FrameCount*2-lp)=img
// Make a 'rough' trimmed version anim (in a different colour)
img=MakeBallImage((Radius2*2)-2,Radius2,rgb(255,155,lp))
TrimmedImages(lp)=img
TrimmedImages(FrameCount*2-lp)=img
next
// Set toggle which animation is being viewed
DrawTrimmedImages=false
// start of programs main loop
Do
// Clear the Screen
Cls rgb (30,50,70)
if DrawTrimmedImages=False
t$="Fixed Size Animations"
SetArray Animation(),GetArray(FixedSizeImages())
else
t$="Trimmed Animations"
SetArray Animation(),GetArray(TrimmedImages())
endif
FrameIndeX=Mod(FrameIndex+1,FrameCount*2)
// Draw a screen full of animations
Xpos=100
Ypos=50
For lp=1 to 200
ThisFrame=Mod(FrameIndex+lp,FrameCount*2)
CurrentImage=Animation(Thisframe)
Xpos=Xpos+20
if Xpos>700
Xpos=100
Ypos=Ypos+100
endif
X=Xpos-GetIMageWidth(CurrentImage)/2
Y=Ypos-GetIMageHeight(CurrentImage)/2
DRawimage CurrentImage,X,Y,true
next
// DRaw the anim over the scene again.
X=100-GetIMageWidth(CurrentImage)/2
Y=200-GetIMageHeight(CurrentImage)/2
DRawimage CurrentImage,X,Y,false
box x,y,x+GetIMageWidth(CurrentImage),Y+GetIMageHeight(CurrentImage),false
// SHow FPS
Text 10,10,"fps:"+str$(fps())
Text 10,30,t$
// Hit the space key to toggle between trimmed and fixed size versions
if Spacekey()=true
DrawTrimmedImages=1-DrawTrimmedImages
FlushKeys
endif
Sync
loop
Function MakeBallImage(FrameSize,Radius,Colour)
Index=NewImage(FrameSize,FrameSize)
Rendertoimage Index
For lp=Radius to 1 step -1
Scale#=(Radius-lp)/Float(Radius)
Circlec frameSize/2,frameSize/2,lp,true,RgbFade(Colour,Scale#*100)
next
rendertoscreen
EndFunction index
[/pbcode]
Changing Display depths can improve performance ?
While you might not think of it, the screen resolution (games display dimensions width & height) and the number of colours play a big part in final performance of our game. This is particularly important for those wanting to design programs that still perform well on older computers. While most of us take for granted that modern machines are capable of throwing high resolution images around the display fairly easily, the same can not be said for older systems. Which is clearly evident when we compare games from 2008 with games from 2004, or back in 2000, or 1980 for example.
The basic rule here is that the higher the resolution, the more pixels on screen, the more pixels on screen, the more power required to fill the screen with images at a reasonable speed.
For example,
If we use a screen size of 1048x * 768y with 32bit colour. That's (1024*768*4)=3,145,728 bytes for the screen. Yes, 3 meg !
Now lets say you CLS the screen and draw one layer of map to it. That means your GPU is roughly shifting 3 meg for the clear, and another 3 meg for the map render. So to draw the image costs approximately 6 meg of bandwidth per frame.
Now as we previously discovered, Every single pixel that is rendered (solid, transparent (mask colour) or translucent ) costs you performance! While Modern GPU's have a much higher fill rate (they can draw more pixels per second), older ones simply can't cope with too much graphics data. So we'll need a way to reduce the work load, if we hope to get reasonable performance on those systems.
We can reduce the work load in number of two ways.
1) Give the user the option of lowering the display depth from 24bit/32bit down to 16bit. This halves the amount of graphics data that has to be shifted, the trade of is that visually we lose some colour quality as well. It's also worth noting the many older GPU's were optimized for
16bit display modes over 24bit or 32bit modes. Really old systems often don't have 32bit mode at all. So on those systems there's no option but 16bit.
2) Change to smaller screen size. However, this is not always convenient when making 2D games. Since most of the artwork is drawn specifically for a certain display resolution. It is possible to scale the graphics media proportionally from within your program. However this will add a lot of extra processing work, and is unlikely to be very efficient or visually attractive. But possible if you really wanted. This approach is far easier in the 3D world.
3) Try to minimize the the amount of the wasted drawing the program might be doing. (The stuff mentioned in this very tutorial)
4) If you're doing things like rotation/image scaling then reducing the quality (the images dimensions) of the image can be very beneficial on old systems. Might not look as good, but this can keep the frame rate higher on those older systems.
5) Don't worry about supporting old clunkers anymore.. Optimizing is your choice, so you don't have to go out of your way support old systems.
Example What example demonstrates how much the resolution can effect the performance of our program. It does this by rendering a set of boxes to a collection of the 'screen' sized images. You can cycle through the images and monitor the programs performance(). The boxes are drawn proportionally, so the scene looks the same regardless of the size. Moreover, It also displays the number of pixels drawn at each resolution. The bigger the display gets the more pixels that are being drawn, and therefore the slower the demo gets.
[pbcode]
MaxCords=50
// Create an array to hold
Dim BoxCords(MaxCords,5)
For lp=0 to MaxCords
BoxCords(lp,1)=rnd(100)
BoxCords(lp,2)=rnd(100)
BoxCords(lp,3)=rnd(100)
BoxCords(lp,4)=rnd(100)
BoxCords(lp,5)=rndrgb()
next
Dim Screens(5)
Screens(1)=NewIMage(320,240)
Screens(2)=NewIMage(400,300)
Screens(3)=NewIMage(640,480)
Screens(4)=NewIMage(800,600)
Screens(5)=NewIMage(1024,768)
CurrentScreen=1
// start of programs main loop
Do
// Clear the Screen
Cls rgb (30,50,70)
ThisImage=Screens(CurrentScreen)
PixelsDrawn=DrawScene(ThisImage)
Xpos=GetScreenWidth()/2-GetIMageWidth(ThisImage)/2
Ypos=GetSCreenHeight()/2-GetIMageHeight(ThisImage)/2
DRawimage ThisImage,Xpos,Ypos,false
w=GetimageWidth(ThisImage)
h=GetimageHeight(ThisImage)
t$="Current Screen Size:"+str$(w)+"*"+str$(h)
// SHow FPS
Text 10,10,"fps:"+str$(fps())
Text 10,30,t$
Text 10,50,"Pixels Drawn:"+str$(PixelsDrawn)
// Hit the space key to toggle between trimmed and fixed size versions
if Spacekey()=true
CurrentScreen =CurrentScreen+1
if CurrentScreen>GetArrayElements(Screens(),1) then CurrentScreen=1
FlushKeys
endif
Sync
loop
Function DrawScene(ThisImage)
rendertoimage Thisimage
w=GetIMageWidth(ThisImage)
h=GetIMageHeight(ThisImage)
ScaleX#=w/100.0
ScaleY#=h/100.0
c1=rgb(200,100,10)
c2=rgb(200,100,150)
ShadeBox 0,0,w,h,c1,c1,c2,c2
For lp=0 to GetArrayElements(BoxCords(),1)
x1=BoxCords(lp,1)*Scalex#
Y1=BoxCords(lp,2)*Scaley#
x2=BoxCords(lp,3)*Scalex#
y2=BoxCords(lp,4)*Scaley#
swapifhigher x1,x2
swapifhigher y1,y2
boxc x1,y1,x2,y2,true,BoxCords(lp,5)
PixelsDrawn=PixelsDrawn+((x2-x1)*(y2-y1))
next
rendertoscreen
EndFunction PixelsDrawn
[/pbcode]
Drawing Everything Real Time
Initially when we first start getting into game programming, one of the most common
misconceptions, well, mistakes people make is they assume 'everything' needs to be drawn every update. While this is sometimes true, often we can selectively update and player won't really notice the difference. However, this does mean that it's not quite as cut and dry as just drawing everything all the time. But it's not rocket science either.
One the most common opportunities for selectively updating occurs in things refresh backdrop animations and foreground overlaps.
Scores and Health Bars In this example all were doing is drawing a simple score and health bar bellow. It's the sort of thing we'd just slap into our game and forget, but we can actually improve the performance by selectively updating it. The same approach will work for all things HUD related in fact. So If it's not changing, don't render it!
[pbcode]
LoadFont "Courier",1,24,0
ScoreOverLayImage=NewImage(204,50)
// start of programs main loop
Do
// Clear the Screen
Cls rgb (30,50,70)
scw=GetScreenWidth()/2
sch=GetScreenHeight()*0.4
if timer()>NextUpdate
NextUpdate=timer()+100
Score=Score+rnd(100)
Health=mod(health+1,100)
endif
if DrawMode=0
// DRaw the score and health directive to the screen, each update
DrawScoreAndHealth(scw,sch,Score,Health)
endif
if DrawMode=1
// Selectively cache the score& health to an image (since image rendering is quicker)
RefreshScore=false
if OldScore<>Score or OldHealth<>Health
; either the score or health have been changed, so we neeed to
; the score image cache
OldScore=Score
OldHealth=Health
RefreshScore=true
endif
if RefreshScore=true
rendertoimage ScoreOverLayImage
cls 0
DrawScoreAndHealth(101,0,Score,Health)
rendertoscreen
endif
drawimage ScoreOverLayImage,scw-101,sch,true
endif
if Drawmode=0
t$="Drawing to the screen"
else
t$="Selectively updating"
endif
// SHow FPS
Text 10,10,"fps:"+str$(fps())
Text 10,30,t$
// Hit the space key to toggle between draw mode
if Spacekey()=true
DrawMode =1-DrawMode
FlushKeys
endif
Sync
loop
Function DrawScoreAndHealth(Xpos,Ypos,Score,Health)
s$="Score:"+Digits$(Score,6)
CenterText Xpos,Ypos,s$
th=GetTextHeight(s$)
x1=xpos-102
x2=xpos+102
y1=ypos+th+5
y2=y1+th
c=rgb(20,30,40)
boxc x1,y1,x2,y2,true,c
c1=rgb(220,30,40)
c2=rgb(20,40,240)
ShadeBox x1+2,y1+2,x1+(Health*2),y2-2,c1,c2,c2,c1
EndFunction
[/pbcode]
Selectively Refreshing The Screen (Dirty Rectangles)
Often when we start writing games, it's common to see programmers just throwing everything at the graphics engine/hardware and let that deal with it. While this may work fine for some games on some computers, but not others. The reason for this is that graphics engines are generally designed to just do what you tell it to do. if you tell it draw the same image/circle/pixel 100 times, then that's what it'll do. It doesn't know what you're trying to achieve, or that you might be drawing something that isn't even on screen.
Selective refreshing is conceptually simple. Rather than just shoveling everything on screen every frame, we going to pay closer attention to what we're drawing and where. In this particular example, we're simply drawing some sprites (with alpha channel) moving around on a static backdrop. If we take the brute force approach, we've have a FX image the size of the screen. Copy the backdrop to this image, draw all the sprites over it, then copy the screen image to actual screen. This will work, but do we really need to refresh the entire backdrop every frame if the backdrop is static picture ? Nope. We only need to refresh the parts of the backdrop that are being overwritten.
There's a few ways of doing this, in this example we're simply going to treat the backdrop picture as if it was Tile Map. Using an array to signify when a tile (of rectangular portion of the backdrop image) needs to be refreshed to it's original state. During our sprite movement routines, we not only move the sprite, but we work out what tiles it's going to overwrite when it's drawn. These backdrop tiles are then flagged as needing to be refreshed when the backdrop is being restored.
So each loop we're basically doing this
Do
* Refresh Backdrop (redraws any tile that was flagged as being overdrawn last refresh. this restores the backdrop to it's original state)
For TileY = 0 to TilesDown
For TileX = 0 to TilesAcross
if TileRefresh(xlp,ylp)
; redraw this portion of the backdrop, since it was previously covered by a sprite
endif
next
next
For each character in game
.. Move character (AI , physics etc)
Get characters final screen coordinates and calc bounding rectangle
convert the sprites screen coords to array coords and fill this rectangle of tiles in the array with
next
refresh the screen
Loop
What this lets us do, is we're effectively removing the cost of the redrawing the full backdrop each update. That is assuming we dont have a full screen of sprites covering the backdrop, in that case it'll all get redrawn regardless.
See example:
Selective Tile Map Refreshing (http://www.underwaredesign.com/forums/index.php?topic=2677.0) See Example #2
Dirty Rectangles (Combined Video & Fx Rendering) (http://www.underwaredesign.com/forums/index.php?topic=3395.0)
(http://www.underwaredesign.com/PlayBasicSig.png)
Visit www.PlayBasic.com (http://www.playbasic.com)
A Crash Course In Optimization Part #4 - Smarter Program Logic
Becoming a good programmer (games or otherwise), goes far beyond the graphical mumbo jumbo of the day. What do I mean by that ? - Well, it doesn't matter what display technology/graphics libraries you use, if your programming & problem solving abilities aren't up to it. Then you'll most likely end up with some pretty visuals on screen, but be unable to actually get the game working. Which is incredibly common !
So here, we'll try and address some these stumbling blocks new/old programmers are likely to run into. Hopefully this will help improve your algorithmic knowledge (methods) and code design principals. Which might just help you shake the brass monkey!
There's tips could be categorized as,
Design Tips * Clearer Code means faster development
Algorithm Tips * Spacial Partitioning (Collision)
Note: Remember, improving our programming design and algorithmic knowledge(methods) has a two fold effect. While better method/algorithms will certainly help optimize our programs performance, the bigger gain can come improved code design/structure. Which will allow us to create larger more complex programs, with less errors at much faster rate.
Clearer Code means faster development
Optimization is the pursuit of efficient design. Conceptually this goes far beyond just improving the execution speed of a piece of program code. By learning to write clearer more structured code to begin with, we can optimize our very development and bug test progresses. Allowing us to develop larger more complex programs with less faults in less time. How do we go about this ?
The simple answer is that we need apply some structure, some rules to our own programming style. Thus forcing ourselves to
think before acting. While this might not be the most intuitive thing to do, in particular for new programmers, all too often badly/ugly code is the result of an impulsive blast at the keyboard.
Just what is bad code ? - While highly subjective, we'll classify a bad or ugly code fragment as being something that
fails to efficiently convey it's meaning to the reader in a well laid out manner.
The key point here is that even though it might 'execute correctly', if it's not written in legible fashion, then this can make the code difficult to debug, very difficult to change and almost impossible to reuse.
So in no particular order this means,
* Use meaningful Variable, Array, Types & Function Names throughout our program. Using meaningful names (of variables/functions/arrays etc) helps with the programs readability dramatically. As the author of a program, we can sometimes be forgiven for using names that don't really related to what they hold or do. This might not be a huge issue while we're developing a section of program code, as while we're writing it, what these variables/functions name represent will be fresh in our memory. It's when we take a break from programming, or show the code to somebody else. It's here that such decisions become more of a liability than anything else. Why ? - Well, beside from being error prone, it means that in order to understand what a section of code is doing, we have to re-learn it from scratch every time. The bigger the fragment, the more difficult and time consuming this becomes.
* Comment your code. Comments might be used to describe the approached a section of code is using to solve a particular problem, outline any known limitations of the code fragment, code separators, to do lists anything really. So comments can help us future proof our own programs.
* Use the appropriate control structures for the algorithm/logic we're implementing. * Use Functions & Subs to divide and conquer. Related articles Planning Your Project (http://www.underwaredesign.com/forums/index.php?topic=1018.msg8246#msg8246)
Thoughts on Planning / Code Design (http://www.underwaredesign.com/forums/index.php?topic=1018.msg13328#msg13328)
Choosing The Correct Variables (http://www.underwaredesign.com/forums/index.php?topic=2401.0)
Wiki articles Given the size of the this particular subject, we can't possibly cover it in a few paragraphs, so feel free to could out some out.
http://en.wikipedia.org/wiki/Structured_programming (http://en.wikipedia.org/wiki/Structured_programming)
http://en.wikipedia.org/wiki/Procedural_programming (http://en.wikipedia.org/wiki/Procedural_programming)
Spacial Partitioning
One increasingly common problem in game programming, relates to the efficient detection of the collisions between the various characters in our game world. Effectively the more characters we have, the more collisions possibilities, and the more checks, the slower the program will operate.
Imagine this scenario, imagine we have a room full of people all walking around the room randomly trying not to bump into each other. So our programs AI will be required to predict collisions between the various people, so the characters can take evasive action. Which just means that each time a person moves, we'll have to make sure the person isn't blindly moving into another.
One way of doing this, is by checking each person against every other person in the room. So if there's 10 people. Person #1 is compared with persons 2,3,4,5,6,7,8,9,10. Person #2 is compared with persons 1,3,4,5,6,7,8,9,10 and so on for all 10 people. While this approach will work, it's doesn't scale up very well, which is it's main flaw. For 10 people we're only making (9*9)=81 collision checks. Which isn't too bad. But what if the room has 100 or 1000 people in it ? It's here the cracks start appearing.. For 100 characters we're performing (99*99)=9801 checks, for 1000 characters it's (999*999)=
998,001 checks. If we think about, by comparing each person to every other person, we'll going to end up making lots of redundant collision checks. Meaning lots of wasted processing time and ultimately a slower running game.
So what's the alternative if comparing everything to everything is too much work for our game ? - Well, we need to start thinking of ways to reduce the number of necessary collision checks. This can be tackled in a while bunch of ways, But for now we'll look into some basic
Spatial Partitioning Spatial partitioning (as the name suggests) is where our program keeps track of the region or zone each character is currently within. So when it comes to detecting collisions between the characters, we only need to the compare our current character, with just the characters that share the same zone. This simple design change, will dramatically reduce the collision overhead.
If we use the example from above again, but this time lets imagine we have an overhead view of room with 1000 people randomly positioned on screen. This time, rather than comparing each person, with every other person, we'll first run though our list people are assign them one of two zones. Characters standing in the top half of the display will be in zone A, characters standing in the bottom part of the display will be in zone B. Characters that are standing in the middle of the screen, will be placed in both zones A & B. So for example sake, lets say that Zone A has 550 people in that half of the screen and Zone B has the remaining 450 people. This will give us an approximate collision test count of (550*550) + (450*450) =
505,000 checks. So we've halved the work load by simply splitting up the screen into two zones. What would happen if we split it into four zones, eight zones ? Simple, we get lower collision check count.
This example uses this very approach (vertically).
See Partition (inter object) Shape Collision (http://www.underwaredesign.com/forums/index.php?topic=1237.0)