Tutorial - A Crash Course In BASIC program Optimization

Started by kevin, June 30, 2008, 01:46:38 PM

Previous topic - Next topic

kevin




A Crash Course In BASIC program Optimization


    Optimization is often seen as some type dirty word in programming circles.   So it's not that uncommon to run into the same old  "Computers are so fast today, nobody optimizes their code anymore." catch phrase.  If only that were true!

     While today's machines are indeed many times faster than they used to be (even just a few years ago), the programs we use today require much more media & data to be manipulated than before.  The more media and data programs need to man handle,  the more muscle (computing power) it takes.   So it's still a balancing act.  

     What is optimization ?  -   Optimization is the process where a programmer tries to improve their programs performance by restructuring the time critical parts (the sections of the program where computer spends the most time executing) into a more stream lined manner.    Unfortunately, this tends be something that you learn through experience / education.  So unfortunately, it's not uncommon for new programmers to make mistakes that can dramatically impact their programs performance, without even knowing it.

     There's two basic forms of optimizations.  We'll call these,  Micro optimizations and Method optimizations.   In this tutorial, we're going to kick around some ideas that may help give your programs a little more get up and go.  

      Micro optimizations are when we try and make subtle changes to the programs code,  looking for 'faster' commands and slight logic tweaks to improve the overall performance of the action.  So generally, we're not changing the entire approach taken, rather just stream lining key parts of the algorithms in our program.   While useful, there's only so far you can go take this before you need to change approaches.

     Method optimizations are when we completely change how we approach a key problem(s) in our program.    Often this will give far greater performance improvement than any micro optimization you'll ever make.  The trick is, coming up with alternative methods isn't always obvious.   In particular, if we're new to this programming stuff.    

     For example,  Imagine a person "Jim" wants to send a message to a friend in another city.   Immediately, your probably already thinking of a few ways you'd approach this problem.  Well, lets say  the first thing "Jim" thought of,  was sending his message via carrier pigeon.     However, when "Bill"  was confronted with the same situation, he decided to send an EMAIL.    While Bill's  email might find it's way into his friends inbox within a few seconds,   Jim's pigeon takes a week.    Now with more training,  Jim could probably make Micro Optimizations to the pigeons flight path that would no doubt see it reach it's target quicker.    But there's only so far he can go before the bird can't physically get there any faster.    So in order to compete, Jim would have to optimize his sending method by changing to an entirely difference approach.  Such as Email/SMS..

    While not best the analogy,  this hopefully demonstrates that the difference between Micro and Method optimizations.  




Don't waste time Optimizing the Input Code


          Be careful. Optimizing can sometimes be a trap, you can easily waste valuable time, effort and more importantly motivation optimizing aspects of your program, that are not that time critical.   As it turns out, in the majority of projects, only a tiny percentage (10% or less) of your game code is really time critical.    So if you're going to optimize, don't guess, analyze your programs behavior and see where it's really spends the bulk of it's time.   In most games,  the main time eaters will be the updating the character Logic (AI / Physics) ), Collisions and ultimately Drawing the stuff to the screen.


       

General Tips On Tweaking a Programs Performance


      All too frequently the war is lost in the design phase, by attempting to make our programming tools do something they weren't really intended to do.   No matter how many stripes you paint on an elephant, it'll never be a zebra!   In other words, by designing our programs (games in our case) to take advantage of the feature set of the language they're written in, then this will generally put us on the right track.

      So bellow, we're basically going to take a general overview of common issues and code optimizations.    While the following examples are written in PlayBasic code, they're fairly universal.    So you might just learn something useful,  you might not too :)


      * Removing Redundant Calculations

      * Smarter Drawing

      * Smarter Logic

       


When Should I Optimize My Program ?


         Generally, method optimizations are made during a programs design phase.  Which occurs before even a single line of program code is written.  Experienced programmers can do this as they're often familiar with the types of problems & solutions required within a particular game genre.  Unfortunately, It's going to be very difficult for new programmers to do this up front.   While this tutorial covers some of the basics, this is a huge subject...

         Anyway, assuming you've some idea of how your game is going to work,  then It's often best to prototype the design before making a serious commitment to this approach.   It's here we generally find more of the big issues that we missed at design time.  As such, you can refine whatever core methods the program is using and press onto the next phase.   I should point out, that sometimes we'll need to revise our original game design ideas to fit into with our programming skills / languages.

         Micro optimizations are generally best made at the very end of a programs development.  

         Now,  you can optimize while you work.  Which in some places makes perfect sense.    Like for example, tweaking a heavily used subroutine / function to work more efficiently.    However,   If by optimizing a programs code you're going to end up with messy and or impossible to follow/update code, or the optimization limits the programs functionality is some negative way in the future...  Then DON'T DO IT!

         We optimize to make our programs more efficient,  not to make unreadable spaghetti !



kevin

#1



A Crash Course In Optimization  Part#2 - Removing Redundant Calculations (Micro Opt's)


         A redundant calculation is any program code that is being performed where when the result never changes.   This is particularly  important inside high frequency loops (Loops with lots of iterations).



        Variables Faster Than Array/Types Accesses
       
         In this example, we're looking at a situation that sometimes can occur when performance calculations upon structures (types/arrays).    Every time you access an array/type PB has to resolve this access location to a where the item is in memory.  While this isn't a massive overhead (considering what it does), it's a still far greater overhead than variables have.    So it can often be beneficial to hold the calculation results in temp variables when doing calculations from a type/array.    


PlayBASIC Code: [Select]
  Tests=500

// Variables have the lowest access overhead, so it's often quicker
// to pull values out of arrays or type if your performing calculations
// on them.

Type tGameObject
x#
y#
SpeedX#
SpeedY#
EndType

Dim Object(10) as tGameObject


ScreenWidth=GetScreenWidth()
ScreenHeight=GetScreenHeight()


Do
cls 0
inc frames

// ===================================
// Calling Functions In a Loop
// ===================================
X=300

// ==========
// Test #1
// ==========

T=timer()
For LP=0 to Tests

// Here we're constantly accessing the type fields. So PB has to resolve
// this every time.

// You'll notice PB has to access the X# and y# fields at least 4 times, possible 5
// times. This adds up.
for object=1 to 10
Object(Objectlp).X#= Object(Objectlp).X#+Object(Objectlp).SpeedX#
Object(Objectlp).Y#= Object(Objectlp).Y#+Object(Objectlp).SpeedY#

if Object(Objectlp).X#>ScreenWidth
Object(Objectlp).X#=ScreenWidth
endif
if Object(Objectlp).Y#>ScreenHeight
Object(Objectlp).Y#=ScreenHeight
endif
next

next
tt1#=tt1#+(timer()-t)
Print "Test #1 Average Time:"+Str$(tt1#/frames)

// ==========
// Test #2
// ==========

T=timer()
// Since the screen width is never going to change
// during the loop, if we want speed we should move
// this calculate outside the loop
ScreenWidth=GetScreenWidth()
For LP=0 to Tests

// Here we're constantly accessing the type fields. So PB has to resolve
// this every time.
for object=1 to 10

// since we're going to compare it's new position, we'll store this in
// a variables X# & Y#, rather than back in the type.
// This way we're reduce the type accessof fields X# * y# to 2 accesses in total

X#= Object(Objectlp).X#+Object(Objectlp).SpeedX#
Y#= Object(Objectlp).Y#+Object(Objectlp).SpeedY#

if X#>ScreenWidth
X#=ScreenWidth
endif
if Y#>ScreenHeight
Y#=ScreenHeight
endif

// Once we're done now we store it back in the type
Object(Objectlp).X#=X#
Object(Objectlp).Y#=Y#

next

next
tt2#=tt2#+(timer()-t)
Print "Test #2 Average Time:"+Str$(tt2#/frames)


Sync
loop






        Calling Functions In Loops

         This demonstration we've two simple loops. The first loop we're constantly comparing a variable X with the screenwidth.  Each loop it calls the  "GetScreenWidth()" function.   The question is why ? - Will the screen ever change sizes inside the loop ? - Nope.  So in the second version of the loop we precalculated the screen width by calling the GetScreenWidth and storing the value in a variable.  

          So if we call functions too much, they can eat up valuable time from our game.  A millisecond improvement might not sound like much, but can make all the difference to your games performance.

PlayBASIC Code: [Select]
   Tests=10000

Do
cls 0
inc frames

// ===================================
// Calling Functions In a Loop
// ===================================
X=300

// ==========
// Test #1
// ==========

T=timer()
For LP=0 to Tests
// Check if the X variable is large the Screen Width
if X>GetScreenWidth()
Print "X bigger than Screen edge"
endif
next
tt1#=tt1#+(timer()-t)
Print "Test #1 Average Time:"+Str$(tt1#/frames)

// ==========
// Test #2
// ==========

T=timer()
// Since the screen width is never going to change
// during the loop, if we want speed we should move
// this calculate outside the loop
ScreenWidth=GetScreenWidth()
For LP=0 to Tests
// Check if the X variable is large the Screen Width
if X>ScreenWidth
Print "X bigger than Screen edge"
endif
next
tt2#=tt2#+(timer()-t)
Print "Test #2 Average Time:"+Str$(tt2#/frames)


Sync
loop






        Storing Calculation Results In Temp Variables

          This (fairly bogus) example demonstrates how including unnecessary calculations inside heavy loops can cost us some performance.

PlayBASIC Code: [Select]
   Tests=10000

Do
cls 0
inc frames

// ===================================
// Precalc Values Out Of Heavy Loops
// ===================================

A=100
B=200

// ==========
// Test #1
// ==========

T=timer()
For LP=0 to Tests
Result=A*B*lp
next
tt1#=tt1#+(timer()-t)
Print "Test #1 Average Time:"+Str$(tt1#/frames)

// ==========
// Test #2
// ==========

T=timer()
// Since values of A and B never change inside the loop
// we can pre calc the value outside before we enter the loop
ATimesB =A*B
For LP=0 to Tests
Result=ATimesB*lp
next
tt2#=tt2#+(timer()-t)
Print "Test #2 Average Time:"+Str$(tt2#/frames)
Sync
loop







        Short Circuiting Comparisons

          This example demonstrates how we can win back some performance by separating combined comparisons.   While will generally win us some performance, providing we place the more unlikely (to be true) compare first.    Other wise, if the first compare is true frequently, this might cost us more, since we're falling through both comparisons to get a complete match.


PlayBASIC Code: [Select]
   Tests=10000

Do
cls 0
inc frames

// ===================================
// Short Circuiting Comparisons
// ===================================

A=100
B=200

// ==========
// Test #1
// ==========


T=timer()
For LP=0 to Tests
if A=12345 and B=12345
print "Both Match"
endif
next
tt1#=tt1#+(timer()-t)
Print "Test #1 Average Time:"+Str$(tt1#/frames)

// ==========
// Test #2
// ==========

T=timer()
For LP=0 to Tests
if A=12345
if B=12345
print "Both Match"
endif
endif
next
tt2#=tt2#+(timer()-t)
Print "Test #2 Average Time:"+Str$(tt2#/frames)


Sync
loop






  Dropping the True from Comparisons

         Often it's common to compare the return value from a function call with a TRUE/FALSE within an IF statement.   This extra comparison normally not needed.   Since the IF statement assumes that value/calculation following it,  is FALSE if the calc return/variable is zero.  Anything else, and the IF will treat is as TRUE.



PlayBASIC Code: [Select]
   Tests=10000

Do
cls 0
inc frames

// ===================================
// Dropping the True from Comparisons
// ===================================


// ==========
// Test #1
// ==========


T=timer()
For LP=0 to Tests
if GetSpriteStatus(1)=true
print "Sprite Exists"
endif
next
tt1#=tt1#+(timer()-t)
Print "Test #1 Average Time:"+Str$(tt1#/frames)

// ==========
// Test #2
// ==========

T=timer()
// If we're checking for TRUE/FALSE return value from
// a function we can drop extra =TRUE
// in these situations. Since GetSpriteStatus(1) will
// return a true/false value anyway.
For LP=0 to Tests
if GetSpriteStatus(1)
print "Sprite Exists"
endif
next
tt2#=tt2#+(timer()-t)
Print "Test #2 Average Time:"+Str$(tt2#/frames)


Sync
loop






        User Functions VS Projected Subroutines





PlayBASIC Code: [Select]
   Tests=10000

Do
cls 0
inc frames

// ===================================
// USer Functions VS Projected Subroutines
// ===================================


// ==========
// Test #1
// ==========


T=timer()
For LP=0 to Tests
result=SomeFunctionCalc(10,lp)
next
tt1#=tt1#+(timer()-t)
Print "Test #1 Average Time:"+Str$(tt1#/frames)



// ==========
// Test #2
// ==========

T=timer()
// Call the Psub function
For LP=0 to Tests
result=SomesubCalc(10,lp)
next
tt2#=tt2#+(timer()-t)
Print "Test #2 Average Time:"+Str$(tt2#/frames)


Sync
loop


Function SomeFunctionCalc(A,B)
A=A*B
EndFunction A

Psub SomeSubCalc(A,B)
A=A*B
EndPsub A






        String Thrashing


PlayBASIC Code: [Select]
   Tests=10000

Do

cls 0
inc frames

// ===================================
// Using MID$() when comparing a Character within a String
// ===================================


TestString$="Hello World"


// ==========
// Test #1
// ==========


T=timer()
For LP=0 to Tests
if Mid$(TestString$,5,1)="a"
print "Character 5 is A"
endif
next
tt1#=tt1#+(timer()-t)
Print "Test #1 Average Time:"+Str$(tt1#/frames)

// ==========
// Test #2
// ==========

T=timer()
// MID() is variation of the MID$(), except it
// returns the ASCII value as integer.
// Integers are faster to compare than strings
For LP=0 to Tests
if Mid(TestString$,5)=asc("a")
print "Character 5 is A"
endif
next
tt2#=tt2#+(timer()-t)
Print "Test #2 Average Time:"+Str$(tt2#/frames)


Sync
loop









        Disc Thrashing

           One cause of performance loss often occurs when we have to read or write large files.  Sadly disc drives are notoriously slow beasts,  so we have to work with them and not against them.    What you might not know, is that because disc devices (and CD/DVD's also) have to physically spin up and seek out the position of the information we're requesting or writing.   There's a latency between each time you try and read or write.  This latency is particularly magnified when we try and read/write small pieces of data.     Each time we ask it to read/write a small packet of data, the device has to local this area on the spinning disc.  This really goes against how these devices are designed, which is they like to read/write big fat chunks of information in one hit.  So knowing this, we can improve our disc performance by working with the devices strengths.  If we have  big file to work with, then rather than nibble away it with BYTE/WORD/Long access,  it's generally more efficient  to load (or save) a bigger chunk in memory and then work with that.
 

           This example shows the worst possible case scenario,  which is reading and writing files BYTE by BYTE.    


PlayBASIC Code: [Select]
   Filename$=CurrentDir$()   +"TempFile.txt"   


// Alloc 100 K sized bank
Bank=NewBank(100*1024)
For lp=0 to size-1
pokeBankByte Bank,lp,lp
next


repeat
Cls rgb(10,20,40)


inc frames


// This test write the Bank contents BYTE by BYTE
t=Timer()
WriteBankAsBytes(Filename$,Bank)
t1#=t1#+(timer()-t)
print "Write Bank As Bytes:"+Str$(t1#/frames)

// This test reads the file into a new bank, byte by byte
// this is where you see just how much this can choke
// your app's disc performance.
t=Timer()
ThisBank=ReadBankAsBytes(Filename$)
t2#=t2#+(timer()-t)
print "Read Bank As Bytes:"+Str$(t2#/frames)
DeleteBank ThisBank


// These versions read/write the same data, but they use
// batch writing. Making them much quicker
t=Timer()
WriteBankAsBlock(Filename$,Bank)
t3#=t3#+(timer()-t)
print "Write Bank As Data Block:"+Str$(t3#/frames)

t=Timer()
ThisBank=ReadBankAsBlock(Filename$)
t4#=t4#+(timer()-t)
print "Read Bank As Block:"+Str$(t4#/frames)
DeleteBank ThisBank





Sync
until Spacekey()


// remove the temp file
DeleteFIle Filename$

end


Function WriteBankAsBytes(Filename$,ThisBank)

If FileExist(filename$) then deletefile Filename$

Fh=WriteNewFile(Filename$)
if Fh
For lp=0 to getbanksize(Thisbank)-1
WriteByte Fh,PeekBankByte(ThisBank,lp)
next
CloseFile FH
endif

EndFunction


Function ReadBankAsBytes(Filename$)

Fh=ReadNewFile(Filename$)
if Fh
Size=FileSize(Filename$)
ThisBank=NewBank(size)
For lp=0 to size-1
POkeBankByte ThisBank,lp,ReadByte(Fh)
next
CloseFile FH
endif

EndFunction ThisBank



Function WriteBankAsBlock(Filename$,ThisBank)
If FileExist(filename$) then deletefile Filename$
Fh=WriteNewFile(Filename$)
if Fh
Address=GEtBankPtr(ThisBank)
WriteMemory fh,Address,Address+GetBankSize(ThisBank)
CloseFile FH
endif
EndFunction


Function ReadBankAsBlock(Filename$)

Fh=ReadNewFile(Filename$)
if Fh
Size=FileSize(Filename$)
ThisBank=NewBank(size)
readMemory Fh,GetBankPtr(ThisBank),Size
CloseFile FH
endif

EndFunction ThisBank





      Examples,
         Load Text Files To String Array
          Buffered file copy (copy data in blocks)

kevin

#2



A Crash Course In Optimization Part #3 - Smarter Drawing Techniques


       In all facets of video game development,  optimizing our programs drawing behavior is one of the most beneficial ways of improving our games overall performance.    Now,  I know what the little voice in back your head is saying, but doesn't my computers super fast video card make optimization redundant ? - Sadly  No.  Like everything in the computer they have a sweet spot.  As such, they're designed to work in a certain way.    While we can simply ignore this and press on regardless,  this will often place more strain upon the drawing devices than need be.  Making our programs slower than they could be.  

       How can I optimized my drawing ?  -   Well that's a good question,  rather than getting into anything thats game or genre specific, we'll just try and cover some simple basic tips that can help us in all types of graphics programming.   There's tips could be categorized as,


      Tips

         * When Not To Clear The Screen
         * Avoid  Rendering Dead Pixels (Drawing nothing sure takes a long time)
         * Changing Display depths can improve performance ?
         * Drawing Everything Real Time ?  
         * Selectively Refresh The Screen (Dirty Rectangles)


        Note:  The objective of optimizing our programs rendering,  should be to improve the programs performance,  without changing the  appearance.    We can generally do this by removing redundant drawing operations.   For example,  if we equated drawing each pixel on the screen as costing 1$ dollar per pixel.    Then we'd want to make sure our program wasn't drawing the same pixels over and over again.  As if we draw over a pixel twice then old one is gone (the user of our program never sees it),  and cost of drawing that pixel has now doubled.    





When Not To Clear The Screen


        Clearing the screen (aka CLS)  is more often than not one of the first commands you'll find our a programs main rendering loop.   But is it really necessarily ?  -  Sometimes yes, but not always.   We only need to clear the screen, if we're not going to be completely filling the screen with some backdrop.    Such as a backdrop picture, gradient or perhaps a map even.   It's in those situations that CLS is just a wasting some rendering time.

        To demonstrate this has effect, the following example simulates the situation where we're drawing a backdrop image and also doing the unnecessary CLS.   Press Space to toggle the CLS on/off between modes.    


PlayBASIC Code: [Select]
   // Create a screen sized image.  This is used to simulate a backdrop
// picture you might use.
GameBackDrop=NewImage(GetScreenWidth(),GetSCreenHeight())

//Fill the BackDrop image with a blue gradient
RendertoImage GameBackDrop
c1=rgb(50,70,200)
c2=rgb(250,70,200)
ShadeBox 0,0,GetScreenWidth(),GetSCreenHeight(),c1,c2,c2,c1
rendertoscreen


ClsEnabled=true


// start of programs main loop
Do

//
if ClsEnabled=true
// Clear the Screen to a bright pinky/orange colour
// We can't see it, since the backdrop is being completely covered
// So the CLS is here for no reason
cls Rgb(200,100,100)
endif

// Draw the gane image as we would in our game.
// Since we're drawing it full screen, and we don't want
// anything to show through it, we draw it solid
Drawimage GameBackDrop,0,0,false

// SHow FPS
Text 10,10,"fps:"+str$(fps())

if ClsEnabled=true
t$="Cls Enabled"
else
t$="Cls Disabled"
endif

Text 10,30,t$

// Hit the spacekey to toggle CLS
if Spacekey()=true
ClsEnabled=1-ClsEnabled
FlushKeys
endif

Sync
loop





        This example,  is an extension of the same subject more or less.  This one draws a gradient backdrop and a some foreground mountains as two screen sized separate images.   Visually, we get a sort of sunset effect sitting behind the scrolling mountain side  (which is just some ellipses in this example).    The thing is,  since the backdrop gradient is none descriptive (no distinctive markings),  what we could do in this case, is combine the gradient and the mountain picture together.  This will effectively 1/2 the amount of  drawing our program is doing and therefore give us a free speed boost.    I should point out, this isn't always possible, but well worth it if you can!

        Press Space to toggle between the two method in this demo

PlayBASIC Code: [Select]
   // =======================================
// Create a Screen Sized Image. This is used to simulate a backdrop
// picture you might use.
// =======================================
GradientBackDropLayer=NewImage(GetScreenWidth(),GetSCreenHeight())

//Fill the BackDrop image with a gradient
RendertoImage GradientBackDropLayer
c1=rgb(50,70,200)
c2=rgb(250,70,200)
ShadeBox 0,0,GetScreenWidth(),GetSCreenHeight(),c1,c1,c2,c2


// =======================================
// Create a second foreground layer
// =======================================
ForegroundLayer =NewImage(GetScreenWidth(),GetSCreenHeight())
RendertoImage ForegroundLayer

// Draw some ellipses to this surface to
Radius=GetSCreenWidth()/3
For Xlp=Radius/2 to GetScreenWidth()-1 step Radius
Ellipsec Xlp,GetScreenHeight(),Radius/2,Radius*1.5,true,rgb(0,255,0)
next


// =======================================
// Create version that is merged together
// =======================================

MergedBackDrop=NewImage(GetScreenWidth(),GetSCreenHeight())
RendertoImage MergedBackDrop
Drawimage GradientBackDropLayer,0,0,false
Drawimage ForegroundLayer,0,0,true
rendertoscreen


//
DRawMergedBackDrop=false

// start of programs main loop
Do

ScrollX=Mod(ScrollX-1,GetScreenWidth())

if DRawMergedBackDrop=false

// Draw the gane image as we would in our game.
// Since we're drawing it full screen, and we don't want
// anything to show through it, we draw it solid
Drawimage GradientBackDropLayer,0,0,false

// Draw the forreground layer over the gradient
Tileimage ForegroundLayer,ScrollX,0,true

else

// Draw the pre merged version of the back drop
// in place of the two seperate images
TileImage MergedBackDrop,ScrollX,0,false

Endif


// SHow FPS
Text 10,10,"fps:"+str$(fps())

if DRawMergedBackDrop=false
t$="Drawing Backdrops Layers Separately"
else
t$="Drawing Merged version"
endif

Text 10,30,t$

// Hit the spacekey to toggle CLS
if Spacekey()=true
DRawMergedBackDrop=1-DRawMergedBackDrop
FlushKeys
endif

Sync
loop








Avoid  Rendering Dead Pixels


       What's a dead pixel ? - These are pixels in the image that we're drawing but are not visible to the end user.



     Solid VS Transparent rendering


PlayBASIC Code: [Select]
   // Create a blank image 100*100 pixels in size
BlankImage =NewIMage(100,100)

TransparentRender =false

// start of programs main loop
Do

// Clear the Screen
Cls rgb (100,150,200)

// Draw the Blank image to screen 25 times
For lp=1 to 25
Xpos=150+(lp*20)
Ypos=100+(lp*15)
Xpos=Xpos-GetIMageWidth(BlankImage)/2
Ypos=Ypos-GetIMageHeight(BlankImage)/2
DrawImage BlankImage,Xpos,Ypos,TransparentRender
next


// SHow FPS
Text 10,10,"fps:"+str$(fps())

if TransparentRender=False
t$="Drawing Solid Images"
else
t$="Drawing Transparent Images"
endif

Text 10,30,t$

// Hit the space key to toggle between transparent and solid rendering
if Spacekey()=true
TransparentRender=1-TransparentRender
FlushKeys
endif

Sync
loop





        If you test (cut and paste the code into PlayBASIC) you'll possibly see something you didn't expect.   While your intuition might be telling you that since we can't see anything being drawn, when rendering transparent pixels, then this isn't eating up our computing time.   If that was the case, then it'd be much faster when rendering an all transparent image.   However, the reality is quite different.  It's actually a fraction slower.  How much slower really depends on the system here.  

         Armed with little bit of knowledge, the next question is how can we best take advantage of it ? - Well, there's a few situations where we can optimize our images to smooth out our games rendering performance.   Such as trimming backdrop images of unnecessary transparent sections and full screen  HUD overlays for starters.   But one place our games can eat up a lot of processing power is when we draw our characters to the screen.    In particular character animations.    

         Why ? - Well animations are often drawn and laid out upon sprite sheets so that all frames are the same size.  While this might make loading the animation relatively straight forward,  it does mean that we're potentially going to have animation frames that are all the same size (width/height)  regardless of the what's in each frame.

         If you imagine a running animation for a second,  then hopefully it's easy to visualize that certain frames during the animation are going to wider or higher than others.  Such as when the character is fully extended for example.   Moreover if we imagine an explosion animation,  where the debris starts out centralized and then radiates out.   Then in such explosion animations, the initial frames will generally be smaller than the later ones.  

         OK,  you're probably falling asleep by now wondering why this is an issue.  Well, It relates back to our discovery above about how rendering transparent pixels costs the same (or more) than drawing a visible pixel.     So if all of our animation frames have extra unused (transparent) space around the visible graphics porttion, then that space is not only costing us a little bit of render performance, but it's also wasting memory.    


  Example

         Lets have a look,

PlayBASIC Code: [Select]
   // Create two identical images.  The only difference is that
// one is larger than the other. The larger (oversized) one
// is bit slower to render.
OverSizedImage =MakeBallImage(256,90)
TrimmedImage =MakeBallImage(180,90)

CurrentImage=OverSizedImage

//
DrawTrimmedIMages=false

// start of programs main loop
Do

// Clear the Screen
Cls rgb (30,40,50)

// Draw the current image to screen 25 times
For lp=1 to 25
Xpos=200+(lp*15)
Ypos=200+(lp*10)
Xpos=Xpos-GetIMageWidth(CurrentImage)/2
Ypos=Ypos-GetIMageHeight(CurrentImage)/2
DRawimage CurrentImage,Xpos,Ypos,true
next


// SHow FPS
Text 10,10,"fps:"+str$(fps())

if DrawTrimmedIMages=False
t$="Drawing Oversized Images"
CurrentImage=OverSizedImage
else
t$="Drawing Trimmed Images"
CurrentImage=trimmedImage
endif

Text 10,30,t$

// Hit the space key to toggle between trimmed and oversize images
if Spacekey()=true
DrawTrimmedIMages=1-DrawTrimmedIMages
FlushKeys
endif

Sync
loop


Function MakeBallImage(FrameSize,Radius)
Index=NewImage(FrameSize,FrameSize)
Rendertoimage Index
For lp=Radius to 1 step -1
Circlec frameSize/2,frameSize/2,lp,true,rgb(100,200-lp,255-(lp*2))
next
rendertoscreen
EndFunction index







  Example #2 - Fixed Size Vs Trimmed Animations Size


          To demonstrate, in this example we're creating two animation sequences. One is trimmed of any dead (transparent) pixels the other isn't.   We're testing the  performance difference by rendering these animations on screen, each in a different stage of it's animation sequence.   You can probably surmise what's going to happen.     The one where we're showing fixed size animation frames will get progressively  slower (the more you have the slower it gets) than the one with the trimmed animations.   So trimming helps us balance the load better.

         
PlayBASIC Code: [Select]
   FrameCount=25
FrameSize=150

// Make two Expanding Ball Type Animations.
Dim FixedSizeImages(FrameCount*2)
Dim TrimmedImages(FrameCount*2)
MakeArray Animation()

Radius=70
For lp=0 to frameCount
Radius2=Radius-(lp*2.5)

// Make the Fixed size version
img=MakeBallImage(FrameSize,Radius2,rgb(00,255,lp))
FixedSizeImages(lp)=img
FixedSizeImages(FrameCount*2-lp)=img

// Make a 'rough' trimmed version anim (in a different colour)
img=MakeBallImage((Radius2*2)-2,Radius2,rgb(255,155,lp))
TrimmedImages(lp)=img
TrimmedImages(FrameCount*2-lp)=img
next



// Set toggle which animation is being viewed
DrawTrimmedImages=false

// start of programs main loop
Do

// Clear the Screen
Cls rgb (30,50,70)


if DrawTrimmedImages=False
t$="Fixed Size Animations"
SetArray Animation(),GetArray(FixedSizeImages())
else
t$="Trimmed Animations"
SetArray Animation(),GetArray(TrimmedImages())
endif


FrameIndeX=Mod(FrameIndex+1,FrameCount*2)

// Draw a screen full of animations
Xpos=100
Ypos=50
For lp=1 to 200
ThisFrame=Mod(FrameIndex+lp,FrameCount*2)
CurrentImage=Animation(Thisframe)
Xpos=Xpos+20
if Xpos>700
Xpos=100
Ypos=Ypos+100
endif
X=Xpos-GetIMageWidth(CurrentImage)/2
Y=Ypos-GetIMageHeight(CurrentImage)/2
DRawimage CurrentImage,X,Y,true
next

// DRaw the anim over the scene again.
X=100-GetIMageWidth(CurrentImage)/2
Y=200-GetIMageHeight(CurrentImage)/2
DRawimage CurrentImage,X,Y,false
box x,y,x+GetIMageWidth(CurrentImage),Y+GetIMageHeight(CurrentImage),false

// SHow FPS
Text 10,10,"fps:"+str$(fps())
Text 10,30,t$

// Hit the space key to toggle between trimmed and fixed size versions
if Spacekey()=true
DrawTrimmedImages=1-DrawTrimmedImages
FlushKeys
endif

Sync
loop



Function MakeBallImage(FrameSize,Radius,Colour)
Index=NewImage(FrameSize,FrameSize)
Rendertoimage Index
For lp=Radius to 1 step -1
Scale#=(Radius-lp)/Float(Radius)
Circlec frameSize/2,frameSize/2,lp,true,RgbFade(Colour,Scale#*100)
next
rendertoscreen
EndFunction index








Changing Display depths can improve performance ?



       While you might not think of it, the screen resolution (games display dimensions width & height) and the number of colours play a big part in final performance of our game.   This is particularly important for those wanting to design programs that still perform well on older computers.   While most of us take for granted that modern machines are capable of throwing high resolution images around the display fairly easily,  the same can not be said for older systems.   Which is clearly evident when we compare games from 2008 with games from 2004, or back in 2000, or 1980 for example.


        The basic rule here is that the higher the resolution,  the more pixels on screen, the  more pixels on screen, the more power required to fill the screen with images at a reasonable speed.

        For example,

           If we use a screen size of  1048x * 768y with  32bit colour.   That's  (1024*768*4)=3,145,728 bytes for the screen.  Yes, 3 meg !
 
           Now lets say you CLS the screen and draw one layer of  map to it.   That means your GPU is roughly shifting 3 meg for the clear, and another 3 meg for the map render.   So to draw the image costs approximately 6 meg of bandwidth per frame.

          Now as we previously discovered,   Every single pixel that is rendered (solid,  transparent (mask colour) or translucent )  costs you performance! While Modern GPU's have a much higher fill rate (they can draw more pixels per second), older ones simply can't cope with too much graphics data.  So we'll need a way to reduce the work load, if we hope to get reasonable performance on those systems.


          We can reduce the work load in number of two ways.

            1) Give the user the option of lowering the display depth from 24bit/32bit  down to 16bit.   This halves the amount of graphics data that has to be shifted, the trade of is that visually we lose some colour quality as well.    It's also worth noting the many older GPU's were optimized for 16bit display modes over 24bit or 32bit modes.    Really old systems often don't have 32bit mode at all.   So on those systems there's no option but 16bit.


           2) Change to smaller screen size.  However, this is not always convenient when making 2D games. Since most of the artwork is drawn specifically for a certain display resolution.  It is possible to scale the graphics media proportionally from within your program.   However this will add a lot of extra processing work, and is unlikely to be very efficient or visually attractive.     But possible if you really wanted.   This approach is far easier in the 3D world.

          3)  Try to minimize the the amount of the wasted drawing the program might be doing.   (The stuff mentioned in this very tutorial)

          4)  If you're doing things like rotation/image scaling then reducing the quality (the images dimensions) of the image can be very beneficial on old systems.   Might not look as good, but this can keep the frame rate higher on those older systems.  

          5)  Don't worry about supporting old clunkers anymore..  Optimizing is your choice, so you don't have to go out of your way support old systems.  



   Example

       What example demonstrates how much the resolution can effect the performance of our program.   It does this by rendering a set of boxes to a collection of the 'screen' sized images. You can cycle through the images and monitor the programs performance().     The boxes are drawn proportionally, so the scene looks the same regardless of the size.  Moreover, It also displays the number of pixels drawn at each resolution.    The bigger the display gets the more pixels that are being drawn, and therefore the slower the demo gets.


PlayBASIC Code: [Select]
   MaxCords=50

// Create an array to hold
Dim BoxCords(MaxCords,5)

For lp=0 to MaxCords
BoxCords(lp,1)=rnd(100)
BoxCords(lp,2)=rnd(100)
BoxCords(lp,3)=rnd(100)
BoxCords(lp,4)=rnd(100)
BoxCords(lp,5)=rndrgb()
next

Dim Screens(5)
Screens(1)=NewIMage(320,240)
Screens(2)=NewIMage(400,300)
Screens(3)=NewIMage(640,480)
Screens(4)=NewIMage(800,600)
Screens(5)=NewIMage(1024,768)


CurrentScreen=1

// start of programs main loop
Do

// Clear the Screen
Cls rgb (30,50,70)

ThisImage=Screens(CurrentScreen)

PixelsDrawn=DrawScene(ThisImage)

Xpos=GetScreenWidth()/2-GetIMageWidth(ThisImage)/2
Ypos=GetSCreenHeight()/2-GetIMageHeight(ThisImage)/2
DRawimage ThisImage,Xpos,Ypos,false


w=GetimageWidth(ThisImage)
h=GetimageHeight(ThisImage)
t$="Current Screen Size:"+str$(w)+"*"+str$(h)


// SHow FPS
Text 10,10,"fps:"+str$(fps())
Text 10,30,t$
Text 10,50,"Pixels Drawn:"+str$(PixelsDrawn)

// Hit the space key to toggle between trimmed and fixed size versions
if Spacekey()=true
CurrentScreen =CurrentScreen+1
if CurrentScreen>GetArrayElements(Screens(),1) then CurrentScreen=1
FlushKeys
endif

Sync
loop




Function DrawScene(ThisImage)

rendertoimage Thisimage
w=GetIMageWidth(ThisImage)
h=GetIMageHeight(ThisImage)

ScaleX#=w/100.0
ScaleY#=h/100.0

c1=rgb(200,100,10)
c2=rgb(200,100,150)
ShadeBox 0,0,w,h,c1,c1,c2,c2
For lp=0 to GetArrayElements(BoxCords(),1)
x1=BoxCords(lp,1)*Scalex#
Y1=BoxCords(lp,2)*Scaley#
x2=BoxCords(lp,3)*Scalex#
y2=BoxCords(lp,4)*Scaley#
swapifhigher x1,x2
swapifhigher y1,y2
boxc x1,y1,x2,y2,true,BoxCords(lp,5)
PixelsDrawn=PixelsDrawn+((x2-x1)*(y2-y1))
next
rendertoscreen
EndFunction PixelsDrawn









Drawing Everything Real Time


      Initially when we first start getting into game programming,  one of the most common misconceptions, well, mistakes people make is they assume 'everything' needs to be drawn every update.  While this is sometimes true, often we can selectively update  and player won't really notice the difference.   However, this does mean that it's not quite as cut and dry as just drawing everything all the time.   But it's not rocket science either.


      One the most common opportunities for selectively updating occurs in things refresh backdrop animations and foreground overlaps.


      Scores and Health Bars

        In this example all were doing is drawing a simple score and health bar bellow.  It's the sort of thing we'd just slap into our game and forget, but we can actually improve the performance by selectively updating it.  The same approach will work for all things HUD related in fact.  So If  it's not changing, don't render it!


PlayBASIC Code: [Select]
   LoadFont "Courier",1,24,0

ScoreOverLayImage=NewImage(204,50)


// start of programs main loop
Do

// Clear the Screen
Cls rgb (30,50,70)

scw=GetScreenWidth()/2
sch=GetScreenHeight()*0.4

if timer()>NextUpdate
NextUpdate=timer()+100
Score=Score+rnd(100)
Health=mod(health+1,100)
endif

if DrawMode=0
// DRaw the score and health directive to the screen, each update
DrawScoreAndHealth(scw,sch,Score,Health)
endif


if DrawMode=1

// Selectively cache the score& health to an image (since image rendering is quicker)

RefreshScore=false
if OldScore<>Score or OldHealth<>Health
; either the score or health have been changed, so we neeed to
; the score image cache
OldScore=Score
OldHealth=Health
RefreshScore=true
endif


if RefreshScore=true
rendertoimage ScoreOverLayImage
cls 0
DrawScoreAndHealth(101,0,Score,Health)
rendertoscreen
endif
drawimage ScoreOverLayImage,scw-101,sch,true

endif


if Drawmode=0
t$="Drawing to the screen"
else
t$="Selectively updating"
endif

// SHow FPS
Text 10,10,"fps:"+str$(fps())
Text 10,30,t$

// Hit the space key to toggle between draw mode
if Spacekey()=true
DrawMode =1-DrawMode
FlushKeys
endif

Sync
loop



Function DrawScoreAndHealth(Xpos,Ypos,Score,Health)

s$="Score:"+Digits$(Score,6)
CenterText Xpos,Ypos,s$

th=GetTextHeight(s$)
x1=xpos-102
x2=xpos+102

y1=ypos+th+5
y2=y1+th

c=rgb(20,30,40)
boxc x1,y1,x2,y2,true,c
c1=rgb(220,30,40)
c2=rgb(20,40,240)
ShadeBox x1+2,y1+2,x1+(Health*2),y2-2,c1,c2,c2,c1
EndFunction









Selectively Refreshing The Screen (Dirty Rectangles)



      Often when we start writing  games,   it's common to see programmers just throwing everything at the graphics engine/hardware and let that deal with it.   While this may  work fine for some games on some computers,  but not others.   The reason for this is that graphics engines are generally designed to just do what you tell it to do.  if you tell it draw the same image/circle/pixel 100 times,  then that's what it'll do.   It doesn't know what you're trying to achieve, or that you might be drawing something that isn't even on screen.  

      Selective refreshing is conceptually simple.   Rather than just shoveling everything on screen every frame,  we going to pay closer attention to what we're drawing and where.    In this particular example, we're simply drawing some sprites (with alpha channel) moving around on a static backdrop.    If we take the brute force approach,  we've have a FX image the size of the screen.  Copy the backdrop to this image,  draw all the sprites over it, then copy the screen image to actual screen.     This will work,  but do we really need to refresh the entire backdrop every frame if the backdrop is static picture ?   Nope.  We only need to refresh the parts of the backdrop that are being overwritten.  

       There's a few ways of doing this,  in this example we're simply going to treat the backdrop picture as if it was Tile Map.  Using an array to signify when a tile (of rectangular portion of the backdrop image) needs to be refreshed to it's original state.   During our sprite movement routines,  we not only move the sprite, but we work out what tiles it's going to overwrite when it's drawn.  These backdrop tiles are then flagged as needing to be refreshed when the backdrop is being restored.    

      So each loop we're basically doing this


      Do
          * Refresh Backdrop (redraws any tile that was flagged as being overdrawn last refresh.  this restores the backdrop to it's original state)
          For TileY = 0 to TilesDown
            For TileX = 0 to TilesAcross
                  if TileRefresh(xlp,ylp)
                           ; redraw this portion of the backdrop,  since it was previously covered by a sprite
                  endif
            next
        next

          For each character in game
                  .. Move character (AI , physics etc)
                  Get characters final screen coordinates and calc bounding rectangle
                  convert the  sprites screen coords to array coords   and fill this rectangle of tiles in the array with
          next
          refresh the screen
      Loop


           
         What this lets us do, is we're effectively removing the cost of the redrawing the full backdrop each update.   That is assuming we dont have a full screen of sprites covering the backdrop, in that case it'll all get redrawn regardless.  

         See example: Selective Tile Map Refreshing


         See Example #2 Dirty Rectangles  (Combined Video & Fx Rendering)



kevin

#3



A Crash Course In Optimization Part #4 -  Smarter Program Logic


    Becoming a good programmer (games or otherwise), goes far beyond the graphical mumbo jumbo of the day.  What do I mean by that ? -  Well,  it doesn't matter what display technology/graphics libraries you use, if your programming & problem solving abilities aren't up to it.  Then you'll most likely end up with some pretty visuals on screen, but be unable to actually get the game working.  Which is incredibly common !

    So here, we'll try and address some  these stumbling blocks new/old programmers are likely to run into.  Hopefully this will help improve your algorithmic knowledge (methods) and code design principals.  Which  might just help you shake the brass monkey!


  There's tips could be categorized as,

       Design Tips

          * Clearer Code means faster development


       Algorithm Tips

          * Spacial Partitioning (Collision)



     Note: Remember, improving our programming design and algorithmic knowledge(methods)  has a two fold effect.  While better method/algorithms will certainly help optimize our programs performance, the bigger gain can come improved code design/structure.  Which will allow us to create larger more complex programs, with less errors at much faster rate.






Clearer Code means faster development


       Optimization is the pursuit of efficient design.   Conceptually this goes far beyond just improving the execution speed of a piece of program code.    By learning to write clearer more structured code to begin with, we can optimize our very development and bug test progresses.   Allowing us to develop larger more complex  programs with less faults in less time.    How do we go about this ?   

       The simple answer is that we need apply some structure,  some rules to our own programming style.   Thus forcing ourselves to think before acting.  While this might not be the most intuitive thing to do, in particular for new programmers,  all too often badly/ugly code is the result of an impulsive blast at the keyboard.

       Just what is bad code ? -  While highly subjective, we'll classify a bad or ugly code fragment as being something that fails to efficiently convey it's  meaning to the reader in a well laid out manner.  The key point here is that  even though it might 'execute correctly', if it's not written in legible fashion,  then  this can make the code difficult to debug, very difficult to change and almost impossible to reuse.   

        So in no particular order this means,

          * Use meaningful Variable, Array, Types & Function Names throughout our program.

                 Using meaningful names (of variables/functions/arrays etc) helps with the programs readability dramatically.   As the author of a program, we can sometimes be forgiven for using names that don't really related to what they hold or do.   This might not be a huge issue while we're developing a section of program code, as while we're writing it, what these variables/functions name represent will be fresh in our memory.    It's when we take a break from programming, or show the code to somebody else.    It's here that  such decisions become more of a liability than anything else.   Why ? - Well, beside from being error prone,  it means that in order to understand what a section of code is doing, we have to re-learn it from scratch every time.  The bigger the fragment, the more difficult and time consuming this becomes. 


          * Comment your code. 

                    Comments might be used to describe the approached  a section of code is using to solve a particular problem, outline any known limitations of the code fragment,  code separators,  to do lists anything really.    So comments can help us future proof our own programs. 


          * Use the appropriate control structures for the algorithm/logic we're  implementing.


          * Use Functions & Subs to divide and conquer.   





       Related articles

           Planning Your Project
           Thoughts on Planning / Code Design
           Choosing The Correct Variables

       Wiki articles

          Given the size of the this particular subject, we can't possibly cover it in a few paragraphs, so feel free to could out some out.

           http://en.wikipedia.org/wiki/Structured_programming
           http://en.wikipedia.org/wiki/Procedural_programming





Spacial Partitioning


       One increasingly common problem in game programming, relates to the efficient detection of the collisions between the various characters in our game world.    Effectively the more characters we have, the more collisions possibilities, and the more checks, the slower the program will operate.   

       Imagine this scenario,  imagine we have a room full of people all walking around the room randomly trying not to bump into each other.    So our programs AI  will be required to predict  collisions  between the various people, so the characters can take evasive action.  Which just means that each time a person moves, we'll have to make sure the person isn't blindly moving into another.

        One way of doing this, is by checking each person against every other person in the room.      So if there's 10 people.   Person #1 is compared with persons 2,3,4,5,6,7,8,9,10.   Person #2 is compared with persons 1,3,4,5,6,7,8,9,10  and so on for all 10 people.     While this approach will work, it's doesn't scale up very well, which is it's main flaw.   For 10 people we're only making (9*9)=81 collision checks.  Which isn't too bad.  But what if the room has 100 or 1000 people in it ?    It's here  the cracks start appearing..  For 100 characters we're performing (99*99)=9801 checks,  for 1000 characters it's (999*999)=998,001 checks.    If we think about, by comparing each person to every other person, we'll going to end up making lots of redundant collision checks.    Meaning lots of wasted processing time and ultimately a slower running game.

         So what's the alternative  if comparing everything to everything is too much work for our game ? - Well,   we need to start thinking of ways to reduce the number of necessary collision checks.   This can be tackled in a while bunch of  ways,   But for now we'll look into some basic Spatial Partitioning

         Spatial partitioning (as the name suggests) is where our program keeps track of the region or zone each character is currently within.   So when it comes to detecting collisions between the characters, we only need to the compare our current character,  with just the characters that share the same zone.     This simple design change, will dramatically reduce the collision overhead.   

          If we use the example from above again,  but this time lets imagine we have an overhead view of  room with 1000 people randomly positioned on screen.   This time,  rather than comparing each person, with every other person,  we'll first run though our list people are assign them one of two zones.    Characters standing in the top half of the display will be in zone A, characters standing in the bottom part of the display will be in zone B.    Characters that are standing in the middle of the screen, will be placed in both zones A & B.      So for example sake, lets say that Zone A has 550 people in that half of the screen and Zone B has the remaining 450 people.    This will give us an approximate collision test count of (550*550) + (450*450) = 505,000 checks.  So we've halved the work load by simply splitting up the screen into two zones.   What would happen if we split it into four zones,  eight zones ?  Simple, we get lower collision  check count. 

    This example uses this very approach (vertically).

    See Partition (inter object) Shape Collision