PlayBASIC V1.64N2 / V164N3 (Work In Progress) Gallery

kevin · April 22, 2012, 12:07:12 AM

PlayBASIC V1.64N2 & V1.64N3 (Work In Progress) Gallery

This thread will document the changes being made for the PlayBASIC 1.64 revision N2 and revision N3 (Last half of thread). Ideally this update will address a few issues in the V1.64N release, with probably some more tweaks thrown in for good measure.

For Newer upgrade work in progress see,

See PlayBASIC V1.64P (login required) - Convert PlayBASIC To Machine Code Dll's

See PlayBASIC V1.64O

For older upgrade work in progress see,

See PlayBASIC V1.64N

See PlayBASIC V1.64M

See PlayBASIC V1.64L Learning Edition History (Building learning Edition Thread)

See PlayBASIC V1.64L

PlayBASIC V1.64N2 Beta #1 - Sprites Collision and evils branch prediction

While roaming the hill sides of the PB engine a few weeks back, noticed something of an oddity in regards to sprite collision, in particular when used in scenes of say 500->1000 sprites. Yeah, I guess that's a fairly pie in the sky sprite count given your average PB has has less than 25 sprites on screen, but none the less, if we can make a dent on the bottle neck then any improvement will flow through to more real world situations.

In the test code, I'm creating a 1000 sprites within a 800 * 600 range of each other and comparing each sprite to all 1000 of them,making a 1,000,000 brute force potential impacts. What's interesting is that in this situation, it's actually quicker to not use collision classes at all, by about 3 times in fact.

When sprite hit runs through the set, it's just AND'ing the your desired class with each sprites class , if no bits are shared, we skip it.

Which might look a bit like this in PB code

Code Select



    MyMASK = %11110000

    For ThisSprite =1 to  NumberOfSprites
         if GetSpriteCollisionClass(ThisSprite)   and MyMASK

              ; Here we'd continue to do the actually impact
              
        endif
    next

So what we're saying is that when the IF / ENDIF statement is always true, then the loop runs much quicker (3 times faster) than when the we selective target impacts, when say only 10% of the cases were true. Doesn't make sense ?, yeah... had me stumped for a while, but this has all the hall marks of the branch predictor failing us. Which I'm not too sure we can avoid at this point. I've been trying to rearrange the logic to avoid potential stalls, but the moment there's a comparison in the loop it dies in such circumstances.

Now given the situation is pretty pie in the sky, it's not really something i'm worried about, but it does bother me :)

Attached is pic showing the scene i'm testing.. a bit over the top :)

ATLUS · April 22, 2012, 08:09:13 AM

one thousand sprite, good work, and how many fps?

kevin · April 22, 2012, 08:58:48 PM

about 30

kevin · April 22, 2012, 09:33:39 PM

PlayBASIC V1.64N2 Beta #2 - Compare Operators

A by product of the previous test has lead to some improvements to the compare operator opcodes in the PB VM. Compares aren't a heavily used operator throughout a program, but they generally appear in high repetition loops, so we can win back some free performance by customizing the opcodes to better handle the most general situation, which is INTEGER compares. It'll still handles mixed INT/FLOAT types, but the reshuffle means it's slightly biased towards integer operations.

In the shots bellow we're running a bunch of for/next loops each performing 10,000 iterations. Each test is comparing the various comparison types, so we've a loop for EQUALS, NOT EQUAL, LESS THAN, GREATER THAN, LESS THAN EQUALS and GREATER THAN EQUALS. Some loops are just doing one compare, others are comparing all 4 combination.. ie. INT=INT, INT=FLT, FLT=INT, FLT-FLT

The compiler can also detect some assignment redundancy in expressions, which can remove the odd bogus move operator that could appear in the outputed code. The combination makes the bench mark about twice as fast. May have also found a few legacy opcodes no longer in use. It's hard to tell really, since the output opcode is often calculated (merged) during code generation, rather than a nice an obvious write function. So if you get 'messages during BETA testing' then I wanna know about it.

PlayBASIC V1.64N2 Beta #3 - FOR / NEXT Statements

While looking through some user code recently, noticed a few usages of STEP 1 in FOR/NEXT loops. The run time actually has two looping methods built in, one for the default loop iteration of one, and another for a user defined step value. This is necessary as the user defined step requires another variable read from memory every iteration. Fetching from memory is unavoidable when the step is some custom value, but when it's one, it's just overhead for the sake of overhead. So the parser now automatically detects this situation and defaults back to the standard +1 looping form.
So this,

Code Select


      For lp =0 to Max Step 1
           Count++
      next

is actually compiled as this now.

Code Select


      For lp =0 to Max
           Count++
      next

kevin · April 24, 2012, 02:18:03 AM

PlayBASIC V1.64N2 Beta #4 - Squeeeeezzzzzing out some extra Loop performance

At this stage, optimizing any of the core operations in the PBV1.64 VM, is how you'd imagine those formula one engineers must feel squeezing that extra millisecond out of their latest F1 car. There was a point early on, where making the runtime faster was often pretty easy, but as time drags, the big impact gains are few and far between. More often than not now, new gains are made not through changing the runtime, but tweaking the compiler to detect and gracefully replace particular 'dodgy' bits of logic or functions. Another way gains can be made, is via tweaking the layout of the opcodes being produced based upon the most common circumstances users apply it, which is how the performance gains in the comparison opcodes are made above.

Hopefully by now, you'll appreciate that Looping and decisions go hand in hand, so any performance we win back has a positive impact upon end user programs. The comparison gains alone gave us some extra zip to our Repeat/Until While/EndWhile loops, in one test making a While loop 4 milliseconds faster. But the actual operation wasn't any different than in V1.64N and bellow. But last night, i've been looking over the looping opcodes (again) for alternative methods of structuring the opcode data, trying to avoid fetching anything that may or may not be required. It turns out, that by pre-processing some the opcodes data, some work can be done at compile time, rather than an runtime. With some interesting results..

Bellow we're looking yet another thrilling standard bench mark test between V1.64N and V1.64N2 Beta 4, the test is just timing a bunch of looping constructs from FOR/NEXT, REPEATS / WHILE and DECLOOP...same old same old.. So the V1.64N release can run the test and maintain a 48fps refresh, which is quickest of all of the classic versions of the PlayBASIC, well... until now.. So running the same test through V1.64N2 beta 4 and we're able to maintain a 60fps rate. Now we should point out that the real benefits of this test are recent changes are the conditional repeat/while statements. There's perhaps a tiny gain in FOR/NEXT opcode, but nothing to write home about.

Comparing today's runtime performance using the standard test, against older editions we get a slight speed advantage, where the V1.64N2 is running the FULL standard test in 3.7->3.8 seconds, about point 1->2 of a second quicker than V1.64N, which was one of the first editions to cross under the 4 second barrier. For those who might not know, the standard test is a program that measures the performance of the all the common low level operations in the PlayBASIC runtime. It was originally written way... back in 2004 and i've a collection of programs built from various Pb revisions across that time period. It's interesting looking back on how long it takes those older versions, like the V1.089 revision takes almost 9 seconds to run same code. It's kind of like looking back on fossils where you see large jumps in the performance to some structural change that was made.

Anyway, we'll keep chipping away.. I think there might be a way to cache some structures accesses across expressions and perhaps groups of lines.. But we'll see.

monkeybot · April 25, 2012, 04:36:47 PM

keep on chippin'

kevin · April 26, 2012, 12:05:55 AM

PlayBASIC V1.64N2 Beta #5 - Select / Case

Some days you win some performance back, some days you lose some. Yesterday was that latter, was trying some rearrangements of the opcodes that handle IF / THEN / IF ELSE ENDIF / SELECT statements. Prior to making the changes, based on the previous sessions results, I would happily bet that they'll be quicker also.. But.. nope, slower.. Spent most of the afternoon trying to get the win, but ended up reverting to the previous version. Which happens.. an awful lot..

Not a complete loss though as the implementation of Select / Case statement has a number of fatty bits wrapped around the edges. Surprisingly when brute forcing the Select/Case against IF/THEN blocks, a raw IF/THEN was quicker over 25,000 calls of 10 options.

This type of decision block thing

Code Select



     If A =0 Then DoSomething0
     If A =1 Then DoSomething1
     If A =2 Then DoSomething2
     If A =3 Then DoSomething3
     If A =4 Then DoSomething4
     If A =5 Then DoSomething5
     If A =6 Then DoSomething6
     If A =7 Then DoSomething7
     If A =8 Then DoSomething8
     If A =9 Then DoSomething9


 vs


     Select A

           case 0 : DoSomething0
           case 1 : DoSomething1
           case 2 : DoSomething2
           case 3 : DoSomething3
           case 4 : DoSomething4
           case 5 : DoSomething5
           case 6 : DoSomething6
           case 7 : DoSomething7
           case 8 : DoSomething8
           case 9 : DoSomething9

    endselect

In PB1.64N the test was showing a 2 millisecond bias towards the IF/THEN version. Which can attributed to that opcodes precedence in the runtime. In other words just calling one is faster than the other, let alone what they actually do. When looking through the select opode though there did seem to be a few places where the opcode could return earlier, so have plugged some leaks. Which resulted in the two sides performing almost evenly. Clearly the IF/THEN statement block above gives a FIXED execution time. Where SELECT can exit the structure when a case is met. The sooner that case is to the top the quicker.

So in the above example if you set A to 9, then they'd preform about the same, if A is 5, then the select block is approaching twice a quick and so on. Now you can shuffle the IF/THEN logic to make the an almost identical logical structure, but it's a just a little extra work like bellow.

Code Select



     If A =0 Then DoSomething0 : Goto Done
     If A =1 Then DoSomething1 : Goto Done
     If A =2 Then DoSomething2 : Goto Done
     If A =3 Then DoSomething3 : Goto Done
     If A =4 Then DoSomething4 : Goto Done
     If A =5 Then DoSomething5 : Goto Done
     If A =6 Then DoSomething6 : Goto Done
     If A =7 Then DoSomething7 : Goto Done
     If A =8 Then DoSomething8 : Goto Done
     If A =9 Then DoSomething9 
Done:

You can take type of tweak even further by moving to JUMP Table are per ON VARIABLE GOTO, but they don't suit every thing. Ideally it'd be best if the compiler could recognized the most optimal implementation that best fits the case data. But, unless your falling through this type of structure 1000's of times each frame, then it's not worth worrying about today.

Type Caching

Because ARRAYS are dynamic in BASIC, then every time you pull or write a data to an array, the runtime has a bunch of overhead to chew through to just work out the location of said data in memory. So the fetch cost is really a pretty significant drain upon performance, where the more accesses we have the more overhead we introduce. So obviously caching values in Variables where possible is a better approach. Variables are comparable to CPU registers in the VM. Everything internally is biased toward them, given they're the currency of the instruction set.

One idea i've been mulling over for a while now would approximate some long term redundancy style opt's you see in low level languages like C/C++, which tend to solve accesses to a pointer and carry that pointer forward until it has to be recalculated. Ideally if we pre-processed the source code, we could make a program reorder the user code into the more efficient sequences of instructions.

Ie.

Code Select



  '      If we had some code like this

      Dim Table(100)

      For lp =0 to 100
             Table(lp)=Table(lp)+Table(lp)
      next

  ' could be this..

      Dim Table(100)

      For lp =0 to 100
             Temp=Table(lp)
             Table(lp)=Temp+Temp
      next

Now you'll notice there's multiple accesses of the same cell in the array in the first snippet. So an easy opt would be pull the value once in an variable , double it there and then write it back. A pre/post processor pass could actually do this for us as it'd take a much broader view of the code, the compiler can't as it's not aware of how these accesses might stack up over time. So any time there's a query, it drops the code to get/set the value from the array.

But, what I think might possible would be introducing an alternative type of Get/SET process for subsequent accesses. It wouldn't be as optimal as rearranging the expression tends to be, but I suspect it'd make a happy middle ground reducing the overhead of heavy structure accesses, effectly giving back some free speed for what minimal effect all round.

kevin · April 27, 2012, 12:32:07 AM

PlayBASIC V1.64N2 Beta #6 - Type Caching In math Short cuts

Since yesterday I've been looking over how types are accessed, this is a very old part of the vm instruction set and one where's not much room to move, so I've had to add some secondary lower priority instructions to help out with the caching. Firstly, caching can only be applied to ordered accesses of the same typed structure. So it can't pre-compute part the cell addresses of a number types and hold them that for long periods of time, it's just the most recent access.

Starting simple, i've been testing the logic upon typed variable access inside the math short cuts operators rather than even attempting to let this run while on anything bigger. Depending upon the operation and the target, the short cuts generally output much the same code as the unrolled expression, not always, but generally they do. When used on arrays/types the code generator spits out code that basically reads the array, does the math operations and drops the result back into the array. Since the source and destination cells are known to be the same, we can short cut this by allowing the read as normal, then spiting out a cached write rather than a flow blown write.

Bellow we've some test code and results showing the proof of concept works. The code just runs batches of additions on 3 fields in a type, 25,000 times.

Code Select


		type Vector3D
				x#,y#,z#
		EndType

		type Stuff
		
				Name$
				X
				y
				z#
				Table(100)
				V3 as Vector3D

		EndType

	Dim Me as stuff
	Dim test(10) as stuff
	Dim cool as stuff list

	me = new stuff
	
	Test(1) = new stuff
	
	
	cool = new stuff
	
	me.v3 =0  

	max=25000
	
Do


		cls	
		frames++

		tests=0
		t=timer()
		for lp =0 to max	
			me.X+=1
			me.y+=2
			me.z+=3	
		next
		tt1#+=timer()-t	
		print tt1#/frames	
		print Me.x
		print Me.y
		print Me.z
		Me.x=0
		Me.y=0
		Me.z=0
		tests++


		t=timer()
		for lp =0 to max			
			me.X=me.X+1
			me.y=me.y+2
			me.z=me.z+3
		next
		tt2#+=timer()-t	
		print tt2#/frames	
		print Me.x
		print Me.y
		print Me.z
		Me.x=0
		Me.y=0
		Me.z=0
		tests++


		print "BandWidth:"+Str$(max*24*Tests)	
		print "FPS:"+Str$(fps())	
			
	
	Sync
loop

Ok.. so result wise, running the first loop in V1.64N2 beta6 gives us something like a 20-25% boot in performance compared to V1.64N. But there's a BUT.. Something that I'd not foreseen, is that when you cache subsequent accesses like this we can no long rely upon the PB's automatic on write type creation. Simply because the write function has no idea where the cache data type comes from.

What i mean is that in PB, when you write to types that don't exist, PB will allocate the type for you..

Code Select


        Eg.
       Type Stuff
             Xpos,Ypos
       EndType

       ; declares the type, doesn't allocate the types structure, only the container
        Dim Me as Stuff

       ; So this is actually writing to a type that doesn't exist 
        Me. Xpos = 45       ; writing to the type

Since PlayBASIC knows the type of array/type your writing to, it allocs the structure for you. Normally you'd do this with the NEW operator. If you use lists, then that's the only way. The thing is, if the code has to READ from the structure and then caches the read pointer, in an expression like Me.Xpos++, we're reading from a type that doesn't exist and caching this none existent structure, so when we run the cached write, BOOM it crashes.

So what this means is that if we enable caching universally across our programs we'll be a slightly changing the behavior of the runtime, even though it's fairly subtle difference, this could be the difference between some programs running and other programs not running without significant modification. In particular, if you rely upon the automatic allocation. If you don't, then there's no problem.

Been testing a bunch of examples, and at this point have only found one or two where there's any issue, even so, I'm not too comfortable setting caching as the default behavior as yet without a lot of 3rd party testing. I suspect the best way forward will be like when the first editions of the expression optimizer appeared, where originally you had to enable it, then eventually it became the default mode.

ATLUS · April 27, 2012, 09:14:28 AM

Kevin, do you planning add new operators in new version?

kevin · April 27, 2012, 11:42:20 AM

Nope, only things planned are what's listed here (login required)

kevin · April 27, 2012, 11:52:54 AM

PlayBASIC V1.64N2 BETA #6 (Retail Compiler Only Beta) - (Avail for Registered Users ONLY)

PlayBASIC V1.64 N2 Beta #6, this revision contains all the latest runtime optimizations and Language files..

The V1.64N revision 2 is meant to address a few issues found in the V1.64N upgrade, so far it's been more of an optimization based upgrade with various improvements being made to the PB VM. The main changes found in this beta would be the new Compare operators, some conditional looping changes (repeat / Until - While / EndWhile), improved select case and last but not least Type Caching in the Math Operators.

Type caching is a method where the runtime can preserve info about type accesses across an expression. So far, it's implementation is limited to only being available within short cut math operators such as ++, --, +=, -=, *=, /* and logic operations.

Ie.

Code Select


      Type Stuff
            X,Y,Z
      EndType
      DIm Me as Stuff
      Me = New Stuff
   
     ; the runtime can now remove some of the work from this type of operation
      Me.X++ 

     ;  It can't currently help if you do this
      Me.x = Me.x +1

Internal tests show that caching gives us about around 25% gain in speed over the long hand version. Be aware though, if you try and do this on types that don't exist you'll get a runtime error. To create a type within the container, we use the NEW operator as shown above.

I hope, caching can be rolled out language wide, but my gut feeling is that we'll struggle to get enough people testing it across their code, which will mean it's implementation will be abandoned.

Download

Old Link removed

ATLUS · April 27, 2012, 12:53:36 PM

great! i going testing some code! ^__^

OldNESJunkie · April 27, 2012, 06:15:20 PM

To quote Kevin: "I hope, caching can be rolled out language wide, but my gut feeling is that we'll struggle to get enough people testing it across their code, which will mean it's implementation will be abandoned."

Personally I say make it that way anyways especially if the performance gains are that large. People can adjust their projects if need be. d/l now to test myself.

ATLUS · April 27, 2012, 06:41:29 PM

i think work well, i tested my codes and some(~50) standard "Examples codes".

kevin · April 28, 2012, 12:40:01 AM

QuotePersonally I say make it that way anyways especially if the performance gains are that large. People can adjust their projects if need be. d/l now to test myself

Have already been testing ways to expand this project wide, but unfortunately it's not turning out to be as simple as i'd hoped. It can work nicely on serialized sequences of code, but issues frequently appear in blocks where there's a mixture of data structures being accessed. The issue isn't that people would have to reshuffle their code to take best advantage (that's a given), the issue is that their code just won't work in some situations. Finding those circumstances takes time and lots and lots and lots of testing.

News:

PlayBASIC V1.64N2 / V164N3 (Work In Progress) Gallery