Simple Formula Evaluation / Compilers

Started by kevin, January 27, 2004, 10:18:38 AM

Previous topic - Next topic

kevin

I've been asked a more than few times about writing a compiler.  So I threw together an example of token based formula evaluation.  Which can be found on the  PC code page.


http://www.underwaredesign.com/?page=programs.PlayBASIC-Source-Codes

kevin

Here's a crash course on the basics of turn the above into a compiler

Anyway, to get started writing your compiler, you only need a hand full of routines.   The most difficult of which,  logically, is the expression evaluation, but that's covered above.  The other parts are dead easy.

 Assuming the compiler is token based. Then the parser is entirely based around a central function that converts the next bit of source code into a token/offset, creatively called GET_TOKEN.   This means we have one central processing loop identifying the main tokens (commands).  All tokens are trapped here. These tokens represent either  the language core structural commands, like IF/THEN/  For/NExt declarations et, while, which are handled manually. While anything else falls through to be evaluated as a command.   If it's unknown after that, then the user typed something stupid :)
 

repeat

Token=GetNextToken()


if token=Token_if
    Call the tokenize expression function.  This function reads all the following tokens into the token buffer, then attempts to resolve the expression back to a single resulting token.  IF there's only one token left, after called the eval expression function, then dump out the code for this comparison.
    Next, Check for a THEN token..   If no then.  It's must bean IF/ENDIF  or IF/ELSE/ENDIF
    Continue    
endif


if token=token_goto
  grab the next token.     Check if it's a label token. If it is, then insert the code for the goto/label address.
  contiune
endif

 etc etc  

  If the token is not of the above, we'll check if it's a command token. All commands, like every else have unique token identifier.  The commands would be placed in a table.   Thus you either search/ or index the table to check what inputs and data types the command has.   The evaluate the following expression, you call the resolve expression function.

.cont:
until EndofSource

 
The main change occurs within  expression evaluation function.  Since the language is not always going to be dealing with known values/strings, operations between variables need be resolved as operational tokens such as ADD/SUB/MUTL/Div/Compares etc.   So rather than replace the resulting token with the known answer, a temp variable is generated and thus inserted in the place of the expunged operation and variables.  So thus the temp variable carries forward in the expression as normal.  Since the evaluation is done  order of precedence, we end with a list of instructions that solve this expression.    

The only other thing is how variables are handled.   In Play Basic all variable casting, and a numbers of other operations, occur with the get_token function. This make everything above it, so much easier !.    Thus when get token sees the next string characters are alphanumeric, first it checks those to the command table. If it's not found as a command  We can then assume it's a variable, and thus check if it's within the variable table.  This table is just stack of variables found to date.  The table has certain flags about each variable, like it's scope, it's type.   The check routine returns the Index of this variable within he table.  But, If it's unknown the routine adds it's to the end of the list, and returns the newly created index.

If you revisit the evaluate expression example, you'll notice that before, constant values were represented as a token called "Token_Constant" plus a secondary value, that being the actual value.   So to represent variables, we'd create a token called say "Token_Variable" and rather than store it's value (since it's unknown at compiler time), it's accompanying value is the variables index within the variable table.


Believe it or not, you now have enough basics to turn the above evaluation example into a simple compiler.. have fun :)

The Lone Programmer

I have read this post, and the Play Basic example, but I still do not understand.

I am not familiar with the Play Basic language, so the commands are foreign to me.  Is there any way you can give me a simple demonstration with say psuedocode or something?


I managed to do this:
8+4*2

With that equation I parsed it out to look like this:
8
+
4
*
2

My problem now is with sorting the equation so it follows the rules of BOMDAS or PEMDAS or whatever you would like to call it.  I am familiar with what stacks are, but would prefer to use another method if possible.  I don't know what kind of crazy recursive function I am going to have to expect to get this to work.

Any help is appreciated.

Thanks,
The Lone Programmer
"Is The Juice Worth The Squeeze"
- The Girl Next Door

kevin

#3

While the example above is PlayBASIC code, the package also contains a tutorial.  


MULT + DIV have a higher precedence than ADD + SUB.

First you scan the expression looking for * or / symbols (highest precedence)

If you find one, just grab the left and right values and mult or divide them, the result is then inserted back into the expression, replacing the left value the operation and right value.

Then you scan for ADD and SUB operations. If you find one, grab the left and right values and either ADD or SUBTRACT them. Then insert the result back into the expression.


Basic logic


repeat

; check for Mult + divde
Repeat  
 -Get Character from expression  
 -check if chr is a * symbol
 --------> Grab the previous character as a value
 --------> grab the next character as value
 --------> Mult them together
 --------> Put the result back into the expression, replacing the left + right values

 -check if chr is a / symbol
 --------> Grab the previous character as a value
 --------> grab the next character as value
 --------> Divide them
 --------> Put the result back into the expression, replacing the left + right vlaues

until end of expression



; check for Add + Sub
Repeat  
 -Get Character from expression  
 -check if chr is a + symbol
 --------> Grab the previous character as a value
 --------> grab the next character as value
 --------> Add them together
 --------> Put the result back into the expression, replacing the left + right values

 -check if chr is a - symbol
 --------> Grab the previous character as a value
 --------> grab the next character as value
 --------> Sub them
 --------> Put the result back into the expression, replacing the left + right vlaues

until end of expression



until  no more operations are found




So the Example resolves like this.
     
 "8+4*2"

 After the scan for mult + divs we get

 "8+8"

after the scan for add+sub we get

 "16"

The Lone Programmer

Man is there such thing as the perfect reply?

Your response is probably the best one I've seen in a forum board.  It put things into great perspective for me.

I must say, that your reply does not match the tutorial that was with the Play Basic code.  This one was much easier to understand.

I could do this in Delphi, but I will have to make up my own DeleteString and InsertString function for C++.

I appreciate the help

Thanks,
The Lone Programmer
"Is The Juice Worth The Squeeze"
- The Girl Next Door