News:

Building a 3D Ray Tracer  By stevmjon

Main Menu

Dtab Web Importer

Started by kevin, April 11, 2011, 03:00:45 AM

Previous topic - Next topic

kevin

 Dtab Web Importer V0.01

  This is another one of the things i've been wanting to add to Dtab for a while, which is the ability to fetch a page and parse Drum TAB directly from the site.  This is currently only a prototype, but already it's able to fetch and format up the content ready for importation.  In the attached picture, you can see the parser is able to grab some general info about the track from the page, which is normally removed from tab itself.    Making it virtually a one click process.    

  Dunno when it'll be built in at this time, since there's a few issues at the moment..  

kevin

#1
 Dtab Web Importer V0.02

  Tweaking the page parsing routine to work better with some differences between some of the main sites.   It'll pretty much pull the copy from the any page now,  but you'll not doubt find the odd missed tag or character on some pages.    Some sites post their tabs with all the information in tact, so importing is a one click process.  Others strip the information like the song name / artist / tabber details from the tab body and insert into the html document, for obvious presentation reasons.

 The current parser has little snippets of code to deal with some the main sites, so it'll grab this info from the page and insert it back into the tab.   The trouble is, if the page formatting changes some time in the future then this would break the importer.  A better solution would be use some type of action script to do the search and grabs.  So changing the behavior doesn't require a recompile / product update each time, just a new text file..    

 The thought did cross my mind that I could drop in an script version of PB (like Kyruss II for example), but that'd be overkill for something this simple.  Just needs a few control opcodes to do the fetching...   So we'll see how this goes :)


kevin


  Dtab Web Importer V0.03

      Haven't had a much seat time for programming the last couple of days, just a few quick sessions here and there.  However the script engine is coming along nicely.   Been through a couple of revisions already though, the first was list an action list based on a stack.  So you push the parameters to internal stack and call the operation, pop the result.    Seemed like about the simplest solution off the top of my head, but also a bit too limiting.    Ended up tossing the manual stack idea and figured a expressionless BASIC styled language, would give a lot more flexibility. 

      The idea was to tokenize the script code and execute the logic from the input stream directly as tokens.  This just meant dropping in one of the various parsers to do the conversion pass.  Pretty straight forward really, the only hindrance is VB's IDE's nanny nature.   Got that up and running a few hours ago, but it soon become apparent that if you make everything single operation per line and remove expressions, then it's equally limiting.   The solution is wedge in a expression resolution pass to the scanning core.   Which means you can treat it more like a actual mini language..  At the moment the parser handles  basic formatting, explicit variable declarations, and code generation for some very simple nested expressions.. ahh the wonders of cut and paste :) 
   

kevin


  Dtab Web Importer V0.04

   Hey presto it's up and running, at first glance it looks like any BASIC variant really. The main difference is that it's typeless and explicit.  There's only support variables and built in functions though.   So It needs some form of decision and perhaps a loop construct  also.  So the script can pick through and make decisions from the PAGE.

  Bellow is the standard test snippet, you can see the the parser supports basic math operations with precedence in expressions now.
  Yesterday, you'd have to unroll everything, So  the expression

      Print(Left(result,10)+Right(Result,10)

  ,  used to be 4 lines..

     TEmp1=Right(Result,10)
     TEmp2=LEft(result,10)
     Result=Join(Temp1,Temp2)     
     Print(Result)



{

var result,DUde,Variable,Test,Temp

result=100-(200)
Temp=123

print(result+temp)

print(result)

print(temp)

result=123456
result="This Is a literal string"

print(result)

print(left(result,10)+right(result,10))


; Result = Test
; Result= (Variable)

; Result= (Variable)+Dude

; Result= (Variable)+(Variable+Test)

; Result= ((Variable)+(Variable+Test))*Test

; Result= Variable+(Variable / Dude) * Variable+Dude
; result=print(result)

}


  The way it'll work is the importer loads the HTML page, it then grabs various main details from the page and adds them as global variables for the script.   Here it'll load the info 'decoder' script.  This script queries the page and sets the return global variables depending upon the logic.   Upon conclusion the importer polls the values of the globals variables on exit.   The same process could be applied to cleaning the extracted tab chunk after  import.   Doing things like detecting the author id's and fixing any common errors / formatting issues in the layout.     

  Speed wise it's seems pretty quick, certainly fast enough for the task at hand ..



kevin

#4
  Dtab Web Importer V0.05

    Haven't had any time for this since Friday, so I'm quickly slapping in remaining functionality, starting with a unconditional branch, better known as a GOTO in most languages.   It's necessary if I want add some control structures such as loops or decisions (IF/ENDIF).   The current parser is now able to resolve forward and backward branches, allowing the code to jump forward or back.   The rules are much the same PB (for obvious reasons :) ), enough though it's using a slightly different method..  

   So this code does absolutely nothing in the output..  



goto Cont

var result,Dude,Variable,Test,Temp

result=100-(200)
Temp=123

print(result+temp)
print(result)
print(temp)


Cont1:

result=123456

result="This Is a literal string"

print(result)

print(left(result,10)+right(result,10))


Cont:




     Next we'll look at either adding a loop structure FOR/NEXT, DO LOOP and some form of decision such as an IF/ENDIF pair.   They're not that challenging really, it's just they need a bit of support code to enable nesting.    The expression resolution does the bulk of the work really.


   


kevin


    Dtab Web Importer V0.06-  DTab Script

     Been put some effort into wrapping this up tonight, ran into a couple VB ism's that would the break script execution.   Ended up having to roll a bit of length caster to handle the input and outputs better.  It's still possible to break it, but only when doing something out of the ordinary.

   The current version supports   GOTO, IF/ELSE/ENDIF, as well DO/LOOP and conditional versions.   The conditional version gives the same control as WHILE loop or the Repeat/ Untril structures found in BASIC/C etc.

  Comparison and loops support nesting, so even though the controls are simple on the surface it can handle some reasonable complex logic.  It's not pretty, but it'll do the job at this point. 
   
     Variables are typeless, so they can be assigned integer/float or strings. All the simple math Operations can be performed upon them, as well as the some limited short cuts such as the ++,--,+=,-+,*=, and /=  during assignments.

     Comparison operators are a cross between BASIC and C again.  The main difference to BASIC is the use of the == operator to test for equality between two variables (or expressions)  and the != for inequality, but  the <> operator is also supported.     



var DUde,Count,S

Count=Dude+12345
dude++
print(dude)
dude--

print(dude)

dude+=1000
print(dude)

dude-=500
print(dude)


dude*=4
print(dude)


dude/=2
print(dude)


print("Testing String operators")
s="Testing"
print(s)
s++
print(s)

s+=999
print(s)

print("Testing the Do / Loop Structures")

count=1
do (count<10)
count+=1
print("Count="+count)
loop

print("End of Program")



outputs


>>>> ------------------------------
>>>> Execute Script
>>>> ------------------------------
Printing >>>1
Printing >>>0
Printing >>>1000
Printing >>>500
Printing >>>2000
Printing >>>1000
Printing >>>Testing String operators
Printing >>>Testing
Printing >>>TestingTesting
Printing >>>TestingTesting999
Printing >>>Testing the Do / Loop Structures
Printing >>>Count=2
Printing >>>Count=3
Printing >>>Count=4
Printing >>>Count=5
Printing >>>Count=6
Printing >>>Count=7
Printing >>>Count=8
Printing >>>Count=9
Printing >>>Count=10
Printing >>>End of Program




     You'll notice one interesting byproduct of the all variable being variants,  is that the addition operator can now automatically join numeric and string data together (as least that's how ti appears to the user ;) ),   as seen in the Print("Count="+Count) line

     What's missing now is a FOR/NEXT control loop,  some binary operators such as AND/OR/XOR for better comparisons and perhaps an EXIT  command. 



kevin

 
  Dtab Web Importer -  Dtab Script V0.07

    Added support for FOR/NEXT loops earlier, which has completed the core functionality of the language.  The For /Next construct is BASIC like, was tempted to drop in a C styled formatting, but went with BASIC so I can drag and drop code better between the dialects easier.   

    Since the core functionality is up and running,  tonight i've setting up the interface between the host application and the script engine.   To execute a scripts the host simply passes the script in string form to the compiler.    It's actually a 2.5 pass compiler, much like PlayBASIC.. no prizes for guessing why.    While much of the parser was lifted (and cleaned up ) out of Kyruss II,  all of the compiler and runtime were written on the fly (just under 5000 lines with, which is about 15/20% size of the Kyruss2 btw..).

     This version the of parser is designed around a full  pre-pass, a compile pass and relative address patching.  Having explicit variable declaration means detecting what something is, is generally easier than implicit declarations.    Was getting worried that the all the string thrashing in VB might choke the performance down too much.  But after some testing it's virtually instant, the code bellow loads, compiles and executes routinely around 0.2 of milliseconds on a 6 year old system. 

    The interface works really well, you load the script, push your "global variables", then run it. Upon exit, you just query the value of the globals back.   In the code bellow, the script pulls apart the title string from the test page and extracts the basic about the song, artist and tabber.  This is very specific to that site,  obviously other sites the layout and information on offer differ.   Previously, the  prototype was using built in hardwired parsing.  Which is less work in the short term, but would mean the application needs to be re-build and published in order to make changes.   erm..No thanks..    But this can be avoided if the app queries the server for 'script' updates.

     There's still a few things to suss out, mainly how the scripts will be packaged and delivered.  It's tempting to just have one large code chunk do all the work, but it's probably better to break them individual script for each source url.



; ------------------------------
;  Global variables
; ------------------------------
;    PageURL = The full URL of this page
;    RawPage = THe entire document
;  PageTitle = The Title string from the page

;-----------------------------
;  Return Variables
;-----------------------------
;     SongName = The Number of the name
;   ArtistName = The Number of the name
;   TabberName = The Number of the name



  var StartPos,EndPos,TabbedByStart,TabbedByEnd,Tag
   
        ;---------------------------------------------------------------------------------------
        ;Bring Me To Life Drum Tab by Evanescence | Tabbed by Billabongpro |  MXTabs.net
        ;-------------------------------------------- -------------------------------------------
        ;-------------------------------------------- -------------------------------------------
        If (InStr(PageTitle, "MXTabs.net",1)>0)
        ;---------------------------------------------------------------------------------------
        ;-------------------------------------------- -------------------------------------------

    print("Max Tabs Page")
           
            Tag = "drum tab by"

            EndPos = InStr( PageTitle, Tag,1)

            If (EndPos>0)
                EndPos = EndPos - 1
                StartPos = 1
                SongName = Trim(Mid(PageTitle, StartPos, EndPos - StartPos))
                Print( "Song:" + SongName)
            EndIf

           
    if (Html_LocateTags(PageTitle, Tag, "|", 1,StartPos,EndPos))

                StartPos = StartPos + Len(Tag)
                ArtistName = Mid(PageTitle, StartPos, EndPos - StartPos)
                ArtistName = Trim(ArtistName)
                Print( "Artist:" + ArtistName)

    endif


            Tag = "Tabbed by"

            TabbedByStart = InStr(PageTitle, Tag,1 )

            If (TabbedByStart > 0)
                TabbedByEnd = InStr( PageTitle, "|",TabbedByStart)
                If (TabbedByEnd > TabbedByStart)
                    TabbedByEnd = TabbedByEnd - 1
                    TabbedByStart = TabbedByStart + Len(Tag)
                    TabberName = Mid(PageTitle, TabbedByStart, TabbedByEnd - TabbedByStart)
                    TabberName = Trim(TabberName)
                EndIf
            EndIf


    goto Done

endif




done:


print("End of Program")





   output

>>>> ------------------------------
>>>> Execute Script
>>>> ------------------------------
Printing >>>Max Tabs Page
Printing >>>Song:Eye Of The Tiger
Printing >>>Artist:Tiger
Printing >>>End of Program
Opcode=End of script
[ArtistName]=Tiger
[SongName]=Eye Of The Tiger
[TabberName]=bjork
Total Script Time In Ticks= .170999999999999






kevin

#7
  Dtab Script V0.08  - Typed Arrays
 
   After looking through some of the importer code that I'm looking to externalize in DTAB, there needed to be some way of creating a typed array like structures.   Really don't want to physically  embed arrays into the language,  I'm perfectly happy with the single data type support as is, it keeps everythign nice and minimal to write, even if making the code a bit clunky.   So the array controls have been introduced as a set of functions that return indexes to the buffers.  

  To create an array, first you initialize your typed structure.  Which supports just the regular Int/Float/String fields as well as Array field.  Once the type is constructed, an array can be made using this structure.     When the array is created,  you can write data in and out of it, although at the moment it only supports writing (Haven't written the read.. use to some lovely VB'isms ).

  The read/write data are done through a pair of functions SET() / GET(),   rather than through an assignment like you'd see in virtually any programming language.    It's just the easiest way to add what is a fairly complicated inclusion really.    Spend virtually all night on the array module,  it's not that it's much code really, rather just ended up chasing my tail trying trying to suss out some oddities with VB.  Which has been a common horn in my side in getting it working really :( -  But it seems to work now..  





var VectorType,Vector_X,Vector_Y,Vector_Z

; Allocate the Data Structure
VectorType=NewType("Vector3D")

; Append fields to the structure
Vector_X=AddField(VectorType,"X","INT")
Vector_Y=AddField(VectorType,"Y","INT")
Vector_Z=AddField(VectorType,"Z","INT")


Var MyArray,lp

MyArray=NewArray("CoolStuff",VectorType)

For lp=0 to 100
; turn this structure on
Set(MyArray,0,lp,1)

; fill in the fields  Set(Array, FieldIndex, DataIndex, Data)
Set(MyArray,Vector_X,lp,1000+lp)
Set(MyArray,Vector_Y,lp,2000+lp)
Set(MyArray,Vector_Z,lp,3000+lp)
next





 Which In PB would be something like this.

PlayBASIC Code: [Select]
  Type Vector3D
X as integer
Y as integer
Z as integer
EndType

Dim CoolStuff(100) as Vector3D

For lp=0 to 100
CoolStuff(100).x =1000+lp
CoolStuff(100).y =2000+lp
CoolStuff(100).z =3000+lp
next






kevin


  Dtab Script V0.09

   Haven't had time for this over the weekend really, so this is a bit of retrospective post, as the scripting engine is up and running days ago.  It's also been set up no to react to link posts.  So if you post a link, it download the page, then call the script engine to rip the tab out and grab any extra information.    The only problem i'm having really is that the downloading isn't asynchronous, so the app halts while the download is tracking place.    This isn't a drama most of the time, but it stalls when the server takes a long time to respond or can't connect.    Which isn't really good enough, but trying to get asycn mode working isn't turning out to be as easy as it appeared :(.   



kevin

#9
  Dtab Web Importer V0.10

   The past few sessions have been focused on improving the download support, after messing with the API just couldn't get Async downloading to work in VB.  All is not lost, VB includes another native method with, but it's custom to VB.. doesn't really have any future..  Not that it really matters now I guess.   Anyway after a bit of fiddling, the downloader runs in the background.  Making the fetch process a lot smoother.  
     
   Since that's up and running the next task has been patching the new download system to the processing system.  So the newly downloaded page gets sent the page decoder.  The build in decode rips the actual tab chunks (if any),  after that it calls the script processing pass.   This has been broken into two passes.   There's a domain level script to pull key information from the raw page html, then a post processing pass.    The second pass doesn't exist as yet, but I imagine that here some formatting can take place prior to import.    I'm not ready to commit to a design yet though, needs some real world testing..   So I think that'll get bumped down the release schedule  

    Anyway the pic bellow is show the result of pasting the link into the text box.  Where it fires up the downloader etc etc.   The result here just shows the script tab with song details ripped from the html ready for easy import and saving.   The process from import to saving as dtab file to disc, is around 3 or 4 clicks..  Providing it can get the song details..   

   

     http://www.mxtabs.net/view/tab/69461/megadeth/a_tout_le_monde/



kevin

#10
 Dtab Web Importer

    Since the last post, I've been doing main real world testing of the import process.  Which means browsing around for as many tabs sites as i can find, writing importers and tweaking any issues as they're discovered.   Turned into a bit of marathon session, but already having something like 15 importers written as of Friday morning (had to watch the wedding :) ).  The quality you can fetch back ranges site to site varies, some are excellent, where you grab not only the song/artist names, but the tabber/emails and even things like the album it comes from. Others are much more basic.  

    The biggest issue with importing hand written drum tabs, is they're often full of custom and author specific notations.  Some sites even use completely custom formats, far removed from the general convention.  Making importing those is pretty much out of the question, without specific reformatting logic being written.   There's nothing really that new about that though, since this was one of the motivations  for writing DTAB over a decade ago.  but anyway..

    At this point, there's no central hub where Dtabbers can share importer links.   Ideally, this would be a small dedicated site where the app can query.  Not too sure how this will might be represented internally, but ideally there will be some form where you can fetch the song.  The app will just do the fetch for you.  A lot easier on the user.  

    However, I think this functionality will have to wait until the core's been testing in the wild a bit more.  Definitely going to add a version query and probably a news query of some type to it.   They're a little easier to add :)