Tuesday, April 14, 2015

DSLs With Ruby Treetop

I have been playing with Ruby quite a bit lately.  One of the things I have been working on is a simple data query language to interface with our back-end databases.  At my job this must work on an old UniData database (running on a HP-UX 10.2 box) and SQL Server.  I wanted a language that looks much like the following:

Data Query Language Syntax Help

General Commands:
Comamand History:            list [NumberOfCommandsToReturn] commands
Exit Program:                exit
Reparse The Grammar:         reparse grammar

Macro Commands:
DQL Macros Are A Series Of Commands That You Want Executed
Begin Macro:                 begin macro
Clear All Macros:            clear all macros
Delete All Macros:           delete all macros
Delete Macro:                delete macro ["MacroName"] 
Edit Macro Save File:        edit macros [from Filename] in Notepad
End Macro:                   end macro [into "MacroName"]
List Current Macros:         list macros
List Macro Commands:         list macro "MacroName" commands
List Macro Files:            list macro files
Load Macro File:             load macros [from FileName] 
Run The Macro:               run macro ["MacroName"]
Save Macros To File:         save macros [into FileName]

Database Object Queries:
List Table Names:            list [avante| dw | datawarehouse] tables [into ResultSetName]
List Table Columns:          list [avante| dw | datawarehouse] table "ITMMST" columns [into ResultSetName]

Database Queries:
Run Query:                   run [avante| dw | datawarehouse] query "LIST ITMMST PART.NBR DESCRIPTION" [INTO ResultSetName]

Result Set Commands:
Copy Results:                copy result set SourceResultSetName to DestinationResultSetName
Delete Results:              delete ResultSetName
Filter Results:              filter ResultSetName by "FilterString1" [and "FilterString2" and ...]
List Result Sets:            list result sets
Match Results:               match ResultSetName on "MatchString1" [and "MatchString2" and ...]
Output To Excel:             output ResultSetName to excel file "ExcelFileName"
Revert Results:              revert ResultSetName
Show Results:                show [NumberOfLinesToReturn] [ResultSetName]

I wanted the ability to run queries on both of our main enterprise databases and do simple filtering/matching on the results (called Result Sets in this system).  I also wanted the ability to create a series of commands and save them in a "macro" that I could run later.  I wanted to be able to save these macros to a file to be run later as well.  In this way I could build up libraries of commonly used commands.  Finally, I wanted to be able to call up lists of database objects I am always searching for (table names and columns).

I knew that the "easiest" way to achieve this would be to create a DSL that encapsulated all these commands.  You could do it by manually parsing regexes but that would be no fun.  Luckily Ruby has many tools to do DSLs.  I chose Treetop to create my new language.  Treetop is a great tool.  The only real negative to using it is the complete lack of good documentation.  Hopefully this post may help someone avoid bashing their head into a wall like I did.

In Treetop you build you language using strings and regular expressions.  Let's take an example from the language we created above:

List Table Names:            list [avante| dw | datawarehouse] tables [into ResultSetName]

I represented this in a rule that looks like this:

rule db_command_list_tables
'list' space database_type space database_object space? optional_resultset:into_resultset?
{
def evaluation_code
eval_code = ""
my_result_set_name = 'default'

# Write Code To Get The Table Names
case database_type.text_value
when 'avante'
if database_object.text_value == 'tables'
eval_code = "result_set = avante_db.get_table_names"
end 
when 'dw','datawarehouse'
if database_object.text_value == 'tables'
eval_code = "result_set = datawarehouse_db.get_table_names"
end
end
# Write Code To Handle Optional Result Set
if defined?(optional_resultset.result_set_name)
my_result_set_name = optional_resultset.result_set_name.text_value
end 

# Add ResultSet To Result Set Dictionary
eval_code = eval_code + "; result_set_dict.add_existing_result_set(result_set,'#{my_result_set_name}')"

# Probably Want To List Result Set By Default
if my_result_set_name == 'default'
eval_code = eval_code + "; puts result_set.value; puts \"\\n\" + Rainbow(\"Results Stored In Result Set: default\").green"
end 

return eval_code
end 
}
end 

rule database_type
  'avante' / 'dw' / 'datawarehouse'

end

rule database_object
  'table' [s]* / 'query' / 'data' / 'info'

end

rule into_resultset
  'into' space result_set_name

end

rule space
  [\s]+

end 

The db_command_list_tables rule builds on the database_type, database_object, into_resultset and space rules to create a complete command.  One important thing to note is that when you make something optional (by appending a question mark to the end of it) it no longer is able to be referenced directly.  An example of this is the into_resultset? part of the db_command_list_tables rule.  We added a question mark to the end of it.  Notice that I prepended "optional_resultset:" to into_resultset?  This enables me to later use this code to look up its value:

# Write Code To Handle Optional Result Set
if defined?(optional_resultset.result_set_name)
  my_result_set_name = optional_resultset.result_set_name.text_value
end 

It would make sense if your were able to use a construct like into_resultset.result_set_name.text_value to reference the result set name.  This will fail.  Instead you must prepend optional_resultset: to into_resultset to look into the into_resultset rule.  Why?  I have no idea and it took me forever to figure this out.  I re-iterate ... it is completely necessary to prepend an identifier to any referenced rule you make optional (with a question mark).  This is nowhere (that I could find) in the "official" Treetop documentation and it will trip you up.

The final item of interest is the:

{
  def evaluation_code
  ....
  ....
}

construct inside the rule.  This enables you to use Ruby code to process the text data from rules that match your input.  How meta ;-).  What this enables you to do is create code that would be evaluated when a rule matches certain input.  To use these rules you have to compile the Treetop rules into a file and load them into a parser.  I used a parser class to accomplish this:

class DQL_Parser
  
  def initialize(grammar_path)
    @path_to_grammar = grammar_path
    
    Treetop.load(grammar_path)
    @parser = DQLParser.new 
  end
  
  def reload_grammar()
    puts Rainbow("Reparsing Grammar").green
    Treetop.load(@path_to_grammar)
    @parser = DQLParser.new
  end
  
  def parse(command)
    tree = nil
    if command !~/^\s*$/
      # Pass the data over to the parser instance
      tree = @parser.parse(command)
      # If the AST is nil then there was an error during parsing
      # we need to report a simple error message to help the user
      if(tree.nil?)
        puts Rainbow("ERROR: Cannot parse \"#{command}\".  Error is at offset: #{@parser.index}").red
        puts Rainbow((' ' * (@parser.index + 21)) + '^').red
        puts Rainbow((' ' * (@parser.index + 21)) + '|').red
      end
    end
    return tree
  end

end

$parser = DQL_Parser.new($base_path + '\Data_Query_Language.treetop')
.... some code to set up a REPL
parse_tree = $parser.parse(command)
if !parse_tree.nil?
  if parse_tree.evaluation_code !~ /^ERROR\:/i
    if parse_tree.evaluation_code != ''
      eval (parse_tree.evaluation_code)
  ....

These commands create a parser using all of the commands contained in the Data_Query_Langague.treetop file.  Then I parse the commands a user enters with $parser.parse.  If the parser tree is not null and if it returns something from the evaluation_code ruby code then we have a winner.  We now take the evaluation code returned from the parser and evaluate it with Ruby's eval method.  If we put this code in a loop then we have a REPL that we can use to run a series of DQL (Data Query Language) commands.

The key thing to remember is that the evaluation code is run in the context of the Ruby program that is creating the parser instance.  By using the Ruby readline and rainbow gems you can create such convenience features as auto command completion and colored error/information output.

The only things left for me to do were to create the classes that would enable me to interface with my databases and easily manipulate result sets that they returned.  This was accomplished using the DBI, net/telnet, net/ftp and good old fashioned Ruby classes.

My database classes had the following structure:

class DBInfo
  def initialize

  end
  
  def log_in
    
  end 
  
  def run_query
    
  end
  
  def get_table_names
    
  end
  
  def get_table_cols
    
  end
  
  def close
    
  end  
end

My SQL Server and UniData classes inherit from DBInfo.  Your classes would depend on the type of databases you were interfacing with.

The more interesting classes are the result set classes (excuse the mess, I haven't had time to properly re-factor these):

class ResultSet
  attr_reader :query, :query_type, :column_chars, :name
  
  def initialize(orig_result_temp_file, query, query_type, column_chars = '|', name = 'default')
    @orig_result_file = orig_result_temp_file
    @curr_result_file = @orig_result_file    
    @query = query
    @query_type = query_type
    @column_chars = column_chars
    @name = name
  end
  
def value
    # Read Results From (Possibly Filtered) Temporary Result Files
    if File.exist?(@curr_result_file.path)
      return File.read(@curr_result_file.path)
    else
      puts Rainbow("ERROR: Result File \"#{@curr_result_file.path}\"Does Not Exist").red
      return ""      
    end 
  end
  
  def original_value
   # Return Original Results (Before Filters And Matches)
    if File.exist?(@orig_result_file.path)
      return File.read(@orig_result_file.path)
    else
      puts Rainbow("ERROR: Result File \"#{@orig_result_file.path}\"Does Not Exist").red
      return ""      
    end     
  end
  
  def print_line_in_columns(lines_to_display = 0)
    line_counter = 1
    
    if File.exist?(@curr_result_file.path)
      # Even At One Byte Per Line It Will Never Have More Than This Many Lines
      if lines_to_display <= 0 then lines_to_display = File.size(@curr_result_file.path) end
      
      File.readlines(@curr_result_file.path).each do |line|
        break if line_counter > lines_to_display
        puts line.gsub(/#{@column_chars}/n,"\t")
        line_counter = line_counter + 1
      end
    else
      puts Rainbow("Results File \"#{@curr_result_file.path}\" Does Not Exist").red
      exit
    end
  end
  
  def write_to_excel(filename)
    # Write This Result Set To Excel (xlsx)
    excel_filename = "#{$base_path}/DQL/Excel/#{filename}.xlsx"
    p = Axlsx::Package.new
    p.use_autowidth = true
    wb = p.workbook
    wb.add_worksheet(:name => "Results") do |sheet|
      File.open(@curr_result_file.path,"r").each_line do |line|
        sheet.add_row(line.split(/#{@column_chars}/n))
      end
    end
    p.serialize(excel_filename)
  end
  
  def revert_to_original_value
    # Revert Result Set To Original Value (Before Filters And Matches)
    if !@orig_result_file == @curr_result_file
      @curr_result_file.close
      @curr_result_file.unlink
    end
    @curr_result_file = @orig_result_file
    puts Rainbow("Result Set \"" + name + "\" Reverted To Original Data").green
    self
  end
  
  def filter_out(filter_query = "")
    # Filter Things Out Of The Result Set
    if @orig_result_file == @curr_result_file
      if not(File.exist?(@orig_result_file.path))
        puts Rainbow("ERROR: Result File \"#{@orig_result_file.path}\"Does Not Exist").red
        exit        
      end
      source = @orig_result_file
      destination = Tempfile.new('OpenODBCECL_Dest')
    else
      if not(File.exist?(@curr_result_file.path))
        puts Rainbow("ERROR: Result File \"#{@curr_result_file.path}\"Does Not Exist").red
        exit        
      end      
      source = @curr_result_file
      destination = Tempfile.new('OpenODBCECL_Dest')
    end
    # Create New Result File Base On The Matching Query
    output = File.open( destination.path,"a" )    
    File.readlines(source.path).each do |line|
      if line !~/#{filter_query}/i
        output << line
      end   
    end      
    output.close
    if @orig_result_file != @curr_result_file
      # Remove Intermediate Result File If It Is Not Equal To Original Results
      @curr_result_file.close
      @curr_result_file.unlink
    end
    # Set Current Result File To New Destination
    @curr_result_file = destination
    # Give Status
    puts Rainbow("Filter \"#{filter_query}\" Applied To Result Set \"" + name + "\"").green
    # Return self So You Can Chain Filters
    self      
  end
  
  def matching(matching_query = "")
    # Return Items Matching matching_query
    if @orig_result_file == @curr_result_file
      if not(File.exist?(@orig_result_file.path))
        puts Rainbow("ERROR: Result File \"#{@orig_result_file.path}\"Does Not Exist").red
        exit        
      end
      source = @orig_result_file
      destination = Tempfile.new('OpenODBCECL_Dest')
    else
      if not(File.exist?(@curr_result_file.path))
        puts Rainbow("ERROR: Result File \"#{@curr_result_file.path}\"Does Not Exist").red
        exit        
      end      
      source = @curr_result_file
      destination = Tempfile.new('OpenODBCECL_Dest')
    end
    # Create New Result File Base On The Matching Query
    output = File.open( destination.path,"a" )    
    File.readlines(source.path).each do |line|
      if line =~/#{matching_query}/i
        output << line
      end   
    end      
    output.close
    if @orig_result_file != @curr_result_file
      # Remove Intermediate Result File If It Is Not Equal To Original Results
      @curr_result_file.close
      @curr_result_file.unlink
    end
    # Set Current Result File To New Destination
    @curr_result_file = destination
    # Give Status
    puts Rainbow("Match Query \"#{matching_query}\" Applied To Result Set \"" + name + "\"").green
    # Return self So You Can Chain Filters
    self
  end
    
  def clean_up_result_files
    # Clean Up Result Files
    if !@orig_result_file.nil?
      if File.exist?(@orig_result_file.path)
        @orig_result_file.close
        @orig_result_file.unlink
      end
    end
    # @curr_result_file.path.nil? Necessary If @orig_result_file = @curr_result_file
    if !@curr_result_file.nil? and !@curr_result_file.path.nil?
      if File.exist?(@curr_result_file.path)
        @curr_result_file.close
        @curr_result_file.unlink
      end 
    end
  end
  
  def close
    clean_up_result_files
  end  
end

class ResultSetDictionary
  
  def initialize
    @result_set_hash = Hash.new
  end
  
  def result_set_exists?(key)
     if @result_set_hash[key].nil?
       return false
     else
       return true 
     end
  end
  
  def list_result_sets()
    puts "Existing Result Sets:"
    @result_set_hash.keys.each {|key| puts "    #{key}"}
  end
  
  def get_result_set(result_set_name)
    #puts @result_set_hash[result_set_name].value
    if result_set_exists?(result_set_name)
      return @result_set_hash[result_set_name]
    else
      puts "Result Set \"#{result_set_name}\" Does Not Exist"
    end 
  end
  
  def delete_result_set(result_set_name)
    if result_set_exists?(result_set_name)
      get_result_set(result_set_name).close
      @result_set_hash.delete(result_set_name)
      puts Rainbow("Result Set \"#{result_set_name}\" Is Deleted.").green
    else
      puts Rainbow("ERROR: Result Set \"#{result_set_name}\" Does Not Exist.").red
      return
    end
  end
  
  def add_existing_result_set(existing_result_set, result_set_name = 'default')
    # Given A ResultSet Add It To My Dictionary
    if result_set_exists?(result_set_name)
      puts Rainbow("ERROR: Result Set #{result_set_name} For Database Type #{existing_result_set.query_type} All Ready Exists.  Deleting Existing Key").red
      # return nil
      delete_result_set(result_set_name)
    end
    @result_set_hash.store(result_set_name,existing_result_set)
    return @result_set_hash[result_set_name]
  end
  
  def add_new_result_set(orig_result_file, result_set_name = 'default', database_type = '', column_chars = '|')
    # Given Proper Items Add A New Result Set To The Result Set Dictionary
    if result_set_exists?(result_set_name)
      puts Rainbow("ERROR: Result Set #{result_set_name} For Database Type #{database_type} All Ready Exists.  Deleting Existing Key").color(255,102,0)
      #return nil
      delete_result_set(result_set_name)
    end
    @result_set_hash.store(result_set_name,ResultSet.new(orig_result_file, query, database_type, result_set_name))
    return @result_set_hash[result_set_name]
  end
  
  def cleanup
    puts "Cleaning Up Result Sets"
    # Close All Result Sets
    @result_set_hash.keys.each {|key| puts "    Result Set \"#{key}\" Cleaned"; @result_set_hash[key].close}
    puts "Result Sets Cleaned"
  end
end

Basically these two classes let me reference result sets by name in my evaluation_code Treetop definitions.  Result sets are basically just temporary text files with my results in them (one record per line).  Filtering and matching are done by reading these files on line at a time and dumping appropriate lines into new temporary result set files.  These temporary files are cleaned up on program exit or when they are deleted.

This is the process I used to create my DQL meta programming language.  I was able to create the whole language using 634 lines of Treetop definitions and a 734 line program that contains all the database, result set and REPL logic.

Overall I am impressed with Treetop's capabilities.  More documentation would be nice but answers can be found on the Internet (with a lot of effort).  I'm hoping this will become another resource for people curious about this outstanding library.