Multi-query PHP Parser for SQL files











up vote
0
down vote

favorite












I have an external project, which needs to run an SQL file containing multiple SQL queries (a bit like mysqldump, if you like, but is user-created, with any kind of SQL statement/definition).



PHP mysqli() does not allow multi-queries, and this problem occurs around quite often, when (for example) updating a site-module/widget to a higher version, and there is an update.sql file of some sorts. Idem for de-installations.



Given the following (1):



//DB connection values
$sHost = "localhost";
$sName = "test";
$sUser = "";
$sPass = "";
$sPort = 3307;


//The following could be retrieved using file_get_contents, or a file streamer
$sFileContents = <<<EOT
-- This is the first comment
SELECT * FROM dl_bookmarks WHERE iID=3;

/* This is the second comment */
SELECT * FROM dl_bookmarks WHERE sTitle=""Paragon" Initiative Enterprises Software consulting and web development for businesses \\ 'smes'";

# This is the third comment
SELECT * FROM dl_bookmarks WHERE sTitle LIKE '"xDEEP" Diving Equipment; Scuba Gear; Single tank BC; Side Mount; Double tank BCD; Diving computer 'equipment'';
EOT;


The ideal solution would be to execute the SQL file at command line, as the following code is demonstrating (with the above sample preceeding):



//Variant 1: Run a local SQL file. Since we stored our SQL contents in a 
//variable (could have been retrieved before using eg. file_get_contents),
//we need to temporarily create a file for this

$sTempFile = tempnam(sys_get_temp_dir(), 'Sql');

//Create the temp file
if(!file_put_contents($sTempFile, $sFileContents)) {
trigger_error("Failed to create temporary file", E_USER_ERROR);
}

//Assemble the command
$sCommand = 'mysql'
. ' --host=' . $sHost
. ' --port=' . $sPort
. ' --user=' . $sUser
. ' --password=' . $sPass
. ' --database=' . $sName
. ' --execute="SOURCE ' . $sTempFile . '"'
;
$sOutput = shell_exec($sCommand);

//Cleanup: remove the temp file
if(!unlink($sTempFile)) {
trigger_error("Failed to remove temporary file", E_USER_ERROR);
}


...but some projects are on shared servers or other limited access servers where shell execution is not allowed, or the mysql command may not be available.



So, to avoid importing heavy external parsing libraries with heavy support to solve a small site-related problem, many projects loop through the SQL file, and split at every ; character into a new query, which is executed individually. Because SQL values can contain these characters too, this process fails in special cases, and the split is only done for ; characters at the end of a line (but this still fails for multi-line SQL values with this character appearing at the end.)



My proposed solution is to properly parse SQL files/content, and split the queries where the queries truly end, according to SQL standards. I am seeking help with this.



My code (with the above first block sample preceeding):



//Variant 2: Run a parser

//Connect to the database
$rMysqlI = new mysqli("localhost", "", "", "test", $sPort);
if ($rMysqlI->connect_errno) {
trigger_error("Failed to connect to MySQL: (" . $rMysqlI->connect_errno . ") " . $rMysqlI->connect_error, E_USER_ERROR);
}



//START_OF_PARSER

$iCur = 0; //Current character pointer inside the SQL content
$iInside = 0; //The context, in which the pointer is currently located (is the pointer inside a
//comment, an SQL query, or deeper into an SQL query value?)
$sBuffer = ""; //The buffer of the next individual query
$aQueries = array(); //The list of queries
while($iCur < strlen($sFileContents)) {

switch ($iInside) {
case 0: //Inside query-context
//Change context: Comments beginning with --
if(substr($sFileContents, $iCur, 2) === "--") {
$iCur++;
$iInside = 2;

//Change context: Comments beginning with /*
} elseif(substr($sFileContents, $iCur, 2) === "/*") {
$iCur++;
$iInside = 3;

//Change context: Comments beginning with #
} elseif(substr($sFileContents, $iCur, 1) === "#") {
$iInside = 2;

//Separator for a new query
} elseif(substr($sFileContents, $iCur, 1) === ";") {
$aQueries = trim($sBuffer); //$sBuffer; //Add current buffer to a unique array query item
$sBuffer = ""; //Start a new buffer

//Change context: query values opened with '
} elseif(substr($sFileContents, $iCur, 1) === "'") {
$sBuffer .= substr($sFileContents, $iCur, 1);
$iInside = 1;

//Change context: query values opened with "
} elseif(substr($sFileContents, $iCur, 1) === '"') {
$sBuffer .= substr($sFileContents, $iCur, 1);
$iInside = 4;

//Not a special character
} else {
$sBuffer .= substr($sFileContents, $iCur, 1);
}
break;

case 1: //Inside value-context, ending with '

//Escaping character found within the query-value
if(substr($sFileContents, $iCur, 1) === "\") {
$sBuffer .= substr($sFileContents, $iCur, 2);
$iCur++; //Skip next char

//The ending character for the query-value is found
} elseif(substr($sFileContents, $iCur, 1) === "'") {
$sBuffer .= substr($sFileContents, $iCur, 1);
$iInside = 0;

//Not a special character
} else {
$sBuffer .= substr($sFileContents, $iCur, 1);
}
break;

case 4: //Inside value-context, ending with "

//Escaping character found within the query-value
if(substr($sFileContents, $iCur, 1) === "\") {
$sBuffer .= substr($sFileContents, $iCur, 2);
$iCur = $iCur + 1; //Skip next char

//The ending character for the query-value is found
} elseif(substr($sFileContents, $iCur, 1) === '"') {
$sBuffer .= substr($sFileContents, $iCur, 1);
$iInside = 0;

//Not a special character
} else {
$sBuffer .= substr($sFileContents, $iCur, 1);
}
break;

case 2: //Inside comment-context, ending with newline

//A two-character newline is found, signalling the end of the comment
if(substr($sFileContents, $iCur, 2) === "rn") {
$iCur++;
$iInside = 0;

//A single-character newline is found, signalling the end of the comment
} elseif(substr($sFileContents, $iCur, 1) === "n" || substr($sFileContents, $iCur, 1) === "r") {
$iInside = 0;
}
break;

case 3: //Inside comment-context, ending with */

//A two-character */ is found, signalling the end of the comment
if(substr($sFileContents, $iCur, 2) === "*/") {
$iCur++;
$iInside = 0;
}
break;

default:
break;
}
$iCur++;
}

//END_OF_PARSER

//Preview our results
foreach($aQueries as $sQuery) {
if (!$rMysqlI->query($sQuery)) {
echo "ERROR "{$sQuery}": (" . $rMysqlI->errno . ") " . $rMysqlI->error . "<br />", E_USER_ERROR;
} else {
echo "SUCCESS "{$sQuery}"<br />", E_USER_ERROR;
}
}


For my problem-area, I am currently only interested in what is between START_OF_PARSER and END_OF_PARSER, ie. the parser bit, or the block of code that converts SQL content to an array of individual queries, which can/may be executed individually later on (as above foreach does), producing the same result as if the Variant 1 code block was used.



The concerns that I have about my code:





  • Query-value encapsulation characters: Are they always ' or ", or are other query-value encapsulation characters used?


  • Query-value Escaping characters: I don't think its a good idea to assume your own escaping skills (security reasons..), but ignoring security concerns for now, I assume looking out for a single escaping character (backslash ) is enough, to discover which query-value character is truly the ending query-value character. But is the backslash the only escaping character (or method) in other SQL dialects?


  • Comment ignoring: To remove comments, I look for /* and */, -- and newline, # and newline characters. Is this enough?


  • DB Abstraction: I assumed MySQL, but I know that the problem I am addressing is for PearDB, which could be Oracle or PostGreSQL or LiteSQL or etc. Do those SQL dialects also have the same characters for comments, escaping, value-encapsulation, etc? Will this work for most of those DBs ?


Executing this code seems to work under Windows (and presumably Linux too).










share|improve this question














bumped to the homepage by Community 15 hours ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.



















    up vote
    0
    down vote

    favorite












    I have an external project, which needs to run an SQL file containing multiple SQL queries (a bit like mysqldump, if you like, but is user-created, with any kind of SQL statement/definition).



    PHP mysqli() does not allow multi-queries, and this problem occurs around quite often, when (for example) updating a site-module/widget to a higher version, and there is an update.sql file of some sorts. Idem for de-installations.



    Given the following (1):



    //DB connection values
    $sHost = "localhost";
    $sName = "test";
    $sUser = "";
    $sPass = "";
    $sPort = 3307;


    //The following could be retrieved using file_get_contents, or a file streamer
    $sFileContents = <<<EOT
    -- This is the first comment
    SELECT * FROM dl_bookmarks WHERE iID=3;

    /* This is the second comment */
    SELECT * FROM dl_bookmarks WHERE sTitle=""Paragon" Initiative Enterprises Software consulting and web development for businesses \\ 'smes'";

    # This is the third comment
    SELECT * FROM dl_bookmarks WHERE sTitle LIKE '"xDEEP" Diving Equipment; Scuba Gear; Single tank BC; Side Mount; Double tank BCD; Diving computer 'equipment'';
    EOT;


    The ideal solution would be to execute the SQL file at command line, as the following code is demonstrating (with the above sample preceeding):



    //Variant 1: Run a local SQL file. Since we stored our SQL contents in a 
    //variable (could have been retrieved before using eg. file_get_contents),
    //we need to temporarily create a file for this

    $sTempFile = tempnam(sys_get_temp_dir(), 'Sql');

    //Create the temp file
    if(!file_put_contents($sTempFile, $sFileContents)) {
    trigger_error("Failed to create temporary file", E_USER_ERROR);
    }

    //Assemble the command
    $sCommand = 'mysql'
    . ' --host=' . $sHost
    . ' --port=' . $sPort
    . ' --user=' . $sUser
    . ' --password=' . $sPass
    . ' --database=' . $sName
    . ' --execute="SOURCE ' . $sTempFile . '"'
    ;
    $sOutput = shell_exec($sCommand);

    //Cleanup: remove the temp file
    if(!unlink($sTempFile)) {
    trigger_error("Failed to remove temporary file", E_USER_ERROR);
    }


    ...but some projects are on shared servers or other limited access servers where shell execution is not allowed, or the mysql command may not be available.



    So, to avoid importing heavy external parsing libraries with heavy support to solve a small site-related problem, many projects loop through the SQL file, and split at every ; character into a new query, which is executed individually. Because SQL values can contain these characters too, this process fails in special cases, and the split is only done for ; characters at the end of a line (but this still fails for multi-line SQL values with this character appearing at the end.)



    My proposed solution is to properly parse SQL files/content, and split the queries where the queries truly end, according to SQL standards. I am seeking help with this.



    My code (with the above first block sample preceeding):



    //Variant 2: Run a parser

    //Connect to the database
    $rMysqlI = new mysqli("localhost", "", "", "test", $sPort);
    if ($rMysqlI->connect_errno) {
    trigger_error("Failed to connect to MySQL: (" . $rMysqlI->connect_errno . ") " . $rMysqlI->connect_error, E_USER_ERROR);
    }



    //START_OF_PARSER

    $iCur = 0; //Current character pointer inside the SQL content
    $iInside = 0; //The context, in which the pointer is currently located (is the pointer inside a
    //comment, an SQL query, or deeper into an SQL query value?)
    $sBuffer = ""; //The buffer of the next individual query
    $aQueries = array(); //The list of queries
    while($iCur < strlen($sFileContents)) {

    switch ($iInside) {
    case 0: //Inside query-context
    //Change context: Comments beginning with --
    if(substr($sFileContents, $iCur, 2) === "--") {
    $iCur++;
    $iInside = 2;

    //Change context: Comments beginning with /*
    } elseif(substr($sFileContents, $iCur, 2) === "/*") {
    $iCur++;
    $iInside = 3;

    //Change context: Comments beginning with #
    } elseif(substr($sFileContents, $iCur, 1) === "#") {
    $iInside = 2;

    //Separator for a new query
    } elseif(substr($sFileContents, $iCur, 1) === ";") {
    $aQueries = trim($sBuffer); //$sBuffer; //Add current buffer to a unique array query item
    $sBuffer = ""; //Start a new buffer

    //Change context: query values opened with '
    } elseif(substr($sFileContents, $iCur, 1) === "'") {
    $sBuffer .= substr($sFileContents, $iCur, 1);
    $iInside = 1;

    //Change context: query values opened with "
    } elseif(substr($sFileContents, $iCur, 1) === '"') {
    $sBuffer .= substr($sFileContents, $iCur, 1);
    $iInside = 4;

    //Not a special character
    } else {
    $sBuffer .= substr($sFileContents, $iCur, 1);
    }
    break;

    case 1: //Inside value-context, ending with '

    //Escaping character found within the query-value
    if(substr($sFileContents, $iCur, 1) === "\") {
    $sBuffer .= substr($sFileContents, $iCur, 2);
    $iCur++; //Skip next char

    //The ending character for the query-value is found
    } elseif(substr($sFileContents, $iCur, 1) === "'") {
    $sBuffer .= substr($sFileContents, $iCur, 1);
    $iInside = 0;

    //Not a special character
    } else {
    $sBuffer .= substr($sFileContents, $iCur, 1);
    }
    break;

    case 4: //Inside value-context, ending with "

    //Escaping character found within the query-value
    if(substr($sFileContents, $iCur, 1) === "\") {
    $sBuffer .= substr($sFileContents, $iCur, 2);
    $iCur = $iCur + 1; //Skip next char

    //The ending character for the query-value is found
    } elseif(substr($sFileContents, $iCur, 1) === '"') {
    $sBuffer .= substr($sFileContents, $iCur, 1);
    $iInside = 0;

    //Not a special character
    } else {
    $sBuffer .= substr($sFileContents, $iCur, 1);
    }
    break;

    case 2: //Inside comment-context, ending with newline

    //A two-character newline is found, signalling the end of the comment
    if(substr($sFileContents, $iCur, 2) === "rn") {
    $iCur++;
    $iInside = 0;

    //A single-character newline is found, signalling the end of the comment
    } elseif(substr($sFileContents, $iCur, 1) === "n" || substr($sFileContents, $iCur, 1) === "r") {
    $iInside = 0;
    }
    break;

    case 3: //Inside comment-context, ending with */

    //A two-character */ is found, signalling the end of the comment
    if(substr($sFileContents, $iCur, 2) === "*/") {
    $iCur++;
    $iInside = 0;
    }
    break;

    default:
    break;
    }
    $iCur++;
    }

    //END_OF_PARSER

    //Preview our results
    foreach($aQueries as $sQuery) {
    if (!$rMysqlI->query($sQuery)) {
    echo "ERROR "{$sQuery}": (" . $rMysqlI->errno . ") " . $rMysqlI->error . "<br />", E_USER_ERROR;
    } else {
    echo "SUCCESS "{$sQuery}"<br />", E_USER_ERROR;
    }
    }


    For my problem-area, I am currently only interested in what is between START_OF_PARSER and END_OF_PARSER, ie. the parser bit, or the block of code that converts SQL content to an array of individual queries, which can/may be executed individually later on (as above foreach does), producing the same result as if the Variant 1 code block was used.



    The concerns that I have about my code:





    • Query-value encapsulation characters: Are they always ' or ", or are other query-value encapsulation characters used?


    • Query-value Escaping characters: I don't think its a good idea to assume your own escaping skills (security reasons..), but ignoring security concerns for now, I assume looking out for a single escaping character (backslash ) is enough, to discover which query-value character is truly the ending query-value character. But is the backslash the only escaping character (or method) in other SQL dialects?


    • Comment ignoring: To remove comments, I look for /* and */, -- and newline, # and newline characters. Is this enough?


    • DB Abstraction: I assumed MySQL, but I know that the problem I am addressing is for PearDB, which could be Oracle or PostGreSQL or LiteSQL or etc. Do those SQL dialects also have the same characters for comments, escaping, value-encapsulation, etc? Will this work for most of those DBs ?


    Executing this code seems to work under Windows (and presumably Linux too).










    share|improve this question














    bumped to the homepage by Community 15 hours ago


    This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

















      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I have an external project, which needs to run an SQL file containing multiple SQL queries (a bit like mysqldump, if you like, but is user-created, with any kind of SQL statement/definition).



      PHP mysqli() does not allow multi-queries, and this problem occurs around quite often, when (for example) updating a site-module/widget to a higher version, and there is an update.sql file of some sorts. Idem for de-installations.



      Given the following (1):



      //DB connection values
      $sHost = "localhost";
      $sName = "test";
      $sUser = "";
      $sPass = "";
      $sPort = 3307;


      //The following could be retrieved using file_get_contents, or a file streamer
      $sFileContents = <<<EOT
      -- This is the first comment
      SELECT * FROM dl_bookmarks WHERE iID=3;

      /* This is the second comment */
      SELECT * FROM dl_bookmarks WHERE sTitle=""Paragon" Initiative Enterprises Software consulting and web development for businesses \\ 'smes'";

      # This is the third comment
      SELECT * FROM dl_bookmarks WHERE sTitle LIKE '"xDEEP" Diving Equipment; Scuba Gear; Single tank BC; Side Mount; Double tank BCD; Diving computer 'equipment'';
      EOT;


      The ideal solution would be to execute the SQL file at command line, as the following code is demonstrating (with the above sample preceeding):



      //Variant 1: Run a local SQL file. Since we stored our SQL contents in a 
      //variable (could have been retrieved before using eg. file_get_contents),
      //we need to temporarily create a file for this

      $sTempFile = tempnam(sys_get_temp_dir(), 'Sql');

      //Create the temp file
      if(!file_put_contents($sTempFile, $sFileContents)) {
      trigger_error("Failed to create temporary file", E_USER_ERROR);
      }

      //Assemble the command
      $sCommand = 'mysql'
      . ' --host=' . $sHost
      . ' --port=' . $sPort
      . ' --user=' . $sUser
      . ' --password=' . $sPass
      . ' --database=' . $sName
      . ' --execute="SOURCE ' . $sTempFile . '"'
      ;
      $sOutput = shell_exec($sCommand);

      //Cleanup: remove the temp file
      if(!unlink($sTempFile)) {
      trigger_error("Failed to remove temporary file", E_USER_ERROR);
      }


      ...but some projects are on shared servers or other limited access servers where shell execution is not allowed, or the mysql command may not be available.



      So, to avoid importing heavy external parsing libraries with heavy support to solve a small site-related problem, many projects loop through the SQL file, and split at every ; character into a new query, which is executed individually. Because SQL values can contain these characters too, this process fails in special cases, and the split is only done for ; characters at the end of a line (but this still fails for multi-line SQL values with this character appearing at the end.)



      My proposed solution is to properly parse SQL files/content, and split the queries where the queries truly end, according to SQL standards. I am seeking help with this.



      My code (with the above first block sample preceeding):



      //Variant 2: Run a parser

      //Connect to the database
      $rMysqlI = new mysqli("localhost", "", "", "test", $sPort);
      if ($rMysqlI->connect_errno) {
      trigger_error("Failed to connect to MySQL: (" . $rMysqlI->connect_errno . ") " . $rMysqlI->connect_error, E_USER_ERROR);
      }



      //START_OF_PARSER

      $iCur = 0; //Current character pointer inside the SQL content
      $iInside = 0; //The context, in which the pointer is currently located (is the pointer inside a
      //comment, an SQL query, or deeper into an SQL query value?)
      $sBuffer = ""; //The buffer of the next individual query
      $aQueries = array(); //The list of queries
      while($iCur < strlen($sFileContents)) {

      switch ($iInside) {
      case 0: //Inside query-context
      //Change context: Comments beginning with --
      if(substr($sFileContents, $iCur, 2) === "--") {
      $iCur++;
      $iInside = 2;

      //Change context: Comments beginning with /*
      } elseif(substr($sFileContents, $iCur, 2) === "/*") {
      $iCur++;
      $iInside = 3;

      //Change context: Comments beginning with #
      } elseif(substr($sFileContents, $iCur, 1) === "#") {
      $iInside = 2;

      //Separator for a new query
      } elseif(substr($sFileContents, $iCur, 1) === ";") {
      $aQueries = trim($sBuffer); //$sBuffer; //Add current buffer to a unique array query item
      $sBuffer = ""; //Start a new buffer

      //Change context: query values opened with '
      } elseif(substr($sFileContents, $iCur, 1) === "'") {
      $sBuffer .= substr($sFileContents, $iCur, 1);
      $iInside = 1;

      //Change context: query values opened with "
      } elseif(substr($sFileContents, $iCur, 1) === '"') {
      $sBuffer .= substr($sFileContents, $iCur, 1);
      $iInside = 4;

      //Not a special character
      } else {
      $sBuffer .= substr($sFileContents, $iCur, 1);
      }
      break;

      case 1: //Inside value-context, ending with '

      //Escaping character found within the query-value
      if(substr($sFileContents, $iCur, 1) === "\") {
      $sBuffer .= substr($sFileContents, $iCur, 2);
      $iCur++; //Skip next char

      //The ending character for the query-value is found
      } elseif(substr($sFileContents, $iCur, 1) === "'") {
      $sBuffer .= substr($sFileContents, $iCur, 1);
      $iInside = 0;

      //Not a special character
      } else {
      $sBuffer .= substr($sFileContents, $iCur, 1);
      }
      break;

      case 4: //Inside value-context, ending with "

      //Escaping character found within the query-value
      if(substr($sFileContents, $iCur, 1) === "\") {
      $sBuffer .= substr($sFileContents, $iCur, 2);
      $iCur = $iCur + 1; //Skip next char

      //The ending character for the query-value is found
      } elseif(substr($sFileContents, $iCur, 1) === '"') {
      $sBuffer .= substr($sFileContents, $iCur, 1);
      $iInside = 0;

      //Not a special character
      } else {
      $sBuffer .= substr($sFileContents, $iCur, 1);
      }
      break;

      case 2: //Inside comment-context, ending with newline

      //A two-character newline is found, signalling the end of the comment
      if(substr($sFileContents, $iCur, 2) === "rn") {
      $iCur++;
      $iInside = 0;

      //A single-character newline is found, signalling the end of the comment
      } elseif(substr($sFileContents, $iCur, 1) === "n" || substr($sFileContents, $iCur, 1) === "r") {
      $iInside = 0;
      }
      break;

      case 3: //Inside comment-context, ending with */

      //A two-character */ is found, signalling the end of the comment
      if(substr($sFileContents, $iCur, 2) === "*/") {
      $iCur++;
      $iInside = 0;
      }
      break;

      default:
      break;
      }
      $iCur++;
      }

      //END_OF_PARSER

      //Preview our results
      foreach($aQueries as $sQuery) {
      if (!$rMysqlI->query($sQuery)) {
      echo "ERROR "{$sQuery}": (" . $rMysqlI->errno . ") " . $rMysqlI->error . "<br />", E_USER_ERROR;
      } else {
      echo "SUCCESS "{$sQuery}"<br />", E_USER_ERROR;
      }
      }


      For my problem-area, I am currently only interested in what is between START_OF_PARSER and END_OF_PARSER, ie. the parser bit, or the block of code that converts SQL content to an array of individual queries, which can/may be executed individually later on (as above foreach does), producing the same result as if the Variant 1 code block was used.



      The concerns that I have about my code:





      • Query-value encapsulation characters: Are they always ' or ", or are other query-value encapsulation characters used?


      • Query-value Escaping characters: I don't think its a good idea to assume your own escaping skills (security reasons..), but ignoring security concerns for now, I assume looking out for a single escaping character (backslash ) is enough, to discover which query-value character is truly the ending query-value character. But is the backslash the only escaping character (or method) in other SQL dialects?


      • Comment ignoring: To remove comments, I look for /* and */, -- and newline, # and newline characters. Is this enough?


      • DB Abstraction: I assumed MySQL, but I know that the problem I am addressing is for PearDB, which could be Oracle or PostGreSQL or LiteSQL or etc. Do those SQL dialects also have the same characters for comments, escaping, value-encapsulation, etc? Will this work for most of those DBs ?


      Executing this code seems to work under Windows (and presumably Linux too).










      share|improve this question













      I have an external project, which needs to run an SQL file containing multiple SQL queries (a bit like mysqldump, if you like, but is user-created, with any kind of SQL statement/definition).



      PHP mysqli() does not allow multi-queries, and this problem occurs around quite often, when (for example) updating a site-module/widget to a higher version, and there is an update.sql file of some sorts. Idem for de-installations.



      Given the following (1):



      //DB connection values
      $sHost = "localhost";
      $sName = "test";
      $sUser = "";
      $sPass = "";
      $sPort = 3307;


      //The following could be retrieved using file_get_contents, or a file streamer
      $sFileContents = <<<EOT
      -- This is the first comment
      SELECT * FROM dl_bookmarks WHERE iID=3;

      /* This is the second comment */
      SELECT * FROM dl_bookmarks WHERE sTitle=""Paragon" Initiative Enterprises Software consulting and web development for businesses \\ 'smes'";

      # This is the third comment
      SELECT * FROM dl_bookmarks WHERE sTitle LIKE '"xDEEP" Diving Equipment; Scuba Gear; Single tank BC; Side Mount; Double tank BCD; Diving computer 'equipment'';
      EOT;


      The ideal solution would be to execute the SQL file at command line, as the following code is demonstrating (with the above sample preceeding):



      //Variant 1: Run a local SQL file. Since we stored our SQL contents in a 
      //variable (could have been retrieved before using eg. file_get_contents),
      //we need to temporarily create a file for this

      $sTempFile = tempnam(sys_get_temp_dir(), 'Sql');

      //Create the temp file
      if(!file_put_contents($sTempFile, $sFileContents)) {
      trigger_error("Failed to create temporary file", E_USER_ERROR);
      }

      //Assemble the command
      $sCommand = 'mysql'
      . ' --host=' . $sHost
      . ' --port=' . $sPort
      . ' --user=' . $sUser
      . ' --password=' . $sPass
      . ' --database=' . $sName
      . ' --execute="SOURCE ' . $sTempFile . '"'
      ;
      $sOutput = shell_exec($sCommand);

      //Cleanup: remove the temp file
      if(!unlink($sTempFile)) {
      trigger_error("Failed to remove temporary file", E_USER_ERROR);
      }


      ...but some projects are on shared servers or other limited access servers where shell execution is not allowed, or the mysql command may not be available.



      So, to avoid importing heavy external parsing libraries with heavy support to solve a small site-related problem, many projects loop through the SQL file, and split at every ; character into a new query, which is executed individually. Because SQL values can contain these characters too, this process fails in special cases, and the split is only done for ; characters at the end of a line (but this still fails for multi-line SQL values with this character appearing at the end.)



      My proposed solution is to properly parse SQL files/content, and split the queries where the queries truly end, according to SQL standards. I am seeking help with this.



      My code (with the above first block sample preceeding):



      //Variant 2: Run a parser

      //Connect to the database
      $rMysqlI = new mysqli("localhost", "", "", "test", $sPort);
      if ($rMysqlI->connect_errno) {
      trigger_error("Failed to connect to MySQL: (" . $rMysqlI->connect_errno . ") " . $rMysqlI->connect_error, E_USER_ERROR);
      }



      //START_OF_PARSER

      $iCur = 0; //Current character pointer inside the SQL content
      $iInside = 0; //The context, in which the pointer is currently located (is the pointer inside a
      //comment, an SQL query, or deeper into an SQL query value?)
      $sBuffer = ""; //The buffer of the next individual query
      $aQueries = array(); //The list of queries
      while($iCur < strlen($sFileContents)) {

      switch ($iInside) {
      case 0: //Inside query-context
      //Change context: Comments beginning with --
      if(substr($sFileContents, $iCur, 2) === "--") {
      $iCur++;
      $iInside = 2;

      //Change context: Comments beginning with /*
      } elseif(substr($sFileContents, $iCur, 2) === "/*") {
      $iCur++;
      $iInside = 3;

      //Change context: Comments beginning with #
      } elseif(substr($sFileContents, $iCur, 1) === "#") {
      $iInside = 2;

      //Separator for a new query
      } elseif(substr($sFileContents, $iCur, 1) === ";") {
      $aQueries = trim($sBuffer); //$sBuffer; //Add current buffer to a unique array query item
      $sBuffer = ""; //Start a new buffer

      //Change context: query values opened with '
      } elseif(substr($sFileContents, $iCur, 1) === "'") {
      $sBuffer .= substr($sFileContents, $iCur, 1);
      $iInside = 1;

      //Change context: query values opened with "
      } elseif(substr($sFileContents, $iCur, 1) === '"') {
      $sBuffer .= substr($sFileContents, $iCur, 1);
      $iInside = 4;

      //Not a special character
      } else {
      $sBuffer .= substr($sFileContents, $iCur, 1);
      }
      break;

      case 1: //Inside value-context, ending with '

      //Escaping character found within the query-value
      if(substr($sFileContents, $iCur, 1) === "\") {
      $sBuffer .= substr($sFileContents, $iCur, 2);
      $iCur++; //Skip next char

      //The ending character for the query-value is found
      } elseif(substr($sFileContents, $iCur, 1) === "'") {
      $sBuffer .= substr($sFileContents, $iCur, 1);
      $iInside = 0;

      //Not a special character
      } else {
      $sBuffer .= substr($sFileContents, $iCur, 1);
      }
      break;

      case 4: //Inside value-context, ending with "

      //Escaping character found within the query-value
      if(substr($sFileContents, $iCur, 1) === "\") {
      $sBuffer .= substr($sFileContents, $iCur, 2);
      $iCur = $iCur + 1; //Skip next char

      //The ending character for the query-value is found
      } elseif(substr($sFileContents, $iCur, 1) === '"') {
      $sBuffer .= substr($sFileContents, $iCur, 1);
      $iInside = 0;

      //Not a special character
      } else {
      $sBuffer .= substr($sFileContents, $iCur, 1);
      }
      break;

      case 2: //Inside comment-context, ending with newline

      //A two-character newline is found, signalling the end of the comment
      if(substr($sFileContents, $iCur, 2) === "rn") {
      $iCur++;
      $iInside = 0;

      //A single-character newline is found, signalling the end of the comment
      } elseif(substr($sFileContents, $iCur, 1) === "n" || substr($sFileContents, $iCur, 1) === "r") {
      $iInside = 0;
      }
      break;

      case 3: //Inside comment-context, ending with */

      //A two-character */ is found, signalling the end of the comment
      if(substr($sFileContents, $iCur, 2) === "*/") {
      $iCur++;
      $iInside = 0;
      }
      break;

      default:
      break;
      }
      $iCur++;
      }

      //END_OF_PARSER

      //Preview our results
      foreach($aQueries as $sQuery) {
      if (!$rMysqlI->query($sQuery)) {
      echo "ERROR "{$sQuery}": (" . $rMysqlI->errno . ") " . $rMysqlI->error . "<br />", E_USER_ERROR;
      } else {
      echo "SUCCESS "{$sQuery}"<br />", E_USER_ERROR;
      }
      }


      For my problem-area, I am currently only interested in what is between START_OF_PARSER and END_OF_PARSER, ie. the parser bit, or the block of code that converts SQL content to an array of individual queries, which can/may be executed individually later on (as above foreach does), producing the same result as if the Variant 1 code block was used.



      The concerns that I have about my code:





      • Query-value encapsulation characters: Are they always ' or ", or are other query-value encapsulation characters used?


      • Query-value Escaping characters: I don't think its a good idea to assume your own escaping skills (security reasons..), but ignoring security concerns for now, I assume looking out for a single escaping character (backslash ) is enough, to discover which query-value character is truly the ending query-value character. But is the backslash the only escaping character (or method) in other SQL dialects?


      • Comment ignoring: To remove comments, I look for /* and */, -- and newline, # and newline characters. Is this enough?


      • DB Abstraction: I assumed MySQL, but I know that the problem I am addressing is for PearDB, which could be Oracle or PostGreSQL or LiteSQL or etc. Do those SQL dialects also have the same characters for comments, escaping, value-encapsulation, etc? Will this work for most of those DBs ?


      Executing this code seems to work under Windows (and presumably Linux too).







      php sql parsing file






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Feb 13 at 17:36









      Florian Mertens

      1011




      1011





      bumped to the homepage by Community 15 hours ago


      This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







      bumped to the homepage by Community 15 hours ago


      This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote













          Why are you trying to build a SQL parser? This seems like a lot of work and a lot of potential fragility for a use case better suited for other approaches. You are dealing with a large SQL script. Execute it as such. Don’t parse it apart.






          share|improve this answer





















          • You are invited to copy paste your answer to all those projects out there, that already have coded their own bad parser... I am only improving a terrible implementation (see the brief description above) for a terrible ongoing practice.
            – Florian Mertens
            Feb 14 at 11:28












          • On your proposal to 'Execute it as such.': This is not always possible (see my description on limitations), and projects are using this general method to work around.
            – Florian Mertens
            Feb 14 at 11:31










          • @FlorianMertens So what makes your terrible experience better? Is it building a complex and fragile SQL parser or it is getting your software into better environments? The are many free/cheap options for deploying your software in environments that aren't using the antiquated shared hosting model, and in which you can install software that does what you want and allows you to have full control over your environment. This is really a technical debt problem here. Do you continue to throw more investment into your bad technical debt or do you get rid of it? My review advice is to dump the debt
            – Mike Brant
            Feb 15 at 22:59












          • The current problem is related to opensource projects, installed on thousands of unknown hosts across the world. It is not my host, but the hosts of thousands of other administrators/developers. However, if you wish to reformulate the review circumstances for a different review, you can indeed copy/paste my review and create a new one. I am sticking to my review. Please reread it. The hosting environment is not part or considered part of the review.
            – Florian Mertens
            Feb 16 at 3:26











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "196"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f187488%2fmulti-query-php-parser-for-sql-files%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          0
          down vote













          Why are you trying to build a SQL parser? This seems like a lot of work and a lot of potential fragility for a use case better suited for other approaches. You are dealing with a large SQL script. Execute it as such. Don’t parse it apart.






          share|improve this answer





















          • You are invited to copy paste your answer to all those projects out there, that already have coded their own bad parser... I am only improving a terrible implementation (see the brief description above) for a terrible ongoing practice.
            – Florian Mertens
            Feb 14 at 11:28












          • On your proposal to 'Execute it as such.': This is not always possible (see my description on limitations), and projects are using this general method to work around.
            – Florian Mertens
            Feb 14 at 11:31










          • @FlorianMertens So what makes your terrible experience better? Is it building a complex and fragile SQL parser or it is getting your software into better environments? The are many free/cheap options for deploying your software in environments that aren't using the antiquated shared hosting model, and in which you can install software that does what you want and allows you to have full control over your environment. This is really a technical debt problem here. Do you continue to throw more investment into your bad technical debt or do you get rid of it? My review advice is to dump the debt
            – Mike Brant
            Feb 15 at 22:59












          • The current problem is related to opensource projects, installed on thousands of unknown hosts across the world. It is not my host, but the hosts of thousands of other administrators/developers. However, if you wish to reformulate the review circumstances for a different review, you can indeed copy/paste my review and create a new one. I am sticking to my review. Please reread it. The hosting environment is not part or considered part of the review.
            – Florian Mertens
            Feb 16 at 3:26















          up vote
          0
          down vote













          Why are you trying to build a SQL parser? This seems like a lot of work and a lot of potential fragility for a use case better suited for other approaches. You are dealing with a large SQL script. Execute it as such. Don’t parse it apart.






          share|improve this answer





















          • You are invited to copy paste your answer to all those projects out there, that already have coded their own bad parser... I am only improving a terrible implementation (see the brief description above) for a terrible ongoing practice.
            – Florian Mertens
            Feb 14 at 11:28












          • On your proposal to 'Execute it as such.': This is not always possible (see my description on limitations), and projects are using this general method to work around.
            – Florian Mertens
            Feb 14 at 11:31










          • @FlorianMertens So what makes your terrible experience better? Is it building a complex and fragile SQL parser or it is getting your software into better environments? The are many free/cheap options for deploying your software in environments that aren't using the antiquated shared hosting model, and in which you can install software that does what you want and allows you to have full control over your environment. This is really a technical debt problem here. Do you continue to throw more investment into your bad technical debt or do you get rid of it? My review advice is to dump the debt
            – Mike Brant
            Feb 15 at 22:59












          • The current problem is related to opensource projects, installed on thousands of unknown hosts across the world. It is not my host, but the hosts of thousands of other administrators/developers. However, if you wish to reformulate the review circumstances for a different review, you can indeed copy/paste my review and create a new one. I am sticking to my review. Please reread it. The hosting environment is not part or considered part of the review.
            – Florian Mertens
            Feb 16 at 3:26













          up vote
          0
          down vote










          up vote
          0
          down vote









          Why are you trying to build a SQL parser? This seems like a lot of work and a lot of potential fragility for a use case better suited for other approaches. You are dealing with a large SQL script. Execute it as such. Don’t parse it apart.






          share|improve this answer












          Why are you trying to build a SQL parser? This seems like a lot of work and a lot of potential fragility for a use case better suited for other approaches. You are dealing with a large SQL script. Execute it as such. Don’t parse it apart.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Feb 14 at 3:49









          Mike Brant

          8,733619




          8,733619












          • You are invited to copy paste your answer to all those projects out there, that already have coded their own bad parser... I am only improving a terrible implementation (see the brief description above) for a terrible ongoing practice.
            – Florian Mertens
            Feb 14 at 11:28












          • On your proposal to 'Execute it as such.': This is not always possible (see my description on limitations), and projects are using this general method to work around.
            – Florian Mertens
            Feb 14 at 11:31










          • @FlorianMertens So what makes your terrible experience better? Is it building a complex and fragile SQL parser or it is getting your software into better environments? The are many free/cheap options for deploying your software in environments that aren't using the antiquated shared hosting model, and in which you can install software that does what you want and allows you to have full control over your environment. This is really a technical debt problem here. Do you continue to throw more investment into your bad technical debt or do you get rid of it? My review advice is to dump the debt
            – Mike Brant
            Feb 15 at 22:59












          • The current problem is related to opensource projects, installed on thousands of unknown hosts across the world. It is not my host, but the hosts of thousands of other administrators/developers. However, if you wish to reformulate the review circumstances for a different review, you can indeed copy/paste my review and create a new one. I am sticking to my review. Please reread it. The hosting environment is not part or considered part of the review.
            – Florian Mertens
            Feb 16 at 3:26


















          • You are invited to copy paste your answer to all those projects out there, that already have coded their own bad parser... I am only improving a terrible implementation (see the brief description above) for a terrible ongoing practice.
            – Florian Mertens
            Feb 14 at 11:28












          • On your proposal to 'Execute it as such.': This is not always possible (see my description on limitations), and projects are using this general method to work around.
            – Florian Mertens
            Feb 14 at 11:31










          • @FlorianMertens So what makes your terrible experience better? Is it building a complex and fragile SQL parser or it is getting your software into better environments? The are many free/cheap options for deploying your software in environments that aren't using the antiquated shared hosting model, and in which you can install software that does what you want and allows you to have full control over your environment. This is really a technical debt problem here. Do you continue to throw more investment into your bad technical debt or do you get rid of it? My review advice is to dump the debt
            – Mike Brant
            Feb 15 at 22:59












          • The current problem is related to opensource projects, installed on thousands of unknown hosts across the world. It is not my host, but the hosts of thousands of other administrators/developers. However, if you wish to reformulate the review circumstances for a different review, you can indeed copy/paste my review and create a new one. I am sticking to my review. Please reread it. The hosting environment is not part or considered part of the review.
            – Florian Mertens
            Feb 16 at 3:26
















          You are invited to copy paste your answer to all those projects out there, that already have coded their own bad parser... I am only improving a terrible implementation (see the brief description above) for a terrible ongoing practice.
          – Florian Mertens
          Feb 14 at 11:28






          You are invited to copy paste your answer to all those projects out there, that already have coded their own bad parser... I am only improving a terrible implementation (see the brief description above) for a terrible ongoing practice.
          – Florian Mertens
          Feb 14 at 11:28














          On your proposal to 'Execute it as such.': This is not always possible (see my description on limitations), and projects are using this general method to work around.
          – Florian Mertens
          Feb 14 at 11:31




          On your proposal to 'Execute it as such.': This is not always possible (see my description on limitations), and projects are using this general method to work around.
          – Florian Mertens
          Feb 14 at 11:31












          @FlorianMertens So what makes your terrible experience better? Is it building a complex and fragile SQL parser or it is getting your software into better environments? The are many free/cheap options for deploying your software in environments that aren't using the antiquated shared hosting model, and in which you can install software that does what you want and allows you to have full control over your environment. This is really a technical debt problem here. Do you continue to throw more investment into your bad technical debt or do you get rid of it? My review advice is to dump the debt
          – Mike Brant
          Feb 15 at 22:59






          @FlorianMertens So what makes your terrible experience better? Is it building a complex and fragile SQL parser or it is getting your software into better environments? The are many free/cheap options for deploying your software in environments that aren't using the antiquated shared hosting model, and in which you can install software that does what you want and allows you to have full control over your environment. This is really a technical debt problem here. Do you continue to throw more investment into your bad technical debt or do you get rid of it? My review advice is to dump the debt
          – Mike Brant
          Feb 15 at 22:59














          The current problem is related to opensource projects, installed on thousands of unknown hosts across the world. It is not my host, but the hosts of thousands of other administrators/developers. However, if you wish to reformulate the review circumstances for a different review, you can indeed copy/paste my review and create a new one. I am sticking to my review. Please reread it. The hosting environment is not part or considered part of the review.
          – Florian Mertens
          Feb 16 at 3:26




          The current problem is related to opensource projects, installed on thousands of unknown hosts across the world. It is not my host, but the hosts of thousands of other administrators/developers. However, if you wish to reformulate the review circumstances for a different review, you can indeed copy/paste my review and create a new one. I am sticking to my review. Please reread it. The hosting environment is not part or considered part of the review.
          – Florian Mertens
          Feb 16 at 3:26


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Code Review Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f187488%2fmulti-query-php-parser-for-sql-files%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Ellipse (mathématiques)

          Quarter-circle Tiles

          Mont Emei