Multi-query PHP Parser for SQL files

up vote
0
down vote

favorite

I have an external project, which needs to run an SQL file containing multiple SQL queries (a bit like mysqldump, if you like, but is user-created, with any kind of SQL statement/definition).

PHP mysqli() does not allow multi-queries, and this problem occurs around quite often, when (for example) updating a site-module/widget to a higher version, and there is an update.sql file of some sorts. Idem for de-installations.

Given the following (1):

//DB connection values

$sHost = "localhost";

$sName = "test";

$sUser = "";

$sPass = "";

$sPort = 3307;





//The following could be retrieved using file_get_contents, or a file streamer

$sFileContents = <<<EOT

 -- This is the first comment

SELECT * FROM dl_bookmarks WHERE iID=3;



 /* This is the second comment */

SELECT * FROM dl_bookmarks WHERE sTitle=""Paragon" Initiative Enterprises Software consulting and web development for businesses \\ 'smes'";



 # This is the third comment

SELECT * FROM dl_bookmarks WHERE sTitle LIKE '"xDEEP" Diving Equipment; Scuba Gear; Single tank BC; Side Mount; Double tank BCD; Diving computer 'equipment'';

EOT;

The ideal solution would be to execute the SQL file at command line, as the following code is demonstrating (with the above sample preceeding):

//Variant 1: Run a local SQL file. Since we stored our SQL contents in a 

//variable (could have been retrieved before using eg. file_get_contents), 

//we need to temporarily create a file for this



$sTempFile = tempnam(sys_get_temp_dir(), 'Sql');



//Create the temp file

if(!file_put_contents($sTempFile, $sFileContents)) {

    trigger_error("Failed to create temporary file", E_USER_ERROR);

}



//Assemble the command

$sCommand = 'mysql'

        . ' --host=' . $sHost

        . ' --port=' . $sPort

        . ' --user=' . $sUser

        . ' --password=' . $sPass

        . ' --database=' . $sName

        . ' --execute="SOURCE ' . $sTempFile . '"'

                ;

$sOutput = shell_exec($sCommand);



//Cleanup: remove the temp file

if(!unlink($sTempFile)) {

    trigger_error("Failed to remove temporary file", E_USER_ERROR);

}

...but some projects are on shared servers or other limited access servers where shell execution is not allowed, or the mysql command may not be available.

So, to avoid importing heavy external parsing libraries with heavy support to solve a small site-related problem, many projects loop through the SQL file, and split at every ; character into a new query, which is executed individually. Because SQL values can contain these characters too, this process fails in special cases, and the split is only done for ; characters at the end of a line (but this still fails for multi-line SQL values with this character appearing at the end.)

My proposed solution is to properly parse SQL files/content, and split the queries where the queries truly end, according to SQL standards. I am seeking help with this.

My code (with the above first block sample preceeding):

//Variant 2: Run a parser



//Connect to the database

$rMysqlI = new mysqli("localhost", "", "", "test", $sPort);

if ($rMysqlI->connect_errno) {

    trigger_error("Failed to connect to MySQL: (" . $rMysqlI->connect_errno . ") " . $rMysqlI->connect_error, E_USER_ERROR);

}







//START_OF_PARSER



$iCur = 0;            //Current character pointer inside the SQL content

$iInside = 0;         //The context, in which the pointer is currently located (is the pointer inside a 

                      //comment, an SQL query, or deeper into an SQL query value?)

$sBuffer = "";        //The buffer of the next individual query

$aQueries = array();  //The list of queries

while($iCur < strlen($sFileContents)) {



    switch ($iInside) {

        case 0: //Inside query-context

            //Change context: Comments beginning with --

            if(substr($sFileContents, $iCur, 2) === "--") {

                $iCur++;

                $iInside = 2;



                //Change context: Comments beginning with /*

            } elseif(substr($sFileContents, $iCur, 2) === "/*") {

                $iCur++;

                $iInside = 3;



                //Change context: Comments beginning with #

            } elseif(substr($sFileContents, $iCur, 1) === "#") {

                $iInside = 2;



                //Separator for a new query

            } elseif(substr($sFileContents, $iCur, 1) === ";") {

                $aQueries = trim($sBuffer); //$sBuffer;  //Add current buffer to a unique array query item

                $sBuffer = "";  //Start a new buffer



                //Change context: query values opened with '

            } elseif(substr($sFileContents, $iCur, 1) === "'") {

                $sBuffer .= substr($sFileContents, $iCur, 1);

                $iInside = 1;



                //Change context: query values opened with "

            } elseif(substr($sFileContents, $iCur, 1) === '"') {

                $sBuffer .= substr($sFileContents, $iCur, 1);

                $iInside = 4;



                //Not a special character

            } else {

                $sBuffer .= substr($sFileContents, $iCur, 1);

            }

            break;



        case 1: //Inside value-context, ending with '



            //Escaping character found within the query-value

            if(substr($sFileContents, $iCur, 1) === "\") {

                $sBuffer .= substr($sFileContents, $iCur, 2);

                $iCur++;  //Skip next char



                //The ending character for the query-value is found

            } elseif(substr($sFileContents, $iCur, 1) === "'") {

                $sBuffer .= substr($sFileContents, $iCur, 1);

                $iInside = 0;



                //Not a special character

            } else {

                $sBuffer .= substr($sFileContents, $iCur, 1);

            }

            break;



        case 4: //Inside value-context, ending with "



            //Escaping character found within the query-value

            if(substr($sFileContents, $iCur, 1) === "\") {

                $sBuffer .= substr($sFileContents, $iCur, 2);

                $iCur = $iCur + 1;  //Skip next char



                //The ending character for the query-value is found

            } elseif(substr($sFileContents, $iCur, 1) === '"') {

                $sBuffer .= substr($sFileContents, $iCur, 1);

                $iInside = 0;



                //Not a special character

            } else {

                $sBuffer .= substr($sFileContents, $iCur, 1);

            }

            break;



        case 2: //Inside comment-context, ending with newline



            //A two-character newline is found, signalling the end of the comment

            if(substr($sFileContents, $iCur, 2) === "rn") {

                $iCur++;

                $iInside = 0;



                //A single-character newline is found, signalling the end of the comment

            } elseif(substr($sFileContents, $iCur, 1) === "n" || substr($sFileContents, $iCur, 1) === "r") {

                $iInside = 0;

            }

            break;



        case 3: //Inside comment-context, ending with */



            //A two-character */ is found, signalling the end of the comment

            if(substr($sFileContents, $iCur, 2) === "*/") {

                $iCur++;

                $iInside = 0;

            }

            break;



        default:

            break;

    }

    $iCur++;

}



//END_OF_PARSER



//Preview our results

foreach($aQueries as $sQuery) {

    if (!$rMysqlI->query($sQuery)) {

        echo "ERROR "{$sQuery}": (" . $rMysqlI->errno . ") " . $rMysqlI->error . "<br />", E_USER_ERROR;

    } else {

        echo "SUCCESS "{$sQuery}"<br />", E_USER_ERROR;

    }

}

For my problem-area, I am currently only interested in what is between START_OF_PARSER and END_OF_PARSER, ie. the parser bit, or the block of code that converts SQL content to an array of individual queries, which can/may be executed individually later on (as above foreach does), producing the same result as if the Variant 1 code block was used.

The concerns that I have about my code:

Query-value encapsulation characters: Are they always ' or ", or are other query-value encapsulation characters used?

Query-value Escaping characters: I don't think its a good idea to assume your own escaping skills (security reasons..), but ignoring security concerns for now, I assume looking out for a single escaping character (backslash ) is enough, to discover which query-value character is truly the ending query-value character. But is the backslash the only escaping character (or method) in other SQL dialects?

Comment ignoring: To remove comments, I look for /* and */, -- and newline, # and newline characters. Is this enough?

DB Abstraction: I assumed MySQL, but I know that the problem I am addressing is for PearDB, which could be Oracle or PostGreSQL or LiteSQL or etc. Do those SQL dialects also have the same characters for comments, escaping, value-encapsulation, etc? Will this work for most of those DBs ?

Executing this code seems to work under Windows (and presumably Linux too).

asked Feb 13 at 17:36

Florian Mertens

1011

bumped to the homepage by Community♦ 15 hours ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

up vote
0
down vote

favorite

I have an external project, which needs to run an SQL file containing multiple SQL queries (a bit like mysqldump, if you like, but is user-created, with any kind of SQL statement/definition).

Given the following (1):

//DB connection values

$sHost = "localhost";

$sName = "test";

$sUser = "";

$sPass = "";

$sPort = 3307;





//The following could be retrieved using file_get_contents, or a file streamer

$sFileContents = <<<EOT

 -- This is the first comment

SELECT * FROM dl_bookmarks WHERE iID=3;



 /* This is the second comment */

SELECT * FROM dl_bookmarks WHERE sTitle=""Paragon" Initiative Enterprises Software consulting and web development for businesses \\ 'smes'";



 # This is the third comment

SELECT * FROM dl_bookmarks WHERE sTitle LIKE '"xDEEP" Diving Equipment; Scuba Gear; Single tank BC; Side Mount; Double tank BCD; Diving computer 'equipment'';

EOT;

The ideal solution would be to execute the SQL file at command line, as the following code is demonstrating (with the above sample preceeding):

//Variant 1: Run a local SQL file. Since we stored our SQL contents in a 

//variable (could have been retrieved before using eg. file_get_contents), 

//we need to temporarily create a file for this



$sTempFile = tempnam(sys_get_temp_dir(), 'Sql');



//Create the temp file

if(!file_put_contents($sTempFile, $sFileContents)) {

    trigger_error("Failed to create temporary file", E_USER_ERROR);

}



//Assemble the command

$sCommand = 'mysql'

        . ' --host=' . $sHost

        . ' --port=' . $sPort

        . ' --user=' . $sUser

        . ' --password=' . $sPass

        . ' --database=' . $sName

        . ' --execute="SOURCE ' . $sTempFile . '"'

                ;

$sOutput = shell_exec($sCommand);



//Cleanup: remove the temp file

if(!unlink($sTempFile)) {

    trigger_error("Failed to remove temporary file", E_USER_ERROR);

}

...but some projects are on shared servers or other limited access servers where shell execution is not allowed, or the mysql command may not be available.

My proposed solution is to properly parse SQL files/content, and split the queries where the queries truly end, according to SQL standards. I am seeking help with this.

My code (with the above first block sample preceeding):

//Variant 2: Run a parser



//Connect to the database

$rMysqlI = new mysqli("localhost", "", "", "test", $sPort);

if ($rMysqlI->connect_errno) {

    trigger_error("Failed to connect to MySQL: (" . $rMysqlI->connect_errno . ") " . $rMysqlI->connect_error, E_USER_ERROR);

}







//START_OF_PARSER



$iCur = 0;            //Current character pointer inside the SQL content

$iInside = 0;         //The context, in which the pointer is currently located (is the pointer inside a 

                      //comment, an SQL query, or deeper into an SQL query value?)

$sBuffer = "";        //The buffer of the next individual query

$aQueries = array();  //The list of queries

while($iCur < strlen($sFileContents)) {



    switch ($iInside) {

        case 0: //Inside query-context

            //Change context: Comments beginning with --

            if(substr($sFileContents, $iCur, 2) === "--") {

                $iCur++;

                $iInside = 2;



                //Change context: Comments beginning with /*

            } elseif(substr($sFileContents, $iCur, 2) === "/*") {

                $iCur++;

                $iInside = 3;



                //Change context: Comments beginning with #

            } elseif(substr($sFileContents, $iCur, 1) === "#") {

                $iInside = 2;



                //Separator for a new query

            } elseif(substr($sFileContents, $iCur, 1) === ";") {

                $aQueries = trim($sBuffer); //$sBuffer;  //Add current buffer to a unique array query item

                $sBuffer = "";  //Start a new buffer



                //Change context: query values opened with '

            } elseif(substr($sFileContents, $iCur, 1) === "'") {

                $sBuffer .= substr($sFileContents, $iCur, 1);

                $iInside = 1;



                //Change context: query values opened with "

            } elseif(substr($sFileContents, $iCur, 1) === '"') {

                $sBuffer .= substr($sFileContents, $iCur, 1);

                $iInside = 4;



                //Not a special character

            } else {

                $sBuffer .= substr($sFileContents, $iCur, 1);

            }

            break;



        case 1: //Inside value-context, ending with '



            //Escaping character found within the query-value

            if(substr($sFileContents, $iCur, 1) === "\") {

                $sBuffer .= substr($sFileContents, $iCur, 2);

                $iCur++;  //Skip next char



                //The ending character for the query-value is found

            } elseif(substr($sFileContents, $iCur, 1) === "'") {

                $sBuffer .= substr($sFileContents, $iCur, 1);

                $iInside = 0;



                //Not a special character

            } else {

                $sBuffer .= substr($sFileContents, $iCur, 1);

            }

            break;



        case 4: //Inside value-context, ending with "



            //Escaping character found within the query-value

            if(substr($sFileContents, $iCur, 1) === "\") {

                $sBuffer .= substr($sFileContents, $iCur, 2);

                $iCur = $iCur + 1;  //Skip next char



                //The ending character for the query-value is found

            } elseif(substr($sFileContents, $iCur, 1) === '"') {

                $sBuffer .= substr($sFileContents, $iCur, 1);

                $iInside = 0;



                //Not a special character

            } else {

                $sBuffer .= substr($sFileContents, $iCur, 1);

            }

            break;



        case 2: //Inside comment-context, ending with newline



            //A two-character newline is found, signalling the end of the comment

            if(substr($sFileContents, $iCur, 2) === "rn") {

                $iCur++;

                $iInside = 0;



                //A single-character newline is found, signalling the end of the comment

            } elseif(substr($sFileContents, $iCur, 1) === "n" || substr($sFileContents, $iCur, 1) === "r") {

                $iInside = 0;

            }

            break;



        case 3: //Inside comment-context, ending with */



            //A two-character */ is found, signalling the end of the comment

            if(substr($sFileContents, $iCur, 2) === "*/") {

                $iCur++;

                $iInside = 0;

            }

            break;



        default:

            break;

    }

    $iCur++;

}



//END_OF_PARSER



//Preview our results

foreach($aQueries as $sQuery) {

    if (!$rMysqlI->query($sQuery)) {

        echo "ERROR "{$sQuery}": (" . $rMysqlI->errno . ") " . $rMysqlI->error . "<br />", E_USER_ERROR;

    } else {

        echo "SUCCESS "{$sQuery}"<br />", E_USER_ERROR;

    }

}

The concerns that I have about my code:

Query-value encapsulation characters: Are they always ' or ", or are other query-value encapsulation characters used?

Query-value Escaping characters: I don't think its a good idea to assume your own escaping skills (security reasons..), but ignoring security concerns for now, I assume looking out for a single escaping character (backslash ) is enough, to discover which query-value character is truly the ending query-value character. But is the backslash the only escaping character (or method) in other SQL dialects?

Comment ignoring: To remove comments, I look for /* and */, -- and newline, # and newline characters. Is this enough?

DB Abstraction: I assumed MySQL, but I know that the problem I am addressing is for PearDB, which could be Oracle or PostGreSQL or LiteSQL or etc. Do those SQL dialects also have the same characters for comments, escaping, value-encapsulation, etc? Will this work for most of those DBs ?

Executing this code seems to work under Windows (and presumably Linux too).

asked Feb 13 at 17:36

Florian Mertens

1011

bumped to the homepage by Community♦ 15 hours ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

up vote
0
down vote

favorite

I have an external project, which needs to run an SQL file containing multiple SQL queries (a bit like mysqldump, if you like, but is user-created, with any kind of SQL statement/definition).

Given the following (1):

//DB connection values

$sHost = "localhost";

$sName = "test";

$sUser = "";

$sPass = "";

$sPort = 3307;





//The following could be retrieved using file_get_contents, or a file streamer

$sFileContents = <<<EOT

 -- This is the first comment

SELECT * FROM dl_bookmarks WHERE iID=3;



 /* This is the second comment */

SELECT * FROM dl_bookmarks WHERE sTitle=""Paragon" Initiative Enterprises Software consulting and web development for businesses \\ 'smes'";



 # This is the third comment

SELECT * FROM dl_bookmarks WHERE sTitle LIKE '"xDEEP" Diving Equipment; Scuba Gear; Single tank BC; Side Mount; Double tank BCD; Diving computer 'equipment'';

EOT;

The ideal solution would be to execute the SQL file at command line, as the following code is demonstrating (with the above sample preceeding):

//Variant 1: Run a local SQL file. Since we stored our SQL contents in a 

//variable (could have been retrieved before using eg. file_get_contents), 

//we need to temporarily create a file for this



$sTempFile = tempnam(sys_get_temp_dir(), 'Sql');



//Create the temp file

if(!file_put_contents($sTempFile, $sFileContents)) {

    trigger_error("Failed to create temporary file", E_USER_ERROR);

}



//Assemble the command

$sCommand = 'mysql'

        . ' --host=' . $sHost

        . ' --port=' . $sPort

        . ' --user=' . $sUser

        . ' --password=' . $sPass

        . ' --database=' . $sName

        . ' --execute="SOURCE ' . $sTempFile . '"'

                ;

$sOutput = shell_exec($sCommand);



//Cleanup: remove the temp file

if(!unlink($sTempFile)) {

    trigger_error("Failed to remove temporary file", E_USER_ERROR);

}

...but some projects are on shared servers or other limited access servers where shell execution is not allowed, or the mysql command may not be available.

My proposed solution is to properly parse SQL files/content, and split the queries where the queries truly end, according to SQL standards. I am seeking help with this.

My code (with the above first block sample preceeding):

//Variant 2: Run a parser



//Connect to the database

$rMysqlI = new mysqli("localhost", "", "", "test", $sPort);

if ($rMysqlI->connect_errno) {

    trigger_error("Failed to connect to MySQL: (" . $rMysqlI->connect_errno . ") " . $rMysqlI->connect_error, E_USER_ERROR);

}







//START_OF_PARSER



$iCur = 0;            //Current character pointer inside the SQL content

$iInside = 0;         //The context, in which the pointer is currently located (is the pointer inside a 

                      //comment, an SQL query, or deeper into an SQL query value?)

$sBuffer = "";        //The buffer of the next individual query

$aQueries = array();  //The list of queries

while($iCur < strlen($sFileContents)) {



    switch ($iInside) {

        case 0: //Inside query-context

            //Change context: Comments beginning with --

            if(substr($sFileContents, $iCur, 2) === "--") {

                $iCur++;

                $iInside = 2;



                //Change context: Comments beginning with /*

            } elseif(substr($sFileContents, $iCur, 2) === "/*") {

                $iCur++;

                $iInside = 3;



                //Change context: Comments beginning with #

            } elseif(substr($sFileContents, $iCur, 1) === "#") {

                $iInside = 2;



                //Separator for a new query

            } elseif(substr($sFileContents, $iCur, 1) === ";") {

                $aQueries = trim($sBuffer); //$sBuffer;  //Add current buffer to a unique array query item

                $sBuffer = "";  //Start a new buffer



                //Change context: query values opened with '

            } elseif(substr($sFileContents, $iCur, 1) === "'") {

                $sBuffer .= substr($sFileContents, $iCur, 1);

                $iInside = 1;



                //Change context: query values opened with "

            } elseif(substr($sFileContents, $iCur, 1) === '"') {

                $sBuffer .= substr($sFileContents, $iCur, 1);

                $iInside = 4;



                //Not a special character

            } else {

                $sBuffer .= substr($sFileContents, $iCur, 1);

            }

            break;



        case 1: //Inside value-context, ending with '



            //Escaping character found within the query-value

            if(substr($sFileContents, $iCur, 1) === "\") {

                $sBuffer .= substr($sFileContents, $iCur, 2);

                $iCur++;  //Skip next char



                //The ending character for the query-value is found

            } elseif(substr($sFileContents, $iCur, 1) === "'") {

                $sBuffer .= substr($sFileContents, $iCur, 1);

                $iInside = 0;



                //Not a special character

            } else {

                $sBuffer .= substr($sFileContents, $iCur, 1);

            }

            break;



        case 4: //Inside value-context, ending with "



            //Escaping character found within the query-value

            if(substr($sFileContents, $iCur, 1) === "\") {

                $sBuffer .= substr($sFileContents, $iCur, 2);

                $iCur = $iCur + 1;  //Skip next char



                //The ending character for the query-value is found

            } elseif(substr($sFileContents, $iCur, 1) === '"') {

                $sBuffer .= substr($sFileContents, $iCur, 1);

                $iInside = 0;



                //Not a special character

            } else {

                $sBuffer .= substr($sFileContents, $iCur, 1);

            }

            break;



        case 2: //Inside comment-context, ending with newline



            //A two-character newline is found, signalling the end of the comment

            if(substr($sFileContents, $iCur, 2) === "rn") {

                $iCur++;

                $iInside = 0;



                //A single-character newline is found, signalling the end of the comment

            } elseif(substr($sFileContents, $iCur, 1) === "n" || substr($sFileContents, $iCur, 1) === "r") {

                $iInside = 0;

            }

            break;



        case 3: //Inside comment-context, ending with */



            //A two-character */ is found, signalling the end of the comment

            if(substr($sFileContents, $iCur, 2) === "*/") {

                $iCur++;

                $iInside = 0;

            }

            break;



        default:

            break;

    }

    $iCur++;

}



//END_OF_PARSER



//Preview our results

foreach($aQueries as $sQuery) {

    if (!$rMysqlI->query($sQuery)) {

        echo "ERROR "{$sQuery}": (" . $rMysqlI->errno . ") " . $rMysqlI->error . "<br />", E_USER_ERROR;

    } else {

        echo "SUCCESS "{$sQuery}"<br />", E_USER_ERROR;

    }

}

The concerns that I have about my code:

Query-value encapsulation characters: Are they always ' or ", or are other query-value encapsulation characters used?

Query-value Escaping characters: I don't think its a good idea to assume your own escaping skills (security reasons..), but ignoring security concerns for now, I assume looking out for a single escaping character (backslash ) is enough, to discover which query-value character is truly the ending query-value character. But is the backslash the only escaping character (or method) in other SQL dialects?

Comment ignoring: To remove comments, I look for /* and */, -- and newline, # and newline characters. Is this enough?

DB Abstraction: I assumed MySQL, but I know that the problem I am addressing is for PearDB, which could be Oracle or PostGreSQL or LiteSQL or etc. Do those SQL dialects also have the same characters for comments, escaping, value-encapsulation, etc? Will this work for most of those DBs ?

Executing this code seems to work under Windows (and presumably Linux too).

asked Feb 13 at 17:36

Florian Mertens

1011

I have an external project, which needs to run an SQL file containing multiple SQL queries (a bit like mysqldump, if you like, but is user-created, with any kind of SQL statement/definition).

Given the following (1):

//DB connection values

$sHost = "localhost";

$sName = "test";

$sUser = "";

$sPass = "";

$sPort = 3307;





//The following could be retrieved using file_get_contents, or a file streamer

$sFileContents = <<<EOT

 -- This is the first comment

SELECT * FROM dl_bookmarks WHERE iID=3;



 /* This is the second comment */

SELECT * FROM dl_bookmarks WHERE sTitle=""Paragon" Initiative Enterprises Software consulting and web development for businesses \\ 'smes'";



 # This is the third comment

SELECT * FROM dl_bookmarks WHERE sTitle LIKE '"xDEEP" Diving Equipment; Scuba Gear; Single tank BC; Side Mount; Double tank BCD; Diving computer 'equipment'';

EOT;

The ideal solution would be to execute the SQL file at command line, as the following code is demonstrating (with the above sample preceeding):

//Variant 1: Run a local SQL file. Since we stored our SQL contents in a 

//variable (could have been retrieved before using eg. file_get_contents), 

//we need to temporarily create a file for this



$sTempFile = tempnam(sys_get_temp_dir(), 'Sql');



//Create the temp file

if(!file_put_contents($sTempFile, $sFileContents)) {

    trigger_error("Failed to create temporary file", E_USER_ERROR);

}



//Assemble the command

$sCommand = 'mysql'

        . ' --host=' . $sHost

        . ' --port=' . $sPort

        . ' --user=' . $sUser

        . ' --password=' . $sPass

        . ' --database=' . $sName

        . ' --execute="SOURCE ' . $sTempFile . '"'

                ;

$sOutput = shell_exec($sCommand);



//Cleanup: remove the temp file

if(!unlink($sTempFile)) {

    trigger_error("Failed to remove temporary file", E_USER_ERROR);

}

...but some projects are on shared servers or other limited access servers where shell execution is not allowed, or the mysql command may not be available.

My proposed solution is to properly parse SQL files/content, and split the queries where the queries truly end, according to SQL standards. I am seeking help with this.

My code (with the above first block sample preceeding):

//Variant 2: Run a parser



//Connect to the database

$rMysqlI = new mysqli("localhost", "", "", "test", $sPort);

if ($rMysqlI->connect_errno) {

    trigger_error("Failed to connect to MySQL: (" . $rMysqlI->connect_errno . ") " . $rMysqlI->connect_error, E_USER_ERROR);

}







//START_OF_PARSER



$iCur = 0;            //Current character pointer inside the SQL content

$iInside = 0;         //The context, in which the pointer is currently located (is the pointer inside a 

                      //comment, an SQL query, or deeper into an SQL query value?)

$sBuffer = "";        //The buffer of the next individual query

$aQueries = array();  //The list of queries

while($iCur < strlen($sFileContents)) {



    switch ($iInside) {

        case 0: //Inside query-context

            //Change context: Comments beginning with --

            if(substr($sFileContents, $iCur, 2) === "--") {

                $iCur++;

                $iInside = 2;



                //Change context: Comments beginning with /*

            } elseif(substr($sFileContents, $iCur, 2) === "/*") {

                $iCur++;

                $iInside = 3;



                //Change context: Comments beginning with #

            } elseif(substr($sFileContents, $iCur, 1) === "#") {

                $iInside = 2;



                //Separator for a new query

            } elseif(substr($sFileContents, $iCur, 1) === ";") {

                $aQueries = trim($sBuffer); //$sBuffer;  //Add current buffer to a unique array query item

                $sBuffer = "";  //Start a new buffer



                //Change context: query values opened with '

            } elseif(substr($sFileContents, $iCur, 1) === "'") {

                $sBuffer .= substr($sFileContents, $iCur, 1);

                $iInside = 1;



                //Change context: query values opened with "

            } elseif(substr($sFileContents, $iCur, 1) === '"') {

                $sBuffer .= substr($sFileContents, $iCur, 1);

                $iInside = 4;



                //Not a special character

            } else {

                $sBuffer .= substr($sFileContents, $iCur, 1);

            }

            break;



        case 1: //Inside value-context, ending with '



            //Escaping character found within the query-value

            if(substr($sFileContents, $iCur, 1) === "\") {

                $sBuffer .= substr($sFileContents, $iCur, 2);

                $iCur++;  //Skip next char



                //The ending character for the query-value is found

            } elseif(substr($sFileContents, $iCur, 1) === "'") {

                $sBuffer .= substr($sFileContents, $iCur, 1);

                $iInside = 0;



                //Not a special character

            } else {

                $sBuffer .= substr($sFileContents, $iCur, 1);

            }

            break;



        case 4: //Inside value-context, ending with "



            //Escaping character found within the query-value

            if(substr($sFileContents, $iCur, 1) === "\") {

                $sBuffer .= substr($sFileContents, $iCur, 2);

                $iCur = $iCur + 1;  //Skip next char



                //The ending character for the query-value is found

            } elseif(substr($sFileContents, $iCur, 1) === '"') {

                $sBuffer .= substr($sFileContents, $iCur, 1);

                $iInside = 0;



                //Not a special character

            } else {

                $sBuffer .= substr($sFileContents, $iCur, 1);

            }

            break;



        case 2: //Inside comment-context, ending with newline



            //A two-character newline is found, signalling the end of the comment

            if(substr($sFileContents, $iCur, 2) === "rn") {

                $iCur++;

                $iInside = 0;



                //A single-character newline is found, signalling the end of the comment

            } elseif(substr($sFileContents, $iCur, 1) === "n" || substr($sFileContents, $iCur, 1) === "r") {

                $iInside = 0;

            }

            break;



        case 3: //Inside comment-context, ending with */



            //A two-character */ is found, signalling the end of the comment

            if(substr($sFileContents, $iCur, 2) === "*/") {

                $iCur++;

                $iInside = 0;

            }

            break;



        default:

            break;

    }

    $iCur++;

}



//END_OF_PARSER



//Preview our results

foreach($aQueries as $sQuery) {

    if (!$rMysqlI->query($sQuery)) {

        echo "ERROR "{$sQuery}": (" . $rMysqlI->errno . ") " . $rMysqlI->error . "<br />", E_USER_ERROR;

    } else {

        echo "SUCCESS "{$sQuery}"<br />", E_USER_ERROR;

    }

}

The concerns that I have about my code:

Query-value encapsulation characters: Are they always ' or ", or are other query-value encapsulation characters used?

Query-value Escaping characters: I don't think its a good idea to assume your own escaping skills (security reasons..), but ignoring security concerns for now, I assume looking out for a single escaping character (backslash ) is enough, to discover which query-value character is truly the ending query-value character. But is the backslash the only escaping character (or method) in other SQL dialects?

Comment ignoring: To remove comments, I look for /* and */, -- and newline, # and newline characters. Is this enough?

DB Abstraction: I assumed MySQL, but I know that the problem I am addressing is for PearDB, which could be Oracle or PostGreSQL or LiteSQL or etc. Do those SQL dialects also have the same characters for comments, escaping, value-encapsulation, etc? Will this work for most of those DBs ?

Executing this code seems to work under Windows (and presumably Linux too).

php sql parsing file

asked Feb 13 at 17:36

Florian Mertens

1011

asked Feb 13 at 17:36

Florian Mertens

1011

asked Feb 13 at 17:36

Florian Mertens

1011

asked Feb 13 at 17:36

Florian Mertens

1011

asked Feb 13 at 17:36

Florian Mertens

1011

bumped to the homepage by Community♦ 15 hours ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ 15 hours ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

Why are you trying to build a SQL parser? This seems like a lot of work and a lot of potential fragility for a use case better suited for other approaches. You are dealing with a large SQL script. Execute it as such. Don’t parse it apart.

answered Feb 14 at 3:49

Mike Brant

8,733619

You are invited to copy paste your answer to all those projects out there, that already have coded their own bad parser... I am only improving a terrible implementation (see the brief description above) for a terrible ongoing practice.
– Florian Mertens
Feb 14 at 11:28

On your proposal to 'Execute it as such.': This is not always possible (see my description on limitations), and projects are using this general method to work around.
– Florian Mertens
Feb 14 at 11:31

@FlorianMertens So what makes your terrible experience better? Is it building a complex and fragile SQL parser or it is getting your software into better environments? The are many free/cheap options for deploying your software in environments that aren't using the antiquated shared hosting model, and in which you can install software that does what you want and allows you to have full control over your environment. This is really a technical debt problem here. Do you continue to throw more investment into your bad technical debt or do you get rid of it? My review advice is to dump the debt
– Mike Brant
Feb 15 at 22:59

The current problem is related to opensource projects, installed on thousands of unknown hosts across the world. It is not my host, but the hosts of thousands of other administrators/developers. However, if you wish to reformulate the review circumstances for a different review, you can indeed copy/paste my review and create a new one. I am sticking to my review. Please reread it. The hosting environment is not part or considered part of the review.
– Florian Mertens
Feb 16 at 3:26

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f187488%2fmulti-query-php-parser-for-sql-files%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

answered Feb 14 at 3:49

Mike Brant

8,733619

You are invited to copy paste your answer to all those projects out there, that already have coded their own bad parser... I am only improving a terrible implementation (see the brief description above) for a terrible ongoing practice.
– Florian Mertens
Feb 14 at 11:28

On your proposal to 'Execute it as such.': This is not always possible (see my description on limitations), and projects are using this general method to work around.
– Florian Mertens
Feb 14 at 11:31

@FlorianMertens So what makes your terrible experience better? Is it building a complex and fragile SQL parser or it is getting your software into better environments? The are many free/cheap options for deploying your software in environments that aren't using the antiquated shared hosting model, and in which you can install software that does what you want and allows you to have full control over your environment. This is really a technical debt problem here. Do you continue to throw more investment into your bad technical debt or do you get rid of it? My review advice is to dump the debt
– Mike Brant
Feb 15 at 22:59

The current problem is related to opensource projects, installed on thousands of unknown hosts across the world. It is not my host, but the hosts of thousands of other administrators/developers. However, if you wish to reformulate the review circumstances for a different review, you can indeed copy/paste my review and create a new one. I am sticking to my review. Please reread it. The hosting environment is not part or considered part of the review.
– Florian Mertens
Feb 16 at 3:26

add a comment |

up vote
0
down vote

answered Feb 14 at 3:49

Mike Brant

8,733619

You are invited to copy paste your answer to all those projects out there, that already have coded their own bad parser... I am only improving a terrible implementation (see the brief description above) for a terrible ongoing practice.
– Florian Mertens
Feb 14 at 11:28

On your proposal to 'Execute it as such.': This is not always possible (see my description on limitations), and projects are using this general method to work around.
– Florian Mertens
Feb 14 at 11:31

@FlorianMertens So what makes your terrible experience better? Is it building a complex and fragile SQL parser or it is getting your software into better environments? The are many free/cheap options for deploying your software in environments that aren't using the antiquated shared hosting model, and in which you can install software that does what you want and allows you to have full control over your environment. This is really a technical debt problem here. Do you continue to throw more investment into your bad technical debt or do you get rid of it? My review advice is to dump the debt
– Mike Brant
Feb 15 at 22:59

The current problem is related to opensource projects, installed on thousands of unknown hosts across the world. It is not my host, but the hosts of thousands of other administrators/developers. However, if you wish to reformulate the review circumstances for a different review, you can indeed copy/paste my review and create a new one. I am sticking to my review. Please reread it. The hosting environment is not part or considered part of the review.
– Florian Mertens
Feb 16 at 3:26

add a comment |

up vote
0
down vote

answered Feb 14 at 3:49

Mike Brant

8,733619

answered Feb 14 at 3:49

Mike Brant

8,733619

answered Feb 14 at 3:49

Mike Brant

8,733619

answered Feb 14 at 3:49

Mike Brant

8,733619

answered Feb 14 at 3:49

Mike Brant

8,733619

You are invited to copy paste your answer to all those projects out there, that already have coded their own bad parser... I am only improving a terrible implementation (see the brief description above) for a terrible ongoing practice.
– Florian Mertens
Feb 14 at 11:28

On your proposal to 'Execute it as such.': This is not always possible (see my description on limitations), and projects are using this general method to work around.
– Florian Mertens
Feb 14 at 11:31

@FlorianMertens So what makes your terrible experience better? Is it building a complex and fragile SQL parser or it is getting your software into better environments? The are many free/cheap options for deploying your software in environments that aren't using the antiquated shared hosting model, and in which you can install software that does what you want and allows you to have full control over your environment. This is really a technical debt problem here. Do you continue to throw more investment into your bad technical debt or do you get rid of it? My review advice is to dump the debt
– Mike Brant
Feb 15 at 22:59

The current problem is related to opensource projects, installed on thousands of unknown hosts across the world. It is not my host, but the hosts of thousands of other administrators/developers. However, if you wish to reformulate the review circumstances for a different review, you can indeed copy/paste my review and create a new one. I am sticking to my review. Please reread it. The hosting environment is not part or considered part of the review.
– Florian Mertens
Feb 16 at 3:26

add a comment |

You are invited to copy paste your answer to all those projects out there, that already have coded their own bad parser... I am only improving a terrible implementation (see the brief description above) for a terrible ongoing practice.
– Florian Mertens
Feb 14 at 11:28

On your proposal to 'Execute it as such.': This is not always possible (see my description on limitations), and projects are using this general method to work around.
– Florian Mertens
Feb 14 at 11:31

@FlorianMertens So what makes your terrible experience better? Is it building a complex and fragile SQL parser or it is getting your software into better environments? The are many free/cheap options for deploying your software in environments that aren't using the antiquated shared hosting model, and in which you can install software that does what you want and allows you to have full control over your environment. This is really a technical debt problem here. Do you continue to throw more investment into your bad technical debt or do you get rid of it? My review advice is to dump the debt
– Mike Brant
Feb 15 at 22:59

The current problem is related to opensource projects, installed on thousands of unknown hosts across the world. It is not my host, but the hosts of thousands of other administrators/developers. However, if you wish to reformulate the review circumstances for a different review, you can indeed copy/paste my review and create a new one. I am sticking to my review. Please reread it. The hosting environment is not part or considered part of the review.
– Florian Mertens
Feb 16 at 3:26

You are invited to copy paste your answer to all those projects out there, that already have coded their own bad parser... I am only improving a terrible implementation (see the brief description above) for a terrible ongoing practice.
– Florian Mertens
Feb 14 at 11:28

On your proposal to 'Execute it as such.': This is not always possible (see my description on limitations), and projects are using this general method to work around.
– Florian Mertens
Feb 14 at 11:31

@FlorianMertens So what makes your terrible experience better? Is it building a complex and fragile SQL parser or it is getting your software into better environments? The are many free/cheap options for deploying your software in environments that aren't using the antiquated shared hosting model, and in which you can install software that does what you want and allows you to have full control over your environment. This is really a technical debt problem here. Do you continue to throw more investment into your bad technical debt or do you get rid of it? My review advice is to dump the debt
– Mike Brant
Feb 15 at 22:59

The current problem is related to opensource projects, installed on thousands of unknown hosts across the world. It is not my host, but the hosts of thousands of other administrators/developers. However, if you wish to reformulate the review circumstances for a different review, you can indeed copy/paste my review and create a new one. I am sticking to my review. Please reread it. The hosting environment is not part or considered part of the review.
– Florian Mertens
Feb 16 at 3:26

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Krdytkyu