Multi-query PHP Parser for SQL files
up vote
0
down vote
favorite
I have an external project, which needs to run an SQL file containing multiple SQL queries (a bit like mysqldump, if you like, but is user-created, with any kind of SQL statement/definition).
PHP mysqli() does not allow multi-queries, and this problem occurs around quite often, when (for example) updating a site-module/widget to a higher version, and there is an update.sql file of some sorts. Idem for de-installations.
Given the following (1):
//DB connection values
$sHost = "localhost";
$sName = "test";
$sUser = "";
$sPass = "";
$sPort = 3307;
//The following could be retrieved using file_get_contents, or a file streamer
$sFileContents = <<<EOT
-- This is the first comment
SELECT * FROM dl_bookmarks WHERE iID=3;
/* This is the second comment */
SELECT * FROM dl_bookmarks WHERE sTitle=""Paragon" Initiative Enterprises Software consulting and web development for businesses \\ 'smes'";
# This is the third comment
SELECT * FROM dl_bookmarks WHERE sTitle LIKE '"xDEEP" Diving Equipment; Scuba Gear; Single tank BC; Side Mount; Double tank BCD; Diving computer 'equipment'';
EOT;
The ideal solution would be to execute the SQL file at command line, as the following code is demonstrating (with the above sample preceeding):
//Variant 1: Run a local SQL file. Since we stored our SQL contents in a
//variable (could have been retrieved before using eg. file_get_contents),
//we need to temporarily create a file for this
$sTempFile = tempnam(sys_get_temp_dir(), 'Sql');
//Create the temp file
if(!file_put_contents($sTempFile, $sFileContents)) {
trigger_error("Failed to create temporary file", E_USER_ERROR);
}
//Assemble the command
$sCommand = 'mysql'
. ' --host=' . $sHost
. ' --port=' . $sPort
. ' --user=' . $sUser
. ' --password=' . $sPass
. ' --database=' . $sName
. ' --execute="SOURCE ' . $sTempFile . '"'
;
$sOutput = shell_exec($sCommand);
//Cleanup: remove the temp file
if(!unlink($sTempFile)) {
trigger_error("Failed to remove temporary file", E_USER_ERROR);
}
...but some projects are on shared servers or other limited access servers where shell execution is not allowed, or the mysql command may not be available.
So, to avoid importing heavy external parsing libraries with heavy support to solve a small site-related problem, many projects loop through the SQL file, and split at every ; character into a new query, which is executed individually. Because SQL values can contain these characters too, this process fails in special cases, and the split is only done for ; characters at the end of a line (but this still fails for multi-line SQL values with this character appearing at the end.)
My proposed solution is to properly parse SQL files/content, and split the queries where the queries truly end, according to SQL standards. I am seeking help with this.
My code (with the above first block sample preceeding):
//Variant 2: Run a parser
//Connect to the database
$rMysqlI = new mysqli("localhost", "", "", "test", $sPort);
if ($rMysqlI->connect_errno) {
trigger_error("Failed to connect to MySQL: (" . $rMysqlI->connect_errno . ") " . $rMysqlI->connect_error, E_USER_ERROR);
}
//START_OF_PARSER
$iCur = 0; //Current character pointer inside the SQL content
$iInside = 0; //The context, in which the pointer is currently located (is the pointer inside a
//comment, an SQL query, or deeper into an SQL query value?)
$sBuffer = ""; //The buffer of the next individual query
$aQueries = array(); //The list of queries
while($iCur < strlen($sFileContents)) {
switch ($iInside) {
case 0: //Inside query-context
//Change context: Comments beginning with --
if(substr($sFileContents, $iCur, 2) === "--") {
$iCur++;
$iInside = 2;
//Change context: Comments beginning with /*
} elseif(substr($sFileContents, $iCur, 2) === "/*") {
$iCur++;
$iInside = 3;
//Change context: Comments beginning with #
} elseif(substr($sFileContents, $iCur, 1) === "#") {
$iInside = 2;
//Separator for a new query
} elseif(substr($sFileContents, $iCur, 1) === ";") {
$aQueries = trim($sBuffer); //$sBuffer; //Add current buffer to a unique array query item
$sBuffer = ""; //Start a new buffer
//Change context: query values opened with '
} elseif(substr($sFileContents, $iCur, 1) === "'") {
$sBuffer .= substr($sFileContents, $iCur, 1);
$iInside = 1;
//Change context: query values opened with "
} elseif(substr($sFileContents, $iCur, 1) === '"') {
$sBuffer .= substr($sFileContents, $iCur, 1);
$iInside = 4;
//Not a special character
} else {
$sBuffer .= substr($sFileContents, $iCur, 1);
}
break;
case 1: //Inside value-context, ending with '
//Escaping character found within the query-value
if(substr($sFileContents, $iCur, 1) === "\") {
$sBuffer .= substr($sFileContents, $iCur, 2);
$iCur++; //Skip next char
//The ending character for the query-value is found
} elseif(substr($sFileContents, $iCur, 1) === "'") {
$sBuffer .= substr($sFileContents, $iCur, 1);
$iInside = 0;
//Not a special character
} else {
$sBuffer .= substr($sFileContents, $iCur, 1);
}
break;
case 4: //Inside value-context, ending with "
//Escaping character found within the query-value
if(substr($sFileContents, $iCur, 1) === "\") {
$sBuffer .= substr($sFileContents, $iCur, 2);
$iCur = $iCur + 1; //Skip next char
//The ending character for the query-value is found
} elseif(substr($sFileContents, $iCur, 1) === '"') {
$sBuffer .= substr($sFileContents, $iCur, 1);
$iInside = 0;
//Not a special character
} else {
$sBuffer .= substr($sFileContents, $iCur, 1);
}
break;
case 2: //Inside comment-context, ending with newline
//A two-character newline is found, signalling the end of the comment
if(substr($sFileContents, $iCur, 2) === "rn") {
$iCur++;
$iInside = 0;
//A single-character newline is found, signalling the end of the comment
} elseif(substr($sFileContents, $iCur, 1) === "n" || substr($sFileContents, $iCur, 1) === "r") {
$iInside = 0;
}
break;
case 3: //Inside comment-context, ending with */
//A two-character */ is found, signalling the end of the comment
if(substr($sFileContents, $iCur, 2) === "*/") {
$iCur++;
$iInside = 0;
}
break;
default:
break;
}
$iCur++;
}
//END_OF_PARSER
//Preview our results
foreach($aQueries as $sQuery) {
if (!$rMysqlI->query($sQuery)) {
echo "ERROR "{$sQuery}": (" . $rMysqlI->errno . ") " . $rMysqlI->error . "<br />", E_USER_ERROR;
} else {
echo "SUCCESS "{$sQuery}"<br />", E_USER_ERROR;
}
}
For my problem-area, I am currently only interested in what is between START_OF_PARSER and END_OF_PARSER, ie. the parser bit, or the block of code that converts SQL content to an array of individual queries, which can/may be executed individually later on (as above foreach does), producing the same result as if the Variant 1 code block was used.
The concerns that I have about my code:
Query-value encapsulation characters: Are they always ' or ", or are other query-value encapsulation characters used?
Query-value Escaping characters: I don't think its a good idea to assume your own escaping skills (security reasons..), but ignoring security concerns for now, I assume looking out for a single escaping character (backslash ) is enough, to discover which query-value character is truly the ending query-value character. But is the backslash the only escaping character (or method) in other SQL dialects?
Comment ignoring: To remove comments, I look for /* and */, -- and newline, # and newline characters. Is this enough?
DB Abstraction: I assumed MySQL, but I know that the problem I am addressing is for PearDB, which could be Oracle or PostGreSQL or LiteSQL or etc. Do those SQL dialects also have the same characters for comments, escaping, value-encapsulation, etc? Will this work for most of those DBs ?
Executing this code seems to work under Windows (and presumably Linux too).
php sql parsing file
bumped to the homepage by Community♦ 15 hours ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
up vote
0
down vote
favorite
I have an external project, which needs to run an SQL file containing multiple SQL queries (a bit like mysqldump, if you like, but is user-created, with any kind of SQL statement/definition).
PHP mysqli() does not allow multi-queries, and this problem occurs around quite often, when (for example) updating a site-module/widget to a higher version, and there is an update.sql file of some sorts. Idem for de-installations.
Given the following (1):
//DB connection values
$sHost = "localhost";
$sName = "test";
$sUser = "";
$sPass = "";
$sPort = 3307;
//The following could be retrieved using file_get_contents, or a file streamer
$sFileContents = <<<EOT
-- This is the first comment
SELECT * FROM dl_bookmarks WHERE iID=3;
/* This is the second comment */
SELECT * FROM dl_bookmarks WHERE sTitle=""Paragon" Initiative Enterprises Software consulting and web development for businesses \\ 'smes'";
# This is the third comment
SELECT * FROM dl_bookmarks WHERE sTitle LIKE '"xDEEP" Diving Equipment; Scuba Gear; Single tank BC; Side Mount; Double tank BCD; Diving computer 'equipment'';
EOT;
The ideal solution would be to execute the SQL file at command line, as the following code is demonstrating (with the above sample preceeding):
//Variant 1: Run a local SQL file. Since we stored our SQL contents in a
//variable (could have been retrieved before using eg. file_get_contents),
//we need to temporarily create a file for this
$sTempFile = tempnam(sys_get_temp_dir(), 'Sql');
//Create the temp file
if(!file_put_contents($sTempFile, $sFileContents)) {
trigger_error("Failed to create temporary file", E_USER_ERROR);
}
//Assemble the command
$sCommand = 'mysql'
. ' --host=' . $sHost
. ' --port=' . $sPort
. ' --user=' . $sUser
. ' --password=' . $sPass
. ' --database=' . $sName
. ' --execute="SOURCE ' . $sTempFile . '"'
;
$sOutput = shell_exec($sCommand);
//Cleanup: remove the temp file
if(!unlink($sTempFile)) {
trigger_error("Failed to remove temporary file", E_USER_ERROR);
}
...but some projects are on shared servers or other limited access servers where shell execution is not allowed, or the mysql command may not be available.
So, to avoid importing heavy external parsing libraries with heavy support to solve a small site-related problem, many projects loop through the SQL file, and split at every ; character into a new query, which is executed individually. Because SQL values can contain these characters too, this process fails in special cases, and the split is only done for ; characters at the end of a line (but this still fails for multi-line SQL values with this character appearing at the end.)
My proposed solution is to properly parse SQL files/content, and split the queries where the queries truly end, according to SQL standards. I am seeking help with this.
My code (with the above first block sample preceeding):
//Variant 2: Run a parser
//Connect to the database
$rMysqlI = new mysqli("localhost", "", "", "test", $sPort);
if ($rMysqlI->connect_errno) {
trigger_error("Failed to connect to MySQL: (" . $rMysqlI->connect_errno . ") " . $rMysqlI->connect_error, E_USER_ERROR);
}
//START_OF_PARSER
$iCur = 0; //Current character pointer inside the SQL content
$iInside = 0; //The context, in which the pointer is currently located (is the pointer inside a
//comment, an SQL query, or deeper into an SQL query value?)
$sBuffer = ""; //The buffer of the next individual query
$aQueries = array(); //The list of queries
while($iCur < strlen($sFileContents)) {
switch ($iInside) {
case 0: //Inside query-context
//Change context: Comments beginning with --
if(substr($sFileContents, $iCur, 2) === "--") {
$iCur++;
$iInside = 2;
//Change context: Comments beginning with /*
} elseif(substr($sFileContents, $iCur, 2) === "/*") {
$iCur++;
$iInside = 3;
//Change context: Comments beginning with #
} elseif(substr($sFileContents, $iCur, 1) === "#") {
$iInside = 2;
//Separator for a new query
} elseif(substr($sFileContents, $iCur, 1) === ";") {
$aQueries = trim($sBuffer); //$sBuffer; //Add current buffer to a unique array query item
$sBuffer = ""; //Start a new buffer
//Change context: query values opened with '
} elseif(substr($sFileContents, $iCur, 1) === "'") {
$sBuffer .= substr($sFileContents, $iCur, 1);
$iInside = 1;
//Change context: query values opened with "
} elseif(substr($sFileContents, $iCur, 1) === '"') {
$sBuffer .= substr($sFileContents, $iCur, 1);
$iInside = 4;
//Not a special character
} else {
$sBuffer .= substr($sFileContents, $iCur, 1);
}
break;
case 1: //Inside value-context, ending with '
//Escaping character found within the query-value
if(substr($sFileContents, $iCur, 1) === "\") {
$sBuffer .= substr($sFileContents, $iCur, 2);
$iCur++; //Skip next char
//The ending character for the query-value is found
} elseif(substr($sFileContents, $iCur, 1) === "'") {
$sBuffer .= substr($sFileContents, $iCur, 1);
$iInside = 0;
//Not a special character
} else {
$sBuffer .= substr($sFileContents, $iCur, 1);
}
break;
case 4: //Inside value-context, ending with "
//Escaping character found within the query-value
if(substr($sFileContents, $iCur, 1) === "\") {
$sBuffer .= substr($sFileContents, $iCur, 2);
$iCur = $iCur + 1; //Skip next char
//The ending character for the query-value is found
} elseif(substr($sFileContents, $iCur, 1) === '"') {
$sBuffer .= substr($sFileContents, $iCur, 1);
$iInside = 0;
//Not a special character
} else {
$sBuffer .= substr($sFileContents, $iCur, 1);
}
break;
case 2: //Inside comment-context, ending with newline
//A two-character newline is found, signalling the end of the comment
if(substr($sFileContents, $iCur, 2) === "rn") {
$iCur++;
$iInside = 0;
//A single-character newline is found, signalling the end of the comment
} elseif(substr($sFileContents, $iCur, 1) === "n" || substr($sFileContents, $iCur, 1) === "r") {
$iInside = 0;
}
break;
case 3: //Inside comment-context, ending with */
//A two-character */ is found, signalling the end of the comment
if(substr($sFileContents, $iCur, 2) === "*/") {
$iCur++;
$iInside = 0;
}
break;
default:
break;
}
$iCur++;
}
//END_OF_PARSER
//Preview our results
foreach($aQueries as $sQuery) {
if (!$rMysqlI->query($sQuery)) {
echo "ERROR "{$sQuery}": (" . $rMysqlI->errno . ") " . $rMysqlI->error . "<br />", E_USER_ERROR;
} else {
echo "SUCCESS "{$sQuery}"<br />", E_USER_ERROR;
}
}
For my problem-area, I am currently only interested in what is between START_OF_PARSER and END_OF_PARSER, ie. the parser bit, or the block of code that converts SQL content to an array of individual queries, which can/may be executed individually later on (as above foreach does), producing the same result as if the Variant 1 code block was used.
The concerns that I have about my code:
Query-value encapsulation characters: Are they always ' or ", or are other query-value encapsulation characters used?
Query-value Escaping characters: I don't think its a good idea to assume your own escaping skills (security reasons..), but ignoring security concerns for now, I assume looking out for a single escaping character (backslash ) is enough, to discover which query-value character is truly the ending query-value character. But is the backslash the only escaping character (or method) in other SQL dialects?
Comment ignoring: To remove comments, I look for /* and */, -- and newline, # and newline characters. Is this enough?
DB Abstraction: I assumed MySQL, but I know that the problem I am addressing is for PearDB, which could be Oracle or PostGreSQL or LiteSQL or etc. Do those SQL dialects also have the same characters for comments, escaping, value-encapsulation, etc? Will this work for most of those DBs ?
Executing this code seems to work under Windows (and presumably Linux too).
php sql parsing file
bumped to the homepage by Community♦ 15 hours ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have an external project, which needs to run an SQL file containing multiple SQL queries (a bit like mysqldump, if you like, but is user-created, with any kind of SQL statement/definition).
PHP mysqli() does not allow multi-queries, and this problem occurs around quite often, when (for example) updating a site-module/widget to a higher version, and there is an update.sql file of some sorts. Idem for de-installations.
Given the following (1):
//DB connection values
$sHost = "localhost";
$sName = "test";
$sUser = "";
$sPass = "";
$sPort = 3307;
//The following could be retrieved using file_get_contents, or a file streamer
$sFileContents = <<<EOT
-- This is the first comment
SELECT * FROM dl_bookmarks WHERE iID=3;
/* This is the second comment */
SELECT * FROM dl_bookmarks WHERE sTitle=""Paragon" Initiative Enterprises Software consulting and web development for businesses \\ 'smes'";
# This is the third comment
SELECT * FROM dl_bookmarks WHERE sTitle LIKE '"xDEEP" Diving Equipment; Scuba Gear; Single tank BC; Side Mount; Double tank BCD; Diving computer 'equipment'';
EOT;
The ideal solution would be to execute the SQL file at command line, as the following code is demonstrating (with the above sample preceeding):
//Variant 1: Run a local SQL file. Since we stored our SQL contents in a
//variable (could have been retrieved before using eg. file_get_contents),
//we need to temporarily create a file for this
$sTempFile = tempnam(sys_get_temp_dir(), 'Sql');
//Create the temp file
if(!file_put_contents($sTempFile, $sFileContents)) {
trigger_error("Failed to create temporary file", E_USER_ERROR);
}
//Assemble the command
$sCommand = 'mysql'
. ' --host=' . $sHost
. ' --port=' . $sPort
. ' --user=' . $sUser
. ' --password=' . $sPass
. ' --database=' . $sName
. ' --execute="SOURCE ' . $sTempFile . '"'
;
$sOutput = shell_exec($sCommand);
//Cleanup: remove the temp file
if(!unlink($sTempFile)) {
trigger_error("Failed to remove temporary file", E_USER_ERROR);
}
...but some projects are on shared servers or other limited access servers where shell execution is not allowed, or the mysql command may not be available.
So, to avoid importing heavy external parsing libraries with heavy support to solve a small site-related problem, many projects loop through the SQL file, and split at every ; character into a new query, which is executed individually. Because SQL values can contain these characters too, this process fails in special cases, and the split is only done for ; characters at the end of a line (but this still fails for multi-line SQL values with this character appearing at the end.)
My proposed solution is to properly parse SQL files/content, and split the queries where the queries truly end, according to SQL standards. I am seeking help with this.
My code (with the above first block sample preceeding):
//Variant 2: Run a parser
//Connect to the database
$rMysqlI = new mysqli("localhost", "", "", "test", $sPort);
if ($rMysqlI->connect_errno) {
trigger_error("Failed to connect to MySQL: (" . $rMysqlI->connect_errno . ") " . $rMysqlI->connect_error, E_USER_ERROR);
}
//START_OF_PARSER
$iCur = 0; //Current character pointer inside the SQL content
$iInside = 0; //The context, in which the pointer is currently located (is the pointer inside a
//comment, an SQL query, or deeper into an SQL query value?)
$sBuffer = ""; //The buffer of the next individual query
$aQueries = array(); //The list of queries
while($iCur < strlen($sFileContents)) {
switch ($iInside) {
case 0: //Inside query-context
//Change context: Comments beginning with --
if(substr($sFileContents, $iCur, 2) === "--") {
$iCur++;
$iInside = 2;
//Change context: Comments beginning with /*
} elseif(substr($sFileContents, $iCur, 2) === "/*") {
$iCur++;
$iInside = 3;
//Change context: Comments beginning with #
} elseif(substr($sFileContents, $iCur, 1) === "#") {
$iInside = 2;
//Separator for a new query
} elseif(substr($sFileContents, $iCur, 1) === ";") {
$aQueries = trim($sBuffer); //$sBuffer; //Add current buffer to a unique array query item
$sBuffer = ""; //Start a new buffer
//Change context: query values opened with '
} elseif(substr($sFileContents, $iCur, 1) === "'") {
$sBuffer .= substr($sFileContents, $iCur, 1);
$iInside = 1;
//Change context: query values opened with "
} elseif(substr($sFileContents, $iCur, 1) === '"') {
$sBuffer .= substr($sFileContents, $iCur, 1);
$iInside = 4;
//Not a special character
} else {
$sBuffer .= substr($sFileContents, $iCur, 1);
}
break;
case 1: //Inside value-context, ending with '
//Escaping character found within the query-value
if(substr($sFileContents, $iCur, 1) === "\") {
$sBuffer .= substr($sFileContents, $iCur, 2);
$iCur++; //Skip next char
//The ending character for the query-value is found
} elseif(substr($sFileContents, $iCur, 1) === "'") {
$sBuffer .= substr($sFileContents, $iCur, 1);
$iInside = 0;
//Not a special character
} else {
$sBuffer .= substr($sFileContents, $iCur, 1);
}
break;
case 4: //Inside value-context, ending with "
//Escaping character found within the query-value
if(substr($sFileContents, $iCur, 1) === "\") {
$sBuffer .= substr($sFileContents, $iCur, 2);
$iCur = $iCur + 1; //Skip next char
//The ending character for the query-value is found
} elseif(substr($sFileContents, $iCur, 1) === '"') {
$sBuffer .= substr($sFileContents, $iCur, 1);
$iInside = 0;
//Not a special character
} else {
$sBuffer .= substr($sFileContents, $iCur, 1);
}
break;
case 2: //Inside comment-context, ending with newline
//A two-character newline is found, signalling the end of the comment
if(substr($sFileContents, $iCur, 2) === "rn") {
$iCur++;
$iInside = 0;
//A single-character newline is found, signalling the end of the comment
} elseif(substr($sFileContents, $iCur, 1) === "n" || substr($sFileContents, $iCur, 1) === "r") {
$iInside = 0;
}
break;
case 3: //Inside comment-context, ending with */
//A two-character */ is found, signalling the end of the comment
if(substr($sFileContents, $iCur, 2) === "*/") {
$iCur++;
$iInside = 0;
}
break;
default:
break;
}
$iCur++;
}
//END_OF_PARSER
//Preview our results
foreach($aQueries as $sQuery) {
if (!$rMysqlI->query($sQuery)) {
echo "ERROR "{$sQuery}": (" . $rMysqlI->errno . ") " . $rMysqlI->error . "<br />", E_USER_ERROR;
} else {
echo "SUCCESS "{$sQuery}"<br />", E_USER_ERROR;
}
}
For my problem-area, I am currently only interested in what is between START_OF_PARSER and END_OF_PARSER, ie. the parser bit, or the block of code that converts SQL content to an array of individual queries, which can/may be executed individually later on (as above foreach does), producing the same result as if the Variant 1 code block was used.
The concerns that I have about my code:
Query-value encapsulation characters: Are they always ' or ", or are other query-value encapsulation characters used?
Query-value Escaping characters: I don't think its a good idea to assume your own escaping skills (security reasons..), but ignoring security concerns for now, I assume looking out for a single escaping character (backslash ) is enough, to discover which query-value character is truly the ending query-value character. But is the backslash the only escaping character (or method) in other SQL dialects?
Comment ignoring: To remove comments, I look for /* and */, -- and newline, # and newline characters. Is this enough?
DB Abstraction: I assumed MySQL, but I know that the problem I am addressing is for PearDB, which could be Oracle or PostGreSQL or LiteSQL or etc. Do those SQL dialects also have the same characters for comments, escaping, value-encapsulation, etc? Will this work for most of those DBs ?
Executing this code seems to work under Windows (and presumably Linux too).
php sql parsing file
I have an external project, which needs to run an SQL file containing multiple SQL queries (a bit like mysqldump, if you like, but is user-created, with any kind of SQL statement/definition).
PHP mysqli() does not allow multi-queries, and this problem occurs around quite often, when (for example) updating a site-module/widget to a higher version, and there is an update.sql file of some sorts. Idem for de-installations.
Given the following (1):
//DB connection values
$sHost = "localhost";
$sName = "test";
$sUser = "";
$sPass = "";
$sPort = 3307;
//The following could be retrieved using file_get_contents, or a file streamer
$sFileContents = <<<EOT
-- This is the first comment
SELECT * FROM dl_bookmarks WHERE iID=3;
/* This is the second comment */
SELECT * FROM dl_bookmarks WHERE sTitle=""Paragon" Initiative Enterprises Software consulting and web development for businesses \\ 'smes'";
# This is the third comment
SELECT * FROM dl_bookmarks WHERE sTitle LIKE '"xDEEP" Diving Equipment; Scuba Gear; Single tank BC; Side Mount; Double tank BCD; Diving computer 'equipment'';
EOT;
The ideal solution would be to execute the SQL file at command line, as the following code is demonstrating (with the above sample preceeding):
//Variant 1: Run a local SQL file. Since we stored our SQL contents in a
//variable (could have been retrieved before using eg. file_get_contents),
//we need to temporarily create a file for this
$sTempFile = tempnam(sys_get_temp_dir(), 'Sql');
//Create the temp file
if(!file_put_contents($sTempFile, $sFileContents)) {
trigger_error("Failed to create temporary file", E_USER_ERROR);
}
//Assemble the command
$sCommand = 'mysql'
. ' --host=' . $sHost
. ' --port=' . $sPort
. ' --user=' . $sUser
. ' --password=' . $sPass
. ' --database=' . $sName
. ' --execute="SOURCE ' . $sTempFile . '"'
;
$sOutput = shell_exec($sCommand);
//Cleanup: remove the temp file
if(!unlink($sTempFile)) {
trigger_error("Failed to remove temporary file", E_USER_ERROR);
}
...but some projects are on shared servers or other limited access servers where shell execution is not allowed, or the mysql command may not be available.
So, to avoid importing heavy external parsing libraries with heavy support to solve a small site-related problem, many projects loop through the SQL file, and split at every ; character into a new query, which is executed individually. Because SQL values can contain these characters too, this process fails in special cases, and the split is only done for ; characters at the end of a line (but this still fails for multi-line SQL values with this character appearing at the end.)
My proposed solution is to properly parse SQL files/content, and split the queries where the queries truly end, according to SQL standards. I am seeking help with this.
My code (with the above first block sample preceeding):
//Variant 2: Run a parser
//Connect to the database
$rMysqlI = new mysqli("localhost", "", "", "test", $sPort);
if ($rMysqlI->connect_errno) {
trigger_error("Failed to connect to MySQL: (" . $rMysqlI->connect_errno . ") " . $rMysqlI->connect_error, E_USER_ERROR);
}
//START_OF_PARSER
$iCur = 0; //Current character pointer inside the SQL content
$iInside = 0; //The context, in which the pointer is currently located (is the pointer inside a
//comment, an SQL query, or deeper into an SQL query value?)
$sBuffer = ""; //The buffer of the next individual query
$aQueries = array(); //The list of queries
while($iCur < strlen($sFileContents)) {
switch ($iInside) {
case 0: //Inside query-context
//Change context: Comments beginning with --
if(substr($sFileContents, $iCur, 2) === "--") {
$iCur++;
$iInside = 2;
//Change context: Comments beginning with /*
} elseif(substr($sFileContents, $iCur, 2) === "/*") {
$iCur++;
$iInside = 3;
//Change context: Comments beginning with #
} elseif(substr($sFileContents, $iCur, 1) === "#") {
$iInside = 2;
//Separator for a new query
} elseif(substr($sFileContents, $iCur, 1) === ";") {
$aQueries = trim($sBuffer); //$sBuffer; //Add current buffer to a unique array query item
$sBuffer = ""; //Start a new buffer
//Change context: query values opened with '
} elseif(substr($sFileContents, $iCur, 1) === "'") {
$sBuffer .= substr($sFileContents, $iCur, 1);
$iInside = 1;
//Change context: query values opened with "
} elseif(substr($sFileContents, $iCur, 1) === '"') {
$sBuffer .= substr($sFileContents, $iCur, 1);
$iInside = 4;
//Not a special character
} else {
$sBuffer .= substr($sFileContents, $iCur, 1);
}
break;
case 1: //Inside value-context, ending with '
//Escaping character found within the query-value
if(substr($sFileContents, $iCur, 1) === "\") {
$sBuffer .= substr($sFileContents, $iCur, 2);
$iCur++; //Skip next char
//The ending character for the query-value is found
} elseif(substr($sFileContents, $iCur, 1) === "'") {
$sBuffer .= substr($sFileContents, $iCur, 1);
$iInside = 0;
//Not a special character
} else {
$sBuffer .= substr($sFileContents, $iCur, 1);
}
break;
case 4: //Inside value-context, ending with "
//Escaping character found within the query-value
if(substr($sFileContents, $iCur, 1) === "\") {
$sBuffer .= substr($sFileContents, $iCur, 2);
$iCur = $iCur + 1; //Skip next char
//The ending character for the query-value is found
} elseif(substr($sFileContents, $iCur, 1) === '"') {
$sBuffer .= substr($sFileContents, $iCur, 1);
$iInside = 0;
//Not a special character
} else {
$sBuffer .= substr($sFileContents, $iCur, 1);
}
break;
case 2: //Inside comment-context, ending with newline
//A two-character newline is found, signalling the end of the comment
if(substr($sFileContents, $iCur, 2) === "rn") {
$iCur++;
$iInside = 0;
//A single-character newline is found, signalling the end of the comment
} elseif(substr($sFileContents, $iCur, 1) === "n" || substr($sFileContents, $iCur, 1) === "r") {
$iInside = 0;
}
break;
case 3: //Inside comment-context, ending with */
//A two-character */ is found, signalling the end of the comment
if(substr($sFileContents, $iCur, 2) === "*/") {
$iCur++;
$iInside = 0;
}
break;
default:
break;
}
$iCur++;
}
//END_OF_PARSER
//Preview our results
foreach($aQueries as $sQuery) {
if (!$rMysqlI->query($sQuery)) {
echo "ERROR "{$sQuery}": (" . $rMysqlI->errno . ") " . $rMysqlI->error . "<br />", E_USER_ERROR;
} else {
echo "SUCCESS "{$sQuery}"<br />", E_USER_ERROR;
}
}
For my problem-area, I am currently only interested in what is between START_OF_PARSER and END_OF_PARSER, ie. the parser bit, or the block of code that converts SQL content to an array of individual queries, which can/may be executed individually later on (as above foreach does), producing the same result as if the Variant 1 code block was used.
The concerns that I have about my code:
Query-value encapsulation characters: Are they always ' or ", or are other query-value encapsulation characters used?
Query-value Escaping characters: I don't think its a good idea to assume your own escaping skills (security reasons..), but ignoring security concerns for now, I assume looking out for a single escaping character (backslash ) is enough, to discover which query-value character is truly the ending query-value character. But is the backslash the only escaping character (or method) in other SQL dialects?
Comment ignoring: To remove comments, I look for /* and */, -- and newline, # and newline characters. Is this enough?
DB Abstraction: I assumed MySQL, but I know that the problem I am addressing is for PearDB, which could be Oracle or PostGreSQL or LiteSQL or etc. Do those SQL dialects also have the same characters for comments, escaping, value-encapsulation, etc? Will this work for most of those DBs ?
Executing this code seems to work under Windows (and presumably Linux too).
php sql parsing file
php sql parsing file
asked Feb 13 at 17:36
Florian Mertens
1011
1011
bumped to the homepage by Community♦ 15 hours ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 15 hours ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
Why are you trying to build a SQL parser? This seems like a lot of work and a lot of potential fragility for a use case better suited for other approaches. You are dealing with a large SQL script. Execute it as such. Don’t parse it apart.
You are invited to copy paste your answer to all those projects out there, that already have coded their own bad parser... I am only improving a terrible implementation (see the brief description above) for a terrible ongoing practice.
– Florian Mertens
Feb 14 at 11:28
On your proposal to 'Execute it as such.': This is not always possible (see my description on limitations), and projects are using this general method to work around.
– Florian Mertens
Feb 14 at 11:31
@FlorianMertens So what makes your terrible experience better? Is it building a complex and fragile SQL parser or it is getting your software into better environments? The are many free/cheap options for deploying your software in environments that aren't using the antiquated shared hosting model, and in which you can install software that does what you want and allows you to have full control over your environment. This is really a technical debt problem here. Do you continue to throw more investment into your bad technical debt or do you get rid of it? My review advice is to dump the debt
– Mike Brant
Feb 15 at 22:59
The current problem is related to opensource projects, installed on thousands of unknown hosts across the world. It is not my host, but the hosts of thousands of other administrators/developers. However, if you wish to reformulate the review circumstances for a different review, you can indeed copy/paste my review and create a new one. I am sticking to my review. Please reread it. The hosting environment is not part or considered part of the review.
– Florian Mertens
Feb 16 at 3:26
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Why are you trying to build a SQL parser? This seems like a lot of work and a lot of potential fragility for a use case better suited for other approaches. You are dealing with a large SQL script. Execute it as such. Don’t parse it apart.
You are invited to copy paste your answer to all those projects out there, that already have coded their own bad parser... I am only improving a terrible implementation (see the brief description above) for a terrible ongoing practice.
– Florian Mertens
Feb 14 at 11:28
On your proposal to 'Execute it as such.': This is not always possible (see my description on limitations), and projects are using this general method to work around.
– Florian Mertens
Feb 14 at 11:31
@FlorianMertens So what makes your terrible experience better? Is it building a complex and fragile SQL parser or it is getting your software into better environments? The are many free/cheap options for deploying your software in environments that aren't using the antiquated shared hosting model, and in which you can install software that does what you want and allows you to have full control over your environment. This is really a technical debt problem here. Do you continue to throw more investment into your bad technical debt or do you get rid of it? My review advice is to dump the debt
– Mike Brant
Feb 15 at 22:59
The current problem is related to opensource projects, installed on thousands of unknown hosts across the world. It is not my host, but the hosts of thousands of other administrators/developers. However, if you wish to reformulate the review circumstances for a different review, you can indeed copy/paste my review and create a new one. I am sticking to my review. Please reread it. The hosting environment is not part or considered part of the review.
– Florian Mertens
Feb 16 at 3:26
add a comment |
up vote
0
down vote
Why are you trying to build a SQL parser? This seems like a lot of work and a lot of potential fragility for a use case better suited for other approaches. You are dealing with a large SQL script. Execute it as such. Don’t parse it apart.
You are invited to copy paste your answer to all those projects out there, that already have coded their own bad parser... I am only improving a terrible implementation (see the brief description above) for a terrible ongoing practice.
– Florian Mertens
Feb 14 at 11:28
On your proposal to 'Execute it as such.': This is not always possible (see my description on limitations), and projects are using this general method to work around.
– Florian Mertens
Feb 14 at 11:31
@FlorianMertens So what makes your terrible experience better? Is it building a complex and fragile SQL parser or it is getting your software into better environments? The are many free/cheap options for deploying your software in environments that aren't using the antiquated shared hosting model, and in which you can install software that does what you want and allows you to have full control over your environment. This is really a technical debt problem here. Do you continue to throw more investment into your bad technical debt or do you get rid of it? My review advice is to dump the debt
– Mike Brant
Feb 15 at 22:59
The current problem is related to opensource projects, installed on thousands of unknown hosts across the world. It is not my host, but the hosts of thousands of other administrators/developers. However, if you wish to reformulate the review circumstances for a different review, you can indeed copy/paste my review and create a new one. I am sticking to my review. Please reread it. The hosting environment is not part or considered part of the review.
– Florian Mertens
Feb 16 at 3:26
add a comment |
up vote
0
down vote
up vote
0
down vote
Why are you trying to build a SQL parser? This seems like a lot of work and a lot of potential fragility for a use case better suited for other approaches. You are dealing with a large SQL script. Execute it as such. Don’t parse it apart.
Why are you trying to build a SQL parser? This seems like a lot of work and a lot of potential fragility for a use case better suited for other approaches. You are dealing with a large SQL script. Execute it as such. Don’t parse it apart.
answered Feb 14 at 3:49
Mike Brant
8,733619
8,733619
You are invited to copy paste your answer to all those projects out there, that already have coded their own bad parser... I am only improving a terrible implementation (see the brief description above) for a terrible ongoing practice.
– Florian Mertens
Feb 14 at 11:28
On your proposal to 'Execute it as such.': This is not always possible (see my description on limitations), and projects are using this general method to work around.
– Florian Mertens
Feb 14 at 11:31
@FlorianMertens So what makes your terrible experience better? Is it building a complex and fragile SQL parser or it is getting your software into better environments? The are many free/cheap options for deploying your software in environments that aren't using the antiquated shared hosting model, and in which you can install software that does what you want and allows you to have full control over your environment. This is really a technical debt problem here. Do you continue to throw more investment into your bad technical debt or do you get rid of it? My review advice is to dump the debt
– Mike Brant
Feb 15 at 22:59
The current problem is related to opensource projects, installed on thousands of unknown hosts across the world. It is not my host, but the hosts of thousands of other administrators/developers. However, if you wish to reformulate the review circumstances for a different review, you can indeed copy/paste my review and create a new one. I am sticking to my review. Please reread it. The hosting environment is not part or considered part of the review.
– Florian Mertens
Feb 16 at 3:26
add a comment |
You are invited to copy paste your answer to all those projects out there, that already have coded their own bad parser... I am only improving a terrible implementation (see the brief description above) for a terrible ongoing practice.
– Florian Mertens
Feb 14 at 11:28
On your proposal to 'Execute it as such.': This is not always possible (see my description on limitations), and projects are using this general method to work around.
– Florian Mertens
Feb 14 at 11:31
@FlorianMertens So what makes your terrible experience better? Is it building a complex and fragile SQL parser or it is getting your software into better environments? The are many free/cheap options for deploying your software in environments that aren't using the antiquated shared hosting model, and in which you can install software that does what you want and allows you to have full control over your environment. This is really a technical debt problem here. Do you continue to throw more investment into your bad technical debt or do you get rid of it? My review advice is to dump the debt
– Mike Brant
Feb 15 at 22:59
The current problem is related to opensource projects, installed on thousands of unknown hosts across the world. It is not my host, but the hosts of thousands of other administrators/developers. However, if you wish to reformulate the review circumstances for a different review, you can indeed copy/paste my review and create a new one. I am sticking to my review. Please reread it. The hosting environment is not part or considered part of the review.
– Florian Mertens
Feb 16 at 3:26
You are invited to copy paste your answer to all those projects out there, that already have coded their own bad parser... I am only improving a terrible implementation (see the brief description above) for a terrible ongoing practice.
– Florian Mertens
Feb 14 at 11:28
You are invited to copy paste your answer to all those projects out there, that already have coded their own bad parser... I am only improving a terrible implementation (see the brief description above) for a terrible ongoing practice.
– Florian Mertens
Feb 14 at 11:28
On your proposal to 'Execute it as such.': This is not always possible (see my description on limitations), and projects are using this general method to work around.
– Florian Mertens
Feb 14 at 11:31
On your proposal to 'Execute it as such.': This is not always possible (see my description on limitations), and projects are using this general method to work around.
– Florian Mertens
Feb 14 at 11:31
@FlorianMertens So what makes your terrible experience better? Is it building a complex and fragile SQL parser or it is getting your software into better environments? The are many free/cheap options for deploying your software in environments that aren't using the antiquated shared hosting model, and in which you can install software that does what you want and allows you to have full control over your environment. This is really a technical debt problem here. Do you continue to throw more investment into your bad technical debt or do you get rid of it? My review advice is to dump the debt
– Mike Brant
Feb 15 at 22:59
@FlorianMertens So what makes your terrible experience better? Is it building a complex and fragile SQL parser or it is getting your software into better environments? The are many free/cheap options for deploying your software in environments that aren't using the antiquated shared hosting model, and in which you can install software that does what you want and allows you to have full control over your environment. This is really a technical debt problem here. Do you continue to throw more investment into your bad technical debt or do you get rid of it? My review advice is to dump the debt
– Mike Brant
Feb 15 at 22:59
The current problem is related to opensource projects, installed on thousands of unknown hosts across the world. It is not my host, but the hosts of thousands of other administrators/developers. However, if you wish to reformulate the review circumstances for a different review, you can indeed copy/paste my review and create a new one. I am sticking to my review. Please reread it. The hosting environment is not part or considered part of the review.
– Florian Mertens
Feb 16 at 3:26
The current problem is related to opensource projects, installed on thousands of unknown hosts across the world. It is not my host, but the hosts of thousands of other administrators/developers. However, if you wish to reformulate the review circumstances for a different review, you can indeed copy/paste my review and create a new one. I am sticking to my review. Please reread it. The hosting environment is not part or considered part of the review.
– Florian Mertens
Feb 16 at 3:26
add a comment |
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f187488%2fmulti-query-php-parser-for-sql-files%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown