Grep is a common Unix command. It is used to
search. Unix's grep searches one or more input files for lines
containing a match to a specified pattern. By default, grep prints the
matching lines.
Simplest PHP Site Search-engine Using Unix Grep
Grep is a common Unix command. It is used to search. Unix's grep
searches one or more input files for lines containing a match to a
specified pattern. By default, grep prints the matching lines.
PHP can call external programs. —It can call the Unix commands that
are on your Linux server. In Unix, we can easily use the command grep
to make a simple search-engine. We will add some complexity to this, by
having the form to accept the search string and the code to display the
results, all in the same file.
Here is the PHP script using grep that includes the PHP code and the
HTML search-engine form all in one page (save it in a file with a .php
extension):
<html><head><title>Grep Search-engine</title></head>
<body>
<h1>Grep Search-engine with PHP</h1>
<p>
Search <a href="http://programmabilities.com/"
title="programmabilities.com">Programmabilities.com</a>:
</p>
<p>
<form action="<?php echo "$PHP_SELF"; ?>" method="post">
<input type="text" name="searchstr"
value="<?php echo "$searchstr"; ?>" size="20"
maxlength="30"/>
<input type="submit" value="Search!"/>
</form>
</p>
<?php
if (! empty($searchstr)) {
// empty() is used to check if we've any search string.
// If we do, call grep and display the results.
echo '<hr/><br/>';
// Call grep with case-insensitive search mode on all files
$cmdstr = "grep -i $searchstr *";
$fp = popen($cmdstr, 'r'); // open the output of command as a pipe
$myresult = array(); // to hold my search results
while ($buffer = fgetss($fp, 4096)) {
// grep returns in the format
// filename: line
// So, we use split() to split the data
list($fname, $fline) = split(':', $buffer, 2);
// we take only the first hit per file
if (! defined($myresult[$fname])) {
$myresult[$fname] = $fline;
}
}
// we have results in a hash. lets walk through it and print it
if (count($myresult)) {
echo '<ol><br/>';
while (list($fname, $fline) = each ($myresult)) {
echo "<li><a href=\"$fname\">$fname</a> : $fline </li>\n";
}
echo '</ol><br/>';
} else {
// no hits
echo "Sorry. Search on <strong>$searchstr</strong>
returned no results.<br/>\n";
}
pclose($fp);
}
?>
</body>
</html>
...And that's it! By using Unix's built in grep search command on
your Linux server, you don't have to write reams of PHP code yourself
from scratch to conduct the search part of your PHP search-engine
program.
Please note that this is not an optimal way to implement a
search-engine. It will help to learn about PHP. Ideally, one should
build a database of keywords and then use the search against that. This
example is not an optimal way to implement a search-engine because of
the overhead and the server load it generates by grepping each document
every time a user initiates a search. That is exactly why more clever
search-engines with flat structure index all pages and just search a
file generated from all. Arguably this means you would have to update
that file every time the site gets updated, but in the long run it
would be a lot less straining for the server.
Notes:
- PHP_SELF is a variable maintained by PHP. It contains the name of the current file.
- fgets() function reads a line, at the most 4096(specified) characters long.
- fgetss() is just like fgets(), but it will parse the output to have proper HTML.
- split() is called with 2, because we need only a split by two. Further ':' are ignored.
- each() is an array function which helps to easily walk through an array.
- popen() / pclose() are identical to fopen() / fclose(), but operate on pipes.
|