Home > Articles > Data

Tolerant Search Algorithms with PHP and PostgreSQL

  • Print
  • + Share This
Picture yourself looking for a certain product in a Web shop, but you don't remember how to spell the name of the product correctly. Imagine looking for a person in your phone book, but you don't know the correct address. If you have had to face these problems, this article on tolerant search algorithms with PHP and PostgreSQL may be just what you have been looking for.
From the author of

Using soundex

One way to implement tolerant search algorithms is to use the soundex algorithm for PostgreSQL, which can be found in the contributed directory of the PostgreSQL source tree. soundex is a method that matches similar sounding words. The algorithms substitute every word with a sequence of characters. This way of finding similar names was first used by the United States Census in 1880, 1900, and 1910, and has turned out to be an efficient way to perform error-tolerant searching.

In this section, we will present a simple example written in PHP, but before we get to PHP, we'll create a database called data:

[hs@athlon hs]$ createdb data

In the next step, we go to the contrib/soundex directory in our PostgreSQL source tree, and compile the soundex module using make and make install. The binaries are now installed in the right directories, and we can insert the soundex module into the database we have just created. Therefore, we can use an SQL script in the soundex directory:

 [hs@athlon soundex]$ psql data < soundex.sql

If no error has occurred, two functions have been inserted into the database.

After that, we write a small SQL script containing the data we want to insert into the database:

CREATE TABLE persons(id int4, name varchar(30));
COPY persons FROM stdin;
1    Paul
2    Alex
3    Epi
4    Eppi
5    Ebi
6    Everlast

To insert the data, we use the following command:

[hs@athlon html]$ psql data < data.sql

Now, we can start writing a simple PHP script, which displays a small form including a text field and a submit button:

    echo '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN"
        <HEAD> <TITLE>Searching ...</TITLE> </HEAD>
        <body> ';
    echo '<form name="searching" action="search.php">
        <INPUT NAME="searchstring" TYPE="text" SIZE="30"><br>
        <INPUT TYPE="submit" VALUE="Send request to the server">

We choose the name searchstring for our text field. When clicking on the button, the user will be redirected to search.php, which can be seen in the following listing:

    # displaying header
    echo '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN"
        <HEAD> <TITLE>Displaying result</TITLE> </HEAD>
        <body> ';
    # connecting to the database and performing query
    $dbh=pg_connect("dbname=data user=hs");
    $sql="SELECT * FROM persons WHERE soundex(name)=soundex('$searchstring')";
    $result=pg_exec($dbh, $sql);
        # do something bad ...
        # retrieving result
        $rows = pg_numrows($result);
        echo '<b>Result:</b><br>';
        for ($j=0; $j < $rows; $j++)
            $id=pg_result($result, $j, "id");
            $name=pg_result($result, $j, "name");
            echo "id: $id, name: $name<br>\n";
    echo '</body></html>';

search.php displays a header and connects to the database called data that we created before. In the next step, we define a string containing the SQL code, and we need to select the data from the database. Let's have a closer look at that SQL command.

We select all columns in the persons table, in which the soundex code of the value submitted to search.php is equal to the soundex code of the records stored in the table. The query will return all records that sound similar to the value passed to the script.

In the next step, we retrieve the result of the query, and display it on the screen.

Let's try the script by looking for Epi:

id: 3, name: Epi
id: 4, name: Eppi
id: 5, name: Ebi

The script has returned three records sounding like Epi. If we query for Alice, the result will look like this:

id: 2, name: Alex

Alex sounds similar to Alice, so the record has been retrieved from the query.

  • + Share This
  • 🔖 Save To Your Account