Home > Articles > Data

  • Print
  • + Share This
From the author of

Performance Issues

Up to now, we have discussed how to use the soundex module, but we haven't dealt with performance issues yet. Especially when the amount of data is high, it is necessary to have a closer look at the performance of your system.

In most cases, a higher level of performance can be used by defining an index on the columns we want to query. This can be done with the command like the following:

data=# CREATE INDEX idx_persons_name ON persons(name);

Normally, we expect the index to speed up the query we have shown in the previous example, but let's have a look at the execution plan of the SQL statement:

data=# EXPLAIN SELECT * FROM persons WHERE soundex(name)=soundex('Epi');
Seq Scan on persons (cost=0.00..27.50 rows=10 width=16)

The database performs a sequential scan on the persons table. This will lead to significant problems when the amount of data grows. No matter how much data we insert into the table, the system will still perform a sequential scan because PostgreSQL cannot use an index when an operation has to be performed with the values of the column. In our case, the soundex code has to be generated, and this will cause PostgreSQL to use a sequential scan instead of an index scan.

To get around the problem, we can simply add an additional column storing the soundex code of column name. Then, we define an index on the soundex code, so we do not have to use the soundex function in the query anymore—it has moved into the INSERT statement used to create the table.

The next example shows you what soundex codes look like. We select all data in the person table, and add a column containing the soundex code to it:

data=# SELECT *, soundex(name) as sndex INTO snd_persons FROM persons;

Let's see what is in the table:

data=# SELECT * FROM snd_persons;
 id |  name  | sndex
 1 | Paul   | P400
 2 | Alex   | A420
 3 | Epi   | E100
 4 | Eppi   | E100
 5 | Ebi   | E100
 6 | Everlast | E164
(6 rows)

If an index is defined on the sndex column, the performance will increase significantly.

  • + Share This
  • 🔖 Save To Your Account