Home > Articles > Data > SQL

  • Print
  • + Share This
This chapter is from the book

This chapter is from the book

Inserting Data

Inserting data into your table is usually done with the INSERT command. Let's assume that you want to insert an employee named Alan Cop with a salary of 20000 USD into the table called oldemp:

INSERT INTO oldemp (empname,salary,currency) VALUES ('Alan Cop',20000,'USD');

The first variable to be specified is the name of the table where you want to insert the data. The names of the columns where data is going to be added have to be in parentheses. In our example, we don't want to add a value in the column called id because this column will get the value from the serial used as the primary key. All fields that are not listed in brackets will be NULL or default values after the INSERT.

Importing Large Datafiles and Using Pipes on UNIX Systems

If large amounts of data have to be imported into a PostgreSQL server, INSERT statements are far too slow for that purpose.

In this case, other methods, such as files and pipes, are far more effective. You can use the COPY command to import data directly from files. PostgreSQL supports reading both binary and ASCII data, which can be quite useful because working with binary data can sometimes be much faster.

Suppose you want to import the following tab-separated file into table oldemp:

1    Paul Stallman    18000  USD
2    Steven Neundorf   4000  DM
3    Josef Russell   20000  USD
4    Alan Torvalds   22000  USD

Here is the required COPY command:

COPY oldemp FROM '/tmp/people.data' USING DELIMITERS '\t';

You import the file people.data in /tmp into the table called oldemp. You use tabs (\t) as delimiters; but let's try to import the file again:

name=# COPY oldemp FROM '/tmp/people.data' USING DELIMITERS '\t';
ERROR: copy: line 1, Cannot insert a duplicate key into unique index emp_id_key

It can't be done again because primary keys have to be unique. Importing the data a second time leads to double entries.

In this example, we manually inserted data into a column that uses a serial as its primary key. What happened to the sequence the serial uses to generate the row ids? You use a SELECT statement to retrieve the value (you learn about SELECT in Chapter 4):

name=# SELECT sequence_name, last_value FROM emp_id_seq;
 sequence_name | last_value
---------------+------------
 emp_id_seq  |     1
(1 row)  

You can see that the value of the sequence is still 1, although we have already inserted four values into the table. This can lead to real difficulties when you start to insert values into the database using INSERT statements:

INSERT INTO oldemp (empname, salary, currency) 
    VALUES ('John Blundell',19000,'USD');
ERROR: Cannot insert a duplicate key into unique index emp_id_key

Because we want the serial to generate the primary key, we don't add the name and the value of the field id to the INSERT statement. During the INSERT, the current value of the emp_id_seq sequence is selected and the value of the sequence is incremented by 1 if the insert succeeds. Accidentally, the sequence's value is still 1, and 1 is already used in the table. Because primary keys have to be unique, the INSERT fails. Temporary tables can be used to solve the problem. If you want to use sequences for your primary key, create a temporary table and import the data to that table. Then perform a SELECT INTO operation to copy the data to the final table. An advantage of that algorithm is that you don't have to remove trash when the import fails. Removing trash from tables can be tricky in production environments.

Pipes are a powerful feature on UNIX systems and can be used to redirect the standard output of a process directly to a PostgreSQL server. Using them is simple:

yourprocess.pl | psql yourdatabase

The output of yourprocess.pl must be formatted properly so that psql can use it as standard input. A good choice of a properly formatted standard output is an SQL statement.

Note

The standard error of the yourprocess.pl process is being omitted, because only standard output is redirected.

If you have processes that run for hours or produce a lot of standard output or standard errors, you might want to store the result of the process in a file. In this case, the UNIX command nohup might be useful. If you start a background process with the help of nohup, you can log out while the process is working. This is not possible when starting a command with & only. Here is an example of a nohup command:

nohup ./myprocess.pl &

myprocess.pl might be a Perl script that performs complex PostgreSQL commands. If you have logged in to a remote machine using your favorite terminal software and if your network is not a reliable one, nohup can save you a lot a time (imagine a process getting killed after 26 hours because of a network interruption).

  • + Share This
  • 🔖 Save To Your Account