Using PostgreSQL and Data Matrices
To see what can be done with the help of the EFEU package, we start with a simple example. Imagine a database storing information about a labor market. One table will be used to store the name, the birthday, and the gender of a person. The second table will contain information about the income a person had in a certain year.
Let's a look at the dump of the database:
CREATE TABLE "persons" ( "name" text, "birth" date, "gender" character(1) ); CREATE UNIQUE INDEX persons_name_key on persons ("name"); CREATE TABLE "income" ( "name" text, "year" integer, "income" integer ); CREATE INDEX income_name_key on income ("name"); COPY "persons" FROM stdin; Albert 1970-01-01 m John 1973-04-04 m Carla 1963-10-21 f Jenny 1982-09-21 f Hans 1978-08-09 m Epi 1976-06-12 m Etschi 1960-02-20 f Bill 1980-07-12 m \. COPY "income" FROM stdin; Albert 1997 32000 Albert 1998 28000 Albert 1999 30000 Albert 2000 35000 Jon 1998 20000 Jon 1999 40000 Jon 2000 50000 Carla 1998 30000 Carla 1999 32000 Jenny 1999 45000 Jenny 2000 45000 Hans 1999 47000 Hans 2000 59000 Epi 1999 25000 Epi 2000 21000 Etschi 1999 30000 Etschi 2000 34000 Bill 1994 41000 Bill 1995 43000 \. CREATE VIEW pincome AS SELECT p.name, p.gender, i.year, i.income FROM persons p, income i WHERE p.name = i.name;
After defining the table and inserting the data, we define a view, which will contain information about a person and the income that person had in that year.
In the next step, we create a database and add the data shown above to the database.
[hs@athlon test]$ createdb persons CREATE DATABASE [hs@athlon test]$ psql persons < data.sql CREATE CREATE CREATE CREATE CREATE
Now, we want to write a short esh script that counts the number of persons, where we can find information about the income in a certain year:
#!/usr/local/efeu-1.0-5/bin/esh loadlib("PG"); // creating a new pg object and selecting data PG pg = PG("persons"); pg.query(string ! SELECT gender, year, COUNT(*) FROM pincome GROUP BY year, gender !); mdmat md = pg.mdmat(int, "COUNT", "gender, year"); md.save("matrix.dat"); free(md); close(pg);
First, we load the PostgreSQL module, and connect to the database by creating a PG object. In the next step, we define an SQL string we want to send to the server. Now we can create an mdmat object, which is a multidimensional cube in which we will store the result.
The values stored in the cube will be integer values. The second parameter tells esh which column in the result of the query will contain the values. The third parameter defines a list of the axes our cube should have.
Finally, the result is saved in matrix.dat, and the memory allocated by the software is freed.
Let's see data can be found in matrix.dat. Therefore, we use mdprint:
[hs@athlon test]$ mdprint matrix.dat ##MDMAT 2.0 ##Titel COUNT by gender, year ##Type int ##Locale C ##Zeilen gender ##Spalten year 6 1994 1995 1997 1998 1999 2000 f 0 0 0 1 3 2 m 1 1 1 1 3 3
If we have included the binary directory of EFEU in $PATH, mdprint will display the result shown above. We can see that three males were found in the database. The important thing about data cubes is that even cells of the cube where the value is zero are displayed. That is the main advantage of multidimensional cubesZero values are not returned by the database.
But mdprint has some other interesting features. Now we want the gender to be displayed in the x-axis:
[hs@athlon test]$ mdprint matrix.dat -b -x gender 2 f m 1994 0 1 1995 0 1 1997 0 1 1998 1 1 1999 3 3 2000 2 3
-x tells mdprint which axis to use as the x-axis. -b tells the script that the header has to be omitted.
The next example shows how certain values can be selected from the data cube:
[hs@athlon test]$ mdprint matrix.dat -b -x gender gender=m year=1999,2000 1 m 1999 3 2000 3