Home > Articles > Data

📄 Contents

  1. About EFEU
  2. Installation
  3. Using PostgreSQL and Data Matrices
  4. Conclusion
  • Print
  • + Share This
This chapter is from the book

This chapter is from the book

Using PostgreSQL and Data Matrices

To see what can be done with the help of the EFEU package, we start with a simple example. Imagine a database storing information about a labor market. One table will be used to store the name, the birthday, and the gender of a person. The second table will contain information about the income a person had in a certain year.

Let's a look at the dump of the database:

CREATE TABLE "persons" (
    "name" text,
    "birth" date,
    "gender" character(1)
CREATE UNIQUE INDEX persons_name_key on persons ("name");
CREATE TABLE "income" (
    "name" text,
    "year" integer,
    "income" integer
CREATE INDEX income_name_key on income ("name");

COPY "persons" FROM stdin;
Albert 1970-01-01   m
John  1973-04-04   m
Carla 1963-10-21   f
Jenny 1982-09-21   f
Hans  1978-08-09   m
Epi  1976-06-12   m
Etschi 1960-02-20   f
Bill  1980-07-12   m

COPY "income" FROM stdin;
Albert 1997  32000
Albert 1998  28000
Albert 1999  30000
Albert 2000  35000
Jon  1998  20000
Jon  1999  40000
Jon  2000  50000
Carla 1998  30000
Carla 1999  32000
Jenny 1999  45000
Jenny 2000  45000
Hans  1999  47000
Hans  2000  59000
Epi  1999  25000
Epi  2000  21000
Etschi 1999  30000
Etschi 2000  34000
Bill  1994  41000
Bill  1995  43000

 AS SELECT p.name, p.gender, i.year, i.income
 FROM persons p, income i
 WHERE p.name = i.name;

After defining the table and inserting the data, we define a view, which will contain information about a person and the income that person had in that year.

In the next step, we create a database and add the data shown above to the database.

[hs@athlon test]$ createdb persons
[hs@athlon test]$ psql persons < data.sql

Now, we want to write a short esh script that counts the number of persons, where we can find information about the income in a certain year:

// creating a new pg object and selecting data
PG pg = PG("persons");
pg.query(string !
SELECT gender, year, COUNT(*)
FROM pincome
GROUP BY year, gender
mdmat md = pg.mdmat(int, "COUNT", "gender, year");

First, we load the PostgreSQL module, and connect to the database by creating a PG object. In the next step, we define an SQL string we want to send to the server. Now we can create an mdmat object, which is a multidimensional cube in which we will store the result.

The values stored in the cube will be integer values. The second parameter tells esh which column in the result of the query will contain the values. The third parameter defines a list of the axes our cube should have.

Finally, the result is saved in matrix.dat, and the memory allocated by the software is freed.

Let's see data can be found in matrix.dat. Therefore, we use mdprint:

[hs@athlon test]$ mdprint matrix.dat
##MDMAT  2.0
##Titel  COUNT by gender, year
##Type  int
##Locale C
##Zeilen gender
##Spalten year
6    1994  1995  1997  1998  1999  2000
f    0    0    0    1    3    2
m    1    1    1    1    3    3

If we have included the binary directory of EFEU in $PATH, mdprint will display the result shown above. We can see that three males were found in the database. The important thing about data cubes is that even cells of the cube where the value is zero are displayed. That is the main advantage of multidimensional cubes—Zero values are not returned by the database.

But mdprint has some other interesting features. Now we want the gender to be displayed in the x-axis:

[hs@athlon test]$ mdprint matrix.dat -b -x gender
2    f    m
1994  0    1
1995  0    1
1997  0    1
1998  1    1
1999  3    3
2000  2    3

-x tells mdprint which axis to use as the x-axis. -b tells the script that the header has to be omitted.

The next example shows how certain values can be selected from the data cube:

[hs@athlon test]$ mdprint matrix.dat -b -x gender gender=m year=1999,2000
1    m
1999  3
2000  3
  • + Share This
  • 🔖 Save To Your Account