PostgreSQL - An Overview

by Rick Moen

Revised: Wednesday, 2001-12-05

Master version will be at and I'll try to improve it there.

History and concepual overview of relational databases, SQL, and PostgreSQL:

1970: E.F. Codd of IBM defined the relational model, naming a number of theoretical characteristics a relational database must satisfy. Note that it immediately became a buzzword for heavy-duty databases, and no real database satisfies all of Dodd's criteria. Dodd defined a type of programming language for relational databases, Structured Query Language, used for both interactive queries and reports.

Relational database is organised into tables aka relations, each with a set number of columns representing data categories. Each row is a unique instance of the data represented (a record or tuple).

"View" is an abstract table containing selected data from other tables, a virtual report.

"Domain" of a column is the range of values it could in theory contain.

"Constraints" are rules for what a column may contain in the context of a specific table, from among its theoretical domain. "Metadata" (or "relation scheme") is the set of all formal descriptions of the tables, columns, domains, and constraints comprising a database.


Queries can use "relational calculus". Uses logical notation instead of arithmetic/albegraic ones.

ACID test:

1974: Donald Chamberlain and others at IBM Watson Research Center invented SEQUEL (Structured English QUEry Language), following Codd's rules, for a series of databases at IBM, with direct descendants DB2, Oracle, and Sybase.

SQL pioneered "transactional" databases: Changes to the database are implemented in an atomic operation (as with a journaling filesystem) -- helps prevent corruption & locking problems.

1977: Ingres database developed at UC Berkeley as an academic project to explore the relational model.

1986: POSTGRES DBMS, successor to Ingres, created by UCB Prof. Michael Stonebraker, as an object-relational database.

1992 (approx.): Illustra Information Technologies (now called Informix) forked off a proprietary variant, Informix.

1995: UC Berkeley graduate students Jolly Chen and Andrew Yu added SQL (Structured Query Language) to POSTGRES. Others recoded the entire project in ANSI C. It was then renamed Postgres95.

1996: Name changed to PostgreSQL: Adds subqueries. Table locking replaced with multi-version concurrency control. (Readers do not block writers. Writers do not block readers. Finer-grained than row-level locking, though such locks are also available.) Mostly SQL92- and SQL99-compliant.

What's Missing (compared to, e.g., Oracle):

The limitations on capacity are now so huge as to be hardly worth mentioning (e.g., 1GB size limit per field).

This list used to include rules, inheritance, and multi-version concurrency control (which reduces lock contention), but PostgreSQL now has them.

Middleware access:

Front ends:

Security (also addressed in Administration section, below):

Performance tips:

no-fsync mode (command option -F) speeds up disk access: Prevents flushing data to disk at the end of each transaction.

Batch your "inserts" into a COPY command (covered below).

Use indexes (covered below). Use LIMIT operator on SELECTs. Disadvantage is extra RAM and storage usage.

Run VACUUM (covered below).

Tune amount of shared buffer memory using -B startup flag.

Tune amount of sort memory using -S startup flag.

Spread databases among physical disk drives, specifying their pathnames using "initlocation" utility to create "locations" that will then be known to PostgreSQL, which you can subsequently specify when using createdb command-line tool or CREATEDB SQL statement. (Once database files, tables, and indexes already exist, you can nonetheless relocate them using symlinks.)


The following run-through of (much of) PostgreSQL's command set, operators, structures, and core concepts uses the included psql command-line tool for the shown examples. Return values are sometimes omitted for clarity.

The SELECT statement queries and returns rows. Spacing, newlines, and capitalisation are ignored. Statement (as usual) ends with semicolon. Lets you (or a process working with PostgreSQL) suck down and view data. Use "." as a delimiter to specify "tablename.columnname".

	(PRICE = 0 OR PRICE <= 15);
        FROM PART WHERE PRICE * 2 < 50;
	            P.PNO = SE.PNO;

(This is an example of a "join" inside a SELECT statement.)

      JOIN sells USING (sno)
      JOIN part USING (pno);
(Same example, but using the JOIN operator.)

JOIN statements without conditions are called CROSS JOINs or unqualified joins. Qualified joins can either be inner or outer joins, and can be specified using different types of operator (ON, USING, or NATURAL - use one only).

Only the most-basic (and most-needed) info on JOINs is addressed in this talk, as it is otherwise an advanced topic. Please see the references (at end) for more detail on them.

(In the following and most of what follows, we have used psql to open a database named "test".)

Specify column-header labels on SELECT output using the "AS" operator:

        test=> SELECT firstname AS buddy FROM friend ORDER BY buddy; 
        (5 rows)   

The "AS" trick can be used to label select results that otherwise would have had no appropriate column label, and would have been labelled "?column?" by default:

        test=> SELECT 1 + 3 AS total;
        (1 row)

The AND and OR operators can be used, with parentheses as needed:

test=> SELECT * FROM friend
        test-> WHERE firstname = 'Victor' AND (state = 'PA' OR state = 'NJ')
        test-> ORDER BY firstname;

Comparison operators: < <= = >= > <> != BETWEEN

Comparison operator LIKE:

        test=> SELECT * FROM friend
        test-> WHERE firstname LIKE 'D%'
        test-> ORDER BY firstname;

LIKE 'D%' means begins with D
LIKE '%D%' means contains a D
LIKE '_D%' means has D in second position ("_" matches a single character)
LIKE 'D%e%" means begins with D and contains e
LIKE 'D%e%f%' means begins with D, contains e, then f
NOT LIKE 'D%' means begins with non-D

Regular expression operators are also valid, and can be preceded by operators:
* means do case-insensitive regex
! means what doesn't match the regex
|* combines the two

Special characters to use in regexes:

^ means start
$ means end
. means any single character
[ccc] means a set of characters (c's being placeholders for actual chars)
^[ccc] means whatever doesn't match the set of characters
[c-c] means a range of characters
^[c-c] means whatever doesn't match the range of characters
? means zero or one of the previous character, repeated
* means zero or more of the previous characters
+ means one or more of the previous characters
| means OR operator

Because CHAR fields have fixed length, they are padded to end with trailing spaces, as needed. Thus, searches must often test for possible trailing spaces, e.g.,

        test=> -- firstname contains a non-S character 
        test=> SELECT * FROM friend  
        test-> WHERE firstname  ^'[S]'  

Any field not assigned a value gets the special value NULL. The only comparison operator that will match a NULL-value field is "WHERE foo IS NULL": Any other comparison will fail. (One can use "WHERE foo IS NOT NULL", if applicable.)

Aggregation operators:

AVG, COUNT, SUM, MIN, MAX... Return value based on using the operator against all rows meeting the WHERE clause. NULL field values are ignored, unless all values in a column are NULL, in which case, so is the aggregation value. Exception: The COUNT(*) operator counts NULLs, because it looks at entire rows. By contrast, COUNT(colname) does ignore NULLs.


Aggregation on groups:

Groups are partitions of tables (GROUP BY followed by attributes to satisfy). E.g.,

        test=> SELECT state, COUNT(*)
        test-> FROM friend
        test-> GROUP BY state;
         state | count
         MA    |     2
         MD    |     1
         NJ    |     2
         PA    |     1
        (4 rows)

HAVING clause is an aggregation operator, too. Specifies that a conditional test should be performed on aggregate values. E.g., which are the states that have more than one friend?

        test=> SELECT state, COUNT(*)
        test-> FROM friend
        test-> GROUP BY state
        test-> HAVING COUNT(*) > 1
        test-> ORDER BY state;
         state | count
         MA    |     2
         NJ    |     2
        (2 rows)

The \da command displays all aggregate operators.

CREATE statement establishes and declares properties of a new table. Example:

        test=> CREATE TABLE friend ( 
        test(>              firstname CHAR(15), 
        test(>              lastname  CHAR(20), 
        test(>              city      CHAR(15), 
        test(>              state     CHAR(2), 
        test(>              age       INTEGER 
        test(> );

There are many other column types, other than those two. Common ones are VARCHAR(length) = variable-length character string, FLOAT = +/- 2 x 10^9 floating-point number, NUMERIC(precision, decimal) = number with user-specified precision and number of decimal places, DATE, TIME, and TIMESTAMP = date + time.

Other more-complex types, not covered here, include TABLE (multidimensional arrays), and binary large objects ("BLOBs"). There's no distinct BLOB data type, but there are lo_import (load-import) and lo_export operators to handle them.

Filling tables with default values and timestamps: DEFAULT and TIMESTAMP (current date + time) are operators best illustrated by an example:

        test=> CREATE TABLE account (
        test(>         name     CHAR(20),
        test(>         balance  NUMERIC(16,2) DEFAULT 0,
        test(>         active   CHAR(1) DEFAULT 'Y',
        test(>         created  TIMESTAMP DEFAULT CURRENT_TIMESTAMP
        test(> );

Every table automatically gets a normally-not-show column called "oid" (object ID), storing a numeric row-counter value (type OID) that uniquely identifies a row. OIDs do not (normally) survive dump/restore, but you can include them with a command-line flag Nor can you reset them to desired values if (e.g.) you delete and replace a row. An OID will always be globally unique across a PostgreSQL installation. Thus, within a table, there will be gaps. Use a separate row-numbering field using a "sequence" operator (covered later), if you want to eliminate gaps.

INSERT statement adds a row (cf. DELETE, UPDATE). Example:

        test=> INSERT INTO friend VALUES (
        test(>                            'Mike',
        test(>                            'Nichols',
        test(>                            'Tampa', 
        test(>                            'FL', 
        test(>                            19 
        test(> ); 

Note carefully the use of single-quotation marks for character elements, never double. Spacing, newlines, and capitalisation outside of quotation marks are ignored. You must make sure you line up field order with the table definition.

To embed a single-quotation character in a string, either escape it with backslash or double the single-quotation character.


WHERE clause: Used in SELECT, UPDATE, DELETE to limit which rows you're interested in, by some condition.

         test=> SELECT city, state FROM friend WHERE age >= 21;

ORDER BY clause for a SELECT statement sorts its display output. Can specify multiple sort keys.

         test=> SELECT * FROM friend ORDER BY city,state;

The underlying table isn't changed at all.

DESC operator = descending order. Ascending is ORDER's default.

         test=> SELECT * FROM friend ORDER BY age DESC;

Put any ORDER clause after any WHERE clause.

CASE clause implements multiple IF-THENs, useful for inserting conditional strings on the fly into SELECT results:

        test=> SELECT firstname,
        test->        age,
        test->        CASE
        test->            WHEN age >= 21 THEN 'adult'
        test->            ELSE 'minor'
        test->        END
        test-> FROM friend
        test-> ORDER BY firstname;

DISTINCT clause eliminates duplicate rows prior to display:

        test=> SELECT DISTINCT state FROM friend ORDER BY state;

Functions take some arguments, return a single value. \df displays functions available (and their arguments). \dd command displays detail about an individual function or group of functions.

SET command changes PostgreSQL global environment variables for the current session. (E.g., SET DATESTYLE TO ISO; SET TIMEZONE TO PST8PDT;)

DROP statement destroys tables. (Caution: It really deletes them, not just puts them away. Cf DELETE statement, which deletes only specified rows, not entire tables.)

         test=> DROP friend

DELETE statement removes rows.

         test=> DELETE FROM friend WHERE lastname = 'Barnes';

Caution: The statement "DELETE FROM friend" with no WHERE clause will delete all rows.

UPDATE statement edits existing rows.

         test=> UPDATE friend SET age = 20 WHERE firstname = 'Mike';

Caution: Again, without the WHERE clause, all rows would be updated.

Comments: Comments in lines of (PostgreSQL's) SQL can take either of two forms:

        test=> -- a single line comment
        test=> /* a multiline
        test*>    comment */

Primary key means a table's unique identifier. Foreign key is a column containing data that is unique in a different table that is also of interest.

Sequences are autoincrementing counters, unique within a given table. You must use the CREATE SEQUENCE command to make them:

        test=> CREATE SEQUENCE functest_seq;

Related functions:

nextval('name') Returns the next available sequence number, and updates the counter
currval('name') Returns the sequence number from the previous nextval() call
setval('name', newval) Sets the sequence number counter to the specified value

Note that you're not dealing with table manipulation, in invoking those three functions. Nextval returns ever-increasing values. Curval repeats the prior return value, leaving the counter untouched. Setval sets the counter to a specific value.

        test=> CREATE SEQUENCE customer_seq; 
        test=> CREATE TABLE customer ( 
        test(> customer_id INTEGER DEFAULT nextval('customer_seq'), 
        test(> name CHAR(30) 
        test(> ); 
        test=> INSERT INTO customer VALUES (nextval('customer_seq'), 
               'Bread Makers');
        test=> INSERT INTO customer (name) VALUES ('Wax Carvers'); 
        test=> INSERT INTO customer (name) VALUES ('Pipe Fitters');
        test=> SELECT * FROM customer; 
         customer_id |              name               
                   1 | Bread Makers                   
                   2 | Wax Carvers                   
                   3 | Pipe Fitters                      
        (3 rows)   

(Default value is used in an INSERT wherever it's not supplied specifically.)

Support variables that are auto-provided by PostgreSQL:

   CURRENT_DATE       current date
   CURRENT_TIME       current time
   CURRENT_USER       username who's connected to database


Grouping statements into a batched transaction, between BEGIN WORK and COMMIT WORK statements, performs the group as a single operation. If the COMMIT WORK has not been run, it can be backed out using ROLLBACK WORK.

Only committed transactions are visible to users. Statements comprising a partially completed WORK block are invisible to them.

Conflict management:

Coder defines what happens when user A starts a transaction, and user B commits it while the first is underway: There are different isolation levels.

Default isolation level, SET TRANSACTION ISOLATION LEVEL READ COMMITTED, lets you see other commits while your transaction is open. Automatic default is exclusive lock (I think).

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE prevents the current transaction from seeing commits made by other transactions. Also, an uncompleted transaction on the same row as a committed one automatically rolls back.

Alternatively, use SELECT FOR UPDATE instead of SELECT: This causes the row to remain locked until the UPDATE statement following.

Alternatively, gain even greater programmatic control by using the LOCK command (not detailed here).


Default of reading entire tables to find relevant rows is very slow. If a column is frequently used in WHERE or ORDER BY clauses, you should create an index for it.

        test=> CREATE INDEX customer_custid_idx ON customer (customer_id);

It will be used automatically to find rows quickly.

Indexes can be sorted on multiple key columns:

        test=> CREATE INDEX customer_age_sex ON customer (age, sex)

Delete indexes using the DROP INDEX command.

LIMIT clause says how many rows to return. LIMIT...OFFSET says how far from the end to return that many rows.

        test=> SELECT customer_id FROM customer ORDER BY customer_id LIMIT 3;
        test=> SELECT customer_id FROM customer ORDER BY customer_id LIMIT 3 OFFSET 997;


Place where return values of a SELECT are stored, to be retrieved individually as needed. Not covered further here.

COPY...FROM does bulk import from external file.
COPY...TO does bulk export to external file

        test=> COPY copytest TO '/tmp/copytest.out'; 
        test=> COPY copytest FROM '/tmp/copytest.out'; 

File format is tab-delimited, one record per line. Tab can be changed to something else using USING DELIMITERS.

        test=> COPY copytest TO '/tmp/copytest.out' USING DELIMITERS '|';

Ditto for COPY...FROM.

Note that the file specified for COPY can be stdin or stdout. Otherwise, always use full pathnames.

A different character for NULL can be specified using WITH NULL AS.

Functions and Triggers = stored procedures.

You can create functions for PostgreSQL in C and other languages. Also, can create them in SQL and store them: CREATE FUNCTION, DROP FUNCTION. (Not further detailed, here.)

Trigger is a SQL query to be executed when a row is modified, to replace that row. For its internal purposes, keywords "new" and "old" stand for the rows to be written and replaced, respectively. Or a trigger may execute when a row is inserted. Or a trigger may be executed when a row is deleted.

        test=> CREATE TRIGGER trigger_statename
        test-> ON statename
        test-> FOR EACH ROW
        test-> EXECUTE PROCEDURE trigger_insert_update_statename();

(The function trigger_insert_update_statename would be declared separately.)


Upon table creation, by default only its creator has any read or write access. Table's creator/owner can change this using GRANT and REVOKE:

        test=> CREATE TABLE permtest (col INTEGER); 
        test=> -- now only the owner can use permtest 
        test=> GRANT SELECT ON permtest TO meyers; 
        test=> -- now user 'meyers' can do SELECTs on permtest 
        test=> GRANT ALL ON permtest TO PUBLIC;   
        test=> -- now all users can perform all operations on permtest 

Rights SELECT , UPDATE , DELETE , RULE , and ALL can be GRANTed to named users or to PUBLIC. (REVOKE syntax is similar, not covered here.)


Abstract objects (pseudo-tables) to present parts of one or more real table in selective, specified manner. Has its own access permissions. Implemented internally as stored SELECT statements.

        test=> CREATE VIEW customer_ohio AS
        test-> SELECT *
        test-> FROM customer
        test-> WHERE state = 'OH';
        CREATE 18908 1
        test=> -- let sanders see only Ohio customers
        test=> GRANT SELECT ON customer_ohio TO sanders;
        test=> -- create view to show only certain columns
        test=> CREATE VIEW customer_address AS
        test-> SELECT customer_id, name, street, city, state, zipcode, country
        test-> FROM customer;
        CREATE 18909 1
        test=> -- create view that combines fields from two tables
        test=> CREATE VIEW customer_finance AS
        test-> SELECT customer.customer_id,, finance.credit_limit
        test-> FROM customer, finance
        test-> WHERE customer.customer_id = finance.customer_id;
        CREATE 18910 1

Delete a view using DROP VIEW. You can't INSERT, UPDATE, or DELETE on a VIEW -- but you can control access using "rules".


Control what effects a SELECT, INSERT, UPDATE, or DELETE may have on specified tables, or on specified rows of specified tables. Managed using the CREATE_RULE and DROP_RULE commands. Watch out for infinite loops.

        test=> -- this prevents any INSERTs into new table ruletest:
        test=> CREATE TABLE ruletest (col INTEGER); 
        test=> CREATE RULE ruletest_insert AS     -- rule name 
        test-> ON INSERT TO ruletest              -- INSERT rule 
        test-> DO INSTEAD                         -- DO INSTEAD-type rule 
        test->     NOTHING;                       -- ACTION is NOTHING 
        CREATE 18932 1 
        test=> INSERT INTO ruletest VALUES (1); 
        test=> SELECT * FROM ruletest; 
        (0 rows) 


Parent-child relationship between tables, whereby the child includes all columns of its parent, plus others.

        test=> CREATE TABLE parent_test (col1 INTEGER);
        test=> CREATE TABLE child_test (col2 INTEGER) INHERITS (parent_test);

If you subsequently refer to the parent table with an asterix suffix on its name, results will be based on fields common to the parent and all children.

Starting v. 7.1, the asterisk can be omitted: Inheritance is used by default, and can be suppressed using the ONLY keyword tablename modifier.


VACUUM command purges "expired" rows (replaced, marked as unused). Run nightly. (Exclusive-locks the named table.) VACUUM with no parameter will work on all tables.


Only initial user is superuser, default name "postgres". Others must be created and GRANTed [/REVOKEd] access permissions. Usernames need not correspond to the system user database.

From command line, "createuser" can add users and optionally assign passwords, and "dropuser" can delete users. In SQL, use CREATE USER, ALTER USER, and DROP USER commands:

        test=> CREATE USER demouser2; 
        CREATE USER 
        test=> ALTER USER demouser2 CREATEDB; 
        ALTER USER 
        test=> CREATE GROUP demogroup WITH USER demouser1, demouser2; 
        test=> CREATE TABLE grouptest (col INTEGER); 
        test=> GRANT ALL on grouptest TO GROUP demogroup; 
        test=> \connect test demouser2  
        You are now connected to database test as user demouser2. 
        test=> \q   

Creating/deleting databases:

From command line, PostgreSQL's superuser and any users given adequate access can use "createdb" and "dropdb". In SQL, use CREATE DATABASE and DROP DATABASE commands.

Access rights:

Controllable based on databasee name, username, TCP/IP address -- via the data/pg_hba.conf file. Unless PostgreSQL starts with -i flag, access is open only to local users. User passwords go in table pg_shadow. Several types of entries in pg_hba.conf:


    local        all                                         trust
    host         all   trust

The above is default pg_hba.conf contents.

    host         all   crypt
    host         all   password finance

The latter line says this subnet's users password-authenticate against
table "finance" holding username/password info.

   host         sales   ident
   host         sales   ident support

Concerns database sales. Latter line says not only use identd authentication, but also look up a mapping of network usernames to PostgreSQ ones in table "support".


If PostgreSQL isn't running, can use system backup tools. (Database files aren't a moving target.) If it is running, must use pg_dumpall (all databases) or pg_dump (specified database) to make snapshot. The dump (snapshot) file has just the ASCII SQL commands that specify the table contents.

To restore, CREATE the table, then reload it directly using psql:

     $ psql newtest < /tmp/test.dump

List of all tables of system internals (data types, functions , operators , databases, users, and groups):

     pg_aggregate    aggregates
     pg_attribute    columns
     pg_class        tables
     pg_database     databases
     pg_description  comments
     pg_group        groups
     pg_index        indexes
     pg_log          transaction status
     pg_operator     operators
     pg_proc         functions
     pg_rewrite      rules (not covered here) and views
     pg_shadow       users
     pg_trigger      triggers
     pg_type         types
     pg_user         a view based on pg_shadow that omits the password field


psql special characters:

Backslash-r resets (erases) query buffer.
Backslash-? prints all backslash commands.
Backslash-h prints all SQL commands.
Backslash-q quits.
Backslash-d [tablename] gets table description. Default is info on all tables.
Backslash-x enables expanded-display mode, in which fields are displayed 90 degrees rotated from usual, to prevent too-wide lines.
Backslash-& means &, which by itself is reserved for LIKE pattern matches.
Backslash-$ means $, which by itself is reserved for regex matches (EOL). (Generally, backslash-character escapes character's reserved role.)
Backslash-N means NULL.

Notes from installation:

Database startup:

/usr/lib/postgresql/bin/postmaster -D /var/lib/postgres/data
/usr/lib/postgresql/bin/pg_ctl -D /var/lib/postgres/data -l logfile start

/usr/bin/createdb: create a database
/usr/bin/createuser: enable other users to connect to a database

Configuration files /etc/postgresql/postmaster.conf, postgresql.conf, postgresql.env. Latter is read into environment for user postgresql. pg_hba.conf file is parsed for database access rights.

"postmaster" is name of main binary, which handles all access. It spawns session instances of the database engine binary, "postgres". Optional launch/shutdown utility for postmaster: pg_ctl.

Postgresql defaults to fsync() on every record write operation. Lowers performance; increases data safety.



1. Create a 20-or-so item address book in a PostgreSQL table. Display it sorted on each field. Create INDEX object for one or more fields (e.g., ZIP or postal code). Create a view that includes just the first and last names.

2. Dump the address book database to a dump file, including OIDs. Remove all of PostgreSQL. Reinstall it. Reload your address book, including OIDs, and verify that it's exactly as before. Note the dumpfile's syntax.

Further Reading:

PostgreSQL: Introduction and Concepts, by Bruce Momjian

Grateful acknowledgement is hereby given to Mr. Momjian's text, from which I have directly taken almost all of my examples, above.

Practical PostgreSQL, by John Worsley and Joshua Drake (book in progress),

Silicon Tao's PostgreSQL Tutorial

Several books from Rich Morin at Prime Time Freeware

Techniques for upgrading from Oracle to PostgreSQL