PostgreSQL - An Overview
by Rick MoenRevised: Wednesday, 2001-12-05
Master version will be at http://linuxmafia.com/faq/Apps/postgresql.html and I'll try to improve it there.
History and concepual overview of relational databases, SQL, and PostgreSQL:
1970: E.F. Codd of IBM defined the relational model, naming a number of theoretical characteristics a relational database must satisfy. Note that it immediately became a buzzword for heavy-duty databases, and no real database satisfies all of Dodd's criteria. Dodd defined a type of programming language for relational databases, Structured Query Language, used for both interactive queries and reports.
Relational database is organised into tables aka relations, each with a set number of columns representing data categories. Each row is a unique instance of the data represented (a record or tuple).
"View" is an abstract table containing selected data from other tables, a virtual report.
"Domain" of a column is the range of values it could in theory contain.
"Constraints" are rules for what a column may contain in the context of a specific table, from among its theoretical domain. "Metadata" (or "relation scheme") is the set of all formal descriptions of the tables, columns, domains, and constraints comprising a database.
Queries can use "relational algebra": SELECT, PROJECT, PRODUCT, UNION, INTERSECT, DIFFERENCE, JOIN, DIVIDE.
Queries can use "relational calculus". Uses logical notation instead of arithmetic/albegraic ones.
ACID test:
Atomicity: Entire sequence of actions in a transaction must either be completed or aborted. No partials.
Consistency: Transation takes the database from one consistent state to another.
Isolation: Transaction's effect is not visible to other transactions until it is completed.
Durability: Once committed, a transaction's effects are permanent and survive the system going down.
1974: Donald Chamberlain and others at IBM Watson Research Center invented SEQUEL (Structured English QUEry Language), following Codd's rules, for a series of databases at IBM, with direct descendants DB2, Oracle, and Sybase.
SQL pioneered "transactional" databases: Changes to the database are implemented in an atomic operation (as with a journaling filesystem) -- helps prevent corruption & locking problems.
1977: Ingres database developed at UC Berkeley as an academic project to explore the relational model.
1986: POSTGRES DBMS, successor to Ingres, created by UCB Prof. Michael Stonebraker, as an object-relational database.
1992 (approx.): Illustra Information Technologies (now called Informix) forked off a proprietary variant, Informix.
1995: UC Berkeley graduate students Jolly Chen and Andrew Yu added SQL (Structured Query Language) to POSTGRES. Others recoded the entire project in ANSI C. It was then renamed Postgres95.
1996: Name changed to PostgreSQL: Adds subqueries. Table locking replaced with multi-version concurrency control. (Readers do not block writers. Writers do not block readers. Finer-grained than row-level locking, though such locks are also available.) Mostly SQL92- and SQL99-compliant.
What's Missing (compared to, e.g., Oracle):
- User-defined types
- Performance if you want filesystem safety (when in fsync
mode)
- Filesystem safety if you want performance (when in no-fsync
mode)
- Full text indexing on binary file objects, e.g., MS-Word,
MS-Excel, PDF.
- Built-in enhanced reporting and control over export/display of
reports
- Extreme reliability
- Highly developed commercial support and third-party
developers
- Price gouging
The limitations on capacity are now so huge as to be hardly worth mentioning (e.g., 1GB size limit per field).
This list used to include rules, inheritance, and multi-version concurrency control (which reduces lock contention), but PostgreSQL now has them.
Middleware access:
- PsqlODBC (provided): Easiest access for MS-Access, Star Office, Applixware clients
- Openlink ODBC
- PHP3, PHP4 (Web)
- Zope (Web)
- ecpg (Embedded C PostGreSQL) & C
- jdbc & Java
- Vibe Java/C++ is at http://www.LinuxMall.com/products/00487.html
- JccWarrior ftp://ftp.redhat.com/pub/contrib/i386/jcc*.rpm
- Kanchenjunga Java RAD Tool for PostgreSQL http://www.man.ac.uk/~whaley/kj/kanch.html
- Ruby
- Perl DBI, mod_perl, CGI.pm (Web)
- EmbPerl
- Python PyGreSQL (similar to Oracle's PL/SQL)
- Scheme/guile/elisp
- Mason (Web)
- ColdFusion (Web)
- OpenACS (Web) -- imitation of ArsDigita's ACS
- Lesstiff Motif tool ftp://ftp.redhat.com/pub/contrib/i386/lesstiff*.rpm
- Tcl/Tk http://www.scriptics.com
- Object oriented extension of Tcl called INCR at http://www.tcltk.com/
- Visual TCL site http://www.neuron.com/
- Visual TCL Redhat rpm at ftp://ftp.redhat.com/pub/contrib/i386/visualtcl*.rpm
http://sunscript.sun.com/
http://sunscript.sun.com/TclTkCore/
ftp://ftp.sunlabs.com/pub/tcl/tcl8.0a2.tar.Z - libpgtcl & TCL
- SpecTCL ftp://ftp.redhat.com/pub/contrib/i386/spec*.rpm
- FreeBuilder ftp://ftp.redhat.com/pub/contrib/i386/free*.rpm
- Applixware Tool http://www.redhat.com/
- XWPE X Windows Programming Environment
http://www.rpi.edu/~payned/xwpe/
ftp://ftp.redhat.com/pub/contrib/i386/xwpe*.rpm - XWB X Windows Work Bench ftp://ftp.redhat.com/pub/contrib/i386/xwb*.rpm
- NEdit ftp://ftp.redhat.com/pub/contrib/i386/nedit*.rpm
- You can also use Borland C++ Builder, Delphi, Borland JBuilder, PowerBuilder on Windows95 connecting to PostgreSQL on Unix box through ODBC/JDBC drivers.
- (many others)
Front ends:
- psql: simple command line using GNU readline. Used for
examples below.
- pgaccess (provided): GUI with report generator (Tcl/Tk)
- pgAdmin: Comprehensive admin utility for Win32
- phpPgAdmin: Web-based administration
- Grapical query tools furnished by KDE and GNOME.
- Twig Web/groupware application
- X-Designer supports C++, Java and MFC http://www.ist.co.uk/xd
- Qt for Windows95 and Unix at http://www.troll.no
- Code Crusader
- Code Warrior http://www.kaze.stetson.edu/cdevel/code_crusader/about.html
- Code Warrior from MetroWorks http://www.metrowerks.com
- GNU Prof C++ IDE from Redhat/Cygnus
- Borland C++ Builder for Linux
- Borland Java JBuilder for Linux
Security (also addressed in Administration section, below):
- Internal login mechanism: Login restrictable based on username, host, database.
- Optional Kerberos authentication.
- Optional SSH/SSL remote access.
- Supports virtual hosts.
- Absent "-i" startup flag, Postgresql defaults to local socket access only, no TCP/IP.
Performance tips:
no-fsync mode (command option -F) speeds up disk access: Prevents flushing data to disk at the end of each transaction.
Batch your "inserts" into a COPY command (covered below).
Use indexes (covered below). Use LIMIT operator on SELECTs. Disadvantage is extra RAM and storage usage.
Run VACUUM (covered below).
Tune amount of shared buffer memory using -B startup flag.
Tune amount of sort memory using -S startup flag.
Spread databases among physical disk drives, specifying their pathnames using "initlocation" utility to create "locations" that will then be known to PostgreSQL, which you can subsequently specify when using createdb command-line tool or CREATEDB SQL statement. (Once database files, tables, and indexes already exist, you can nonetheless relocate them using symlinks.)
Commands:
The following run-through of (much of) PostgreSQL's command set, operators, structures, and core concepts uses the included psql command-line tool for the shown examples. Return values are sometimes omitted for clarity.
The SELECT statement queries and returns rows. Spacing,
newlines, and capitalisation are ignored. Statement (as usual)
ends with semicolon. Lets you (or a process working with
PostgreSQL) suck down and view data. Use "." as a delimiter to
specify "tablename.columnname".
SELECT * FROM PART WHERE PRICE > 10; SELECT PNAME, PRICE FROM PART WHERE PRICE > 10; SELECT PNAME, PRICE FROM PART WHERE PNAME = 'Bolt' AND (PRICE = 0 OR PRICE <= 15); SELECT PNAME, PRICE * 2 AS DOUBLE FROM PART WHERE PRICE * 2 < 50; SELECT S.SNAME, P.PNAME FROM SUPPLIER S, PART P, SELLS SE WHERE S.SNO = SE.SNO AND P.PNO = SE.PNO;
(This is an example of a "join" inside a SELECT statement.)
SELECT SNAME, PNAME FROM SUPPLIER JOIN sells USING (sno) JOIN part USING (pno);(Same example, but using the JOIN operator.)
JOIN statements without conditions are called CROSS JOINs or unqualified joins. Qualified joins can either be inner or outer joins, and can be specified using different types of operator (ON, USING, or NATURAL - use one only).
- CROSS JOIN: e.g., T1 CROSS JOIN T2 -- all possible combinations
- ON: takes search condition (same as WHERE clause)
- USING: takes a comma-separated list of column names, which the tables must have in common, and makes a joined table of rows where those columns hold equal values.
- NATURAL: A shorthand for a USING clause that uses all column names that are common among the listed tables.
- INNER (default): Rows in join table created only when condition satisfied.
- OUTER: Nulls filled into condition-value columns of join table for when condition isn't satisfied.
Only the most-basic (and most-needed) info on JOINs is addressed in this talk, as it is otherwise an advanced topic. Please see the references (at end) for more detail on them.
(In the following and most of what follows, we have used psql
to open a database named "test".)
Specify column-header labels on SELECT output using the "AS" operator:
test=> SELECT firstname AS buddy FROM friend ORDER BY buddy; buddy ----------------- Cindy Jack Mark Mike Sam (5 rows)
The "AS" trick can be used to label select results that otherwise would have had no appropriate column label, and would have been labelled "?column?" by default:
test=> SELECT 1 + 3 AS total; total ------- 4 (1 row)
The AND and OR operators can be used, with parentheses as needed:
test=> SELECT * FROM friend test-> WHERE firstname = 'Victor' AND (state = 'PA' OR state = 'NJ') test-> ORDER BY firstname;
Comparison operators: < <= = >= > <> !=
BETWEEN
Comparison operator LIKE:
test=> SELECT * FROM friend test-> WHERE firstname LIKE 'D%' test-> ORDER BY firstname;
LIKE 'D%' means begins with D
LIKE '%D%' means contains a D
LIKE '_D%' means has D in second position ("_" matches a single
character)
LIKE 'D%e%" means begins with D and contains e
LIKE 'D%e%f%' means begins with D, contains e, then f
NOT LIKE 'D%' means begins with non-D
Regular expression operators are also valid, and can be
preceded by operators:
* means do case-insensitive regex
! means what doesn't match the regex
|* combines the two
Special characters to use in regexes:
^ means start
$ means end
. means any single character
[ccc] means a set of characters (c's being placeholders for
actual chars)
^[ccc] means whatever doesn't match the set of characters
[c-c] means a range of characters
^[c-c] means whatever doesn't match the range of characters
? means zero or one of the previous character, repeated
* means zero or more of the previous characters
+ means one or more of the previous characters
| means OR operator
Because CHAR fields have fixed length, they are padded to end with trailing spaces, as needed. Thus, searches must often test for possible trailing spaces, e.g.,
test=> -- firstname contains a non-S character test=> SELECT * FROM friend test-> WHERE firstname ^'[S]'
Any field not assigned a value gets the special value NULL. The only comparison operator that will match a NULL-value field is "WHERE foo IS NULL": Any other comparison will fail. (One can use "WHERE foo IS NOT NULL", if applicable.)
Aggregation operators:
AVG, COUNT, SUM, MIN, MAX... Return value based on using the operator against all rows meeting the WHERE clause. NULL field values are ignored, unless all values in a column are NULL, in which case, so is the aggregation value. Exception: The COUNT(*) operator counts NULLs, because it looks at entire rows. By contrast, COUNT(colname) does ignore NULLs.
test-> SELECT AVG(PRICE) AS AVG_PRICE FROM PART;
Aggregation on groups:
Groups are partitions of tables (GROUP BY followed by attributes to satisfy). E.g.,
test=> SELECT state, COUNT(*) test-> FROM friend test-> GROUP BY state; state | count -------+------- MA | 2 MD | 1 NJ | 2 PA | 1 (4 rows)
HAVING clause is an aggregation operator, too. Specifies that a conditional test should be performed on aggregate values. E.g., which are the states that have more than one friend?
test=> SELECT state, COUNT(*) test-> FROM friend test-> GROUP BY state test-> HAVING COUNT(*) > 1 test-> ORDER BY state; state | count -------+------- MA | 2 NJ | 2 (2 rows)
The \da command displays all aggregate operators.
CREATE statement establishes and declares properties of a new table. Example:
test=> CREATE TABLE friend ( test(> firstname CHAR(15), test(> lastname CHAR(20), test(> city CHAR(15), test(> state CHAR(2), test(> age INTEGER test(> );
There are many other column types, other than those two. Common ones are VARCHAR(length) = variable-length character string, FLOAT = +/- 2 x 10^9 floating-point number, NUMERIC(precision, decimal) = number with user-specified precision and number of decimal places, DATE, TIME, and TIMESTAMP = date + time.
Other more-complex types, not covered here, include TABLE (multidimensional arrays), and binary large objects ("BLOBs"). There's no distinct BLOB data type, but there are lo_import (load-import) and lo_export operators to handle them.
Filling tables with default values and timestamps: DEFAULT and
TIMESTAMP (current date + time) are operators best illustrated
by an example:
test=> CREATE TABLE account ( test(> name CHAR(20), test(> balance NUMERIC(16,2) DEFAULT 0, test(> active CHAR(1) DEFAULT 'Y', test(> created TIMESTAMP DEFAULT CURRENT_TIMESTAMP test(> );
Every table automatically gets a normally-not-show column called "oid" (object ID), storing a numeric row-counter value (type OID) that uniquely identifies a row. OIDs do not (normally) survive dump/restore, but you can include them with a command-line flag Nor can you reset them to desired values if (e.g.) you delete and replace a row. An OID will always be globally unique across a PostgreSQL installation. Thus, within a table, there will be gaps. Use a separate row-numbering field using a "sequence" operator (covered later), if you want to eliminate gaps.
INSERT statement adds a row (cf. DELETE, UPDATE). Example:
test=> INSERT INTO friend VALUES ( test(> 'Mike', test(> 'Nichols', test(> 'Tampa', test(> 'FL', test(> 19 test(> );
Note carefully the use of single-quotation marks for character elements, never double. Spacing, newlines, and capitalisation outside of quotation marks are ignored. You must make sure you line up field order with the table definition.
To embed a single-quotation character in a string, either escape it with backslash or double the single-quotation character.
'O\'Donnell' 'O''Donnell'
WHERE clause: Used in SELECT, UPDATE, DELETE to limit which
rows you're interested in, by some condition.
test=> SELECT city, state FROM friend WHERE age >= 21;
ORDER BY clause for a SELECT statement sorts its display output. Can specify multiple sort keys.
test=> SELECT * FROM friend ORDER BY city,state;
The underlying table isn't changed at all.
DESC operator = descending order. Ascending is ORDER's default.
test=> SELECT * FROM friend ORDER BY age DESC;
Put any ORDER clause after any WHERE clause.
CASE clause implements multiple IF-THENs, useful for inserting
conditional strings on the fly into SELECT results:
test=> SELECT firstname, test-> age, test-> CASE test-> WHEN age >= 21 THEN 'adult' test-> ELSE 'minor' test-> END test-> FROM friend test-> ORDER BY firstname;
DISTINCT clause eliminates duplicate rows prior to display:
test=> SELECT DISTINCT state FROM friend ORDER BY state;
Functions take some arguments, return a single value. \df displays functions available (and their arguments). \dd command displays detail about an individual function or group of functions.
SET command changes PostgreSQL global environment variables for the current session. (E.g., SET DATESTYLE TO ISO; SET TIMEZONE TO PST8PDT;)
DROP statement destroys tables. (Caution: It really deletes
them, not just puts them away. Cf DELETE statement, which
deletes only specified rows, not entire tables.)
test=> DROP friend
DELETE statement removes rows.
test=> DELETE FROM friend WHERE lastname = 'Barnes';
Caution: The statement "DELETE FROM friend" with no WHERE clause will delete all rows.
UPDATE statement edits existing rows.
test=> UPDATE friend SET age = 20 WHERE firstname = 'Mike';
Caution: Again, without the WHERE clause, all rows would be updated.
Comments: Comments in lines of (PostgreSQL's) SQL can take
either of two forms:
test=> -- a single line comment test=> /* a multiline test*> comment */
Primary key means a table's unique identifier. Foreign key is a column containing data that is unique in a different table that is also of interest.
Sequences are autoincrementing counters, unique within a given
table. You must use the CREATE SEQUENCE command to make
them:
test=> CREATE SEQUENCE functest_seq;
Related functions:
nextval('name') Returns the next available sequence number,
and updates the counter
currval('name') Returns the sequence number from the previous
nextval() call
setval('name', newval) Sets the sequence number counter to the
specified value
Note that you're not dealing with table manipulation, in invoking those three functions. Nextval returns ever-increasing values. Curval repeats the prior return value, leaving the counter untouched. Setval sets the counter to a specific value.
test=> CREATE SEQUENCE customer_seq; test=> CREATE TABLE customer ( test(> customer_id INTEGER DEFAULT nextval('customer_seq'), test(> name CHAR(30) test(> ); test=> INSERT INTO customer VALUES (nextval('customer_seq'), 'Bread Makers'); test=> INSERT INTO customer (name) VALUES ('Wax Carvers'); test=> INSERT INTO customer (name) VALUES ('Pipe Fitters'); test=> SELECT * FROM customer; customer_id | name -------------+-------------------------------- 1 | Bread Makers 2 | Wax Carvers 3 | Pipe Fitters (3 rows)
(Default value is used in an INSERT wherever it's not supplied specifically.)
Support variables that are auto-provided by PostgreSQL:
CURRENT_DATE current date CURRENT_TIME current time CURRENT_TIMESTAMP both CURRENT_USER username who's connected to database
Transactions:
Grouping statements into a batched transaction, between BEGIN WORK and COMMIT WORK statements, performs the group as a single operation. If the COMMIT WORK has not been run, it can be backed out using ROLLBACK WORK.
Only committed transactions are visible to users. Statements comprising a partially completed WORK block are invisible to them.
Conflict management:
Coder defines what happens when user A starts a transaction, and user B commits it while the first is underway: There are different isolation levels.
Default isolation level, SET TRANSACTION ISOLATION LEVEL READ COMMITTED, lets you see other commits while your transaction is open. Automatic default is exclusive lock (I think).
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE prevents the current transaction from seeing commits made by other transactions. Also, an uncompleted transaction on the same row as a committed one automatically rolls back.
Alternatively, use SELECT FOR UPDATE instead of SELECT: This causes the row to remain locked until the UPDATE statement following.
Alternatively, gain even greater programmatic control by using the LOCK command (not detailed here).
Indexes:
Default of reading entire tables to find relevant rows is very slow. If a column is frequently used in WHERE or ORDER BY clauses, you should create an index for it.
test=> CREATE INDEX customer_custid_idx ON customer (customer_id);
It will be used automatically to find rows quickly.
Indexes can be sorted on multiple key columns:
test=> CREATE INDEX customer_age_sex ON customer (age, sex)
Delete indexes using the DROP INDEX command.
LIMIT clause says how many rows to return. LIMIT...OFFSET says
how far from the end to return that many rows.
test=> SELECT customer_id FROM customer ORDER BY customer_id LIMIT 3; test=> SELECT customer_id FROM customer ORDER BY customer_id LIMIT 3 OFFSET 997;
Cursor:
Place where return values of a SELECT are stored, to be retrieved individually as needed. Not covered further here.
COPY...FROM does bulk import from external file.
COPY...TO does bulk export to external file
test=> COPY copytest TO '/tmp/copytest.out'; test=> COPY copytest FROM '/tmp/copytest.out';
File format is tab-delimited, one record per line. Tab can be changed to something else using USING DELIMITERS.
test=> COPY copytest TO '/tmp/copytest.out' USING DELIMITERS '|';
Ditto for COPY...FROM.
Note that the file specified for COPY can be stdin or stdout. Otherwise, always use full pathnames.
A different character for NULL can be specified using WITH NULL AS.
Functions and Triggers = stored procedures.
You can create functions for PostgreSQL in C and other languages. Also, can create them in SQL and store them: CREATE FUNCTION, DROP FUNCTION. (Not further detailed, here.)
Trigger is a SQL query to be executed when a row is modified, to replace that row. For its internal purposes, keywords "new" and "old" stand for the rows to be written and replaced, respectively. Or a trigger may execute when a row is inserted. Or a trigger may be executed when a row is deleted.
test=> CREATE TRIGGER trigger_statename test-> BEFORE INSERT OR UPDATE test-> ON statename test-> FOR EACH ROW test-> EXECUTE PROCEDURE trigger_insert_update_statename();
(The function trigger_insert_update_statename would be declared separately.)
GRANT and REVOKE
Upon table creation, by default only its creator has any read or write access. Table's creator/owner can change this using GRANT and REVOKE:
test=> CREATE TABLE permtest (col INTEGER); CREATE test=> -- now only the owner can use permtest test-> test=> GRANT SELECT ON permtest TO meyers; CHANGE test=> -- now user 'meyers' can do SELECTs on permtest test=> test=> GRANT ALL ON permtest TO PUBLIC; CHANGE test=> -- now all users can perform all operations on permtest test=>
Rights SELECT , UPDATE , DELETE , RULE , and ALL can be GRANTed to named users or to PUBLIC. (REVOKE syntax is similar, not covered here.)
VIEWs:
Abstract objects (pseudo-tables) to present parts of one or more real table in selective, specified manner. Has its own access permissions. Implemented internally as stored SELECT statements.
test=> CREATE VIEW customer_ohio AS test-> SELECT * test-> FROM customer test-> WHERE state = 'OH'; CREATE 18908 1 test=> test=> -- let sanders see only Ohio customers test=> GRANT SELECT ON customer_ohio TO sanders; CHANGE test=> test=> -- create view to show only certain columns test=> CREATE VIEW customer_address AS test-> SELECT customer_id, name, street, city, state, zipcode, country test-> FROM customer; CREATE 18909 1 test=> test=> -- create view that combines fields from two tables test=> CREATE VIEW customer_finance AS test-> SELECT customer.customer_id, customer.name, finance.credit_limit test-> FROM customer, finance test-> WHERE customer.customer_id = finance.customer_id; CREATE 18910 1
Delete a view using DROP VIEW. You can't INSERT, UPDATE, or DELETE on a VIEW -- but you can control access using "rules".
Rules:
Control what effects a SELECT, INSERT, UPDATE, or DELETE may have on specified tables, or on specified rows of specified tables. Managed using the CREATE_RULE and DROP_RULE commands. Watch out for infinite loops.
test=> -- this prevents any INSERTs into new table ruletest: test=> CREATE TABLE ruletest (col INTEGER); CREATE test=> CREATE RULE ruletest_insert AS -- rule name test-> ON INSERT TO ruletest -- INSERT rule test-> DO INSTEAD -- DO INSTEAD-type rule test-> NOTHING; -- ACTION is NOTHING CREATE 18932 1 test=> INSERT INTO ruletest VALUES (1); test=> SELECT * FROM ruletest; col ----- (0 rows)
Inheritance:
Parent-child relationship between tables, whereby the child includes all columns of its parent, plus others.
test=> CREATE TABLE parent_test (col1 INTEGER); CREATE test=> CREATE TABLE child_test (col2 INTEGER) INHERITS (parent_test); CREATE
If you subsequently refer to the parent table with an asterix suffix on its name, results will be based on fields common to the parent and all children.
Starting v. 7.1, the asterisk can be omitted: Inheritance is used by default, and can be suppressed using the ONLY keyword tablename modifier.
Administration:
VACUUM command purges "expired" rows (replaced, marked as unused). Run nightly. (Exclusive-locks the named table.) VACUUM with no parameter will work on all tables.
Users:
Only initial user is superuser, default name "postgres". Others must be created and GRANTed [/REVOKEd] access permissions. Usernames need not correspond to the system user database.
From command line, "createuser" can add users and optionally assign passwords, and "dropuser" can delete users. In SQL, use CREATE USER, ALTER USER, and DROP USER commands:
test=> CREATE USER demouser2; CREATE USER test=> ALTER USER demouser2 CREATEDB; ALTER USER test=> CREATE GROUP demogroup WITH USER demouser1, demouser2; CREATE GROUP test=> CREATE TABLE grouptest (col INTEGER); CREATE test=> GRANT ALL on grouptest TO GROUP demogroup; CHANGE test=> \connect test demouser2 You are now connected to database test as user demouser2. test=> \q
Creating/deleting databases:
From command line, PostgreSQL's superuser and any users given adequate access can use "createdb" and "dropdb". In SQL, use CREATE DATABASE and DROP DATABASE commands.
Access rights:
Controllable based on databasee name, username, TCP/IP address -- via the data/pg_hba.conf file. Unless PostgreSQL starts with -i flag, access is open only to local users. User passwords go in table pg_shadow. Several types of entries in pg_hba.conf:
- >Local: can be set to trust (default), password, crypt, or
reject.
- Host and Hostssl: Foregoing options, plus ident (identd AKA auth is weak network authentication), krb4, and krb5.
Examples:
local all trust host all 127.0.0.1 255.255.255.255 trust
The above is default pg_hba.conf contents.
host all 192.168.34.0 255.255.255.255 crypt host all 192.168.98.0 255.255.255.255 password finance
The latter line says this subnet's users
password-authenticate against
table "finance" holding username/password info.
host sales 192.168.7.12 255.255.255.255 ident host sales 192.168.7.64 255.255.255.255 ident support
Concerns database sales. Latter line says not only use identd authentication, but also look up a mapping of network usernames to PostgreSQ ones in table "support".
Backup/Restore:
If PostgreSQL isn't running, can use system backup tools. (Database files aren't a moving target.) If it is running, must use pg_dumpall (all databases) or pg_dump (specified database) to make snapshot. The dump (snapshot) file has just the ASCII SQL commands that specify the table contents.
To restore, CREATE the table, then reload it directly using psql:
$ psql newtest < /tmp/test.dump
List of all tables of system internals (data types, functions , operators , databases, users, and groups):
pg_aggregate aggregates pg_attribute columns pg_class tables pg_database databases pg_description comments pg_group groups pg_index indexes pg_log transaction status pg_operator operators pg_proc functions pg_rewrite rules (not covered here) and views pg_shadow users pg_trigger triggers pg_type types pg_user a view based on pg_shadow that omits the password field
Pitfalls:
- Letter case: PostgreSQL isn't case-sensitive except for strings in quotation marks.
- Statements must end with semicolon or backslash-g (go). Newlines aren't significant.
psql special characters:
Backslash-r resets (erases) query buffer.Backslash-? prints all backslash commands.
Backslash-h prints all SQL commands.
Backslash-q quits.
Backslash-d [tablename] gets table description. Default is info on all tables.
Backslash-x enables expanded-display mode, in which fields are displayed 90 degrees rotated from usual, to prevent too-wide lines.
Backslash-& means &, which by itself is reserved for LIKE pattern matches.
Backslash-$ means $, which by itself is reserved for regex matches (EOL). (Generally, backslash-character escapes character's reserved role.)
Backslash-N means NULL.
Notes from installation:
Database startup:
/usr/lib/postgresql/bin/postmaster -D
/var/lib/postgres/data
or
/usr/lib/postgresql/bin/pg_ctl -D /var/lib/postgres/data -l
logfile start
/usr/bin/createdb: create a database
/usr/bin/createuser: enable other users to connect to a
database
Configuration files /etc/postgresql/postmaster.conf, postgresql.conf, postgresql.env. Latter is read into environment for user postgresql. pg_hba.conf file is parsed for database access rights.
"postmaster" is name of main binary, which handles all access. It spawns session instances of the database engine binary, "postgres". Optional launch/shutdown utility for postmaster: pg_ctl.
Postgresql defaults to fsync() on every record write operation. Lowers performance; increases data safety.
Subdirectories:
- bin — Command-line programs, such as psql .
- data — Configuration files and tables shared by all
databases.
- data/base — A subdirectory for each database. Use du and ls
commands to
display disk space used by databases, tables, index.
- doc — Documentation.
- include — Header files for various programming languages.
- lib — Libraries for various programming languages. Also,
database init files, prototype sample configuration files usable in
/data.
- man —Manpages.
Exercises:
1. Create a 20-or-so item address book in a PostgreSQL table. Display it sorted on each field. Create INDEX object for one or more fields (e.g., ZIP or postal code). Create a view that includes just the first and last names.
2. Dump the address book database to a dump file, including OIDs. Remove all of PostgreSQL. Reinstall it. Reload your address book, including OIDs, and verify that it's exactly as before. Note the dumpfile's syntax.
Further Reading:
PostgreSQL: Introduction and Concepts, by Bruce Momjian
http://www.ca.postgresql.org/docs/aw_pgsql_book/
Grateful acknowledgement is hereby given to Mr. Momjian's text, from which I have directly taken almost all of my examples, above.
Practical PostgreSQL, by John Worsley and Joshua Drake (book in
progress),
http://stage.linuxports.com/projects/postgres/book1/htm
Silicon Tao's PostgreSQL Tutorial
http://www.silicontao.com/ProgrammingGuide/PostgreSQL/
Several books from Rich Morin at Prime Time Freeware
http://ptf.com/ptf/dossier/sets/Post.shtml
Techniques for upgrading from Oracle to PostgreSQL
http://openacs.org/doc/openacs/html/oracle-to-pg-porting.html