PostgreSQL

2019-02-17 (updated 2020-05-10)

PostgreSQL is a database engine. It has an excellent manual.

This page should be valid for PostgreSQL 12.

Setting up a private instance

Sometimes it's desirable to have a private PostgreSQL instance (for example run from the user's home directory) instead of a database in a centralized (administrator managed) instance. First, we let PostgreSQL create a directory for itself:

1	initdb <pgdata>

This could be anywhere the user has write permissions.

Before we start the daemon, we need to ensure that nobody else will be able to connect to it. This is done by modifying <pgdata>/pg_hba.conf so that the only uncommented line is

1	local all <user> trust

At this point we may as well configure the network settings too. In <pgdata>/postgresql.conf, set the following:

1 2	listen_addresses = '0.0.0.0' port = <port>

Depending on your circumstances, you may want something more restrictive for listen_addresses, and port should be something that nobody else is using on this machine. If the default doesn't suit you, feel free to also set max_connections.

Then it should be possible to start it up:

1	pg_ctl -D <pgdata> start

Now we can configure it for secure access over the network. First, we connect locally without authenticating using

1	psql -p <port> postgres

Then

1 2	postgres=# SET password_encryption = "scram-sha-256"; postgres=# ALTER USER <user> WITH PASSWORD '<password>';

To allow <user> to log in with <password>, we add

1	host <dbname> <user> 0.0.0.0/0 scram-sha-256

to <pgdata>/pg_hba.conf for each <dbname> we'll be using. You may also choose to use all instead of <dbname>, but that will grant access to the postgres database over the network as well, which might not be something you want.

Now we can restart it with the new settings:

1	pg_ctl -D <pgdata> restart

You should then be able to create a database using

1	createdb -p <port> <dbname>

and access it with

1	psql -p <port> <dbname>

For remote access over the network, simply use

1	psql -h <host> -p <port> <dbname>

To run the daemon in the foreground, you can run

1	postgres -D <pgdata>

instead of pg_ctl.

Socket directory write permissions

In some cases, the user might not have write permissions in the default socket directory:

1	FATAL: could not create lock file "/var/run/postgresql/.s.PGSQL.<port>.lock": Permission denied

If that's the case, you can ask it to put the socket files in <pgdata> by setting

1	unix_socket_directories = '.'

in <pgdata>/postgresql.conf. Then commands that require the port will also need you to add -h <pgdata> with a full path to <pgdata>, including a leading slash.

Tuning

To change a setting for a single session, use SET or set_config. Changes to postgresql.conf can be loaded globally into a running database server with pg_reload_conf:

1	SELECT pg_reload_conf();

Asynchronous commit

If some data loss is acceptable, then asynchronous commit can be used to speed up transactions. When synchronous_commit is set to off, the database will not wait for each transaction to be written to disk. In the event of a database crash, some transactions that were reported to have been committed will disappear, as if they were rolled back.

The application that's using the database could end up in an inconsistent state if other actions (external to the database) have been taken with the assumption that committed transactions were actually committed, However, when the database storage is very slow (e.g. network-mounted), this could make transactions faster by several orders of magnitude.

Query planning fetch cost

To decide how to execute a query, the database engine first builds a plan using the query optimizer. In order to do its job properly, the optimizer needs to know how expensive various operations are, so that it can choose the most efficient plan. If the cost for fetching a page is very high, whether it's sequential or random access, then it could make sense to set seq_page_cost and random_page_cost to be equal and large. This will encourage the optimizer to use indexes and avoid sequential scans.

Resources

Diagnostics

Slow statements

Statements that take longer than log_min_duration_statement will be logged. This can be useful to identify candidate queries for optimization. It's best to start with a large value and gradually bring it down to avoid flooding the logs.

Statistics collection

The pg_stat_activity view describes the current activity of each database process, including both client backend processes and background processes. This makes it possible to view a snapshot of what the database is doing at any given time. For example, this provides an overview:

SELECT datname, usename, client_addr, query_start, wait_event_type, wait_event,
       state, backend_type
FROM pg_stat_activity
ORDER BY backend_type, state;

It's also possible to view the longest-running queries:

SELECT NOW() - query_start AS time, wait_event_type, wait_event, query
FROM pg_stat_activity
WHERE state = 'active'
ORDER BY time DESC
LIMIT 5;

Slow `COMMIT`

If you have a transaction that's slow to commit, many of the usual diagnostics won't be helpful, since they'll just say that COMMIT is the slow query. However, the culprit could be a deferred constraint trigger.

Documentation

Some useful deep links into the documentation: