PostgreSQL
2019-02-17 (updated 2020-05-10)PostgreSQL is a database engine. It has an excellent manual.
This page should be valid for PostgreSQL 12.
Setting up a private instance
Sometimes it's desirable to have a private PostgreSQL instance (for example run from the user's home directory) instead of a database in a centralized (administrator managed) instance. First, we let PostgreSQL create a directory for itself:
1 | initdb <pgdata>
|
This could be anywhere the user has write permissions.
Before we start the daemon, we need to ensure that nobody else will be able to connect to it.
This is done by modifying <pgdata>/pg_hba.conf
so that the only uncommented line is
1 | local all <user> trust
|
At this point we may as well configure the network settings too.
In <pgdata>/postgresql.conf
, set the following:
1 2 | listen_addresses = '0.0.0.0'
port = <port>
|
Depending on your circumstances, you may want something more restrictive for listen_addresses
, and port
should be something that nobody else is using on this machine.
If the default doesn't suit you, feel free to also set max_connections
.
Then it should be possible to start it up:
1 | pg_ctl -D <pgdata> start
|
Now we can configure it for secure access over the network. First, we connect locally without authenticating using
1 | psql -p <port> postgres
|
Then
1 2 | postgres=# SET password_encryption = "scram-sha-256";
postgres=# ALTER USER <user> WITH PASSWORD '<password>';
|
To allow <user>
to log in with <password>
, we add
1 | host <dbname> <user> 0.0.0.0/0 scram-sha-256
|
to <pgdata>/pg_hba.conf
for each <dbname>
we'll be using.
You may also choose to use all
instead of <dbname>
, but that will grant access to the postgres
database over the network as well, which might not be something you want.
Now we can restart it with the new settings:
1 | pg_ctl -D <pgdata> restart
|
You should then be able to create a database using
1 | createdb -p <port> <dbname>
|
and access it with
1 | psql -p <port> <dbname>
|
For remote access over the network, simply use
1 | psql -h <host> -p <port> <dbname>
|
To run the daemon in the foreground, you can run
1 | postgres -D <pgdata>
|
instead of pg_ctl
.
Socket directory write permissions
In some cases, the user might not have write permissions in the default socket directory:
1 | FATAL: could not create lock file "/var/run/postgresql/.s.PGSQL.<port>.lock": Permission denied
|
If that's the case, you can ask it to put the socket files in <pgdata>
by setting
1 | unix_socket_directories = '.'
|
in <pgdata>/postgresql.conf
.
Then commands that require the port will also need you to add -h <pgdata>
with a full path to <pgdata>
, including a leading slash.
Tuning
To change a setting for a single session, use SET
or set_config
.
Changes to postgresql.conf
can be loaded globally into a running database server with pg_reload_conf
:
1 | SELECT pg_reload_conf();
|
Asynchronous commit
If some data loss is acceptable, then asynchronous commit can be used to speed up transactions.
When synchronous_commit
is set to off
, the database will not wait for each transaction to be written to disk.
In the event of a database crash, some transactions that were reported to have been committed will disappear, as if they were rolled back.
The application that's using the database could end up in an inconsistent state if other actions (external to the database) have been taken with the assumption that committed transactions were actually committed, However, when the database storage is very slow (e.g. network-mounted), this could make transactions faster by several orders of magnitude.
Query planning fetch cost
To decide how to execute a query, the database engine first builds a plan using the query optimizer.
In order to do its job properly, the optimizer needs to know how expensive various operations are, so that it can choose the most efficient plan.
If the cost for fetching a page is very high, whether it's sequential or random access, then it could make sense to set seq_page_cost
and random_page_cost
to be equal and large.
This will encourage the optimizer to use indexes and avoid sequential scans.
Resources
Diagnostics
Slow statements
Statements that take longer than log_min_duration_statement
will be logged.
This can be useful to identify candidate queries for optimization.
It's best to start with a large value and gradually bring it down to avoid flooding the logs.
Statistics collection
The pg_stat_activity
view describes the current activity of each database process, including both client backend processes and background processes.
This makes it possible to view a snapshot of what the database is doing at any given time.
For example, this provides an overview:
1 2 3 4 | SELECT datname, usename, client_addr, query_start, wait_event_type, wait_event,
state, backend_type
FROM pg_stat_activity
ORDER BY backend_type, state;
|
It's also possible to view the longest-running queries:
1 2 3 4 5 | SELECT NOW() - query_start AS time, wait_event_type, wait_event, query
FROM pg_stat_activity
WHERE state = 'active'
ORDER BY time DESC
LIMIT 5;
|
Slow COMMIT
If you have a transaction that's slow to commit, many of the usual diagnostics won't be helpful, since they'll just say that COMMIT
is the slow query.
However, the culprit could be a deferred constraint trigger.
Documentation
Some useful deep links into the documentation: