Updating a Postgres Schema
It is easy to edit the
.sql
files which specify the tables and other schema entries for an application. In development, it is even easy to drop and re-create the schema from these files. But when it comes time to apply the changes to an application in production, it can be a bit trickier. This is especially so if the application provides tables or other objects directly to other applications; in this case, dumping the data, dropping and re-creating the schema, and restoring the data may not work.
Instead, what we do is write an update script, which is just an SQL file which issues appropriate SQL commands to change the existing tables to match the schema creation file. I believe there are automatic tools for doing this but I haven’t investigated them. This is a job I want to understand how to do manually before I start trusting automation—the whole point is to update a production schema in place, so it is important that one understands
exactly what the automatic tool is going to do.
Starting Out
The first thing to do is to dump the schema of the production schema. So once we have set our environment variables to cause us to connect to the production database, we use
pg_dump
to get the existing schema and save it in a file:
pg_dump --schema=uwdata --schema-only >old_uwdata.dump
Now we set our environment variables back to our usual values so the rest of the work connects to the development database. The rest of the work is done using exclusively the development database until we have a migration script we are confident is correct.
Make sure you don’t set your connection information to connect to the production database while experimenting, otherwise you will need
PostgresBackupRecovery at some point.
Basic Idea
We want to find the differences between our updated schema and the schema actually in use on the production server. The
diff
tool is nice for finding differences between files, but it performs a fairly unsophisticated textual diff—as far as it is concerned, every file is an sequence of atoms (lines). We need to translate the schemas into a form which
diff
can handle.
pg_dump
almost does this, but it can put items out of order, which typically isn't a real difference but will certainly be reported as one by
diff
.
The solution (which is not quite complete) is to use
diff
only on the result of using
pg_dump
on a schema created by loading the output of
pg_dump
. We need a script to obtain an appropriate schema dump based on our version-controlled schema files, and another script to obtain an appropriate schema dump based on the production schema, adjusted by our proposed migration script.
Example Scripts
The schema names and file names will have to be adjusted for specific conditions. Make sure that environment variables are set for the development instance before running these, otherwise you will need
PostgresBackupRecovery.
Here is an example script to get the version-controlled target schema:
#!/bin/bash
psql <<EOF
set role dba;
drop schema uwdata cascade;
\i init_uwdata.sql
EOF
pg_dump --schema=uwdata --schema-only >fresh_uwdata.dump
psql <<EOF
set role dba;
drop schema uwdata cascade;
\i fresh_uwdata.dump
EOF
pg_dump --schema=uwdata --schema-only >fresh_uwdata_reloaded.dump
Here is a script to test the migration and check the results:
#!/bin/bash
psql <<EOF
set role dba;
drop schema uwdata cascade;
\i old_prod_uwdata_reloaded.dump
\i migrate_regdata.sql
EOF
pg_dump --schema=uwdata --schema-only >migrated_regdata.sql
diff migrated_regdata.sql fresh_uwdata_reloaded.dump
diff migrated_regdata.sql fresh_uwdata_reloaded.dump | wc
The
old_prod_uwdata_reloaded.dump
file is obtained by loading
old_uwdata.dump
, generated earlier, and dumping using
pg_dump
. This is the same as the technique used to obtain
fresh_uwdata_reloaded.dump
in the first script.
Deficiencies
I know of the following differences that cannot be resolved with this technique:
- Fields cannot be re-ordered. When an
ALTER TABLE
command adds a field, it does so as the last field in the table. Therefore, fields added to schema files need to be added as the last field also in order to match.
- Permissions seem to be dumped in an order that is related to the order in which the permissions were assigned. Typically this appears in the
diff
output as re-ordered lines, which are fairly easy to recognize as long as the total number of them is manageable.
--
IsaacMorland - 14 Jul 2009