It is easy to edit the .sql
files which specify the tables and other schema entries for an application. In development, it is even easy to drop and re-create the schema from these files. But when it comes time to apply the changes to an application in production, it can be a bit trickier. This is especially so if the application provides tables or other objects directly to other applications; in this case, dumping the data, dropping and re-creating the schema, and restoring the data may not work.
Instead, what we do is write an update script, which is just an SQL file which issues appropriate SQL commands to change the existing tables to match the schema creation file. I believe there are automatic tools for doing this but I haven‘t investigated them. This is a job I want to understand how to do manually before I start trusting automation—the whole point is to update a production schema in place, so it is important that one understands exactly what the automatic tool is going to do.
The first thing to do is to dump the schema of the production schema. So once we have set our environment variables to cause us to connect to the production database, we use pg_dump
to get the existing schema and save it in a file:
pg_dump --schema=uwdata --schema-only >old_uwdata.dump
Now we set our environment variables back to our usual values so the rest of the work connects to the development database. The rest of the work is done using exclusively the development database until we have a migration script we are confident is correct.
We want to find the differences between our updated schema and the schema actually in use on the production server. The diff
tool is nice for finding differences between files, but it performs a fairly unsophisticated textual diff—as far as it is concerned, every file is an sequence of atoms (lines). We need to translate the schemas into a form which diff
can handle. pg_dump
almost does this, but it can put items out of order, which typically isn't a real difference but will certainly be reported as one by diff
.
The solution (which is probably only partial) is to use diff
only on the result of using pg_dump
on a schema created by loading the output of pg_dump
.
-- IsaacMorland - 14 Jul 2009