PostgresSchemaUpdate < CF

CF Web>Database>PostgreSQL>PostgresSchemaUpdate (2009-07-20, IsaacMorland) (raw view)
---+ Updating a Postgres Schema

It is easy to edit the =.sql= files which specify the tables and other schema entries for an application.  In development, it is even easy to drop and re-create the schema from these files.  But when it comes time to apply the changes to an application in production, it can be a bit trickier.  This is especially so if the application provides tables or other objects directly to other applications; in this case, dumping the data, dropping and re-creating the schema, and restoring the data may not work.

Instead, what we do is write an update script, which is just an SQL file which issues appropriate SQL commands to change the existing tables to match the schema creation file.  I believe there are automatic tools for doing this but I haven’t investigated them.  This is a job I want to understand how to do manually before I start trusting automation—the whole point is to update a production schema in place, so it is important that one understands *exactly* what the automatic tool is going to do.

---++ Starting Out

The first thing to do is to dump the schema of the production schema.  So once we have set our environment variables to cause us to connect to the production database, we use =pg_dump= to get the existing schema and save it in a file:

<verbatim>
pg_dump --schema=uwdata --schema-only >old_uwdata.dump
</verbatim>

Now we set our environment variables back to our usual values so the rest of the work connects to the development database.  The rest of the work is done using exclusively the development database until we have a migration script we are confident is correct.

Make sure you don’t set your connection information to connect to the production database while experimenting, otherwise you will need PostgresBackupRecovery at some point.

---++ Basic Idea

We want to find the differences between our updated schema and the schema actually in use on the production server.  The =diff= tool is nice for finding differences between files, but it performs a fairly unsophisticated textual diff—as far as it is concerned, every file is an sequence of atoms (lines).  We need to translate the schemas into a form which =diff= can handle.  =pg_dump= almost does this, but it can put items out of order, which typically isn't a real difference but will certainly be reported as one by =diff=.

The solution (which is not quite complete) is to use =diff= only on the result of using =pg_dump= on a schema created by loading the output of =pg_dump=.  We need a script to obtain an appropriate schema dump based on our version-controlled schema files, and another script to obtain an appropriate schema dump based on the production schema, adjusted by our proposed migration script.

---++ Example Scripts

The schema names and file names will have to be adjusted for specific conditions.  Make sure that environment variables are set for the development instance before running these, otherwise you will need PostgresBackupRecovery.

Here is an example script to get the version-controlled target schema:

<verbatim>
#!/bin/bash

psql <<EOF
set role dba;
drop schema uwdata cascade;
\i init_uwdata.sql
EOF

pg_dump --schema=uwdata --schema-only >fresh_uwdata.dump

psql <<EOF
set role dba;
drop schema uwdata cascade;
\i fresh_uwdata.dump
EOF

pg_dump --schema=uwdata --schema-only >fresh_uwdata_reloaded.dump
</verbatim>

Here is a script to test the migration and check the results:

<verbatim>
#!/bin/bash

psql <<EOF
set role dba;
drop schema uwdata cascade;
\i old_prod_uwdata_reloaded.dump
\i migrate_regdata.sql
EOF

pg_dump --schema=uwdata --schema-only >migrated_regdata.sql

diff migrated_regdata.sql fresh_uwdata_reloaded.dump
diff migrated_regdata.sql fresh_uwdata_reloaded.dump | wc
</verbatim>

The =old_prod_uwdata_reloaded.dump= file is obtained by loading =old_uwdata.dump=, generated earlier, and dumping using =pg_dump=.  This is the same as the technique used to obtain =fresh_uwdata_reloaded.dump= in the first script.

---++ Deficiencies

I know of the following differences that cannot be resolved with this technique:

   * Fields cannot be re-ordered.  When an =ALTER TABLE= command adds a field, it does so as the last field in the table.  Therefore, fields added to schema files need to be added as the last field also in order to match.
   * Permissions seem to be dumped in an order that is related to the order in which the permissions were assigned.  Typically this appears in the =diff= output as re-ordered lines, which are fairly easy to recognize as long as the total number of them is manageable.

-- Main.IsaacMorland - 14 Jul 2009
Topic revision: r4 - 2009-07-20 - IsaacMorland
Information in this area is meant for use by CSCF staff and is not official documentation, but anybody who is interested is welcome to use it if they find it useful.
Other Webs
My links
- People
- CERAS
- WatForm
- Tetherless lab
- Ubuntu Main.HowTo
- eDocs
- RGG NE notes
- RGG
- CS infrastructure
- Grad images
Edit