I was recently working on a project where we had about half a dozen developers working on an established code base. All of the developers were new to the code base and I knew that we were going to be making a fair number of database schema and data-seeding changes in a short period of time. Each developer had their own development environment with a dedicated database (PostgreSQL). The developers on the project had their hands full learning about the code base and I didn’t want to distract them by having to take a lot of their time managing their development database instances.
I decided to try using Alembic to manage the database schema migrations.
I wanted something where someone just needed to grab the latest code from source-control and run a single command to apply any database migrations that go along with the version of the source code they had. In addition to schema changes we had a number of planned data-migrations often things as simple as updating content or templates stored in a database table. The application had database access from PHP via Propel and Java via Hibernate. We thought about using either of those for schema and data migrations but ran into a few concerns
- I didn’t see a good way to track or even specify ‘data’ changes with the built-in support either of those tools(we were also on legacy versions of both ORMs). Changing the schema often isn’t enough the data changes and associated seeding are just as important
- The production DBA’s were not comfortable running hbm2ddl or a propel migrate in production because they couldn’t see and review what was actually being changed. Solving the problem for development was important but eventually the changes need to be deployed
We decided to try Alembic because it had a good reputation in the postgresql community and I’ve met the author of Alembic Michael Brewer at a number of conferences. Alembic also allowed us to write arbitrary queries for data migrations. I initially did some demonstration migrations using the Alembic/SQL Alchemy ORM syntax.
A typical migration file in this manner looked something like
"""add_car Revision ID: 9de4d14f Revises: 43254534 Create Date: 2014-04-30 17:10:51 """ revision = '9de4d14' down_revision = '43254534' from alembic import op from sqlalchemy as sa def upgrade(): op.add_column('person',sa.Column('first_car',sa.String())) def downgrade(): op.drop_column('person','first_car')
Developers were able to use this type of migration to keep their environments up to date. The migration file would get checked into source control along with the code change associated with the migration. A developer would then run
alembic -c alembic.ini upgrade head
To upgrade their database. This worked well but I found that lots of developers needed to add things to the database but they didn’t really want to learn the SQL Alchemy ORM language. Most of the developers hadn’t worked much with python let alone SQL Alchemey. Their SQL wasn’t great but they found it much easier to write the database migrations directly in SQL instead of in the ORM. Database migrations with Alembic done this way look something like
"""add_car Revision ID: 9de4d14f Revises: 43254534 Create Date: 2014-04-30 17:10:51 """ revision = '9de4d14' down_revision = '43254534' from alembic import op from sqlalchemy as sa def upgrade(): op.execute('alter table person add column first_car text') def downgrade(): op.execute('alter table person drop column first_car')
Data style migrations were similarly straight forward they would just put a INSERT or UPDATE statement in the op.execute to perform the data migration.
The team wrote approximately 40 database migrations in a little over 2 months, that works out to a database migration every few days. Once we got over the initial setup (puppet helped) we had very few problems. Occasionally a two developers would add a migration around the same time both revising the same base version creating a split revision line. Alembic detects this when you try to run an upgrade an errors out.
When it came time to apply the upgrade to production our DBA team was able to just run a ‘alembic upgrade head’ in production to update the database from the known starting state using the same migrations that were applied in the development environments. This worked fine for us but might not be optimal. Sometimes a column would be added one in one revision then renamed or the data type changed a few days after in a different migration script. The production migration could have just created the column with final name/type but this would have required re-writing some of the migrations to merge migrations into a more consolidated version. In many circumstances this is probably worth doing but we didn’t do that to.