Python DB & Data Warehouse Migrations
Arkalos lets you manage schema changes in your relational database or data warehouse with simple, versioned Python migration scripts. Built on top of SQLGlot and Ibis, Arkalos supports CREATE TABLE
, ALTER TABLE
, and other SQL operations using a Pythonic syntax.
This guide shows how to write, organize, and run database and data warehouse migrations in your FastAPI or data engineering project using Arkalos.
Versioning Migrations
Arkalos organizes migrations by version subfolders, instead of storing many files in a single flat directory. This helps your team track schema changes logically alongside your app or warehouse version.
Choose a versioning strategy and stick to it:
- Semantic versioning:
0.1.0
,0.2.0
, etc. - Minimal versioning:
0.1
,0.2
- Date-based:
2025.06
for monthly migrations - Year-based:
2025
Example path:
Creating a Database Migration
To generate a new database migration file:
This creates a migration template at app/schema/migrations/db/...
:
from arkalos import DB
from arkalos.schema.database_migration import DatabaseMigration
class Migration(DatabaseMigration):
def up(self):
with DB().createTable('users') as table:
table.col('id').id()
table.col('created_at').datetime().notNull().defaultNow()
table.col('updated_at').datetime().notNull().defaultNow()
def down(self):
DB().dropTable('users')
If you create migrations manually, name the file starting with v
and use underscores: _
.
Creating a Data Warehouse Migration
To create a data warehouse migration:
This generates a file in:
Example migration:
from arkalos import DWH
from arkalos.schema.data_warehouse_migration import DataWarehouseMigration
class Migration(DataWarehouseMigration):
def up(self):
with DWH().clean().createTable('users') as table:
table.col('id').id()
table.col('created_at').datetime().notNull().defaultNow()
table.col('updated_at').datetime().notNull().defaultNow()
def down(self):
DWH().clean().dropTable('users')
Data Warehouse Layers
Most data warehouses follow a three-layer architecture:
raw
(bronze): lightly structured ingestion layerclean
(silver): normalized, validated dataBI
(gold): business-ready reporting tables
You can target these layers using the appropriate facade:
Running and Rolling Back Migrations
Run all database migrations:
Run only a specific version:
Rollback the last database migration:
Run all data warehouse migrations:
Rollback data warehouse migrations:
Alter Tables with SQLGlot and Ibis
To alter a table instead of creating it, use alterTable()
:
Table Syntax and Column Types
Start by creating or altering a table with:
Column Types
Integer:
.id()
– Primary key, auto-increment bigint.integer()
/.tinyInt()
/.smallInt()
/.bigInt()
.uInteger()
/.uTinyInt()
/.uSmallInt()
/.uBigInt()
– Unsigned
Decimal:
.decimal(total=8, places=2)
Text:
.string(length=255)
– VARCHAR.text()
– TEXT
Boolean:
.boolean()
Date & Time:
.datetime()
.date()
Column Modifiers
.notNull()
– NOT NULL.nullable()
– Allow NULLs (removes NOT NULL on ALTER).default(value)
– Static default.defaultNow()
– Use current timestamp.defaultFunction(func_name, param)
– Custom default function
Indexes
with DWH().clean().alterTable('users') as table:
table.col('email').string().notNull()
table.indexUnique('email')
.indexUnique(column_name)
– Adds a UNIQUE INDEX
Foreign Keys
For relational databases:
with DB().alterTable('posts') as table:
table.col('user_id').uBigInt().notNull()
table.foreignKey('user_id', ref_table='users', on_ref_col='id')
For data warehouses:
table.foreignKey(
'user_id',
ref_table='users',
on_ref_col='id',
ref_table_group=DWH().clean().layerName()
)
Python Query Builder (with SQLGlot ORM-style syntax)
Use Arkalos' built-in query builder to interact with your DB or DWH from notebooks, scripts, or your FastAPI app.
from arkalos import DB, DWH
from pydantic import BaseModel
import polars as pl
import datetime
class UserType(BaseModel):
id: int
name: str
email: str
is_admin: bool = False
created_at: datetime.datetime
updated_at: datetime.datetime
class InsertUserType(BaseModel):
name: str
email: str
is_admin: bool = False
db = DB()
dwh = DWH()
# Connect before running queries
dwh.connect()
# Your queries here
# Disconnect after
dwh.disconnect()
Raw SQL
dwh.executeSql("INSERT INTO users ...")
df = dwh.executeSql("SELECT ...", select=True) # Returns Polars DataFrame
Ibis Tables
df = dwh.raw().selectAll('airtable__properties')
table = dwh.raw().table('notion__projects') # ibis Table
See Ibis Tutorials for more.
Insert Records
db.insert('users', user)
# or get inserted row
user = db.insertReturning('users', user)
# or get inserted row as a full model type
db.insertReturning('users', user, UserType)
# To insert multiple records:
db.insertMultiple('users', users) # DataFrame or list of dicts