Scaling Out Real Project Database with SQL Azure Federations. Part 3: Migration

by Pavlo Revenkov

Greetings to all the Internet dwellers out there, and a happy beginning of a new week! We continue migrating the database using SQL Azure Federations. As you may remember, previously we selected the table and the field we will be using to divide our database into shards. Let's do it!

Migration

We will perform the partitioning of our database using the Account table, because the data records stored in and logically connected to it do not overlap. Since we have the script for creating a database, let's try to adapt it to be used in SQL Azure Federations.

Let's assume that the database has been already created in the Windows Azure Management Portal or with the help of the SQL Server Management Studio.

Open the script for creating objects in the database.

USE xPenses
GO
IF EXISTS (SELECT name FROM sysobjects where name = N'Operation') DROP TABLE Operation
...

Pay attention that if you connect to the SQL Azure Server via SSMS to run a query, you must select the database from a drop-down list (see below).

This way you create a new federation, the data records of which are distributed by the Account ID value. Keep in mind that no tables exist in the database at this point, meaning the AccountId field is not yet bound to any data set from real tables. The field name may also differ from the name of the field in the table that will be used for the partition.

Here you can see another logical limitation of SQL Azure Federations. The field used for the distribution must have the INT, BIGINT, UNIQUEIDENTIFIER and VARBINARY type.

After creating the federation, you must select the first shard to begin inserting data, in other words, the shard that will store the data of the first account (AccountId = 1).

Now back to the script. You need to modify the creation of the Account table, so that SQL Azure knows that the data from this specific table will be distributed among the shards using the field Id.

CREATE TABLE Account (
[Id] INTEGER NOT NULL PRIMARY KEY IDENTITY(1,1),
[EntityId] INTEGER NOT NULL FOREIGN KEY REFERENCES Entity(Id),
[Currency] NVARCHAR(3)
)

This way, a table creation script transforms into the following:

CREATE TABLE Account (
[Id] BIGINT NOT NULL,
[EntityId] INTEGER NOT NULL FOREIGN KEY REFERENCES Entity(Id),
[Currency] NVARCHAR(3),
CONSTRAINT [PK_Account] PRIMARY KEY CLUSTERED
(
[Id] ASC
)
) FEDERATED ON (AccountId= Id)

So, what has changed? The ID field type is now BIGINT. In addition to that, you lose the ability to automatically generate a value for this field when inserting a new record. This is another SQL Azure Federations limitation. But you can still use the DEFAULT keyword. This can be useful if, for example, the ID field type is UNIQUEIDENTIFIER. In this case, you can declare the field as follows:

[Id] UNIQUEIDENTIFIERNOT NULL DEFAULT NEWID()

Now, when you insert new records into the table, you don't need to define the ID of the created record. When working with other field types, this logic must be implemented on the application level.

The next thing you should pay attention to is declaring the table's primary key. You must explicitly define that the created key will be a clustered one.

The last thing you must do is use the FEDERATED ON keyword to define that the current table will be federated. The data in it will be split by the ID field.

The Account table is created. Moving along! As you can see from the database scheme, the Account table is a parent for the Credit Card and Bank Account tables.

In other words, the Bank Account and Credit Card tables have the foreign key referencing the Account table. As the Account table is now federated, you cannot ensure referential integrity from one table to another.

This is yet another SQL Azure Federations limitation. Tables that are used for data partitioning (federated tables) cannot be referenced in other tables. You will need to remove all foreign keys from tables referencing the Account table.

Consequently, a script for creating, for example, a Credit Card table, will change from something like this:

CREATE TABLE CreditCard (
[Id] INTEGER NOT NULL PRIMARY KEY IDENTITY(1,1),
[AccountId] INTEGER NOT NULL FOREIGN KEY REFERENCES Account(id),
[Type] NVARCHAR(MAX)
CONSTRAINT CreditCardType CHECK (
[Type] = 'Visa'
OR [Type] = 'MasterCard'
OR [Type] = 'JCB'
OR [Type] = 'AmericanExpress'),
[Number] NVARCHAR(MAX)
)

Into something like this:

CREATE TABLE CreditCard (
[Id] INTEGER NOT NULL PRIMARY KEY IDENTITY(1,1),
[AccountId] BIGINT NOT NULL,
[Type] NVARCHAR(MAX)
CONSTRAINT CreditCardType CHECK (
[Type] = 'Visa'
OR [Type] = 'MasterCard'
OR [Type] = 'JCB'
OR [Type] = 'AmericanExpress'),
[Number] NVARCHAR(MAX)
)

The consistency of the references from the Credit Card table records to the Account table falls entirely on the application logic.

If you try running the database creation script at this point, all the tables will be created successfully, but in addition to the tables, the database also includes two procedures. The first procedure is for adding a new category and it does not require any changes, because its logic remains within the bounds of one shard.

However, the procedure for adding new accounts (AddAccount) requires some minor adjustments. See the source code of the procedure:

CREATE PROCEDURE AddAccount(
@Name NVARCHAR(MAX),
@Description NVARCHAR(MAX),
@Currency NVARCHAR(3),
@Instrument NVARCHAR(MAX),
@Type NVARCHAR(MAX),
@Number NVARCHAR(MAX)
)
AS
INSERT INTO Entity VALUES (@Name, @Description)
DECLARE @EntityId INTEGER = (SELECT Id FROM Entity WHERE Name = @Name AND Description = @Description)

INSERT INTO Account VALUES (@EntityId, @Currency)
IF (@Instrument = 'BankAccount')
BEGIN
INSERT INTO BankAccount VALUES (
(SELECT Id FROM Account WHERE Account.EntityId = @EntityId),
@Type,
@Number
) END

IF (@Instrument = 'CreditCard')
BEGIN
INSERT INTO CreditCard VALUES (
(SELECT Id FROM Account WHERE Account.EntityId = @EntityId),
@Type,
@Number
) END
GO

In reality, the changes you need to make are pretty obvious. Since you lost the ability to generate values for the ID field automatically, this logic becomes part of the application logic. You must change the procedure header:

CREATE PROCEDURE AddAccount(
@AccountId BIGINT,
@Name NVARCHAR(MAX),
@Description NVARCHAR(MAX),
@Currency NVARCHAR(3),
@Instrument NVARCHAR(MAX),
@Type NVARCHAR(MAX),
@Number NVARCHAR(MAX)
)
AS
INSERT INTO Entity VALUES (@Name, @Description)
DECLARE @EntityId INTEGER = (SELECT Id FROM Entity WHERE Name = @Name AND Description = @Description)

INSERT INTO Account VALUES (@AccountId, @EntityId, @Currency)
...
GO

Consequently, by adding records to the Account table, instead of the following code:

EXEC AddAccount 'Cash', 'Everyday cash account', 'USD', NULL, NULL, NULL

we will now receive one more parameter (ID of the created account):

EXEC AddAccount 1, 'Cash', 'Everyday cash account', 'USD', NULL, NULL, NULL

Possibly, the next logical question is whether it is possible to add a command to use the necessary federation (USE FEDERATION) to the body of the procedure. As we got the ID of the account to be created, we know which federation we are going to work with, and can directly proceed to use the necessary shard.

USE FEDERATION Accounts(AccountId = @AcccountId) WITH RESET, FILTERING = OFF
GO

Unfortunately, if you do this, SSMS will display an error message. The problem is that the AddAccount procedure is stored on a specific shard, and this means that you cannot use the USE FEDERATION operation. Apart from that, using the USE FEDERATION statement is just impossible in procedures. The code for switching federations must be placed at a "higher level".

This concludes the changes that must be made in the database creation script. You can apply them without errors. As a result, you create a single root database (federation root) and one shard (federation member).

Sharding

One last thing we have left to do is actually scaling out the database - separating the data of one account from the data of another account.

Create a separate script for this purpose:

-- Scaling out the federation
USE FEDERATION ROOT WITH RESET
GO

ALTER FEDERATION Accounts SPLIT AT (AccountId = 2)
GO

The first thing this script does is it starts using the federation root, meaning it starts working within the bounds of the xPenses database, since the information about federations (metadata) is stored there.

Next, define that you want to partition the Accounts federation, beginning with the AccountId field with the value "2". This way, the data of the first account remains on the first shard, and the data of all the other accounts is moved to the next. Keep in mind that you do not need to specify it anywhere that the table used for data partitioning is the Account table. You work only with the metadata of the xPenses database!

Well then, you run this command and... there is a possibility that when you refresh the Object Explorer window you'll see the following:

Instead of one new shard you got... three! There is actually nothing wrong with that. The reason for this is that the data stored within the bounds of the first shard is copied in accordance with the AccountId value. In other words, for the account with the ID value "1" you must copy the data to one shard, for one with the ID value "2" - to another one. Obviously, this takes time. After SQL Azure redistributes the data among the shards, you will see that you got a database with the federations metadata and two shards.

Now, if you need to split the data into three shards, for example, to move the data of the account with the ID value "3" to a separate shard, you just need to run the following command:

ALTER FEDERATION Accounts SPLIT AT (AccountId = 3)
GO

Conclusion

We have examined the process of migrating a database creation script for using it with SQL Azure Federations. As you can see, you can avoid most bottlenecks pretty easily. However, a significant part of the database logic must be moved to a "higher level". We stayed at the database level. In real projects, we strongly recommend conducting a detailed analysis of the application environment and the database architecture before commencing migration using SQL Azure Federations.