Scaling Out Real Project Database with SQL Azure Federations. Part 2: Master Data

by Pavlo Revenkov

Last time we've covered some theory about SQL Azure Federations, including what you should give a thought and what you should keep in mind when migrating. Importantly, it's not only about technology. The first thing to consider is always the database architecture, regardless of the scaling out method you choose - Federations, MySQL Cluster or anything else. The database you scale out must be always architecturally oriented.

Let's get back to our project. The subject domain of our database is personal budget management. The database structure is presented in the diagram below.

We can see that the database is pretty simple. Every system object is an entity with basic properties (Id, Name, and Description). The specific entities are Account (its legacy entities: Bank Account, Credit Card), Expense Category (its legacy entities: Budget and all child categories), and Account Transactions.

Apart from the tables, the database also contains some logic for adding new entities (implemented as stored procedures), and a couple of Views for displaying the results of typical database queries.

Below is an example of database structure:

Obviously, in a real project, the number of artifacts in a database can be significantly larger, but trying to migrate even such a small database can help identify major issues you may encounter while using SQL Azure Federations.

Analysis

Before you jump right in and migrate the database for using with SQL Azure Federations, you must define which data units in the database are logically independent and which data unit types can be distributed into different databases. Essentially, you must select the so-called federated table, which content will be divided into several databases.

If we look at the database structure, the first candidate that comes to mind is the basic Entity table. The script for creating this table looks like this:

CREATE TABLE Entity (
[Id] INTEGER NOT NULL PRIMARY KEY IDENTITY(1,1),
[Name] NVARCHAR(MAX) NOT NULL,
[Description] NVARCHAR(MAX) NOT NULL,
)

At first glance, this table seems to be perfect for dividing, but it's not like this actually. Sure, it does not contain any foreign keys, has a pretty simple structure and the maximum number of stored records. However, according to the database logic, the data of the "legacy tables" are linked to this table. In other words, all entities created in the system have a record in the Entity table.

Let's have a look at the following example. Suppose we divide the data by the entity ID range (the ID field). Let's say all records with IDs 1-50 are stored in the first shard, 51-100 - in the second one, and so on.

A user tries to add a record to the Operations table, for example, information about buying a carton of milk. The account entity ID is, for example, 1, Expense Category ID is 6. Let's also assume that the first database already contains 50 records, so the ID of the new one must be 51, namely it must be stored in the second shard.

The query for adding new data to the table looks like this:

USE FEDERATION Entities(EntityId = 51) WITH RESET, FILTERING = OFF
GO
INSERT INTO Operation VALUES (
51, -- EntityId
1, -- AccountId
6, -- CategoryId
GETDATE(), -- Date
10, -- Amount
'USD' -- Currency
)

The query will function absolutely correctly. Let's now try to get a list of all operations for this account (ID = 1). There is a corresponding View for this in the database. Its source code looks like this:

SELECT
Account_Entity.Description AS 'Account',
Operation_Entity.Name AS 'Operation',
Operation_Entity.Description,
Operation.Amount,
Operation.Currency,
Operation.Date
FROM
Operation
INNER JOIN Entity AS Operation_Entity ON Operation.EntityId = Operation_Entity.Id
INNER JOIN Account ON Operation.AccountId = Account.Id
INNER JOIN Entity AS Account_Entity ON Account.EntityId = Account_Entity.Id
INNER JOIN Category ON Operation.CategoryId = Category.Id
INNER JOIN Entity AS Category_Entity ON Category.EntityId = Category_Entity.Id
WHERE
Account.Id = 1

We remember that the data from one account is stored in different shards. So, in order for this query to return correct results, it must be executed in every shard separately.

USE FEDERATION Entities(EntityId = 1) WITH RESET, FILTERING = OFF
GO
...
USE FEDERATION Entities(EntityId = 51) WITH RESET, FILTERING = OFF
GO
...

No need to say, this approach would have a destructive effect on the application performance. Even the smallest, simplest query must be executed twice! Therefore, the Entity table is not the right choice.

Considering the multi-tenant approach to database design, where every user works with a separate database and the data between databases does not intersect, this brings up a question. Can something similar be implemented within SQL Azure Federations? So that each shard would contain the data of one user Account? This approach would actually make a lot of sense. Speaking of business logic, it would look something like this:

Let's say several family members are using one application. Each member has a separate budget. The husband has his own bookkeeping, and the wife has her own. As a result, the husband's data (AccountId = 1) do not intersect in any way with the wife's data (AccountId = 2). In this case dividing the database into shards by Accounts table seems quite logical.

Adding a new account will result in adding another shard. Frequent operations, such as working with the list of categories, operations, etc. will not cause drop in the performance.

USE FEDERATION Accounts(AccountId = 1) WITH RESET, FILTERING = OFF
GO

After running this query, we immediately know which user data we are working with at the moment. It means, this operation is performed only once.

To summarize: we observed two methods of dividing an existing database. Now we know which field we will use to logically distribute our data into separate bases. Next time we will actually distribute data among different shards. Stay tuned and have a great week! Thank you!