Synchronizing the Database with Your Domain Model, a Magical Tale
What is the synchronization component?
One of the technically interesting features of Mendix is the synchronization component. Once you create a domain model for your application, this component of the Mendix Runtime Server maps it to a database schema in the underlying database and keeps it in sync after any modifications.
This means you can simply create entities, add attributes to them, draw associations between them and let entities inherit from others, and this component will generate all the required tables, columns, sequences and so on. And when you make changes to the domain model, these are automatically synced with the database, during which application data is kept. This saves valuable time for database administrators; however, they can still see and approve the DDL commands. The remainder of the time, they can focus on important things, rather than creating database structures and keeping them in sync with applications. In the Mendix Cloud you don’t even have to know about the database and the server it runs on.
A quick and dirty introduction to databases.
To understand this blog post, some basic knowledge of relational databases is helpful. To summarize, a database consists of tables, these tables contain columns and can contain records (also called rows) of data. Tables can also have relations to each other. There is far more to it, but for further reading, the Internet is your friend.
For this domain model the synchronization component will generate four tables: tables for the Document, Order entity and OrderLine entities and a table for the association between Order and OrderLine. For the database experts among us: A foreign key construction would have been sufficient as well for this association type; we’ll get to this later.
Additionally, a number of columns are created in the tables: one for each attribute of the entity, plus an ID column containing the unique identifier of an object. For the AutoNumber attribute in the OrderLine entity, a sequence is generated, which is an automatically increasing number for which the database guarantees uniqueness. Note that the Order entity inherits from the Document entity, meaning that if you create an Order in your application, it also has the Description attribute of Document. This is not actually a column in the Order table. We’ll explain how this works now.
How inheritance hierarchies are defined
First, some terms here. A subentity is an entity that inherits from another entity which is then called its superentity. In the example above, the Order is a subentity of Document, which is its superentity.
There are a few ways in which you can define inheritance in a database. They all have their benefits and downsides and it’s beyond the scope of this blog post to describe them all. For further reading, check out Single Table Inheritance, Concrete Table Inheritance and Class Table Inheritance. In our synchronization component we opted for the Class Table Inheritance architecture, meaning that for every Entity in your domain model a table is created, but attributes of superentities are not duplicated in all the tables of the subentities. We can usually query efficiently in Mendix, for example because of well-defined pages in the client that only need certain attributes. If you only want to see attributes of a superentity (like the Description attribute of Document in the example) then with Class Table Inheritance you do not have to query all the tables for the subentities of Document to gather the value of the Description attribute.
How associations are defined
We always render associations as junction tables (also called cross-reference tables or link tables; see http://en.wikipedia.org/wiki/Junction_table). These tables always contain two ID columns, one for each of the IDs of the objects of the entities forming both sides of the association. This is a common way of creating many-to-many associations but one-to-many associations can also be created using foreign keys. The reason we also use these junction tables in one-to-many associations is mostly simplicity, flexibility and preservation of data. When you change an association from one-to-many to many-to-many or vice versa no changes have to be made to the database structure and data is automatically kept. It also means that there are no long-running queries needed to transform a structure of foreign keys into cross tables. However, it is still possible that we will optimize by using a foreign key construction when needed in certain use cases. Using foreign keys you would not need an additional table but you could simply store the ‘foreign’ IDs (meaning the IDs of objects of another entity) within the table of an entity.
Administration of the database structure and preservation of data
Back in Mendix 2.5 and earlier we used to simply read the database structure before synchronization and compare it with the domain models in the application model. This was nice and easy but not as powerful as the system we have now. Data preservation was much harder because all you had to go on were names of database objects. So data preservation had to be done by maintaining a list of renamed entities, attributes and associations. If you then synchronized the database, it would look at the last synchronization date and check if any renames still had to be performed on that specific database. It also could not handle complex cases where changing an entity’s superentity came into play.
In Mendix 3 we completely rewrote the database synchronization algorithm. We decided to go with an administration of the domain models and how they are mapped to the underlying database structures. Of course this information then had to be kept in each database. You can spot them in the database because they have the prefix ‘mendixsystem’. For the actual mapping we have administration tables for entities, attributes, associations, indexes, index columns and sequences.
All model elements such as entities and associations have unique and immutable identifiers in the application model. What the aforementioned administration tables have in common is that they map this identifier to the actual name in the database. This means we always know which database element belongs to which model element. For example, the mendixsystem$entity table has an ID, entity_name, table_name and superentity_id column.
This works for all combinations of changing entity inheritance, renaming and deleting entities and whatever else. In addition, it also works in combination with adding/removing/changing associations, attributes and indices. Now you understand we need to have a meta administration of the database to make this all possible while preserving data.
How it works
That’s all nice, but how is this all actually done?
Note that for this piece of the article, you will need more understanding of databases than the previous short explanation. It’s safe to skip if you don’t care about the technical details.
The synchronization phase that is executed while starting up the runtime server consists of a few phases of its own:
We’ll get into these now.
Sometimes we make mistakes or have new insights that can improve things in the synchronization code and the resulting database structure. To be able to account for these, we have a few pieces of code built-in that allow us to repair mistakes of the past. The following part may sound very abstract but the example afterwards will hopefully clarify it.
The pre-analysis phase is one of those pieces of code and will figure out the changes to the database that must be executed before entering the main analysis phase. Another reason to have this phase is when we introduce new concepts and the administration tables must be updated to accommodate for these. Both the synchronization component and the pre-analysis subphase have their own versioning not directly connected to the framework version. This is one of the things that is stored in the mendixsystem$version table. Depending on the current version, the code will know what to do to achieve its goal.
An example of one of these changes is when we added indices to the actual administration tables themselves, allowing for faster querying of the administration tables and a faster execution of the synchronization component.
When there are queries coming out of this phase, control is returned to the user to ask if the queries should be performed. Then the synchronization process is restarted and this phase will not turn up any queries anymore and the analysis phase is entered.
This is the phase where the application model and the current database are compared against each other and where is decided what should be done to bring the database in sync with the domain models of your application.
This phase starts off with generating two instances of the so-called MappingModel class, which contains how entities, associations, etc. are mapped to the database. One instance is generated by analyzing the current application model (in the deployment package) and thus shows what the situation should become after synchronization. The other instance is generated by reading the previously mentioned system tables, resulting in a model of the CURRENT state in the database. These two class instances are then compared and all removals and updates are collected and saved into an object that is passed on to the next phase.
All the information the analysis phase came up with are now turned into actual SQL. SQL contains a subset of DDL commands that allow changing the structure of a database. The SQL rendered in this phase primarily consists of these DDL commands.
The main goal is to read the object that contains changes to be made to the database and render commands out of that information. Many database types do things slightly differently when it comes to altering the database structure. Because we support multiple types of databases, this is a complex setup that can, by inheritance, deal with different dialects of SQL. To name an example, there is a class called AlterColumnTypeRenderer and the PostgreSQL version is called PostgreSQLAlterColumnTypeRenderer. A DDL renderer for a certain database type can register handlers for all kinds of possible operations. However, the majority of the DDL rendering code is generic for all databases.
The result of all this is a list of commands and the user is given the option to execute them.
Another small thing this phase does is handle some additional steps for migration from previous Mendix versions. An example of such a command is adding a column to store the model version number and home.mendix.com project name to the version table, so the database can also store this information.
Unlike the pre-analysis phase, it doesn’t require execution before analysis so they will just be added to the list of synchronization commands that this phase comes up with. This is one of the reasons there are two different parts of code dealing with migration.
After the user has given the ok to execute the queries, the queries are actually executed on the database. This all happens within a transaction so that if something fails along the way, the whole set of changes are reverted. Still, it’s a good idea to backup your database before executing steps like these.
In this article I discussed the possibilities of the synchronization component and explained something of the underlying mechanism. I could go into much more detail on how all the administration tables work and for example, how identifiers are given out by the database, but I think this is enough detail for one blog. Let me know if you’re interested in more!