Introduction

Core Data is an iOS/OSX persistence framework, used for persisting your model data. It allows data to be stored in XML, binary, memory or SQLite stores.

In Garena iOS Team, we use Core Data extensively to persist data. It's our #1 choice to data persistence across all our apps. This blog post highlights the problems that we have encountered and some precautions we took along the way.

Let's take a look at what Core Data does internally and how it persists your data at SQL layer.

Core Data Tables

In our stack, we use SQLite for our Core Data persistence option. Let us examine the SQLite file that is created by Core Data.

our_db.sqlite is a production database that we use in one of our apps.

$ sqlite3 our_db.sqlite
SQLite version 3.8.5 2014-08-15 22:37:57
Enter ".help" for usage hints.
sqlite> .tables
ZBTAD                 ZBTMESSAGE           Z_PRIMARYKEY        
ZBTCHAT               ZBTUSER                             
ZBTGAMEINFO           ZBTUSERINFO  
ZBTINVITE             Z_METADATA
...

Tables in Core Data are prefixed with Z, followed by entity name, e.g. for an entity named BTMessage, it will create a corresponding table named ZBTMesssage.

If you look closely at the list, you will notice that there are two tables, Z_METADATA and Z_PRIMARYKEY. These two tables do not have a corresponding entity. They are used to store information about the Core Data stack.

1. Z_METADATA###

sqlite> PRAGMA table_info(Z_METADATA);
0|Z_VERSION|INTEGER|0||1
1|Z_UUID|VARCHAR(255)|0||0
2|Z_PLIST|BLOB|0||0

PRAGMA table_info shows the columns in a table in SQLite.

sqlite> select * from Z_METADATA;
1|5B15A414-CC2D-4364-BFDD-D1234F9461B|bplist00?	
QRS_NSStoreModelVersionIdentifiers_NSPersistenceFrameworkVersion_NSStoreModelVersionHashes[NSStoreType__NSAutoVacuumLevel_ NSStoreModelVersionHashesVersionR45?#

!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOP\
...

Z_UUID is the current Core Data's UUID. Z_PLIST contains a serialized binary plist with lots more information (See! Who says it's bad to store huge binary in a relational database row.).

This is what the binary plist looks like.

These values are hashes of the Core Data Entities that we have in our database. If you change a property in an entity, the hash for that entity will change. That is also when migration must be performed.

2. Z_PRIMARYKEY###

sqlite> SELECT * FROM Z_PRIMARYKEY;
Z_ENT       Z_NAME                Z_SUPER     Z_MAX     
----------  --------------------  ----------  ----------
1           BTAd                  0           0         
2           BTAdList              0           1         
3           BTCategory            0           0         
4           BTCellMetadata        0           1         
5           BTChat                0           7         
6           BTChatGroup           5           0         
7           BTChatStranger        5           0         
8           BTChatUser            5           0         
9           BTZChatClub           5           0         
10          BTGameInfo            0           0         
11          BTGameList            0           1         
12          BTGameRank            0           0         
13          BTGroup               0           8         
14          BTInvite              0           2         
15          BTInviteGroup         14          0         
16          BTInviteUser          14          0 
...

Z_PRIMARYKEY table contains meta-information of all the entities that we have in our Core Data. Z_ENT value is assigned in this Z_PRIMARYKEY table based on ascending order of the entity name. This has significant implications for migration which we will cover later. Note that sub-entities will also appear on this list and be assigned a corresponding Z_ENT.

Z_ENT Column####

sqlite> PRAGMA table_info(ZBTMessage);
0|Z_PK|INTEGER|0||1
1|Z_ENT|INTEGER|0||0
2|Z_OPT|INTEGER|0||0
3|ZDISPLAYINDEX|INTEGER|0||0
4|ZISOUTGOING|INTEGER|0||0
5|ZISWHISPER|INTEGER|0||0
...

There is a Z_ENT column in every table. This Z_ENT is used to identify which entity that particular row in the table is.

Profiling

Before we proceed further to examine the tables in detail, we need to be familiar with the different profiling mechanism that is available.

1. Application Argument

To profile or log core data fetches, you can pass the following as an argument to the application:

- com.apple.CoreData.SQLDebug 1

Higher levels of debug�ging numbers produce more information but can become too clutter to be of any use.

The information that is being logged can be useful but it doesn't alert or inform the developer immediately when there is a slow query. It just throws everything into the console and it's your job to stare and digest them.

2015-11-20 10:09:48.593 OurApp[3579:1652444] CoreData: sql: SELECT 0, t0.Z_PK, t0.Z_OPT, t0.ZDATA, t0.ZDISPLAYINDEX, t0.ZMSGID, t0.ZREADCOUNT, t0.ZSTATUS, t0.ZTIMESTAMP, t0.ZTYPE, t0.ZUSER FROM ZBTMESSAGE t0 AND  t0.ZTYPE = ?
 
2015-11-20 10:09:48.594 OurApp[3579:1652444] CoreData: annotation: sql connection fetch time: 0.0002s

2. Overriding executeFetchRequest:error

We believe we can do more. We came up with a solution to identify slow queries. Since all main queries (except for relationship faulting) go through the NSManagedObjectContext method: executeFetchRequest:error:, we can swizzle it and perform some profiling.

+ (void)load
{
#if defined DEBUG
    Method original, swizzled;

    original = class_getInstanceMethod(self, @selector(executeFetchRequest:error:));
    swizzled = class_getInstanceMethod(self, @selector(executeFetchRequest_Swizzled:error:));
    method_exchangeImplementations(original, swizzled);
#endif
}

- (NSArray *)executeFetchRequest_Swizzled:(NSFetchRequest *)request error:(NSError *__autoreleasing *)error
{
    NSString *logFormat = BTSerializeLog(request);
    CFAbsoluteTime start = CFAbsoluteTimeGetCurrent();
    NSArray *results = [self executeFetchRequest_Swizzled:request error:error];
    CFAbsoluteTime end = CFAbsoluteTimeGetCurrent() - start;

    // 100ms
    if (end - start > 0.1) {
        BTRemoteLog(@"%@[%.0f]", logFormat, end-start);
        NSParameterAssert(NO);
    }

    return results;
}

If an execution or fetch request takes longer than 100ms, it will be remotely logged. With these logs, we will be able to quickly identify slow queries.

We combine this data with our app responsiveness ping system (which actively pings the main thread and remotely log stack trace when main thread freezes for >100ms), we are able to greatly improve user experience, ensuring a lag-free experience to our user.

3. Explain Query Plan

Should we require more in-depth information about the queries that are being executed, we can use SQLite command, explain query plan.

Good query which uses the correct index:

sqlite> explain query plan select * from ZBTMessage where ZMSGID = 2;

0|0|0|SEARCH TABLE ZBTMessage USING INDEX ZBTMESSAGE_ZMSGID_INDEX (ZMSGID=?)

Slow query which scans the whole table:

sqlite> explain query plan select * from ZBTInvite where ZINVITEID = 2;

0|0|0|SCAN TABLE ZBTInvite

Like all relational database, indexing is important. Setting up the correct indexes is even more important.

Setting up the correct indexes can be a problem when we have subclasses in Core Data.

Subclassing

In Object Oriented Programming, we are taught about the paradigm of using a subclass to represent inheritance. Similarly, Core Data allows you to create abstract and sub entities. At Garena, we avoid using sub-entities to represent subclasses. Each table should correspond to one and only one entity.

But why?

Let's look at how Core Data deals with sub-entities and subclasses.

Creating sub-entity is straight forward in Xcode.
Subclass

This is a screenshot of our BTInvite entity which has two other sub-entities, BTInviteGroup and BTInviteUser.

But if you remember the list of tables that we have above

$ sqlite3 our_db.sqlite
SQLite version 3.8.5 2014-08-15 22:37:57
Enter ".help" for usage hints.
sqlite> .tables
ZBTAD                 ZBTMESSAGE           Z_PRIMARYKEY        
ZBTCHAT               ZBTUSER                             
ZBTGAMEINFO           ZBTUSERINFO  
ZBTINVITE             Z_METADATA
...

This list contains only ZBTINVITE and not the other sub-entities. This is because entries of BTInviteGroup and BTInviteUser will be inserted into ZBTInvite table, differentiated by the Z_ENT column in our table.

If you look back at our Z_PRIMARYKEY table, you can see that Z_ENT for BTInvite, BTInviteGroup, BTInviteUser are 14, 15, 16 respectively.

sqlite> select * From Z_PRIMARYKEY where Z_NAME like 'BTInvite%';
14|BTInvite|0|2
15|BTInviteGroup|14|0
16|BTInviteUser|14|0

This is a list of indexes for BTInvite, mostly generated by Core Data, except for Z_ID_INDEX which we created for fast retrieval via ID = ?.

sqlite> .indices 'ZBTINVITE'
ZBTINVITE_ZGROUP_INDEX
ZBTINVITE_ZUSER_INDEX
ZBTINVITE_Z_ENT_INDEX
ZBTINVITE_Z_ID_INDEX

In BTInvite table,

sqlite> select * from ZBTInvite;
Z_PK        Z_ENT       Z_OPT       ZUSER       ZGROUP      Z_ID        ZMEMO
----------  ----------  ----------  ----------  ----------  ----------  ----------
1           16          4           65                      4412                                    
2           15          1                       12          2         

These two entries in ZBTInvite represent two different entities. The first row represents an entry of BTInviteUser (Z_ENT=16) while the second row represents BTInviteGroup (Z_ENT=15).

All seems reasonable and acceptable until we populate this table to have 10000 records of BTInviteUser and 10000 records of BTInviteGroup.

Suppose we want to retrieve an entity of BTInviteUser where its ID = 4412.

This is a typical piece of code which we will write.

NSFetchRequest *fetchRequest = [NSFetchRequest fetchWithEntityName:@"BTInviteUser"];
fetchRequest.predicate = [NSPredicate predicateWithFormat:@"ID == 4412"];
NSArray *results = [context executeFetchRequest:fetchRequest error:nil];

Which generates the following SQL statement:

SELECT * from ZBTInvite where Z_ENT = 16 AND Z_ID = 4412;

As the table becomes populated with more and more BTInviteUser entries, we noticed that the query becomes slower and slower. Z_ID is unique and should be reasonable fast since it is indexed.

Let us use explain query plan to check the query execution.

sqlite> explain query plan select * from ZBTInvite where Z_ENT = 16 AND Z_ID = 4412;

0|0|0|SEARCH TABLE ZBTInvite USING INDEX ZBTINVITE_Z_ENT_INDEX (Z_ENT=?)

Surprise! It is not using the ZBTINVITE_Z_ID_INDEX index. Instead, it uses the ZBTINVITE_Z_ENT_INDEX. As Z_ENT is not unique enough, the query will indeed become slower, just like what we have experienced.

Why? To be honest, we have no idea. Choosing which index to use is done by SQLite internally. We have no way of intervening it, choosing which index to use. What we can do is to provide a more obvious index, i.e. compound index of Z_ENT + Z_ID.

Setting compound index is straightforward (http://stackoverflow.com/a/8657785/2227541). However, it is impossible to use the Z_ENT column as part of a compound index. It just throws an error at runtime.

The solution to this is to avoid using subclasses in Core Data entities. You may choose to use your introduce your own type property, to remove dependence on Z_ENT column. With type and your own unique ID, you can then achieve something similar to ZBTInvite, ZBTInviteUser, ZBTInviteGroup, with the benefit of having a proper index, i.e. Z_TYPE + Z_ID compound index.

Migration

And, the Z_ENT haunt doesn't end here. It may cause other problems. In small applications, this can be negligible, but in big applications, it can cause significant delays.

Let us go back and look at the Z_PRIMARYKEY table.

sqlite> select * From Z_PRIMARYKEY;
Z_ENT       Z_NAME                Z_SUPER     Z_MAX     
----------  --------------------  ----------  ----------
1           BTAd                  0           0         
2           BTAdList              0           1         
3           BTCategory            0           0         
4           BTCellMetadata        0           1         
5           BTChat                0           7         
6           BTChatGroup           5           0         
7           BTChatStranger        5           0         
8           BTChatUser            5           0         
9           BTZChatClub           5           0         
10          BTGameInfo            0           0         
11          BTGameList            0           1         
12          BTGameRank            0           0         
13          BTGroup               0           8         
14          BTInvite              0           2         
15          BTInviteGroup         14          0         
16          BTInviteUser          14          0   

As mentioned earlier, this table is sorted alphabetically and with Z_ENT incrementing for each row.

Now, let's try to introduce BTHome entity to represent homes of our users (just an example, not stalking our users).

Z_ENT       Z_NAME                Z_SUPER     Z_MAX     
----------  --------------------  ----------  ----------
...
13          BTGroup               0           8         
14          BTHome                0           0
15          BTInvite              0           2 

This causes tables BTInvite and below to have a higher (+1) Z_ENT now.

And this increase in Z_ENT in all the tables, will require SQL column update execution. i.e. UPDATE Z_BTINVITE SET Z_ENT=15 WHERE_ENT=14. While this SQL statement is generated for you automatically when you perform a migration, this can still cause significant workload in updating all the tables.

And now, imagine you have 100 tables below ZBTInvite, and each table with a significant amount of data. Your migration is going to take a long long time.

This can still thankfully be prevented. How? Adopt one of the most primitive methods, naming your entities intelligently.

In this example, we introduced BTHome. But to avoid causing migration to our 1203912 tables that are after BTInvite, we can name it BTZHome. This way we forced this newly created table all the way to the bottom of the tables list, hence, not affecting the other tables when migration is performed.

Conclusion

Core Data is a framework that had became our integral solution for data persistance. Without in-depth documentation of the underlying implementation, we were often blindsided by the various performance issues. To overcome these issues, we learnt different ways to profile our App's performance and adopt best practices to avoid unnecessary performance hits in our fetching and migration process.

We think Apple has done a great job and there is no good substitute at the moment. Core Data allows us to iterate with the ease and providing a good balance between speed and complexity in modelling our persistence data layer. Moving forward, we hope our findings here can help you avoid hours of headache grappling with Core Data.