Sunday, June 24, 2012


Using Read/Write Through In Your Data Access Layer

When thinking about accessing data we usually think of one layer. It can be DB, Cache or even a file on our file system. However, when we want to scale our system, we need to think of using multiple layers. The read and write through logic can help us balance our load and even sync data as we go. In the next few paragraphs I will try to explain the way we chose to deal with this kind of data accessing.

First let's explain the logic of read through and write through.
Write through means that the application writes data to one layer and that layer is responsible for writing the data to another layer if necessary.
Read through means that the application read from one layer of data (usually cache) and that layer is responsible of getting the data from another layer if necessary. It also responsible to update the layers that had missing data.
Note: There's also another way for handling data accessing that should be mentioned in this context. It's called "Write Behind". It means the data is always being saved to one layer, and once in a while (not for every write) this layer updates the other layers with the latest changes.

Getting Started

To get started, let's assume implementing the following interface when writing your data accessing code:

public interface IDataAccess<Key, Value>
{
    void Save(Key key, Value value);

    Value Load(Key key);

    void Delete(Key key);
}

It is very important to use this kind of interface when accessing data and it will help us later when we’ll want to enhance it for multiple layers.
Now let's implement this interface for each one of our layers. In this example I will use 2 layers: DB and Cache.

public class DBDataAccess<Key, Value> : IDataAccess<Key, Value>
{
    //Implement Save, Load and Delete for database
}

public class CacheDataAccess<Key, Value> : IDataAccess<Key, Value>
{
    //Implement Save, Load and Delete for Cache
}

Note: In the examples above and throughout the post, I used generics. In your own implementation you will probably have to specify specific types.
    

Combining the Layers

Now that we have our implementation for each layer, let's start putting it all together.
For this, let's create another implementation which will use the 2 layers above to simulate read\write through. This implementation will be used to save, load and delete, and will be responsible for the execution of the read\write through.
For this, we will store our layers in a list or an array in the order we want the data to be read. In our example, if we want to read data first from cache and then from DB, the order will be:
1) CacheDataAccess
2) DBDataAccess

For example:
public class DataAccess<Key, Value> : IDataAccess<Key, Value>
{
    //the list of our data access implementations
    private List<IDataAccess<Key, Value>> dataAccessProviders;
    public DataAccess ()
    {
        //create the list with the desired order
        dataAccessProviders = new List<IDataAccess<Key, Value>>()
        {
            new CacheDataAccess<Key, Value>(),
            new DBDataAccess<Key, Value>()
        };
    }
    .
    .
}
Note: In the example I created the providers list myself. In your implementation you should create it through better mechanism such as injection or factory.
Note: This example uses the "Decorator" pattern, which dynamically adds functionality to an object.

Add Some Logic

Now we just have to write the read and write through logic. It will be pretty easy since we used the same interface for each one of our layers. The Save method will be very simple. Just use the code above with the dataAccessProviders as the list parameter.

public void Save(Key key, Value value)
{
    dataAccessProviders.ForEach(provider => provider.Save(key, value));
}

The Load method is a bit more complicated. It needs to return the data from the first layer that has a result, but also update the layers that we went through which had no result. This is done in the following code:
public Value Load(Key key)
{
    Value value = LoadFromProvider(key);
    List<IDataAccess<Key, Value>> staleProviders = FindStaleProviders(key);
    UpdateStaleProviders(key, value, staleProviders);
    return value;
}

private Value LoadFromProvider(Key key)
{
    foreach (IDataAccess<Key, Value> provider in dataAccessProviders)
    {
        Value value = provider.Load(key);
        if (Exists(value))
        {
            return value;
        }
    }
    return default(Value);
}

private List<IDataAccess<Key, Value>> FindStaleProviders(Key key)
{
    List<IDataAccess<Key, Value>> staleProviders = new List<IDataAccess<Key, Value>>();
    foreach (IDataAccess<Key, Value> provider in dataAccessProviders)
    {
        Value value = provider.Load(key);
        if (Exists(value))
        {
            break;
        }
        staleProviders.Add(provider);
    }
    return staleProviders;
}

private void UpdateStaleProviders(Key key, Value value,
List<IDataAccess<Key, Value>> staleProviders)
{
    if (Exists(value))
    {
        staleProviders.ForEach(provider => provider.Save(key, value));
    }
}

private bool Exists(Value value)
{
    return !Equals(value, default(Value));
}
Note: Pay attention, code is optimized for readiness and not for performance.  

As you can see, we first try to load from each one of our layers (The order will be the order you specified earlier). Then, we create a sub-list of all the layers that have missing value. Last thing we do before returning the value is update those layers that have missing value.
The Delete method has the same logic as the Save method. I didn't implement it here; you can try it yourself…

That's it, we're done. We finally have a simple implementation to our read/write through problem. Coding using interface has made it possible for us to work with any kind of provider, regardless its actual implementation. We can easily replace the behavior of the code by using a different module which implements the same interface. 

2 comments:

  1. in FindStaleProviders(Key key) inside the loop:
    if (Exists(value))
    {
    break;
    }

    Did you mean continue or break?

    ReplyDelete
  2. I meant break, and I'll explain:
    Once we've found a value, we need to update just the providers that have missing data until that point. If we continue, we end up loading from all the providers, which is exactly what we wanted to avoid here.

    ReplyDelete