As part of our crawling infrastructure, we wanted to enhance our messaging framework.
The crawlers are using a collection of dedicated “workers”, each worker is implementing unique business logic like downloading, validating and parsing the content.
The crawlers are using a collection of dedicated “workers”, each worker is implementing unique business logic like downloading, validating and parsing the content.
At first, we used NServiceBus (Based on MSMQ by default) and the system worked as well as expected.
Unfortunately, when we tried to speed up the crawling process by running a greater amount of processes, we noticed that MSMQ hindered the ability to scale. In essence, there was a huge hit in performance due to heavy I/O operations that we needed to bare because of the MSMQ.
Unfortunately, when we tried to speed up the crawling process by running a greater amount of processes, we noticed that MSMQ hindered the ability to scale. In essence, there was a huge hit in performance due to heavy I/O operations that we needed to bare because of the MSMQ.
Based on Redis publicity as a fast key-value store (More information can be found at Dvir Volks presentations), we decided to give it a shot.
Redis works on Linux, Solaris and most of the POSIX systems. Although there is no support for Windows builds, we had to try it out because our system runs on Windows (written in C#).
Unfortunately, when we had a lot of Redis connections the system stopped working due to timeouts (Operations were timed out even though the messages were popped which caused a loss of data).
Eventually after running the Ubuntu version those issues were gone and everything started to work.
Eventually after running the Ubuntu version those issues were gone and everything started to work.
In order to work with Redis using C# we have reviewed various available clients.
We tested Sider, Booksleeve and ServiceStack.Redis focusing on ease of use, functionality and connections management; Finally our choice was ServiceStack.Redis.
ServiceStack.Redis provides typed clients which allow you to bind a client to a specific type, and a native client that allows you to work with byte arrays.
For our purpose we worked with the native client due to:
Press here to download
We tested Sider, Booksleeve and ServiceStack.Redis focusing on ease of use, functionality and connections management; Finally our choice was ServiceStack.Redis.
ServiceStack.Redis provides typed clients which allow you to bind a client to a specific type, and a native client that allows you to work with byte arrays.
For our purpose we worked with the native client due to:
- Problems deserializing complex types with the typed clients (if we were using only primitives / simple types it would have worked without a problem).
- Serialization control for reduction of data on the network and for performance efficiency (we did not want to use a wasteful serialization format as xml).
- BLPop / BRPop will block a client until a message will be added to the queue ( No queue polling is required and that means that the system uses less networking and leaves Redis free to process other requests).
- For better performance (And if persistency is not required) Redis can be configured ti save its data in memory only.
- You can use Redis "Set" object for getting a random message from a queue.
- You can use Redis "Hash" object in order to verify uniqueness of a message.
- We had some memory fragmentation issues on Linux that were fixed after using Redis version that supports Jemalloc (Supported from Redis 2.4).
- You can run more than one Redis process on a machine (by specifying a different port for each process).
Queues information can be very useful for development and for testing.
Therefore, we created a tool called "Redis Administration" (this time we used the Sider C# client):
Therefore, we created a tool called "Redis Administration" (this time we used the Sider C# client):
Press here to download