Monday, April 28, 2014

How to split an array in C#?

Every now and then, we need to perform mundane operations that are very simple but don't have a built-in function in the language. So we write some ad-hoc code, maybe even copy something from StackOverflow and are done with it. What we sometimes fail to notice, however, is the affect this has on performance.
This entry will focus on splitting an array, but this is relevant for other operations as well.
The time it takes to perform an operation is only important if the code is in a time-critical section and/or is performed a large amount of times. If this is not the case, whatever implementation you choose will probably be OK.
Having said that, let's look at some of the ways to split an array. We'll use a medium sized byte array for this purpose (256), splitting it to a short array (16 bytes) and the rest.
Assuming you are using .Net 3.5+, the most natural way to go about this is to use LINQ:

private static void SplitArayUsingLinq(byte[] data)
{
         byte[] first = data.Take(16).ToArray();
         byte[] second = data.Skip(16).ToArray();
}

As you can see, the method is very elegant, only 1 line of code required to create each part. In addition, it seems like using Take and Skip on a small number of runs can't cause a performance issue.
However, running the above code a 1,000,000 times takes around 10 seconds, which is a considerable amount of time.
Let's compare the LINQ version to a good old for loop:

private static void SplitUsingForLoop(byte[] data)
{
 byte[] first = new byte[16];
 for (int i = 0; i < first.Length; i++)
 {
  first[i] = data[i];
 }
 byte[] second = new byte[data.Length - first.Length];
 for (int i = 0; i < second.Length; i++)
 {
  second[i] = data[i + first.Length];
 }
}

This yields a run time of less than 2 seconds – more than x5 improvement! The looping method seems to be much more efficient. Let's try to improve this some more.
If we do our Googling right, we find that copying arrays actually is a library function – Array.Copy. Let's test this:

private static void SplitArrayUsingArrayCopy(byte[] data)
{
         byte[] first = new byte[16];
         Array.Copy(data, first, first.Length);
         byte[] second = new byte[data.Length - first.Length];
         Array.Copy(data, first.Length, second, 0, second.Length);
}


We get a result of 250ms – another x8 improvement, a total of x40 compared to the LINQ version!
Digging even dipper and only in the case of byte[], we can use another method called Buffer.BlockCopy that actually performs a low level byte copy:


private static void SplitArrayUsingBlockCopy(byte[] data)
{
         byte[] first = new byte[16];
         Buffer.BlockCopy(data, 0, first, 0, first.Length);
         byte[] second = new byte[data.Length - first.Length];
         Buffer.BlockCopy(data, first.Length, second, 0, second.Length);
}


Now the results are 180ms, which is yet another improvement, albeit not as dramatic as the previous ones.
In conclusion:
Method
Time for 1,000,000 splits (s)
Improvement factor
LINQ
10.2
-
for loop
1.93
x5
Array.Copy
0.25
x40
Buffer.BlockCopy
0.18
x56

Kids, don't trust LINQ blindly for what really matters J

1 comment:

  1. .GetRange is quite fast, which is a LINQ extension.

    ReplyDelete