Download Example
(UPDATE 11/30/2011: I made some clarifications to this article based on some healthy criticism from my friend at http://ox.no/posts/linq-vs-loop-a-performance-test)
LINQ Deferred Execution (a very brief explanation)
Once understood,
LINQ offers a quick way to create
query language statements (like
SQL) for
object-oriented development. However, it is critical to understand
deferred execution for LINQ (i.e. the code is not run until the results of the query need to be evaluated). I highly recommend
http://blogs.msdn.com/b/charlie/archive/2007/12/09/deferred-execution.aspx as a good read.
Performance Study Explanation
The general approach to this study is to load up some objects with some random data. Then do a comparison of the resulting performance between the LINQ iterations verses the for loop iterations. Furthermore, I have included one example where iterating with LINQ is a good choice (Performance Test A) and another example where iterating through LINQ is a bad choice (Performance Test B).
Before showing the test examples, it should be noted that the working sets are preloaded with 10,000,000 dummy objects as is shown in the GenerateList() method below. The entire source code for the tests can be
downloaded here.
public static List<DummyModel>
GenerateList()
{
int i;
List<DummyModel> list = new
List<DummyModel>();
Random
rand = new Random(unchecked((int)DateTime.Now.Ticks));
for (i = 0;
i < 10000000; i++)
list.Add(new
DummyModel()
{
A = rand.NextDouble()
});
return
list;
}
Performance Test A (the Good LINQ)
Performance Test A Summary: On my machine set (1) performed
better than set (2) by about 200 milliseconds and sets (1) and (2) were
iterated the same number of times.
[TestMethod]
public void
PerformanceTestA()
{
IEnumerable<DummyModel> queryResults;
IEnumerable<DummyModel> staticResults;
List<DummyModel> list = PerformanceUtility.GenerateList();
DateTime
start = DateTime.Now;
// 1.
queryResults = list.Where(d => PerformanceUtility.PickDummy(d)); // Execution deferred
PerformanceUtility.WriteAccessCount("1. "); //
Should still be zero
// 1.a
foreach (DummyModel model in
queryResults) // Query Executed
model.A.ToString();
PerformanceUtility.WriteAccessCount("1.a"); //
Should be approx 15,000,000
// 1.b
Console.WriteLine(string.Format("1.b
Execution Time: {0}", DateTime.Now
- start));
//------------------------------------
Console.WriteLine();
start = DateTime.Now;
// 2.
DummyModel.AccessCount
= 0;
queryResults = list.Where(d => PerformanceUtility.PickDummy(d)); // Execution deferred
PerformanceUtility.WriteAccessCount("2. "); //
Should still be zero
// 2.a
staticResults = queryResults.ToArray(); // Query Executed Again
PerformanceUtility.WriteAccessCount("2.a"); //
Should be 10,000,000
// 2.b
foreach (DummyModel model in
staticResults) // Query NOT Executed
model.A.ToString();
PerformanceUtility.WriteAccessCount("2.b"); //
Should be approx 15,000,000
// 2.c
Console.WriteLine(string.Format("2.c
Execution Time: {0}", DateTime.Now
- start));
}
I should note that the access of property DummyModel.A property is accessed 15,000,000 times because the PerformanceUtility.PickDummy() method picks all A values >= 0.5. This means the initial iteration for choosing which DummyModels to be include in our set will be 10,000,000. Thus, the resulting list will contain 5,000,000 items. Therefore any subsequent iterations will only be 5,000,000.
But wait, why did LINQ execute faster than the for loop if they both did the same number of iterations? Well to put it bluntly, apparently one of the perks for using LINQ is some kind of compiler optimization that your average for loop will not have
(Pending Fact Check: I'd like to have some sort of confirmation from an official source about this assumption. If anyone has any info from a reliable source, let me know.).
Performance Test B (the Bad LINQ)
Performance Test B Summary: On my machine set (2) performed
better than set (1) by about 900 milliseconds and set (1) required 20,000,000 more iterations than set (2) for the same functionality.
[TestMethod]
public void
PerformanceTestB()
{
IEnumerable<DummyModel> queryResults;
IEnumerable<DummyModel> staticResults;
List<DummyModel> list = PerformanceUtility.GenerateList();
DateTime
start = DateTime.Now;
// 1.
queryResults = list.Where(d => PerformanceUtility.PickDummy(d)); // Execution deferred
PerformanceUtility.WriteAccessCount("1. "); //
Should still be zero
// 1.a
queryResults.ToArray(); // Query Executed
PerformanceUtility.WriteAccessCount("1.a"); //
Should be 10,000,000
// 1.b
foreach (DummyModel model in
queryResults) // Query Executed Again
model.A.ToString();
PerformanceUtility.WriteAccessCount("1.b"); //
Should be approx 25,000,000
// 1.c
foreach (DummyModel model in
queryResults) // Query Executed Again
model.A.ToString();
PerformanceUtility.WriteAccessCount("1.c"); //
Should be approx 40,000,000
// 1.d
Console.WriteLine(string.Format("1.d
Execution Time: {0}", DateTime.Now
- start));
//------------------------------------
Console.WriteLine();
start = DateTime.Now;
// 2.
DummyModel.AccessCount
= 0;
queryResults = list.Where(d => PerformanceUtility.PickDummy(d)); // Execution deferred
PerformanceUtility.WriteAccessCount("2. "); //
Should still be zero
// 2.a
staticResults = queryResults.ToArray(); // Query Executed Again
PerformanceUtility.WriteAccessCount("2.a"); //
Should be 10,000,000
// 2.b
foreach (DummyModel model in
staticResults) // Query NOT Executed
model.A.ToString();
PerformanceUtility.WriteAccessCount("2.b"); //
Should be approx 15,000,000
// 2.c
foreach (DummyModel model in
staticResults) // Query NOT Executed
model.A.ToString();
PerformanceUtility.WriteAccessCount("2.c"); //
Should be approx 20,000,000
// 2.d
Console.WriteLine(string.Format("2.d
Execution Time: {0}", DateTime.Now
- start));
}
Whoa! Why did LINQ take so many more iterations than the for loop? Well, this is definitely a direct result of deferred execution. If I was a developer who had no understanding of deferred execution (of which I did at first), I might make this performance mistake. Look at the comments in the code carefully. You'll notice that in each iteration LINQ does a query execution each and every time.
Conclusion
A simple knowledge and building some of your own tests for deferred execution can quickly get you up to speed for optimization. Ultimately, whether to load your LINQ results into a static collection or to allow deferred execution is going to be a case by case decision. Have fun and good luck!
Download Example