Thursday, March 10, 2011

Linq Performance

Download Example

(UPDATE 11/30/2011: I made some clarifications to this article based on some healthy criticism from my friend at http://ox.no/posts/linq-vs-loop-a-performance-test)

LINQ Deferred Execution (a very brief explanation)


Once understood, LINQ offers a quick way to create query language statements (like SQL) for object-oriented development. However, it is critical to understand deferred execution for LINQ (i.e. the code is not run until the results of the query need to be evaluated). I highly recommend http://blogs.msdn.com/b/charlie/archive/2007/12/09/deferred-execution.aspx as a good read.

Performance Study Explanation

The general approach to this study is to load up some objects with some random data. Then do a comparison of the resulting performance between the LINQ iterations verses the for loop iterations. Furthermore, I have included one example where iterating with LINQ is a good choice (Performance Test A) and another example where iterating through LINQ is a bad choice (Performance Test B).

Before showing the test examples, it should be noted that the working sets are preloaded with 10,000,000 dummy objects as is shown in the GenerateList() method below. The entire source code for the tests can be downloaded here.

public static List<DummyModel> GenerateList()
{
    int i;
    List<DummyModel> list = new List<DummyModel>();
    Random rand = new Random(unchecked((int)DateTime.Now.Ticks));

    for (i = 0; i < 10000000; i++)
        list.Add(new DummyModel()
        {
            A = rand.NextDouble()
        });

    return list;
}

 

Performance Test A (the Good LINQ)

Performance Test A Summary: On my machine set (1) performed better than set (2) by about 200 milliseconds and sets (1) and (2) were iterated the same number of times.

[TestMethod]
public void PerformanceTestA()
{
    IEnumerable<DummyModel> queryResults;
    IEnumerable<DummyModel> staticResults;
    List<DummyModel> list = PerformanceUtility.GenerateList();
    DateTime start = DateTime.Now;

    // 1.
    queryResults = list.Where(d => PerformanceUtility.PickDummy(d)); // Execution deferred
    PerformanceUtility.WriteAccessCount("1. "); // Should still be zero

    // 1.a
    foreach (DummyModel model in queryResults) // Query Executed
        model.A.ToString();

    PerformanceUtility.WriteAccessCount("1.a"); // Should be approx 15,000,000

    // 1.b
    Console.WriteLine(string.Format("1.b Execution Time: {0}", DateTime.Now - start));


    //------------------------------------
    Console.WriteLine();
    start = DateTime.Now;

    // 2.
    DummyModel.AccessCount = 0;
    queryResults = list.Where(d => PerformanceUtility.PickDummy(d)); // Execution deferred
    PerformanceUtility.WriteAccessCount("2. "); // Should still be zero

    // 2.a
    staticResults = queryResults.ToArray(); // Query Executed Again
    PerformanceUtility.WriteAccessCount("2.a"); // Should be 10,000,000

    // 2.b
    foreach (DummyModel model in staticResults) // Query NOT Executed
        model.A.ToString();

    PerformanceUtility.WriteAccessCount("2.b"); // Should be approx 15,000,000

    // 2.c
    Console.WriteLine(string.Format("2.c Execution Time: {0}", DateTime.Now - start));
}


I should note that the access of property DummyModel.A property is accessed 15,000,000 times because the PerformanceUtility.PickDummy() method picks all A values >= 0.5. This means the initial iteration for choosing which DummyModels to be include in our set will be 10,000,000. Thus, the resulting list will contain 5,000,000 items. Therefore any subsequent iterations will only be 5,000,000.

But wait, why did LINQ execute faster than the for loop if they both did the same number of iterations? Well to put it bluntly, apparently one of the perks for using LINQ is some kind of compiler optimization that your average for loop will not have (Pending Fact Check: I'd like to have some sort of confirmation from an official source about this assumption. If anyone has any info from a reliable source, let me know.).

Performance Test B (the Bad LINQ)

Performance Test B Summary: On my machine set (2) performed better than set (1) by about 900 milliseconds and set (1) required 20,000,000 more iterations than set (2) for the same functionality.
[TestMethod]
public void PerformanceTestB()
{
    IEnumerable<DummyModel> queryResults;
    IEnumerable<DummyModel> staticResults;
    List<DummyModel> list = PerformanceUtility.GenerateList();
    DateTime start = DateTime.Now;

    // 1.
    queryResults = list.Where(d => PerformanceUtility.PickDummy(d)); // Execution deferred
    PerformanceUtility.WriteAccessCount("1. "); // Should still be zero

    // 1.a
    queryResults.ToArray(); // Query Executed
    PerformanceUtility.WriteAccessCount("1.a"); // Should be 10,000,000

    // 1.b
    foreach (DummyModel model in queryResults) // Query Executed Again
        model.A.ToString();

    PerformanceUtility.WriteAccessCount("1.b"); // Should be approx 25,000,000

    // 1.c
    foreach (DummyModel model in queryResults) // Query Executed Again
        model.A.ToString();

    PerformanceUtility.WriteAccessCount("1.c"); // Should be approx 40,000,000

    // 1.d
    Console.WriteLine(string.Format("1.d Execution Time: {0}", DateTime.Now - start));


    //------------------------------------
    Console.WriteLine();
    start = DateTime.Now;

    // 2.
    DummyModel.AccessCount = 0;
    queryResults = list.Where(d => PerformanceUtility.PickDummy(d)); // Execution deferred
    PerformanceUtility.WriteAccessCount("2. "); // Should still be zero

    // 2.a
    staticResults = queryResults.ToArray(); // Query Executed Again
    PerformanceUtility.WriteAccessCount("2.a"); // Should be 10,000,000

    // 2.b
    foreach (DummyModel model in staticResults) // Query NOT Executed
        model.A.ToString();

    PerformanceUtility.WriteAccessCount("2.b"); // Should be approx 15,000,000

    // 2.c
    foreach (DummyModel model in staticResults) // Query NOT Executed
        model.A.ToString();

    PerformanceUtility.WriteAccessCount("2.c"); // Should be approx 20,000,000

    // 2.d
    Console.WriteLine(string.Format("2.d Execution Time: {0}", DateTime.Now - start));
}

Whoa! Why did LINQ take so many more iterations than the for loop? Well, this is definitely a direct result of deferred execution. If I was a developer who had no understanding of deferred execution (of which I did at first), I might make this performance mistake. Look at the comments in the code carefully. You'll notice that in each iteration LINQ does a query execution each and every time.


Conclusion

A simple knowledge and building some of your own tests for deferred execution can quickly get you up to speed for optimization. Ultimately, whether to load your LINQ results into a static collection or to allow deferred execution is going to be a case by case decision. Have fun and good luck!

Download Example

1 comment:

  1. Well, I think your blog is pretty good and have lots of useful information. Keep it up!
    Offshore Software Development Company

    ReplyDelete