Performance testing LINQ to Objects Except Method


When .NET introduced LINQ and lambdas, I was blown away by the efficiency improvements in writing code. But, after reading countless articles online about how horribly slow and inefficient LINQ (even to objects) was compared with just plain for-loops, I’ve always felt guilty about using it.

The other day, however, I wanted to see how much slower linq would be in comparing 2 lists of objects to find all of the items from list1 that were not in list2. LINQ has an extension method called “Except” just for this.

I coded the solution in LINQ, and then coded a few “solutions” using pre-LINQ techniques…spoiler alert: LINQ was the best for large sets of objects.

The objects in the lists look like this:

private class TempFileInfo: IEquatable<TempFileInfo>
        {
            public ulong Id { get; set; }
            public string Name { get; set; }
            public ulong ParentId { get; set; }
            public string VirtualPath { get; set; }
            public string FileName { get; set; }

            public bool Equals(TempFileInfo other)
            {
                if (other == null)
                    return false;
                if (!EqualityComparer<ulong>.Default.Equals(Id, other.Id))
                    return false;
                if (!EqualityComparer<string>.Default.Equals(Name, other.Name))
                    return false;
                if (!EqualityComparer<ulong>.Default.Equals(ParentId, other.ParentId))
                    return false;
                if (!EqualityComparer<string>.Default.Equals(VirtualPath, other.VirtualPath))
                    return false;
                if (!EqualityComparer<string>.Default.Equals(FileName, other.FileName))
                    return false;
                return true;
            }

            public override int GetHashCode()
            {
                int hash = 0;
                hash ^= EqualityComparer<ulong>.Default.GetHashCode(Id);
                hash ^= EqualityComparer<string>.Default.GetHashCode(Name);
                hash ^= EqualityComparer<ulong>.Default.GetHashCode(ParentId);
                hash ^= EqualityComparer<string>.Default.GetHashCode(VirtualPath);
                hash ^= EqualityComparer<string>.Default.GetHashCode(FileName);
                return hash;
            }
        }

And then I have a test method that does all of the comparisons:

        [TestMethod]
        public void perfTest()
        {
            var localList = new List<TempFileInfo>();
            var remoteList = new List<TempFileInfo>();
            var diffs = new Dictionary<TempFileInfo, bool>();

            var listLength = 1000;

            for (int counter = 0; counter < listLength; counter ++)
            {
                var guidText = Guid.NewGuid().ToString();
                localList.Add(new TempFileInfo() {Id = (ulong) counter, Name = guidText, ParentId = 1, VirtualPath = "Meh", FileName = "fake file path"});
                remoteList.Add(new TempFileInfo() {Id = (ulong) counter, Name = guidText, ParentId = 1, VirtualPath = "Meh", FileName = "fake file path"});
            }

            var uniqueName = "different";
            var differentFile = new TempFileInfo()
                {
                    Id = 0,
                    Name = uniqueName,
                    ParentId = 1,
                    VirtualPath = "Meh",
                    FileName = "fake file path"
                };

            localList.Add(differentFile);

            var watch = new Stopwatch();

            //linq way
            watch.Start();
            var linqDiffs= localList.Except(remoteList).ToList();
            watch.Stop();

            var linqDuration = watch.Elapsed.Duration();
            
            watch.Reset();

            //dictionary way
            watch.Start();
            
            foreach (var a in localList) diffs.Add(a, true);
            foreach (var b in remoteList) diffs.Remove(b);
            watch.Stop();

            var dictionaryDuration = watch.Elapsed.Duration();

            watch.Reset();
            diffs.Clear();

            //contains way
            watch.Start();
            foreach (var localInfo in localList)
            {
                if (!remoteList.Contains(localInfo))
                {
                    diffs.Add(localInfo, true);
                }
            }
            watch.Stop();

            var containsDuration = watch.Elapsed.Duration();

            watch.Reset();
            diffs.Clear();

            //double loop
            watch.Start();
            foreach (var localInfo in localList)
            {
                var contains = false;
                foreach (var remoteInfo in remoteList)
                {
                    contains = localInfo.Name == remoteInfo.Name;
                    if (contains) break;
                }
                if (!contains)
                    diffs.Add(localInfo, true);
            }
            watch.Stop();

            var foreachDuration = watch.Elapsed.Duration();

            watch.Reset();
            diffs.Clear();

            //remove way (if localList can be modified)
            watch.Start();
            foreach (var remoteFi in remoteList)
            {
                localList.Remove(remoteFi);
            }
            watch.Stop();
            var removeDuration = watch.Elapsed.Duration();
            
        }

For just 1K items (listLength variable), LINQ isn’t very impressive:

1K_Except_Durations

But let’s try for 10K items…hmm, LINQ is working it’s way up to #1:
10K_LINQ_Duration

And what about 100K items? (If you plan to run this yourself, prepare to wait a while… the nested-looping techniques are effectively O(N^2)… so the time it takes to run is exponential.)

LINQ wins hands down:
100K_LINQ_Duration

If you have time to kill, you can try it with higher numbers… but waiting 5+minutes for the .Contains() method was long enough for me!

Conclusion

No need to feel guilty about using LINQ… where it matters (huge sets), it’s the best choice anyway.

Advertisements

About Bogdan Varlamov

A .NET Software Engineer that strongly believes technology should simplify and improve the quality of our lives instead of making them more complicated. View all posts by Bogdan Varlamov

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: