[.NET]快快樂樂學LINQ系列－Zip() 簡介

2014-06-06

前言

發現蠻多朋友雖然很常用 LINQ to Objects 的方法，卻很少用到 Zip() ，個人覺得很可能是方法名字不夠直覺，但實務上其實也有蠻多場景適合使用 Zip() 來合併兩個集合。

需求與範例

這邊以一個發票配號的簡單例子來說明需求：

有一個 Customer 的集合，假設裡面有 3 個 Customer 。
有一個 InvoiceNumber 的集合，假設裡面有 4 個發票號碼等待配號。
希望得到的結果是 3 個帶著 Customer Name 與發票號碼資訊的物件。
如果是 3 個 Customer, 2 個發票號碼，則希望得到的結果是 2 個帶著 Customer Name 與發票號碼資訊的物件。

看一下範例程式：


    [TestClass]
    public class UnitTest1
    {
        [TestMethod]
        public void TestMethod1()
        {
            var customers = new List<Customer>
            {
                new Customer{Name="Joey"},
                new Customer{Name="Kevin"},
                new Customer{Name="Ian"},
            };

            var invoiceNumbers = new List<string>
            {
                "AZ001","AZ007", "AZ101", "AZ999"
            };

            var expected = new List<Tuple<string, string>>
            {
                Tuple.Create("Joey", "AZ001"),
                Tuple.Create("Kevin", "AZ007"),
                Tuple.Create("Ian", "AZ101"),
            };

            List<Tuple<string, string>> actual = this.DispatchInvoice(customers, invoiceNumbers);

            Assert.IsTrue(expected.SequenceEqual(actual));
        }

        private List<Tuple<string, string>> DispatchInvoice(List<Customer> customers, List<string> invoiceNumbers)
        {
            var upbound = customers.Count < invoiceNumbers.Count ? customers.Count : invoiceNumbers.Count;
            var result = new List<Tuple<string, string>>();

            for (int i = 0; i < upbound; i++)
            {
                result.Add(Tuple.Create(customers[i].Name, invoiceNumbers[i]));
            }

            return result;
        }
    }

    public class Customer
    {
        public string Name { get; set; }
    }

可以看到，這樣的需求，如果不用 Zip() ，基本上大部分選擇的作法，就是要用 for 迴圈，透過 index 來選擇兩個 ICollection 的 item 進行結合。而 index 的上限，取決於兩個 ICollection 長度較短的那一個集合長度。

通常不會用 foreach ，因為一個 foreach 無法同時取得兩個 ICollection 的 item 。但用 for 迴圈看起來又有點愚蠢。

而這需求不容易使用 Select() 來做，因為 Select() 比較像是多個 element 使用同一個 selector delegate 來投射成新的物件，除非使用 Select() 的多載，一樣透過 index 來取代剛剛的 for 迴圈，如下所示：


        private List<Tuple<string, string>> DispatchInvoice(List<Customer> customers, List<string> invoiceNumbers)
        {
            //var upbound = customers.Count < invoiceNumbers.Count ? customers.Count : invoiceNumbers.Count;
            //var result = new List<Tuple<string, string>>();

            //for (int i = 0; i < upbound; i++)
            //{
            //    result.Add(Tuple.Create(customers[i].Name, invoiceNumbers[i]));
            //}

            //return result;

            // 使用 Select()
            if (customers.Count < invoiceNumbers.Count)
            {
                return customers.Select((c, index) => Tuple.Create(c.Name, invoiceNumbers[index])).ToList();
            }
            else
            {
                return invoiceNumbers.Select((n, index) => Tuple.Create(customers[index].Name, n)).ToList();
            }
        }

雖然用了 Select() 的多載，但仍卡在要判斷哪一個集合長度較短，來決定要以哪一個集合為 source ，而且這幾乎是為了用 Select() 而用 Select() ，看起來很酷，卻還不如 for 迴圈來得簡單好懂。

這個需求就是使用 Zip() 的標準場景，透過 Zip() 改寫，只要簡單一行程式碼即可。如下所示：


        private List<Tuple<string, string>> DispatchInvoice(List<Customer> customers, List<string> invoiceNumbers)
        {
            //var upbound = customers.Count < invoiceNumbers.Count ? customers.Count : invoiceNumbers.Count;
            //var result = new List<Tuple<string, string>>();

            //for (int i = 0; i < upbound; i++)
            //{
            //    result.Add(Tuple.Create(customers[i].Name, invoiceNumbers[i]));
            //}

            //return result;

            //// 使用 Select()
            //if (customers.Count < invoiceNumbers.Count)
            //{
            //    return customers.Select((c, index) => Tuple.Create(c.Name, invoiceNumbers[index])).ToList();
            //}
            //else
            //{
            //    return invoiceNumbers.Select((n, index) => Tuple.Create(customers[index].Name, n)).ToList();
            //}

            // 使用 Zip()
            return customers.Zip(invoiceNumbers, (c, n) => Tuple.Create(c.Name, n)).ToList();
        }

模擬 Zip() 實作方式

瞭解前面 LINQ 系列的基底時，不難想像 Zip() 骨子裡的實作有多簡單，原本卡在 foreach 沒有 index ，以及 for 迴圈用 index 來取得 Collection item 太醜的這些問題，在 IEnumerable<T> 裡完全不是問題。因為 index 本來就只有 ICollection 才有，在 IEnumerable<T> 中，只有 GetEnumerator(), MoveNext() 跟 Current 取 item 三個方式。

只要讓兩個 IEnumerable<T> 一起跑 MoveNext() 取 Current ，把這兩個 item 餵給 resultSelector 的委派即可。而 MoveNext() 中止條件就是其中一個集合已經跑完了，就結束這個 yield 。


    [TestClass]
    public class UnitTest1
    {
        [TestMethod]
        public void TestMethod1()
        {
            var customers = new List<Customer>
            {
                new Customer{Name="Joey"},
                new Customer{Name="Kevin"},
                new Customer{Name="Ian"},
            };

            var invoiceNumbers = new List<string>
            {
                "AZ001","AZ007", "AZ101", "AZ999"
            };

            var expected = new List<Tuple<string, string>>
            {
                Tuple.Create("Joey", "AZ001"),
                Tuple.Create("Kevin", "AZ007"),
                Tuple.Create("Ian", "AZ101"),
            };

            List<Tuple<string, string>> actual = this.DispatchInvoice(customers, invoiceNumbers);

            Assert.IsTrue(expected.SequenceEqual(actual));
        }

        private List<Tuple<string, string>> DispatchInvoice(List<Customer> customers, List<string> invoiceNumbers)
        {
            //var upbound = customers.Count < invoiceNumbers.Count ? customers.Count : invoiceNumbers.Count;
            //var result = new List<Tuple<string, string>>();

            //for (int i = 0; i < upbound; i++)
            //{
            //    result.Add(Tuple.Create(customers[i].Name, invoiceNumbers[i]));
            //}

            //return result;

            //// 使用 Select()
            //if (customers.Count < invoiceNumbers.Count)
            //{
            //    return customers.Select((c, index) => Tuple.Create(c.Name, invoiceNumbers[index])).ToList();
            //}
            //else
            //{
            //    return invoiceNumbers.Select((n, index) => Tuple.Create(customers[index].Name, n)).ToList();
            //}

            //// 使用 Zip()
            //return customers.Zip(invoiceNumbers, (c, n) => Tuple.Create(c.Name, n)).ToList();

            // 使用 MyZip()
            return customers.MyZip(invoiceNumbers, (c, n) => Tuple.Create(c.Name, n)).ToList();
        }
    }

    public static class MyLinqExtension
    {
        public static IEnumerable<TResult> MyZip<TFirst, TSecond, TResult>(
        this IEnumerable<TFirst> first,
        IEnumerable<TSecond> second,
        Func<TFirst, TSecond, TResult> resultSelector)
        {
            using (IEnumerator<TFirst> firstIterator = first.GetEnumerator())
            using (IEnumerator<TSecond> secondIterator = second.GetEnumerator())
            {
                while (firstIterator.MoveNext() && secondIterator.MoveNext())
                {
                    yield return resultSelector(firstIterator.Current, secondIterator.Current);
                }
            }
        }
    }

有些時候，回到原點反而很簡單！

結論

如果看到程式碼是在針對兩個集合的每一個 item 進行結合與處理，而處理方式是使用 for 迴圈 + index ，甚至於 foreach 迴圈 + 硬幹 index 時，考慮一下是不是能直接用 Zip 搭配 resultSelector 漂亮的解決這個需求或重構程式碼。

另外補充的是，結合兩個集合，可能還有其他常見的方式：

Concat()：Concat() 是用來把兩個同型別的 element 集合串起來。在這個例子，一個是 Customer, 一個是 string ，所以不適用。
Join()：Join() 需要兩個集合有關連的 key 來做關連。在這個例子， Customer 與 string 沒有可用來關連的 key ，所以不適用。
Union()：Union() 也是針對相同型別的兩個集合來串接，但會把重複的 element 排除掉。

希望這一篇對大家來說，真的有快快樂樂學 LINQ 的感覺

Reference

blog 與課程更新內容，請前往新站位置：http://tdd.best/

回首頁

In 91

擁有熱血魂，熱衷敏捷開發相關的點火師，專門點燃其他人內心的火種