Custom implementation of the linq Zip operator for different length lists











up vote
6
down vote

favorite












Based on my answer I have my implementation of linq Zip operator which operates on different length lists, and loops shortest list.



My implementation:



using System;
using System.Collections.Generic;
using System.Linq;

namespace SO
{
internal class Program
{
public static void Main(string args)
{
List<String> listA = new List<string> {"a", "b", "c", "d", "e", "f", "g"};
List<String> listB = new List<string> {"1", "2", "3"};

var mix = listA.ZipNew(listB, (l1, l2) => new {l1, l2}).SelectMany(x => x);

foreach (var m in mix)
{
Console.WriteLine(m);
}
}
}

public static class Impl
{
public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
using (IEnumerator<TFirst> iterator1 = first.GetEnumerator())
using (IEnumerator<TSecond> iterator2 = second.GetEnumerator())
{
var i1 = true;
var i2 = true;
var i1Shorter = false;
var i2Shorter = false;
var firstRun = true;


while(true)
{
i1 = iterator1.MoveNext();
i2 = iterator2.MoveNext();

if (!i1 && (i1Shorter || firstRun))
{
iterator1.Reset();
i1 = iterator1.MoveNext();
i1Shorter = true;
firstRun = false;
}

if (!i2 && (i2Shorter || firstRun))
{
iterator2.Reset();
i2 = iterator2.MoveNext();
i2Shorter = true;
firstRun = false;
}

if (!(i1 && i2))
{
break;
}

yield return resultSelector(iterator1.Current, iterator2.Current);
}
}
}
}
}


And I wonder if this implementation could be improved somehow, what could be improved, for better readability or speed.










share|improve this question




















  • 2




    Your question would benefit from some example usage within the question itself, so that we can see how you intend the method to be consumed, and quickly verify that it is working as intended.
    – VisualMelon
    2 days ago












  • @VisualMelon improved, full working code
    – BWA
    2 days ago






  • 4




    Never Reset an enumerator. Just get a new enumerator. Reset was a misfeature intended for COM interop scenarios.
    – Eric Lippert
    2 days ago















up vote
6
down vote

favorite












Based on my answer I have my implementation of linq Zip operator which operates on different length lists, and loops shortest list.



My implementation:



using System;
using System.Collections.Generic;
using System.Linq;

namespace SO
{
internal class Program
{
public static void Main(string args)
{
List<String> listA = new List<string> {"a", "b", "c", "d", "e", "f", "g"};
List<String> listB = new List<string> {"1", "2", "3"};

var mix = listA.ZipNew(listB, (l1, l2) => new {l1, l2}).SelectMany(x => x);

foreach (var m in mix)
{
Console.WriteLine(m);
}
}
}

public static class Impl
{
public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
using (IEnumerator<TFirst> iterator1 = first.GetEnumerator())
using (IEnumerator<TSecond> iterator2 = second.GetEnumerator())
{
var i1 = true;
var i2 = true;
var i1Shorter = false;
var i2Shorter = false;
var firstRun = true;


while(true)
{
i1 = iterator1.MoveNext();
i2 = iterator2.MoveNext();

if (!i1 && (i1Shorter || firstRun))
{
iterator1.Reset();
i1 = iterator1.MoveNext();
i1Shorter = true;
firstRun = false;
}

if (!i2 && (i2Shorter || firstRun))
{
iterator2.Reset();
i2 = iterator2.MoveNext();
i2Shorter = true;
firstRun = false;
}

if (!(i1 && i2))
{
break;
}

yield return resultSelector(iterator1.Current, iterator2.Current);
}
}
}
}
}


And I wonder if this implementation could be improved somehow, what could be improved, for better readability or speed.










share|improve this question




















  • 2




    Your question would benefit from some example usage within the question itself, so that we can see how you intend the method to be consumed, and quickly verify that it is working as intended.
    – VisualMelon
    2 days ago












  • @VisualMelon improved, full working code
    – BWA
    2 days ago






  • 4




    Never Reset an enumerator. Just get a new enumerator. Reset was a misfeature intended for COM interop scenarios.
    – Eric Lippert
    2 days ago













up vote
6
down vote

favorite









up vote
6
down vote

favorite











Based on my answer I have my implementation of linq Zip operator which operates on different length lists, and loops shortest list.



My implementation:



using System;
using System.Collections.Generic;
using System.Linq;

namespace SO
{
internal class Program
{
public static void Main(string args)
{
List<String> listA = new List<string> {"a", "b", "c", "d", "e", "f", "g"};
List<String> listB = new List<string> {"1", "2", "3"};

var mix = listA.ZipNew(listB, (l1, l2) => new {l1, l2}).SelectMany(x => x);

foreach (var m in mix)
{
Console.WriteLine(m);
}
}
}

public static class Impl
{
public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
using (IEnumerator<TFirst> iterator1 = first.GetEnumerator())
using (IEnumerator<TSecond> iterator2 = second.GetEnumerator())
{
var i1 = true;
var i2 = true;
var i1Shorter = false;
var i2Shorter = false;
var firstRun = true;


while(true)
{
i1 = iterator1.MoveNext();
i2 = iterator2.MoveNext();

if (!i1 && (i1Shorter || firstRun))
{
iterator1.Reset();
i1 = iterator1.MoveNext();
i1Shorter = true;
firstRun = false;
}

if (!i2 && (i2Shorter || firstRun))
{
iterator2.Reset();
i2 = iterator2.MoveNext();
i2Shorter = true;
firstRun = false;
}

if (!(i1 && i2))
{
break;
}

yield return resultSelector(iterator1.Current, iterator2.Current);
}
}
}
}
}


And I wonder if this implementation could be improved somehow, what could be improved, for better readability or speed.










share|improve this question















Based on my answer I have my implementation of linq Zip operator which operates on different length lists, and loops shortest list.



My implementation:



using System;
using System.Collections.Generic;
using System.Linq;

namespace SO
{
internal class Program
{
public static void Main(string args)
{
List<String> listA = new List<string> {"a", "b", "c", "d", "e", "f", "g"};
List<String> listB = new List<string> {"1", "2", "3"};

var mix = listA.ZipNew(listB, (l1, l2) => new {l1, l2}).SelectMany(x => x);

foreach (var m in mix)
{
Console.WriteLine(m);
}
}
}

public static class Impl
{
public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
using (IEnumerator<TFirst> iterator1 = first.GetEnumerator())
using (IEnumerator<TSecond> iterator2 = second.GetEnumerator())
{
var i1 = true;
var i2 = true;
var i1Shorter = false;
var i2Shorter = false;
var firstRun = true;


while(true)
{
i1 = iterator1.MoveNext();
i2 = iterator2.MoveNext();

if (!i1 && (i1Shorter || firstRun))
{
iterator1.Reset();
i1 = iterator1.MoveNext();
i1Shorter = true;
firstRun = false;
}

if (!i2 && (i2Shorter || firstRun))
{
iterator2.Reset();
i2 = iterator2.MoveNext();
i2Shorter = true;
firstRun = false;
}

if (!(i1 && i2))
{
break;
}

yield return resultSelector(iterator1.Current, iterator2.Current);
}
}
}
}
}


And I wonder if this implementation could be improved somehow, what could be improved, for better readability or speed.







c# linq






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 2 days ago

























asked 2 days ago









BWA

16627




16627








  • 2




    Your question would benefit from some example usage within the question itself, so that we can see how you intend the method to be consumed, and quickly verify that it is working as intended.
    – VisualMelon
    2 days ago












  • @VisualMelon improved, full working code
    – BWA
    2 days ago






  • 4




    Never Reset an enumerator. Just get a new enumerator. Reset was a misfeature intended for COM interop scenarios.
    – Eric Lippert
    2 days ago














  • 2




    Your question would benefit from some example usage within the question itself, so that we can see how you intend the method to be consumed, and quickly verify that it is working as intended.
    – VisualMelon
    2 days ago












  • @VisualMelon improved, full working code
    – BWA
    2 days ago






  • 4




    Never Reset an enumerator. Just get a new enumerator. Reset was a misfeature intended for COM interop scenarios.
    – Eric Lippert
    2 days ago








2




2




Your question would benefit from some example usage within the question itself, so that we can see how you intend the method to be consumed, and quickly verify that it is working as intended.
– VisualMelon
2 days ago






Your question would benefit from some example usage within the question itself, so that we can see how you intend the method to be consumed, and quickly verify that it is working as intended.
– VisualMelon
2 days ago














@VisualMelon improved, full working code
– BWA
2 days ago




@VisualMelon improved, full working code
– BWA
2 days ago




4




4




Never Reset an enumerator. Just get a new enumerator. Reset was a misfeature intended for COM interop scenarios.
– Eric Lippert
2 days ago




Never Reset an enumerator. Just get a new enumerator. Reset was a misfeature intended for COM interop scenarios.
– Eric Lippert
2 days ago










3 Answers
3






active

oldest

votes

















up vote
7
down vote













I noticed a few things that can be improved:




  • Not all enumerators support Reset. Generator methods don't, for example, so calling ZipNew on the result of a ZipNew call will fail with a NotSupportedException. Obtaining a new enumerator should work, at the cost of having to replace the convenient using statements with try/finally constructions. Edit: As Eric pointed out, Reset should not be used at all. It's been abandoned.

  • There's no need to call Reset (or rather, to get a new enumerator) when a collection is empty. I'd probably add a special case for that.

  • Passing null causes either an unspecific NullReferenceException or an ArgumentNullException with parameter name source to be thrown. Throwing ArgumentNullExceptions with accurate parameter names would be more helpful. Edit: As JAD pointed out, this is trickier than it looks. You'll have to split the method into an eager non-yielding method and a lazy yielding method. A local function should be useful here.


  • i1 and i2 can be declared inside the while loop.




Addendum:



As Henrik's answer shows, a helper class can be useful for properly repeating enumerators without having to give up on using. I would take a slightly different approach by creating a repeatable enumerator class:



class RepeatableEnumerator<T> : IDisposable
{
private IEnumerable<T> _enumerable;
private IEnumerator<T> _enumerator;


public bool IsRepeating { get; private set; }
public T Current => _enumerator.Current;


public RepeatableEnumerator(IEnumerable<T> enumerable)
{
_enumerable = enumerable;
_enumerator = enumerable.GetEnumerator();
}

public void Dispose()
{
_enumerator.Dispose();
_enumerator = null;
}

public bool MoveNext() => _enumerator.MoveNext();

public bool Repeat()
{
IsRepeating = true;
_enumerator.Dispose();
_enumerator = _enumerable.GetEnumerator();
return _enumerator.MoveNext();
}
}


Which can then be used for both enumerables (and possibly in other extension methods as well):



public static IEnumerable<TResult> ZipLongest<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
// Eager parameter validation:
if (first == null) throw new ArgumentNullException(nameof(first));
if (second == null) throw new ArgumentNullException(nameof(second));
if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));

// Local function for lazy zipping:
IEnumerable<TResult> ZipLongestImpl()
{
using (var enum1 = new RepeatableEnumerator<TFirst>(first))
using (var enum2 = new RepeatableEnumerator<TSecond>(second))
{
// Up-front check for empty collections:
if (!enum1.MoveNext() || !enum2.MoveNext())
yield break;

while (true)
{
yield return resultSelector(enum1.Current, enum2.Current);

var is1Empty = !enum1.MoveNext();
var is2Empty = !enum2.MoveNext();
if (is1Empty)
{
if (enum2.IsRepeating || is2Empty || !enum1.Repeat())
yield break;
}
else if (is2Empty)
{
if (enum1.IsRepeating || !enum2.Repeat())
yield break;
}
}
}
}
return ZipLongestImpl();
}


At this point it would be a good idea to add some documentation...






share|improve this answer



















  • 1




    Worth adding that if you're going to throw ArgumentNullExceptions, make sure they're thrown eagerly, not only when iteration has started.
    – JAD
    yesterday










  • That's a good point! Much trickier than you'd expect it to be... looks like a good place to use a local function.
    – Pieter Witvoet
    yesterday










  • Ah, so that's what you meant with a local function. If you look at what System.Linq does, they put the lazy part in a separate private method. But I don't think there's much difference between those two.
    – JAD
    23 hours ago












  • Also, maybe it's worth adding a repetition counter to the repeater class. I can imagine some instances where it's worth knowing howmany time it has looped.
    – JAD
    23 hours ago










  • Local functions get turned into private static methods, so it's basically the same thing. I think they would've been used in Linq if they had been available at that time. I'll update the example to use capturing though - contrary to what I expected, in this case it's actually slightly faster, and it simplifies the code a little.
    – Pieter Witvoet
    22 hours ago


















up vote
6
down vote














    public static class Impl
{
public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(



Names? The class would be more descriptive as something like LinqExtensions; the method something like ZipLooped.






            using (IEnumerator<TFirst> iterator1 = first.GetEnumerator()) 
using (IEnumerator<TSecond> iterator2 = second.GetEnumerator())
{
var i1 = true;
var i2 = true;
var i1Shorter = false;
var i2Shorter = false;
var firstRun = true;



The iterators have useful names, but what does i1 mean? And why five variables to track the state of two iterators? IMO it would be simpler as



                var firstEnded = false;
var secondEnded = false;

while (true)
{
if (!iterator1.MoveNext())
{
if (secondEnded) yield break;
firstEnded = true;
iterator1.Reset();
if (!iterator1.MoveNext()) yield break;
}
if (!iterator2.MoveNext())
{
if (firstEnded) yield break;
secondEnded = true;
iterator2.Reset();
if (!iterator2.MoveNext()) yield break;
}

yield return resultSelector(iterator1.Current, iterator2.Current);
}


and the almost repeated code might be worth pulling out as an inner method:



                var firstEnded = false;
var secondEnded = false;

bool advance<T>(IEnumerator<T> it, ref bool thisEnded, bool otherEnded)
{
if (it.MoveNext()) return true;
// `it` has done a full cycle; if the other one has too, we've finished
if (otherEnded) return false;
thisEnded = true;
// Start again, although if `it` is empty we need to abort
it.Reset();
return it.MoveNext();
}

while (true)
{
if (!advance(iterator1, ref firstEnded, secondEnded)) yield break;
if (!advance(iterator2, ref secondEnded, firstEnded)) yield break;
yield return resultSelector(iterator1.Current, iterator2.Current);
}




I notice that you've decided to yield break if either of the enumerables is empty. Would an exception be a better choice?






share|improve this answer





















  • I'd rename advance in TryMoveNextOrLoop
    – t3chb0t
    2 days ago










  • I notice that you've decided to yield break if either of the enumerables is empty. - This is the expected behaviour. I'd be surprised if it was something else and this makes linq so reliable - nothing there so nothing happens - otherwise you would need to check everything for emptyness, not pretty ;-)
    – t3chb0t
    2 days ago












  • @t3chb0t only most LINQ methods don't explicitly loop over one of the inputs as it consumes the other. I would probably expect an exception here if only one of them was empty, and an empty enumerable (yield break) if both are empty.
    – VisualMelon
    2 days ago










  • @VisualMelon no way :-P this is not how it should work. No collection returning LINQ extensions throw exceptions if the source is empty. Compare this new { 1 }.Zip(Enumerable.Empty<int>(), (x, y) => (x, y)).Dump(); The result is an empty collection.
    – t3chb0t
    2 days ago












  • @t3chb0t indeed, but that's because it's defined as zipping as far as the shortest. I'd argue that extracting values repeatedly from an empty collection is meaningless, and so it should throw (as per Average). Anyhow, I can see your argument, so I think we'll have to agree to disagree :P
    – VisualMelon
    2 days ago




















up vote
4
down vote













If you should respect that not all Enumerators implement Reset() then it is not possible to use using statements for the two IEnumerators. But you could introduce an IEnumerator<TResult> for the zipped result and use it like this:



public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
if (first == null) throw new ArgumentNullException(nameof(first));
if (second == null) throw new ArgumentNullException(nameof(second));
if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));

return InnerZipNew(first, second, resultSelector);
}

private static IEnumerable<TResult> InnerZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
using (ZipEnumerator<TFirst, TSecond, TResult> zipEnumerator = new ZipEnumerator<TFirst, TSecond, TResult>(first, second, resultSelector))
{
while (zipEnumerator.MoveNext())
{
yield return zipEnumerator.Current;
}
}
}


As JAD writes in his comment it is necessary to catch possible invalid input as the first thing and then call a private shadow method to do the actual iteration in order to make the exceptions be thrown when the extension is called rather than when the enumeration is performed.



In this way you're back on the using track.



The ZipEnumerator it self could be something like:



  public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>
{
IEnumerable<T> m_dataT;
IEnumerable<S> m_dataS;
IEnumerator<T> m_enumerT;
IEnumerator<S> m_enumerS;
List<IDisposable> m_disposables = new List<IDisposable>();
Func<T, S, TResult> m_selector;
bool m_secondReloaded = false;
bool m_first = true;

public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)
{
m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));
m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));
m_selector = selector ?? throw new ArgumentNullException(nameof(selector));

}

public TResult Current => m_selector(m_enumerT.Current, m_enumerS.Current);

object IEnumerator.Current => Current;

public void Dispose()
{
foreach (IDisposable disposable in m_disposables)
{
disposable.Dispose();
}
m_disposables.Clear();
}

private IEnumerator<T> GetTEnumerator()
{
var enumerator = m_dataT.GetEnumerator();
m_disposables.Add(enumerator);
return enumerator;
}

private IEnumerator<S> GetSEnumerator()
{
var enumerator = m_dataS.GetEnumerator();
m_disposables.Add(enumerator);
return enumerator;
}

public bool MoveNext()
{
m_enumerT = m_enumerT ?? GetTEnumerator();
m_enumerS = m_enumerS ?? GetSEnumerator();

if (m_first)
{
if (m_enumerT.MoveNext())
{
if (!m_enumerS.MoveNext())
{
m_enumerS = GetSEnumerator();
m_secondReloaded = true;
if (!m_enumerS.MoveNext())
return false;
}
return true;
}
else
{
m_first = false;
}
}

if (!m_first && !m_secondReloaded)
{
if (m_enumerS.MoveNext())
{
if (!m_enumerT.MoveNext())
{
m_enumerT = GetTEnumerator();
if (!m_enumerT.MoveNext())
return false;
}

return true;
}
}

return false;
}

public void Reset()
{
m_secondReloaded = false;
m_first = true;
m_enumerT = null;
m_enumerS = null;
Dispose();
}
}


It's a little more code than other suggestions, but it encapsulates the problems with the disposal of intermediate enumerators without the necessity of a try-catch-statement. You could discuss if the disposal should be immediately when the enumerator is done or as I do collect them for disposal when the ZipEnumerator itself is disposed off?



The MoveNext() method went a little more complicated than I like, so feel free to edit or suggest improvements.





Edit



A refactored version of ZipEnumerator:



  public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>
{
IEnumerable<T> m_dataT;
IEnumerable<S> m_dataS;
IEnumerator<T> m_enumeratorT;
IEnumerator<S> m_enumeratorS;
List<IDisposable> m_disposables = new List<IDisposable>();
Func<T, S, TResult> m_selector;
bool m_secondReloaded = false;
bool m_isInitilized = false;

public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)
{
m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));
m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));
m_selector = selector ?? throw new ArgumentNullException(nameof(selector));
}

public TResult Current => m_selector(m_enumeratorT.Current, m_enumeratorS.Current);
object IEnumerator.Current => Current;

public void Dispose()
{
DoDispose();
}

private void RegisterDisposable(IDisposable disposable)
{
m_disposables.Add(disposable);
if (m_disposables.Count > 10)
{
DoDispose();
}
}

private void DoDispose()
{
foreach (IDisposable disposable in m_disposables)
{
disposable.Dispose();
}
m_disposables.Clear();
}

private IEnumerator<T> GetTEnumerator()
{
var enumerator = m_dataT.GetEnumerator();
RegisterDisposable(enumerator);
return enumerator;
}

private IEnumerator<S> GetSEnumerator()
{
var enumerator = m_dataS.GetEnumerator();
RegisterDisposable(enumerator);
return enumerator;
}

private Func<bool> CurrentMover = null;

private bool FirstMover()
{
if (m_enumeratorT.MoveNext())
{
if (!m_enumeratorS.MoveNext())
{
m_enumeratorS = GetSEnumerator();
m_secondReloaded = true;
if (!m_enumeratorS.MoveNext())
return false;
}
return true;
}
else if (!m_secondReloaded)
{
CurrentMover = SecondMover;
return CurrentMover();
}

return false;
}

private bool SecondMover()
{
if (m_enumeratorS.MoveNext())
{
if (!m_enumeratorT.MoveNext())
{
m_enumeratorT = GetTEnumerator();
if (!m_enumeratorT.MoveNext())
return false;
}

return true;
}

return false;
}

private void Initialize()
{
m_enumeratorT = GetTEnumerator();
m_enumeratorS = GetSEnumerator();
CurrentMover = FirstMover;
m_isInitilized = true;
}

public bool MoveNext()
{
if (!m_isInitilized)
{
Initialize();
}
return CurrentMover();
}

public void Reset()
{
m_isInitilized = false;
m_secondReloaded = false;
CurrentMover = null;
m_enumeratorT = null;
m_enumeratorS = null;
DoDispose();
}
}





share|improve this answer























  • I'd like to click +1 but when I see the variable names it says -1 so at the and it's a 0 ;-)
    – t3chb0t
    2 days ago










  • I think the word unconventional describes them pretty good... although ts and ss were below that level ;-]
    – t3chb0t
    2 days ago










  • I can explain that ;-P It's local and in a very small scope, this is allowed in my world, tt and ss on the other hand were public.
    – t3chb0t
    2 days ago










  • You might want to refactor this so that the null-checks are eagerly performed instead of when iteration start. Doing so would be in line with other Linq implementations. Calling any Linq method on a null enumerable will throw immediately, not when iterating starts. To do this, have the public extension methods check for nulls, then from that call a private method that's using the yield return lazy evaluation. Example.
    – JAD
    yesterday








  • 1




    @HenrikHansen I saw that. The problem is that if a method is using the yield method of returning a IEnumerable, the entire method is treated as lazy. So nothing in the method starts executing before the first element in the result sequence is accessed. This means that only at that point the ZipEnumerator is constructed, and only then nullchecks are performed. Take a look at this blog by Jon Skeet for another (probably better) explanation.
    – JAD
    yesterday











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f208152%2fcustom-implementation-of-the-linq-zip-operator-for-different-length-lists%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
7
down vote













I noticed a few things that can be improved:




  • Not all enumerators support Reset. Generator methods don't, for example, so calling ZipNew on the result of a ZipNew call will fail with a NotSupportedException. Obtaining a new enumerator should work, at the cost of having to replace the convenient using statements with try/finally constructions. Edit: As Eric pointed out, Reset should not be used at all. It's been abandoned.

  • There's no need to call Reset (or rather, to get a new enumerator) when a collection is empty. I'd probably add a special case for that.

  • Passing null causes either an unspecific NullReferenceException or an ArgumentNullException with parameter name source to be thrown. Throwing ArgumentNullExceptions with accurate parameter names would be more helpful. Edit: As JAD pointed out, this is trickier than it looks. You'll have to split the method into an eager non-yielding method and a lazy yielding method. A local function should be useful here.


  • i1 and i2 can be declared inside the while loop.




Addendum:



As Henrik's answer shows, a helper class can be useful for properly repeating enumerators without having to give up on using. I would take a slightly different approach by creating a repeatable enumerator class:



class RepeatableEnumerator<T> : IDisposable
{
private IEnumerable<T> _enumerable;
private IEnumerator<T> _enumerator;


public bool IsRepeating { get; private set; }
public T Current => _enumerator.Current;


public RepeatableEnumerator(IEnumerable<T> enumerable)
{
_enumerable = enumerable;
_enumerator = enumerable.GetEnumerator();
}

public void Dispose()
{
_enumerator.Dispose();
_enumerator = null;
}

public bool MoveNext() => _enumerator.MoveNext();

public bool Repeat()
{
IsRepeating = true;
_enumerator.Dispose();
_enumerator = _enumerable.GetEnumerator();
return _enumerator.MoveNext();
}
}


Which can then be used for both enumerables (and possibly in other extension methods as well):



public static IEnumerable<TResult> ZipLongest<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
// Eager parameter validation:
if (first == null) throw new ArgumentNullException(nameof(first));
if (second == null) throw new ArgumentNullException(nameof(second));
if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));

// Local function for lazy zipping:
IEnumerable<TResult> ZipLongestImpl()
{
using (var enum1 = new RepeatableEnumerator<TFirst>(first))
using (var enum2 = new RepeatableEnumerator<TSecond>(second))
{
// Up-front check for empty collections:
if (!enum1.MoveNext() || !enum2.MoveNext())
yield break;

while (true)
{
yield return resultSelector(enum1.Current, enum2.Current);

var is1Empty = !enum1.MoveNext();
var is2Empty = !enum2.MoveNext();
if (is1Empty)
{
if (enum2.IsRepeating || is2Empty || !enum1.Repeat())
yield break;
}
else if (is2Empty)
{
if (enum1.IsRepeating || !enum2.Repeat())
yield break;
}
}
}
}
return ZipLongestImpl();
}


At this point it would be a good idea to add some documentation...






share|improve this answer



















  • 1




    Worth adding that if you're going to throw ArgumentNullExceptions, make sure they're thrown eagerly, not only when iteration has started.
    – JAD
    yesterday










  • That's a good point! Much trickier than you'd expect it to be... looks like a good place to use a local function.
    – Pieter Witvoet
    yesterday










  • Ah, so that's what you meant with a local function. If you look at what System.Linq does, they put the lazy part in a separate private method. But I don't think there's much difference between those two.
    – JAD
    23 hours ago












  • Also, maybe it's worth adding a repetition counter to the repeater class. I can imagine some instances where it's worth knowing howmany time it has looped.
    – JAD
    23 hours ago










  • Local functions get turned into private static methods, so it's basically the same thing. I think they would've been used in Linq if they had been available at that time. I'll update the example to use capturing though - contrary to what I expected, in this case it's actually slightly faster, and it simplifies the code a little.
    – Pieter Witvoet
    22 hours ago















up vote
7
down vote













I noticed a few things that can be improved:




  • Not all enumerators support Reset. Generator methods don't, for example, so calling ZipNew on the result of a ZipNew call will fail with a NotSupportedException. Obtaining a new enumerator should work, at the cost of having to replace the convenient using statements with try/finally constructions. Edit: As Eric pointed out, Reset should not be used at all. It's been abandoned.

  • There's no need to call Reset (or rather, to get a new enumerator) when a collection is empty. I'd probably add a special case for that.

  • Passing null causes either an unspecific NullReferenceException or an ArgumentNullException with parameter name source to be thrown. Throwing ArgumentNullExceptions with accurate parameter names would be more helpful. Edit: As JAD pointed out, this is trickier than it looks. You'll have to split the method into an eager non-yielding method and a lazy yielding method. A local function should be useful here.


  • i1 and i2 can be declared inside the while loop.




Addendum:



As Henrik's answer shows, a helper class can be useful for properly repeating enumerators without having to give up on using. I would take a slightly different approach by creating a repeatable enumerator class:



class RepeatableEnumerator<T> : IDisposable
{
private IEnumerable<T> _enumerable;
private IEnumerator<T> _enumerator;


public bool IsRepeating { get; private set; }
public T Current => _enumerator.Current;


public RepeatableEnumerator(IEnumerable<T> enumerable)
{
_enumerable = enumerable;
_enumerator = enumerable.GetEnumerator();
}

public void Dispose()
{
_enumerator.Dispose();
_enumerator = null;
}

public bool MoveNext() => _enumerator.MoveNext();

public bool Repeat()
{
IsRepeating = true;
_enumerator.Dispose();
_enumerator = _enumerable.GetEnumerator();
return _enumerator.MoveNext();
}
}


Which can then be used for both enumerables (and possibly in other extension methods as well):



public static IEnumerable<TResult> ZipLongest<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
// Eager parameter validation:
if (first == null) throw new ArgumentNullException(nameof(first));
if (second == null) throw new ArgumentNullException(nameof(second));
if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));

// Local function for lazy zipping:
IEnumerable<TResult> ZipLongestImpl()
{
using (var enum1 = new RepeatableEnumerator<TFirst>(first))
using (var enum2 = new RepeatableEnumerator<TSecond>(second))
{
// Up-front check for empty collections:
if (!enum1.MoveNext() || !enum2.MoveNext())
yield break;

while (true)
{
yield return resultSelector(enum1.Current, enum2.Current);

var is1Empty = !enum1.MoveNext();
var is2Empty = !enum2.MoveNext();
if (is1Empty)
{
if (enum2.IsRepeating || is2Empty || !enum1.Repeat())
yield break;
}
else if (is2Empty)
{
if (enum1.IsRepeating || !enum2.Repeat())
yield break;
}
}
}
}
return ZipLongestImpl();
}


At this point it would be a good idea to add some documentation...






share|improve this answer



















  • 1




    Worth adding that if you're going to throw ArgumentNullExceptions, make sure they're thrown eagerly, not only when iteration has started.
    – JAD
    yesterday










  • That's a good point! Much trickier than you'd expect it to be... looks like a good place to use a local function.
    – Pieter Witvoet
    yesterday










  • Ah, so that's what you meant with a local function. If you look at what System.Linq does, they put the lazy part in a separate private method. But I don't think there's much difference between those two.
    – JAD
    23 hours ago












  • Also, maybe it's worth adding a repetition counter to the repeater class. I can imagine some instances where it's worth knowing howmany time it has looped.
    – JAD
    23 hours ago










  • Local functions get turned into private static methods, so it's basically the same thing. I think they would've been used in Linq if they had been available at that time. I'll update the example to use capturing though - contrary to what I expected, in this case it's actually slightly faster, and it simplifies the code a little.
    – Pieter Witvoet
    22 hours ago













up vote
7
down vote










up vote
7
down vote









I noticed a few things that can be improved:




  • Not all enumerators support Reset. Generator methods don't, for example, so calling ZipNew on the result of a ZipNew call will fail with a NotSupportedException. Obtaining a new enumerator should work, at the cost of having to replace the convenient using statements with try/finally constructions. Edit: As Eric pointed out, Reset should not be used at all. It's been abandoned.

  • There's no need to call Reset (or rather, to get a new enumerator) when a collection is empty. I'd probably add a special case for that.

  • Passing null causes either an unspecific NullReferenceException or an ArgumentNullException with parameter name source to be thrown. Throwing ArgumentNullExceptions with accurate parameter names would be more helpful. Edit: As JAD pointed out, this is trickier than it looks. You'll have to split the method into an eager non-yielding method and a lazy yielding method. A local function should be useful here.


  • i1 and i2 can be declared inside the while loop.




Addendum:



As Henrik's answer shows, a helper class can be useful for properly repeating enumerators without having to give up on using. I would take a slightly different approach by creating a repeatable enumerator class:



class RepeatableEnumerator<T> : IDisposable
{
private IEnumerable<T> _enumerable;
private IEnumerator<T> _enumerator;


public bool IsRepeating { get; private set; }
public T Current => _enumerator.Current;


public RepeatableEnumerator(IEnumerable<T> enumerable)
{
_enumerable = enumerable;
_enumerator = enumerable.GetEnumerator();
}

public void Dispose()
{
_enumerator.Dispose();
_enumerator = null;
}

public bool MoveNext() => _enumerator.MoveNext();

public bool Repeat()
{
IsRepeating = true;
_enumerator.Dispose();
_enumerator = _enumerable.GetEnumerator();
return _enumerator.MoveNext();
}
}


Which can then be used for both enumerables (and possibly in other extension methods as well):



public static IEnumerable<TResult> ZipLongest<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
// Eager parameter validation:
if (first == null) throw new ArgumentNullException(nameof(first));
if (second == null) throw new ArgumentNullException(nameof(second));
if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));

// Local function for lazy zipping:
IEnumerable<TResult> ZipLongestImpl()
{
using (var enum1 = new RepeatableEnumerator<TFirst>(first))
using (var enum2 = new RepeatableEnumerator<TSecond>(second))
{
// Up-front check for empty collections:
if (!enum1.MoveNext() || !enum2.MoveNext())
yield break;

while (true)
{
yield return resultSelector(enum1.Current, enum2.Current);

var is1Empty = !enum1.MoveNext();
var is2Empty = !enum2.MoveNext();
if (is1Empty)
{
if (enum2.IsRepeating || is2Empty || !enum1.Repeat())
yield break;
}
else if (is2Empty)
{
if (enum1.IsRepeating || !enum2.Repeat())
yield break;
}
}
}
}
return ZipLongestImpl();
}


At this point it would be a good idea to add some documentation...






share|improve this answer














I noticed a few things that can be improved:




  • Not all enumerators support Reset. Generator methods don't, for example, so calling ZipNew on the result of a ZipNew call will fail with a NotSupportedException. Obtaining a new enumerator should work, at the cost of having to replace the convenient using statements with try/finally constructions. Edit: As Eric pointed out, Reset should not be used at all. It's been abandoned.

  • There's no need to call Reset (or rather, to get a new enumerator) when a collection is empty. I'd probably add a special case for that.

  • Passing null causes either an unspecific NullReferenceException or an ArgumentNullException with parameter name source to be thrown. Throwing ArgumentNullExceptions with accurate parameter names would be more helpful. Edit: As JAD pointed out, this is trickier than it looks. You'll have to split the method into an eager non-yielding method and a lazy yielding method. A local function should be useful here.


  • i1 and i2 can be declared inside the while loop.




Addendum:



As Henrik's answer shows, a helper class can be useful for properly repeating enumerators without having to give up on using. I would take a slightly different approach by creating a repeatable enumerator class:



class RepeatableEnumerator<T> : IDisposable
{
private IEnumerable<T> _enumerable;
private IEnumerator<T> _enumerator;


public bool IsRepeating { get; private set; }
public T Current => _enumerator.Current;


public RepeatableEnumerator(IEnumerable<T> enumerable)
{
_enumerable = enumerable;
_enumerator = enumerable.GetEnumerator();
}

public void Dispose()
{
_enumerator.Dispose();
_enumerator = null;
}

public bool MoveNext() => _enumerator.MoveNext();

public bool Repeat()
{
IsRepeating = true;
_enumerator.Dispose();
_enumerator = _enumerable.GetEnumerator();
return _enumerator.MoveNext();
}
}


Which can then be used for both enumerables (and possibly in other extension methods as well):



public static IEnumerable<TResult> ZipLongest<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
// Eager parameter validation:
if (first == null) throw new ArgumentNullException(nameof(first));
if (second == null) throw new ArgumentNullException(nameof(second));
if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));

// Local function for lazy zipping:
IEnumerable<TResult> ZipLongestImpl()
{
using (var enum1 = new RepeatableEnumerator<TFirst>(first))
using (var enum2 = new RepeatableEnumerator<TSecond>(second))
{
// Up-front check for empty collections:
if (!enum1.MoveNext() || !enum2.MoveNext())
yield break;

while (true)
{
yield return resultSelector(enum1.Current, enum2.Current);

var is1Empty = !enum1.MoveNext();
var is2Empty = !enum2.MoveNext();
if (is1Empty)
{
if (enum2.IsRepeating || is2Empty || !enum1.Repeat())
yield break;
}
else if (is2Empty)
{
if (enum1.IsRepeating || !enum2.Repeat())
yield break;
}
}
}
}
return ZipLongestImpl();
}


At this point it would be a good idea to add some documentation...







share|improve this answer














share|improve this answer



share|improve this answer








edited 22 hours ago

























answered 2 days ago









Pieter Witvoet

4,951723




4,951723








  • 1




    Worth adding that if you're going to throw ArgumentNullExceptions, make sure they're thrown eagerly, not only when iteration has started.
    – JAD
    yesterday










  • That's a good point! Much trickier than you'd expect it to be... looks like a good place to use a local function.
    – Pieter Witvoet
    yesterday










  • Ah, so that's what you meant with a local function. If you look at what System.Linq does, they put the lazy part in a separate private method. But I don't think there's much difference between those two.
    – JAD
    23 hours ago












  • Also, maybe it's worth adding a repetition counter to the repeater class. I can imagine some instances where it's worth knowing howmany time it has looped.
    – JAD
    23 hours ago










  • Local functions get turned into private static methods, so it's basically the same thing. I think they would've been used in Linq if they had been available at that time. I'll update the example to use capturing though - contrary to what I expected, in this case it's actually slightly faster, and it simplifies the code a little.
    – Pieter Witvoet
    22 hours ago














  • 1




    Worth adding that if you're going to throw ArgumentNullExceptions, make sure they're thrown eagerly, not only when iteration has started.
    – JAD
    yesterday










  • That's a good point! Much trickier than you'd expect it to be... looks like a good place to use a local function.
    – Pieter Witvoet
    yesterday










  • Ah, so that's what you meant with a local function. If you look at what System.Linq does, they put the lazy part in a separate private method. But I don't think there's much difference between those two.
    – JAD
    23 hours ago












  • Also, maybe it's worth adding a repetition counter to the repeater class. I can imagine some instances where it's worth knowing howmany time it has looped.
    – JAD
    23 hours ago










  • Local functions get turned into private static methods, so it's basically the same thing. I think they would've been used in Linq if they had been available at that time. I'll update the example to use capturing though - contrary to what I expected, in this case it's actually slightly faster, and it simplifies the code a little.
    – Pieter Witvoet
    22 hours ago








1




1




Worth adding that if you're going to throw ArgumentNullExceptions, make sure they're thrown eagerly, not only when iteration has started.
– JAD
yesterday




Worth adding that if you're going to throw ArgumentNullExceptions, make sure they're thrown eagerly, not only when iteration has started.
– JAD
yesterday












That's a good point! Much trickier than you'd expect it to be... looks like a good place to use a local function.
– Pieter Witvoet
yesterday




That's a good point! Much trickier than you'd expect it to be... looks like a good place to use a local function.
– Pieter Witvoet
yesterday












Ah, so that's what you meant with a local function. If you look at what System.Linq does, they put the lazy part in a separate private method. But I don't think there's much difference between those two.
– JAD
23 hours ago






Ah, so that's what you meant with a local function. If you look at what System.Linq does, they put the lazy part in a separate private method. But I don't think there's much difference between those two.
– JAD
23 hours ago














Also, maybe it's worth adding a repetition counter to the repeater class. I can imagine some instances where it's worth knowing howmany time it has looped.
– JAD
23 hours ago




Also, maybe it's worth adding a repetition counter to the repeater class. I can imagine some instances where it's worth knowing howmany time it has looped.
– JAD
23 hours ago












Local functions get turned into private static methods, so it's basically the same thing. I think they would've been used in Linq if they had been available at that time. I'll update the example to use capturing though - contrary to what I expected, in this case it's actually slightly faster, and it simplifies the code a little.
– Pieter Witvoet
22 hours ago




Local functions get turned into private static methods, so it's basically the same thing. I think they would've been used in Linq if they had been available at that time. I'll update the example to use capturing though - contrary to what I expected, in this case it's actually slightly faster, and it simplifies the code a little.
– Pieter Witvoet
22 hours ago












up vote
6
down vote














    public static class Impl
{
public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(



Names? The class would be more descriptive as something like LinqExtensions; the method something like ZipLooped.






            using (IEnumerator<TFirst> iterator1 = first.GetEnumerator()) 
using (IEnumerator<TSecond> iterator2 = second.GetEnumerator())
{
var i1 = true;
var i2 = true;
var i1Shorter = false;
var i2Shorter = false;
var firstRun = true;



The iterators have useful names, but what does i1 mean? And why five variables to track the state of two iterators? IMO it would be simpler as



                var firstEnded = false;
var secondEnded = false;

while (true)
{
if (!iterator1.MoveNext())
{
if (secondEnded) yield break;
firstEnded = true;
iterator1.Reset();
if (!iterator1.MoveNext()) yield break;
}
if (!iterator2.MoveNext())
{
if (firstEnded) yield break;
secondEnded = true;
iterator2.Reset();
if (!iterator2.MoveNext()) yield break;
}

yield return resultSelector(iterator1.Current, iterator2.Current);
}


and the almost repeated code might be worth pulling out as an inner method:



                var firstEnded = false;
var secondEnded = false;

bool advance<T>(IEnumerator<T> it, ref bool thisEnded, bool otherEnded)
{
if (it.MoveNext()) return true;
// `it` has done a full cycle; if the other one has too, we've finished
if (otherEnded) return false;
thisEnded = true;
// Start again, although if `it` is empty we need to abort
it.Reset();
return it.MoveNext();
}

while (true)
{
if (!advance(iterator1, ref firstEnded, secondEnded)) yield break;
if (!advance(iterator2, ref secondEnded, firstEnded)) yield break;
yield return resultSelector(iterator1.Current, iterator2.Current);
}




I notice that you've decided to yield break if either of the enumerables is empty. Would an exception be a better choice?






share|improve this answer





















  • I'd rename advance in TryMoveNextOrLoop
    – t3chb0t
    2 days ago










  • I notice that you've decided to yield break if either of the enumerables is empty. - This is the expected behaviour. I'd be surprised if it was something else and this makes linq so reliable - nothing there so nothing happens - otherwise you would need to check everything for emptyness, not pretty ;-)
    – t3chb0t
    2 days ago












  • @t3chb0t only most LINQ methods don't explicitly loop over one of the inputs as it consumes the other. I would probably expect an exception here if only one of them was empty, and an empty enumerable (yield break) if both are empty.
    – VisualMelon
    2 days ago










  • @VisualMelon no way :-P this is not how it should work. No collection returning LINQ extensions throw exceptions if the source is empty. Compare this new { 1 }.Zip(Enumerable.Empty<int>(), (x, y) => (x, y)).Dump(); The result is an empty collection.
    – t3chb0t
    2 days ago












  • @t3chb0t indeed, but that's because it's defined as zipping as far as the shortest. I'd argue that extracting values repeatedly from an empty collection is meaningless, and so it should throw (as per Average). Anyhow, I can see your argument, so I think we'll have to agree to disagree :P
    – VisualMelon
    2 days ago

















up vote
6
down vote














    public static class Impl
{
public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(



Names? The class would be more descriptive as something like LinqExtensions; the method something like ZipLooped.






            using (IEnumerator<TFirst> iterator1 = first.GetEnumerator()) 
using (IEnumerator<TSecond> iterator2 = second.GetEnumerator())
{
var i1 = true;
var i2 = true;
var i1Shorter = false;
var i2Shorter = false;
var firstRun = true;



The iterators have useful names, but what does i1 mean? And why five variables to track the state of two iterators? IMO it would be simpler as



                var firstEnded = false;
var secondEnded = false;

while (true)
{
if (!iterator1.MoveNext())
{
if (secondEnded) yield break;
firstEnded = true;
iterator1.Reset();
if (!iterator1.MoveNext()) yield break;
}
if (!iterator2.MoveNext())
{
if (firstEnded) yield break;
secondEnded = true;
iterator2.Reset();
if (!iterator2.MoveNext()) yield break;
}

yield return resultSelector(iterator1.Current, iterator2.Current);
}


and the almost repeated code might be worth pulling out as an inner method:



                var firstEnded = false;
var secondEnded = false;

bool advance<T>(IEnumerator<T> it, ref bool thisEnded, bool otherEnded)
{
if (it.MoveNext()) return true;
// `it` has done a full cycle; if the other one has too, we've finished
if (otherEnded) return false;
thisEnded = true;
// Start again, although if `it` is empty we need to abort
it.Reset();
return it.MoveNext();
}

while (true)
{
if (!advance(iterator1, ref firstEnded, secondEnded)) yield break;
if (!advance(iterator2, ref secondEnded, firstEnded)) yield break;
yield return resultSelector(iterator1.Current, iterator2.Current);
}




I notice that you've decided to yield break if either of the enumerables is empty. Would an exception be a better choice?






share|improve this answer





















  • I'd rename advance in TryMoveNextOrLoop
    – t3chb0t
    2 days ago










  • I notice that you've decided to yield break if either of the enumerables is empty. - This is the expected behaviour. I'd be surprised if it was something else and this makes linq so reliable - nothing there so nothing happens - otherwise you would need to check everything for emptyness, not pretty ;-)
    – t3chb0t
    2 days ago












  • @t3chb0t only most LINQ methods don't explicitly loop over one of the inputs as it consumes the other. I would probably expect an exception here if only one of them was empty, and an empty enumerable (yield break) if both are empty.
    – VisualMelon
    2 days ago










  • @VisualMelon no way :-P this is not how it should work. No collection returning LINQ extensions throw exceptions if the source is empty. Compare this new { 1 }.Zip(Enumerable.Empty<int>(), (x, y) => (x, y)).Dump(); The result is an empty collection.
    – t3chb0t
    2 days ago












  • @t3chb0t indeed, but that's because it's defined as zipping as far as the shortest. I'd argue that extracting values repeatedly from an empty collection is meaningless, and so it should throw (as per Average). Anyhow, I can see your argument, so I think we'll have to agree to disagree :P
    – VisualMelon
    2 days ago















up vote
6
down vote










up vote
6
down vote










    public static class Impl
{
public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(



Names? The class would be more descriptive as something like LinqExtensions; the method something like ZipLooped.






            using (IEnumerator<TFirst> iterator1 = first.GetEnumerator()) 
using (IEnumerator<TSecond> iterator2 = second.GetEnumerator())
{
var i1 = true;
var i2 = true;
var i1Shorter = false;
var i2Shorter = false;
var firstRun = true;



The iterators have useful names, but what does i1 mean? And why five variables to track the state of two iterators? IMO it would be simpler as



                var firstEnded = false;
var secondEnded = false;

while (true)
{
if (!iterator1.MoveNext())
{
if (secondEnded) yield break;
firstEnded = true;
iterator1.Reset();
if (!iterator1.MoveNext()) yield break;
}
if (!iterator2.MoveNext())
{
if (firstEnded) yield break;
secondEnded = true;
iterator2.Reset();
if (!iterator2.MoveNext()) yield break;
}

yield return resultSelector(iterator1.Current, iterator2.Current);
}


and the almost repeated code might be worth pulling out as an inner method:



                var firstEnded = false;
var secondEnded = false;

bool advance<T>(IEnumerator<T> it, ref bool thisEnded, bool otherEnded)
{
if (it.MoveNext()) return true;
// `it` has done a full cycle; if the other one has too, we've finished
if (otherEnded) return false;
thisEnded = true;
// Start again, although if `it` is empty we need to abort
it.Reset();
return it.MoveNext();
}

while (true)
{
if (!advance(iterator1, ref firstEnded, secondEnded)) yield break;
if (!advance(iterator2, ref secondEnded, firstEnded)) yield break;
yield return resultSelector(iterator1.Current, iterator2.Current);
}




I notice that you've decided to yield break if either of the enumerables is empty. Would an exception be a better choice?






share|improve this answer













    public static class Impl
{
public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(



Names? The class would be more descriptive as something like LinqExtensions; the method something like ZipLooped.






            using (IEnumerator<TFirst> iterator1 = first.GetEnumerator()) 
using (IEnumerator<TSecond> iterator2 = second.GetEnumerator())
{
var i1 = true;
var i2 = true;
var i1Shorter = false;
var i2Shorter = false;
var firstRun = true;



The iterators have useful names, but what does i1 mean? And why five variables to track the state of two iterators? IMO it would be simpler as



                var firstEnded = false;
var secondEnded = false;

while (true)
{
if (!iterator1.MoveNext())
{
if (secondEnded) yield break;
firstEnded = true;
iterator1.Reset();
if (!iterator1.MoveNext()) yield break;
}
if (!iterator2.MoveNext())
{
if (firstEnded) yield break;
secondEnded = true;
iterator2.Reset();
if (!iterator2.MoveNext()) yield break;
}

yield return resultSelector(iterator1.Current, iterator2.Current);
}


and the almost repeated code might be worth pulling out as an inner method:



                var firstEnded = false;
var secondEnded = false;

bool advance<T>(IEnumerator<T> it, ref bool thisEnded, bool otherEnded)
{
if (it.MoveNext()) return true;
// `it` has done a full cycle; if the other one has too, we've finished
if (otherEnded) return false;
thisEnded = true;
// Start again, although if `it` is empty we need to abort
it.Reset();
return it.MoveNext();
}

while (true)
{
if (!advance(iterator1, ref firstEnded, secondEnded)) yield break;
if (!advance(iterator2, ref secondEnded, firstEnded)) yield break;
yield return resultSelector(iterator1.Current, iterator2.Current);
}




I notice that you've decided to yield break if either of the enumerables is empty. Would an exception be a better choice?







share|improve this answer












share|improve this answer



share|improve this answer










answered 2 days ago









Peter Taylor

15.3k2657




15.3k2657












  • I'd rename advance in TryMoveNextOrLoop
    – t3chb0t
    2 days ago










  • I notice that you've decided to yield break if either of the enumerables is empty. - This is the expected behaviour. I'd be surprised if it was something else and this makes linq so reliable - nothing there so nothing happens - otherwise you would need to check everything for emptyness, not pretty ;-)
    – t3chb0t
    2 days ago












  • @t3chb0t only most LINQ methods don't explicitly loop over one of the inputs as it consumes the other. I would probably expect an exception here if only one of them was empty, and an empty enumerable (yield break) if both are empty.
    – VisualMelon
    2 days ago










  • @VisualMelon no way :-P this is not how it should work. No collection returning LINQ extensions throw exceptions if the source is empty. Compare this new { 1 }.Zip(Enumerable.Empty<int>(), (x, y) => (x, y)).Dump(); The result is an empty collection.
    – t3chb0t
    2 days ago












  • @t3chb0t indeed, but that's because it's defined as zipping as far as the shortest. I'd argue that extracting values repeatedly from an empty collection is meaningless, and so it should throw (as per Average). Anyhow, I can see your argument, so I think we'll have to agree to disagree :P
    – VisualMelon
    2 days ago




















  • I'd rename advance in TryMoveNextOrLoop
    – t3chb0t
    2 days ago










  • I notice that you've decided to yield break if either of the enumerables is empty. - This is the expected behaviour. I'd be surprised if it was something else and this makes linq so reliable - nothing there so nothing happens - otherwise you would need to check everything for emptyness, not pretty ;-)
    – t3chb0t
    2 days ago












  • @t3chb0t only most LINQ methods don't explicitly loop over one of the inputs as it consumes the other. I would probably expect an exception here if only one of them was empty, and an empty enumerable (yield break) if both are empty.
    – VisualMelon
    2 days ago










  • @VisualMelon no way :-P this is not how it should work. No collection returning LINQ extensions throw exceptions if the source is empty. Compare this new { 1 }.Zip(Enumerable.Empty<int>(), (x, y) => (x, y)).Dump(); The result is an empty collection.
    – t3chb0t
    2 days ago












  • @t3chb0t indeed, but that's because it's defined as zipping as far as the shortest. I'd argue that extracting values repeatedly from an empty collection is meaningless, and so it should throw (as per Average). Anyhow, I can see your argument, so I think we'll have to agree to disagree :P
    – VisualMelon
    2 days ago


















I'd rename advance in TryMoveNextOrLoop
– t3chb0t
2 days ago




I'd rename advance in TryMoveNextOrLoop
– t3chb0t
2 days ago












I notice that you've decided to yield break if either of the enumerables is empty. - This is the expected behaviour. I'd be surprised if it was something else and this makes linq so reliable - nothing there so nothing happens - otherwise you would need to check everything for emptyness, not pretty ;-)
– t3chb0t
2 days ago






I notice that you've decided to yield break if either of the enumerables is empty. - This is the expected behaviour. I'd be surprised if it was something else and this makes linq so reliable - nothing there so nothing happens - otherwise you would need to check everything for emptyness, not pretty ;-)
– t3chb0t
2 days ago














@t3chb0t only most LINQ methods don't explicitly loop over one of the inputs as it consumes the other. I would probably expect an exception here if only one of them was empty, and an empty enumerable (yield break) if both are empty.
– VisualMelon
2 days ago




@t3chb0t only most LINQ methods don't explicitly loop over one of the inputs as it consumes the other. I would probably expect an exception here if only one of them was empty, and an empty enumerable (yield break) if both are empty.
– VisualMelon
2 days ago












@VisualMelon no way :-P this is not how it should work. No collection returning LINQ extensions throw exceptions if the source is empty. Compare this new { 1 }.Zip(Enumerable.Empty<int>(), (x, y) => (x, y)).Dump(); The result is an empty collection.
– t3chb0t
2 days ago






@VisualMelon no way :-P this is not how it should work. No collection returning LINQ extensions throw exceptions if the source is empty. Compare this new { 1 }.Zip(Enumerable.Empty<int>(), (x, y) => (x, y)).Dump(); The result is an empty collection.
– t3chb0t
2 days ago














@t3chb0t indeed, but that's because it's defined as zipping as far as the shortest. I'd argue that extracting values repeatedly from an empty collection is meaningless, and so it should throw (as per Average). Anyhow, I can see your argument, so I think we'll have to agree to disagree :P
– VisualMelon
2 days ago






@t3chb0t indeed, but that's because it's defined as zipping as far as the shortest. I'd argue that extracting values repeatedly from an empty collection is meaningless, and so it should throw (as per Average). Anyhow, I can see your argument, so I think we'll have to agree to disagree :P
– VisualMelon
2 days ago












up vote
4
down vote













If you should respect that not all Enumerators implement Reset() then it is not possible to use using statements for the two IEnumerators. But you could introduce an IEnumerator<TResult> for the zipped result and use it like this:



public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
if (first == null) throw new ArgumentNullException(nameof(first));
if (second == null) throw new ArgumentNullException(nameof(second));
if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));

return InnerZipNew(first, second, resultSelector);
}

private static IEnumerable<TResult> InnerZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
using (ZipEnumerator<TFirst, TSecond, TResult> zipEnumerator = new ZipEnumerator<TFirst, TSecond, TResult>(first, second, resultSelector))
{
while (zipEnumerator.MoveNext())
{
yield return zipEnumerator.Current;
}
}
}


As JAD writes in his comment it is necessary to catch possible invalid input as the first thing and then call a private shadow method to do the actual iteration in order to make the exceptions be thrown when the extension is called rather than when the enumeration is performed.



In this way you're back on the using track.



The ZipEnumerator it self could be something like:



  public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>
{
IEnumerable<T> m_dataT;
IEnumerable<S> m_dataS;
IEnumerator<T> m_enumerT;
IEnumerator<S> m_enumerS;
List<IDisposable> m_disposables = new List<IDisposable>();
Func<T, S, TResult> m_selector;
bool m_secondReloaded = false;
bool m_first = true;

public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)
{
m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));
m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));
m_selector = selector ?? throw new ArgumentNullException(nameof(selector));

}

public TResult Current => m_selector(m_enumerT.Current, m_enumerS.Current);

object IEnumerator.Current => Current;

public void Dispose()
{
foreach (IDisposable disposable in m_disposables)
{
disposable.Dispose();
}
m_disposables.Clear();
}

private IEnumerator<T> GetTEnumerator()
{
var enumerator = m_dataT.GetEnumerator();
m_disposables.Add(enumerator);
return enumerator;
}

private IEnumerator<S> GetSEnumerator()
{
var enumerator = m_dataS.GetEnumerator();
m_disposables.Add(enumerator);
return enumerator;
}

public bool MoveNext()
{
m_enumerT = m_enumerT ?? GetTEnumerator();
m_enumerS = m_enumerS ?? GetSEnumerator();

if (m_first)
{
if (m_enumerT.MoveNext())
{
if (!m_enumerS.MoveNext())
{
m_enumerS = GetSEnumerator();
m_secondReloaded = true;
if (!m_enumerS.MoveNext())
return false;
}
return true;
}
else
{
m_first = false;
}
}

if (!m_first && !m_secondReloaded)
{
if (m_enumerS.MoveNext())
{
if (!m_enumerT.MoveNext())
{
m_enumerT = GetTEnumerator();
if (!m_enumerT.MoveNext())
return false;
}

return true;
}
}

return false;
}

public void Reset()
{
m_secondReloaded = false;
m_first = true;
m_enumerT = null;
m_enumerS = null;
Dispose();
}
}


It's a little more code than other suggestions, but it encapsulates the problems with the disposal of intermediate enumerators without the necessity of a try-catch-statement. You could discuss if the disposal should be immediately when the enumerator is done or as I do collect them for disposal when the ZipEnumerator itself is disposed off?



The MoveNext() method went a little more complicated than I like, so feel free to edit or suggest improvements.





Edit



A refactored version of ZipEnumerator:



  public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>
{
IEnumerable<T> m_dataT;
IEnumerable<S> m_dataS;
IEnumerator<T> m_enumeratorT;
IEnumerator<S> m_enumeratorS;
List<IDisposable> m_disposables = new List<IDisposable>();
Func<T, S, TResult> m_selector;
bool m_secondReloaded = false;
bool m_isInitilized = false;

public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)
{
m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));
m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));
m_selector = selector ?? throw new ArgumentNullException(nameof(selector));
}

public TResult Current => m_selector(m_enumeratorT.Current, m_enumeratorS.Current);
object IEnumerator.Current => Current;

public void Dispose()
{
DoDispose();
}

private void RegisterDisposable(IDisposable disposable)
{
m_disposables.Add(disposable);
if (m_disposables.Count > 10)
{
DoDispose();
}
}

private void DoDispose()
{
foreach (IDisposable disposable in m_disposables)
{
disposable.Dispose();
}
m_disposables.Clear();
}

private IEnumerator<T> GetTEnumerator()
{
var enumerator = m_dataT.GetEnumerator();
RegisterDisposable(enumerator);
return enumerator;
}

private IEnumerator<S> GetSEnumerator()
{
var enumerator = m_dataS.GetEnumerator();
RegisterDisposable(enumerator);
return enumerator;
}

private Func<bool> CurrentMover = null;

private bool FirstMover()
{
if (m_enumeratorT.MoveNext())
{
if (!m_enumeratorS.MoveNext())
{
m_enumeratorS = GetSEnumerator();
m_secondReloaded = true;
if (!m_enumeratorS.MoveNext())
return false;
}
return true;
}
else if (!m_secondReloaded)
{
CurrentMover = SecondMover;
return CurrentMover();
}

return false;
}

private bool SecondMover()
{
if (m_enumeratorS.MoveNext())
{
if (!m_enumeratorT.MoveNext())
{
m_enumeratorT = GetTEnumerator();
if (!m_enumeratorT.MoveNext())
return false;
}

return true;
}

return false;
}

private void Initialize()
{
m_enumeratorT = GetTEnumerator();
m_enumeratorS = GetSEnumerator();
CurrentMover = FirstMover;
m_isInitilized = true;
}

public bool MoveNext()
{
if (!m_isInitilized)
{
Initialize();
}
return CurrentMover();
}

public void Reset()
{
m_isInitilized = false;
m_secondReloaded = false;
CurrentMover = null;
m_enumeratorT = null;
m_enumeratorS = null;
DoDispose();
}
}





share|improve this answer























  • I'd like to click +1 but when I see the variable names it says -1 so at the and it's a 0 ;-)
    – t3chb0t
    2 days ago










  • I think the word unconventional describes them pretty good... although ts and ss were below that level ;-]
    – t3chb0t
    2 days ago










  • I can explain that ;-P It's local and in a very small scope, this is allowed in my world, tt and ss on the other hand were public.
    – t3chb0t
    2 days ago










  • You might want to refactor this so that the null-checks are eagerly performed instead of when iteration start. Doing so would be in line with other Linq implementations. Calling any Linq method on a null enumerable will throw immediately, not when iterating starts. To do this, have the public extension methods check for nulls, then from that call a private method that's using the yield return lazy evaluation. Example.
    – JAD
    yesterday








  • 1




    @HenrikHansen I saw that. The problem is that if a method is using the yield method of returning a IEnumerable, the entire method is treated as lazy. So nothing in the method starts executing before the first element in the result sequence is accessed. This means that only at that point the ZipEnumerator is constructed, and only then nullchecks are performed. Take a look at this blog by Jon Skeet for another (probably better) explanation.
    – JAD
    yesterday















up vote
4
down vote













If you should respect that not all Enumerators implement Reset() then it is not possible to use using statements for the two IEnumerators. But you could introduce an IEnumerator<TResult> for the zipped result and use it like this:



public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
if (first == null) throw new ArgumentNullException(nameof(first));
if (second == null) throw new ArgumentNullException(nameof(second));
if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));

return InnerZipNew(first, second, resultSelector);
}

private static IEnumerable<TResult> InnerZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
using (ZipEnumerator<TFirst, TSecond, TResult> zipEnumerator = new ZipEnumerator<TFirst, TSecond, TResult>(first, second, resultSelector))
{
while (zipEnumerator.MoveNext())
{
yield return zipEnumerator.Current;
}
}
}


As JAD writes in his comment it is necessary to catch possible invalid input as the first thing and then call a private shadow method to do the actual iteration in order to make the exceptions be thrown when the extension is called rather than when the enumeration is performed.



In this way you're back on the using track.



The ZipEnumerator it self could be something like:



  public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>
{
IEnumerable<T> m_dataT;
IEnumerable<S> m_dataS;
IEnumerator<T> m_enumerT;
IEnumerator<S> m_enumerS;
List<IDisposable> m_disposables = new List<IDisposable>();
Func<T, S, TResult> m_selector;
bool m_secondReloaded = false;
bool m_first = true;

public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)
{
m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));
m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));
m_selector = selector ?? throw new ArgumentNullException(nameof(selector));

}

public TResult Current => m_selector(m_enumerT.Current, m_enumerS.Current);

object IEnumerator.Current => Current;

public void Dispose()
{
foreach (IDisposable disposable in m_disposables)
{
disposable.Dispose();
}
m_disposables.Clear();
}

private IEnumerator<T> GetTEnumerator()
{
var enumerator = m_dataT.GetEnumerator();
m_disposables.Add(enumerator);
return enumerator;
}

private IEnumerator<S> GetSEnumerator()
{
var enumerator = m_dataS.GetEnumerator();
m_disposables.Add(enumerator);
return enumerator;
}

public bool MoveNext()
{
m_enumerT = m_enumerT ?? GetTEnumerator();
m_enumerS = m_enumerS ?? GetSEnumerator();

if (m_first)
{
if (m_enumerT.MoveNext())
{
if (!m_enumerS.MoveNext())
{
m_enumerS = GetSEnumerator();
m_secondReloaded = true;
if (!m_enumerS.MoveNext())
return false;
}
return true;
}
else
{
m_first = false;
}
}

if (!m_first && !m_secondReloaded)
{
if (m_enumerS.MoveNext())
{
if (!m_enumerT.MoveNext())
{
m_enumerT = GetTEnumerator();
if (!m_enumerT.MoveNext())
return false;
}

return true;
}
}

return false;
}

public void Reset()
{
m_secondReloaded = false;
m_first = true;
m_enumerT = null;
m_enumerS = null;
Dispose();
}
}


It's a little more code than other suggestions, but it encapsulates the problems with the disposal of intermediate enumerators without the necessity of a try-catch-statement. You could discuss if the disposal should be immediately when the enumerator is done or as I do collect them for disposal when the ZipEnumerator itself is disposed off?



The MoveNext() method went a little more complicated than I like, so feel free to edit or suggest improvements.





Edit



A refactored version of ZipEnumerator:



  public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>
{
IEnumerable<T> m_dataT;
IEnumerable<S> m_dataS;
IEnumerator<T> m_enumeratorT;
IEnumerator<S> m_enumeratorS;
List<IDisposable> m_disposables = new List<IDisposable>();
Func<T, S, TResult> m_selector;
bool m_secondReloaded = false;
bool m_isInitilized = false;

public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)
{
m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));
m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));
m_selector = selector ?? throw new ArgumentNullException(nameof(selector));
}

public TResult Current => m_selector(m_enumeratorT.Current, m_enumeratorS.Current);
object IEnumerator.Current => Current;

public void Dispose()
{
DoDispose();
}

private void RegisterDisposable(IDisposable disposable)
{
m_disposables.Add(disposable);
if (m_disposables.Count > 10)
{
DoDispose();
}
}

private void DoDispose()
{
foreach (IDisposable disposable in m_disposables)
{
disposable.Dispose();
}
m_disposables.Clear();
}

private IEnumerator<T> GetTEnumerator()
{
var enumerator = m_dataT.GetEnumerator();
RegisterDisposable(enumerator);
return enumerator;
}

private IEnumerator<S> GetSEnumerator()
{
var enumerator = m_dataS.GetEnumerator();
RegisterDisposable(enumerator);
return enumerator;
}

private Func<bool> CurrentMover = null;

private bool FirstMover()
{
if (m_enumeratorT.MoveNext())
{
if (!m_enumeratorS.MoveNext())
{
m_enumeratorS = GetSEnumerator();
m_secondReloaded = true;
if (!m_enumeratorS.MoveNext())
return false;
}
return true;
}
else if (!m_secondReloaded)
{
CurrentMover = SecondMover;
return CurrentMover();
}

return false;
}

private bool SecondMover()
{
if (m_enumeratorS.MoveNext())
{
if (!m_enumeratorT.MoveNext())
{
m_enumeratorT = GetTEnumerator();
if (!m_enumeratorT.MoveNext())
return false;
}

return true;
}

return false;
}

private void Initialize()
{
m_enumeratorT = GetTEnumerator();
m_enumeratorS = GetSEnumerator();
CurrentMover = FirstMover;
m_isInitilized = true;
}

public bool MoveNext()
{
if (!m_isInitilized)
{
Initialize();
}
return CurrentMover();
}

public void Reset()
{
m_isInitilized = false;
m_secondReloaded = false;
CurrentMover = null;
m_enumeratorT = null;
m_enumeratorS = null;
DoDispose();
}
}





share|improve this answer























  • I'd like to click +1 but when I see the variable names it says -1 so at the and it's a 0 ;-)
    – t3chb0t
    2 days ago










  • I think the word unconventional describes them pretty good... although ts and ss were below that level ;-]
    – t3chb0t
    2 days ago










  • I can explain that ;-P It's local and in a very small scope, this is allowed in my world, tt and ss on the other hand were public.
    – t3chb0t
    2 days ago










  • You might want to refactor this so that the null-checks are eagerly performed instead of when iteration start. Doing so would be in line with other Linq implementations. Calling any Linq method on a null enumerable will throw immediately, not when iterating starts. To do this, have the public extension methods check for nulls, then from that call a private method that's using the yield return lazy evaluation. Example.
    – JAD
    yesterday








  • 1




    @HenrikHansen I saw that. The problem is that if a method is using the yield method of returning a IEnumerable, the entire method is treated as lazy. So nothing in the method starts executing before the first element in the result sequence is accessed. This means that only at that point the ZipEnumerator is constructed, and only then nullchecks are performed. Take a look at this blog by Jon Skeet for another (probably better) explanation.
    – JAD
    yesterday













up vote
4
down vote










up vote
4
down vote









If you should respect that not all Enumerators implement Reset() then it is not possible to use using statements for the two IEnumerators. But you could introduce an IEnumerator<TResult> for the zipped result and use it like this:



public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
if (first == null) throw new ArgumentNullException(nameof(first));
if (second == null) throw new ArgumentNullException(nameof(second));
if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));

return InnerZipNew(first, second, resultSelector);
}

private static IEnumerable<TResult> InnerZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
using (ZipEnumerator<TFirst, TSecond, TResult> zipEnumerator = new ZipEnumerator<TFirst, TSecond, TResult>(first, second, resultSelector))
{
while (zipEnumerator.MoveNext())
{
yield return zipEnumerator.Current;
}
}
}


As JAD writes in his comment it is necessary to catch possible invalid input as the first thing and then call a private shadow method to do the actual iteration in order to make the exceptions be thrown when the extension is called rather than when the enumeration is performed.



In this way you're back on the using track.



The ZipEnumerator it self could be something like:



  public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>
{
IEnumerable<T> m_dataT;
IEnumerable<S> m_dataS;
IEnumerator<T> m_enumerT;
IEnumerator<S> m_enumerS;
List<IDisposable> m_disposables = new List<IDisposable>();
Func<T, S, TResult> m_selector;
bool m_secondReloaded = false;
bool m_first = true;

public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)
{
m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));
m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));
m_selector = selector ?? throw new ArgumentNullException(nameof(selector));

}

public TResult Current => m_selector(m_enumerT.Current, m_enumerS.Current);

object IEnumerator.Current => Current;

public void Dispose()
{
foreach (IDisposable disposable in m_disposables)
{
disposable.Dispose();
}
m_disposables.Clear();
}

private IEnumerator<T> GetTEnumerator()
{
var enumerator = m_dataT.GetEnumerator();
m_disposables.Add(enumerator);
return enumerator;
}

private IEnumerator<S> GetSEnumerator()
{
var enumerator = m_dataS.GetEnumerator();
m_disposables.Add(enumerator);
return enumerator;
}

public bool MoveNext()
{
m_enumerT = m_enumerT ?? GetTEnumerator();
m_enumerS = m_enumerS ?? GetSEnumerator();

if (m_first)
{
if (m_enumerT.MoveNext())
{
if (!m_enumerS.MoveNext())
{
m_enumerS = GetSEnumerator();
m_secondReloaded = true;
if (!m_enumerS.MoveNext())
return false;
}
return true;
}
else
{
m_first = false;
}
}

if (!m_first && !m_secondReloaded)
{
if (m_enumerS.MoveNext())
{
if (!m_enumerT.MoveNext())
{
m_enumerT = GetTEnumerator();
if (!m_enumerT.MoveNext())
return false;
}

return true;
}
}

return false;
}

public void Reset()
{
m_secondReloaded = false;
m_first = true;
m_enumerT = null;
m_enumerS = null;
Dispose();
}
}


It's a little more code than other suggestions, but it encapsulates the problems with the disposal of intermediate enumerators without the necessity of a try-catch-statement. You could discuss if the disposal should be immediately when the enumerator is done or as I do collect them for disposal when the ZipEnumerator itself is disposed off?



The MoveNext() method went a little more complicated than I like, so feel free to edit or suggest improvements.





Edit



A refactored version of ZipEnumerator:



  public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>
{
IEnumerable<T> m_dataT;
IEnumerable<S> m_dataS;
IEnumerator<T> m_enumeratorT;
IEnumerator<S> m_enumeratorS;
List<IDisposable> m_disposables = new List<IDisposable>();
Func<T, S, TResult> m_selector;
bool m_secondReloaded = false;
bool m_isInitilized = false;

public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)
{
m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));
m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));
m_selector = selector ?? throw new ArgumentNullException(nameof(selector));
}

public TResult Current => m_selector(m_enumeratorT.Current, m_enumeratorS.Current);
object IEnumerator.Current => Current;

public void Dispose()
{
DoDispose();
}

private void RegisterDisposable(IDisposable disposable)
{
m_disposables.Add(disposable);
if (m_disposables.Count > 10)
{
DoDispose();
}
}

private void DoDispose()
{
foreach (IDisposable disposable in m_disposables)
{
disposable.Dispose();
}
m_disposables.Clear();
}

private IEnumerator<T> GetTEnumerator()
{
var enumerator = m_dataT.GetEnumerator();
RegisterDisposable(enumerator);
return enumerator;
}

private IEnumerator<S> GetSEnumerator()
{
var enumerator = m_dataS.GetEnumerator();
RegisterDisposable(enumerator);
return enumerator;
}

private Func<bool> CurrentMover = null;

private bool FirstMover()
{
if (m_enumeratorT.MoveNext())
{
if (!m_enumeratorS.MoveNext())
{
m_enumeratorS = GetSEnumerator();
m_secondReloaded = true;
if (!m_enumeratorS.MoveNext())
return false;
}
return true;
}
else if (!m_secondReloaded)
{
CurrentMover = SecondMover;
return CurrentMover();
}

return false;
}

private bool SecondMover()
{
if (m_enumeratorS.MoveNext())
{
if (!m_enumeratorT.MoveNext())
{
m_enumeratorT = GetTEnumerator();
if (!m_enumeratorT.MoveNext())
return false;
}

return true;
}

return false;
}

private void Initialize()
{
m_enumeratorT = GetTEnumerator();
m_enumeratorS = GetSEnumerator();
CurrentMover = FirstMover;
m_isInitilized = true;
}

public bool MoveNext()
{
if (!m_isInitilized)
{
Initialize();
}
return CurrentMover();
}

public void Reset()
{
m_isInitilized = false;
m_secondReloaded = false;
CurrentMover = null;
m_enumeratorT = null;
m_enumeratorS = null;
DoDispose();
}
}





share|improve this answer














If you should respect that not all Enumerators implement Reset() then it is not possible to use using statements for the two IEnumerators. But you could introduce an IEnumerator<TResult> for the zipped result and use it like this:



public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
if (first == null) throw new ArgumentNullException(nameof(first));
if (second == null) throw new ArgumentNullException(nameof(second));
if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));

return InnerZipNew(first, second, resultSelector);
}

private static IEnumerable<TResult> InnerZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
using (ZipEnumerator<TFirst, TSecond, TResult> zipEnumerator = new ZipEnumerator<TFirst, TSecond, TResult>(first, second, resultSelector))
{
while (zipEnumerator.MoveNext())
{
yield return zipEnumerator.Current;
}
}
}


As JAD writes in his comment it is necessary to catch possible invalid input as the first thing and then call a private shadow method to do the actual iteration in order to make the exceptions be thrown when the extension is called rather than when the enumeration is performed.



In this way you're back on the using track.



The ZipEnumerator it self could be something like:



  public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>
{
IEnumerable<T> m_dataT;
IEnumerable<S> m_dataS;
IEnumerator<T> m_enumerT;
IEnumerator<S> m_enumerS;
List<IDisposable> m_disposables = new List<IDisposable>();
Func<T, S, TResult> m_selector;
bool m_secondReloaded = false;
bool m_first = true;

public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)
{
m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));
m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));
m_selector = selector ?? throw new ArgumentNullException(nameof(selector));

}

public TResult Current => m_selector(m_enumerT.Current, m_enumerS.Current);

object IEnumerator.Current => Current;

public void Dispose()
{
foreach (IDisposable disposable in m_disposables)
{
disposable.Dispose();
}
m_disposables.Clear();
}

private IEnumerator<T> GetTEnumerator()
{
var enumerator = m_dataT.GetEnumerator();
m_disposables.Add(enumerator);
return enumerator;
}

private IEnumerator<S> GetSEnumerator()
{
var enumerator = m_dataS.GetEnumerator();
m_disposables.Add(enumerator);
return enumerator;
}

public bool MoveNext()
{
m_enumerT = m_enumerT ?? GetTEnumerator();
m_enumerS = m_enumerS ?? GetSEnumerator();

if (m_first)
{
if (m_enumerT.MoveNext())
{
if (!m_enumerS.MoveNext())
{
m_enumerS = GetSEnumerator();
m_secondReloaded = true;
if (!m_enumerS.MoveNext())
return false;
}
return true;
}
else
{
m_first = false;
}
}

if (!m_first && !m_secondReloaded)
{
if (m_enumerS.MoveNext())
{
if (!m_enumerT.MoveNext())
{
m_enumerT = GetTEnumerator();
if (!m_enumerT.MoveNext())
return false;
}

return true;
}
}

return false;
}

public void Reset()
{
m_secondReloaded = false;
m_first = true;
m_enumerT = null;
m_enumerS = null;
Dispose();
}
}


It's a little more code than other suggestions, but it encapsulates the problems with the disposal of intermediate enumerators without the necessity of a try-catch-statement. You could discuss if the disposal should be immediately when the enumerator is done or as I do collect them for disposal when the ZipEnumerator itself is disposed off?



The MoveNext() method went a little more complicated than I like, so feel free to edit or suggest improvements.





Edit



A refactored version of ZipEnumerator:



  public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>
{
IEnumerable<T> m_dataT;
IEnumerable<S> m_dataS;
IEnumerator<T> m_enumeratorT;
IEnumerator<S> m_enumeratorS;
List<IDisposable> m_disposables = new List<IDisposable>();
Func<T, S, TResult> m_selector;
bool m_secondReloaded = false;
bool m_isInitilized = false;

public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)
{
m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));
m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));
m_selector = selector ?? throw new ArgumentNullException(nameof(selector));
}

public TResult Current => m_selector(m_enumeratorT.Current, m_enumeratorS.Current);
object IEnumerator.Current => Current;

public void Dispose()
{
DoDispose();
}

private void RegisterDisposable(IDisposable disposable)
{
m_disposables.Add(disposable);
if (m_disposables.Count > 10)
{
DoDispose();
}
}

private void DoDispose()
{
foreach (IDisposable disposable in m_disposables)
{
disposable.Dispose();
}
m_disposables.Clear();
}

private IEnumerator<T> GetTEnumerator()
{
var enumerator = m_dataT.GetEnumerator();
RegisterDisposable(enumerator);
return enumerator;
}

private IEnumerator<S> GetSEnumerator()
{
var enumerator = m_dataS.GetEnumerator();
RegisterDisposable(enumerator);
return enumerator;
}

private Func<bool> CurrentMover = null;

private bool FirstMover()
{
if (m_enumeratorT.MoveNext())
{
if (!m_enumeratorS.MoveNext())
{
m_enumeratorS = GetSEnumerator();
m_secondReloaded = true;
if (!m_enumeratorS.MoveNext())
return false;
}
return true;
}
else if (!m_secondReloaded)
{
CurrentMover = SecondMover;
return CurrentMover();
}

return false;
}

private bool SecondMover()
{
if (m_enumeratorS.MoveNext())
{
if (!m_enumeratorT.MoveNext())
{
m_enumeratorT = GetTEnumerator();
if (!m_enumeratorT.MoveNext())
return false;
}

return true;
}

return false;
}

private void Initialize()
{
m_enumeratorT = GetTEnumerator();
m_enumeratorS = GetSEnumerator();
CurrentMover = FirstMover;
m_isInitilized = true;
}

public bool MoveNext()
{
if (!m_isInitilized)
{
Initialize();
}
return CurrentMover();
}

public void Reset()
{
m_isInitilized = false;
m_secondReloaded = false;
CurrentMover = null;
m_enumeratorT = null;
m_enumeratorS = null;
DoDispose();
}
}






share|improve this answer














share|improve this answer



share|improve this answer








edited yesterday

























answered 2 days ago









Henrik Hansen

6,2531722




6,2531722












  • I'd like to click +1 but when I see the variable names it says -1 so at the and it's a 0 ;-)
    – t3chb0t
    2 days ago










  • I think the word unconventional describes them pretty good... although ts and ss were below that level ;-]
    – t3chb0t
    2 days ago










  • I can explain that ;-P It's local and in a very small scope, this is allowed in my world, tt and ss on the other hand were public.
    – t3chb0t
    2 days ago










  • You might want to refactor this so that the null-checks are eagerly performed instead of when iteration start. Doing so would be in line with other Linq implementations. Calling any Linq method on a null enumerable will throw immediately, not when iterating starts. To do this, have the public extension methods check for nulls, then from that call a private method that's using the yield return lazy evaluation. Example.
    – JAD
    yesterday








  • 1




    @HenrikHansen I saw that. The problem is that if a method is using the yield method of returning a IEnumerable, the entire method is treated as lazy. So nothing in the method starts executing before the first element in the result sequence is accessed. This means that only at that point the ZipEnumerator is constructed, and only then nullchecks are performed. Take a look at this blog by Jon Skeet for another (probably better) explanation.
    – JAD
    yesterday


















  • I'd like to click +1 but when I see the variable names it says -1 so at the and it's a 0 ;-)
    – t3chb0t
    2 days ago










  • I think the word unconventional describes them pretty good... although ts and ss were below that level ;-]
    – t3chb0t
    2 days ago










  • I can explain that ;-P It's local and in a very small scope, this is allowed in my world, tt and ss on the other hand were public.
    – t3chb0t
    2 days ago










  • You might want to refactor this so that the null-checks are eagerly performed instead of when iteration start. Doing so would be in line with other Linq implementations. Calling any Linq method on a null enumerable will throw immediately, not when iterating starts. To do this, have the public extension methods check for nulls, then from that call a private method that's using the yield return lazy evaluation. Example.
    – JAD
    yesterday








  • 1




    @HenrikHansen I saw that. The problem is that if a method is using the yield method of returning a IEnumerable, the entire method is treated as lazy. So nothing in the method starts executing before the first element in the result sequence is accessed. This means that only at that point the ZipEnumerator is constructed, and only then nullchecks are performed. Take a look at this blog by Jon Skeet for another (probably better) explanation.
    – JAD
    yesterday
















I'd like to click +1 but when I see the variable names it says -1 so at the and it's a 0 ;-)
– t3chb0t
2 days ago




I'd like to click +1 but when I see the variable names it says -1 so at the and it's a 0 ;-)
– t3chb0t
2 days ago












I think the word unconventional describes them pretty good... although ts and ss were below that level ;-]
– t3chb0t
2 days ago




I think the word unconventional describes them pretty good... although ts and ss were below that level ;-]
– t3chb0t
2 days ago












I can explain that ;-P It's local and in a very small scope, this is allowed in my world, tt and ss on the other hand were public.
– t3chb0t
2 days ago




I can explain that ;-P It's local and in a very small scope, this is allowed in my world, tt and ss on the other hand were public.
– t3chb0t
2 days ago












You might want to refactor this so that the null-checks are eagerly performed instead of when iteration start. Doing so would be in line with other Linq implementations. Calling any Linq method on a null enumerable will throw immediately, not when iterating starts. To do this, have the public extension methods check for nulls, then from that call a private method that's using the yield return lazy evaluation. Example.
– JAD
yesterday






You might want to refactor this so that the null-checks are eagerly performed instead of when iteration start. Doing so would be in line with other Linq implementations. Calling any Linq method on a null enumerable will throw immediately, not when iterating starts. To do this, have the public extension methods check for nulls, then from that call a private method that's using the yield return lazy evaluation. Example.
– JAD
yesterday






1




1




@HenrikHansen I saw that. The problem is that if a method is using the yield method of returning a IEnumerable, the entire method is treated as lazy. So nothing in the method starts executing before the first element in the result sequence is accessed. This means that only at that point the ZipEnumerator is constructed, and only then nullchecks are performed. Take a look at this blog by Jon Skeet for another (probably better) explanation.
– JAD
yesterday




@HenrikHansen I saw that. The problem is that if a method is using the yield method of returning a IEnumerable, the entire method is treated as lazy. So nothing in the method starts executing before the first element in the result sequence is accessed. This means that only at that point the ZipEnumerator is constructed, and only then nullchecks are performed. Take a look at this blog by Jon Skeet for another (probably better) explanation.
– JAD
yesterday


















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f208152%2fcustom-implementation-of-the-linq-zip-operator-for-different-length-lists%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Quarter-circle Tiles

build a pushdown automaton that recognizes the reverse language of a given pushdown automaton?

Mont Emei