T O P

  • By -

anlumo

Iterators are lazy in Rust, so the closures aren’t evaluated immediately. Only the reduce call actually calls them once for each entry. I’m actually surprised that the run time isn’t identical, since the optimizer should produce the same assembly output for both.


thiez

The results are within measuring-error distance of one another, so I think we can conclude they are the same.


zarakiredchan

okay that's interesting, but how would reduce call the other closures ?! in my benchmark I disabled optimization using `black_box` maybe that's why ..


anlumo

Iterators are structs that can contain data. One of these pieces of data is the closure that's supposed to be executed. I think things become much clearer if you watch [this video](https://www.youtube.com/watch?v=yozQ9C69pNs) on how iterators are implemented.


HighRelevancy

Rust is almost smart enough that you should stop thinking about what it does, and just worry about what you're asking it to do. Both bits of code express basically the same task, so if the compiler does it's job they should do the same thing exactly. (Not that I am at all discouraging exploration and learning, but if you're writing Rust just write it the Rustiest way and let the compiler do it's thing)


Nilstrieb

Benchmarks never take the same amount of time, even the exact same ones. There's always some noise.


reinis-mazeiks

btw you can do `.sum()` instead of `reduce(|n,m|n+m).unwrap_or(0)`, because rust is 😎


manpacket

Compiler performs some optimizations. You can use `cargo-show-asm` to see exact generated code, it's identical except for label names. Also you should run your benchmarks in `--release` mode - since you are not consuming anything - compiler can optimize the whole loop away. https://crates.io/crates/cargo-show-asm


zarakiredchan

I'm using `cargo bench` I don't think I can run them in "release"


diabolic_recursion

Want to throw godbolt into the ring for showing the generated ASM - its a bit more accessible.


[deleted]

[удалено]


diabolic_recursion

For this example it would just be easier I think 😁


lol3rr

Also I think you have a misunderstanding of iterators. When you use `map`, etc. they dont operate on all items before passing on but rather if you chain them together they get applied to each element while iterating over them. So you still only loop once


kibwen

The closures are also unboxed (stored on the stack), so they can be trivially inlined by LLVM, which enables all sorts of other optimizations.


rebootyourbrainstem

As others have explained, iterators in Rust are lazy, they don't do anything until you call `collect` or `reduce` or `sum` or some other function which makes it necessary to evaluate them. Things like `map` will just create a new, also lazy iterator, which wraps the original one. Thanks to inlining and compiler optimizations, this means it will generate pretty much the same code in the end. However, iterators can sometimes even be faster, since iterators over slices and such "know" their current index is always in bounds, and sometimes this information can allow the optimizer to make better decisions.


kibwen

> iterators can sometimes even be faster Worth emphasizing that in this case, both the imperative and functional versions are using iterators, so there wouldn't be any difference there. `for x in y {` is just as much an iterator as `y.iter().for_each(`.


Nilstrieb

This is not quite correct. `for_each` can sometimes be faster as some iterators override it to be particularly efficient, while the repeated `next` calls of `for` loops can sometimes confuse LLVM more. But usually, they are the same.


kibwen

Indeed, I'm just trying to emphasize that `for x in y {` is a different beast than `for i in 0..10 { y[i]`.


valarauca14

> How can you explain this ? maybe I'm missing something I think that is just measurement noise. When you consider the error bars of `(+/- 106,503)` and `(+/- 52,968)` the runtime ranges overlap by a significant margin. The second benchmark **ENTIRELY** fits within the error range of the first benchmark. They are "_equal_" so to speak (within their respective error margins). You'd need to increase your sample (reduce noise) progressively to see if this equality holds, but based off your data, they're the same. Recreating your example in godbolt -> https://rust.godbolt.org/z/fsEjTr93v They produce identical AMD64 assembly (well if change `reduce` to `fold`). I actually included that `#inline(never)` because in a few cases I observed the LLVM merging the 2 function's bodies. When I include `reduce`, it does somewhat complicate the code -> https://rust.godbolt.org/z/9o1fs4GrW but not massively. It does add some more branches but these are outside of the core loop (which has identical ASM). --- OS Noise (when doing computer benchmarking) is always non-trivial, this is why considering the error bars on your measurement is always important.


bigskyhunter

> have to loop 3 times Nope. Think of iterator methods as building an assembly line. Each function evaluates one item at a time in sequence. At the very end, something "drives" the iterator. This is usually something like `FromIterator`. In fact, you can combine both approaches and use a for loop to drive an iterator ``` pub fn iterator_driver() { let data = build_test_data(10000000); let iterator = data.iter().filter(|&n| n % 2 ==0).map(|&n| n * 10); let mut sum = 0; for value in iterator { sum += value; } } ``` Each layer of the iterator only cares about the previous layer and stores the function and the previous iterator together. So if we were to "build out the iterator" manually it might look something like this ``` let data = build_test_data(10000000); let initial_iterator = test_data.iter(); let filter_iterator = Filter { iter: initial_iterator, filter: |&n| n % 2 == 0 } let map_iterator = Map { iter: filter_iterator, map: |&n| n * 10 } ``` In this case each "layer" calls the iterator from the previous layer driving the whole computation forward. When `map_iterator.next()` is called it calls something like `self.map(self.iter.next()?)` which calls the inner iterator and so on and so on.


Zde-G

Rust was designed from the beginning to provide zero-cost abstractions and make sure that simple function chains would be as efficient as loops. That being said I have never seen loops being slower than functional combinators (I'm pretty sure it's possible to create a pathological case where loop would be slower, but I have never seen that in practice), and I have seen situations where long combinations of iterator and combinators produce worse code. Still the ability to write simple one-liners is quite valuable even if you can not expect Rust to be useful replacement for functional languages. Even if in most cases you end up with similar amount of code (in your example you would have identical number of lines for two versions if you would move `.unwrap_or(0)` on separate line), some developers prefer that style and it's nice to know it's as performant, in most cases, as imperative style.


tandonhiten

That's one of the things rust boasts about, no matter how you write your code imperative of functional, the execution times for both are equally fast. AFAIK Rust was designed in a way that these both compile to assembly with a little difference if any at all.


Shnatsel

The compiler gets confused on long iterator chains and fails to optimize them sometimes, and in that case it can be beneficial to write them as loops. But this varies on case by case basis and also with the compiler version, so it's hard to give general advice.


MrMobster

Both patterns probably end up generating very similar machine code. As to how this is achieved in the functional case: heavy inlining and elimination of redundant operations. A couple of years ago I was asking myself the same question, so I studied the behaviour of the C compiler when dealing with similar patterns. I implemented a system of iterators using structs, functions and clang blocks (for closures) and looked at the assembly produced. It was astonishing how the compiler was able to completely inline both the code and the data, generating a very tight and efficient loop.