The Wayback Machine - https://web.archive.org/web/20230119203048/https://github.com/ruby/ruby/pull/4561
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document possible pitfall in each_with_object iteration #4561

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ianfixes
Copy link

@ianfixes ianfixes commented Jun 9, 2021

This patch only updates code comments (for documentation purposes), not code.

Consider the following code:

["a", "b", "c"].each_with_object([]) do |x, acc|
  if x == "b"
    acc = ["potatoes"]
  else
    acc << "#{x}-ok"
  end
end

This returns ["a-ok", "c-ok"], although ["potatoes", "c-ok"] might be expected. This is infuriating to debug, because all indications are that the assignment was successful.

The proper approach is to avoid assignment to the memo_obj at all costs.

["a", "b", "c"].each_with_object([]) do |x, acc|
  if x == "b"
    acc.replace(["potatoes"])
  else
    acc << "#{x}-ok"
  end
end

I've updated the documentation to mirror that of inject, which offers more clarity into the nature of the object being passed in:

ruby/enum.c

Lines 877 to 883 in 9210f8d

* If you specify a block, then for each element in <i>enum</i>
* the block is passed an accumulator value (<i>memo</i>) and the element.
* If you specify a symbol instead, then each element in the collection
* will be passed to the named method of <i>memo</i>.
* In either case, the result becomes the new value for <i>memo</i>.
* At the end of the iteration, the final value of <i>memo</i> is the
* return value for the method.

Consider the following code:

```ruby
["a", "b", "c"].each_with_object([]) do |x, acc|
  if x == "b"
    acc = ["potatoes"]
  else
    acc << "#{x}-ok"
  end
end
```

This returns `["a-ok", "c-ok"]`, although `["potatoes", "c-ok"]` might be expected.  This is infuriating to debug, because all indications are that the assignment was successful.

The proper approach is to avoid assignment to the `memo_obj` at all costs.
```ruby
["a", "b", "c"].each_with_object([]) do |x, acc|
  if x == "b"
    acc.replace(["potatoes"])
  else
    acc << "#{x}-ok"
  end
end
```
@ianfixes
Copy link
Author

ianfixes commented Jun 9, 2021

@jeremyevans I see you have reviewed other contributions to documentation. It looks like I can't request a review from you on this PR via the interface.

enum.c Outdated Show resolved Hide resolved
@duerst
Copy link
Member

duerst commented Jun 9, 2021

I'd actually support making it clearer that only changes to the object, not assignment to the variable, have lasting effects. The reason is that there are similar methods (the most prominent being inject/reduce) where assignments actually have an effect. So making the reader aware of the fact that there is a difference seems important.

@nobu
Copy link
Member

nobu commented Jun 10, 2021

The reason is that there are similar methods (the most prominent being inject/reduce) where assignments actually have an effect.

Assignments to argument(s) can never have effects outside the block.
You may have placed an assignment at the end of block?

In general, that assignment in a local scope has no effect outside is elementary and very common in modern programing languages, except for special cases such as assignment operators in C++.
It may be worth to be described somewhere (doc/syntax/assignment.rdoc ?) for beginners.

@ianfixes
Copy link
Author

ianfixes commented Jun 10, 2021

There are 2 reasons I refer to documentation:

  1. to gain an initial understanding of what an object is capable of
  2. because something is not working the way I expect and I'm having difficulty figuring out exactly where the problem might be

This was an example of that second case. I've been programming in Ruby for about a decade, and in this instance -- against the backdrop of the code in context -- the "beginner error" that I'm making wasn't obvious. (It is, of course, much easier to see in hindsight, especially when posted in minimal form.)

But the question is worth asking: is there a reason that Enumerable's documentation (of all things) shouldn't be more accessible to a beginner audience? It seems like something a beginner would run into fairly quickly.

There is every reason to close this bug with the comment "well, Ian is just an idiot sometimes and he shouldn't be". But I opened this PR to express the manner in which I found the documentation lacking.

@meinac
Copy link

meinac commented Jun 11, 2021

But the question is worth asking: is there a reason that Enumerable's documentation (of all things) shouldn't be more accessible to a beginner audience? It seems like something a beginner would run into fairly quickly.

@ianfixes - This is an elementary level information about the assignments which should be covered in another place if hasn't already been as this behavior is nothing different than just local variable assignment, like so;

def foo(var)
  var = :zoo
end

var = :bar
foo(var)

puts var # bar

@ianfixes
Copy link
Author

Respectfully, I think you are missing my point.

We agree that there is no need to put beginner-level instruction on the nature of assignment in ruby into the documentation of each_with_object.

What I'm suggesting is that the documentation cover a pitfall which is specific to this function: if you need to change the value of the "arbitrary object", be sure to use an object method (like .replace) and not an assignment operator.

@ianfixes ianfixes changed the title Document unexpected behavior in each_with_object iteration Document possible pitfall in each_with_object iteration Jun 11, 2021
Reword the function documentation to add clarity while avoiding giving away any hints about how assignment in ruby affects the use of this function in practice
@nobu
Copy link
Member

nobu commented Jun 13, 2021

What I'm suggesting is that the documentation cover a pitfall which is specific to this function: if you need to change the value of the "arbitrary object", be sure to use an object method (like .replace) and not an assignment operator.

I think the others consider that it is a part of “the nature of assignment” and not specific to this method.

@ianfixes
Copy link
Author

@jeremyevans what do you think of the updated text?

@jeremyevans
Copy link
Contributor

@jeremyevans what do you think of the updated text?

I think it's a bit long. A bigger issue I see is with the call_seq, which doesn't make it obvious that the argument passed to the method is the same as the second argument passed to the block.

Maybe:

 *  call-seq:
 *    enum.each_with_object(obj) { |element, obj| ... }  ->  obj
 *    enum.each_with_object(obj)                         ->  an_enumerator
 *
 *  For each element in <i>enum</i>, yield the element and the
 *  argument to the method. Returns the argument.

enum.c Outdated Show resolved Hide resolved
@ianfixes
Copy link
Author

ianfixes commented Jun 14, 2021

I think the others consider that it is a part of “the nature of assignment” and not specific to this method.

Again, respectfully, you are talking past me.

When I read the Enumerable documentation, all of the methods that take a block fall into one of 3 categories:

How the block affects the method's return value Methods
No effect cycle, each_cons, each_entry, each_slice, each_with_index, reverse_each, zip
The block's return value informs the method's behavior all?, any?, chunk, chunk_while, collect, collect_concat, count, detect, drop_while, find, find_all, find_index, flat_map, grep, grep_v, group_by, inject, map, max, max_by, min, min_by, minmax, minmax_by, none?, one?, partition, reduce, reject, select, slice_after, slice_before, slice_when, sort, sort_by, sum, take_while, uniq
A side effect on one of the block's arguments informs the method's behavior each_with_object

Would you agree that one of these things is not like the others?

Co-authored-by: Marivaldo Cavalheiro <[email protected]>
@ianfixes
Copy link
Author

 *    enum.each_with_object(obj) { |element, obj| ... }  ->  obj

I considered doing this, but thought that it would be confusing (in the sense that it looks like shadowing). Can you confirm if that's OK here?

Let me see about shortening the text.

@jeremyevans
Copy link
Contributor

 *    enum.each_with_object(obj) { |element, obj| ... }  ->  obj

I considered doing this, but thought that it would be confusing (in the sense that it looks like shadowing). Can you confirm if that's OK here?

I think in this case it is fine, and properly shows that the method parameter, second block argument, and return argument are all the same object. However, other committers may have different opinions.

I think @nobu's point is not about each_with_object versus other enumerable methods. It's about your issue being caused by the nature of assignment/scope in general:

def foo(bar)
  bar = 2
end
bar = 3
foo(bar)
bar # 3, not 2

You are stating this is a pitfall specific to each_with_object. I disagree with that. I agree with @nobu that it is not specific to each_with_object, it's just how assignment/scope work in Ruby, as shown in the above example.

@ianfixes
Copy link
Author

Might as well show all my work. I'm working from the existing documentation for inject:

inject(initial) { |memo, obj| block } → obj

If you specify a block, then for each element in enum the block is passed an accumulator value (memo) and the element. If you specify a symbol instead, then each element in the collection will be passed to the named method of memo. In either case, the result becomes the new value for memo. At the end of the iteration, the final value of memo is the return value for the method.

This documentation uses obj twice, but one is the element and one is the return value -- I'd rather not do the same thing here.
Removing the part about symbols (which doesn't apply to each_with_object) and tweaking the existing variable names, I get this:

each_with_object(obj) { |(*args), memo_obj| ... } -> obj
each_with_object(initial) { |(*args), initial_obj| ... } -> initial

For each element in enum, the block is passed the element (*args) and initial as an accumulator value (initial_obj). At the end of the iteration, the final value of initial is the return value for the method.

Is that better? I feel like inject and each_with_object have enough in common that the descriptions should contain similar language, is that the wrong starting point?

@jeremyevans
Copy link
Contributor

Might as well show all my work. I'm working from the existing documentation for inject:

inject(initial) { |memo, obj| block } → obj
If you specify a block, then for each element in enum the block is passed an accumulator value (memo) and the element. If you specify a symbol instead, then each element in the collection will be passed to the named method of memo. In either case, the result becomes the new value for memo. At the end of the iteration, the final value of memo is the return value for the method.

This documentation uses obj twice, but one is the element and one is the return value -- I'd rather not do the same thing here.

The reason this is bad in inject is that the block argument obj and the return value obj represent two different objects. I would change the block argument from obj to element, but that's out of scope for this discussion on each_with_object.

Removing the part about symbols (which doesn't apply to each_with_object) and tweaking the existing variable names, I get this:

each_with_object(obj) { |(*args), memo_obj| ... } -> obj
each_with_object(initial) { |(*args), initial_obj| ... } -> initial
For each element in enum, the block is passed the element (*args) and initial as an accumulator value (initial_obj). At the end of the iteration, the final value of initial is the return value for the method.

Is that better? I feel like inject and each_with_object have enough in common that the descriptions should contain similar language, is that the wrong starting point?

No, I don't think that is better. It misses the primary point of my recommendation, which is to reflect in the call-seq that the object passed to the method is yielded as the second argument to the block and also the return value.

I don't think inject and each_with_index are sufficiently similar such that the documentation of one is required to be based on the other.

* <i>obj</i>) and the element. At the end of the iteration, <i>obj</i>
* is returned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of block parameters is reverse of inject.

Suggested change
* <i>obj</i>) and the element. At the end of the iteration, <i>obj</i>
* is returned.
* <i>obj</i>) in addition to the element. At the end of the iteration,
* <i>obj</i> is returned.

@ianfixes
Copy link
Author

ianfixes commented Jun 29, 2021

It sounds like 2 maintainer-acceptable options are being proposed for this text:

@jeremyevans version

enum.each_with_object(obj) { |(*args), obj| ... } -> obj
enum.each_with_object(obj) -> an_enumerator
For each element in enum, the block is passed the element (*args) and an accumulator value (obj). At the end of the iteration, the final value of obj is the return value for the method.

vs

@nobu version

enum.each_with_object(obj) { |(*args), memo_obj| ... } -> obj
enum.each_with_object(obj) -> an_enumerator
For each element in enum, the block is passed the element and an accumulator value (memo_obj), which is a reference to the initially given object (obj). At the end of the iteration, obj is returned

Which one should I go with?

@ianfixes
Copy link
Author

ianfixes commented Jul 6, 2021

Bumping this, please let me know if I should wait more than a week before bumping it again

@hsbt hsbt added the Documentation Improvements to documentation. label Jan 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Improvements to documentation.
7 participants