Skip to content

Adding indenttabchar, indentspacechar and spacechar options #3760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Neko-Box-Coder
Copy link
Contributor

@Neko-Box-Coder Neko-Box-Coder commented May 24, 2025

This PR added the options of indentspacechar and spacechar and moved indentchar to indenttabchar to avoid confusion.

indenttabchar will display the specified character when \t is encountered.

indentspacechar will display the specified character when ' ' is encountered at tabsize locations

spacechar will display the specified character when ' ' is encountered, but indentspacechar takes precedence.

Created this PR because I wanted the feature to view files with deeply nested blocks.
Gonna merge it to my own fork https://github.com/Neko-Box-Coder/micro-dev for whoever needs it while waiting for this to be merged.

This should address long-standing requests of #2539 , #2707 and #431

I am happy to revert the prompt message of telling user indentchar is deprecated, if people/maintainer finds it controversial.

@Neko-Box-Coder Neko-Box-Coder force-pushed the MoreCharOptions branch 4 times, most recently from 22c9b6f to f2c18b2 Compare May 25, 2025 00:10
default value: ` ` (space)

* `indentspacechar`: same as `indenttabchar` except for spaces that are at
locations that are divisible by `tabsize`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if tabsize doesn't match the actual indentation used in the given file (which happens more often than not)? If my detectindent plugin is used, it tries to detect the actual indentation width and adjusts tabsize accordingly, but this plugin is not even built-in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what the problem is here.

The user (or any plugins) can always change the tabsize and the indentspacechar will be displayed accordingly?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the default behavior is confusing. For example, the default value of tabsize is 4, the user edits a file with 2-space indentation and doesn't bother changing the tabsize value (because it doesn't matter to him), then the user sets this indentspacechar option to highlight indentation, but only half of the indentation is highlighted.

I have no better suggestions right now, so I'm fine with it.

setting available in many other text editors. The color of this character is
determined by the `indent-char` field in the current theme rather than the
default text color.
* `indenttabchar`: sets the indentation character for tabs. This will not be
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "indentation character for tabs"? AFAICS indenttabchar does exactly what indentchar currently does, i.e. simply displays all tabs, so why not call it just tabchar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess technically they are the same? Since the tab character is always (displayed) aligned to tabsize?

I can change it to tabsize, I named it this ways just (trying) to be consistent to what it was called previously.

@dmaluka
Copy link
Collaborator

dmaluka commented Jun 1, 2025

Do we need to deprecate indentchar? I'd rather fix it to do something closer to what its documentation says, instead of what it does now (i.e. simply displays all tab characters regardless of indentation).

It feels that by adding that many new options we are creating even more mess and confusion.

Why not just add tabchar which simply displays all tabs (i.e. what indentchar does now), spacechar which simply displays all spaces, and repurpose indentchar, for example, to display all tabs and all spaces at the position of tabs (i.e. divisible by tabsize)? And for example let tabchar (if set to any other value than ) take precedence over indentchar, so that if you want different characters for tabs and for spaces at the position of tabs, you can set tabchar for the former and indentchar for the latter?

BTW another request in #431 (comment) was to display other non-printable characters, which is quite reasonable, so we might add another option e.g. invisiblechar for that.

@Neko-Box-Coder
Copy link
Contributor Author

Do we need to deprecate indentchar? I'd rather fix it to do something closer to what its documentation says, instead of what it does now (i.e. simply displays all tab characters regardless of indentation).

It feels that by adding that many new options we are creating even more mess and confusion.

Why not just add tabchar which simply displays all tabs (i.e. what indentchar does now), spacechar which simply displays all spaces, and repurpose indentchar, for example, to display all tabs and all spaces at the position of tabs (i.e. divisible by tabsize)? And for example let tabchar (if set to any other value than ) take precedence over indentchar, so that if you want different characters for tabs and for spaces at the position of tabs, you can set tabchar for the former and indentchar for the latter?

Repurposing a setting would be confusing since the user is not expecting it unless they read the documentation.

I guess if we want to truly do what the name says, we can have a setting indenttabchar for all the tab characters before the first letter in a line and another setting tabchar for all the tabs after the first character, similar to how indentspacechar is handled.

BTW another request in #431 (comment) was to display other non-printable characters, which is quite reasonable, so we might add another option e.g. invisiblechar for that.

Okay, I will add that as well.

@dmaluka
Copy link
Collaborator

dmaluka commented Jun 1, 2025

Repurposing a setting would be confusing since the user is not expecting it unless they read the documentation.

It would be "backward compatible" in a sense: tabs would be still highlighted.

Spaces at the position of tabs would become highlighted as well, but if the user changed the default value of indentchar in the first place, quite likely the user actually wants that?

I guess if we want to truly do what the name says, we can have a setting indenttabchar for all the tab characters before the first letter in a line and another setting tabchar for all the tabs after the first character, similar to how indentspacechar is handled.

Let's not overcomplicate without real need.

@Neko-Box-Coder
Copy link
Contributor Author

Repurposing a setting would be confusing since the user is not expecting it unless they read the documentation.

It would be "backward compatible" in a sense: tabs would be still highlighted.

Spaces at the position of tabs would become highlighted as well, but if the user changed the default value of indentchar in the first place, quite likely the user actually wants that?

Okay, I see. But if tabchar is "overriding" indentchar while spacechar isn't, wouldn't it make more sense to call it indenttabchar? And this hidden behavior (where tabchar is "overriding" indentchar) is no longer hidden?

I am fine to do it either way tbh. Just a question.

The reason I deprecated indentchar other than clarity is because I don't need to check multiple settings for the same character (tabchar vs indentchar if both are populated) each time, makes it a bit easier to follow the logic in code as well.

Another reason is what if the user actually wants space (the default) instead of indentchar, we wouldn't be able to tell as well.

I guess if we want to truly do what the name says, we can have a setting indenttabchar for all the tab characters before the first letter in a line and another setting tabchar for all the tabs after the first character, similar to how indentspacechar is handled.

Let's not overcomplicate without real need.

Okay.

@dmaluka
Copy link
Collaborator

dmaluka commented Jun 1, 2025

But if tabchar is "overriding" indentchar while spacechar isn't, wouldn't it make more sense to call it indenttabchar?

Its primary role is to simply specify which character to use to visualize tabs. It is not primarily about indentation. What do we gain by making its name more verbose?

And this hidden behavior (where tabchar is "overriding" indentchar)

It doesn't need to be hidden, it can (and probably should) be documented as well.

@Neko-Box-Coder
Copy link
Contributor Author

Neko-Box-Coder commented Jun 1, 2025

But if tabchar is "overriding" indentchar while spacechar isn't, wouldn't it make more sense to call it indenttabchar?

Its primary role is to simply specify which character to use to visualize tabs. It is not primarily about indentation.

I know, but that's what it was called.

That's why I said maybe we could have 2 settings for tabs, 1 act like indentspacechar and the other one act like spacechar.

... What do we gain by making its name more verbose?

To be less confusing. The previous behavior of indentchar is only for tabs. The user updates to latest micro and found out indentchar is now applied to indented space as well. If they want to change or disable it, they will have to unintuitively change the tabchar option (remember, they want to change indented space now) or mistakely think spacechar is the option they are looking for.

And this hidden behavior (where tabchar is "overriding" indentchar)

It doesn't need to be hidden, it can (and probably should) be documented as well.

I would prefer the behavior to be mentioned in the name rather that requiring the user to read the documentation (although they should be to be fair).

This is gonna be my last attempt to see if we can stick to the current proposed names.
If you still disagree, I can change it.

@dmaluka
Copy link
Collaborator

dmaluka commented Jun 1, 2025

Simple question: do we want an option that allows the user to simply display all tabs (no matter at the beginning, at the end or in the middle of a line)?

If the answer is yes, why not call this option tabchar? Wouldn't that be the most natural and intuitive name? If we name it indenttabchar instead, wouldn't that be confusing?

If what we want instead (or in addition) is an option that only displays tabs at the beginning of lines, then yes, it would make sense to name it indenttabchar. But that is not what your PR is implementing.

@Neko-Box-Coder
Copy link
Contributor Author

Neko-Box-Coder commented Jun 1, 2025

Simple question: do we want an option that allows the user to simply display all tabs (no matter at the beginning, at the end or in the middle of a line)?

If the answer is yes, why not call this option tabchar? Wouldn't that be the most natural and intuitive name? If we name it indenttabchar instead, wouldn't that be confusing?

That's what it does currently, both indentchar and this PR's indenttabchar. The reason I keep the indent name is to be consistent with previous name, even though technically it acts like spacechar

If what we want instead (or in addition) is an option that only displays tabs at the beginning of lines, then yes, it would make sense to name it indenttabchar. But that is not what your PR is implementing.

That's why I was asking maybe if I should have indenttabchar act like indentspacechar and tabchar that is used for in the middle of a sentence. I can implement this instead But as you said it introduces complexity.

Actually, maybe it is better if I rename the new options to the following:

  • tabchar which will replace (and act like) current indentchar
  • indentspacechar which highlights spaces that are in tabsize at the beginning of the sentence
  • and spacechar for the rest of spaces
  • and invisiblechar for non-displayable characters (todo)

Do you think this is better?

@dmaluka
Copy link
Collaborator

dmaluka commented Jun 1, 2025

The reason I keep the indent name is to be consistent with previous name

What's the point of being consistent with it if you are deprecating it?

That's why I was asking maybe if I should have indenttabchar act like indentspacechar and tabchar that is used for in the middle of a sentence. I can implement this instead But as you said it introduces complexity.

I'm not suggesting to implement it, I'm saying that if we wanted to implement it, then indenttabchar would be a suitable name for it.

Actually, maybe it is better if I rename the new options to the following:

[...]

  • indentspacechar which highlights spaces that are in tabsize at the beginning of the sentence

Ok, I've just noticed that you already have bloc.X < leadingwsEnd for indentspacechar...

Question (not a rhetorical one): are you going to use tabchar and indentspacechar with different values, or with the same value?

@Neko-Box-Coder
Copy link
Contributor Author

Actually, maybe it is better if I rename the new options to the following:
[...]

  • indentspacechar which highlights spaces that are in tabsize at the beginning of the sentence

Ok, I've just noticed that you already have bloc.X < leadingwsEnd for indentspacechar...

Question (not a rhetorical one): are you going to use tabchar and indentspacechar with different values, or with the same value?

Yeah, I am, so that I can tell any hidden tabs in space-indented files.

@dmaluka
Copy link
Collaborator

dmaluka commented Jun 1, 2025

so that I can tell any hidden tabs in space-indented files

For that purpose, wouldn't hltaberrors be more useful?

@Neko-Box-Coder
Copy link
Contributor Author

Neko-Box-Coder commented Jun 1, 2025

so that I can tell any hidden tabs in space-indented files

For that purpose, wouldn't hltaberrors be more useful?

True. I should probably turn that on as well. But they will still be different values, since I use indentchar to "see" tab characters (including ones that are in the middle of a sentence in tab indented file), and indentspacechar as indented scopes.

The main thing here is we are mixing the ability to see tab character and indented blocks into a single option (indentchar), which now we are trying to separate.

If we reuse indentchar, this will certainly be confusing for people who are using indentchar just to see tab character, which now becomes tabchar.

If we are reusing indentchar, I think we should at least have a new version/breaking changes message or something when the user upgraded micro, similar to how lazygit did it when they introduced a breaking change. I really think just mentioning it in the documentation is not enough.

I think it is good to have a new version/breaking changes message anyway for future breaking changes if any.

Also if we need to account for a special case where tabchar can take precedence over indentchar when it is ' ', why not spacechar as well?


Or we can avoid all of this headache by just having 4 options, spacechar, indentspacechar, tabchar and indenttabchar, and don't need to introduce breaking/confusing changes and special cases. idk ¯\_(ツ)_/¯

@dmaluka
Copy link
Collaborator

dmaluka commented Jun 2, 2025

To be clear, when I suggested reusing indentchar, I thought your intention was to extend the definition of "indentation character" to all blank characters with X % tabsize == 0, not just those with X < leadingWsEnd (because I hadn't noticed your bloc.X < leadingwsEnd for indentspacechar, and because you confusingly didn't add it for indenttabchar).

I think it is good to have a new version/breaking changes message anyway for future breaking changes if any.

This is a UI change, not an API change. The user is gonna notice it anyway.

We've done such "breaking" UI changes before. It is less breaking than #3335, for example.

Also if we need to account for a special case where tabchar can take precedence over indentchar when it is ' ', why not spacechar as well?

Because with the "extended" definition of indentation character (see my confusion above), any tab is an indent char, but not any space is...

Anyway, screw that, yes, it would be messy. Now I think it's better to let indentchar take precedence over both spacechar and tabchar. So that, tabs at the end of a line are displayed as tabchar, but tabs (and spaces) at the beginning of a line are displayed as indentchar (unless it is set to ' '). That would mean we can't set different characters for indentation tabs vs indentation spaces, but it seems there are no use cases for that?

Or we can avoid all of this headache by just having 4 options, spacechar, indentspacechar, tabchar and indenttabchar

I'm almost fine with that, but... Think of what would be the most useful and pleasant behavior for the user. The user wants to see "indentation columns", displayed with the | character, for example. It seems friendlier to let the user simply set indentchar to | for that, instead of figuring out that it needs to be set separately for tabs and for spaces?

On the other hand... I've recalled once again that it's not that simple, since tabwidth may not match the actual indentation width with spaces. So yeah... for this reason it seems better to have separate options indenttabchar and indentspacechar, and in the documentation for indentspacechar emphasize that this option is tricky and the result will only be desirable if tabwidth is set appropriately...

So, right now I'm inclined to agree that having 4 options spacechar, indentspacechar, tabchar and indenttabchar would be best...

and don't need to introduce breaking/confusing changes and special cases

It still doesn't answer what to do with indentchar.

I don't think silently removing it from the documentation (what your PR seems to be doing) is an option. It seems we should just keep it, and say in its documentation that it is just an alias for tabchar, kept for backward compatibility.

@JoeKar
Copy link
Collaborator

JoeKar commented Jun 2, 2025

So, right now I'm inclined to agree that having 4 options spacechar, indentspacechar, tabchar and indenttabchar would be best...

Agree, but while we are at it...the eolchar (see: #431 (comment)), the already mentioned invchar and maybe even the wrapchar.

@Andriamanitra
Copy link
Contributor

So, right now I'm inclined to agree that having 4 options spacechar, indentspacechar, tabchar and indenttabchar would be best...

Agree, but while we are at it...the eolchar (see: #431 (comment)), the already mentioned invchar and maybe even the wrapchar.

If we're going to keep adding more of these maybe grouping them together under one setting would be better for discoverability? Similar to what vim does with listchars.

@JoeKar
Copy link
Collaborator

JoeKar commented Jun 2, 2025

Similar to what vim does with listchars.

Hm, good idea.
micro does something more "lightweight" with the divchars already, but this would be ugly to configure for more than two options.

@Neko-Box-Coder
Copy link
Contributor Author

Neko-Box-Coder commented Jun 2, 2025

Similar to what vim does with listchars.

Hm, good idea. micro does something more "lightweight" with the divchars already, but this would be ugly to configure for more than two options.

I will need an example in order to implement grouping for the options.

I am not following how divchars are grouped?.


I did say this PR addresses the issues but doesn't mean I am solving all of them completely (for instance #431 (comment) is asking for displaying a hex code, that's not gonna be done in this PR). I am open to ideas but don't want to get feature-creeped here 😅

I will try to implement the ideas mentioned here if relatively straightforward, if not it's better to have a dedicated PR for those features.

The main goal for this PR (to me) is to have configurable indent characters for both space and tab indented files.

@dmaluka
Copy link
Collaborator

dmaluka commented Jun 2, 2025

If we're going to keep adding more of these maybe grouping them together under one setting would be better for discoverability? Similar to what vim does with listchars.

Good point. Originally I was mildly opposed to doing something similar to vim's listchars (we don't need to exactly follow what vim does), but given that it would help us to avoid this "option creep", it seems attractive.

@dmaluka
Copy link
Collaborator

dmaluka commented Jun 2, 2025

I will need an example in order to implement grouping for the options.

Something like tab=>,space=.,itab=|,ispace=| ?

I'm not sure what would be the name of this generic option though. listchars is definitely not the most intuitive one.

@Neko-Box-Coder
Copy link
Contributor Author

I'm not sure what would be the name of this generic option though. listchars is definitely not the most intuitive one.

formatchars (welcome for any other suggestions)? Since space and tabs are used for formatting. But in this case newline (eolchar) would be in this option as well. invchar and wrapchar not so sure.

Something like tab=>,space=.,itab=|,ispace=| ?

I see. In this case, I will need to store these options as global variables (since we don't want to do string parsing in every display call) instead of getting it from the settings map.

@dmaluka
Copy link
Collaborator

dmaluka commented Jun 3, 2025

since we don't want to do string parsing in every display call

Hah, you are sure it would be slower than the settings map lookup? Especially considering how many times we do the settings map lookup in every display call? :D

We are extremely wasteful there. See #3227 and #3228 (and #2970 for the background)

(Just in case, I'm not saying we should not store those parsed settings.)

@dmaluka
Copy link
Collaborator

dmaluka commented Jun 3, 2025

specialchars maybe?

@dmaluka
Copy link
Collaborator

dmaluka commented Jun 3, 2025

for instance #431 (comment) is asking for displaying a hex code, that's not gonna be done in this PR

Ah, I misread that comment, I thought it was just about displaying non-printable characters with some character... And I forgot that micro already does that, since a long time: 51ab8f9 + b793e9b

@JoeKar
Copy link
Collaborator

JoeKar commented Jun 3, 2025

specialchars maybe?

Maybe or symbolchars?

for instance #431 (comment) is asking for displaying a hex code, that's not gonna be done in this PR

Yes, that is ok, since we already show a replacement for non-displayable characters. Transforming them into their Unicode HEX would require more work unrelated to that.
So let us continue with the idea of grouping them into one string before we add four additional new options. The list within that one new option can be extended easily afterwards.
tab=>,space=.,itab=|,ispace=| is already a good option to start with. If the value for one isn't given then the default shall take place.

@Neko-Box-Coder
Copy link
Contributor Author

Neko-Box-Coder commented Jun 3, 2025

Hah, you are sure it would be slower than the settings map lookup? Especially considering how many times we do the settings map lookup in every display call? :D

No, I meant parsing the value of tab=>,space=.,itab=|,ispace=| into individual options everytime when we do displayBuffer() is inefficient compared to reading a non-grouped settings map like what I have now in the PR.

specialchars maybe?

Maybe or symbolchars?
...
tab=>,space=.,itab=|,ispace=| is already a good option to start with

Right, I thought we were only grouping the indent + tab/space chars, or are we?

If we are trying to group other characters as well like eolchar or wrapchar, I guess displaychars makes sense?
Since we are displaying a character to replace a specific character or condition and I think the name of the option should explain (kinda) what it does.

@dmaluka
Copy link
Collaborator

dmaluka commented Jun 3, 2025

No, I meant parsing the value of tab=>,space=.,itab=|,ispace=| into individual options everytime when we do displayBuffer() is inefficient compared to reading a non-grouped settings map

Well, it may slower than a single lookup of the settings map, but my point what that it is a drop in the ocean compared to the rest of the crap we do in every displayBuffer() call.

Anyway, nevermind.

Right, I thought we were only grouping the indent + tab/space chars, or are we?

The idea is that this option can be extensible.

I guess displaychars makes sense?
Since we are displaying a character to replace a specific character or condition

These are not the only characters that we are displaying. We are displaying other, regular characters as well. It's not like we are hiding them, right?

@Neko-Box-Coder
Copy link
Contributor Author

Neko-Box-Coder commented Jun 3, 2025

These are not the only characters that we are displaying. We are displaying other, regular characters as well. It's not like we are hiding them, right?

I mean, we are hiding newlines (sort of) at the moment I guess?
Edit: I get what you meant now, misread your reply.

Anyway, my intent for displaychar is we are "displaying" these chars in places of these conditions.

This is one of the (few?) times where chatgpt is useful 😄 . Got a few more suggestions from it here:

From what you said, it seems to be mostly for making invisible characters visible in a meaningful way, similar to what listchars does in Vim. Is that accurate?

Assuming that's the case, your current suggestions — specialchars, symbolchars, displaychars — are on the right track, but might not fully convey the idea of replacing invisible/control characters with visible glyphs.

A few candidate names that might better reflect the function:

  • visiblechars — emphasizes visibility, which is likely the user's intent.
  • glyphmap — short and technical; useful if this is more for power users.
  • charreveal or revealchars — implies the option reveals normally invisible characters.
  • showchars — simple, clear, action-oriented.
  • charoverlay — if these characters sit on top of normal text visually.
  • invisiblereplace — explicit, though longer.

Funny how it gave listchars from vim when I was describing the problem (without mentioning vim).

Anyway, my favorites are revealchars and showchars (or displaychars), assuming chatgpt's assumption of only using these on invisible characters is correct (I assumed it is).

Edit2: Or maybe replacechars (Inspired by invisiblereplace) since we are "replacing" the characters that match the condition with the ones we set?

@dmaluka
Copy link
Collaborator

dmaluka commented Jun 3, 2025

Or maybe replacechars (Inspired by invisiblereplace) since we are "replacing" the characters that match the condition with the ones we set?

That would be a pretty unconventional usage of the term "replace", especially in a text editor.

At the moment my favorite is showchars, it sounds very succinct and quite meaningful (meaning is: specify whether and how to show characters that are not shown otherwise).

@JoeKar
Copy link
Collaborator

JoeKar commented Jun 4, 2025

At the moment my favorite is showchars

👍

@Neko-Box-Coder
Copy link
Contributor Author

Yep. showchars it is then :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants