The Wayback Machine - https://web.archive.org/web/20221218083032/https://github.com/python/cpython/issues/100322
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent drive letter case on Windows #100322

Open
Kieran-Bacon opened this issue Dec 17, 2022 · 2 comments
Open

Inconsistent drive letter case on Windows #100322

Kieran-Bacon opened this issue Dec 17, 2022 · 2 comments
Labels
OS-windows type-bug An unexpected behavior, bug, or error

Comments

@Kieran-Bacon
Copy link

Kieran-Bacon commented Dec 17, 2022

Bug report + suggested fix

The os.path.abspath method on windows is inconsistent when pre-pending the drive letter as it can be either upper or lowercase.

It is not entirely clear from Microsoft's documentation (https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getfullpathnamew) when this happens exactly, but, a simple way of re-producing the issue is by changing directory and specifying the opposite case letter in the call

C:\Users\kieran>python -c "import os; print(os.path.abspath('.'))"
C:\Users\kieran

C:\Users\kieran>cd c:\Users\kieran\Projects

c:\Users\kieran\Projects>python -c "import os; print(os.path.abspath('.'))"
c:\Users\kieran\Projects

This is very frustrating when trying to compare paths, and, sometimes different tools in the same development environments will run in such a way that they change their drive letter case, leading to issues between those tools or differences inside tests.

I would like to suggest that we enforce the lowercase driver letters by making a small change to

cpython/Modules/posixmodule.c

Lines 4236 to 4274 in 0fe61d0

_PyOS_getfullpathname(const wchar_t *path, wchar_t **abspath_p)
{
wchar_t woutbuf[MAX_PATH], *woutbufp = woutbuf;
DWORD result;
result = GetFullPathNameW(path,
Py_ARRAY_LENGTH(woutbuf), woutbuf,
NULL);
if (!result) {
return -1;
}
if (result >= Py_ARRAY_LENGTH(woutbuf)) {
if ((size_t)result <= (size_t)PY_SSIZE_T_MAX / sizeof(wchar_t)) {
woutbufp = PyMem_RawMalloc((size_t)result * sizeof(wchar_t));
}
else {
woutbufp = NULL;
}
if (!woutbufp) {
*abspath_p = NULL;
return 0;
}
result = GetFullPathNameW(path, result, woutbufp, NULL);
if (!result) {
PyMem_RawFree(woutbufp);
return -1;
}
}
if (woutbufp != woutbuf) {
*abspath_p = woutbufp;
return 0;
}
*abspath_p = _PyMem_RawWcsdup(woutbufp);
return 0;
}

If we were to include an iteration (line 4266) of the buffered path that lowercased the chars up to the colon (guaranteed to be set), this would ensure consistent drive letters.

Your environment

Python 3.9.6
Microsoft Windows 10 Pro - 10.0.19044 Build 19044
x64-based PC

@Kieran-Bacon Kieran-Bacon added the type-bug An unexpected behavior, bug, or error label Dec 17, 2022
@barneygale
Copy link
Contributor

barneygale commented Dec 17, 2022

I believe that Python deliberately preserves case in functions like abspath().

You could use os.path.normcase(a) == os.path.normcase(b) to compare paths, or use appropriate classes from pathlib. Would that work for you?

@eryksun
Copy link
Contributor

eryksun commented Dec 18, 2022

Device names and UNC server and share names are case insensitive. For example:

>>> os.chdir('//LoCaLhOsT/C$')
>>> os.getcwd()
'\\\\LoCaLhOsT\\C$'

>>> os.path.samefile('.', '//localhost/c$')
True

NTFS on Windows 10+ supports flagging a directory as case sensitive, which makes pure path comparisons unreliable. If that isn't a concern, then we can simply normalize the case of the entire path. Otherwise, we can at least split off the drive via os.path.splitdrive() and normalize the case of the drive component.

It's best to use os.path.normcase() because it uses Windows filesystem rules. It's based on a table that maps one 16-bit ordinal to another 16-bit ordinal. No case conversion is supported for characters beyond the BMP, i.e. UTF-16 surrogate codes map to themselves. There's no linguistic or locale-dependent casing rules (e.g. "ß" <-> "SS") and no Unicode normalization.

@eryksun eryksun changed the title Inconsistent driver letter case on Windows + suggested fix Inconsistent drive letter case on Windows Dec 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OS-windows type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants