Amg arpack workaround fix #14647
Amg arpack workaround fix #14647
Conversation
You may also want to completely remove the sign flip in the Laplacian |
The preceding if condition also looks dodgy. It is (eigen_solver == 'arpack' or eigen_solver != 'lobpcg' and
(not sparse.isspmatrix(laplacian) or n_nodes < 5 * n_components)) I think this is interpreted as
If this is the correct interpretation then the first clause can be simplified to |
For speed and simplicity, you may want just to call a dense solver if n_nodes < 5 * n_components no matter what the other conditions are. |
The conditions are really complicated to follow. Recalling and trusting my old self (probably I should not do that) #10715 (comment), we could simplify with something like: if not sparse.isparse(laplacian) and eigen_solver == 'amg':
# warns that we switch to arpack
eigen_solver = 'arpack'
if eigen_solver == 'arpack':
# solve the problem
try:
...
except RuntimeError:
eigen_solver = 'lobcpg'
# from that point eigen_solver will be either amg or lobcpg
# check if we have enough n_nodes
if n_nodes < 5 * n_components:
# use eigh
else:
# check that we should precondition
if eigen_solver == 'amg':
# preconditionne and get M
# call lobcpg with M if available otherwise go for default Anyway, we should probably merge the regression test and later refactor the code to be readable. |
@@ -301,7 +301,7 @@ def spectral_embedding(adjacency, n_components=8, eigen_solver=None, | |||
if embedding.shape[0] == 1: | |||
raise ValueError | |||
|
|||
elif eigen_solver == "lobpcg": | |||
if eigen_solver == "lobpcg": |
glemaitre
Aug 14, 2019
Contributor
As a side note, codecov
is reporting that lobpcg
is never used our tests:
As a side note, codecov
is reporting that lobpcg
is never used our tests:
@lobpcg dense solver meaning arpack? |
@glemaitre your code has the same issue I'm fixing here. There is no |
Ok I also really don't understand the sign flip tbh. |
dense solver meaning scipy.linalg.eigh or numpy.linalg.eigh (unsure about the difference), also turning a sparse Laplacian into dense if needed. The point being that if n_nodes < 5 * n_components it it likely cheaper to compute all eigenvectors and drop some. |
@lobpcg? Shouldn't the fallback for arpack just be the standard eigsh, not lobpcg? The benefits of shift-invert mode are not within my expertise, can you comment on the benefits of using that instead of using the standard eigsh? |
laplacian *= -1 in several places makes no sense whatsoever to me. |
@lobpcg did you see the comment above the arpack magic? That explains what's happening. I don't think it's related to lobpcg. |
There's a bunch of other issues here, but I think this PR fixes one obvious bug and adds one clear regression test and we should merge it. Then we can triage the other bugs. |
@glemaitre would love to increase coverage but calling lobpcg actually fails, see #13393 (comment) |
For "small" matrices, scipy.sparse.linalg.eigsh is typically the best in all respects. For large matrices:
|
(4 should be with amg I think ;) |
@jnothman if (eigen_solver == 'arpack' or (eigen_solver != 'lobpcg' and
((not sparse.isspmatrix(laplacian)) or n_nodes < 5 * n_components)): |
thanks, fixed
Do you mean
It's just a bad way to call arpack shift-and-invert in this case. A good way is just to change sigma, replacing the above with something like
to find the eigenvalues closest to zero. You want sigma to be a bit negative, to avoid LU factorization failures, so that arpack never fails. Above I use a safe choice sigma=-1e-5 that should work in single precision as well (if arpack supports it). If you make sigma even more negative, it should slow down the convergence a bit. |
I'd still like to see this merged before we go into rewriting the logic. |
+1 for introducing the regression test first |
Please
|
addes whatsnew, opened #14713 |
# Conflicts: # doc/whats_new/v0.22.rst
@@ -175,7 +175,22 @@ def test_spectral_embedding_amg_solver(seed=36): | |||
random_state=np.random.RandomState(seed)) | |||
embed_amg = se_amg.fit_transform(S) | |||
embed_arpack = se_arpack.fit_transform(S) | |||
assert _check_with_col_sign_flipping(embed_amg, embed_arpack, 0.05) | |||
assert _check_with_col_sign_flipping(embed_amg, embed_arpack, 0.1e-4) |
ogrisel
Aug 29, 2019
Member
Suggested change
assert _check_with_col_sign_flipping(embed_amg, embed_arpack, 0.1e-4)
assert _check_with_col_sign_flipping(embed_amg, embed_arpack, 1e-5)
assert _check_with_col_sign_flipping(embed_amg, embed_arpack, 0.1e-4) | |
assert _check_with_col_sign_flipping(embed_amg, embed_arpack, 1e-5) |
se_arpack.affinity = "precomputed" | ||
embed_amg = se_amg.fit_transform(affinity) | ||
embed_arpack = se_arpack.fit_transform(affinity) | ||
assert _check_with_col_sign_flipping(embed_amg, embed_arpack, 0.1e-4) |
ogrisel
Aug 29, 2019
Member
Suggested change
assert _check_with_col_sign_flipping(embed_amg, embed_arpack, 0.1e-4)
assert _check_with_col_sign_flipping(embed_amg, embed_arpack, 1e-5)
assert _check_with_col_sign_flipping(embed_amg, embed_arpack, 0.1e-4) | |
assert _check_with_col_sign_flipping(embed_amg, embed_arpack, 1e-5) |
LGTM. Will merge. |
c52b6e1
into
scikit-learn:master
Fixes #10715.
See discussion here:
#10720 (comment)
Further discussion by @glemaitre and @lesteve and @sky88088 and @lobpcg here:
#10715 (comment)
This logic can probably be simplified further but this is a fix for an obvious logic bug.
Right now, in the first case of the "try", if it succeeds and
eigen_solver=="amg"
, the result is discarded, a different (apparently less suitable, according to the heuristic) solver is used, and the laplacian was flipped.This is clearly not the intention of the code. If the
try
succeeds, these results should be used.FYI, the tolerances in the checks were way larger than any of the values.