Mantis - ALGLIB
Viewing Issue Advanced Details
329 Spec.functions major always 2010-05-07 22:52 2010-05-26 12:33
gighi  
SergeyB  
normal  
resolved 2.5.0  
fixed  
none    
none 2.6.0  
Unspecified
0000329: FIXED: K-Means++: 'multiple restarts' doesn't work correctly
Hi,
I have a problem regarding the K-Means++ implementation contained in ALGLIB.
Please have a look at kmeans.cpp:238:

if( ap::fp_less(e,ebest) )
{
    copymatrix(ct, 0, k-1, 0, nvars-1, ctbest, 0, k-1, 0, nvars-1);
}

Now, if "e" is less then "ebest", the current table containing centroids is saved as the "best centroids set", but, as you can easily see, the variable "ebest" is never updated, so its value will always be "ap::maxrealnumber". As a consequence of this, the "number of restarts" of the algorithm is completely useless, because only the last iteration will be considered, even if it's not the best one.
Am I wrong? This produces an output which is not the best one computed by the algorithm.
Issue History
2010-05-07 22:52 gighi New Issue
2010-05-07 22:52 gighi Programming language => C++
2010-05-08 14:55 gighi Issue Monitored: gighi
2010-05-10 23:00 SergeyB Status new => assigned
2010-05-10 23:00 SergeyB Assigned To => SergeyB
2010-05-10 23:01 SergeyB Note Added: 0000048
2010-05-10 23:02 SergeyB Summary K-Means++ wrong implementation => K-Means++: 'multiple restarts' doesn't work correctly
2010-05-14 21:42 gighi Note Added: 0000049
2010-05-26 12:31 SergeyB Summary K-Means++: 'multiple restarts' doesn't work correctly => FIXED: K-Means++: 'multiple restarts' doesn't work correctly
2010-05-26 12:31 SergeyB Programming language C++ => Unspecified
2010-05-26 12:33 SergeyB Note Added: 0000052
2010-05-26 12:33 SergeyB Status assigned => resolved
2010-05-26 12:33 SergeyB Fixed in Version => [NOT RELEASED YET] Next release
2010-05-26 12:33 SergeyB Resolution open => fixed

Notes
(0000048)
SergeyB   
2010-05-10 23:01   
Yes, you are perfectly right. Thanks for bug report.

This issue will be fixed in the next release, which is scheduled for June, 1st.
(0000049)
gighi   
2010-05-14 21:42   
Another thing: I think you should also add something like "xyc_best", because at this time the "xyc" vector is overwritten at every iteration, and so only the last one is reported as output, and that's not good for the same initial reason.
(0000052)
SergeyB   
2010-05-26 12:33   
* issue is fixed in 2.6.0
* added unit test which checks for statistically significant difference between Restarts=1 and Restarts>1 (solution qualities are compared).