This thread on StackExchange is circling around my Twitter timeline today and I couldn’t resist sharing it here:
Suppose we have data set (X_i, Y_i) with n points. We want to perform a linear regression, but first we sort the X_i values and the Y_i values independently of each other, forming data set(X_i, Y_j). Is there any meaningful interpretation of the regression on the new data set? Does this have a name?
I don’t want to blame the author of the question. It just offers plain ignorance of basic statistical concepts. On first sight this might be a beginner’s misunderstanding, but this totally kills it:
But my manager says he gets “better regressions most of the time” when he does this […]. I have a feeling he is deceiving himself.
This isn’t incompetence anymore – this is deliberate torture of statistics.