touch-packages team mailing list archive
-
touch-packages team
-
Mailing list archive
-
Message #22956
[Bug 1376606] [NEW] inappropriate statistical parameters in ping output
Public bug reported:
The output of the ping command gives to the user several statistical paramters of the measured values (seen, in statistical sense, as a sample of a statistical population) e.g.
> rtt min/avg/max/mdev = 423.152/728.492/1306.341/220.001 ms, pipe 2
One of them is surely not appropriate in order to describe data in that
case, one is probably inappropriate.
First, the mean of a sample is a good estimator (in statistical sence) for the mean of an underlying statistical distribution (in statistical sense).
But, this is only true for statistical distributions that do possess a so-called "first moment" (or "expectation value") or mean.
Some do not. And for that cases, giving the mean of the sample is misleading, because it is only an unreliable, fluctuating property of the random sample - and not of the statistical population!
The mean of the random sample does not converge (e.g. with increasing sample size) to a location paramter of the underlying population or distribution.
An user will interprete the given value as information of some kind of "middle" of the latencies, that will occure in the data conneciton. And this interpretation is wrong. Therefor, the statistical parameter "avg", mean of the sample, is misleading and therefor inappropriate.
Latency measurements are a standard case, where distrubutions occure, that do not possess first moments or expectation values (or, at least, do contain a large amount of outliers).
In such cases, the more robust (and easier) measure of location, called "median" should be used, see
http://en.wikipedia.org/wiki/Median
http://en.wikipedia.org/wiki/File:Comparison_mean_median_mode.svg
(As a second reason, the skew of the latency measurements also
indicates, that a sample mean is not a good choice for an estimator for
the measure of location of the distribution.)
Second, a better measure of dispersion should be used. Wikipedia:
"When the median is used as a location parameter in descriptive statistics, there are several choices for a measure of variability: the range, the interquartile range, the mean absolute deviation, and the median absolute deviation."
I would argue for the median absolute deviation.
(I wrote "probalby inappropriate", because "mdev" does not indicate a specific statistical technical term, so I do not know, what ist calculated. If it is "(square root of) sample variance" or "estimator for standard deviation", then it is surely inappropriate.)
Ubuntu release: 14.04.1 LTS
iputils-ping: 3:20121221-4ubuntu1.1
** Affects: iputils (Ubuntu)
Importance: Undecided
Status: New
** Summary changed:
- inappropriate statistical paramters in ping output
+ inappropriate statistical parameters in ping output
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to iputils in Ubuntu.
https://bugs.launchpad.net/bugs/1376606
Title:
inappropriate statistical parameters in ping output
Status in “iputils” package in Ubuntu:
New
Bug description:
The output of the ping command gives to the user several statistical paramters of the measured values (seen, in statistical sense, as a sample of a statistical population) e.g.
> rtt min/avg/max/mdev = 423.152/728.492/1306.341/220.001 ms, pipe 2
One of them is surely not appropriate in order to describe data in
that case, one is probably inappropriate.
First, the mean of a sample is a good estimator (in statistical sence) for the mean of an underlying statistical distribution (in statistical sense).
But, this is only true for statistical distributions that do possess a so-called "first moment" (or "expectation value") or mean.
Some do not. And for that cases, giving the mean of the sample is misleading, because it is only an unreliable, fluctuating property of the random sample - and not of the statistical population!
The mean of the random sample does not converge (e.g. with increasing sample size) to a location paramter of the underlying population or distribution.
An user will interprete the given value as information of some kind of "middle" of the latencies, that will occure in the data conneciton. And this interpretation is wrong. Therefor, the statistical parameter "avg", mean of the sample, is misleading and therefor inappropriate.
Latency measurements are a standard case, where distrubutions occure, that do not possess first moments or expectation values (or, at least, do contain a large amount of outliers).
In such cases, the more robust (and easier) measure of location, called "median" should be used, see
http://en.wikipedia.org/wiki/Median
http://en.wikipedia.org/wiki/File:Comparison_mean_median_mode.svg
(As a second reason, the skew of the latency measurements also
indicates, that a sample mean is not a good choice for an estimator
for the measure of location of the distribution.)
Second, a better measure of dispersion should be used. Wikipedia:
"When the median is used as a location parameter in descriptive statistics, there are several choices for a measure of variability: the range, the interquartile range, the mean absolute deviation, and the median absolute deviation."
I would argue for the median absolute deviation.
(I wrote "probalby inappropriate", because "mdev" does not indicate a specific statistical technical term, so I do not know, what ist calculated. If it is "(square root of) sample variance" or "estimator for standard deviation", then it is surely inappropriate.)
Ubuntu release: 14.04.1 LTS
iputils-ping: 3:20121221-4ubuntu1.1
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/iputils/+bug/1376606/+subscriptions
Follow ups
References