touch-packages team mailing list archive
-
touch-packages team
-
Mailing list archive
-
Message #38369
[Bug 75695] Re: huge performance hit for -i with UTF-8 locales
with 14.04:
time find -name '*.[ch]' | xargs grep -i 'volatile.*s3tc' # en_CA.utf8
real 0m44.320s
user 0m43.777s
sys 0m0.459s
time find -name '*.[ch]' | LANG=C xargs grep -i 'volatile.*s3tc'
real 0m2.078s
user 0m1.795s
sys 0m0.381s
time find -name '*.[ch]' | xargs grep 'volatile.*s3tc' # en_CA.utf8
real 0m1.876s
user 0m0.414s
sys 0m0.523s
Slowdown is still a factor of 20.
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to grep in Ubuntu.
https://bugs.launchpad.net/bugs/75695
Title:
huge performance hit for -i with UTF-8 locales
Status in grep:
Unknown
Status in grep package in Ubuntu:
Incomplete
Bug description:
On a source tree with 28MB of .c and .h files (Mesa), grep is slow with -i and fast without it with the default Ubuntu locale settings (LANG=en_US.UTF-8, no LC_ variables set). Actually, even some [Vv] style patterns are much faster with LANG=C, so this is even more like
https://bugs.launchpad.net/distros/ubuntu/+source/grep/+bug/47634
My box is a core 2 duo (2.4GHz), which makes a beast like gnome feel
almost as snappy as fluxbox :) Everything is in the disk cache, so
I/O isn't a factor. Neither is memory bandwidth. The machine was
otherwise idle. I'm running AMD64 Edgy.
peter@tesla:/usr/local/src/g965/mesa$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
... (all the same)
(times are measured for the second run in a row, so the CPU core it runs on is at full clock speed the whole time.)
time find -name '*.[ch]' | xargs grep -i 'volatile_s3tc'
real 0m3.498s; user 0m3.483s; sys 0m0.023s
time find -name '*.[ch]' | xargs grep 'volatile.*s3tc'
real 0m0.076s; user 0m0.050s; sys 0m0.023s
Non UTF-8 locales are just as fast as without -i
time find -name '*.[ch]' | LANG=C xargs grep -i 'volatile.*s3tc'
real 0m0.083s; user 0m0.067s; sys 0m0.020s
time find -name '*.[ch]' | LANG=en_CA xargs grep -i 'volatile.*s3tc'
real 0m0.079s; user 0m0.050s; sys 0m0.027s
Making a case insensitive pattern takes more time, but is not really slow. However, it probably doesn't really match everything that grep -i would on input that wasn't all 7 bit ASCII:
time find -name '*.[ch]' | xargs grep '[Vv][Oo][Ll][Aa][Tt][Ii][Ll][Ee].*[Ss]3[Tt][Cc]'
real 0m0.340s; user 0m0.313s; sys 0m0.027s
It is affected by locale settings, too.
time find -name '*.[ch]' | LANG=C xargs grep '[Vv][Oo][Ll][Aa][Tt][Ii][Ll][Ee].*[Ss]3[Tt][Cc]'
real 0m0.096s; user 0m0.080s; sys 0m0.027s
To manage notifications about this bug go to:
https://bugs.launchpad.net/grep/+bug/75695/+subscriptions