← Back to team overview

touch-packages team mailing list archive

[Bug 75695] Re: huge performance hit for -i with UTF-8 locales

 

with 14.04:

time find -name '*.[ch]' | xargs grep -i 'volatile.*s3tc'  # en_CA.utf8
real    0m44.320s
user    0m43.777s
sys     0m0.459s

time find -name '*.[ch]' | LANG=C xargs grep -i 'volatile.*s3tc'
real    0m2.078s
user    0m1.795s
sys     0m0.381s

time find -name '*.[ch]' | xargs grep 'volatile.*s3tc'  # en_CA.utf8
real    0m1.876s
user    0m0.414s
sys     0m0.523s

Slowdown is still a factor of 20.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to grep in Ubuntu.
https://bugs.launchpad.net/bugs/75695

Title:
  huge performance hit for -i with UTF-8 locales

Status in grep:
  Unknown
Status in grep package in Ubuntu:
  Incomplete

Bug description:
  On a source tree with 28MB of .c and .h files (Mesa), grep is slow with -i and fast without it with the default Ubuntu locale settings (LANG=en_US.UTF-8, no LC_ variables set).  Actually, even some [Vv] style patterns are much faster with LANG=C, so this is even more like 
  https://bugs.launchpad.net/distros/ubuntu/+source/grep/+bug/47634

   My box is a core 2 duo (2.4GHz), which makes a beast like gnome feel
  almost as snappy as fluxbox :)  Everything is in the disk cache, so
  I/O isn't a factor.  Neither is memory bandwidth.  The machine was
  otherwise idle.  I'm running  AMD64 Edgy.

  peter@tesla:/usr/local/src/g965/mesa$ locale
  LANG=en_US.UTF-8
  LC_CTYPE="en_US.UTF-8"
  ... (all the same)

  (times are measured for the second run in a row, so the CPU core it runs on is at full clock speed the whole time.)
  time find -name '*.[ch]' | xargs grep -i 'volatile_s3tc'
   real    0m3.498s; user    0m3.483s; sys     0m0.023s

  time find -name '*.[ch]' | xargs grep  'volatile.*s3tc'
   real    0m0.076s; user    0m0.050s; sys     0m0.023s

  
  Non UTF-8 locales are just as fast as without -i
  time find -name '*.[ch]' | LANG=C xargs grep -i 'volatile.*s3tc'
   real    0m0.083s; user    0m0.067s; sys     0m0.020s

  time find -name '*.[ch]' | LANG=en_CA xargs grep -i 'volatile.*s3tc'
   real    0m0.079s; user    0m0.050s; sys     0m0.027s

  
   Making a case insensitive pattern takes more time, but is not really slow.  However, it probably doesn't really match everything that grep -i would on input that wasn't all 7 bit ASCII:
   time find -name '*.[ch]' | xargs grep  '[Vv][Oo][Ll][Aa][Tt][Ii][Ll][Ee].*[Ss]3[Tt][Cc]'
  real    0m0.340s; user    0m0.313s; sys     0m0.027s

  It is affected by locale settings, too.
  time find -name '*.[ch]' | LANG=C xargs grep  '[Vv][Oo][Ll][Aa][Tt][Ii][Ll][Ee].*[Ss]3[Tt][Cc]'
  real    0m0.096s; user    0m0.080s; sys     0m0.027s

To manage notifications about this bug go to:
https://bugs.launchpad.net/grep/+bug/75695/+subscriptions