← Back to team overview

randgen team mailing list archive

Re: combinations.pl: 1 hour+ instead of 10 minutes

 

Roel,

Bernt should comment on this particular problem, as he implemented those options to combinations.pl .I personally never use --parallel, and I have never had any hangs.

However, you are trying to run 30 mysqld servers under valgrind. What are the specs of the machine that is doing that? Even if it had 30 cores, does it have 30 hard drives and 30 separate memory channels to ensure that enough useful work happens within 600 seconds? My guess is that for some of the test runs, replication barely started before the 600 seconds were up.

Philip Stoev

----- Original Message ----- From: "Roel Van de Paar" <roel.van.de.paar@xxxxxxxxxx>
To: <randgen@xxxxxxxxxxxxxxxxxxx>
Sent: Monday, December 26, 2011 5:18 AM
Subject: [Randgen] combinations.pl: 1 hour+ instead of 10 minutes


Hi All,

I am running into something odd: a combinations.pl run takes 6-7 as long
as it should:

Relevant switches to combinations.pl:
============
  --run-all-combinations-once
  --parallel=15
  --force
============

Relevant settings in the .cc file:
============
 ['
  --mysqld=--log-output=none
  --mysqld=--sql_mode=ONLY_FULL_GROUP_BY
  --mysqld=--default-time-zone=UTC
  --duration=600
  --queries=100000000
  --querytimeout=5

--reporters=Shutdown,Backtrace,QueryTimeout,ErrorLog,ErrorLogAlarm,ValgrindErrors
  --short_column_names
  --strict_fields
  --threads=1
  --valgrind
  --validators=MarkErrorLog
  --seed=132
  --mysqld=--binlog-format=MIXED
  --mysqld=--log-bin=binlog
  --mysqld=--log-bin-index=binlog.index
 '
 ],[
  '','','','','','','','','','','','','','',''
 ]
============

Result:
============
bash-4.1$ ./108.run
# 2011-12-26T03:56:30 /randgen Revno: 912
[...]
# 2011-12-26T03:56:30 Started thread [1] pid=25834
# 2011-12-26T03:56:30 [1] Running combination 1/15
# 2011-12-26T03:56:30 Started thread [2] pid=25835
# 2011-12-26T03:56:30 [2] Running combination 2/15
[... 15 threads, all at once in parallel, as expected ...]
# 2011-12-26T05:03:12 [15] runall.pl exited with exit status
STATUS_OK(0), see /.../trial15.log
[...]
# 2011-12-26T05:05:14 [8] runall.pl exited with exit status
STATUS_VALGRIND_FAILURE(108), see /.../trial8.log
[...]
# 2011-12-26T05:05:45 [14] runall.pl exited with exit status
STATUS_OK(0), see /.../trial14.log
[...]
# 2011-12-26T05:05:45 ./combinations.pl will exit with exit status
STATUS_VALGRIND_FAILURE(108)
============

It took more than one hour while --duration was set to 600!

I think this happens more.

Some ideas:
- Is there some function which "delays" terminating RQG if not enough
"real" time has been processed or something?
- Could it be related to Valgrind runs?
- A 1 hour offset somewhere which causes ++1 hour runs?

Any input/ideas?

--
Kind regards,
God Bless,

Oracle <http://www.oracle.com>
Roel Van de Paar | Senior QA Engineer
Oracle MySQL Server QA
Oracle Australia | NSW 2440
Green Oracle <http://www.oracle.com/commitment> Oracle is committed to
developing practices and products that help protect the environment




--------------------------------------------------------------------------------


Hi All,

I am running into something odd: a combinations.pl run takes 6-7 as long as it should:

Relevant switches to combinations.pl:
============
 --run-all-combinations-once
 --parallel=15
 --force
============

Relevant settings in the .cc file:
============
['
 --mysqld=--log-output=none
 --mysqld=--sql_mode=ONLY_FULL_GROUP_BY
 --mysqld=--default-time-zone=UTC
 --duration=600
 --queries=100000000
 --querytimeout=5
 --reporters=Shutdown,Backtrace,QueryTimeout,ErrorLog,ErrorLogAlarm,ValgrindErrors
 --short_column_names
 --strict_fields
 --threads=1
 --valgrind
 --validators=MarkErrorLog
 --seed=132
 --mysqld=--binlog-format=MIXED
 --mysqld=--log-bin=binlog
 --mysqld=--log-bin-index=binlog.index
'
],[
 '','','','','','','','','','','','','','',''
]
============

Result:
============
bash-4.1$ ./108.run
# 2011-12-26T03:56:30 /randgen Revno: 912
[...]
# 2011-12-26T03:56:30 Started thread [1] pid=25834
# 2011-12-26T03:56:30 [1] Running combination 1/15
# 2011-12-26T03:56:30 Started thread [2] pid=25835
# 2011-12-26T03:56:30 [2] Running combination 2/15
[... 15 threads, all at once in parallel, as expected ...]
# 2011-12-26T05:03:12 [15] runall.pl exited with exit status STATUS_OK(0), see /.../trial15.log
[...]
# 2011-12-26T05:05:14 [8] runall.pl exited with exit status STATUS_VALGRIND_FAILURE(108), see /.../trial8.log
[...]
# 2011-12-26T05:05:45 [14] runall.pl exited with exit status STATUS_OK(0), see /.../trial14.log
[...]
# 2011-12-26T05:05:45 ./combinations.pl will exit with exit status STATUS_VALGRIND_FAILURE(108)
============

It took more than one hour while --duration was set to 600!

I think this happens more.

Some ideas:
- Is there some function which "delays" terminating RQG if not enough "real" time has been processed or something?
- Could it be related to Valgrind runs?
- A 1 hour offset somewhere which causes ++1 hour runs?

Any input/ideas?


--
Kind regards,
God Bless,

Roel Van de Paar | Senior QA Engineer
Oracle MySQL Server QA
Oracle Australia | NSW 2440
Oracle is committed to developing practices and products that help protect the environment



--------------------------------------------------------------------------------


_______________________________________________
Mailing list: https://launchpad.net/~randgen
Post to     : randgen@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~randgen
More help   : https://help.launchpad.net/ListHelp




Follow ups

References