Hi Mike,
Thanks, I didn't know that PUT operation also includes updating
replica containers. That makes sense. I will check that.
In the mean time I've added debug checkpoints into PUT operation to
measure different steps. Modified code is here:
http://paste.openstack.org/show/3899/ (original:
https://github.com/openstack/swift/blob/master/swift/obj/server.py#L530).
Basically, I added some self.logger.debug() with timestamp in few places.
I'm not python dev and don't know swift internals, so it's quite
possible that I've got something wrong. In any case here are few
sample results: http://paste.openstack.org/show/3900/
Basically, there are 2 steps which take too long:
1. write metadata (line #63 of the first paste, metadata.update()),
this takes 0.5-1.0 sec!
2. update container (line #85, self.container_update()), this is
second slowest, also in the range of ~0.5-1.0 sec.
I assume that self.container_update() includes replica updates. But
why does metadata.update() takes so long? Does it also imply replica
updates?
Overall, I have to say that troubleshooting of swift is impossible.
There's almost no difference between log levels INFO and DEBUG. Would
be nice to have some more info in DEBUG and even TRACE level for this
kinda problems.
--
Rustam.
On 20/12/2011 04:41, Michael Barton wrote:
On Mon, Dec 19, 2011 at 6:21 AM, Rustam Aliyev<rustam@xxxxxxx <mailto:rustam@xxxxxxx>> wrote:
The only thing which looks suspicious to me are these errors:
Dec 18 04:01:28 ec01 object-server ERROR container update failed with
10.0.1.3:6001/d01 (saving for async update later): Timeout (3s) (txn:
txdf95ad5a10844ee0b74d70d8a7638082)
Dec 18 04:01:28 ec01 object-server ERROR container update failed with
10.0.1.2:6001/d01 (saving for async update later): Timeout (3s) (txn:
txee2545ba4610430fa3a6a166ca50c574)
Dec 18 04:01:28 ec01 object-server ERROR container update failed with
10.0.1.8:6001/d01 (saving for async update later): Timeout (3s) (txn:
tx2546b29b15c643ec90a122a753dfddd3)
Yeah, that is likely to be the culprit. Each write is taking at least
3 seconds because it's timing out trying to update the container
servers.
So you need to debug connectivity from this object server to those IP
addresses on port 6001 -- that the IP addresses and port are correct,
everything's on the same network, there aren't any firewall rules
blocking those connections, that the container servers are running and
accepting connections, etc. I'll read through your paste in a bit and
see if I notice anything.
-- Mike
_______________________________________________
Mailing list: https://launchpad.net/~openstack
<https://launchpad.net/%7Eopenstack>
Post to : openstack@xxxxxxxxxxxxxxxxxxx
<mailto:openstack@xxxxxxxxxxxxxxxxxxx>
Unsubscribe : https://launchpad.net/~openstack
<https://launchpad.net/%7Eopenstack>
More help : https://help.launchpad.net/ListHelp