group.of.nepali.translators team mailing list archive
-
group.of.nepali.translators team
-
Mailing list archive
-
Message #07949
[Bug 1628750] Re: Please backport fixes from 10.2.3 and tip for RadosGW
** Changed in: ceph (Ubuntu)
Status: New => Triaged
** Changed in: ceph (Ubuntu)
Importance: Undecided => Critical
** Also affects: ceph (Ubuntu Yakkety)
Importance: Critical
Status: Triaged
** Also affects: ceph (Ubuntu Xenial)
Importance: Undecided
Status: New
** Also affects: cloud-archive
Importance: Undecided
Status: New
** Also affects: cloud-archive/mitaka
Importance: Undecided
Status: New
** Changed in: cloud-archive
Status: New => Invalid
** Changed in: cloud-archive/mitaka
Status: New => Triaged
** Changed in: ceph (Ubuntu Xenial)
Status: New => Triaged
** Changed in: cloud-archive/mitaka
Importance: Undecided => Critical
** Changed in: ceph (Ubuntu Xenial)
Importance: Undecided => Critical
--
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1628750
Title:
Please backport fixes from 10.2.3 and tip for RadosGW
Status in Ubuntu Cloud Archive:
Invalid
Status in Ubuntu Cloud Archive mitaka series:
Triaged
Status in ceph package in Ubuntu:
Triaged
Status in ceph source package in Xenial:
Triaged
Status in ceph source package in Yakkety:
Triaged
Bug description:
We've run into significant issues with RadosGW at scale; we have a
customer who has ½ billion objects in ~20Tb of data and whenever they
lose an OSD for whatever reason, even for a very short period of time,
ceph was taking hours and hours to recover. The whole time it was
recovering requests to RadosGW were hanging.
I ended up cherry picking 3 patches; 2 from 10.2.3 and one from trunk:
* d/p/fix-pg-temp.patch: cherry pick 56bbcb1aa11a2beb951de396b0de9e3373d91c57 from jewel.
* d/p/only-update-up_thru-if-newer.patch: 6554d462059b68ab983c0c8355c465e98ca45440 from jewel.
* d/p/limit-omap-data-in-push-op.patch: 38609de1ec5281602d925d20c392ba4094fdf9d3 from master.
The 2 from 10.2.3 are because pg_temp was implicated in one of the
longer outages we had.
The last one is what I think actually got us to a point where ceph was
stable and I found it via the following URL chain:
http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2016-June/010230.html
-> http://tracker.ceph.com/issues/16128
-> https://github.com/ceph/ceph/pull/9894
-> https://github.com/ceph/ceph/commit/38609de1ec5281602d925d20c392ba4094fdf9d3
With these 3 patches applied the customer has been stable for 4 days
now but I've yet to restart the entire cluster (only the stuck OSDs)
so it's hard to be completely sure that all our issues are resolved
but also which of the patches fixed things.
I've attached the debdiff I used for reference.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1628750/+subscriptions