group.of.nepali.translators team mailing list archive
-
group.of.nepali.translators team
-
Mailing list archive
-
Message #08969
[Bug 1640518] Re: MongoDB Memory corruption
** Also affects: glibc (Ubuntu Xenial)
Importance: Undecided
Status: New
** Also affects: glibc (Ubuntu Yakkety)
Importance: Undecided
Status: New
** Changed in: glibc (Ubuntu)
Assignee: Taco Screen team (taco-screen-team) => Adam Conrad (adconrad)
** Changed in: glibc (Ubuntu Xenial)
Assignee: (unassigned) => Adam Conrad (adconrad)
** Changed in: glibc (Ubuntu Yakkety)
Assignee: (unassigned) => Adam Conrad (adconrad)
--
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1640518
Title:
MongoDB Memory corruption
Status in GLibC:
Unknown
Status in glibc package in Ubuntu:
New
Status in glibc source package in Xenial:
New
Status in glibc source package in Yakkety:
New
Bug description:
== Comment: #0 - Calvin L. Sze <calvins@xxxxxxxxxx> - 2016-11-01 23:09:10 ==
Team has changed to the Bare-metal Ubuntu 16.4. The problem still exists, so it is not related to the virtualization.
Since the bug is complicated to reproduce, Could we use sets of tools
to collect the data when this happens?
---Problem Description---
MongoDB has memory corruption issues which only occurred on Ubuntu 16.04, it doesn't occur on Ubuntu 15.
Contact Information =Calvin Sze/Austin/IBM
---uname output---
Linux master 4.4.0-36-generic #55-Ubuntu SMP Thu Aug 11 18:00:57 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux
Machine Type = Model: 2.1 (pvr 004b 0201) Model name: POWER8E (raw), altivec supported
---System Hang---
the system is still alive
---Debugger---
A debugger is not configured
---Steps to Reproduce---
Unfortunately, not very easily. I had a test case that I was running on ubuntu1604-ppc-dev.pic.build.10gen.cc and xxxx-ppc-dev.pic.build.10gen.cc. I understand these to be two VMs running on the same physical host.
About 3.5% of the test runs on ubuntu1604-ppc-dev.pic.build.10gen.cc
would fail, but all of the runs on the other machine passed.
Originally, this failure manifested as the GCC stack protector (from
-fstack-protector-strong) claiming stack corruption.
Hoping to be able to see the data that was being written and
corrupting the stack, I manually injected a guard region into the
stack of the failing functions as follows:
+namespace {
+
+class Canary {
+public:
+
+ static constexpr size_t kSize = 1024;
+
+ explicit Canary(volatile unsigned char* const t) noexcept : _t(t) {
+ ::memset(const_cast<unsigned char*>(_t), kBits, kSize);
+ }
+
+ ~Canary() {
+ _verify();
+ }
+
+private:
+ static constexpr uint8_t kBits = 0xCD;
+ static constexpr size_t kChecksum = kSize * size_t(kBits);
+
+ void _verify() const noexcept {
+ invariant(std::accumulate(&_t[0], &_t[kSize], 0UL) == kChecksum);
+ }
+
+ const volatile unsigned char* const _t;
+};
+
+} // namespace
+
Status bsonExtractField(const BSONObj& object, StringData fieldName, BSONElement* outElement) {
+
+ volatile unsigned char* const cookie = static_cast<unsigned char *>(alloca(Canary::kSize));
+ const Canary c(cookie);
+
When running with this, the invariant would sometimes fire. Examining
the stack cookie under the debugger would show two consecutive bytes,
always at an offset ending 0x...e, written as either 0 0, or 0 1,
somewhere at random within the middle of the cookie.
This indicated that it was not a conventional stack smash, where we
were writing past the end of a contiguous buffer. Instead it appeared
that either the currently running thread had reached up some arbitrary
and random amount on the stack and done either two one-byte writes, or
an unaligned 2-byte write. Another possibility was that a local
variable had been transferred to another thread, which had written to
it.
However, while looking at the code to find such a thing, I realized
that there was another possibility, which was that the bytes had never
been written correctly in the first place. I changed the stack canary
constructor to be:
+ explicit Canary(volatile unsigned char* const t) noexcept : _t(t) {
+ ::memset(const_cast<unsigned char*>(_t), kBits, kSize);
+ _verify();
+ }
So that immediately after writing the byte pattern to the stack
buffer, we verified the contents we wrote. Amazingly, this *failed*,
with the same corruption as seen before. This means that either
between the time we called memset to write the bytes and when we read
them back, something either overwrote the stack cookie region, or that
the bytes were never written correctly by memset, or that memset wrote
the bytes, but the underlying physical memory never took the write.
Stack trace output:
no
Oops output:
no
Userspace tool common name: MongoDB
Userspace rpm: mongod
The userspace tool has the following bit modes: 64bit
System Dump Info:
The system is not configured to capture a system dump.
Userspace tool obtained from project website: na
*Additional Instructions for Lilian Romero/Austin/IBM:
-Post a private note with access information to the machine that the bug is occuring on.
-Attach sysctl -a output output to the bug.
-Attach ltrace and strace of userspace application.
== Comment: #1 - Luciano Chavez <chavez@xxxxxxxxxx> - 2016-11-02 08:41:47 ==
Normally for userspace memory corruption type problems I would recommend Valgrind's memcheck tool though if this works on other versions of linux, one would want to compare the differences such as whether or not you are using the same version of mongodb, gcc, glibc and the kernel.
Has a standalone testcase been produced that shows the issue without
mongodb?
== Comment: #2 - Steven J. Munroe <sjmunroe@xxxxxxxxxx> - 2016-11-02 10:27:40 ==
We really need that standalone test case.
Need to look at WHAT c++ is doing with memset. I suspect the compiler
is short circuiting the function and inlining. That is what you would
want for optimization, but we need to know so we can steer this to the
correct team.
== Comment: #3 - Calvin L. Sze <calvins@xxxxxxxxxx> - 2016-11-02 13:17:30 ==
Hi Luciano and Steve, Thanks for the advise,
They don't have a standalone test case without Mongodb, I could image
it take a while and probably not that easy to produce. I am seeking
your advise how to approach this. The failure takes at least 24 - 48
hours running to reproduce. Steve, do you have what you needed for
C++ test, or there is something I need to ask Mongo development team?
Thanks
== Comment: #4 - William J. Schmidt <wschmidt@xxxxxxxxxx> - 2016-11-02 16:29:26 ==
(In reply to comment #3)
> Hi Luciano and Steve, Thanks for the advise,
>
> They don't have a standalone test case without Mongodb, I could image it
> take a while and probably not that easy to produce. I am seeking your
> advise how to approach this. The failure takes at least 24 - 48 hours
> running to reproduce. Steve, do you have what you needed for C++ test, or
> there is something I need to ask Mongo development team?
>
> Thanks
It's unclear to me yet that we have evidence of this being a problem
in the toolchain. Does the last experiment (revised Canary
constructor) ALWAYS fail, or does it also fail only ever 24 - 48
hours? If the latter, then all we know is that stack corruption
happens. There's no indication of where the wild pointer is coming
from (application problem, compiler problem, etc.). If it does always
fail, however, then I question the assertion that they can't provide a
standalone test case.
We need something more concrete to work with.
Bill
== Comment: #5 - Calvin L. Sze <calvins@xxxxxxxxxx> - 2016-11-03 18:08:33 ==
Could this ticket be viewed by external customer/ISV?
I am thinking how to establish the direct communications between Mongodb development team and experts/owner of the ticket to pass the middle man, me :-)
Here are the MongoDB deelopment director, Andrew's answers to my 3
questions. And in addition he added comments.
Basically, there are 3 questions,
> 1. Is the mongoDB binary built with gcc came with Linux
distributions or with IBM Advance toolchain gcc?
We build our own GCC, but we have reproduced the issue with both our custom GCC, and the builtin linux distribution GCC. We have also reproduced with clang 3.9 built from source on the Ubuntu 16.04 POWER machine, so we do not think that this is a compiler issue (could still be a std library issue).
> 2. Does the last experiment (revised Canary constructor) ALWAYS fail, or does it also fail only ever 24 - 48 hours?
No, we have never been able to construct a deterministic repro. We are
only able to get it to fail after running the test a very large number
of times.
> 3. Is there any way we can have a standalone test case without
MongoDB?
We do not have such a repro at this time.
I do understand the position they are taking - it isn't a lot of
information to go on, and most of the time the correct response to a
mysterious software crash is to blame the software itself, not the
surrounding ecosystem. However, we have a lot of *indirect* evidence
that has made us skeptical that this is our bug. We would love to be
proved wrong!
- The stack corruption has not reproduced on any other systems. We are running these same tests on every commit across dozens of Linux variants, and across four cpu architectures (x86_64, POWER, zSeries, ARMv8).
- We don't see crashes on other POWER, but we do on Ubuntu POWER.
- We don't see crashes on Windows, Solaris, OS X
- We have run the under the clang address sanitizer, with no reports.
- We have enabled the clang address sanitizer use-after-return detector, and found no results.
If this were a wild pointer in the MongoDB server process that was writing to the stack of other threads, we would expect to see corruption show up elsewhere, but we simply do not.
However, lets assume that this is a bug in our code, that for whatever
reason only reveals itself on POWER, and only on Ubuntu. We would
still be interesting in learning from the kernel team if there are
additional power specific debugging techniques that we might be able
to apply. In particular, the ability to programmatically set/unset
hardware watchpoints over the stack canary. Another possibility would
be to mprotect the stack canary, but it is not clear to us whether it
is valid to mprotect part of the stack, either in general, or on
POWER.
We would be happy to hear any suggestions on how to proceed.
Thanks,
Andrew
== Comment: #6 - Steven J. Munroe <sjmunroe@xxxxxxxxxx> - 2016-11-03 18:34:30 ==
you could tell what specific GCC version you are based on and configure options.
You could provide the disassemble of the canary code.
== Comment: #7 - William J. Schmidt <wschmidt@xxxxxxxxxx> - 2016-11-03 23:01:55 ==
It would be useful to see what the Canary is compiled into, as Steve suggested. Let's make sure it's doing what we think it is.
Given we have multiple compilers producing the same results, we may
want to think more about the runtime environment -- are you using the
same glibc and libstdc++ in all cases? Clang at least would pick up
the distro versions, as it doesn't provide its own.
One reason you see this on Ubuntu 16.04 and not on another linux
distro is likely because of glibc level. The other linux's glibc is
quite old by comparison. glibc 2.23, which appears on Ubuntu 16.04,
is the first version to be compiled with -fstack-protector-strong by
default. So this doesn't necessarily mean that the bug doesn't exist
elsewhere; it just means that the stack protector code isn't enabled
to spot the problem. If the stack corruption is benign, then it
wouldn't be noticed otherwise.
I assume that glibc 2.23 was compiled with Ubuntu's version of gcc 5
that ships with the system, in case that becomes relevant.
I don't personally have a lot of experience with trying to debug
something of this nature, in case we don't see something obvious from
the disassembly of the canary. CCing Ulrich Weigand in case he has
some ideas of other approaches to try.
== Comment: #9 - Ulrich Weigand <Ulrich.Weigand@xxxxxxxxxx> - 2016-11-04 12:21:48 ==
I don't really have any other great ideas either. Just two comments:
- Even though the original reported mentioned they already tried
clang's address sanitizer, I'd definitely still also try reproducing
the problem under valgrind -- the two are different in what exactly
they detect, and using both tools in a complex problem can only help.
- The Canary code sample above has strictly speaking undefined
behavior, I think: it is calling memset on a const *. (The const_cast
makes the warning go away, but doesn't actually cure the undefined
behavior.) I don't *think* this will cause codegen changes in this
example, but it cannot hurt to try to fix this and see if anything
changes.
== Comment: #12 - Calvin L. Sze <calvins@xxxxxxxxxx> - 2016-11-06 10:32:25 ==
Hi Bill, Thanks
I have asked Andrew, waiting for his confirmation.
== Comment: #14 - Calvin L. Sze <calvins@xxxxxxxxxx> - 2016-11-06 10:56:49 ==
Hi Calvin -
I can provide the assembly of the function that contains the canary (the canary itself gets inlined), but I think it might just be easier if I uploaded a binary and an associated corefile? That way your engineers could disassemble the crashing function themselves in the debugger and see exactly what the state was at the time of the crash.
What is the best way for me to get that information to you?
Thanks,
Andrew
== Comment: #15 - Calvin L. Sze <calvins@xxxxxxxxxx> - 2016-11-06 10:58:54 ==
Provided the binary and core information.
Note from Mongo;
I've uploaded a sample core file and the associated binary to your ftp
server as detailed above. The binary is named `mongod.power` and the core is
named `mongod.power.core`.
You should expect to see a backtrace on the faulting thread which looks
like this (for the first few frames):
(gdb) bt
#0 0x00003fff997be5d0 in __libc_signal_restore_set (set=0x3fff5814c1f0)
at ../sysdeps/unix/sysv/linux/nptl-signals.h:79
#1 __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:55
#2 0x00003fff997c0c00 in __GI_abort () at abort.c:89
#3 0x00000000223c33e8 in mongo::invariantFailed (expr=<optimized out>,
file=0x24131b38 "src/mongo/bson/util/bson_extract.cpp",
line=<optimized out>) at src/mongo/util/assert_util.cpp:154
#4 0x00000000224bbc48 in mongo::(anonymous namespace)::Canary::_verify (
this=<optimized out>) at src/mongo/bson/util/bson_extract.cpp:58
The "Canary::_verify" frame (number 4) has a local variable "_t" which is an
on-the-stack array and filled with "0xcd" for a span of 1024 bytes. Near the
end of this block we see two bytes of poisoned memory which were altered:
0x3fff5814c858: 0xcd 0xcd 0xcd 0xcd 0xcd 0xcd 0xcd 0xcd
0x3fff5814c860: 0xcd 0xcd 0xcd 0xcd 0xcd 0xcd 0xcd 0xcd
0x3fff5814c868: 0xcd 0xcd 0xcd 0xcd 0xcd 0xcd 0x01 0x00
0x3fff5814c870: 0xcd 0xcd 0xcd 0xcd 0xcd 0xcd 0xcd 0xcd
0x3fff5814c878: 0xcd 0xcd 0xcd 0xcd 0xcd 0xcd 0xcd 0xcd
Note the two bytes set to values "0x01" and "0x00".
At the time of core-dump all the other threads seemed to be paused on system
calls such as "recv" or "__pthread_cond_wait". The verify function is called
when setting up our software canary, and checks the memory immediately after
its setup. We do not run any other functions on this thread between the
memory poisoning and the verification of the poisoning. All other threads
appear to be paused at this time.
== Comment: #16 - Calvin L. Sze <calvins@xxxxxxxxxx> - 2016-11-06 10:59:40 ==
A follow up message from Mongo
The function calling the canary code, which you'll want to possibly
disassemble is in frame 6:
#6 mongo::bsonExtractStringField (object=..., fieldName=...,
out=0x3fff5814caa8) at src/mongo/bson/util/bson_extract.cpp:138
The lower numbered frames deal with the canary code
itself.
== Comment: #17 - Calvin L. Sze <calvins@xxxxxxxxxx> - 2016-11-06 11:03:46 ==
From Andrew,
>Given we have multiple compilers producing the same results, we may want to
>think more about the runtime environment -- are you using the same glibc and
>libstdc++ in all cases? Clang at least would pick up the distro versions, as
>it doesn't provide its own.
We have repro'd with three compilers:
- The system GCC, using system libstdc++ and system glibc
- Our hand-rolled GCC, using its own libstdc++, and system glibc
- One off clang-3.9 build, using system libstdc++, and system glibc.
Coincidentally, both system and hand-rolled GCC are 5.4.0, so there may not be as much variation there as hoped. We could try building with clang and libc++ to at least rule out libstdc++ as a factor.
>One reason you see this on Ubuntu 16.04 and not on the other linux distro is likely because of
>glibc level. The other linux distro's glibc is quite old by comparison. glibc 2.23, which
>appears on Ubuntu 16.04, is the first version to be compiled with
>-fstack-protector-strong by default.
I'm not sure I follow. Our software has been built with -fstack-protector-strong on both platforms, whether or not glibc has been, and the invocation of the __stack_chk_fail function is always from our code, not from glibc, or libstdc++. So, I'd expect that if there were stack corruption taking place as a result of our code, we would see the stack protector trip on both platforms. Or are you saying that on platforms where glibc itself wasn't built with -fstack-protector-whatever that user code built with that same flag won't report errors?
>So this doesn't necessarily mean that the
>bug doesn't exist elsewhere; it just means that the stack protector code isn't
>enabled to spot the problem. If the stack corruption is benign, then it
>wouldn't be noticed otherwise.
Yeah, still confused. I can definitely make the other linux distro box
report a stack corruption:
[amorrow@xxxxxxxxxxxxxxxx.build ~]$ cat > boom.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
struct no_chars {
unsigned int len;
unsigned int data;
};
int main(int argc, char * argv[])
{
struct no_chars info = { };
if (argc < 3) {
fprintf(stderr, "Usage: %s LENGTH DATA...\n", argv[0]);
return 1;
}
info.len = atoi(argv[1]);
memcpy(&info.data, argv[2], info.len);
return 0;
}
[amorrow@xxxxxxxxxxxxxxxxxx.build ~]$ gcc -Wall -O2 -U_FORTIFY_SOURCE -fstack-protector-strong boom.c -o boom
[amorrow@xxxxxxxxxxxxxxxxxx.build ~]$ ./boom 64 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
*** stack smashing detected ***: ./boom terminated
Segmentation fault
I assume that glibc 2.23 was compiled with Ubuntu's version of gcc 5 that ships
with the system, in case that becomes relevant.
Correct, we have not made any changes to glibc - we are using the stock version that ships on the system.
== Comment: #18 - Calvin L. Sze <calvins@xxxxxxxxxx> - 2016-11-06 11:04:24 ==
From Andrew
Also, I want to re-iterate that while we have definitely observed
cases where the stack protector detects the stack corruption, we have
also observed stack corruption within our own hand-rolled stack
buffer, per the code posted earlier. The core dump that Adam provided
is of this latter sort So to some extent, this is independent of
-fstack-protector-strong.
One thing that I have not yet ruled out is whether -fstack-protect-strong could itself be at fault, somehow, though I find that unlikely given that we have reproduced with clang as well.
Still, it sounds like a worthwhile experiment, so I will see if I can still detect corruption in our hand-rolled stack canary when building without any form of -fstack-protector enabled.
== Comment: #19 - Calvin L. Sze <calvins@xxxxxxxxxx> - 2016-11-06 11:05:58 ==
From Andrew,
I've performed this experiment, replacing our use of -fstack-protector-strong with -fno-stack-protector when building MongoDB, and I can confirm that we still observe stack corruption in our hand-rolled canary, per the code posted earlier.
I have a core file and executable. Let me know if you would be interested in my providing those in addition to the files provided yesterday by Adam.
== Comment: #21 - William J. Schmidt <wschmidt@xxxxxxxxxx> - 2016-11-07 11:10:54 ==
Andrew, thanks for all the details, and for the binary and core file! I'll start poking through them this morning. I've just been absorbing all the notes that Calvin dumped into our bug tracking system yesterday.
You can ignore what I was saying about -fstack-protector-strong. My
thought at the time was that *if* the flow of control entered glibc,
that whether or not the code *there* was compiled with -fstack-
protector-strong might prove to make a difference. Reading back
through today, I see that was off base, so sorry for the distraction.
While I'm looking at the binary, there are a couple of other things you might want to try:
- Replace ::memset with __builtin_memset with GCC to see whether that makes any difference;
- Try Ulrich Weigand's suggestions from comment #9;
- As you suggested, try clang + libc++ to try to rule libstdc++ in or out.
A couple of questions that may or may not prove relevant:
- You've mentioned you don't get the crashes on the other linux distro. Have you tried your modified canary on the other linux distro anyway? If we're certain the two systems behave differently with the canary that may help us in narrowing things down.
- Which version of the C++ standard are you compiling against? Is it just the default on all systems, or are you forcing a specific -std=...?
== Comment: #22 - William J. Schmidt <wschmidt@xxxxxxxxxx> - 2016-11-07 12:18:41 ==
I'm having some difficulties with core file compatibility. I put your files on an Ubuntu 16.04.1 system, but I don't see quite the same results as you report under gdb, with libc and libgcc shared libs not at the correct address and a problem with the stack. There's a transcript below. I'm particularly concerned about the warning that the core file and executable may not match. Note also the report of stack corruption above frame #4, so I can't get to frame #6 to look at the register state. The library frames at #0-#3 are reporting the wrong information, which I assume to be because the libraries are at the wrong address.
For debug purposes it would probably be best to use the system
compiler, just in case that wasn't the case here.
$ ls -l
total 1950688
-rw-r--r-- 1 wschmidt wschmidt 700141992 Nov 7 14:37 mongod.power
-rw-r--r-- 1 wschmidt wschmidt 1297350656 Nov 7 14:39 mongod.power.core
$ gdb mongod.power mongod.power.core
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "powerpc64le-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from mongod.power...done.
warning: core file may not match specified executable file.
[New LWP 101461]
[New LWP 100045]
[New LWP 100062]
[New LWP 100056]
[New LWP 99983]
[New LWP 100052]
[New LWP 100054]
[New LWP 99892]
[New LWP 100051]
[New LWP 100048]
[New LWP 100007]
[New LWP 99868]
[New LWP 100059]
[New LWP 101459]
[New LWP 100001]
[New LWP 99986]
[New LWP 101403]
[New LWP 99980]
[New LWP 99882]
[New LWP 99893]
[New LWP 99877]
[New LWP 99872]
[New LWP 101462]
[New LWP 99874]
[New LWP 100058]
[New LWP 100231]
[New LWP 99994]
[New LWP 99873]
[New LWP 100003]
[New LWP 99993]
[New LWP 99879]
[New LWP 101398]
[New LWP 99891]
[New LWP 99880]
[New LWP 99910]
[New LWP 99895]
[New LWP 99901]
[New LWP 100011]
[New LWP 99974]
[New LWP 100049]
[New LWP 99898]
[New LWP 99875]
[New LWP 101460]
[New LWP 99878]
[New LWP 99871]
[New LWP 99896]
[New LWP 101954]
[New LWP 101406]
[New LWP 100015]
[New LWP 100068]
[New LWP 99984]
[New LWP 101519]
[New LWP 100053]
[New LWP 99996]
[New LWP 100050]
[New LWP 100055]
[New LWP 100057]
[New LWP 101807]
[New LWP 99890]
[New LWP 100004]
[New LWP 99884]
[New LWP 101437]
[New LWP 101455]
[New LWP 100013]
[New LWP 99894]
[New LWP 101411]
[New LWP 101457]
[New LWP 101431]
[New LWP 101458]
[New LWP 100443]
[New LWP 101438]
[New LWP 101414]
[New LWP 101433]
[New LWP 101784]
[New LWP 99979]
[New LWP 101397]
[New LWP 101402]
[New LWP 101401]
[New LWP 101435]
[New LWP 101405]
[New LWP 101423]
[New LWP 101425]
[New LWP 99897]
[New LWP 101419]
[New LWP 99989]
[New LWP 101409]
[New LWP 100008]
[New LWP 101410]
[New LWP 99998]
[New LWP 101413]
[New LWP 101469]
[New LWP 101418]
[New LWP 101427]
[New LWP 101399]
[New LWP 101235]
[New LWP 101396]
[New LWP 101421]
[New LWP 99990]
[New LWP 101407]
[New LWP 101480]
[New LWP 100060]
[New LWP 101499]
[New LWP 101506]
[New LWP 101395]
[New LWP 101415]
[New LWP 101400]
[New LWP 101412]
[New LWP 101408]
[New LWP 101420]
[New LWP 101416]
[New LWP 101492]
[New LWP 101513]
[New LWP 101782]
[New LWP 101404]
[New LWP 101481]
[New LWP 101417]
[New LWP 100067]
[New LWP 101429]
[New LWP 99883]
[New LWP 101430]
[New LWP 101436]
[New LWP 101454]
[New LWP 101428]
[New LWP 101422]
[New LWP 100108]
[New LWP 101434]
[New LWP 100064]
[New LWP 101453]
[New LWP 100061]
[New LWP 101426]
[New LWP 100066]
[New LWP 101452]
[New LWP 101439]
[New LWP 101456]
[New LWP 101451]
[New LWP 101450]
[New LWP 101432]
[New LWP 101449]
[New LWP 101424]
[New LWP 100065]
[New LWP 100063]
[New LWP 101448]
[New LWP 101447]
[New LWP 101446]
[New LWP 101445]
[New LWP 101444]
[New LWP 101443]
[New LWP 101442]
[New LWP 101441]
[New LWP 101440]
warning: .dynamic section for "/lib/powerpc64le-linux-
gnu/libgcc_s.so.1" is not at the expected address (wrong library or
version mismatch?)
warning: .dynamic section for "/lib/powerpc64le-linux-gnu/libc.so.6" is not at the expected address (wrong library or version mismatch?)
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/powerpc64le-linux-gnu/libthread_db.so.1".
Core was generated by `/home/pic1user/proj/mongo-repro/mongod --oplogSize 1024 --port 30012 --nopreall'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00003fff997be5d0 in __copysign (y=<optimized out>, x=<optimized out>)
at ../sysdeps/generic/math_private.h:233
233 ../sysdeps/generic/math_private.h: No such file or directory.
[Current thread is 1 (Thread 0x3fff5814ec20 (LWP 101461))]
(gdb) bt
#0 0x00003fff997be5d0 in __copysign (y=<optimized out>, x=<optimized out>)
at ../sysdeps/generic/math_private.h:233
#1 __modf_power5plus (x=-6.2774385622041925e+66, iptr=0x3fff5814c1f0)
at ../sysdeps/powerpc/power5+/fpu/s_modf.c:44
#2 0x00003fff997be4f0 in ?? () from /lib/powerpc64le-linux-gnu/libc.so.6
#3 0x00003fff997c0c00 in ?? () at ../signal/allocrtsig.c:45
from /lib/powerpc64le-linux-gnu/libc.so.6
#4 0x00000000223c33e8 in mongo::invariantFailed (expr=<optimized out>,
file=0x24131b38 "src/mongo/bson/util/bson_extract.cpp",
line=<optimized out>) at src/mongo/util/assert_util.cpp:154
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) quit
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/powerpc64le-linux-gnu/5/lto-wrapper
Target: powerpc64le-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/IBM 5.4.0-6ubuntu1~16.04.2' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-ppc64el/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-ppc64el --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-ppc64el --with-arch-directory=ppc64le --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-secureplt --with-cpu=power8 --enable-targets=powerpcle-linux --disable-multilib --enable-multiarch --disable-werror --with-long-double-128 --enable-checking=release --build=powerpc64le-linux-gnu --host=powerpc64le-linux-gnu --target=powerpc64le-linux-gnu
Thread model: posix
gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.2)
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.1 LTS
Release: 16.04
Codename: xenial
$
I'll disassemble the binary and see if I can spot anything without the state information.
Oh, still waiting on permission to mirror the bug.
== Comment: #23 - William J. Schmidt <wschmidt@xxxxxxxxxx> - 2016-11-07 13:39:45 ==
A little more information:
I've been looking at bsonExtractStringField's disassembly. It appears
that this binary inlines the call to the Canary constructor as well as
the call to _verify. As evidence, I see the PLT call to glibc's
memset:
8ebb3c: 71 c9 06 48 bl 9584ac
<00000d72.plt_call.memset@@GLIBC_2.17>
And later I see the call to invariantFailed:
8ebc44: e9 75 f0 4b bl 7f322c
<_ZN5mongo15invariantFailedEPKcS1_j+0x8>
So we've answered Steve's initial question about which memset we're
using. This isn't being inlined by the compiler, but does an out-of-
line dynamic call to the GLIBC_2.17 version.
I'm not sure whether GCC would inline a 1024-byte memset using
__builtin_memset, or just end up calling out the same way, but it
might be worth trying out that replacement, and disassembling
bsonExtractStringField again to see if the PLT call has gone away.
== Comment: #24 - William J. Schmidt <wschmidt@xxxxxxxxxx> - 2016-11-07 13:50:04 ==
I forgot to mention that the ensuing code generation to accumulate the checksum and test it is completely straightforward and looks correct. So this looks like pretty strong evidence that the problem is in the GLIBC memset implementation.
8ebb3c: 71 c9 06 48 bl 9584ac <00000d72.plt_call.memset@@GLIBC_2.17>
8ebb40: 18 00 41 e8 ld r2,24(r1)
8ebb44: 00 04 40 39 li r10,1024
8ebb48: 00 00 20 39 li r9,0
8ebb4c: a6 03 49 7d mtctr r10
8ebb50: 00 00 43 89 lbz r10,0(r3)
8ebb54: 01 00 63 38 addi r3,r3,1
8ebb58: 14 52 29 7d add r9,r9,r10
8ebb5c: f4 ff 00 42 bdnz 8ebb50 <_ZN5mongo22bsonExtractStringFieldERKNS_7BSONObjENS_10StringDataEPNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x80>
8ebb60: 03 00 40 3d lis r10,3
8ebb64: 00 34 4a 61 ori r10,r10,13312
8ebb68: 00 50 a9 7f cmpd cr7,r9,r10
8ebb6c: c4 00 9e 40 bne cr7,8ebc30 <_ZN5mongo22bsonExtractStringFieldERKNS_7BSONObjENS_10StringDataEPNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x160>
...
8ebc30: 44 ff 82 3c addis r4,r2,-188
8ebc34: 44 ff 62 3c addis r3,r2,-188
8ebc38: 3a 00 a0 38 li r5,58
8ebc3c: 38 aa 84 38 addi r4,r4,-21960
8ebc40: 60 aa 63 38 addi r3,r3,-21920
8ebc44: e9 75 f0 4b bl 7f322c <_ZN5mongo15invariantFailedEPKcS1_j+0x8>
== Comment: #28 - William J. Schmidt <wschmidt@xxxxxxxxxx> - 2016-11-08 11:02:18 ==
Recording some information from email discussions.
(1) The customer is planning to attempt to use valgrind memcheck.
(2) The const cast problem with the canary has been fixed without changing the results.
(3) Prior to that fix, the canary was used on the RHEL system with no corruption detected, so this does seem to be Ubuntu-specific.
(4) -std=c++11 is used everywhere.
(5) The core and binary compatibility issues appear to be that they were generated on 16.10, not 16.04. New ones coming.
(6) The canary code now looks like:
+namespace {
+
+class Canary {
+public:
+
+ static constexpr size_t kSize = 2048;
+
+ explicit Canary(volatile unsigned char* const t) noexcept : _t(t) {
+ __builtin_memset(const_cast<unsigned char*>(t), kBits, kSize);
+ _verify();
+ }
+
+ ~Canary() {
+ _verify();
+ }
+
+private:
+ static constexpr uint8_t kBits = 0xCD;
+ static constexpr size_t kChecksum = kSize * size_t(kBits);
+
+ void _verify() const noexcept {
+ invariant(std::accumulate(&_t[0], &_t[kSize], 0UL) == kChecksum);
+ }
+
+ const volatile unsigned char* const _t;
+};
+
+} // namespace
+
And its application in bsonExtractTypedField looks like:
@@ -47,6 +82,10 @@ Status bsonExtractTypedField(const BSONObj& object,
StringData fieldName,
BSONType type,
BSONElement* outElement) {
+
+ volatile unsigned char* const cookie = static_cast<unsigned char *>(alloca(Canary::kSize));
+ const Canary c(cookie);
+
Status status = bsonExtractField(object, fieldName, outElement);
(7) Steve Munroe investigated memset and he and Andrew are in
agreement that we can rule it out:
I looked at the memset_power8 code (memset is just a IFUNC resolve
stub). and I don't see how this problem is caused by memset_power8.
First some observations:
The canary is allocated with alloca for a large power of 2 (1024 bytes).
Alloca returns quadword aligned memory as required to maintain quadword stack alignment.
For this case memset_power8 will quickly jump to the vector store loop (quadword x 8) all from the same register (a vector splat of the fill char).
With this code the failure modes could only be:
Overwrite by N*quadwords,
Underwrite by N*quadwords,
A repeated pattern every quadword.
But we are not see this. Also think we are back to a clobber by some
other code.
== Comment: #29 - William J. Schmidt <wschmidt@xxxxxxxxxx> - 2016-11-08 11:03:33 ==
From Andrew, difficulties with Valgrind:
I did try the valgrind repro. However, I'm not able to make valgrind
work:
The first try resulted in lots of "mismatched free/delete" reports,
which is sort of odd, because they all seem to be from within the
standard library:
> valgrind --soname-synonyms=somalloc=NONE --track-origins=yes --leak-check=no ./mongos
==17387== Memcheck, a memory error detector
==17387== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==17387== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==17387== Command: ./mongos
==17387==
==17387== Mismatched free() / delete / delete []
==17387== at 0x4895888: free (in /usr/lib/valgrind/vgpreload_memcheck-ppc64le-linux.so)
==17387== by 0x59514F: deallocate (new_allocator.h:110)
==17387== by 0x59514F: deallocate (alloc_traits.h:517)
==17387== by 0x59514F: _M_deallocate_buckets (hashtable_policy.h:2010)
==17387== by 0x59514F: _M_deallocate_buckets (hashtable.h:356)
==17387== by 0x59514F: _M_deallocate_buckets (hashtable.h:361)
==17387== by 0x59514F: _M_rehash_aux (hashtable.h:1999)
==17387== by 0x59514F: std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_rehash(unsigned long, unsigned long const&) (hashtable.h:1953)
==17387== by 0x595253: std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData>, true>*) (hashtable.h:1600)
==17387== by 0x5954D3: std::__detail::_Map_base<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true>, true>::operator[](std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (hashtable_policy.h:600)
==17387== by 0x593693: operator[] (unordered_map.h:668)
==17387== by 0x593693: mongo::InitializerDependencyGraph::addInitializer(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<mongo::Status (mongo::InitializerContext*)> const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) (initializer_dependency_graph.cpp:58)
==17387== by 0x591057: mongo::GlobalInitializerRegisterer::GlobalInitializerRegisterer(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<mongo::Status (mongo::InitializerContext*)> const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) (global_initializer_registerer.cpp:44)
==17387== by 0x52D46F: __static_initialization_and_destruction_0(int, int) [clone .constprop.34] (mongos_options_init.cpp:39)
==17387== by 0x137FED3: __libc_csu_init (in /home/acm/opt/src/mongo/mongos)
==17387== by 0x4F830A7: generic_start_main.isra.0 (libc-start.c:247)
==17387== by 0x4F83337: (below main) (libc-start.c:116)
==17387== Address 0x5151fb0 is 0 bytes inside a block of size 16 alloc'd
==17387== at 0x48951D4: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-ppc64le-linux.so)
==17387== by 0x59328F: allocate (new_allocator.h:104)
==17387== by 0x59328F: allocate (alloc_traits.h:491)
==17387== by 0x59328F: std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, true> > >::_M_allocate_buckets(unsigned long) [clone .isra.108] (hashtable_policy.h:1996)
==17387== by 0x595093: _M_allocate_buckets (hashtable.h:347)
==17387== by 0x595093: _M_rehash_aux (hashtable.h:1974)
==17387== by 0x595093: std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_rehash(unsigned long, unsigned long const&) (hashtable.h:1953)
==17387== by 0x595253: std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData>, true>*) (hashtable.h:1600)
==17387== by 0x5954D3: std::__detail::_Map_base<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true>, true>::operator[](std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (hashtable_policy.h:600)
==17387== by 0x59356B: operator[] (unordered_map.h:668)
==17387== by 0x59356B: mongo::InitializerDependencyGraph::addInitializer(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<mongo::Status (mongo::InitializerContext*)> const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) (initializer_dependency_graph.cpp:46)
==17387== by 0x591057: mongo::GlobalInitializerRegisterer::GlobalInitializerRegisterer(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<mongo::Status (mongo::InitializerContext*)> const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) (global_initializer_registerer.cpp:44)
==17387== by 0x52D46F: __static_initialization_and_destruction_0(int, int) [clone .constprop.34] (mongos_options_init.cpp:39)
==17387== by 0x137FED3: __libc_csu_init (in /home/acm/opt/src/mongo/mongos)
==17387== by 0x4F830A7: generic_start_main.isra.0 (libc-start.c:247)
==17387== by 0x4F83337: (below main) (libc-start.c:116)
So, that is a puzzle. However, I can instruct valgrind to ignore that. But it still fails to start, now with something more odd:
$ valgrind --show-mismatched-frees=no --soname-synonyms=somalloc=NONE --track-origins=yes --leak-check=no ./mongos
==19834== Memcheck, a memory error detector
==19834== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==19834== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==19834== Command: ./mongos
==19834==
MC_(get_otrack_shadow_offset)(ppc64)(off=1688,sz=8)
Memcheck: mc_machine.c:329 (get_otrack_shadow_offset_wrk): the
'impossible' happened.
host stacktrace:
==19834== at 0x3808D9B8: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
==19834== by 0x3808DB5F: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
==19834== by 0x3808DCDB: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
==19834== by 0x38078CE3: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
==19834== by 0x38076FAB: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
==19834== by 0x380BAA2B: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
==19834== by 0x381B9BB7: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
==19834== by 0x380BE19F: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
==19834== by 0x3810D04F: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
==19834== by 0x3810FFEF: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
==19834== by 0x3812BB97: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
sched status:
running_tid=1
Thread 1: status = VgTs_Runnable (lwpid 19834)
==19834== at 0x4F3AC14: __lll_lock_elision (elision-lock.c:60)
==19834== by 0x4F2BBC7: pthread_mutex_lock (pthread_mutex_lock.c:92)
==19834== by 0x602753: mongo::DBConnectionPool::DBConnectionPool() (connpool.cpp:196)
==19834== by 0x5319EB: __static_initialization_and_destruction_0 (global_conn_pool.cpp:35)
==19834== by 0x5319EB: _GLOBAL__sub_I__ZN5mongo14globalConnPoolE (global_conn_pool.cpp:39)
==19834== by 0x137FED3: __libc_csu_init (in /home/acm/opt/src/mongo/mongos)
==19834== by 0x4F830A7: generic_start_main.isra.0 (libc-start.c:247)
==19834== by 0x4F83337: (below main) (libc-start.c:116)
Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.
If that doesn't help, please report this bug to: www.valgrind.org
In the bug report, send all the above text, the valgrind
version, and what OS and version you are using. Thanks.
I'm not really sure what to make of that, except that I did see some thing die in the same place, once or twice (__lll_lock_elision), when running with clang ASAN with the stack-use-after-return checking enabled. I wasn't really sure what to make of that, but it is interesting that this has turned up twice. I presume this is related to hardware lock elision?
Anyway, it doesn't seem like I can get this running with valgrind.
Happy to try again if anyone is aware of a workaround.
== Comment: #30 - William J. Schmidt <wschmidt@xxxxxxxxxx> - 2016-11-08 11:06:00 ==
CCing Carl Love. Carl, have you seen this sort of interaction between valgrind and lock elision before? (Comment #29, you can ignore the rest of this bugzilla for now.)
To manage notifications about this bug go to:
https://bugs.launchpad.net/glibc/+bug/1640518/+subscriptions