Native mono crashes [kernel fix released]

“For every action there is an equal and opposite reaction.”

We’ve had quite a few reports about odd crashes, most of them known to be so called Native crashes, where mono cries out in terror and dies without telling us why.

Initially this thread was to collect important information about the systems on which the crashes occurred.

Since this thread was posted I’ve literally spent more than a hundred hours investigating this issue.
Debugging Sonarr and mono countless times, compiled mono half a dozen times, even compiled the linux kernel several times.
Late April I finally isolated a problem in a linux kernel update.
From what I can see, the problem is specific to virtual machines and has to do with how the kernel detects whether a thread execution was moved from one physical cpu core to another. (It’s specific to VMs running on multi-core hosts… which is pretty much all of them nowadays)
Contrary to earlier reports, this issue likely affects both Virtual and Physical machines.
Please note that this was a regression in the kernel and got fixed later on in linux 4.0.

The first UBUNTU kernel version in which it occurred first was 3.13.0-48.

Once I got all the information and a possible solution, it was reported to the Ubuntu Kernel team. They already committed those changes in the vNext branch.
Those updates have now been released to the main ubuntu repository.
The kernel updates are currently in trusty-update:
Trusty 3.13.0-54
Utopic 3.16.0-40 (which gets installed on new Trusty installs too via LTS backport)

Looks like linux kernel 4.1.x has a similar issue again

##References
Xamarin: https://bugzilla.xamarin.com/show_bug.cgi?id=29212 (stalled, pending kernel fixes)
Ubuntu: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1450584 (committed, pending release)
Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=784960 tnx to @adhawkins
Emby: http://emby.media/community/index.php?/topic/19955-emby-crashing-ubuntu-server/?p=207271

1 Like

####Crash Stacktrace
Both SIGSEGV and SIGABRT and related stacks observed by my on a user setup.

####Versions
Linux Distribution: Debian GNU/Linux 8.0 (jessie)
Sonarr: .3004
Mono: 3.10.0-0xamarin2
libmediainfo0: 0.7.70-1
libsqlite3-0: 3.8.7.1-1

####Additional info
Quite reproducible, started a month ago about daily atm it can happen anywhere from immediately to a few hours.

Linux Distro: Ubuntu 14.04.2
Sonarr: .3004
Mono: 3.12.1
libmediainfo0: 0.7.72
libsqlite: 3.8.2

error log from Sonarr: http://pastebin.com/C78UCUYp

if there are other log files I need to post or can locate… let me know and I will help out all I can.

Linux: Ubuntu 14.04.2 LTS trusty
Sonarr: 2.0.0.3037
Mono: 3.10.0 (tarball Wed Nov 5 12:50:04 UTC 2014)
MediaInfoLib - v0.7.72
SQLite version 3.8.2 2013-12-06 14:53:30

quite a few crashes in the below log, sonarr rarely running for more than a few minutes
http://pastebin.com/ZQxwrvsH

also
http://pastebin.com/YqmaqEMa

Ubuntu 14.04.2 LTS
Version 2.0.0.3037
Mono JIT compiler version 3.12.1 (mono-3.12.0-branch/0849ec7 Sat Apr 4 16:59:16 PDT 2015)
libmediainfo0 0.7.67-2ubuntu1
libsqlite3-0 3.8.2-1ubuntu2
sqlite3 3.8.2-1ubuntu2

I built mono and had gdb installed. I saw the following crash report:
http://pastebin.com/E1DPgAJE

I’ve seen the same crash more than once.

Mine is crashing too now.

Native stacktrace:

    mono() [0x4accac]
    /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340) [0x7f5abedd0340]
    /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x39) [0x7f5abea31cc9]
    /lib/x86_64-linux-gnu/libc.so.6(abort+0x148) [0x7f5abea350d8]
    mono() [0x6232f9]
    mono() [0x623507]
    mono() [0x6235b2]
    mono() [0x4aebe7]
    mono() [0x4af3c3]
    [0x40d0dde6]

Debug info from gdb:

  • Assertion: should not be reached at sgen-scan-object.h:107

Aborted (core dumped)

lsb_release -a
Distributor ID: Ubuntu
Description: Ubuntu 14.04.2 LTS
Release: 14.04
Codename: trusty

Mono --version
Mono JIT compiler version 3.12.1 (tarball Fri Mar 6 19:12:47 UTC 2015)
Copyright © 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
TLS: __thread
SIGSEGV: altstack
Notifications: epoll
Architecture: amd64
Disabled: none
Misc: softdebug
LLVM: supported, not enabled.
GC: sgen

mediainfo --version
MediaInfo Command line,
MediaInfoLib - v0.7.72

I’m getting the same issue:

Native stacktrace:

mono() [0x4accac]
mono() [0x50451f]
mono() [0x42a7c7]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340) [0x7f1c13738340]
mono() [0x533653]
[0x4031ad5a]

and:

Distributor ID:	Ubuntu
Description:	Ubuntu 14.04.2 LTS
Release:	14.04
Codename:	trusty

Sonarr version 2.0.0.3004
Mono version: 3.12.1 (but also present on 3.10, I upgraded that first when the error started)
libmediainfo0: libmediainfo0:amd64/trusty 0.7.67-2ubuntu1 uptodate
libsqllite3-0: libsqlite3-0:amd64/trusty 3.8.2-1ubuntu2 uptodate

Ubuntu 14.04.2 LTS
Mono JIT compiler version 3.10.0 (tarball Wed Nov 5 12:50:04 UTC 2014)
MediaInfoLib - v0.7.67
sqlite3 - 3.8.2-1ubuntu2
Sonarr version 2.0.0.3004

Native stacktrace:
mono() [0x4b3f7c]
mono() [0x50c30f]
mono() [0x423637]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340) [0x7f04f9314340]
[0x404cea11]

http://pastebin.com/2sQNi3eQ

Happens several times a day. This is a brand new sonarr install, I am a new user to it.

[Info] RssSyncService: Starting RSS Sync
[Info] DownloadDecisionMaker: Processing 400 reports
[Info] RssSyncService: RSS Sync Completed. Reports found: 400, Reports grabbed: 0
Stacktrace:

Native stacktrace:

mono() [0x4b3f7c]
mono() [0x50c30f]
mono() [0x423637]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340) [0x7f86b1986340]

Debug info from gdb:

=================================================================
Got a SIGSEGV while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries
used by your application.

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.2 LTS
Release: 14.04
Codename: trusty

$ mono --version
Mono JIT compiler version 3.10.0 (tarball Wed Nov 5 12:50:04 UTC 2014)
Copyright © 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
TLS: __thread
SIGSEGV: altstack
Notifications: epoll
Architecture: amd64
Disabled: none
Misc: softdebug
LLVM: supported, not enabled.
GC: sgen

Sonarr 2.0.0.3004
libmediainfo 0.7.67-2ubuntu1
sqlite3 3.8.2-1ubuntu2

Guys, tnx for the reports so far, keep em coming!

My own setup on which the problem has NOT occurred:

Hardware: i5 quadcore home server
Linux Distribution: Ubuntu 14.04.2 LTS
Sonarr: 2.0.0.3034
Mono: 3.10.0-0xamarin2 (tarball Wed Nov 5 12:50:04 UTC 2014)
libmediainfo0: 0.7.67-2ubuntu1
libsqlite3-0: 3.8.2-1ubuntu2
libgdiplus: 3.8-0xamarin1

Based on the reports so far, I started to make some comparisons, pour over some logs.
So far we only had reports on ubuntu 14.04 but that’s a highly biased statistic I suspect.
Other than that we’ve seen several versions of mono and mediainfo.

The log files are a bit ambiguous. But that’s expected. Problem is that it’s hard to dive into the issues without getting internal info.

@bradkollmyer
How often do you think it happens?
According to your log gdb stopped dumping the backtraces due to some error so I couldn’t see the all of those traces.
If it happens relatively frequently you could attach the gdb debugger and use the mono specific commands (mono_backtrace, mono_stack etc, google it) which only work on a running process, not a core dump (sadly, otherwise I would just have you zip the core dump).
Or just lemme know and i’ll help out with the gdb commands.

Two methods:
1)
MONO_DEBUG=suspend-on-sigsegv mono --debug /opt/NzbDrone/NzbDrone.exe
That will suspend the process when a SIGSEGV occurs. Allowing you to attach gdb to it at your leisure.

MONO_DEBUG=explicit-null-checks gdb --args mono --debug /opt/NzbDrone/NzbDrone.exe
then type run in the gdb command prompt.
This option is a bit more risky, coz the explicit-null-checks is needed to avoid false-positives but might prevent the issue from occurring.

Either must be run as the user normally running Sonarr. It might also be useful to run it inside of screen so you can safely detach/reattach to that screen virtual terminal from whatever ssh session you like.

Essentially what I need is ssh root access to a machine on which it occurs frequently enough to be be able to debug hands-on. But for now i’ll settle for ‘more information’ :smiley:

Edit: Btw. if any of you dabble with dev-stuff, gdb. Feel free to jump in as well. Doing this all alone is painful.

For me it happens >30 times a day! I finally started using monit to relaunch the process. I have tried to use gdb to debug the problem, but gdb tends to core dump when this problem happens.

For me the problem seems to be in building a X509 cert. It does not happen all the time, which makes me start to think there are some bad servers in a load balancer out there.

Let me think on how to get you SSH access to my machine. Send me an email so we can coordinate this.

I’ll try the methods you have suggested.

Lemme know how it goes then we’ll talk later about setting up a link bit late today anyway (for me anyway).

Btw you got any ssl connected indexers? nzbgeek, oznzb, deepcave? Just following up in that X509.

Down to just using Omgwtfnzbs, try this for a few hours

I know everyone has been looking at what they have running as far as library versions etc

Any chance our issue could be the tools the devs use to compile Sonarr?

So, I’ve never really used gdb but figured I’d try what you suggested. Using method 2, I get the following:

Starting program: /usr/bin/mono --debug /opt/NzbDrone/NzbDrone.exe
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff4ee9700 (LWP 31402)]
[Info] Bootstrap: Starting NzbDrone - /opt/NzbDrone/NzbDrone.exe - Version 2.0.0.3004

Program received signal SIGPWR, Power fail/restart.
[Switching to Thread 0x7ffff4ee9700 (LWP 31402)]
sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
85      ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S: No such file or directory.

And the backtrace:

(gdb) backtrace
#0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
#1  0x0000000000619238 in mono_sem_wait ()
#2  0x000000000059d02d in ?? ()
#3  0x0000000000582484 in ?? ()
#4  0x000000000061e0b6 in ?? ()
#5  0x00007ffff74b2182 in start_thread (arg=0x7ffff4ee9700) at pthread_create.c:312
#6  0x00007ffff71df47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

At which point I realize I have no idea what I’m doing and so I’ll just wait for people who know a bit more to take over.

@threz_ SIGPWR = Power signal, not crash. system maybe wanted to go into standby or got unplugged from a hardline?
type c at the gdb prompt to continue.

In anycase, unrelated. sry.

After a bit of googling seems like SGPWR and SIGXCPU are common in Mono so I used:

handle SIGPWR ignore noprint pass 
handle SIGXCPU ignore noprint pass 

After about 2 minutes, Sonarr then crashed with the following:

[Info] RefreshEpisodeService: Starting episode info refresh for: [94571][Community] 

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffef27b700 (LWP 6649)]
0x000000004023cdd4 in ?? ()
(gdb) backtrace
#0  0x000000004023cdd4 in ?? ()
#1  0x0000000001fe0e78 in ?? ()
#2  0x00007ffff692bcf8 in ?? ()
#3  0x000001f142a0fd1e in ?? ()
#4  0x0000000000000038 in ?? ()
#5  0x0000000000ebc988 in ?? ()
#6  0x0000000000b17d38 in ?? ()
#7  0x00007fffcb6ce7f0 in ?? ()
#8  0x00007fffcb6ce820 in ?? ()
#9  0x0000000000ebc968 in ?? ()
#10 0x0000000000b17d38 in ?? ()
#11 0x00007fffcb6ce7f0 in ?? ()
#12 0x0000000000002710 in ?? ()
#13 0x000001f13fa60c9e in ?? ()
#14 0x00007fffcb6ce7f0 in ?? ()
#15 0x0000000000000000 in ?? ()

Recent git branch of 3.12.1 is not crashing anymore for me (Arch) and Betrayed (Debian). We both have a stable 3.10.0 and the distro 3.12.1 is acting up unless we use the git branch.

I will try to replicate this on the laptop with faulty hardware that was affected heavily by SIGSEGV crashes.

If you could link to some instructions or a guide, I could try to get git and the new branch of Mono running on my setup once I get home tonight.