Native mono crash on linux kernel 4.1.2

I’m getting the (another?) native mono crash again with version Sonarr 2.0.0.3328 (July 16, 2015 build) on a current kernel.

systemctl status sonarr -l
● sonarr.service - Sonarr Daemon
   Loaded: loaded (/usr/lib/systemd/system/sonarr.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Thu 2015-07-16 08:28:30 EDT; 3min 54s ago
  Process: 3120 ExecStart=/usr/bin/mono /opt/NzbDrone/NzbDrone.exe -nobrowser (code=exited, status=0/SUCCESS)
 Main PID: 3120 (code=exited, status=0/SUCCESS)

Jul 16 08:28:30 computer mono[3120]: EPIC FAIL: System.NullReferenceException: Object reference not set to an instance of an object
Jul 16 08:28:30 computer mono[3120]: at System.Threading.Timer+Scheduler.ShrinkIfNeeded (System.Collections.Generic.List`1 list, Int32 initial) [0x00000] in <filename unknown>:0
Jul 16 08:28:30 computer mono[3120]: at System.Threading.Timer+Scheduler.SchedulerThread () [0x00000] in <filename unknown>:0
Jul 16 08:28:30 computer mono[3120]: at System.Threading.Thread.StartInternal () [0x00000] in <filename unknown>:0
Jul 16 08:28:30 computer mono[3120]: [Fatal] GlobalExceptionHandlers: EPIC FAIL: Object reference not set to an instance of an object
Jul 16 08:28:30 computer mono[3120]: System.NullReferenceException: Object reference not set to an instance of an object
Jul 16 08:28:30 computer mono[3120]: at System.Threading.Timer+Scheduler.ShrinkIfNeeded (System.Collections.Generic.List`1 list, Int32 initial) [0x00000] in <filename unknown>:0
Jul 16 08:28:30 computer mono[3120]: at System.Threading.Timer+Scheduler.SchedulerThread () [0x00000] in <filename unknown>:0
Jul 16 08:28:30 computer mono[3120]: at System.Threading.Thread.StartInternal () [0x00000] in <filename unknown>:0
Jul 16 08:28:30 computer mono[3120]: [ERROR] FATAL UNHANDLED EXCEPTION: System.NullReferenceException: Object reference not set to an instance of an object
Jul 16 08:28:30 computer mono[3120]: at System.Threading.Timer+Scheduler.ShrinkIfNeeded (System.Collections.Generic.List`1 list, Int32 initial) [0x00000] in <filename unknown>:0
Jul 16 08:28:30 computer mono[3120]: at System.Threading.Timer+Scheduler.SchedulerThread () [0x00000] in <filename unknown>:0
Jul 16 08:28:30 computer mono[3120]: at System.Threading.Thread.StartInternal () [0x00000] in <filename unknown>:0
Jul 16 08:28:30 computer mono[3120]: Stacktrace:
Jul 16 08:28:30 computer mono[3120]: Native stacktrace:
Jul 16 08:28:30 computer mono[3120]: /usr/bin/mono() [0x4b20bc]
Jul 16 08:28:30 computer mono[3120]: /usr/bin/mono() [0x5086ee]
Jul 16 08:28:30 computer mono[3120]: /usr/bin/mono() [0x428f7d]
Jul 16 08:28:30 computer mono[3120]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x10d10) [0x7fb3c482cd10]
Jul 16 08:28:30 computer mono[3120]: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSbIwSt11char_traitsIwESaIwEEC2ERKS2_+0x1b) [0x7fb393994a4b]
Jul 16 08:28:30 computer mono[3120]: /usr/lib/x86_64-linux-gnu/libmediainfo.so.0(+0x5cc31) [0x7fb3b8682c31]

Here’s my mono info:

mono --version
Mono JIT compiler version 4.0.2 (Stable 4.0.2.5/c99aa0c Wed Jun 24 10:04:37 UTC 2015)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
	TLS:           __thread
	SIGSEGV:       altstack
	Notifications: epoll
	Architecture:  amd64
	Disabled:      none
	Misc:          softdebug 
	LLVM:          supported, not enabled.
	GC:            sgen

Also I’m running a newer kernel from the kernel-ppas archive:

uname -a
Linux computer 4.1.2-040102-generic #201507101335 SMP Fri Jul 10 17:36:40 UTC 2015 x86_64

I’m getting the native mono crash again with version Sonarr 2.0.0.3325 (July 14, 2015 build).

systemctl status sonarr -l
● sonarr.service - Sonarr Daemon
   Loaded: loaded (/usr/lib/systemd/system/sonarr.service; enabled; vendor preset: enabled)
   Active: failed (Result: signal) since Wed 2015-07-15 23:32:33 EDT; 5min ago
  Process: 7581 ExecStart=/usr/bin/mono /opt/NzbDrone/NzbDrone.exe -nobrowser (code=killed, signal=SEGV)
 Main PID: 7581 (code=killed, signal=SEGV)

Jul 15 23:32:32 computer mono[7581]: at NLog.Logger.WriteToTargets (NLog.LogLevel level, IFormatProvider formatProvider, System.String message, System.Object[] args) [0x00000] in <filename unknown>:0
Jul 15 23:32:32 computer mono[7581]: at NLog.Logger.WriteToTargets (NLog.LogLevel level, System.String message, System.Object[] args) [0x00000] in <filename unknown>:0
Jul 15 23:32:32 computer mono[7581]: at NLog.Logger.Debug (System.String message, System.String argument) [0x00000] in <filename unknown>:0
Jul 15 23:32:32 computer mono[7581]: at NzbDrone.Core.Parser.ParsingService.GetSeries (NzbDrone.Core.Parser.Model.ParsedEpisodeInfo parsedEpisodeInfo, Int32 tvRageId) [0x00000] in <filename unknown>:0
Jul 15 23:32:32 computer mono[7581]: at NzbDrone.Core.Parser.ParsingService.Map (NzbDrone.Core.Parser.Model.ParsedEpisodeInfo parsedEpisodeInfo, Int32 tvRageId, NzbDrone.Core.IndexerSearch.Definitions.SearchCriteriaBase searchCriteria) [0x00000] in <filename unknown>:0
Jul 15 23:32:32 computer mono[7581]: at NzbDrone.Core.DecisionEngine.DownloadDecisionMaker+<GetDecisions>d__0.MoveNext () [0x00000] in <filename unknown>:0
Jul 15 23:32:33 computer mono[7581]: Stacktrace:
Jul 15 23:32:33 computer systemd[1]: sonarr.service: main process exited, code=killed, status=11/SEGV
Jul 15 23:32:33 computer systemd[1]: Unit sonarr.service entered failed state.
Jul 15 23:32:33 computer systemd[1]: sonarr.service failed.

Here’s my mono info:

mono --version
Mono JIT compiler version 4.0.2 (Stable 4.0.2.5/c99aa0c Wed Jun 24 10:04:37 UTC 2015)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
	TLS:           __thread
	SIGSEGV:       altstack
	Notifications: epoll
	Architecture:  amd64
	Disabled:      none
	Misc:          softdebug 
	LLVM:          supported, not enabled.
	GC:            sgen

Also I’m running a newer kernel from the kernel-ppas archive:

uname -a
Linux computer 4.1.2-040102-generic #201507101335 SMP Fri Jul 10 17:36:40 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

That’s what you get with new kernels :slight_smile: doesn’t look like ubuntu though, what distro?

Tbh, I’m not really interesting in analyzing this one. Do you know how long it takes to bisect the kernel tree and recompile the kernel over and over again? I’ll tell you: A very long time.

Not even sure if it’s the kernel. Do a memtest. Also run the testcase in the first post of this thread.

If the testcase doesn’t fail (after repeated runs), then it isn’t the same issue.
In either case you might wanna boot into an older kernel to verify whether it’s the kernel.

It’s on ubuntu/vivid using the mono from mono-project and the 4.1.2 kernel from kernel.ubuntu.com (kernel-ppa/mainline).

I’ll try an older kernel… but if it is a kernel 4.1 issue… that’s going to be a world of pain, since 4.1 is the new LTS kernel. :frowning:

Tried the stress test on 4.1.2.

Occasionally no errors, but usually got this:

Method (wrapper managed-to-managed) string:.ctor (char[],int,int) emitted at 0x40b5b1b0 to 0x40b5b1d9 (code length 41) 

[bug-18026.exe]
converting method (wrapper managed-to-native) object:__icall_wrapper_mono_gc_alloc_string (intptr,intptr,int)
Method (wrapper managed-to-native) object:__icall_wrapper_mono_gc_alloc_string (intptr,intptr,int) emitted at 0x40b5b1f0 to 0x40b5b284 (code length 148) [bug-18026.exe]

Unhandled Exception:
System.NullReferenceException: Object reference not set to an instance of an object
  at Test.Main () [0x00000] in <filename unknown>:0 
[ERROR] FATAL UNHANDLED EXCEPTION: System.NullReferenceException: Object reference not set to an instance of an object
  at Test.Main () [0x00000] in <filename unknown>:0 

However no problems with kernel 4.0.5. (Now installing 4.0.8, so we’ll see what happens with that.)

Hmmm.

yeah, 4.0.8 works, but 4.1.0, 4.1.1 and 4.1.2 are broken.
But wily is 4.0.x. and 4.1.x is ubuntu unstable, so plenty of time.

Are you on a VM?

I’m not on a VM. So yes, plenty of time!

I noticed you posted on xamarin, and that bug is already closed, so you’ll have to submit that differently. It isn’t necessarily the same problem. It’s just a neat stress test.

The likely trigger:
https://github.com/torvalds/linux/commit/73459e2a1ada09a68c02cc5b73f3116fc8194b3d

Yes, that commit reverts the exact fixes that were applied earlier. :anguished:
I suspect that the crashing in mono is some kind of side-effect, but without a mono expert that will be difficult to isolate.

Also happens on mono nightly 4.3.0.

Sigh. I guess at this point I’ll just back away quietly and wait for a kernel update that fixes it at some point. Since the 4.0.x series works fine, I guess that’s ok. I was just hoping to use a newer kernel since I’m a btrfs user. I feel like every new kernel series has bug fixes for that.

You’re not the only one affected by this, other apps will run into problems too btw, for example Emby (mediabrowser).

Btw, still happens of kernel 4.2-rc2.

I’ve built 4.1.0 with that commit reverted to check if it’s indeed the cause, but nope, it still crashes. So my earlier guess was wrong. Moving on to the next idea.

git bisect between 4.0 and 4.1 is 13 steps long, going to take days.

Don’t know if my report helps or not, but I’m having this same issue. It usually occurs in the same place:

Jul 29 08:20:45 server sonarr[11521]: Stacktrace:
Jul 29 08:20:45 server sonarr[11521]: at <unknown> <0xffffffff>
Jul 29 08:20:45 server sonarr[11521]: at (wrapper managed-to-native) NzbDrone.Core.MediaFiles.MediaInfo.MediaInfo.MediaInfo_Open_Buffer_Continue (intptr,byte[],intptr) <0xffffffff>
Jul 29 08:20:45 server sonarr[11521]: at NzbDrone.Core.MediaFiles.MediaInfo.MediaInfo.Open (System.IO.Stream) <0x000a7>
Jul 29 08:20:45 server sonarr[11521]: at NzbDrone.Core.MediaFiles.MediaInfo.VideoFileInfoReader.GetMediaInfo (string) <0x001db>
Jul 29 08:20:45 server sonarr[11521]: at NzbDrone.Core.MediaFiles.MediaInfo.UpdateMediaInfoService.UpdateMediaInfo (NzbDrone.Core.Tv.Series,System.Collections.Generic.List`1<NzbDro
Jul 29 08:20:45 server sonarr[11521]: at NzbDrone.Core.MediaFiles.MediaInfo.UpdateMediaInfoService.Handle (NzbDrone.Core.MediaFiles.Events.SeriesScannedEvent) <0x00133>
Jul 29 08:20:45 server sonarr[11521]: at NzbDrone.Core.Messaging.Events.EventAggregator.PublishEvent<TEvent> (TEvent) <0x00404>
Jul 29 08:20:45 server sonarr[11521]: at NzbDrone.Core.MediaFiles.DiskScanService.Scan (NzbDrone.Core.Tv.Series) <0x004e1>
Jul 29 08:20:45 server sonarr[11521]: at NzbDrone.Core.MediaFiles.DiskScanService.Handle (NzbDrone.Core.Tv.Events.SeriesUpdatedEvent) <0x00023>
Jul 29 08:20:45 server sonarr[11521]: at NzbDrone.Core.Messaging.Events.EventAggregator.PublishEvent<TEvent> (TEvent) <0x00404>
Jul 29 08:20:45 server sonarr[11521]: at NzbDrone.Core.Tv.RefreshSeriesService.RefreshSeriesInfo (NzbDrone.Core.Tv.Series) <0x00a3b>
Jul 29 08:20:45 server sonarr[11521]: at NzbDrone.Core.Tv.RefreshSeriesService.Execute (NzbDrone.Core.Tv.Commands.RefreshSeriesCommand) <0x002b7>
Jul 29 08:20:45 server sonarr[11521]: at NzbDrone.Core.Messaging.Commands.CommandExecutor.ExecuteCommand<TCommand> (TCommand,NzbDrone.Core.Messaging.Commands.CommandModel) <0x001c2
Jul 29 08:20:45 server sonarr[11521]: at (wrapper dynamic-method) object.CallSite.Target (System.Runtime.CompilerServices.Closure,System.Runtime.CompilerServices.CallSite,NzbDrone.
Jul 29 08:20:45 server sonarr[11521]: at System.Dynamic.UpdateDelegates.UpdateAndExecuteVoid3<T0, T1, T2> (System.Runtime.CompilerServices.CallSite,T0,T1,T2) <0x001b4>
Jul 29 08:20:45 server sonarr[11521]: at (wrapper dynamic-method) object.CallSite.Target (System.Runtime.CompilerServices.Closure,System.Runtime.CompilerServices.CallSite,NzbDrone.
Jul 29 08:20:45 server sonarr[11521]: at NzbDrone.Core.Messaging.Commands.CommandExecutor.ExecuteCommands () <0x0024b>
Jul 29 08:20:45 server sonarr[11521]: at System.Threading.Thread.StartInternal () <0x00066>
Jul 29 08:20:45 server sonarr[11521]: at (wrapper runtime-invoke) object.runtime_invoke_void__this__ (object,intptr,intptr,intptr) <0xffffffff>
Jul 29 08:20:45 server sonarr[11521]: Native stacktrace:
Jul 29 08:20:45 server sonarr[11521]: /usr/lib/libmonosgen-2.0.so.1(+0xd2efa) [0x7f64ce2c2efa]
Jul 29 08:20:45 server sonarr[11521]: /usr/lib/libmonosgen-2.0.so.1(+0x48500) [0x7f64ce238500]
Jul 29 08:20:45 server sonarr[11521]: /usr/lib/libpthread.so.0(+0x10660) [0x7f64cdfe3660]
Jul 29 08:20:45 server sonarr[11521]: /usr/lib/libmediainfo.so.0(_ZN12MediaInfoLib11File_MpegTs24Read_Buffer_AfterParsingEv+0x2f6) [0x7f6493b1dad6]
Jul 29 08:20:45 server sonarr[11521]: /usr/lib/libmediainfo.so.0(_ZN12MediaInfoLib13File__Analyze25Open_Buffer_Continue_LoopEv+0x327) [0x7f6493846057]
Jul 29 08:20:45 server sonarr[11521]: /usr/lib/libmediainfo.so.0(_ZN12MediaInfoLib13File__Analyze20Open_Buffer_ContinueEPKhm+0x6b0) [0x7f6493847000]
Jul 29 08:20:45 server sonarr[11521]: /usr/lib/libmediainfo.so.0(_ZN12MediaInfoLib18MediaInfo_Internal20Open_Buffer_ContinueEPKhm+0x3a) [0x7f64938c6d3a]
Jul 29 08:20:45 server sonarr[11521]: /usr/lib/libmediainfo.so.0(_ZN12MediaInfoLib9MediaInfo20Open_Buffer_ContinueEPKhm+0xc) [0x7f649389574c]
Jul 29 08:20:45 server sonarr[11521]: /usr/lib/libmediainfo.so.0(MediaInfo_Open_Buffer_Continue+0x99) [0x7f6493c7fc19]
Jul 29 08:20:45 server sonarr[11521]: [0x40d79b55]
Jul 29 08:20:45 server sonarr[11521]: Debug info from gdb:
Jul 29 08:20:45 server sonarr[11521]: =================================================================
Jul 29 08:20:45 server sonarr[11521]: Got a SIGSEGV while executing native code. This usually indicates
Jul 29 08:20:45 server sonarr[11521]: a fatal error in the mono runtime or one of the native libraries
Jul 29 08:20:45 server sonarr[11521]: used by your application.
Jul 29 08:20:45 server sonarr[11521]: =================================================================

My mono info is identical to the original poster, and my computer info is very similar:

$ uname -a
Linux server 4.1.2-2-ARCH #1 SMP PREEMPT Wed Jul 15 08:30:32 UTC 2015 x86_64 GNU/Linux

I know which commit in the linux kernel triggers it, but I don’t know HOW it affects mono. It’s causing problems indirectly.

To give you an insight in at which level I’m investigating this:

Old & Working: http://hastebin.com/manobazige.mel
vs
New & Broken: http://hastebin.com/pajaquwotu.mel
(Yes, that is assembly language)

the difference: https://github.com/torvalds/linux/commit/c70e1b475f37f07ab7181ad28458666d59aae634

The change itself is minor, but it causes the gcc compiler to inline another method, and somehow that gets it all messed up.
The annoying thing is that this code’s purpose is to measure time… granted, on the nanosecond scale, but still merely measuring time. It shouldn’t be able to cause all these weird problems… yet it does.

I got a couple of theories, but you can understand my frustration that I’m the one investigating this. I should be doing other stuff.

I agree, it doesn’t seem like your responsibility. It seems like whatever is causing libmonosgen-2.0.so to segfault at (+0xd2efa) is what is going out of bounds here, and causing the issue, and that’s likely where the fix should go.

Is there a better way to understand what this library does? I don’t know what language or where it is generated from the mono files, but if you’ve got a little context here I can see what I can do to try to make a fix.

What you’re seeing is merely a symptom of a problem elsewhere. That’s what makes these kind of issues hard to fix.
The problem is also that this is an issue that exists on the boundary between mono and linux, so you can’t point to one party.

I can’t guarantee your crash is the same issue, but it’s likely.

Sorry to revive an older topic, but has there been any change on this issue? Do the mono guys know about this? I don’t think the developers here should have to worry about issues that are in mono, but unless they know about this they’re not going to be able to fix it.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.