Sonarr Crashes on Windows (NzbDrone.exe error with clrjit.dll)

@markus101, i looked at the dump but at 250mb it’s way too small, the memory layout is exactly what you’d expect.
However, if he gets a OOM exception between 250mb and 500mb, then it’s possibly a specific alloc. I’m wondering what kind of allocation in RSS could be causing the problem. Nothing big enough would make it past httpclient.

I’m thinking we need to configure his system to break at OOM. (HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft.NETFramework\GCBreakOnOOM=DWORD:2)
And use procdump -b -e -ma [PID] or procdump -b -ma [PID] not sure which, to get a dump.

Thanks guys. If there is something new you want me to try, please let me know.

Hey guys,

Just wanted to let you know what the latest version released didn’t seem to impact my crash issue positively or negatively.

Let me know if there are any more logs I can provide.

We’ll need the procdump that Taloth suggested above after making a change to the registry. Probably best to try with the first one he suggested and then the 2nd if the first doesn’t work.

@Crenim It’s actually easier:

procdump -e 1 -f "OutOfMemory" {pid}

Stupid that I didn’t figure that out before.

Thank you. I just started a new procdump using that command. I’ll let you know as soon as I have something for you guys to review.

Thank you.

I just suffered another crash and generated a dump file.

A new set of logs is located here.
The dump file is located here.

Crap, I’m so sorry.

That commandline should’ve included -ma so procdump -e 1 -f "OutOfMemory" -ma {pid} (the -ma tells procdump to make a full process memory dump instead of a partial dump.)

Semi-good news is that my debugger actually recognizes the OOM exception… just won’t tell me anything else coz it’s a partial dump.

Can you do it again with the -ma parameter included… pretty please? :slight_smile:

/me hides in a corner.

Just restarted the procdump with the corrected command. I will get back to you.

Thank you!

1 Like

Just had another crash. Here is a new dump with the latest settings you gave me.

Thanks guys!

Thanks for the dump, I’m inspecting it now.

First thing I noticed was that you probably got bitdefender running. There are some reports on the internet that it could be related. It might be worth disabling bitdefender for Sonarr (if it’s possible to do that for a single app). Not saying that’s the cause, but it’s worth checking.

Another update: The Visual Studio debugger didn’t give me more insights, but I moved on to low level debugging with the native debugger, then stuff started to get interesting.
There’s a huge amount of memory fragmentation (one of the things we knew could cause OOM exceptions, but didn’t expect it to actually be the case).

Nerd Warning: Technical content ahead

Here’s the memory allocation from a locally running Sonarr on my own machine:

--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
Free                                    141          6ed9f000 (   1.732 GB)           86.61%
Image                                   640           a22b000 ( 162.168 MB)  59.12%    7.92%
<unknown>                               575           38f2000 (  56.945 MB)  20.76%    2.78%
MappedFile                               26           1de3000 (  29.887 MB)  10.89%    1.46%
Stack                                    60           1340000 (  19.250 MB)   7.02%    0.94%
Heap                                     20            598000 (   5.594 MB)   2.04%    0.27%
Other                                     7             44000 ( 272.000 kB)   0.10%    0.01%
TEB                                      20             32000 ( 200.000 kB)   0.07%    0.01%
PEB                                       1              3000 (  12.000 kB)   0.00%    0.00%

--- Largest Region by Usage ----------- Base Address -------- Region Size ----------
Free                                         f4aa000          41f76000 (   1.031 GB)
Image                                       51773000            e7c000 (  14.484 MB)
<unknown>                                    3c8d000            f63000 (  15.387 MB)
MappedFile                                   1814000           113c000 (  17.234 MB)
Stack                                         d10000             fd000 (1012.000 kB)
Heap                                         7b33000            13c000 (   1.234 MB)
Other                                       7ef40000             23000 ( 140.000 kB)
TEB                                           9c4000              3000 (  12.000 kB)
PEB                                           9c1000              3000 (  12.000 kB)

Two things: Usage Summary->Free is 1.7GB, so there’s alot of free space (divided amoung 141 continuous regions). And Largest Region by Usage>Free is 1.031GB, so the largest piece of available free memory is over a gigabyte.

Now, this is from your memory dump:

--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
Free                                  25784          5e16c000 (   1.470 GB)           73.51%
<unknown>                             26277           e0a6000 ( 224.648 MB)  41.41%   10.97%
Image                                   967           bc94000 ( 188.578 MB)  34.76%    9.21%
Heap                                    579           6f76000 ( 111.461 MB)  20.55%    5.44%
Stack                                    57           1180000 (  17.500 MB)   3.23%    0.85%
Other                                     9             40000 ( 256.000 kB)   0.05%    0.01%
TEB                                      19             13000 (  76.000 kB)   0.01%    0.00%
PEB                                       1              1000 (   4.000 kB)   0.00%    0.00%

--- Largest Region by Usage ----------- Base Address -------- Region Size ----------
Free                                               0             10000 (  64.000 kB)
<unknown>                                    2170000           1498000 (  20.594 MB)
Image                                       64d93000            e7b000 (  14.480 MB)
Heap                                        6a12d000            be3000 (  11.887 MB)
Stack                                        7520000             fc000 (1008.000 kB)
Other                                       7f770000             23000 ( 140.000 kB)
TEB                                         7f549000              1000 (   4.000 kB)
PEB                                         7f79f000              1000 (   4.000 kB)

Same two statistics: Usage Summary->Free is 1.47GB, so there’s alot of free space, but it’s divided amount a whopping 25784 regions. And Largest Region by Usage>Free is 64 KB, so the largest piece of available free memory is virtually nothing.

This is what’s causing the Out of Memory exception, it’s not really running out of memory… it just doesn’t have a big enough continuous piece of it.

So lets dive a little bit deeper, a small piece of the entire memory allocation table:

+ 70f40000 70f41000     1000 MEM_PRIVATE MEM_COMMIT  PAGE_READWRITE                     <unknown>  [................]
+ 70f41000 70f50000     f000             MEM_FREE    PAGE_NOACCESS                      Free       
+ 70f50000 70f51000     1000 MEM_PRIVATE MEM_COMMIT  PAGE_READWRITE                     <unknown>  [................]
+ 70f51000 70f60000     f000             MEM_FREE    PAGE_NOACCESS                      Free       
+ 70f60000 70f61000     1000 MEM_PRIVATE MEM_COMMIT  PAGE_READWRITE                     <unknown>  [................]
+ 70f61000 70f70000     f000             MEM_FREE    PAGE_NOACCESS                      Free       
+ 70f70000 70f71000     1000 MEM_PRIVATE MEM_COMMIT  PAGE_READWRITE                     <unknown>  [................]
+ 70f71000 70f80000     f000             MEM_FREE    PAGE_NOACCESS                      Free       
+ 70f80000 70f81000     1000 MEM_PRIVATE MEM_COMMIT  PAGE_READWRITE                     <unknown>  [................]
+ 70f81000 70f90000     f000             MEM_FREE    PAGE_NOACCESS                      Free       
+ 70f90000 70f91000     1000 MEM_PRIVATE MEM_COMMIT  PAGE_READWRITE                     <unknown>  [................]
+ 70f91000 70fa0000     f000             MEM_FREE    PAGE_NOACCESS                      Free       

As you can see, every 0x10000 (64kbyte) there’s a 0x1000 (4kbyte) block of memory allocated (MEM_COMMIT) for ‘something’.

It’s kind of like throwing needles all over the floor, it won’t cover every inch of the floor, but no matter where you plant your foot, you’re gonna step on one of em.

Anyway, that ‘something’ could be bitdefender, or something else. It’s possible to find out what allocated that region of memory, but not from the dump I have atm. (By default, memory allocations aren’t ‘tracked’)
Btw. those 4kbyte blocks of memory consist of all zeroes.

It’s definitely not Sonarr, but could be a native library Sonarr uses, my bet is still on bitdefender though.

Thanks for the information. I removed bitdefender from the machine and replaced it with a copy of symantec so I have something providing AV on that system. I will let you know what happens. I do have procdump running as well.

The only thing I find odd is that I had Bitdefender running on this machine for almost a year with Sonarr on it and on problems. Then the July patch cycle from Microsoft hit and that’s when the issues started. I suppose it’s possible one of the patches did something, but I have on way to prove that.

That occurred back when this was on a Windows 7 os. I rebuilt the machine as 2012 R2 OS, reinstalled Sonarr, and migrated the library. That did not change the symptoms.

Definitely a pain to try to trace this one down.

Hopefully the Bitdefender theory ends up nailing it down.

Thanks again for the help.

Definitely a pain, but now it’s just a matter of ticking of a checklist.

Stuff will get increasingly technical further down the list. That’s why I’m starting with bitdefender, no guarantees that’s the cause, but it’s the easiest to test out.
Next would be to attach leaktrack to start tracing memory allocations, but I’ll explain that when we get to it.

So far 2 days and no crashes yet. I have seen it make it that far before, so I am not 100% saying that was definitely it as yet. Will keep monitoring and let you guys know.

Thanks again for all the time and effort.

Thanks for letting us know, I’m cautiously optimistic. We’ll know for sure in a few days, I guess.

Hey guys… Just fyi that I am still running stable here after almost a week. This is the longest I have been without a crash in months.

It’s definitely looking like this one is figured out.

1 Like

Two more days of stability guys. That’s 9 days… no crashes.

Sounds like we got it. Anti-virus strikes again breaks something that’s not a virus :slight_smile:

Yes indeed.

Another 5 days gone by and still completely stable.

Thanks again guys!

1 Like