Torrent download ends in hung mono and 225 extra processes on Ubuntu 14

Current develop b4c628
Ubuntu 14 updated latest, freshly installed in VM (KVM/Proxmox)
mono 3.10.0 (same thing happened with supplied mono of earlier version)

snip from log (set for Trace):

14-12-9 22:34:43.9|Debug|VideoFileInfoReader|Getting media info from /root/Downloads/zzz.S01E04.720p.HDTV.X264-DIMENSION[et]/zzz.S01E04.720p.HDTV.X264-DIMENSION.mkv
14-12-9 22:34:43.9|Debug|SampleService|Runtime is over 90 seconds
14-12-9 22:34:43.9|Trace|ConfigService|Unable to find config key 'downloadclientworkingfolders' defaultValue:'_UNPACK_|_FAILED_'
14-12-9 22:34:43.9|Debug|Parser|Parsing string 'zzz.S01E04.720p.HDTV.X264-DIMENSION[et]'

14-12-9 22:34:43.9|Debug|Parser|Episode Parsed. zzz- S01E04
14-12-9 22:34:43.9|Debug|Parser|Language parsed: English
14-12-9 22:34:43.9|Debug|NzbDrone.Core.Parser.QualityParser|Trying to parse quality for zzz.S01E04.720p.HDTV.X264-DIMENSION[et]
14-12-9 22:34:43.9|Debug|Parser|Quality parsed: HDTV-720p v1
14-12-9 22:34:43.9|Debug|Parser|Release Group parsed: DIMENSION
14-12-9 22:34:43.9|Debug|EpisodeFileMovingService|Copying episode file: [0]  to /mnt/mede8er/TV Series/zzz/Season 1/zzz.S01E04.720p.HDTV.X264-DIMENSION[et].mkv
14-12-9 22:34:43.9|Trace|ConfigService|Unable to find config key 'copyusinghardlinks' defaultValue:'False'
14-12-9 22:34:43.9|Debug|EpisodeFileMovingService|Copy [/root/Downloads/zzz.S01E04.720p.HDTV.X264-DIMENSION[et]/zzz.S01E04.720p.HDTV.X264-DIMENSION.mkv] > [/mnt/mede8er/TV Series/zzz/Season 1/zzz.S01E04.720p.HDTV.X264-DIMENSION[et].mkv]
14-12-9 22:35:00.0|Trace|EventAggregator|Publishing UpdateQueueEvent
14-12-9 22:35:00.0|Trace|EventAggregator|UpdateQueueEvent -> QueueModule
14-12-9 22:35:00.0|Trace|EventAggregator|UpdateQueueEvent <- QueueModule
Waiting for data... (interrupt to abort)

This is the end of the log
(had to remove the regex parser log statement that confused this editor)

On the healthpage that was showing at the time the following was displayed:

Unable to communicate with download client Object reference not set to an instance of an object

About 225 kworker processes where created at that minute

root      1864  0.0  0.0      0     0 ?        S    22:35   0:00 [kworker/0:122]
root      1865  0.0  0.0      0     0 ?        S    22:35   0:00 [kworker/0:123]
root      1866  0.0  0.0      0     0 ?        S    22:35   0:00 [kworker/0:124]

mono process looks healthy when checked with ps aux (yes, running under root to make sure thats not the problem)

root       893  3.2  3.9 1078068 158692 ?      Ssl  22:26   1:02 mono --debug /opt/NzbDrone/NzbDrone.exe

When i try to kill the mono process i get a zombie

root@tv3:~# kill -9 893
root@tv3:~# ps aux | grep mono
root       893  3.1  0.0      0     0 ?        Zsl  22:26   1:02 [mono] 

As with most zombies nothing but reboot can kill it.

Is there anything else i can do to help trace this?

Kind Regards,
Thomas

Update: Tested both with transmission-daemon and deluged

Did you compile Sonarr yourself? The actual version number is much more meaningful for us (should be 2.0.0.2408 in this case).

3.2.8? Where did you get 3.10 from? we include 3.10 in our repo, so its all there when you apt-get install nzbdrone

Pastebin is the best place for logs, its just plain text and the thread doesn’t fill up with logs.

I am curious why we’re not logging anything here, I’d expect more info in the logs but don’t see anything, I’ll have to take a look.

How many torrents are being downloaded/seeded?

Deluge’s Web API is pretty unreliable (though we have previously implemented some fixes), but nothing to the extend of it crashing.

Running mono with the --debug switch is good (I see thats enabled), if we can narrow down where its hanging (does it seem to hang in the same spot?) then --trace with some filters would probably be good to track it down (@Taloth should be able to help there).

Nothing here jumps out as something we should be digging into to troubleshoot, so at this point trying to figure out if it fails in the same place is a good start.

Is /mnt/mede8er a local disk or a network drive?

Version 2.0.0.2407 (installed with apt)

Compiled mono myself after first crash.

Was only downloading 2 torrents at the time, the first one finished had both a sample and a normal file, not packed.

/mnt/mede8er is a network drive connected to mediaplayer disk.

I added some additional logging to figure out the underlying error with the download client health check (it will log an error with more details).

We’ve noticed issues with SMB shares and mono, we’re working on a means to detect and prevent it, but as of right now it still exists.