Alternative to hard linking and copying

bobbintb · June 9, 2016, 2:58am

I know this is a long shot but I was hoping for some ideas. I have a Linux server that Sonarr downloads to. The server provides a share that is composed of a number of disks and represented as one large disk. If I copy files directly to the share it puts them on one of those disks, depending on things like free space and such. So for technical reasons, I cannot use hard links on the share. I have a lot of files though so this is taking up a huge amount of unnecessary space because it is copying. I have 8 disks composing that share. I can however use hard links on the individual drives themselves. The problem with using an individual disk from that share is that it makes managing shows in Sonarr impossible because right now one show might have files spread across all disks so I would have to move data all around to make sure every file for that show is contained on one disk. Even then I would have multiple disks to manage and keep track of, especially if drives start filling up, which kind of defeats the purpose of have a virtual disk share in the first place. Ideally, having Sonarr just move the files instead of copying (I don’t do any renaming) and continue seeding would be ideal but I know that’s now possible. I am hoping someone can help me come up with a creative solution.

lordjynx · June 9, 2016, 4:28am

I’m kinda side-stepping your question and it’s not creative but are all the drives the same size? If so, I’d just do mdadm to raid or try a XFS file system… sounds easier to me than having them scattered all over the place!

loco88 · June 10, 2016, 12:45am

If you didn’t want to be the one manually managing and your disks are all different sizes, you could use greyhole (https://www.greyhole.net/) to do the managing of file distribution shares across disks easily enough. Should then make it easy enough to manage your seeding/symlinks assuming I understand your issue correctly.

bobbintb · June 10, 2016, 3:47pm

The drive aren’t the same size, and they are already XFS. It’s basically a software RAID, but I still can access the disks individually.

I think I may have come up with a solution but it depends on when the custom post processing scripts triggers and whether I can pass values back. My thinking is that before Sonarr copies the file, I can run a script that will find out which disk behind the share the file is actually located on and just change the source location to that. Then a hard link will work fine.

So then, at what point in the process is a custom post processing script ran and can a script pass values back to Sonarr?

markus101 · June 10, 2016, 5:07pm

After the import of the file, you can not pass values back to Sonarr.

You could script something that does the importing and then tells Sonarr to update the series and turn of Completed Download Handling in Sonarr.

bobbintb · June 10, 2016, 7:34pm

Well, I know it’s after it imports but is that before the copy/link process? I guess I’m just not clear on if copying/linking is part of the import process or not. It also looks like Sonarr uses environmental variables to pass to the script but does Sonarr use those variables itself during processing or does it simply populate those values with it’s own internal variables? My thinking is since the script can’t pass variables back to Sonarr, maybe the script can change the environmental variable, assuming Sonarr uses them to do it’s processing, which I doubt is the case but I thought it is worth asking.

markus101 · June 10, 2016, 7:36pm

The import is the copy/hardlink/move operation.

It uses its own values, those environment variables are set for the process running the script only.

bobbintb · June 10, 2016, 7:42pm

Ok, I was afraid of that. Looks like there’s no good options then. Thanks.

I don’t quite understand what you mean here:

[quote=“markus101, post:5, topic:11069, full:true”]You could script something that does the importing and then tells Sonarr to update the series and turn of Completed Download Handling in Sonarr.
[/quote]

Could you elaborate? What would be the outcome of that?

markus101 · June 10, 2016, 7:57pm

Have something else do the sorting, maybe something custom because of the requirements and that tells Sonarr to update its library after the file is copied (so Sonarr knows the file is there and doesn’t need to wait up to 12 hours to find out).

Whatever does the sorting would need to hardlink the file to the correct location (or you could move the download file and re-link it in the download client if you prefer) and then tell Sonarr to rescan the series. Turning off CDH is important so Sonarr doesn’t try to import it.

bobbintb · June 10, 2016, 8:47pm

Ah, ok, I think I get it. I was confusing CDH with the Completed Download Handling - Remove option. So just to make sure I’m understanding it correctly, if CDH is off, it will simply send the torrent to the torrent client and that’s it, right? But it will also run the post processing script?

If I understand that correctly, with a little work, I think that’s the option that I have been looking for. But when you say “script something that does the importing” are you talking about the actual script doing the importing or are you talking about the scripting triggering Sonarr to import?

What I understand you to be saying is turn off CDH, use a custom script to find the drive the show resides on and then trigger CDH from the script on the correct location.

I hope I’m making sense and not over explaining myself. It’s Friday and my words have not been the best today.

markus101 · June 10, 2016, 11:08pm

Correct.

No, because nothing will be imported.

The script will need to do the importing, if it tells Sonarr to import the file (which is possible) Sonarr will decide where to put it.

No the script will need to do the moving/copying/hardlinking of the file, the benefit of that script talking to Sonarr at all is to tell Sonarr it was imported (instead of seeing the file as missing), that step is optional, but would be better overall. The approach is advanced and unique to your setup.

I’m curious about the share that you’re using though, if the torrents were downloaded to the share would hardlinking work? I have a similar limitation due to Drive Bender not supporting hardlinks, since I rename only symlinks would be viable for me though.

bobbintb · June 11, 2016, 1:27am

Ok, if post processing doesn’t happen unless I import, how would my script even be triggered? Are you talking about a cronjob or something?

Ah, hadn’t thought of that. My thinking was, since I know anything done in the webui can be done in the API, I could have the script trigger a manual import and point it to the file on the actual disk behind the share, without realizing I wouldn’t be able to have control over the destination. There doesn’t seem to be a way to dictate that, which is fine if that’s how it is. Just trying to find the least invasive option. In that case, could you tell me where in the source code the import function is? That way I can mimic the process instead of having to start from scratch.

Actually it’s the opposite. I’ll try not to over explain it. The software RAID system is called UnRAID if you’re curious. It provides parity and drive pooling like a RAID. I have 8 disks and 2 parity disks. The 8 disks are under /mnt/diskX/ with X, of course being the drive number. Those disks can be hard linked. Any folder on the root of those drives will create a virtual share (or I guess a drive pool is a better way to look at it) that will be mounted under /mnt/users/name_of_share. So say you have files /mnt/disk1/downloads/episode1.mkv and /mnt/disk2/downloads/episode2.mkv. Both of those files will show up in the downloads share mounted under /mnt/users/downloads/ even though they are on different drives. Anything under /mnt/users/ cannot be hard linked for obvious reasons. However any hard links made from the drives themselves will work just fine and still be valid under the user share since the hard linking doesn’t affect pooling.

The way I have it set up now, I download directly to the share because I have one drive to deal with and I can see everything on all 8 drives. This won’t allow me to hard link however. I could hard link if I downloaded to a specific disk instead of a share but this creates numerous management issues such as that disk filling up much faster and the the others not being filled at all because everything is going to one disk instead of being spread out. Or if I had shows A-D go to disk 1, E-H to disk 2, etc, well this creates different management issues, as does copying, that I have to deal with when the whole point of my setup is to lessen management. The ideal solution would be to download directly to the user share but then have a process figure out which disk the file is actually on and hard link that.

Actually, this gives me another idea that might work. What does Sonarr use to hard link? I image it just uses whatever is baked into the OS, right? I’m on Linux so would it just use ln? If I cannot change the values Sonarr uses to hard link, maybe I can use some kind of man-in-the-middle between Sonarr and ln. Does that sound feasible?

markus101 · June 11, 2016, 1:50am

Yeah, something like a cronjob.

Which part of it, its in several places depending what its doing, but I don’t think any of it will be all that useful.

Interesting, I’ve used unRAID before and was looking at using it again, but wasn’t sure if/how hardlinks worked, which for my use would need to work across the share, which won’t work as you described.

Doesn’t downloading straight to the share add a lot of overhead since parity calculations are being performed as writes are happening (unless you’re using a cache drive of course).

Its all done through mono, we don’t explicitly call ln, so you can’t MITM the calls to ln with your own app.

bobbintb · June 11, 2016, 3:18am

[quote=“markus101, post:13, topic:11069”]
Which part of it, its in several places depending what its doing, but I don’t think any of it will be all that useful.[/quote]

The part that does the regex match to the filename and then decides where to finally link/copy/move it to.

Yeah, it makes the writing process slower, but so does writing it to the cache drive first and then copying it to the share. The cache drive is really meant to speed up the write process and delay writing to the array until a time when it’s more convenient. The download speed is slower than writing to the user share anyway so it doesn’t really affect anything for me.

Well, mono then. Going back to my question about the source code, where would that be? I think if I can find what command Sonarr sends to mono to hard link, I could MITM it by having a script that forwards all commands that it receives from Sonarr to mono (in case Sonarr sends commands other than hard links, which I’m sure it does) but will also reformat any commands to hard link. Sonarr is in a docker container so there are no worries with it messing with another program.

markus101 · June 11, 2016, 9:18am

The parsing is done here: https://github.com/Sonarr/Sonarr/blob/develop/src/NzbDrone.Core/Parser/Parser.cs but there are other parts that determine which series and episode it is (ParsingService).

Mono is how Sonarr runs on non-windows machines, its not a command that is called. Here is where the hardlink is created: https://github.com/Sonarr/Sonarr/blob/develop/src/NzbDrone.Mono/DiskProvider.cs#L119

bobbintb · June 11, 2016, 11:54pm

Oh, duh. I don’t know what I was thinking but I should have realized what you meant as soon as you said mono. I was just tired and didn’t catch it. I know you don’t running call command from mono/.net like that.

Anyway, thanks for all the help/suggestions. I think I can come up with a workable solution. Also, I know my use case scenario is rather unique but I wonder if it might be worth a feature request to have the post-processing script run even if CDH is turned off. There aren’t many reasons one would choose to turn CDH off, but I would imagine most if not all of those reasons would be because something else is handling the file. In that case being able to trigger that process with a post-processing script would be useful, as opposed to a cron job or watch folder or something. I also wouldn’t mind the ability for scripts to pass information back to Sonarr.

markus101 · June 12, 2016, 5:24pm

It runs when Sonarr imports files from a place other than the series folder, manual import, CDH, drone factory or when told to import through the API. I don’t see how an option would help here, if Sonarr isn’t doing the importing what would the script do and how would it know what parameters to use at that point you might as well call the script yourself.

bobbintb · June 13, 2016, 3:53am

Not trying to beat a dead horse here but I had another idea that I wanted to run by you markus101. At what point does Sonarr know where the file is located? When the torrent client reports to it after the download is finished? My thought is I can run a script from rtorrent, assuming I can ever figure out rtorrent scripting, so that when the download completes it will find the correct disk and change the save location from the user share to the actual disk. Then Sonarr will be able to hard link. There are plenty of different events that I could potentially use to trigger it, I just need to make certain that is happens before the information gets sent to Sonarr. Does that sound like it could work?

markus101 · June 13, 2016, 9:00pm

It knows while its downloading as well, but if it changed before Sonarr knew it was completed then it wouldn’t matter, Sonarr would just see it at the new location.

Definitely important, but if it moved and Sonarr failed to find the file it would try again a few minutes later, the issue would be partial files getting imported. If you could figure it out earlier (when the torrent is added) then Sonarr would never be able to import a partial file.

Seems reasonable, no clue how powerful rtorrent scripting is, but it seems plausible.

bobbintb · June 14, 2016, 1:47am

Ok, great, that’s kind of what I thought. There are a lot of events rtorrent uses so I don’t think it will be an issue to find some kind of “on torrent add” trigger. The only issues are rtorrent is essentially downloading 2 files, the .torrent file from the magnet think and the video files. If I’m not careful, the magnet link might cause problems. The other issue is there is virtually no documentation on rtorrent scripting and the forum (for rutorrent actually) is not accessible to new users as the activation emails never get sent, at least for the half dozen email addresses and providers I’ve tried anyway.

Anyway, thanks again for all the help. The level of support you consistently provide is above and beyond.