Foreign language in episode title

It’s kinda funny, then again not funny at all. You guys should know how to create good regex.

This here “Madam.Secretary.S02E09.Russian.Roulette.1080p.WEB-DL.DD5.1.H264-NTb” flags as Russian and won’t download…

Release Rejected
English is wanted, but found Russian

It’s funny, that’s not the fault of regex at all.

This is a tough edge case to handle, for a human its easy, we know that’s the episode title (or at least we know its a high chance of being the episode title), but to Sonarr when it relies on parsing before it knows if the release matches a known series and episode it doesn’t know.

But it’s easy, if the series or episode title have a language word, ignore it, right?

Nope, not that easy, what if neither of those titles exists in the file/release name and for most releases the episode title isn’t.

Can we make an educated guess if the episode title is part of the release name?

Probably and we do that for specials, but its unreliable and we miss a lot because it’s all guesswork. Since we have to make a decision we block a few more releases (the number would be low given how many releases have a language name in the episode title) and err on the side of caution. If the release is truly wanted manually searching for it is always an option and it beats grabbing a bunch of wrong language releases and replacing valid files because we guessed wrong.

We do have plans to improve parsing overall and it’d be reasonable to consider this example, but its not a simple change and something we’ve talked about at length over the course of months. Improvements will come and there have been other posts on the same subject, so we’re aware.

Well, I don’t know what kind of release names you try against. But for the most part, normal filenames start with,

Name Season Episode
or
Name Season Episode Title
or
Name (year) Season Episode
or
Name (year) Season Episode Title

I could give examples on how to write regex that will look for “matches” considering “placement” or perhaps its called “look ahead” I’m unfamiliar with the lingo. But the reason I wrote this post in the first place is that I can make way better regex than that. And I know shit about coding and the fact that sonarr does everything else very well I was surprised you guys failed at this.

Sonarr does not have to match it against anything. Well non other then preset rules ofc. Like, what is a filename in sections.

I could give examples, try to help out but then I would probably get banned here as well, so I stay out of it.

You’re over simplifying it, because it’s not that easy, release names are not that consistent and there isn’t an easy way to differentiate between the title portion being a title or codec/quality information.

Here is all the regex that we use for parsing:

And here is all the tests we use to validate it:

If you think you can solve this with regex, please contribute: https://github.com/Sonarr/Sonarr/blob/develop/CONTRIBUTING.md

I could give examples, try to help out but then I would probably get banned here as well, so I stay out of it.

You were banned on IRC because you were aggressive and combative, if you have something meaningful to contribute and can do so without isulting people and being negative, then please do.

That’s how you see it. I don’t. I just say what I think about things. And with meaningful, you just insulted me, think about that.

I will look this over tomorrow. And see if I can help.

Hi,

Its very ambitious trying to support this many variations. Quite impressive actually. Before I processed, tell me this. Why does Sonarr need to know what language it is?

Mostly to eliminate/prefer dubbed versions (for example German audio), partly to eliminate/prefer credits/titles/etc.

can you give me some examples?

I guess this is what you are referring too "Madam.Secretary.S01E16.Tamerlane.German.DL.Dubbed.WEBRiP.x264"
The thing with that is, I’m quite sure Sonarr also checks category. And on most (if not all) they have a foreign category such as TV-Foreign. However this does not matter. In the case of the above, “German” + “Dubbed” would be/should be flagged as foreign.

While for example “Madam.Secretary.S02E09.Russian.Roulette.1080p.WEB-DL.DD5.1.H264-NTb” should not. Not only because it does not include “Dubbed” it also does not belong to the TV-Foreign category.

@markus101 mention “guess work” its not really guessing when you have predetermine factors. Granted that this is not true everywhere perhaps. But lets focus on “Madam.Secretary.S02E09.Russian.Roulette.1080p.WEB-DL.DD5.1.H264-NTb” witch should never been flagged as foreign in the first place.

Or cases when “Ray Donovan - S01E01.720p.HDtv.x264-Evolve (NLsub)” is flagged as foreign Because of the “NLsub” in the filename. Its not Dutch language, only subbed and can be turned off.

The bellow examples, where do they come from? they seem made up.

As for the others examples such as “Salamander.S01E01.FLEMISH.HDTV.x264-BRiGAND” that does not contain “Dubbed” in the filename. Seem extremely rare and I’m jet to find an actual release. And if so, does it belong in a Foreign category where its found?

Now all this seems silly when a release is rejected because being “foreign” does not happen very often. Especially when a release contains “Language” in the title. Nevertheless these are my thoughts.

They are, they are there to ensure we’re parsing correctly.

When you have zero knowledge about series/episode information, because the parser is a blackbox that doesn’t know anything other than how to parse releases into basic information it doesn’t have any information to make a smarter decision.

The TV-Foreign category contains items uploaded to specific groups (unless otherwise moved), if only that was used releases that were uploaded to non-foreign groups would always be grabbed.

Dubbed is a reasonable marker, but not always used and would miss things like Csi Cyber - S01E07 - german sub Hardcoded - HDTV.x264 and transfers the issue to the word dubbed, which would catch this atrocity Kid.vs.Kat.S01E03.Do.Not.Fort.sake.MeCookie.D.Uh.TVRip.XviD.720x576.HeB.DubbeD-WwW.DDL-IL

Being able to turn off the subs wasn’t always true, this might not be a problem any more, but was a problem previously, maybe excluding all of them isn’t ideal, but the only other option is to draw a line and say anything released before this date is foreign when tagged with NLsub and anything after is fine, but causes issues if its reposted later on.

I’m not trying to set the impression that we’re not going to do anything, because we are, but its a big undertaking to improve the parser and continue to match things accurately, this issue is well known and something that needs improvement, but its not something we’re focused on at the moment.

Absolutely no idea what you mean by that.

The predetermine factors is when for example the word “Dubbed” exist. Or the show is found in the “TV-Foreign” categories.

No idea what you really mean with this either.

Anyway, yes, there is no easy fix for this. Lets leave it at that =)

I believe what @markus101 means is that episodes aren’t always correctly uploaded to their corresponding categories, and being so, there are always some stray episodes. like episodes in foreign languages in the English TV category, which would be selected if the parser didn’t check according to the current critieria. Sometimes neither the word sub nor dubbed exist, yet the episode is on some other language than English that it can’t be changed at runtime, and yet it is not in the TV-Foreign category.

Please correct me if I’m wrong :slight_smile:

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.