Python regular expression to identify parentheses pairs in a string; unless the parentheses are in a squared bracket?

I have a strings that look like this:

[object]-ABGF-[A-BEC(2)]-LRPG-[object]
ABCDEFGHDGSASDASR-(typ1)-ASDHASDUASIUDHAS-[object]
[object]-RLC(1)-C(2)-GF-[obj]-KSASDASD-[obj3]-ASD-[object]
[object3]-RLC(1)-C(2)-GF-[Hyp]-KSCRSRQCK-[Hyp]-HRCC-[amide]
ABCDEFGHIJK(1)-GHGSHS(2)-ABCDE
ABCDD(1)-ASDASDASD(1)-ASBFIFD
ASDASDASD(1)-ASDASADJASJS(2)-ERASDASD

The aim is: I want to find the strings, where there is parentheses() with numbers, but the numbers aren't in pairs (i.e. there's not two identical numbers in two sets of parentheses), EXCEPT if the parentheses is itself in a square bracket.

For example, the seqs I want to identify from the list above are:

[object]-RLC(1)-C(2)-GF-[obj]-KSASDASD-[obj3]-ASD-[object]
[object3]-RLC(1)-C(2)-GF-[Hyp]-KSCRSRQCK-[Hyp]-HRCC-[amide]
ABCDEFGHIJK(1)-GHGSHS(2)-ABCDE
ASDASDASD(1)-ASDASADJASJS(2)-ERASDASD

Because each number in parentheses should be in the sequence in pairs, so should have two (1)s in this case, whereas these have a (1) and a (2).

But the following sequence should NOT be returned, because the single parentheses is in a square bracket:

[object]-ABGF-[A-BEC(2)]-LRPG-[object] (because it's in a square bracket)

I can't hard-code in the numbers, because this should check for, in theory, an infinite number of pairs, e.g. if the sequence was:

AVDD(1)-ASDAS(2)-ASDAFJF(3)-ASDAS(1)-ASGGG(2)

this should also be returned, because (3) is missing a pair, even though 1 and 2 are paired properly.

But if the sequence was:

ASDHD(9)-ASDJAS(9) 

This would NOT be returned, because that is a pair of parentheses (and the parentheses aren't in brackets)

I wrote this code:

circle_pattern = re.compile(r'\(([a-z0-9]+)\)')
if circle_regex:
x_list = ["(" + re.sub("\d", "x", i) + ")" for i in circle_regex]
check_if_even = dict(Counter(circle_regex))
for k,v in check_if_even.items():
       if v % 2 != 0:
                print(row)

Which returns:

[object]-ABGF-[A-BEC(2)]-LRPG-[object]
ABCDEFGHDGSASDASR-(typ1)-ASDHASDUASIUDHAS-[object]
[object]-RLC(1)-C(2)-GF-[obj]-KSASDASD-[obj3]-ASD-[object]
[object3]-RLC(1)-C(2)-GF-[Hyp]-KSCRSRQCK-[Hyp]-HRCC-[amide]
ABCDEFGHIJK(1)-GHGSHS(2)-ABCDE
ASDASDASD(1)-ASDASADJASJS(2)-ERASDASD

But could someone show me how to amend this code, so it does not return the first sequence, because in that case (2) is in squared brackets?



Read more here: https://stackoverflow.com/questions/66276708/python-regular-expression-to-identify-parentheses-pairs-in-a-string-unless-the

Content Attribution

This content was originally published by Slowat_Kela at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: