माणिक
मैंने जितना संभव हो सके रूबी के रेगेक्स स्वाद के वास्तविक सिंटैक्स से मिलान करने की कोशिश की, लेकिन कुछ क्विर्क हैं: यह कुछ लुकबाइंड को स्वीकार करता है जो वास्तव में अमान्य (जैसे (?<=(?<!))
) हैं, और यह खाली वर्ण श्रेणियों को पहचानता है D-A
। उत्तरार्द्ध ASCII के लिए तय किया जा सकता है, लेकिन रेगेक्स काफी लंबा है जैसा कि यह है।
\A(?<main>
(?!
\{(\d+)?,(\d+)?\} # do not match lone counted repetition
)
(?:
[^()\[\]\\*+?|<'] | # anything but metacharacters
(?<cclass>
\[ \^? (?: # character class
(?: # character class
[^\[\]\\-] | # anything but square brackets, backslashes or dashes
\g<esc> |
\[ : \^? (?: # POSIX char-class
alnum | alpha | word | blank | cntrl | x?digit | graph | lower | print | punct | space | upper
) : \] |
- (?!
\\[dwhsDWHS]
) # range / dash not succeeded by a character class
)+ |
\g<cclass> # more than one bracket as delimiter
) \]
) |
(?<esc>
\\[^cuxkg] | # any escaped character
\\x \h\h? | # hex escape
\\u \h{4} | # Unicode escape
\\c . # control escape
) |
\\[kg] (?:
< \w[^>]* (?: > | \Z) |
' \w[^']* (?: ' | \Z)
)? | # named backrefs
(?<! (?<! \\) \\[kg]) [<'] | # don't match < or ' if preceded by \k or \g
\| (?! \g<rep> ) | # alternation
\( (?: # group
(?:
\?
(?:
[>:=!] | # atomic / non-capturing / lookahead
(?<namedg>
< [_a-zA-Z][^>]* > |
' [_a-zA-Z][^']* ' # named group
) |
[xmi-]+: # regex options
)
)?
\g<main>*
) \) |
\(\?<[!=] (?<lbpat>
(?! \{(\d+)?,(\d+)?\} )
[^()\[\]\\*+?] |
\g<esc> (?<! \\[zZ]) |
\g<cclass> |
\( (?: # group
(?:
\?: |
\? \g<namedg> |
\? <[!=]
)?
\g<lbpat>*
) \) |
\(\?\# [^)]* \)
)* \)
|
\(\? [xmi-]+ \) # option group
(?! \g<rep> )
|
\(\?\# [^)]*+ \) # comment
(?! \g<rep> )
)+
(?<rep>
(?:
[*+?] | # repetition
\{(\d+)?,(\d+)?\} # counted repetition
)
[+?]? # with a possessive/lazy modifier
)?
)*\Z
अपठनीय संस्करण:
\A(?<main>(?!\{(\d+)?,(\d+)?\})(?:[^()\[\]\\*+?|<']|(?<cclass>\[\^?(?:(?:[^\[\]\\-]|\g<esc>|\[:\^?(?:alnum|alpha|word|blank|cntrl|x?digit|graph|lower|print|punct|space|upper):\]|-(?!\\[dwhsDWHS]))+|\g<cclass>)\])|(?<esc>\\[^cuxkg]|\\x\h\h?|\\u\h{4}|\\c.)|\\[kg](?:<\w[^>]*(?:>|\Z)|'\w[^']*(?:'|\Z))?|(?<!(?<!\\)\\[kg])[<']|\|(?!\g<rep>)|\((?:(?:\?(?:[>:=!]|(?<namedg><[_a-zA-Z][^>]*>|'[_a-zA-Z][^']*')|[xmi-]+:))?\g<main>*)\)|\(\?<[!=](?<lbpat>(?!\{(\d+)?,(\d+)?\})[^()\[\]\\*+?]|\g<esc>(?<!\\[zZ])|\g<cclass>|\((?:(?:\?:|\?\g<namedg>|\?<[!=])?\g<lbpat>*)\)|\(\?#[^)]*\))*\)|\(\?[xmi-]+\)(?!\g<rep>)|\(\?#[^)]*+\)(?!\g<rep>))+(?<rep>(?:[*+?]|\{(\d+)?,(\d+)?\})[+?]?)?)*\Z