नियमित भाव नहीं हैं


36

यहां तक ​​कि कंप्यूटर विज्ञान की पृष्ठभूमि वाले किसी व्यक्ति से पूछें कि एक नियमित अभिव्यक्ति क्या है, और इसका उत्तर एक परिमित-राज्य ऑटोमेटन की पहुंच के दायरे से बाहर जाने की संभावना है।

उदाहरण के लिए, "नियमित अभिव्यक्ति"

/^1?$|^(11+?)\1+$/

विख्यात पर्ल व्यक्तित्व अबीगैल द्वारा निर्मित (और 2002 से पर्ल के परीक्षण सूट का हिस्सा ) एक ऐसी मशीन का वर्णन करता है जो केवल समग्र संख्याओं को स्वीकार करता है, लेकिन पीटर लिंज़ के तीसरे संस्करण में औपचारिक भाषा और ऑटोमेटा के पाठक का उपयोग करने के लिए 4.5 (बी) का उपयोग करता है। यह साबित करने के लिए पम्पिंग लेम्मा

L={an:n is not a prime number}

is not a regular language.

In contexts where the distinction is important, what should we call the strictly more powerful expressions?

जवाबों:


46

Larry Wall proposed that we use "regular expression" for the formalism Kleene proposed, and "regex" for expressions for the widely used extensions. It's a fairly widely followed convention. If you want to make it clear that you are talking about regular expressions in the formal languages sense, it is usually not difficult to translate into talk of regular languages.

The power of regexes comes from backtracking, and there has been work done on automata for regular languages with backtracking. See, in particular, Becchi & Crowley, 2008, Extending Finite Automata to Efficiently Match Perl-Compatible Regular Expressions.


5
I agree, something like "Perl regex" ("POSIX regex", etc.) vs. "regular language" should be clear enough to prevent any possibility of misinterpretation.
Jukka Suomela

Perl regexes have a lot more additional features than just backtracking.
पुनर्निरीक्षक

@reinierpost True, but I think backtracking is the most important one from a formal languages perspective. Perl regexes have features like executing arbitrary Perl code, but I think regexes should be interpreted loosely as covering PCREs. PCREs contain such oddities as recursive patterns, but these are dark arts, taking you far outside the realm of regular languages. I could update my answer to cover these, though.
Charles Stewart

18

These expressions have been examined by Aho (Handbook of Theoretical Computer Science, Vol. A, Chp. 5) and Campeanu, Salomaa, Yu ("A formal study of practical regular expressions", International Journal of Foundations of Computer Science, 14:1007–1018, 2003), as well as some of follow-up papers.

Aho calls the more powerful expressions "rewbr" (regular expression with backreferences), Campeanu et al. use "extended regular expression" as well as "practical regular expression". As it seems, "extended regular expression" is the term most commonly used in recent literature.

Building on the term "rational expression" from the French school, and considering the fact that those expressions are used in the real world, I myself like "real expression".

Addendum: A chapter in my PhD thesis deals with this class of formal languages (the corresponding paper is due to appear at STACS 2011). While writing that chapter and the paper, I experimented with various terms. Finally, I decided to use extended regular expressions for the model with backreferences, and proper regular expressions for the nice and normal regular expressions. As it is quite annoying to change the terminology in a paper that is already completely (or mostly) written, I think that some might be interested in the experiences that led to my choice:

First, regex and rewbr don't really roll of the tongue, and using them again and again in the course of a whole paper got really tiresome to write and read, in particular when using any of the possible plural forms. PERL-like regular expressions were also quite unwieldy. Of course, I am no native speaker, so YMMV.

Second, as soon as one wants to talk about both models, it is convenient to use terms that are a variation of regular expression, as this allows one to emphasize similarity or differences as needed (e.g., "a regular expression, be it proper or extended"). Furthermore, this allows one to easily emphasize the special case of "extended regular expressions without backreferences", when talking about special cases in the whole class, instead of comparing different models.

Third, I preferred to use a term that is already used in literature over a newly coined term, which left me the choice between extended regular expressions and practical regular expressions. The second choice implied (at least implicitly) that proper regular expressions are somehow impractical, which felt rather weird (especially as Google's RE2 does not use backrefs, and appears to be quite practical).

Of course, this choice is only my "personal local maximum", and depending on ones needs, other choices might be more appropriate.


7
Unfortunately, the term extended regular expression is already taken by POSIX, which distinguishes between basic regular expression (BRE) and extended regular expression (ERE), both of which are extended regular expressions according to your definition.
जॉर्ज डब्ल्यू मित्तग सेप

@Jörg: Actually according to this neither extended nor basic POSIX regular expressions are more powerful than regular regular expressions. And pure (non-GNU) BRE seem to actually be less powerful than regular expressions (missing an alternation operator).
sepp2k

See "On Extended Regular Expressions" by Carle and Narendran (2009) for more recent results about this "rewbr": portal.acm.org/citation.cfm?id=1533235
जकोब

Further recent results on this language class: "On the intersection of regex languages with regular languages" by Campeanu and Santean (TCS 410, 2009) "A Polynomial Time Match Test for Large Classes of Extended Regular Expressions" by Reidenbach and Schmid (CIAA 2010), and "Extended Regular Expressions: Succinctness and Decidability" (by me, due to appear at STACS 2011).
Dominik D. Freydenberger

6

यह ज्ञात है कि पर्ल का तथाकथित रेगेक्सप ट्यूरिंग पूर्ण होने के लिए पर्याप्त शक्तिशाली है; वहाँ भी सामान्य प्रोग्राम से पर्ल regexp के लिए एक संकलक है।

इसलिए मुझे संदेह है कि इस तरह के "रेगेक्स" के लिए एक नाम की खोज करना समझ में आता है।

उदाहरण के लिए देखें http://search.cpan.org/~asavige/Acme-EyeDrops-1.62/lib/Acme/EyeDrops.pm


Do you have some pointers?
András Salamon

5
@András: I think Arthur is talking about Perl's ?{CODE} directive, that allows pattern expressions to interleave program code in regular expressions. I understand that PCREs are ususally defined as being the "declarative" part of the language, the whole language being called the pattern language. According to WP, Aho, 1990, "Algorithms for finding patterns in strings" shows that the membership problem for regular languages with backtracking is NP complete. There are no other hard features to declarative PCREs.
Charles Stewart

I added the link; I didn't look at the source code, so i do not really know how it works and if there is any proof that the compilation is really correct.
Arthur MILCHIOR

1
Sorry, but according to your argument, since lambda-calculus is Turing-complete, it did not make sense to search a name for it. Same for all other Turing-complete computation formalisms and languages. More to the point, Turing-completeness does not describe how expressive a language is, so it makes no sense to identify languages just because they are Turing-complete. My example about lambda-calculus was an extreme one, of course.
Blaisorblade


1

Given the other answers, I would suggest that "regular languages" is safe, and after briefly remarking the difference, to talk about "practical regular expressions" for regexs (with backtracking).

Also note that the same regexp, as a regular expressions and as a practical one, can have different semantics, because in the latter case semantics are defined in term of backtracking, with different results. Details would be off-topic, but I will answer if you ask another question on that (maybe on SO rather than here, dunno) and notify me through a comment.


0

We could call them pattern expressions. This might introduce confusions with pattern languages, but at least these are less common.


2
In principle, I agree with your reasoning, but Campeanu, Santean, and Yu have already used the term pattern expressions to denote a similar class of languages with a "cleaner" definition (see "Pattern expressions and pattern automata", IPL 92 (2004).
Dominik D. Freydenberger
हमारी साइट का प्रयोग करके, आप स्वीकार करते हैं कि आपने हमारी Cookie Policy और निजता नीति को पढ़ और समझा लिया है।
Licensed under cc by-sa 3.0 with attribution required.