क्या यह घटित हो सकता है यदि संख्याओं द्वारा वर्णित भाषा नियमित है?


14

यह ज्ञात है कि 0 और 1 के समान संख्या वाले शब्दों की भाषा नियमित नहीं है, जबकि 001 और 100 की समान संख्या वाले शब्दों की भाषा नियमित है ( यहां देखें )।

दो शब्दों को देखते हुए डब्ल्यू 1 , डब्ल्यू 2w1,w2 , यदि की संख्या के बराबर युक्त शब्दों का भाषा डिसाइडेबल है w 1w1 और डब्ल्यू 2w2 नियमित है?


क्या आप नियमित रूप से परिभाषित भाषाओं के अन्य उदाहरण दे सकते हैं, 1 i 01i0 और 01 i के अलावा01i , या 0 i 10i1 और 10 i10i ? एक 3 प्रतीकों वर्णमाला पर एक उदाहरण के बारे में क्या?
बाबू

अगर डब्ल्यू 1w1 के एक सख्त subword है डब्ल्यू 2w2 , वहाँ एक बड़ा मौका है भाषा खाली है, इसलिए नियमित रूप से है। मैं अन्य उदाहरण नहीं जानता।
sdcvvc

मुझे पूरी तरह से संदेह है कि उपरोक्त उदाहरण केवल एक ही हैं, जो समस्या को निर्णायक बना देगा। यदि आप केवल दो सब्सट्रिंग्स निर्दिष्ट करते हैं, तो मुझे लगता है कि यह सीएफ है ... आप घटनाओं के बारे में जो निर्दिष्ट कर सकते हैं उसके आधार पर। आप सटीक नहीं बनाते हैं कि आपके द्वारा "घटित घटनाओं की संख्या" से क्या मतलब है।
Babou

प्रश्न निकाय सटीक IMO पर्याप्त है।
sdcvvc

1
विशेष मामलों के लिए अभी तक समाधान इस विचार पर टिका हुआ है कि डब्ल्यू 1 के सबस्ट्रिंग की घटनाएं केवल डब्ल्यू 2 में हस्तक्षेप करने की केवल एक घटना की गारंटी देती हैं । इसलिए किसी भी तरह से वर्तमान उत्तर सही हैं [यह मेरे लिए अभी तक स्पष्ट नहीं है] ऐसा लगता है कि डब्ल्यू 1 , डब्ल्यू 2 के बीच कुछ संबंध है जो स्ट्रिंग को स्कैन करने के बीच में गारंटी देता है कि कोई भी "समान" या "असमान" हो सकता है। ", लेकिन केवल" असमान "मामले के लिए अधिकतम परिमित संख्या से दूर। w1w2w1w2
vzn

जवाबों:


3

दो शब्दों को देखते हुए डब्ल्यू 1 , डब्ल्यू 2 , यदि भाषा डिसाइडेबल है एल के बराबर संख्या वाले शब्दों के w 1 और डब्ल्यू 2 नियमित है?w1w2Lw1w2

पहले कुछ परिभाषाएँ:
उन्हें अधिक संक्षिप्त बनाया जा सकता है, और यदि प्रमाणों में उनका उपयोग किया जाना है, तो सूचनाओं में सुधार किया जा सकता है। यह केवल एक पहला मसौदा है।

डब्ल्यू 1 और डब्ल्यू 2 शब्दों को देखते हुए , हम कहते हैं कि: w1w2

  • डब्ल्यू 1 हमेशा होता हैके साथ डब्ल्यू 2 , विख्यात डब्ल्यू 1डब्ल्यू 2 , iff w1 w2w1w2

    1. किसी भी स्ट्रिंग के लिए रों ऐसी है कि रों = एक्स डब्ल्यू 2 y के साथ | x | ,ss=xw2y| Y | | डब्ल्यू 1 | + | डब्ल्यू 2 |  और | x | 0 , | x | 1 | , | y | 0 , | y | 1 | 1 है एक और अपघटन रों = एक्स ' डब्ल्यू 1 y 'नोट: वह स्थिति जो x और y हैx,y w1+w2|x|0,|x|1|,|y|0,|y|1|1s=xw1y
      xyप्रत्येक में कम से कम एक 0 होते हैं और एक 1 एक रोग मामले (@sdcvvc से पाया जाता है) के लिए आवश्यक है: डब्ल्यू 1 = 1 मैं 0 , डब्ल्यू 2 = v 1 मैं + j और y 1 * , और उसके सममित वेरिएंट।w1=1i0w2=v1i+jy1
    2. वहाँ एक स्ट्रिंग है रों = एक्स डब्ल्यू 2 y के साथ | x | ,s=xw2yy ≥∣w1+w2x,y w1+w2 such that there is at most one decomposition s=xw1ys=xw1y
  • w1w1 always cooccurs with w2w2, noted w1w2w1w2, iff each always occur with the other,

  • w1w1 and w2w2 occur independently, noted w1w2w1w2, iff neither one always occur with the other,

  • w1w1 always occurs mm times or more than w2w2, noted w1mw2w1mw2, iff for any string ss such that s=xw2ys=xw2y with x, y| ≥∣w1+w2x, y| w1+w2 there are mm other decompositions s=xiw1yis=xiw1yi for i[1,m]i[1,m] such that ijij implies xixjxixj.

These definitions are constructed so that we can ignore what happens at the ends of the string where w1w1 and w2w2 are supposed to occur. Boundary effects at the end of the string have to be analyzed separately, but they represent a finite number of cases (actually I think I forgot one or two such boundary sub-cases in my first analysis below, but it does not really matter). The definitions are compatible with overlap of occurrences.

There are 4 main cases to consider (ignoring the symetry between w1w1 and w2w2):

  1. w1w2w1w2
    Both words come necessarily together, except possibly at the ends of the string. This concerns only pairs of the form 1i01i0 and 01i01i, or 0i10i1 and 10i10i. This is easily recognized by a finite automaton that only checks for lone occurences at both ends of the string to be recognized, to make sure there is a lone occurrence at both ends or at neither end. There is also the degenerate case when w1=w2w1=w2: then the language L is obviously regular.

  2. w1w2w1w2, but not w2w1w2w1
    One of the 2 words cannot occur without the other, but the converse is not true (except possibly at the ends of the string). This happens when:

    • w1w1 is a substring of w2w2:then a finite automaton can just check that w1w1 does not occur outside an instance of w2w2.

    • w1=1i0w1=1i0 and w2=v1jw2=v1j for some word v{0,1}v{0,1}, v01iv01i: then a finite automaton check as in the previous case that w1w1 does not occur separated from w2w2. However, the automaton allows counting one extra instance of w1w1 that will allow acceptance if w2w2 is a suffix of the string. There are three other symetrical cases (1-0 symmetry and left-right symetry).

  3. w12w2w12w2
    One of the 2 words occurs twice in the other. That can be recognized by an a finite automation that checks that the smaller word never occurs in the string. The is also a slightly more complex variant that combines the two variations of case 2. In this case the automaton checks that the smaller string 1i01i0 never occurs, except possibly as part of vv in the larger one v1jv1j coming as a suffix of the string (and 3 other cases by symetry).

  4. w1w2w1w2
    The 2 words can occur independently of each other. We build a generalized-sequential-machine (gsm) GG that output aa when it recognizes an occurrence of w1w1 and bb when recognizing an occurrence of w2w2, and forgets everything else. The language LL is regular only if the language G(L)G(L) is regular. But G(L)={w{a,b} wa=∣wb}G(L)={w{a,b} wa=wb} which is clearly context-free and not regular. Hence LL is not regular.
    Actually we have L=G1(G(L))L=G1(G(L)). Since regular languages and context-free languages are closed under gsm mapping and inverse gsm mapping, we know also that LL is context free.

One way to organize a formal proof could be the following. First build a PDA that recognizes the language. Actually it can be done with a 1-counter machine, but it is easier to have two stack symbols to avoid duplicating the finite control. Then, for the cases where it should be a FA, show that the counter can be bounded by a constant that depends only on the two words. For the other cases show that the counter can reach any arbitrary value. Of course, the PDA should be organized so that the proofs are easy enough to carry.

Representing the FA as a 2-stack-symbols PDA is probably the simplest representation for it. In the non-regular case, the finite control part of the PDA is the same as that of the GSM in the proof sketch above. Instead of outputting aa's and bb's like the GSM, the PDA counts the difference in number with the stack.


I had a question about context-freeness in the case of three words. I deleted it when I realised it could be analyzed similarly. I had first thought that proving non-CFness would make an original exercise, but the GSM ruins it.
babou

2
It is not clear what do you mean by "occur independently of each other", "come necessarily together" etc. Please write formal definitions instead, and prove that they cover all cases.
sdcvvc

1
I am not sure what you are asking, and what level of formalization you need, for what purpose. I realized that analyzing by hand possible relations of the two words is not garanteed to be correct, and does not matter anyway. What matters is whether an occurence of one word can exist without creating at the same time an occurence (or several) of the other word. The details do not matter as it will always be localized and thus manageable finitely. The two ends do not matter either as tey are localized too. Even overlaps of occurrences do not matter since they can only be finitely many in 1 place
babou

1
I asked you about precise definitions of the terms mentioned in the comment. Thank you for writing them. Was I supposed to guess them previously? Anyway, you seem to claim that 0i110i. This does not satisfy condition 1. of the definition of "w1 always occurs with w2", since there is no occurrence of 10i in s=0M0i11M.
sdcvvc

Sorry, I did not mean to make you guess. It only took me time to understand what exactly you wanted. My failing only. Regarding your counter example, you are correct. But for me it only means that I have to be a little bit more careful about telomeres, in the definition of the relations. I defined them too quickly, but 0M or 1M do not convey much information in this context. This is really a boundary pathological example within a pathological case, that actually cannot occur when more than 2 symbols are used. I just do not believe it changes anything.
babou
हमारी साइट का प्रयोग करके, आप स्वीकार करते हैं कि आपने हमारी Cookie Policy और निजता नीति को पढ़ और समझा लिया है।
Licensed under cc by-sa 3.0 with attribution required.