आइटम खोजने के लिए तुलना-आधारित डेटा संरचना


34

क्या कोई डेटा संरचना है जो n आइटमों की अनियंत्रित सारणी लेती है , में प्रीप्रोसेसिंग करती है O(n)और प्रश्नों का उत्तर देती है: क्या सूची में कुछ तत्व x , बुरे समय में प्रत्येक क्वेरी O(logn) ?

मुझे वास्तव में लगता है कि ऐसा नहीं है, इसलिए इस बात का प्रमाण है कि कोई भी स्वागत योग्य नहीं है।


3
(1) मुझे नहीं पता कि आप यह क्यों कह सकते हैं "बेशक मैं अपेक्षित समय पर विचार करता हूं," क्योंकि आप अपने प्रश्न में "अपेक्षित" बिल्कुल भी नहीं बताते हैं। कह से पहले अधिक सटीकता से अपने सवाल राज्य के लिए प्रयास करें (2) कृपया परिभाषित "निश्चित रूप से।" "गैर hashable।"
त्सुयोशी इतो

2
(१) मैं देखता हूँ। स्पष्टीकरण के लिए धन्यवाद। अगर किसी ने पूछा कि "क्या आप रनिंग टाइम की परवाह करते हैं?" "नॉन-वॉश करने योग्य" कहने के बजाय, क्या आप प्रश्न को संपादित कर सकते हैं ताकि लोगों को यह समझने के लिए टिप्पणियों को पढ़ना न पड़े कि "नॉन-वॉशेबल" का मतलब क्या है?
त्सुयोशी इतो

3
वैसे, यदि आप इसे साबित नहीं कर सकते हैं, तो आप क्यों जानते हैं कि यह असंभव है? यदि यह एक पाठ्यपुस्तक या एक कक्षा में एक अभ्यास है, तो आप एक गलत वेबसाइट पर पूछ रहे हैं।
त्सुयोशी इतो

6
क्या यह आपका प्रश्न है: क्या कोई डेटा संरचना है जो n आइटमों की अनियंत्रित सारणी लेती है, O (n) में प्रीप्रोसेसिंग करती है और प्रश्नों का उत्तर देती है: क्या सूची में कुछ तत्व x, बुरे समय में प्रत्येक क्वेरी O (लॉग एन) है?
sdcvvc

2
@ फिलिप: यह देखना आसान है? अगर यह सच है, तो मैं मानता हूं कि यह सवाल हल करता है।
त्सुयोशी इतो

जवाबों:


30

Here's a proof that it's impossible. Suppose you could build such a data structure. Build it. Then choose n/logn items at random from the list, add ϵ to each of them, where ϵ is smaller than the difference between any two items on the list, and perform the queries to check whether any of the resulting items is in the list. You've performed O(n) queries so far.

I would like to claim that the comparisons you have done are sufficient to tell whether an item a on the original list is smaller than or larger than any new item b. Suppose you couldn't tell. Then, because this is a comparison-based model, you wouldn't know whether a was equal to b or not, a contradiction of the assumption that your data structure works.

Now, since the n/logn items you chose were random, your comparisons have with high probability given enough information to divide the original list into n/logn lists each of size O(logn). By sorting each of these lists, you get a randomized O(nloglogn)-time sorting algorithm based solely on comparisons, a contradiction.


A few hints to help understanding the proof (assuming I understand it correctly myself of course): the b items should be filled in by the items after ϵ has been added to them; the comparison model guarantees you know which of the cases ab and ab holds; the n/logn lists are in 'ascending order': every element in any higher list is higher than every element in any lower list; after the original queries you have enough information to make the lists around the items you randomly chose,
Alex ten Brink

(continued) note that you don't even have to explicitly be able to build the list in the provided time for the proof to hold.
Alex ten Brink

I don't quiet understand this proof. The final contradiction is off "algorithm based solely on comparisons", but in the first steps of our algorithm we added ϵ to each item (further, "where ϵ is smaller than the difference between any two items on the list"). Why are we still justified that our algorithm still only comparison based if we assumed our items have a non-discrete total order on them?
Artem Kaznatcheev

5
@Artem: Your original input consists of elements xX. Then you construct a new set X=X×{0,1}; you represent an original xX as (x,0)X and a modified x+ϵ as (x,1)X. Now you use the black box algorithm; the algorithm compares elements of X to each other; to answer such queries, you only need to compare a constant number of elements of X to each other. Hence everything should be doable in the comparison model, with a constant overhead.
Jukka Suomela

1
@Aryabhata: it does. What is the O(log2n) algorithm?
Peter Shor

24

I believe here is a different proof, proving the impossibility of an O(logkn) query time structure, with O(n) pre-processing.

Suppose in the preprocessing you do O(n) comparisons, leading to a partial order.

Now consider the size A of the largest antichain in that. Since these elements are not comparable, for us to have an O(logkn) query algorithm, we must have that A=O(logkn).

Now by Dilworth's theorem, there is a partition of size A, into chains.

Now we can complement the algorithm to determine the chains in the partition. We can determine if two elements are comparable by creating a directed graph of comparisons and doing a reachability analysis. This can be done without any additional comparisons. Now just brute force out each possible partition of size A to determine if it is a partition of chains.

Once we have the chains, we can merge them to give an O(nloglogn) comparisons algorithm for sorting the whole list.


1
This is a nice idea. And if you could show that the chain partition must be known to the algorithm then you could use mergesort to show that it would only take an additional O(n log log n) comparisons to sort the whole input, rather than using Jensen. But there is a problem: why must the preprocessing algorithm construct a chain partition? A chain partition must exist, yes, but that's very different from it being known to the algorithm.
David Eppstein

8
Ok, I now believe this proof. And it shows more strongly that to achieve polylog query time you have to use a number of comparisons that's within an additive O(nloglogn) of sorting. Nice. By the way, the chain partition can be found in polynomial time from the set of comparisons already performed, rather than needing a brute force search, but that doesn't make any difference for your argument.
David Eppstein

6
The proofs actually show that you must have either Ω(nlogn) preprocessing or Ω(n) for each query. Of course both are tight. This shows that binary search and linear search are the only "interesting" search algorithms (at least in the classical world).
Yuval Filmus

1
@Yuval: maybe you should write up this observation as an actual answer, because it seems to me that you have to do a moderate amount of work to get the above result from the proofs in the answers.
Peter Shor

1
@Yuval: thinking about the proofs, I only see that you must have either Ω(nlogn) preprocessing or Ω(n1ϵ) query time for all ϵ. It's possible to have o(nlogn) preprocessing time and O(n/logn) query time. One can divide the list into logn lists of size n/logn each in time θ(nloglogn) using repeated median-finding.
Peter Shor

0

As Peter Shor's answer notes, to rule out membership in a comparison-based model, we need to know how the element compares with every member. Thus, using k<n random queries (the number of members smaller than the queried nonmember is random), we gain Θ(nlogk) information relative to having n unsorted values. Therefore, for some constant c>0, using cnlogk preprocesssing, we cannot have cnlogk/k query cost. This is optimal up to a constant factor since we can sort the data into k=k/logkn/logn approximately equal buckets (each bucket unsorted) in O(nlogk)=O(nlogk) time, which allows O(n/k) query cost.

In particular, using O(n) preprocessing, we cannot have o(n) query cost. Also, o(nlogn) preprocessing corresponds to k in O(nε) for every ε>0 and thus Ω(n1ε) query cost.

हमारी साइट का प्रयोग करके, आप स्वीकार करते हैं कि आपने हमारी Cookie Policy और निजता नीति को पढ़ और समझा लिया है।
Licensed under cc by-sa 3.0 with attribution required.