मैं एक सूची में डुप्लिकेट कैसे खोज सकता हूं और उनके साथ एक और सूची बना सकता हूं?

437

मैं पायथन सूची में डुप्लिकेट कैसे खोज सकता हूं और डुप्लिकेट की दूसरी सूची बना सकता हूं? सूची में केवल पूर्णांक होते हैं।

python list duplicates

— MFB
स्रोत

1

संभव डुप्लिकेट कैसे आप पायथन में एक सूची से डुप्लिकेट को हटाते हैं, जबकि संरक्षण क्रम है?

— ध्रुवपथक

1

क्या आप एक बार डुप्लिकेट चाहते हैं, या हर बार इसे फिर से देखा जाता है?

— मूओइपिप

मुझे लगता है कि इसका जवाब यहाँ बहुत अधिक दक्षता के साथ दिया गया है। stackoverflow.com/a/642919/1748045 चौराहा सेट की विधि में बनाया गया है और इसे ठीक उसी तरह से करना चाहिए जो आवश्यक है

— टॉम स्मिथ

545

डुप्लिकेट का उपयोग निकालने के लिए set(a)। डुप्लिकेट प्रिंट करने के लिए, कुछ इस तरह से:

a = [1,2,3,2,1,5,6,5,5,5]

import collections
print([item for item, count in collections.Counter(a).items() if count > 1])

## [1, 2, 5]

ध्यान दें कि Counterविशेष रूप से कुशल नहीं है ( समय ) और शायद यहाँ ओवरकिल। setबेहतर प्रदर्शन करेंगे। यह कोड स्रोत क्रम में अद्वितीय तत्वों की एक सूची की गणना करता है:

seen = set()
uniq = []
for x in a:
    if x not in seen:
        uniq.append(x)
        seen.add(x)

या, अधिक संक्षेप में:

seen = set()
uniq = [x for x in a if x not in seen and not seen.add(x)]

मैं बाद की शैली की सिफारिश नहीं करता, क्योंकि यह स्पष्ट नहीं है कि क्या not seen.add(x)कर रहा है (सेट add()विधि हमेशा लौटती है None, इसलिए इसकी आवश्यकता है not)।

पुस्तकालयों के बिना डुप्लिकेट तत्वों की सूची की गणना करने के लिए:

seen = {}
dupes = []

for x in a:
    if x not in seen:
        seen[x] = 1
    else:
        if seen[x] == 1:
            dupes.append(x)
        seen[x] += 1

यदि सूची तत्व उपलब्ध नहीं हैं, तो आप सेट / डाइक का उपयोग नहीं कर सकते हैं और उन्हें द्विघात समय समाधान (प्रत्येक से प्रत्येक की तुलना करें) का सहारा लेना होगा। उदाहरण के लिए:

a = [[1], [2], [3], [1], [5], [3]]

no_dupes = [x for n, x in enumerate(a) if x not in a[:n]]
print no_dupes # [[1], [2], [3], [5]]

dupes = [x for n, x in enumerate(a) if x in a[:n]]
print dupes # [[1], [3]]

— जॉर्ज
स्रोत

2

@eric: मुझे लगता है कि यह है O(n), क्योंकि यह केवल एक बार सूची को पुनरावृत्त करता है और लुकअप सेट होता है O(1)।

— georg

3

@ ह्यूगो, डुप्लिकेट की सूची देखने के लिए, हमें बस एक नई सूची बनाने की आवश्यकता है जिसे डुप कहा जाता है और एक और बयान जोड़ना होगा। उदाहरण के लिए:dup = [] else: dup.append(x)

— क्रिस नीलसन

4

@oxeimon: आपको शायद यह मिल गया है, लेकिन आप प्रिंट को कोष्ठक के साथ अजगर 3 में बुलाया जाता हैprint()

— Moberg

4

केवल डुप्लिकेट प्राप्त करने के लिए सेट () के लिए अपना उत्तर परिवर्तित करना। seen = set()तोdupe = set(x for x in a if x in seen or seen.add(x))

— Ta946

2

पायथन 3.x के लिए: प्रिंट ([आइटम के लिए आइटम, संग्रह में गिनती। मुठभेड़ (ए) .items () अगर गिनती> 1])

— kibitzforu

327

>>> l = [1,2,3,4,4,5,5,6,1]
>>> set([x for x in l if l.count(x) > 1])
set([1, 4, 5])

— रितेश कुमार
स्रोत

2

क्या कोई कारण है कि आप एक जनरेटर समझ के बजाय एक सूची समझ का उपयोग करते हैं?

64

वास्तव में, एक सरल समाधान, लेकिन जटिलता चुकता है क्योंकि प्रत्येक गणना () फिर से सभी सूची को पार्स करती है, इसलिए बड़ी सूची के लिए उपयोग न करें।

— danuker

4

@ जॉन्ज, बबल सॉर्ट भी सरल है और काम करता है। इसका मतलब यह नहीं है कि हमें इसका उपयोग करना चाहिए!

— जॉन ला रोय

@JohnLaRooy वास्तव में इसका मतलब यह है कि हमें इसका उपयोग नहीं करना चाहिए, क्योंकि सॉर्ट करने के लिए लगभग हमेशा अधिक कुशल (और सरल) तरीका होता है।

— खो दिया

1

@ वॉट्सनिक: आपका "सरल स्विच" सामान्य मामले में द्विघात से समय की जटिलता को कम करने में विफल रहता है। केवल के lसाथ प्रतिस्थापित set(l)करने से सबसे खराब समय की जटिलता कम हो जाती है और इसलिए इस उत्तर के साथ बड़े पैमाने पर दक्षता चिंताओं को संबोधित करने के लिए कुछ भी नहीं होता है । यह सब के बाद शायद इतना आसान नहीं था। संक्षेप में, यह मत करो।

— सेसिल करी

82

आपको गिनती की आवश्यकता नहीं है, बस आइटम पहले देखा गया था या नहीं। अनुकूलित है कि इसका जवाब इस समस्या का:

def list_duplicates(seq):
  seen = set()
  seen_add = seen.add
  # adds all elements it doesn't know yet to seen and all other to seen_twice
  seen_twice = set( x for x in seq if x in seen or seen_add(x) )
  # turn the set into a list (as requested)
  return list( seen_twice )

a = [1,2,3,2,1,5,6,5,5,5]
list_duplicates(a) # yields [1, 2, 5]

बस गति के मामले में, यहाँ कुछ समय हैं:

# file: test.py
import collections

def thg435(l):
    return [x for x, y in collections.Counter(l).items() if y > 1]

def moooeeeep(l):
    seen = set()
    seen_add = seen.add
    # adds all elements it doesn't know yet to seen and all other to seen_twice
    seen_twice = set( x for x in l if x in seen or seen_add(x) )
    # turn the set into a list (as requested)
    return list( seen_twice )

def RiteshKumar(l):
    return list(set([x for x in l if l.count(x) > 1]))

def JohnLaRooy(L):
    seen = set()
    seen2 = set()
    seen_add = seen.add
    seen2_add = seen2.add
    for item in L:
        if item in seen:
            seen2_add(item)
        else:
            seen_add(item)
    return list(seen2)

l = [1,2,3,2,1,5,6,5,5,5]*100

यहाँ परिणाम हैं: (@JohnLaRooy अच्छी तरह से किया!)

$ python -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'
10000 loops, best of 3: 74.6 usec per loop
$ python -mtimeit -s 'import test' 'test.moooeeeep(test.l)'
10000 loops, best of 3: 91.3 usec per loop
$ python -mtimeit -s 'import test' 'test.thg435(test.l)'
1000 loops, best of 3: 266 usec per loop
$ python -mtimeit -s 'import test' 'test.RiteshKumar(test.l)'
100 loops, best of 3: 8.35 msec per loop

दिलचस्प बात यह है कि, समय के अलावा, रैंकिंग भी थोड़ा बदल जाता है जब पीपे का उपयोग किया जाता है। सबसे दिलचस्प बात यह है कि काउंटर-आधारित दृष्टिकोण, पीपीपी के अनुकूलन से बेहद लाभान्वित होता है, जबकि मेरे द्वारा सुझाए गए विधि कैशिंग दृष्टिकोण का लगभग कोई प्रभाव नहीं पड़ता है।

$ pypy -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'
100000 loops, best of 3: 17.8 usec per loop
$ pypy -mtimeit -s 'import test' 'test.thg435(test.l)'
10000 loops, best of 3: 23 usec per loop
$ pypy -mtimeit -s 'import test' 'test.moooeeeep(test.l)'
10000 loops, best of 3: 39.3 usec per loop

स्पष्ट रूप से यह प्रभाव इनपुट डेटा के "डुप्लिकेटेडनेस" से संबंधित है। मैंने l = [random.randrange(1000000) for i in xrange(10000)]इन परिणामों को सेट और प्राप्त किया है:

$ pypy -mtimeit -s 'import test' 'test.moooeeeep(test.l)'
1000 loops, best of 3: 495 usec per loop
$ pypy -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'
1000 loops, best of 3: 499 usec per loop
$ pypy -mtimeit -s 'import test' 'test.thg435(test.l)'
1000 loops, best of 3: 1.68 msec per loop

— moooeeeep
स्रोत

6

बस जिज्ञासु - saw_add = saw.add का उद्देश्य क्या है?

— रॉब

3

@Rob इस तरह से आप केवल उस फ़ंक्शन को कॉल करते हैं जो आपने पहले देखा है। अन्यथा आपको addहर बार एक आवेषण आवश्यक होगा जो एक सदस्य को (एक शब्दकोष क्वेरी) देखना होगा।

— moooeeeep

मेरे स्वयं के डेटा और Ipython के% समय के साथ जांच की गई कि आपकी विधि परीक्षण पर सबसे तेज़ दिखती है: "सबसे धीमी गति से चलने में 4.34 गुना अधिक समय लगता है। इसका मतलब यह हो सकता है कि एक मध्यवर्ती परिणाम कैश किया जा रहा है"

— जोप

1

@mooeeeeeep, मैंने आपके स्क्रिप्ट के लिए एक और संस्करण जोड़ा है आप कोशिश करने के लिए :) इसके अलावा कोशिश करें pypyकि क्या आपके पास यह काम है और गति के लिए जा रहे हैं।

— जॉन ला रूय

@JohnLaRooy प्रदर्शन में अच्छा सुधार! दिलचस्प है, जब मैंने परिणामों का मूल्यांकन करने के लिए पेपी का इस्तेमाल किया, तो काउंटर-आधारित दृष्टिकोण में काफी सुधार हुआ।

— मूओइपिप

42

आप उपयोग कर सकते हैं iteration_utilities.duplicates:

>>> from iteration_utilities import duplicates

>>> list(duplicates([1,1,2,1,2,3,4,2]))
[1, 1, 2, 2]

या यदि आप केवल प्रत्येक डुप्लिकेट में से एक चाहते हैं तो इसे इसके साथ जोड़ा जा सकता है iteration_utilities.unique_everseen:

>>> from iteration_utilities import unique_everseen

>>> list(unique_everseen(duplicates([1,1,2,1,2,3,4,2])))
[1, 2]

यह अस्वास्थ्यकर तत्वों को भी संभाल सकता है (हालांकि प्रदर्शन की कीमत पर):

>>> list(duplicates([[1], [2], [1], [3], [1]]))
[[1], [1]]

>>> list(unique_everseen(duplicates([[1], [2], [1], [3], [1]])))
[[1]]

यह कुछ ऐसा है जो केवल कुछ अन्य दृष्टिकोणों को यहां संभाल सकते हैं।

मानक

मैंने यहां वर्णित दृष्टिकोणों का एक त्वरित बेंचमार्क किया, जिसमें अधिकांश (लेकिन सभी नहीं) थे।

पहले बेंचमार्क में केवल सूची-लंबाई की एक छोटी श्रृंखला शामिल थी क्योंकि कुछ दृष्टिकोणों में O(n**2)व्यवहार होता है।

रेखांकन में y- अक्ष समय का प्रतिनिधित्व करता है, इसलिए कम मूल्य का मतलब बेहतर है। यह लॉग-लॉग भी प्लॉट किया गया है ताकि मूल्यों की विस्तृत श्रृंखला को बेहतर तरीके से देखा जा सके:

O(n**2)एप्रोच को हटाते हुए मैंने एक और बेंचमार्क एक सूची में आधे मिलियन तत्वों तक किया:

जैसा कि आप देख सकते हैं कि iteration_utilities.duplicatesदृष्टिकोण किसी भी अन्य दृष्टिकोण unique_everseen(duplicates(...))से तेज है और यहां तक कि जंजीर अन्य दृष्टिकोणों की तुलना में तेज या समान रूप से तेज है।

यहां ध्यान देने वाली एक और दिलचस्प बात यह है कि छोटी सूचियों के लिए पांडा के दृष्टिकोण बहुत धीमे होते हैं, लेकिन आसानी से लंबी सूची के लिए प्रतिस्पर्धा कर सकते हैं।

हालाँकि जैसा कि ये बेंचमार्क दिखाते हैं कि अधिकांश दृष्टिकोण समान रूप से प्रदर्शन करते हैं, इसलिए यह बहुत ज्यादा मायने नहीं रखता है कि कौन सा उपयोग किया जाता है (3 के अलावा जो O(n**2)रनटाइम था )।

from iteration_utilities import duplicates, unique_everseen
from collections import Counter
import pandas as pd
import itertools

def georg_counter(it):
    return [item for item, count in Counter(it).items() if count > 1]

def georg_set(it):
    seen = set()
    uniq = []
    for x in it:
        if x not in seen:
            uniq.append(x)
            seen.add(x)

def georg_set2(it):
    seen = set()
    return [x for x in it if x not in seen and not seen.add(x)]   

def georg_set3(it):
    seen = {}
    dupes = []

    for x in it:
        if x not in seen:
            seen[x] = 1
        else:
            if seen[x] == 1:
                dupes.append(x)
            seen[x] += 1

def RiteshKumar_count(l):
    return set([x for x in l if l.count(x) > 1])

def moooeeeep(seq):
    seen = set()
    seen_add = seen.add
    # adds all elements it doesn't know yet to seen and all other to seen_twice
    seen_twice = set( x for x in seq if x in seen or seen_add(x) )
    # turn the set into a list (as requested)
    return list( seen_twice )

def F1Rumors_implementation(c):
    a, b = itertools.tee(sorted(c))
    next(b, None)
    r = None
    for k, g in zip(a, b):
        if k != g: continue
        if k != r:
            yield k
            r = k

def F1Rumors(c):
    return list(F1Rumors_implementation(c))

def Edward(a):
    d = {}
    for elem in a:
        if elem in d:
            d[elem] += 1
        else:
            d[elem] = 1
    return [x for x, y in d.items() if y > 1]

def wordsmith(a):
    return pd.Series(a)[pd.Series(a).duplicated()].values

def NikhilPrabhu(li):
    li = li.copy()
    for x in set(li):
        li.remove(x)

    return list(set(li))

def firelynx(a):
    vc = pd.Series(a).value_counts()
    return vc[vc > 1].index.tolist()

def HenryDev(myList):
    newList = set()

    for i in myList:
        if myList.count(i) >= 2:
            newList.add(i)

    return list(newList)

def yota(number_lst):
    seen_set = set()
    duplicate_set = set(x for x in number_lst if x in seen_set or seen_set.add(x))
    return seen_set - duplicate_set

def IgorVishnevskiy(l):
    s=set(l)
    d=[]
    for x in l:
        if x in s:
            s.remove(x)
        else:
            d.append(x)
    return d

def it_duplicates(l):
    return list(duplicates(l))

def it_unique_duplicates(l):
    return list(unique_everseen(duplicates(l)))

बेंचमार्क 1

from simple_benchmark import benchmark
import random

funcs = [
    georg_counter, georg_set, georg_set2, georg_set3, RiteshKumar_count, moooeeeep, 
    F1Rumors, Edward, wordsmith, NikhilPrabhu, firelynx,
    HenryDev, yota, IgorVishnevskiy, it_duplicates, it_unique_duplicates
]

args = {2**i: [random.randint(0, 2**(i-1)) for _ in range(2**i)] for i in range(2, 12)}

b = benchmark(funcs, args, 'list size')

b.plot()

बेंचमार्क 2

funcs = [
    georg_counter, georg_set, georg_set2, georg_set3, moooeeeep, 
    F1Rumors, Edward, wordsmith, firelynx,
    yota, IgorVishnevskiy, it_duplicates, it_unique_duplicates
]

args = {2**i: [random.randint(0, 2**(i-1)) for _ in range(2**i)] for i in range(2, 20)}

b = benchmark(funcs, args, 'list size')
b.plot()

अस्वीकरण

^{1 यह एक तीसरे पक्ष के पुस्तकालय से है जिसे मैंने लिखा है iteration_utilities:।}

— MSeifert
स्रोत

1

मैं अपनी गर्दन यहाँ से हटाने जा रहा हूँ और पायथन के बजाय C में काम करने के लिए एक bespoke लाइब्रेरी लिखने का सुझाव दूंगा जो शायद उस उत्तर की भावना नहीं है जिसकी तलाश की जा रही थी - लेकिन यह एक वैध दृष्टिकोण है! मुझे उत्तर की चौड़ाई और परिणामों का चित्रमय प्रदर्शन पसंद है - यह देखने के लिए बहुत अच्छा है कि वे अभिसरण कर रहे हैं, मुझे आश्चर्य होता है कि क्या वे कभी भी पार करते हैं क्योंकि इनपुट आगे बढ़ते हैं! प्रश्न: पूरी तरह से यादृच्छिक सूचियों के विपरीत ज्यादातर छांटे गए सूचियों के साथ परिणाम क्या है?

— F1Rumors

30

मैं इस प्रश्न से संबंधित था, जिसमें कुछ संबंधित था - और आश्चर्य है कि किसी ने जनरेटर आधारित समाधान की पेशकश क्यों नहीं की? इस समस्या का समाधान होगा:

>>> print list(getDupes_9([1,2,3,2,1,5,6,5,5,5]))
[1, 2, 5]

मैं स्केलेबिलिटी से चिंतित था, इसलिए कई दृष्टिकोणों का परीक्षण किया, जिसमें भोले आइटम शामिल हैं जो छोटी सूचियों पर अच्छी तरह से काम करते हैं, लेकिन बड़े पैमाने पर बड़े पैमाने पर सूचियां मिलती हैं (नोट- समयसीमा का उपयोग करना बेहतर होता, लेकिन यह निराशाजनक है)।

मैंने तुलना के लिए @ moooeeeep को शामिल किया (यह प्रभावशाली रूप से तेज़ है: सबसे तेज़ अगर इनपुट सूची पूरी तरह से यादृच्छिक है) और एक इटर्टूलस दृष्टिकोण जो कि ज्यादातर छांटे गए सूचियों के लिए फिर से तेज़ है ... अब @firelynx से पांडा दृष्टिकोण शामिल है - धीमा, लेकिन नहीं बुरी तरह से, और सरल। नोट - सॉर्ट / टी / जिप दृष्टिकोण मेरी मशीन पर लगातार बड़ी क्रमबद्ध सूचियों के लिए सबसे तेज़ है, मुओइदीप फेरबदल सूचियों के लिए सबसे तेज़ है, लेकिन आपका माइलेज भिन्न हो सकता है।

लाभ

एक ही कोड का उपयोग करके 'किसी भी' डुप्लिकेट के लिए परीक्षण करने के लिए बहुत जल्दी सरल

मान्यताओं

डुप्लिकेट केवल एक बार रिपोर्ट किया जाना चाहिए
डुप्लिकेट आदेश को संरक्षित करने की आवश्यकता नहीं है
सूची में कहीं भी डुप्लिकेट हो सकता है

सबसे तेज़ समाधान, 1 मी प्रविष्टियाँ:

def getDupes(c):
        '''sort/tee/izip'''
        a, b = itertools.tee(sorted(c))
        next(b, None)
        r = None
        for k, g in itertools.izip(a, b):
            if k != g: continue
            if k != r:
                yield k
                r = k

दृष्टिकोण का परीक्षण किया गया

import itertools
import time
import random

def getDupes_1(c):
    '''naive'''
    for i in xrange(0, len(c)):
        if c[i] in c[:i]:
            yield c[i]

def getDupes_2(c):
    '''set len change'''
    s = set()
    for i in c:
        l = len(s)
        s.add(i)
        if len(s) == l:
            yield i

def getDupes_3(c):
    '''in dict'''
    d = {}
    for i in c:
        if i in d:
            if d[i]:
                yield i
                d[i] = False
        else:
            d[i] = True

def getDupes_4(c):
    '''in set'''
    s,r = set(),set()
    for i in c:
        if i not in s:
            s.add(i)
        elif i not in r:
            r.add(i)
            yield i

def getDupes_5(c):
    '''sort/adjacent'''
    c = sorted(c)
    r = None
    for i in xrange(1, len(c)):
        if c[i] == c[i - 1]:
            if c[i] != r:
                yield c[i]
                r = c[i]

def getDupes_6(c):
    '''sort/groupby'''
    def multiple(x):
        try:
            x.next()
            x.next()
            return True
        except:
            return False
    for k, g in itertools.ifilter(lambda x: multiple(x[1]), itertools.groupby(sorted(c))):
        yield k

def getDupes_7(c):
    '''sort/zip'''
    c = sorted(c)
    r = None
    for k, g in zip(c[:-1],c[1:]):
        if k == g:
            if k != r:
                yield k
                r = k

def getDupes_8(c):
    '''sort/izip'''
    c = sorted(c)
    r = None
    for k, g in itertools.izip(c[:-1],c[1:]):
        if k == g:
            if k != r:
                yield k
                r = k

def getDupes_9(c):
    '''sort/tee/izip'''
    a, b = itertools.tee(sorted(c))
    next(b, None)
    r = None
    for k, g in itertools.izip(a, b):
        if k != g: continue
        if k != r:
            yield k
            r = k

def getDupes_a(l):
    '''moooeeeep'''
    seen = set()
    seen_add = seen.add
    # adds all elements it doesn't know yet to seen and all other to seen_twice
    for x in l:
        if x in seen or seen_add(x):
            yield x

def getDupes_b(x):
    '''iter*/sorted'''
    x = sorted(x)
    def _matches():
        for k,g in itertools.izip(x[:-1],x[1:]):
            if k == g:
                yield k
    for k, n in itertools.groupby(_matches()):
        yield k

def getDupes_c(a):
    '''pandas'''
    import pandas as pd
    vc = pd.Series(a).value_counts()
    i = vc[vc > 1].index
    for _ in i:
        yield _

def hasDupes(fn,c):
    try:
        if fn(c).next(): return True    # Found a dupe
    except StopIteration:
        pass
    return False

def getDupes(fn,c):
    return list(fn(c))

STABLE = True
if STABLE:
    print 'Finding FIRST then ALL duplicates, single dupe of "nth" placed element in 1m element array'
else:
    print 'Finding FIRST then ALL duplicates, single dupe of "n" included in randomised 1m element array'
for location in (50,250000,500000,750000,999999):
    for test in (getDupes_2, getDupes_3, getDupes_4, getDupes_5, getDupes_6,
                 getDupes_8, getDupes_9, getDupes_a, getDupes_b, getDupes_c):
        print 'Test %-15s:%10d - '%(test.__doc__ or test.__name__,location),
        deltas = []
        for FIRST in (True,False):
            for i in xrange(0, 5):
                c = range(0,1000000)
                if STABLE:
                    c[0] = location
                else:
                    c.append(location)
                    random.shuffle(c)
                start = time.time()
                if FIRST:
                    print '.' if location == test(c).next() else '!',
                else:
                    print '.' if [location] == list(test(c)) else '!',
                deltas.append(time.time()-start)
            print ' -- %0.3f  '%(sum(deltas)/len(deltas)),
        print
    print

'सभी डुप्स' परीक्षण के परिणाम सुसंगत थे, इस सरणी में "पहले" डुप्लिकेट फिर "ऑल" डुप्लिकेट ढूंढते हुए:

Finding FIRST then ALL duplicates, single dupe of "nth" placed element in 1m element array
Test set len change :    500000 -  . . . . .  -- 0.264   . . . . .  -- 0.402  
Test in dict        :    500000 -  . . . . .  -- 0.163   . . . . .  -- 0.250  
Test in set         :    500000 -  . . . . .  -- 0.163   . . . . .  -- 0.249  
Test sort/adjacent  :    500000 -  . . . . .  -- 0.159   . . . . .  -- 0.229  
Test sort/groupby   :    500000 -  . . . . .  -- 0.860   . . . . .  -- 1.286  
Test sort/izip      :    500000 -  . . . . .  -- 0.165   . . . . .  -- 0.229  
Test sort/tee/izip  :    500000 -  . . . . .  -- 0.145   . . . . .  -- 0.206  *
Test moooeeeep      :    500000 -  . . . . .  -- 0.149   . . . . .  -- 0.232  
Test iter*/sorted   :    500000 -  . . . . .  -- 0.160   . . . . .  -- 0.221  
Test pandas         :    500000 -  . . . . .  -- 0.493   . . . . .  -- 0.499

जब सूचियों को पहले फेरबदल किया जाता है, तो सॉर्ट की कीमत स्पष्ट हो जाती है - दक्षता काफी कम हो जाती है और @moooeeeep दृष्टिकोण हावी हो जाता है, सेट और तानाशाह दृष्टिकोण समान लेकिन कम प्रदर्शन वाले होते हैं:

Finding FIRST then ALL duplicates, single dupe of "n" included in randomised 1m element array
Test set len change :    500000 -  . . . . .  -- 0.321   . . . . .  -- 0.473  
Test in dict        :    500000 -  . . . . .  -- 0.285   . . . . .  -- 0.360  
Test in set         :    500000 -  . . . . .  -- 0.309   . . . . .  -- 0.365  
Test sort/adjacent  :    500000 -  . . . . .  -- 0.756   . . . . .  -- 0.823  
Test sort/groupby   :    500000 -  . . . . .  -- 1.459   . . . . .  -- 1.896  
Test sort/izip      :    500000 -  . . . . .  -- 0.786   . . . . .  -- 0.845  
Test sort/tee/izip  :    500000 -  . . . . .  -- 0.743   . . . . .  -- 0.804  
Test moooeeeep      :    500000 -  . . . . .  -- 0.234   . . . . .  -- 0.311  *
Test iter*/sorted   :    500000 -  . . . . .  -- 0.776   . . . . .  -- 0.840  
Test pandas         :    500000 -  . . . . .  -- 0.539   . . . . .  -- 0.540

— F1Rumors
स्रोत

@mooeeeeeep - ifilter / izip / tee दृष्टिकोण पर अपने विचार देखने के लिए इच्छुक रहें।

— F1Rumors

1

यह उत्तर अविश्वसनीय रूप से अच्छा है। मुझे समझ में नहीं आता है कि स्पष्टीकरण और परीक्षणों के लिए इसमें अधिक अंक नहीं थे, जो उन लोगों के लिए बहुत उपयोगी हैं जिन्हें इसकी आवश्यकता होगी।

— dlewin

1

अजगर का प्रकार O (n) है जब केवल एक वस्तु क्रम से बाहर होती है। आपको उसका random.shuffle(c)हिसाब देना चाहिए । इसके अतिरिक्त, मैं आपके परिणामों की नकल नहीं कर सकता जब बिना स्क्रिप्ट (या पूरी तरह से अलग क्रम) के चल रहा हो, तो शायद यह सीपीयू पर भी निर्भर हो।

— जॉन ला रूय

धन्यवाद @ जॉन-ला-रूय, सीपीयू / स्थानीय मशीन पर प्रभावी अवलोकन प्रभावशाली है - इसलिए मुझे आइटम YMMV में संशोधन करना चाहिए । का उपयोग करते हुए हे (एन) तरह जानबूझकर किया गया था: किसी दोहराव तत्व विभिन्न स्थानों पर डाला जाता है विशेष रूप से दृष्टिकोण के प्रभाव को देखने के लिए अगर वहाँ एक अच्छा में एक एकमात्र डुप्लिकेट है इन के साथ (सूची के शुरू) या बुरा (सूची के अंत) स्थान दृष्टिकोण। मैंने एक यादृच्छिक सूची पर विचार किया - जैसे random.shuffle - लेकिन फैसला किया कि केवल तभी समझदार होगा जब मैंने बहुत अधिक रन बनाए! मुझे एक से अधिक रन / फेरबदल के बराबर वापस लौटना होगा और देखना होगा कि क्या प्रभाव पड़ता है।

— F1Rumors

@Firelynx पांडा दृष्टिकोण को शामिल करने और पूरी तरह से फेरबदल सूची के साथ-साथ क्रमबद्ध सूची में चलाने के लिए संशोधित। इसका कारण यह है कि पाइथन द्वारा उपयोग किए जाने वाले देशी टाइमस्टार्ट ज्यादातर सॉर्ट किए गए डेटा (सबसे अच्छा मामला) पर तेजी से दुष्ट है और फेरबदल सूची इसकी सबसे खराब स्थिति है - जो परिणामों को हिलाता है।

— F1Rumors

13

पांडा का उपयोग करना:

>>> import pandas as pd
>>> a = [1, 2, 1, 3, 3, 3, 0]
>>> pd.Series(a)[pd.Series(a).duplicated()].values
array([1, 3, 3])

— शब्दों का जानकार
स्रोत

10

संग्रह। मुठभेड़ अजगर 2.7 में नया है:


Python 2.5.4 (r254:67916, May 31 2010, 15:03:39) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2
a = [1,2,3,2,1,5,6,5,5,5]
import collections
print [x for x, y in collections.Counter(a).items() if y > 1]
Type "help", "copyright", "credits" or "license" for more information.
  File "", line 1, in 
AttributeError: 'module' object has no attribute 'Counter'
>>>

पहले के संस्करण में आप इसके बजाय एक पारंपरिक तानाशाही का उपयोग कर सकते हैं:

a = [1,2,3,2,1,5,6,5,5,5]
d = {}
for elem in a:
    if elem in d:
        d[elem] += 1
    else:
        d[elem] = 1

print [x for x, y in d.items() if y > 1]

— एडवर्ड
स्रोत

9

यहाँ एक साफ और संक्षिप्त समाधान है -

for x in set(li):
    li.remove(x)

li = list(set(li))

— निखिल प्रभु
स्रोत

हालांकि, मूल सूची खो गई है। यह सामग्री को किसी अन्य सूची में कॉपी करके तय किया जा सकता है - temp = li [:]

— निखिल प्रभु

3

यह बड़ी सूचियों पर काफी बुरा अभ्यास है - सूचियों से तत्वों को निकालना काफी महंगा है!

— F1Rumors

7

सूची में परिवर्तित किए बिना और शायद सबसे सरल तरीका नीचे जैसा कुछ होगा। यह एक साक्षात्कार के दौरान उपयोगी हो सकता है जब वे सेट का उपयोग न करने के लिए कहें

a=[1,2,3,3,3]
dup=[]
for each in a:
  if each not in dup:
    dup.append(each)
print(dup)

======= अन्य 2 विशिष्ट मानों और डुप्लिकेट मानों की अलग-अलग सूची प्राप्त करने के लिए

a=[1,2,3,3,3]
uniques=[]
dups=[]

for each in a:
  if each not in uniques:
    uniques.append(each)
  else:
    dups.append(each)
print("Unique values are below:")
print(uniques)
print("Duplicate values are below:")
print(dups)

— Chetan_Vasudevan
स्रोत

1

यह (या मूल सूची) के डुप्लिकेट की एक सूची में परिणाम नहीं करता है, हालांकि, यह (या मूल सूची) के सभी अद्वितीय तत्वों की एक सूची में परिणाम है। "डुप" सूची बनाने के बाद कोई क्या करेगा?

— गेमक्रॉकर 95

6

मैं पंडों के साथ ऐसा करता, क्योंकि मैं पंडों का बहुत उपयोग करता हूं

import pandas as pd
a = [1,2,3,3,3,4,5,6,6,7]
vc = pd.Series(a).value_counts()
vc[vc > 1].index.tolist()

देता है

[3,6]

संभवतः बहुत कुशल नहीं है, लेकिन यह सुनिश्चित है कि बहुत सारे अन्य उत्तरों की तुलना में कम कोड है, इसलिए मैंने सोचा कि मैं इसमें योगदान दूंगा

— firelynx
स्रोत

3

यह भी ध्यान दें कि पांडा में एक अंतर्निहित डुप्लिकेट फ़ंक्शन होता है pda = pd.Series(a) print list(pda[pda.duplicated()])

— लेन ब्लोककेन

6

स्वीकृत उत्तर का तीसरा उदाहरण एक गलत उत्तर देता है और डुप्लिकेट देने का प्रयास नहीं करता है। यहाँ सही संस्करण है:

number_lst = [1, 1, 2, 3, 5, ...]

seen_set = set()
duplicate_set = set(x for x in number_lst if x in seen_set or seen_set.add(x))
unique_set = seen_set - duplicate_set

— यो टा
स्रोत

6

सूची में प्रत्येक तत्व के माध्यम से लूप के बारे में कैसे घटनाओं की संख्या की जांच करके, फिर उन्हें एक सेट में जोड़ दें जो फिर डुप्लिकेट को प्रिंट करेगा। उम्मीद है कि इससे वहां किसी को मदद मिलेगी।

myList  = [2 ,4 , 6, 8, 4, 6, 12];
newList = set()

for i in myList:
    if myList.count(i) >= 2:
        newList.add(i)

print(list(newList))
## [4 , 6]

— HenryDev
स्रोत

5

हम उन itertools.groupbyसभी वस्तुओं को ढूंढने के लिए उपयोग कर सकते हैं , जिनमें डुबकी है:

from itertools import groupby

myList  = [2, 4, 6, 8, 4, 6, 12]
# when the list is sorted, groupby groups by consecutive elements which are similar
for x, y in groupby(sorted(myList)):
    #  list(y) returns all the occurences of item x
    if len(list(y)) > 1:
        print x

उत्पादन होगा:

4
6

— alfasin
स्रोत

1

या अधिक संक्षेप में:dupes = [x for x, y in groupby(sorted(myList)) if len(list(y)) > 1]

— frnhr

5

मुझे लगता है कि सूची में डुप्लिकेट खोजने का सबसे प्रभावी तरीका है:

from collections import Counter

def duplicates(values):
    dups = Counter(values) - Counter(set(values))
    return list(dups.keys())

print(duplicates([1,2,3,6,5,2]))

यह Counterसभी तत्वों और सभी अद्वितीय तत्वों का उपयोग करता है । दूसरे के साथ पहले वाले को घटाना केवल डुप्लिकेट को छोड़ देगा।

— आनंद चितीपोथु
स्रोत

4

थोड़ा देर से, लेकिन शायद कुछ के लिए उपयोगी है। एक बड़ी सूची के लिए, मैंने पाया कि यह मेरे लिए काम कर रहा है।

l=[1,2,3,5,4,1,3,1]
s=set(l)
d=[]
for x in l:
    if x in s:
        s.remove(x)
    else:
        d.append(x)
d
[1,3,1]

केवल और सभी डुप्लिकेट दिखाता है और आदेश को संरक्षित करता है।

— user3109122
स्रोत

3

पायथन में एक पुनरावृत्ति के साथ डोज़ ढूंढने का बहुत सरल और त्वरित तरीका है:

testList = ['red', 'blue', 'red', 'green', 'blue', 'blue']

testListDict = {}

for item in testList:
  try:
    testListDict[item] += 1
  except:
    testListDict[item] = 1

print testListDict

आउटपुट निम्नानुसार होगा:

>>> print testListDict
{'blue': 3, 'green': 1, 'red': 2}

यह और अधिक मेरे ब्लॉग में http://www.howtoprogramwithpython.com

— इगोर विश्नेव्स्की
स्रोत

3

मैं इस चर्चा में बहुत देर से प्रवेश कर रहा हूं। हालांकि, मैं एक लाइनर के साथ इस समस्या से निपटना चाहूंगा। क्योंकि वह पायथन का आकर्षण है। अगर हम केवल एक अलग सूची (या किसी संग्रह) में डुप्लिकेट प्राप्त करना चाहते हैं, तो मैं नीचे के रूप में करने का सुझाव दूंगा। तो हमारे पास एक डुप्लिकेट सूची है जिसे हम 'लक्ष्य' कह सकते हैं

    target=[1,2,3,4,4,4,3,5,6,8,4,3]

अब यदि हम डुप्लिकेट प्राप्त करना चाहते हैं, तो हम नीचे दिए गए एक लाइनर का उपयोग कर सकते हैं:

    duplicates=dict(set((x,target.count(x)) for x in filter(lambda rec : target.count(rec)>1,target)))

यह कोड डुप्लिकेट किए गए रिकॉर्डों को कुंजी के रूप में रखेगा और 'डुप्लिकेट्स' शब्दकोश के मूल्य के रूप में गिना जाएगा। 'डुप्लिकेट' शब्दकोश इस प्रकार दिखेगा:

    {3: 3, 4: 4} #it saying 3 is repeated 3 times and 4 is 4 times

यदि आप केवल एक सूची में डुप्लिकेट के साथ सभी रिकॉर्ड चाहते हैं, तो इसका फिर से बहुत छोटा कोड:

    duplicates=filter(lambda rec : target.count(rec)>1,target)

आउटपुट होगा:

    [3, 4, 4, 4, 3, 4, 3]

यह अजगर 2.7.x + संस्करणों में पूरी तरह से काम करता है

— akhil pathirippilly
स्रोत

3

यदि आप अपने खुद के एल्गोरिथ्म लिखने या पुस्तकालयों का उपयोग करने की परवाह नहीं करते हैं तो पायथन 3.8 वन-लाइनर:

l = [1,2,3,2,1,5,6,5,5,5]

res = [(x, count) for x, g in groupby(sorted(l)) if (count := len(list(g))) > 1]

print(res)

प्रिंट आइटम और गणना:

[(1, 2), (2, 2), (5, 4)]

groupbyएक समूहीकरण कार्य करता है ताकि आप अपने समूहों को विभिन्न तरीकों से परिभाषित कर सकें और Tupleआवश्यकतानुसार अतिरिक्त फ़ील्ड लौटा सकें।

groupby यह बहुत धीमा नहीं होना चाहिए।

— yǝsʞǝla
स्रोत

2

कुछ अन्य परीक्षण। जरूर करें ...

set([x for x in l if l.count(x) > 1])

... बहुत महंगा है। अगली अंतिम विधि का उपयोग करने के लिए यह लगभग 500 गुना तेज (अधिक लंबी सरणी बेहतर परिणाम देता है):

def dups_count_dict(l):
    d = {}

    for item in l:
        if item not in d:
            d[item] = 0

        d[item] += 1

    result_d = {key: val for key, val in d.iteritems() if val > 1}

    return result_d.keys()

केवल 2 छोरों, कोई बहुत महंगा l.count()संचालन नहीं।

उदाहरण के लिए तरीकों की तुलना करने के लिए यहां एक कोड है। कोड नीचे है, यहाँ आउटपुट है:

dups_count: 13.368s # this is a function which uses l.count()
dups_count_dict: 0.014s # this is a final best function (of the 3 functions)
dups_count_counter: 0.024s # collections.Counter

परीक्षण कोड:

import numpy as np
from time import time
from collections import Counter

class TimerCounter(object):
    def __init__(self):
        self._time_sum = 0

    def start(self):
        self.time = time()

    def stop(self):
        self._time_sum += time() - self.time

    def get_time_sum(self):
        return self._time_sum


def dups_count(l):
    return set([x for x in l if l.count(x) > 1])


def dups_count_dict(l):
    d = {}

    for item in l:
        if item not in d:
            d[item] = 0

        d[item] += 1

    result_d = {key: val for key, val in d.iteritems() if val > 1}

    return result_d.keys()


def dups_counter(l):
    counter = Counter(l)    

    result_d = {key: val for key, val in counter.iteritems() if val > 1}

    return result_d.keys()



def gen_array():
    np.random.seed(17)
    return list(np.random.randint(0, 5000, 10000))


def assert_equal_results(*results):
    primary_result = results[0]
    other_results = results[1:]

    for other_result in other_results:
        assert set(primary_result) == set(other_result) and len(primary_result) == len(other_result)


if __name__ == '__main__':
    dups_count_time = TimerCounter()
    dups_count_dict_time = TimerCounter()
    dups_count_counter = TimerCounter()

    l = gen_array()

    for i in range(3):
        dups_count_time.start()
        result1 = dups_count(l)
        dups_count_time.stop()

        dups_count_dict_time.start()
        result2 = dups_count_dict(l)
        dups_count_dict_time.stop()

        dups_count_counter.start()
        result3 = dups_counter(l)
        dups_count_counter.stop()

        assert_equal_results(result1, result2, result3)

    print 'dups_count: %.3f' % dups_count_time.get_time_sum()
    print 'dups_count_dict: %.3f' % dups_count_dict_time.get_time_sum()
    print 'dups_count_counter: %.3f' % dups_count_counter.get_time_sum()

— sergzach
स्रोत

2

विधि 1:

list(set([val for idx, val in enumerate(input_list) if val in input_list[idx+1:]]))

स्पष्टीकरण: [आइडीएक्स के लिए वैल, एन्यूमरेट में इनपुट (इनपुट_लिस्ट) यदि वैल इन इनपुट_लिस्ट [आईडीएक्स + 1:]] एक सूची समझ है, जो एक तत्व को लौटाता है, यदि एक ही तत्व वर्तमान स्थिति से मौजूद है, तो सूची में, सूचकांक ।

उदाहरण: input_list = [42,31,42,31,3,31,31,5,6,6,6,6,6,6,7,42]

सूची में पहले तत्व के साथ शुरू, 42, सूचकांक 0 के साथ, यह जाँचता है कि तत्व 42, इनपुट_लिस्ट में मौजूद है [1:] (यानी, सूची के अंत तक सूचकांक 1 से) क्योंकि 42 इनपुट_लिस्ट में मौजूद है [1:] , यह 42 को लौटेगा।

फिर यह सूचकांक 1 के साथ अगले तत्व 31 पर जाता है, और जाँचता है कि क्या तत्व 31 इनपुट_लिस्ट [2:] में मौजूद है (यानी, सूचकांक 2 से सूची के अंत तक), क्योंकि 31 इनपुट_लिस्ट [2:] में मौजूद है, यह 31 को लौटेगा।

इसी तरह यह सूची के सभी तत्वों के माध्यम से जाता है, और सूची में केवल दोहराया / डुप्लिकेट तत्वों को वापस करेगा।

फिर क्योंकि हम डुप्लिकेट हैं, एक सूची में, हमें प्रत्येक डुप्लिकेट में से एक को चुनना होगा, अर्थात डुप्लिकेट के बीच डुप्लिकेट को हटा दें, और ऐसा करने के लिए, हम एक अजगर को निर्मित नाम () में कॉल करते हैं, और यह डुप्लिकेट को हटा देता है,

फिर हमें एक सेट के साथ छोड़ दिया जाता है, लेकिन सूची नहीं, और इसलिए सेट से सूची में परिवर्तित करने के लिए, हम उपयोग करते हैं, टाइपकास्टिंग, सूची (), और जो तत्वों के सेट को सूची में परिवर्तित करता है।

विधि 2:

def dupes(ilist):
    temp_list = [] # initially, empty temporary list
    dupe_list = [] # initially, empty duplicate list
    for each in ilist:
        if each in temp_list: # Found a Duplicate element
            if not each in dupe_list: # Avoid duplicate elements in dupe_list
                dupe_list.append(each) # Add duplicate element to dupe_list
        else: 
            temp_list.append(each) # Add a new (non-duplicate) to temp_list

    return dupe_list

स्पष्टीकरण: यहां हम दो रिक्त सूची बनाते हैं, जिनके साथ शुरू करना है। फिर सूची के सभी तत्वों के माध्यम से ट्रेसिंग करते रहें, यह देखने के लिए कि क्या यह temp_list (शुरू में खाली) में मौजूद है। अगर यह temp_list में नहीं है, तो हम append पद्धति का उपयोग करके इसे temp_list में जोड़ते हैं ।

यदि यह पहले से ही temp_list में मौजूद है, तो इसका मतलब है, कि सूची का वर्तमान तत्व एक डुप्लिकेट है, और इसलिए हमें इसे एपेंड पद्धति का उपयोग करके इसे dupe_list में जोड़ना होगा ।

— S471
स्रोत

2

raw_list = [1,2,3,3,4,5,6,6,7,2,3,4,2,3,4,1,3,4,]

clean_list = list(set(raw_list))
duplicated_items = []

for item in raw_list:
    try:
        clean_list.remove(item)
    except ValueError:
        duplicated_items.append(item)


print(duplicated_items)
# [3, 6, 2, 3, 4, 2, 3, 4, 1, 3, 4]

आप मूल रूप से सेट करने के लिए परिवर्तित करके डुप्लिकेट को हटाते हैं ( clean_list), फिर पुनरावृति करते हैं raw_list, जबकि प्रत्येक itemको साफ सूची में होने के लिए हटाते हैं raw_list। यदि itemनहीं मिला है, तो उठाया ValueErrorअपवाद पकड़ा गया है और itemइसे जोड़ा गया हैduplicated_items सूची में ।

यदि डुप्लिकेट किए गए आइटम के सूचकांक की आवश्यकता है, तो बस enumerateसूची और सूचकांक के साथ चारों ओर खेलें। ( for index, item in enumerate(raw_list):) जो बड़ी सूचियों के लिए तेज़ और अनुकूलित है (जैसे कि हजारों + तत्व)

— सभी Іѕ Vаиітy
स्रोत

2

list.count()दी गई सूची के डुप्लिकेट तत्वों का पता लगाने के लिए सूची में विधि का उपयोग

arr=[]
dup =[]
for i in range(int(input("Enter range of list: "))):
    arr.append(int(input("Enter Element in a list: ")))
for i in arr:
    if arr.count(i)>1 and i not in dup:
        dup.append(i)
print(dup)

— रविकिरन डी
स्रोत

सूची फ़ंक्शन में डुप्लिकेट तत्वों को खोजने का सरल तरीका

— रविकिरण डी

2

वन-लाइनर, मौज-मस्ती के लिए, और जहां एक ही कथन की आवश्यकता होती है।

(lambda iterable: reduce(lambda (uniq, dup), item: (uniq, dup | {item}) if item in uniq else (uniq | {item}, dup), iterable, (set(), set())))(some_iterable)

— Wizr
स्रोत

1

list2 = [1, 2, 3, 4, 1, 2, 3]
lset = set()
[(lset.add(item), list2.append(item))
 for item in list2 if item not in lset]
print list(lset)

— हरेश शायरा
स्रोत

1

एक पंक्ति समाधान:

set([i for i in list if sum([1 for a in list if a == i]) > 1])

— ytpillai
स्रोत

1

यहाँ बहुत सारे उत्तर हैं, लेकिन मुझे लगता है कि यह अपेक्षाकृत बहुत ही पठनीय है और दृष्टिकोण को समझने में आसान है:

def get_duplicates(sorted_list):
    duplicates = []
    last = sorted_list[0]
    for x in sorted_list[1:]:
        if x == last:
            duplicates.append(x)
        last = x
    return set(duplicates)

टिप्पणियाँ:

यदि आप डुप्लिकेट गणना को संरक्षित करना चाहते हैं, तो पूरी सूची प्राप्त करने के लिए नीचे दिए गए 'सेट' में कलाकारों से छुटकारा पाएं
यदि आप जनरेटर का उपयोग करना पसंद करते हैं, तो डुप्लिकेट xapp.append (x) को उपज x और रिटर्न स्टेटमेंट के साथ बदलें (आप बाद में सेट करने के लिए कास्ट कर सकते हैं)

— tvt173
स्रोत

1

यहां एक फास्ट जनरेटर है जो प्रत्येक तत्व को स्टोर करने के लिए एक बूलियन मान के साथ एक कुंजी का उपयोग करने के लिए एक तानाशाह का उपयोग करता है अगर डुप्लिकेट आइटम पहले से ही उपज गया है।

धोने योग्य प्रकार वाले सभी तत्वों की सूची के लिए:

def gen_dupes(array):
    unique = {}
    for value in array:
        if value in unique and unique[value]:
            unique[value] = False
            yield value
        else:
            unique[value] = True

array = [1, 2, 2, 3, 4, 1, 5, 2, 6, 6]
print(list(gen_dupes(array)))
# => [2, 1, 6]

उन सूचियों के लिए जिनमें सूचियाँ हो सकती हैं:

def gen_dupes(array):
    unique = {}
    for value in array:
        is_list = False
        if type(value) is list:
            value = tuple(value)
            is_list = True

        if value in unique and unique[value]:
            unique[value] = False
            if is_list:
                value = list(value)

            yield value
        else:
            unique[value] = True

array = [1, 2, 2, [1, 2], 3, 4, [1, 2], 5, 2, 6, 6]
print(list(gen_dupes(array)))
# => [2, [1, 2], 6]

— जॉन बी
स्रोत

1

def removeduplicates(a):
  seen = set()

  for i in a:
    if i not in seen:
      seen.add(i)
  return seen 

print(removeduplicates([1,1,2,2]))

— आशीष रंजन
स्रोत

आप एक सेट लौटाते हैं और अनुरोध के अनुसार सूची नहीं। एक सेट में केवल अनन्य तत्व होते हैं, इस प्रकार यदि कथन वास्तव में अनुत्पादक नहीं है। आपको यह भी बताना चाहिए कि दूसरे की तुलना में आपके समाधान से क्या फायदा है।

— 13:13 बजे क्लीमेन्स

1

टूलज़ का उपयोग करते समय :

from toolz import frequencies, valfilter

a = [1,2,2,3,4,5,4]
>>> list(valfilter(lambda count: count > 1, frequencies(a)).keys())
[2,4]

— एंड्रियास प्रोफस
स्रोत

0

इस तरह से मुझे ऐसा करना पड़ा क्योंकि मैंने खुद को चुनौती दी कि मैं अन्य तरीकों का इस्तेमाल न करूं:

def dupList(oldlist):
    if type(oldlist)==type((2,2)):
        oldlist=[x for x in oldlist]
    newList=[]
    newList=newList+oldlist
    oldlist=oldlist
    forbidden=[]
    checkPoint=0
    for i in range(len(oldlist)):
        #print 'start i', i
        if i in forbidden:
            continue
        else:
            for j in range(len(oldlist)):
                #print 'start j', j
                if j in forbidden:
                    continue
                else:
                    #print 'after Else'
                    if i!=j: 
                        #print 'i,j', i,j
                        #print oldlist
                        #print newList
                        if oldlist[j]==oldlist[i]:
                            #print 'oldlist[i],oldlist[j]', oldlist[i],oldlist[j]
                            forbidden.append(j)
                            #print 'forbidden', forbidden
                            del newList[j-checkPoint]
                            #print newList
                            checkPoint=checkPoint+1
    return newList

तो आपका नमूना निम्नानुसार काम करता है:

>>>a = [1,2,3,3,3,4,5,6,6,7]
>>>dupList(a)
[1, 2, 3, 4, 5, 6, 7]

— मैट एस
स्रोत

3

यह वह नहीं है जो ओपी चाहता था। वह डुप्लिकेट की एक सूची चाहता था, न कि हटाए गए डुप्लिकेट के साथ एक सूची। हटाए गए डुप्लिकेट के साथ एक सूची बनाने के लिए, मैं सुझाव दूंगा duplist = list(set(a))।

— जोंडो