एक ही आदेश के साथ एक बार में दो सूची में फेरबदल करें

Question 1

मैं nltkलाइब्रेरी के movie_reviewsकॉर्पस का उपयोग कर रहा हूं जिसमें बड़ी संख्या में दस्तावेज हैं। मेरे कार्य को डेटा के पूर्व-प्रसंस्करण और पूर्व-प्रसंस्करण के बिना इन समीक्षाओं का अनुमानित प्रदर्शन मिल रहा है। लेकिन समस्या है, सूचियों में documentsऔर documents2मेरे पास एक ही दस्तावेज हैं और मुझे दोनों सूचियों में समान क्रम रखने के लिए उन्हें फेरबदल की आवश्यकता है। मैं उन्हें अलग से फेरबदल नहीं कर सकता क्योंकि हर बार जब मैं सूची में फेरबदल करता हूं, मुझे अन्य परिणाम मिलते हैं। इसलिए मुझे एक ही आदेश के साथ एक बार में फेरबदल करने की आवश्यकता है क्योंकि मुझे अंत में उनकी तुलना करने की आवश्यकता है (यह आदेश पर निर्भर करता है)। मैं अजगर 2.7 का उपयोग कर रहा हूँ

उदाहरण (वास्तविक में तार टोकन हैं, लेकिन यह सापेक्ष नहीं है):

documents = [(['plot : two teen couples go to a church party , '], 'neg'),
             (['drink and then drive . '], 'pos'),
             (['they get into an accident . '], 'neg'),
             (['one of the guys dies'], 'neg')]

documents2 = [(['plot two teen couples church party'], 'neg'),
              (['drink then drive . '], 'pos'),
              (['they get accident . '], 'neg'),
              (['one guys dies'], 'neg')]

और मुझे दोनों सूचियों में फेरबदल के बाद यह परिणाम प्राप्त करने की आवश्यकता है:

documents = [(['one of the guys dies'], 'neg'),
             (['they get into an accident . '], 'neg'),
             (['drink and then drive . '], 'pos'),
             (['plot : two teen couples go to a church party , '], 'neg')]

documents2 = [(['one guys dies'], 'neg'),
              (['they get accident . '], 'neg'),
              (['drink then drive . '], 'pos'),
              (['plot two teen couples church party'], 'neg')]

मेरे पास यह कोड है:

def cleanDoc(doc):
    stopset = set(stopwords.words('english'))
    stemmer = nltk.PorterStemmer()
    clean = [token.lower() for token in doc if token.lower() not in stopset and len(token) > 2]
    final = [stemmer.stem(word) for word in clean]
    return final

documents = [(list(movie_reviews.words(fileid)), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]

documents2 = [(list(cleanDoc(movie_reviews.words(fileid))), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]

random.shuffle( and here shuffle documents and documents2 with same order) # or somehow

Question 2

आप इसे इस प्रकार कर सकते हैं:

import random

a = ['a', 'b', 'c']
b = [1, 2, 3]

c = list(zip(a, b))

random.shuffle(c)

a, b = zip(*c)

print a
print b

[OUTPUT]
['a', 'c', 'b']
[1, 3, 2]

बेशक, यह सरल सूचियों के साथ एक उदाहरण था, लेकिन अनुकूलन आपके मामले के लिए समान होगा।

आशा है कि इससे सहायता मिलेगी। शुभ लाभ।

Question 3

मुझे ऐसा करने का एक आसान तरीका मिल गया है

import numpy as np
a = np.array([0,1,2,3,4])
b = np.array([5,6,7,8,9])

indices = np.arange(a.shape[0])
np.random.shuffle(indices)

a = a[indices]
b = b[indices]
# a, array([3, 4, 1, 2, 0])
# b, array([8, 9, 6, 7, 5])

Question 4

from sklearn.utils import shuffle

a = ['a', 'b', 'c','d','e']
b = [1, 2, 3, 4, 5]

a_shuffled, b_shuffled = shuffle(np.array(a), np.array(b))
print(a_shuffled, b_shuffled)

#random output
#['e' 'c' 'b' 'd' 'a'] [5 3 2 4 1]

Question 5

एक साथ सूचियों की एक मध्यस्थ संख्या में फेरबदल करें।

from random import shuffle

def shuffle_list(*ls):
  l =list(zip(*ls))

  shuffle(l)
  return zip(*l)

a = [0,1,2,3,4]
b = [5,6,7,8,9]

a1,b1 = shuffle_list(a,b)
print(a1,b1)

a = [0,1,2,3,4]
b = [5,6,7,8,9]
c = [10,11,12,13,14]
a1,b1,c1 = shuffle_list(a,b,c)
print(a1,b1,c1)

आउटपुट:

$ (0, 2, 4, 3, 1) (5, 7, 9, 8, 6)
$ (4, 3, 0, 2, 1) (9, 8, 5, 7, 6) (14, 13, 10, 12, 11)

नोट:
द्वारा लौटाए गए ऑब्जेक्ट shuffle_list()हैं tuples।

PS shuffle_list()को भी लागू किया जा सकता हैnumpy.array()

a = np.array([1,2,3])
b = np.array([4,5,6])

a1,b1 = shuffle_list(a,b)
print(a1,b1)

आउटपुट:

$ (3, 1, 2) (6, 4, 5)

Question 6

ऐसा करने का आसान और तेज़ तरीका है random.seed () random.shuffle () के साथ उपयोग करना। यह आपको कई बार वही यादृच्छिक क्रम उत्पन्न करने देता है जो आप चाहते हैं। यह इस तरह दिखेगा:

a = [1, 2, 3, 4, 5]
b = [6, 7, 8, 9, 10]
seed = random.random()
random.seed(seed)
a.shuffle()
random.seed(seed)
b.shuffle()
print(a)
print(b)

>>[3, 1, 4, 2, 5]
>>[8, 6, 9, 7, 10]

यह तब भी काम करता है जब आप मेमोरी की समस्याओं के कारण एक ही समय में दोनों सूचियों के साथ काम नहीं कर सकते।

Question 7

फेरबदल के क्रम को ठीक करने के लिए आप फेरबदल फ़ंक्शन के दूसरे तर्क का उपयोग कर सकते हैं।

विशेष रूप से, आप फेरबदल फ़ंक्शन के दूसरे तर्क को शून्य तर्क फ़ंक्शन पास कर सकते हैं जो [0, 1) में एक मान लौटाता है। इस फ़ंक्शन का रिटर्न मान फेरबदल के क्रम को ठीक करता है। (डिफ़ॉल्ट रूप से अर्थात यदि आप किसी फ़ंक्शन को दूसरे तर्क के रूप में पास नहीं करते हैं, तो यह फ़ंक्शन का उपयोग करता है random.random()। आप इसे यहां लाइन 277 पर देख सकते हैं ।)

यह उदाहरण दिखाता है कि मैंने क्या वर्णन किया है:

import random

a = ['a', 'b', 'c', 'd', 'e']
b = [1, 2, 3, 4, 5]

r = random.random()            # randomly generating a real in [0,1)
random.shuffle(a, lambda : r)  # lambda : r is an unary function which returns r
random.shuffle(b, lambda : r)  # using the same function as used in prev line so that shuffling order is same

print a
print b

आउटपुट:

['e', 'c', 'd', 'a', 'b']
[5, 3, 4, 1, 2]