पायथन पांडा: चयनित कॉलम को श्रृंखला के बजाय डेटाफ़्रेम के रूप में रखें

Question 1

पंडों डेटाफ्रेम (कहते हैं df.iloc[:, 0], df['A']या df.A, आदि) से एक एकल स्तंभ का चयन करते समय , परिणामस्वरूप वेक्टर स्वचालित रूप से एक एकल-स्तंभ डेटाफ़्रेम के बजाय एक श्रृंखला में बदल जाता है। हालाँकि, मैं कुछ फ़ंक्शन लिख रहा हूं जो एक डेटा तर्क को एक इनपुट तर्क के रूप में लेता है। इसलिए, मैं सीरीज के बजाय सिंगल-कॉलम डेटाफ्रेम से निपटना पसंद करता हूं ताकि फ़ंक्शन यह कह सके कि df.columns सुलभ है। अभी मुझे स्पष्ट रूप से कुछ का उपयोग करके श्रृंखला को डेटाफ़्रेम में बदलना है pd.DataFrame(df.iloc[:, 0])। यह सबसे साफ तरीका नहीं लगता है। क्या किसी DataFrame से सीधे अनुक्रमण करने का एक और अधिक सुरुचिपूर्ण तरीका है ताकि परिणाम श्रृंखला के बजाय एक एकल-स्तंभ DataFrame हो?

Question 2

जैसा कि @ जेफ़ का उल्लेख है कि ऐसा करने के कुछ तरीके हैं, लेकिन मैं आपको अधिक स्पष्ट होने के लिए लोक / इलोक का उपयोग करने की सलाह देता हूं (और यदि आपके कुछ अस्पष्ट प्रयास कर रहा है, तो जल्दी उठें):

In [10]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

In [11]: df
Out[11]:
   A  B
0  1  2
1  3  4

In [12]: df[['A']]

In [13]: df[[0]]

In [14]: df.loc[:, ['A']]

In [15]: df.iloc[:, [0]]

Out[12-15]:  # they all return the same thing:
   A
0  1
1  3

बाद के दो विकल्प पूर्णांक स्तंभ नामों के मामले में अस्पष्टता को दूर करते हैं (ठीक यही कारण है कि लोक / इलोक बनाए गए थे)। उदाहरण के लिए:

In [16]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 0])

In [17]: df
Out[17]:
   A  0
0  1  2
1  3  4

In [18]: df[[0]]  # ambiguous
Out[18]:
   A
0  1
1  3

Question 3

As Andy Hayden recommends, utilizing .iloc/.loc to index out (single-columned) dataframe is the way to go; another point to note is how to express the index positions. Use a listed Index labels/positions whilst specifying the argument values to index out as Dataframe; failure to do so will return a 'pandas.core.series.Series'

Input:

    A_1 = train_data.loc[:,'Fraudster']
    print('A_1 is of type', type(A_1))
    A_2 = train_data.loc[:, ['Fraudster']]
    print('A_2 is of type', type(A_2))
    A_3 = train_data.iloc[:,12]
    print('A_3 is of type', type(A_3))
    A_4 = train_data.iloc[:,[12]]
    print('A_4 is of type', type(A_4))

Output:

    A_1 is of type <class 'pandas.core.series.Series'>
    A_2 is of type <class 'pandas.core.frame.DataFrame'>
    A_3 is of type <class 'pandas.core.series.Series'>
    A_4 is of type <class 'pandas.core.frame.DataFrame'>

Question 4

You can use df.iloc[:, 0:1], in this case the resulting vector will be a DataFrame and not series.

As you can see:

Question 5

These three approaches have been mentioned:

pd.DataFrame(df.loc[:, 'A'])  # Approach of the original post
df.loc[:,[['A']]              # Approach 2 (note: use iloc for positional indexing)
df[['A']]                     # Approach 3

pd.Series.to_frame() is another approach.

Because it is a method, it can be used in situations where the second and third approaches above do not apply. In particular, it is useful when applying some method to a column in your dataframe and you want to convert the output into a dataframe instead of a series. For instance, in a Jupyter Notebook a series will not have pretty output, but a dataframe will.

# Basic use case: 
df['A'].to_frame()

# Use case 2 (this will give you pretty output in a Jupyter Notebook): 
df['A'].describe().to_frame()

# Use case 3: 
df['A'].str.strip().to_frame()

# Use case 4: 
def some_function(num): 
    ...

df['A'].apply(some_function).to_frame()