डेटा साइंस data

1

स्केगन ग्रैडिएंटबॉस्टिंग क्लैसिफायर की तुलना में एक्सगबोस्ट इतना तेज क्यों है?

मैं 100 न्यूमेरिक फीचर्स के साथ 50k उदाहरणों पर एक ढाल बढ़ाने वाले मॉडल को प्रशिक्षित करने की कोशिश कर रहा हूं। XGBClassifierमेरी मशीन पर 43 सेकंड के भीतर 500 पेड़ लगाता है, जबकि GradientBoostingClassifier1 मिनट और 2 सेकंड में केवल 10 पेड़ (!) को संभालता है :( मैंने 500 …

29 scikit-learn xgboost gbm data-mining classification data-cleaning machine-learning reinforcement-learning data-mining bigdata dataset nlp language-model stanford-nlp machine-learning neural-network deep-learning randomized-algorithms machine-learning beginner career xgboost loss-function neural-network software-recommendation naive-bayes-classifier classification scikit-learn feature-selection r random-forest cross-validation data-mining python scikit-learn random-forest churn python clustering k-means machine-learning nlp sentiment-analysis machine-learning programming python scikit-learn nltk gensim visualization data csv neural-network deep-learning descriptive-statistics machine-learning supervised-learning text-mining orange data parameter-estimation python pandas scraping r clustering k-means unsupervised-learning

1

केरेस के फिट फंक्शन का वैलिडेशन_सप्लिट पैरामीटर कैसे काम करता है?

केरस अनुक्रमिक मॉडल फिट फ़ंक्शन में मान्यता-विभाजन https://keras.io/models/fter// पर निम्न के रूप में प्रलेखित है : validation_split: 0 और 1. के बीच फ़्लोट सत्यापन डेटा का उपयोग सत्यापन डेटा के रूप में किया जाएगा। मॉडल प्रशिक्षण डेटा के इस अंश को अलग करेगा, उस पर प्रशिक्षण नहीं देगा, और प्रत्येक …

17 keras data cross-validation

5

सीबॉर्न हीटमैप को बड़ा करें

मैं corr()एक मूल df से df बनाता हूं । corr()Df बाहर 70 एक्स 70 में आया और यह हीटमैप कल्पना करने के लिए असंभव है ... sns.heatmap(df)। अगर मैं प्रदर्शित करने की कोशिश करता हूं corr = df.corr(), तो तालिका स्क्रीन पर फिट नहीं होती है और मैं सभी सहसंबंधों …

17 visualization pandas plotting machine-learning neural-network svm decision-trees svm efficiency python linear-regression machine-learning nlp topic-model lda named-entity-recognition naive-bayes-classifier association-rules fuzzy-logic kaggle deep-learning tensorflow inception classification feature-selection feature-engineering machine-learning scikit-learn tensorflow keras encoding nlp text-mining nlp rnn python neural-network feature-extraction machine-learning predictive-modeling python r linear-regression clustering r ggplot2 neural-network neural-network training python neural-network deep-learning rnn predictive-modeling databases sql programming distribution dataset cross-validation neural-network deep-learning rnn machine-learning machine-learning python deep-learning data-mining tensorflow visualization tools sql embeddings orange feature-extraction unsupervised-learning gan machine-learning python data-mining pandas machine-learning data-mining bigdata apache-spark apache-hadoop deep-learning python convnet keras aggregation clustering k-means r random-forest decision-trees reference-request visualization data pandas plotting neural-network keras rnn theano deep-learning tensorflow inception predictive-modeling deep-learning regression sentiment-analysis nlp encoding deep-learning python scikit-learn lda convnet keras predictive-modeling regression overfitting regression svm prediction machine-learning similarity word2vec information-retrieval word-embeddings neural-network deep-learning rnn

4

क्या पांडा अब डेटा से अधिक तेज है।

https://github.com/Rdatatable/data.table/wiki/Benchmarks-%3A-Grouping 2014 के बाद से डेटाटेबल बेंचमार्क को अपडेट नहीं किया गया है। मैंने सुना है कि Pandasअब इससे कहीं तेज है data.table। क्या ये सच है? क्या किसी ने कोई बेंचमार्क किया है? मैंने पहले कभी पायथन का उपयोग नहीं किया है, लेकिन स्विच करने पर विचार pandasकर सकता …

17 python r pandas data data.table

1

निर्णय पेड़ों में निरंतर चर के लिए एक विभाजन बिंदु कैसे चुना जाता है?

मेरे पास निर्णय पेड़ों से संबंधित दो प्रश्न हैं: यदि हमारे पास एक निरंतर विशेषता है, तो हम विभाजन मूल्य कैसे चुनते हैं? उदाहरण: आयु = (20,29,50,40 ....) कल्पना कीजिए कि हमारे पास एक निरंतर विशेषता जिसमें मान हैं । मैं एक एल्गोरिथ्म कि विभाजन बिंदु पाता है कैसे लिख …

15 classification data decision-trees

5

क्या आधुनिक आर और / या पायथन लाइब्रेरी SQL को अप्रचलित बनाते हैं?

मैं एक ऐसे कार्यालय में काम करता हूँ जहाँ SQL सर्वर डेटा प्रोसेसिंग से लेकर सफाई तक की हर चीज़ की रीढ़ है। मेरे सहकर्मी आने वाले डेटा को व्यवस्थित रूप से संसाधित करने के लिए जटिल कार्यों और संग्रहीत प्रक्रियाओं को लिखने में माहिर हैं ताकि इसे मानकीकृत किया …

14 python r data-cleaning data sql

3

क्या अजगर के लिए कोई अच्छा आउट-ऑफ-द-बॉक्स भाषा मॉडल है?

मैं एक एप्लिकेशन का प्रोटोटाइप बना रहा हूं और मुझे कुछ उत्पन्न वाक्यों के प्रति एकरूपता की गणना करने के लिए एक भाषा मॉडल की आवश्यकता है। क्या अजगर में कोई प्रशिक्षित भाषा मॉडल है जिसका मैं आसानी से उपयोग कर सकता हूं? जैसे कुछ सरल model = LanguageModel('en') p1 …

11 python nlp language-model r statistics linear-regression machine-learning classification random-forest xgboost python sampling data-mining orange predictive-modeling recommender-system statistics dimensionality-reduction pca machine-learning python deep-learning keras reinforcement-learning neural-network image-classification r dplyr deep-learning keras tensorflow lstm dropout machine-learning sampling categorical-data data-imputation machine-learning deep-learning machine-learning-model dropout deep-network pandas data-cleaning data-science-model aggregation python neural-network reinforcement-learning policy-gradients r dataframe dataset statistics prediction forecasting r k-means python scikit-learn labels python orange cloud-computing machine-learning neural-network deep-learning rnn recurrent-neural-net logistic-regression missing-data deep-learning autoencoder apache-hadoop time-series data preprocessing classification predictive-modeling time-series machine-learning python feature-selection autoencoder deep-learning keras tensorflow lstm word-embeddings predictive-modeling prediction machine-learning-model machine-learning classification binary theory machine-learning neural-network time-series lstm rnn neural-network deep-learning keras tensorflow convnet computer-vision

2

मेरे मशीन लर्निंग मॉडल को प्रशिक्षित करने के लिए कितना डेटा पर्याप्त है?

मैं थोड़ी देर के लिए मशीन लर्निंग और बायोइनफॉरमैटिक्स पर काम कर रहा हूं, और आज मैंने एक सहकर्मी के साथ डेटा माइनिंग के मुख्य मुद्दों के बारे में बातचीत की। मेरे सहकर्मी (जो एक मशीन लर्निंग विशेषज्ञ हैं) ने कहा कि, उनकी राय में, मशीन लर्निंग का सबसे महत्वपूर्ण …

11 machine-learning data-mining dataset data-cleaning data

2

बड़ी संख्या में सुविधाओं के साथ लॉजिस्टिक प्रतिगमन कैसे करें?

मेरे पास 330 नमूनों के साथ एक डेटासेट है और प्रत्येक नमूने के लिए 27 सुविधाएँ, लॉजिस्टिक रिग्रेशन के लिए एक बाइनरी क्लास समस्या है। "नियम अगर दस" के अनुसार मुझे शामिल होने के लिए प्रत्येक सुविधा के लिए कम से कम 10 घटनाओं की आवश्यकता है। हालाँकि, मेरे पास …

10 machine-learning python predictive-modeling logistic-regression data

4

फीचर इम्पोर्टेंस के संदर्भ में डिसिजन ट्री की व्याख्या करना

मैं यह समझने की कोशिश कर रहा हूं कि स्केलेर के साथ निर्मित एक निर्णय ट्री वर्गीकरण मॉडल की निर्णय प्रक्रिया को पूरी तरह से कैसे समझा जाए। मैं जिस 2 मुख्य पहलू को देख रहा हूं, वे हैं पेड़ का रेखांकन प्रतिनिधित्व और फीचर इंपोर्टेंस की सूची। मुझे समझ …

9 machine-learning visualization scikit-learn data decision-trees

data पर टैग किए गए जवाब