The Wayback Machine - https://web.archive.org/web/20200901001047/https://github.com/topics/scikit-learn
Skip to content
#

scikit-learn

scikit-learn logo

scikit-learn is a widely-used Python module for classic machine learning. It is built on top of SciPy.

Here are 3,598 public repositories matching this topic...

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
  • Updated Jul 24, 2020
  • Python
eugeneh101
eugeneh101 commented Apr 24, 2020

If you join Dask DataFrame on a categorical column, then the outputted Dask DataFrame column is still category dtype. However, the moment you .compute() the outputted Dask DataFrame, then the column is the wrong dtype, not categorical.

Tested on Dask 2.14.0 and Pandas 1.0.3
This example where the category type looks like a float, so after .compute(), the dtype is float.

import dask.d
CJStadler
CJStadler commented Jul 23, 2019

For example, if there is a relationship transaction.session_id -> sessions.id and we are calculating a feature transactions: sessions.SUM(transactions.value) any rows for which there is no corresponding session should be given the default value of 0 instead of NaN.

Of course this should not normally occur, but when it does it seems more reasonable to use the default_value.

`DirectF

A comprehensive list of Deep Learning / Artificial Intelligence and Machine Learning tutorials - rapidly expanding into areas of AI/Deep Learning / Machine Vision / NLP and industry specific areas such as Climate / Energy, Automotives, Retail, Pharma, Medicine, Healthcare, Policy, Ethics and more.
  • Updated Aug 31, 2020
  • Python

Created by David Cournapeau

Released January 05, 2010

Latest release 28 days ago

Repository
scikit-learn/scikit-learn
Website
scikit-learn.org
Wikipedia
Wikipedia

Related Topics

python scikit
You can’t perform that action at this time.