The Wayback Machine - https://web.archive.org/web/20210918031744/https://github.com/topics/data-science
Skip to content
#

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

Here are 21,495 public repositories matching this topic...

reshamas
reshamas commented Aug 6, 2021

Describe the issue linked to the documentation

The "20 newsgroups text" dataset can be accessed within scikit-learn using defined functions. The dataset contains some text which is considered culturally insensitive.

Suggest a potential alternative/fix

Add a section in the dataset documentation, possibly above the "Recommendation" section called "Data Considerations".
https://

superset
junlincc
junlincc commented Sep 16, 2021

"As a user... I would like to target a specific value and not a range of values using a single value slider. For example, I would like to target a single value in a growth rate of 2x. By having a single value slider, this allows me to better target the growth rate."
this request is similar to [[native-filter][ux]hard to specify the min and max in slider when the range is huge](https://github.c

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
  • Updated May 13, 2021
  • Python
edoakes
edoakes commented Sep 8, 2021

From a slack message:

Hi, So I observed that if you deploy a deployment with more replicas than the available resources serve keeps trying to allocate them waiting for autoscaler.

(pid=125021) 2021-09-07 20:52:42,899    INFO http_state.py:75 -- Starting HTTP proxy with name 'pfaUeM:SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-node:192.168.1.13-0' on node 'node:192.168.1.13-0' listening on '12
pytorch-lightning
aprbw
aprbw commented Sep 17, 2021

🚀 Feature

lr_find need unique temporary checkpoint filenames.

Motivation

I'm running a number of experiment in parallel that are saving to the same folder. Thus, they have the same trainer.default_root_dir. However, since they all have the same directory and filename, they are overwriting each other.

Pitch

lr_find temporary checkpoint should have unique filenames.

dash
MrMino
MrMino commented Sep 8, 2021

Minor, non-breaking issue found during review of #13094.

If path of the active virtualenv is a substring of another virtualenv, IPython started from the second one will not fire up any warning.

Example:

virtualenv aaa
virtualenv aaaa
. aaaa/bin/activate
python -m pip install ipython
. aaa/bin/activate
aaaa/bin/ipython

Expected behavior after executing aaaa/bin/ipython:

mwaskom
mwaskom commented Sep 17, 2021

Bug summary

The only way (that I am aware of) to control the linewidth of hatches is through an rc parameter. But temporarily modifying the parameter with plt.rc_context has not effect.

Code for reproduction

import matplotlib.pyplot as plt

plt.figure().subplots().bar([0, 1], [1, 2], hatch=["/", "."], fc="r")

with plt.rc_context({"hatch.linewidth": 5}):
    plt.
danieldeutsch
danieldeutsch commented Jun 2, 2021

Is your feature request related to a problem? Please describe.
I typically used compressed datasets (e.g. gzipped) to save disk space. This works fine with AllenNLP during training because I can write my dataset reader to load the compressed data. However, the predict command opens the file and reads lines for the Predictor. This fails when it tries to load data from my compressed files.

nni