The Wayback Machine - https://web.archive.org/web/20210828103122/https://github.com/topics/big-data
Skip to content
#

big-data

Here are 2,584 public repositories matching this topic...

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
  • Updated May 13, 2021
  • Python
Bluenix2
Bluenix2 commented Aug 7, 2021

Is your feature request related to a problem? Please describe.
Many static type checkers have issues finding Cython's stubs.
Here is from running mypy on my current project:

error: Skipping analyzing "cython": found module but no type hints or library stubs

The same issue can be seen when using import Cython as cython:

error: Skipping analyzing "Cython": found module but 

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
  • Updated Aug 28, 2021
  • Jupyter Notebook
electrum
electrum commented Aug 5, 2021

If the --server option is used without a protocol, then it should use https when on port 443. For example, these invocations would be equivalent, with the first one having the new behavior:

trino --server example.net:443
trino --server https://example.net:443
trino --server https://example.net

This will make the CLI consistent with the JDBC driver in this regard. While it's t

vespa
kkraune
kkraune commented Apr 2, 2021

... to make it easier to read Vespa documentation on an e-reader / offline

Vespa documentation is generated using Jekyll from .md and .html files, look into options for generating the artifact as part of site generation (there might be plugins we can use here)

jovanpop-msft
jovanpop-msft commented Aug 18, 2021

Could we clarify that delta-log files are JSON line-delimited files in https://github.com/delta-io/delta/blob/master/PROTOCOL.md#delta-log-entries ?

In the PROTOCOL.md file it is not clear what is the format of JSON. Every delta-log entry file is "new-line delimited json file", but this is not specified in this file. Protocol do not explicitly specify that every action is stored as a single-lin

seut
seut commented Jun 22, 2021

Use case:

1.) A user may want to backup all tables but no metadata like users, privileges, etc. without explicitly defining each table inside the CREATE SNAPSHOT statement.

2.) A user may want to transfer users & privileges, custom analyzers or user-defined-functions from one cluster to another without backing up a complete cluster including all data (tables).

*Feature description

Improve this page

Add a description, image, and links to the big-data topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the big-data topic, visit your repo's landing page and select "manage topics."

Learn more