The Wayback Machine - https://web.archive.org/web/20220709005742/https://github.com/topics/pyspark
Skip to content
#

pyspark

Here are 2,061 public repositories matching this topic...

SynapseML
brunocous
brunocous commented Sep 2, 2020

I have a simple regression task (using a LightGBMRegressor) where I want to penalize negative predictions more than positive ones. Is there a way to achieve this with the default regression LightGBM objectives (see https://lightgbm.readthedocs.io/en/latest/Parameters.html)? If not, is it somehow possible to define (many example for default LightGBM model) and pass a custom regression objective?

Jonathanpro
Jonathanpro commented Jan 2, 2019

Hello everyone,
Recently I tried to set up petastorm on my company's hadoop cluster.
However as the cluster uses Kerberos for authentication using petastorm failed.
I figured out that petastorm relies on pyarrow which actually supports kerberos authentication.

I hacked "petastorm/petastorm/hdfs/namenode.py" line 250
and replaced it with

driver = 'libhdfs'
return pyarrow.hdfs.c
enhancement good first issue

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
  • Updated Jun 21, 2022
  • Python
kuwala
IritaSee
IritaSee commented Jan 3, 2022

In this PR, I wanted to solve issue #25 by creating a CSV file to list and also help the brand property matching process. This PR including:

  • Compile a list of all brand names and operator names from the name-suggestion-index as a CSV with the columns id, display_name, and wiki_data in tmp folder.
    Deadline: 05.01.2022

  • Create a PySpark UDF similar to the ones in the osm

enhancement good first issue pipeline/osm-poi

MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization
  • Updated Oct 2, 2019
  • Python
Dee-Pac
Dee-Pac commented Apr 24, 2018

These files belong to the Gimel Discovery Service, which is still Work-In-Progress in PayPal & not yet open sourced. In addition, the logic in these files are outdated & hence it does not make sense to have these files in the repo.

https://github.com/paypal/gimel/search?l=Shell
Remove --> gimel-dataapi/gimel-core/src/main/scripts/tools/bin/hbase/hbase_ddl_creator.sh

https://github.com/paypa

good first issue
AlvaroMarquesAndrade
AlvaroMarquesAndrade commented Sep 17, 2020

Pivot missing categories breaks FeatureSet/AggregatedFeatureSet

Summary

When defining a feature set, it's expected that pivot will have all categories and, as a consequence, the resulting Source dataframe will be suitable to be transformed. When a different behavior happens, FeatureSet and AggregatedFeatureSet breaks.

Feature related:

Age: legacy

bug good first issue

Improve this page

Add a description, image, and links to the pyspark topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pyspark topic, visit your repo's landing page and select "manage topics."

Learn more