pyspark

I have a simple regression task (using a LightGBMRegressor) where I want to penalize negative predictions more than positive ones. Is there a way to achieve this with the default regression LightGBM objectives (see https://lightgbm.readthedocs.io/en/latest/Parameters.html)? If not, is it somehow possible to define (many example for default LightGBM model) and pass a custom regression objective?

Hello everyone,
Recently I tried to set up petastorm on my company's hadoop cluster.
However as the cluster uses Kerberos for authentication using petastorm failed.
I figured out that petastorm relies on pyarrow which actually supports kerberos authentication.

I hacked "petastorm/petastorm/hdfs/namenode.py" line 250
and replaced it with

driver = 'libhdfs'
return pyarrow.hdfs.c

if they are not class methods then the method would be invoked for every test and a session would be created for each of those tests.

`class PySparkTest(unittest.TestCase):
@classmethod
def suppress_py4j_logging(cls):
logger = logging.getLogger('py4j')
logger.setLevel(logging.WARN)

@classmethod
def create_testing_pyspark_session(cls):
    return Sp

https://linkedin.github.io/feathr/how-to-guides/feathr-configuration-and-env.html

It looks good in GitHub:

In this PR, I wanted to solve issue #25 by creating a CSV file to list and also help the brand property matching process. This PR including:

Compile a list of all brand names and operator names from the name-suggestion-index as a CSV with the columns id, display_name, and wiki_data in tmp folder.
Deadline: 05.01.2022
Create a PySpark UDF similar to the ones in the osm

These files belong to the Gimel Discovery Service, which is still Work-In-Progress in PayPal & not yet open sourced. In addition, the logic in these files are outdated & hence it does not make sense to have these files in the repo.

https://github.com/paypal/gimel/search?l=Shell
Remove --> gimel-dataapi/gimel-core/src/main/scripts/tools/bin/hbase/hbase_ddl_creator.sh

https://github.com/paypa

Pivot missing categories breaks FeatureSet/AggregatedFeatureSet

Summary

When defining a feature set, it's expected that pivot will have all categories and, as a consequence, the resulting Source dataframe will be suitable to be transformed. When a different behavior happens, FeatureSet and AggregatedFeatureSet breaks.

Feature related:

Age: legacy

May	JUL	Aug
	09
2021	2022	2023

pyspark

Here are 2,061 public repositories matching this topic...

microsoft / SynapseML

JohnSnowLabs / spark-nlp

apache / incubator-linkis

ibis-project / ibis

uber / petastorm

jadianes / spark-py-notebooks

awesome-spark / awesome-spark

hi-primus / optimus

jupyter-incubator / sparkmagic

mahmoudparsian / data-algorithms-book

AlexIoannides / pyspark-example-project

linkedin / feathr

WeBankFinTech / Scriptis

HariSekhon / DevOps-Python-tools

kuwala-io / kuwala

ankurchavda / SparkLearning

ericxiao251 / spark-syntax

lyhue1991 / eat_pyspark_in_10_days

ekampf / PySpark-Boilerplate

huseinzol05 / Gather-Deployment

awesome-spark / spark-gotchas

MrPowers / quinn

CamDavidsonPilon / tdigest

Morphl-AI / MorphL-Community-Edition

XD-DENG / Spark-practice

cluster-apps-on-docker / spark-standalone-cluster-on-docker

paypal / gimel

commoncrawl / cc-pyspark

tirthajyoti / Spark-with-Python

quintoandar / butterfree

Pivot missing categories breaks FeatureSet/AggregatedFeatureSet

Summary

Improve this page

Add this topic to your repo