pyspark
Here are 2,061 public repositories matching this topic...
-
Updated
Jul 8, 2022 - Scala
-
Updated
Jul 8, 2022 - Java
-
Updated
Jul 9, 2022 - Python
Hello everyone,
Recently I tried to set up petastorm on my company's hadoop cluster.
However as the cluster uses Kerberos for authentication using petastorm failed.
I figured out that petastorm relies on pyarrow which actually supports kerberos authentication.
I hacked "petastorm/petastorm/hdfs/namenode.py" line 250
and replaced it with
driver = 'libhdfs'
return pyarrow.hdfs.c
-
Updated
Apr 7, 2021 - Jupyter Notebook
-
Updated
Mar 15, 2022 - Shell
-
Updated
Jul 6, 2022 - Python
-
Updated
Apr 12, 2022 - Java
if they are not class methods then the method would be invoked for every test and a session would be created for each of those tests.
`class PySparkTest(unittest.TestCase):
@classmethod
def suppress_py4j_logging(cls):
logger = logging.getLogger('py4j')
logger.setLevel(logging.WARN)
@classmethod
def create_testing_pyspark_session(cls):
return Sp
-
Updated
Jun 30, 2022 - Vue
-
Updated
Jun 21, 2022 - Python
In this PR, I wanted to solve issue #25 by creating a CSV file to list and also help the brand property matching process. This PR including:
-
Compile a list of all brand names and operator names from the name-suggestion-index as a CSV with the columns
id
,display_name
, andwiki_data
intmp
folder.
Deadline: 05.01.2022 -
Create a PySpark UDF similar to the ones in the osm
-
Updated
Feb 11, 2022 - Jupyter Notebook
-
Updated
Mar 30, 2021 - Python
-
Updated
Apr 17, 2022 - Jupyter Notebook
-
Updated
Jun 6, 2017
-
Updated
Apr 24, 2022 - Python
-
Updated
Nov 21, 2021 - Python
-
Updated
Oct 2, 2019 - Python
These files belong to the Gimel Discovery Service, which is still Work-In-Progress in PayPal & not yet open sourced. In addition, the logic in these files are outdated & hence it does not make sense to have these files in the repo.
https://github.com/paypal/gimel/search?l=Shell
Remove --> gimel-dataapi/gimel-core/src/main/scripts/tools/bin/hbase/hbase_ddl_creator.sh
-
Updated
Jul 6, 2022 - Python
-
Updated
Jul 7, 2020 - Jupyter Notebook
Pivot missing categories breaks FeatureSet/AggregatedFeatureSet
Summary
When defining a feature set, it's expected that pivot
will have all categories and, as a consequence, the resulting Source
dataframe will be suitable to be transformed. When a different behavior happens, FeatureSet
and AggregatedFeatureSet
breaks.
Feature related:
Age: legacy
Improve this page
Add a description, image, and links to the pyspark topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the pyspark topic, visit your repo's landing page and select "manage topics."
I have a simple regression task (using a LightGBMRegressor) where I want to penalize negative predictions more than positive ones. Is there a way to achieve this with the default regression LightGBM objectives (see https://lightgbm.readthedocs.io/en/latest/Parameters.html)? If not, is it somehow possible to define (many example for default LightGBM model) and pass a custom regression objective?