data-engineering

The Mixed Time-Series chart type allows for configuring the title of the primary and the secondary y-axis.
However, while only the title of the primary axis is shown next to the axis, the title of the secondary one is placed at the upper end of the axis where it gets hidden by bar values and zoom controls.

How to reproduce the bug

Create a mixed time-series chart
Configure axi

Hello!

I've found an issue here:[Bitbucket Storage](https://docs.prefect.io/api/latest/storage.html#github) is a storage option that uploads flows to a Bitbucket repository as .py files.

Page reference: https://docs.prefect.io/orchestration/flow_config/storage.html#bitbucket

First, the link is incorrect. Second, should the line read more like Github where it references the repository

The datatype -> name calulation in the FE doesn't currently work. Effectively only the type will be taken into account, but not the format or airbyte_type. Our code to calculate the types (useTranslateDataType) correctly would calculate it, but we're actually never passing in the format a

Describe the bug
data docs columns shrink to 1 character width with long query

To Reproduce
Steps to reproduce the behavior:

make a batch from a long query string
run validation
render result to data docs
See screenshot
<img width="1525" alt="Data_documentation_compiled_by_Great_Expectations" src="https://user-images.githubusercontent.com/928247/103230647-30eca500-4

This should error:

@repository
def repo(default_executor_def=in_process_executor):
    ...

Is your feature request related to a problem? Please describe.

Feast is often hard to install alongside other python packages that use google-cloud-core. Specifically, Feast sets an upper-bound on this library (2.0.0), but the latest version is 2.3.1 and many python packages have a lower-bound of 2.0.0 and above.

Describe the solution you'd like

Remove google-cloud-core fr

If you add multiple projects in GrowthBook, the dropdown list in the left navigation is sorted by creation date. We should instead sort them alphabetically. The dropdown is rendered in packages/front-end/components/Layout/ProjectSelector.tsx

Error should not be wrapped with key information. Instead add logging to module

I got a user who asked me if their pipeline.yaml looked well and I noticed they had something like this:

tasks:
   # ... a bunch of tasks
   - source: fit.py
      product:
        nb: path/to/report.html
        data: path/to/data.csv
        data1: path/to/data.csv
        data2: path/to/data.csv

The user was confused and thought that the naming should be name, name1

Description

Add a new Bitbucket data source plugin for Bitbucket so DevLake can collect dev data from it.

Pre-requisites

Understand how a DevLake plugin works.
Make sure you have access to Bitbucket API.

Describe the solution you'd like

Add a plugin for bitbucket. Please refer to other data source plug

In some cases, you might want to checksum everything, and instead of having to type out all the columns yourself, we should be able to get them from the schema.

--all-mutual-tables could be a nice addition at some point too...

remove from metric computation string types (keep only numbers, dates)
add date & timestamp types to min/max metric type

(1) Add docstrings to methods
(2) Covert .format() methods to f strings for readability
(3) Make sure we are using Python 3.8 throughout
(4) zip extract_all() in ingest_flights.py can be simplified with a Path parameter

Hi ,

I am using some basic functions from pyjanitor such as - clean_names() , collapse_levels() in one of my code which I want to productionise.
And there are limitations on the size of the production code base.
Currently ,if I just look at the requirements.txt for just "pyjanitor" , its huge .
I don't think I require all the dependencies in my code.
How can I remove the unnecessary ones ?

if they are not class methods then the method would be invoked for every test and a session would be created for each of those tests.

`class PySparkTest(unittest.TestCase):
@classmethod
def suppress_py4j_logging(cls):
logger = logging.getLogger('py4j')
logger.setLevel(logging.WARN)

@classmethod
def create_testing_pyspark_session(cls):
    return Sp

Jun	JUL	Aug
	18
2021	2022	2023

data-engineering

Here are 1,420 public repositories matching this topic...

apache / superset

How to reproduce the bug

eugeneyan / applied-ml

andkret / Cookbook

datastacktv / data-engineer-roadmap

PrefectHQ / prefect

airbytehq / airbyte

great-expectations / great_expectations

DataTalksClub / data-engineering-zoomcamp

dagster-io / dagster

benthosdev / benthos

feast-dev / feast

growthbook / growthbook

awslabs / aws-data-wrangler

treeverse / lakeFS

ploomber / ploomber

kestra-io / kestra

adilkhash / Data-Engineering-HowTo

apache / incubator-devlake

Description

Pre-requisites

Describe the solution you'd like

kantord / just-dashboard

metarank / metarank

datafold / data-diff

quiltdata / quilt

open-metadata / OpenMetadata

GoogleCloudPlatform / data-science-on-gcp

benthecoder / yt-channels-DS-AI-ML-CS

pyjanitor-devs / pyjanitor

san089 / goodreads_etl_pipeline

sodadata / soda-core

AlexIoannides / pyspark-example-project

abhishek-ch / around-dataengineering

Improve this page

Add this topic to your repo