The Wayback Machine - https://web.archive.org/web/20220718082744/https://github.com/topics/data-engineering
Skip to content
#

data-engineering

Here are 1,420 public repositories matching this topic...

superset
rumbin
rumbin commented Jan 31, 2022

The Mixed Time-Series chart type allows for configuring the title of the primary and the secondary y-axis.
However, while only the title of the primary axis is shown next to the axis, the title of the secondary one is placed at the upper end of the axis where it gets hidden by bar values and zoom controls.

How to reproduce the bug

  1. Create a mixed time-series chart
  2. Configure axi
good first issue #bug validation:validated preset:cares
RowdyHowell
RowdyHowell commented Feb 28, 2022

Hello!

I've found an issue here:[Bitbucket Storage](https://docs.prefect.io/api/latest/storage.html#github) is a storage option that uploads flows to a Bitbucket repository as .py files.

Page reference: https://docs.prefect.io/orchestration/flow_config/storage.html#bitbucket

First, the link is incorrect. Second, should the line read more like Github where it references the repository

good first issue docs
Aylr
Aylr commented Dec 28, 2020

Describe the bug
data docs columns shrink to 1 character width with long query

To Reproduce
Steps to reproduce the behavior:

  1. make a batch from a long query string
  2. run validation
  3. render result to data docs
  4. See screenshot
    <img width="1525" alt="Data_documentation_compiled_by_Great_Expectations" src="https://user-images.githubusercontent.com/928247/103230647-30eca500-4
enhancement help wanted good first issue core-team
chhabrakadabra
chhabrakadabra commented Jun 30, 2022

Is your feature request related to a problem? Please describe.

Feast is often hard to install alongside other python packages that use google-cloud-core. Specifically, Feast sets an upper-bound on this library (2.0.0), but the latest version is 2.3.1 and many python packages have a lower-bound of 2.0.0 and above.

Describe the solution you'd like

Remove google-cloud-core fr

kind/feature good first issue Community Contribution Needed
growthbook
jdorn
jdorn commented Jul 7, 2022

If you add multiple projects in GrowthBook, the dropdown list in the left navigation is sorted by creation date. We should instead sort them alphabetically. The dropdown is rendered in packages/front-end/components/Layout/ProjectSelector.tsx

good first issue
aws-data-wrangler

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
  • Updated Jul 18, 2022
  • Python
lakeFS
edublancas
edublancas commented Jun 30, 2022

I got a user who asked me if their pipeline.yaml looked well and I noticed they had something like this:

tasks:
   # ... a bunch of tasks
   - source: fit.py
      product:
        nb: path/to/report.html
        data: path/to/data.csv
        data1: path/to/data.csv
        data2: path/to/data.csv

The user was confused and thought that the naming should be name, name1

documentation good first issue low priority
hezyin
hezyin commented Jun 6, 2022

Description

Add a new Bitbucket data source plugin for Bitbucket so DevLake can collect dev data from it.

Pre-requisites

  1. Understand how a DevLake plugin works.
  2. Make sure you have access to Bitbucket API.

Describe the solution you'd like

Add a plugin for bitbucket. Please refer to other data source plug

good first issue type/feature-request
sirupsen
sirupsen commented Jun 21, 2022

In some cases, you might want to checksum everything, and instead of having to type out all the columns yourself, we should be able to get them from the schema.

--all-mutual-tables could be a nice addition at some point too...

enhancement good first issue

A comprehensive list of 180+ YouTube Channels for Data Science, Data Engineering, Machine Learning, Deep learning, Computer Science, programming, software engineering, etc.
  • Updated Dec 31, 2021
anks7190
anks7190 commented Jan 27, 2021

Hi ,

I am using some basic functions from pyjanitor such as - clean_names() , collapse_levels() in one of my code which I want to productionise.
And there are limitations on the size of the production code base.
Currently ,if I just look at the requirements.txt for just "pyjanitor" , its huge .
I don't think I require all the dependencies in my code.
How can I remove the unnecessary ones ?

help wanted good first issue available for hacking infrastructure

Improve this page

Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."

Learn more