The Wayback Machine - https://web.archive.org/web/20220320185734/https://github.com/huggingface/datasets/pull/2893
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add mbpp dataset #2893

Merged
merged 5 commits into from Sep 16, 2021
Merged

add mbpp dataset #2893

merged 5 commits into from Sep 16, 2021

Conversation

lvwerra
Copy link
Member

@lvwerra lvwerra commented Sep 10, 2021

This PR adds the mbpp dataset introduced by Google here as mentioned in #2816.

The dataset contain two versions: a full and a sanitized one. They have a slightly different schema and it is current state the loading preserves the original schema. An open question is whether to harmonize the two schemas when loading the dataset or to preserve the original one. Since not all fields are overlapping the schema will not be exactly the same.

Copy link
Member

@lhoestq lhoestq left a comment

Cool ! Thanks for adding this dataset :)

datasets/mbpp/README.md Outdated Show resolved Hide resolved
datasets/mbpp/README.md Outdated Show resolved Hide resolved
@lhoestq
Copy link
Member

@lhoestq lhoestq commented Sep 16, 2021

I think it's fine to have the original schema

@lhoestq lhoestq merged commit 95037b7 into huggingface:master Sep 16, 2021
2 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants