The Wayback Machine - https://web.archive.org/web/20220320212258/https://github.com/huggingface/datasets/pull/2853
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AMI dataset #2853

Merged
merged 8 commits into from Sep 29, 2021
Merged

Add AMI dataset #2853

merged 8 commits into from Sep 29, 2021

Conversation

cahya-wirawan
Copy link
Contributor

@cahya-wirawan cahya-wirawan commented Aug 31, 2021

This is an initial commit for AMI dataset

@patrickvonplaten patrickvonplaten self-requested a review Aug 31, 2021
Copy link
Member

@lhoestq lhoestq left a comment

Hi ! Thanks for adding this dataset

Is there anything I can help you with ?
It looks like the dataset script only tries to generate one example "ES2002a.A"

Feel free to ping me if you have any questions or if I can help :)

datasets/ami/README.md Outdated Show resolved Hide resolved
@patrickvonplaten
Copy link
Member

@patrickvonplaten patrickvonplaten commented Sep 24, 2021

Hey @cahya-wirawan,

I played around with the dataset a bit and it looks already very good to me! That's exactly how it should be constructed :-) I can help you a bit with defining the config, etc... on Monday!

@patrickvonplaten
Copy link
Member

@patrickvonplaten patrickvonplaten commented Sep 27, 2021

@patrickvonplaten patrickvonplaten requested a review from lhoestq Sep 29, 2021
Copy link
Member

@lhoestq lhoestq left a comment

Thanks for adding this dataset !

datasets/ami/README.md Outdated Show resolved Hide resolved
_id: [os.path.join(annotation_path, "words/{}.{}.words.xml".format(_id, speaker)) for speaker in _SPEAKERS]
for _id in ids
}
words_paths = {_id: list(filter(lambda path: os.path.isfile(path), words_paths[_id])) for _id in ids}
Copy link
Member

@lhoestq lhoestq Sep 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI os.path.isfile is not supported yet for dataset streaming, but we can add it I think :)

(we just need to check in the remote zip file if the file exists, instead of checking locally)

@lhoestq lhoestq merged commit bbccb55 into huggingface:master Sep 29, 2021
0 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants