huggingface / datasets Public
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AMI dataset #2853
Add AMI dataset #2853
Conversation
Hi ! Thanks for adding this dataset
Is there anything I can help you with ?
It looks like the dataset script only tries to generate one example "ES2002a.A"
Feel free to ping me if you have any questions or if I can help :)
Hey @cahya-wirawan, I played around with the dataset a bit and it looks already very good to me! That's exactly how it should be constructed :-) I can help you a bit with defining the config, etc... on Monday! |
@lhoestq - I think the dataset is ready to be merged :-) At the moment, I don't really see how the failing tests correspond to this PR:
could you maybe give it a look? :-) |
_id: [os.path.join(annotation_path, "words/{}.{}.words.xml".format(_id, speaker)) for speaker in _SPEAKERS] | ||
for _id in ids | ||
} | ||
words_paths = {_id: list(filter(lambda path: os.path.isfile(path), words_paths[_id])) for _id in ids} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI os.path.isfile
is not supported yet for dataset streaming, but we can add it I think :)
(we just need to check in the remote zip file if the file exists, instead of checking locally)
This is an initial commit for AMI dataset