Add AMI dataset #2853

cahya-wirawan · 2021-08-31T10:19:01Z

This is an initial commit for AMI dataset

lhoestq

Hi ! Thanks for adding this dataset

Is there anything I can help you with ?
It looks like the dataset script only tries to generate one example "ES2002a.A"

Feel free to ping me if you have any questions or if I can help :)

datasets/ami/README.md

patrickvonplaten · 2021-09-24T19:32:11Z

Hey @cahya-wirawan,

I played around with the dataset a bit and it looks already very good to me! That's exactly how it should be constructed :-) I can help you a bit with defining the config, etc... on Monday!

patrickvonplaten · 2021-09-27T18:28:31Z

@lhoestq - I think the dataset is ready to be merged :-)

At the moment, I don't really see how the failing tests correspond to this PR:

could you maybe give it a look? :-)

… ami

lhoestq

Thanks for adding this dataset !

datasets/ami/README.md

lhoestq · 2021-09-29T09:18:44Z

datasets/ami/ami.py

+            _id: [os.path.join(annotation_path, "words/{}.{}.words.xml".format(_id, speaker)) for speaker in _SPEAKERS]
+            for _id in ids
+        }
+        words_paths = {_id: list(filter(lambda path: os.path.isfile(path), words_paths[_id])) for _id in ids}


FYI os.path.isfile is not supported yet for dataset streaming, but we can add it I think :)

(we just need to check in the remote zip file if the file exists, instead of checking locally)

add ami dataset

318e675

update the dataset name

ea40a2d

patrickvonplaten self-requested a review Aug 31, 2021

lhoestq reviewed Sep 20, 2021

View changes

datasets/ami/README.md Outdated Show resolved Hide resolved

finish ami

0e52359

make more error proof

ad274bc

finish real dataset

95c514f

finish dummy data tests

5a3efc3

Merge branch 'master' of https://github.com/huggingface/datasets into…

acde325

… ami

patrickvonplaten approved these changes Sep 29, 2021

View changes

patrickvonplaten requested a review from lhoestq Sep 29, 2021

lhoestq approved these changes Sep 29, 2021

View changes

Update datasets/ami/README.md

e91ea6e

lhoestq merged commit bbccb55 into huggingface:master Sep 29, 2021
0 of 6 checks passed

Feb	MAR	Apr
	20
2021	2022	2023

huggingface / datasets Public

Add AMI dataset #2853

Add AMI dataset #2853

cahya-wirawan commented Aug 31, 2021

lhoestq left a comment

patrickvonplaten commented Sep 24, 2021

patrickvonplaten commented Sep 27, 2021

lhoestq left a comment

lhoestq Sep 29, 2021

huggingface / datasets Public

Add AMI dataset #2853

Add AMI dataset #2853

Conversation

cahya-wirawan commented Aug 31, 2021

lhoestq left a comment

patrickvonplaten commented Sep 24, 2021

patrickvonplaten commented Sep 27, 2021

lhoestq left a comment

lhoestq Sep 29, 2021

Choose a reason for hiding this comment