nlp-datasets

Rather than the current system of each sub-corpora it is own folder with its own code. Create a top-level downloads.sh which can re-assemble the sub-corpora.

Separately, have the downloaded & pre-processed sub-corpora ready to be referenced from ADR, and NMT repos as submodules etc.

Jun	AUG	Feb
	29
2020	2021	2022

nlp-datasets

Here are 99 public repositories matching this topic...

mihail911 / nlp-library

dkulagin / kartaslov

hellohaptik / multi-task-NLP

quincyliang / nlp-public-dataset

cjiang2 / VDCNN

INK-USC / TriggerNER

grammarly / ua-gec

INK-USC / CommonGen

irfnrdh / Awesome-Indonesia-NLP

chiphuyen / MetroTwitter

kelvin-jiang / FreebaseQA

Pzoom522 / HistSumm

xtea / chinese_medical_words

Niger-Volta-LTI / yoruba-text

Provide a script to cleanly download and normalize text

gcunhase / AMICorpusXML

bothub-it / bothub

secsilm / zi-dataset

selimfirat / bilkent-turkish-writings-dataset

uma-pi1 / OPIEC

cyrilou242 / RapLyrics-Back

maxent-ai / Datasets

INK-USC / XCSR

ElizaLo / Question-Answering-based-on-SQuAD

utahnlp / infotabs-code

uma-pi1 / OPIEC-pipeline

cybermatt / russian-names

SemiringInc / Mueller-Report-Corpus

MiniXC / opensubtitles-dataloader

jamesohortle / loanwords_gairaigo

aajanki / finnish-nlp-datasets

Improve this page

Add this topic to your repo