The Wayback Machine - https://web.archive.org/web/20210821094953/https://github.com/topics/corpus
Here are
630 public repositories
matching this topic...
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
A collection of small corpuses of interesting data for the creation of bots and similar stuff.
Updated
Jul 5, 2021
JavaScript
中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。
Deep Learning and deep reinforcement learning research papers and some codes
Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator
Updated
Oct 25, 2019
Python
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Updated
Aug 3, 2021
Python
Updated
Mar 1, 2020
Python
Awesome Chatbot Projects,Corpus,Papers,Tutorials.Chinese Chatbot =>:
Updated
Feb 10, 2020
Python
用于训练中英文对话系统的语料库 Datasets for Training Chatbot System
Updated
Sep 23, 2020
Python
A multilingual dialog corpus
Updated
Jun 29, 2021
Python
公司名语料库。机构名语料库。公司简称,缩写,品牌词,企业名。可用于中文分词、机构名实体识别。
Updated
Oct 11, 2020
Python
Chatbot in 200 lines of code using TensorLayer
Updated
Oct 6, 2019
Python
An R package for the Quantitative Analysis of Textual Data
高质量中文预训练模型集合:最先进大模型、最快小模型、相似度专门模型
Updated
Jul 8, 2020
Python
Collections of Chinese NLP corpus
Updated
Dec 28, 2020
Python
Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources
Updated
Jun 23, 2021
Python
Some useful Chinese corpus datasets 中文语料小数据
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
Updated
Aug 16, 2021
Python
Fuzzing resources for feeding various fuzzers with input. 🔧
Updated
May 25, 2021
HTML
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
Updated
Jan 28, 2021
Python
A dataset of millions of news articles scraped from a curated list of data sources.
❤️ Emotional First Aid Dataset, 心理咨询问答、聊天机器人语料库
Updated
Oct 11, 2020
Python
A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.
中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Updated
Jul 5, 2021
Python
data resource untuk NLP bahasa indonesia
Improve this page
Add a description, image, and links to the
corpus
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
corpus
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.