The Wayback Machine - https://web.archive.org/web/20220310073133/https://github.com/topics/language-recognition
Skip to content
#

language-recognition

Here are 30 public repositories matching this topic...

The European Parliament Proceedings Parallel Corpus (1996-2011) (https://www.statmt.org/europarl/) is a well-known dataset in Natural Language Processing tasks, it contains proceedings of the European Parliament in 21 European languages. In this project we will only extract data from 6 languages (German, French, Spanish, Italian, Polish and English), we will extract, preprocess, clean and normalize the data and after that we will train on that data some quite simple classifiers that will be able to tell in which language a sentence is written. This was originally a project i did on university.
  • Updated Jan 26, 2022
  • Python

Improve this page

Add a description, image, and links to the language-recognition topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the language-recognition topic, visit your repo's landing page and select "manage topics."

Learn more