COLLECTED BY
Organization:
Internet Archive
The Internet Archive discovers and captures web pages through many different web crawls.
At any given time several distinct crawls are running, some for months, and some every day or longer.
View the web archive through the
Wayback Machine .
The Wayback Machine - https://web.archive.org/web/20211002153454/https://github.com/topics/language-classification
#
language-classification
Here are
19 public repositories
matching this topic...
👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike
Updated
Jul 19, 2021
Kotlin
👄 The most accurate natural language detection library in the Go ecosystem, suitable for long and short text alike
👄 The most accurate natural language detection library in the Rust ecosystem, suitable for long and short text alike
An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.
A Language Classifier powered by Recurrent Neural Network implemented in Python without AI libraries. AI from scratch.
Updated
Sep 7, 2021
Python
Hyperdimensional computing explained and demonstrated
Classifier that identifies Greek text as Cypriot Greek or Standard Modern Greek
Updated
Oct 4, 2019
Jupyter Notebook
语言识别数据集的基本数据分析方法,包括SVM算法。
Updated
Apr 22, 2017
Python
An ensemble of neural network models for toxic language classification
Updated
Nov 27, 2019
Python
Suite of Python modules to recognise the language of a file
Updated
Sep 23, 2020
Python
Classifying English, Slovak, Czech language using Naive Bayes
Updated
Sep 14, 2020
Smalltalk
Detecting hate speech in tweets using bag-of-trick models and bi-LSTM networks.
Updated
Oct 13, 2017
Python
Detecting the location and native language of a place from an image
Updated
Oct 30, 2017
Python
Using Decision Tree and AdaBoost to classify languages(English/Dutch)
Updated
May 13, 2020
Python
This program identify the input text language.
Updated
Apr 21, 2020
Jupyter Notebook
Classified sentences into one of Slovak, Czech, and English. Implemented relevant preprocessing steps, addressed the class imbalance in training set by employing the learned theory of Naive Bayes Models, and implementing subword units.
Updated
Jun 4, 2020
Smalltalk
Implementing a Naive Bayes Classifier for multiclass classification to identify language of a given text
Updated
Aug 27, 2017
Scala
Updated
Jan 21, 2021
Jupyter Notebook
Improve this page
Add a description, image, and links to the
language-classification
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
language-classification
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.
Hi guys,
after downloading and extracting the Turkish part of the OSCAR 21.09 release, I've found some sentences with encoding errors:
I did a
grep -c "�" tr_part_*
over the complete corpus, here are some stats:| Filename | Affected number of lines
| --------------