Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
N-Gram language model that learns n-gram probabilities from a given corpus and generates new sentences from it based on the conditional probabilities from the generated words and phrases.
Built a system from scratch in Python which can detect spelling and grammatical errors in a word and sentence respectively using N-gram based Smoothed-Language Model, Levenshtein Distance, Hidden Markov Model and Naive Bayes Classifier.
This repository contains the group projects undertaken during the course "Text Engineering and Analytics" taught by Prof. Ion Androutsopoulos in the context of Msc. in Data Science at Athens University of Economics and Business.
Part of a semester project this grammarchecker uses a n-gram language model to detect grammatical errors and a bert model is used to generate suggestions.
Prediction using a Ngram language model the probability that a given text is the work of a certain author. Also generates a text similar to the work of a given author