Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
Demo on the capability of Yandex CatBoost gradient boosting classifier on a fictitious IBM HR dataset obtained from Kaggle. Data exploration, cleaning, preprocessing and model tuning are performed on the dataset
A command-line utility program for automating the trivial, frequently occurring data preparation tasks: missing value interpolation, outlier removal, and encoding categorical variables.
This repository contains Sentiment Classification, Word Level Text Generation, Character Level Text Generation and other important codes/notes on NLP. Python and Keras are used for implementation.
Deep Neural Networks like Single Layer Perceptron and Multi Layer Perceptron implementation using Tensorflow library on Datasets like MNIST and Naval Mine for categorical Classification. Saving and Restoring Tensorflow "Variables" weights for testing.
This is my contribution to a competition on kaggle.com, where you have a dataset with 79 explanatory variables describing (almost) every aspect of c. 1500 residential homes in Ames, Iowa. The aim is to predict the final price of each home.
Feature Importance of categorical variables by converting them into dummy variables (One-hot-encoding) can skewed or hard to interpret results. Here I present a method to get around this problem using H2O.
Calculate accuracy of prediction per pixel.