COLLECTED BY
Organization:
Internet Archive
Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
The Wayback Machine - https://web.archive.org/web/20200812010757/https://github.com/topics/hadoop-framework
Here are
23 public repositories
matching this topic...
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
IBIS is a workflow creation-engine that abstracts the Hadoop internals of ingesting RDBMS data.
Updated
Oct 21, 2018
Python
Cloud-based SQL engine using SPARK where data is accessible as JDBC/ODBC data source via Spark ThriftServer.
Updated
Jul 12, 2017
Java
Toy Hadoop cluster combining various SQL-on-Hadoop variants
Updated
Nov 16, 2017
Shell
A storage reference to a comprehensive guide on installing Hadoop on Windows
Updated
Jun 11, 2018
Shell
Updated
Jul 14, 2019
Java
A simple Hadoop-like distributed computing platform implemented in Java. [This is a course project at UIUC (awarded the best Java version implementation) and it's open-sourced for reference.]
Updated
Jan 30, 2020
Java
Code samples, summaries, cheatsheets and other study material for Hadoop MapReduce and Apache Spark
Updated
Aug 17, 2018
Java
This Project focuses on creating a KNN MapReduce program for the Hadoop Framework
Updated
May 21, 2020
Java
Setup hadoop cluster manually and automatically
Updated
Jul 17, 2017
Python
Twitter data analysis using hadoop (hdfs), flume, map-reduce and hive. Sentiment Analysis is also done using affin dictionary for tweets related to Indian election.
PageRank algorithm written in Java MapReduce framework
Updated
Jul 20, 2019
Java
EMR 5.25.0 cluster single node Hadoop docker image. With Amazon Linux, Hadoop 2.8.5 and Hive 2.3.5
Updated
Jan 6, 2020
Shell
The goal of this project is to identify the flood-prone areas with probabilities of flood in counties in a future date, using Spark MLLib.
Updated
Jan 20, 2020
Scala
Updated
Sep 23, 2017
Python
Distributed Hadoop and Spark based framework for in-memory GIS queries
Updated
Jul 11, 2018
Python
Product recommendation system on Amazon product dataset using Apache Spark framework
Updated
Jun 15, 2018
Jupyter Notebook
Updated
Apr 22, 2019
Java
Python Scripts for working with Big Data Files
Updated
Apr 6, 2018
Python
Basic spark examples to scratch some ground
Titanic data analysis with Hadoop
Updated
Nov 13, 2018
Java
WQD7008 Parallel and Distributed Computing Project
Improve this page
Add a description, image, and links to the
hadoop-framework
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
hadoop-framework
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.