0% found this document useful (0 votes)
501 views1 page

Hadoop and Mapreduce Cheat Sheet

Hadoop and MapReduce provide tools for processing large datasets across clusters of computers. The hdfs dfs commands allow users to interact with HDFS, including listing files, uploading/downloading, changing permissions, and checking file sizes. MapReduce is a programming model used for distributed computing on large datasets using Hadoop, with jobs submitted, tracked, and their completion/counters checked via commands like hadoop job -submit and hadoop job -status.

Uploaded by

connectbalajir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
501 views1 page

Hadoop and Mapreduce Cheat Sheet

Hadoop and MapReduce provide tools for processing large datasets across clusters of computers. The hdfs dfs commands allow users to interact with HDFS, including listing files, uploading/downloading, changing permissions, and checking file sizes. MapReduce is a programming model used for distributed computing on large datasets using Hadoop, with jobs submitted, tracked, and their completion/counters checked via commands like hadoop job -submit and hadoop job -status.

Uploaded by

connectbalajir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

HADOOP AND Hadoop HDFS List File Commands

hdfs dfs –ls /


Tasks
Lists all the files and directories given for the hdfs
Commands used to interact with MapReduce

Commands Tasks

MAPREDUCE
destination path
hdfs dfs –ls –d /hadoop This command lists all the details of the Hadoop files hadoop job -submit <job-file> used to submit the Jobs created
Recursively lists all the files in the Hadoop directory shows map & reduce completion
hdfs dfs –ls –R /hadoop hadoop job -status <job-id>

C H E AT S H E E T
and al sub directories in Hadoop directory
status and all job counters
This command lists all the files in the Hadoop directory
hdfs dfs –ls hadoop/ dat* hadoop job -counter <job-id> <group-
starting with ‘dat’ prints the counter value
name><countername>

Hadoop & MapReduce


Hdfs basic commands Tasks hadoop job -kill <job-id> This command kills the job

hdfs dfs -put logs.csv /data/


This command is used to upload the files from local file
MapReduce hadoop job -events <job-id> shows the event details received

Basics
system to HDFS
<fromevent-#> <#-of-events> by the job tracker for given range
hdfs dfs -cat /data/logs.csv This command is used to read the content from the file MapReduce is a framework for processing parallelizable problems Prints the job details, and killed
across huge datasets using a large number of systems referred as hadoop job -history [all] <jobOutputDir>
hdfs dfs -chmod 744 /data/logs.csv This command is used to change the permission of the files and failed tip details
clusters. Basically, it is a processing technique and program model for This command is used to display
Hadoop hdfs dfs -chmod –R 744 /data/logs.csv
This command is used to change the permission of the files
distributed computing based on Java
hadoop job -list[all]
all the jobs
recursively
Hadoop is a framework basically designed to handle a large volume of data This command is used to kill the
hdfs dfs -setrep -w 5 /data/logs.csv This command is used to set the replication factor to 5 hadoop job -kill-task <task-id>
both structured and unstructured tasks
hdfs dfs -du -h /data/logs.csv This command is used to check the size of the file Mahout hadoop job -fail-task <task-id>
This command is used to fail the
This command is used to move the files to a newly created
HDFS hdfs dfs -mv logs.csv logs/
subdirectory Apache Mahout is an open source algebraic framework used for data
hadoop job -set-priority <job-id>
task
Changes and sets the priority of
mining which works along with the distributed environments with
Hadoop Distributed File System is a framework designed to manage huge hdfs dfs -rm -r logs This command is used to remove the directories from Hdfs <priority> the job
volumes of data in a simple and pragmatic way. It contains a vast amount of simple programming languages
stop-all.sh This command is used to stop the cluster HADOOP_HOME/bin/hadoop job -kill This command kills the job
servers and each stores a part of file system start-all.sh This command is used to start the cluster <JOB-ID> created
In order to secure Hadoop, configure Hadoop with the following aspects Hadoop version This command is used to check the version of Hadoop HADOOP_HOME/bin/hadoop job - This is used to show the history
Components of history <DIR-NAME> of the jobs
• Authentication: hdfs fsck/ This command is used to check the health of the files
• Define users
This command is used to turn off the safemode of
MapReduce Important commands used in MapReduce
Hdfs dfsadmin –safemode leave
• Enable Kerberos in Hadoop namenode PayLoad: The applications implement Map and Reduce functions and
Usage: mapred [Generic commands] <parameters>
• Set-up Knox gateway to control access and authentication Hdfs namenode -format This command is used to format the NameNode form the core of the job
to the HDFS cluster hadoop [--config confdir]archive - Parameters Tasks
MRUnit: Unit test framework for MapReduce
This command is used to create a Hadoop archieve
• Authorization: archiveName NAME -p
Mapper: Mapper maps the input key/value pairs to the set of -input directory/file-name Shows Inputs the location for mapper
• Define groups hadoop fs [generic options] -touchz
intermediate key/value pairs -output directory-name Shows output location for the mapper
This is used to create an empty files in a hdfs directory
• Define HDFS permissions <path> ... -mapper executable or
NameNode: Node that manages the HDFS is known as namednode Used for Mapper executable
• Define HDFS ACL’s hdfs dfs [generic options] -getmerge This is used to concatenate all files in a directory into one script or JavaClassName
DataNode: Node where the data is presented before processing takes
[-nl] <src> <localdst> file
• Audit: -reducer executable or
hdfs dfs -chown -R admin:hadoop This is used to change the owner of the group
place Used for reducer executable
Enable process execution audit trail script or JavaClassName
/new-dir MasterNode: Node where the jobtrackers runs and accept the job
• Data protection: Makes the mapper, reducer, combiner
request from the clients
Enable wire encryption with Hadoop Commands Tasks
SlaveNode: Node where the Map and Reduce program runs -file file-name executable available locally on the
yarn This command shows the yarn help
JobTracker: Schedules jobs and tracks the assigned jobs to the task computing nodes
yarn [--config confdir] This command is used to define configuration file
tracker -numReduceTasks This is used to specify number of reducers
yarn [--loglevel loglevel] This can be used to define the log level, which can be
fatal, error, warn, info, debug or trace -mapdebug Script to call when the map task fails
TaskTracker: Tracks the task and updates the status to the job tracker
yarn classpath This is used to show the Hadoop classpath -reducedebug Script to call when the reduce task fails
Job: A program which is an execution of a Mapper and Reducer across
yarn application This is used to show and kill the Hadoop applications a dataset
yarn applicationattempt This shows the application attempt
Task: An execution of Mapper and Reducer on a piece of data
yarn container This command shows the container information
Task Attempt: A particular instance of an attempt to execute a task on
yarn node This shows the node information FURTHERMORE:
a SlaveNode
yarn queue This shows the queue information Big Data Hadoop Certification Training Course

You might also like