0% found this document useful (0 votes)

501 views1 page

Hadoop and Mapreduce Cheat Sheet

Hadoop and MapReduce provide tools for processing large datasets across clusters of computers. The hdfs dfs commands allow users to interact with HDFS, including listing files, uploading/downloading, changing permissions, and checking file sizes. MapReduce is a programming model used for distributed computing on large datasets using Hadoop, with jobs submitted, tracked, and their completion/counters checked via commands like hadoop job -submit and hadoop job -status.

Uploaded by

connectbalajir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

501 views1 page

Hadoop and Mapreduce Cheat Sheet

Uploaded by

connectbalajir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

HADOOP AND Hadoop HDFS List File Commands

hdfs dfs –ls /

Tasks
Lists all the files and directories given for the hdfs
Commands used to interact with MapReduce

Commands Tasks

MAPREDUCE
destination path
hdfs dfs –ls –d /hadoop This command lists all the details of the Hadoop files hadoop job -submit <job-file> used to submit the Jobs created
Recursively lists all the files in the Hadoop directory shows map & reduce completion
hdfs dfs –ls –R /hadoop hadoop job -status <job-id>

C H E AT S H E E T
and al sub directories in Hadoop directory
status and all job counters
This command lists all the files in the Hadoop directory
hdfs dfs –ls hadoop/ dat* hadoop job -counter <job-id> <group-
starting with ‘dat’ prints the counter value
name><countername>

Hadoop & MapReduce

Hdfs basic commands Tasks hadoop job -kill <job-id> This command kills the job

hdfs dfs -put logs.csv /data/

This command is used to upload the files from local file
MapReduce hadoop job -events <job-id> shows the event details received

Basics
system to HDFS
<fromevent-#> <#-of-events> by the job tracker for given range
hdfs dfs -cat /data/logs.csv This command is used to read the content from the file MapReduce is a framework for processing parallelizable problems Prints the job details, and killed
across huge datasets using a large number of systems referred as hadoop job -history [all] <jobOutputDir>
hdfs dfs -chmod 744 /data/logs.csv This command is used to change the permission of the files and failed tip details
clusters. Basically, it is a processing technique and program model for This command is used to display
Hadoop hdfs dfs -chmod –R 744 /data/logs.csv
This command is used to change the permission of the files
distributed computing based on Java
hadoop job -list[all]
all the jobs
recursively
Hadoop is a framework basically designed to handle a large volume of data This command is used to kill the
hdfs dfs -setrep -w 5 /data/logs.csv This command is used to set the replication factor to 5 hadoop job -kill-task <task-id>
both structured and unstructured tasks
hdfs dfs -du -h /data/logs.csv This command is used to check the size of the file Mahout hadoop job -fail-task <task-id>
This command is used to fail the
This command is used to move the files to a newly created
HDFS hdfs dfs -mv logs.csv logs/
subdirectory Apache Mahout is an open source algebraic framework used for data
hadoop job -set-priority <job-id>
task
Changes and sets the priority of
mining which works along with the distributed environments with
Hadoop Distributed File System is a framework designed to manage huge hdfs dfs -rm -r logs This command is used to remove the directories from Hdfs <priority> the job
volumes of data in a simple and pragmatic way. It contains a vast amount of simple programming languages
stop-all.sh This command is used to stop the cluster HADOOP_HOME/bin/hadoop job -kill This command kills the job
servers and each stores a part of file system start-all.sh This command is used to start the cluster <JOB-ID> created
In order to secure Hadoop, configure Hadoop with the following aspects Hadoop version This command is used to check the version of Hadoop HADOOP_HOME/bin/hadoop job - This is used to show the history
Components of history <DIR-NAME> of the jobs
• Authentication: hdfs fsck/ This command is used to check the health of the files
• Define users
This command is used to turn off the safemode of
MapReduce Important commands used in MapReduce
Hdfs dfsadmin –safemode leave
• Enable Kerberos in Hadoop namenode PayLoad: The applications implement Map and Reduce functions and
Usage: mapred [Generic commands] <parameters>
• Set-up Knox gateway to control access and authentication Hdfs namenode -format This command is used to format the NameNode form the core of the job
to the HDFS cluster hadoop [--config confdir]archive - Parameters Tasks
MRUnit: Unit test framework for MapReduce
This command is used to create a Hadoop archieve
• Authorization: archiveName NAME -p
Mapper: Mapper maps the input key/value pairs to the set of -input directory/file-name Shows Inputs the location for mapper
• Define groups hadoop fs [generic options] -touchz
intermediate key/value pairs -output directory-name Shows output location for the mapper
This is used to create an empty files in a hdfs directory
• Define HDFS permissions <path> ... -mapper executable or
NameNode: Node that manages the HDFS is known as namednode Used for Mapper executable
• Define HDFS ACL’s hdfs dfs [generic options] -getmerge This is used to concatenate all files in a directory into one script or JavaClassName
DataNode: Node where the data is presented before processing takes
[-nl] <src> <localdst> file
• Audit: -reducer executable or
hdfs dfs -chown -R admin:hadoop This is used to change the owner of the group
place Used for reducer executable
Enable process execution audit trail script or JavaClassName
/new-dir MasterNode: Node where the jobtrackers runs and accept the job
• Data protection: Makes the mapper, reducer, combiner
request from the clients
Enable wire encryption with Hadoop Commands Tasks
SlaveNode: Node where the Map and Reduce program runs -file file-name executable available locally on the
yarn This command shows the yarn help
JobTracker: Schedules jobs and tracks the assigned jobs to the task computing nodes
yarn [--config confdir] This command is used to define configuration file
tracker -numReduceTasks This is used to specify number of reducers
yarn [--loglevel loglevel] This can be used to define the log level, which can be
fatal, error, warn, info, debug or trace -mapdebug Script to call when the map task fails
TaskTracker: Tracks the task and updates the status to the job tracker
yarn classpath This is used to show the Hadoop classpath -reducedebug Script to call when the reduce task fails
Job: A program which is an execution of a Mapper and Reducer across
yarn application This is used to show and kill the Hadoop applications a dataset
yarn applicationattempt This shows the application attempt
Task: An execution of Mapper and Reducer on a piece of data
yarn container This command shows the container information
Task Attempt: A particular instance of an attempt to execute a task on
yarn node This shows the node information FURTHERMORE:
a SlaveNode
yarn queue This shows the queue information Big Data Hadoop Certification Training Course

OPDT 12cR2
No ratings yet
OPDT 12cR2
188 pages
MarkStamp Ch2 Crypto Basics
No ratings yet
MarkStamp Ch2 Crypto Basics
74 pages
DSL Pandas
No ratings yet
DSL Pandas
87 pages
History of Computer
No ratings yet
History of Computer
8 pages
Online Rental System Project Synopsis
No ratings yet
Online Rental System Project Synopsis
115 pages
Comprehensive Guide To Using FFmpeg To Convert Media Files
No ratings yet
Comprehensive Guide To Using FFmpeg To Convert Media Files
20 pages
08 - Introduction To Servers and Security - Slides
No ratings yet
08 - Introduction To Servers and Security - Slides
24 pages
Lesson 1
No ratings yet
Lesson 1
29 pages
Asa Remote Access VPN Technologies: SSLVPN Webvpn Ipsecvpn: Security Consulting Se Ccie, Cissp
No ratings yet
Asa Remote Access VPN Technologies: SSLVPN Webvpn Ipsecvpn: Security Consulting Se Ccie, Cissp
43 pages
Profinet Communication: Practices Guide
No ratings yet
Profinet Communication: Practices Guide
21 pages
MLOps Study Material
No ratings yet
MLOps Study Material
4 pages
Programming The Loop Detector Vek M4D: Classax-V
No ratings yet
Programming The Loop Detector Vek M4D: Classax-V
21 pages
General:: History of Blhelisuite32 Revisions Changes in 32.8.1.2
No ratings yet
General:: History of Blhelisuite32 Revisions Changes in 32.8.1.2
16 pages
Postman Part 1 Postman Intro: Try It Out
No ratings yet
Postman Part 1 Postman Intro: Try It Out
12 pages
BST JSS 1-3
100% (1)
BST JSS 1-3
54 pages
Wso2 Whitepaper The Five Pillars of Customer Identity and Access Management
No ratings yet
Wso2 Whitepaper The Five Pillars of Customer Identity and Access Management
17 pages
WBS
No ratings yet
WBS
3 pages
Palo Alto Networks
No ratings yet
Palo Alto Networks
8 pages
Alibaba SysOps 1
No ratings yet
Alibaba SysOps 1
14 pages
Microc/Os-Ii The Real-Time Kernel
No ratings yet
Microc/Os-Ii The Real-Time Kernel
29 pages
Dis511 Jan-Apr 2020 Introduction To Information Systems - Lect 1
No ratings yet
Dis511 Jan-Apr 2020 Introduction To Information Systems - Lect 1
11 pages
Cyber Crime - Assignment by Amir Khan
No ratings yet
Cyber Crime - Assignment by Amir Khan
12 pages
python interview question
No ratings yet
python interview question
39 pages
Cyber Security
No ratings yet
Cyber Security
6 pages
Tutorial-3 Ans
No ratings yet
Tutorial-3 Ans
6 pages
BDA Unit - II
No ratings yet
BDA Unit - II
66 pages
Information Sheet: Account Security
No ratings yet
Information Sheet: Account Security
10 pages
RFP For Selection of MSI For Implementation of ICCC and E-Governance Based Smart City Solution at Port Blair
No ratings yet
RFP For Selection of MSI For Implementation of ICCC and E-Governance Based Smart City Solution at Port Blair
5 pages
BDA Lab ManuaL[1]
No ratings yet
BDA Lab ManuaL[1]
83 pages
Data Streams: Models and Algorithms
No ratings yet
Data Streams: Models and Algorithms
372 pages
Questions About The OSI Model
No ratings yet
Questions About The OSI Model
3 pages
M1 - Introducing Google Cloud v5.2 - ILT
No ratings yet
M1 - Introducing Google Cloud v5.2 - ILT
69 pages
Alt + f8 Entrar Al Macros Al Visual Basic y Pegar El Siguiente Código
No ratings yet
Alt + f8 Entrar Al Macros Al Visual Basic y Pegar El Siguiente Código
4 pages
Ethnotech - Data Science With Python
No ratings yet
Ethnotech - Data Science With Python
480 pages
BDA Unit 5 HIVE HBASE
No ratings yet
BDA Unit 5 HIVE HBASE
33 pages
Fundamentals of Apache Sqoop Notes
No ratings yet
Fundamentals of Apache Sqoop Notes
66 pages
Hadoop Interview Guide
100% (1)
Hadoop Interview Guide
34 pages
MySQL Cheatsheet - CodeWithHarry
100% (1)
MySQL Cheatsheet - CodeWithHarry
13 pages
Jenkins Installation On AWS Cloud
No ratings yet
Jenkins Installation On AWS Cloud
5 pages
Lecture Notes Hadoop
100% (1)
Lecture Notes Hadoop
11 pages
Pandas
100% (1)
Pandas
1,131 pages
Python Full Stack
0% (1)
Python Full Stack
6 pages
Mastering SQL Window Functions - 01
No ratings yet
Mastering SQL Window Functions - 01
39 pages
Pig Hive
No ratings yet
Pig Hive
72 pages
The Ultimate C - C - C4H450 - 01 - SAP Certified Integration Associate - SAP Cloud For Customer
No ratings yet
The Ultimate C - C - C4H450 - 01 - SAP Certified Integration Associate - SAP Cloud For Customer
2 pages
Apache Pig
100% (2)
Apache Pig
80 pages
Hadoop Questions
No ratings yet
Hadoop Questions
41 pages
Machine Learning With Spark
No ratings yet
Machine Learning With Spark
26 pages
Mining Data Streams (Part 2)
No ratings yet
Mining Data Streams (Part 2)
56 pages
Smart Shopping Trolley Using Rfid
No ratings yet
Smart Shopping Trolley Using Rfid
4 pages
Lecture 4 - Pair RDD and DataFrame
No ratings yet
Lecture 4 - Pair RDD and DataFrame
38 pages
How Can I Become A Data Scientist - Quora
100% (1)
How Can I Become A Data Scientist - Quora
16 pages
Hadoop and Mapreduce
No ratings yet
Hadoop and Mapreduce
21 pages
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
No ratings yet
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
55 pages
Data Science Masters 2.0: Impact Batch 2.0
No ratings yet
Data Science Masters 2.0: Impact Batch 2.0
11 pages
Iti Pdfs
No ratings yet
Iti Pdfs
10 pages
DBMS Course Outline
No ratings yet
DBMS Course Outline
14 pages
76 - Sample - Chapter Kunci M2K3 No 9
No ratings yet
76 - Sample - Chapter Kunci M2K3 No 9
94 pages
Probability and Statistics Notes - Akshansh
0% (2)
Probability and Statistics Notes - Akshansh
101 pages
Simulado Databricks
No ratings yet
Simulado Databricks
25 pages
Introduction To Data Engineering
No ratings yet
Introduction To Data Engineering
8 pages
Python Interview Questions
No ratings yet
Python Interview Questions
8 pages
Hands-On Hadoop Tutorial
100% (1)
Hands-On Hadoop Tutorial
13 pages
Hadoop Interview Questions New
No ratings yet
Hadoop Interview Questions New
9 pages
Pandas Guide
No ratings yet
Pandas Guide
64 pages
Super Study Guide: Data Science Tools: Afshine Amidi and Shervine Amidi August 21, 2020
No ratings yet
Super Study Guide: Data Science Tools: Afshine Amidi and Shervine Amidi August 21, 2020
23 pages
Python For Non-Programmers - 1-1
No ratings yet
Python For Non-Programmers - 1-1
19 pages
Python Interview Questions 1653100147
No ratings yet
Python Interview Questions 1653100147
24 pages
Databricks Performance Tuning
No ratings yet
Databricks Performance Tuning
9 pages
Association Rule Mining Lesson PDF
No ratings yet
Association Rule Mining Lesson PDF
9 pages
Day64 - Pandas Interview Questions
No ratings yet
Day64 - Pandas Interview Questions
5 pages
Data Engineering Quick Reference
No ratings yet
Data Engineering Quick Reference
9 pages
Migrating Big Data Analytics
No ratings yet
Migrating Big Data Analytics
16 pages
Text Summarization
No ratings yet
Text Summarization
6 pages
Hive Cheat Sheet - Quick Reference
No ratings yet
Hive Cheat Sheet - Quick Reference
19 pages
Apache Spark Architecture
No ratings yet
Apache Spark Architecture
7 pages
Introduction To SQL - NEW
No ratings yet
Introduction To SQL - NEW
27 pages
De Mod 5 Deploy Workloads With Databricks Workflows
No ratings yet
De Mod 5 Deploy Workloads With Databricks Workflows
19 pages
CDC Python Learning Hierarchy
No ratings yet
CDC Python Learning Hierarchy
3 pages
Mastering Apache Spark - Sample Chapter
No ratings yet
Mastering Apache Spark - Sample Chapter
24 pages
Databricks Course Curriculum
No ratings yet
Databricks Course Curriculum
2 pages
Querying Microsoft SQL Server
No ratings yet
Querying Microsoft SQL Server
3 pages
Python Syllbus by Lokesh
No ratings yet
Python Syllbus by Lokesh
5 pages
Hadoop Commands Cheat Sheet
No ratings yet
Hadoop Commands Cheat Sheet
1 page
Gradient Descent Algorithm
No ratings yet
Gradient Descent Algorithm
5 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet

Uploaded by

Uploaded by

HADOOP AND Hadoop HDFS List File Commands

hdfs dfs –ls /

Hadoop & MapReduce

hdfs dfs -put logs.csv /data/

You might also like