Shots of HDFS.

Prakash Kumar
1 min readOct 13, 2019

--

What is Hadoop?

It deals with two components:

  1. Distributed Storage(i.e , accessing the files among users on a computer network)
  2. Map Reduce (i.e processing of large data sets)

What are the features?

  1. Open-source
  2. Distributed processing(i.e parallel processing of data using map-reduce)
  3. Fault-Tolerance(i.e it can work on node failure, replication factor)
  4. Scalability (i.e nodes can be added anytime)

Limitations:

Interactive use by users.

Architecture

Follows master-slave architecture whereas master is named as “name node” and slaves are named as “data node”.

Responsibilities of NameNode:

  1. Keeping track of directory tree and file. (i.e maintaining metadata of filename, the path of the data node, no. of data blocks, block ids, block location, no. of replicas, permissions of files)
  2. Keep tracks of the data file location of the node. (i.e maintaining edit log and fs image))
  3. Checking the heartbeats of data node associated with the name node.
  4. Interaction with the clients.
  5. Executes file namespace operation i.e read, writes, renaming directories or files

Responsibilities of DataNode :

  1. Responding to name node for filesystem operations.
  2. Stores data in HDFS.

It manages the components using rack awareness(i.e management of clusters)

Security

  1. User Quotas (Quotas(space) defined at directory levels or file levels)
  2. Access Permission (Permission for reading directories/files, applies stick bit permission)
  3. Hard links and soft links are not supported by HDFS.
  4. SafeMode(on start, blocks write operation, on the failure of data nodes)

--

--

No responses yet