Posts

Showing posts from January, 2020

Hadoop Comand

Syntax     :   hadoop fsck / COMMAND_OPTION Description path Start checking from this path. -delete Delete corrupted files. -files Print out files being checked. -files   -blocks Print out the block report -files   -blocks   -locations Print out locations for every block. -files   -blocks   -racks Print out network topology for data-node locations. -includeSnapshots Include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it. -list-corruptfileblocks Print out list of missing blocks and files they belong to. -move Move corrupted files to /lost+found. -openforwrite Print out files opened for write.   HDFS supports the fsck command to check for various inconsist

Hadoop Distribution File System

EDGE Node          :         Ø Edge nodes are the interface between the Hadoop cluster and the outside network. Ø   For this reason, they’re sometimes referred to as  gateway  nodes. Ø   Most commonly, edge nodes are used to run client applications and cluster administration tools. Ø EdgeNode is machine which is part of cluster where client applications are installed. NameNode                    : Ø The NameNode is the centerpiece of an HDFS file system. Ø It keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. Ø   It does not store the data of these files itself. Ø Client applications talk to the NameNode whenever they wish to locate a file, or when they want to add/copy/move/delete a file. Ø   The NameNode responds the successful requests by returning a list of relevant  DataNode  servers where the data lives. Ø The NameNode maintains two persistent files – Ø A transaction log called a

Hadoop

Image
“90% of the world’s data was generated in the last few years.” Due to the advent of new technologies, devices, and communication means like social networking sites, the amount of data produced by mankind is growing rapidly every year. The amount of data produced by us from the beginning of time till 2003 was 5 billion gigabytes. If you pile up the data in the form of disks it may fill an entire football field. The same amount was created in every two days in  2011 , and in every ten minutes in  2013 . This rate is still growing enormously. Though all this information produced is meaningful and can be useful when processed, it is being neglected. What is Big Data? Big data  is a collection of large datasets that cannot be processed using traditional computing techniques. It is not a single technique or a tool, rather it has become a complete subject, which involves various tools, technqiues and frameworks. What Comes Under Big Data? Big data involves the data produced by dif