Posts

Showing posts from February, 2020

Sqoop

Image
The traditional application management system, that is, the interaction of applications with relational database using RDBMS, is one of the sources that generate Big Data. Such Big Data, generated by RDBMS, is stored in Relational  Database Servers  in the relational database structure. When Big Data storages and analyzers such as MapReduce, Hive, HBase, Cassandra, Pig, etc. of the Hadoop ecosystem came into picture, they required a tool to interact with the relational database servers for importing and exporting the Big Data residing in them. Here, Sqoop occupies a place in the Hadoop ecosystem to provide feasible interaction between relational database server and Hadoop’s HDFS. Sqoop  − “SQL to Hadoop and Hadoop to SQL” Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases. It is provided by the Apache

Hive Vs Pig

Differences between Hive and Pig: Hive Pig 1.Hive is declarative language works on HiveQL(HQL) which is similar to SQL. 1.Pig is procedural language. Works on Pig Latin 2.Used by data analytics people 2.Used by Researchers, and programmers. 3.Works only on structured data. 3.Works on structured, semi structured and unstructured data . 4.Hive operates on server side of the cluster. 4.Pig operates on Client side of the cluster. 5.Supports partitioning of data. 5.Doesnot support partitioning. 6.Doesnot load the data quickly   but executes quickly 6.Loads the data quickly and effectively 7.Has separate metadata database on HDFS. 7.Doesnot have separate metadata database.Uses HDFS as its database. 8.Hive    was first developed by facebook 8.Pig was first developed by yahoo

Airhmatic Operation on Hive

Image
Arithematicoperators : ·        Arithmetic Operators in Hive supports various arithmetic operations on the operands. ·        All return number types. ·          If any of the operands are  NULL , then the result is also NULL . Different arithematic operators are : Operator Operand type Description A+B Number type Results of adding A and B A-B Number type Results of subtracting B from A A*B Number type Results in productof A and B A/B Number type Results in quotient of A and B A%B Number type Results in modulus A and B A&B Number type Results in bitwise   AND of A and B A|B Number type Results in bitwise   OR   of A and B A^B Number type Results in bitwise XOR of A and B ~A

Partitioning and Bucketing in Hive

Partitioning and Bucketing in Hive: These two techniques are query optimization techniques. Partitioning :   ·        It is the way to organize the data in partitions by dividing the table into different parts   based on partition keys. ·        If there are repeated values in the any field or column then partition technique can be applied. ·        Each table in hive can have one or more partitions based on column or partition key. ·        Packages to be imported for partitoning sethive.exec.dynamic.partition.mode=nonstrict sethive.exec.dynamic.partition= true Bucketing: ·        It is the technique   which is used for segregating hive tables into   multiple files. ·        The divisionof hive tables or partitions   isbased on hash function of field which is present in table. ·        Package for bucketing sethive.enforce.bucketing=true;   Partitioned Using Bucketking create table txnByCat(txnno INT, txndate STRING, custno INT, amount DOUBLE, product

Apache Hive

Apache Hive ·         Hive is one of the component of hadoop . ·         Hive is called As HiveQL   (Hive Quering Language ). ·         Hive is Introduce at Facebook inorder to fulfill requirements with respect to ETL Technology. ·         Hive comes with built in connectors for comma and tab- separated values (CSV/TSV) text files,  Apache Parquet ™ ,  Apache ORC ™ , and other formats. ·         The  Apache Hive™  data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage and queried using SQL syntax.    ·          Built on top of  Apache Hadoop™ , Hive provides the following features: ·          Tools to enable easy access to data via SQL, thus enabling data warehousing tasks such as extract/transform/load (ETL), reporting, and data analysis. ·          A mechanism to impose structure on a variety of data formats ·          Access to files stored either directly in  Apache HDFS ™  or in other data storage s