Apache Hive

Apache Hive

·        Hive is one of the component of hadoop .

·        Hive is called As HiveQL  (Hive Quering Language ).

·        Hive is Introduce at Facebook inorder to fulfill requirements with respect to ETL Technology.

·        Hive comes with built in connectors for comma and tab-separated values (CSV/TSV) text files, Apache ParquetApache ORC, and other formats.

·        The Apache Hive™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage and queried using SQL syntax. 

 

·         Built on top of Apache Hadoop™, Hive provides the following features:

·         Tools to enable easy access to data via SQL, thus enabling data warehousing tasks such as extract/transform/load (ETL), reporting, and data analysis.

·         A mechanism to impose structure on a variety of data formats

·         Access to files stored either directly in Apache HDFS or in other data storage systems such as Apache HBase .

 

Hive Limitation’s

·        Not All “Standard” SQL is Supported  

·        No Support for UPDATE or DELETE .

·        No Support for INSERTING Single Rows.

 

How to find Hive Version ?

 hive –version

 

Present version in cloudera 5.12.0 

Hive 1.1.0-cdh5.12.0

 

 

 

 

+1256-

 

 

Delimiters  /        

Default Format file ORC Optimised Row Columnar

Datatypes…

Primitive Types

Types are associated with the columns in the tables. The following Primitive types are supported:

Integers

·         TINYINT—1 byte integer

·         SMALLINT—2 byte integer

·         INT—4 byte integer

·         BIGINT—8 byte integer

 

Boolean type

·         BOOLEAN—TRUE/FALSE

 

Floating point numbers

·         FLOAT—single precision

·         DOUBLE—Double precision

 

Fixed point numbers

·         DECIMAL—a fixed point value of user defined scale and precision

 

String types

·         STRING—sequence of characters in a specified character set

·         VARCHAR—sequence of characters in a specified character set with a maximum length

·         CHAR—sequence of characters in a specified character set with a defined length

 

·         Date and time types

TIMESTAMP— a specific point in time, up to nanosecond precision

·         DATE—a date

 

Binary types

·         BINARY—a sequence of bytes

This type hierarchy defines how the types are implicitly converted in the query language. Implicit conversion is allowed for types from child to an ancestor. So when a query expression expects type1 and the data is of type2, type2 is implicitly converted to type1 if type1 is an ancestor of type2 in the type hierarchy. Note that the type hierarchy allows the implicit conversion of STRING to DOUBLE.

Explicit type conversion can be done using the cast operator as shown in the #Built In Functions section below.

Complex Types

Complex Types can be built up from primitive types and other composite types using:

·         Structs: the elements within the type can be accessed using the DOT (.) notation. For example, for a column c of type STRUCT {a INT; b INT}, the a field is accessed by the expression c.a

·         Maps (key-value tuples): The elements are accessed using ['element name'] notation. For example in a map M comprising of a mapping from 'group' -> gid the gid value can be accessed using M['group']

·         Arrays (indexable lists): The elements in the array have to be in the same type. Elements can be accessed using the [n] notation where n is an index (zero-based) into the array. For example, for an array A having the elements ['a', 'b', 'c'], A[1] retruns 'b'.

 

Apache Pig Vs Hive

Both Apache Pig and Hive are used to create MapReduce jobs. And in some cases, Hive operates on HDFS in a similar way Apache Pig does. In the following table, we have listed a few significant points that set Apache Pig apart from Hive.

 

Apache Pig

Hive

Apache Pig uses a language called Pig Latin. It was originally created at Yahoo.

Hive uses a language called HiveQL. It was originally created at Facebook.

Pig Latin is a data flow language.

HiveQL is a query processing language.

Pig Latin is a procedural language and it fits in pipeline paradigm.

HiveQL is a declarative language.

Apache Pig can handle structured, unstructured, and semi-structured data.

Hive is mostly for structured data.

 

 

Describe  & describe formatted table name…

 

 

sudo service hive-metastore status
sudo service hive-server2 status

                                                          :

1.     hdfsdfsadmin -safemode leave

2.     sudo service hadoop-master stop   :         To stop

3.     sudo service hadoop-master start   :         To start

4.     Hadoop dfsadmin -safemode enter :         Lock's to safe mode.

5.      

6.     Hadoop dfsadmin -safemode get    :         Lock's to safe mode.

7.     Hadoop dfsadmin -safemode leave :         Unlock safemode.

 

 

Create Table:::::::::::

create EXTERNAL table EXtxn909(txnno INT, txndate STRING, custno INT, amount DOUBLE,

category STRING, product STRING, city STRING, state STRING, spendby STRING)

row format delimited

fields terminated by ','

stored as textfile;

 

LOAD DATA LOCAL INPATH '/home/cloudera/emp.txt'OVERWRITE INTO TABLE e1234;

LOAD DATA LOCAL INPATH '/home/cloudera/Pig123' OVERWRITE INTO TABLE student1;

 


Comments

Popular posts from this blog

Hadoop

Problem Statement Of Real Estate Use Cases

Problem Statement Of Bank Marketing analysis