Apache Hive
Apache
Hive
·
Hive is one of the component of hadoop .
·
Hive is called As HiveQL
(Hive Quering Language ).
·
Hive is Introduce at Facebook inorder to fulfill requirements with
respect to ETL
Technology.
·
Hive comes with built in
connectors for comma
and tab-separated
values (CSV/TSV) text files, Apache Parquet™, Apache
ORC™, and other formats.
·
The Apache Hive™ data warehouse software
facilitates reading, writing, and managing large datasets residing in
distributed storage and queried using SQL syntax.
·
Built on top of Apache Hadoop™, Hive provides the following features:
·
Tools to enable easy access to
data via SQL, thus enabling data warehousing tasks such as
extract/transform/load (ETL), reporting, and data analysis.
·
A
mechanism to impose structure on a variety of data formats
·
Access to
files stored either directly in Apache HDFS™ or in other data storage systems such as Apache HBase™ .
Hive Limitation’s
·
Not
All “Standard” SQL is Supported
·
No
Support for UPDATE or DELETE .
·
No
Support for INSERTING Single Rows.
How to find Hive Version ?
hive –version
Present version in cloudera 5.12.0
Hive 1.1.0-cdh5.12.0
+1256-
Delimiters /
Default
Format file ORC Optimised Row Columnar
Datatypes…
Primitive
Types
Types are
associated with the columns in the tables. The following Primitive types are
supported:
Integers
·
TINYINT—1
byte integer
·
SMALLINT—2
byte integer
·
INT—4
byte integer
·
BIGINT—8
byte integer
Boolean type
·
BOOLEAN—TRUE/FALSE
Floating point
numbers
·
FLOAT—single
precision
·
DOUBLE—Double
precision
Fixed point
numbers
·
DECIMAL—a
fixed point value of user defined scale and precision
String types
·
STRING—sequence
of characters in a specified character set
·
VARCHAR—sequence
of characters in a specified character set with a maximum length
·
CHAR—sequence
of characters in a specified character set with a defined length
·
Date
and time types
TIMESTAMP— a
specific point in time, up to nanosecond precision
·
DATE—a
date
Binary types
·
BINARY—a
sequence of bytes
This type
hierarchy defines how the types are implicitly converted in the query language.
Implicit conversion is allowed for types from child to an ancestor. So when a
query expression expects type1 and the data is of type2, type2 is implicitly
converted to type1 if type1 is an ancestor of type2 in the type hierarchy. Note
that the type hierarchy allows the implicit conversion of STRING to DOUBLE.
Explicit type conversion can be done
using the cast operator as shown in the #Built In Functions section
below.
Complex Types
Complex Types can be built up from
primitive types and other composite types using:
·
Structs:
the elements within the type can be accessed using the DOT (.) notation. For
example, for a column c of type STRUCT {a INT; b INT}, the a field is accessed
by the expression c.a
·
Maps
(key-value tuples): The elements are accessed using ['element name'] notation.
For example in a map M comprising of a mapping from 'group' -> gid the gid
value can be accessed using M['group']
·
Arrays
(indexable lists): The elements in the array have to be in the same type.
Elements can be accessed using the [n] notation where n is an index
(zero-based) into the array. For example, for an array A having the elements
['a', 'b', 'c'], A[1] retruns 'b'.
Apache Pig
Vs Hive
Both Apache Pig and Hive are used
to create MapReduce jobs. And in some cases, Hive operates on HDFS in a similar
way Apache Pig does. In the following table, we have listed a few significant
points that set Apache Pig apart from Hive.
Apache Pig |
Hive |
Apache
Pig uses a language called Pig Latin. It was originally created
at Yahoo. |
Hive
uses a language called HiveQL. It was originally created at Facebook. |
Pig
Latin is a data flow language. |
HiveQL
is a query processing language. |
Pig
Latin is a procedural language and it fits in pipeline paradigm. |
HiveQL
is a declarative language. |
Apache
Pig can handle structured, unstructured, and semi-structured data. |
Hive is
mostly for structured data. |
Describe
& describe formatted table name…
sudo service hive-metastore status
sudo service hive-server2 status
:
1.
hdfsdfsadmin -safemode
leave
2.
sudo service hadoop-master stop : To
stop
3.
sudo service hadoop-master start : To
start
4.
Hadoop dfsadmin -safemode enter : Lock's
to safe mode.
5.
6.
Hadoop dfsadmin -safemode get : Lock's
to safe mode.
7.
Hadoop dfsadmin -safemode leave : Unlock
safemode.
Create Table:::::::::::
create EXTERNAL table EXtxn909(txnno INT,
txndate STRING, custno INT, amount DOUBLE,
category STRING, product STRING, city
STRING, state STRING, spendby STRING)
row format delimited
fields terminated by ','
stored as textfile;
LOAD DATA LOCAL INPATH
'/home/cloudera/emp.txt'OVERWRITE INTO TABLE e1234;
LOAD DATA LOCAL INPATH
'/home/cloudera/Pig123' OVERWRITE INTO TABLE student1;
Comments
Post a Comment