Saturday, February 23, 2013

Hive+Xml Processing


                                  Hive+Xml Processing
First to understand XPATH() : by using this to parse XML data into String array.
Example: small xml data
   <rec><name>Babu</name><age>25</age><sex>male</sex></rec>
 <rec><name>Radha</name><age>23</age><sex>female</sex></rec>
NOTE:  xml data converted into hive table in two steps process
1.      Convert the xml data into array format  
2.      Array data can be converted into hive table format.
Process :
Step1: create the hive table
      Ex: Hive>create table hivexml(str string);
Step2: load the xmldata into hive table
     EX:    Hive>load data local inpath ‘xmlfile’ into table hivexml;
Ø  By this step load all local xml data into your hive table astise, so we can convert that data into STRING ARRAY format by using XPATH(), And then we can convert the array data into normal hive table data,
Step3:  convert the xml data into array format
     EX:   Hive>select xpath(str,’rec/*/text()’) from xmlhive;
Ø  OutPut:  ["Babu”,”25”,”male”]
                [“Radha”,”23”,”female”]

Explanation of ‘rec/*/text()’
                     rec: its define Node of xml same as XML DATA  (Check the xml data)
                       *: its define all the fields of xml data.
If you want specific fields simply mansion it  like below
Ex: Hive>select xpath(str,’rec/name/text()’)from xmlhive;

Ø  OutPut:  [“Babu”]
                [“Radha”]
Step4:  crate the HIVE table required columns
   EX: Hive> create  table newhivexml(name string,age  int,sex  string);
Ø  After creating the table to load the xml array format data into newhivexml table like below
Step5: 
     Hive> insert overwrite table newhivexml select xpath_string(str,'rec/name'),xpath_string(str,'rec/age'),xpath_string(str,'rec/sex')from hivexml;

Hive>select * from  newhivwxml ;
To get the data in table format like below.
name    age    sex
Babu    25      male
Radha   23      female

Thank you.
This note only for to get some basic idea purpose give me your feedback                                



Friday, February 8, 2013

Hadoop Ecosystem

 

The above diagram Clearly explain the Hadoop Echosystem, These are combination of different  Techknowledges all are doing different types of works shown the above program you can understand clearly     .
The Echosystem of Hadoop is

      Name            Purpose

    1. Hive          ( Data WareHouse)
    2. Pig            (Text Mining)
    3. Hbase       (Random Operations)
    4. Sqoop       (Export and Import)
    5. Flume        (Streaming Data)
    6. Ooziee       (Scheduler nd Workflow Design)
    7.Zookeeper (State Maintenance)


1. Hive :
  •   Hive is  a Data Warehouse in Hadoop Environment.
  •  To process Structured and Semi-structured and Un-Structured data.
  •  Un-Structured data can be processed by converting into Structured data.


2. Pig :
  • Pig is used for Text analytic (mining)
  • Pig is Processed for Xml and Json data.
  • Even though data is Structured and impossible of Hive (hql) can be processed by Pig.
  • The additional Functionalists (not possible of Pig) can be done by Using UDF (User Define Functions)
  • The Pig UDF's can be done in following languages
                       i.e : Java,Ruby,Python,Java script, C++, etc........  
  •   Hive also supports UDF's hive udf's can be done in
                         i.e:  Java,Ruby,Python,C++,R Program etc........ 
  • When you run high query's in pig automatically java Map Reduce code will be build by the frame it will be submitted by JVM.







Tuesday, February 5, 2013

Hadoop Training

Hi Welcome to Programming Hadoop