Hive+Xml Processing
First to
understand XPATH() : by using this
to parse XML data into String array.
Example:
small xml data
<rec><name>Babu</name><age>25</age><sex>male</sex></rec>
<rec><name>Radha</name><age>23</age><sex>female</sex></rec>
NOTE: xml data converted into hive table in two
steps process
1. Convert the xml data into array
format
2. Array data can be converted into hive
table format.
Process :
Step1:
create the hive table
Ex: Hive>create table hivexml(str
string);
Step2: load
the xmldata into hive table
EX:
Hive>load data local inpath ‘xmlfile’ into table hivexml;
Ø By this step load all local xml data
into your hive table astise, so we can convert that data into STRING ARRAY
format by using XPATH(), And then we
can convert the array data into normal hive table data,
Step3: convert the xml data into array format
EX:
Hive>select xpath(str,’rec/*/text()’) from xmlhive;
Ø OutPut: ["Babu”,”25”,”male”]
[“Radha”,”23”,”female”]
Explanation of ‘rec/*/text()’
rec: its define Node of
xml same as XML DATA (Check the xml
data)
*: its define all the
fields of xml data.
If you want specific fields simply mansion it like below
Ex: Hive>select
xpath(str,’rec/name/text()’)from xmlhive;
Ø OutPut: [“Babu”]
[“Radha”]
Step4: crate the HIVE table required columns
EX: Hive> create table newhivexml(name string,age int,sex
string);
Ø After creating the table to load the
xml array format data into newhivexml table like below
Step5:
Hive> insert
overwrite table newhivexml select
xpath_string(str,'rec/name'),xpath_string(str,'rec/age'),xpath_string(str,'rec/sex')from
hivexml;
Hive>select
* from newhivwxml ;
To get the
data in table format like below.
name age
sex
Babu 25
male
Radha 23
female
Thank you.
* This note only for to get some basic idea purpose give me
your feedback