Hive+Xml Processing
First to
understand XPATH() : by using this
to parse XML data into String array.
Example:
small xml data
<rec><name>Babu</name><age>25</age><sex>male</sex></rec>
<rec><name>Radha</name><age>23</age><sex>female</sex></rec>
NOTE: xml data converted into hive table in two
steps process
1. Convert the xml data into array
format
2. Array data can be converted into hive
table format.
Process :
Step1:
create the hive table
Ex: Hive>create table hivexml(str
string);
Step2: load
the xmldata into hive table
EX:
Hive>load data local inpath ‘xmlfile’ into table hivexml;
Ø By this step load all local xml data
into your hive table astise, so we can convert that data into STRING ARRAY
format by using XPATH(), And then we
can convert the array data into normal hive table data,
Step3: convert the xml data into array format
EX:
Hive>select xpath(str,’rec/*/text()’) from xmlhive;
Ø OutPut: ["Babu”,”25”,”male”]
[“Radha”,”23”,”female”]
Explanation of ‘rec/*/text()’
rec: its define Node of
xml same as XML DATA (Check the xml
data)
*: its define all the
fields of xml data.
If you want specific fields simply mansion it like below
Ex: Hive>select
xpath(str,’rec/name/text()’)from xmlhive;
Ø OutPut: [“Babu”]
[“Radha”]
Step4: crate the HIVE table required columns
EX: Hive> create table newhivexml(name string,age int,sex
string);
Ø After creating the table to load the
xml array format data into newhivexml table like below
Step5:
Hive> insert
overwrite table newhivexml select
xpath_string(str,'rec/name'),xpath_string(str,'rec/age'),xpath_string(str,'rec/sex')from
hivexml;
Hive>select
* from newhivwxml ;
To get the
data in table format like below.
name age
sex
Babu 25
male
Radha 23
female
Thank you.
* This note only for to get some basic idea purpose give me
your feedback
This comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteHi kali,
ReplyDeleteI am following your blog post for Hive+Xml Processing on aws ec2 instance and running the commands on centos. All was working fine until i ran the following command :
"select xpath(data,'Food_Display_Table/*/text()') from food_display_table_xml;"
This command is giving me the following error message:
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user":hdfs:hdfs:drwxr-xr-x
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user":hdfs:hdfs:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:225)
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1440)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1438)
at org.apache.hadoop.ipc.Client.call(Client.java:1118)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at $Proxy11.mkdirs(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1641)
... 26 more
Job Submission failed with exception 'org.apache.hadoop.security.AccessControlException(org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user":hdfs:hdfs:drwxr-xr-x)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
hi kaali,
ReplyDeletethank you for the stuff.
the code which you gave is only working in hive 0.7 version but not in 10.1 .....may i know why?
Thanks for the post, In this complex environment business need to present there company data in meaningful way.So user easily understand it .Sqiar (http://www.sqiar.com/blog) which is in UK,provide services like Tableau and Data Warehousing etc .In these services sqiar experts convert company data into meaningful way.
ReplyDeleteThanks for sharing this informative information. For more you may refer http://www.s4techno.com/hadoop-training-in-pune/
ReplyDeleteHadoop Online Training
ReplyDeleteHadoop Training in Hyderabad
ReplyDeletethank u for the post
ReplyDelete