Monday, September 9, 2013

Managed Tables and External Tables

When you create a table in Hive, by default Hive will manage the data, which means that Hive moves the data into its warehouse directory.
Alternatively, you may create an external table, which tells Hive to refer to the data that is at an existing location outside the warehouse directory.

The difference between the two types of table is seen in the LOAD and DROP  Semantics.

CREATE TABLE managed_table(dummy STRING);
LOAD DATA INPATH   '/user/tom/data.txt' INTO table managed_table;

CREATE EXTERNAL TABLE external_table(dummy STRING)
            LOCATION   '/user/tom/external_table';

LOAD DATA INPATH '/user/tom/data.txt' INTO TABLE external_table;


Which one to use?
As a rule of thumb, if you are doing all your processing with Hive, then use managed tables, but if you wish to use Hive and other tools on the same dataset, then use external tables. A common pattern is to use an external table to access an initial dataset stored in HDFS (created by another process), then use a Hive transform to move the data into a managed Hive table. This works the other way around, too—an external table (not necessarily on HDFS) can be used to export data from Hive for other applications to use.
Another reason for using external tables is when you wish to associate multiple schemas with the same dataset.


17 comments:

  1. Thank you Biginfosys. if any one looking for Hadoop Training refer my details my id: hadooptoall@gmail.com

    ReplyDelete
  2. RS Trainings offers hadoop online training by real-time experts for more details visit:rstrainings.com

    ReplyDelete
  3. Thanks For Sharing Nice information

    Hadoop Training in Hyderabad,
    Hadooop Training in india

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Thanks for sharing the descriptive information on Hadoop course. It’s really helpful to me since I'm taking Talbeau training. Keep doing the good work and if you are interested to know more on Hadoop, do check this Hadoop tutorial.
    https://www.youtube.com/watch?v=1jMR4cHBwZE

    ReplyDelete

  6. Thanks for sharing the descriptive information on Big Data Hadoop Tutorial. It’s really helpful to me since I'm taking Big Data Hadoop Training. Keep doing the good work and if you are interested to know more on Big Data Hadoop Tutorial, do check this Hadoop tutorial.https://www.youtube.com/watch?v=nuPp-TiEeeQ&

    ReplyDelete
  7. Book Tenride call taxi in Chennai at most affordable taxi fare for Local or Outstation rides. Get multiple car options with our Chennai cab service
    chennai to kochi cab
    bangalore to kochi cab
    kochi to bangalore cab
    chennai to hyderabad cab
    hyderabad to chennai cab

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete
  9. ecommerce application development company
    best ecommerce website designers

    ReplyDelete