Featured Post

How to Work With Tuple in Python

Image
Tuple in python is one of the streaming datasets. The other streaming datasets are List and Dictionary. Operations that you can perform on it are shown here for your reference. Writing tuple is easy. It has values of comma separated, and enclosed with parenthesis '()'. The values in the tuple are immutable, which means you cannot replace with new values. #1. How to create a tuple Code: Tuple example my_tuple=(1,2,3,4,5) print(my_tuple) Output: (1, 2, 3, 4, 5) ** Process exited - Return Code: 0 ** Press Enter to exit terminal #2. How to read tuple values Code: print(my_tuple[0]) Output: 1 ** Process exited - Return Code: 0 ** Press Enter to exit terminal #3. How to add two tuples Code: a=(1,6,7,8) c=(3,4,5,6,7,8) d=print(a+c) Output: (1, 6, 7, 8, 3, 4, 5, 6, 7, 8) ** Process exited - Return Code: 0 ** Press Enter to exit terminal #4.  How to count tuple values Here the count is not counting values; count the repetition of a given value. Code: sample=(1, 6, 7, 8, 3, 4, 5, 6, 7, 8

Top Key Architecture Components in HIVE

5 architectural components present in Hadoop Hive: Shell: allows interactive queries like MySQL shell connected to a database – Also supports web and JDBC clients Driver: session handles, fetch, execute Compiler: parse, plan, optimize Execution engine: DAG of stages (M/R, HDFS, or metadata) Metastore: schema, location in HDFS, SerDe

Data Mode of Hive:
  • Tables
– Typed columns (int, float, string, date, boolean)
– Also, list: map (for JSON-like data)
  • Partitions
– e.g., to range-partition tables by date
  • Buckets
– Hash partitions within ranges (useful for sampling, join optimization)

HIVE Meta Store
  • Database: namespace containing a set of tables
  • Holds table definitions (column types, physical layout)
  • Partition data 
  • Uses JPOX ORM for implementation; can be stored in Derby, MySQL, many other relational databases
Physical Layout of HIVE
  • Warehouse directory in HDFS
– e.g., /home/hive/warehouse
  • Tables stored in subdirectories of warehouse
– Partitions, buckets form subdirectories of tables
  • Actual data stored in flat files
– Control char-delimited text, or SequenceFiles
– With custom SerDe, can use arbitrary format

Comments

Popular posts from this blog

7 AWS Interview Questions asked in Infosys, TCS

How to Decode TLV Quickly

Hyperledger Fabric: 20 Real Interview Questions