Monday, June 1, 2015

Hadoop Tough Interview Questions

MapReduce:


1. If you want to re-use your reducer class as a combiner class, how should be the input & output key value pairs of the reducer class be related?


2. How do you write a MapReduce program to find the lengthiest word in a given document.


3. I have a file with the following columns - User Id, login time & logout time. How do you write a MapReduce program to find the top 5 users in terms of time spent. (Assume each user has only one entry in the file)


4. Write a MapReduce program to create phrases such that each phrase contain 2 consecutive words of a line. eg: If input is - "I am a good boy", then the expected output is - "{I am, am a, a good, good boy}


5. How can you customize the name of the output file emitted by a mapper/reducer?


6. What are the various methods inside the Mapper class(org.apache.hadoop.mapreduce.Mapper)? Explain each method



7. What will be the outcome if you comment the following line of code in your main() of the driver class
'job.setJarByClass()'?

8. In the configuration class, the map output key is defined using 'job.setMapOutputKey(Text.class)' while the signature of the Mapper class is 'Mapper<LongWritable, Text, LongWritable, Text>'. Will it cause any error? If yes, is that a compile-error or run-time error?

Hive:


1. What are Hive Query Tuning techniques?

2. What is .hiverc file?


3. How to set number of reducers in a Hive query?


4. What is the difference between Cluster by & Clustered by?


5. What is the difference between Static Partition & Dynamic Partition?



(More to follow....)

1 comment:

  1. Excellent Post, I welcome your interest about to post blogs. It will help many of them to update their skills in their interesting field.
    Regards,

    SAS Training in Chennai|SAS Institutes in Chennai

    ReplyDelete