Download this data set containing the complete works of william shakespeare and then write mapreduce programs to find the following:

Download the file from here.

Q1: Which word has the highest frequency of occurrence in the document?

  • ‘the’ (25,513 times)
  • ‘a’ (25,513 times)
  • ‘the’ (34,512 times)
  • ‘a’ (34,512 times)


Q2: What is the frequency of occurrence of the word ‘Romeo’? (Ignore cases.)

  • 48
  • 49
  • 47
  • 50


Q3: Which word in the document is the longest? (You do not need to remove the punctuation marks from the words.) [Hint: The length of a single word should not exceed 14 characters.]

  • “circumference.”
  • “Shakespeare.”
  • “Hamlet”
  • “Julius.”