Consider a classical MapReduce application that counts the total number of occurrences of words in a given text. For example, look at WordCount application WordCount.java.
Assume the following classification of words depending on the length of each word.
very short:
1

Question

Consider a classical MapReduce application that counts the total number of occurrences of words in a given text. For example, look at WordCount application WordCount.java.
Assume the following classification of words depending on the length of each word.
very short: 
1
 
<
=
 ﻿length 
<
=
 
3
short: 
4
 
<
=
 ﻿length 
<
=
 
5
medium: 
6
 
<
=
 ﻿length 
<
=
 
8
long: 
9
 
<
=
 ﻿length 
<
=
 
1
2
X long: 
1
3
 
<
=
 ﻿length 
<
=
 
1
5
XX long: 
1
6
 
<
=
 ﻿length
Extend Java code of the application such that it counts in a given text the total number of words in each category. For example, the distribution of words in a text that consists of 
9
0
 ﻿words could be the following.
X short: 
1
0
 ﻿words
short: 
1
5
 ﻿words
medium: 
3
5
 ﻿words
long: 
2
0
 ﻿words
X long: 
1
0
 ﻿words
XX long: 
0
 ﻿words
Save your solution in a file solution
3
.
java.
When ready, compile, create jar file, and process your application. To test your application, you can use a file grep.txt included. Display the results created by the application. When finished, Copy and Paste the messages from a Terminal screen into a file solution3.pdf.

Accepted Answer

```java // Import necessary MapReduce packages import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import java.io.IOException; import java.util.StringTokenizer; // Define the WordCount class public class WordCount { // Mapper class that processes input key-value pairs and produces a set of intermediate key-value pairs public static class TokenizerMapper extends Mapper { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); // Map function that categorizes words based on their length public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { String currentWord = itr.nextToken(); String category = categorizeWord(currentWord); word.set(category); context.write(word, one); } } // Helper function to categorize words based on length private String categorizeWord(String word) { int length = word.length(); if (length <= 3) { return "very short"; } else if (length <= 5) { return "short"; } else if (length <= 8) { return "medium"; } else if (length <= 12) { return "long"; } else if (length <= 15) { return "X long"; } else { return "XX long"; } } } // Reducer class that processes the intermediate key-value pairs to produce a smaller set of key-value pairs public static class IntSumReducer extends Reducer { private IntWritable result = new IntWritable(); // Reduce function that sums up the counts for each word category public void reduce(Text key, Iterable values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } // Main function to set up the MapReduce job public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } ``` ∻Explanation∻ ⚹The provided Java code is an extension of the classical WordCount MapReduce application. It includes a Mapper that categorizes words based on their length and a Reducer that sums up the counts for each category. The main function sets up the MapReduce job configuration.⚹ ∻Step-by-step Instruction∻ ‖ ▪ ⋮ Import necessary MapReduce packages to handle the Hadoop framework functionalities. ‖ ‖ ▪ ⋮ Define the WordCount class that will contain the Mapper, Reducer, and main method. ‖ ‖ ▪ ⋮ Create the TokenizerMapper class to tokenize the input text and categorize words based on length. ‖ ‖ ▪ ⋮ Implement the map method to output the word category as the key and the count '1' as the value. ‖ ‖ ▪ ⋮ Define a helper method categorizeWord to determine the category of a word based on its length. ‖ ‖ ▪ ⋮ Create the IntSumReducer class to sum up the counts of words in each category. ‖ ‖ ▪ ⋮ Implement the reduce method to aggregate the counts for each category. ‖ ‖ ▪ ⋮ In the main method, set up the job configuration, including setting the job's name, mapper, combiner, reducer, output key and value classes, input and output paths. ‖ ‖ ▪ ⋮ Compile the code, create a jar file, and run the MapReduce job on the Hadoop cluster using the provided grep.txt file. ‖ ‖ ▪ ⋮ Display the results and save the terminal messages to solution3.pdf. ‖ ∻Time Complexity∻ ⚹The time complexity of the map and reduce functions is O(n) where n is the number of words in the input text, as each word is processed once.⚹ ∻Space Complexity∻ ⚹The space complexity is O(k) where k is the number of unique word categories, as the reducer maintains a count for each category.⚹

Sia