Thursday, August 31, 2017

BigData Notes

--------->
Docker
We have a Virtual Machine which has the same environment to the server



The image has everything, it has a complete environment all contained into one

Using the resource and configuration of its host system





It provides and lightweight environment to run your application code. Docker has an efficient workflow for moving your application from developers laptop, test environment to production.

--------->

Why declaring Mapper and Reducer classes as static?

"When declaring mapper and reducer classes as inner classes to another class, they have to be declared static such that they are not dependent on the parent class.
Hadoop uses reflection to create an instance of the class for each map or reduce task that runs. The new instance created expects a zero argument constructor (otherwise how would it know what to pass).
By declaring the inner mapper or reduce class without the static keyword, the java compile actually creates a constructor which expects an instance of the parent class to be passed in at construction.
You should be able to see this by running the javap command against the generated classfile
Also, the static keyword is not valid when used in a parent class declaration (which is why you never see it at the top level, but only in the child classes)"


----->

public static class Map extends Mapper<**LongWritable**, Text, Text, IntWritable>
"TextInputFormat’s keys, being simply the offset within the file, are not normally very useful. It is common for each line in a file to be a key-value pair, separated by a delimiter such as a tab character. For example, this is the output produced by TextOutputFormat, Hadoop’s default OutputFormat. To interpret such files correctly, KeyValueTextInputFormat is appropriate.

No comments:

Post a Comment