What is Hadoop Streaming?

Hadoop Streaming is an API that allows writing Mappers and Reduces in any language. It uses Unix standard streams as the interface between Hadoop and the user application.

Streaming is naturally suited for text processing. The data view is line-oriented and processed as a key-value pair separated by a 'tab' character. The Reduce function reads lines from the standard input, which is sorted by key, and writes its results to the standard output.