It is a means to combine the different operations of Spark MLlib, i.e., imputer, transformer and model. The imputer modifies data samples and removes null values. The transformer transforms data points, for example, the TF-IDF vectorizer, which performs TF-IDF vectorization on the data set. A model is a pyspark.mllib model, for example, Logistic Regression.