Skip to main content

Spark MLlib Interview Questions

Displaying 1 - 10 of 14

When do you use StringIndexer?

StringIndexer is used when a label does not have an integer value. Models generally prefer prediction columns to be of integer type rather than string type. It is similar to the LabelEncoder of sklearn.

When do you perform regression or use a regression-type model in spark-mllib?

Regression is performed when we are predicting a value. For example, consider a scenario wherein you want to predict the number of jumps given and the number of steps a person has to follow. In this case, we will use regression. We will use particular steps as a feature and the number of jumps will be used as output.

What is a sparse vector?

A local vector contains both integer-type and 0-based indices. It also contains double-typed values, which are stored on a single machine. In MLlib, two types of local vectors are supported, namely, Dense and Sparse vectors. A sparse vector is one in which most of the entries are zero.

What is Spark-Conf, which is used in spark-mllib?

It is used to set configuration and the parameters while submitting a Spark job. These parameters include variables such as the Spark cluster’s IP address, the Spark executor’s memory and the number of cores to be used.

Does Spark support SVM with SGD?

Yes, Spark supports SVM with SGD. It is a stochastic gradient descent optimiser that is used to optimise a model for a given data set. It is an iterative method.

Subscribe to Spark MLlib Interview Questions


At ProgramsBuzz, you can learn, share and grow with millions of techie around the world from different domain like Data Science, Software Development, QA and Digital Marketing. You can ask doubt and get the answer for your queries from our experts.