Spark MLlib MCQ
Analysis using PySpark Exercise
Download the following notebook and data and answer the following questions:
Q1: As mentioned in the notebook, what is the r2 score of the model built on 'MPG-Out' after the imputing and scaling the features? [Mark the nearest values]
- 0.78
- 0.66
- 0.69
- None of the above
# Use the transformed dataframe to create our input data
input_data = Final_output
#Build a linear regression model
from pyspark.ml.regression import LinearRegression
lr = LinearRegression(featuresCol='scaledFeatures', labelCol='MPG-Out')
model1 = lr.fit(input_data)
#Find the r2 score of the model
summary =model1.evaluate(input_data)
print(summary.r2)
Q2: As mentioned in the notebook, what is the r2 score of the model built on 'MPG-Out' after just imputing the features and not scaling them? [Mark the nearest values]
- 0.78
- 0.66
- 0.69
- None of the above
#Initialize a new dataframe
input_data = df1
#import Pipeline and set the stages of the pipeline
from pyspark.ml import Pipeline
pipeline = Pipeline(stages = [imputer,assembler])
#Use .fit() and .transform() on the pipeline
model = pipeline.fit(data)
input_data = model.transform(input_data)
#Build a new linear regression model
from pyspark.ml.regression import LinearRegression
lr = LinearRegression(featuresCol='features', labelCol='MPG-Out')
model2 = lr.fit(input_data)
#Find the r2 score of the model
summary =model2.evaluate(input_data)
print(summary.r2)
Q3: The following is a code extract that is used to create a pipeline:
OHE = OneHotEncoderEstimator(inputCols=['C1','C15', 'C16', 'C18'],outputCols=['C1_encoded', 'C15_encoded','C16_encoded', 'C18_encoded'])
vec_assembler = VectorAssembler(inputCols=['C1_encoded','C15_encoded', 'C16_encoded', 'C18_encoded'], outputCol="features")
lr = LogisticRegression(featuresCol='features', labelCol='label')
final_pipe = Pipeline(stages=[OHE, vec_assembler,lr])
What will be the output of this section of code?
- A logistic regression model
- A pipeline object
Q4: Which of the following statements that are used to call the evaluation metrics from the model will be executed?
A.
result = model.evaluate(df)
model.accuracy
B. Correct
result = model.evaluate(df)
result.accuracy
C.
model = lr.fit(df)
model.accuracy
D.
model = lr.fit(df)
lr.accuracy
Arrange the elements of a Pipeline object in a proper sequence
- Estimator
- Imputer
- Transformer
- Model