Skip to main content

Spark MLlib MCQ

Displaying 1 - 10 of 14

Analysis using PySpark Exercise

Download the following notebook and data and answer the following questions:

Q1: As mentioned in the notebook, what is the r2 score of the model built on 'MPG-Out' after the imputing and scaling the features? [Mark the nearest values]

  • 0.78
  • 0.66
  • 0.69
  • None of the above
# Use the transformed dataframe to create our input data 
input_data = Final_output

#Build a linear regression model
from pyspark.ml.regression import LinearRegression
lr = LinearRegression(featuresCol='scaledFeatures', labelCol='MPG-Out')
model1 = lr.fit(input_data)

#Find the r2 score of the model
summary =model1.evaluate(input_data)
print(summary.r2)

Q2: As mentioned in the notebook, what is the r2 score of the model built on 'MPG-Out' after just imputing the features and not scaling them? [Mark the nearest values]

  • 0.78
  • 0.66
  • 0.69
  • None of the above
#Initialize a new dataframe
input_data = df1

#import Pipeline and set the stages of the pipeline
from pyspark.ml import Pipeline
pipeline =  Pipeline(stages = [imputer,assembler])

#Use .fit() and .transform() on the pipeline
model = pipeline.fit(data)
input_data = model.transform(input_data)

#Build a new linear regression model
from pyspark.ml.regression import LinearRegression
lr = LinearRegression(featuresCol='features', labelCol='MPG-Out')
model2 = lr.fit(input_data)

#Find the r2 score of the model
summary =model2.evaluate(input_data)
print(summary.r2)

Q3: The following is a code extract that is used to create a pipeline:

OHE = OneHotEncoderEstimator(inputCols=['C1','C15', 'C16', 'C18'],outputCols=['C1_encoded', 'C15_encoded','C16_encoded', 'C18_encoded'])

vec_assembler = VectorAssembler(inputCols=['C1_encoded','C15_encoded', 'C16_encoded', 'C18_encoded'], outputCol="features")

lr = LogisticRegression(featuresCol='features', labelCol='label')

final_pipe = Pipeline(stages=[OHE, vec_assembler,lr])

What will be the output of this section of code? 

  • A logistic regression model
  • A pipeline object

Q4: Which of the following statements that are used to call the evaluation metrics from the model will be executed?

A.

result = model.evaluate(df)
model.accuracy

B. Correct

result = model.evaluate(df)
result.accuracy

C.

model = lr.fit(df)
model.accuracy

D.

model = lr.fit(df)
lr.accuracy
Subscribe to Spark MLlib MCQ

About

At ProgramsBuzz, you can learn, share and grow with millions of techie around the world from different domain like Data Science, Software Development, QA and Digital Marketing. You can ask doubt and get the answer for your queries from our experts.