Python Scikit-Learn Models Quiz

Python
0 Passed
0% acceptance

A 50-question quiz covering the Scikit-Learn machine learning library, from estimator basics and pipelines to model selection and evaluation metrics.

50 Questions
~100 minutes
1

Question 1

What is the core method used to train a model in Scikit-Learn?

A
train()
B
fit()
C
learn()
D
execute()
2

Question 2

Which method is used to generate predictions from a trained model?

A
score()
B
forecast()
C
predict()
D
infer()
3

Question 3

What does `estimator.score(X, y)` typically return for a classifier?

A
The mean squared error.
B
The accuracy of the predictions.
C
The predicted class labels.
D
The confusion matrix.
4

Question 4

What is a 'Transformer' in Scikit-Learn?

A
A deep learning model.
B
An estimator that modifies the data (e.g., scaling, encoding).
C
A visualization tool.
D
A metric calculator.
5

Question 5

Which method combines `fit()` and `transform()` into one step?

javascript

scaler = StandardScaler()
X_scaled = scaler.____(X_train)
                
A
fit_transform()
B
transform_fit()
C
apply()
D
run()
6

Question 6

What is the shape of the input feature matrix `X` typically expected by `fit()`?

A
(n_features, n_samples)
B
(n_samples, n_features)
C
(n_samples,)
D
A list of lists.
7

Question 7

What happens if you call `fit()` on an estimator that has already been trained?

A
It throws an error.
B
It incrementally updates the model.
C
It re-initializes the model and trains from scratch (forgetting previous training).
D
It does nothing.
8

Question 8

Which attribute usually stores the learned parameters after training (e.g., coefficients)?

A
model.weights
B
model.params
C
model.coef_
D
model.learned
9

Question 9

How do you instantiate a Linear Regression model?

javascript

from sklearn.linear_model import LinearRegression
model = ____
                
A
LinearRegression.new()
B
LinearRegression()
C
create_model('linear')
D
fit(LinearRegression)
10

Question 10

What is the target vector `y` usually expected to be?

A
A 2D array (n_samples, 1).
B
A 1D array (n_samples,).
C
A dictionary.
D
A string.
11

Question 11

For a classifier, what does `predict_proba(X)` return?

A
The predicted class labels.
B
The probability estimates for each class.
C
The confidence score.
D
The log probabilities.
12

Question 12

If `predict_proba` returns `[0.2, 0.8]` for a binary classifier, what will `predict` return?

A
0
B
1
C
0.8
D
True
13

Question 13

What input does `predict()` require?

A
The target labels `y`.
B
The feature matrix `X` with the same number of features as training data.
C
A single sample only.
D
The training data.
14

Question 14

Which method is used for unsupervised clustering models to assign labels?

A
cluster()
B
group()
C
predict()
D
assign()
15

Question 15

Can you use `predict()` on a model before calling `fit()`?

A
Yes, it returns random predictions.
B
No, it raises a `NotFittedError`.
C
Yes, it uses default weights.
D
It depends on the model.
16

Question 16

Which class handles missing values by replacing them with the mean or median?

A
SimpleImputer
B
MissingValueHandler
C
FillNA
D
ImputeScaler
17

Question 17

How do you convert categorical string variables (e.g., 'red', 'blue') into integers?

A
OneHotEncoder
B
LabelEncoder
C
StringIndexer
D
CategoryEncoder
18

Question 18

What does `OneHotEncoder` do?

A
Converts text to numbers.
B
Creates a binary column for each category value.
C
Scales data to 0-1.
D
Removes duplicates.
19

Question 19

Why should you fit a scaler ONLY on the training set?

A
To save time.
B
To prevent data leakage.
C
Because the test set is too small.
D
It doesn't matter.
20

Question 20

Which preprocessing step is often required for Support Vector Machines (SVM) and KNN?

A
Feature Scaling
B
Imputation
C
Polynomial Features
D
Binning
21

Question 21

What is the main purpose of `sklearn.pipeline.Pipeline`?

A
To visualize data.
B
To chain multiple processing steps and an estimator into a single object.
C
To download data.
D
To parallelize training.
22

Question 22

How do you create a pipeline with a scaler and a classifier?

javascript

from sklearn.pipeline import make_pipeline
pipe = ____(StandardScaler(), LogisticRegression())
                
A
Pipeline
B
make_pipeline
C
create_pipeline
D
chain
23

Question 23

When you call `pipe.fit(X, y)`, what happens to the intermediate steps?

A
They are skipped.
B
They call `fit_transform()` sequentially, passing output to the next step.
C
They only call `fit()`.
D
They only call `transform()`.
24

Question 24

When you call `pipe.predict(X)`, what happens?

A
It predicts using the first step.
B
It transforms X using all intermediate steps, then calls `predict` on the final estimator.
C
It errors.
D
It refits the model.
25

Question 25

What is `ColumnTransformer` used for?

A
Transforming the target column.
B
Applying different transformations to different columns (e.g., scaling numeric, encoding categorical).
C
Removing columns.
D
Renaming columns.
26

Question 26

What does `StandardScaler` do?

A
Scales features to the range [0, 1].
B
Standardizes features by removing the mean and scaling to unit variance.
C
Normalizes samples to unit norm.
D
Takes the logarithm.
27

Question 27

What does `MinMaxScaler` do?

A
Scales features to a given range, usually [0, 1].
B
Centers the data.
C
Divides by the maximum value.
D
Removes outliers.
28

Question 28

Which scaler is robust to outliers?

A
StandardScaler
B
MinMaxScaler
C
RobustScaler
D
Normalizer
29

Question 29

What is the difference between `Normalizer` and `StandardScaler`?

A
They are the same.
B
`Normalizer` scales individual samples (rows) to have unit norm; `StandardScaler` scales features (columns).
C
`Normalizer` is for regression.
D
`StandardScaler` is deprecated.
30

Question 30

If you use `fit_transform` on the test set with `StandardScaler`, what happens?

A
It works perfectly.
B
You introduce data leakage.
C
It errors.
D
It improves accuracy legitimately.
31

Question 31

Which function is used to split data into training and testing sets?

A
train_test_split
B
split_data
C
cross_val_split
D
sample_split
32

Question 32

What is the purpose of the `random_state` parameter?

A
To improve accuracy.
B
To ensure reproducibility of the split.
C
To randomize the model weights.
D
To speed up processing.
33

Question 33

What is 'Stratified Sampling' (via `stratify=y`)?

A
Splitting data based on time.
B
Ensuring the proportion of class labels is the same in train and test sets.
C
Random sampling.
D
Sampling with replacement.
34

Question 34

Which model is a good baseline for classification tasks?

A
Neural Network
B
DummyClassifier
C
XGBoost
D
SVM
35

Question 35

How do you perform K-Fold Cross-Validation?

A
cross_val_score(model, X, y, cv=5)
B
model.cross_validate(5)
C
kfold(model)
D
validate(model)
36

Question 36

What does `GridSearchCV` do?

A
Searches for data on the grid.
B
Exhaustively searches over a specified parameter grid to find the best combination.
C
Randomly samples parameters.
D
Visualizes the grid.
37

Question 37

How do you define the parameter grid for `GridSearchCV`?

javascript

param_grid = {
    'C': [0.1, 1, 10],
    'kernel': ['linear', 'rbf']
}
                
A
A list of values.
B
A dictionary where keys are parameter names and values are lists of settings.
C
A function.
D
A tuple.
38

Question 38

What is the advantage of `RandomizedSearchCV` over `GridSearchCV`?

A
It is more accurate.
B
It is faster and more efficient for large parameter spaces.
C
It checks every combination.
D
It works without data.
39

Question 39

After fitting `GridSearchCV`, how do you access the best model?

A
grid.best_estimator_
B
grid.model
C
grid.winner
D
grid.top
40

Question 40

Can you tune pipeline parameters with GridSearchCV?

A
No, only single models.
B
Yes, by using the step name followed by double underscore (e.g., `stepname__param`).
C
Yes, but only the last step.
D
Yes, using a special class.
41

Question 41

Why is Cross-Validation preferred over a single Train/Test split?

A
It is faster.
B
It provides a more robust estimate of model performance by using all data for both training and validation.
C
It requires less data.
D
It is easier to code.
42

Question 42

What is `LeaveOneOut` cross-validation?

A
K-Fold where K equals the number of samples.
B
Leaving one feature out.
C
Training on one sample.
D
A deprecated method.
43

Question 43

What does `cross_val_predict` return?

A
The scores.
B
The predictions for each sample when it was in the test set.
C
The trained models.
D
The parameters.
44

Question 44

When should you use `TimeSeriesSplit`?

A
Always.
B
When data is ordered by time (e.g., stock prices).
C
When data is random.
D
For image data.
45

Question 45

Does `cross_val_score` return a fitted model?

A
Yes.
B
No, it returns a list of scores.
C
It returns the best model.
D
It returns the data.
46

Question 46

Which metric is appropriate for a classification problem with imbalanced classes?

A
Accuracy
B
F1-Score or ROC-AUC
C
Mean Squared Error
D
R-Squared
47

Question 47

What does the Confusion Matrix show?

A
The correlation between features.
B
The counts of True Positives, True Negatives, False Positives, and False Negatives.
C
The training time.
D
The probability distribution.
48

Question 48

What is the ROC Curve?

A
A plot of True Positive Rate vs. False Positive Rate at various thresholds.
B
A plot of Precision vs. Recall.
C
A plot of Loss vs. Epochs.
D
A plot of Accuracy vs. Data Size.
49

Question 49

Which function calculates the Mean Squared Error?

A
accuracy_score
B
mean_squared_error
C
r2_score
D
confusion_matrix
50

Question 50

What does an R^2 score of 1.0 indicate?

A
The model explains none of the variance.
B
The model makes perfect predictions.
C
The model is overfitting.
D
The model is random.

QUIZZES IN Python