Scikit-Learn Models

Question 1

What is the core method used to train a model in Scikit-Learn?

Accepted Answer

fit()

Question 2

Which method is used to generate predictions from a trained model?

Accepted Answer

predict()

Question 3

What does `estimator.score(X, y)` typically return for a classifier?

Accepted Answer

The accuracy of the predictions.

Question 4

What is a 'Transformer' in Scikit-Learn?

Accepted Answer

An estimator that modifies the data (e.g., scaling, encoding).

Question 5

Which method combines `fit()` and `transform()` into one step?

Accepted Answer

fit_transform()

Question 6

What is the shape of the input feature matrix `X` typically expected by `fit()`?

Accepted Answer

(n_samples, n_features)

Question 7

What happens if you call `fit()` on an estimator that has already been trained?

Accepted Answer

It re-initializes the model and trains from scratch (forgetting previous training).

Question 8

Which attribute usually stores the learned parameters after training (e.g., coefficients)?

Accepted Answer

model.coef_

Question 9

How do you instantiate a Linear Regression model?

Accepted Answer

LinearRegression()

Question 10

What is the target vector `y` usually expected to be?

Accepted Answer

A 1D array (n_samples,).

Question 11

For a classifier, what does `predict_proba(X)` return?

Accepted Answer

The probability estimates for each class.

Question 12

If `predict_proba` returns `[0.2, 0.8]` for a binary classifier, what will `predict` return?

Accepted Answer

1

Question 13

What input does `predict()` require?

Accepted Answer

The feature matrix `X` with the same number of features as training data.

Question 14

Which method is used for unsupervised clustering models to assign labels?

Accepted Answer

predict()

Question 15

Can you use `predict()` on a model before calling `fit()`?

Accepted Answer

No, it raises a `NotFittedError`.

Question 16

Which class handles missing values by replacing them with the mean or median?

Accepted Answer

SimpleImputer

Question 17

How do you convert categorical string variables (e.g., 'red', 'blue') into integers?

Accepted Answer

LabelEncoder

Question 18

What does `OneHotEncoder` do?

Accepted Answer

Creates a binary column for each category value.

Question 19

Why should you fit a scaler ONLY on the training set?

Accepted Answer

To prevent data leakage.

Question 20

Which preprocessing step is often required for Support Vector Machines (SVM) and KNN?

Accepted Answer

Feature Scaling

Question 21

What is the main purpose of `sklearn.pipeline.Pipeline`?

Accepted Answer

To chain multiple processing steps and an estimator into a single object.

Question 22

How do you create a pipeline with a scaler and a classifier?

Accepted Answer

make_pipeline

Question 23

When you call `pipe.fit(X, y)`, what happens to the intermediate steps?

Accepted Answer

They call `fit_transform()` sequentially, passing output to the next step.

Question 24

When you call `pipe.predict(X)`, what happens?

Accepted Answer

It transforms X using all intermediate steps, then calls `predict` on the final estimator.

Question 25

What is `ColumnTransformer` used for?

Accepted Answer

Applying different transformations to different columns (e.g., scaling numeric, encoding categorical).

Question 26

What does `StandardScaler` do?

Accepted Answer

Standardizes features by removing the mean and scaling to unit variance.

Question 27

What does `MinMaxScaler` do?

Accepted Answer

Scales features to a given range, usually [0, 1].

Question 28

Which scaler is robust to outliers?

Accepted Answer

RobustScaler

Question 29

What is the difference between `Normalizer` and `StandardScaler`?

Accepted Answer

`Normalizer` scales individual samples (rows) to have unit norm; `StandardScaler` scales features (columns).

Question 30

If you use `fit_transform` on the test set with `StandardScaler`, what happens?

Accepted Answer

You introduce data leakage.

Question 31

Which function is used to split data into training and testing sets?

Accepted Answer

train_test_split

Question 32

What is the purpose of the `random_state` parameter?

Accepted Answer

To ensure reproducibility of the split.

Question 33

What is 'Stratified Sampling' (via `stratify=y`)?

Accepted Answer

Ensuring the proportion of class labels is the same in train and test sets.

Question 34

Which model is a good baseline for classification tasks?

Accepted Answer

DummyClassifier

Question 35

How do you perform K-Fold Cross-Validation?

Accepted Answer

cross_val_score(model, X, y, cv=5)

Question 36

What does `GridSearchCV` do?

Accepted Answer

Exhaustively searches over a specified parameter grid to find the best combination.

Question 37

How do you define the parameter grid for `GridSearchCV`?

Accepted Answer

A dictionary where keys are parameter names and values are lists of settings.

Question 38

What is the advantage of `RandomizedSearchCV` over `GridSearchCV`?

Accepted Answer

It is faster and more efficient for large parameter spaces.

Question 39

After fitting `GridSearchCV`, how do you access the best model?

Accepted Answer

grid.best_estimator_

Question 40

Can you tune pipeline parameters with GridSearchCV?

Accepted Answer

Yes, by using the step name followed by double underscore (e.g., `stepname__param`).

Question 41

Why is Cross-Validation preferred over a single Train/Test split?

Accepted Answer

It provides a more robust estimate of model performance by using all data for both training and validation.

Question 42

What is `LeaveOneOut` cross-validation?

Accepted Answer

K-Fold where K equals the number of samples.

Question 43

What does `cross_val_predict` return?

Accepted Answer

The predictions for each sample when it was in the test set.

Question 44

When should you use `TimeSeriesSplit`?

Accepted Answer

When data is ordered by time (e.g., stock prices).

Question 45

Does `cross_val_score` return a fitted model?

Accepted Answer

No, it returns a list of scores.

Question 46

Which metric is appropriate for a classification problem with imbalanced classes?

Accepted Answer

F1-Score or ROC-AUC

Question 47

What does the Confusion Matrix show?

Accepted Answer

The counts of True Positives, True Negatives, False Positives, and False Negatives.

Question 48

What is the ROC Curve?

Accepted Answer

A plot of True Positive Rate vs. False Positive Rate at various thresholds.

Question 49

Which function calculates the Mean Squared Error?

Accepted Answer

mean_squared_error

Question 50

What does an R^2 score of 1.0 indicate?

Accepted Answer

The model makes perfect predictions.

Python Scikit-Learn Models Quiz

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Question 11

Question 12

Question 13

Question 14

Question 15

Question 16

Question 17

Question 18

Question 19

Question 20

Question 21

Question 22

Question 23

Question 24

Question 25

Question 26

Question 27

Question 28

Question 29

Question 30

Question 31

Question 32

Question 33

Question 34

Question 35

Question 36

Question 37

Question 38

Question 39

Question 40

Question 41

Question 42

Question 43

Question 44

Question 45

Question 46

Question 47

Question 48

Question 49

Question 50

About This Scikit-Learn Quiz

What You Will Practise

Other Quiz Categories

QUIZZES IN Python

Introduction & Core Fundamentals

Data Types Overview

Type Casting

Operators Overview

Logic Flow Basics

Iteration Patterns

String Manipulation Fundamentals

Sequence Collections

Python Lists Basics

Python Dictionaries

Python Tuples & Unpacking

Python Sets

Python Booleans & Truth Values

Python Variable Scope Basics

File Handling Basics

Modules and Imports

Random Module

Date and Time Basics

Lambda Functions

Exception Handling

OOP Basics

NumPy Arrays

Pandas DataFrames

Matplotlib Visualization

Seaborn Charts