OnePaper Logo

Machine Learning MCQs with Answers

Regression trees are typically applied to model what kind of relationships?

  • A. linear
  • B. nonlinear
  • C. categorical
  • D. symmetrical

Support Vector Machine is fundamentally which type of model?

  • A. logical model
  • B. probabilistic model
  • C. geometric model
  • D. none of the above

Given typical weather data and a playing decision, what’s the probability of not playing?

  • A. 0.4
  • B. 0.64
  • C. 0.36
  • D. 0.5

In PCA, the number of principal components can be at most the same as the original input dimensions.

  • A. TRUE
  • B. FALSE

Which two techniques are commonly used for probability calibration in supervised learning?

  • A. Platt Calibration and Isotonic Regression
  • B. Statistics and Informal Retrieval
  • C. Both A and B
  • D. None of these

Which distance metric is known as the Manhattan distance?

  • A. applicable to continuous variables
  • B. applicable to categorical variables
  • C. applicable to both continuous and categorical
  • D. applicable only to constants

When fitting a degree-2 polynomial, what happens to bias and variance?

  • A. high bias, high variance
  • B. low bias, high variance
  • C. high bias, low variance
  • D. low bias, low variance

Both supervised learning and unsupervised clustering require at least one which attribute?

  • A. hidden attribute
  • B. output attribute
  • C. input attribute
  • D. categorical attribute

Which learning type relies on feedback from the environment even without explicit labels?

  • A. Supervised
  • B. Reinforcement
  • C. Unsupervised
  • D. None of the above

Allowing the model to generalize and avoid over-specialization prevents which issue?

  • A. Overfitting
  • B. Overlearning
  • C. Classification
  • D. Regression

Which statement is true about k-Nearest Neighbors?

  • A. used for classification
  • B. used for regression
  • C. used for both classification and regression
  • D. not useful in ML

A feature in a decision tree can act as what?

  • A. binary split
  • B. predictor
  • C. both a and b
  • D. none of the above

Which metric measures the average absolute difference between predicted and actual values?

  • A. root mean squared error
  • B. mean squared error
  • C. mean absolute error
  • D. mean positive error

Which is the most general form of distance metric?

  • A. Manhattan
  • B. Euclidean
  • C. mean
  • D. Minkowski

Which scikit-learn class fills missing values based on mean, median, or frequency?

  • A. LabelEncoder
  • B. LabelBinarizer
  • C. DictVectorizer
  • D. Imputer

Which function offers unsupervised prediction in this list?

  • A. cl_forecastb
  • B. cl_nowcastc
  • C. cl_precastd
  • D. none of the mentioned

Training increasingly deep, multi-layer models over the last decade is known as what?

  • A. Deep learning
  • B. Machine learning
  • C. Reinforcement learning
  • D. Unsupervised learning

A good test dataset should be large enough and also what?

  • A. representative of the whole dataset
  • B. collected quickly
  • C. overfit the model
  • D. none of the above

In Naïve Bayes classification, the features are assumed to be what relative to each other?

  • A. independent
  • B. dependent
  • C. partially dependent
  • D. none

In the R formula `lm(formula, data,…)`, the `data` argument should be which type?

  • A. matrix
  • B. vector
  • C. array
  • D. list

Which parameter allows you to specify a fixed random seed?

  • A. regression
  • B. classification
  • C. random_state
  • D. missing values

Which of these is an example of categorical data?

  • A. branch of bank
  • B. expenditure in rupees
  • C. price of house
  • D. weight of a person

Linear regression belongs to which category of machine learning?

  • A. TRUE
  • B. FALSE

What’s a key difference between stacking and blending?

  • A. stacking has less stable CV
  • B. in blending you use out-of-fold predictions
  • C. stacking is simpler than blending
  • D. none of these

Logistic regression is a __________ regression method for modeling data with a __________ outcome.

  • A. linear, numeric
  • B. linear, binary
  • C. nonlinear, numeric
  • D. nonlinear, binary

Sorting new Netflix reviews as positive, negative or neutral is an example of what?

  • A. supervised learning
  • B. unsupervised learning
  • C. semi-supervised learning
  • D. reinforcement learning

Which feature selection starts from an empty set and adds features one by one?

  • A. forward feature selection
  • B. backward feature selection
  • C. both
  • D. none of the above

Neural networks are composed of which type of functions?

  • A. linear functions
  • B. nonlinear functions
  • C. discrete functions
  • D. exponential functions

If a linear regression model fits the training data perfectly (zero train error), what can you say about test error?

  • A. always zero
  • B. non-zero
  • C. cannot say
  • D. equal to train error

When there are many classes, Gini index is an unsuitable impurity measure.

  • A. TRUE
  • B. FALSE

Which dataset is used to build a data-mining model?

  • A. training data
  • B. validation data
  • C. test data
  • D. hidden data

Which method assigns a conditional probability to each instance?

  • A. linear regression
  • B. logistic regression
  • C. simple regression
  • D. multiple linear regression

To match the PCA projection produced by SVD, you must first do what?

  • A. center the data to zero mean
  • B. center the data to zero median
  • C. not possible
  • D. none of these

The __________ of the separating hyperplane is determined by the number of features.

  • A. dimension
  • B. classification
  • C. reduction
  • D. none of the above

Decision tree induction typically uses which search strategy?

  • A. greedy
  • B. top-down
  • C. procedural
  • D. step-by-step

Is feature selection mandatory to extract useful patterns?

  • A. Yes
  • B. No

Computers learn best by focusing on which?

  • A. facts
  • B. concepts
  • C. procedures
  • D. principles

KDD refers to extraction of what?

  • A. data
  • B. knowledge
  • C. rules
  • D. model

Linear regression is classified as which type of ML algorithm?

  • A. supervised
  • B. unsupervised
  • C. semi-supervised
  • D. can't say

Which method requires data to be linearly separable?

  • A. hard-margin linear SVM
  • B. linear logistic regression
  • C. soft-margin linear SVM
  • D. centroid method

Which metric is the average squared difference between predicted and actual?

  • A. mean squared error
  • B. root mean squared error
  • C. mean absolute error
  • D. mean relative error

To fit the best line in linear regression, we commonly use which method?

  • A. Least Squares Error
  • B. Maximum Likelihood
  • C. Logarithmic Loss
  • D. Both A and B

Which are descriptive modeling techniques?

  • A. clustering
  • B. classification
  • C. association rule
  • D. both a and c

Common feature selection methods in regression include what?

  • A. correlation coefficient
  • B. greedy algorithms
  • C. all of the above
  • D. none of these

A hallmark of an excellent ML method is which combination?

  • A. fast
  • B. accurate
  • C. scalable
  • D. all of the above

To avoid implying ML is magic, some prefer the term ____ instead of prediction.

  • A. inference
  • B. interference
  • C. accuracy
  • D. none of above

Unlabeled examples in ML are characterized by having what?

  • A. no prior labels
  • B. no confusing information
  • C. prior labels
  • D. a lot of noise

Which categories of ML techniques exist?

  • A. supervised and semi-supervised learning
  • B. unsupervised learning
  • C. both A & B
  • D. none of the mentioned

Which of these is not considered a machine learning method?

  • A. artificial intelligence
  • B. rule-based inference
  • C. both a and b
  • D. none of the mentioned

If an ML model’s output doesn’t include a target variable, it’s called a ____ model.

  • A. descriptive model
  • B. predictive model
  • C. reinforcement learning
  • D. all of the above

Which two broad techniques in ML include Genetic Programming and Inductive Learning?

  • A. Genetic Programming and Inductive Learning
  • B. Speech recognition and Regression
  • C. Both A and B
  • D. None of the mentioned

In simple terms, machine learning involves which two activities?

  • A. training on historical data
  • B. making predictions
  • C. both A and B
  • D. automation of complex tasks

Which is considered the best machine learning method characteristic?

  • A. scalable
  • B. accurate
  • C. fast
  • D. all of the above

What is produced at the end of the ML training process?

  • A. machine learning model
  • B. machine learning algorithm
  • C. null
  • D. accuracy

Applying ML methods to very large databases is commonly called what?

  • A. data mining
  • B. artificial intelligence
  • C. big data computing
  • D. Internet of Things

If an ML model’s output does include a target variable, it’s referred to as a ____ model.

  • A. descriptive model
  • B. predictive model
  • C. reinforcement learning
  • D. all of the above