Machine Learning MCQs with Answers
Regression trees are typically applied to model what kind of relationships?
- A. linear
- B. nonlinear
- C. categorical
- D. symmetrical
Support Vector Machine is fundamentally which type of model?
- A. logical model
- B. probabilistic model
- C. geometric model
- D. none of the above
Given typical weather data and a playing decision, what’s the probability of not playing?
- A. 0.4
- B. 0.64
- C. 0.36
- D. 0.5
In PCA, the number of principal components can be at most the same as the original input dimensions.
- A. TRUE
- B. FALSE
Which two techniques are commonly used for probability calibration in supervised learning?
- A. Platt Calibration and Isotonic Regression
- B. Statistics and Informal Retrieval
- C. Both A and B
- D. None of these
Which distance metric is known as the Manhattan distance?
- A. applicable to continuous variables
- B. applicable to categorical variables
- C. applicable to both continuous and categorical
- D. applicable only to constants
When fitting a degree-2 polynomial, what happens to bias and variance?
- A. high bias, high variance
- B. low bias, high variance
- C. high bias, low variance
- D. low bias, low variance
Both supervised learning and unsupervised clustering require at least one which attribute?
- A. hidden attribute
- B. output attribute
- C. input attribute
- D. categorical attribute
Which learning type relies on feedback from the environment even without explicit labels?
- A. Supervised
- B. Reinforcement
- C. Unsupervised
- D. None of the above
Allowing the model to generalize and avoid over-specialization prevents which issue?
- A. Overfitting
- B. Overlearning
- C. Classification
- D. Regression
Which statement is true about k-Nearest Neighbors?
- A. used for classification
- B. used for regression
- C. used for both classification and regression
- D. not useful in ML
A feature in a decision tree can act as what?
- A. binary split
- B. predictor
- C. both a and b
- D. none of the above
Which metric measures the average absolute difference between predicted and actual values?
- A. root mean squared error
- B. mean squared error
- C. mean absolute error
- D. mean positive error
Which is the most general form of distance metric?
- A. Manhattan
- B. Euclidean
- C. mean
- D. Minkowski
Which scikit-learn class fills missing values based on mean, median, or frequency?
- A. LabelEncoder
- B. LabelBinarizer
- C. DictVectorizer
- D. Imputer
Which function offers unsupervised prediction in this list?
- A. cl_forecastb
- B. cl_nowcastc
- C. cl_precastd
- D. none of the mentioned
Training increasingly deep, multi-layer models over the last decade is known as what?
- A. Deep learning
- B. Machine learning
- C. Reinforcement learning
- D. Unsupervised learning
A good test dataset should be large enough and also what?
- A. representative of the whole dataset
- B. collected quickly
- C. overfit the model
- D. none of the above
In Naïve Bayes classification, the features are assumed to be what relative to each other?
- A. independent
- B. dependent
- C. partially dependent
- D. none
In the R formula `lm(formula, data,…)`, the `data` argument should be which type?
- A. matrix
- B. vector
- C. array
- D. list
Which parameter allows you to specify a fixed random seed?
- A. regression
- B. classification
- C. random_state
- D. missing values
Which of these is an example of categorical data?
- A. branch of bank
- B. expenditure in rupees
- C. price of house
- D. weight of a person
Linear regression belongs to which category of machine learning?
- A. TRUE
- B. FALSE
What’s a key difference between stacking and blending?
- A. stacking has less stable CV
- B. in blending you use out-of-fold predictions
- C. stacking is simpler than blending
- D. none of these
Logistic regression is a __________ regression method for modeling data with a __________ outcome.
- A. linear, numeric
- B. linear, binary
- C. nonlinear, numeric
- D. nonlinear, binary
Sorting new Netflix reviews as positive, negative or neutral is an example of what?
- A. supervised learning
- B. unsupervised learning
- C. semi-supervised learning
- D. reinforcement learning
Which feature selection starts from an empty set and adds features one by one?
- A. forward feature selection
- B. backward feature selection
- C. both
- D. none of the above
Neural networks are composed of which type of functions?
- A. linear functions
- B. nonlinear functions
- C. discrete functions
- D. exponential functions
If a linear regression model fits the training data perfectly (zero train error), what can you say about test error?
- A. always zero
- B. non-zero
- C. cannot say
- D. equal to train error
When there are many classes, Gini index is an unsuitable impurity measure.
- A. TRUE
- B. FALSE
Which dataset is used to build a data-mining model?
- A. training data
- B. validation data
- C. test data
- D. hidden data
Which method assigns a conditional probability to each instance?
- A. linear regression
- B. logistic regression
- C. simple regression
- D. multiple linear regression
To match the PCA projection produced by SVD, you must first do what?
- A. center the data to zero mean
- B. center the data to zero median
- C. not possible
- D. none of these
The __________ of the separating hyperplane is determined by the number of features.
- A. dimension
- B. classification
- C. reduction
- D. none of the above
Decision tree induction typically uses which search strategy?
- A. greedy
- B. top-down
- C. procedural
- D. step-by-step
Is feature selection mandatory to extract useful patterns?
- A. Yes
- B. No
Computers learn best by focusing on which?
- A. facts
- B. concepts
- C. procedures
- D. principles
KDD refers to extraction of what?
- A. data
- B. knowledge
- C. rules
- D. model
Linear regression is classified as which type of ML algorithm?
- A. supervised
- B. unsupervised
- C. semi-supervised
- D. can't say
Which method requires data to be linearly separable?
- A. hard-margin linear SVM
- B. linear logistic regression
- C. soft-margin linear SVM
- D. centroid method
Which metric is the average squared difference between predicted and actual?
- A. mean squared error
- B. root mean squared error
- C. mean absolute error
- D. mean relative error
To fit the best line in linear regression, we commonly use which method?
- A. Least Squares Error
- B. Maximum Likelihood
- C. Logarithmic Loss
- D. Both A and B
Which are descriptive modeling techniques?
- A. clustering
- B. classification
- C. association rule
- D. both a and c
Common feature selection methods in regression include what?
- A. correlation coefficient
- B. greedy algorithms
- C. all of the above
- D. none of these
A hallmark of an excellent ML method is which combination?
- A. fast
- B. accurate
- C. scalable
- D. all of the above
To avoid implying ML is magic, some prefer the term ____ instead of prediction.
- A. inference
- B. interference
- C. accuracy
- D. none of above
Unlabeled examples in ML are characterized by having what?
- A. no prior labels
- B. no confusing information
- C. prior labels
- D. a lot of noise
Which categories of ML techniques exist?
- A. supervised and semi-supervised learning
- B. unsupervised learning
- C. both A & B
- D. none of the mentioned
Which of these is not considered a machine learning method?
- A. artificial intelligence
- B. rule-based inference
- C. both a and b
- D. none of the mentioned
If an ML model’s output doesn’t include a target variable, it’s called a ____ model.
- A. descriptive model
- B. predictive model
- C. reinforcement learning
- D. all of the above
Which two broad techniques in ML include Genetic Programming and Inductive Learning?
- A. Genetic Programming and Inductive Learning
- B. Speech recognition and Regression
- C. Both A and B
- D. None of the mentioned
In simple terms, machine learning involves which two activities?
- A. training on historical data
- B. making predictions
- C. both A and B
- D. automation of complex tasks
Which is considered the best machine learning method characteristic?
- A. scalable
- B. accurate
- C. fast
- D. all of the above
What is produced at the end of the ML training process?
- A. machine learning model
- B. machine learning algorithm
- C. null
- D. accuracy
Applying ML methods to very large databases is commonly called what?
- A. data mining
- B. artificial intelligence
- C. big data computing
- D. Internet of Things
If an ML model’s output does include a target variable, it’s referred to as a ____ model.
- A. descriptive model
- B. predictive model
- C. reinforcement learning
- D. all of the above