CS 4407: DATA MINING AND MACHINE LEARNING
Which of the following is an example of a NOSQL Analytics database?
Select one:
The correct answer is: Cassandra
Question 2
What does ETL stand for?
The correct answer is: Extract transform load
Question 3
True or False: In a data warehouse, unidimensional data is stored in a star schema format.
Select one:
True
False
The correct answer is 'False'.
Question 4
The term OLAP stands for?
Select one:
The correct answer is: Online Analytical Processing
Question 5
A database where all of the values for a particular column are stored contiguously is called?
Select one:
The correct answer is: Column-oriented storage
Question 6
True or False: The snowflake schema differs from the star schema in that the table holding the dimensional data are normalized.
Select one:
True
False
The correct answer is 'True'.
Question 7
True or False: Map/Reduce refers to an optimized approach to process SQL queries.
Select one:
True
False
The correct answer is 'False'.
Question 8
True or False: Information Retrieval or text analytics is NOT a form of data mining.
Select one:
True
False
The correct answer is 'False'.
Question 9
Which of the following is NOT a statistical processing software package?
Select one:
The correct answer is: Vertica
Question 10
True or False: NoSQL databases provide greater performance at the expense of availability.
Select one:
True
False
The correct answer is 'False'.
Question 11
True or False: Residual plots are a useful tool for identifying non-linearity.
Select one:
True
False
The correct answer is 'True'.
Question 12
Which command will provide descriptive statistics for the Boston data frame?
Select one:
The correct answer is: summary(Boston)
Question 13
Which of the following functions is used to generate a linear regression model within R?
Select one:
The correct answer is: lm()
Question 14
True or False: Colinearity refers to a situation in which two or more predictor variables are closely related to each other.
Select one:
True
False
The correct answer is 'True'.
Question 15
True or False: In the KNN algorithm, a small value for K provides the most flexible fit (low bias/high variance).
Select one:
True
False
The correct answer is 'True'.
Question 16
The names() function within R:
Select one:
The correct answer is: Lists all of the column names in the data frame provided as an argument to the function.
Question 17
You have a dataset which produces the following plot and you need to create a predictive model. Which of the following techniques are you most likely to use?
Select one:
The correct answer is: Linear Regression
Question 18
True or False: The library() function lists all of the libraries that are loaded into memory within R.
Select one:
True
False
The correct answer is 'False'.
Question 19
Residual plots are a useful tool for identifying:
Select one:
The correct answer is: Non-linearity
Question 20
Which of the following is an example of a parametric approach.
Select one:
The correct answer is: Linear Regression
Question 21
A linear regression model is expressed as y ≈ β0+ β1x where β0 is the intercept and β1 is the slope of the line). The following equations can be used to compute the value of the coefficients β0 and β1.Using the following set of data, find the coefficients β0 and β1rounded to the nearest thousandths place and the predicted value of y when x is 10.
{(-1 , 0), (0 , 2), (1 , 4), (2 , 5)}
Select one:
Question 22
The values of y and their corresponding values of y are shown in the table below, identify the linear regression model in the form y=mx+b and report the values of m(slope) and b(intercept) as well as the estimated value of y when the value of x is 10
Select one:
Question 23
What R command could we use to generate a scatterplot diagram of our data to determine if it forms a linear pattern that would be suitable for linear regression or a non-linear pattern that would require some other technique?
Select one:
The correct answer is:plot()
Question 24
The values of y and their corresponding values of y are shown in the table below, identify the linear regression model in the form y=mx+b and report the values of m (slope) and b (intercept) as well as the estimated value of y when the value of x is 3. Round to the nearest hundreds place.
Select one:
Question 25
The income of a company that produces disaster equipment has been expressed as a linear regression model based upon the input variable which is the number of hurricanes projected for the upcoming hurricane season. The model is express as Y = mX + b where Y is the estimated sales in millions of dollars, m = .76 and b = 5. Assuming that the weather service is predicting 6 hurricanes during the season what are the sales in millions of dollars expected to be?
The correct answer is:9.56
Question 26
True or False: The following data plot represents data that is linearly separable?
Select one:
True
False
The correct answer is 'False'.
Question 27
True or False: Linear regression is considered a non-parametric approach.
Select one:
True
False
The correct answer is 'False'.
Question 28
True or False: The fix() function identifies values that contain data within a data frame that are inconsistent and automatically corrects these values.
Select one:
True
False
The correct answer is 'False'.
Question 29
A farmer’s yield of corn is expressed as a linear regression model based upon the input variable which is the number of days of sunlight during the growing season. The model is express as Y = mX + b where Y is the estimated corn yield in bushels per acre, m = 1.38 and b = 42. Assuming that during the growing season it is predicted that there will be 67 days of sun, what will the corn yield be in bushels per acre?
The correct answer is:134.46
Question 30
True or False: Logistic regression can be used to predict a continuous variable.
Select one:
True
False
The correct answer is 'False'.
Question 31
True or False: Data Mining can be said to be a process designed to detect patterns in data sets.
Select one:
True
False
The correct answer is 'True'.
Question 31
True or False: In unsupervised learning, the learning algorithm must be trained using data attributes that have been paired with an outcome variable.
Select one:
True
False
The correct answer is 'False'.
Question 32
True or False: Unsupervised learning involves building a statistical model for predicting, or estimating an output based upon one or more inputs.
Select one:
True
False
The correct answer is 'False'.
Question 33
Regression analysis involves developing a model where one or more inputs are used to predict an output variable. Regression, in this context, represents what kind of learning.
Select one:
The correct answer is: Supervised learning
Question 34
Assuming that we have a data set that includes sales data for every customer over the course of several years and we wanted to use this data to predict future sales which would be the most appropriate technique to investigate?
Select one:
The correct answer is: Regression
Question 35
Assume that you had a variety of data including medical history, diet, heredity factors on individuals who developed cancer and you wanted to use this data to determine whether a person is likely to develop cancer. Which technique would be the most promising to start with?
Select one:
The correct answer is: Classification
Question 36
Which of the following is an example of an unsupervised learning algorithm?
Select one:
The correct answer is: K-Means
Question 37
True or False: A predication outcome variable must be categorical?
Select one:
True
False
The correct answer is 'False'.
Question 38
Which of the following is NOT a machine learning technique?
Select one:
The correct answer is: Linear Components Analytics
Question 39
True or False: In a supervised learning model, Bias refers to the error that is introduced from the assumptions of the data analyst.
Select one:
True
False
The correct answer is 'False'.
Question 40
The objective of Answer _ is to identify valid novel and potentially useful, and understandable correlations and patterns in existing data.
The correct answer is: data mining