Data Mining Quiz

CS 4407: DATA MINING AND MACHINE LEARNING

Which of the following is an example of a NOSQL Analytics database?

Select one:

  1. IBM DB2
  2. Oracle
  3. Cassandra
  4. Greenplum

 

The correct answer is: Cassandra


Question 
2

What does ETL stand for?

The correct answer is: Extract transform load


Question 
3

True or False: In a data warehouse, unidimensional data is stored in a star schema format.

Select one:

True 

False

 

The correct answer is 'False'.


Question 
4

The term OLAP stands for?

Select one:

  1. Online Applications Processing
  2. Online Analytical Processing
  3. Online Transactional Processing
  4. Online Limited Analytics Processing

 

The correct answer is: Online Analytical Processing


Question 
5

A database where all of the values for a particular column are stored contiguously is called?

Select one:

  1. Column-oriented storage
  2. In memory database
  3. Partitioning
  4. Data Compression

 

The correct answer is: Column-oriented storage


Question 
6

True or False: The snowflake schema differs from the star schema in that the table holding the dimensional data are normalized.

Select one:

True 

False

 

The correct answer is 'True'.


Question 
7

True or False: Map/Reduce refers to an optimized approach to process SQL queries.

Select one:

True 

False

 

The correct answer is 'False'.


Question 
8

True or False: Information Retrieval or text analytics is NOT a form of data mining.

Select one:

True 

False

 

The correct answer is 'False'.


Question 
9

Which of the following is NOT a statistical processing software package?

Select one:

  1. SAS
  2. Minitab
  3. Vertica
  4. Mahout

 

The correct answer is: Vertica


Question 
10

True or False: NoSQL databases provide greater performance at the expense of availability.

Select one:

True 

False

 

The correct answer is 'False'.


Question 
11

True or False: Residual plots are a useful tool for identifying non-linearity.

Select one:

True 

False

 

The correct answer is 'True'.


Question 
12

Which command will provide descriptive statistics for the Boston data frame?

Select one:

  1. summary(Boston)
  2. eval(Boston)
  3. coef(Boston)
  4. stats(Boston)

 

The correct answer is: summary(Boston)


Question 
13

Which of the following functions is used to generate a linear regression model within R?

Select one:

  1. lredict()
  2. lm()
  3. lstat()
  4. glm()

 

The correct answer is: lm()


Question 
14

True or False: Colinearity refers to a situation in which two or more predictor variables are closely related to each other.

Select one:

True 

False

 

The correct answer is 'True'.


Question 
15

True or False: In the KNN algorithm, a small value for K provides the most flexible fit (low bias/high variance).

Select one:

True 

False

 

The correct answer is 'True'.


Question 
16

The names() function within R:

Select one:

  1. Lists all of the column names in the data frame provided as an argument to the function.
  2. Attaches the names to make the variables in the data frame available by name.
  3. Displays the names of the classes identified by the K means clustering algorithm.
  4. None of these answers

 

The correct answer is: Lists all of the column names in the data frame provided as an argument to the function.


Question 
17

You have a dataset which produces the following plot and you need to create a predictive model. Which of the following techniques are you most likely to use?

Select one:

  1. Linear Regression
  2. Curvilinear Regression
  3. K-Nearest Neighbors
  4. Logistic Regression

 

The correct answer is: Linear Regression


Question 
18

True or False: The library() function lists all of the libraries that are loaded into memory within R.

Select one:

True 

False

 

The correct answer is 'False'.


Question 
19

Residual plots are a useful tool for identifying:

Select one:

  1. Non-linearity
  2. Linearity
  3. Polynomial relationships
  4. Non-parametric relationships

 

The correct answer is: Non-linearity


Question 
20

Which of the following is an example of a parametric approach.

Select one:

  1. KNN Classifier
  2. Bayes Classifier
  3. Linear Regression
  4. Principle Components Analysis

 

The correct answer is: Linear Regression


Question 
21

A linear regression model is expressed as y ≈ β0+ β1x where β0 is the intercept and β1 is the slope of the line). The following equations can be used to compute the value of the coefficients β0 and β1.Using the following set of data, find the coefficients β0 and β1rounded to the nearest thousandths place and the predicted value of y when x is 10.

{(-1 , 0), (0 , 2), (1 , 4), (2 , 5)}

Select one:

  1. a = Answer_
  2. b = Answer_
  3. y = Answer_
  4. when x is 10


Question 
22

The values of y and their corresponding values of y are shown in the table below, identify the linear regression model in the form y=mx+b and report the values of m(slope) and b(intercept) as well as the estimated value of y when the value of x is 10

Select one:

  1. b = Answer_
  2. m = Answer_
  3. y = Answer_

 


Question 
23

What R command could we use to generate a scatterplot diagram of our data to determine if it forms a linear pattern that would be suitable for linear regression or a non-linear pattern that would require some other technique?

Select one:

  1. plot()
  2. hist()
  3. matrix()
  4. summary()

 

The correct answer is:plot()


Question 
24

The values of y and their corresponding values of y are shown in the table below, identify the linear regression model in the form y=mx+b and report the values of m (slope) and b (intercept) as well as the estimated value of y when the value of x is 3. Round to the nearest hundreds place.

Select one:

  1. b = Answer_
  2. m = Answer_
  3. y = Answer_

 


Question 
25

The income of a company that produces disaster equipment has been expressed as a linear regression model based upon the input variable which is the number of hurricanes projected for the upcoming hurricane season. The model is express as Y = mX + b where Y is the estimated sales in millions of dollars, m = .76 and b = 5. Assuming that the weather service is predicting 6 hurricanes during the season what are the sales in millions of dollars expected to be?

  1. Answer: _ million dollars

 

The correct answer is:9.56


Question 
26

True or False: The following data plot represents data that is linearly separable?

Select one:

True 

False

 

The correct answer is 'False'.


Question 
27

True or False: Linear regression is considered a non-parametric approach.

Select one:

True 

False

 

The correct answer is 'False'.


Question 
28

True or False: The fix() function identifies values that contain data within a data frame that are inconsistent and automatically corrects these values.

Select one:

True 

False

 

The correct answer is 'False'.


Question 
29

A farmer’s yield of corn is expressed as a linear regression model based upon the input variable which is the number of days of sunlight during the growing season. The model is express as Y = mX + b where Y is the estimated corn yield in bushels per acre, m = 1.38 and b = 42. Assuming that during the growing season it is predicted that there will be 67 days of sun, what will the corn yield be in bushels per acre?

  1. Answer: _ bushels per acre

 

The correct answer is:134.46


Question 
30

True or False: Logistic regression can be used to predict a continuous variable.

Select one:

True 

False

 

The correct answer is 'False'.


Question 
31

True or False: Data Mining can be said to be a process designed to detect patterns in data sets.

Select one:

True 

False

 

The correct answer is 'True'.


Question 
31

True or False: In unsupervised learning, the learning algorithm must be trained using data attributes that have been paired with an outcome variable.

Select one:

True 

False

 

The correct answer is 'False'.


Question 
32

True or False: Unsupervised learning involves building a statistical model for predicting, or estimating an output based upon one or more inputs.

Select one:

True 

False

 

The correct answer is 'False'.


Question 
33

Regression analysis involves developing a model where one or more inputs are used to predict an output variable. Regression, in this context, represents what kind of learning.

Select one:

  1. Reinforcement learning
  2. Supervised learning
  3. Unsupervised learning
  4. Hybrid Learning

 

The correct answer is: Supervised learning


Question 
34

Assuming that we have a data set that includes sales data for every customer over the course of several years and we wanted to use this data to predict future sales which would be the most appropriate technique to investigate?

Select one:

  1. Classification
  2. Regression
  3. Clustering
  4. Decision Trees

 

The correct answer is: Regression


Question 
35

Assume that you had a variety of data including medical history, diet, heredity factors on individuals who developed cancer and you wanted to use this data to determine whether a person is likely to develop cancer. Which technique would be the most promising to start with?

Select one:

  1. Classification
  2. Regression
  3. Clustering
  4. Estimation

 

The correct answer is: Classification


Question 
36

Which of the following is an example of an unsupervised learning algorithm?

Select one:

  1. Linear Regression
  2. ID3 Decision Tree
  3. K-Means
  4. K-Nearest Neighbors

 

The correct answer is: K-Means


Question 
37

True or False: A predication outcome variable must be categorical?

Select one:

True 

False

 

The correct answer is 'False'.


Question 
38

Which of the following is NOT a machine learning technique?

Select one:

  1. Regression
  2. Clustering
  3. Linear Components Analytics
  4. Neural Networks

 

The correct answer is: Linear Components Analytics


Question 
39

True or False: In a supervised learning model, Bias refers to the error that is introduced from the assumptions of the data analyst.

Select one:

True 

False

 

The correct answer is 'False'.


Question 
40

The objective of Answer _ is to identify valid novel and potentially useful, and understandable correlations and patterns in existing data.

 

The correct answer is: data mining

CallTutors Guarantees

  • Work Within Deadline
  • Lowest Price Guranteed
  • Plagiarism Free Guranteed
  • 24 * 7 Availability
  • Native Experienced Experts
  • Free Revisions

Avg Client Rating: 4.8/5

Total Reviews: 9,835

Amazing operations assignment help07/21/2018

I ordered operations assignment for myself and I am very satisfied with the work your team presented. Thanks a lot!

Robert Absher
Great game theory assignment07/21/2018

The specialists on your team are experts in offering game theory assignment writings. I am very happy with the work and will definitely come again to you!

David Ball
Excellent human resource development assignment07/21/2018

Thanks, calltutors.com writers for doing such a great work. I really have a good experience working with your writers. I will definitely come again with more tasks!

James Ballantyne
Great quality law dissertation help07/21/2018

I have really good experience with Calltutors Assignment Help experts. The presented me a high-quality law dissertation writings that help me score best grades.

Jacob Smith
Pricing strategy assignment07/20/2018

Overall good pricing strategy assignment and properly met all my requirements. Thanks for such a great help team Calltutors!

John Perna

Read More...