The interest is studying how the occurrences of particular word/expressions are related to spam email. Assignment4.Rmd fits a logistic regression and displays the summary statistics of simulation.

statistics

Description

Instruction 

• Please answer all two questions. 

• For both questions, Assignment4.Rmd fits the models and generate posterior samples. The seed-number is fixed and everyone has the same simulation output to answer the questions. Do not change the seed-number in Assignment4.Rmd. 

• You will need to use R for some questions and there is no need to include R-codes. However, you will need to demonstrate (either plain language or mathematical expressions) how you obtained the values. 

• Make sure you include the coversheet.


1. The interest is studying how the occurrences of particular word/expressions are related to spam email. Assignment4.Rmd fits a logistic regression and displays the summary statistics of simulation. Answer (a,b,c,f) using the output of this simulation. 

(a) Express the probability for each binary outcome of yesno in terms of covariates, intercept a and coefficients (b1, b2, b3). [3 marks] 

(b) Express the prior densities. [2 marks] 

(c) Among the three word/symbols (’money’, $ and !), which one is the most important predictor in identifying spam email? Explain the reason for your answer. [3 marks] 1 

(d) For each of ’money’, $ and !, find the mean proportion of occurrences in spam emails. [3 marks] 

(e) In (d), which word or symbol is most frequently appeared on average in spam emails? [1 marks] 

(f) Your friend claims that the most frequently appeared word/symbol in spam emails is the most important indicator in recognizing a spam email. Is this claim supported by the dataset? Explain the reason for your answer. [3 marks]


Related Questions in statistics category