R Language - RStudio
I need help with one of the Assignment for my Data Science and Big Data Course.
I have a Dataset for VPN-nonVPN Traffic. I need help with the below in R language,
Preprocessing activities, Features Selection / Engineering :
1. Plot variable importance plot with 10-20 importance features in R language
2. Partial plot with 3-5 most important features in R language
3. How did you select features in R language?
4. Did you make any important feature transformations in R language?
5. Did you find any interesting interactions between features?
6. Did you use external data? (if permitted)
1. What training methods did you use?
2. Did you ensemble the models?
3. If you did ensemble, how did you weight the different models?
A6. Interesting findings
4. What was the most important trick you used?
5. What do you think set you apart from others in the competition?
6. Did you find any interesting relationships in the data that don't fit in the sections above?
Many customers are happy to trade off model performance for simplicity. With this in mind:
1. Is there a subset of features that would get 90-95% of your final performance? Which features? *
2. What model that was most important? *
3. What would the simplified model score?
Accuracy metrics reporting, charts, Model Execution Time :
1. How long does it take to train your model?
2. How long does it take to generate predictions using your model?
3. How long does it take to train the simplified model (referenced in section A6)?
4. How long does it take to generate predictions from the simplified model?