We use cookies to give you the best experience possible. By continuing we’ll assume you’re on board with our cookie policy

K Neighbours Essay Sample

The whole doc is available only for registered users OPEN DOC
• Pages:
• Word count: 530
• Category: data tree

Get Access

K Neighbours Essay Sample

a. How would this customer be classified?
A. This customer would be classified as not accepting the personal loan offer. According to the KNN_Output there appears to be overfitting due to the discrepancies in the classification matrix for training (Class 0 = 0% error, Class 1 = 0% error, Overall = 0% error), and validation error (Class 0 = 4.2% error, Class 1 = 55.85% error, and Overall = 9.1% error).

b. What is a choice of k that balances between overfitting and ignoring the predictor information? A. A choice of k that balances between overfitting and ignoring the predictor would be k = 6. The value is chosen because it minimizes the % validation error. After testing various k levels. According to the validation error log for different k the best k points to 6, where %error training is 7.4% and validation % error is 8.75%.

c. Show the classification matrix for the validation data that results from using the best k.

d. Classify the customer using the best k
A. According to the best k the customer would not be inclined to accept the personal loan. e. Re-partition the data, this time into training, validation, and test sets (50%: 30%: 20%). Apply the k-NN method with the k chosen above, compare the classification matrix of the test set with that of the training and validation sets. Comment on the differences and their reason. A. Based on the training, validation, and test matrices we can see a steady increase in the percentage errors. There does not appear to be overfitting due to the minimal error discrepancies among all three matrices, from the training to the validation error there is a 5.69% difference, and from validation to test error there is a 14.05% error difference. Based on the lift chart, the model appears to make a difference even though the loan acceptance has a 82% error rate for the test classification matrix. 9.3

i. Compare the tree generated by the CT with the one generated by the RT. Are they different? (Look at structure, the top predictors, size of tree, etc.) Why? A. According to the Regression Tree and Classification Tree Output, both appear to have age, kilometers, and horsepower as the most important car specifications. The regression tree seems to be structurally bigger compared to the classification tree. In addition, both trees appear to use similar predictors. According to the classification matrix in the classification matrix for the training error report the percentage error is 0% for all 20 bins.

For the validation error report there are approximately 1 bins with 100% error rates, overall error is 74.88%. Finally, for the test error report there are 4 bins with 100% error rates, overall error is 75.98%. There appears to be a slight decrease in the overall error percentage between the validation and the test error report, but there is clearly overfitting due to the distinct difference between training and validation confusion matrix. ii. Predict the price, using the RT and the CT, of a used Toyota Corolla with the specifications listed in Table 9.3 A. After running both models the predicted price was the same.

We can write a custom essay

Order an essay

You May Also Find These Documents Helpful

Roles and responsibilities of data users

In order to successfully gather data from the users, conducting an interview must be a good idea. Hence that may insufficient as users may not able to describe everything by themselves as the ideal meaning may not portrays only by words. So that it is better if the user can be observed while they use an interface they currently use which may portrays much information....

Data Handling - ICT

Choice of Problem When asked to think of ideas to base my ICT course work on I came up with the following; * Make over salon * Modelling agency * School health centre * Job centre * A tracing family trees agency I decided on the job centre for a number of reasons; * This idea was original; no one else in my year was...

Estate Agents - Problems of a Paper...

Introduction An estate agent based in a paper-based office comes across a lot of problems. It necessary to solve these problems and try to bring the system of the basic system and improve to be more efficient and more reliable in a working area. The firm is in an area that is equally part urban and part rural. Further more because of this area people...

Database Analysis and Database Design Project -...

Moving Images operates a DVD library. The library has a large number of titles, each title having at least one copy. Each title falls into a specific category some of these are adventure, thriller, fantasy, action or education. (There are others) All titles are loan only to registered members of Moving Images. Information is keep about the members is only personal details including name, address...

The ICT Theory

Data All the data for the new system will have to be collected together. All the names, addresses and phone numbers of all the client's contacts will need to be put together so that it will be easier to input the data to the new system. To collect the data together there will need to be a data sheet designed. One of the client members...

Essays 64,739

300+
Materials Daily
100,000+ Subjects
2000+ Topics
Free Plagiarism
Checker
All Materials
are Cataloged Well

Sorry, but copying text is forbidden on this website. If you need this or any other sample, we can send it to you via email.

Sorry, but only registered users have full access

immediately?

Become a member

Thank You A Lot!

Emma Taylor

online

Hi there!
Would you like to get such a paper?
How about getting a customized one?

Can't find What you were Looking for?