This experiment has the aim of proving the hypotheses (that I shall develop) by handling data and managing it effectively to bring about realistic results. The hypotheses will be based upon test results from a driving school. I am intending to explore the success of males and females in a driving school. I shall do this by testing the number of one hour lessons and the number of errors in a random sample of 80 pupils. I aim to use this data to test several hypothesises.
The data which I will be using in this investigation has been gathered from an unknown driving school. I am going to investigate in to have a look at the performance of this driving school. The raw data which I have been given is in list form and provides me with the performance of 240 students, of both sexes and includes:
* The number of one hour lessons before successful test.
* The total number of minor errors in test(s) taken.
* The name of the instructor (4 different teachers).
* The day of the test.
* The time of the day of the test.
The data is in no particular order and reflects a large range and variety of results and performances. Here is an example of the data:
Gender of Student
Number of One-Hour Lessons
Number of Minor mistakes
Day of Test
Time of Day of Test
I have chosen to explore whether males are more successful than females or vice versa. I will break this line of enquiry down into three, manageable hypothesises. I will analyse whether the gender of the candidate affects the number of errors and whether the gender of the candidate affects the number of lessons required. I feel by investigating these three factors I will have enough evidence to give a clear answer to my investigation.
I am going to use the data given to follow my analysis and test the following hypothesises:
* The more lessons you take, the less mistakes you will make
* Males perform better than Females in the test
* One gender performs better under because they get better instructors
In order to notice investigate my hypothesises and notice patterns within this large amount of data, it is necessary to summarise and identify the 240 pupil’s data into diagrams and charts. To create these graphs, I had to sort the data, using excel, I used the auto filter feature in excel to highlight specific sections within the field which here was the number of pupils. So for my first graph, I sorted the data using the first field gender. This made easy for us to count the number of males and females so that we could create the graph.
By looking at the graph we notice that there is almost fifty- fifty split between the number of males and females doing the test. There is slightly more girls than boys but this is consistent with the male/female population ratio today.
In this graph we notice that instructor B taught the majority of pupils (100) while instructor A taught 60 and C and D only taught 40 each.
In this graph we notice that Friday is the most popular day for pupils lessons as over sixty pupils attend at that day. The least popular day of the week is Tuesday at over thirty pupils.
Below are means and ranges of ‘Number of Lessons’ and ‘Number of Minor mistakes’, I did this as they are too complex to put in to a graph. I did not include ‘Time of day’ as its interpretation would become muddled.
Number of Minor Mistakes
I will take a stratified sample to find 80 pupils. Sampling is necessary to attempt to give a general picture of all the data, there is too much data to deal with so I need a proportional section that reflects the whole. By using a stratified sample and using the mean and range, I can provide a fairly detailed answer to my hypothesises that will, in general, make full use of all the data. In order to reduce the danger of bias.
In order to select a sample from 240 into 80 which is directly proportional I must find out how many pupils attend at each day, I then divide it by 240 and multiply by 80. I will then round up the result to the nearest whole number.
With these new results I will total them all together to prove I have an accurate sample of 80.
I will now select at random, the pupils from each day using the results I got above. In order to do this I will have to select every third person from the list of 240 pupils in their days, I will make it random by first picking a random number out of three out of a hat for each day. I will repeat this procedure for each day. If I come across missing data I will move onto the next available field and continue. Below I have shown a print screen for Tuesday the candidates I am using for the sample.
Verification of Sample:
Now in order to prove my newly acquired sample is proportional to its larger original I will create more graphs and diagrams and compare them to that in the preliminary analysis, if they are fairly consistent to one another then my sample is right and ready to use in my hypothesizes.
Looking at this graph, there is like before an almost 50/50 split between males and females with a slightly larger ratio of girls to boys like before. So this is proportional.
In this graph we notice that instructor B taught the majority of pupils again while instructor A taught the 2nd most again and C and D teaching similar low amounts.
In this graph we notice that Friday is again the most popular day for pupil’s lessons. The least popular day of the week is like in the preliminary analysis Tuesday at just over 10 pupils.
The ‘Number of lesson’ mean is almost the same as the one in the preliminary analysis which was 23.02917, same goes for the range, and it is only two numbers off the previous one which was 36. In the ‘Number of mistakes’ the mean and range is also almost the same to that in the preliminary analysis which was 16.77093 and 36 respectfully however the range seems to be a little more off than the other parts of the sample.
Number of Minor Mistakes
Number of Lessons
With these graphs and diagrams I can see the consistencies and similarities of the sample to that of the original data in the preliminary analysis, and with these results I am fairly confidant that this sample is proportional and un bias. I am now ready to use this sample in my hypothesizes.
Hypothesis 1: The more lessons you take, the fewer mistakes you will make.
This is the assumed thought that with practice you will improve so I will plot my sample’s data onto a scatter graph which will allow me to see if there is a strong negative correlation which I would expect.
From the graph above depicting my sample, you can see there is a very weak negative correlation, almost not visible without a line of best fit. Very few points lay on the line. This is a surprise to me; there must be some reason why this correlation is so weak, this would make you think that there must be some pupils that had the most lessons are the one who aren’t as good at driving and so need to take more lessons and so make a lot of mistakes. Perhaps it is females who are obscuring this fact and males who show a strong negative correlation, this will be my next hypothesizes.
Hypothesis 2: Males perform better in tests than females
Here I will split up the graph I did in the first hypothesizes to show male and female correlations separately which may prove males perform better.
Here the correlation is a little stronger than it was but is still a weak correlation. Only about 4 points lie on the line of best fit.
This has a little or no correlation and is weaker in comparison to the male graph. Only 2 points lie on the line of best fit.
From the line graphs I found that it was a female who made the highest amount of mistakes but also the lowest amount of mistakes. By looking at the graph I also found that a greater number of females were found to have greater mistakes than that of the males. The ale graph has a slightly steeper line than that of the females indicating better performance however both graphs still remain similar so I will use box plots which may let me see a pass rate more clearly for each gender. To make a box plot, I will first need to find the quartiles from my data, to do this I will make a stem and leaf diagram.
We can see from the box plot above that the males performed better than the females as the females had a higher median and upper and lower quartile in mistakes. After the evidence shown above it is clear to me that my evidence supports my hypothesis that Females make more mistakes than Males.hesis, there could be a range of reasons to explain why men perform better than women. One of these reasons could be that one or more of the instructors are better at teaching pupils how to drive than another instructor. Maybe the reason why the females appeared to do worse in the test than men is because an instructor who isn’t particularly good at teaching driving had the greater part of the female candidates. If the instructor can influence results then I will investigate this in my next hypothesis.
Hypothesis 3: Females performed poorly because of some instructor/s
To find out which instructor/s is influencing these results I must draw out three scatter graphs for each instructor and their pupils. A graph showing their pupils performance, then separated into male performance and female performance graphs to compare to each other. There will be twelve scatter graphs overall. External factors will be mentioned after these graphs.
Above we see a weak negative correlation but this one is stronger than other correlation we have seen so far. The instructor A is fairly consistent as definitely the more lesson pupils have undertaken the better they perform. I will separate this into male and female graphs and see what it shows me.
From this graph we see a stronger negative correlation; around 2 points lie on the line of best fit and the line is a little steeper showing better performance than in the graph before and the instructors teaching ability. There are only around three anomalies, one which is with a high No. of lessons. This could be due to the tiredness of the candidate or other external factors mentioned later.
In this graph above we see a weak negative correlation with a gentler slope of a line than before in the other graphs. There are a lot of anomalies and only one point lies on the line. This supports my hypothesis that males perform better than females with Instructor A; however this could be attributed to any of the external factors below.
In this graph about instructor B we see a weak negative correlation but with a steep line of best fit, this shows good performance and teaching so I will separate it to show more. There are a lot of anomalies but this could be due to fact that Instructor B has more pupils than any other instructor.
From this graph, you can see a strong negative correlation between the No. of lessons by instructor B and the number of mistakes made by the male pupils, this is fairly respectable even if there are some strange anomalies towards the end, I would say instructor B is a good teacher to males.
Above we see a very weak negative correlation with no points lying on the line of best fit. . Again I could use this graph as further evidence for my previous hypothesis that men perform better in the test than women, but this graph could also show that instructor B teaches men better than women. After looking at instructor B’s graphs, I believe that instructor B is not as good of an instructor as instructor A as A taught females better. However it could be due to any of the external factors I have mentioned.
Here we see a very negative correlation with only about one or two anomalies, around four points lie on the line of best fit. It is steep indicating that instructor C is a good teacher.
Here we see a strong negative correlation, which is very steep, I think Instructor C is good at teaching males. They had taken a lot of lessons.
In the graph above we see a perfect negative correlation, this line is better than the male graph and does not support my hypothesis, more males had more minor mistakes but this could be explained by the time of day and other external factors. I think Instructor C however is better than A or B so far in teaching both male and female as they both had the lowest mistakes.
From this graph we can see a fairly strong negative correlation however it is a gentle slope of a line ad very high up despite high amount of lessons. This shows the instructor is not making much progress. There are no points that lie in the line and are numerous anomalies.
In this graph we see a very weak positive correlation in males; I did not expect this, this show that these pupils are actually getting worse however this could be held responsible to age and physical attributes of the candidate or some other external factor. This show that instructor D cannot teach male candidates.
From the graph above we see a big difference compared to the males. Here we see a strong negative correlation. Even though they had fewer lessons than the males they still performed better. This does not support my hypotheses that males perform better than females in the test. This instructor is better at teaching females than males.
There are several factors, which could affect the performance of drivers in general. These include:
* The teacher. The experience and teaching methods of the instructor.
* The weather conditions. Rain will restrict the performance while sunny weather will increase the ease of driving.
* The time of day. The time of day may affect the level of light and hence the visibility.
* The level of alcohol the candidate may have been consuming before his/hers test and lessons.
* The tiredness of the candidate.
* The age and physical attributes of the candidate. Pupils with better vision may find driving easier than others.
* The experience of drivers. One driver may have had more tests than others and therefore know the procedure. Also, the number of lessons will affect the ability of the candidate.
The difference in performance between males and females may be affected by the following factors:
* Females may perform better than males because it is widely believed they have a larger concentration span.
* Males may be more confident at the wheel and therefore be less prone to make mistakes.
* Females could have come from a learning deficiency school.
* Males could have come from a rural background and might have had more experience driving using tractors or other machinery.
* Perhaps the Instructor is sexist and is marking males or females more leniently.
After completing all the graphs I have come to the conclusion that Instructor C was the best Instructor for teaching both males and females due to the instructor’s low mistakes and vast improvement in lessons, the worst instructor I would be instructor D because they were the only instructor to come up with a positive graph. Instructor B was the worst instructor for teaching females. So in summary in my first hypotheses I proved that more lessons means less mistakes and that in my second hypothesis males do perform better than females. However after studying those graphs in which two instructors were worse at teaching the majority of the female sample while males did do better with said instructor A and B, I do believe that the reason females perform worse is because of the instructors A and B.