One interesting application of statistics is in trying to identify who wrote historical works that were published using pseudonyms. A classic paper on this topic is â€œ On Sentence Length as a Statistical Characteristic of Style in Prose: With Application to Two Case of Disputed Authorsâ€ by G. Udny Yule, published in the journal Biometrika ( Vol. 30, No. Â¾ Jan., 1939, pp. 363 – 390). In that paper, Yule identified the length of sentences as a feature that tends to remain consistent across written works by the same author.
For this project, you are going to figure out how to estimate the distribution of sentence lengths for a book of your choosing.
Find a book that is mainly text and relatively uncluttered with pictures, graphs, ect.
Choose a random sample of 20 sentences from throughout the book and count how many words they have in them.
Use two different sampling measures: simple random sampling, stratified sampling, cluster sampling, or systematic sampling.
For your two sampling measures, make a table showing how many sentences there were for each length (two words, three words, ect.) and answer the following questions:
- Explain exactly how you chose your samples.
- Explain which of your two methods was easier to use.
- Of the methods you did not use, explain which of them would have been the most difficult to use.
- Do you think that either of your methods produced biased results? Explain.
- Report your results for each of the two methods. Are they generally in agreement with each other?