Upon completion of the week 3 lesson, you will be able to:
- Explain the concept of clustering algorithm
- Discuss different type of clustering algorithms
- Understand the cons and pros of clustering algorithms
This week reading assignment is the course textbook chapter 4 (EMC Education Service (Eds). (2015) Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing, and Presenting Data, Indianapolis, IN: John Wiley & Sons, Inc).
Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group than those in other groups. In simple words, the aim is to segregate groups with similar traits and assign them into clusters.
Letâ€™s understand this with an example. Suppose, you are the head of a rental store and wish to understand preferences of your costumers to scale up your business. Is it possible for you to look at details of each costumer and devise a unique business strategy for each one of them? Definitely not. But, what you can do is to cluster all of your costumers into say 10 groups based on their purchasing habits and use a separate strategy for costumers in each of these 10 groups. And this is what we call clustering.
Dimensionality reduction is the process of converting data of very high dimensionality (many attributes) into data of much lower dimensionality such that each of the lower dimensions conveys much more information. Dimensionality reduction improved the performance of data analysis algorithms in terms of speed and better results.
Choose four data preparation and transformation methods and write a comparative study on them. Refer to the assigned readings. In addition, you are also required to conduct your own search to learn about data normalization techniques, data type conversion, and attribute and instance selection.
While APA style is required for the body of this assignment, solid academic writing is expected, and documentation of sources should be presented using APA formatting guidelines, which can be found in the APA Style Guide, located in the Student Success Center.