Visual Analytics - Cluster Analysis (2)
In the second part of this lecture, we elaborate more on visualization techniques for assessing clustering results. For example, we explain refinements of parallel coordinates that are designed to better convey the multidimensional data, e.g., the distributions of values from instances belonging to different clusters. Moreover, we discuss outlier detection and visualization. Outliers may be erroneous values that may significantly affect the results of analytical methods, such as the determination of correlations. Thus, in a pre-processing step they are often removed. However, care is necessary to avoid that valid and interesting data is removed – what is actually an outlier may strongly depend on the context. The question whether a dataset is an outlier, may be answered more subtly than just with yes or no. Some methods discriminate mild and strong outliers; others compute an outlierness value. Outliers may be detected as a side-effect of some clustering methods, e.g., DB-SCAN considers all elements as outliers that do not belong to a cluster. However, if the goal is to identify outliers, there are better methods that directly search for outliers. They are based on a model, e.g., assumptions for outliers. Therefore, there are distance-based or density-based methods, some of them including parameters that need to be carefully chosen. Visual representations again are essential to assess whether the „right“ outliers were identified. A related problem, only briefly discussed, is rare subgroup detection – an analytic technique that is used for example in online learning to identify small subgroups that would benefit from special support. Chapters: 00:00 - Visualization of Clustering Results 12:16 - Outlier Detection 29:43 - Methods 40:40 - Results, Validation and Discussion 46:10 - HD Data 47:25 - Summary, Outlook and References
Download
0 formatsNo download links available.