The scatter plot above is from an ongoing project of mine called “qspace”; it provides insight into the “query space” represented in a set of queries (~6k in this run).
For the scatter plot – each dot represents a query (or more accurately, a stemmed/pre-processed query), with similar color dots being search queries that have been assigned to the same cluster. The terms in the legend represent the top two bi-grams for each cluster group.
The other two plots show 1) how “tight” the clusters are in terms of the silhouette coefficient (silhouette plot), and 2) the count of queries in each cluster (cluster histogram). Large clusters (ex. #3, aka “machine learn, learn algorithm”) can be subset by using the queries from the cluster as the sole input into the same clustering pipeline used on the larger set.
I initially created the components of the visualization to help evaluate the results from my cluster analysis. There are better plots for actually evaluating the “health” of the clusters (ex. “elbow” method), but the 3D scatter plot is a good final output that provides some concrete context, as well as entertainment value (as you might imagine, the spreadsheet view is relatively boring), for conversations relating to SEM/SEO (ex. imagine exploring queries/keywords that a site ranks well against, competitors positioning, etc).
The queries in the underlying dataset represent queries from site visits, and additional queries cultivated from various seed sets.