I started playing around with a new visualization tool the other day, and thought I would document here.
If you are into getting hands on with high-dimensional datasets, you should check out http://projector.tensorflow.org/. You can learn more about TensorBoard here (the Embedding Projector is part of TensorBoard).
“diagnostic” view used when evaluating query clusters
The scatter plot above is from an ongoing project of mine called “qspace”; it provides insight into the “query space” represented in a set of queries (~6k in this run).
For the scatter plot – each dot represents a query (or more accurately, a stemmed/pre-processed query), with similar color dots being search queries that have been assigned to the same cluster. The terms in the legend represent the top two bi-grams for each cluster group.
The other two plots show 1) how “tight” the clusters are in terms of the silhouette coefficient (silhouette plot), and 2) the count of queries in each cluster (cluster histogram). Large clusters (ex. #3, aka “machine learn, learn algorithm”) can be subset by using the queries from the cluster as the sole input into the same clustering pipeline used on the larger set.
I initially created the components of the visualization to help evaluate the results from my cluster analysis. There are better plots for actually evaluating the “health” of the clusters (ex. “elbow” method), but the 3D scatter plot is a good final output that provides some concrete context, as well as entertainment value (as you might imagine, the spreadsheet view is relatively boring), for conversations relating to SEM/SEO (ex. imagine exploring queries/keywords that a site ranks well against, competitors positioning, etc).
The queries in the underlying dataset represent queries from site visits, and additional queries cultivated from various seed sets.
Re-reading this is interesting. Makes me feel like my mouth is full of marbles…funny how I evaluated each word and sentence. I still stand behind it though.
Information builds the foundation of all human knowledge. People use it to define reality in terms that are comprehensible to the human mind. The media that we use to communicate these definitions of reality fundamentally alter the meaning of the message encoded within the information, thereby influencing our understanding of truth and reality. It is with this mindset that I have chosen to pursue my Masters in Media Studies at The New School.