Discovery of Latent Patterns with Hierarchical Bayesian Mixed-Membership Models and the Issue of Model Choice

Event Date: 

Wednesday, January 23, 2008 - 3:15pm

Event Date Details: 

Refreshments served at 3:00 PM

Event Location: 

  • South Hall 5607F

Tanzy Love

Discovery of Latent Patterns with Hierarchical Bayesian Mixed-Membership Models and the Issue of Model Choice

Model choice is a major methodological issue in the explosive growth of data-mining models involving latent structure for clustering and classification, especially because models often have different parameterizations and very different specifications and constraints. Here, we work from a general formulation of hierarchical Bayesian mixed-membership models and present several model specifications and variations, both parametric and nonparametric, in the context of learning the number of latent groups and associated patterns for clustering units. We elucidate strategies for comparing models and specifications by producing novel analyses of the following two data sets: (1) a corpus of scientific publications from the Proceedings of the National Academy of Sciences; (2) data on functionally disabled American seniors from the National Long Term Care Survey. 
Our specifications make use of both text and references to narrow the choice of the number of latent topics in our publications data, in both parametric and nonparametric settings. Our findings also bring new insights regarding latent topics compared with earlier analyses.