Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality

Writer

Lau, Jey Han and Newman, David and Baldwin, Timothy

Briefing

Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

Figures & Tables

Table 2: Pearson correlation of OC-Human and the automatic methods — OC-Auto-PMI, OC-AutoNPMI, OC-Machine-LCP and OC-Automobile-DS — at the model level.

Tabular array 6: Word intrusion vs. observed coherence: pearson correlation results at the topic level.

Table 7: A list of W IKI topics to illustrate the impact of NPMI.

Tabular array 8: A list of W IKI topics to illustrate the difference between observed coherence and word intrusion. Boxes denote homo chosen intruder words, and boldface denotes true intruder words.

Tabular array 3: Word intrusion vs. observed coherence: Pearson correlation coefficient at the model level.

Table 4: Pearson correlation coefficient of WI-Homo and WI-Motorcar-PMI/WI-Motorcar-NPMI at the topic level.

Table ane: Pearson correlation of WI-Homo and WI-Automobile-PMI/WI-Auto-NPMI at the model level.

Tabular array 5: Pearson correlation of OC-Human and the automated methods at the topic level.

Tabular array of Contents

  • 1 Introduction
  • 2 Related Work
  • three Dataset
  • iv Human being-Interpretability at the Model Level
    • 4.1 Indirect Arroyo: Word Intrusion
    • 4.two Direct Approach: Observed Coherence
    • iv.3 Word Intrusion vs. Observed Coherence
  • 5 Human-Interpretability at the Topic Level
    • v.1 Indirect Approach: Word Intrusion
    • 5.2 Direct Approach: Observed Coherence
    • v.3 Word Intrusion vs. Observed Coherence
  • 6 Discussion
  • seven Conclusion
  • Acknowledgements
  • References

References

  • 4N. Aletras and One thousand. Stevenson. 2013a. Evaluating topic coherence using distributional semantics. In Proceedings of the 10th International Workshop on Computational Semantics (IWCS-x), pages 13–22,Potsdam, Germany.View this Paper
  • N. Aletras and M. Stevenson. 2013b. Representing topics using images. In Proceedings of the 2013 Conference of the N American Affiliate of the Association for Computational Linguistics: Human Linguistic communication Technologies (NAACL HLT 2013), pages 158–167, Atlanta, U.s.a..View this Paper
  • D. Blei and J. Lafferty. 2005. Correlated topic models. In Advances in Neural Information Processing Systems 17 (NIPS-05), pages 147–154, Vancouver,Canada.View this Paper
  • iiD.M. Blei, A.Y. Ng, and M.I. Hashemite kingdom of jordan. 2003. Latent Dirichlet resource allotment. Periodical of Machine Learning Research, three:993–1022.View this Paper
  •  L. Bolelli, Ş. Ertekin, and C.L. Giles. 2009. Topic and trend detection in text collections using Latent Dirichlet Allocation. In Proceedings of ECIR 2009,pages 776–780, Toulouse, French republic.View this Paper
  • 2G. Bouma. 2009. Normalized (pointwise) mutual information in collocation extraction. In Proceedings of the Biennial GSCL Conference, pages 31–40,Potsdam, Germany.
  • 2 5J. Chang, J. Boyd-Graber, S. Gerrish, C. Wang, and D. Blei. 2009. Reading tea leaves: How humans interpret topic models. In Advances in Neural Data Processing Systems 21 (NIPS-09), pages 288–296, Vancouver, Canada.View this Paper
  •  A. Haghighi and L. Vanderwende. 2009. Exploring content models for multi-document summarization. In Proceedings of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies 2009 (NAACL HLT 2009),pages 362–370.View this Newspaper
  • D. Hall, D. Jurafsky, and C.D. Manning. 2008. Studying the history of ideas using topic models. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP 2008), pages 363–371, Honolulu, USA.View this Newspaper
  •  M. Hall, P. Clough, and G. Stevenson. 2012. Evaluating the use of clustering for automatically organising digital library collections. In Proceedings of the Second International Briefing on Theory and Practice of Digital Libraries, pages 323–334, Paphos, Cyprus.View this Paper
  • T. Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of 22nd International ACMSIGIR Briefing on Enquiry and Development in Data Retrieval (SIGIR'99), pages 50–57,Berkeley, U.s..View this Paper
  •  M. Paul and R. Girju. 2010. A 2-dimensional topic-aspect model for discovering multi-faceted topics. In Proceedings of the 24th Annual Conference on Bogus Intelligence (AAAI-10), Atlanta, The states.View this Paper
  • T. Joachims. 2006. Preparation linear SVMs in linear time. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Information Mining (KDD 2006), Philadelphia, USA.View this Paper
  •  J. Reisinger, A. Waters, B. Silverthorn, and R.J. Mooney. 2010. Spherical topic models. In Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pages 903–910, Haifa,Israel.View this Newspaper
  • J.H. Lau, D. Newman, S. Karimi, and T. Baldwin.2010. Best topic give-and-take choice for topic labelling. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010),Posters Volume, pages 605–613, Beijing, China.View this Paper
  • X. Wang, A. McCallum, and X. Wei. 2007. Topical due north-grams: Phrase and topic discovery, with an awarding to information retrieval. In Proceedings of the Seventh IEEE International Conference on Data Mining (ICDM 2007), pages 697–702, Omaha,USA.View this Paper
  • J.H. Lau, K. Grieser, D. Newman, and T. Baldwin.2011. Automatic labelling of topic models. In Proceedings of the 49th Almanac Coming together of the Clan for Computational Linguistics: Human Language Technologies (ACL HLT 2011), pages 1536–1545, Portland, The states.View this Paper
  •  B. Zhao and E.P. Xing. 2007. HM-BiTAM: Bilingual topic exploration, discussion alignment, and translation. In Advances in Neural Information Processing Systems (NIPS 2007), pages 1689–1696, Vancouver,Canada.View this Paper
  •  J.H. Lau, N. Collier, and T. Baldwin. 2012a. Online trend assay with topic models: #twitter trends detection topic model online. In Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), pages 1519–1534, Mumbai, India.View this Paper
  •  J.H. Lau, P. Cook, D. McCarthy, D. Newman, and T. Baldwin. 2012b. Give-and-take sense induction for novel sense detection. In Proceedings of the 13th Briefing of the EACL (EACL 2012), pages 591–601,Avignon, France.View this Paper
  •  J.H. Lau, P. Cook, D. McCarthy, D. Newman, and T. Baldwin. 2012b. Word sense induction for novel sense detection. In Proceedings of the 13th Conference of the EACL (EACL 2012), pages 591–601,Avignon, French republic.View this Paper
  •  J.H. Lau, T. Baldwin, and D. Newman. 2013. On collocations and topic models. ACM Transactions on Speech communication and Language Processing, 10(3):10:1–10:xiv.View this Paper
  • J.H. Lau, T. Baldwin, and D. Newman. 2013. On collocations and topic models. ACM Transactions on Speech and Language Processing, ten(3):10:1–10:14.View this Paper
  •  A McCallum, G.S. Isle of mann, and D Mimno. 2006. Bibliometric impact measures leveraging topic analysis. In Proceedings of the 6th ACM/IEEE-CS Articulation Conference on Digital Libraries 2006 (JCDL'06), pages 65–74, Chapel Hill, Us.View this Paper
  • A McCallum, G.Southward. Mann, and D Mimno. 2006. Bibliometric impact measures leveraging topic analysis. In Proceedings of the sixth ACM/IEEE-CS Joint Conference on Digital Libraries 2006 (JCDL'06), pages 65–74, Chapel Hill, United states of america.View this Paper
  •  D. Mimno, H. Wallach, E. Talley, Chiliad. Leenders, and A. McCallum. 2011. Optimizing semantic coherence in topic models. In Proceedings of the 2011 Briefing on Empirical Methods in Natural Linguistic communication Processing (EMNLP 2011), pages 262–272,Edinburgh, UK.View this Paper
  • 4D. Mimno, H. Wallach, E. Talley, M. Leenders, and A. McCallum. 2011. Optimizing semantic coherence in topic models. In Proceedings of the 2011 Conference on Empirical Methods in Natural Linguistic communication Processing (EMNLP 2011), pages 262–272,Edinburgh, Britain.View this Paper
  •  One thousand. Minnen, J. Carroll, and D. Pearce. 2001. Practical morphological processing of English. Natural Language Engineering, 7(iii):207–223.View this Paper
  •  Thou. Minnen, J. Carroll, and D. Pearce. 2001. Applied morphological processing of English. Natural Language Engineering, 7(three):207–223.View this Paper
  •  C. Musat, J. Velcin, S. Trausan-Matu, and Thousand.A. Rizoiu.2011. Improving topic evaluation using conceptual knowledge. In Proceedings of the 22nd International Joint Briefing on Artificial Intelligence(IJCAI-2011), pages 1866–1871, Barcelona, Spain.View this Paper
  • C. Musat, J. Velcin, S. Trausan-Matu, and M.A. Rizoiu.2011. Improving topic evaluation using conceptual cognition. In Proceedings of the 22nd International Articulation Conference on Artificial Intelligence(IJCAI-2011), pages 1866–1871, Barcelona, Spain.View this Newspaper
  • viiiD. Newman, J.H. Lau, K. Grieser, and T. Baldwin.2010. Automated evaluation of topic coherence. In Proceedings of Human Language Technologies:The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2010), pages 100–108, Los Angeles, USA.View this Paper
  •  D. Newman, J.H. Lau, K. Grieser, and T. Baldwin.2010. Automated evaluation of topic coherence. In Proceedings of Human being Language Technologies:The 11th Annual Conference of the Due north American Affiliate of the Association for Computational Linguistics (NAACL HLT 2010), pages 100–108, Los Angeles, U.s.a..View this Newspaper
  •  M. Paul and R. Girju. 2010. A two-dimensional topic-aspect model for discovering multi-faceted topics. In Proceedings of the 24th Annual Conference on Artificial Intelligence (AAAI-10), Atlanta, U.s..View this Newspaper
  •  J. Reisinger, A. Waters, B. Silverthorn, and R.J. Mooney. 2010. Spherical topic models. In Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pages 903–910, Haifa,Israel.View this Paper
  •  X. Wang, A. McCallum, and X. Wei. 2007. Topical due north-grams: Phrase and topic discovery, with an awarding to data retrieval. In Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007), pages 697–702, Omaha,USA.View this Paper
  •  B. Zhao and E.P. Xing. 2007. HM-BiTAM: Bilingual topic exploration, give-and-take alignment, and translation. In Advances in Neural Information Processing Systems (NIPS 2007), pages 1689–1696, Vancouver,Canada.View this Newspaper

+ - Similar Papers (ten)

+ - Cited by (21)

jonescritaiment.blogspot.com

Source: http://sidenoter.nii.ac.jp/acl_anthology/E14-1056/

0 Response to "Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel