Topic Modeling: How and Why to Use in Management Research
DOI:
https://doi.org/10.5585/ijsm.v18i3.14561Keywords:
Topic modeling, Latent Dirichlet allocation, Computer-aided text analysis, Machine learning, Big dataAbstract
Objective: To exemplify how topic modeling can be used in management research, my objectives are two-fold. First, I introduce topic modeling as a social sciences research tool and map critical published studies in management and other social sciences that employed topic modeling in a proper manner. Second, I illustrate how to do topic modeling by applying topic modeling in an analysis of the last five years of published research in this journal: the Iberoamerican Journal of Strategic Management (IJSM).
Methodology: I analyze the last five years (2014 to 2018) of published articles in the IJSM. The sample is 164 articles. The abstracts were subjected to a standard topic modeling text pre-processing routine, generating 1,252 unique tokens.
Originality/Relevance: By proposing topic modeling as a valid and opportunistic methodology for analyzing textual data, it can shift the old paradigm that textual data belongs only to the qualitative realm. Furthermore, allowing textual data to be labeled and quantified in a reproducible manner that mitigates (or closely fully eliminates) researcher bias.
Main Results: Six topics were generated through Latent Dirichlet Allocation (LDA): Topic 1 – Strategy and Competitive Advantage; Topic 2 – International Business and Top Management Team; Topic 3 – Entrepreneurship; Topic 4 – Learning and Cooperation; Topic 5 – Finance and Strategy; and Topic 6 – Dynamic Capabilities.
Theoretical/methodological Contributions: I present the state of the art of the literature published in IJSM and also show how the reader can perform their own topic modeling. The full data and code that was used are available in free open science repositories in Open Science Framework (OSF) and GitHub.
Downloads
References
Baumer, E. P. S., Mimno, D., Guha, S., Quan, E., & Gay, G. K. (2017). Comparing grounded theory and topic modeling: Extreme divergence or unlikely convergence? Journal of the Association for Information Science and Technology, 68(6), 1397–1410. https://doi.org/10.1002/asi.23786
, N. T., & Wang, X. (2016). Uncovering the message from the mess of big data. Business Horizons, 59(1), 115–124. https://doi.org/10.1016/j.bushor.2015.10.001
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(1), 993–1022.
Blei, D. M., & Lafferty, J. D. (2007). A correlated topic model of Science. The Annals of Applied Statistics, 1(1), 17–35. https://doi.org/10.1214/07-aoas136
Bordag, S. (2008). A comparison of co-occurrence and similarity measures as simulations of context. Proceedings of the 9th international conference on computational linguistics and intelligent text processing, 52–63. https://doi.org/10.1007/978–3– 540–78135–6_5
Chang, J. (2011). lda: Collapsed Gibbs sampling methods for topic models. R.
Debortoli, S., Müller, O., Junglas, I., & vom Brocke, J. (2016). Text Mining for Information Systems Researchers: An Annotated Topic Modeling Tutorial. Communications of the Association for Information Systems, 39(1), 110–135. https://doi.org/10.17705/1CAIS.03907
Denny, M. J., & Spirling, A. (2018). Text Preprocessing For Unsupervised Learning: Why It Matters, When It Misleads, And What To Do About It. Political Analysis, 26(2), 168–189. https://doi.org/10.1017/pan.2017.44
DiMaggio, P., Nag, M., & Blei, D. M. (2013). Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of U.S. government arts funding. Poetics, 41(6), 570–606. https://doi.org/10.1016/j.poetic.2013.08.004
DiMaggio, P. (2015). Adapting computational text analysis to social science (and vice versa). Big Data & Society, 2(2), 205395171560290. https://doi.org/10.1177/2053951715602908
Glaser, B., & Strauss, A. (1967). Grounded theory: The discovery of grounded theory. Sociology the journal of the British Sociological Association, 12(1), 27-49.
Hannigan, T., Haans, R. F. J., Vakili, K., Tchalian, H., Glaser, V., Wang, M. & Jennings, P. D. (2019). Topic modeling in management research: Rendering new theory from textual data. Academy of Management Annals. https://doi.org/10.5465/annals.2017.0099
Honnibal, M., & Montani, I. (2017). spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing.
Hornik, K., & Grün, B. (2011). topicmodels: An R Package for Fitting Topic Models. Journal of Statistical Software., 40(13), 1–30.
Lucas, C., Nielsen, R. A., Roberts, M. E., Stewart, B. M., Storer, A., & Tingley, D. (2015). Computer-Assisted Text Analysis for Comparative Politics. Political Analysis, 23(2), 254–277. https://doi.org/10.1093/pan/mpu019
Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A. & Adam, S. (2018). Applying LDA Topic Modeling in Communication Research: Toward a Valid and Reliable Methodology. Communication Methods and Measures, 12(2–3), 93–118. https://doi.org/10.1080/19312458.2018.1430754
McCallum, A. K. (2002). MALLET: A Machine Learning for Language Toolkit. Retrieved from http://mallet.cs.umass.edu/index.php
Mimno, D., Wallach, H. M., Talley, E., Leenders, M., & McCallum, A. (2011). Optimizing semantic coherence in topic models. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 262–272.
Mimno, D. (2013). mallet: A wrapper around the Java machine learning tool MALLET. Retrieved from https://cran.r-project.org/package=mallet
Mohr, J. W., & Bogdanov, P. (2013). Introduction—Topic models: What they are and why they matter. Poetics, 41(6), 545–569. https://doi.org/10.1016/j.poetic.2013.10.001
Nelson, L. K. (2017). Computational Grounded Theory. Sociological Methods & Research, 1-40. https://doi.org/10.1177/0049124117729703
Nelson, L. K., Burk, D., Knudsen, M., & McCall, L. (2018). The Future of Coding. Sociological Methods & Research, 1-36. https://doi.org/10.1177/0049124118769114
Nikolenko, S. I., Koltcov, S., & Koltsova, O. (2017). Topic modelling for qualitative studies. Journal of Information Science, 43(1), 88–102. https://doi.org/10.1177/0165551515617393
Ottolinger, P. (2019). bib2df: Parse a BibTeX File to a Data Frame. Retrieved from https://cran.r-project.org/package=bib2df
Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137. https://doi.org/10.1108/eb046814
Rehurek, R., & Sojka, P. (2010). Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks (pp. 45–50). Valletta, Malta: ELRA. https://doi.org/10.13140/2.1.2393.1847
Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S. K. & Rand, D. G. (2014). Structural Topic Models for Open-Ended Survey Responses. American Journal of Political Science, 58(4), 1064–1082. https://doi.org/10.1111/ajps.12103
Roberts, M. E., Stewart, B. M., & Tingley, D. (2014). stm: R package for structural topic models. Journal of Statistical Software, 10(2), 1–40.
Sievert, C., & Shirley, K. (2014). LDAvis: A method for visualizing and interpreting topics. In Proceedings of the workshop on interactive language learning, visualization, and interfaces (pp. 63–70).
Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. Handbook of Latent Semantic Analysis. Mahwah, NJ, US: Lawrence Erlbaum Associates Publishers.
Storopoli, J. (2019, July 22). Topic Modeling IJSM-RIAE. Retrieved from osf.io/97w6z
Torgerson, W. S. (1958). Theory and methods of scaling. New York: J. Wiley.
APPENDIX I - Topic terms weights
First Topic - 0.212*"estratég" + 0.049*"organiz" + 0.042*"competi" + 0.040*"prát" + 0.036*"merc" + 0.036*"conceit" + 0.035*"empr" + 0.022*"vantag" + 0.018*"gerenc" + 0.016*"comport"
Second Topic - 0.055*"gest" + 0.047*"teor" + 0.042*"negóci" + 0.027*"internac" + 0.024*"caracterís" + 0.022*"futur" + 0.021*"decis" + 0.020*"abord" + 0.018*"país" + 0.017*"relacion"
Third Topic - 0.080*"desenvolv" + 0.043*"ambi" + 0.032*"inform" + 0.030*"empreend" + 0.029*"públic" + 0.026*"sustent" + 0.025*"instituc" + 0.023*"institu" + 0.022*"internacion" + 0.022*"empreendedor"
Fourth Topic - 0.090*"process" + 0.037*"conhec" + 0.030*"entrev" + 0.030*"context" + 0.028*"qualit" + 0.023*"perspec" + 0.023*"form" + 0.023*"particip" + 0.020*"envolv" + 0.018*"mudanç"
Fifth Topic - 0.125*"empr" + 0.043*"fat" + 0.042*"recurs" + 0.040*"brasil" + 0.032*"ativ" + 0.026*"estrut" + 0.023*"corpor" + 0.022*"financ" + 0.019*"efici" + 0.019*"capit"
Sixth Topic - 0.084*"inov" + 0.082*"desempenh" + 0.067*"capac" + 0.062*"organizac" + 0.059*"model" + 0.027*"dimens" + 0.024*"dinâm" + 0.024*"produt" + 0.022*"ges" + 0.018*"pequen"
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2019 Iberoamerican Journal of Strategic Management
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.