Topic Modeling: How and Why to Use in Management Research
Topic modeling, Latent Dirichlet allocation, Computer-aided text analysis, Machine learning, Big dataAbstract
Objective: To exemplify how topic modeling can be used in management research, my objectives are two-fold. First, I introduce topic modeling as a social sciences research tool and map critical published studies in management and other social sciences that employed topic modeling in a proper manner. Second, I illustrate how to do topic modeling by applying topic modeling in an analysis of the last five years of published research in this journal: the Iberoamerican Journal of Strategic Management (IJSM).
Methodology: I analyze the last five years (2014 to 2018) of published articles in the IJSM. The sample is 164 articles. The abstracts were subjected to a standard topic modeling text pre-processing routine, generating 1,252 unique tokens.
Originality/Relevance: By proposing topic modeling as a valid and opportunistic methodology for analyzing textual data, it can shift the old paradigm that textual data belongs only to the qualitative realm. Furthermore, allowing textual data to be labeled and quantified in a reproducible manner that mitigates (or closely fully eliminates) researcher bias.
Main Results: Six topics were generated through Latent Dirichlet Allocation (LDA): Topic 1 – Strategy and Competitive Advantage; Topic 2 – International Business and Top Management Team; Topic 3 – Entrepreneurship; Topic 4 – Learning and Cooperation; Topic 5 – Finance and Strategy; and Topic 6 – Dynamic Capabilities.
Theoretical/methodological Contributions: I present the state of the art of the literature published in IJSM and also show how the reader can perform their own topic modeling. The full data and code that was used are available in free open science repositories in Open Science Framework (OSF) and GitHub.
Baumer, E. P. S., Mimno, D., Guha, S., Quan, E., & Gay, G. K. (2017). Comparing grounded theory and topic modeling: Extreme divergence or unlikely convergence? Journal of the Association for Information Science and Technology, 68(6), 1397–1410.
, N. T., & Wang, X. (2016). Uncovering the message from the mess of big data. Business Horizons, 59(1), 115–124.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(1), 993–1022.
Blei, D. M., & Lafferty, J. D. (2007). A correlated topic model of Science. The Annals of Applied Statistics, 1(1), 17–35.
Bordag, S. (2008). A comparison of co-occurrence and similarity measures as simulations of context. Proceedings of the 9th international conference on computational linguistics and intelligent text processing, 52–63.–3– 540–78135–6_5
Chang, J. (2011). lda: Collapsed Gibbs sampling methods for topic models. R.
Debortoli, S., Müller, O., Junglas, I., & vom Brocke, J. (2016). Text Mining for Information Systems Researchers: An Annotated Topic Modeling Tutorial. Communications of the Association for Information Systems, 39(1), 110–135.
Denny, M. J., & Spirling, A. (2018). Text Preprocessing For Unsupervised Learning: Why It Matters, When It Misleads, And What To Do About It. Political Analysis, 26(2), 168–189.
DiMaggio, P., Nag, M., & Blei, D. M. (2013). Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of U.S. government arts funding. Poetics, 41(6), 570–606.
DiMaggio, P. (2015). Adapting computational text analysis to social science (and vice versa). Big Data & Society, 2(2), 205395171560290.
Glaser, B., & Strauss, A. (1967). Grounded theory: The discovery of grounded theory. Sociology the journal of the British Sociological Association, 12(1), 27-49.
Hannigan, T., Haans, R. F. J., Vakili, K., Tchalian, H., Glaser, V., Wang, M. & Jennings, P. D. (2019). Topic modeling in management research: Rendering new theory from textual data. Academy of Management Annals.
Honnibal, M., & Montani, I. (2017). spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing.
Hornik, K., & Grün, B. (2011). topicmodels: An R Package for Fitting Topic Models. Journal of Statistical Software., 40(13), 1–30.
Lucas, C., Nielsen, R. A., Roberts, M. E., Stewart, B. M., Storer, A., & Tingley, D. (2015). Computer-Assisted Text Analysis for Comparative Politics. Political Analysis, 23(2), 254–277.
Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A. & Adam, S. (2018). Applying LDA Topic Modeling in Communication Research: Toward a Valid and Reliable Methodology. Communication Methods and Measures, 12(2–3), 93–118.
McCallum, A. K. (2002). MALLET: A Machine Learning for Language Toolkit. Retrieved from
Mimno, D., Wallach, H. M., Talley, E., Leenders, M., & McCallum, A. (2011). Optimizing semantic coherence in topic models. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 262–272.
Mimno, D. (2013). mallet: A wrapper around the Java machine learning tool MALLET. Retrieved from
Mohr, J. W., & Bogdanov, P. (2013). Introduction—Topic models: What they are and why they matter. Poetics, 41(6), 545–569.
Nelson, L. K. (2017). Computational Grounded Theory. Sociological Methods & Research, 1-40.
Nelson, L. K., Burk, D., Knudsen, M., & McCall, L. (2018). The Future of Coding. Sociological Methods & Research, 1-36.
Nikolenko, S. I., Koltcov, S., & Koltsova, O. (2017). Topic modelling for qualitative studies. Journal of Information Science, 43(1), 88–102.
Ottolinger, P. (2019). bib2df: Parse a BibTeX File to a Data Frame. Retrieved from
Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137.
Rehurek, R., & Sojka, P. (2010). Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks (pp. 45–50). Valletta, Malta: ELRA.
Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S. K. & Rand, D. G. (2014). Structural Topic Models for Open-Ended Survey Responses. American Journal of Political Science, 58(4), 1064–1082.
Roberts, M. E., Stewart, B. M., & Tingley, D. (2014). stm: R package for structural topic models. Journal of Statistical Software, 10(2), 1–40.
Sievert, C., & Shirley, K. (2014). LDAvis: A method for visualizing and interpreting topics. In Proceedings of the workshop on interactive language learning, visualization, and interfaces (pp. 63–70).
Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. Handbook of Latent Semantic Analysis. Mahwah, NJ, US: Lawrence Erlbaum Associates Publishers.
Storopoli, J. (2019, July 22). Topic Modeling IJSM-RIAE. Retrieved from
Torgerson, W. S. (1958). Theory and methods of scaling. New York: J. Wiley.
APPENDIX I - Topic terms weights
First Topic - 0.212*"estratég" + 0.049*"organiz" + 0.042*"competi" + 0.040*"prát" + 0.036*"merc" + 0.036*"conceit" + 0.035*"empr" + 0.022*"vantag" + 0.018*"gerenc" + 0.016*"comport"
Second Topic - 0.055*"gest" + 0.047*"teor" + 0.042*"negóci" + 0.027*"internac" + 0.024*"caracterís" + 0.022*"futur" + 0.021*"decis" + 0.020*"abord" + 0.018*"país" + 0.017*"relacion"
Third Topic - 0.080*"desenvolv" + 0.043*"ambi" + 0.032*"inform" + 0.030*"empreend" + 0.029*"públic" + 0.026*"sustent" + 0.025*"instituc" + 0.023*"institu" + 0.022*"internacion" + 0.022*"empreendedor"
Fourth Topic - 0.090*"process" + 0.037*"conhec" + 0.030*"entrev" + 0.030*"context" + 0.028*"qualit" + 0.023*"perspec" + 0.023*"form" + 0.023*"particip" + 0.020*"envolv" + 0.018*"mudanç"
Fifth Topic - 0.125*"empr" + 0.043*"fat" + 0.042*"recurs" + 0.040*"brasil" + 0.032*"ativ" + 0.026*"estrut" + 0.023*"corpor" + 0.022*"financ" + 0.019*"efici" + 0.019*"capit"
Sixth Topic - 0.084*"inov" + 0.082*"desempenh" + 0.067*"capac" + 0.062*"organizac" + 0.059*"model" + 0.027*"dimens" + 0.024*"dinâm" + 0.024*"produt" + 0.022*"ges" + 0.018*"pequen"
How to Cite
Copyright (c) 2019 Iberoamerican Journal of Strategic Management

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
- Abstract 2176
- PDF 1231