TEXT ANALYSIS TECHNIQUES USING MACHINE LEARNING ALGORITHMS
Abstract
Text analysis, a crucial area within natural language processing (NLP), employs machine learning algorithms to extract meaningful insights from textual data. This thesis explores the primary techniques used in text analysis, detailing the processes involved in data collection, preprocessing, model training, and evaluation. Emphasis is placed on the practical applications and comparative effectiveness of different machine learning algorithms. A comparison table summarizing the key characteristics of these algorithms is included to facilitate understanding.
References
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.
Russell, M. A. (2013). Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More. O'Reilly Media.
Mitra, T. (2014). Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Springer.
Twitter Developers. (n.d.). Twitter API Documentation. Retrieved from https://developer.twitter.com/en/docs
Lewis, D. D. (1997). Reuters-21578 Text Categorization Test Collection. Retrieved from https://www.daviddlewis.com/resources/testcollections/reuters21578
Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python. O'Reilly Media.
Jurafsky, D., & Martin, J. H. (2020). Speech and Language Processing (3rd ed.). Draft. Stanford University.
Manning, C. D., & Schütze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press.
Porter, M. F. (1980). An Algorithm for Suffix Stripping. Program, 14(3), 130-137.
Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
Sebastiani, F. (2002). Machine Learning in Automated Text Categorization. ACM Computing Surveys, 34(1), 1-47.
Goldberg, Y. (2017). Neural Network Methods for Natural Language Processing. Morgan & Claypool Publishers.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993-1022.