MATNLI MA’LUMOTLARGA DASTLABKI ISHLOV BERISH BOSQICHLARI VA USULLARI
Keywords:
matnli ma’lumotlar, NLP, tokenizatsiya, stemming, lemmatizatsiya, stop-so‘zlar, vektorlashtirish, Bag-of-Words, TF-IDF, Word2Vec, SpaCy, NLTK, PythonAbstract
Mazkur ishda matnli ma’lumotlarga dastlabki ishlov berish bosqichlari va usullari yoritilgan. Xususan, tokenizatsiya, stemming, lemmatizatsiya, stop-so‘zlarni olib tashlash hamda matnni normalizatsiya qilish jarayonlari ko‘rib chiqilgan. Shuningdek, matnli ma’lumotlarni raqamli ko‘rinishga o‘tkazish uchun vektorlashtirish usullari — Bag-of-Words, TF-IDF va zamonaviy Word Embeddings modellari tahlil qilingan. Python dasturlash tilining NLTK, SpaCy va Scikit-learn kabi kutubxonalari yordamida ushbu jarayonlarni amalga oshirish imkoniyatlari ko‘rsatib berilgan. Ishda matnli ma’lumotlarni to‘g‘ri qayta ishlashning tahlil natijalarining aniqligi va ishonchliligiga ta’siri asoslab berilgan.
References
1. Jurafsky D., Martin J.H. – Speech and Language Processing (3rd edition draft), 2023.
2. Bird S., Klein E., Loper E. – Natural Language Processing with Python, O’Reilly Media, 2009.
3. Manning C.D., Raghavan P., Schütze H. – Introduction to Information Retrieval, Cambridge University Press, 2008.
4. Mikolov T. et al. – Efficient Estimation of Word Representations in Vector Space, 2013.
5. Pedregosa F. et al. – Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research, 2011.
6. Honnibal M., Montani I. – spaCy 2: Natural Language Understanding with Bloom Embeddings, 2017.

