CS 260: Seminar in Text Mining
Fall 2017
Instructor: Vagelis Hristidis (aka Evangelos Christidis)
Seminar time: MWF 3:10-4:00 pm
Location: Watkins Hall | Room 1117
Main Topics:
Grading
Presentations Schedule
https://docs.google.com/spreadsheets/d/1Hzpz94D9XxmDnBxfx4ePXNpgGLxb3lCPvhI_aGC9AxQ/edit?usp=sharing
Several papers are chapters from:
CC
Aggarwal, CX
Zhai.
Mining
text data. Kluwer Academic
Publishers, 2012 (to download for free you must be inside UCR network)
Date | Presenter | Paper | Topic |
9/29 | Vagelis | Intro and presentation assignments; Intro on text mining (clustering, classification, information extraction), reviews analysis (extraction, sentiment), chatbots | |
10/2 | cancelled | ||
10/4 | 1. Chapter 2 (Information Extraction from Text, Jing Jiang, 11-35) | Information Extraction, Summarization | |
10/6 | |||
10/9 |
2. Hu, M., & Liu, B. (2004, August).
Mining and summarizing customer reviews. In Proceedings of the
tenth ACM SIGKDD international conference on Knowledge discovery and
data mining (pp. 168-177). ACM |
Reviews Analysis | |
10/11 |
|
||
10/13 | |||
10/16 |
4. Chapter 6 (A Survey of Text Classification
Algorithms, Charu C. Aggarwal and ChengXiang Zhai, 163- 213,
double) |
Classification | |
10/18 |
5. cont'd |
||
10/20 | |||
10/23 |
6. Tutorial on how to use WEKA for text classification (use resources
from
https://www.youtube.com/watch?v=IY29uC4uem8,
https://weka.wikispaces.com/Text+categorization+with+WEKA, and so on) |
||
10/25 |
7. Chapter 12 (Text Analytics in Social Media, Xia Hu and Huan Liu, 385-408) |
Social media | |
10/27 | |||
10/30 |
8.
|
Sentiment analysis | |
11/1 | 9. cont'd | ||
11/3 | |||
11/6 |
|
Spam reviews | |
11/8 (11/10 is holiday) |
11. Mukherjee, A., Kumar, A., Liu,
B., Wang, J., Hsu, M., Castellanos, M., & Ghosh, R. (2013, August).
Spotting
opinion spammers using behavioral footprints. In Proceedings of
the 19th ACM SIGKDD international conference on Knowledge discovery and
data mining (pp. 632-640). ACM |
||
11/13 |
12. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR, 2013. (include short demo and material from https://code.google.com/p/word2vec/, double) |
word2vec | |
11/15 | 13. cont'd (demo) | ||
11/17 | |||
11/20 (Thanksgiving week) | 14. Intro to LSTM Neural Networks (http://colah.github.io/posts/2015-08-Understanding-LSTMs/, https://deeplearning4j.org/lstm.html) + demo (https://www.tensorflow.org/tutorials/recurrent) (double) | Deep Learning, Application to email auto-reply | |
11/27 |
15. cont'd (demo) |
||
11/29 | 16. Anjuli Kannan, Karol Kurach, Sujith Ravi, Tobias Kaufmann, Andrew Tomkins, Balint Miklos, Greg Corrado, Laszlo Lukacs, Marina Ganea, Peter Young, and Vivek Ramavajjala. Smart Reply: Automated Response Suggestion for Email. In Proc. of KDD, 2016, 955-964 | ||
12/1 |
|
||
12/4 |
17.
Liu, Chia-Wei, Ryan Lowe, Iulian V. Serban, Michael Noseworthy, Laurent Charlin, and Joelle Pineau. "How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation." arXiv preprint arXiv:1603.08023 (2016).
Harvard |
chatbots | |
12/6 | 18. Xu, A., Liu, Z., Guo, Y., Sinha, V., & Akkiraju, R. (2017, May). A New Chatbot for Customer Service on Social Media. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (pp. 3506-3510). ACM. | ||
12/8 |
|
If we have time we can also cover:
Blei, D. M., Ng, A. Y., & Jordan,
M. I. (2003).
Latent
dirichlet allocation. the Journal of machine Learning research,
3, 993-1022
Tutorial on what is Stanford POS Tagger and how to use it (use material
from
http://nlp.stanford.edu/software/tagger.shtml)
Kiritchenko, S., Zhu, X., &
Mohammad, S. M. (2014). Sentiment analysis of short informal texts http://www.jair.org/media/4272/live-4272-8102-jair.pdf.
Journal of Artificial Intelligence Research, 723-762.
Ritter, Alan, Colin Cherry, and William B. Dolan. "Data-driven response generation in social media." In Proceedings of the conference on empirical methods in natural language processing, pp. 583-593. Association for Computational Linguistics, 2011.
Other interesting papers:
Socher, R., Perelygin, A., Wu, J.
Y., Chuang, J., Manning, C. D., Ng, A. Y., & Potts, C. (2013, October).
Recursive deep models for semantic compositionality over a sentiment treebank.
In Proceedings of the conference on empirical methods in natural language
processing (EMNLP) (Vol. 1631, p. 1642).
Presentation tips:
There are many sources, but here is just one: