Thursday, 2 February 2012

Rethinking of Sentiment Analysis



    Given the boldness of their claims, I believe they ought to either publish their methods and their code, or withdraw these claims. (Kuleshov, 2010)



Yesterday, I received an email from Google Scholar Alert (You can subscribe with any keyword if you want). The email provided me several links of Stanford Machine Leaning 2011 Autumn FInal Project. This course was given by Prof. Andrew Ng, and it also has an online course: http://www.ml-class.org/. I then read all 7 reports about tweet and stock prediction, which replicated and implemented Twitter Mood Predicts the Stock Market by Bollen, Mao & Zeng (2010) (http://www.ml-class.org/). 

You can find my tabular comparison here: 7 Stanford Final Project Comparison

In a nutshell, these 7 projects have some similarities:

1. None of 7 reports reach Bollen et al's high accuracy. The accuracy of 7 projects are around 60% to 70%.

2. 5 reports use dataset from Stanford SNAP lab, which is about 476 million tweets (other two are not clear). Although they did not use the entire dataset, their data size should be larger than Bollen et al's 10 million tweets.

3. Most projects use SVM and neutral network as the machine leaning methods, and also use Granger causality and several regressions to detect the correlation. 

4. All projects use some pre-designed or pre-generated word lists, such as Alex Davis's  Twitter Sentiment Analysis Word List, and some lists developed from POMS. Only one used a hybrid approach by extracting the top 1000 frequent words (Debbini, Estin & Goutagny, 2011). 


5. Most projects focus on the unigram or bigram features, which is similar to the bag of words model.


6. Four of them concentrate on DJIA, one on NASDAQ, one on DIA and one is unknown.


7. Four of them follow Bollen et al's experiment, using a multi-dimensional polarity (from 4 to 6 dimensions) to do sentiment analysis.


From these comparisons, I would like to say:


a. Even focusing on the same data regardless the size, the results of sentiment analysis can vary very much. I think the main problem is the features, because these cross-validating results are based on different machines learning methods, but the results are quite similar.  Thus, the present word-level approaches should be implemented. In EMNLP 2011, many presenters mention about the shortages of the bag-of-word approach. So, a more steady level of features should be considered (I do not feel that phrase level is the best candidate). 


b. I cannot see any advantage of multi-dimensional approach in stock prediction. Though Bollen et al claim that the curve of calm is most related to the stock trend, from several Stanford reports, this result cannot be replicated. Nevertheless we cannot exclude the coincidence of Bollen et al's result.


c. Several Stanford projects suggest the daily amount of tweet data does not change the accuracy very much, but a longer data coverage may be needed. I feel that one reason is that they ignored many potential features, e.g.: ellipsis. Some projects (followng Bollen et al) only consider tweets with specific patterns, what is the purpose of this? If you think about John Sinclair's internal and external criteria on corpus design, this should be really careful.


d. So far, we cannot deny Bollen, Mao & Zeng's contribution, but should really carefully evaluate their experiment. Anyway, they opened a brand new topic in sentiment analysis. Moreover, I strongly recommend Tweets and Trades: The Information Content of Stock Microblogs by Sprenger & Welpe (2010), which is more straightforward and detailed.


At first, I was really pessimistic when I read Stanford's reports, but later, I felt that it was really nice to see the cross-validating results, because they clearly present the shortcoming of current sentiment approach. Now, I feel more confident to develop some new implementations.



Reference




Bollen, J., Mao, H., & Zeng, X. (2010). Twitter mood predicts the stock market. Computer, 1010(3003v1), 1-8. Retrieved from http://arxiv.org/abs/1010.3003
Chakoumakos, R., Trusheim, S., & Yendluri, V. (2011). Automated Market Sentiment Analysis of Twitter for Options Trading. CS229. Retrieved from http://cs229.stanford.edu/proj2011/TrusheimChakoumakosYendluri-Automated_Market_Sentiment_analysis_of_Twitter_for_Options_Trading.pdf
Chen, R., & Lazer, M. (2011). Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement (pp. 1-5).
Chyan, A., Hsieh, T., & Lengerich, C. (2011). A Stock-Purchasing Agent from Sentiment Analysis of Twitter.
Debbini, D., Estin, P., & Goutagny, M. (2011). Modeling the Stock Market Using Twitter Sentiment Analysis. CSS229.
Hsu, E., Shiu, S., & Torczynski, D. (2011). Predicting Dow Jones Movement with Twitter. CS229, 1-5. Retrieved from http://cs229.stanford.edu/proj2011/HsuShiuTorczynski-PredictingDowJonesMovementWithTwitter.pdf
Kuleshov, V. (2011). Can Twitter predict the stock market ? CS229 (pp. 1-5).
Lee, H. (2011). Using Twitter to Estimate and Predict the Trends and Opinions. CS229 (Vol. 0, pp. 4-8).
Mittal, A., & Goel, A. (2011). Stock Prediction Using Twitter Sentiment Analysis. CSS229, (June). Retrieved from http://cs229.stanford.edu/proj2011/GoelMittal-StockMarketPredictionUsingTwitterSentimentAnalysis.pdf
Sprenger & Welpe. (2011) Tweets and Trades: The Information Content of Stock Microblogs     (November 1, 2010). Available at SSRN: http://ssrn.com/abstract=1702854



0 comments:

Post a Comment