Digital Market -

Content: ВКР.rar (9.62 MB)
Uploaded: 14.08.2015

Positive responses: 0
Negative responses: 0

Sold: 1
Refunds: 0

Final qualifying works on the theme: "Development of software for sentiment analysis based on machine learning methods." Achieved in 2015, protected as "excellent" without any comments. It does the work carefully and thoughtfully. Originality of - 84%. The scope and breadth of work (according to the head of the department) is suitable for the protection of works for undergraduate and specialties, and even the Judiciary.

Detail introduction:
With the development of Internet services, each user has received, among others, the opportunity to express their views. This may be an opinion on a product or service, a film or a book, company or politician. Thus there is a need to process huge amounts of information to determine the relationship of users to a particular object.
Obviously, the number of published reviews, for example, in social networks in the tens of thousands, and processing manual reviews by experts is impossible. In connection with this widespread areas such Computer Science, as the Opinion Mining and Sentiment Analysis (from the English sentiment - feeling). By analyzing the tone has become possible to obtain automatically or "remove" the opinion expressed in the text.
Sentiment analysis (sentiment analysis) - the area of computational linguistics dealing with the release of the texts emotive language or emotional evaluation of the author.
Sentiment analysis - a difficult task, because It requires a deep knowledge of implicit and explicit, used and rare, syntactic and semantic rules of natural language. Researchers in this area are faced with unresolved issues from the processing of natural language texts such as the resolution of various ambiguities, handling negation in texts and others.
The objectives of the analysis are key:
• classification of documents on the basis of opinions;
• classification of the proposals on the basis of opinions;
• Analysis of opinions on the basis of characteristics of the object;
• creation of the dictionary views;
• search for comparisons;
• search spam in a review;
• utility analysis reviews.
The problem of analysis of opinions of Internet users is becoming increasingly important from both the theoretical and applied perspectives. Sentiment analysis draws large companies due to significant spread of social media, as marketers are always needed to monitor the media in the search for mentions of their brands.
Different subject areas, types and categories of texts require special attention. There is no universal algorithm for the analysis of tone that would show a sufficient level of classification accuracy in any subject area and category of texts. In the present paper, we consider not only the task of analyzing the tone of the texts, but also the analysis and comparison of different methods and approaches to the classification of key texts.
The relevance of this topic is that the task of sentiment analysis emerged relatively recently and the optimal solution to the problem at the moment does not exist. It should be taken into account as well and that all the problems in one way or another connected with natural language processing - are complex and ambiguous. It is primarily concerned with methods of machine translation, speech recognition, and sentiment analysis. For this reason, research in this area is not very much, and the Russian-speaking work virtually none.
The aim of this work is the implementation of software designed for the analysis of key texts in Russian and analyzing the effectiveness of different methods and approaches to sentiment analysis.
In the archive:
1. Explanatory Note (page 116)
2. Abstract
3. Review of the scientific. hands.
4. Report Antiplagiat
5. Presentation (20 slides)
6. Address to the presentation
7. The source code of programs (projects Visual Studio)
8. These programs

The content of the explanatory note:
Introduction 8
1 Research Section 11
1.1 Goals and objectives of sentiment analysis 11
1.1.1 Definition 11
1.1.2 The objectives of the analysis and the application of key 11
1.1.3 Problems in defining the key 14
1.2 Overview of text classification 15
1.2.1 Classification by the binary scale 15
1.2.2 multi-class classification 15
1.2.3 Regression 16
1.2.4 The subjectivity / objectivity 17
1.3 Overview of approaches to sentiment analysis 17
1.3.1 Approaches based on rule sets 18
1.3.2 Approaches based on 19 tones dictionaries
1.3.3 Machine Learning 20 Machine learning with a teacher 21 Machine learning without a teacher 21
1.4 Overview of mathematical classification methods in the field of machine learning with a teacher 22
1.4.1 Method closest neighbors 22
1.4.2 Naive Bayes classifier 22
1.4.3 Support Vector 23
1.4.4 The method of decision tree 25
1.5 Overview of methods of organizing feature vector 25
1.5.1 Getting corpus 25
1.5.2 Pre-treatment of the text 28
1.5.3 feature vectors 29 N-gram 29 Character N-gram 30
1.5.4 The weighted vector 31 Binary Weight 31 TF-IDF 31 dTF-IDF 31
1.6 Overview of software solutions in the field of sentiment analysis in Russian 32
1.6.1 «SentiStrength» 32
1.6.2 Module sentiment analysis in the "Analytical Courier" 33
1.6.3 Module sentiment analysis system RCO Fact Extractor 34
Conclusions 34
2 Special Section 36
2.1 Formal description of the subject area 36
2.1.1 The problem of classification and machine learning 36
2.1.2 Task sentiment analysis 37
2.2 The development of the system architecture sentiment analysis 38
2.2.1 Algorithm sentiment analysis 38
2.2.2 Algorithm analysis of the effectiveness of various methods of classification 41
2.3 Organization of the feature vector 43
2.3.1 indicative text description 43
2.3.2 The weighted vector 44 Binary Weight 45 dTF-IDF 45 Modified dTF-IDF 45
2.4 Methods of machine learning with a teacher in the field of classification 47
2.4.1 Naive Bayes classifier (NB) 47
2.4.2 Support Vector Machines (SVM) 50
2.5 Method of estimating the accuracy of the classifier 51
2.5.1 Metrics right and wrong 52
2.5.2 The accuracy and completeness 52
2.5.3 F1-measure 53
Conclusions 54
3 Technology section 55
3.1 The choice of software 55
3.2 Development of software architecture 57
3.3 Development of data parser of the Internet 58
3.3.1 User selection key 59 reviews
3.3.2 Selecting the number of 60 reviews
3.3.3 Selecting the starting position 60
3.3.4 Parsing and pretreatment of 60 reviews
3.3.5 Saving a collection of texts 61
3.4 Implementation of algorithms organization feature vector 61
3.5 Implementation of the learning algorithms classifier 63
3.5.1 Education Naive Bayes classifier 63
3.5.2 Education Classifier of Support Vector 64 Education SVM-classifier with binary function weighing 65 Education SVM-classifier function dTF-IDF 66 Education SVM-classifier function Mod dTF-IDF 67
3.5.3 Saving a model classifier 68
3.6 Implementation of text classification algorithms 68
3.6.1 Algorithm for NB-68 classification
3.6.2 Algorithm for SVM-69 classification
3.7 Analysis of the accuracy of text classification 70
Conclusions 75
4 other sections of the project 77
4.1 User Guide 77
4.1.1 parser reviews ParserKinopoisk.exe 77
4.1.2 The main program Sentiment.exe 78
4.2 Programmer´s Guide 82
4.3 System Administrator´s Guide 83
5 Conclusions 85
List of sources used 87
Appendix A: Terms of Reference 89
Appendix B: Program Listing 96
Appendix C 106 Graphic material
Appendix D. Leaf-specification 126
No feedback yet