ABSTRACTIn this research project we propose a retrieval based and data driven artificially intelligent chatbot which is supposed to perform as a student assistant. This idea of AI assistant can also be implemented in different sectors like business, health, e-commerce etc. This chatbot named ‘Bella’ can interact with the users in chat interface by utilizing it’s knowledge base. It can also learn through user interactions. In this research project we have tested two algorithms on our training data and compared the performances. This chatbot learns from the training data and pre-calculate the confidence score for each conversation. And when a user ask a query or converse with the chatbot, it matches the input with it’s knowledge base and compares the confidence score and give responses in real time with the highest matching confidence score. Our hypothesis implies that, the learning process of our chatbot can be driven to build an AI assistant for students. Our main focus was to compare the algorithms and their performance on different kind of datasets and learn about the impacts of AI assistant in the digitization process of an institution. CHAPTER 1INTRODUCTION1.1 IntroductionWe live in a competitive world and technology is evolving exponentially day by day. Nowadays people tend to spend time as little as possible and want to get their job done. Thus Software giants are improving Human-Computer Interaction by moving towards virtual assistants. A virtual assistant is a software agent which can complete tasks as per the user desires and is also able to record the daily necessaries of a user. The most basic form of a virtual assistant is a Chatbot. A chatbot is that kind of assistant which can communicate with the user in form of text data. Basically, Chatbot is a computer program that can understand the query or response of the user and act accordingly.The applications of chatbots are massive. A chatbot may expert in a particular genre. Different types of chatbot can be effective for getting different sort of jobs done. Our chatbot named “Bella” is a student assistant. To be more specific this chatbot has been built in that way so that it can be very useful and effective for the students of Daffodil International University and also able to perform human-like interactions.1.2 MotivationNowadays education has begun to enjoy the perks of technology. And many institutions are digitizing their systems for the greater good. It has reduced the difficulties and the wastage of time for the students, guardians, and teachers. Still, this is not enough. Yet a student may experience some difficulties in his daily activities. For example, s/he may have to check the routine, schedules. Or s/he may have to check the notices. Or s/he may have forgotten to check the email or missed some digital classroom announcements. Or some may find it a bit difficult to go to certain websites and browse them and extract their desired query.These difficulties can be solved easily by improving their interactions with the computing devices. It would be very nice if students had an assistant and which is able to communicate with the students and solve their queries.So, we tried to build a chatbot and see how the majority responded with the concept. And we have found that a sheer amount of students that we have surveyed has found this idea interesting and they would like to have a mobile application for this chatbot.1.3 Rationale of the StudyThe idea of chatbots is not new in the field of artificial intelligence. But specialized chatbots for students are pretty new. And not much work has done yet. To be more precise we have not found any dedicated chatbot for this job. So we conducted a survey, on which about 57 students responded with their individual views. From that survey we have found some interesting thoughts. The students are not that much aware of their emails and online classroom activities. They are more likely stick to the offline educational activities. Because of extreme diversity of contents over the internet. But there is another interesting fact about their social activities. The are more likely to check their online social media more frequently. And they would very much like to have an agent similar to that. Table 1.3.1 Survey on rationality of chatbot assistantQuestionResponse(On a Scale of 5)Mean(in %)12345Do you regularly check your emails?109134334.7How often do you check your social messenger?2315231287.8Would you like if you had an assistant for your varsity tasks for free?21213122693.6Would you like to have a mobile chatbot app for assistance?014231795.3 1.4 Research QuestionsThe educational institutes are investing a lot of money and worship to digitize and automate the system. Even some institutions are transforming their classroom make it available online also. And some tasks are forced to do online sometimes. But does it worth it?The users might be confused to have too many interfaces and may give up on the online materials and features. 1.5 Expected OutputFrom the survey we have found that the main drawback between the students and online services is the complex and diverse human computer interactions. So if we could reduce that complexity and diversity, and shrink the contents at a same place together so that the students might enjoy all the features of digitalization in the education system of an institution. So, our main focus is to provide a simple as much as possible GUI and machine learning chatbot interface to provide a full featured assistance. The chatbot should be able to answer any frequently asked queries about Daffodil International university and also perform intellectual tasks. The chatbot should be also able to learn over time and conduct human-like interactions with the students. 1.6 Report LayoutHere are the report format that used to make this report on AI activated Student Assistant system: All the topics are covered with the related information. All topics are divided with some paragraph thus it can be easy to understand.Figs and images are used to get the topics content more easily.Tables are shown to compare data with one and another. Required font and size is used to organize all the contents of the report. Specific margins and spaces are used to format the contents. Bullet points are used to describe the content value more exact. Required format and menus are used to get all the contents more easily. CHAPTER 2BACKGROUND 2.1 IntroductionChatbots are meant to perform human-like interactions and communications with the users in form of natural language. For this particular job, NLP (Natural Language Processing) is used. The chatbot should be able to evolve over time and that is why machine learning algorithms are used. So the more users interact with the chatbot the more it learns.2.2 Related WorksIn the field of artificial intelligence and machine learning, chatbots are not new. But chatbots in educational field is quite new and we have not found much related works. But the basic chatbot structure should be the same and can be implemented for various tasks as needed. The basic chat client, if that is a mobile or desktop or web client requires an XMPP protocol and some web services and centralized server for making the magic happen. Fig 2.2.1 Basic Chat Application Architecture 2.2.1 IBM WatsonIf we take chatbots and AI then perhaps the IBM Watson chatbot buzzes in our mind first. The basic IBM Watson chatbot’s workflow is illustrated below. It uses a KB(Knowledge Base) to converse and interact with the user. Fig 2.2.2 IBM Watson Workflow The IBM Watson chatbot API can be integrated in different cross platform chat clients. Fig 2.2.3 IBM Watson Platforms and API 2.3 Research SummaryIn this research project we had to deal with machine learning, natural language processing and artificial intelligence mainly. 2.3.1 Machine LearningMachine learning algorithms can be classified in three categories-Supervised LearningUnsupervised LearningReinforcement Learning 2.3.2 Decision TreeIn machine learning and data mining, for classification and regression analysis decision trees are used. It uses a model which is like a tree for decisions. A sample decision tree is shown below- Fig x.x.x Decision Tree 2.3.3 Naive Bayes AlgorithmIn the area of machine learning and data mining, naive bayes classifier is another popular methodology. This is basically a classification technique based on the well known Bayes’ theorem.The naive bayes algorithm can be implemented easily by using the sklearn toolkit. This algorithm works well for categorical input variable rather than the numerical variables. This algorithm also works well for multi class prediction. A sample code for Gaussian Naive Bayesian classifier is shown below-#Naive Bayes model (Gaussian)from sklearn.naive_bayes import GaussianNBimport numpy as np#target variables and assigning predictorx= np.array(-3,7,1,5, 1,2, -2,0, 2,3, -4,0, -1,1, 1,1, -2,2, 2,7, -4,1, -2,7)Y = np.array(3, 3, 3, 3, 4, 3, 3, 4, 3, 4, 4, 4)#Gaussian Classifiermodel = GaussianNB()# Training the modelmodel.fit(x, y)#Predict Output predicted= model.predict(1,2,3,4)print predictedOutput: (3,4) 2.3.4 Natural Language ToolkitFor natural language processing the tool we are using is NLTK. In order to work with human language data, the NLTK is very feasible for python programs. Fig 2.3.1 Parts of Speech Tagging 2.4 Scope of the Problem 2.5 ChallengesDatabase collection and management is one of the major challenge to build a chatbot. Besides, functional dependency is also a problem to answering same question repeatedly. To identify garbage sentences is also a challenge for a chatbot. The record of conversation for a particular person is difficult to set on the memory and after a while recognize that person is also a challenge for a chatbot. If I select an event via a chatbot for example, I would like that conversation to remember that context and remind me later that I have an upcoming event. More advanced chatbots might know my name, email, address, and so on. Loops, splits, and recursions. This is often where most of the complexity lies in developing a chatbot. Once a developer has integrated the NLP, then comes the real challenge of building a truly productive chatbot. After overcome these challenges we can build a real life chatbot that can interact with human and pass the turing test. CHAPTER 3RESEARCH METHODOLOGY 3.1 IntroductionComparison between k-means and Bayesian classifier is based on the subset of features. Sequential feature selection method is used to establish the subsets. Four categories of subsets are used like life and medical transcripts, arts and humanities transcripts, social science transcripts, physical science transcripts to show the experimental results to classify data and to show that K-NN classifier gets competition with naïve Bayesian classifier. The classification performance K-NN classifier is far better than naïve Bayesian classifier when learning parameters and number of samples are small. But as the number of samples increases the naïve Bayesian classifier performance is better K-NN classifier. On the other hand naïve Bayesian classifier is much better then K-NN classifier when computational demand and memory requirements are considered. This paper demonstrates the strength of naïve Bayesian classifier for classification and summarizes the some of the most important developments in naïve Bayesian classification and K- nearest neighbor classification research. Specifically, the issues of posterior probability estimation, the link between Naïve Bayesian and K-NN classifiers, learning and generalization tradeoff in classification, the feature variable selection, as well as the effect of misclassification costs are examined. The purpose is to provide a synthesis of the published research in this area and stimulate further research interests and efforts in the identified topics. 3.2 Research Subject and Instrumentation Fig 3.2.1 Data Prediction Architecture Fig 3.2.2 MVT Architecture 3.3 Data Collection ProcedureData collection procedure is the core part of the research methodology. Because data is the most important thing in any research. Thus data should be organized in such way so that it can be accessed as per the model requires and also be very memory efficient at the same time. Organized data is important for performing fast queries and make the machine more efficient. Fig 3.3.1 Data Modeling ProcessThis the reason we should organize the data while collection. And store the data in the database so that it can represent some information and knowledge. 3.4 Statistical AnalysisThe collected data should be well organized and for information retrieval from that data statistical analysis is used. There are some very classic and well known formula for calculating the statistical value and then we can use that value for further analysis and the input for other mining algorithms. Fig 3.3.2 Data exploration process Table 3.1 Statistical formulaStatistical TermVisualizationEquationDescriptionCountHistogramNThe number of values of the variable.MeanBoxplotx_bar = ( ? xi ) / nSum of the values divided by the count.RangeBoxplotmax – min Difference between max – minVarianceHistogramA measure of data dispersion.Standard deviationHistogramThe square root of the variance.Coefficient of DeviationHistogramMeasure of data dispersion divided by mean. 3.4.1 Confusion MatrixThe most feasible and efficient way to find the accuracy of any particular class of a dataset is confusion matrix.It shows the number of correct and incorrect predictions made by the classification model compare to the actual outcomes in the data.The matrix is NxN, where N is the number of target values or classes.Performance of such models is commonly evaluated using the data in the matrix.For example, the following table displays a 2×2 confusion matrix for two classes(Positive and Negative) Table 3.2 Confusion MatrixConfusion MatrixTarget (Actual)PositiveNegativeModel(Predict)PositiveabPositive Predictive Valuea/(a+b)NegativecdNegative Predictive Valued/(c+d)SensitivitySpecificityAccuracy:(a+d) / (a+b+c+d)a/(a+c)d/(b+d) 3.5 Implementation Requirements CHAPTER 4EXPERIMENTAL RESULTS AND DISCUSSION 4.1 IntroductionSometimes we face some question to use two techniques and text mining in an efficient way to utilize massive amount of data. K-means and Naive Bayes are two popular algorithms in machine learning world. But with many positive issues we get some negative issues also in our experiment.With too many data, there are some differences between these two algorithms. Like that we compare their speed and conditional independence also. But more important thing is before using one of them we need to understand what type of performance we want. If we want speed or want output based on labeled data, we need to use one and for conditional independence and unlabeled data we need to use another one. 4.2 Experimental ResultAfter comparing two popular algorithms we get some positive and negative both side of those. K-means algorithm good for medium size data but its performance is very poor for huge dataset. K-means works for unlabeled data and doesn’t need any fixed condition.Naïve Bayes algorithm’s performance is much better for our student assistant. It gives good performance for huge dataset and speed is well. But Naïve Bayes is bad if not having conditional independence in dataset.When datasets training are started with the number of documents, K-means and Bayesian classifier are used to show a good status. When documents increase the difference starts showing the differences of these algorithms. For larger training sets the performance of Naïve Bayes algorithm is much better than K-means classifier.When processing time is considered , the processing time is fully depend on the size of test set. If the test set size increases then the processing time increases and remains same for these two classifiers. If different number of documents are used then we can observed the differences of processing time.After analyzing data we select Naïve Bayes classifier for our student assistant and make a GUI for Bella. Figure 4.1: GUI of “BELLA” Here is a sample conversation screenshot. Figure 4.2: Conversation with “BELLA” 4.3 Descriptive AnalysisNaive Bayes classifier is better than K-means. So Bella continue his conversation based on Bayesian classifier. Class label those are conditionally independent to each other based on attributes is the main assumption of naïve Bayesian (NB) decision rule although independence assumption is considered between attributes apart from that the naïve Bayesian performs well amazingly even though this condition is not interested in consideration for most of the data sets as the literature is considered. The independence property of data set shows that although any relation exists in between the data sets, it will be totally ignored while considering the naïve Bayesian classifier. If the class label of any node is ? and it has two attribute values as Xi and Xj (where Xi?Xj) and are conditionally independent. Then xi will be conditionally International Journal of Computer Applications (0975 – 8887) Volume 65– No.23, March 2013 41 independent of xj for given class ?. 4.4 SummaryAs in the study, based on literature result it is found that performance of Naïve Bayesian classifier is the best document classifier. When on same data sets these two methods are applied for finding the optimal result shows. K-means classification method gives much more accuracy as compared to Naïve Bayesian classification method that gives the accuracy result. By increasing the number of cases for training and testing classification can be improved. Though far from perfect, our chatbot produces viable results in some contexts, and provides a solid foundation for future research in this direction. Our original purpose was to handle repetitive questions from the current students of this university and those who are willing to get admitted here. Given that mission, the chatbot performs reasonably well. If a similar question has already been asked, the chatbot is generally able to retrieve it. If no similar question has been asked the bot is able to recognize that and respond, “I don’t know.” It also performs well on policy questions, which is the other type of question that consumes the time of teaching assistants, but does not necessarily require their skills. The limited number of categories makes it easy to determine exactly what a policy question is asking and return the relevant information. The main issue we faced was responding to conceptual questions. When we began this project we recognized that this was an inherently hard problem, and it is one we believe still requires a human teaching assistant. The process of understanding a concept, understanding a student’s question, and then synthesizing the appropriate information to directly respond to the student’s question is one that even humans struggle with. We did not expect to be successful on this part of the project. CHAPTER 5SUMMARY, CONCLUSION, RECOMMENDATION, AND IMPLICATION FOR FUTURE RESEARCH 5.1 Summary of the StudyIn this research project we tried to minimize the complexity and the vast diversity of the online interfaces and also tried to enhance the human computer interaction so that the sheer amount of student are able to enjoy the perks of digitization. The AI chatbot “Bella” which is capable of understanding natural language can perform human-like interactions. The chatbot currently can answer the frequently asked questions and conduct basic conversations. We have found that naive bayes algorithms performs best for this particular task and for our conversational dataset. And NLTK is used for natural language processing. And python django framework is used for a web based interface. 5.2 ConclusionsWe have conducted another survey on the students to learn whether they are liking the chatbot or not.Table 5.2.1 Survey on students QuestionResponse(On a Scale of 5)Mean(in %)12345Do you like the chatbot “Bella”?21213122693.6How well is Bella performing?1323159527.8Would you like to have a mobile app?014231795.3 From that survey we have learned that students are liking the chatbot. But still there is much work to improve. 5.3 Recommendations5.4 Implication for Further StudyREFERENCES1https://www.quora.com/Classification-machine-learning-When-should-I-use-a-K-NN-classifier-over-a-Naive-Bayes-classifier last access at 11.00pm on 10th Dec 20172https://en.wikipedia.org/wiki/Naive_Bayes_classifier last access at 11.00pm on 10th Dec 20173https://en.wikipedia.org/wiki/K-means_clustering last access at 11.00pm on 10th Dec 20174https://smagazine.com/six-ways-a-i-and-chatbots-are-changing-education-c22e2d319bbf last access at 11.00pm on 10th Dec 2017
Copyright 2019 - Education WordPress Theme.