Posts

Unit 12 Reading Notes

Text classification and Naive Bayes To capture the generality and scope of the problem space to which standing queries belong, we now introduce the general notion of a classification problem. Apart from manual classification and hand-crafted rules, there is a third approach to text classification, namely, machine learning-based text classification. Flat clustering Clustering algorithms group a set of documents into subsets or clusters. The algorithms’ goal is to create clusters that are coherent internally, but clearly different from each other. Clustering is the most common form of unsupervised learning. No supervision means that there is no human expert who has assigned documents to classes. In clustering, it is the distribution and makeup of the data that will determine cluster membership. Flat clustering creates a flat set of clusters without any explicit structure that would relate clusters to each other. Hierarchical clustering Hierarchical clustering (or hierarchic...

Unit 11 Muddiest Points

How efficiently do the adaptive system using log mining? I mean that we have to store a lot of log data to calculate the score, do we still build the system to collect the data?

Unit 11 Reading Notes

User profiling refers to use popular techniques for collecting information about users, representing and building user profiles.  Collecting information about users: the information collected may be explicitly input by the user or implicitly gathered by a software agent, collected on the user's client machine or gathered by the application server itself.  User profile representations: keyword profiles and semantic network profile. 4. User profile construction: building keyword profiles and semantic network profile, then building concept profiles.

Unit 10 Muddiest Points

1. In the web search, link is very important message for the query. If the search engine company add the advertisement between the page links. Do they violate information integrity? 2. In the link analysis, you said that there are some people who made the fake linked page(static pages) to enhance their website ranking. Can web search engine detect such pages?

Unit 7 Muddiest Points

Is that possible we can train a model on relevant feedback to know whether the users give the real feedback? I mean can we use this method to detect whether users give us a real feedback? I think giving feedback would distinguish on different culture people. For example, maybe U.S people would like to give feedback or not. I'm just wondering it is maybe a criteria to measure.

Unit 6 Muddiest Points

In the smoothing method, can you just skip the stop words? Why is the language model is hard to deal with relevance feedback? 

Unit 5 Muddiest Points

1. If we use the query likelihood model, how do we deal with the zero probability problem? 2. If we use the document likelihood model, Is not the same document length as the different contents and how do we deal with this problem?