Unit 12 Reading Notes

Text classification and Naive Bayes

To capture the generality and scope of the problem space to which standing queries belong, we now introduce the general notion of a classification problem.
Apart from manual classification and hand-crafted rules, there is a third approach to text classification, namely, machine learning-based text classification.

Flat clustering
Clustering algorithms group a set of documents into subsets or clusters. The algorithms’ goal is to create clusters that are coherent internally, but clearly different from each other.

Clustering is the most common form of unsupervised learning. No supervision means that there is no human expert who has assigned documents to classes. In clustering, it is the distribution and makeup of the data that will determine cluster membership.

Flat clustering creates a flat set of clusters without any explicit structure that would relate clusters to each other.

Hierarchical clustering

Hierarchical clustering (or hierarchic clustering) outputs a hierarchy, a structure that is more informative than the unstructured set of clusters returned by flat clustering.1 Hierarchical clustering does not require us to prespecify the number of clusters and most hierarchical algorithms that have been used in IR are deterministic.

Search This Blog

Information Storage and Retrieval

Unit 12 Reading Notes

Comments

Post a Comment