Unit 12 Reading Notes
Text classification and Naive Bayes To capture the generality and scope of the problem space to which standing queries belong, we now introduce the general notion of a classification problem. Apart from manual classification and hand-crafted rules, there is a third approach to text classification, namely, machine learning-based text classification. Flat clustering Clustering algorithms group a set of documents into subsets or clusters. The algorithms’ goal is to create clusters that are coherent internally, but clearly different from each other. Clustering is the most common form of unsupervised learning. No supervision means that there is no human expert who has assigned documents to classes. In clustering, it is the distribution and makeup of the data that will determine cluster membership. Flat clustering creates a flat set of clusters without any explicit structure that would relate clusters to each other. Hierarchical clustering Hierarchical clustering (or hierarchic