Evaluation of the Method
While the method described above works well for handling large quantities of world news feeds, it fails in a number of other situations. These inherent limitations in the approach require significant changes to the method.
- All sources must discuss the same topics using similar terms. For example, it would be difficult to group an article about the U.S. presidential race if it described the race using terms from Shakespeare. Because no other stories would be using the same poetic approach, the true nature of the topic would be obscured in the analysis.
- All or nearly all of the analysis must be performed in the same language. In the absence of robust machine translators, the topic sources won't use the same terms to describe a topic.
- Some threshold must be crossed for a grouping to form. The number of sources required to get reasonable results scales with the diversity of topics. This is evidenced by the success of the clustering for world news based on approximately 65 feeds, while similar groupings for memes requires thousands of blogs and other sites to be aggregated and analyzed.
- Because it's a second-order analysis, this approach doesn't work well to immediately detect breaking news events. New, breaking topics (fast-moving weather systems, terrorist attacks, or the like) are first reported in the press and must gather enough momentum or magnitude to compete with other stories in order to become visible. The topic would become most visible when ranked by momentum (i.e., rising in visibility most quickly) rather than static rankings of visibility.