Home > Articles > Programming > Java

  • Print
  • + Share This
Like this article? We recommend

Like this article? We recommend

Creating an Index

The first step in implementing full-text searching with Lucene is to build an index. This is easy—you just specify a directory and an analyzer class. The analyzer breaks text fields into indexable tokens; this is a core part of Lucene.

Several types of analyzers are provided out of the box. Table 1 shows some of the more interesting ones.

Table 1 Lucene analyzers.

Analyzer

Description

StandardAnalyzer

A sophisticated general-purpose analyzer.

WhitespaceAnalyzer

A very simple analyzer that just separates tokens using white space.

StopAnalyzer

Removes common English words that are not usually useful for indexing.

SnowballAnalyzer

An interesting experimental analyzer that works on word roots (a search on rain should also return entries with raining, rained, and so on).

There are even a number of language-specific analyzers, including analyzers for German, Russian, French, Dutch, and others.

It isn’t difficult to implement your own analyzer, though the standard ones often do the job well enough. For the sake of simplicity, we’ll use the StandardAnalyzer in this tutorial.

Next, we need to create an IndexWriter object. The IndexWriter object is used to create the index and to add new index entries to this index. You can create an IndexWriter with the StandardAnalyzer analyzer as follows:

IndexWriter indexWriter = new IndexWriter("index", new StandardAnalyzer(), true);
  • + Share This
  • 🔖 Save To Your Account