摘要:Lucene.Net 建立數字索引
NumericField , 根據官方 API 介紹如下 :
This class provides a Field that enables indexing of numeric values for efficient range filtering and sorting.
NumericField class 提供一個專門儲存數字的欄位 , 它可以在搜尋數字區間時獲得更好的效果
The following is sample :
document.Add(new NumericField(name).SetIntValue(value));
Step1 : Build RAMDirectory
為了減少 code 的量 , 把焦點更集中在 NumericField , 因此使用了 RAMDirectory
public static RAMDirectory dir = new RAMDirectory();
Step 2 : Build Index
在下面建立了 1000項產品 , 包含 PROD_ID [產品ID] , PROD_Name [產品名稱] , PROD_Price [產品價格] ,
而PROD_Price 是 NumericField , 我們等等要搜尋某個區間的價格出來
private void BuildIndex() {
IndexWriter iw = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_30), true, IndexWriter.MaxFieldLength.UNLIMITED);
for (int i = 1; i <= 1000; i++)
{
Document doc = new Document();
Field field = new Field("PROD_ID", Guid.NewGuid().ToString(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.NO);
Field field2 = new Field("PROD_Name", "Lucene.Net" + i.ToString(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.NO);
NumericField field3 = new NumericField("PROD_Price", Field.Store.YES, true);
field3.SetIntValue(i);
doc.Add(field);
doc.Add(field2);
doc.Add(field3);
iw.AddDocument(doc);
}
iw.Optimize();
iw.Commit();
iw.Close();
}
Step 3 : Search
接下來我們要搜尋價格區間在 11 ~ 20 之間的產品出來 , 其程式碼在 Line : 7 , 設置了 11 跟 20 的參數
private void SearchIndex() {
IndexSearcher search = new IndexSearcher(dir,true);
QueryParser qp = new QueryParser(Version.LUCENE_30, "PROD_Price", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30));
var nq = NumericRangeQuery.NewIntRange("PROD_Price", 11, 20, true, true);
var hits = search.Search(nq, null, search.MaxDoc).ScoreDocs;
foreach (var res in hits)
{
Response.Write(string.Format("PROD_ID:{0} / PROD_Name{1} / PROD_Price{2}", search.Doc(res.Doc).Get("PROD_ID").ToString()
, search.Doc(res.Doc).Get("PROD_Name").ToString()
, search.Doc(res.Doc).Get("PROD_Price").ToString() + "
"));
}
}
Result :
PROD_ID:563a2d5a-e6ba-4dbd-9276-b7a0a6c81015 / PROD_NameLucene.Net11 / PROD_Price11
PROD_ID:6d5e2353-0b25-4d5b-834e-c243453659f7 / PROD_NameLucene.Net12 / PROD_Price12
PROD_ID:85ab8b54-f779-4f6f-86fb-a1e5f2f1eae6 / PROD_NameLucene.Net13 / PROD_Price13
PROD_ID:652b573e-f7ee-487e-87a1-2aa50882e832 / PROD_NameLucene.Net14 / PROD_Price14
PROD_ID:11d8ba45-db63-4a4f-9454-3f49f8082dcd / PROD_NameLucene.Net15 / PROD_Price15
PROD_ID:19ee84a1-56ec-4754-bd8e-b643fa875f75 / PROD_NameLucene.Net16 / PROD_Price16
PROD_ID:eab7a441-fac8-4020-8eea-62a7c271816e / PROD_NameLucene.Net17 / PROD_Price17
PROD_ID:2fae37b1-3c97-4425-a206-0f29788781b3 / PROD_NameLucene.Net18 / PROD_Price18
PROD_ID:f65da88f-fba9-4e95-a203-2764f2c6125e / PROD_NameLucene.Net19 / PROD_Price19
PROD_ID:9c9f1bf1-07f9-4111-b025-7b06d417826b / PROD_NameLucene.Net20 / PROD_Price20
註記 : 在剛才 Build Index 的部分 , 我們在迴圈中不斷建立 Document 物件 跟 NumericField 其實是不正確的作法 ,
在 Lucene.Net API 文件也提到如下 :
For optimal performance, re-use the NumericField and Document instance for more than one document:
因此我們可以將 Build Index 的部分更改如下 :
private void BuildIndex() {
IndexWriter iw = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_30), true, IndexWriter.MaxFieldLength.UNLIMITED);
Document doc = new Document();
doc.Add(new Field("PROD_ID", "", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.NO));
doc.Add(new Field("PROD_Name", "", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.NO));
NumericField field = new NumericField("PROD_Price", Field.Store.YES, true);
for (int i = 1; i <= 1000; i++)
{
doc.GetField("PROD_ID").SetValue(Guid.NewGuid().ToString());
doc.GetField("PROD_Name").SetValue("Lucene.Net" + i.ToString());
field.SetIntValue(i);
doc.Add(field);
iw.AddDocument(doc);
}
iw.Optimize();
iw.Commit();
iw.Close();
}
資料來源 :