摘要:Lucene.net how to wildcard like
Lucene.net 本身支援類似 book* 這樣的搜尋 , 但是更多人想做到 *book* ,
可是執行之後可能會出現以下的訊息 ,
'*' or '?' not allowed as first character in WildcardQuery
這是因為 Lucene.net 預設是不允許使用 Leading WildCard 的 ,
若要使用這樣的搜尋 , 必須執行下面的程式碼 ,
parser.AllowLeadingWildcard = true;
這樣子就可以完成搜尋了 , 但是引用 Java Lucene 有關於 WildCard 的描述 ,
使用這樣的運算可能會對效能有所影響
Leading wildcards (e.g. *ook) are not supported by the QueryParser by default. As of Lucene 2.1, they can be enabled by calling QueryParser.setAllowLeadingWildcard( true ). Note that this can be an expensive operation: it requires scanning the list of tokens in the index in its entirety to look for those that match the pattern.
有些時候 , 可以考慮另外一個相似的功能 , 模糊搜尋 ( Fuzzy Query ) ,
可以參考我之前的一篇 Lucene.Net 使用模糊搜尋 ,
當你搜尋 Luce.Net 這樣錯誤的字串時 , 你仍然能夠搜尋到 Lucene.net 的正確資料
下面為範例程式碼 :
///
/// 建立索引於 App_Data 資料夾下
///
public void BuildIndex()
{
//從App_Data底下讀入Index檔案 , 若沒有會自動建立
DirectoryInfo dirInfo = new DirectoryInfo(AppDomain.CurrentDomain.BaseDirectory.ToString() + "\\App_Data");
FSDirectory dir = FSDirectory.Open(dirInfo);
IndexWriter iw = new IndexWriter(dir, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30), true, IndexWriter.MaxFieldLength.UNLIMITED);
//索引兩欄位 , ID 跟 DESC , 其值為 holmes2136 跟 但是Holmes是專業PG
Document doc = new Document();
Field field = new Field("ID", "holmes2136", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.NO);
Field field2 = new Field("DESC", "但是Holmes是專業PG", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.NO);
doc.Add(field);
doc.Add(field2);
iw.AddDocument(doc);
iw.Optimize();
iw.Commit();
iw.Close();
}
public void Search(string keyWord) {
string indexPath = AppDomain.CurrentDomain.BaseDirectory.ToString() + "\\App_Data\\";
DirectoryInfo dirInfo = new DirectoryInfo(indexPath);
FSDirectory dir = FSDirectory.Open(dirInfo);
IndexSearcher search = new IndexSearcher(dir, true);
// 針對 DESC 欄位進行搜尋
QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_30, "DESC", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30));
//開啟 leading wildcard
parser.AllowLeadingWildcard = true;
// 搜尋的關鍵字
Query query = parser.Parse(keyWord);
// 開始搜尋
var hits = search.Search(query, null, search.MaxDoc).ScoreDocs;
foreach (var res in hits)
{
Response.Write(string.Format("ID:{0} / DESC{1}",search.Doc(res.Doc).Get("ID").ToString()
,search.Doc(res.Doc).Get("DESC").ToString().Replace(keyWord, "" + keyWord + "") + ""));
}
}
protected void Page_Load(object sender, EventArgs e)
{
BuildIndex();
Search("*holm*");
}
參考資料 :
How to query lucene with “like” operator?
相關文章 :
使用 Lucene.net 開啟 leading wildcard 搜尋 2w 筆姓名