[Office開發系列] 讀取 Office 文件的主要文件摘要區域 (含 Office 與 Open XML 檔案)
只要是一般的文件管理系統或是企業內部搜尋引擎 (像 Openfind 這樣的) 都會搜尋 Office 文件檔案中的一個特別的區域,這個區域有時連我們一般使用者都很少關注的文件摘要 (Document Summary Information) 區,這個資料區會儲存 Office 文件的相關摘要以及作者的資訊,建立的時間等等,像部份系統專案 (如政府標案) 都會要求要在 Office 文件中加入一些資訊,例如 metadata 全文檢索資料等,這時我們就會需要存取 Document Summary Information 這裡的資料。
在 Open XML 中要讀這個資料是很簡單的,只要利用 OpemXmlPackage 中的 PackageProperties 的成員屬性即可,例如:
SpreadsheetDocument openXmlDoc = SpreadsheetDocument.Open(args[0], false);
Console.WriteLine("Title: {0}", openXmlDoc.PackageProperties.Title);
Console.WriteLine("Subject: {0}", openXmlDoc.PackageProperties.Subject);
Console.WriteLine("Revision: {0}", openXmlDoc.PackageProperties.Revision);
Console.WriteLine("Modified: {0}", openXmlDoc.PackageProperties.Modified);
Console.WriteLine("LastModifiedBy: {0}", openXmlDoc.PackageProperties.LastModifiedBy);
Console.WriteLine("Keywords: {0}", openXmlDoc.PackageProperties.Keywords);
Console.WriteLine("Version: {0}", openXmlDoc.PackageProperties.Version);
openXmlDoc.Close();
但是在 Office 97-2003 的文件格式,這個工作卻變得異常困難,使得我們只能借助 NPOI 來做這件事:
FileStream fs = new FileStream(args[0], FileMode.Open, FileAccess.Read, FileShare.Read);
HSSFWorkbook excelDoc = new HSSFWorkbook(fs);
DocumentSummaryInformation docSummaryInfo = excelDoc.DocumentSummaryInformation;
SummaryInformation summaryInfo = excelDoc.SummaryInformation;
Console.WriteLine("Title: {0}", summaryInfo.Title);
Console.WriteLine("Subject: {0}", summaryInfo.Subject);
Console.WriteLine("Revision: {0}", summaryInfo.RevNumber);
Console.WriteLine("Modified: {0}", summaryInfo.LastAuthor);
Console.WriteLine("LastModifiedBy: {0}", summaryInfo.LastSaveDateTime);
Console.WriteLine("Keywords: {0}", summaryInfo.Keywords);
Console.WriteLine("Version: {0}", summaryInfo.OSVersion);
excelDoc = null;
fs.Close();
上面兩段程式碼的執行結果為:
完整的程式碼如下:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using NPOI;
using NPOI.HPSF;
using NPOI.HSSF.UserModel;
using NPOI.HSSF;
using NPOI.POIFS;
using NPOI.POIFS.FileSystem;
using NPOI.Util;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Office;
using DocumentFormat.OpenXml.Office.Excel;
using DocumentFormat.OpenXml.Spreadsheet;
namespace ReadExcelDocumentSummary
{
class Program
{
static void Main(string[] args)
{
if (args.Length == 0)
{
Console.WriteLine("Please specify an Excel Spreadsheet file to first argument.");
return;
}
if (Path.GetExtension(args[0]) == ".xlsx")
{
SpreadsheetDocument openXmlDoc = SpreadsheetDocument.Open(args[0], false);
Console.WriteLine("Title: {0}", openXmlDoc.PackageProperties.Title);
Console.WriteLine("Subject: {0}", openXmlDoc.PackageProperties.Subject);
Console.WriteLine("Revision: {0}", openXmlDoc.PackageProperties.Revision);
Console.WriteLine("Modified: {0}", openXmlDoc.PackageProperties.Modified);
Console.WriteLine("LastModifiedBy: {0}", openXmlDoc.PackageProperties.LastModifiedBy);
Console.WriteLine("Keywords: {0}", openXmlDoc.PackageProperties.Keywords);
Console.WriteLine("Version: {0}", openXmlDoc.PackageProperties.Version);
openXmlDoc.Close();
}
else if (Path.GetExtension(args[0]) == ".xls")
{
FileStream fs = new FileStream(args[0], FileMode.Open, FileAccess.Read, FileShare.Read);
HSSFWorkbook excelDoc = new HSSFWorkbook(fs);
DocumentSummaryInformation docSummaryInfo = excelDoc.DocumentSummaryInformation;
SummaryInformation summaryInfo = excelDoc.SummaryInformation;
Console.WriteLine("Title: {0}", summaryInfo.Title);
Console.WriteLine("Subject: {0}", summaryInfo.Subject);
Console.WriteLine("Revision: {0}", summaryInfo.RevNumber);
Console.WriteLine("Modified: {0}", summaryInfo.LastAuthor);
Console.WriteLine("LastModifiedBy: {0}", summaryInfo.LastSaveDateTime);
Console.WriteLine("Keywords: {0}", summaryInfo.Keywords);
Console.WriteLine("Version: {0}", summaryInfo.OSVersion);
excelDoc = null;
fs.Close();
}
else
{
Console.WriteLine("Unknown file type.");
}
}
}
}
參考資料:
Reading Document Summary Information by NPOI: http://www.cnblogs.com/tonyqus/archive/2009/03/23/1419364.html