[Office開發系列] 讀取 Office 文件的主要文件摘要區域 (含 Office 與 Open XML 檔案)

[Office開發系列] 讀取 Office 文件的主要文件摘要區域 (含 Office 與 Open XML 檔案)

只要是一般的文件管理系統或是企業內部搜尋引擎 (像 Openfind 這樣的) 都會搜尋 Office 文件檔案中的一個特別的區域,這個區域有時連我們一般使用者都很少關注的文件摘要 (Document Summary Information) 區,這個資料區會儲存 Office 文件的相關摘要以及作者的資訊,建立的時間等等,像部份系統專案 (如政府標案) 都會要求要在 Office 文件中加入一些資訊,例如 metadata 全文檢索資料等,這時我們就會需要存取 Document Summary Information 這裡的資料。

image

在 Open XML 中要讀這個資料是很簡單的,只要利用 OpemXmlPackage 中的 PackageProperties 的成員屬性即可,例如:

SpreadsheetDocument openXmlDoc = SpreadsheetDocument.Open(args[0], false);

Console.WriteLine("Title: {0}", openXmlDoc.PackageProperties.Title);
Console.WriteLine("Subject: {0}", openXmlDoc.PackageProperties.Subject);
Console.WriteLine("Revision: {0}", openXmlDoc.PackageProperties.Revision);
Console.WriteLine("Modified: {0}", openXmlDoc.PackageProperties.Modified);
Console.WriteLine("LastModifiedBy: {0}", openXmlDoc.PackageProperties.LastModifiedBy);
Console.WriteLine("Keywords: {0}", openXmlDoc.PackageProperties.Keywords);
Console.WriteLine("Version: {0}", openXmlDoc.PackageProperties.Version);

openXmlDoc.Close();        

但是在 Office 97-2003 的文件格式,這個工作卻變得異常困難,使得我們只能借助 NPOI 來做這件事:

FileStream fs = new FileStream(args[0], FileMode.Open, FileAccess.Read, FileShare.Read);
HSSFWorkbook excelDoc = new HSSFWorkbook(fs);
DocumentSummaryInformation docSummaryInfo = excelDoc.DocumentSummaryInformation;
SummaryInformation summaryInfo = excelDoc.SummaryInformation;

Console.WriteLine("Title: {0}", summaryInfo.Title);
Console.WriteLine("Subject: {0}", summaryInfo.Subject);
Console.WriteLine("Revision: {0}", summaryInfo.RevNumber);
Console.WriteLine("Modified: {0}", summaryInfo.LastAuthor);
Console.WriteLine("LastModifiedBy: {0}", summaryInfo.LastSaveDateTime);
Console.WriteLine("Keywords: {0}", summaryInfo.Keywords);
Console.WriteLine("Version: {0}", summaryInfo.OSVersion);

excelDoc = null;
fs.Close();

上面兩段程式碼的執行結果為:

image

完整的程式碼如下:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using NPOI;
using NPOI.HPSF;
using NPOI.HSSF.UserModel;
using NPOI.HSSF;
using NPOI.POIFS;
using NPOI.POIFS.FileSystem;
using NPOI.Util;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Office;
using DocumentFormat.OpenXml.Office.Excel;
using DocumentFormat.OpenXml.Spreadsheet;

namespace ReadExcelDocumentSummary
{
    class Program
    {
        static void Main(string[] args)
        {
            if (args.Length == 0)
            {
                Console.WriteLine("Please specify an Excel Spreadsheet file to first argument.");
                return;
            }

            if (Path.GetExtension(args[0]) == ".xlsx")
            {
                SpreadsheetDocument openXmlDoc = SpreadsheetDocument.Open(args[0], false);

                Console.WriteLine("Title: {0}", openXmlDoc.PackageProperties.Title);
                Console.WriteLine("Subject: {0}", openXmlDoc.PackageProperties.Subject);
                Console.WriteLine("Revision: {0}", openXmlDoc.PackageProperties.Revision);
                Console.WriteLine("Modified: {0}", openXmlDoc.PackageProperties.Modified);
                Console.WriteLine("LastModifiedBy: {0}", openXmlDoc.PackageProperties.LastModifiedBy);
                Console.WriteLine("Keywords: {0}", openXmlDoc.PackageProperties.Keywords);
                Console.WriteLine("Version: {0}", openXmlDoc.PackageProperties.Version);

                openXmlDoc.Close();               
            }
            else if (Path.GetExtension(args[0]) == ".xls")
            {
                FileStream fs = new FileStream(args[0], FileMode.Open, FileAccess.Read, FileShare.Read);
                HSSFWorkbook excelDoc = new HSSFWorkbook(fs);
                DocumentSummaryInformation docSummaryInfo = excelDoc.DocumentSummaryInformation;
                SummaryInformation summaryInfo = excelDoc.SummaryInformation;

                Console.WriteLine("Title: {0}", summaryInfo.Title);
                Console.WriteLine("Subject: {0}", summaryInfo.Subject);
                Console.WriteLine("Revision: {0}", summaryInfo.RevNumber);
                Console.WriteLine("Modified: {0}", summaryInfo.LastAuthor);
                Console.WriteLine("LastModifiedBy: {0}", summaryInfo.LastSaveDateTime);
                Console.WriteLine("Keywords: {0}", summaryInfo.Keywords);
                Console.WriteLine("Version: {0}", summaryInfo.OSVersion);

                excelDoc = null;
                fs.Close();
            }
            else
            {
                Console.WriteLine("Unknown file type.");
            }
        }
    }
}

 

參考資料:

Reading Document Summary Information by NPOI: http://www.cnblogs.com/tonyqus/archive/2009/03/23/1419364.html