WP8 - SpeechRecognizerUI | Pou's IT Life - 點部落

2013-08-11

WP8 - SpeechRecognizerUI

1954
0
Windows Phone
2017-03-06

Windows Phone 8 – Speech Recognition - SpeechRecognizerUI

有關Windows Phone 8的Speech API先前已Voice Commands與Text-to-Speech(TTS)功能與使用範例，

詳細內容可參考<voice command –1>、<voice command –2>與<Windows Phone 8 – Text-to-Speech (TTS)讓應用程式讀出內容>。

接下來該篇內容將介紹另一個API：Speech Recognition，如下圖所示：

簡單說明Voice Commands與Speech Recogniton最大的不同，可透過以下說明來簡單定義：

‧Voice Commands：主要是透過Speech System啟動應用程式進行某些任務；

‧Speech Recognition：則是在應用程式裡支援語音識別的功能；

‧Text-to-Speach：朗讀指定內容，可配搭指定的語音、語系或另外定義SSML文件；

三者均有各自的使用情境與特色，往下便來了解Speech Recognition其運作邏輯與重要元件。

〉Speech Recognition：

Windows Phone 8支援Speech Recognition(語音識別)的功能，讓開發人員得以在應用程式中加上語音識別功能提供更好的用戶體驗。

應用程式要支援語音識別功能，需包括幾個主要元素：

(0) Speech runtime；

(1) 使用Recognition APIs；

(2) 準備語音識別的語法以支援聽、寫；

(3) 網路搜尋；

(5) 建立GUI幫助用戶了解與使用speech recognition功能；

它與Voice Command二者最大的不同：

‧speech recognition：使用於應用程式之內；

‧voice command：使用於應用程式之外；

WP8本身預先定義許多grammars(語法)來獨立識別文字與Web search，也支持使用

(Speech Recognition Grammar Specification (SRGS) Version 1.0)

工作標準的客製定義grammars。更可以自行開發GUI來做speech recognition提供用戶可視化反饋。

往下討論要使用Speech Recognition APIs與Grammars前，先根據<Speech recognition for Windows Phone 8>

透過SpeechRecognitionUI類別來做範例：

private async void ButtonSR_Click(object sender, RoutedEventArgs e)

{

  // Create an instance of SpeechRecognizerUI.

  this.recoWithUI = new SpeechRecognizerUI();

  // Start recognition (load the dictation grammar by default).

  SpeechRecognitionUIResult recoResult = 

            await recoWithUI.RecognizeWithUIAsync();

  // Do something with the recognition result.

  MessageBox.Show(string.Format("You said {0}.", recoResult.RecognitionResult.Text));

}

看上述程式碼很短，感覺一定可行吧！錯，如果設備選擇的語音是中文不是英文，那該程式一定會跑出Exception，為什麼會這樣？

預設不會下載非英文以外的語音檔，所以預設是不支持的，可參考

<Handling errors in speech apps for Windows Phone>的「0x800455BC」代碼說明。

如果希望預設可以支持特定的語系，可以改用SpeechRecognition或預先載入設定好的Gammars來進行。

往下針對APIs的部分進行說明。

[注意]

〉要使用Speech Recognition功能，需在WMAppManifest.xml宣告三個capabilities，如下：

ID_CAP_SPEECH_RECOGNITION、ID_CAP_MICROPHONE、ID_CAP_NETWORKING ；

<Capabilities>

  <Capability Name="ID_CAP_NETWORKING" />

  <Capability Name="ID_CAP_MICROPHONE" />

  <Capability Name="ID_CAP_SPEECH_RECOGNITION" />

</Capabilities>

〉Windows.Phone.Speech.Recognition：

該命名空間定義了相關處理Speech Recognition的類別與列舉。以下列出相關的類別與列舉：

類型	名稱	說明
Class	InstalledSpeechRecognizers	回傳電話中已安裝可用的語音識別。
	SemanticProperty	提供有關一個Sematic(語義)的特性。
	SpeechAudioProblemOccurredEventArgs	AudioProblemOccurred event的事件參數。
	SpeechGrammar	運行時對象引用了語音識別語法。
	SpeechGrammarSet	代表SpeechRecognizer或SpeechRecognizerUI實例中相關的grammar集合。
	SpeechRecognitionResult	語音識別(speech recognition session)的結果。
	SpeechRecognitionResultDetail	提供Speech Recognizer執行語音識別(speech recognition session)結果的詳細資訊。
	SpeechRecognitionUIResult	搭配SpeechRecognizerUI執行語音識別(speech recognition session)取得的結果。
	SpeechRecognizer	允許Speech Recognizer使用客製的GUI。
	SpeechRecognizerAudioCaptureStateChangedEventArgs	AudioCaptureStateChanged event的事件參數。
	SpeechRecognizerInformation	包含SpeechRecognizer相關資訊。該類別允許開發人員去使用speech recognizer中的任何屬性，例如：指定語系。
	SpeechRecognizerSettings	調整SpeechRecognizer物件的timeout設定。
	SpeechRecognizerUI	允許Speech recognition使用的系統預設GUI。
	SpeechRecognizerUISettings	設定/取得SpeechRecognizerUI物件的設定。

Enumerations	SpeechGrammarProbability	An enumeration that indicates the weighted value of a grammar for speech recognition.
	SpeechPredefinedGrammar	Indicates the predefined grammar type.
	SpeechRecognitionAudioProblem	Represents the type of audio problem that occurred.
	SpeechRecognitionConfidence	Represents the confidence level that describes how accurately a spoken phrase was matched to a phrase in an active grammar.
	SpeechRecognitionUIStatus	Indicates the status of the speech recognition session that was initiated by the SpeechRecognizerUI object.
	SpeechRecognizerAudioCaptureState	An enumeration that contains all audio capture states.

以上列出相關的類別與列舉，可看出Speech Recognition session分成二大API來完成任務：

(1) SpeechRecognizer；(2) SpeechRecognizerUI；

二者使用上有些差異，針對SpeechRecognizer交由<Windows Phone 8 – Speech Recognition - SpeechRecognizer>加以討論，

以下針對SpeechRecognizerUI類型加以說明：

〉SpeechRecognizerUI：

使用系統預設的Speech Recognizer GUI進行語音識別。在使用上需注意，同一時間只能啟動一個SpeechRecognizerUI實例。

也代表說如果程式裡實例了二個SpeechRecognizerUI，第二個需要等到第一個完成了RecognizeWithUIAsync()事件，才可以繼續使用。

具有二個方法：

‧Close：執行釋放或重置資源的相關任務。如果應用程式裡有用到一個以上的SpeechRecognizerUI物件，建議使用完要記得呼叫該方法；

‧RecognizeWithUIAsync：開始一個speech recognition session。有關SpeechRecognizerUI的設定均要在執行此方法之前完成；

SpeechRecognizerUI的重點在於以下二個屬性，這二個屬性的設定與特性將影響SpeechRecognizerUI的使用：

‧Recognizer：

唯讀。取得SpeechRecognizerUI物件相關的speech recognizer物件。可用於設定grammars與change settings。

如果直接使用該物件的RecognizeAsync()，將不會出現GUI的畫面。

‧Settings：

唯讀。取得speech recognizer GUI的設定。

可參考<Presenting prompts, confirmations, and disambiguation choices for Windows Phone 8>進行調整。

由於SpeechRecognizerUI主要協助使用者知道在應用程式中，語音識別要說什麼、知道識別後的結果，加以改善recognization experience。

WP 8提供內鍵Speech recognization GUI screen提供開發者設定預期的內容與用戶輸入的介面。

根據<Presenting prompts, confirmations, and disambiguation choices for Windows Phone 8>的內容，

往下說明GUI有那類型畫面，與可設定屬性：

a. 畫面類型：

根據使用SpeechRecognizerUI的方式分成二種：

(1) speech recognizer使用predefined dictation或web search grammar：

GUI呈現順序： 1. Listening screen；

2. Thinking screen(沒有文件或相關提示)；

3. Heard you say or error screen；

(2) speech recognizer使用custom grammar：

GUI呈現順序： 1. Listening screen；

2. Did you say screen(如果使用者說出的內容，可能出現一個以上的可能字段，則可提供使用者選擇)；

3. Heard you say or error screen；

參考以下畫面：

a.Listening Screen：

呈現文字以說明系統正在等待用戶的輸入。可選擇性提供文字或片語提示用戶。該畫面出現於程式裡使用RecognizeWithAsync()。

可修改的屬性有二個：

‧SpeechRecognizerUISettings.ListenText：

設定呈現於GUI上的自訂顯示文字以提示用戶如何輸入。文字盡量簡短，因有二行限制(two-line limit)。

該屬性如果沒有設定，預設值為「Listening...」，隨著手機指定的語系會自動調整，例如：台灣：「正在聆聽…」；

所以如果要修改文字的部分，自己也需要做多語系。

‧SpeechRecognizerUISettings.ExampleText：

設定提示用戶可以輸入的片語、文法。例如：" 'blue', 'orange' ", " 'plain list', 'checklist', 'reminder' "。

相同地，如果沒有設定該屬性，預設值為「Listening...」。

b. Did you say screen：

如果用戶所說出的字或片語，語音識別結果有多個選擇時，即會出現該畫面以讓用戶可以選擇自己認為需要的文字。

其選項結果最多可以有20個項目；如果顯示的項目只有5個或更少，可透過指定ReadoutEnabled屬性，自動進入Heard you say的畫面。

另外，需注意如果指定的Recognizer為使用dictation或web search grammars 將不會出現該畫面。

可修改屬性：

‧SpeechRecognizerUISettings.ReadoutEnabled：

該屬性預設值為true，代表phone speaker成功識別後，自動回到Heard you say的畫面。其值會顯示回到heard you say時，

是否要自動讀出識別的結果。

c. Heard you say screen：

當呼叫RecognizeWithUIAsync()方法處理一個了成功的識別時將會顯示，並將識別的文字顯示於畫面中。

可以控件當識別完成時，是否要顯示該畫面，以及是否要返回到用戶所使用的應用程式中。

可修改屬性：

‧SpeechRecognizerUISettings.ReadoutEnabled：

與Did you say screen搭配使用，如果設定為true時，則會顯示這二個畫面。需注意如果該屬性設定為true，

在那識別完成後，自動啟動TTS將識別的內容朗讀出來。

‧SpeechRecognizerUISettings.ShowConfirmation：

設定屬性值為false，在遇到Did you say screen中用戶選擇一個項目後，則會跳過進入heard you say screen。

例如：設定一個Grammar裡有「喔、哦」，在用戶發音時就容易出現Did you say screen後，

用戶選擇就不會在進入heard you say screen。

d. Error screen：

當speech recognition識別失敗時，將會顯示該畫面。

以上了解了SpeechRecognizerUI的重要方法與屬性後，透過下方的程式代碼來看看上述的結果：

private async void StartSpeechRecognitionUI()

{

    recoWithUI = new SpeechRecognizerUI();

    // 建立一個string array，放置要說明grammar，並將它加入recognizer的grammar集合中；

    string[] triviaCategories = { "geography", "movies", "food" };

    recoWithUI.Recognizer.Grammars.AddGrammarFromList("categories", triviaCategories);

    // 設定在listening screen中要顯示的文字

    recoWithUI.Settings.ListenText = "Select a trivia category";

    // 設定在listening screen提示用戶的文字範例

    recoWithUI.Settings.ExampleText = @"Ex. 'geography', 'movies', 'food'";

    // 啟動或取消識別後，讀出識別的內容給用戶

    recoWithUI.Settings.ReadoutEnabled = false;

    // 載入grammar集合，與啟動識別，最後設定回傳結果；

    SpeechRecognitionUIResult result = await recoWithUI.RecognizeWithUIAsync();

    MessageBox.Show(string.Format("You said {0}.", result.RecognitionResult.Text)); 

}