[中文翻譯] Visual Studio的 羅斯林專案(Roslyn Project) -- 顯露C#和VB編譯器的程式碼分析 #2

摘要:[中文翻譯] Visual Studio的 羅斯林專案(Roslyn Project) -- 顯露C#和VB編譯器的程式碼分析 #2

 

翻譯到這裡我才發現進入「火星文」的地步了

因為裡面的專有名詞我完全不懂,學生時代並沒有學到 Compiler或是相關的專有名詞

連作業系統(OS)也沒有學得很深,所以這邊完全卡住了。說不定資工、或是資訊科學的人更熟悉這一塊領域。

 

大概有半個月的時間,我無法進行下去,因為完全看不懂。

可是,做多少、算多少。說不定有更厲害的人可以接手下去,或是翻譯(潤飾)得更好

 

有些比較艱澀的詞彙,我也翻譯不出來。

所以先透過翻譯軟體,然後我再進行潤稿。

如果詞不達義,先說聲抱歉。建議以原文為主。

中文翻譯稿,歡迎使用。...... 希望您引用時能加上本文的URL(超連結)& 翻譯者:MIS2000 Lab.   謝謝。

 

================================================================================
 
羅斯林專案 -- 顯露C#和VB編譯器的程式碼分析
Roslyn Project -- Exposing the C# and VB compiler’s code analysis

2012年9月

上一篇:[中文翻譯] Visual Studio的 羅斯林專案(Roslyn Project) -- 顯露C#和VB編譯器的程式碼分析 #1

 

檔案下載:微軟的“Roslyn” CTP(社群預覽版 http://msdn.microsoft.com/en-us/vstudio/roslyn.aspx

 
 
3使用語法
 
編譯器APIs展露了最基礎的資料結構,便是語法樹(syntax tree)。這些樹代表的詞法和語法結構(lexical and syntactic structure)的原始碼。有兩個重要的目的:
 
1. 允許使用工具(To Allow Tools)。例如:整合開發環境(IDE)、添加Add-in元件、程式碼分析工具和重構,讓這些工具可以看到和處理用戶(程式設計師)的專案裡面原始碼的句法結構。
2. 啟用工具(To Enable Tools),例如:重構,以及透過自然的方式創建、修改和重新排列,而不必直接對原始碼進行文字編輯的整合開發環境(IDE)。 通過創建和操縱tree,工具可以輕鬆地創建和重新排列原始碼。
 

The most fundamental data structure exposed by the Compiler APIs is the syntax tree. These trees represent the lexical and syntactic structure of source code. They serve two important purposes:

  1. To allow tools—such as an IDE, add-ins, code analysis tools, and refactorings—to see and process the syntactic structure of source code in a user’s project.
  2. To enable tools—such as refactorings and an IDE—to create, modify, and rearrange source code in a natural manner without having use direct text edits. By creating and manipulating trees, tools can easily create and rearrange source code.
 
3.1關鍵概念
本節將討論有關語法(Syntax)APIs的關鍵概念。
 
3.1.1語法樹
語法樹的一級結構,用於編譯、程式碼分析、繫結(綁定)、重構、IDE功能、和程式碼生成。 想要理解原始碼的任何部分,都需要在一開始就細分為許多已知的結構語言元素。
 
語法樹有三個關鍵屬性。 第一個屬性是語法樹容納”full fidelity”裡面所有的原始資訊。 這意味著語法樹中包含的每一件資料來源的文本、每一個語法結構、每一個lexical token,和包括空格,註釋和預存處理指令。舉例來說,當一個字被輸入時,每個在原始碼中被提及的文字便會展現出來。 語法樹也表示在原始碼中的錯誤,當程序不完整或殘缺,相當於跳脫或丟失的語法樹中的token。
 
語法樹的第二屬性。 從解析器(Parser)得到完全語法樹可相互轉換(round-trippable)後的文本解析。 從任何語法的節點,它可獲得該節點為”根”的子樹(sub-tree)文本表示。這意味著,語法樹可以作為一種方法來構造和編輯的原始文本。 透過創建一棵樹,也表示你可創建等量的文本,並通過編輯語法樹的步驟,更改到現有樹的另一個快照(snapshot),你可以有效地編輯文本。
 
語法樹的第三個特性是,他們是不變的(mmutable) 、安全執行緒(thread-safe)。 這意味著,獲得一棵樹以後,它是程式碼當前狀態的快照並且永遠不會改變。 這允許多個用戶同時在不同的執行緒,使用相同的語法樹進行互動,也不需鎖定或複製。 因為樹是不變的,不可直接修改到另一棵樹,工廠方法(factory methods)通過創建樹的另一份快照來創建和修改語法樹。 這些樹是有效率被創建出來,他們重新使用底層節點,所以新版本可以快速重建並減少使用記憶體。
 
語法樹是名副其實的樹資料結構(tree data structure),只要不是最後一個(最後一層)的結構元素,將會是另外一個元素的父元素。 每個語法樹都由這三者構成:節點、token和trivia。
 

Syntax trees are the primary structure used for compilation, code analysis, binding, refactoring, IDE features, and code generation. No part of the source code is understood without it first being identified and categorized into one of many well-known structural language elements.

Syntax trees have three key attributes. The first attribute is that syntax trees hold all the source information in full fidelity. This means that the syntax tree contains every piece of information found in the source text, every grammatical construct, every lexical token, and everything else in between including whitespace, comments, and preprocessor directives. For example, each literal mentioned in the source is represented exactly as it was typed. The syntax trees also represent errors in source code when the program is incomplete or malformed, by representing skipped or missing tokens in the syntax tree. 

This enables the second attribute of syntax trees. A syntax tree obtained from the parser is completely round-trippable back to the text it was parsed from. From any syntax node, it is possible to get the text representation of the sub-tree rooted at that node. This means that syntax trees can be used as a way to construct and edit source text. By creating a tree you have by implication created the equivalent text, and by editing a syntax tree, making a new tree out of changes to an existing tree, you have effectively edited the text.

The third attribute of syntax trees is that they are immutable and thread-safe.  This means that after a tree is obtained, it is a snapshot of the current state of the code, and never changes. This allows multiple users to interact with the same syntax tree at the same time in different threads without locking or duplication. Because the trees are immutable and no modifications can be made directly to a tree, factory methods help create and modify syntax trees by creating additional snapshots of the tree. The trees are efficient in the way they reuse underlying nodes, so the new version can be rebuilt fast and with little extra memory.

A syntax tree is literally a tree data structure, where non-terminal structural elements parent other elements. Each syntax tree is made up of nodes, tokens, and trivia.

 
3.1.2語法節點
語法節點是構成語法樹的主要元素之一。 這些節點表示語法結構(syntactic constructs),如宣告、聲明、條款和表達式(運算式)。 每個語法節點的分類,均是從SyntaxNode 而來的每一個單獨的類別(class)。
 
所有語法樹中的的語法節點都是”非終端”節點,這意味著它們總是有其他的子節點和(子)token。 作為一個子節點,每個節點都有一個"父"節點,經由它的Parent屬性可以訪問父節點。 由於節點和樹是不可改變的,每個節點的父節點永遠不會改變。 樹的根(root)上層有一個null的父層。
 
每個節點都有一個子節點的方法(ChildNodes method),該方法傳回一個基於其原始文本中的子節點列表的順序位置。 此列表不包含token。 每個節點還有Descendant* methods,如DescendantNodes、DescendantTokens、或DescendantTrivia—用來表示存在該節點之下的子樹(sub-tree)的所有節點、token、或 trivia的列表。
 
此外,每個語法節點的子類別(subclass)通過強類型(strongly typed)屬性來展示所有相同的子系(children)。 例如,BinaryExpressionSyntax節點類別有三個附加屬性特有的二元運算符: Left,OperatorToken和Right 。 Left和Right的類型是ExpressionSyntax,而OperatorToken的類型則是SyntaxToken。
 
有些的語法節點有可選的子系(option children)。 例如,一個IfStatementSyntax有一個可選的ElseClauseSyntax 。 如果子系不存在,則返回null。
 

Syntax nodes are one of the primary elements of syntax trees. These nodes represent syntactic constructs such as declarations, statements, clauses, and expressions. Each category of syntax nodes is represented by a separate class derived from SyntaxNode. The set of node classes is not extensible.

All syntax nodes are non-terminal nodes in the syntax tree, which means they always have other nodes and tokens as children. As a child of another node, each node has a parent node that can be accessed through the Parent property. Because nodes and trees are immutable, the parent of a node never changes. The root of the tree has a null parent. 

Each node has a ChildNodes method, which returns a list of child nodes in sequential order based on its position in the source text. This list does not contain tokens. Each node also has a collection of Descendant* methods—such as DescendantNodes, DescendantTokens, orDescendantTrivia—that represent a list of all the nodes, tokens, or trivia that exist in the sub-tree rooted by that node.

In addition, each syntax node subclass exposes all the same children through strongly typed properties. For example, a BinaryExpressionSyntaxnode class has three additional properties specific to binary operators: Left, OperatorToken, and Right. The type of Left and Right isExpressionSyntax, and the type of OperatorToken is SyntaxToken.

Some syntax nodes have optional children. For example, an IfStatementSyntax has an optional ElseClauseSyntax. If the child is not present, the property returns null.

 
3.1.3語法令牌(Syntax Token)
Syntax token是語言文法中的終端(末端、結尾),表示最小的句法的程式碼片段。 他們從來沒有其他節點或token的父代(父階層)。 Syntax token包含關鍵字、標識符、文字和標點符號。
 
對於效率而言, SyntaxToken類型是一個CLR值的類型。Therefore, unlike syntax nodes, there is only one structure for all kinds of tokens with a mix of properties that have meaning depending on the kind of token that is being represented.
 
例如,一個”整數”的literal token代表一個數字的值。In addition to the raw source text the token spans, the literal token has a Value property that tells you the exact decoded integer value. 該屬性類型為Object,因為它可能是許多原始類型之一。
 
ValueText屬性與Value屬性告訴你相同的資訊,但這些屬性原始類型為String。 C#原始文本(文字)中的標識符(identifier)可能包括Unicode跳脫字元(escape characters),yet the syntax of the escape sequence itself is not considered part of the identifier name. 因此,儘管原始文本是由token作為區隔(span)並包含escape sequence,但ValueText屬性的作法並非如此。 相反地,it(ValueText屬性) includes the Unicode characters identified by the escape.
 

Syntax tokens are the terminals of the language grammar, representing the smallest syntactic fragments of the code. They are never parents of other nodes or tokens. Syntax tokens consist of keywords, identifiers, literals, and punctuation.

For efficiency purposes, the SyntaxToken type is a CLR value type. Therefore, unlike syntax nodes, there is only one structure for all kinds of tokens with a mix of properties that have meaning depending on the kind of token that is being represented.

For example, an integer literal token represents a numeric value. In addition to the raw source text the token spans, the literal token has a Valueproperty that tells you the exact decoded integer value. This property is typed as Object because it may be one of many primitive types.

The ValueText property tells you the same information as the Value property; however this property is always typed as String. An identifier in C# source text may include Unicode escape characters, yet the syntax of the escape sequence itself is not considered part of the identifier name. So although the raw text spanned by the token does include the escape sequence, the ValueText property does not. Instead, it includes the Unicode characters identified by the escape.

 
3.1.4語法花絮(Syntax Trivia)
Syntax Trivia代表原始文本的一部分,在大部分可以正常理解的程式碼之中是很微不足道的,如空格、註釋和預處理指令(preprocessor directives)等。
 
因為trivia並不是通用的語言語法的一部分,可以出現在任兩個token之間,trivia並非以一個子節點的狀態存在語法樹裡面。 然而,當我們實現一個功能(如:重構)並保持原始文本的真正意涵時,因為trivia如此重要,我們才發現 trivia確實存在於語法樹裡面。
 
您可以藉由由檢查token的LeadingTrivia或 TrailingTrivia collections來存取trivia。 當解析原始文本時,trivia序列會與token發生關連。 一般而言,在同一列中的下一個token之後,一個token即擁有任何trivia。 在原始的檔案中得到的第一個token即取得所有初始化的trivia,而在文件中的最後一個trivia便會附加到檔案結束標記(EOF),否則trivia寬度將為零。
 
不像語法節點和token(可能有父節點),syntax trivia本身沒有父系的階層。 然而,他們仍是樹的一部分,而且每一個都跟單獨的token有關,你可能使用token屬性存取相關的token。
 
如同syntax token一樣,trivia是”值”類型。 單Syntax Trivia類型是用來描述各種trivia。
 

Syntax trivia represent the parts of the source text that are largely insignificant for normal understanding of the code, such as whitespace, comments, and preprocessor directives.

Because trivia are not part of the normal language syntax and can appear anywhere between any two tokens, they are not included in the syntax tree as a child of a node. Yet, because they are important when implementing a feature like refactoring and to maintain full fidelity with the source text, they do exist as part of the syntax tree.

You can access trivia by inspecting a token’s LeadingTrivia or TrailingTrivia collections. When source text is parsed, sequences of trivia are associated with tokens. In general, a token owns any trivia after it on the same line up to the next token. Any trivia after that line is associated with the following token. The first token in the source file gets all the initial trivia, and the last sequence of trivia in the file is tacked onto the end-of-file token, which otherwise has zero width.

Unlike syntax nodes and tokens, syntax trivia do not have parents. Yet, because they are part of the tree and each is associated with a single token, you may access the token it is associated with using the Token property.

Like syntax tokens, trivia are value types. The single SyntaxTrivia type is used to describe all kinds of trivia.

 
3.1.5跨(Spans)
每個節點上、token、或原始文本上的trivia位置與其包含的字元數。 文本位置以一個32位元的整數被表現出來,這是一個從零開始的Unicode字元索引。 一個TextSpan物件的開始位置和字元計數,均表示為整數。如果TextSpan長度為零,它是指兩個字元之間的位置。
 
每個節點都有兩個的TextSpan屬性: Span屬性和FullSpan屬性 。
 
(1)The Span property is the text span from the start of the first token in the node’s sub-tree to the end of the last token. This span does not include any leading or trailing trivia.
 
(2)The FullSpan property is the text span that includes the node’s normal span, plus the span of any leading or trailing trivia.
例如:

上圖的程式碼(throw new Exception(“Not right.”)下方有一條藍色的虛線; FullSpan則以紅色的虛線表示。It includes the same characters as the span and the characters associated with the leading and trailing trivia.
 

Each node, token, or trivia knows its position within the source text and the number of characters it consists of. A text position is represented as a 32-bit integer, which is a zero-based Unicode character index. A TextSpan object is the beginning position and a count of characters, both represented as integers. If TextSpan has a zero length, it refers to a location between two characters.

Each node has two TextSpan properties: Span and FullSpan.

The Span property is the text span from the start of the first token in the node’s sub-tree to the end of the last token. This span does not include any leading or trailing trivia.

The FullSpan property is the text span that includes the node’s normal span, plus the span of any leading or trailing trivia.

For example:

 

The statement node inside the block has a span indicated by the purple underline. It includes the characters throw new Exception(“Not right.”);. The full span is indicated by the orange underline. It includes the same characters as the span and the characters associated with the leading and trailing trivia.

 
3.1.6種(Kinds)
每個節點、token、或trivia有一種SyntaxKind型態的kind屬性,定義了確切的語法元素。 每一種語言,不管是C#或VB,都有一個單一SyntaxKind的列舉,列出語法中所有可能的節點、token、或trivia元素。
 
The Kind property allows for easy disambiguation of syntax node types that share the same node class. 對token與trivia來說,kind這個屬性是區分不同元素的類型的唯一方法。
 
例如,一個單一的BinaryExpressionSyntax類別具有Left、OperatorToken、和Right等等的子階層。 Kind屬性區分它是否為一個AddExpression、SubtractExpression、或MultiplyExpression的語法節點。
 

Each node, token, or trivia has a Kind property, of type SyntaxKind, that identifies the exact syntax element represented. Each language, C# or VB, has a single SyntaxKind enumeration that lists all the possible nodes, tokens, and trivia elements in the grammar.

The Kind property allows for easy disambiguation of syntax node types that share the same node class. For tokens and trivia, this property is the only way to distinguish one type of element from another.

For example, a single BinaryExpressionSyntax class has Left, OperatorToken, and Right as children. The Kind property distinguishes whether it is an AddExpression, SubtractExpression, or MultiplyExpression kind of syntax node.

 
3.1.7錯誤
即使當原始文本包含語法錯誤 ,a full syntax tree that is round-trippable to the source is exposed. 當解析器遇到不符合語言的語法定義的程式碼,它使用兩種技巧之一來創建語法樹。
 
首先,如果分析器(parser)期望有一種特殊的token,但沒有找到它,它可能會插入missing token到語法樹中的位置(先卡位)。 這個missing token表示實際token可能的位置,但它有一個空的span,而且它的IsMissing屬性會傳回true值。
 
其次,分析器(parser)除非能找到一個token讓它繼續解析,不然的話就會跳過(忽略不計)。 在這種情況下skipped token that were skipped are attached as a trivia node with the kind SkippedTokens.
 

Even when the source text contains syntax errors, a full syntax tree that is round-trippable to the source is exposed. When the parser encounters code that does not conform to the defined syntax of the language, it uses one of two techniques to create a syntax tree.

First, if the parser expects a particular kind of token, but does not find it, it may insert a missing token into the syntax tree in the location that the token was expected. A missing token represents the actual token that was expected, but it has an empty span, and its IsMissing property returns true.

Second, the parser may skip tokens until it finds one where it can continue parsing. In this case, the skipped tokens that were skipped are attached as a trivia node with the kind SkippedTokens.

 

 

 

 

 

 

未完,待續......

 

P.S. 如果我沒有能力完成這篇翻譯,我會把已經整理好的 Word檔附在這裡,讓更有能力的人完成,謝謝。

 

 

2014/5/15 補充:大陸的熱心網友 Ray Linn已經完成翻譯,請看

http://blogs.ejb.cc/archives/7604/dotnet-compile-platform-roslyn-overview

 

 

 

我將思想傳授他人, 他人之所得,亦無損於我之所有;

猶如一人以我的燭火點燭,光亮與他同在,我卻不因此身處黑暗。----Thomas Jefferson

寫信給我,不要私訊 --  mis2000lab (at) yahoo.com.台灣  或  school (at) mis2000lab.net

................   facebook社團   https://www.facebook.com/mis2000lab   ......................

................  YouTube (ASP.NET) 線上教學影片  http://goo.gl/rGLocQ

*********************************************************************************************

*** ASP.NET MVC線上課程 第一天 免費看 (5.5小時) *** 

************************************************************(歡迎索取,免費申請)*****

 

ASP.NET遠距教學、線上課程(Web Form + MVC)第二門 課程「四折」-- 以MVC課程作為優惠。

第一天完整課程,提供 "完整" 試聽。  如 "第一天"試聽 不滿意 全額退費!

 

Blog文章 "附的範例" 無法下載,請看 這裡 ...... https://dotblogs.com.tw/mis2000lab/2016/03/14/2008_2015_mis2000lab_sample_download

請看我們的「售後服務」範圍(嚴格認定)

......................................................................................................................................................

...................................................................................................................................................... 

[遠距教學、教學影片] ASP.NET (Web Form) 課程 上線了!MIS2000Lab.主講

事先錄製好的影片,並非上課時側錄!   觀看影片時,有如我「一對一」跟您面對面講課

 

    MIS2000 Lab.  線上教學影片(YouTube) **免費觀賞**