HtmlAgility、HTML
解析html元件很好用的套件HtmlAgility
https://html-agility-pack.net/?z=codeplex
VS透過Nuget搜尋HtmlAgility
當網頁中要取得一個元素的值時
範例: 取得value
<input type="hidden" name="dse_processorId" value="AKISDOHAERGVEVABBLFBDFDFEFE">
HtmlWeb webClient = new HtmlWeb();
HtmlDocument doc = webClient.Load("http://www.w3.org/");
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//input[@name='dse_processorId']");
var process = node.GetAttributeValue("value", string.Empty);
其中 SelectNodes裡下的是XPath的語法 (https://zh.wikipedia.org/wiki/XPath)
範例的網址是亂給的詳細請參照要解析的網頁
另外如果是透過WebClient GET取得的HTML可以用 LoadHtml後去解析
WebClient wc = new WebClient();
string htmlCode = wc.DownloadString("http://example.com");
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(WebUtility.HtmlDecode(htmlCode));
補充: 解析table
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(@"<html><body><p><table id=""foo""><tr><th>hello</th></tr><tr><td>world</td></tr></table></body></html>");
foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table")) {
Console.WriteLine("Found: " + table.Id);
foreach (HtmlNode row in table.SelectNodes("tr")) {
Console.WriteLine("row");
foreach (HtmlNode cell in row.SelectNodes("th|td")) {
Console.WriteLine("cell: " + cell.InnerText);
}
}
}
請參考
https://stackoverflow.com/questions/655603/html-agility-pack-parsing-tables