想用C#寫出類似爬蟲取得網頁的HTML
搜尋大部分會得到透過WebClient或HttpClient去取得網頁的HTML
但現在前後端分離盛行透過WebClient或HttpClient的方式只能取得Server-Side-Rendering(SSR)結果
無法取得由前端JS (react angular vue...等)渲染的結果
透過Puppeteer Sharp可以解決這個問題
使用套件
程式範例
string url = "https://網址";
Uri uri = new Uri(url);
string content = string.Empty;
await new BrowserFetcher().DownloadAsync(BrowserFetcher.DefaultRevision);
using (Browser browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = true
}))
{
using (var page = await browser.NewPageAsync())
{
await page.GoToAsync(url);
//取回來的完整HTML
content = await page.GetContentAsync();
}
};
參考資料:
https://blog.darkthread.net/blog/puppeteer-sharp/
https://www.kiltandcode.com/puppeteer-sharp-crawl-the-web-using-csharp-and-headless-chrome/