C# 透過Puppeteer Sharp取得網頁前端渲染完整HTML,非Server-Side-Rendering(SSR)結果

想用C#寫出類似爬蟲取得網頁的HTML

搜尋大部分會得到透過WebClient或HttpClient去取得網頁的HTML

但現在前後端分離盛行透過WebClient或HttpClient的方式只能取得Server-Side-Rendering(SSR)結果

無法取得由前端JS (react angular vue...等)渲染的結果

透過Puppeteer Sharp可以解決這個問題

使用套件

程式範例

string url = "https://網址";
	
Uri uri = new Uri(url);

string content = string.Empty;

await new BrowserFetcher().DownloadAsync(BrowserFetcher.DefaultRevision);
using (Browser browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
		Headless = true
}))
{
		using (var page = await browser.NewPageAsync())
		{
			await page.GoToAsync(url);

            //取回來的完整HTML
			content = await page.GetContentAsync();

		}
};

參考資料:

https://blog.darkthread.net/blog/puppeteer-sharp/

https://www.kiltandcode.com/puppeteer-sharp-crawl-the-web-using-csharp-and-headless-chrome/