PHP DOMXPath提取Html資料

hsuan-ming Yang
3 min readSep 13, 2020

今日主題:最近在做內容網站需要將某個網站頁面上資料整理後寫入資料庫。

範例
$USERAGENT="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36";$ch = curl_init();curl_setopt($ch, CURLOPT_USERAGENT, $USERAGENT);
curl_setopt($ch, CURLOPT_URL, "https://www.banpresto.jp/cn/kuji/9027060.html"); # 內容網站curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
if (isset($_SERVER) && isset($_SERVER['HTTP_USER_AGENT'])) {
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
}
$html = curl_exec($ch);
curl_close($ch);
//var_dump($output);# 解析
$dom = new DOMDocument();
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
# 取得圖片
$imgList = $xpath->query('//section[@class="product-content"]/div/figure/a/img');
# 取得標題
$captionList = $xpath->query('//section[@class="product-content"]/div/figure/a/figcaption/p');
# XPath
//'//section[@class="product-content"]/div/figure/a/img'
//'//section[@class="product-content"]/div/figure/a/figcaption/p# 轉換成陣列
$srcList = [];
foreach ($imgList as $ix => $node) {$srcList[] = [
'caption' => $captionList[$ix]->nodeValue,
'img' => $node->attributes->getNamedItem('src')->nodeValue
];
}print_r($srcList);

如何取得XPath

Chrome 的Elements介面

— 完成 —

--

--