PHP DOMXPath提取Html資料
3 min readSep 13, 2020
今日主題:最近在做內容網站需要將某個網站頁面上資料整理後寫入資料庫。
$USERAGENT="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36";$ch = curl_init();curl_setopt($ch, CURLOPT_USERAGENT, $USERAGENT);
curl_setopt($ch, CURLOPT_URL, "https://www.banpresto.jp/cn/kuji/9027060.html"); # 內容網站curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
if (isset($_SERVER) && isset($_SERVER['HTTP_USER_AGENT'])) {
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
}$html = curl_exec($ch);
curl_close($ch);//var_dump($output);# 解析
$dom = new DOMDocument();
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);# 取得圖片
$imgList = $xpath->query('//section[@class="product-content"]/div/figure/a/img'); # 取得標題
$captionList = $xpath->query('//section[@class="product-content"]/div/figure/a/figcaption/p');# XPath
//'//section[@class="product-content"]/div/figure/a/img'//'//section[@class="product-content"]/div/figure/a/figcaption/p# 轉換成陣列
$srcList = [];foreach ($imgList as $ix => $node) {$srcList[] = [
'caption' => $captionList[$ix]->nodeValue,
'img' => $node->attributes->getNamedItem('src')->nodeValue
];}print_r($srcList);
如何取得XPath
— 完成 —