nokogiriã使ã£ãã¹ã¯ã¬ã¤ãã³ã°å ¥é
ä»åã¯nokogiriã使ã£ãã¹ã¯ã¬ã¤ãã³ã°ã«ã¤ãã¦æ¸ãã¾ãã
ã¹ã¯ã¬ã¤ãã³ã°ã¨ã¯
ã¦ã§ãã¹ã¯ã¬ã¤ãã³ã°ï¼Web scrapingï¼ã¨ã¯ãã¦ã§ããµã¤ãããæ å ±ãæ½åºããã³ã³ãã¥ã¼ã¿ã½ããã¦ã§ã¢æè¡ã®ãã¨ã
ã¦ã§ãã¹ã¯ã¬ã¤ãã³ã° - Wikipediaãã
Webããã°ã©ãã³ã°ããã®ã§ããã°ã©ã£ãããã§å¿ è¦ã«ãªããã®ã§ããããè¦ã¯ã¦ã§ããµã¤ãããæ å ±ãã¬ã¼ã¼ã¼ãã¨åå¾ãããã¨ãã«ããã°ã©ã ãæ¸ãã¦èªååãããããã¨ãããã®ã§ãããããããã¨ãã§ããããã«ãªãã¨ãã¤ã³ã¿ã¼ãããããã£ã¨æ¥½ãããªãã¾ãã
ãã®ããRuby 第3çã«åããããã解説ããã¦ããã®ã§ãããåèã«ãã¾ããã
ããæ¹
ç°¡åãªãã¨ãªãjQueryã®åæ©çãªDOMæä½ã¿ãããªæãã§åºæ¥ã¾ãã
使ããã®
è¨èª
gem
- open-uri
- nokogiri
ä»åã®ã¹ã¯ã¬ã¤ãã³ã°ã¯rubyã§ãã£ã¦ããã¾ããPHPã¨ã©ã£ã¡ãç°¡åãããããªããã©å¤åRubyã®æ¹ãç°¡åãªããããªãã§ãããã
ã¨ãããã¨ã§ãã¾ãã¯
gem install nokogiri
ãã¦ããã
require 'open-uri' require 'nokogiri'
ãã¦ããã¾ãããã
æ©éãæ½åºãã
å ¨é¨åºã
ãã£ããHTMLãã¨ã£ã¦ãã¾ãããã
uri = "http://shgam.hatenadiary.jp" doc = Nokogiri::HTML(open(uri),nil,"utf-8") puts doc
HTMLãå
¨é¨ã¨ã£ã¦ãã¦ããããã¾ãã¾åºåããã«ã¯ããã§ãã£ãã¼ã§ãã
HTMLãã°ã¼ã¼ã£ã¨åºåãããã¯ãã§ããHTMLãã¨ã£ã¦ããã ãªãã¦ããã©ã¦ã¶ã«ãªã£ãæ°åã§ããã
CSSã®è¨æ³ã§æ¬²ãããã®ãã¨ã£ã¦ãã
doc.css("h1").each do |h1| puts h1 end
ããã§h1ã¿ã°ã®ãã®ãå ¨é¨åºåããã¾ããã§ããä¾ãã°è¨äºã¿ã¤ãã«ã ãæ¬²ãããã ããªã¨ãããã¨ã§ã¯ã©ã¹åãæå®ãã¦ããããã¨ãã§ãã¾ãã
doc.css("h1.entry-title").each do |title| puts title end
ã¾ãã¾CSSã§ãããjQuery使ãã人ãªãããã¦ãã¦ãã ã¨æãã¾ãã
ãããã§ãã¿ã°ããã®ã¾ãã¾ãããªãã¦ä¸èº«ã®ããã¹ãã ããæ¬²ããæããã¾ãããã
doc.css("h1.entry-title").each do |title| puts title.text end
.textã§ã¨ãã¾ãã便å©ã§ããã説æã®å¿ è¦ãç¡ãã§ããããã¨ãè¨äºã®ãªã³ã¯ã欲ããã§ãããã
ã¯ã©ã¹åã§æå®ããã¨ãã
doc.css(".entry-title-link").each do |link| puts link["href"] end
ãè¨äºã¿ã¤ãã«ä¸ã®ãªã³ã¯ãã¨ããæãã§æå®ãã¦ãããããããªãã§ããããã
doc.css("h1.entry-title > a").each do |link| puts link["href"] end
ã¨ã¾ãããããªæãã§ãã
ã¹ã¯ã¬ã¤ãã³ã°ãå¿ è¦ã«ãªã£ãçµç·¯
ä»å¹´ã®æ¥ããããããã¹ãã¼ãã®ã¦ã§ããµã¤ãéå¶ããæä¼ããã¦ããã®ã§ããããã®ä½æ¥ã®ä¸ã¤ã«ã彿ã®ããã°è¨äºã®ã¿ã¤ãã«ã¨æ´æ°æ¥ã¨ãªã³ã¯ãã¨ã¯ã»ã«ãã¡ã¤ã«ã«ã¾ã¨ãããã¨ãããã®ãããã¾ãã
ãã©ã¦ã¶ããæãã¨ã®ãã¼ã¸ã«è¡ã£ã¦ãï¼è¨äºãã¤è¡¨ç¤ºããããããæä½æ¥ã§ã³ãã¼ã»ãã¼ã¹ãããã®ã¯è¾ãã§ããChromeã®JavaScriptã³ã³ã½ã¼ã«ãã対å¿ããDOMãã¨ã£ã¦ãã¦ã¿ã¦ããååé¢åãããããã¼ã¸ã«è¡¨ç¤ºãããã®ã¯ï¼è¨äºãã¤ãªã®ã§ããã®åº¦ãã¼ã¸é·ç§»ãå¿ è¦ã«ãªãã¾ãã
Rubyã§ã¡ãã£ã¨ããã¹ã¯ãªãããæ¸ãã¦ã¿ã¼ããã«ããï¼çºã§ã¨ã£ã¦ãããããã«ãããè¯ãã®ã§ã¯ãã¨ããã®ãnokogiriã«ããã¹ã¯ã¬ã¤ãã³ã°ã®å人çãªç®çã§ããã
æ¸ããã³ã¼ããã®ãã®ã¯å¤§ãããã¨ãªããã§ãããããããèãæ¹ãã§ããããã«ãªã£ã¦ãããã¨ã«ããã°ã©ãã³ã°ãã£ã¦ã¦ããã£ããªã¼ã¨ãã¿ãã¿æãã¦ãã¾ããä»åº¦ã¯Rubyããã¨ã¯ã»ã«ã«æ¸ãè¾¼ãæ¹æ³ã調ã¹ã¦ã¾ã¨ãããã§ãã