[나도코딩 웹스크래핑] 정리

Notice

Recent Posts

Recent Comments

Link

« 2025/02 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Tags more

Archives

Today

Total

관리 메뉴

콘솔워크

[나도코딩 웹스크래핑] 정리 본문

프로그래밍/python

[나도코딩 웹스크래핑] 정리

콘솔워크 2021. 1. 17. 23:05

배운내용 정리

Selenium을 더 잘 활용하기 위해서는 아래 사이트 참조

selenium-python.readthedocs.io/

Selenium with Python — Selenium Python Bindings 2 documentation

Note This is not an official documentation. If you would like to contribute to this documentation, you can fork this project in GitHub and send pull requests. You can also send your feedback to my email: baiju.m.mail AT gmail DOT com. So far 50+ community

selenium-python.readthedocs.io

XPath란?

html의 element에 대한 unique한 경로이다.

이 xpath를 활용하여 원하는 element를 쉽게 가져올 수 있다.

개발자 도구에서 오른쪽 마우스 클릭 >> Copy >> Copy Xpath 클릭

정규식

User-Agent

서버에 requests 할 때, request 하는 client가 어떠한 환경인지 보여주는 정보

Requests And Selenium

Selenium

크롬 버전 확인 후 아래 사이트에서 크롬 버전에 맞는 드라이버 다운로드 필요

☞ 크롬버전확인 : chrome://version

☞ 크롬드라이버 다운로드 : chromedriver.chromium.org/downloads

Downloads - ChromeDriver - WebDriver for Chrome

WebDriver for Chrome

chromedriver.chromium.org

로딩이 될때 까지 기다릴때 쓰는 함수

스크롤 내리기

from bs4 import BeautifulSoup
from selenium import webdriver

import time
interval = 2  # 2초에 한번 씩 스크롤 내림

browser = webdriver.Chrome()
browser.maximize_window()
url = "https://play.google.com/store/movies/top"
browser.get(url)

# 현재 문서 높이를 가져와서 저장
prev_height = browser.execute_script("return document.body.scrollHeight")

# 반복 수행
while True:
    browser.execute_script("window.scrollTo(0, document.body.scrollHeight)")

    # 페이지 로딩 대기
    time.sleep(interval)

    # 현재 문서 높이를 가져와서 저장
    curr_height = browser.execute_script("return document.body.scrollHeight")
    if curr_height == prev_height:
        break

    prev_height = curr_height

print("스크롤 완료")

BeautifulSoup

구글 이미지 다운로드

웹스크래핑 - 엑셀에 저장 (csv확장자)

브라우저를 띄우지 않고 크롬의 html 스크래핑

☞크롬 Headdless Chrome 소스 확인: uipath.tistory.com/59

[나도코딩 웹스크래핑] Chrome headless 최종소스

나도코딩에서 동적인 동작을 통해서 구글에서 webscraping이 가능하다. 스크롤을 내리면서 전체 할인된 영화정보를 가져오고, 이것을 크롬을 열지 않고도 웹스크래핑이 가능하다. headless 옵션을 제

uipath.tistory.com

데이터 사용 주의

강의링크.

www.youtube.com/watch?v=yQ20jZwDjTE

저작자표시 비영리 변경금지

'프로그래밍 > python' 카테고리의 다른 글

[나도코딩 웹스크래핑) User-Agent 자동으로 가져오기 (0)	2021.01.18
[나도코딩 웹스크래핑] 퀴즈1 - 다음 부동산- 헬리오시티 검색 결과 출력 (0)	2021.01.17
[나도코딩 웹스크래핑] Chrome headless 최종소스 (0)	2021.01.17
[Python dataframe] Union and Union ALL (0)	2021.01.15
[Python dataframe] 값이 없는 데이터 filtering notnull (0)	2021.01.12

'프로그래밍/python' Related Articles

콘솔워크

[나도코딩 웹스크래핑] 정리 본문

[나도코딩 웹스크래핑] 정리

배운내용 정리

Selenium을 더 잘 활용하기 위해서는 아래 사이트 참조

XPath란?

정규식

User-Agent

Requests And Selenium

Selenium

스크롤 내리기

BeautifulSoup

구글 이미지 다운로드

웹스크래핑 - 엑셀에 저장 (csv확장자)

브라우저를 띄우지 않고 크롬의 html 스크래핑

데이터 사용 주의

'프로그래밍 > python' 카테고리의 다른 글

티스토리툴바