[나도코딩 웹스크래핑] 네이버날씨, IT헤드라인뉴스 오늘의영어회화 최종소스

프로그래밍/python

[나도코딩 웹스크래핑] 네이버날씨, IT헤드라인뉴스 오늘의영어회화 최종소스

콘솔워크 2021. 1. 18. 00:21

기존 버전에서 User-Agent를 셀레니움으로 가져오게 하는 부분을 추가하였다.

import re
import requests
from bs4 import BeautifulSoup
from selenium import webdriver


def agent_text():
    options = webdriver.ChromeOptions()
    options.headless = True
    options.add_argument("window-size=1920x1080")
    # options.add_argument("user-agent=")

    browser = webdriver.Chrome(options=options)
    browser.maximize_window()

    url = "https://www.whatismybrowser.com/detect/what-is-my-user-agent"
    browser.get(url)

    detected_value = browser.find_element_by_id("detected_value").text
    browser.quit()
    return detected_value


def create_soup(url):
    headers = {"User-Agent": agent_text()}
    print(headers)
    res = requests.get(url, headers=headers)
    res.raise_for_status()
    soup = BeautifulSoup(res.text, "lxml")

    return soup


def print_news(index, title, link):
    print("{}. {}".format(index+1, title))
    print("  (링크 : {})".format(link))


def scrape_weather():
    print("[오늘의 날씨]")
    url = "https://search.naver.com/search.naver?sm=top_hty&fbm=1&ie=utf8&query=%EC%84%9C%EC%9A%B8+%EB%82%A0%EC%94%A8"
    soup = create_soup(url)
    # 흐림, 어제보다 OO˚ 높아요
    cast = soup.find("p", attrs={"class": "cast_txt"}).get_text()
    # 현재 OO℃  (최저 OO˚ / 최고 OO˚)
    curr_temp = soup.find(
        "p", attrs={"class": "info_temperature"}).get_text().replace("도씨", "")  # 현재 온도
    min_temp = soup.find("span", attrs={"class": "min"}).get_text()  # 최저 온도
    max_temp = soup.find("span", attrs={"class": "max"}).get_text()  # 최고 온도
    # 오전 강수확률 OO% / 오후 강수확률 OO%
    morning_rain_rate = soup.find(
        "span", attrs={"class": "point_time morning"}).get_text().strip()  # 오전 강수확률
    afternoon_rain_rate = soup.find(
        "span", attrs={"class": "point_time afternoon"}).get_text().strip()  # 오후 강수확률

    # 미세먼지 OO㎍/㎥좋음
    # 초미세먼지 OO㎍/㎥좋음
    dust = soup.find("dl", attrs={"class": "indicator"})
    pm10 = dust.find_all("dd")[0].get_text()  # 미세먼지
    pm25 = dust.find_all("dd")[1].get_text()  # 초미세먼지

    # 출력
    print(cast)
    print("현재 {} (최저 {} / 최고 {})".format(curr_temp, min_temp, max_temp))
    print("오전 {} / 오후 {}".format(morning_rain_rate, afternoon_rain_rate))
    print()
    print("미세먼지 {}".format(pm10))
    print("초미세먼지 {}".format(pm25))
    print()


def scrape_headline_news():
    print("[헤드라인 뉴스]")
    url = "https://news.naver.com"
    soup = create_soup(url)
    news_list = soup.find(
        "ul", attrs={"class": "hdline_article_list"}).find_all("li", limit=3)
    for index, news in enumerate(news_list):
        title = news.find("a").get_text().strip()
        link = url + news.find("a")["href"]
        print_news(index, title, link)
    print()


def scrape_it_news():
    print("[IT 뉴스]")
    url = "https://news.naver.com/main/list.nhn?mode=LS2D&mid=shm&sid1=105&sid2=230"
    soup = create_soup(url)
    news_list = soup.find("ul", attrs={"class": "type06_headline"}).find_all(
        "li", limit=3)  # 3개까지만 가져오기
    for index, news in enumerate(news_list):
        a_idx = 0
        img = news.find("img")
        if img:
            a_idx = 1  # img 태그가 있으면 1번째 a 태그의 정보를 사용

        a_tag = news.find_all("a")[a_idx]
        title = a_tag.get_text().strip()
        link = a_tag["href"]
        print_news(index, title, link)
    print()

# [오늘의 영어 회화]
# (영어 지문)
# Jason : How do you think bla bla..?
# Kim : Well, I think ...

# (한글 지문)
# Json : 어쩌구 저쩌구 어떻게 생각하세요?
# Kim : 글쎄요, 저는 어쩌구 저쩌구


def scrape_english():
    print("[오늘의 영어 회화]")
    url = "https://www.hackers.co.kr/?c=s_eng/eng_contents/I_others_english&keywd=haceng_submain_lnb_eng_I_others_english&logger_kw=haceng_submain_lnb_eng_I_others_english"
    soup = create_soup(url)
    sentences = soup.find_all("div", attrs={"id": re.compile("^conv_kor_t")})
    print("(영어 지문)")
    # 8문장이 있다고 가정할 때, index 기준 4~7 까지 잘라서 가져옴
    for sentence in sentences[len(sentences)//2:]:
        print(sentence.get_text().strip())

    print()
    print("(한글 지문)")
    # 8문장이 있다고 가정할 때, index 기준 0~3 까지 잘라서 가져옴
    for sentence in sentences[:len(sentences)//2]:
        print(sentence.get_text().strip())
    print()


if __name__ == "__main__":

    scrape_weather()  # 오늘의 날씨 정보 가져오기
    scrape_headline_news()  # 헤드라인 뉴스 정보 가져오기
    scrape_it_news()  # IT 뉴스 정보 가져오기
    scrape_english()  # 오늘의 영어 회화 가져오기

저작자표시 비영리 변경금지 (새창열림)

'프로그래밍 > python' 카테고리의 다른 글

[Python pyinstaller] exe 윈도우 실행 파일 만들기 (0)	2021.02.02
[#1 python excel] 엑셀 다루기 기초 (0)	2021.01.23
[나도코딩 웹스크래핑) User-Agent 자동으로 가져오기 (0)	2021.01.18
[나도코딩 웹스크래핑] 퀴즈1 - 다음 부동산- 헬리오시티 검색 결과 출력 (0)	2021.01.17
[나도코딩 웹스크래핑] 정리 (0)	2021.01.17

현재글[나도코딩 웹스크래핑] 네이버날씨, IT헤드라인뉴스 오늘의영어회화 최종소스

콘솔워크 콘솔워크 님의 블로그입니다.

콘솔워크

Selenium 셀렉터잡기, Python, UiPath, Uipath 기초, Uipath 설치방법, 네이버 로그인 하기, 가상환경설치, 파이썬 가상환경 설치방법, 네이버부동산크롤링, 네이버 로그인 영수증 해결, venv 설치, pywinauto, 왕초보 파이썬 실행, 파이썬 환경설정, vscode venv 설치, 네이버 로그인 캡챠해결, Element is not clickable at point, selenium, 파이썬 네이버 로그인, 파이썬 가상환경 설치,

Today :
Yesterday :

일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

콘솔워크

[나도코딩 웹스크래핑] 네이버날씨, IT헤드라인뉴스 오늘의영어회화 최종소스

'프로그래밍 > python' 카테고리의 다른 글

'프로그래밍/python'의 다른글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역

[나도코딩 웹스크래핑] 네이버날씨, IT헤드라인뉴스 오늘의영어회화 최종소스

'프로그래밍 > python' 카테고리의 다른 글

'프로그래밍/python'의 다른글

관련글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역