python 파이썬으로 html 코드에서 주석만 추출하는 방법 remove  beatuifulsoup4 사용

Notice

Recent Posts

Recent Comments

Link

« 2025/02 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Tags more

Archives

Today

Total

관리 메뉴

콘솔워크

python 파이썬으로 html 코드에서 주석만 추출하는 방법 remove  beatuifulsoup4 사용 본문

프로그래밍/python

python 파이썬으로 html 코드에서 주석만 추출하는 방법 remove  beatuifulsoup4 사용

콘솔워크 2022. 4. 9. 10:32

from bs4 import BeautifulSoup, Comment
soup = BeautifulSoup("""1<!--The loneliest number-->
                        <a>2<!--Can be as bad as one--><b>3""")
comments = soup.findAll(text=lambda text:isinstance(text, Comment))
comments_tags = [comment.extract() for comment in comments]

위의 코드를 날리면, 특정 html 코드에 있는 주석 부분만 추출해준다.
참조: https://code-examples.net/ko/q/358453

먼저 beatuifulsoup4가 없다면 먼저 설치해줍니다.

pip install beautifulsoup4

설치가 완료되면 import 하여 사용합니다.

from bs4 import BeautifulSoup, Comment


def get_html_without_comment(self):
    driver = self.driver

    p_detail_html = ""
    # 상세설명 가져오기
    p_detail_html = driver.find_element(By.CSS_SELECTOR, 'div.prpv_gosiwrap').get_attribute(
        "outerHTML").replace("\n", "").replace("\t", "")

    soup_p_detail_html = BeautifulSoup(p_detail_html)
    comments = soup_p_detail_html.findAll(
        text=lambda text: isinstance(text, Comment))

    for comment in comments:
        replace_text = comment.extract()
        p_detail_html = p_detail_html.replace(replace_text, "")
        p_detail_html = p_detail_html.replace("<!---->", "")
    return p_detail_html

그것을 응용하여 특정 html을 가져온다음에 여기서 주석 부분 () 이런 코드들을 모두 제거하고싶으면 이렇게 하면 된다.

저작자표시 비영리 변경금지

'프로그래밍 > python' 카테고리의 다른 글

pyinstaller hook-sqlalchemy.py 오류 error (0)	2022.05.06
파이썬 절대경로 상대경로 참조, 패키지 안에서 다른 패키지 호출, 모듈안에서 다른 모듈 호출 (0)	2022.04.29
python html 파일 png 또는 pdf로 변경 코드 (0)	2022.03.31
파이썬 딕셔너리 값 대입 (0)	2022.03.20
python 셀레니움 특정 url href a 태그 가져오기 (0)	2022.03.11

'프로그래밍/python' Related Articles

콘솔워크

python 파이썬으로 html 코드에서 주석만 추출하는 방법 remove  beatuifulsoup4 사용 본문

python 파이썬으로 html 코드에서 주석만 추출하는 방법 remove  beatuifulsoup4 사용

'프로그래밍 > python' 카테고리의 다른 글

티스토리툴바