[백준/Baekjoon] 5076번: Web Pages

3 minute read

1. 문제

You probably know about HTML, the mark up language used to create Web pages. HTML code contains a number of tags which are expected to follow certain rules.

In this problem we will be concerned with two of these rules:

Every opening tag has to have a corresponding closing tag
All tags must be properly nested.

Tags are marked by angled brackets which contain a keyword, such as <body> or . These are opening tags, the corresponding closing tags having / before the keyword, ie </body> and . It is possible for a tag to be both opening and closing, such as , which complies with rule 1.

To be properly nested, if a tag is opened inside another tag, it must be closed before the other tag closes. For example

<body> </body> is properly nested <body> </body> is not, and breaks rule 2. If there are no tags present, the text complies with both rules.

Attributes may be present within an opening tag, such as

The closing tag has only to match the keyword, not the attributes.

2. 입력

Input will consist of a number of lines of HTML code, each line containing from 0 to 255 characters. The last line will contain a single # character – do not process this line. Within the text of each line will be zero or more tags. No angle bracket will be present unless it is part of a properly formed tag.

Determine whether or not the HTML meets the rules specified above.

3. 출력

Output will consist of a single line for each line of input. The line will contain either the word legal, or the word illegal.

4. 예제 입력 1

<body> Oops, this is</body> naughty 
<body> Hell< </body>
Just text, no tags.
 Oh dear, we are missing something.
<a href=”http://www.nzprogcontest.org.nz”>This is a link</a>
#

5. 예제 출력 1

illegal
legal
legal
illegal
legal

6. 풀이

좋은 단어 문제와 비슷한 방법으로 풀 수 있다.

다만 문자열 처리 과정에서 예외 처리를 해야 하는 내용이 많아 약간 복잡하다. 문자열 처리 방법은 다음과 같다.

기본적으로 문자열을 한 글자씩 읽어들인다. 문자열은 태그를 기준으로 구분한다. ‘<‘로 시작하면 태그 내에 들어가 있는 것이고, ‘>‘로 끝나면 태그가 끝난 것이다. 태그가 활성화된 동안 태그 내의 글자를 임시로 저장한다. 이를 코드로 나타내면 다음과 같다.

tag = False  # 태그를 구별함
temp = ''  # 태그 내 문자열을 임시로 저장
for i in range(len(data)):
    if data[i] == '<':  # 태그 시작
        tag = True
    elif data[i] == '>':  # 태그 끝
        tag = False

이 때, ‘ ‘과 같은 경우를 구분해야 하므로, 태그가 끝났을 때 태그 내의 값이 ‘/’로 끝나면 스택에 저장하지 않는다.

또한 ‘<a href=”http://www.nzprogcontest.org.nz”>‘와 같이 태그 내에 속성이 있는 경우를 구분해야 하므로, 태그 내에 띄어쓰기 다음에 ‘/’가 오지 않는 경우에는 뒤 내용을 저장하지 않는다. 즉, ‘ ‘과 같은 경우가 아니라면 속성을 제거한 내용만 스택에 저장한다. 이를 코드로 나타내면 다음과 같다.

# 띄어쓰기 다음에 '/'가 오지 않는 경우
if tag and data[i] == ' ' and data[i + 1] != '/':
    tag = False  # 속성은 저장하지 않음

이후 스택을 이용하여 ‘legal’과 ‘illegal’을 구분한다.

7. 코드

def check(data):
    s = []
    tag = False
    temp = ''
    for i in range(len(data)):
        if data[i] == '<':
            tag = True
        elif data[i] == '>':
            tag = False
            if temp and temp[-1] == '/':
                temp = ''
                continue
            elif s and s[-1] == temp[1:]:
                s.pop()
            else:
                s.append(temp)
            temp = ''

        if tag and data[i] == ' ' and data[i + 1] != '/':
            tag = False

        if tag and data[i] != '<':
            temp += data[i]

    if s:
        print("illegal")
    else:
        print("legal")


while True:
    data = input()
    if data == '#':
        break
    else:
        check(data)

Share on

Twitter Facebook LinkedIn

derekahndev

[백준/Baekjoon] 5076번: Web Pages

1. 문제

2. 입력

3. 출력

4. 예제 입력 1

5. 예제 출력 1

6. 풀이

7. 코드

Share on

Leave a comment

You may also enjoy

[백준/Baekjoon] 2696번: 중앙값 구하기

[백준/Baekjoon] 1655번: 가운데를 말해요

[처음 배우는 딥러닝 챗봇] Python을 이용한 딥러닝 챗봇 제작기 (7)

[처음 배우는 딥러닝 챗봇] Python을 이용한 딥러닝 챗봇 제작기 (6)