-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat : 학교 홈페이지 리뉴얼로인한 교직원 스크랩 및 스크랩간 직위 추가 #223
Merged
Merged
Changes from all commits
Commits
Show all changes
48 commits
Select commit
Hold shift + click to select a range
cdab7ff
fix : 교직원 스크랩할 페이지 base url 변경(학과 홈페이지)
rlagkswn00 48a182e
remove : 리빙디자인, 커뮤니케이션 디자인 파서 삭제(미사용)
rlagkswn00 39d0b7c
fix : 학과 교직원 페이지 파서 수정(부동산 학과 제외)
rlagkswn00 7d1e6db
fix : 부동산 학과 파싱로직 변경
rlagkswn00 8db54c6
feat : 교직원 스크랩시 필요한 정보 변경
rlagkswn00 9c64b73
feat : 전체 학과 siteId, siteName 수정
rlagkswn00 0fd0dc8
feat : 교직원 정보 지원 유무 검증을 위한 메서드 추가
rlagkswn00 28a7557
feat : siteId, siteName 필드값 추가로 인한 getter 변경
rlagkswn00 961b896
remove : 교직원 스크랩 API Client 통합에 따른 미사용 클래스 삭제
rlagkswn00 fd4025e
feat : 학과 교직원 스크랩 API Client 로직 변경
rlagkswn00 cca4bd4
feat : 교직원 스크랩 과정 간 직위정보 추가
rlagkswn00 c8c22e1
feat : StaffUpdater 스크랩 로직 수정
rlagkswn00 a20e5de
feat : 수의예과, 수의학과 중복 교직원 정보 제거를 위한 distinct처리
rlagkswn00 5c915ee
feat : test용 html 파일 추가(컴퓨터공학부, 부동산학과) 및 legacy 파일 이동
rlagkswn00 f6bc03e
test : 학과 교직원 정보 스크랩 로직 테스트코드 작성
rlagkswn00 49ca89b
fix : entity position 추가 전 로직으로 수정
rlagkswn00 3ae1d1a
remove : legacy StaffScraperTest 삭제
rlagkswn00 38654a6
remove : 불필요 주석 제거
rlagkswn00 8e921da
test : MockServerSupport 객체 생성
rlagkswn00 2a84b06
remove : 불필요 DTO 제거
rlagkswn00 4a5f28a
test : 신규 StaffScraperTest 추가
rlagkswn00 708f4bf
remove : 불필요 import문 삭제
rlagkswn00 9ac2010
feat : StaffDTO identifier() 메서드 추가
rlagkswn00 2b36510
feat : 수의과대학 교직원 스크랩시 수의예과만 스크랩 하도록 변경
rlagkswn00 30c6675
feat : 전화번호 유틸 클래스 분리
rlagkswn00 8ce7b42
feat : 이메일 유틸 클래스 분리
rlagkswn00 3727f63
test : 이메일 유틸 클래스 테스트 추가
rlagkswn00 ffd86bd
teat : 전화번호 유틸 클래스 테스트 추가
rlagkswn00 49c684b
feat : Staff DB position column 추가
rlagkswn00 eb9cded
feat : Staff & StaffDTO position 추가
rlagkswn00 5d3a715
feat : StaffDTO position 비교 로직 추가
rlagkswn00 666db2c
feat : 이메일 valid 정책 수정(공백 허용)
rlagkswn00 b607552
feat : Staff 업데이트시 직위 추가
rlagkswn00 980fe0f
feat : 교직원 스크랩 스케쥴링 월 1회 활성화
rlagkswn00 143a5e5
feat : EmailSupporter & PhoneNumberSupporter 검증/변환 로직 분리
rlagkswn00 3049c30
feat : EmailSupporter & PhoneNumberSupporter 검증 메서드 테스트 추가
rlagkswn00 47b08f9
feat : StaffDTO 객체 생성 간 email, phone 검증, 변환하도록 수정
rlagkswn00 05bdb5d
refactor : 불필요 import문 제거
rlagkswn00 c8f848e
refactor : 테스트 클래스 및 메서드 public 키워드 제거
rlagkswn00 48b48c1
refactor : 주석 TODO 키워드 제거 및 replaceAll() -> replace() 변경
rlagkswn00 ec2fafb
feat : 직위 추가에 따른 StaffUpdate 로직 변경
rlagkswn00 5cf1377
refactor : 람다 함수 사용 간 불필요 괄호 제거
rlagkswn00 518ac33
feat : 전화번호 없을 경우 기본 저장 값 변경("-" -> "")
rlagkswn00 bcadaba
fix : StaffUpdate 로직 변경에 따른 Staff 도메인 테스트 변경(identifier(), 전화번호)
rlagkswn00 4216676
test : StaffUpdate 로직 변경에 따른 테스트 코드 변경(identifier(), 전화번호, 직위)
rlagkswn00 90f5958
refactor : 소나큐브 이슈 수정(변수명 컨벤션)
rlagkswn00 3954e23
remove : 불필요 출력문 제거
rlagkswn00 5afe41f
feat : 교직원 스크랩 스케쥴링 시간 1분 변경.(테스트 후 30분 되돌릴 예정)
rlagkswn00 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
68 changes: 68 additions & 0 deletions
68
src/main/java/com/kustacks/kuring/common/utils/converter/EmailSupporter.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
package com.kustacks.kuring.common.utils.converter; | ||
|
||
import java.util.Arrays; | ||
import java.util.regex.Pattern; | ||
|
||
public class EmailSupporter { | ||
private static final Pattern AT_PATTERN = Pattern.compile("\\s+at\\s+"); | ||
private static final Pattern DOT_PATTERN = Pattern.compile("\\s+dot\\s+"); | ||
private static final Pattern EMAIL_PATTERN = Pattern.compile("^[a-zA-Z0-9_!#$%&'\\*+/=?{|}~^.-]+@[a-zA-Z0-9.-]+$"); | ||
|
||
private static final String KONKUK_DOMAIN = "@konkuk.ac.kr"; | ||
private static final String EMPTY_EMAIL = ""; | ||
|
||
public static boolean isNullOrBlank(String email) { | ||
return email == null || email.isBlank(); | ||
} | ||
|
||
public static String convertValidEmail(String email) { | ||
if (isNullOrBlank(email)) { | ||
return EMPTY_EMAIL; | ||
} | ||
|
||
String[] emailGroups = splitEmails(email); | ||
String[] normalizedEmails = normalizeEmails(emailGroups); | ||
|
||
//여러 이메일 중 konkuk을 우선 선택, 없으면 첫번째 내용 | ||
return selectPreferredEmail(normalizedEmails); | ||
} | ||
|
||
private static String[] splitEmails(String email) { | ||
return email.split("[/,]"); | ||
} | ||
|
||
private static String[] normalizeEmails(String[] emailGroups) { | ||
return Arrays.stream(emailGroups) | ||
.map(EmailSupporter::normalizeEmail) | ||
.toArray(String[]::new); | ||
} | ||
|
||
private static String normalizeEmail(String email) { | ||
if (EMAIL_PATTERN.matcher(email).matches()) { | ||
return email; | ||
} | ||
|
||
if (containsSubstitutePatterns(email)) { | ||
return replaceSubstitutePatterns(email); | ||
} | ||
|
||
return EMPTY_EMAIL; | ||
} | ||
|
||
private static String replaceSubstitutePatterns(String email) { | ||
return email.replaceAll(DOT_PATTERN.pattern(), ".") | ||
.replaceAll(AT_PATTERN.pattern(), "@"); | ||
} | ||
|
||
private static boolean containsSubstitutePatterns(String email) { | ||
return DOT_PATTERN.matcher(email).find() && AT_PATTERN.matcher(email).find(); | ||
} | ||
|
||
// Konkuk 도메인 우선 선택 | ||
private static String selectPreferredEmail(String[] emails) { | ||
return Arrays.stream(emails) | ||
.filter(email -> email.endsWith(KONKUK_DOMAIN)) | ||
.findFirst() | ||
.orElseGet(() -> emails.length > 0 ? emails[0] : EMPTY_EMAIL); | ||
} | ||
} |
42 changes: 42 additions & 0 deletions
42
src/main/java/com/kustacks/kuring/common/utils/converter/PhoneNumberSupporter.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
package com.kustacks.kuring.common.utils.converter; | ||
|
||
import java.util.regex.Pattern; | ||
|
||
public class PhoneNumberSupporter { | ||
|
||
private static final Pattern LAST_FOUR_NUMBER_PATTERN = Pattern.compile("\\d{4}"); | ||
private static final Pattern FULL_NUMBER_PATTERN = Pattern.compile("02-\\d{3,4}-\\d{4}"); | ||
private static final Pattern FULL_NUMBER_WITH_PARENTHESES_PATTERN = Pattern.compile("02[)]\\d{3,4}-\\d{4}"); | ||
|
||
private static final String EMPTY_PHONE = ""; | ||
|
||
public static boolean isNullOrBlank(String number) { | ||
return number == null || number.isBlank(); | ||
} | ||
|
||
public static String convertFullExtensionNumber(String number) { | ||
if (isNullOrBlank(number)) { | ||
return EMPTY_PHONE; | ||
} | ||
|
||
if (FULL_NUMBER_PATTERN.matcher(number).matches()) { | ||
return number; | ||
} | ||
if (containsLastFourNumber(number)) { | ||
return "02-450-" + number; | ||
} | ||
if (containsParenthesesPattern(number)) { | ||
return number.replace(")", "-"); | ||
} | ||
|
||
return EMPTY_PHONE; | ||
} | ||
|
||
private static boolean containsLastFourNumber(String number) { | ||
return LAST_FOUR_NUMBER_PATTERN.matcher(number).matches(); | ||
} | ||
|
||
private static boolean containsParenthesesPattern(String number) { | ||
return FULL_NUMBER_WITH_PARENTHESES_PATTERN.matcher(number).matches(); | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
37 changes: 0 additions & 37 deletions
37
.../com/kustacks/kuring/worker/parser/staff/LivingAndCommunicationDesignStaffHtmlParser.java
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,23 +14,20 @@ public class RealEstateStaffHtmlParser extends StaffHtmlParserTemplate { | |
public boolean support(DeptInfo deptInfo) { | ||
return deptInfo instanceof RealEstateDept; | ||
} | ||
|
||
protected Elements selectStaffInfoRows(Document document) { | ||
Element table = document.select(".sub0201_list").get(0).getElementsByTag("ul").get(0); | ||
return table.getElementsByTag("li"); | ||
return document.select(".row"); | ||
} | ||
|
||
protected String[] extractStaffInfoFromRow(Element row) { | ||
Element content = row.select(".con").get(0); | ||
|
||
String name = content.select("dl > dt > a > strong").get(0).text(); | ||
String major = String.valueOf(content.select("dl > dd").get(0).childNode(4)).replaceFirst("\\s", "").trim(); | ||
|
||
Element textMore = content.select(".text_more").get(0); | ||
|
||
String lab = String.valueOf(textMore.childNode(4)).split(":")[1].replaceFirst("\\s", "").trim(); | ||
String phone = String.valueOf(textMore.childNode(6)).split(":")[1].replaceFirst("\\s", "").trim(); | ||
String email = textMore.getElementsByTag("a").get(0).text(); | ||
return new String[]{name, major, lab, phone, email}; | ||
String name = row.select(".info .title .name").text(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 이쪽 함수도 NPE가능성이 있어서 try |
||
|
||
Elements detalTagElement = row.select(".detail"); | ||
String jobPosition = detalTagElement.select("dt:contains(직위) + dd").text(); | ||
String major = detalTagElement.select("dt:contains(연구분야) + dd").text().trim(); | ||
String lab = detalTagElement.select("dt:contains(연구실) + dd").text().trim(); | ||
String extensionNumber = detalTagElement.select("dt:contains(연락처) + dd").text().trim(); | ||
String email = detalTagElement.select("dt:contains(이메일) + dd").text().trim(); | ||
return new String[]{name, jobPosition, major, lab, extensionNumber, email}; | ||
} | ||
} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
여기의 모든 html요소들이 항상 존재할까요?
간혹 특정 정보가 없는 교수님의 정보도 있을 수 있는것 같아요~
NullPointException의 여지가 있는 것같아요~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
말씀해주신 부분 알아보니 null값은 나오지 않도록 Jsoup에서 지원하는 것 같아요!
예를들어
etailElement.select(".ico1 dd")
을 수행할 때두 경우 모두 직접 테스트 해본 결과 null값이 아닌 ""과 같은 빈 공백이 배열에 저장됩니다.
실제, 데이터가 없는 학과(ex. 수의예과의 경우 1번의 경우에 해당되는걸 확인했습니다.
혹시나 싶어 Jsoup 라이브러리의 select메서드를 훑어봤을 때 찾는 요소가 없다면 빈 Elements 객체를 반환하는 걸로 보입니다.
마찬가지 text() 메서드 또한 빈 StringBuilder 객체를 생성하고 사용하기에 값이 없다면 그대로 빈 공백이 출력되도록 하는거 같습니다.
솔직하게 말하자면 잠깐 고민했던 부분인데 일단 돌아가길래 뒀던거 같습니다 하하...😂