[OCR] EasyOCR vs Paddle OCR 한국어 인식 비교하기 (1

Notice

Recent Posts

Recent Comments

Link

« 2024/09 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Tags more

Archives

Today

Total

관리 메뉴

D_ontory : 개발스토리

[OCR] EasyOCR vs Paddle OCR 한국어 인식 비교하기 (1 - PaddleOCR) 본문

Project/OCR

[OCR] EasyOCR vs Paddle OCR 한국어 인식 비교하기 (1 - PaddleOCR)

D_on 2024. 6. 4. 14:45

OCR 이란?

결론부터 말하자면, EasyOCR 과 PaddleOCR 을 비교해본 결과 PaddleOCR 의 인식률이 더 좋았다.

그 전에 OCR에 대해 간단하게 설명하자면, OCR은 크게 두가지 모델이 사용된다.

1. Text Detection (어디쯤에 글자라는게 있는지 찾음)
2. Text Recognition (찾은 영역에 있는글자가 어떤 글자인지 인식함)

PaddleOCR 실험 결과

PaddleOCR 은 중국 Baidu PaddlePaddle 에서 개발한 OCR 라이브러리다.

성능도 향상시키고, 경량화도 진행하여 CPU 환경에서도 테스트가 수월하게 가능하다.

테스트 데이터는 AIHUB에서 제공하는 책표지 데이터셋이다.

추후 학습 데이터로도 사용할 수 있을 것 같다.

https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_en/multi_languages_en.md

PaddleOCR/doc/doc_en/multi_languages_en.md at main · PaddlePaddle/PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and...

github.com

해당 git 에서 가져온 코드를 조금 수정하여 바로 테스트를 진행 할 수 있다.

from paddleocr import PaddleOCR, draw_ocr

ocr = PaddleOCR(lang="korean") ## 한국어 사용시 korean, 영어사용시 en

# 이미지 경로
img_path ='/workspace/test_tiny/책표지_총류_000014.jpg'


result = ocr.ocr(img_path)

# 추가
result = result[0]

# Recognition and detection can be performed separately through parameter control
# result = ocr.ocr(img_path, det=False)  Only perform recognition
# result = ocr.ocr(img_path, rec=False)  Only perform detection
# Print detection frame and recognition result
for line in result:
    print(line)

# Visualization
from PIL import Image

image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]


im_show = draw_ocr(image, boxes, txts, scores, font_path='/workspace/PaddleOCR/doc/fonts/korean.ttf')
im_show = Image.fromarray(im_show)
im_show.save('/workspace/test_result.jpg')

TypeError: '<' not supported between instances of 'tuple' and 'float'

에러가 나기 때문에

result = result[0]

부분을 추가해 주었다.

어디를 text로 detection 했는지, 그리고 detection 한 bbox 영역의 글자를 어떤 글자로 인식했는지 시각화까지 손쉽게 해볼 수 있다.

저작자표시

'Project > OCR' 카테고리의 다른 글

PaddleOCR export_model.py : TypeError: can only concatenate list (not "NoneType") to list (0)	2024.07.22
[paddleOCR] paddleOCR fine-tuning gpu error ValueError: substring not found (0)	2024.05.29

'Project/OCR' Related Articles

Comments

D_ontory : 개발스토리

[OCR] EasyOCR vs Paddle OCR 한국어 인식 비교하기 (1 - PaddleOCR) 본문

[OCR] EasyOCR vs Paddle OCR 한국어 인식 비교하기 (1 - PaddleOCR)

'Project > OCR' 카테고리의 다른 글

티스토리툴바