logo
Welcome Guest! To enable all features please Login or Register.

Notification

Icon
Error

Options
Go to last post Go to first unread
Stefan  
#1 Posted : Tuesday, April 5, 2016 5:11:35 AM(UTC)
Stefan

Rank: Newbie

Groups: Registered
Joined: 4/4/2016(UTC)
Posts: 4
Germany
Location: Munic

My test Image:
UserPostedImage

My Code:
Code:

var api = OcrApi.Create();
api.Init();
api.PageSegmentationMode = PageSegMode.PSM_AUTO_OSD;
var renderer = OcrPdfRenderer.Create(OutputPdf, _tessData);
api.ProcessPages(InputPdf, null, 0, renderer);


OCR Result:

EEE ...........................

The Problem:
When I rotate the image by 90 degrees clockwise, the OSD seems to work fine, but the OCR seems to ignore the rotation, so my result now is:

mmm ...........................

As you can see I get 1 line of text, but the characters are read rotated. What am I doing wrong?
Paul Rayman  
#2 Posted : Tuesday, April 5, 2016 6:16:30 AM(UTC)
Paul Rayman

Rank: Administration

Groups: Administrators
Joined: 1/5/2016(UTC)
Posts: 1,011

Thanks: 5 times
Was thanked: 121 time(s) in 118 post(s)
Do you download language data for Orientation and Script Detection from page below?
https://tesseract.patagames.com/langs/
Stefan  
#3 Posted : Tuesday, April 5, 2016 7:39:27 AM(UTC)
Stefan

Rank: Newbie

Groups: Registered
Joined: 4/4/2016(UTC)
Posts: 4
Germany
Location: Munic

I had not, thanks. I recommend adding this package to the default set or at least make it more visible in the language selection :)

Now I get the following result:

........................... mum.

Which is probably because the dictionary kicks in. I'll try an actual text and report if I find any problems.
Stefan  
#4 Posted : Tuesday, April 5, 2016 8:03:04 AM(UTC)
Stefan

Rank: Newbie

Groups: Registered
Joined: 4/4/2016(UTC)
Posts: 4
Germany
Location: Munic

So, I did some Tests with a few documents with the following results:

Document rotatetd by 0°:
Works very nice, takes 5sec

Document rotated by 90°:
Works nearly as good as the not rotated one, takes 5sec as well

Document rotated by 180°:
Produces complete rubbish. It tries to parse the doument as is, which of course will not work.
This also takes 14sec, so nearly 3 times as long.

Document rotated by 270°:
This does not return any result after 5min so I think its stuck or at least not usable.

So I guess I will have to implement my own preprocessing to determine the rotation of a document, then rotate it accordingly and feed the rotated bitmap to tesseract?
Paul Rayman  
#5 Posted : Tuesday, April 5, 2016 10:30:46 PM(UTC)
Paul Rayman

Rank: Administration

Groups: Administrators
Joined: 1/5/2016(UTC)
Posts: 1,011

Thanks: 5 times
Was thanked: 121 time(s) in 118 post(s)
Please look at this post http://forum.patagames.com/posts/m288-Is-there-a-way-to-detect-page-rotation#post288

Edited by user Tuesday, April 5, 2016 10:31:20 PM(UTC)  | Reason: Not specified

Users browsing this topic
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.