logo
Welcome Guest! To enable all features please Login or Register.

Notification

Icon
Error

Options
Go to last post Go to first unread
arun  
#1 Posted : Tuesday, October 22, 2019 2:23:19 AM(UTC)
arun

Rank: Newbie

Groups: Registered
Joined: 10/12/2019(UTC)
Posts: 3
India
Location: India

How to get Font, Color properties for each character as well as each Word in PDF
Paul Rayman  
#2 Posted : Tuesday, October 22, 2019 10:30:59 PM(UTC)
Paul Rayman

Rank: Administration

Groups: Administrators
Joined: 1/5/2016(UTC)
Posts: 844

Thanks: 2 times
Was thanked: 103 time(s) in 101 post(s)
Hi

In PDF documents, there is the concept of text objects. There are only text objects with properties and attributes, such as font or color. There is no difference between words or characters in this part. You can access text objects through the page.PageObjects collection.

The hierarchy of page objects is here (upper-left corner):
https://pdfium.patagames...um-Net-SDK-Reference.htm

Class diagram
arun  
#3 Posted : Wednesday, October 23, 2019 1:42:48 AM(UTC)
arun

Rank: Newbie

Groups: Registered
Joined: 10/12/2019(UTC)
Posts: 3
India
Location: India

Hi Paul, thank you very much for your reply.

My requirement is to extract text (along with pdftext object properties) from the rectangle bound.

Currently i'm using your method "GetBoundedTextInfo" and "AnalyzeCharBox" (got from this forum) which returns list(of FS_RECTF) and text.

after that, i'm checking each FS_RECTF overlaps in any of the pdftextobject of current page.

please suggest that Is there any better way of getting pdftext objects of a rectangle bound.

Users browsing this topic
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.