logo
Welcome Guest! To enable all features please Login or Register.

Notification

Icon
Error

New Topic Post Reply
Options
Go to last post Go to first unread
arun  
#1 Posted : Tuesday, October 22, 2019 2:23:19 AM(UTC)
Quote
arun

Rank: Newbie

Groups: Registered
Joined: 10/12/2019(UTC)
Posts: 3
India
Location: India

How to get Font, Color properties for each character as well as each Word in PDF
Paul Rayman  
#2 Posted : Tuesday, October 22, 2019 10:30:59 PM(UTC)
Quote
Paul Rayman

Rank: Administration

Groups: Administrators
Joined: 1/5/2016(UTC)
Posts: 866

Thanks: 3 times
Was thanked: 103 time(s) in 101 post(s)
Hi

In PDF documents, there is the concept of text objects. There are only text objects with properties and attributes, such as font or color. There is no difference between words or characters in this part. You can access text objects through the page.PageObjects collection.

The hierarchy of page objects is here (upper-left corner):
https://pdfium.patagames...um-Net-SDK-Reference.htm

Class diagram
arun  
#3 Posted : Wednesday, October 23, 2019 1:42:48 AM(UTC)
Quote
arun

Rank: Newbie

Groups: Registered
Joined: 10/12/2019(UTC)
Posts: 3
India
Location: India

Hi Paul, thank you very much for your reply.

My requirement is to extract text (along with pdftext object properties) from the rectangle bound.

Currently i'm using your method "GetBoundedTextInfo" and "AnalyzeCharBox" (got from this forum) which returns list(of FS_RECTF) and text.

after that, i'm checking each FS_RECTF overlaps in any of the pdftextobject of current page.

please suggest that Is there any better way of getting pdftext objects of a rectangle bound.

Quick Reply Show Quick Reply
Users browsing this topic
Guest
New Topic Post Reply
Forum Jump  
You can post new topics in this forum.
You can reply to topics in this forum.
You can delete your posts in this forum.
You can edit your posts in this forum.
You cannot create polls in this forum.
You can vote in polls in this forum.