logo
Welcome Guest! To enable all features please Login or Register.

Notification

Icon
Error

New Topic Post Reply
Options
Go to last post Go to first unread
ju1989  
#1 Posted : Thursday, November 7, 2019 2:13:47 AM(UTC)
Quote
ju1989

Rank: Member

Groups: Registered
Joined: 6/1/2016(UTC)
Posts: 25
Germany
Location: Hessen

Hi,

is there a possibility to get the logical structure of a pdf document with .net code?
For example, if pdf document is generated by Microsoft Word the logical structure contains information about headlines, links, header, footer and so on.

Thanks and best regards
Julian
Paul Rayman  
#2 Posted : Thursday, November 14, 2019 11:04:18 AM(UTC)
Quote
Paul Rayman

Rank: Administration

Groups: Administrators
Joined: 1/5/2016(UTC)
Posts: 895

Thanks: 3 times
Was thanked: 109 time(s) in 106 post(s)
Hi,

If this information is contained in the PDF file, then it is possible get it with SDK. Although I have not seen such things in the PDF specification, and I don’t know where it can be stored.
In any case, you can access all the contents of the PDF. Start your research with Document.Root.
In addition, the qpdf utility helps very well, if you run it with following command line, you will see the contents of the PDF document in text form, which will greatly facilitate its investigation.

qpdf.exe --stream-data=uncompress --normalize-content=y --object-streams=disable %1 %1_decoded.pdf

You can download qpdf utility here
http://qpdf.sourceforge.net
Quick Reply Show Quick Reply
Users browsing this topic
New Topic Post Reply
Forum Jump  
You can post new topics in this forum.
You can reply to topics in this forum.
You can delete your posts in this forum.
You can edit your posts in this forum.
You cannot create polls in this forum.
You can vote in polls in this forum.