Welcome Guest! To enable all features please Login or Register.



Go to last post Go to first unread
#1 Posted : Thursday, November 7, 2019 2:13:47 AM(UTC)

Rank: Member

Groups: Registered
Joined: 6/1/2016(UTC)
Posts: 25
Location: Hessen


is there a possibility to get the logical structure of a pdf document with .net code?
For example, if pdf document is generated by Microsoft Word the logical structure contains information about headlines, links, header, footer and so on.

Thanks and best regards
Paul Rayman  
#2 Posted : Thursday, November 14, 2019 11:04:18 AM(UTC)
Paul Rayman

Rank: Administration

Groups: Administrators
Joined: 1/5/2016(UTC)
Posts: 856

Thanks: 2 times
Was thanked: 103 time(s) in 101 post(s)

If this information is contained in the PDF file, then it is possible get it with SDK. Although I have not seen such things in the PDF specification, and I don’t know where it can be stored.
In any case, you can access all the contents of the PDF. Start your research with Document.Root.
In addition, the qpdf utility helps very well, if you run it with following command line, you will see the contents of the PDF document in text form, which will greatly facilitate its investigation.

qpdf.exe --stream-data=uncompress --normalize-content=y --object-streams=disable %1 %1_decoded.pdf

You can download qpdf utility here
Users browsing this topic
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.