logo
Welcome Guest! To enable all features please Login or Register.

Notification

Icon
Error

Options
Go to last post Go to first unread
lxman  
#1 Posted : Tuesday, May 7, 2019 12:43:44 PM(UTC)
lxman

Rank: Newbie

Groups: Registered
Joined: 10/19/2017(UTC)
Posts: 7
United States
Location: North Carolina

Hello,

I'm trying to change the fonts in a pdf (particularly the ones used in the form fields to display text that the end-user enters). I am using the following code:

Code:
using Patagames.Pdf.Enums;
using Patagames.Pdf.Net;
using Patagames.Pdf.Net.BasicTypes;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Windows.Forms;

namespace EmbeddedFonts
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
            PdfCommon.Initialize();
            using (PdfDocument doc = PdfDocument.Load(@"document.pdf", new PdfForms()))
            {
                List<PdfField> fields = doc.FormFill.InterForm.Fields.ToList();
                doc.Pages.ToList().ForEach(p =>
                {
                    List<PdfControl> ctls = doc.FormFill.InterForm.GetPageControls(p).ToList();
                    ctls.ForEach(c =>
                    {
                        if (c.Dictionary.Keys.Contains("DA"))
                        {
                            PdfTypeString font = c.Dictionary["DA"] as PdfTypeString;
                            string s = font.AnsiString.Substring(1);
                            if (s.Contains("CourierNew")) c.Dictionary["DA"] = PdfTypeString.Create(s.Replace("CourierNew", "Courier"));
                            Console.WriteLine(s);
                        }
                    });
                    p.GenerateContent();
                });
                doc.Save(@"new.pdf", Patagames.Pdf.Enums.SaveFlags.NoIncremental);
            }
        }
    }
}


At runtime I can look at the dictionary entries via the debugger and see that they are, in fact changing.

Difficulty is that once I open the saved PDF, now all of my fields have a font of Microsoft Sans Serif.

Any idea what I could be missing here?
Paul Rayman  
#2 Posted : Friday, May 10, 2019 7:41:16 AM(UTC)
Paul Rayman

Rank: Administration

Groups: Administrators
Joined: 1/5/2016(UTC)
Posts: 789

Thanks: 1 times
Was thanked: 98 time(s) in 96 post(s)
Hi,

Can you provide your PDF document?
I'll see what happens there.

Meanwhile, I assume that AP or DS entries take effect here. Either the DA entry becomes broken.
lxman  
#3 Posted : Friday, May 10, 2019 4:29:11 PM(UTC)
lxman

Rank: Newbie

Groups: Registered
Joined: 10/19/2017(UTC)
Posts: 7
United States
Location: North Carolina

Hmm, thank you for the reply, but rather than tackle an individual PDF, I think I need to learn more about the structure of said documents. In that vein, I offer the following:

Code:

using Patagames.Pdf.Net;
using Patagames.Pdf.Net.BasicTypes;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Windows.Forms;

namespace DictionaryExplore
{
    public partial class Form1 : Form
    {
        private readonly List<PdfTypeDictionary> dicts = new List<PdfTypeDictionary>();
        PdfDocument currDoc;

        public Form1()
        {
            InitializeComponent();
            PdfCommon.Initialize();
        }

        private void ParseDictionary(PdfTypeDictionary dict, TreeNode tn)
        {
            bool contains = false;
            dicts.ForEach(d =>
            {
                if (CompareDicts(dict, d)) contains = true;
            });
            if (contains) return;
            dicts.Add(dict);
            dict.ToList().ForEach(kvp => ParseObject(kvp.Value, tn.Nodes.Add(kvp.Key)));
        }

        private void ParseObject(PdfTypeBase item, TreeNode tn)
        {
            switch (item.ObjectType)
            {
                case Patagames.Pdf.Enums.IndirectObjectTypes.Invalid:
                    break;
                case Patagames.Pdf.Enums.IndirectObjectTypes.Boolean:
                    tn.Nodes.Add("Boolean").Nodes.Add(item.As<PdfTypeBoolean>().Value.ToString());
                    break;
                case Patagames.Pdf.Enums.IndirectObjectTypes.Number:
                    tn.Nodes.Add("Number").Nodes.Add(item.As<PdfTypeNumber>().FloatValue.ToString());
                    break;
                case Patagames.Pdf.Enums.IndirectObjectTypes.String:
                    tn.Nodes.Add("String").Nodes.Add(item.As<PdfTypeString>().AnsiString);
                    break;
                case Patagames.Pdf.Enums.IndirectObjectTypes.Name:
                    tn.Nodes.Add("Name").Nodes.Add(item.As<PdfTypeName>().Value);
                    break;
                case Patagames.Pdf.Enums.IndirectObjectTypes.Array:
                    TreeNode new_tn = tn.Nodes.Add("Array of " + (item as PdfTypeArray)?.GetAt(0).ObjectType);
                    (item as PdfTypeArray)?.ToList().ForEach(subitem =>
                    {
                        ParseObject(subitem, new_tn);
                    });
                    break;
                case Patagames.Pdf.Enums.IndirectObjectTypes.Dictionary:
                    ParseDictionary(item as PdfTypeDictionary, tn);
                    break;
                case Patagames.Pdf.Enums.IndirectObjectTypes.Stream:
                    tn.Nodes.Add("Stream").Nodes.Add(item.As<PdfTypeStream>().DecodedText);
                    break;
                case Patagames.Pdf.Enums.IndirectObjectTypes.Null:
                    break;
                case Patagames.Pdf.Enums.IndirectObjectTypes.Reference:
                    ParseObject((item as PdfTypeIndirect)?.Direct, tn);
                    break;
                default:
                    break;
            }
            item.Dispose();
        }

        private bool CompareDicts(PdfTypeDictionary d1, PdfTypeDictionary d2)
        {
            if (d1.Keys.Count != d2.Keys.Count) return false;
            List<string> tmp = d1.Keys.ToList();
            List<string> tmp2 = d2.Keys.ToList();
            tmp2.ForEach(t => tmp.Remove(t));
            if (tmp.Count != 0) return false;
            tmp = d1.Keys.ToList();
            tmp.ForEach(t => tmp2.Remove(t));
            if (tmp2.Count != 0) return false;
            bool isEqual = true;
            d1.ToList().ForEach(kvp =>
            {
                if (!CompareObjs(kvp, d2)) isEqual = false;
            });
            return isEqual;
        }

        private bool CompareObjs(KeyValuePair<string, PdfTypeBase> kvp, PdfTypeDictionary d2)
        {
            bool isEqual = true;
            switch (kvp.Value.ObjectType)
            {
                case Patagames.Pdf.Enums.IndirectObjectTypes.Invalid:
                    break;
                case Patagames.Pdf.Enums.IndirectObjectTypes.Boolean:
                    isEqual = (kvp.Value.As<PdfTypeBoolean>().Value == d2[kvp.Key].As<PdfTypeBoolean>().Value);
                    break;
                case Patagames.Pdf.Enums.IndirectObjectTypes.Number:
                    isEqual = (kvp.Value.As<PdfTypeNumber>().FloatValue == d2[kvp.Key].As<PdfTypeNumber>().FloatValue);
                    break;
                case Patagames.Pdf.Enums.IndirectObjectTypes.String:
                    isEqual = (kvp.Value.As<PdfTypeString>().AnsiString == d2[kvp.Key].As<PdfTypeString>().AnsiString);
                    break;
                case Patagames.Pdf.Enums.IndirectObjectTypes.Name:
                    isEqual = (kvp.Value.As<PdfTypeName>().Value == d2[kvp.Key].As<PdfTypeName>().Value);
                    break;
                case Patagames.Pdf.Enums.IndirectObjectTypes.Array:
                    break;
                case Patagames.Pdf.Enums.IndirectObjectTypes.Dictionary:
                    break;
                case Patagames.Pdf.Enums.IndirectObjectTypes.Stream:
                    break;
                case Patagames.Pdf.Enums.IndirectObjectTypes.Null:
                    break;
                case Patagames.Pdf.Enums.IndirectObjectTypes.Reference:
                    break;
                default:
                    break;
            }
            return isEqual;
        }

        private void BtnLoad_Click(object sender, EventArgs e)
        {
            using (OpenFileDialog ofd = new OpenFileDialog())
            {
                ofd.Title = "Choose a pdf file to examine.";
                ofd.CheckPathExists = true;
                ofd.CheckFileExists = true;
                ofd.ReadOnlyChecked = true;
                ofd.ShowReadOnly = false;
                ofd.Multiselect = false;
                ofd.AutoUpgradeEnabled = true;
                ofd.DefaultExt = "pdf";
                ofd.SupportMultiDottedExtensions = true;
                DialogResult dr;
                dr = ofd.ShowDialog();
                if ((dr != DialogResult.OK) && (dr != DialogResult.Yes)) return;
                tvMap.Nodes.Clear();
                currDoc = PdfDocument.Load(ofd.FileName, new PdfForms());
                currDoc.Pages.ToList().ForEach(p =>
                {
                    dicts.Clear();
                    TreeNode tn = new TreeNode("Root");
                    ParseDictionary(p.Dictionary, tn);
                    tvMap.Nodes.Add(tn);
                });
            }
        }

        private void Form1_Resize(object sender, EventArgs e)
        {
            tvMap.Width = ClientSize.Width - 24;
            tvMap.Height = ClientSize.Height - 57;
        }
    }
}


I'm sure you can figure this out, in fact, probably already have something similar, but in the interest of trying to contribute something of value here, here's how to set this up.

  1. Create a new VS C# WinForms project called DictionaryExplore (or call it what you like and just change the namespace entry)
  2. Get the nuget package for Pdfium.Net.SDK
  3. Go into the designer for Form1 and drop a TreeView control on the form (name it tvMap)
  4. Add Form1_Resize as the form's Resize event handler
  5. Add a button (call it BtnLoad and add BtnLoad_Click as the Click event handler) - the button needs to be placed above the treeview control for the resize logic to work properly
  6. Drop this code in place of Form1's code
  7. Fire it up


I'm not quite 100% sure that I have parsed ALL of the information in the dictionaries, yet. But the initial results appear promising. For me, I find that seeing things in this form makes it clearer (assuming that I am not misrepresenting the structure in the process).
Paul Rayman  
#4 Posted : Friday, May 10, 2019 9:46:07 PM(UTC)
Paul Rayman

Rank: Administration

Groups: Administrators
Joined: 1/5/2016(UTC)
Posts: 789

Thanks: 1 times
Was thanked: 98 time(s) in 96 post(s)
I do not quite understand what the question is, but just in case ... the structure of PDF dictionaries can be found in this document
https://www.adobe.com/co...ve/pdf_reference_1-7.pdf
starting on page 137.
Class diagram for the SDK - here https://pdfium.patagames...um-Net-SDK-Reference.htm

If you want to recursively iterate through the structure of the document, then you should keep in mind that objects can refer to themselves. I see you already have protection against such a looping, but I would recommend to change the check for equality of dictionaries. You can simply compare their Handle properties.
if two dictionaries are the same objects, they would be have the same Handle.

Something like following

Code:

private Dictionary<IntPtr, byte> processedObjects = new Dictionary<IntPtr, byte>();

private void ProcessObjects(PdfTypeDictionary dict)
{
    foreach (var item in dict)
    {
        if (processedObjects.ContainsKey(item.Value.Handle))
            continue;
        var obj = item.Value;
        processedObjects.Add(obj.Handle, 1);

        CustomProcessing(obj);

        if (obj.Is<PdfTypeDictionary>())
            ProcessObjects(obj.As<PdfTypeDictionary>());
        else if (obj.Is<PdfTypeArray>())
            ProcessObjects(obj.As<PdfTypeArray>());
    }
}

private void ProcessObjects(PdfTypeArray array)
{
    for (int i = 0; i < array.Count; i++)
    {
        if (processedObjects.ContainsKey(array[i].Handle))
            continue;
        var obj = array[i];
        processedObjects.Add(obj.Handle, 1);

        CustomProcessing(obj);

        if (obj.Is<PdfTypeDictionary>())
            ProcessObjects(obj.As<PdfTypeDictionary>());
        else if (obj.Is<PdfTypeArray>())
            ProcessObjects(obj.As<PdfTypeArray>());
    }
}

private void CustomProcessing(PdfTypeBase item)
{
    //your code here
}

...
ProcessObjects(document.Root)
or
ProcessObjects(page.Dictionary)

Edited by user Friday, May 10, 2019 10:04:49 PM(UTC)  | Reason: Not specified

lxman  
#5 Posted : Saturday, May 11, 2019 6:11:59 AM(UTC)
lxman

Rank: Newbie

Groups: Registered
Joined: 10/19/2017(UTC)
Posts: 7
United States
Location: North Carolina

Ah, yes, thank you. It was the infinite recursion issue that I was stumbling on. After about 2500 or so recursions, VS likes to throw a stack overflow exception. And I didn't realize that the test could be that simple. As far as the purpose of the question, the goal is to be able to modify the font used in the fields on a page. In order to do so, I assume that I would have to understand that there are multiple dictionary entries that can contribute to the final definition of which font is finally used for which field. In order to try to understand this, I wanted a tool that would allow me to visualize the structure and see where those definitions are in relation to each other.

It may be the long way around to understanding this issue, but it's the way my brain works.

Again, thank you for the insight. I will play with this and see where it goes.

One question, if I may:

What is the purpose of using a Dictionary<IntPtr, byte> instead of simply a List<IntPtr>? It appears that the byte value is just being set to a constant (1) and never changes. Is it for performance reasons, or is there something else I am not seeing?

Edited by user Saturday, May 11, 2019 6:18:19 AM(UTC)  | Reason: Not specified

Paul Rayman  
#6 Posted : Saturday, May 11, 2019 8:00:11 AM(UTC)
Paul Rayman

Rank: Administration

Groups: Administrators
Joined: 1/5/2016(UTC)
Posts: 789

Thanks: 1 times
Was thanked: 98 time(s) in 96 post(s)
Just for performance reasons.
My motives are that dictionary has a faster lookup with O(1) while the lookup performance of a List is an O(n) operation.
Users browsing this topic
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.