PowerPython… or PythonPoint… or something

I’ve been meaning to update this for a fair while now as an uncharacteristically large amount of stuff has happened. Since exams finished I’ve managed to get a job at ECS working for two of my lecturers on two separate projects, which is pretty good because it means my work is varied. Both are IC design projects though, so there is a similar vein running through them.

One of my minor duties on this dual-job is to assemble slides from about twelve people into a large presentation, with cover slides for each speaker, every Friday for a progress meeting we all have. Naturally the first Friday I just did it by hand by importing each one in turn into PowerPoint. However it is a fairly tedious job, and to paraphrase a certain member of staff: why do something by hand when I have a powerful computer under the desk?

So I began to investigate automating the process.

Turns out that Python has an IMAP module in its standard library, which isn’t too surprising I suppose as the Python standard library is enormous. After some playing I managed to write a program that logged into my university email account and downloaded the appropriate PowerPoint attachments.

from imaplib import IMAP4_SSL
from email import message_from_string
from datetime import datetime, timedelta
import re

M = IMAP4_SSL('imapserver')
M.login('user', 'pass')
M.select()

since = (datetime.today() - timedelta(days=5)).strftime("%d-%b-%Y")

for user in users:
    typ, data = M.search(None, '(HEADER FROM "%s" SINCE %s)' % (user, since))

    for id in data[0].split():
        typ, data = M.fetch(id, 'RFC822')
        msg = message_from_string(data[0][1])
        if not msg.is_multipart():
            continue

        for part in msg.walk():
            fn  = part.get_filename()
            ext = re.match(r'.*\.(pptx?)$', fn):
            if not ext
                continue

            fp = file('%s.%s.temp' % (user, ext.group(1)), 'w')
            fp.write(part.get_payload())
            fp.close()

M.close()

This code downloads any .ppt or .pptx attachment from anyone with an email address in the users list and saves the file as <user>.ppt.temp or <user>.pptx.temp. It should be possible to decode the message part with msg.get_payload(decode=True) but it seemed to introduce a fair number of errors into the file. I think this is because it seems to convert line-by-line instead of as one large block. So I used the following code to fix this:

raw      = part.get_payload()
raw      = re.sub(r'[\r\n]+', '', raw)
encoding = part['Content-Transfer-Encoding'].lower()

if encoding == 'base64':
    raw = b64decode(raw)
elif encoding == 'quoted-printable':
    raw = quopri.decodestring(raw)
else:
    print "ERROR: Unknown coding strategy - ignoring."
    continue

Not particularly pretty, but it does the job.

So now I had all the emails downloaded, but I still needed to merge them all into a single file. At first I was thinking of making a C++ program to exploit the PowerPoint COM interface, but then I found Python for Windows Extensions which fully supports COM!

pythoncom.CoInitializeEx(pythoncom.COINIT_APARTMENTTHREADED)
gencache.EnsureModule('{2DF8D04C-5BFA-101B-BDE5-00AA0044DE52}', 0, 2, 4)
gencache.EnsureDispatch("PowerPoint.Application.12")

# Create an instance of PowerPoint and presentation
pp = Dispatch("PowerPoint.Application.12")
pp.Activate()
pres = pp.Presentations.Add()

# Insert all the downloaded presentations
count = 1
for filename in os.listdir('presentations'):
    pres.Slides.InsertFromFile(os.path.realpath(filename), count)
    count = pres.Slides.Count

# Save and exit
pres.SaveAs('compiled.pptx')
pres.Close()
pp.Quit()
pythoncom.CoUninitialize()

I was quite shocked at how straightforward it was to automate PowerPoint from Python, but this solution wasn’t quite good enough yet. Using InsertFromFile means that the imported presentation acquires the formatting of pres which is not what I wanted. Also, there appears to be a bug in PowerPoint 2007 which causes image references to be broken when importing from a .pptx into a .pptx with the COM interface.

Searching for a solution to the import with formatting issue led me to this awesome VBA function which has been referenced many, many times. I ported this to Python and it worked perfectly! There was still the weird image problem, but I used a really crude fix for that:

for filename in [f for f in os.listdir('presentations') if f[-4:] == 'pptx']:
    pres = pp.Presentations.Open(path.realpath(filename))
    pres.SaveAs(path.realpath(filename[:-1]))
    pres.Close()

Yes. I converted all the .pptx’s into .ppt‘s. Nothing intelligent here. Finally, I wrote a function to replace all the fonts added by the import with Arial. Job done.

fonts = {}
for i in range(1, pres.Fonts.Count + 1):
    fonts[pres.Fonts.Item(i).Name] = 1

fonts = fonts.keys()
fonts.remove(u'Arial')

for font in fonts:
    pres.Fonts.Replace(font, u'Arial')

So after all that I have a Python program that logs into my email, downloads a load of PowerPoint files, converts them all to .ppt format, inserts them into a blank presentation, and then normalises the font to Arial. Not bad for a few hundred lines of code!

I did change the program a bit so that it would copy a template with title slides in and add the presentations to that instead of a blank file, and then set the date appropriately on the main title slide. But the key point here is that I’ve replaced my tedious Friday-morning activity with a single command: makepres.

Victory.