I’ve been meaning to update this for a fair while now as an uncharacteristically large amount of stuff has happened. Since exams finished I’ve managed to get a job at ECS working for two of my lecturers on two separate projects, which is pretty good because it means my work is varied. Both are IC design projects though, so there is a similar vein running through them.
One of my minor duties on this dual-job is to assemble slides from about twelve people into a large presentation, with cover slides for each speaker, every Friday for a progress meeting we all have. Naturally the first Friday I just did it by hand by importing each one in turn into PowerPoint. However it is a fairly tedious job, and to paraphrase a certain member of staff: why do something by hand when I have a powerful computer under the desk?
So I began to investigate automating the process.
Turns out that Python has an IMAP module in its standard library, which isn’t too surprising I suppose as the Python standard library is enormous. After some playing I managed to write a program that logged into my university email account and downloaded the appropriate PowerPoint attachments.
from imaplib import IMAP4_SSL
from email import message_from_string
from datetime import datetime, timedelta
import re
M = IMAP4_SSL('imapserver')
M.login('user', 'pass')
M.select()
since = (datetime.today() - timedelta(days=5)).strftime("%d-%b-%Y")
for user in users:
typ, data = M.search(None, '(HEADER FROM "%s" SINCE %s)' % (user, since))
for id in data[0].split():
typ, data = M.fetch(id, 'RFC822')
msg = message_from_string(data[0][1])
if not msg.is_multipart():
continue
for part in msg.walk():
fn = part.get_filename()
ext = re.match(r'.*\.(pptx?)$', fn):
if not ext
continue
fp = file('%s.%s.temp' % (user, ext.group(1)), 'w')
fp.write(part.get_payload())
fp.close()
M.close()
This code downloads any .ppt
or .pptx
attachment from anyone
with an email address in the users
list and saves the file as
<user>.ppt.temp
or <user>.pptx.temp
. It should be possible to decode the
message part with msg.get_payload(decode=True)
but it seemed to
introduce a fair number of errors into the file. I think this is because it
seems to convert line-by-line instead of as one large block. So I used the
following code to fix this:
raw = part.get_payload()
raw = re.sub(r'[\r\n]+', '', raw)
encoding = part['Content-Transfer-Encoding'].lower()
if encoding == 'base64':
raw = b64decode(raw)
elif encoding == 'quoted-printable':
raw = quopri.decodestring(raw)
else:
print "ERROR: Unknown coding strategy - ignoring."
continue
Not particularly pretty, but it does the job.
So now I had all the emails downloaded, but I still needed to merge them all into a single file. At first I was thinking of making a C++ program to exploit the PowerPoint COM interface, but then I found Python for Windows Extensions which fully supports COM!
pythoncom.CoInitializeEx(pythoncom.COINIT_APARTMENTTHREADED)
gencache.EnsureModule('{2DF8D04C-5BFA-101B-BDE5-00AA0044DE52}', 0, 2, 4)
gencache.EnsureDispatch("PowerPoint.Application.12")
# Create an instance of PowerPoint and presentation
pp = Dispatch("PowerPoint.Application.12")
pp.Activate()
pres = pp.Presentations.Add()
# Insert all the downloaded presentations
count = 1
for filename in os.listdir('presentations'):
pres.Slides.InsertFromFile(os.path.realpath(filename), count)
count = pres.Slides.Count
# Save and exit
pres.SaveAs('compiled.pptx')
pres.Close()
pp.Quit()
pythoncom.CoUninitialize()
I was quite shocked at how straightforward it was to automate PowerPoint from
Python, but this solution wasn’t quite good enough yet. Using InsertFromFile
means that the imported presentation acquires the formatting of pres
which is
not what I wanted. Also, there appears to be a bug in PowerPoint 2007 which
causes image references to be broken when importing from a .pptx
into a
.pptx
with the COM interface.
Searching for a solution to the import with formatting issue led me to this awesome VBA function which has been referenced many, many times. I ported this to Python and it worked perfectly! There was still the weird image problem, but I used a really crude fix for that:
for filename in [f for f in os.listdir('presentations') if f[-4:] == 'pptx']:
pres = pp.Presentations.Open(path.realpath(filename))
pres.SaveAs(path.realpath(filename[:-1]))
pres.Close()
Yes. I converted all the .pptx
’s into .ppt
‘s. Nothing intelligent here.
Finally, I wrote a function to replace all the fonts added by the import with
Arial. Job done.
fonts = {}
for i in range(1, pres.Fonts.Count + 1):
fonts[pres.Fonts.Item(i).Name] = 1
fonts = fonts.keys()
fonts.remove(u'Arial')
for font in fonts:
pres.Fonts.Replace(font, u'Arial')
So after all that I have a Python program that logs into my email, downloads a
load of PowerPoint files, converts them all to .ppt
format, inserts them into
a blank presentation, and then normalises the font to Arial. Not bad for a few
hundred lines of code!
I did change the program a bit so that it would copy a template with title
slides in and add the presentations to that instead of a blank file, and then
set the date appropriately on the main title slide. But the key point here is
that I’ve replaced my tedious Friday-morning activity with a single command:
makepres
.
Victory.