So I have an ancient archive of emails and I want to combine them into one big happy text file, can that be done? I notice when I view an email and cut and paste the content I get exactly what I want — all the formatting is gone and I have a readable text file, all the meta data is gone along with all the HTML stuff. I’ve looked at ‘html2text’ but it leaves all the tags in, whereas I’d have thought it would strip them out. I could view, cut and paste, but that would take a long time.

BTW I use Thunderbird if that matters. I don’t think I can do this within the program, it’s going to be some command line utility I expect.

Asked By: Ray Andrews


Thunderbird has an attachment that does it: ImportExportTools. First you use it to import the loose .eml files, then you export them to a single text file. Works well.

Answered By: Ray Andrews

A scripted solution using:

  • formail (comes with procmail) to convert the emails to mailbox format
  • mutt to decode the mime stuff
  • elinks to convert HTML to text for those emails that don’t have a text/plain alternative
  • zsh to orchestrate the lot and find the eml files in the current working directory and below.
#! /bin/zsh -
print 'text/html; /usr/bin/elinks -force-html -dump %s; copiousoutput'
) mutt -F =(<<'EOF'
set pipe_decode
set pipe_split
alternative_order text/plain text
unset wait_key
auto_view text/html
push "<tag-pattern>~A<enter><tag-prefix-cond><pipe-message>formail>>out.mbox<enter><exit>"
) -f =(for f (**/*.eml(n.)) formail < $f) > /dev/null

The text is appended to the out.mbox file. That file can still be opened as a mail folder by most mail user agents (don’t know about Thunderbird) and should be plain text with most headers weeded.

If you don’t care about the file being in mailbox format, you can replace the formail>>out.mbox with cat>>out.txt, and if you don’t care about headers, with sed '1,/^$/d'>>out.txt.

Check the elinks man page for how to tune the HTML to text rendering if needed.

Answered By: Stéphane Chazelas
