Time for a geek entry, even though few if any current readers care about this stuff. My apologies.
Email must be stored somewhere. First, while it's waiting on a server to be downloaded with a POP or IMAP client; second, after it's been downloaded to a user's computer; third, when a user saves it for posterity. (Yes, I'm deliberately ignoring lots of earlier steps that aren't relevant to my thoughts here, as well as some other ways of accessing email that generally just skip one or more of these steps.) It's fairly safe to assume that this storage is on a disk, but in what format?
Email is normally organized into mailboxes holding multiple messages. Sometimes each user has a single mailbox, and sometimes a single user can have multiple mailboxes for different purposes. Multiple mailboxes are especially useful when the end user saves mail, since that person often wants to organize what they're saving. Most people don't think about the format used for saving the mail, but that format becomes important if more than one program will be used to access the mail.
There are three major types of email storage formats. One is for each mailbox to be a single file containing all its messages, with some internal organization to determine which message is which. Another is for each mailbox to be a directory (or directory structure), with each message being a separate file within that directory. The third is within a database outside the usual filesystem, with some method of organization enabled by the flexibilities of general-purpose databases.
There is not yet a commonly-accepted schema for storing email in databases; each program that does it has its own method, making interoperability impractical in the general case. This may change in the future, but for now, database storage is not useful if more than one program needs to access the mail.
That leaves mailbox files and mailbox directories, which each have their advantages and disadvantages....
( Read more... )
Email must be stored somewhere. First, while it's waiting on a server to be downloaded with a POP or IMAP client; second, after it's been downloaded to a user's computer; third, when a user saves it for posterity. (Yes, I'm deliberately ignoring lots of earlier steps that aren't relevant to my thoughts here, as well as some other ways of accessing email that generally just skip one or more of these steps.) It's fairly safe to assume that this storage is on a disk, but in what format?
Email is normally organized into mailboxes holding multiple messages. Sometimes each user has a single mailbox, and sometimes a single user can have multiple mailboxes for different purposes. Multiple mailboxes are especially useful when the end user saves mail, since that person often wants to organize what they're saving. Most people don't think about the format used for saving the mail, but that format becomes important if more than one program will be used to access the mail.
There are three major types of email storage formats. One is for each mailbox to be a single file containing all its messages, with some internal organization to determine which message is which. Another is for each mailbox to be a directory (or directory structure), with each message being a separate file within that directory. The third is within a database outside the usual filesystem, with some method of organization enabled by the flexibilities of general-purpose databases.
There is not yet a commonly-accepted schema for storing email in databases; each program that does it has its own method, making interoperability impractical in the general case. This may change in the future, but for now, database storage is not useful if more than one program needs to access the mail.
That leaves mailbox files and mailbox directories, which each have their advantages and disadvantages....
( Read more... )