Regarding map formats, I think you'll like what I came up with. IIRC Medo and I already talked about it way back when.
For the .metadata file, it would be a JSON file (or YAML, something simple) It would define everything about the map: entities, names, timers, game mode, and what images to use for background & walkmask. And everything would be packed in a zip.
I disagree about using a third image for metadata, for the same reason Medo does. Additionally, it would be difficult to have entities with custom properties, or complex entity-to-entity connections.
Regarding placing multiple game modes in a single file, I think it would only be worth it if those modes could somehow share resources. That is, there might be a base.metadata file that defines the placement for entities common to all game modes. Then other ctf.metadata and cp.metadata files would contain mode-specific content. The implementation details (multiple metadata files vs single file with an appropriate data structure) aren't set in stone.