So, raster collision checking is slow and rect collision checking is fast. As a result, your own code for raster collision, using thousands of rect collision checks must be faster than the raster collision code in a popular library?
Implementing mask collision in terms of rectangles might be much more feasible if you had even a simple broadphase, but you don't. Your algorithm doggedly looks at every single rect. Thinning out the walls reduces the number of rects dramatically, but did you consider the problem of tunnelling this creates? And there are still thousands of rects left.
The most annoying thing about this is that a proper rect-mask collision code would be much simpler than what you wrote, and much faster too, at least if you assume that the rect is always axis-aligned, but that is always the case in GG2. My recommendation: Look at the first function in
http://pygame.org/wiki/FastPixelPerfect for a template, but leave out the check against the second mask since your entire second rect is colliding anyway.
Yes, you look about x AND y now. So what? Our characters are just a few map pixels high and wide, so it is still much less checking than with your code.
All in all, this looks very much like an example of premature (mis)optimization to me. Congratulations - you identified a suspected problem and, before ever checking that it is a real one, pored work into a solution that is probably slower than just using the simple solution: the built-in mask collision checking with a filled rectangular mask.