optimization for extracting email and URLs

In case the emails and URLs are extracted it is possible to avoid running the (expensive) regular expression and replace it with a much simpler and cheaper check to filter out large amounts of files (over 50% in my experience).

By first checking for '@' and '://' you can avoid having to run some of these checks, if these characters are not present.