Skip to content

GmailReadBlock can’t read message bodies #9863

@Torantulino

Description

@Torantulino

📝 Summary

GmailReadBlock._get_email_body() only inspects the top‑level payload and a single text/plain part.
Because most Gmail messages are wrapped in one or more multipart/* containers, the method falls through and returns the fallback string “This email does not contain a text body.” for every message.

🔍 How to reproduce

  1. Authorize GmailReadBlock with a normal Gmail account.
  2. Make sure at least one ordinary email exists in the inbox (almost any message created in Gmail itself is multipart/alternative).
  3. Run the block (query="in:inbox", max_results=1).
  4. Inspect email.body in the output → value is the fallback string.

🤔 Expected

email.body contains the real plain‑text (or HTML) content of the message.

😢 Actual

email.body == "This email does not contain a text body." for every message.

📓 Root cause (per API docs)

Doc excerpt What it means
The message part body for this part may be empty for container MIME message parts.” (field description of payload.body) (Google for Developers) multipart/* parts never have body data; you must look deeper.
The child MIME message parts of this part. This only applies to container MIME message parts, for example multipart/*.” (field description of payload.parts) (Google for Developers) The payload is a tree; walk every parts[] array recursively.
When present, attachmentId points to data you must fetch separately; when not present, the content is in body.data.” (definition of MessagePartBody) (Google for Developers) Even leaf parts can omit data; code has to handle both storage modes.

✅ Acceptance criteria

  • For any email that contains either a text/plain or text/html part, email.body returns non‑empty text.
  • HTML‑only messages are returned as raw HTML.
  • Messages whose body is stored via attachmentId are correctly fetched and decoded.
  • Unchanged behaviour for messages that genuinely lack any readable body (still return fallback string).
  • Unit tests cover:
    • single‑part text/plain,
    • multipart/alternative with plain+html,
    • html‑only,
    • body delivered through attachmentId.

🛠 Recommended fix (high‑level)

def _get_email_body(self, msg):
    text = self._walk_for_body(msg["payload"])
    return text or "This email does not contain a readable body."

def _walk_for_body(self, part):
    mime = part.get("mimeType", "")
    body = part.get("body", {})

    if mime == "text/plain" and body.get("data"):
        return _decode(body["data"])

    if mime == "text/html" and body.get("data"):
        return html2text.html2text(_decode(body["data"]))

    if body.get("attachmentId"):
        data = self._download_attachment_body(part["body"]["attachmentId"],
                                              msg_id=msg["id"])
        return _decode(data)

    for sub in part.get("parts", []):
        text = self._walk_for_body(sub)
        if text:
            return text
  • _decode() safely adds missing padding before Base64‑URL decode.
  • _download_attachment_body() wraps users().messages().attachments().get(...). (Google for Developers)

📈 Effort / risk

  • Effort ≈ 2 h: implement helper + tests.
  • Risk low‑medium: new recursion; guard against very deep nesting (use a max depth or system recursion limit).

Please prioritise—currently every Gmail read flow returns unusable body content.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions