Skip to content

wrong detection of Docx file as Zip if first entry is not word/ #197

@piranna

Description

@piranna

I have a Docx file that for some reason, first entry is [trash], so when looking for a word/ instead (or any other of the entries of OOXML files), it fails.`

Entries of the culprit file:

[trash]/0000.dat
[trash]/0002.dat
[trash]/0001.dat
word/document.xml
word/footnotes.xml
word/endnotes.xml
word/theme/theme1.xml
word/settings.xml
word/webSettings.xml
word/fontTable.xml
word/styles.xml
[trash]/0003.dat
docProps/app.xml
customXml/item1.xml
customXml/itemProps1.xml
customXml/item2.xml
customXml/itemProps2.xml
customXml/_rels/item1.xml.rels
[trash]/0004.dat
customXml/_rels/item2.xml.rels
word/_rels/document.xml.rels
customXml/item3.xml
customXml/itemProps3.xml
docProps/custom.xml
customXml/_rels/item3.xml.rels
_rels/.rels
docProps/core.xml
[Content_Types].xml

Actual value checked at ZippedDocumentBase.compare_bytes() for offset 0x1E:

bytearray(b'[tras')
bytearray(b'[tra')
bytearray(b'[tr')
bytearray(b'[trash]/0000.dat\xff\xff\xff')
bytearray(b'[trash]/000')
bytearray(b'[trash]/')

file` command detects the file as DocX correctly. Not sure if we should open the zip file an iterate on its entries, or a directa approach without openning it would be possible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions