-
Notifications
You must be signed in to change notification settings - Fork 78
Integrate a small (130 line) implementation of MD5 #199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Creates a ql-md5 package with exported function "md5-file" that returns a hex string that can be compared with "string-equal" to verify against. New condition "corrupt-local-archive", potentially signaled from "check-local-archive-file"
This looks good, thank you! Can you tell me some benchmarks on various implementations? Like, how many MB/sec relative to SBCL on ABCL, CLISP, and others? |
I will benchmark further. MD5's use a lot of 32-bit ops which on non-64bit archs unfortunately won't be fixnums. I don't expect this to be super fast, as I didn't even declare optimize speed, but to test it I did locally download every release .tgz (~400MByte) and have it md5 the lot of them in about 7 seconds on sbcl on a 6 year old iMac. (time (dolist (x (ql-dist:provided-releases (car (ql-dist:enabled-dists)))) (ql-dist:check-local-archive-file x)))
Evaluation took:
7.260 seconds of real time
6.345531 seconds of total run time (5.771987 user, 0.573544 system)
[ Run times consist of 0.041 seconds GC time, and 6.305 seconds non-GC time. ]
87.41% CPU
28,975,291,831 processor cycles
522,484,592 bytes consed One of the functions |
This is running ql-dist:check-local-archive-file on 1899 package archives, totalling 376M. CCL v1.12: 20.5 secondsClozure Common Lisp Version 1.12 DarwinX8664
For more information about CCL, please see http://ccl.clozure.com.
CCL is free software. It is distributed under the terms of the Apache
Licence, Version 2.0.
? (time (dolist (x (ql-dist:provided-releases (car (ql-dist:enabled-dists)))) (ql-dist:check-local-archive-file x)))
(DOLIST (X (QL-DIST:PROVIDED-RELEASES (CAR (QL-DIST:ENABLED-DISTS)))) (QL-DIST:CHECK-LOCAL-ARCHIVE-FILE X))
took 20,527,187 microseconds (20.527187 seconds) to run.
50,954 microseconds ( 0.050954 seconds, 0.25%) of which was spent in GC.
During that period, and with 8 available CPU cores,
18,785,078 microseconds (18.785078 seconds) were spent in user mode
908,797 microseconds ( 0.908797 seconds) were spent in system mode
526,313,584 bytes of memory allocated.
622 minor page faults, 12 major page faults, 0 swaps. CLISP v2.49: 487 seconds[1]> (time (dolist (x (ql-dist:provided-releases (car (ql-dist:enabled-dists)))) (ql-dist:check-local-archive-file x)))
Real time: 487.09634 sec.
Run time: 485.27423 sec.
Space: 2867582144 Bytes
GC: 1538, GC time: 8.831983 sec.
NIL ABCL v1.6.1 : 493 secondsArmed Bear Common Lisp 1.6.1
Java 13.0.2 N/A
OpenJDK 64-Bit Server VM
Low-level initialization completed in 0.199 seconds.
Startup completed in 1.09 seconds.
Type ":help" for a list of available commands.
CL-USER(1): (time (dolist (x (ql-dist:provided-releases (car (ql-dist:enabled-dists)))) (ql-dist:check-local-archive-file x)))
493.581 seconds real time
706103756 cons cells
NIL ECL v20.4.24 : 269 seconds;;; Loading "/Users/joff/quicklisp/setup.lisp"
;;; Loading #P"/usr/local/Cellar/ecl/20.4.24/lib/ecl-20.4.24/asdf.fas"
ECL (Embeddable Common-Lisp) 20.4.24 (git:UNKNOWN)
Copyright (C) 1984 Taiichi Yuasa and Masami Hagiya
Copyright (C) 1993 Giuseppe Attardi
Copyright (C) 2013 Juan J. Garcia-Ripoll
Copyright (C) 2018 Daniel Kochmanski
Copyright (C) 2020 Daniel Kochmanski and Marius Gerbershagen
ECL is free software, and you are welcome to redistribute it
under certain conditions; see file 'Copyright' for details.
Type :h for Help.
Top level in: #<process TOP-LEVEL 0x10baedf80>.
> (time (dolist (x (ql-dist:provided-releases (car (ql-dist:enabled-dists)))) (ql-dist:check-local-archive-file x)))
real time : 269.974 secs
run time : 267.677 secs
gc count : 123 times
consed : 592999584 bytes |
Since adding some declares and setting speed optimization, the time it takes to md5 check the entire quicklisp archive directory in sbcl has gone from 7.2 seconds to 2.7 seconds. On my iMac, running a (time (dolist (x (ql-dist:provided-releases (car (ql-dist:enabled-dists)))) (ql-dist:check-local-archive-file x)))
Evaluation took:
2.732 seconds of real time
2.733517 seconds of total run time (2.454918 user, 0.278599 system)
[ Run times consist of 0.035 seconds GC time, and 2.699 seconds non-GC time. ]
100.07% CPU
10,901,781,530 processor cycles
522,330,080 bytes consed |
Just my two bits: I don't really like that quicklisp still uses md5 as it is really insecure. |
MD5 is better than nothing for detecting accidental file corruption, but I agree that it is not suitable for detecting malicious tampering. For detecting malicious tampering, I have a branch that uses OpenPGP-signed manifests of sha256 digests. Unfortunately it remains a work in progress. |
The hash works just fine if the primary purpose is not security, but something a little better than a file size and name check defense against corrupt downloads. For security, I'd recommend using ed25519 signatures. It hashes with SHA512 then uses elliptic curve crypto to sign the hash. Quicklisp could have a defconstant public key to verify against. With a little massaging, the ironclad ed25519 implementation could be pulled out in isolation into a standalone .lisp. |
Avoid needless loadtime (re)calculation and storage of MD5 T table. Keeps the fasl load time minimal. T table is only needed for macroexpansion.
Wanted to note that in the performance optimizations, I declared a 32-bit modulo add operation inline (mod32+). This results in 256 inline expansions of it in one function, which causes a warning on old SBCL versions and seems to trigger miscompilation on abcl. As a result, there is a abcl read-time conditional around the inline decl. |
Creates a
ql-md5
package with exported functionmd5-file
thatreturns a hex string that can be compared with
string-equal
toverify against. New condition
corrupt-local-archive
, potentiallysignaled from
check-local-archive-file