Crafting a JSON parser with SIMD: Hidden costs
When I originally designed this parser, I decided that all JSON objects would be parsed into hash-maps. They seem like the right tool for the job and, in some sense, that's exactly what JSON objects are, mappings of keys to values.
Time complexity
Hash-maps lure you in with their appealing time complexity for the average case, let's have a look at what they promise:
Operation | Average Case | Worst Case |
---|---|---|
Insertion | O(1) | O(n) |
Lookup | O(1) | O(n) |
Deletion | O(1) | O(n) |
This table may differ slightly depending on the actual implementation but it will have to do. Hashmaps are sick, right? Most of the time you are dealing with constant time complexity. It's not a lie but this distracts from what is actually going on.
First, let's compare the hashmap to a humble list:
Operation | Average/Worst Case |
---|---|
Insertion (front) | O(1) |
Insertion (back) | O(n) |
Lookup | O(n) |
Deletion | O(n) |
Looks horrible, and yet, parsing objects into associative lists (alists) instead of hashmaps has sped up my parser by a significant amount...
Bringing lists up to speed
Here is the object parsing code as was previously:
(defun %parse-object (string index)
(declare (type simple-string string)
(type fixnum index))
(let ((current-index (skip-to-next-character string (1+ index)))
(parsed-object (make-hash-table :test 'equal
:size 10)))
(declare (type fixnum current-index))
;; empty object check
(when (char= (char string current-index) #\})
(return-from %parse-object (values parsed-object current-index)))
(loop with raw-string-length = (length string)
while (< current-index raw-string-length)
when (char/= +double-quote+ (char string current-index))
do (error "Key not of type string at position ~a!" current-index)
do (multiple-value-bind (parsed-key new-index)
(%parse-string string current-index)
(setf current-index (skip-to-next-character string new-index))
(if (char= #\: (char string current-index))
(incf current-index)
(error "Missing ':' after key '~a' at position ~a!" parsed-key current-index))
(multiple-value-bind (parsed-value new-index)
(parse string current-index)
(setf current-index (skip-to-next-character string new-index)
(gethash parsed-key parsed-object) parsed-value)
(let ((character (char string current-index)))
(cond
((char= #\} character)
(loop-finish))
((char/= #\, character)
(error "Expected ',' after object value. Instead found ~a at position ~a!"
character current-index))
(t
(setf current-index (skip-to-next-character string (incf current-index)))))))))
(values parsed-object (incf current-index))))
and here is the sped up version:
(defun %parse-object (string index)
(declare (type simple-string string)
(type fixnum index))
(let ((current-index (skip-to-next-character string (1+ index))))
(declare (type fixnum current-index))
;; empty object check
(when (char= (char string current-index) #\})
(return-from %parse-object (values nil current-index)))
(values
(loop with raw-string-length = (length string)
while (< current-index raw-string-length)
when (char/= +double-quote+ (char string current-index))
do (error "Key not of type string at position ~a!" current-index)
collect (multiple-value-bind (parsed-key new-index)
(%parse-string string current-index)
(setf current-index (skip-to-next-character string new-index))
(if (char= #\: (char string current-index))
(incf current-index)
(error "Missing ':' after key '~a' at position ~a!" parsed-key current-index))
(multiple-value-bind (parsed-value new-index)
(parse string current-index)
(setf current-index (skip-to-next-character string new-index))
(cons parsed-key parsed-value)))
do (let ((character (char string current-index)))
(cond
((char= #\} character)
(loop-finish))
((char/= #\, character)
(error "Expected ',' after object value. Instead found ~a at position ~a!"
character current-index))
(t
(setf current-index (skip-to-next-character string (incf current-index)))))))
(incf current-index))))
So we went from your standard hashmap to creating alists:
(("key1" . "value1")
("key2" . (("key 3" . "value3"))))
but why is it faster?
Cheap appends
Given a random list, if you were asked to append an element to it then yes, it would cost O(n)
. When you are actually creating it though, it's a different story. loop
hides all the magic so let's expand it:
(BLOCK NIL
(LET ((RAW-STRING-LENGTH (LENGTH STRING)))
(DECLARE (IGNORABLE RAW-STRING-LENGTH))
(LET* ((#:LOOP-LIST-HEAD-286 (LIST NIL))
(#:LOOP-LIST-TAIL-287 #:LOOP-LIST-HEAD-286))
(DECLARE (DYNAMIC-EXTENT #:LOOP-LIST-HEAD-286))
(TAGBODY
SB-LOOP::NEXT-LOOP
(IF (< CURRENT-INDEX RAW-STRING-LENGTH)
NIL
(GO SB-LOOP::END-LOOP))
(IF (CHAR/= +DOUBLE-QUOTE+ (CHAR STRING CURRENT-INDEX))
(ERROR "Key not of type string at position ~a!" CURRENT-INDEX))
(RPLACD #:LOOP-LIST-TAIL-287
(SETQ #:LOOP-LIST-TAIL-287
(LIST
(MULTIPLE-VALUE-CALL
#'(LAMBDA
(
&OPTIONAL PARSED-KEY NEW-INDEX
&REST #:IGNORE)
(DECLARE (IGNORE #:IGNORE))
(SETQ CURRENT-INDEX
(SKIP-TO-NEXT-CHARACTER STRING
NEW-INDEX))
(IF (CHAR= #\: (CHAR STRING CURRENT-INDEX))
(SETQ CURRENT-INDEX (+ 1 CURRENT-INDEX))
(ERROR
"Missing ':' after key '~a' at position ~a!"
PARSED-KEY CURRENT-INDEX))
(MULTIPLE-VALUE-CALL
#'(LAMBDA
(
&OPTIONAL PARSED-VALUE NEW-INDEX
&REST #:IGNORE)
(DECLARE (IGNORE #:IGNORE))
(SETQ CURRENT-INDEX
(SKIP-TO-NEXT-CHARACTER STRING
NEW-INDEX))
(CONS PARSED-KEY PARSED-VALUE))
(PARSE STRING CURRENT-INDEX)))
(%PARSE-STRING STRING CURRENT-INDEX)))))
(LET ((CHARACTER (CHAR STRING CURRENT-INDEX)))
(DECLARE (TYPE CHARACTER CHARACTER))
(IF (CHAR= #\} CHARACTER)
(GO SB-LOOP::END-LOOP)
(IF (CHAR/= #\, CHARACTER)
(ERROR
"Expected ',' after object value. Instead found ~a at position ~a!"
CHARACTER CURRENT-INDEX)
(THE T
(SETQ CURRENT-INDEX
(SKIP-TO-NEXT-CHARACTER STRING
(SETQ CURRENT-INDEX (+ 1 CURRENT-INDEX))))))))
(GO SB-LOOP::NEXT-LOOP)
SB-LOOP::END-LOOP
(RETURN-FROM NIL (TRULY-THE LIST (CDR #:LOOP-LIST-HEAD-286)))))))
It does exactly what you would expect. It keeps a reference to the last cons cell in the list #:LOOP-LIST-TAIL-287
and uses that to append at O(1)
time. But didn't we already have O(1)
insertion with the hashmap?
O(1) is relative
Just because 2 data structures define one operation as O(1)
it doesn't mean they cost the same, all it means is that their cost remains constant as the number of elements in the data structure increases.
Insertion in a hashmap is much more complicated than in a list. First of all, each insertion also bears the cost of the hashing algorithm, sure we are talking about nanoseconds but it all matters. Notice the size set on the hashmap initialization :size 10
. When you exceed that size you have to pay the hashing tax again in order to rehash all the keys so they can be moved into a larger structure while setting the too high to begin with just wastes memory. It's hard to find a balance when you don't really know much about the data that will be moved into it.
Then there is the testing function, the hashmap actually cares if 2 keys are identical which I initially thought was a benefit but I am just parsing the JSON, if 2 keys are the same then assoc
will consistently retrieve the first one, sure the second one will still be in the structure but at this point we have been given an "invalid" JSON and all bets are off the table.
So in general, creating the alist is going to be cheaper because we save on overhead and have ways to bridge the gap in time complexity.
Lots of little hidden costs to a hashmap. Wouldn't lookups be faster though? As always, it depends, after a certain number of keys the hashmap will be faster, below that the alist should win because of the reduced overhead.
Benchmarks
Before
(time (loop for i from 0 to 100 do (parse *data*)))
Evaluation took:
6.362 seconds of real time
2.015625 seconds of total run time (1.640625 user, 0.375000 system)
[ Real times consist of 2.084 seconds GC time, and 4.278 seconds non-GC time. ]
[ Run times consist of 0.625 seconds GC time, and 1.391 seconds non-GC time. ]
31.69% CPU
13,438,524,761 processor cycles
4,875,842,528 bytes consed
After
(time (loop for i from 0 to 100 do (parse *data*)))
Evaluation took:
4.857 seconds of real time
0.750000 seconds of total run time (0.671875 user, 0.078125 system)
[ Real times consist of 0.993 seconds GC time, and 3.864 seconds non-GC time. ]
[ Run times consist of 0.140 seconds GC time, and 0.610 seconds non-GC time. ]
15.44% CPU
10,258,086,921 processor cycles
3,837,231,136 bytes consed
Just about 1.5s cut off though the real win is looking at how much less work the GC is doing as well as the reduction in bytes consed.
What's next?
Probably looking into making string parsing a little better. The idea is to extract index ranges instead of substrings in hopes that this reduces memory and saves me some more time.