node-low-level.hl.rkt (18168B)
1 #lang hyper-literate typed/racket/base #:no-require-lang #:no-auto-require 2 @(require scribble-enhanced/doc 3 racket/require 4 hyper-literate 5 (for-label racket/format 6 racket/promise 7 racket/list 8 type-expander 9 (except-in (subtract-in typed/racket/base type-expander) 10 values) 11 (only-in racket/base values) 12 (subtract-in racket/contract typed/racket/base) 13 phc-toolkit 14 phc-toolkit/untyped-only 15 remember)) 16 @(unless-preexpanding 17 (require (for-label (submod "..")))) 18 @doc-lib-setup 19 20 @title[#:style manual-doc-style 21 #:tag "node-low-level" 22 #:tag-prefix "phc-adt/node-low-level" 23 ]{Implementation of nodes: printing and equality} 24 25 @(chunks-toc-prefix 26 '("(lib phc-adt/scribblings/phc-adt-implementation.scrbl)" 27 "phc-adt/node-low-level")) 28 29 This section discusses the implementation of @tc[prop:custom-write] and 30 @tc[prop:equal+hash] for nodes. 31 32 @(table-of-contents) 33 34 @section{Printing nodes} 35 36 To avoid printing large and confusing swathes of data when a node is displayed, 37 we only print its constituents up to a certain depth. The 38 @tech[#:doc '(lib "scribblings/guide/guide.scrbl")]{parameter} 39 @tc[write-node-depth] controls the depth for printing nested nodes. 40 41 @CHUNK[<write-node-depth> 42 (define write-node-depth (make-parameter 1))] 43 44 The @tc[make-node-writer] macro expands to a procedure which prints a node with 45 the given name and fields. If the @racket[write-node-depth] is @racket[0], then 46 the contents of the node are elided, and only its name is printed, so that the 47 resulting printed representation is @racket["(node name …)"] with an actual 48 ellipsis character. 49 50 @CHUNK[<node-custom-write> 51 (define-syntax/parse (make-node-writer pid name fieldᵢ …) 52 #'(λ (self out mode) 53 (if (> (write-node-depth) 0) 54 (parameterize ([write-node-depth (sub1 (write-node-depth))]) 55 (fprintf out 56 "(node ~a ~a)" 57 'name 58 (string-join (list <format-field> …) " "))) 59 (fprintf out "(node ~a …)" 'name))))] 60 61 Each field is formatted as @tc[[fieldᵢ valueᵢ]]. Copy-pasting the whole printed 62 form will not form a valid expression which would be @tc[equal?] to the 63 original. This limitation is deliberate: a node will often refer to many other 64 nodes, and a stand-alone representation of such a node would result in a very 65 large printed form. Instead, the user should call the @tc[serialize-graph] 66 macro, which will produce a complete, canonical @note{The representation is 67 canonical so long as unordered sets or hash tables are not used as part of the 68 node's contents. In that case, the printed form is canonical modulo the order 69 of elements within the set or hash table. Once executed, it will nevertheless 70 produce a node which is @racket[equal?] to the original.} and self-contained 71 representation of the node. 72 73 @chunk[<format-field> 74 (format "[~a ~a]" 'fieldᵢ (force ((struct-accessor pid fieldᵢ) self)))] 75 76 @section{Comparing and hashing nodes} 77 78 Nodes are represented like tagged structures, but contain an extra @tc[raw] 79 field. The @tc[raw] field contains a low-level representation of the node, which 80 is used to implement node equality. The low-level representation uses the 81 @tc[raw-node] Racket @racket[struct]. It contains two fields, @tc[database] and 82 @tc[index]. The first is the database of nodes, as created by the graph 83 construction macro. It contains one vector of nodes per node type. The second is 84 a logical pointer into that database, consisting of the node's type's name, 85 represented as a symbol, and an offset within the corresponding vector, 86 represented as an @tc[Index]. 87 88 @chunk[<raw-node> 89 (struct/props (D I) raw-node ([database : D] [index : I]) #:transparent 90 <raw-node-equality>)] 91 92 A regular with-promises node can have several in-memory representations which 93 are not pointer-equal. This is due to the fact that the contents of node fields 94 are wrapped with promises, and accessing the node via two distinct paths will 95 yield two copies, each with fresh promises. We therefore use the @tc[raw-node] 96 as a proxy for pointer equality: we know for sure that two nodes are exactly the 97 same if the @tc[database] and @tc[index] is the same for both nodes. 98 99 @chunk[<raw-node-equality> 100 #:property prop:equal+hash 101 (list (λ (a b r) 102 (and (raw-node? a) 103 (raw-node? b) 104 (eq? (raw-node-database a) (raw-node-database b)) 105 (equal? (raw-node-index a) (raw-node-index b)))) 106 (λ (a r) 107 (bitwise-xor (eq-hash-code (raw-node-database a)) 108 (r (raw-node-index a)))) 109 (λ (a r) 110 (bitwise-xor (eq-hash-code (raw-node-database a)) 111 (r (raw-node-index a)))))] 112 113 The following function can then be used to test if two nodes are the same, based 114 on the contents of their @tc[raw] field: 115 116 @chunk[<same-node?> 117 (define (same-node? a b) 118 (and ((struct-predicate node-id) a) 119 ((struct-predicate node-id) b) 120 (equal? ((struct-accessor node-id raw) a) 121 ((struct-accessor node-id raw) b))))] 122 123 To detect cycles within the graph while implementing node equality, we use a 124 global hash table tracking which nodes have already been visited. 125 126 @chunk[<seen-hash-table> 127 (define seen-nodes 128 : (Parameterof (U #f (HashTable (raw-node Any Any) Any))) 129 (make-parameter #f))] 130 131 The current implementation uses a mutable hash table. It is only initialised 132 when @tc[equal?] starts comparing two nodes, so that references to nodes are not 133 kept once @tc[equal?] finished running. However, in theory, an immutable hash 134 table could be threaded through all the recursive calls to @tc[equal?]. 135 Unfortunately, the recursive equality function supplied by Racket when 136 implementing @tc[prop:equal+hash] does not accept an extra parameter to thread 137 state throughout the recursion. It would therefore be necessary to re-implement 138 the algorithm used by Racket's @tc[equal?] as described by 139 @cite[adams2008scheme-equality] tailored to the comparison of data structures 140 with high-level logical cycles. To be correct, such a re-implementation would 141 however need to access the @tc[prop:equal+hash] property of other structs, but 142 Racket provides no public predicate or accessor for that property. Therefore, 143 although it would, in theory, be possible to implement node equality without 144 mutable state, Racket's library does not offer the primitives needed to build 145 it. We therefore settle on using a global, mutable hash table, which will exist 146 only during the execution of @tc[equal?]. 147 148 @chunk[<node-equal+hash> 149 (define-syntax/parse 150 (make-node-comparer common-id node-id name fieldᵢ …) 151 (define-temp-ids "~a/τ" (fieldᵢ …)) 152 #'(let () 153 <same-node?> 154 <find-in-table> 155 <node-hash> 156 (list <node-equal?> 157 <node-equal-hash-code> 158 <node-equal-secondary-hash-code>)))] 159 160 @subsection{Hashing nodes} 161 162 @tc[equal-hash-code] and @tc[equal-secondary-hash-code] are implemented via 163 a single function @tc[node-hash], the only difference being the function used to 164 recursively compute the hash of sub-elements. 165 166 @chunk[<node-equal-hash-code> 167 (λ (a rec-equal-hash-code) 168 (node-hash a rec-equal-hash-code))] 169 170 @chunk[<node-equal-secondary-hash-code> 171 (λ (a rec-equal-secondary-hash-code) 172 (node-hash a rec-equal-secondary-hash-code))] 173 174 It would be desirable to implement hashing in the following way: if the current 175 node is already present in a hash table of seen nodes, but is not @tc[eq?] to 176 that copy, then the racket hash function is called on the already-seen node. 177 Otherwise, if the node has never been seen, or if it is @tc[eq?] to the seen 178 node, the hash code is computed. 179 180 The problem with this approach is that it introduces an intermediate recursive 181 call to Racket's hashing function. When Racket's hashing function is applied to 182 a structure with the @tc[prop:equal+hash] property, it does @emph{not} 183 return the result of the struct's hash implementation unmodified. 184 185 For example, the code below implements a struct @tc[s] with no fields, which 186 computes its hash code by recursively calling Racket's hashing function on other 187 (unique) instances of @tc[s], and returns the constant @tc[1] at a certain 188 depth. The custom hashing function does not alter in any way the result returned 189 by Racket's hashing function, however we can see that the hash for the same 190 instance of @tc[s] depends on the number of recursive calls to Racket's hashing 191 function @tc[r]. This simple experiment seems to suggest that Racket adds 192 @tc[50] at each step, but this is not something that can be relied upon. 193 194 @(require scribble/eval) 195 @defs+int[ 196 {(define recursion-depth (make-parameter #f)) 197 (struct s (x) #:transparent 198 #:property prop:equal+hash 199 (list (λ (a b r) (error "Not implemented")) 200 (λ (a r) 201 (if (= 0 (recursion-depth)) 202 1 203 (parameterize ([recursion-depth (sub1 (recursion-depth))]) 204 (r (s (gensym)))))) 205 (λ (a r) (error "Not implemented")))) 206 (define s-instance (s 'x))} 207 (parameterize ([recursion-depth 0]) 208 (equal-hash-code s-instance)) 209 (parameterize ([recursion-depth 1]) 210 (equal-hash-code s-instance)) 211 (parameterize ([recursion-depth 2]) 212 (equal-hash-code s-instance))] 213 214 Since the order of traversal of the nodes is not fixed in the presence of sets 215 and hash tables, we need to make sure that the recursion depth at which a node's 216 hash is computed is constant. We achieve this by @emph{always} calling Racket's 217 hash function on the already-seen node from the hash table, even if was inserted 218 just now. To distinguish between the current node and the already-seen node from 219 the hash table, we remove the contents of the node's @tc[raw] field, and replace 220 them with a special marker. 221 222 @chunk[<node-hash> 223 (: node-hash (∀ (fieldᵢ/τ …) 224 (→ (node-id fieldᵢ/τ … Any Any) (→ Any Integer) Integer))) 225 (define (node-hash nd racket-recur-hash) 226 (if (eq? (raw-node-database ((struct-accessor node-id raw) nd)) 227 'unique-copy) 228 <compute-hash> 229 <hash-init-table-and-recur>))] 230 231 When the node's @tc[raw] field does not indicate @tc['unique-copy], we first 232 initialise the hash table if needed, then recursively call 233 @tc[racket-recur-hash] on the unique copy of the node: 234 235 @chunk[<hash-init-table-and-recur> 236 (let ([seen-table (or (seen-nodes) 237 ((inst make-hash (raw-node Any Any) Any)))]) 238 (parameterize ([seen-nodes seen-table]) 239 (racket-recur-hash (find-in-table seen-table nd))))] 240 241 To obtain the unique copy of the node, we look it up in the hash table, creating 242 it and adding it to the hash table if the current node is not already present 243 there: 244 245 @chunk[<find-in-table> 246 (: find-in-table (∀ (fieldᵢ/τ …) 247 (→ (HashTable (raw-node Any Any) Any) 248 (node-id fieldᵢ/τ … Any Any) 249 Any))) 250 (define (find-in-table seen-table nd) 251 (hash-ref! seen-table 252 ((struct-accessor node-id raw) nd) 253 (λ () <make-unique-copy-node>)))] 254 255 To create a unique copy of the node, we create a new instance of the node's 256 struct, and copy over all the fields except for the @tc[raw] field, whose value 257 becomes @tc['unique-copy]. 258 259 @chunk[<make-unique-copy-node> 260 ((struct-constructor node-id) ((struct-accessor common-id fieldᵢ) nd) 261 … 262 (raw-node 'unique-copy 'unique-copy))] 263 264 The hash code is finally computed by combining the hash code for each field's 265 contents (after forcing it). The node's tag name is also hashed, and added to 266 the mix. 267 268 @chunk[<compute-hash> 269 (combine-hash-codes 270 (racket-recur-hash 'name) 271 (racket-recur-hash (force ((struct-accessor common-id fieldᵢ) nd))) 272 …)] 273 274 To combine hash codes, we simply compute their @elem[#:style 'tt]{xor}. Later 275 versions of this library may use more sophisticated mechanisms. 276 277 @chunk[<combine-hash-codes> 278 (: combine-hash-codes (→ Integer * Integer)) 279 (define (combine-hash-codes . hashes) 280 (apply bitwise-xor hashes))] 281 282 @subsection{Caching node equality} 283 284 We provide a mechanism at run-time to cache the result of equality tests 285 within a limited dynamic scope. The graph generation procedure can coalesce 286 nodes which are @racket[equal?], which means that it needs to perform a 287 significant number of equality comparisons, and can therefore benefit from 288 caching the result of inner equality tests during the execution of the 289 coalescing operation. 290 291 @chunk[<equality-cache> 292 (define equality-cache 293 : (Parameterof (U #f (HashTable (Pairof (raw-node Any Any) 294 (raw-node Any Any)) 295 Boolean))) 296 (make-parameter #f))] 297 298 The @racket[with-node-equality-cache] form executes its body while enabling 299 caching of the result of direct and recursive calls to @racket[equal?] on 300 nodes. 301 302 @chunk[<with-node-equality-cache> 303 (define-syntax-rule (with-node-equality-cache . body) 304 (parameterize ([equality-cache (or (equality-cache) 305 <make-equality-cache>)]) 306 . body))] 307 308 If necessary, a new equality cache is created, unless 309 @racket[with-node-equality-cache] is used within the dynamic extent of another 310 use of itself. 311 312 @chunk[<make-equality-cache> 313 ((inst make-hash (Pairof (raw-node Any Any) (raw-node Any Any)) Any))] 314 315 When comparing two nodes, we first check whether an equality cache is 316 installed. If so, we attempt to query the cache, and memoize the result of the 317 comparison when the pair of values is not already in the cache. 318 319 @chunk[<memoize-equality> 320 (λ (result-thunk) 321 (let ([e-cache (equality-cache)]) 322 (if e-cache 323 (cond 324 [(hash-has-key? e-cache (cons a-raw b-raw)) 325 (hash-ref e-cache (cons a-raw b-raw))] 326 [(hash-has-key? e-cache (cons b-raw a-raw)) 327 (hash-ref e-cache (cons b-raw a-raw))] 328 [else 329 (let ([result (result-thunk)]) 330 (hash-set! e-cache (cons a-raw b-raw) result) 331 result)]) 332 (result-thunk))))] 333 334 @subsection{Comparing nodes for equality} 335 336 We implement equality following the same architecture as for hash codes, but 337 check that both nodes are already unique copies. In addition, the implementation 338 of @tc[equal?] checks that both values are of the node's type. 339 340 @chunk[<node-equal?> 341 (λ (a b racket-recur-equal?) 342 (and ((struct-predicate node-id) a) 343 ((struct-predicate node-id) b) 344 (let ([a-raw ((struct-accessor node-id raw) a)] 345 [b-raw ((struct-accessor node-id raw) b)]) 346 (if (and (eq? (raw-node-database a-raw) 'unique-copy) 347 (eq? (raw-node-database b-raw) 'unique-copy)) 348 <compare> 349 (or (same-node? a b) 350 (<memoize-equality> 351 (λ () <equality-init-table-and-recur>)))))))] 352 353 When either or both of the node's @tc[raw] field do not indicate 354 @tc['unique-copy], we first initialise the hash table if needed, then 355 recursively call @tc[racket-recur-hash] on the unique copy of each node: 356 357 @chunk[<equality-init-table-and-recur> 358 (let ([seen-table (or (seen-nodes) 359 ((inst make-hash (raw-node Any Any) Any)))]) 360 (parameterize ([seen-nodes seen-table]) 361 (racket-recur-equal? (find-in-table seen-table a) 362 (find-in-table seen-table b))))] 363 364 The nodes are compared pointwise, checking each pair of fields for equality, 365 after forcing both: 366 367 @chunk[<compare> 368 (and (racket-recur-equal? (force ((struct-accessor common-id fieldᵢ) a)) 369 (force ((struct-accessor common-id fieldᵢ) b))) 370 …)] 371 372 @chunk[<*> 373 (require racket/promise 374 racket/string 375 racket/require 376 phc-toolkit 377 remember 378 typed-struct-props 379 (for-syntax racket/base 380 racket/syntax 381 racket/list 382 racket/set 383 racket/format 384 (subtract-in syntax/stx phc-toolkit/untyped) 385 syntax/parse 386 phc-toolkit/untyped)) 387 388 (provide make-node-comparer 389 make-node-writer 390 raw-node 391 write-node-depth 392 with-node-equality-cache) 393 394 <equality-cache> 395 <with-node-equality-cache> 396 <seen-hash-table> 397 <write-node-depth> 398 <node-custom-write> 399 <raw-node> 400 <combine-hash-codes> 401 <node-equal+hash>] 402 403 @define[adams2008scheme-equality 404 (string-append "Efficient nondestructive equality checking for trees" 405 " and graphs, Adams and Dybvig, 2008")] 406 @bibliography[ 407 @bib-entry[#:key adams2008scheme-equality 408 #:title @list{Efficient nondestructive equality checking for trees 409 and graphs in @emph{ACM Sigplan Notices} (Vol. 43, No. 9) 410 pp. 179–188} 411 #:date "2008" 412 #:author "Michael D. Adams and R. Kent Dybvig" 413 #:url "http://www.cs.indiana.edu/~dyb/pubs/equal.pdf"]]