node-low-level.hl.rkt - www - Unnamed repository; edit this file 'description' to name the repository.

node-low-level.hl.rkt (18168B)
      1 #lang hyper-literate typed/racket/base #:no-require-lang #:no-auto-require
      2 @(require scribble-enhanced/doc
      3           racket/require
      4           hyper-literate
      5           (for-label racket/format
      6                      racket/promise
      7                      racket/list
      8                      type-expander
      9                      (except-in (subtract-in typed/racket/base type-expander)
     10                                 values)
     11                      (only-in racket/base values)
     12                      (subtract-in racket/contract typed/racket/base)
     13                      phc-toolkit
     14                      phc-toolkit/untyped-only
     15                      remember))
     16 @(unless-preexpanding
     17   (require (for-label (submod ".."))))
     18 @doc-lib-setup
     19 
     20 @title[#:style manual-doc-style
     21        #:tag "node-low-level"
     22        #:tag-prefix "phc-adt/node-low-level"
     23        ]{Implementation of nodes: printing and equality}
     24 
     25 @(chunks-toc-prefix
     26   '("(lib phc-adt/scribblings/phc-adt-implementation.scrbl)"
     27     "phc-adt/node-low-level"))
     28 
     29 This section discusses the implementation of @tc[prop:custom-write] and
     30 @tc[prop:equal+hash] for nodes.
     31 
     32 @(table-of-contents)
     33 
     34 @section{Printing nodes}
     35 
     36 To avoid printing large and confusing swathes of data when a node is displayed,
     37 we only print its constituents up to a certain depth. The
     38 @tech[#:doc '(lib "scribblings/guide/guide.scrbl")]{parameter}
     39 @tc[write-node-depth] controls the depth for printing nested nodes.
     40 
     41 @CHUNK[<write-node-depth>
     42        (define write-node-depth (make-parameter 1))]
     43 
     44 The @tc[make-node-writer] macro expands to a procedure which prints a node with
     45 the given name and fields. If the @racket[write-node-depth] is @racket[0], then
     46 the contents of the node are elided, and only its name is printed, so that the
     47 resulting printed representation is @racket["(node name …)"] with an actual
     48 ellipsis character.
     49 
     50 @CHUNK[<node-custom-write>
     51        (define-syntax/parse (make-node-writer pid name fieldᵢ …)
     52          #'(λ (self out mode)
     53              (if (> (write-node-depth) 0)
     54                  (parameterize ([write-node-depth (sub1 (write-node-depth))])
     55                    (fprintf out
     56                             "(node ~a ~a)"
     57                             'name
     58                             (string-join (list <format-field> …) " ")))
     59                  (fprintf out "(node ~a …)" 'name))))]
     60 
     61 Each field is formatted as @tc[[fieldᵢ valueᵢ]]. Copy-pasting the whole printed
     62 form will not form a valid expression which would be @tc[equal?] to the
     63 original. This limitation is deliberate: a node will often refer to many other
     64 nodes, and a stand-alone representation of such a node would result in a very
     65 large printed form. Instead, the user should call the @tc[serialize-graph]
     66 macro, which will produce a complete, canonical @note{The representation is
     67  canonical so long as unordered sets or hash tables are not used as part of the
     68  node's contents. In that case, the printed form is canonical modulo the order
     69  of elements within the set or hash table. Once executed, it will nevertheless
     70  produce a node which is @racket[equal?] to the original.} and self-contained
     71 representation of the node.
     72 
     73 @chunk[<format-field>
     74        (format "[~a ~a]" 'fieldᵢ (force ((struct-accessor pid fieldᵢ) self)))]
     75 
     76 @section{Comparing and hashing nodes}
     77 
     78 Nodes are represented like tagged structures, but contain an extra @tc[raw]
     79 field. The @tc[raw] field contains a low-level representation of the node, which
     80 is used to implement node equality. The low-level representation uses the
     81 @tc[raw-node] Racket @racket[struct]. It contains two fields, @tc[database] and
     82 @tc[index]. The first is the database of nodes, as created by the graph
     83 construction macro. It contains one vector of nodes per node type. The second is
     84 a logical pointer into that database, consisting of the node's type's name,
     85 represented as a symbol, and an offset within the corresponding vector,
     86 represented as an @tc[Index].
     87 
     88 @chunk[<raw-node>
     89        (struct/props (D I) raw-node ([database : D] [index : I]) #:transparent
     90                      <raw-node-equality>)]
     91 
     92 A regular with-promises node can have several in-memory representations which
     93 are not pointer-equal. This is due to the fact that the contents of node fields
     94 are wrapped with promises, and accessing the node via two distinct paths will
     95 yield two copies, each with fresh promises. We therefore use the @tc[raw-node]
     96 as a proxy for pointer equality: we know for sure that two nodes are exactly the
     97 same if the @tc[database] and @tc[index] is the same for both nodes.
     98 
     99 @chunk[<raw-node-equality>
    100        #:property prop:equal+hash
    101        (list (λ (a b r)
    102                (and (raw-node? a)
    103                     (raw-node? b)
    104                     (eq? (raw-node-database a) (raw-node-database b))
    105                     (equal? (raw-node-index a) (raw-node-index b))))
    106              (λ (a r)
    107                (bitwise-xor (eq-hash-code (raw-node-database a))
    108                             (r (raw-node-index a))))
    109              (λ (a r)
    110                (bitwise-xor (eq-hash-code (raw-node-database a))
    111                             (r (raw-node-index a)))))]
    112 
    113 The following function can then be used to test if two nodes are the same, based
    114 on the contents of their @tc[raw] field:
    115 
    116 @chunk[<same-node?>
    117        (define (same-node? a b)
    118          (and ((struct-predicate node-id) a)
    119               ((struct-predicate node-id) b)
    120               (equal? ((struct-accessor node-id raw) a)
    121                       ((struct-accessor node-id raw) b))))]
    122 
    123 To detect cycles within the graph while implementing node equality, we use a
    124 global hash table tracking which nodes have already been visited.
    125 
    126 @chunk[<seen-hash-table>
    127        (define seen-nodes
    128          : (Parameterof (U #f (HashTable (raw-node Any Any) Any)))
    129          (make-parameter #f))]
    130 
    131 The current implementation uses a mutable hash table. It is only initialised
    132 when @tc[equal?] starts comparing two nodes, so that references to nodes are not
    133 kept once @tc[equal?] finished running. However, in theory, an immutable hash
    134 table could be threaded through all the recursive calls to @tc[equal?].
    135 Unfortunately, the recursive equality function supplied by Racket when
    136 implementing @tc[prop:equal+hash] does not accept an extra parameter to thread
    137 state throughout the recursion. It would therefore be necessary to re-implement
    138 the algorithm used by Racket's @tc[equal?] as described by
    139 @cite[adams2008scheme-equality] tailored to the comparison of data structures
    140 with high-level logical cycles. To be correct, such a re-implementation would
    141 however need to access the @tc[prop:equal+hash] property of other structs, but
    142 Racket provides no public predicate or accessor for that property. Therefore,
    143 although it would, in theory, be possible to implement node equality without
    144 mutable state, Racket's library does not offer the primitives needed to build
    145 it. We therefore settle on using a global, mutable hash table, which will exist
    146 only during the execution of @tc[equal?].
    147 
    148 @chunk[<node-equal+hash>
    149        (define-syntax/parse
    150            (make-node-comparer common-id node-id name fieldᵢ …)
    151          (define-temp-ids "~a/τ" (fieldᵢ …))
    152          #'(let ()
    153              <same-node?>
    154              <find-in-table>
    155              <node-hash>
    156              (list <node-equal?>
    157                    <node-equal-hash-code>
    158                    <node-equal-secondary-hash-code>)))]
    159 
    160 @subsection{Hashing nodes}
    161 
    162 @tc[equal-hash-code] and @tc[equal-secondary-hash-code] are implemented via
    163 a single function @tc[node-hash], the only difference being the function used to
    164 recursively compute the hash of sub-elements.
    165 
    166 @chunk[<node-equal-hash-code>
    167        (λ (a rec-equal-hash-code)
    168          (node-hash a rec-equal-hash-code))]
    169 
    170 @chunk[<node-equal-secondary-hash-code>
    171        (λ (a rec-equal-secondary-hash-code)
    172          (node-hash a rec-equal-secondary-hash-code))]
    173 
    174 It would be desirable to implement hashing in the following way: if the current
    175 node is already present in a hash table of seen nodes, but is not @tc[eq?] to
    176 that copy, then the racket hash function is called on the already-seen node.
    177 Otherwise, if the node has never been seen, or if it is @tc[eq?] to the seen
    178 node, the hash code is computed.
    179 
    180 The problem with this approach is that it introduces an intermediate recursive
    181 call to Racket's hashing function. When Racket's hashing function is applied to
    182 a structure with the @tc[prop:equal+hash] property, it does @emph{not}
    183 return the result of the struct's hash implementation unmodified.
    184 
    185 For example, the code below implements a struct @tc[s] with no fields, which
    186 computes its hash code by recursively calling Racket's hashing function on other
    187 (unique) instances of @tc[s], and returns the constant @tc[1] at a certain
    188 depth. The custom hashing function does not alter in any way the result returned
    189 by Racket's hashing function, however we can see that the hash for the same
    190 instance of @tc[s] depends on the number of recursive calls to Racket's hashing
    191 function @tc[r]. This simple experiment seems to suggest that Racket adds
    192 @tc[50] at each step, but this is not something that can be relied upon.
    193 
    194 @(require scribble/eval)
    195 @defs+int[
    196  {(define recursion-depth (make-parameter #f))
    197   (struct s (x) #:transparent
    198     #:property prop:equal+hash
    199     (list (λ (a b r) (error "Not implemented"))
    200           (λ (a r)
    201             (if (= 0 (recursion-depth))
    202                 1
    203                 (parameterize ([recursion-depth (sub1 (recursion-depth))])
    204                   (r (s (gensym))))))
    205           (λ (a r) (error "Not implemented"))))
    206   (define s-instance (s 'x))}
    207  (parameterize ([recursion-depth 0])
    208    (equal-hash-code s-instance))
    209  (parameterize ([recursion-depth 1])
    210    (equal-hash-code s-instance))
    211  (parameterize ([recursion-depth 2])
    212    (equal-hash-code s-instance))]
    213 
    214 Since the order of traversal of the nodes is not fixed in the presence of sets
    215 and hash tables, we need to make sure that the recursion depth at which a node's
    216 hash is computed is constant. We achieve this by @emph{always} calling Racket's
    217 hash function on the already-seen node from the hash table, even if was inserted
    218 just now. To distinguish between the current node and the already-seen node from
    219 the hash table, we remove the contents of the node's @tc[raw] field, and replace
    220 them with a special marker.
    221 
    222 @chunk[<node-hash>
    223        (: node-hash (∀ (fieldᵢ/τ …)
    224                        (→ (node-id fieldᵢ/τ … Any Any) (→ Any Integer) Integer)))
    225        (define (node-hash nd racket-recur-hash)
    226          (if (eq? (raw-node-database ((struct-accessor node-id raw) nd))
    227                   'unique-copy)
    228              <compute-hash>
    229              <hash-init-table-and-recur>))]
    230 
    231 When the node's @tc[raw] field does not indicate @tc['unique-copy], we first
    232 initialise the hash table if needed, then recursively call
    233 @tc[racket-recur-hash] on the unique copy of the node:
    234 
    235 @chunk[<hash-init-table-and-recur>
    236        (let ([seen-table (or (seen-nodes)
    237                              ((inst make-hash (raw-node Any Any) Any)))])
    238          (parameterize ([seen-nodes seen-table])
    239            (racket-recur-hash (find-in-table seen-table nd))))]
    240 
    241 To obtain the unique copy of the node, we look it up in the hash table, creating
    242 it and adding it to the hash table if the current node is not already present
    243 there:
    244 
    245 @chunk[<find-in-table>
    246        (: find-in-table (∀ (fieldᵢ/τ …)
    247                            (→ (HashTable (raw-node Any Any) Any)
    248                               (node-id fieldᵢ/τ … Any Any)
    249                               Any)))
    250        (define (find-in-table seen-table nd)
    251          (hash-ref! seen-table
    252                     ((struct-accessor node-id raw) nd)
    253                     (λ () <make-unique-copy-node>)))]
    254 
    255 To create a unique copy of the node, we create a new instance of the node's
    256 struct, and copy over all the fields except for the @tc[raw] field, whose value
    257 becomes @tc['unique-copy].
    258 
    259 @chunk[<make-unique-copy-node>
    260        ((struct-constructor node-id) ((struct-accessor common-id fieldᵢ) nd)
    261                                      …
    262                                      (raw-node 'unique-copy 'unique-copy))]
    263 
    264 The hash code is finally computed by combining the hash code for each field's
    265 contents (after forcing it). The node's tag name is also hashed, and added to
    266 the mix.
    267 
    268 @chunk[<compute-hash>
    269        (combine-hash-codes
    270         (racket-recur-hash 'name)
    271         (racket-recur-hash (force ((struct-accessor common-id fieldᵢ) nd)))
    272         …)]
    273 
    274 To combine hash codes, we simply compute their @elem[#:style 'tt]{xor}. Later
    275 versions of this library may use more sophisticated mechanisms.
    276 
    277 @chunk[<combine-hash-codes>
    278        (: combine-hash-codes (→ Integer * Integer))
    279        (define (combine-hash-codes . hashes)
    280          (apply bitwise-xor hashes))]
    281 
    282 @subsection{Caching node equality}
    283 
    284 We provide a mechanism at run-time to cache the result of equality tests
    285 within a limited dynamic scope. The graph generation procedure can coalesce
    286 nodes which are @racket[equal?], which means that it needs to perform a
    287 significant number of equality comparisons, and can therefore benefit from
    288 caching the result of inner equality tests during the execution of the
    289 coalescing operation.
    290 
    291 @chunk[<equality-cache>
    292        (define equality-cache
    293          : (Parameterof (U #f (HashTable (Pairof (raw-node Any Any)
    294                                                  (raw-node Any Any))
    295                                          Boolean)))
    296          (make-parameter #f))]
    297 
    298 The @racket[with-node-equality-cache] form executes its body while enabling
    299 caching of the result of direct and recursive calls to @racket[equal?] on
    300 nodes.
    301 
    302 @chunk[<with-node-equality-cache>
    303        (define-syntax-rule (with-node-equality-cache . body)
    304          (parameterize ([equality-cache (or (equality-cache)
    305                                             <make-equality-cache>)])
    306            . body))]
    307 
    308 If necessary, a new equality cache is created, unless
    309 @racket[with-node-equality-cache] is used within the dynamic extent of another
    310 use of itself.
    311 
    312 @chunk[<make-equality-cache>
    313        ((inst make-hash (Pairof (raw-node Any Any) (raw-node Any Any)) Any))]
    314 
    315 When comparing two nodes, we first check whether an equality cache is
    316 installed. If so, we attempt to query the cache, and memoize the result of the
    317 comparison when the pair of values is not already in the cache.
    318 
    319 @chunk[<memoize-equality>
    320        (λ (result-thunk)
    321          (let ([e-cache (equality-cache)])
    322            (if e-cache
    323                (cond
    324                  [(hash-has-key? e-cache (cons a-raw b-raw))
    325                   (hash-ref e-cache (cons a-raw b-raw))]
    326                  [(hash-has-key? e-cache (cons b-raw a-raw))
    327                   (hash-ref e-cache (cons b-raw a-raw))]
    328                  [else
    329                   (let ([result (result-thunk)])
    330                     (hash-set! e-cache (cons a-raw b-raw) result)
    331                     result)])
    332                (result-thunk))))]
    333 
    334 @subsection{Comparing nodes for equality}
    335 
    336 We implement equality following the same architecture as for hash codes, but
    337 check that both nodes are already unique copies. In addition, the implementation
    338 of @tc[equal?] checks that both values are of the node's type.
    339 
    340 @chunk[<node-equal?>
    341        (λ (a b racket-recur-equal?)
    342          (and ((struct-predicate node-id) a)
    343               ((struct-predicate node-id) b)
    344               (let ([a-raw ((struct-accessor node-id raw) a)]
    345                     [b-raw ((struct-accessor node-id raw) b)])
    346                 (if (and (eq? (raw-node-database a-raw) 'unique-copy)
    347                          (eq? (raw-node-database b-raw) 'unique-copy))
    348                     <compare>
    349                     (or (same-node? a b)
    350                         (<memoize-equality>
    351                          (λ () <equality-init-table-and-recur>)))))))]
    352 
    353 When either or both of the node's @tc[raw] field do not indicate
    354 @tc['unique-copy], we first initialise the hash table if needed, then
    355 recursively call @tc[racket-recur-hash] on the unique copy of each node:
    356 
    357 @chunk[<equality-init-table-and-recur>
    358        (let ([seen-table (or (seen-nodes)
    359                              ((inst make-hash (raw-node Any Any) Any)))])
    360          (parameterize ([seen-nodes seen-table])
    361            (racket-recur-equal? (find-in-table seen-table a)
    362                                 (find-in-table seen-table b))))]
    363 
    364 The nodes are compared pointwise, checking each pair of fields for equality,
    365 after forcing both:
    366 
    367 @chunk[<compare>
    368        (and (racket-recur-equal? (force ((struct-accessor common-id fieldᵢ) a))
    369                                  (force ((struct-accessor common-id fieldᵢ) b)))
    370             …)]
    371 
    372 @chunk[<*>
    373        (require racket/promise
    374                 racket/string
    375                 racket/require
    376                 phc-toolkit
    377                 remember
    378                 typed-struct-props
    379                 (for-syntax racket/base
    380                             racket/syntax
    381                             racket/list
    382                             racket/set
    383                             racket/format
    384                             (subtract-in syntax/stx phc-toolkit/untyped)
    385                             syntax/parse
    386                             phc-toolkit/untyped))
    387        
    388        (provide make-node-comparer
    389                 make-node-writer
    390                 raw-node
    391                 write-node-depth
    392                 with-node-equality-cache)
    393 
    394        <equality-cache>
    395        <with-node-equality-cache>
    396        <seen-hash-table>
    397        <write-node-depth>
    398        <node-custom-write>
    399        <raw-node>
    400        <combine-hash-codes>
    401        <node-equal+hash>]
    402 
    403 @define[adams2008scheme-equality
    404         (string-append "Efficient nondestructive equality checking for trees"
    405                        " and graphs, Adams and Dybvig, 2008")]
    406 @bibliography[
    407  @bib-entry[#:key adams2008scheme-equality
    408             #:title @list{Efficient nondestructive equality checking for trees
    409               and graphs in @emph{ACM Sigplan Notices} (Vol. 43, No. 9)
    410               pp. 179–188}
    411             #:date "2008"
    412             #:author "Michael D. Adams and R. Kent Dybvig"
    413             #:url "http://www.cs.indiana.edu/~dyb/pubs/equal.pdf"]]
	www Unnamed repository; edit this file 'description' to name the repository.
	Log \| Files \| Refs \| README