Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syntax for Dictionary types #3355

Open
2 tasks done
andrevidela opened this issue Jul 26, 2024 · 7 comments
Open
2 tasks done

Syntax for Dictionary types #3355

andrevidela opened this issue Jul 26, 2024 · 7 comments

Comments

@andrevidela
Copy link
Collaborator

andrevidela commented Jul 26, 2024

  • I have read CONTRIBUTING.md.
  • I have checked that there is no existing PR/issue about my proposal.

Summary

Develop new syntax to write dictionaries inspired from the list syntax sugar:

  • empty dictionary: [:=]
  • 1 key-value dictionary: [k := v]
  • multiple key-values Dictionary: [ k1 := v1, k2 := v2, ...]
  • At least one key-value in the dictionary: DCons k v ds

Motivation

Contemporary programming involves manipulation a common set of
data structures, like Lists, Strings, and numbers. One of them is dictionaries.

I will call Dictionary the general category of data structure that involve a key-value pair.
In industry, dictionaries are seen as an efficient key-value storage that indexes the value
in O(Log(n)) complexity. In idris, this datastructure is SortedMap.

In reality, there are many forms of key-value pair datastructures that
do not fulfill this performance promise but we would still like to use key-value pair syntax to represent them
those use-cases are:

  • List (a, b), lists of pairs
  • JSON objects
  • Generally deeply nested structures indexed by names, like XML/HTML
  • Configuration files formats, like toml and yaml

Because of their ubiquity, all those dictionary types should enjoy an easily accessible and
familiar syntax. This is further evidenced by other languages providing syntax sugar
specifically for dictionary types. Like we find in python {a : b}, Swift [a: b] and
Ruby {a => b}

The proposal

The dictionary syntax is developed as a desugaring step similar to the one used for lists. To
recap the list desugaring: Lists [a, b, c] are desugared into successive application of the
constructor :: (pronounced "cons") and Nil. Such that [a, b, c] desugars to
a :: (b :: (c :: Nil)).

The proposal is to allow the parser to accept syntax of the form [ a := b , c := d ].
This is meant to look like a list where terms on the left of := as assigned values to the right of
it. Another way to understand this syntax is to thing of it as record update but using square brackets
as enclosing symbols, rather than curly brackets.

Scope

The syntax sugar does not help with indexing dictionaries, dictionary update, or performance.
This proposal is only about syntax sugar to write terms that match key-value pair semantics.
They should allow pattern matching but not develop new pattern matching semantics to avoid
introducing complexity in case trees or in learning curve.

Examples

Empty dictionary: [:=]
Desugars to: DNil

Singleton dictionary: [ key := val ]
Desugars to: DCons key val DNil

Multiple key-values dictionary: [ k1 := v1 , k2 := v2 ]
Desugars to: DCons k1 v1 (DCons k2 v2 DNil)

Pattern matching:

case dict of
    -- match on empty dictionary
    [:=] => ...
    -- match on singleton dictionary 
    [ key := val ] => ...
    -- match on dictionary with two keys
    [ key1 := val1, key2 := val2 ] => ...
    -- match on at least one key
    DCons k v ds => ...

How to use it

Currently the constructor for JSON is Object : List (String, JSON) -> JSON and so writing JSON values looks like this:

value : JSON
value = Object [ ("key1", "value1") 
               , ("key2", Array ["value2", 3, "value4"])]
               ]

There are two problem, we always need to nest Object and Array and the second is that we use the full pair syntax
for key-value pairs in objects.
We can fix the Arrayand Object nesting by providing a suitable :: function, but we cannot remove the parenthesis
nesting without introducing a bespoke operator for pairing.

Instead, the above can be written in direct style by designing a DSL for objects-as-dictionaries:

value : JSON
value = [ "key1" := "value1"
        , "Key2" := ["value3", 3, "value4"]
        ]

The JSON DSL is implemented in the tests

Complex record libraries using a schema as index can be update to support the new dictionary syntax as demonstrated
here: https://github.com/andrevidela/idris-records/blob/dictionary-syntax/src/Records.idr#L78-L79

Technical implementation

This feature is implemented as a desugaring step where the parser is extended to parse lists of pairs of expressions separated by := in the same way as record updates but using square brackets instead.

A prototype implementation is here where you can see that the diff is fairly small and the changes to the syntax are also small since all the components already exist. It can be found in the draft #3356 .

Alternatives considered

Use [] for empty dictionary

This does work but it creates unwelcome ambiguities when mixing lists and dictionaries in the same data.

For example given a value of type JSON we cannot know if we are building an empty array or an empty object
if we write

empty : JSON
empty = []

and Nil stands both for empty dictionary and empty array.

To remove this ambiguity, I propose to use [:=] as a reserved token to refer to empty dictionaries.

We could use [:] or [=] which better mimic [>] and [<] but they do not reflect reflect the
syntax used for dictionaries.

Same notation, old constructors

Another option would be to desugar [k := v] into MkPair k v :: [] instead of DCons k v [].

This would work for the most common case but falls apart whenever there is a type dependency between
the key and the value, or the key-value pair and the rest of the data. For example, we could not implement
the indexed data :

-- A map guaranteed to contain the keys in the index.
data InMap : List String ->Type where
  Nil : InMap []
  Cons : (idx : String ** Elem idx ls) -> InMap xs -> InMap ls

This is because the constructor for the key/value pair is dependent, and MkPair is not.

Use a pairing operator

This problem already has a solution: Use list syntax with a custom operator to pair-up
key-values. If the pairing operator is called ::= the json example can be written:

user : JSON
user = Object [
  "name" ::= "Annie",
  "age" ::= 8,
  "power" ::= "fire"
  ]

Scala uses -> as a pairing operator:

val d = Map("name" -> "Annie", 
			"age" -> 8, 
			"power" -> "fire")

Haskell's aeson uses .= for pairing:

user : JSON
user = [ "name" .= "Annie"
       , "age" .= 8
       , "power" .= "fire"]

Other Idris libraries currently use ::=, :>, =:, :=: :->:and more.

The shortcoming of this approach is obvious by now: there is absolutely
no uniformity when building key-value pairs in Idris. However, in the absence
of a standard pairing operator, using := would render all uses of dictionaries uniform,
decreasing the cognitive overhead of learning a new operator for each bespoke dictionary
type.

Out of this, stems two counter points:

  • We already have a standard pairing operator ,
  • We actually do want to have multiple operators for different dictionary types

The current standard pairing operator is the Pair type, constructed in its type, and term,
with the syntax (_, _). One can use tuple sections to use it as a function (,). We can
also use the name Pair to build the type or MkPair to build terms of the aforementioned type.

The problem becomes apparent when attempting to use the standard constructor for pair
to build dictionaries: It is verbose, repetitive, and noisy:

user : JSON
user = [ ("name", "Annie")
       , ("age", 8)
       , ("power", "fire")
       ]

While only makes use of prelude constructors, it is not
a good user experience to type that much boilerplate. This argument is supported by
the fact that library designers have been deploying their own pairing operators to make
constructing dictionaries easier.

The second argument is strongest when applied with domain-specific knowledge. There
are uses of dictionaries where the use of := would be misleading and generally worse than
bespoke operators. For example a typing context is itself a form of dictionary, holding variables
as keys and types as values, but it is very much expected that we write them with a bespoke
operator to represent the fact that a variable is assigned a type, for example [ x :- t ].

The presence of this feature does not forbid the use of custom operators for such purposes,
and in fact, is completely backward compatible.

Do nothing

One last thing to consider is to do nothing. Other successful languages such as Java
Haskell or Rust do not have a bespoke dictionary syntax. This is clear evidence that such feature
is not necessary for wider adoption and its maintenance cost might exceed the overall
usage benefits we get from it. However, I personally think that the above argument about
uniformity are enough to justify the additional feature. As the prototype shows, the diff
for it is quite small, rendering the maintenance cost minimal, see #3356.

Conclusion

Dictionary syntax solidifies Idris as a modern programming language, enjoying the same ergonomics
as other programming environments that focus on user-friendliness and accessibility. What's more,
enabling a uniform syntax that accommodates for dictionaries, JSON, XML, dependent types all at once
represents an unprecedented opportunity to lead the way in programming language experience.

@mattpolzin
Copy link
Collaborator

mattpolzin commented Jul 26, 2024

I think the only bit of this that could provide opposing intuitions is pattern matching. One intuition is that the data structure is built inductively like everything else and therefore matching on one element is (arbitrarily) order-dependent. The other intuition is that dictionaries are not (generally) order dependent so pattern matching on one element means “this element is in the dictionary.” The latter is incorrect, of course.

This probably isn’t a problem that outweighs the benefits of pattern matching with the new syntax I don’t think, but it certainly is a consideration not shared by Lists.

@joelberkeley
Copy link
Contributor

joelberkeley commented Jul 26, 2024

Thanks for starting this in a well-thought out way. My thoughts:

  1. It might be confusing to have two uses of := which mean different things. One being assignment, the other being pairs. Particularly from the perspective of readability.
  2. A dictionary will typically be unique in its "keys". I can't think how that would be possible with a recursive data type and pattern matching on data constructors. For example, what would this be ["1" := 1, "1" := 2]? Would you get access to both "1"s for DCons k v (DCons k' v' _)? Or would one just use DCons as a function rather than a constructor here, and not allow pattern matching? Like
    export  -- not public
    data Map k v = ?whatever
    
    export
    DCons : k -> v -> Map k v -> Map k v
    DCons = -- overwrite or add k in Map
    I see no problem with this, as long as it's intentional.
  3. Does this allow for O(1) access/update hashmaps and in-place mutable maps, like stefan's linear arrays?

@andrevidela
Copy link
Collaborator Author

andrevidela commented Jul 26, 2024

@joelberkeley Thanks for your comments. For your concerns about 2 and 3, please refer to the "Scope" section:

The syntax sugar does not help with indexing dictionaries, dictionary update, or performance.
This proposal is only about syntax sugar to write terms that match key-value pair semantics.

Uniqueness of keys and dictionary indexing remains unchanged and is left as an implementation choice of the library writer.

@gallais
Copy link
Member

gallais commented Jul 26, 2024

@joelberkeley You can use induction-recursion to define a dictionary
together with a notion of key freshness and insist that all new keys
must be fresh.

import Data.So

record Distinct (x, y : String) where
  constructor MkDistinct
  runDistinct : So (x /= y)

data FreshDict : Type -> Type
isFresh : String -> FreshDict a -> Type

data FreshDict where
  DNil : FreshDict a
  DCons : (key : String) -> (value : a) ->
          (dict : FreshDict a) ->
          {auto 0 fresh : isFresh key dict} ->
          FreshDict a

isFresh x DNil = ()
isFresh x (DCons y _ dict) = (Distinct x y, isFresh x dict)

failing #"Can't find an implementation for Distinct "1" "1"."#

  test : FreshDict Nat
  test = DCons "1" 1
       $ DCons "1" 2
       $ DNil

@joelberkeley
Copy link
Contributor

joelberkeley commented Jul 26, 2024

Uniqueness of keys and dictionary indexing remains unchanged and is left as an implementation choice of the library writer.

Indeed, I mainly want to check if this functionality will lend itself well to these situations, esp when they strike me as a particularly common use-case.

@ohad
Copy link
Collaborator

ohad commented Jul 26, 2024

I like the alternative considered (using a pairing operator), I'm not so sure about the argument to reject it.

Would defining this syntax (::=, say, for concreteness) in a standard library implementation for dictionaries achieve the same goal without complicating the parser/desugaring/elaborator further?

Including it in a stdlib would signal it is the preferred/conventional syntax.
OTOH, implementing a specialised desugaring/elaboration will not help us avoid the proliferation of user-defined syntax for dictionaries.

Maybe there are other arguments against the pairing operator approach that would strengthen this proposal?

@andrevidela
Copy link
Collaborator Author

andrevidela commented Jul 26, 2024

I think the pairing operator is a very strong alternative, and it has two great benefits over this proposal:

  • It does not require changing the language
  • It already works

Beside the subjective aesthetic difference, dictionary syntax has two benefits over operators + lists:

  • The overload between [] as an empty list and [] as an empty dictionary introduces ambiguity when mixed as constructors for the same type like JSON
  • Because := is a reserved symbol, it cannot be misused. It also gives us the opportunity to design bespoke error messages related to dictionaries usage.

On this last point, providing a preferred operator in the Prelude will not prevent users from using it for non-dictionary purposes or redefining it entirely, which I could qualify as "misuse".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants