Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split Overview into the two specific use cases #1370

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

awwright
Copy link
Member

This is an alternative to #1244, and like that issue, this one should be incorporated prior to #1365. This PR makes a clearer distinction between validation uses and annotation uses that 1365 will draw on (it relies on classifying which use each keyword is being used for, since validation use is different from annotation use).

This PR moves vocabulary-related discussion deeper into the specification, and replaces Overview with a specific description of what a JSON Schema can do, especially in terms of its output.

Once its capabilities are described, the rest of the specification can describe how you write a schema that can do these things.

This replaces the Overview section, which is a little bit redundant with the above Abstract, then Introduction, sections.

CC @handrews: Could you provide your take on if this touches Vocabularies too much?

@awwright awwright force-pushed the two-broad-cases branch 2 times, most recently from 3c321dd to 8a76bea Compare December 17, 2022 07:11
@awwright awwright mentioned this pull request Dec 26, 2022
@handrews
Copy link
Contributor

@awwright I generally like the split. I'd overall be more comfortable working out high-level organization and terminology in a few issues. I find it hard to think about terminology changes in PRs like this because they have systemic implications. Manageable-sized PRs make it hard to think about the systemic impact, and PRs that change terminology everywhere to show consistency are too large to work with in general.

Some of this also comes from recent experiences of having wording I've written questioned- the questions were understandable, but we were only able to work out what the spec really meant because there was a solid record of discussion in issues that we could research in addition to the text.

I've also had too many experiences (both as a writer and reader) where people thought they agreed on terminology, but actually had different meanings in mind. Discussions in issues, where work can focus on the concepts and fit words to those concepts once they are thoroughly understood and agreed to, gives me much more confidence. It would also help us avoid some of the terminology inconsistencies that have accumulated because the terminology was developed piecemeal instead of worked out as a system.

There's a balance between working out conceptual pieces and working out the whole system that I'm not sure how to manage, but in both cases I find working from concrete wording in PRs difficult. It makes me nervous for reasons that have little to do with the PR and a lot to do with how things have ended up misunderstood in the past.

@handrews
Copy link
Contributor

As far as vocabularies, I see the point of moving things down. I also feel like it's important to get the concept established early (although it doesn't need as much detail as is currently present in the overview). This is another thing where working at an outline level (or slightly more detailed) would help a lot more than looking at text changes and movements.

This better describes the specific uses that JSON Schema supports.
The vocabulary-related prose is moved down into the relevant section.
@awwright awwright requested review from gregsdennis and Relequestual and removed request for handrews March 28, 2023 00:29
@awwright
Copy link
Member Author

If this looks reasonable, I'd like to move this through next.

Then I'd like to focus mostly on resolving outstanding PRs to accommodate a conversion to Markdown, but I'd also like to squeeze in #1390—this would help organize the spec to accommodate writing #1365/Discussion #329.

@awwright awwright marked this pull request as ready for review March 28, 2023 01:49
Copy link
Member

@gregsdennis gregsdennis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still on the fence about needing to split validation and annotation use cases like this, although this does give a deeper dive into each. I think these sections may need to be children of a "base use cases" section to ease the reader into it a bit more.

Extracting the vocab bit is fine with me.

JSON Schemas are themselves JSON documents.
This, and related specifications, define keywords allowing authors to describe JSON
data in several ways.
A JSON Schema document describes a validator (also known as a "recognizer" or "acceptor") which classifies a provided JSON document as "accepted" or "rejected."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "accept"/"reject" terminology is new. I see you use it later in the PR as well, but it's not used throughout the document.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's new to this spec, but it is used widely outside JSON Schema and may help new readers understand what is going on. I'm going to suggest we should use accept/reject more often (it greatly simplifies the phrasing of many sentences), but that'll be an issue for later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove that language from this PR and open an issue for that change, please?

I'm not opposed to it, but I think vernacular should be an agreed-upon change, not something that's just snuck in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well my point is there's a certain segment who may see our language as new, and "accepts" is the existing term they're familiar with. I think we should use a variety of language to introduce and define the concepts, and then we can use our choice of term for the rest of the document. Is there a problem with this line of thinking?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a problem with introducing them, but this PR doesn't seem the place for it. I'd like to get the opinions of the other maintainers.

Copy link
Member Author

@awwright awwright Apr 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know what a finite state machine is, I still don't find the references you're adding helpful

significantly fewer people have a real understanding of them or how a JSON Schema can be mapped into one

Ok, though my argument is that not every part of the intro has to be helpful to everyone; it has to be written so that the widest possible audience will understand what JSON Schema accomplishes for them.

The two biggest audiences, I think, will be application developers ("I want a DSL for checking JSON, instead of doing it in code") and formal grammars ("I know what ABNF and DTDs are, I want this for JSON").

I think you'll find that other similar technology uses technical terms much more heavily than I'm suggesting we do.

I looked at the introduction for ABNF, which I found far too technical for most people to understand. It talks in technical terms that it's a formal syntax, but doesn't really describe why you'd want to use it at all, or use it over other languages.

XML DTD also talks about formal grammars, validators, and uses the accepts/rejects terminology; but it too is somewhat technical and it's not immediately obvious to me who the target audience is.

So what I'm looking for is (1) should the formal grammar audience be accommodated in the introduction? (Since ABNF and DTDs both seem to be written exclusively for this audience, I would suggest this is important.)

And (2) if we should accommodate the formal grammar audience, is there a better way to write it so that it's more helpful for them, and less confusing to others?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, though my argument is that not every part of the intro has to be helpful to everyone; it has to be written so that the widest possible audience will understand what JSON Schema accomplishes for them.

This is a code review, where saying "I don't find it helpful" is to say "I believe you should not add this, it isn't helpful to a wider audience", not simply offering my own anecdote about my personal reading.

XML DTD also talks about formal grammars, validators, and uses the accepts/rejects terminology;

Section 2.8 of a document is wildly different from being literally the first paragraph of the actual content of the document. I also don't see the "accepts/rejects" terminology in the section you linked. It uses "valid", as we already do.

So what I'm looking for is (1) should the formal grammar audience be accommodated in the introduction?

You already have my own opinion, now three times: no, we should not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not simply offering my own anecdote about my personal reading

Ok, I ask because saying "I don't find it helpful" is suggestive of a personal opinion without projecting what others will think; saying "I don't believe this will be helpful" is a general observation of the sort I'm looking for.

I'm going to have to think about what else to say, if it's not immediately obvious that formal grammars are related here, as that's the formal study of what JSON Schema is fundamentally doing.

I also don't see the "accepts/rejects" terminology in the section you linked. It uses "valid", as we already do.

XML does not use the term "validates" (in the third person singular) to refer to an outcome (and actually it doesn't use it in that form at all). It uses "validate" to describe a process, "accept"/"matches", and "reject" to describe outcomes of that process, and "valid" to describe documents that have been accepted by the process, but nothing like "validates successfully" as we do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I ask because saying "I don't find it helpful" is suggestive of a personal opinion without projecting what others will think; saying "I don't believe this will be helpful" is a general observation of the sort I'm looking for.

At the risk of quoting myself, the comment I left before that was quite clear on which I was intending, please don't ignore it:

All in all I find the first few paragraphs here to be a step back

I don't see them as adding understanding to someone reading the spec

what's here in this whole PR does too much

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to bow out of this PR as well, as I've I believe communicated I'm -1 on the changes in their current form, and that there might be smaller changes that I'm more positive on but that they're sufficiently far away from this PR in its current state that it's not a matter of rewording a small bit here and there. It bears repeating I suppose that that's just my vote, and others may disagree of course, though obviously I've landed on this PR after Greg sounds like he was expressing similar doubts.

A condition for accepting a document is called an "assertion".
Assertions impose constraints that instances must conform to.
Given a schema and an instance, the schema "accepts" an input whenever all the assertions are met,
and the schema "rejects" when any of the assertions fail.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"rejects" needs an object, i.e. what is being rejected?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input JSON document, as was mentioned in 'the schema "accepts" an input whenever...'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but grammatically, you need to repeat the object.

JSON Schemas are themselves JSON documents.
This, and related specifications, define keywords allowing authors to describe JSON
data in several ways.
A JSON Schema document describes a validator (also known as a "recognizer" or "acceptor") which classifies a provided JSON document as "accepted" or "rejected."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the schema describe a validator? I would expect people think of the "validator" as the implementation, not the document.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense... There's a sense in which these two uses are actually the same, the "validator implementation" is just a generic form of validator that is configurable. Like if I have a schema, then if the program is written or compiled to work only with that schema, or if it's generic and configured at runtime, makes no difference.

Is there a better name for "the program that tests an input against some specific schema"?

Copy link
Member

@gregsdennis gregsdennis Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you understand my point. Colloquially, the "validator" is the implementation, not the schema. I think we need to stick with this.

Saying the schema itself is the validator will be confusing. A validator evaluates JSON against a schema. The schema is no more than configuration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Colloquially, the "validator" is the implementation, not the schema.

I believe I see the point you're making, but I'm adding, this is similar to how we discuss compilers and interpreters. You're pointing out a definition of "validator" that functions like an interpreter: there's a library that reads the schema (the source code), then uses this interpretation to validate JSON.

But you can also compile source code to a program, and run the program directly. In this paradigm, there is no interpreter (what is usually called the validator), but the compiled program is still a "validator" (a thing that performs validation). It just has no concept of a schema (any more than a compiled C program can parse C).

So with JSON Schema, the schema is not the validator (as such), but I think you can say it describes a validator.

Copy link
Member

@gregsdennis gregsdennis Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see where you're coming from, but never in my experience with this project have we used "validator" that way. It has always been used to mean the implementation.

At best, this reads weird.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Julian If I compile a schema or curry away the schema argument, leaving an executable that only reads an instance, what terminology should we use for the compiler, and the program/function it outputs?

I ask because in my opinion, I think the function that accepts the instance would be the "validator", not the compiler. And I argue this usage is entirely consistent with most "validator" libraries that are more like interpreters (they both parse the schema, and validate instances, in a single package).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe we need terminology for such a concept in the spec at all (and certainly not at this point in time). What we use today is fine, "implementation", which refers to the executable program capable of doing things with schemas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the function that accepts the instance would be the "validator"

This I agree with, but it doesn't follow from this that the schema is a validator. The schema is still just "configuration" (if you want to call it that. It still goes through a library/application, and you get an output. It's just that your example also produces an intermediate output of an executable function that represents a specific schema. The system is inputting the JSON Schema (most likely as JSON or YAML text) and an instance and getting out whether the instance is valid according to that schema. That "compile" step is an intermediate implementation detail that doesn't need to be covered in the spec.

The spec needs to concern itself with one thing:

  • inputs: a schema and an instance
  • output: validation results and/or annotations

Anything an implementation does to get from input to output is necessarily beyond the scope of the spec.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it doesn't follow from this that the schema is a validator

I see, this isn't what I intended to convey. By saying "the schema describes a validator" I think that would disconnect the schema (the description) from the validator (the actual process). Is a different word is in order here, or some additional explanation ("the schema describes the behavior of a validator")?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary to say that at all.

A schema describes a set of constraints and annotations that can be applied to an instance. That's it. There's no need to bring in implementations of any form.

with a "$" character to emphasize their required nature. This vocabulary
is essential to the functioning of the "application/schema+json" media
type, and is used to bootstrap the loading of other vocabularies.
A schema may also describe an "annotator," a way to read an instance and output a set of "annotations."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the schema describing an annotator? (same as "validator" above)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, similar situation, I have a schema, and I want to use it to compile a program that takes a JSON input and returns an output format. It's not otherwise configurable, maybe this is an HTTP service. What do I call that program?

Comment on lines +170 to +172
However, not all valid input is meaningful or true to a given application.
That is, if you process an arbitrary instance with nonsense data,
the resulting annotations may not necessarily be true, even though the input is valid.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of "true" here is odd. What does it mean for an input to be "true" to an application?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I struggled a bit with how to phrase this. I'm trying to explain the phenomenon of "garbage in garbage out" and that the assertions don't have to be 100% completely defined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think dropping "true" and sticking with "meaningful" is the right way here.

@gregsdennis
Copy link
Member

I do want to address some of this with the upcoming release, but I fear this particular PR may be too far gone at this point and will need to be resubmitted.

I'm going to leave this open for now as a reminder to address it.

@gregsdennis gregsdennis added this to the stable-release milestone Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Discussion
Development

Successfully merging this pull request may close these issues.

4 participants