A 10 minute read covering some YAML edge-cases that you should have in mind when writing complex YAML files

  • tyler@programming.dev
    link
    fedilink
    arrow-up
    41
    arrow-down
    4
    ·
    16 days ago

    You shouldn’t write complex yaml files. Keep it simple and yaml is great. Do complex stuff and you’ll hate your life.

    • ruk_n_rul@monyet.cc
      link
      fedilink
      arrow-up
      29
      ·
      16 days ago

      If you write your own tooling then it’s great. The vast majority of us are using other people’s tooling and have to deal with their imposed complexity. I for one hate GitHub actions with a passion.

      • tyler@programming.dev
        link
        fedilink
        arrow-up
        12
        ·
        16 days ago

        None of the complexity of GitHub actions would be solved with any other configuration language. It needs to be a full scripting language at minimum. The problems with GHA have nothing to do with yaml.

      • atzanteol@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        8
        ·
        edit-2
        16 days ago

        I’m convinced everybody who told me that “GitHub actions are great!” were just part of one big prank.

      • tyler@programming.dev
        link
        fedilink
        arrow-up
        1
        ·
        14 days ago

        JSON is not easier for most strings. Anything multiline for example.

        But yaml is a superset of JSON so you literally can use JSON and it’s still valid YAML.

  • Ephera@lemmy.ml
    link
    fedilink
    English
    arrow-up
    29
    ·
    16 days ago

    Man, even knowing that YAML document was going to be laden with bullshit, I only spotted the unquoted version strings looking fishy.

    I also really dislike how often YAML is abused to take keys as list items. Something like this, for example:

    hosts:
      debian-vm:
        user: root
      database-server:
        user: sql
    

    “debian-vm” and “database-server” are the hostname, and as such they are values. So, this should be written as:

    hosts:
      - name: debian-vm
        user: root
      - name: database-server
        user: sql
    

    And I’m not just nitpicking here. If we change the example a bit:

    hosts:
      database:
        user: sql
    

    …then suddenly, you don’t know, if “database” is just one of many possible hosts or if all hosts always have a shared database with this technology.

  • Sickday@kbin.earth
    link
    fedilink
    arrow-up
    12
    ·
    16 days ago

    Interesting read. Wish I would’ve found it years ago when I started my first DevOps gig. The company used AWS and CloudFormation (YAML, not JSON) quite a bit along with Ansible. The things I saw in that hellscape were brutal.

  • Eager Eagle@lemmy.world
    link
    fedilink
    English
    arrow-up
    10
    ·
    edit-2
    16 days ago

    Writing YAML is only better than writing XML. I’d rather read and write JSON, which is allegedly not “human-friendly” for some reason.

    If you get to choose a format, please pick something else. There are plenty of better options these days.

  • moonpiedumplings@programming.dev
    link
    fedilink
    English
    arrow-up
    9
    ·
    edit-2
    15 days ago

    See also: noyaml.com

    I personally like yaml though. Although I won’t deny it can be hellish to write without a linter, it’s just like any other language with tab autocomplete and warning for sus things if you have the right software set up.

    I used the ansible and kubernetes VSCode extensions, and I really like them both. With the kubernetes one, you can just start typing the name of the resources you want to create, and then press tab, and boom, a template is created.

    I would much rather see something like Nix be the norm, but I find Nix very frustrating to edit because the language servers for it are nowhere near as developed.

    • GTG3000@programming.dev
      link
      fedilink
      arrow-up
      1
      ·
      14 days ago

      This is kinda my experience. If there’s an extension keeping track of schema and linting, it’s alright.

      If you’re doing it by hand, well, good luck.

      My personal favourite way to make configs is lua. But that’s neither here nor there.

  • Trigger2_2000@sh.itjust.works
    link
    fedilink
    arrow-up
    9
    ·
    edit-2
    14 days ago

    I read part of it; it was too painful to read more.

    I kept finding myself saying “Well that’s stupid” over and over again.

    Edit: To clarify, it’s yaml parsing that is “stupid”; the article was great.

  • AnyOldName3@lemmy.world
    link
    fedilink
    arrow-up
    11
    arrow-down
    3
    ·
    16 days ago

    Most of the problems can be totally avoided by telling the YAML loader what type you’re expecting instead of forcing it to guess (e.g. provide a schema or use typed getter functions). If it has to guess, it’s no surprise that some things don’t survive the string to inferred type to desired type journey, and this is something that isn’t seen as a dealbreaker in other contexts, e.g. the multitude of languages where the string "false" evaluates to true when converted to a boolean because it’s non-empty.

    • pcouy@lemmy.pierre-couy.frOP
      link
      fedilink
      arrow-up
      7
      ·
      16 days ago

      In any almost other context (where boolean values exist), strings must be delimited by quotes, eliminating the ambiguity with false as string contents and the false boolean value

      • AnyOldName3@lemmy.world
        link
        fedilink
        arrow-up
        4
        ·
        16 days ago

        Putting "false" in a YAML file gives you a string, and just false on its own gives you a boolean, unless you tell the YAML library that it’s a string. Part of the point of YAML is that you don’t have to specify lots of stuff that’s redundant except when it would otherwise be ambiguous, and people misinterpret that as never having to specify anything ever.

        • pcouy@lemmy.pierre-couy.frOP
          link
          fedilink
          arrow-up
          8
          ·
          16 days ago

          The problem is specifically that in’t not exactly clear what’s considered ambiguous. For instance, no is the same thing as false, but as evidenced in the linked post, in the context of country codes, it means “Norway” and it’s not obvious that it might get interpreted as a boolean value.

          It’s the same thing as this famous meme about implicit type conversions in JS :

          • atzanteol@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            9
            ·
            edit-2
            16 days ago

            TBF (I don’t often defend JS) - one of those is just “standard floating point issues” that every developer should be aware of. Computers cannot represent an infinite array of numbers between 0 and 1.

            • pixelscript@lemm.ee
              link
              fedilink
              English
              arrow-up
              8
              arrow-down
              1
              ·
              edit-2
              16 days ago

              The first four of them are “just how floats work”, yeah. Has nothing to do with JavaScript.

              typeof NaN
              // "number"
              

              Classic, yes, very funny. “NaN stands for ‘not a number’ but it says it’s a number”. But for real though. It’s still a variable that’s the Number type, but its contents happen to be invalid. It’s Not a (Valid) Number.

              The next three are just classic floating point precision moments.

              The Math.max() and Math.min() ones are interesting. Seems that under the hood, both methods implicitly have a fallback “number” that it compares to any argument list you give it that will auto-lose (or at closest, tie) with any other valid number you can possibly give it, so when you give it nothing at all, they leak out. Honestly, makes sense. Kinda ludicrous it needs to have defined behavior for a zero-argument call in the first place. But JS is one of those silly languages that lets you stuff in or omit as many arguments as you want with no consequences, function signature be damned. So as long as that paradigm exists, the zero-argument case probably ought to do something, and IMO this isn’t the worst choice.

              Every other one is bog standard truthy/type coercion shitlery. A demonstration of why implicit type coercion as a language feature is stupid.

  • FizzyOrange@programming.dev
    link
    fedilink
    arrow-up
    6
    ·
    15 days ago

    The problem is there aren’t really any good alternatives that have as widespread support. I’ve looked at lots and always found some annoying flaw that JSON or YAML don’t have. I mainly want good support in Python, Rust and VSCode.

    • JSON5: This is my ideal alternative but it has surprisingly poor support. No good VSCode extension. There’s a Serde crate but it’s not very popular.
    • Jsonnet: This has great VSCode support and support for lots of languages including Rust, but for some inexplicable reason they won’t let you use it with Serde just to load documents.
    • TOML: This is just not a good format. It’s ok for very basic things but any level of nesting and it becomes even worse than YAML.
    • Cue: Only supported language is Go.

    There isn’t really a perfect option at the moment IMO.

    If I’m using Rust I tend to go with RON at the moment. Sometimes I do use YAML but I write it as JSON (since YAML is a superset of JSON) with # comments.

    Also never output YAML from your programs. You can always output JSON instead which is better.

    • umbraroze@lemmy.world
      link
      fedilink
      arrow-up
      3
      ·
      edit-2
      14 days ago

      My hierarchy goes something like this:

      • A relatively trivial configuration file? TOML
      • A configuration file that needs a bit of complexity and nesting? YAML
      • Is it getting so complicated and longwinded that you’re actually unlikely to touch it by hand anyway? JSON
      • Have we become downright enterprisey? XML
  • Thinker@lemmy.world
    link
    fedilink
    arrow-up
    6
    arrow-down
    3
    ·
    16 days ago

    YAML is truly an untenable format. I’m personally excited for KDL to stabilize and hopefully see wider adoption, but in the meantime I’m fine sticking with JSON most of the time.

    • FizzyOrange@programming.dev
      link
      fedilink
      arrow-up
      2
      ·
      15 days ago

      That has XML semantics, which isn’t what people want in the vast majority of cases. They want JSON semantics because it matches programming language object models.

      XML semantics are good for documents.

    • moonpiedumplings@programming.dev
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      4
      ·
      edit-2
      15 days ago

      I don’t see anything about turing completeness or programmatic capabilities in their github. Any language that doesn’t have the programmatic abilities will inevitably get them hacked on when someone needs them, like what happened to yaml a bunch of times for a bunch of different software. This is one of people’s many frustrations with yaml, the fact that doing a loop, an if statement, or templating, is different for every single software that uses yaml. Even within Kubernetes, there exists different ways to do templates.

      I would much rather see the language consider those things first, then see it repeat one of the biggest mistakes of yaml. This is why I am more eager for things like nickel, or even Nix as a configuration language, and am skeptical of any new standard that doesn’t have those features.

      • Magiilaro@feddit.org
        link
        fedilink
        arrow-up
        3
        ·
        14 days ago

        Yaml is a data storage format, why should it have any kind of programmability or even turning completeness?

        Those things should be done in the program that uses the data not inside of the data itself.

        • moonpiedumplings@programming.dev
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          13 days ago

          Yaml is a data storage format

          I have literally never seen yaml used as a data storage format, only as a configuration language. Ansible, Kubernetes, Home manager, netplan, and many, many other examples of yaml as a configuration language, but I cannot think of an example of yaml as a data storage format off the top of my head.

          Given the:

          package {
            name my-pkg
            version "1.2.3"
          
            dependencies {
              // Nodes can have standalone values as well as
              // key/value pairs.
              lodash "^3.2.1" optional=#true alias=underscore
            }
          

          On the README of the KDL Github, it looks like KDL has a similar goal to be a configuration langauge, rather than a data storage format.

          • Magiilaro@feddit.org
            link
            fedilink
            arrow-up
            3
            ·
            13 days ago

            Configuration is a type of stored data.

            Configuration is data that is read and parsed on program startup.

            But limiting it to configuration storage only makes it only more absurd to implement turning completeness into the language.

  • lurklurk@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    15 days ago

    Just don’t do yaml.

    yq can translate yaml to json and in most cases json is still valid yaml

  • gencha@lemm.ee
    link
    fedilink
    arrow-up
    3
    arrow-down
    5
    ·
    15 days ago

    If you’re comparing YAML with JSON, it displays that you understand neither.

    JSON is designed for data exchange between systems. YAML is designed to describe data for a single system, and is always subject to individual implementations.

    They are not interchangeable concepts.

    • vrighter@discuss.tchncs.de
      link
      fedilink
      arrow-up
      6
      ·
      14 days ago

      all json is valid yaml and can be parsed with a yaml parser. Yaml is literally a superset of json. In what world are they not comparable?

    • pcouy@lemmy.pierre-couy.frOP
      link
      fedilink
      arrow-up
      2
      ·
      15 days ago

      They are both serialization formats that are supposed to be able to represent the same thing. Converting between these 2 formats is used in the article as a way to highlight yaml’s parsing quirks (since JSON only has a single way to represent the false boolean value, it makes it clear that the no value in yaml is interpreted as a boolean false and not as the "no" string)

      Anyway, I disagree with your point about YAML and JSON not being interchangeable