No relation to the sports channel.

  • 0 Posts
  • 3 Comments
Joined 1 year ago
cake
Cake day: June 9th, 2023

help-circle

  • The answer given in the spoiler tag is not quite correct!

    Test case

    According to the spoiler, this shouldn’t match “abab”, but it does.

    Corrected regex

    This will match what the spoiler says: ^.?$|^((.)\2+?)\1+$

    Full workup

    Any Perl-compatible regex can be parsed into a syntax tree using the Common Lisp package CL-PPCRE. So if you already know Common Lisp, you don’t need to learn regex syntax too!

    So let’s put the original regex into CL-PPCRE’s parser. (Note, we have to add a backslash to escape the backslash in the string.) The parser will turn the regex notation into a nice pretty S-expression.

    > (cl-ppcre:parse-string "^.?$|^(..+?)\\1+$")
    (:ALTERNATION
     (:SEQUENCE :START-ANCHOR (:GREEDY-REPETITION 0 1 :EVERYTHING) :END-ANCHOR)
     (:SEQUENCE :START-ANCHOR
      (:REGISTER
       (:SEQUENCE :EVERYTHING (:NON-GREEDY-REPETITION 1 NIL :EVERYTHING)))
      (:GREEDY-REPETITION 1 NIL (:BACK-REFERENCE 1)) :END-ANCHOR))
    

    At which point we can tell it’s tricky because there’s a capturing register using a non-greedy repetition. (That’s the \1 and the +? in the original.)

    The top level is an alternation (the | in the original) and the first branch is pretty simple: it’s just zero or one of any character.

    The second branch is the fun one. It’s looking for two or more repetitions of the captured group, which is itself two or more characters. So, for instance, “aaaa”, or “abcabc”, or “abbaabba”, but not “aaaaa” or “abba”.

    So strings that this matches will be of non-prime length: zero, one, or a multiple of two numbers 2 or greater.

    But it is not true that it matches only “any character repeated a non-prime number of times” because it also matches composite-length sequences formed by repeating a string of different characters, like “abcabc”.

    If we actually want what the spoiler says — only non-prime repetitions of a single character — then we need to use a second capturing register inside the first. This gives us:

    ^.?$|^((.)\2+?)\1+$.

    Specifically, this replaces (..+?) with ((.)\2+?). The \2 matches the character captured by (.), so the whole regex now needs to see the same character throughout.


  • Until you know a few very different languages, you don’t know what a good language is, so just relax on having opinions about which languages are better. You don’t need those opinions. They just get in your way.

    Don’t even worry about what your first language is. The CS snobs used to say BASIC causes brain damage and that us '80s microcomputer kids were permanently ruined … but that was wrong. JavaScript is fine, C# is fine … as long as you don’t stop there.

    (One of my first programming languages after BASIC was ZZT-OOP, the scripting language for Tim Sweeney’s first published game, back when Epic Games was called Potomac Computer Systems. It doesn’t have numbers. If you want to count something, you can move objects around on the game board to count it. If ZZT-OOP doesn’t cause brain damage, no language will.)


    Please don’t say the new language you’re being asked to learn is “unintuitive”. That’s just a rude word for “not yet familiar to me”. So what if the first language you used required curly braces, and the next one you learn doesn’t? So what if type inference means that you don’t have to write int on your ints? You’ll get used to it.

    You learned how to use curly braces, and you’ll learn how to use something else too. You’re smart. You can cope with indentation rules or significant capitalization or funny punctuation. The idea that some features are “unintuitive” rather than merely temporarily unfamiliar is just getting in your way.