Luc Gommans - jq for dummies

jq for dummies

Written on 2021-08-01

jq is a neat little tool that I initially used for just pretty-printing and syntax-highlighting JSON data, but it can do more. The syntax looks like an amalgamation of various languages and the documentation leaves some things as an exercise to the reader, but it's not hard, especially to get started, so let's do just that!

Simple usage

The simplest jq program is . (just a dot), it will just take the current object and output it again. Let's use it:

$ echo '{"member":{"name":"Joe Sample", "money":512, "memberTime": 86400}}' |
    jq .

{
  "member": {
    "name": "Joe Sample",
    "money": 512,
    "memberTime": 86400
  }
}

The next-simplest program is .fieldName.subField, taking one field from the data:

$ echo '{"member":{"name":"Joe Sample", "money":512, "memberTime": 86400}}' |
    jq .member.name

"Joe Sample"

What if we have an array of members? You can use [] to iterate over an array:

$ echo '{"members": [
         {"name":"Joe Sample", "money":512, "memberTime": 86400},
         {"name":"Jane Ampel", "money":9001, "memberTime": 1800}
        ] }' | jq .members[].name

"Joe Sample"
"Jane Ampel"

The array we are iterating over is .members and we apply the .name filter to each iteration.

To get rid of those quotes for every line of output, there is the -r option. Let's use this and also introduce quotes around the filter so we don't accidentally trigger special shell characters.

$ echo '{"members": [
         {"name":"Joe Sample", "money":512, "memberTime": 86400},
         {"name":"Jane Ampel", "money":9001, "memberTime": 1800}
        ] }' | jq -r '.members[].name'

Joe Sample
Jane Ampel

The final basic usage example shows how to combine multiple fields on one line, which I personally use most of the time when I use jq:

$ echo '{"members": [
         {"name":"Joe Sample", "money":512, "memberTime": 86400},
         {"name":"Jane Ampel", "money":9001, "memberTime": 1800}
        ] }' | jq -r '.members[] | "\(.name) got \(.money) bucks"'

Joe Sample got 512 bucks
Jane Ampel got 9001 bucks

There are a few new things here: the pipe symbol (vertical bar) and the string with fields. Some more info about what's going on:

We could have used a pipe in the previous example as well (.members[] | .name), it works either way. In this case, it is required: you want to "pipe" the data from the first part into the second, similar to regular shell pipes.
The string fields should also be relatively clear: you start a string and inside you can use the \(...) syntax to specify what data should go there.
No "print" command is needed: everything in jq sends its output forward (it "emits" the output, in jq terminology). If the string is the last thing, it will be sent to stdout.

This might be a good break if everything so far was new to you.
Next time you need to parse some JSON, try the above and revisit the blog post when you want more!

More formatting

We already started formatting with the string, but much more is possible. I'll now dive into a few examples that I used earlier today (also since they aren't obvious from the jq documentation or examples).

The first thing is that I wanted to turn the time, in seconds, into something more useful. One might think that "\(.memberTime/3600)" would do the trick and the documentation says that / is indeed division, but that would have been too obvious. Instead, the operation needs to be wrapped in parentheses (()):

$ echo '{"members": [
         {"name":"Joe Sample", "money":512, "memberTime": 86400},
         {"name":"Jane Ampel", "money":9001, "memberTime": 1800}
        ] }' | jq -r '.members[] | "\(.name) joined \((.memberTime/3600)) hours ago"'

Joe Sample joined 24 hours ago
Jane Ampel joined 0.5 hours ago

What if we want only the person's first name? jq has a split function that works with data from the input pipe and "emits" the result. We can also use the [N] operator to select an index, [0] being the first one. If we split on space, we get ["Joe","Sample"] and so [0] would give us "Joe".

I will also start adding newlines for clarity, but note that newlines have no meaning in jq.

$ echo '{"members": [
         {"name":"Joe Sample", "money":512, "memberTime": 86400},
         {"name":"Jane Ampel", "money":9001, "memberTime": 1800}
        ] }' | jq -r '.members[]
        | "\((.name | split(" ")[0])) joined \((.memberTime/3600)) hours ago"'

Joe joined 24 hours ago
Jane joined 0.5 hours ago

Eww, see that misalignment? It might not be so bad for 2 sentences, but in a table that would look terrible. There does not seem to be a native padding or string repeat function, but we can use a tiny loop to add spaces. The next example will overwrite the name field with only the first name and demonstrate the loop:

$ echo '{"members": [
         {"name":"Joe Sample", "money":512, "memberTime": 86400},
         {"name":"Jane Ampel", "money":9001, "memberTime": 1800}
        ] }' | jq -r '.members[]
      | .name = (.name | split(" "))[0]
      | until((.name | length) >= 4; .name += " ")
      | "\(.name) joined \((.memberTime/3600)) hours ago"'

Joe  joined 24 hours ago
Jane joined 0.5 hours ago

The same string padding trick would not work for memberTime because it is a number: you can't add strings to number, but for that we have a tostring function, after which we can do the same:

$ echo '{"members": [
         {"name":"Joe Sample", "money":512, "memberTime": 86400},
         {"name":"Jane Ampel", "money":9001, "memberTime": 1800}
        ] }' | jq -r '.members[]
      | .memberTime = ((.memberTime / 3600) | tostring)
      | .name = (.name | split(" "))[0]
      | until((.name | length) >= 4; .name += " ")
      | until((.memberTime | length) >= 3; .memberTime += " ")
      | "\(.name) joined \(.memberTime) hours ago"'

Joe  joined 24  hours ago
Jane joined 0.5 hours ago

This is starting to look like a decent program! If we want to go much beyond this, it might be good to just move to Python, but so far it's still maintainable and I think understandable for regular programmers even if they don't know jq syntax specifically, so they could maintain and tweak it.

I think you are now ready to read the manual by yourself: most of its quirks that I ran into I have already covered and from here I'd mostly just rattle off more random functions that you can also simply find in the manual as you go.

More random functions

Figured I might as well mention the ones I used anyway :-)

The function sub, the way to do string replacement (substitution), works similar to the until function (ahem, "loop"):

$ echo '{"A": "Floppy Disk"}' | jq -r '.A | sub("sk"; "ks")'

Substring (see why I don't like the sub abbreviation for a replace function?) works by giving a range in square brackets, similar to Python:

$ echo '{"A": "Floppy Disk"}' | jq -r '.A[0:4]'

Another thing I noticed can be useful is just defining a completely new JSON object and using the join function as a final step, but rather than cooking up another minimal example, I think it should be clear from the worked example if you made it this far, so let's move on to that.

Extracing Jira work logs

So this is the culmination of figuring all this out: parsing Jira's JSON response into a nice table. We can now avoid using the web interface and it was a nice way to learn jq better, which I have been meaning to do for some time.

This is an excerpt of my bash script, let's call it getworklog:

issue="$1"
username=luc

echo -n 'Password: '
read -s password

args="-sS -H Content-Type:application/json -u $username:$password"

curl $args "https://jira.example.org/rest/api/2/issue/$issue/worklog" | jq -r '.worklogs[]
    | {
        "firstname": .author.displayName | split(" ")[0],
        "started": (.started[0:16] | sub("T"; " ")),
        "timespent": ((.timeSpentSeconds/3600) | tostring[0:4]),
        "comment": ("h " + .comment)
    }
    | until((.firstname | length) >= 4; .firstname += " ")
    | until((.timespent | length) >= 4; .timespent += " ")
    | join(" ")'

Let's use it!

$ ./getworklog JRA-37

Jane 2020-05-11 14:30 1.66 h customer meeting
Joe  2020-05-12 09:00 2    h internal meeting
Luc  2020-05-12 09:00 2    h internal meeting
Jane 2020-05-12 09:00 2    h internal meeting
Luc  2020-05-12 13:00 4    h worked on ticket
Jane 2020-05-12 13:00 0.5  h reviewed work

This output works well with other tools that we already know and love. You could paste this into LibreOffice Calc and it would recognize the fields so you can sum the hours, for example. Or just use awk. (You know and love both of those, right? Right?)

$ ./getworklog JRA-37 | awk '{sum+=$4} END{print sum/8}'

1.52

$ ./getworklog JRA-37 | grep -v meeting | awk '{sum+=$4} END{print sum/8}'

0.5625

1.5 person-days were spent on this ticket, only 0.6 of which were actual work getting done. Yikes! I wonder how much time we spend on meetings in total:

$ for i in {1..37}; do ./getworklog JRA-$i; done | awk '
    { total+=$4 }
    /meeting/ { meetingtime+=$4 }
    END { print meetingtime"/"total"="(meetingtime/total*100)"%" }'

1423/2174=65.4554%

Note that this is a fictional example and not a representation of the situation at my current employer :-)

Conclusion

jq is a versatile tool that fits right into your tool belt on the *nix command line. Finally, I'd like to mention an interactive jq tool that a friend mentioned, which might come in very handy when you're trying out jq syntax: https://sr.ht/~gpanders/ijq/.