(except I have it aliased to "jq-structure" locally of course. also, if there's a new fancy way to do this, I'm all ears; I've been using this alias for like... almost a decade now :/)
In the spirit of trying out jqfmt, let's see how it formats that one-liner...
Not bad! Shame that jqfmt doesn't output a newline at the end, though. The errant `%` is zsh's partial line marker. Also, `-ob -ar -op pipe` seems like a pretty good set of defaults to me - I would prefer that over it (seemingly?) not doing anything with no flags. (At least for this sample snippet.)
For small problem sizes, you can get a nontrivial improvement by moving the unique up ahead of all the string manipulation:
jq -r '[path(..)|map(if type=="number" then "[]" end)]|unique[]|join(".")/".[]"|"."+join("[]")'
For larger problem sizes, you might enjoy this approach to avoid generating the array of all paths as an intermediate, instead producing a deduped shadow structure as you go along:
jq -rn --stream 'reduce (inputs|select(.[1])[0]|map(if type=="number" then "[]" end)) as $_ (.; setpath($_; 1))|path(..)|join(".")/".[]"|"."+join("[]")'
(Note that in either case, you still run yourself into a bit of trouble with fields named "[]", as well as field names with "." in them. I assume this is not a serious issue, since you're only ever looking at this interactively.)
Not anywhere near as sophisticated as yours but I have something vaguely similar for simplifying JSON documents (while maintaining what the data also looks like) for feeding to LLMs to help them code against:
jq 'walk(if type == "array" then (if length > 0 then [.[0]] else . end) else . end)'
So that 70,000+ line Amazon example of yours would boil down to:
Oh wow, that's fantastic. I love that it includes real values while still summarizing the doc's structure. I'm going to steal that. I'll probably keep jq-structure around because it's so easy to copy/paste paths I'm looking for, but yours is definitely better for understanding what the JSON doc actually contains.
Got a bit nerd-sniped here, but first of all we can reduce if A then B else . end === if A then B end since jq 1.7:
jq 'walk(if type == "array" then (if length > 0 then [.[0]] end) end)'
Now we could contract those conditionals:
jq 'walk(if type == "array" and length > 0 then [.[0]] end)'
but it turns out we can even more usefully express if length > 0 then [.[0]] end === [limit(1; .[])] == .[:1]:
jq 'walk(if type == "array" then .[:1] end)'
From here, we can golf it a little further (this is kind of a generic type-matching pattern):
jq 'walk(arrays[:1] // .)'
although this does incur a bit more overhead than checking type directly.
Speaking of overhead, though, it turns out that the implementation of walk/1 (https://github.com/jqlang/jq/blob/master/src/builtin.jq#L212) will actually run the filter on every element of an array, even though we're about to throw most of them out, which we can eliminate by writing the recursion explicitly:
jq 'def w: if type=="array" then [limit(1; .[]|w)] elif type=="object" then .[] |= w end; w'
which gets the operation down from ~200 ms on my machine (not long enough to really get distracted, but enough to feel the wait) to a perceptually instant ~40 ms (which is mostly just the cost of reading the input). Now we can golf it down a little more:
jq 'def w: if type=="array" then [limit(1; .[]|w)] else objects[] |= w end; w'
jq 'def w: (arrays[:1]|map(w)) // (objects[] |= w); w'
(the precedence here actually allows us to eliminate the parens here...)
jq 'def w: arrays |= .[:1]|iterables[] |= w; w'
And, inaccessibility of the syntax aside, I think this does an incredible job of expressing the essence of what we're trying to do: we trim any array down to its first element, and then recursively apply the same transformation throughout the structure. jq is a very expressive language, it just looks like line noise...
This is an incredibly useful one-liner. Thank you for sharing!
I'm a big fan of jq, having written my own jq wrapper that supports multiple formats (github.com/jzelinskie/faq), but these days I find myself more quickly reaching for Python when I get any amount of complexity. Being able to use uv scripts in Python has considerably lowered the bar for me to use it for scripting.
Hmm. I stick to jq for basically any JSON -> JSON transformation or summarization (field extraction, renaming, etc.). Perhaps I should switch to scripts more. uv is... such a game changer for Python, I don't think I've internalized it yet!
But as an example of about where I'd stop using jq/shell scripting and switch to an actual program... we have a service that has task queues. The number of queues for an endpoint is variable, but enumerable via `GET /queues` (I'm simplifying here of course), which returns e.g. `[0, 1, 2]`. There was a bug where certain tasks would get stuck in a non-terminal state, blocking one of those queues. So, I wanted a simple little snippet to find, for each queue, (1) which task is currently executing and (2) how many tasks are enqueued. It ended up vaguely looking like:
I think this is roughly where I'd start to consider "hmm, maybe a proper script would do this better". I bet the equivalent Python is much easier to read and probably not much longer.
Although, I think this example demonstrates how I typically use jq, which is like a little multitool. I don't usually write really complicated jq.
uv has a feature where you can put a magic comment at the top of a script and it will pull all the dependencies into its central store when you do “uv run …”. And then it makes a special venv too I think? That part’s cloudier.
God I really abhor jq and it seems it's becoming a standard. I dislike it cause I'm too dumb to correctly dredge up it's incantations, and once a year I have to go reading their arcane docs. I suppose it's another fertile ground for LLM use.
The bad news is that much like how "I'm just going to DSL this ..." inevitably morphs into a full-blown programming language[1], so too is the ubiquitous "gah, your language is too complex, I'm going to just use this other tool that implements my favorite 10% of the cases"
which is a long way of saying: or else what? There's 100% no way that I'm going to ever, ever use <<python3 -c "import json, sys; print(json.load(sys.stdin)[...ohgawd...]">> and if you are, then more power to ya and jq apparently doesn't solve a problem you have
It's a pretty good on/off-ramp into better tools. Going from arbitrary slop to something that's a reasonable input to `nixlang` or Dhall is pure win IMHO.
I get a lot of use out of `jq` even though I prefer sounder systems than JSON.
What would "non-arcane" jq docs look like? I'm kind of in the same boat, being an infrequent jq user, but I've generally found the docs pretty easy to navigate.
Given the choice between a hypothetical standard that nobody wrote (or implemented) and a tool that organically grew complex enough to benefit from a standard, I'd rather have the latter.
Users (i.e. not implementors) usually also don't read the standard – they read the docs (ideally containing lots of examples on top of a dry enumeration of options), or today indeed ask an LLM.
jq is convenient, but I don't see the draw in building data processing pipelines on it. It's like writing complex software in shell.
Recently, I found myself wanting to do a join by filename on two sets of about 300,000 files. Tried bashing my head against jq with INDEX and various tricks and couldn't get the runtime below minutes.
Then I just gave up, fired up Python, loaded the dataset into Pandas, and did a join. Completed too fast to notice.
Oh, fantastic. jq has become an integral part of work for me.
I'll use this opportunity to plug the one-liner I use all the time, which summarizes the "structure" of a doc in a jq-able way: https://github.com/stedolan/jq/issues/243#issuecomment-48470... (I didn't write it, I'm just a happy user)
For example:
(except I have it aliased to "jq-structure" locally of course. also, if there's a new fancy way to do this, I'm all ears; I've been using this alias for like... almost a decade now :/)In the spirit of trying out jqfmt, let's see how it formats that one-liner...
Not bad! Shame that jqfmt doesn't output a newline at the end, though. The errant `%` is zsh's partial line marker. Also, `-ob -ar -op pipe` seems like a pretty good set of defaults to me - I would prefer that over it (seemingly?) not doing anything with no flags. (At least for this sample snippet.)For small problem sizes, you can get a nontrivial improvement by moving the unique up ahead of all the string manipulation:
For larger problem sizes, you might enjoy this approach to avoid generating the array of all paths as an intermediate, instead producing a deduped shadow structure as you go along: (Note that in either case, you still run yourself into a bit of trouble with fields named "[]", as well as field names with "." in them. I assume this is not a serious issue, since you're only ever looking at this interactively.)I'm a long time user of this snippet as well. I discovered fastgron [0] last year and found it convenient for some situations!
[0] https://github.com/adamritter/fastgron
Not anywhere near as sophisticated as yours but I have something vaguely similar for simplifying JSON documents (while maintaining what the data also looks like) for feeding to LLMs to help them code against:
So that 70,000+ line Amazon example of yours would boil down to: .. which is easier/cheaper to feed to an LLM for getting it to write code to process, etc. than the multi-megabyte original.Oh wow, that's fantastic. I love that it includes real values while still summarizing the doc's structure. I'm going to steal that. I'll probably keep jq-structure around because it's so easy to copy/paste paths I'm looking for, but yours is definitely better for understanding what the JSON doc actually contains.
Got a bit nerd-sniped here, but first of all we can reduce if A then B else . end === if A then B end since jq 1.7:
Now we could contract those conditionals: but it turns out we can even more usefully express if length > 0 then [.[0]] end === [limit(1; .[])] == .[:1]: From here, we can golf it a little further (this is kind of a generic type-matching pattern): although this does incur a bit more overhead than checking type directly.Speaking of overhead, though, it turns out that the implementation of walk/1 (https://github.com/jqlang/jq/blob/master/src/builtin.jq#L212) will actually run the filter on every element of an array, even though we're about to throw most of them out, which we can eliminate by writing the recursion explicitly:
which gets the operation down from ~200 ms on my machine (not long enough to really get distracted, but enough to feel the wait) to a perceptually instant ~40 ms (which is mostly just the cost of reading the input). Now we can golf it down a little more: (the precedence here actually allows us to eliminate the parens here...) And, inaccessibility of the syntax aside, I think this does an incredible job of expressing the essence of what we're trying to do: we trim any array down to its first element, and then recursively apply the same transformation throughout the structure. jq is a very expressive language, it just looks like line noise...Hat off.-
PS. Also, if I may l, thanks for the walkthrough - I'd be clapping with just the short form at the end, but the reasoning is appreciated.-
This is an incredibly useful one-liner. Thank you for sharing!
I'm a big fan of jq, having written my own jq wrapper that supports multiple formats (github.com/jzelinskie/faq), but these days I find myself more quickly reaching for Python when I get any amount of complexity. Being able to use uv scripts in Python has considerably lowered the bar for me to use it for scripting.
Where are you drawing the line?
Hmm. I stick to jq for basically any JSON -> JSON transformation or summarization (field extraction, renaming, etc.). Perhaps I should switch to scripts more. uv is... such a game changer for Python, I don't think I've internalized it yet!
But as an example of about where I'd stop using jq/shell scripting and switch to an actual program... we have a service that has task queues. The number of queues for an endpoint is variable, but enumerable via `GET /queues` (I'm simplifying here of course), which returns e.g. `[0, 1, 2]`. There was a bug where certain tasks would get stuck in a non-terminal state, blocking one of those queues. So, I wanted a simple little snippet to find, for each queue, (1) which task is currently executing and (2) how many tasks are enqueued. It ended up vaguely looking like:
which ends up producing output like (assuming queue 0 was blocked) I think this is roughly where I'd start to consider "hmm, maybe a proper script would do this better". I bet the equivalent Python is much easier to read and probably not much longer.Although, I think this example demonstrates how I typically use jq, which is like a little multitool. I don't usually write really complicated jq.
I could Google it, but tell a bit more about uv scripts. Isn't uv a package manager like pip?
uv has a feature where you can put a magic comment at the top of a script and it will pull all the dependencies into its central store when you do “uv run …”. And then it makes a special venv too I think? That part’s cloudier.
https://docs.astral.sh/uv/guides/scripts/
Makes it a snap to have a one file python script without having to explicitly pip install requests or whatever into a venv.
Example usage for those who haven't seen it yet:
May I also add this ain't a mere one liner. It's a masterclass!
this is a super useful oneliner, immediately saved to my bash profile as `jqstructure`
> Side note: Ever tried Googling for "jq formatter"? Reading search results is a nightmare since jq itself is, among other things, a formatter.
That’s what I thought too, when I read the title. To clarify: This tool formats jq commands, not JSON itself.
Which makes sense because jq, with no options, acts as a formatter by default. (it's about 50% of my jq usage).
While it doesn’t help much for search in this case, the more specific term is “pretty-printer”.
If you need to format your one-liner, maybe it shouldn't be a one liner?
Anyway whether or not this tool is advisable its definitely cool, nice work!
My prototype one-liners usually turn into Go programs :)
Sic semper :)
> If you need to format your one-liner, maybe it shouldn't be a one liner?
Entirely correct, this point.-
PS. May I also appreciate your comment, as far as form? You made both, valid, points.-
Instead of making users enable every formatting rule explicitly e.g.
It would be better if the tool enabled a common set of rules by default, so that `echo ... | jqfmt` actually did something useful :)Stop naming products after falling silverware.
jq is sed for json data. gofmt is a GO source formatter jqfmt is like gofmt a go source formatter, but for json. So jqfmt is really json beautifier...
Anyone with an ASR-33 for sale? rq?
God I really abhor jq and it seems it's becoming a standard. I dislike it cause I'm too dumb to correctly dredge up it's incantations, and once a year I have to go reading their arcane docs. I suppose it's another fertile ground for LLM use.
The bad news is that much like how "I'm just going to DSL this ..." inevitably morphs into a full-blown programming language[1], so too is the ubiquitous "gah, your language is too complex, I'm going to just use this other tool that implements my favorite 10% of the cases"
which is a long way of saying: or else what? There's 100% no way that I'm going to ever, ever use <<python3 -c "import json, sys; print(json.load(sys.stdin)[...ohgawd...]">> and if you are, then more power to ya and jq apparently doesn't solve a problem you have
1: https://www.laws-of-software.com/laws/zawinski/
It's a pretty good on/off-ramp into better tools. Going from arbitrary slop to something that's a reasonable input to `nixlang` or Dhall is pure win IMHO.
I get a lot of use out of `jq` even though I prefer sounder systems than JSON.
What would "non-arcane" jq docs look like? I'm kind of in the same boat, being an infrequent jq user, but I've generally found the docs pretty easy to navigate.
Hey. Don't hate on jq too much. It's a backdoor way to get functional programming past people's mental perceived complexity forcefields.
A standard for what? It just makes JSON look nicer and more query-able. You don't have to use it.
A standard as in there is a cottage industry of tools and websites built around it now, like this one.
Given the choice between a hypothetical standard that nobody wrote (or implemented) and a tool that organically grew complex enough to benefit from a standard, I'd rather have the latter.
Users (i.e. not implementors) usually also don't read the standard – they read the docs (ideally containing lots of examples on top of a dry enumeration of options), or today indeed ask an LLM.
Been using fx (fx.wtf) as alternative to jq recently.
Give you a nice javascript interface to do similar types of processing to what I would do with jq.
if you like fx, then you'll love https://jless.io/
I thought this title was rot13 at first. :D
Gubhtug V jnf gur bayl bar :)
PS. Honestly, it's pretty close.-
jq is convenient, but I don't see the draw in building data processing pipelines on it. It's like writing complex software in shell.
Recently, I found myself wanting to do a join by filename on two sets of about 300,000 files. Tried bashing my head against jq with INDEX and various tricks and couldn't get the runtime below minutes.
Then I just gave up, fired up Python, loaded the dataset into Pandas, and did a join. Completed too fast to notice.
Hey, author here. see also, sol: a de-minifier (formatter, exploder, beautifier) for shell one-liners
https://github.com/noperator/sol
I actually wrote jqfmt because I needed it for sol :)
Your explanation of the *many* "meanings" of "sol" is gold :)
PS. Happily, featured here:
- https://news.ycombinator.com/item?id=41556088