How to combine strings from JSON values, keeping only part of the string?











up vote
2
down vote

favorite












I have sample:



           "name": "The title of website",
"sync_transaction_version": "1",
"type": "url",
"url": "https://url_of_website"


I want to get the following output:



"The title of website"    url_of_website


I need to remove the protocol prefix from the URL, so that only url_of_website is left (and no http in the front).
Problem is I'm not quite familiar with sed reading multiple lines, doing some research reach me https://unix.stackexchange.com/a/337399/256195, still can't produce the result.



A valid json object that I'm trying to parse is Bookmark of google chrome , sample:



{
"checksum": "9e44bb7b76d8c39c45420dd2158a4521",
"roots": {
"bookmark_bar": {
"children": [ {
"children": [ {
"date_added": "13161269379464568",
"id": "2046",
"name": "The title is here",
"sync_transaction_version": "1",
"type": "url",
"url": "https://the_url_is_here"
}, {
"date_added": "13161324436994183",
"id": "2047",
"meta_info": {
"last_visited_desktop": "13176472235950821"
},
"name": "The title here",
"sync_transaction_version": "1",
"type": "url",
"url": "https://url_here"
} ]
} ]
}
}
}









share|improve this question




















  • 3




    Can you post a valid json object? Also jq or json is the proper tool for this, not sed.
    – Jesse_b
    Nov 29 at 14:49








  • 4




    You don't parse JSON with sed. JSON is a structured document format unsuitable for parsing by anything other than a JSON parser. Doing it with sed would require you to implement a JSON parser in sed that could handle the different entity encoding etc. that could be present in the data (especially in URLs).
    – Kusalananda
    Nov 29 at 14:51










  • @Jesse_b: Thanks, I've just added the json object, and if possible jq and json also work if it can solve the issue.
    – Tuyen Pham
    Nov 29 at 14:52










  • @Kusalananda: Thanks, I'll edit the title and change content to suit the context.
    – Tuyen Pham
    Nov 29 at 14:53















up vote
2
down vote

favorite












I have sample:



           "name": "The title of website",
"sync_transaction_version": "1",
"type": "url",
"url": "https://url_of_website"


I want to get the following output:



"The title of website"    url_of_website


I need to remove the protocol prefix from the URL, so that only url_of_website is left (and no http in the front).
Problem is I'm not quite familiar with sed reading multiple lines, doing some research reach me https://unix.stackexchange.com/a/337399/256195, still can't produce the result.



A valid json object that I'm trying to parse is Bookmark of google chrome , sample:



{
"checksum": "9e44bb7b76d8c39c45420dd2158a4521",
"roots": {
"bookmark_bar": {
"children": [ {
"children": [ {
"date_added": "13161269379464568",
"id": "2046",
"name": "The title is here",
"sync_transaction_version": "1",
"type": "url",
"url": "https://the_url_is_here"
}, {
"date_added": "13161324436994183",
"id": "2047",
"meta_info": {
"last_visited_desktop": "13176472235950821"
},
"name": "The title here",
"sync_transaction_version": "1",
"type": "url",
"url": "https://url_here"
} ]
} ]
}
}
}









share|improve this question




















  • 3




    Can you post a valid json object? Also jq or json is the proper tool for this, not sed.
    – Jesse_b
    Nov 29 at 14:49








  • 4




    You don't parse JSON with sed. JSON is a structured document format unsuitable for parsing by anything other than a JSON parser. Doing it with sed would require you to implement a JSON parser in sed that could handle the different entity encoding etc. that could be present in the data (especially in URLs).
    – Kusalananda
    Nov 29 at 14:51










  • @Jesse_b: Thanks, I've just added the json object, and if possible jq and json also work if it can solve the issue.
    – Tuyen Pham
    Nov 29 at 14:52










  • @Kusalananda: Thanks, I'll edit the title and change content to suit the context.
    – Tuyen Pham
    Nov 29 at 14:53













up vote
2
down vote

favorite









up vote
2
down vote

favorite











I have sample:



           "name": "The title of website",
"sync_transaction_version": "1",
"type": "url",
"url": "https://url_of_website"


I want to get the following output:



"The title of website"    url_of_website


I need to remove the protocol prefix from the URL, so that only url_of_website is left (and no http in the front).
Problem is I'm not quite familiar with sed reading multiple lines, doing some research reach me https://unix.stackexchange.com/a/337399/256195, still can't produce the result.



A valid json object that I'm trying to parse is Bookmark of google chrome , sample:



{
"checksum": "9e44bb7b76d8c39c45420dd2158a4521",
"roots": {
"bookmark_bar": {
"children": [ {
"children": [ {
"date_added": "13161269379464568",
"id": "2046",
"name": "The title is here",
"sync_transaction_version": "1",
"type": "url",
"url": "https://the_url_is_here"
}, {
"date_added": "13161324436994183",
"id": "2047",
"meta_info": {
"last_visited_desktop": "13176472235950821"
},
"name": "The title here",
"sync_transaction_version": "1",
"type": "url",
"url": "https://url_here"
} ]
} ]
}
}
}









share|improve this question















I have sample:



           "name": "The title of website",
"sync_transaction_version": "1",
"type": "url",
"url": "https://url_of_website"


I want to get the following output:



"The title of website"    url_of_website


I need to remove the protocol prefix from the URL, so that only url_of_website is left (and no http in the front).
Problem is I'm not quite familiar with sed reading multiple lines, doing some research reach me https://unix.stackexchange.com/a/337399/256195, still can't produce the result.



A valid json object that I'm trying to parse is Bookmark of google chrome , sample:



{
"checksum": "9e44bb7b76d8c39c45420dd2158a4521",
"roots": {
"bookmark_bar": {
"children": [ {
"children": [ {
"date_added": "13161269379464568",
"id": "2046",
"name": "The title is here",
"sync_transaction_version": "1",
"type": "url",
"url": "https://the_url_is_here"
}, {
"date_added": "13161324436994183",
"id": "2047",
"meta_info": {
"last_visited_desktop": "13176472235950821"
},
"name": "The title here",
"sync_transaction_version": "1",
"type": "url",
"url": "https://url_here"
} ]
} ]
}
}
}






text-processing sed json filter






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 29 at 20:20









MatthewRock

3,87321847




3,87321847










asked Nov 29 at 14:48









Tuyen Pham

547113




547113








  • 3




    Can you post a valid json object? Also jq or json is the proper tool for this, not sed.
    – Jesse_b
    Nov 29 at 14:49








  • 4




    You don't parse JSON with sed. JSON is a structured document format unsuitable for parsing by anything other than a JSON parser. Doing it with sed would require you to implement a JSON parser in sed that could handle the different entity encoding etc. that could be present in the data (especially in URLs).
    – Kusalananda
    Nov 29 at 14:51










  • @Jesse_b: Thanks, I've just added the json object, and if possible jq and json also work if it can solve the issue.
    – Tuyen Pham
    Nov 29 at 14:52










  • @Kusalananda: Thanks, I'll edit the title and change content to suit the context.
    – Tuyen Pham
    Nov 29 at 14:53














  • 3




    Can you post a valid json object? Also jq or json is the proper tool for this, not sed.
    – Jesse_b
    Nov 29 at 14:49








  • 4




    You don't parse JSON with sed. JSON is a structured document format unsuitable for parsing by anything other than a JSON parser. Doing it with sed would require you to implement a JSON parser in sed that could handle the different entity encoding etc. that could be present in the data (especially in URLs).
    – Kusalananda
    Nov 29 at 14:51










  • @Jesse_b: Thanks, I've just added the json object, and if possible jq and json also work if it can solve the issue.
    – Tuyen Pham
    Nov 29 at 14:52










  • @Kusalananda: Thanks, I'll edit the title and change content to suit the context.
    – Tuyen Pham
    Nov 29 at 14:53








3




3




Can you post a valid json object? Also jq or json is the proper tool for this, not sed.
– Jesse_b
Nov 29 at 14:49






Can you post a valid json object? Also jq or json is the proper tool for this, not sed.
– Jesse_b
Nov 29 at 14:49






4




4




You don't parse JSON with sed. JSON is a structured document format unsuitable for parsing by anything other than a JSON parser. Doing it with sed would require you to implement a JSON parser in sed that could handle the different entity encoding etc. that could be present in the data (especially in URLs).
– Kusalananda
Nov 29 at 14:51




You don't parse JSON with sed. JSON is a structured document format unsuitable for parsing by anything other than a JSON parser. Doing it with sed would require you to implement a JSON parser in sed that could handle the different entity encoding etc. that could be present in the data (especially in URLs).
– Kusalananda
Nov 29 at 14:51












@Jesse_b: Thanks, I've just added the json object, and if possible jq and json also work if it can solve the issue.
– Tuyen Pham
Nov 29 at 14:52




@Jesse_b: Thanks, I've just added the json object, and if possible jq and json also work if it can solve the issue.
– Tuyen Pham
Nov 29 at 14:52












@Kusalananda: Thanks, I'll edit the title and change content to suit the context.
– Tuyen Pham
Nov 29 at 14:53




@Kusalananda: Thanks, I'll edit the title and change content to suit the context.
– Tuyen Pham
Nov 29 at 14:53










1 Answer
1






active

oldest

votes

















up vote
8
down vote



accepted










This works on the JSON document given in the question:



$ jq -r '.roots.bookmark_bar.children|.children|[""(.name)"",.url]|@tsv' file.json
"The title is here" https://the_url_is_here
"The title here" https://url_here


This accesses the .children array of each .roots.bookmark_bar.children array entry and creates a string that is formatted according to what you showed in the question (with a tab character in-between the two pieces of data).



If the double quotes are not necessary, you could change the cumbersome [""(.name)"",.url] to just [.name,.url].



To trim the https:// off from the URLs, use



.url|ltrimstr("https://")


instead of just .url.






share|improve this answer























  • Thanks, at the end of the file I get this errror: jq: error (at Bookmarks:23397): Cannot iterate over null (null), 23397 is the last line of the file.
    – Tuyen Pham
    Nov 29 at 15:08










  • So I've just modified your command, the correct one should be: jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv' that eliminate the above error. One more question, Is that space or tab between title and url? What if I need to insert tab between them?
    – Tuyen Pham
    Nov 29 at 15:17






  • 1




    @TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use @csv to get output like "The title here","https://url_here"
    – glenn jackman
    Nov 29 at 15:20












  • @TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The @tsv command formats the array that it gets as a tab-delimited string.
    – Kusalananda
    Nov 29 at 15:20












  • How to trim both http:// and https://?
    – Tuyen Pham
    Nov 29 at 15:26













Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f484933%2fhow-to-combine-strings-from-json-values-keeping-only-part-of-the-string%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
8
down vote



accepted










This works on the JSON document given in the question:



$ jq -r '.roots.bookmark_bar.children|.children|[""(.name)"",.url]|@tsv' file.json
"The title is here" https://the_url_is_here
"The title here" https://url_here


This accesses the .children array of each .roots.bookmark_bar.children array entry and creates a string that is formatted according to what you showed in the question (with a tab character in-between the two pieces of data).



If the double quotes are not necessary, you could change the cumbersome [""(.name)"",.url] to just [.name,.url].



To trim the https:// off from the URLs, use



.url|ltrimstr("https://")


instead of just .url.






share|improve this answer























  • Thanks, at the end of the file I get this errror: jq: error (at Bookmarks:23397): Cannot iterate over null (null), 23397 is the last line of the file.
    – Tuyen Pham
    Nov 29 at 15:08










  • So I've just modified your command, the correct one should be: jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv' that eliminate the above error. One more question, Is that space or tab between title and url? What if I need to insert tab between them?
    – Tuyen Pham
    Nov 29 at 15:17






  • 1




    @TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use @csv to get output like "The title here","https://url_here"
    – glenn jackman
    Nov 29 at 15:20












  • @TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The @tsv command formats the array that it gets as a tab-delimited string.
    – Kusalananda
    Nov 29 at 15:20












  • How to trim both http:// and https://?
    – Tuyen Pham
    Nov 29 at 15:26

















up vote
8
down vote



accepted










This works on the JSON document given in the question:



$ jq -r '.roots.bookmark_bar.children|.children|[""(.name)"",.url]|@tsv' file.json
"The title is here" https://the_url_is_here
"The title here" https://url_here


This accesses the .children array of each .roots.bookmark_bar.children array entry and creates a string that is formatted according to what you showed in the question (with a tab character in-between the two pieces of data).



If the double quotes are not necessary, you could change the cumbersome [""(.name)"",.url] to just [.name,.url].



To trim the https:// off from the URLs, use



.url|ltrimstr("https://")


instead of just .url.






share|improve this answer























  • Thanks, at the end of the file I get this errror: jq: error (at Bookmarks:23397): Cannot iterate over null (null), 23397 is the last line of the file.
    – Tuyen Pham
    Nov 29 at 15:08










  • So I've just modified your command, the correct one should be: jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv' that eliminate the above error. One more question, Is that space or tab between title and url? What if I need to insert tab between them?
    – Tuyen Pham
    Nov 29 at 15:17






  • 1




    @TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use @csv to get output like "The title here","https://url_here"
    – glenn jackman
    Nov 29 at 15:20












  • @TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The @tsv command formats the array that it gets as a tab-delimited string.
    – Kusalananda
    Nov 29 at 15:20












  • How to trim both http:// and https://?
    – Tuyen Pham
    Nov 29 at 15:26















up vote
8
down vote



accepted







up vote
8
down vote



accepted






This works on the JSON document given in the question:



$ jq -r '.roots.bookmark_bar.children|.children|[""(.name)"",.url]|@tsv' file.json
"The title is here" https://the_url_is_here
"The title here" https://url_here


This accesses the .children array of each .roots.bookmark_bar.children array entry and creates a string that is formatted according to what you showed in the question (with a tab character in-between the two pieces of data).



If the double quotes are not necessary, you could change the cumbersome [""(.name)"",.url] to just [.name,.url].



To trim the https:// off from the URLs, use



.url|ltrimstr("https://")


instead of just .url.






share|improve this answer














This works on the JSON document given in the question:



$ jq -r '.roots.bookmark_bar.children|.children|[""(.name)"",.url]|@tsv' file.json
"The title is here" https://the_url_is_here
"The title here" https://url_here


This accesses the .children array of each .roots.bookmark_bar.children array entry and creates a string that is formatted according to what you showed in the question (with a tab character in-between the two pieces of data).



If the double quotes are not necessary, you could change the cumbersome [""(.name)"",.url] to just [.name,.url].



To trim the https:// off from the URLs, use



.url|ltrimstr("https://")


instead of just .url.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 29 at 15:22

























answered Nov 29 at 15:03









Kusalananda

120k16225367




120k16225367












  • Thanks, at the end of the file I get this errror: jq: error (at Bookmarks:23397): Cannot iterate over null (null), 23397 is the last line of the file.
    – Tuyen Pham
    Nov 29 at 15:08










  • So I've just modified your command, the correct one should be: jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv' that eliminate the above error. One more question, Is that space or tab between title and url? What if I need to insert tab between them?
    – Tuyen Pham
    Nov 29 at 15:17






  • 1




    @TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use @csv to get output like "The title here","https://url_here"
    – glenn jackman
    Nov 29 at 15:20












  • @TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The @tsv command formats the array that it gets as a tab-delimited string.
    – Kusalananda
    Nov 29 at 15:20












  • How to trim both http:// and https://?
    – Tuyen Pham
    Nov 29 at 15:26




















  • Thanks, at the end of the file I get this errror: jq: error (at Bookmarks:23397): Cannot iterate over null (null), 23397 is the last line of the file.
    – Tuyen Pham
    Nov 29 at 15:08










  • So I've just modified your command, the correct one should be: jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv' that eliminate the above error. One more question, Is that space or tab between title and url? What if I need to insert tab between them?
    – Tuyen Pham
    Nov 29 at 15:17






  • 1




    @TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use @csv to get output like "The title here","https://url_here"
    – glenn jackman
    Nov 29 at 15:20












  • @TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The @tsv command formats the array that it gets as a tab-delimited string.
    – Kusalananda
    Nov 29 at 15:20












  • How to trim both http:// and https://?
    – Tuyen Pham
    Nov 29 at 15:26


















Thanks, at the end of the file I get this errror: jq: error (at Bookmarks:23397): Cannot iterate over null (null), 23397 is the last line of the file.
– Tuyen Pham
Nov 29 at 15:08




Thanks, at the end of the file I get this errror: jq: error (at Bookmarks:23397): Cannot iterate over null (null), 23397 is the last line of the file.
– Tuyen Pham
Nov 29 at 15:08












So I've just modified your command, the correct one should be: jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv' that eliminate the above error. One more question, Is that space or tab between title and url? What if I need to insert tab between them?
– Tuyen Pham
Nov 29 at 15:17




So I've just modified your command, the correct one should be: jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv' that eliminate the above error. One more question, Is that space or tab between title and url? What if I need to insert tab between them?
– Tuyen Pham
Nov 29 at 15:17




1




1




@TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use @csv to get output like "The title here","https://url_here"
– glenn jackman
Nov 29 at 15:20






@TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use @csv to get output like "The title here","https://url_here"
– glenn jackman
Nov 29 at 15:20














@TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The @tsv command formats the array that it gets as a tab-delimited string.
– Kusalananda
Nov 29 at 15:20






@TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The @tsv command formats the array that it gets as a tab-delimited string.
– Kusalananda
Nov 29 at 15:20














How to trim both http:// and https://?
– Tuyen Pham
Nov 29 at 15:26






How to trim both http:// and https://?
– Tuyen Pham
Nov 29 at 15:26




















draft saved

draft discarded




















































Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f484933%2fhow-to-combine-strings-from-json-values-keeping-only-part-of-the-string%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Quarter-circle Tiles

build a pushdown automaton that recognizes the reverse language of a given pushdown automaton?

Mont Emei