Is it safe to use standard input & output with binary data?
I need to split a binary file into two. I was wondering if head and/or tail could be used but then I wondered...is it safe to use redirection, piping etc with binary data? Do new lines get messed about with, or nulls ignored, or backspace or delete do something special? (bash, kubuntu 18.04 LTS)
command-line 18.04 bash kubuntu
add a comment |
I need to split a binary file into two. I was wondering if head and/or tail could be used but then I wondered...is it safe to use redirection, piping etc with binary data? Do new lines get messed about with, or nulls ignored, or backspace or delete do something special? (bash, kubuntu 18.04 LTS)
command-line 18.04 bash kubuntu
1
Take a look at thesplit
command.
– egmont
Dec 29 '18 at 22:11
add a comment |
I need to split a binary file into two. I was wondering if head and/or tail could be used but then I wondered...is it safe to use redirection, piping etc with binary data? Do new lines get messed about with, or nulls ignored, or backspace or delete do something special? (bash, kubuntu 18.04 LTS)
command-line 18.04 bash kubuntu
I need to split a binary file into two. I was wondering if head and/or tail could be used but then I wondered...is it safe to use redirection, piping etc with binary data? Do new lines get messed about with, or nulls ignored, or backspace or delete do something special? (bash, kubuntu 18.04 LTS)
command-line 18.04 bash kubuntu
command-line 18.04 bash kubuntu
edited Dec 29 '18 at 12:58
Ketan Patel
10.3k94365
10.3k94365
asked Dec 29 '18 at 12:55
B.TannerB.Tanner
9531814
9531814
1
Take a look at thesplit
command.
– egmont
Dec 29 '18 at 22:11
add a comment |
1
Take a look at thesplit
command.
– egmont
Dec 29 '18 at 22:11
1
1
Take a look at the
split
command.– egmont
Dec 29 '18 at 22:11
Take a look at the
split
command.– egmont
Dec 29 '18 at 22:11
add a comment |
2 Answers
2
active
oldest
votes
Yes it's safe if you pipe it to another process or save it to a file. There is potential "weirdness" if you let binary stdout print to a terminal since it can contain escape sequences (at random) that can temporarily mess up the terminal display.
5
In which case you can typereset
and press enter to fix it.
– Baard Kopperud
Dec 29 '18 at 16:42
3
@BaardKopperud I thought I read somewhere about some corner cases where tset/reset wouldn't work
– Xen2050
Dec 30 '18 at 0:04
@Xen2050 I don't know. the only case that would happen if some escape sequence changes the keyboard layout/encoding, so that typingreset<enter>
does not actually type that sequence of characters as seen by the terminal...
– Bakuriu
Dec 30 '18 at 10:02
2
See also Fix terminal after displaying a binary file and Why does the console need sometimes a reset after CTRL+C. As suggested in the first link,stty sane; tput rs1
sequence of commands will do the trick for when there are corner cases ofreset
not working. Such cases, in addition to mentioned by Bakuriu, could include width of the terminal line/columns or I'm guessing the settings related to serial communication ( baudrate/parity).
– Sergiy Kolodyazhnyy
Dec 30 '18 at 10:34
add a comment |
The main problem with using commands like head
or tail
is that they are line-oriented and binary files are not. If they do have newlines in them, they are often not being used to represent the end of a line and if they are, they may be just be part of strings like program messages or data fields.
If the data is structured in any way, then you have to take that into account in choosing split points so you don't break structures in the middle.
If you know the structure of the file, you can use a command such as
dd -if input-file -of output-file ...
with options to only copy so many blocks of data of a specific size starting at a particular (incremented) offset into the file.
It looks like the split
command as mentioned by @egmont will automate this process for you, but it appears to be line-oriented by default, so you'll have to specify additional options such as --bytes count
to tell it how large each piece of the file should be.
As a side note, if you don't know what's in a file, but suspect it contains at least some meaningful textual data, the strings
command is a great way of taking a first look to see what you're dealing with.
strings -n 6 file | less
will find all runs of printable characters at least six characters in length and display them in a pager so they don't fly by on the terminal. Using a number a bit larger than the default of 4 characters helps eliminate tiny snippets of data that just happen to be printable, but are not being used that way in the file.
If you later have to explore the file in more detail with binary editor such as hexedit
, you'll have some landmarks that point out where something interesting might be found.
strings
has an option -t x
that will precede each printed string with its offset into the file in hexadecimal (o for octal/d for decimal) so you know where to find it later. Even very short files are a lot to deal with when you have to look at them character by character.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "89"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1105348%2fis-it-safe-to-use-standard-input-output-with-binary-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Yes it's safe if you pipe it to another process or save it to a file. There is potential "weirdness" if you let binary stdout print to a terminal since it can contain escape sequences (at random) that can temporarily mess up the terminal display.
5
In which case you can typereset
and press enter to fix it.
– Baard Kopperud
Dec 29 '18 at 16:42
3
@BaardKopperud I thought I read somewhere about some corner cases where tset/reset wouldn't work
– Xen2050
Dec 30 '18 at 0:04
@Xen2050 I don't know. the only case that would happen if some escape sequence changes the keyboard layout/encoding, so that typingreset<enter>
does not actually type that sequence of characters as seen by the terminal...
– Bakuriu
Dec 30 '18 at 10:02
2
See also Fix terminal after displaying a binary file and Why does the console need sometimes a reset after CTRL+C. As suggested in the first link,stty sane; tput rs1
sequence of commands will do the trick for when there are corner cases ofreset
not working. Such cases, in addition to mentioned by Bakuriu, could include width of the terminal line/columns or I'm guessing the settings related to serial communication ( baudrate/parity).
– Sergiy Kolodyazhnyy
Dec 30 '18 at 10:34
add a comment |
Yes it's safe if you pipe it to another process or save it to a file. There is potential "weirdness" if you let binary stdout print to a terminal since it can contain escape sequences (at random) that can temporarily mess up the terminal display.
5
In which case you can typereset
and press enter to fix it.
– Baard Kopperud
Dec 29 '18 at 16:42
3
@BaardKopperud I thought I read somewhere about some corner cases where tset/reset wouldn't work
– Xen2050
Dec 30 '18 at 0:04
@Xen2050 I don't know. the only case that would happen if some escape sequence changes the keyboard layout/encoding, so that typingreset<enter>
does not actually type that sequence of characters as seen by the terminal...
– Bakuriu
Dec 30 '18 at 10:02
2
See also Fix terminal after displaying a binary file and Why does the console need sometimes a reset after CTRL+C. As suggested in the first link,stty sane; tput rs1
sequence of commands will do the trick for when there are corner cases ofreset
not working. Such cases, in addition to mentioned by Bakuriu, could include width of the terminal line/columns or I'm guessing the settings related to serial communication ( baudrate/parity).
– Sergiy Kolodyazhnyy
Dec 30 '18 at 10:34
add a comment |
Yes it's safe if you pipe it to another process or save it to a file. There is potential "weirdness" if you let binary stdout print to a terminal since it can contain escape sequences (at random) that can temporarily mess up the terminal display.
Yes it's safe if you pipe it to another process or save it to a file. There is potential "weirdness" if you let binary stdout print to a terminal since it can contain escape sequences (at random) that can temporarily mess up the terminal display.
answered Dec 29 '18 at 13:05
Eric MintzEric Mintz
584212
584212
5
In which case you can typereset
and press enter to fix it.
– Baard Kopperud
Dec 29 '18 at 16:42
3
@BaardKopperud I thought I read somewhere about some corner cases where tset/reset wouldn't work
– Xen2050
Dec 30 '18 at 0:04
@Xen2050 I don't know. the only case that would happen if some escape sequence changes the keyboard layout/encoding, so that typingreset<enter>
does not actually type that sequence of characters as seen by the terminal...
– Bakuriu
Dec 30 '18 at 10:02
2
See also Fix terminal after displaying a binary file and Why does the console need sometimes a reset after CTRL+C. As suggested in the first link,stty sane; tput rs1
sequence of commands will do the trick for when there are corner cases ofreset
not working. Such cases, in addition to mentioned by Bakuriu, could include width of the terminal line/columns or I'm guessing the settings related to serial communication ( baudrate/parity).
– Sergiy Kolodyazhnyy
Dec 30 '18 at 10:34
add a comment |
5
In which case you can typereset
and press enter to fix it.
– Baard Kopperud
Dec 29 '18 at 16:42
3
@BaardKopperud I thought I read somewhere about some corner cases where tset/reset wouldn't work
– Xen2050
Dec 30 '18 at 0:04
@Xen2050 I don't know. the only case that would happen if some escape sequence changes the keyboard layout/encoding, so that typingreset<enter>
does not actually type that sequence of characters as seen by the terminal...
– Bakuriu
Dec 30 '18 at 10:02
2
See also Fix terminal after displaying a binary file and Why does the console need sometimes a reset after CTRL+C. As suggested in the first link,stty sane; tput rs1
sequence of commands will do the trick for when there are corner cases ofreset
not working. Such cases, in addition to mentioned by Bakuriu, could include width of the terminal line/columns or I'm guessing the settings related to serial communication ( baudrate/parity).
– Sergiy Kolodyazhnyy
Dec 30 '18 at 10:34
5
5
In which case you can type
reset
and press enter to fix it.– Baard Kopperud
Dec 29 '18 at 16:42
In which case you can type
reset
and press enter to fix it.– Baard Kopperud
Dec 29 '18 at 16:42
3
3
@BaardKopperud I thought I read somewhere about some corner cases where tset/reset wouldn't work
– Xen2050
Dec 30 '18 at 0:04
@BaardKopperud I thought I read somewhere about some corner cases where tset/reset wouldn't work
– Xen2050
Dec 30 '18 at 0:04
@Xen2050 I don't know. the only case that would happen if some escape sequence changes the keyboard layout/encoding, so that typing
reset<enter>
does not actually type that sequence of characters as seen by the terminal...– Bakuriu
Dec 30 '18 at 10:02
@Xen2050 I don't know. the only case that would happen if some escape sequence changes the keyboard layout/encoding, so that typing
reset<enter>
does not actually type that sequence of characters as seen by the terminal...– Bakuriu
Dec 30 '18 at 10:02
2
2
See also Fix terminal after displaying a binary file and Why does the console need sometimes a reset after CTRL+C. As suggested in the first link,
stty sane; tput rs1
sequence of commands will do the trick for when there are corner cases of reset
not working. Such cases, in addition to mentioned by Bakuriu, could include width of the terminal line/columns or I'm guessing the settings related to serial communication ( baudrate/parity).– Sergiy Kolodyazhnyy
Dec 30 '18 at 10:34
See also Fix terminal after displaying a binary file and Why does the console need sometimes a reset after CTRL+C. As suggested in the first link,
stty sane; tput rs1
sequence of commands will do the trick for when there are corner cases of reset
not working. Such cases, in addition to mentioned by Bakuriu, could include width of the terminal line/columns or I'm guessing the settings related to serial communication ( baudrate/parity).– Sergiy Kolodyazhnyy
Dec 30 '18 at 10:34
add a comment |
The main problem with using commands like head
or tail
is that they are line-oriented and binary files are not. If they do have newlines in them, they are often not being used to represent the end of a line and if they are, they may be just be part of strings like program messages or data fields.
If the data is structured in any way, then you have to take that into account in choosing split points so you don't break structures in the middle.
If you know the structure of the file, you can use a command such as
dd -if input-file -of output-file ...
with options to only copy so many blocks of data of a specific size starting at a particular (incremented) offset into the file.
It looks like the split
command as mentioned by @egmont will automate this process for you, but it appears to be line-oriented by default, so you'll have to specify additional options such as --bytes count
to tell it how large each piece of the file should be.
As a side note, if you don't know what's in a file, but suspect it contains at least some meaningful textual data, the strings
command is a great way of taking a first look to see what you're dealing with.
strings -n 6 file | less
will find all runs of printable characters at least six characters in length and display them in a pager so they don't fly by on the terminal. Using a number a bit larger than the default of 4 characters helps eliminate tiny snippets of data that just happen to be printable, but are not being used that way in the file.
If you later have to explore the file in more detail with binary editor such as hexedit
, you'll have some landmarks that point out where something interesting might be found.
strings
has an option -t x
that will precede each printed string with its offset into the file in hexadecimal (o for octal/d for decimal) so you know where to find it later. Even very short files are a lot to deal with when you have to look at them character by character.
add a comment |
The main problem with using commands like head
or tail
is that they are line-oriented and binary files are not. If they do have newlines in them, they are often not being used to represent the end of a line and if they are, they may be just be part of strings like program messages or data fields.
If the data is structured in any way, then you have to take that into account in choosing split points so you don't break structures in the middle.
If you know the structure of the file, you can use a command such as
dd -if input-file -of output-file ...
with options to only copy so many blocks of data of a specific size starting at a particular (incremented) offset into the file.
It looks like the split
command as mentioned by @egmont will automate this process for you, but it appears to be line-oriented by default, so you'll have to specify additional options such as --bytes count
to tell it how large each piece of the file should be.
As a side note, if you don't know what's in a file, but suspect it contains at least some meaningful textual data, the strings
command is a great way of taking a first look to see what you're dealing with.
strings -n 6 file | less
will find all runs of printable characters at least six characters in length and display them in a pager so they don't fly by on the terminal. Using a number a bit larger than the default of 4 characters helps eliminate tiny snippets of data that just happen to be printable, but are not being used that way in the file.
If you later have to explore the file in more detail with binary editor such as hexedit
, you'll have some landmarks that point out where something interesting might be found.
strings
has an option -t x
that will precede each printed string with its offset into the file in hexadecimal (o for octal/d for decimal) so you know where to find it later. Even very short files are a lot to deal with when you have to look at them character by character.
add a comment |
The main problem with using commands like head
or tail
is that they are line-oriented and binary files are not. If they do have newlines in them, they are often not being used to represent the end of a line and if they are, they may be just be part of strings like program messages or data fields.
If the data is structured in any way, then you have to take that into account in choosing split points so you don't break structures in the middle.
If you know the structure of the file, you can use a command such as
dd -if input-file -of output-file ...
with options to only copy so many blocks of data of a specific size starting at a particular (incremented) offset into the file.
It looks like the split
command as mentioned by @egmont will automate this process for you, but it appears to be line-oriented by default, so you'll have to specify additional options such as --bytes count
to tell it how large each piece of the file should be.
As a side note, if you don't know what's in a file, but suspect it contains at least some meaningful textual data, the strings
command is a great way of taking a first look to see what you're dealing with.
strings -n 6 file | less
will find all runs of printable characters at least six characters in length and display them in a pager so they don't fly by on the terminal. Using a number a bit larger than the default of 4 characters helps eliminate tiny snippets of data that just happen to be printable, but are not being used that way in the file.
If you later have to explore the file in more detail with binary editor such as hexedit
, you'll have some landmarks that point out where something interesting might be found.
strings
has an option -t x
that will precede each printed string with its offset into the file in hexadecimal (o for octal/d for decimal) so you know where to find it later. Even very short files are a lot to deal with when you have to look at them character by character.
The main problem with using commands like head
or tail
is that they are line-oriented and binary files are not. If they do have newlines in them, they are often not being used to represent the end of a line and if they are, they may be just be part of strings like program messages or data fields.
If the data is structured in any way, then you have to take that into account in choosing split points so you don't break structures in the middle.
If you know the structure of the file, you can use a command such as
dd -if input-file -of output-file ...
with options to only copy so many blocks of data of a specific size starting at a particular (incremented) offset into the file.
It looks like the split
command as mentioned by @egmont will automate this process for you, but it appears to be line-oriented by default, so you'll have to specify additional options such as --bytes count
to tell it how large each piece of the file should be.
As a side note, if you don't know what's in a file, but suspect it contains at least some meaningful textual data, the strings
command is a great way of taking a first look to see what you're dealing with.
strings -n 6 file | less
will find all runs of printable characters at least six characters in length and display them in a pager so they don't fly by on the terminal. Using a number a bit larger than the default of 4 characters helps eliminate tiny snippets of data that just happen to be printable, but are not being used that way in the file.
If you later have to explore the file in more detail with binary editor such as hexedit
, you'll have some landmarks that point out where something interesting might be found.
strings
has an option -t x
that will precede each printed string with its offset into the file in hexadecimal (o for octal/d for decimal) so you know where to find it later. Even very short files are a lot to deal with when you have to look at them character by character.
answered Jan 3 at 13:40
JoeJoe
1,201821
1,201821
add a comment |
add a comment |
Thanks for contributing an answer to Ask Ubuntu!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1105348%2fis-it-safe-to-use-standard-input-output-with-binary-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Take a look at the
split
command.– egmont
Dec 29 '18 at 22:11