immediatewrite with plain text
up vote
5
down vote
favorite
I have read the question in TeX.SE, but I don't want user to add ^^J
manually. That is, I want writer to output content originally.
documentclass{article}
begin{document}
newwritefile
immediateopenoutfile=tmp.txt
immediatewritefile{
To be or not to be,
that is % the question
}
closeoutfile
end{document}
It should output
To be or not to be,
that is % the question
Here's my source code, at the beginning, I use Python to extract the content from the .tex
file, then I'm refactoring it in an easier way, use LaTeX to output the code originally, that's the reason why I meet this question.
Sorry about my poor expression :P
Thanks a lot!
macros external-files write
New contributor
add a comment |
up vote
5
down vote
favorite
I have read the question in TeX.SE, but I don't want user to add ^^J
manually. That is, I want writer to output content originally.
documentclass{article}
begin{document}
newwritefile
immediateopenoutfile=tmp.txt
immediatewritefile{
To be or not to be,
that is % the question
}
closeoutfile
end{document}
It should output
To be or not to be,
that is % the question
Here's my source code, at the beginning, I use Python to extract the content from the .tex
file, then I'm refactoring it in an easier way, use LaTeX to output the code originally, that's the reason why I meet this question.
Sorry about my poor expression :P
Thanks a lot!
macros external-files write
New contributor
1
Why don't you use thefilecontents
environment?
– Phelype Oleinik
2 days ago
Well, I don't know this environment, I will try after dinner. Thanks a lot :)
– Iydon
2 days ago
You seem to wish to have linebreaks at the beginning and at the end of the argument removed instead of having them written to file. What behavior do you wish in case the argument of thewrite
-command is empty or does contain only a single line-break, i.e.,immediatewritefile{}
orimmediatewritefile{<line-break>}
?
– Ulrich Diez
yesterday
add a comment |
up vote
5
down vote
favorite
up vote
5
down vote
favorite
I have read the question in TeX.SE, but I don't want user to add ^^J
manually. That is, I want writer to output content originally.
documentclass{article}
begin{document}
newwritefile
immediateopenoutfile=tmp.txt
immediatewritefile{
To be or not to be,
that is % the question
}
closeoutfile
end{document}
It should output
To be or not to be,
that is % the question
Here's my source code, at the beginning, I use Python to extract the content from the .tex
file, then I'm refactoring it in an easier way, use LaTeX to output the code originally, that's the reason why I meet this question.
Sorry about my poor expression :P
Thanks a lot!
macros external-files write
New contributor
I have read the question in TeX.SE, but I don't want user to add ^^J
manually. That is, I want writer to output content originally.
documentclass{article}
begin{document}
newwritefile
immediateopenoutfile=tmp.txt
immediatewritefile{
To be or not to be,
that is % the question
}
closeoutfile
end{document}
It should output
To be or not to be,
that is % the question
Here's my source code, at the beginning, I use Python to extract the content from the .tex
file, then I'm refactoring it in an easier way, use LaTeX to output the code originally, that's the reason why I meet this question.
Sorry about my poor expression :P
Thanks a lot!
macros external-files write
macros external-files write
New contributor
New contributor
edited 2 days ago
Phelype Oleinik
20.4k54277
20.4k54277
New contributor
asked 2 days ago
Iydon
456
456
New contributor
New contributor
1
Why don't you use thefilecontents
environment?
– Phelype Oleinik
2 days ago
Well, I don't know this environment, I will try after dinner. Thanks a lot :)
– Iydon
2 days ago
You seem to wish to have linebreaks at the beginning and at the end of the argument removed instead of having them written to file. What behavior do you wish in case the argument of thewrite
-command is empty or does contain only a single line-break, i.e.,immediatewritefile{}
orimmediatewritefile{<line-break>}
?
– Ulrich Diez
yesterday
add a comment |
1
Why don't you use thefilecontents
environment?
– Phelype Oleinik
2 days ago
Well, I don't know this environment, I will try after dinner. Thanks a lot :)
– Iydon
2 days ago
You seem to wish to have linebreaks at the beginning and at the end of the argument removed instead of having them written to file. What behavior do you wish in case the argument of thewrite
-command is empty or does contain only a single line-break, i.e.,immediatewritefile{}
orimmediatewritefile{<line-break>}
?
– Ulrich Diez
yesterday
1
1
Why don't you use the
filecontents
environment?– Phelype Oleinik
2 days ago
Why don't you use the
filecontents
environment?– Phelype Oleinik
2 days ago
Well, I don't know this environment, I will try after dinner. Thanks a lot :)
– Iydon
2 days ago
Well, I don't know this environment, I will try after dinner. Thanks a lot :)
– Iydon
2 days ago
You seem to wish to have linebreaks at the beginning and at the end of the argument removed instead of having them written to file. What behavior do you wish in case the argument of the
write
-command is empty or does contain only a single line-break, i.e., immediatewritefile{}
or immediatewritefile{<line-break>}
?– Ulrich Diez
yesterday
You seem to wish to have linebreaks at the beginning and at the end of the argument removed instead of having them written to file. What behavior do you wish in case the argument of the
write
-command is empty or does contain only a single line-break, i.e., immediatewritefile{}
or immediatewritefile{<line-break>}
?– Ulrich Diez
yesterday
add a comment |
2 Answers
2
active
oldest
votes
up vote
7
down vote
accepted
The LaTeX kernel provides the filecontents
environment to write to external files without having to worry about catcodes and such. The filecontents
package does minimal changes to this environment allowing it to be used anywhere in the document (LaTeX's version can only be used in the preamble, for some reason; and allowing it to overwrite existing files, which is also disabled in LaTeX's version.
To produce
To be or not to be,
that is % the question
you use:
documentclass{article}
usepackage{filecontents}
begin{document}
begin{filecontents*}{tmp.txt}
To be or not to be,
that is % the question
end{filecontents*}
end{document}
The starred version (filecontents*
) omits the heading that is printed in the standard version of the environment:
%% LaTeX2e file `tmp.txt'
%% generated by the `filecontents' environment
%% from source `test' on 2018/11/20.
%%
To be or not to be,
that is % the question
An addendum on my (admittedly lazy) answer:
If you should want to persist on reinventing the wheel (which is much more fun, I must admit), then you can create a command to take care of the catcode
ing for you. Here I provide an ad hoc implementation of a verbwrite
command which does the job for you.
The command syntax is somewhat like LaTeX's verb
: you can use either as verbwritefile{<stuff>}
or verbwritefile|<stuff>|
. For the latter syntax, any character other than {
can be used to delimit the contents. This character, obviously, can't appear in <stuff>
. The advantage of the second syntax is that you don't have any restriction in balancing {
and }
inside the contents of the command.
documentclass{article}
makeatletter
longdef@ifnextchar@other@space#1#2#3{%
letreserved@d=#1%
defreserved@a{#2}%
defreserved@b{#3}%
futurelet@let@token@ifnch@other}
letkernel@ifnextchar@ifnextchar
def@ifnch@other{%
ifx@let@tokenother@sptoken
letreserved@c@xifnch@other
else
ifx@let@tokenreserved@d
letreserved@creserved@a
else
letreserved@creserved@b
fi
fi
reserved@c}
{catcode` =12
{globalletother@sptoken= }%
gdef@xifnch@other {futurelet@let@token@ifnch@other}}%
defverbwrite{@ifstar
{let@ifnextchar@ifnextchar@other@spaceverbwrite@grab}%
verbwrite@grab}
defverbwrite@grab#1{%
begingroup
catcode`^^M=13
newlinechar`^^M
letdo@makeother dospecials
catcode`{=1
@ifnextcharbgroup
{%
catcode`}=2
verbwrite@brace{#1}%
}%
{%
catcode`{=12
verbwrite@other{#1}%
}%
}
defverbwrite@brace#1#2{%
immediatewrite#1{unexpanded{#2}}%
endgroup
}
defverbwrite@other#1#2{%
defverbwrite@delim##1##2#2{%
immediatewrite##1{unexpanded{##2}}%
endgroup
}%
verbwrite@delim#1%
}
makeatother
begin{document}
newwritefile
immediateopenoutfile=tmp.txt
tracingall
verbwrite*file{To be or not to be,
that is % the question}
verbwritefile|To be or not to be,
that is } the {question|
verbwritefile$être ou ne pas être,
вот в чем вопрос$
verbwritefile}être ou ne pas être,
вот в чем вопрос}
closeoutfile
end{document}
Please beware that I took 63 minutes to write this command, so it is certainly not what you can call robust. Proceed with care :)
Fix 1: Prevent expansion of the text using ε-TeX's unexpanded
(thanks to jfbu :)
Fix 2: Prevent premature tokenization of the delimiter (thanks again to jfbu :)
Feature 1: Added a starred version that ignores spaces before the delimiter of the verbatim content.
Fix 3: Actually allow }
as a "other" delimiter (verbwritefile}stuff}
) (thanks to Ulrich Diez :)
1
Thank you very much :)
– Iydon
2 days ago
Try this withêtre ou ne pas être
. You will need to addusepackage[T1](fontenc}
(assuming pdflatex here). And this will only fix those benign letters, add then a Unicode letter for Cyrillic for example. On the other hand thefilecontents*
environment does not have that problem...even for быть или не быть
– jfbu
2 days ago
@jfbu It was creating content! The code has developed consciousness! Thanks for the warning, I hadn't realised that :-)
– Phelype Oleinik
2 days ago
Keep in mind, it could be worse if @egreg was around, and allow me anothe remark: try it with&
or$
as delimiters...
– jfbu
2 days ago
@jfbu Oops, that one was bad. Hopefully fixed now. Thanks :) And yes, egreg would most certainly spot a missing%
;)
– Phelype Oleinik
2 days ago
|
show 9 more comments
up vote
2
down vote
I suggest using the filecontents*
-environment.
Be aware that there is also a LaTeX 2ε-package filecontents which does remove some of the limitations that come along with the filecontents*
-environment from the LaTeX 2ε-kernel.
If you are in the mood for reinventing the wheel, you can write a macro which does
- switch to verbatim-catcode-régime,
- switch the catcode of the endlinechar (usually
^^M
/ASCII-Return) to 12 so that ASCII-return is treated like digits and punctuation-marks, - read and tokenize under that catcode-régime the argument containing the text that is to be written to file
- trim leading and trailing endline-chars from that text
- write the text to file while having
endlinechar
also asnewlinechar
.
In (La)TeX there are several stages of processing input.
(La)TeX does read TeX-input, e.g., a .tex-input-file, line by line.
In the pre-processing-stage, the single characters that form the line will be converted to (La)TeX' internal character encoding. (With old-school (La)TeX engines, the internal character-encoding is ASCII. With engines based on XeTeX or LuaTeX, the internal character-encoding is utf-8 whereof ASCII is a subset.) Then all space-characters (code-point-number 32 both in ASCII and in utf-8, i.e., in all encodings that come into question as internal-character encoding of a (La)TeX engine) that occur at the right end of the line will be removed. Then a character will be inserted at the right end of the line whose code-point-number in (La)TeX' internal character-encoding (i.e. ASCII or utf-8) corresponds to the number of the integer-parameter endlinechar
. Usually the value of the integer-parameter endlinechar
is 13 while code-point-number 13 both in ASCII and in utf-8, i.e., in all encodings that come into question as internal-character encoding of a (La)TeX engine, denotes the ⟨RETURN⟩-character. This means: Usually a ⟨RETURN⟩-character gets inserted at the right end of the line.
When this is done, the tokenizing-stage begins: In this stage (La)TeX takes the characters that form the line for instructions for placing tokens into the token-stream. This is the stage when things start to be about so-called tokens, e.g., control-sequence-tokens (which come in two flavors: control-word-tokens and control-symbol-tokens) and character-tokens. Character-tokens consist of character-codes denoting the code-point-number in the (La)TeX' internal character-encoding and category-codes. Category-codes make it possible for characters to have special meanings for the (La)TeX-engine. E.g., the category-code of the backslash-character usually is 0(escape). A character whose category-code is 0 at tokenizing-time causes (La)TeX to gather the name of a control-sequence-token and afterwards place that control-sequence-token into the token-stream. E.g., the category-code of the opening curly brace usually is 1(begin grouping) and the category-code of the closing curly brace usually is 2(end grouping) while character-tokens of category-code 1(begin grouping) are to be used for introducing groups (i.e., macro arguments consisting of several tokens or local-scopes for assignments like macro-definitions or the ⟨balanced text⟩ with things like scantokens
) and character-tokens of category-code 2(end grouping) are to be used for denoting what does not belong to the group in question any more. More information about category-codes can be found at https://en.wikibooks.org/wiki/TeX/catcode.
After tokenizing, there is a "stream of tokens". Processing the stream of tokens includes things like expansion of expandable tokens (e.g., macro-tokens, e.g., expandable primitives like string
or csname...endcsname
) and (later) carrying out assignments, creating boxes etc.
When reading and tokenizing a .tex-input-file, (La)TeX will— during the pre-processing-stage— remove spaces at every line-ending and insert an endline-character at every line-ending.
Therefore the input-sequence
immediatewritefile{
To be or not to be,
that is % the question
}
will by (La)TeX at tokenizing-time, i.e., after pre-processing, be treated as
immediatewritefile{⟨character due to endline-char-insertion⟩
To be or not to be,⟨character due to endline-char-insertion⟩
that is % the question⟨character due to endline-char-insertion⟩
}⟨character due to endline-char-insertion⟩
Usually the endline-character is ^^M
, i.e., ⟨RETURN⟩.
Thus the above input-sequence usually will by (La)TeX at tokenizing-time be treated as
immediatewritefile{⟨^^M/RETURN-character⟩
To be or not to be,⟨^^M/RETURN-character⟩
that is % the question⟨^^M/RETURN-character⟩
}⟨^^M/RETURN-character⟩
(The answer to the question which tokens (La)TeX will insert into the token-stream when encountering a ⟨^^M/RETURN-character⟩ depends on the category-code which at the time of tokenizing is assigned to the ⟨^^M/RETURN-character⟩.
Usually the category-code of the ⟨^^M/RETURN-character⟩ is 5 (end of line) which means that depending on the state of (La)TeX' reading apparatus either (in state S=skipping blanks) no token at all or (in state M=in the middle of a line) a space-token(=a character-token of category-code 10(space) and character-code 32 (32 is the number of the space-character in (La)TeX' internal character-encoding) or (in state N=about to begin new line) a par
-token will be inserted.
In case category code 12(other) is assigned to the ⟨^^M/RETURN-character⟩, (La)TeX will insert a character-token of category-code 12(other) and character-code 13 (13 is the number of the ⟨RETURN-character⟩, in (La)TeX' internal character-encoding) into the token-stream. Such a token can be processed as any other character token.)
Besides this, (La)TeX will—at writing-time—in any case attach at the end of the argument of a write
-command that sequence of characters/bytes that on the platform in use serves for ending lines within plain text files.
Thus—assuming that we managed to have LaTeX accept the percent-char as an ordinary character—the write
-command will get something like:
⟨token due to ^^M/RETURN-character⟩To be or not to be,⟨token due to ^^M/RETURN-character⟩that is % the question⟨token due to ^^M/RETURN-character⟩
Att writing-time, a
⟨platform-dependent sequence for ending the line⟩
will be attached.If the category code of the endline-character/of the ⟨^^M/RETURN-character⟩ was 5(end of line) at the time of tokenizing the input, the sequence
⟨space⟩To be or not to be,⟨space⟩that is % the question⟨space⟩⟨platform-dependent sequence for ending the line⟩
will be written to the external file.If the category code of the endline-character/of the ⟨^^M/RETURN-character⟩ was 12(return) at the time of tokenizing the input, the sequence
^^MTo be or not to be,^^Mthat is % the question^^M⟨platform-dependent sequence for ending the line⟩
will be written to the external file.You can ensure that at writing-time a ⟨^^M/RETURN-character⟩ also yields the ⟨platform-dependent sequence for ending the line⟩ by assigning the integer-parameter newlinechar
the value of the integer-parameter endlinechar
.
If you do this also, the sequence
⟨platform-dependent sequence for ending the line⟩To be or not to be,⟨platform-dependent sequence for ending the line⟩that is % the question⟨platform-dependent sequence for ending the line⟩⟨platform-dependent sequence for ending the line⟩
will be written to the external file.
But this way you might get undesired empty lines.
Therefore you may wish to apply a routine for removing leading and trailing ⟨characters due to endline-char-insertion⟩ from the entire argument before letting write
do the writing-job.
A coding-example could look like this:
documentclass{article}
makeatletter
begingroup
catcode`^^M=12relax%
@firstofone{%
endgroup%
newcommand*gobbleendl{}defgobbleendl ^^M{}%
newcommandtrimendls[2]{innertrimleadendl{#2}#1^^Mrelax{#1}}%
newcommand*innertrimleadendl{}%
definnertrimleadendl#1#2^^M#3relax#4{%
ifxrelax#2relaxexpandafter@firstoftwoelseexpandafter@secondoftwofi%
{%
ifxrelax#4relaxexpandafter@firstoftwoelseexpandafter@secondoftwofi%
{trimtrailendl{}{#1}}%
{expandaftertrimtrailendlexpandafter{gobbleendl#4}{#1}}%
}%
{trimtrailendl{#4}{#1}}%
}%
newcommand*trimtrailendl[2]{%
innertrimtrailendl{#2}.#1relax.^^Mrelax.relaxrelax{#1}%
}%
newcommand*innertrimtrailendl{}%
definnertrimtrailendl#1#2^^Mrelax.#3relaxrelax#4{%
ifxrelax#3relaxexpandafter@firstoftwoelseexpandafter@secondoftwofi%
{def@tempa{#4}}%
{expandafterdefexpandafter@tempaexpandafter{@gobble#2}}%
@onelevel@sanitize@tempa%
newlinechar=endlinechar%
immediatewrite#1{@tempa}%
}%
}%
newcommandimmediateverbatimwrite[1]{%
begingroup
letdo=@makeother
dospecials
catcode` =10 %We don't want to allow space as verb-arg-delimiter.
%Thus let's remove spaces when grabbing undelimited arguments.
%endlinechar=`^^M%
%catcode`endlinechar=5 %
bracefork{#1}%
}%
begingroup
catcode`(=1 %
catcode`{=12 %
@firstofone(%
endgroup
newcommandbracefork[2](%
catcode` =12relax
catcodeendlinechar=12 %
ifx{#2expandafter@firstoftwoelseexpandafter@secondoftwofi
(%
catcode`{=1 %
catcode`}=2 %
internalfilewritercaller(#1}(}%
}(%
internalfilewritercaller(#1}(#2}%
}%
}%
}%
newcommandinternalfilewritercaller[2]{%
def@tempa##1#2{internalfilewriter{#1}{##1}}%
ifxrelax#2relaxexpandafter@firstoftwoelseexpandafter@secondoftwofi
{expandafterexpandafter
expandafter@tempa
expandafterexpandafter
expandafter{%
expandafter@gobblestring}}%
{@tempa}%
}
newcommandinternalfilewriter[2]{%
trimendls{#2}{#1}%
endgroup
}%
makeatother
begin{document}
newwritefile
immediateopenoutfile=tmp.txtrelax
Aimmediateverbatimwrite{file}
{
être ou ne pas être.
That is % the question.
}B%
C%
%
Dimmediateverbatimwrite{file} |
}être ou ne pas être.
That is % the question.
|E%
F
immediatecloseoutfile
end{document}
With this example you get
- a pdf-file with the sequence ABCDEF. (This shows that no spurious spaces/whatsoever characters get introduced/inserted.)
- a text-file whose name is tmp.txt and whose content is:
être ou ne pas être.⟨linebreak⟩
That is % the question.⟨linebreak⟩
}être ou ne pas être.⟨linebreak⟩
That is % the question.⟨linebreak⟩
Due to the linebreaks, editors which also show line-numbers might display that file as1 être ou ne pas être.
2 That is % the question.
3 }être ou ne pas être.
4 That is % the question.
5
By the way: With (La)TeX it is not possible to keep spaces at the ends of lines.
The reason is that (La)TeX does read and tokenize input line by line and one of the first things it does (in the pre-processing-stage) to every line of input (even before adding the endline-character and starting tokenizing the line) is removing all spaces that occur at the ends of lines.
Thus (La)TeX input like
code⟨space⟩⟨space⟩
more code⟨space⟩⟨space⟩⟨space⟩⟨space⟩⟨space⟩
even more code⟨space⟩⟨space⟩
will in any case be pre-processed to
code⟨character due to endline-char-insertion⟩more code⟨character due to endline-char-insertion⟩even more code⟨character due to endline-char-insertion⟩
before any further processing/tokenization etc takes place.
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
7
down vote
accepted
The LaTeX kernel provides the filecontents
environment to write to external files without having to worry about catcodes and such. The filecontents
package does minimal changes to this environment allowing it to be used anywhere in the document (LaTeX's version can only be used in the preamble, for some reason; and allowing it to overwrite existing files, which is also disabled in LaTeX's version.
To produce
To be or not to be,
that is % the question
you use:
documentclass{article}
usepackage{filecontents}
begin{document}
begin{filecontents*}{tmp.txt}
To be or not to be,
that is % the question
end{filecontents*}
end{document}
The starred version (filecontents*
) omits the heading that is printed in the standard version of the environment:
%% LaTeX2e file `tmp.txt'
%% generated by the `filecontents' environment
%% from source `test' on 2018/11/20.
%%
To be or not to be,
that is % the question
An addendum on my (admittedly lazy) answer:
If you should want to persist on reinventing the wheel (which is much more fun, I must admit), then you can create a command to take care of the catcode
ing for you. Here I provide an ad hoc implementation of a verbwrite
command which does the job for you.
The command syntax is somewhat like LaTeX's verb
: you can use either as verbwritefile{<stuff>}
or verbwritefile|<stuff>|
. For the latter syntax, any character other than {
can be used to delimit the contents. This character, obviously, can't appear in <stuff>
. The advantage of the second syntax is that you don't have any restriction in balancing {
and }
inside the contents of the command.
documentclass{article}
makeatletter
longdef@ifnextchar@other@space#1#2#3{%
letreserved@d=#1%
defreserved@a{#2}%
defreserved@b{#3}%
futurelet@let@token@ifnch@other}
letkernel@ifnextchar@ifnextchar
def@ifnch@other{%
ifx@let@tokenother@sptoken
letreserved@c@xifnch@other
else
ifx@let@tokenreserved@d
letreserved@creserved@a
else
letreserved@creserved@b
fi
fi
reserved@c}
{catcode` =12
{globalletother@sptoken= }%
gdef@xifnch@other {futurelet@let@token@ifnch@other}}%
defverbwrite{@ifstar
{let@ifnextchar@ifnextchar@other@spaceverbwrite@grab}%
verbwrite@grab}
defverbwrite@grab#1{%
begingroup
catcode`^^M=13
newlinechar`^^M
letdo@makeother dospecials
catcode`{=1
@ifnextcharbgroup
{%
catcode`}=2
verbwrite@brace{#1}%
}%
{%
catcode`{=12
verbwrite@other{#1}%
}%
}
defverbwrite@brace#1#2{%
immediatewrite#1{unexpanded{#2}}%
endgroup
}
defverbwrite@other#1#2{%
defverbwrite@delim##1##2#2{%
immediatewrite##1{unexpanded{##2}}%
endgroup
}%
verbwrite@delim#1%
}
makeatother
begin{document}
newwritefile
immediateopenoutfile=tmp.txt
tracingall
verbwrite*file{To be or not to be,
that is % the question}
verbwritefile|To be or not to be,
that is } the {question|
verbwritefile$être ou ne pas être,
вот в чем вопрос$
verbwritefile}être ou ne pas être,
вот в чем вопрос}
closeoutfile
end{document}
Please beware that I took 63 minutes to write this command, so it is certainly not what you can call robust. Proceed with care :)
Fix 1: Prevent expansion of the text using ε-TeX's unexpanded
(thanks to jfbu :)
Fix 2: Prevent premature tokenization of the delimiter (thanks again to jfbu :)
Feature 1: Added a starred version that ignores spaces before the delimiter of the verbatim content.
Fix 3: Actually allow }
as a "other" delimiter (verbwritefile}stuff}
) (thanks to Ulrich Diez :)
1
Thank you very much :)
– Iydon
2 days ago
Try this withêtre ou ne pas être
. You will need to addusepackage[T1](fontenc}
(assuming pdflatex here). And this will only fix those benign letters, add then a Unicode letter for Cyrillic for example. On the other hand thefilecontents*
environment does not have that problem...even for быть или не быть
– jfbu
2 days ago
@jfbu It was creating content! The code has developed consciousness! Thanks for the warning, I hadn't realised that :-)
– Phelype Oleinik
2 days ago
Keep in mind, it could be worse if @egreg was around, and allow me anothe remark: try it with&
or$
as delimiters...
– jfbu
2 days ago
@jfbu Oops, that one was bad. Hopefully fixed now. Thanks :) And yes, egreg would most certainly spot a missing%
;)
– Phelype Oleinik
2 days ago
|
show 9 more comments
up vote
7
down vote
accepted
The LaTeX kernel provides the filecontents
environment to write to external files without having to worry about catcodes and such. The filecontents
package does minimal changes to this environment allowing it to be used anywhere in the document (LaTeX's version can only be used in the preamble, for some reason; and allowing it to overwrite existing files, which is also disabled in LaTeX's version.
To produce
To be or not to be,
that is % the question
you use:
documentclass{article}
usepackage{filecontents}
begin{document}
begin{filecontents*}{tmp.txt}
To be or not to be,
that is % the question
end{filecontents*}
end{document}
The starred version (filecontents*
) omits the heading that is printed in the standard version of the environment:
%% LaTeX2e file `tmp.txt'
%% generated by the `filecontents' environment
%% from source `test' on 2018/11/20.
%%
To be or not to be,
that is % the question
An addendum on my (admittedly lazy) answer:
If you should want to persist on reinventing the wheel (which is much more fun, I must admit), then you can create a command to take care of the catcode
ing for you. Here I provide an ad hoc implementation of a verbwrite
command which does the job for you.
The command syntax is somewhat like LaTeX's verb
: you can use either as verbwritefile{<stuff>}
or verbwritefile|<stuff>|
. For the latter syntax, any character other than {
can be used to delimit the contents. This character, obviously, can't appear in <stuff>
. The advantage of the second syntax is that you don't have any restriction in balancing {
and }
inside the contents of the command.
documentclass{article}
makeatletter
longdef@ifnextchar@other@space#1#2#3{%
letreserved@d=#1%
defreserved@a{#2}%
defreserved@b{#3}%
futurelet@let@token@ifnch@other}
letkernel@ifnextchar@ifnextchar
def@ifnch@other{%
ifx@let@tokenother@sptoken
letreserved@c@xifnch@other
else
ifx@let@tokenreserved@d
letreserved@creserved@a
else
letreserved@creserved@b
fi
fi
reserved@c}
{catcode` =12
{globalletother@sptoken= }%
gdef@xifnch@other {futurelet@let@token@ifnch@other}}%
defverbwrite{@ifstar
{let@ifnextchar@ifnextchar@other@spaceverbwrite@grab}%
verbwrite@grab}
defverbwrite@grab#1{%
begingroup
catcode`^^M=13
newlinechar`^^M
letdo@makeother dospecials
catcode`{=1
@ifnextcharbgroup
{%
catcode`}=2
verbwrite@brace{#1}%
}%
{%
catcode`{=12
verbwrite@other{#1}%
}%
}
defverbwrite@brace#1#2{%
immediatewrite#1{unexpanded{#2}}%
endgroup
}
defverbwrite@other#1#2{%
defverbwrite@delim##1##2#2{%
immediatewrite##1{unexpanded{##2}}%
endgroup
}%
verbwrite@delim#1%
}
makeatother
begin{document}
newwritefile
immediateopenoutfile=tmp.txt
tracingall
verbwrite*file{To be or not to be,
that is % the question}
verbwritefile|To be or not to be,
that is } the {question|
verbwritefile$être ou ne pas être,
вот в чем вопрос$
verbwritefile}être ou ne pas être,
вот в чем вопрос}
closeoutfile
end{document}
Please beware that I took 63 minutes to write this command, so it is certainly not what you can call robust. Proceed with care :)
Fix 1: Prevent expansion of the text using ε-TeX's unexpanded
(thanks to jfbu :)
Fix 2: Prevent premature tokenization of the delimiter (thanks again to jfbu :)
Feature 1: Added a starred version that ignores spaces before the delimiter of the verbatim content.
Fix 3: Actually allow }
as a "other" delimiter (verbwritefile}stuff}
) (thanks to Ulrich Diez :)
1
Thank you very much :)
– Iydon
2 days ago
Try this withêtre ou ne pas être
. You will need to addusepackage[T1](fontenc}
(assuming pdflatex here). And this will only fix those benign letters, add then a Unicode letter for Cyrillic for example. On the other hand thefilecontents*
environment does not have that problem...even for быть или не быть
– jfbu
2 days ago
@jfbu It was creating content! The code has developed consciousness! Thanks for the warning, I hadn't realised that :-)
– Phelype Oleinik
2 days ago
Keep in mind, it could be worse if @egreg was around, and allow me anothe remark: try it with&
or$
as delimiters...
– jfbu
2 days ago
@jfbu Oops, that one was bad. Hopefully fixed now. Thanks :) And yes, egreg would most certainly spot a missing%
;)
– Phelype Oleinik
2 days ago
|
show 9 more comments
up vote
7
down vote
accepted
up vote
7
down vote
accepted
The LaTeX kernel provides the filecontents
environment to write to external files without having to worry about catcodes and such. The filecontents
package does minimal changes to this environment allowing it to be used anywhere in the document (LaTeX's version can only be used in the preamble, for some reason; and allowing it to overwrite existing files, which is also disabled in LaTeX's version.
To produce
To be or not to be,
that is % the question
you use:
documentclass{article}
usepackage{filecontents}
begin{document}
begin{filecontents*}{tmp.txt}
To be or not to be,
that is % the question
end{filecontents*}
end{document}
The starred version (filecontents*
) omits the heading that is printed in the standard version of the environment:
%% LaTeX2e file `tmp.txt'
%% generated by the `filecontents' environment
%% from source `test' on 2018/11/20.
%%
To be or not to be,
that is % the question
An addendum on my (admittedly lazy) answer:
If you should want to persist on reinventing the wheel (which is much more fun, I must admit), then you can create a command to take care of the catcode
ing for you. Here I provide an ad hoc implementation of a verbwrite
command which does the job for you.
The command syntax is somewhat like LaTeX's verb
: you can use either as verbwritefile{<stuff>}
or verbwritefile|<stuff>|
. For the latter syntax, any character other than {
can be used to delimit the contents. This character, obviously, can't appear in <stuff>
. The advantage of the second syntax is that you don't have any restriction in balancing {
and }
inside the contents of the command.
documentclass{article}
makeatletter
longdef@ifnextchar@other@space#1#2#3{%
letreserved@d=#1%
defreserved@a{#2}%
defreserved@b{#3}%
futurelet@let@token@ifnch@other}
letkernel@ifnextchar@ifnextchar
def@ifnch@other{%
ifx@let@tokenother@sptoken
letreserved@c@xifnch@other
else
ifx@let@tokenreserved@d
letreserved@creserved@a
else
letreserved@creserved@b
fi
fi
reserved@c}
{catcode` =12
{globalletother@sptoken= }%
gdef@xifnch@other {futurelet@let@token@ifnch@other}}%
defverbwrite{@ifstar
{let@ifnextchar@ifnextchar@other@spaceverbwrite@grab}%
verbwrite@grab}
defverbwrite@grab#1{%
begingroup
catcode`^^M=13
newlinechar`^^M
letdo@makeother dospecials
catcode`{=1
@ifnextcharbgroup
{%
catcode`}=2
verbwrite@brace{#1}%
}%
{%
catcode`{=12
verbwrite@other{#1}%
}%
}
defverbwrite@brace#1#2{%
immediatewrite#1{unexpanded{#2}}%
endgroup
}
defverbwrite@other#1#2{%
defverbwrite@delim##1##2#2{%
immediatewrite##1{unexpanded{##2}}%
endgroup
}%
verbwrite@delim#1%
}
makeatother
begin{document}
newwritefile
immediateopenoutfile=tmp.txt
tracingall
verbwrite*file{To be or not to be,
that is % the question}
verbwritefile|To be or not to be,
that is } the {question|
verbwritefile$être ou ne pas être,
вот в чем вопрос$
verbwritefile}être ou ne pas être,
вот в чем вопрос}
closeoutfile
end{document}
Please beware that I took 63 minutes to write this command, so it is certainly not what you can call robust. Proceed with care :)
Fix 1: Prevent expansion of the text using ε-TeX's unexpanded
(thanks to jfbu :)
Fix 2: Prevent premature tokenization of the delimiter (thanks again to jfbu :)
Feature 1: Added a starred version that ignores spaces before the delimiter of the verbatim content.
Fix 3: Actually allow }
as a "other" delimiter (verbwritefile}stuff}
) (thanks to Ulrich Diez :)
The LaTeX kernel provides the filecontents
environment to write to external files without having to worry about catcodes and such. The filecontents
package does minimal changes to this environment allowing it to be used anywhere in the document (LaTeX's version can only be used in the preamble, for some reason; and allowing it to overwrite existing files, which is also disabled in LaTeX's version.
To produce
To be or not to be,
that is % the question
you use:
documentclass{article}
usepackage{filecontents}
begin{document}
begin{filecontents*}{tmp.txt}
To be or not to be,
that is % the question
end{filecontents*}
end{document}
The starred version (filecontents*
) omits the heading that is printed in the standard version of the environment:
%% LaTeX2e file `tmp.txt'
%% generated by the `filecontents' environment
%% from source `test' on 2018/11/20.
%%
To be or not to be,
that is % the question
An addendum on my (admittedly lazy) answer:
If you should want to persist on reinventing the wheel (which is much more fun, I must admit), then you can create a command to take care of the catcode
ing for you. Here I provide an ad hoc implementation of a verbwrite
command which does the job for you.
The command syntax is somewhat like LaTeX's verb
: you can use either as verbwritefile{<stuff>}
or verbwritefile|<stuff>|
. For the latter syntax, any character other than {
can be used to delimit the contents. This character, obviously, can't appear in <stuff>
. The advantage of the second syntax is that you don't have any restriction in balancing {
and }
inside the contents of the command.
documentclass{article}
makeatletter
longdef@ifnextchar@other@space#1#2#3{%
letreserved@d=#1%
defreserved@a{#2}%
defreserved@b{#3}%
futurelet@let@token@ifnch@other}
letkernel@ifnextchar@ifnextchar
def@ifnch@other{%
ifx@let@tokenother@sptoken
letreserved@c@xifnch@other
else
ifx@let@tokenreserved@d
letreserved@creserved@a
else
letreserved@creserved@b
fi
fi
reserved@c}
{catcode` =12
{globalletother@sptoken= }%
gdef@xifnch@other {futurelet@let@token@ifnch@other}}%
defverbwrite{@ifstar
{let@ifnextchar@ifnextchar@other@spaceverbwrite@grab}%
verbwrite@grab}
defverbwrite@grab#1{%
begingroup
catcode`^^M=13
newlinechar`^^M
letdo@makeother dospecials
catcode`{=1
@ifnextcharbgroup
{%
catcode`}=2
verbwrite@brace{#1}%
}%
{%
catcode`{=12
verbwrite@other{#1}%
}%
}
defverbwrite@brace#1#2{%
immediatewrite#1{unexpanded{#2}}%
endgroup
}
defverbwrite@other#1#2{%
defverbwrite@delim##1##2#2{%
immediatewrite##1{unexpanded{##2}}%
endgroup
}%
verbwrite@delim#1%
}
makeatother
begin{document}
newwritefile
immediateopenoutfile=tmp.txt
tracingall
verbwrite*file{To be or not to be,
that is % the question}
verbwritefile|To be or not to be,
that is } the {question|
verbwritefile$être ou ne pas être,
вот в чем вопрос$
verbwritefile}être ou ne pas être,
вот в чем вопрос}
closeoutfile
end{document}
Please beware that I took 63 minutes to write this command, so it is certainly not what you can call robust. Proceed with care :)
Fix 1: Prevent expansion of the text using ε-TeX's unexpanded
(thanks to jfbu :)
Fix 2: Prevent premature tokenization of the delimiter (thanks again to jfbu :)
Feature 1: Added a starred version that ignores spaces before the delimiter of the verbatim content.
Fix 3: Actually allow }
as a "other" delimiter (verbwritefile}stuff}
) (thanks to Ulrich Diez :)
edited yesterday
answered 2 days ago
Phelype Oleinik
20.4k54277
20.4k54277
1
Thank you very much :)
– Iydon
2 days ago
Try this withêtre ou ne pas être
. You will need to addusepackage[T1](fontenc}
(assuming pdflatex here). And this will only fix those benign letters, add then a Unicode letter for Cyrillic for example. On the other hand thefilecontents*
environment does not have that problem...even for быть или не быть
– jfbu
2 days ago
@jfbu It was creating content! The code has developed consciousness! Thanks for the warning, I hadn't realised that :-)
– Phelype Oleinik
2 days ago
Keep in mind, it could be worse if @egreg was around, and allow me anothe remark: try it with&
or$
as delimiters...
– jfbu
2 days ago
@jfbu Oops, that one was bad. Hopefully fixed now. Thanks :) And yes, egreg would most certainly spot a missing%
;)
– Phelype Oleinik
2 days ago
|
show 9 more comments
1
Thank you very much :)
– Iydon
2 days ago
Try this withêtre ou ne pas être
. You will need to addusepackage[T1](fontenc}
(assuming pdflatex here). And this will only fix those benign letters, add then a Unicode letter for Cyrillic for example. On the other hand thefilecontents*
environment does not have that problem...even for быть или не быть
– jfbu
2 days ago
@jfbu It was creating content! The code has developed consciousness! Thanks for the warning, I hadn't realised that :-)
– Phelype Oleinik
2 days ago
Keep in mind, it could be worse if @egreg was around, and allow me anothe remark: try it with&
or$
as delimiters...
– jfbu
2 days ago
@jfbu Oops, that one was bad. Hopefully fixed now. Thanks :) And yes, egreg would most certainly spot a missing%
;)
– Phelype Oleinik
2 days ago
1
1
Thank you very much :)
– Iydon
2 days ago
Thank you very much :)
– Iydon
2 days ago
Try this with
être ou ne pas être
. You will need to add usepackage[T1](fontenc}
(assuming pdflatex here). And this will only fix those benign letters, add then a Unicode letter for Cyrillic for example. On the other hand the filecontents*
environment does not have that problem...even for быть или не быть– jfbu
2 days ago
Try this with
être ou ne pas être
. You will need to add usepackage[T1](fontenc}
(assuming pdflatex here). And this will only fix those benign letters, add then a Unicode letter for Cyrillic for example. On the other hand the filecontents*
environment does not have that problem...even for быть или не быть– jfbu
2 days ago
@jfbu It was creating content! The code has developed consciousness! Thanks for the warning, I hadn't realised that :-)
– Phelype Oleinik
2 days ago
@jfbu It was creating content! The code has developed consciousness! Thanks for the warning, I hadn't realised that :-)
– Phelype Oleinik
2 days ago
Keep in mind, it could be worse if @egreg was around, and allow me anothe remark: try it with
&
or $
as delimiters...– jfbu
2 days ago
Keep in mind, it could be worse if @egreg was around, and allow me anothe remark: try it with
&
or $
as delimiters...– jfbu
2 days ago
@jfbu Oops, that one was bad. Hopefully fixed now. Thanks :) And yes, egreg would most certainly spot a missing
%
;)– Phelype Oleinik
2 days ago
@jfbu Oops, that one was bad. Hopefully fixed now. Thanks :) And yes, egreg would most certainly spot a missing
%
;)– Phelype Oleinik
2 days ago
|
show 9 more comments
up vote
2
down vote
I suggest using the filecontents*
-environment.
Be aware that there is also a LaTeX 2ε-package filecontents which does remove some of the limitations that come along with the filecontents*
-environment from the LaTeX 2ε-kernel.
If you are in the mood for reinventing the wheel, you can write a macro which does
- switch to verbatim-catcode-régime,
- switch the catcode of the endlinechar (usually
^^M
/ASCII-Return) to 12 so that ASCII-return is treated like digits and punctuation-marks, - read and tokenize under that catcode-régime the argument containing the text that is to be written to file
- trim leading and trailing endline-chars from that text
- write the text to file while having
endlinechar
also asnewlinechar
.
In (La)TeX there are several stages of processing input.
(La)TeX does read TeX-input, e.g., a .tex-input-file, line by line.
In the pre-processing-stage, the single characters that form the line will be converted to (La)TeX' internal character encoding. (With old-school (La)TeX engines, the internal character-encoding is ASCII. With engines based on XeTeX or LuaTeX, the internal character-encoding is utf-8 whereof ASCII is a subset.) Then all space-characters (code-point-number 32 both in ASCII and in utf-8, i.e., in all encodings that come into question as internal-character encoding of a (La)TeX engine) that occur at the right end of the line will be removed. Then a character will be inserted at the right end of the line whose code-point-number in (La)TeX' internal character-encoding (i.e. ASCII or utf-8) corresponds to the number of the integer-parameter endlinechar
. Usually the value of the integer-parameter endlinechar
is 13 while code-point-number 13 both in ASCII and in utf-8, i.e., in all encodings that come into question as internal-character encoding of a (La)TeX engine, denotes the ⟨RETURN⟩-character. This means: Usually a ⟨RETURN⟩-character gets inserted at the right end of the line.
When this is done, the tokenizing-stage begins: In this stage (La)TeX takes the characters that form the line for instructions for placing tokens into the token-stream. This is the stage when things start to be about so-called tokens, e.g., control-sequence-tokens (which come in two flavors: control-word-tokens and control-symbol-tokens) and character-tokens. Character-tokens consist of character-codes denoting the code-point-number in the (La)TeX' internal character-encoding and category-codes. Category-codes make it possible for characters to have special meanings for the (La)TeX-engine. E.g., the category-code of the backslash-character usually is 0(escape). A character whose category-code is 0 at tokenizing-time causes (La)TeX to gather the name of a control-sequence-token and afterwards place that control-sequence-token into the token-stream. E.g., the category-code of the opening curly brace usually is 1(begin grouping) and the category-code of the closing curly brace usually is 2(end grouping) while character-tokens of category-code 1(begin grouping) are to be used for introducing groups (i.e., macro arguments consisting of several tokens or local-scopes for assignments like macro-definitions or the ⟨balanced text⟩ with things like scantokens
) and character-tokens of category-code 2(end grouping) are to be used for denoting what does not belong to the group in question any more. More information about category-codes can be found at https://en.wikibooks.org/wiki/TeX/catcode.
After tokenizing, there is a "stream of tokens". Processing the stream of tokens includes things like expansion of expandable tokens (e.g., macro-tokens, e.g., expandable primitives like string
or csname...endcsname
) and (later) carrying out assignments, creating boxes etc.
When reading and tokenizing a .tex-input-file, (La)TeX will— during the pre-processing-stage— remove spaces at every line-ending and insert an endline-character at every line-ending.
Therefore the input-sequence
immediatewritefile{
To be or not to be,
that is % the question
}
will by (La)TeX at tokenizing-time, i.e., after pre-processing, be treated as
immediatewritefile{⟨character due to endline-char-insertion⟩
To be or not to be,⟨character due to endline-char-insertion⟩
that is % the question⟨character due to endline-char-insertion⟩
}⟨character due to endline-char-insertion⟩
Usually the endline-character is ^^M
, i.e., ⟨RETURN⟩.
Thus the above input-sequence usually will by (La)TeX at tokenizing-time be treated as
immediatewritefile{⟨^^M/RETURN-character⟩
To be or not to be,⟨^^M/RETURN-character⟩
that is % the question⟨^^M/RETURN-character⟩
}⟨^^M/RETURN-character⟩
(The answer to the question which tokens (La)TeX will insert into the token-stream when encountering a ⟨^^M/RETURN-character⟩ depends on the category-code which at the time of tokenizing is assigned to the ⟨^^M/RETURN-character⟩.
Usually the category-code of the ⟨^^M/RETURN-character⟩ is 5 (end of line) which means that depending on the state of (La)TeX' reading apparatus either (in state S=skipping blanks) no token at all or (in state M=in the middle of a line) a space-token(=a character-token of category-code 10(space) and character-code 32 (32 is the number of the space-character in (La)TeX' internal character-encoding) or (in state N=about to begin new line) a par
-token will be inserted.
In case category code 12(other) is assigned to the ⟨^^M/RETURN-character⟩, (La)TeX will insert a character-token of category-code 12(other) and character-code 13 (13 is the number of the ⟨RETURN-character⟩, in (La)TeX' internal character-encoding) into the token-stream. Such a token can be processed as any other character token.)
Besides this, (La)TeX will—at writing-time—in any case attach at the end of the argument of a write
-command that sequence of characters/bytes that on the platform in use serves for ending lines within plain text files.
Thus—assuming that we managed to have LaTeX accept the percent-char as an ordinary character—the write
-command will get something like:
⟨token due to ^^M/RETURN-character⟩To be or not to be,⟨token due to ^^M/RETURN-character⟩that is % the question⟨token due to ^^M/RETURN-character⟩
Att writing-time, a
⟨platform-dependent sequence for ending the line⟩
will be attached.If the category code of the endline-character/of the ⟨^^M/RETURN-character⟩ was 5(end of line) at the time of tokenizing the input, the sequence
⟨space⟩To be or not to be,⟨space⟩that is % the question⟨space⟩⟨platform-dependent sequence for ending the line⟩
will be written to the external file.If the category code of the endline-character/of the ⟨^^M/RETURN-character⟩ was 12(return) at the time of tokenizing the input, the sequence
^^MTo be or not to be,^^Mthat is % the question^^M⟨platform-dependent sequence for ending the line⟩
will be written to the external file.You can ensure that at writing-time a ⟨^^M/RETURN-character⟩ also yields the ⟨platform-dependent sequence for ending the line⟩ by assigning the integer-parameter newlinechar
the value of the integer-parameter endlinechar
.
If you do this also, the sequence
⟨platform-dependent sequence for ending the line⟩To be or not to be,⟨platform-dependent sequence for ending the line⟩that is % the question⟨platform-dependent sequence for ending the line⟩⟨platform-dependent sequence for ending the line⟩
will be written to the external file.
But this way you might get undesired empty lines.
Therefore you may wish to apply a routine for removing leading and trailing ⟨characters due to endline-char-insertion⟩ from the entire argument before letting write
do the writing-job.
A coding-example could look like this:
documentclass{article}
makeatletter
begingroup
catcode`^^M=12relax%
@firstofone{%
endgroup%
newcommand*gobbleendl{}defgobbleendl ^^M{}%
newcommandtrimendls[2]{innertrimleadendl{#2}#1^^Mrelax{#1}}%
newcommand*innertrimleadendl{}%
definnertrimleadendl#1#2^^M#3relax#4{%
ifxrelax#2relaxexpandafter@firstoftwoelseexpandafter@secondoftwofi%
{%
ifxrelax#4relaxexpandafter@firstoftwoelseexpandafter@secondoftwofi%
{trimtrailendl{}{#1}}%
{expandaftertrimtrailendlexpandafter{gobbleendl#4}{#1}}%
}%
{trimtrailendl{#4}{#1}}%
}%
newcommand*trimtrailendl[2]{%
innertrimtrailendl{#2}.#1relax.^^Mrelax.relaxrelax{#1}%
}%
newcommand*innertrimtrailendl{}%
definnertrimtrailendl#1#2^^Mrelax.#3relaxrelax#4{%
ifxrelax#3relaxexpandafter@firstoftwoelseexpandafter@secondoftwofi%
{def@tempa{#4}}%
{expandafterdefexpandafter@tempaexpandafter{@gobble#2}}%
@onelevel@sanitize@tempa%
newlinechar=endlinechar%
immediatewrite#1{@tempa}%
}%
}%
newcommandimmediateverbatimwrite[1]{%
begingroup
letdo=@makeother
dospecials
catcode` =10 %We don't want to allow space as verb-arg-delimiter.
%Thus let's remove spaces when grabbing undelimited arguments.
%endlinechar=`^^M%
%catcode`endlinechar=5 %
bracefork{#1}%
}%
begingroup
catcode`(=1 %
catcode`{=12 %
@firstofone(%
endgroup
newcommandbracefork[2](%
catcode` =12relax
catcodeendlinechar=12 %
ifx{#2expandafter@firstoftwoelseexpandafter@secondoftwofi
(%
catcode`{=1 %
catcode`}=2 %
internalfilewritercaller(#1}(}%
}(%
internalfilewritercaller(#1}(#2}%
}%
}%
}%
newcommandinternalfilewritercaller[2]{%
def@tempa##1#2{internalfilewriter{#1}{##1}}%
ifxrelax#2relaxexpandafter@firstoftwoelseexpandafter@secondoftwofi
{expandafterexpandafter
expandafter@tempa
expandafterexpandafter
expandafter{%
expandafter@gobblestring}}%
{@tempa}%
}
newcommandinternalfilewriter[2]{%
trimendls{#2}{#1}%
endgroup
}%
makeatother
begin{document}
newwritefile
immediateopenoutfile=tmp.txtrelax
Aimmediateverbatimwrite{file}
{
être ou ne pas être.
That is % the question.
}B%
C%
%
Dimmediateverbatimwrite{file} |
}être ou ne pas être.
That is % the question.
|E%
F
immediatecloseoutfile
end{document}
With this example you get
- a pdf-file with the sequence ABCDEF. (This shows that no spurious spaces/whatsoever characters get introduced/inserted.)
- a text-file whose name is tmp.txt and whose content is:
être ou ne pas être.⟨linebreak⟩
That is % the question.⟨linebreak⟩
}être ou ne pas être.⟨linebreak⟩
That is % the question.⟨linebreak⟩
Due to the linebreaks, editors which also show line-numbers might display that file as1 être ou ne pas être.
2 That is % the question.
3 }être ou ne pas être.
4 That is % the question.
5
By the way: With (La)TeX it is not possible to keep spaces at the ends of lines.
The reason is that (La)TeX does read and tokenize input line by line and one of the first things it does (in the pre-processing-stage) to every line of input (even before adding the endline-character and starting tokenizing the line) is removing all spaces that occur at the ends of lines.
Thus (La)TeX input like
code⟨space⟩⟨space⟩
more code⟨space⟩⟨space⟩⟨space⟩⟨space⟩⟨space⟩
even more code⟨space⟩⟨space⟩
will in any case be pre-processed to
code⟨character due to endline-char-insertion⟩more code⟨character due to endline-char-insertion⟩even more code⟨character due to endline-char-insertion⟩
before any further processing/tokenization etc takes place.
add a comment |
up vote
2
down vote
I suggest using the filecontents*
-environment.
Be aware that there is also a LaTeX 2ε-package filecontents which does remove some of the limitations that come along with the filecontents*
-environment from the LaTeX 2ε-kernel.
If you are in the mood for reinventing the wheel, you can write a macro which does
- switch to verbatim-catcode-régime,
- switch the catcode of the endlinechar (usually
^^M
/ASCII-Return) to 12 so that ASCII-return is treated like digits and punctuation-marks, - read and tokenize under that catcode-régime the argument containing the text that is to be written to file
- trim leading and trailing endline-chars from that text
- write the text to file while having
endlinechar
also asnewlinechar
.
In (La)TeX there are several stages of processing input.
(La)TeX does read TeX-input, e.g., a .tex-input-file, line by line.
In the pre-processing-stage, the single characters that form the line will be converted to (La)TeX' internal character encoding. (With old-school (La)TeX engines, the internal character-encoding is ASCII. With engines based on XeTeX or LuaTeX, the internal character-encoding is utf-8 whereof ASCII is a subset.) Then all space-characters (code-point-number 32 both in ASCII and in utf-8, i.e., in all encodings that come into question as internal-character encoding of a (La)TeX engine) that occur at the right end of the line will be removed. Then a character will be inserted at the right end of the line whose code-point-number in (La)TeX' internal character-encoding (i.e. ASCII or utf-8) corresponds to the number of the integer-parameter endlinechar
. Usually the value of the integer-parameter endlinechar
is 13 while code-point-number 13 both in ASCII and in utf-8, i.e., in all encodings that come into question as internal-character encoding of a (La)TeX engine, denotes the ⟨RETURN⟩-character. This means: Usually a ⟨RETURN⟩-character gets inserted at the right end of the line.
When this is done, the tokenizing-stage begins: In this stage (La)TeX takes the characters that form the line for instructions for placing tokens into the token-stream. This is the stage when things start to be about so-called tokens, e.g., control-sequence-tokens (which come in two flavors: control-word-tokens and control-symbol-tokens) and character-tokens. Character-tokens consist of character-codes denoting the code-point-number in the (La)TeX' internal character-encoding and category-codes. Category-codes make it possible for characters to have special meanings for the (La)TeX-engine. E.g., the category-code of the backslash-character usually is 0(escape). A character whose category-code is 0 at tokenizing-time causes (La)TeX to gather the name of a control-sequence-token and afterwards place that control-sequence-token into the token-stream. E.g., the category-code of the opening curly brace usually is 1(begin grouping) and the category-code of the closing curly brace usually is 2(end grouping) while character-tokens of category-code 1(begin grouping) are to be used for introducing groups (i.e., macro arguments consisting of several tokens or local-scopes for assignments like macro-definitions or the ⟨balanced text⟩ with things like scantokens
) and character-tokens of category-code 2(end grouping) are to be used for denoting what does not belong to the group in question any more. More information about category-codes can be found at https://en.wikibooks.org/wiki/TeX/catcode.
After tokenizing, there is a "stream of tokens". Processing the stream of tokens includes things like expansion of expandable tokens (e.g., macro-tokens, e.g., expandable primitives like string
or csname...endcsname
) and (later) carrying out assignments, creating boxes etc.
When reading and tokenizing a .tex-input-file, (La)TeX will— during the pre-processing-stage— remove spaces at every line-ending and insert an endline-character at every line-ending.
Therefore the input-sequence
immediatewritefile{
To be or not to be,
that is % the question
}
will by (La)TeX at tokenizing-time, i.e., after pre-processing, be treated as
immediatewritefile{⟨character due to endline-char-insertion⟩
To be or not to be,⟨character due to endline-char-insertion⟩
that is % the question⟨character due to endline-char-insertion⟩
}⟨character due to endline-char-insertion⟩
Usually the endline-character is ^^M
, i.e., ⟨RETURN⟩.
Thus the above input-sequence usually will by (La)TeX at tokenizing-time be treated as
immediatewritefile{⟨^^M/RETURN-character⟩
To be or not to be,⟨^^M/RETURN-character⟩
that is % the question⟨^^M/RETURN-character⟩
}⟨^^M/RETURN-character⟩
(The answer to the question which tokens (La)TeX will insert into the token-stream when encountering a ⟨^^M/RETURN-character⟩ depends on the category-code which at the time of tokenizing is assigned to the ⟨^^M/RETURN-character⟩.
Usually the category-code of the ⟨^^M/RETURN-character⟩ is 5 (end of line) which means that depending on the state of (La)TeX' reading apparatus either (in state S=skipping blanks) no token at all or (in state M=in the middle of a line) a space-token(=a character-token of category-code 10(space) and character-code 32 (32 is the number of the space-character in (La)TeX' internal character-encoding) or (in state N=about to begin new line) a par
-token will be inserted.
In case category code 12(other) is assigned to the ⟨^^M/RETURN-character⟩, (La)TeX will insert a character-token of category-code 12(other) and character-code 13 (13 is the number of the ⟨RETURN-character⟩, in (La)TeX' internal character-encoding) into the token-stream. Such a token can be processed as any other character token.)
Besides this, (La)TeX will—at writing-time—in any case attach at the end of the argument of a write
-command that sequence of characters/bytes that on the platform in use serves for ending lines within plain text files.
Thus—assuming that we managed to have LaTeX accept the percent-char as an ordinary character—the write
-command will get something like:
⟨token due to ^^M/RETURN-character⟩To be or not to be,⟨token due to ^^M/RETURN-character⟩that is % the question⟨token due to ^^M/RETURN-character⟩
Att writing-time, a
⟨platform-dependent sequence for ending the line⟩
will be attached.If the category code of the endline-character/of the ⟨^^M/RETURN-character⟩ was 5(end of line) at the time of tokenizing the input, the sequence
⟨space⟩To be or not to be,⟨space⟩that is % the question⟨space⟩⟨platform-dependent sequence for ending the line⟩
will be written to the external file.If the category code of the endline-character/of the ⟨^^M/RETURN-character⟩ was 12(return) at the time of tokenizing the input, the sequence
^^MTo be or not to be,^^Mthat is % the question^^M⟨platform-dependent sequence for ending the line⟩
will be written to the external file.You can ensure that at writing-time a ⟨^^M/RETURN-character⟩ also yields the ⟨platform-dependent sequence for ending the line⟩ by assigning the integer-parameter newlinechar
the value of the integer-parameter endlinechar
.
If you do this also, the sequence
⟨platform-dependent sequence for ending the line⟩To be or not to be,⟨platform-dependent sequence for ending the line⟩that is % the question⟨platform-dependent sequence for ending the line⟩⟨platform-dependent sequence for ending the line⟩
will be written to the external file.
But this way you might get undesired empty lines.
Therefore you may wish to apply a routine for removing leading and trailing ⟨characters due to endline-char-insertion⟩ from the entire argument before letting write
do the writing-job.
A coding-example could look like this:
documentclass{article}
makeatletter
begingroup
catcode`^^M=12relax%
@firstofone{%
endgroup%
newcommand*gobbleendl{}defgobbleendl ^^M{}%
newcommandtrimendls[2]{innertrimleadendl{#2}#1^^Mrelax{#1}}%
newcommand*innertrimleadendl{}%
definnertrimleadendl#1#2^^M#3relax#4{%
ifxrelax#2relaxexpandafter@firstoftwoelseexpandafter@secondoftwofi%
{%
ifxrelax#4relaxexpandafter@firstoftwoelseexpandafter@secondoftwofi%
{trimtrailendl{}{#1}}%
{expandaftertrimtrailendlexpandafter{gobbleendl#4}{#1}}%
}%
{trimtrailendl{#4}{#1}}%
}%
newcommand*trimtrailendl[2]{%
innertrimtrailendl{#2}.#1relax.^^Mrelax.relaxrelax{#1}%
}%
newcommand*innertrimtrailendl{}%
definnertrimtrailendl#1#2^^Mrelax.#3relaxrelax#4{%
ifxrelax#3relaxexpandafter@firstoftwoelseexpandafter@secondoftwofi%
{def@tempa{#4}}%
{expandafterdefexpandafter@tempaexpandafter{@gobble#2}}%
@onelevel@sanitize@tempa%
newlinechar=endlinechar%
immediatewrite#1{@tempa}%
}%
}%
newcommandimmediateverbatimwrite[1]{%
begingroup
letdo=@makeother
dospecials
catcode` =10 %We don't want to allow space as verb-arg-delimiter.
%Thus let's remove spaces when grabbing undelimited arguments.
%endlinechar=`^^M%
%catcode`endlinechar=5 %
bracefork{#1}%
}%
begingroup
catcode`(=1 %
catcode`{=12 %
@firstofone(%
endgroup
newcommandbracefork[2](%
catcode` =12relax
catcodeendlinechar=12 %
ifx{#2expandafter@firstoftwoelseexpandafter@secondoftwofi
(%
catcode`{=1 %
catcode`}=2 %
internalfilewritercaller(#1}(}%
}(%
internalfilewritercaller(#1}(#2}%
}%
}%
}%
newcommandinternalfilewritercaller[2]{%
def@tempa##1#2{internalfilewriter{#1}{##1}}%
ifxrelax#2relaxexpandafter@firstoftwoelseexpandafter@secondoftwofi
{expandafterexpandafter
expandafter@tempa
expandafterexpandafter
expandafter{%
expandafter@gobblestring}}%
{@tempa}%
}
newcommandinternalfilewriter[2]{%
trimendls{#2}{#1}%
endgroup
}%
makeatother
begin{document}
newwritefile
immediateopenoutfile=tmp.txtrelax
Aimmediateverbatimwrite{file}
{
être ou ne pas être.
That is % the question.
}B%
C%
%
Dimmediateverbatimwrite{file} |
}être ou ne pas être.
That is % the question.
|E%
F
immediatecloseoutfile
end{document}
With this example you get
- a pdf-file with the sequence ABCDEF. (This shows that no spurious spaces/whatsoever characters get introduced/inserted.)
- a text-file whose name is tmp.txt and whose content is:
être ou ne pas être.⟨linebreak⟩
That is % the question.⟨linebreak⟩
}être ou ne pas être.⟨linebreak⟩
That is % the question.⟨linebreak⟩
Due to the linebreaks, editors which also show line-numbers might display that file as1 être ou ne pas être.
2 That is % the question.
3 }être ou ne pas être.
4 That is % the question.
5
By the way: With (La)TeX it is not possible to keep spaces at the ends of lines.
The reason is that (La)TeX does read and tokenize input line by line and one of the first things it does (in the pre-processing-stage) to every line of input (even before adding the endline-character and starting tokenizing the line) is removing all spaces that occur at the ends of lines.
Thus (La)TeX input like
code⟨space⟩⟨space⟩
more code⟨space⟩⟨space⟩⟨space⟩⟨space⟩⟨space⟩
even more code⟨space⟩⟨space⟩
will in any case be pre-processed to
code⟨character due to endline-char-insertion⟩more code⟨character due to endline-char-insertion⟩even more code⟨character due to endline-char-insertion⟩
before any further processing/tokenization etc takes place.
add a comment |
up vote
2
down vote
up vote
2
down vote
I suggest using the filecontents*
-environment.
Be aware that there is also a LaTeX 2ε-package filecontents which does remove some of the limitations that come along with the filecontents*
-environment from the LaTeX 2ε-kernel.
If you are in the mood for reinventing the wheel, you can write a macro which does
- switch to verbatim-catcode-régime,
- switch the catcode of the endlinechar (usually
^^M
/ASCII-Return) to 12 so that ASCII-return is treated like digits and punctuation-marks, - read and tokenize under that catcode-régime the argument containing the text that is to be written to file
- trim leading and trailing endline-chars from that text
- write the text to file while having
endlinechar
also asnewlinechar
.
In (La)TeX there are several stages of processing input.
(La)TeX does read TeX-input, e.g., a .tex-input-file, line by line.
In the pre-processing-stage, the single characters that form the line will be converted to (La)TeX' internal character encoding. (With old-school (La)TeX engines, the internal character-encoding is ASCII. With engines based on XeTeX or LuaTeX, the internal character-encoding is utf-8 whereof ASCII is a subset.) Then all space-characters (code-point-number 32 both in ASCII and in utf-8, i.e., in all encodings that come into question as internal-character encoding of a (La)TeX engine) that occur at the right end of the line will be removed. Then a character will be inserted at the right end of the line whose code-point-number in (La)TeX' internal character-encoding (i.e. ASCII or utf-8) corresponds to the number of the integer-parameter endlinechar
. Usually the value of the integer-parameter endlinechar
is 13 while code-point-number 13 both in ASCII and in utf-8, i.e., in all encodings that come into question as internal-character encoding of a (La)TeX engine, denotes the ⟨RETURN⟩-character. This means: Usually a ⟨RETURN⟩-character gets inserted at the right end of the line.
When this is done, the tokenizing-stage begins: In this stage (La)TeX takes the characters that form the line for instructions for placing tokens into the token-stream. This is the stage when things start to be about so-called tokens, e.g., control-sequence-tokens (which come in two flavors: control-word-tokens and control-symbol-tokens) and character-tokens. Character-tokens consist of character-codes denoting the code-point-number in the (La)TeX' internal character-encoding and category-codes. Category-codes make it possible for characters to have special meanings for the (La)TeX-engine. E.g., the category-code of the backslash-character usually is 0(escape). A character whose category-code is 0 at tokenizing-time causes (La)TeX to gather the name of a control-sequence-token and afterwards place that control-sequence-token into the token-stream. E.g., the category-code of the opening curly brace usually is 1(begin grouping) and the category-code of the closing curly brace usually is 2(end grouping) while character-tokens of category-code 1(begin grouping) are to be used for introducing groups (i.e., macro arguments consisting of several tokens or local-scopes for assignments like macro-definitions or the ⟨balanced text⟩ with things like scantokens
) and character-tokens of category-code 2(end grouping) are to be used for denoting what does not belong to the group in question any more. More information about category-codes can be found at https://en.wikibooks.org/wiki/TeX/catcode.
After tokenizing, there is a "stream of tokens". Processing the stream of tokens includes things like expansion of expandable tokens (e.g., macro-tokens, e.g., expandable primitives like string
or csname...endcsname
) and (later) carrying out assignments, creating boxes etc.
When reading and tokenizing a .tex-input-file, (La)TeX will— during the pre-processing-stage— remove spaces at every line-ending and insert an endline-character at every line-ending.
Therefore the input-sequence
immediatewritefile{
To be or not to be,
that is % the question
}
will by (La)TeX at tokenizing-time, i.e., after pre-processing, be treated as
immediatewritefile{⟨character due to endline-char-insertion⟩
To be or not to be,⟨character due to endline-char-insertion⟩
that is % the question⟨character due to endline-char-insertion⟩
}⟨character due to endline-char-insertion⟩
Usually the endline-character is ^^M
, i.e., ⟨RETURN⟩.
Thus the above input-sequence usually will by (La)TeX at tokenizing-time be treated as
immediatewritefile{⟨^^M/RETURN-character⟩
To be or not to be,⟨^^M/RETURN-character⟩
that is % the question⟨^^M/RETURN-character⟩
}⟨^^M/RETURN-character⟩
(The answer to the question which tokens (La)TeX will insert into the token-stream when encountering a ⟨^^M/RETURN-character⟩ depends on the category-code which at the time of tokenizing is assigned to the ⟨^^M/RETURN-character⟩.
Usually the category-code of the ⟨^^M/RETURN-character⟩ is 5 (end of line) which means that depending on the state of (La)TeX' reading apparatus either (in state S=skipping blanks) no token at all or (in state M=in the middle of a line) a space-token(=a character-token of category-code 10(space) and character-code 32 (32 is the number of the space-character in (La)TeX' internal character-encoding) or (in state N=about to begin new line) a par
-token will be inserted.
In case category code 12(other) is assigned to the ⟨^^M/RETURN-character⟩, (La)TeX will insert a character-token of category-code 12(other) and character-code 13 (13 is the number of the ⟨RETURN-character⟩, in (La)TeX' internal character-encoding) into the token-stream. Such a token can be processed as any other character token.)
Besides this, (La)TeX will—at writing-time—in any case attach at the end of the argument of a write
-command that sequence of characters/bytes that on the platform in use serves for ending lines within plain text files.
Thus—assuming that we managed to have LaTeX accept the percent-char as an ordinary character—the write
-command will get something like:
⟨token due to ^^M/RETURN-character⟩To be or not to be,⟨token due to ^^M/RETURN-character⟩that is % the question⟨token due to ^^M/RETURN-character⟩
Att writing-time, a
⟨platform-dependent sequence for ending the line⟩
will be attached.If the category code of the endline-character/of the ⟨^^M/RETURN-character⟩ was 5(end of line) at the time of tokenizing the input, the sequence
⟨space⟩To be or not to be,⟨space⟩that is % the question⟨space⟩⟨platform-dependent sequence for ending the line⟩
will be written to the external file.If the category code of the endline-character/of the ⟨^^M/RETURN-character⟩ was 12(return) at the time of tokenizing the input, the sequence
^^MTo be or not to be,^^Mthat is % the question^^M⟨platform-dependent sequence for ending the line⟩
will be written to the external file.You can ensure that at writing-time a ⟨^^M/RETURN-character⟩ also yields the ⟨platform-dependent sequence for ending the line⟩ by assigning the integer-parameter newlinechar
the value of the integer-parameter endlinechar
.
If you do this also, the sequence
⟨platform-dependent sequence for ending the line⟩To be or not to be,⟨platform-dependent sequence for ending the line⟩that is % the question⟨platform-dependent sequence for ending the line⟩⟨platform-dependent sequence for ending the line⟩
will be written to the external file.
But this way you might get undesired empty lines.
Therefore you may wish to apply a routine for removing leading and trailing ⟨characters due to endline-char-insertion⟩ from the entire argument before letting write
do the writing-job.
A coding-example could look like this:
documentclass{article}
makeatletter
begingroup
catcode`^^M=12relax%
@firstofone{%
endgroup%
newcommand*gobbleendl{}defgobbleendl ^^M{}%
newcommandtrimendls[2]{innertrimleadendl{#2}#1^^Mrelax{#1}}%
newcommand*innertrimleadendl{}%
definnertrimleadendl#1#2^^M#3relax#4{%
ifxrelax#2relaxexpandafter@firstoftwoelseexpandafter@secondoftwofi%
{%
ifxrelax#4relaxexpandafter@firstoftwoelseexpandafter@secondoftwofi%
{trimtrailendl{}{#1}}%
{expandaftertrimtrailendlexpandafter{gobbleendl#4}{#1}}%
}%
{trimtrailendl{#4}{#1}}%
}%
newcommand*trimtrailendl[2]{%
innertrimtrailendl{#2}.#1relax.^^Mrelax.relaxrelax{#1}%
}%
newcommand*innertrimtrailendl{}%
definnertrimtrailendl#1#2^^Mrelax.#3relaxrelax#4{%
ifxrelax#3relaxexpandafter@firstoftwoelseexpandafter@secondoftwofi%
{def@tempa{#4}}%
{expandafterdefexpandafter@tempaexpandafter{@gobble#2}}%
@onelevel@sanitize@tempa%
newlinechar=endlinechar%
immediatewrite#1{@tempa}%
}%
}%
newcommandimmediateverbatimwrite[1]{%
begingroup
letdo=@makeother
dospecials
catcode` =10 %We don't want to allow space as verb-arg-delimiter.
%Thus let's remove spaces when grabbing undelimited arguments.
%endlinechar=`^^M%
%catcode`endlinechar=5 %
bracefork{#1}%
}%
begingroup
catcode`(=1 %
catcode`{=12 %
@firstofone(%
endgroup
newcommandbracefork[2](%
catcode` =12relax
catcodeendlinechar=12 %
ifx{#2expandafter@firstoftwoelseexpandafter@secondoftwofi
(%
catcode`{=1 %
catcode`}=2 %
internalfilewritercaller(#1}(}%
}(%
internalfilewritercaller(#1}(#2}%
}%
}%
}%
newcommandinternalfilewritercaller[2]{%
def@tempa##1#2{internalfilewriter{#1}{##1}}%
ifxrelax#2relaxexpandafter@firstoftwoelseexpandafter@secondoftwofi
{expandafterexpandafter
expandafter@tempa
expandafterexpandafter
expandafter{%
expandafter@gobblestring}}%
{@tempa}%
}
newcommandinternalfilewriter[2]{%
trimendls{#2}{#1}%
endgroup
}%
makeatother
begin{document}
newwritefile
immediateopenoutfile=tmp.txtrelax
Aimmediateverbatimwrite{file}
{
être ou ne pas être.
That is % the question.
}B%
C%
%
Dimmediateverbatimwrite{file} |
}être ou ne pas être.
That is % the question.
|E%
F
immediatecloseoutfile
end{document}
With this example you get
- a pdf-file with the sequence ABCDEF. (This shows that no spurious spaces/whatsoever characters get introduced/inserted.)
- a text-file whose name is tmp.txt and whose content is:
être ou ne pas être.⟨linebreak⟩
That is % the question.⟨linebreak⟩
}être ou ne pas être.⟨linebreak⟩
That is % the question.⟨linebreak⟩
Due to the linebreaks, editors which also show line-numbers might display that file as1 être ou ne pas être.
2 That is % the question.
3 }être ou ne pas être.
4 That is % the question.
5
By the way: With (La)TeX it is not possible to keep spaces at the ends of lines.
The reason is that (La)TeX does read and tokenize input line by line and one of the first things it does (in the pre-processing-stage) to every line of input (even before adding the endline-character and starting tokenizing the line) is removing all spaces that occur at the ends of lines.
Thus (La)TeX input like
code⟨space⟩⟨space⟩
more code⟨space⟩⟨space⟩⟨space⟩⟨space⟩⟨space⟩
even more code⟨space⟩⟨space⟩
will in any case be pre-processed to
code⟨character due to endline-char-insertion⟩more code⟨character due to endline-char-insertion⟩even more code⟨character due to endline-char-insertion⟩
before any further processing/tokenization etc takes place.
I suggest using the filecontents*
-environment.
Be aware that there is also a LaTeX 2ε-package filecontents which does remove some of the limitations that come along with the filecontents*
-environment from the LaTeX 2ε-kernel.
If you are in the mood for reinventing the wheel, you can write a macro which does
- switch to verbatim-catcode-régime,
- switch the catcode of the endlinechar (usually
^^M
/ASCII-Return) to 12 so that ASCII-return is treated like digits and punctuation-marks, - read and tokenize under that catcode-régime the argument containing the text that is to be written to file
- trim leading and trailing endline-chars from that text
- write the text to file while having
endlinechar
also asnewlinechar
.
In (La)TeX there are several stages of processing input.
(La)TeX does read TeX-input, e.g., a .tex-input-file, line by line.
In the pre-processing-stage, the single characters that form the line will be converted to (La)TeX' internal character encoding. (With old-school (La)TeX engines, the internal character-encoding is ASCII. With engines based on XeTeX or LuaTeX, the internal character-encoding is utf-8 whereof ASCII is a subset.) Then all space-characters (code-point-number 32 both in ASCII and in utf-8, i.e., in all encodings that come into question as internal-character encoding of a (La)TeX engine) that occur at the right end of the line will be removed. Then a character will be inserted at the right end of the line whose code-point-number in (La)TeX' internal character-encoding (i.e. ASCII or utf-8) corresponds to the number of the integer-parameter endlinechar
. Usually the value of the integer-parameter endlinechar
is 13 while code-point-number 13 both in ASCII and in utf-8, i.e., in all encodings that come into question as internal-character encoding of a (La)TeX engine, denotes the ⟨RETURN⟩-character. This means: Usually a ⟨RETURN⟩-character gets inserted at the right end of the line.
When this is done, the tokenizing-stage begins: In this stage (La)TeX takes the characters that form the line for instructions for placing tokens into the token-stream. This is the stage when things start to be about so-called tokens, e.g., control-sequence-tokens (which come in two flavors: control-word-tokens and control-symbol-tokens) and character-tokens. Character-tokens consist of character-codes denoting the code-point-number in the (La)TeX' internal character-encoding and category-codes. Category-codes make it possible for characters to have special meanings for the (La)TeX-engine. E.g., the category-code of the backslash-character usually is 0(escape). A character whose category-code is 0 at tokenizing-time causes (La)TeX to gather the name of a control-sequence-token and afterwards place that control-sequence-token into the token-stream. E.g., the category-code of the opening curly brace usually is 1(begin grouping) and the category-code of the closing curly brace usually is 2(end grouping) while character-tokens of category-code 1(begin grouping) are to be used for introducing groups (i.e., macro arguments consisting of several tokens or local-scopes for assignments like macro-definitions or the ⟨balanced text⟩ with things like scantokens
) and character-tokens of category-code 2(end grouping) are to be used for denoting what does not belong to the group in question any more. More information about category-codes can be found at https://en.wikibooks.org/wiki/TeX/catcode.
After tokenizing, there is a "stream of tokens". Processing the stream of tokens includes things like expansion of expandable tokens (e.g., macro-tokens, e.g., expandable primitives like string
or csname...endcsname
) and (later) carrying out assignments, creating boxes etc.
When reading and tokenizing a .tex-input-file, (La)TeX will— during the pre-processing-stage— remove spaces at every line-ending and insert an endline-character at every line-ending.
Therefore the input-sequence
immediatewritefile{
To be or not to be,
that is % the question
}
will by (La)TeX at tokenizing-time, i.e., after pre-processing, be treated as
immediatewritefile{⟨character due to endline-char-insertion⟩
To be or not to be,⟨character due to endline-char-insertion⟩
that is % the question⟨character due to endline-char-insertion⟩
}⟨character due to endline-char-insertion⟩
Usually the endline-character is ^^M
, i.e., ⟨RETURN⟩.
Thus the above input-sequence usually will by (La)TeX at tokenizing-time be treated as
immediatewritefile{⟨^^M/RETURN-character⟩
To be or not to be,⟨^^M/RETURN-character⟩
that is % the question⟨^^M/RETURN-character⟩
}⟨^^M/RETURN-character⟩
(The answer to the question which tokens (La)TeX will insert into the token-stream when encountering a ⟨^^M/RETURN-character⟩ depends on the category-code which at the time of tokenizing is assigned to the ⟨^^M/RETURN-character⟩.
Usually the category-code of the ⟨^^M/RETURN-character⟩ is 5 (end of line) which means that depending on the state of (La)TeX' reading apparatus either (in state S=skipping blanks) no token at all or (in state M=in the middle of a line) a space-token(=a character-token of category-code 10(space) and character-code 32 (32 is the number of the space-character in (La)TeX' internal character-encoding) or (in state N=about to begin new line) a par
-token will be inserted.
In case category code 12(other) is assigned to the ⟨^^M/RETURN-character⟩, (La)TeX will insert a character-token of category-code 12(other) and character-code 13 (13 is the number of the ⟨RETURN-character⟩, in (La)TeX' internal character-encoding) into the token-stream. Such a token can be processed as any other character token.)
Besides this, (La)TeX will—at writing-time—in any case attach at the end of the argument of a write
-command that sequence of characters/bytes that on the platform in use serves for ending lines within plain text files.
Thus—assuming that we managed to have LaTeX accept the percent-char as an ordinary character—the write
-command will get something like:
⟨token due to ^^M/RETURN-character⟩To be or not to be,⟨token due to ^^M/RETURN-character⟩that is % the question⟨token due to ^^M/RETURN-character⟩
Att writing-time, a
⟨platform-dependent sequence for ending the line⟩
will be attached.If the category code of the endline-character/of the ⟨^^M/RETURN-character⟩ was 5(end of line) at the time of tokenizing the input, the sequence
⟨space⟩To be or not to be,⟨space⟩that is % the question⟨space⟩⟨platform-dependent sequence for ending the line⟩
will be written to the external file.If the category code of the endline-character/of the ⟨^^M/RETURN-character⟩ was 12(return) at the time of tokenizing the input, the sequence
^^MTo be or not to be,^^Mthat is % the question^^M⟨platform-dependent sequence for ending the line⟩
will be written to the external file.You can ensure that at writing-time a ⟨^^M/RETURN-character⟩ also yields the ⟨platform-dependent sequence for ending the line⟩ by assigning the integer-parameter newlinechar
the value of the integer-parameter endlinechar
.
If you do this also, the sequence
⟨platform-dependent sequence for ending the line⟩To be or not to be,⟨platform-dependent sequence for ending the line⟩that is % the question⟨platform-dependent sequence for ending the line⟩⟨platform-dependent sequence for ending the line⟩
will be written to the external file.
But this way you might get undesired empty lines.
Therefore you may wish to apply a routine for removing leading and trailing ⟨characters due to endline-char-insertion⟩ from the entire argument before letting write
do the writing-job.
A coding-example could look like this:
documentclass{article}
makeatletter
begingroup
catcode`^^M=12relax%
@firstofone{%
endgroup%
newcommand*gobbleendl{}defgobbleendl ^^M{}%
newcommandtrimendls[2]{innertrimleadendl{#2}#1^^Mrelax{#1}}%
newcommand*innertrimleadendl{}%
definnertrimleadendl#1#2^^M#3relax#4{%
ifxrelax#2relaxexpandafter@firstoftwoelseexpandafter@secondoftwofi%
{%
ifxrelax#4relaxexpandafter@firstoftwoelseexpandafter@secondoftwofi%
{trimtrailendl{}{#1}}%
{expandaftertrimtrailendlexpandafter{gobbleendl#4}{#1}}%
}%
{trimtrailendl{#4}{#1}}%
}%
newcommand*trimtrailendl[2]{%
innertrimtrailendl{#2}.#1relax.^^Mrelax.relaxrelax{#1}%
}%
newcommand*innertrimtrailendl{}%
definnertrimtrailendl#1#2^^Mrelax.#3relaxrelax#4{%
ifxrelax#3relaxexpandafter@firstoftwoelseexpandafter@secondoftwofi%
{def@tempa{#4}}%
{expandafterdefexpandafter@tempaexpandafter{@gobble#2}}%
@onelevel@sanitize@tempa%
newlinechar=endlinechar%
immediatewrite#1{@tempa}%
}%
}%
newcommandimmediateverbatimwrite[1]{%
begingroup
letdo=@makeother
dospecials
catcode` =10 %We don't want to allow space as verb-arg-delimiter.
%Thus let's remove spaces when grabbing undelimited arguments.
%endlinechar=`^^M%
%catcode`endlinechar=5 %
bracefork{#1}%
}%
begingroup
catcode`(=1 %
catcode`{=12 %
@firstofone(%
endgroup
newcommandbracefork[2](%
catcode` =12relax
catcodeendlinechar=12 %
ifx{#2expandafter@firstoftwoelseexpandafter@secondoftwofi
(%
catcode`{=1 %
catcode`}=2 %
internalfilewritercaller(#1}(}%
}(%
internalfilewritercaller(#1}(#2}%
}%
}%
}%
newcommandinternalfilewritercaller[2]{%
def@tempa##1#2{internalfilewriter{#1}{##1}}%
ifxrelax#2relaxexpandafter@firstoftwoelseexpandafter@secondoftwofi
{expandafterexpandafter
expandafter@tempa
expandafterexpandafter
expandafter{%
expandafter@gobblestring}}%
{@tempa}%
}
newcommandinternalfilewriter[2]{%
trimendls{#2}{#1}%
endgroup
}%
makeatother
begin{document}
newwritefile
immediateopenoutfile=tmp.txtrelax
Aimmediateverbatimwrite{file}
{
être ou ne pas être.
That is % the question.
}B%
C%
%
Dimmediateverbatimwrite{file} |
}être ou ne pas être.
That is % the question.
|E%
F
immediatecloseoutfile
end{document}
With this example you get
- a pdf-file with the sequence ABCDEF. (This shows that no spurious spaces/whatsoever characters get introduced/inserted.)
- a text-file whose name is tmp.txt and whose content is:
être ou ne pas être.⟨linebreak⟩
That is % the question.⟨linebreak⟩
}être ou ne pas être.⟨linebreak⟩
That is % the question.⟨linebreak⟩
Due to the linebreaks, editors which also show line-numbers might display that file as1 être ou ne pas être.
2 That is % the question.
3 }être ou ne pas être.
4 That is % the question.
5
By the way: With (La)TeX it is not possible to keep spaces at the ends of lines.
The reason is that (La)TeX does read and tokenize input line by line and one of the first things it does (in the pre-processing-stage) to every line of input (even before adding the endline-character and starting tokenizing the line) is removing all spaces that occur at the ends of lines.
Thus (La)TeX input like
code⟨space⟩⟨space⟩
more code⟨space⟩⟨space⟩⟨space⟩⟨space⟩⟨space⟩
even more code⟨space⟩⟨space⟩
will in any case be pre-processed to
code⟨character due to endline-char-insertion⟩more code⟨character due to endline-char-insertion⟩even more code⟨character due to endline-char-insertion⟩
before any further processing/tokenization etc takes place.
edited yesterday
answered 2 days ago
Ulrich Diez
3,755515
3,755515
add a comment |
add a comment |
Iydon is a new contributor. Be nice, and check out our Code of Conduct.
Iydon is a new contributor. Be nice, and check out our Code of Conduct.
Iydon is a new contributor. Be nice, and check out our Code of Conduct.
Iydon is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2ftex.stackexchange.com%2fquestions%2f460913%2fimmediate-write-with-plain-text%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Why don't you use the
filecontents
environment?– Phelype Oleinik
2 days ago
Well, I don't know this environment, I will try after dinner. Thanks a lot :)
– Iydon
2 days ago
You seem to wish to have linebreaks at the beginning and at the end of the argument removed instead of having them written to file. What behavior do you wish in case the argument of the
write
-command is empty or does contain only a single line-break, i.e.,immediatewritefile{}
orimmediatewritefile{<line-break>}
?– Ulrich Diez
yesterday