Bash file naming conventions are very rich and it is easy to create a script or line that incorrectly parses file names. Learn to parse file names correctly and, because, make sure your scripts work as intended.
The problem of correctly parsing file names in Bash
If you've been using Bash for a while and have been writing in its rich Bash language, you have probably run into some file name parsing problems. Let's take a look at a simple example of what can go wrong:
touch 'a > b'
Here we create a file that has a CR
(car return) entered it by pressing enter after the a
. Bash file naming conventions are very rich, And even though it's somehow cool, we can use special characters like these in a filename, let's see how this file fares when we try to perform some actions on it:
ls | xargs rm
This is not functional. xargs
will take the input of ls
(through him |
pipeline) and pass it to rm
, But something went wrong in the procedure!
What went wrong is that the output of ls
is taken literally by xargs
, and the 'enter’ (CR
– car return) inside the filename is seen by xargs
like a real ending character, not a CR
to be passed to rm
as it should be.
Let's exemplify this in another way:
ls | xargs -I{} echo '{}|'
It's clear: xargs
you are processing the input as two individual lines, splitting the original file name in two. Even if we had to fix space problems through elegant analysis using thirst, soon we would encounter other problems when we started using other special characters as spaces., back bars, quotation marks and more.
touch 'a b' touch 'a b' touch 'ab' touch 'a"b' touch "a'b" ls
Even if you are an experienced Bash developer, you may be shaken by seeing file names like this, since it would be very complex, for most common Bash tools, scan these files correctly. You would have to do all sorts of chain modifications for this to work.. In other words, unless you have the secret recipe.
Before we dive into that, there is one more thing, something you should know, you may come across when analyzing ls
production. If you use color coding for directory listings, which is enabled by default in Ubuntu, it's easy to run another set of ls
analysis problems.
These are not truly related to how the files are named, but rather with how the files are presented as output from ls
. the ls
the output will contain hex codes representing the color to be used in your terminal.
To avoid encountering these, just use --color=never
as an option for ls
:ls --color=never
.
And Mint 20 (a great OS derived from Ubuntu), this problem seems solved, even though it is possible that the problem is still present in many other versions of Ubuntu or older, etc. I have seen this problem in mid August 2020 in Ubuntu.
Even if you don't use color coding for your directory listings, your script is likely to run on other systems that are not owned or managed by you. In that case, you will also want to use this option to prevent users of that machine from running into the problem described.
Going back to our secret recipe, Let's see how we can make sure that we will not have problems with the special characters in the Bash file names. The answer provided avoids all use of ls
, that you would do well to avoid in general, so color coding problems are also not applicable.
There are still times when ls
Analysis is fast and convenient, but it will always be complicated and probably ‘dirty’ as soon as special characters are entered, not to mention they are unsafe (special characters can be used to introduce all kinds of problems).
The secret recipe: NULL termination
The developers of Bash tools have realized this same problem many years before and have provided us: NULL
termination!
What is it NULL
completion questions? Consider how in the above examples, CR
(the literally get into) was the ending main character.
We also saw how you can use special characters such as quotes, blanks and backslashes in file names, even though they have special functions when it comes to other Bash text analysis and modification tools like sed. Now compare this with the -0
option a xargs, from man xargs
:
-0, -null Input items end with a null character instead of a blank space, and the quotes and the backslash are not special (all characters are taken literally). Disable end of file string, which is treated like any other argument. Useful when input items can contain blanks, quotes or backslashes. The GNU find -print0 option produces a suitable input for this mode.
And the -print0
option a find
, from man find
:
-fprint0 archivo Certain; prints the full file name to standard output, followed by a null character (instead of the newline character that uses -print). This enables programs that process search output to correctly interpret file names that contain new lines or other types of white space.. This option corresponds to the option -0 de xargs.
the Certain; here means If the option is specified, the following is true;. Also interesting are the two clear warnings that are given in other parts of the same manual page:
- If you are piping the output of find to another program and there is the slightest chance that the files you are looking for contain a new line, then you should seriously consider using the -print0 option instead of -print. See the UNUSUAL FILE NAMES section for information on how unusual characters are handled in file names..
- If you are using search in a script or in a situation where matching files may have arbitrary names, you should consider using -print0 instead of -print.
These clear warnings remind us that analyzing file names in bash can be, And it is, a complicated business. Despite this, with the right alternatives for find
, namely -print0
, and xargs
, namely -0
, all our special characters containing file names can be scanned correctly:
ls find . -name 'a*' -print0 find . -name 'a*' -print0 | xargs -0 ls find . -name 'a*' -print0 | xargs -0 rm
First we review our directory list. All our filenames containing special characters are there. Next we do a simple find ... -print0
to see the output. We observe that the strings are NULL
finished (with the NULL
O