String Manipulation
The Linux ecosystem is packed with fantastic tools for working with text and strings. These include awk, grep, sed, and cut. For any heavyweight text wrangling, these should be your go-to choices.
Sometimes though, it’s convenient to use the shell’s built-in capabilities, especially when you’re writing a short and simple script. If your script is going to be shared with other people and it is going to run on their computers, using the standard Bash functionality means you don’t have to wonder about the presence or version of any of the other utilities.
If you need the power of the dedicated utilities, then by all means use them. That’s what they’re there for. But often your script and Bash can get the job done on their own.
Because they’re Bash built-ins, you can use them in scripts or on the command line. Using them in a terminal window is a fast and convenient way to prototype your commands and perfect the syntax. It avoids the edit, save, run, and debug cycle.
Creating and Working With String Variables
All we need to declare a variable and assign a string to it is to name the variable, use the equals sign =, and provide the string. If there are spaces in your string, wrap it in single or double-quotes. Make sure there is no whitespace on either side of the equals sign.
Once you’ve created a variable, that variable name is added to the shell’s list of tab completion words. In this example, typing “my_” and hitting the “Tab” key entered the full name on the command line.
Read-Only Variables
There is a declare command that we can use for declaring variables. In simple cases, you don’t really need it, but using it allows you to use some of the command’s options. Probably the one you’d use most is the -r (read-only) option. This creates a read-only variable that can’t be changed.
If we try to assign a new value to it, it’ll fail.
Writing to the Terminal Window
We can write several strings to the terminal window using echo or printf so that they appear as though they’re one string. And we’re not limited to our own strings variables, we can incorporate environment variables into our commands.
Concatenating Strings
The plus-equals operator, +=, lets you “add” two strings together. It’s called concatenating.
Note that you don’t get a space added automatically between concatenated strings. If you need to have a space, you need to explicitly put one at the end of the first string or at the start of the second.
RELATED: How to Set Environment Variables in Bash on Linux
Reading User Input
As well as creating string variables that have their contents defined as part of their declaration, we can read user input into a string variable.
The read command reads user input. The -p (prompt) option writes a prompt to the terminal window. The user’s input is stored in the string variable. In this example, the variable is called user_file.
If you don’t provide a string variable to capture the input, it will still work. The user input will be stored in a variable called REPLY.
It’s usually more convenient to provide your own variable and give it a meaningful name.
Manipulating Strings
Now that we have our strings, whether defined at creation time, read from user input, or created by concatenating strings, we can start to do things with them.
Finding the String Length
If it is important or useful to know the length of a string, we can get it by preceding the variable name with a hash “#” symbol.
Extracting Substrings by Character Offsets
We can extract a substring from a string variable by providing a start point within the string, and an optional length. If we don’t provide a length, the substring will contain everything from the start point up to the last character.
The start point and length follow the variable name, with a colon “:” between them. Note that the characters in a string variable are numbered starting at zero.
Another variation lets you discard a number of letters from the tail end of the string. Effectively it lets you set a start point, and use a negative number as the length. The substring will contain the characters from the start point up to the end of the string, minus the number of characters you specified in the negative number.
In all cases the original string variable is untouched. The “extracted” substring is not actually removed from the contents of the variable.
Extracting Substrings by Delimiter
The disadvantage of using character offsets is you need to know in advance where the substrings you want to extract are located within the string.
If your string is delimited by a repeating character, you can extract substrings without knowing where they are in the string, nor how long they are.
To search from the front of the string, follow the variable name with double percent signs, %%, the delimiting character, and an asterisk, *. The words in this string are delimited by spaces.
This returns the first substring from the front of the string that doesn’t contain the delimiter character. This is called the short substring option.
The long substring option returns the front part of the string up to the last delimited substring. In other words, it omits the last delimited substring. Syntactically, the only difference is it uses a single percent sign “%” in the command.
As you’d expect you can search in the same way from the end of the string. Instead of a percent sign, use a hash “#” sign, and move the delimiter to come after the asterisk “*” in the command.
This is the short substring option, it trims off the first substring it finds from the rear of the string that doesn’t contain the delimiter.
The long substring option returns the rear part of the string up to the first delimiter from the front of the string. In other words, it omits the first delimited substring.
Substring Substitution
Swapping substrings out for other substrings is easy. The format is the name of the string, the substring that will be replaced, and the substring that will be inserted, separated by forward slash “/” characters.
To limit the search to the end of the string, precede the search string with a percent sign “% ” character.
To limit the search to the start of the string, precede the search string with a hash “#” character.
A String Is a Flexible Thing
If a string isn’t just how you’d like it, or need it, these tools will help you reformat it so that it suits your needs. For complicated transformations, use the dedicated utilities, but for the minor tweaks use the shell built-ins and avoid the overhead of loading and running an external tool.
RELATED: Everything You Ever Wanted to Know About inodes on Linux