Last updated on March 7, 2021 by Dan Nanni
Without explicit support for variable types, all bash variables are by default treated as character strings. Therefore more often than not, you need to manipulate string variables in various fashions while working on your bash script. Unless you are well-versed in this department, you may end up constantly coming back to Google and searching for tips and examples to handle your specific use case.
In the spirit of saving your time and thus boosting your productivity in shell scripting, I compile in this tutorial a comprehensive list of useful string manipulation tips for bash scripting. Where possible I will try to use bash's built-in mechanisms (e.g., parameter expansion) to manipulate strings instead of invoking external tools such as awk
, sed
or grep
.
If you find any missing tips, feel free to suggest it in the comment. I will be happy to incorporate it in the article.
Foremost, it is worth noting that when you are working with string variables, it is good practice to wrap double quotes around them (e.g., "$var1"). That is because bash can apply word splitting while expanding a variable if the variable is not quoted. If the string stored in an unquoted variable contains whitespaces, the string may be split by whitespaces and treated as multiple strings, depending on contexts (e.g., when the string variable is used as an argument to a function).
In bash, there is no dedicated operator that concatenates two strings. To combine two string variables in bash, you can simply put one variable after another without any special operator in between. If you want to concatenate a string variable with a string literal, you need to enclose the variable in curly braces {} to distinguish the variable name from the subsequent string literal. See the following example for string concatenation in bash.
base=http://www.abc.com api="/devices/" deviceid=1024 url="$base$api$deviceid" # concatenate string variables url2="$base$api${deviceid}/ports" # concatenate a string variable with a string literal echo "URL: $url" echo "URL2: $url2"
URL: http://www.abc.com/devices/1024 URL2: http://www.abc.com/devices/1024/ports
This scenario is similar to string concatenation. Thus you can use the same method described above to add a string to an existing variable. Another (easier) way is to use a built-in operator +=
. When used with string operands, the +=
operator appends a string to a variable, as illustrated below.
var1="Hello" var2=" !" var1+=" World" # append a string literal var1+="$var2" # append a string variable echo $var1
Hello World !
You can use '==' or '!=' operators to check equality or inequality of two strings (or string variables) in bash. If you are using single brackets in the conditional, you can also use '=' as an equality operator. But the '=' operator is not allowed inside double round brackets.
# The following formats are all valid. if [ "$var1" == "apple" ]; then echo "This is good" fi if [ "$var1" = "apple" ]; then echo "This is good" fi if [ "$var1" != "$var2" ]; then echo "This is bad" fi if (( "$var1" == "apple" )); then echo "This is okay" fi
There are several ways to count the length of a string in bash. Of course you can use wc
or awk
to get string length information, but you don't need an external tool for a simple task like this. The following example shows how to find string length using bash's built-in mechanism.
my_var="This is my example string" len=${#my_var} len=$(expr length "$my_var")
If you want to remove a trailing newline or carriage return character from a string, you can use the bash's parameter expansion in the following form.
${string%$var}
This expression implies that if the "string" contains a trailing character stored in "var", the result of the expression will become the "string" without the character. For example:
# input string with a trailing newline character input_line=$'This is my example line\n' # define a trailing character. For carriage return, replace it with $'\r' character=$'\n' echo -e "($input_line)" # remove a trailing newline character input_line=${input_line%$character} echo -e "($input_line)"
(This is my example line ) (This is my example line)
If you want to remove whitespaces at the beginning or at the end of a string (also known as leading/trailing whitespaces) from a string, you can use sed
command.
my_str=" This is my example string " # original string with leading/trailing whitespaces echo -e "($my_str)" # trim leading whitespaces in a string my_str=$(echo "$my_str" | sed -e "s/^[[:space:]]*//") echo -e "($my_str)" # trim trailing whitespaces in a string my_str=$(echo "$my_str" | sed -e "s/[[:space:]]*$//") echo -e "($my_str)"
( This is my example string ) (This is my example string ) ← leading whitespaces removed (This is my example string) ← trailing whitespaces removed
If you want to stick with bash's built-in mechanisms, the following bash function can get the job done.
trim() { local var="$*" # remove leading whitespace characters var="${var#"${var%%[![:space:]]*}"}" # remove trailing whitespace characters var="${var%"${var##*[![:space:]]}"}" echo "$var" } my_str=" This is my example string " echo "($my_str)" my_str=$(trim $my_str) echo "($my_str)"
( This is my example string ) (This is my example string)
This is a generalization of the previous whitespace/newline character removal. Again, you can use the sed
command to remove any substring from a string. The following example illustrates how you can remove a pre-defined prefix/suffix, or remove all occurrences of a substring from a string variable. One thing to note is that if the substring contains any special character (e.g., '[' and ']' in this example), the character needs to be escaped with '\' in sed
.
my_str="[DEBUG] Device0 is not a valid input [EOL]" prefix="\[DEBUG\]" suffix="\[EOL\]" substring="valid" echo "$my_str" # remove a prefix from a string my_str=$(echo "$my_str" | sed -e "s/^$prefix//") echo "$my_str" # remove a suffix from a string my_str=$(echo "$my_str" | sed -e "s/$suffix$//") echo "$my_str" # remove a substring from a string my_str=$(echo "$my_str" | sed -e "s/$substring//") echo "$my_str"
[DEBUG] Device0 is not a valid input [EOL] Device0 is not a valid input [EOL] Device0 is not a valid input Device0 is not a input
Another way to remove a prefix or a suffix from a string is to use the bash's built-in pattern matching mechanism. In this case, the special character does not need to be escaped.
my_str="[DEBUG] Device0 is not a valid input [EOL]" prefix="[DEBUG]" suffix="[EOL]" # remove a prefix string my_str=${my_str#"$prefix"} echo "$my_str" # remove a suffix string my_str=${my_str%"$suffix"} echo "$my_str"
Device0 is not a valid input [EOL] Device0 is not a valid input
If you want to check whether or not a given string variable starts with a prefix, there are multiple ways to do it, as illustrated below.
var1="This is my text" prefix="This" case $var1 in $prefix*) echo "1. \"$var1\" starts with \"$prefix\"" esac if [[ $var1 =~ ^$prefix ]]; then echo "2. \"$var1\" starts with \"$prefix\"" fi if [[ $var1 == $prefix* ]]; then echo "3. \"$var1\" starts with \"$prefix\"" fi if [[ $var1 == This* ]]; then echo "4. \"$var1\" starts with \"This\"" fi
1. "This is my text" starts with "This" 2. "This is my text" starts with "This" 3. "This is my text" starts with "This" 4. "This is my text" starts with "This"
Note that the first approach is the most portable, POSIX-compliant one (which works not just for bash, but also for other shells).
Similarly, if you want to check whether or not a string ends with a specific suffix, you can try one of these methods shown below.
var1="This is my text" suffix="text" case $var1 in *$suffix) echo "1. \"$var1\" ends with \"$suffix\"" esac if [[ $var1 =~ $suffix$ ]]; then echo "2. \"$var1\" ends with \"$suffix\"" fi if [[ $var1 == *$suffix ]]; then echo "3. \"$var1\" ends with \"$suffix\"" fi if [[ $var1 == *text ]]; then echo "4. \"$var1\" ends with \"text\"" fi
1. "This is my text" ends with "text" 2. "This is my text" ends with "text" 3. "This is my text" ends with "text" 4. "This is my text" ends with "text"
In bash, you can check if a string contains a substring that is matched by a regular expression. As a special case, it's even easier to check if a string contains a fixed substring.
pattern="length\s+[0-9]+" # regular expression for a substring var1="This data has length 1000" var2="This data is not valid" if [[ $var1 =~ $pattern ]]; then echo "$var1: length found" else echo "$var1: length not found" fi if [[ $var2 =~ $pattern ]]; then echo "$var2: length found" else echo "$var2: length not found" fi
This data has length 1000: length found This data is not valid: length not found
When you need to split a string in bash, you can use bash's built-in read
command. This command reads a single line of string from stdin, and splits the string on a delimiter. The split elements are then stored in either an array or separate variables supplied with the read
command. The default delimiter is whitespace characters (' ', '\t', '\r', '\n'). If you want to split a string on a custom delimiter, you can specify the delimiter in IFS
variable before calling read
.
# strings to split var1="Harry Samantha Bart Amy" var2="green:orange:black:purple" # split a string by one or more whitespaces, and store the result in an array read -a my_array <<< $var1 # iterate the array to access individual split words for elem in "${my_array[@]}"; do echo $elem done echo "----------" # split a string by a custom delimter IFS=':' read -a my_array2 <<< $var2 for elem in "${my_array2[@]}"; do echo $elem done
Harry Samantha Bart Amy ---------- green orange black purple
If you want to replace a string with another string in bash, you can use the bash's parameter expansion feature.
var1="This is a very bad guide" substring="bad" # string to be replaced replacement="useful" # substitute string var2="${var1/$substring/$replacement}" echo $var2
This is a very useful guide
Let's say you want to remove from a string all text that appears after a specific character (e.g., a delimeter character), along with the character itself. In this case you can use the bash's parameter expansion in the following format.
${string%word}
The above expression means that if the "word" matches a trailing portion of the "string", the result of this expression will become the "string" without the matched pattern. For example:
url="http://www.mysite.com:50001" delimeter=":" # remove all text starting from the delimeter result=${url%$delimeter*} echo $result
http://www.mysite.com
Let's say you want to delete everything preceding and including a specific character (e.g., a delimeter character). The following form of bash's parameter expansion can get it done.
${string##*word}
The above expression means that if the "string" contains a text ending with "word", the result of this expression will become the "string" without the (longest) matched pattern. The longest matching pattern means that if the "string" contains multiple instances of "word", the matched pattern should contain all of them. For example:
# remove all text preceding and including the delimeter url="http://www.mysite.com:50001" delimeter=":" result=${url##*$delimeter} echo $result
50001
In the above example, the original string contains two instances of the delimeter ':'. Since we use the longest matching pattern, the matched content is "http://www.mysite.com:", not "http:" and hence the result ("50001") is what remains after the matched content is removed.
bash
shell scripting tutorials provided by Xmodulo.This website is made possible by minimal ads and your gracious donation via PayPal or credit card
Please note that this article is published by Xmodulo.com under a Creative Commons Attribution-ShareAlike 3.0 Unported License. If you would like to use the whole or any part of this article, you need to cite this web page at Xmodulo.com as the original source.
Xmodulo © 2020 ‒ About ‒ Powered by DigitalOcean