# Bash in_array madness

You have the problem that you want to check if a value is in an array in Bash. Well, then you have more than one problem, or in other words: the fun begins. :)

The following is partly a script that can be executed and partly a post. Probably I should use something like Rmarkdown for bash-scripts or post the snippets as Gists, but I am going to leave it like that for now.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 #!/bin/bash # ---------------------------------------------------------- # These are switches you want to set anyway # ---------------------------------------------------------- # Exit on error set -e # No uninitialized variables set -u # ---------------------------------------------------------- # The problem: check if a value is in an array # or: find the needle in the haystack # # ---------------------------------------------------------- # The haystack HAYSTACK=(one two three) # The needle NEEDLE=one # ---------------------------------------------------------- # Way 1: case in solution # # ---------------------------------------------------------- # Given the fact that case is used to implement complex conditionals # this top stackoverflow answer seems weird: case "${HAYSTACK[@]}" in *"$NEEDLE"*) echo "1a) found";; esac # Why? because of the way case works # Remember that case matches the "EXPR" against the conditions CASE1) to # CASEN) and exectues the commands CMD specified for that condition, but only # for the first match, so # - the order of CASE1 … CASEN is very important. # - we are doing pattern matching # Syntax: # case EXPR in CASE1) CMD;; … CASEN) CMD;; esac # Concrete example: # case "f" in "abc") echo "abc";; "f") echo "f";; esac # The top answer turns this around: here the HAYSTACK is the thing to look for # and we check if it is found in NEEDLE, where needle is made into a globbing # pattern that matches for any substrings that contains NEEDLE. # That means it will match this: NEEDLE=o case "${HAYSTACK[@]}" in *"$NEEDLE"*) echo "1b) falsely found";; esac # which is certainly not what you want. # ---------------------------------------------------------- # Way 2: regex matching # # ---------------------------------------------------------- # Let us consider the following two additional arrays: NEEDLE=one SINGLETON=(one) SPACES=(one "two two" three "four") # Spaces have been a long standing problem for bash # in fact if I have the time I would like to write about bash quirks more :) # The first thing is: we are again using regular expression matching in a weird # way: =~ # the HAYSTACK is the and NEEDLE is the # and it is not good # practice to quote the rhs of an ERE because it is interpreted as a string and # not as a regex. (Thank you, shellcheck). if [[ " ${HAYSTACK[@]} " =~ "${NEEDLE} " ]]; then echo "2a) found"; fi # Which looks good. But let's check for substrings: NEEDLE=o if [[ " ${HAYSTACK[@]} " =~ "${NEEDLE} " ]]; then echo "2a1) found"; fi # Which looks good as well. # But let's check for substrings that contain the input field separator (IFS): NEEDLE=two if [[ " ${SPACES[@]} " =~ "${NEEDLE} " ]]; then echo "2b) falsely found"; fi # This should not be the case # but the other properties look fine: NEEDLE=o if [[ " ${SPACES[@]} " =~ "${NEEDLE} " ]]; then echo "2c) found"; fi if [[ " ${SINGLETON[@]} " =~ "${NEEDLE} " ]]; then echo "2d) found"; fi # so this solution require to change the IFS (which breaks if a value contains # the new IFS) IFS=$'\t' HAYSTACK=(one\ttwo two\tthree) unset IFS NEEDLE=two # so this is the old behaviour if [[ "${SPACES[@]} " =~ " ${NEEDLE} " ]]; then echo "2e) falsely found"; fi # so this is the new behaviour if [[ "${HAYSTACK[@]} " =~ " ${NEEDLE} " ]]; then echo "2f) falsely found"; fi # buuuuut this does not work: NEEDLE="two two" if [[ "${HAYSTACK[@]} " =~ " ${NEEDLE} " ]]; then echo "2g) found"; fi NEEDLE=two\ two if [[ "${HAYSTACK[@]} " =~ " ${NEEDLE} " ]]; then echo "2g) found"; fi # at this point I am giving up on this path # ---------------------------------------------------------- # for loop in a function (lots of sources, since this is the obvious way in # procedural programming) # # # # ---------------------------------------------------------- # The sane thing should be a loop. # But then we would need to know what we are comparing. # -eq is for numbers and == for strings. A version for strings could be: function failing_in_array(){ local THIS_ARRAY=$1 local THIS_VALUE=$2 #printf "%s" "$THIS_ARRAY" for i in "${THIS_ARRAY[@]}" do #printf "%s" "$i" if [ "$i" == "$THIS_VALUE" ] ; then printf "y\n" return 0 fi done printf "n\n" "\n" return 0 #return 1 } # INPUT HAYSTACK=(one two three) NEEDLE=one SINGLETON=(one) SPACES=(one "two two" three "four") if [[ $(failing_in_array "$HAYSTACK" "$NEEDLE") == "y" ]]; then printf "3a) found\n"; fi NEEDLE=two if [[$(failing_in_array "$HAYSTACK" "$NEEDLE") == "y" ]]; then printf "3b) found\n"; else printf "3b) falsely not found\n"; fi # So why does this not work for the last case? # there are a lot of things going wrong here: # - first: you can not pass an array as an argument in bash # - second: THIS_ARRAY is not an array. It is a string. # - third: $HAYSTACK is not the array, but only the first element of the array. # (at this point I feel like a carpenter who is only allowed to use broken tools) # what about first? # so there are two ways of doing this: pass the values (the expanded array) or # pass by name. Both have the disadvantage that they are unable to distinguish # the last element of the array from the second parameter. # For the function: # in_array (a, b, c) d # is equivalent to # in_array (a, b, c, d) # because both lists of arguments evaluate to # a, b, c, d # So the function only gets a list of values but does not know how it was # called. So, can we live with that? I think yes, I can and it is better # then to mess around with the IFS. # And lastly, why do we have to use a weird output value like y and n? # Because if the function would exit with 1 the script would stop because we # are using set -e (and we won't unset this because a function that requires # that would be … crappy). But that is always the case with -e and a function # that returns something different to 0. So, let's accept that too. # so in the end our in_array function would be like this: function in_array(){ local ARGS=("$@") # printf "ALL ARGS: %s\n" "${ARGS[@]}" local THIS_ARRAY=("${ARGS[@]:0:((${#ARGS[@]} - 1))}") # printf "ARRAY: %s\n" "${THIS_ARRAY[@]}" local THIS_VALUE="${ARGS[*]:((${#ARGS[@]} - 1)):1}" # printf "VALUE: %s\n" "${THIS_VALUE[@]}" for i in "${THIS_ARRAY[@]}" do # printf "i: %s\n" "$i" if [ "$i" == "$THIS_VALUE" ] ; then printf "y\n" return 0 fi done printf "n\n" "\n" return 0 #return 1 } # INPUT HAYSTACK=(one two three) NEEDLE=one SINGLETON=(one) SPACES=(one "two two" three "four") if [[$(in_array "${HAYSTACK[@]}" "$NEEDLE") == "y" ]]; then printf "4a) found\n"; else printf "4a) falsely not found\n"; fi # in_array "${HAYSTACK[@]}" "$NEEDLE" NEEDLE=two if [[ $(in_array "${HAYSTACK[@]}" "$NEEDLE") == "y" ]]; then printf "4b) found\n"; else printf "4b) falsely not found\n"; fi # in_array "${HAYSTACK[@]}" "$NEEDLE" NEEDLE=two if [[$(in_array "${SPACES[@]}" "$NEEDLE") == "y" ]]; then printf "4c) falsely found\n"; else printf "4c) not found\n"; fi # so this looks good. does it break on spaces? NEEDLE="two two" if [[ $(in_array "${SPACES[@]}" "$NEEDLE") == "y" ]]; then printf "4d) found\n"; else printf "4d) falsely not found\n"; fi if [[$(in_array "${HAYSTACK[@]}" "$NEEDLE") == "y" ]]; then printf "4e) falsely found\n"; else printf "4e) not found\n"; fi # no. good. and on numbers? NEEDLE=2 HAYSTACK=(one 2 three) if [[ $(in_array "${HAYSTACK[@]}" "$NEEDLE") == "y" ]]; then printf "4f) found\n"; else printf "4f) falsely not found\n"; fi # look like the perfect thing :) # this function does not differentiate between not-string and string. So "2" == # 2 is True but that is the default behaviour of bash and zsh NEEDLE=2 HAYSTACK=(one "2" three) if [[$(in_array "${HAYSTACK[@]}" "$NEEDLE") == "y" ]]; then printf "4g) falsely found\n"; else printf "4g) not found\n"; fi # Optimizations # Note that you could check before if it is worth to traverse the array # by using grep at the beginning and you could also time the different solutions. # Ugh let's leave it like this for today. # other solutions I did not look at # use a declarative array # # use grep # # inarray=$(echo${haystack[@]} | grep -o "needle" | wc -w) 

# Summary

Due to the way bash handles arrays we went on a bigger and longer journey than we should. Is this really productive? I like to look at programming and scripting languages in depth, so we'll see, maybe there will be similar posts about other languages. If you know a language I should write about, write me and I'll consider it. I like to complain too. :)