The Substring Finding Algorithm

Written by Raza on. Posted in C++

We have been playing with strings and doing some basic operations on string for the last two posts. Another such interesting operation with strings is finding one string in another string. There are a number of things to be checked while searching for a string as a substring of second one. We shall be explaining these steps and other details in the proceeding lines.

A string may exist as it is in the second string if and only if the second one is greater than or equal to its length. If this condition fails, first string is not a substring of the second one. Naturally, this should be checked first of all. Then if the first string is null, it is surely a substring of the second as every string has a null value in it. We can safely assume that the strings passed to us are null terminated.

As soon as we have checked these boundary conditions and implemented some checks, we start comparing the main string for our substring. We have to iterate both the strings with different index variables. The reason being we have to check for the substring matching till the length of the substring. Suppose we have the string “marry went home” and the substring we are finding is “went”. Length of our substring is 4 so our comparison iterations should be 4 in the main string each time. If the matching continues till the length of the substring, it means that it is found in the main string.

It is very important to remember that if a mismatch occurs during the comparison of main string and substring in our string matching algorithm, we have to start matching from the next index from where we started off before this comparison in the main string. If we have main string “ababaa” and substring “abaa” to search, we would stop our comparison at the second ‘b’ with mismatch indicated. But if we start the next comparison from the second last‘a’, we have lost our substring even though it is there. So we should start comparing onwards from the first ‘b’ of the string now, the next element after the first comparison.

Our simple string matching algorithm returns the starting index of the first occurrence of the substring in the main string if it is present at all. If substring is not present, it returns -1. If the sub string is null, it can return 0. Below is the code for our simple string matching algorithm.

```
int stringcompare(char str[], char substr[])
{
.  bool isequal ;
.  for (int i = 0 ; str[i] != '\0' ; ++i)
.  {
.   isequal = true ;
.   int prev = i ;
.   for(int j = 0 ; (substr[j] != '\0' && str[i] != '\0') ; ++j, ++i)
.   {
.    if (str[i] != substr[j])
.     isequal = false ;
.   }
.   if (isequal == true)
.    return prev ;
.   else
.    i = prev + 1 ;
.   }
.  return -1 ;
}

.
int main ()
{
.  char str[] = "marry has a little lamb" ;
.  char substr[] = "" ;
.  cout << stringcompare(str, substr) ;

}

```
Hope you enjoyed and learned that. Feel free to point out any improvement in the code. Good Bye!