Skip to content

Commit 67d4658

Browse files
authored
Merge pull request #199 from Jatin86400/Z-algorithm
added Z-algorithm
2 parents 685b485 + 5247111 commit 67d4658

File tree

2 files changed

+121
-0
lines changed

2 files changed

+121
-0
lines changed
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Z Algorithm
2+
3+
This algorithm finds all occurrences of a pattern in a text in linear time. Let length of text be n and of pattern be m, then total time taken is O(m + n) with linear space complexity. Now we can see that both time and space complexity is same as KMP algorithm but this algorithm is Simpler to understand.
4+
5+
In this algorithm, we construct a Z array.
6+
7+
# What is Z array?
8+
9+
For a string str[0..n-1], Z array is of same length as string. An element Z[i] of Z array stores length of the longest substring starting from str[i] which is also a prefix of str[0..n-1]. The first entry of Z array is meaning less as complete string is always prefix of itself.
10+
Example:
11+
Index 0 1 2 3 4 5 6 7 8 9 10 11
12+
Text a a b c a a b x a a a z
13+
Z values X 1 0 0 3 1 0 0 2 2 1 0
14+
15+
# How to construct Z array?
16+
17+
The idea is to maintain an interval [L, R] which is the interval with max R
18+
such that [L,R] is prefix substring (substring which is also prefix).
19+
20+
Steps for maintaining this interval are as follows –
21+
22+
1) If i > R then there is no prefix substring that starts before i and
23+
ends after i, so we reset L and R and compute new [L,R] by comparing
24+
str[0..] to str[i..] and get Z[i] (= R-L+1).
25+
26+
2) If i <= R then let K = i-L, now Z[i] >= min(Z[K], R-i+1) because
27+
str[i..] matches with str[K..] for atleast R-i+1 characters (they are in
28+
[L,R] interval which we know is a prefix substring).
29+
Now two sub cases arise –
30+
a) If Z[K] < R-i+1 then there is no prefix substring starting at
31+
str[i] (otherwise Z[K] would be larger) so Z[i] = Z[K] and
32+
interval [L,R] remains same.
33+
b) If Z[K] >= R-i+1 then it is possible to extend the [L,R] interval
34+
thus we will set L as i and start matching from str[R] onwards and
35+
get new R then we will update interval [L,R] and calculate Z[i] (=R-L+1)
36+
37+
The algorithm runs in linear time because we never compare character less than R and with matching we increase R by one so there are at most T comparisons. In mismatch case, mismatch happen only once for each i (because of which R stops), that’s another at most T comparison making overall linear complexity.
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
#include<bits/stdc++.h>
2+
using namespace std;
3+
#define fastio ios_base::sync_with_stdio(false);cin.tie(0);cout.tie(0)
4+
#define md 1000000007
5+
#define ll long long int
6+
#define vi vector<int>
7+
#define vll vector<i64>
8+
#define pb push_back
9+
#define all(c) (c).begin(),(c).end()
10+
template< class T > T max2(const T &a,const T &b) {return (a < b ? b : a);}
11+
template< class T > T min2(const T &a,const T &b) {return (a > b ? b : a);}
12+
template< class T > T max3(const T &a, const T &b, const T &c) { return max2(a, max2(b, c)); }
13+
template< class T > T min3(const T &a, const T &b, const T &c) { return min2(a, min2(b, c)); }
14+
template< class T > T gcd(const T a, const T b) { return (b ? gcd<T>(b, a%b) : a); }
15+
template< class T > T lcm(const T a, const T b) { return (a / gcd<T>(a, b) * b); }
16+
template< class T > T mod(const T &a, const T &b) { return (a < b ? a : a % b); }
17+
typedef pair<ll,ll> pi;
18+
int main()
19+
{
20+
fastio;
21+
string txt;
22+
string pat;
23+
getline(cin,txt);//getline() reads the complete line in contrary to the traditional cin function which reads just the string before any spaces
24+
getline(cin,pat);
25+
int n = txt.length();
26+
int pat_len = pat.length();
27+
string str = pat + "$" + txt;//This is the new string that is formed after merging the pattern, '$' and txt string . we can use any other symbol instead of '$'.I have used dollar sign because it occurs rarely in the txt string
28+
29+
int len = n+pat_len +1;//length of the total output string
30+
int z_val[len]={0};
31+
int left =0;//left index of the z box
32+
int right =0;//right index of the z box
33+
int count=0;//count of the match
34+
for(int i=1;i<len;i++)
35+
{
36+
int curr = i;
37+
if(count>1)
38+
{
39+
left =i;
40+
right = i + count-2;
41+
}
42+
43+
44+
if(count<=1)
45+
{
46+
count=0;
47+
for(int j=0;j<curr && j<len;j++)
48+
{
49+
if(str[j]==str[curr])
50+
{
51+
count++;
52+
curr++;
53+
}
54+
else
55+
{
56+
z_val[i]=count;
57+
break;
58+
59+
}
60+
}
61+
}
62+
else
63+
{
64+
for(int j=left;j<=right; j++)
65+
{
66+
if(z_val[j-left]+j<right)//looks for the edge cases when the string index + the assigned z value surpasses the right index..this is not possible.So we need to check the match for given letter saperately.
67+
z_val[j]=z_val[j-left];
68+
else
69+
{
70+
count=0;
71+
i=j-1;
72+
break;
73+
}
74+
}
75+
}
76+
}
77+
for(int i=0;i<len;i++)
78+
{
79+
if(z_val[i]==pat_len)//This indicates the index where the string occurence takes place
80+
cout<<i-pat_len<<endl;
81+
}
82+
83+
84+
}

0 commit comments

Comments
 (0)