On 8/16/06, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > Hi, > > On Wed, 16 Aug 2006, Shawn Pearce wrote: > > > Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > > Hi, > > > > > > On Wed, 16 Aug 2006, Jon Smirl wrote: > > > > > > > On 8/16/06, John Rigby <jcrigby@gmail.com> wrote: > > > > > Sorry if this is off topic, but could the dictionary be used to make > > > > > git-grep alot faster? > > > > > > > > It would be almost instant. > > > > > > But only if you are not using a regular expression, but a single word. > > > > Yes and no. If the inverted index contains terms broken by some > > known pattern (e.g. break on word-type boundaries) and the regex > > in question has constant sections (it should, otherwise it might > > as well just be '.') then you can reduce your search space to a > > fraction of the overall data by looking at the inverted index to > > select likely terms, select the related revisions containing those > > possible terms, then run the regex only on those revisions. > > > > Sure you would be possibly pulling out a number of false positives > > but if the constant sequence(s) in the regex reduce your search > > space to below 1/2 of the overall data that's probably a lot less > > I/O and CPU required to complete the query, even if you have to > > read the entire dictionary and apply each term in the dictionary > > to the regex to look for those possible matches. > > So it would speed up the search, but no, in case of regular expressions, > particularly any interesting one, the result would not be instantaneous. Instant is a relative term. Google is instant compared to running grep over 10TB of data. How long would that take, a month? Shawn is correct, the inverted indexes are used to eliminate as many files as possible. So the response time is a more of a function of how many hits you have instead of how big the data set is. Of course if you give it a pattern that matches everything it will just as slow as grep. Give it a pattern that is only in one file and detectable by the index and it will be very fast. If you are going to give it a bunch of patterns that aren't in the index, then we need to adjust how the index is built. -- Jon Smirl jonsmirl@gmail.com - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.htmlReceived on Thu Aug 17 00:44:57 2006
This archive was generated by hypermail 2.1.8 : 2006-08-17 00:45:47 EST