[lug] Regex Help

Chris McDermott csmcdermott at gmail.com
Fri Jul 8 15:40:21 MDT 2011


You could try this:

/https?:\/\/[^\/]+\/specificpath/


https?:\/\/ - this will match either "http://" or "https://"
[^\/]+ - this will match anything *except* a "/" character



It worked for at least some preliminary testing:

[chris at bull sandbox]$ ./test.pl http://www.example.com/specificpath
Yay!
[chris at bull sandbox]$ ./test.pl https://www.example.com/specificpath
Yay!
[chris at bull sandbox]$ ./test.pl https://192.158.282.12/specificpath

Yay!
[chris at bull sandbox]$ ./test.pl https://192.158.282.12/blah/specificpath
Boo!
[chris at bull sandbox]$ ./test.pl
https://192.158.282.12/blah/balh/specificpath
Boo!
[chris at bull sandbox]$ ./test.pl
https://www.cnn.com/blah/balh/specificpath

Boo!


Chris

On Fri, Jul 8, 2011 at 3:13 PM, Davide Del Vento <davide.del.vento at gmail.com
> wrote:

> I think you are not getting clear with your requirements.
>
> Let's start with English first, regex later.
>
> You have a string, which is an (already validated) URL.
>
> Do you want to much ANY site (not a specific one), right?
>
> Then you want to have a specific path, that you explicitly say. Does
> this path have a generic or specific number of slashes?
>
> Then you have "whatever" for the rest of the URL, *including* other
> path, possibly with many other slashes, as well as file names.
>
> Is this correct or not?
>
> Last, but not least, which language you need this in? Perl, Python,
> Unix Grep, GNU Grep, you name it, all have regexp. They are *not* 100%
> compatible with each other.
>
> Dav
>
> On Fri, Jul 8, 2011 at 15:03, George Sexton <georges at mhsoftware.com>
> wrote:
> > Not to be stupid or anything, but if I understood regular expressions
> well
> > enough to use this, I wouldn't have asked for help.
> >
> > I'm using an application that matches regular expressions in URLs.
> >
> > I'd like it to match
> >
> > /somepath/*
> >
> > But not
> >
> > /somethingelse/somepath/*
> >
> > I can write an expression to match /somepath/*. The problem is it's
> matching
> > the second thing which I don't want.
> >
> > I don't get to write a lot of code.
> >
> > I don't know what the host name will be. It might be a fqdn, might be an
> IP
> > Address.
> >
> > The input has the full URL syntax:
> >
> > Scheme:hostname/path/
> >
> >
> >
> > George Sexton
> > MH Software, Inc.
> > 303 438-9585
> > www.mhsoftware.com
> >
> >
> >> -----Original Message-----
> >> From: lug-bounces at lug.boulder.co.us [mailto:lug-
> >> bounces at lug.boulder.co.us] On Behalf Of Chip Atkinson
> >> Sent: Friday, July 08, 2011 2:44 PM
> >> To: Boulder (Colorado) Linux Users Group -- General Mailing List
> >> Subject: Re: [lug] Regex Help
> >>
> >> How about this:
> >>
> >> http://txt2re.com/
> >>
> >>
> >>
> >> On Fri, 8 Jul 2011, George Sexton wrote:
> >>
> >> > I'm just dying on a regular expression here. I'm always rotten. If
> >> someone
> >> > could help me out I would appreciate it.
> >> >
> >> >
> >> >
> >> > I'm looking for a regex that will match:
> >> >
> >> >
> >> >
> >> > http://some.host/specificpath/
> >> >
> >> >
> >> >
> >> > but not
> >> >
> >> >
> >> >
> >> > http://some.host/otherjunk/specificpath/
> >> >
> >> >
> >> >
> >> > I'd really appreciate any help I can get.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > George Sexton
> >> >
> >> > MH Software, Inc.
> >> >
> >> > 303 438-9585
> >> >
> >> > www.mhsoftware.com
> >> >
> >> >
> >> >
> >> >
> >>
> >> _______________________________________________
> >> Web Page:  http://lug.boulder.co.us
> >> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> >> Join us on IRC: irc.hackingsociety.org port=6667
> >> channel=#hackingsociety
> >
> >
> > _______________________________________________
> > Web Page:  http://lug.boulder.co.us
> > Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> > Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
> >
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20110708/f5e3ef05/attachment.html>


More information about the LUG mailing list