Regular Expressions in Ruby

  • published on January 17th, 2008

    PHP implements both POSIX and Perl-compatible regular expressions. The Perl-compatible regexp functions (which includes all the preg_* functions) is the preferred library for most developers since it has many features not available in POSIX, and is binary safe.

    Ruby uses Perl-compatible regular expressions, so if you’re familiar with the preg_* functions in PHP, you’re already well on your way to learning regular expressions in Ruby. Regular expressions are a complex topic, so we won’t be covering regular expression basics, but will instead focus on translating existing knowledge of Perl-compatible PHP functions to Ruby.

    Regular Expressions in Ruby

    We use regular expression patterns in PHP by passing a string argument to various functions. Ruby treats regular expressions differently. Instead of specifying the pattern within a string, they are objects just like everything else in Ruby.

    PHP

    $myRegexp = '/[a-z0-9]+\s/mi';
    print gettype($myRegexp); 
    // => string

    Ruby

    my_regexp = /[a-z0-9]+\s/mi
    p my_regexp.class
    # => Regexp

    We can create regular expressions in Ruby using two different literal syntaxes.
    The most common is by enclosing the pattern in forward-slashes, but we can also use an alternate %r{} syntax. We usually use %r{} when the pattern contains a lot of forward-slashes (such as a filepath). Regular expressions can also be explicitly instantiated using the Regexp class.

    Ruby

    /[a-z0-9]+\s/mi
    %r{/path/to/gif\.gif}mi
    Regexp.new("[a-z0-9]+\s", Regexp::IGNORECASE | Regexp::MULTILINE)

    Comparing Functions/Methods

    Now that we’ve seen some basic syntax for regular expression objects in Ruby, let’s take a look at PHP’s PCRE functions, and their closest equivalents in Ruby.

    preg_match vs. String#match

    We match a pattern in Ruby strings using the match method. Ruby’s match method works differently than preg_match in how it returns matches. We usually want to know two different things when we match data: If the pattern matched, and what specific strings sections were matched.

    PHP returns an integer to tell us if the data matched (either 0 or 1) and populates a matches array by reference. Ruby returns a MatchData object when the pattern matches, and nil when something doesn’t. We can inspect the MatchData object to find the actual string matches.

    In this example, we try to match the different components of a list of email addresses. Both preg_match and String#match only match the first occurrence of the pattern.

    PHP

    $string = 'joe@example.com; walter@example.org';
    $result = preg_match('/([a-z0-9_.-]+)@([a-z0-9-]+)\.([a-z.]+)/i', 
              $string, $matches);
    var_export($result); 
    // => 1
     
    var_export($matches); 
    // => array('joe@example.com', 'joe', 'example', 'com')

    Ruby

    string = 'joe@example.com; walter@example.org'
    matches = string.match(/([a-z0-9_.-]+)@([a-z0-9-]+)\.([a-z.]+)/i)
    p !matches.nil?
    # => true
     
    p matches
    # => #<MatchData:0x1ed138>
     
    p matches[1]
    # => "joe"
     
    p matches.to_a
    # ["joe@example.com", "joe", "example", "com"]

    preg_match_all vs. String#scan

    PHP returns an integer with the number of matches for preg_match_all and populates a matches array by reference. Ruby performs multiple matches for a string using the scan method. This method returns a nested array of matches or an empty array when no matches are found. Be aware that the nesting of values in this array is different than how preg_match_all orders matches.

    In this example, we match components of the email address, and both preg_match_all and String#scan give us an array of matches that are found.

    PHP

    $string = 'joe@example.com; walter@example.org';
    $result = preg_match_all('/([a-z0-9_.-]+)@([a-z0-9-]+)\.[a-z.]+/i', 
              $string, $matches);
    var_export($result);
    // => 2
     
    var_export($matches);
    // => array(array('joe@example.com', 'walter@example.org'),
    //          array('joe',             'walter'),
    //          array('example',         'example'),

    Ruby

    string = 'joe@example.com; walter@example.org'
    result = string.scan(/([a-z0-9_.-]+)@([a-z0-9-]+)\.[a-z.]+/i)
    p result.size
    # => 2
     
    p result
    # => [["joe", "example"], ["walter", "example"]]

    preg_replace vs. String#gsub

    We perform pattern based substitution in Ruby using gsub, which is equivalent to PHP’s preg_replace function. A notable difference is that the gsub method is also used for string substitution in Ruby. We do this by simply providing a string instead of the regular expression pattern. This would be like using the str_replace function in PHP.

    In this example, we want to replace the domain in all emails with foo.

    PHP

    $string = 'joe@example.com; walter@example.org';
    $result = preg_replace('/@([a-z0-9-]+)/', '@foo', $string);
     
    var_export($result);
    // => 'joe@foo.com; walter@foo.org'

    Ruby

    string = 'joe@example.com; walter@example.org'
    result = string.gsub(/@([a-z0-9-]+)/, '@foo')
     
    p result
    # => "joe@foo.com; walter@foo.org"

    We can use backreferences in our gsub replacements just as we would with preg_replace by using \1, \2, etc in our replacement string.

    In this example, we prefix the existing domain with mail.. Remember to escape backslashes used for the backreference.

    PHP

    // Replace domain with mail.domain
    $string = 'joe@example.com; walter@example.org';
    $result = preg_replace('/@([a-z0-9-]+)/', '@mail.\\1', $string);
     
    var_export($result);
    // => 'joe@mail.example.com; walter@mail.example.org'

    Ruby

    string = 'joe@example.com; walter@example.org'
    result = string.gsub(/@([a-z0-9-]+)/, '@mail.\\1')
     
    p result
    # => "joe@mail.example.com; walter@mail.example.org"

    preg_split vs. String#split

    We split strings by a pattern in Ruby using the split method. This is pretty much the same as the preg_split function in PHP. As with gsub, we can also use this same method to split using a string instead of a regular expression. This means that split also performs the equivalent of the explode function in PHP.

    In this example, we create an array of the list of emails by splitting the string using the semi-colon and space as the delimiter.

    PHP

      $string = 'joe@example.com; walter@example.org';
      $result = preg_split('/;\s?/', $string);
     
      var_export($result);
      // array('joe@example.com', 'walter@example.org')

    Ruby

      string = 'joe@example.com; walter@example.org'
      result = string.split(/;\s?/)
     
      p result
      # => ["joe@example.com", "walter@example.org"]

    preg_grep vs. Array#grep

    The preg_grep function in PHP is a useful function to find entries in an array that match a given pattern. Ruby does this same operation with the grep method.

    In this example, we’ll build a new array that only consists of email addresses that end in .com.

    PHP

    $myArray = array('joe@example.com', 'walter@example.org');
    $result = preg_grep('/\.com$/', $myArray);
     
    var_export($result);
    // => array('joe@example.com')

    Ruby

    my_array = ['joe@example.com', 'walter@example.org']
    result = my_array.grep(/\.com$/)
     
    p result
    # => ["joe@example.com"]

    preg_quote vs. Regexp.quote

    When we use a string as a regular expression, we want to escape the characters that could be interpreted as regexp special characters. PHP does this using preg_quote, and Ruby has an equivalent Regexp.escape method.

    In this example, we’ll escape any regular expression special character in the given string.

    PHP

    $string = '[my_file.gif]';
    $result = preg_quote($string);
     
    var_export($result);
    // => '\\[my_file\\.gif\\]'

    Ruby

    string = '[my_file.gif]'
    result = Regexp.escape(string)
     
    p result
    # => "\\[my_file\\.gif\\]"

    Regular Expressions in Rails

    Rails uses regular expressions in various places to specify patterns. When we are matching a route in Rails, we can use them to assign a requirement that a route component must match:

    Ruby

    ActionController::Routing::Routes.draw do |map|
      map.connect 'teams/:team_id/players/:action/:id, :team_id => /\d+/
    end
    

    We can also use regular expressions in our models when we validate the format of data. We pass a regexp to the :with option of validates_format_of:

    Ruby

    class Image < ActiveRecord::Base
      validates_format_of :url, :with => /\.(gif|jpg)/i, 
                          :message => "must be a GIF or JPG" 
    end

    When we are testing controller code, the assert_select method will accept a regular expression to match response data according to the given pattern.

    Ruby

    class HomepageControllerTest < ActionController::TestCase
      def test_greeting
        get :index
        assert_select 'div.greeting', /Welcome [a-z0-9-_]+/
      end
    end

8 comments

  • comment by Php Developer 18 Jan 08

    Excellent rails regexp tutorial. YOu can find more about regexp here

    http://www.regular-expressions.info/

  • comment by Markus 23 Jan 08

    You said “by enclosing the pattern in backslashes”, but I think you meant the forward slash in this case?

  • comment by Derek 23 Jan 08

    Markus: Thanks, and fixed

  • comment by junaid 18 Mar 08

    Nice article. What is the alternative of preg_replace_callback in ruby?

  • comment by junaid 18 Mar 08

    We can achieve preg_replace_callback functionality in ruby this way
    def my_test_method(matches)
    return “_pk”
    end
    my_string = ‘joe@example@example.com; walter@example.org
    str = my_string.gsub(/@([a-z0-9-]+)/){ |match| my_test_method(match) }
    puts str

    Thanks for nice article. Its really very helpful to me.
    Regards
    Junaid malik.

  • comment by Ryan 31 Mar 08

    It is worth adding that you will need to escape a different set of characters on ruby than in php. For instance the curly braces ‘{’, ‘}’, and ‘#’ need to be escaped in ruby because they have special meaning in strings.

  • comment by Ramzi Ferchichi 11 Jun 08

    On the subject of “preg_match_all vs. String#scan”, to achieve the same nesting of values in the $matches array of preg_match_all, add the PREG_SET_ORDER flag.

    preg_match_all(’/([a-z0-9_.-]+)@([a-z0-9-]+)\.[a-z.]+/i’, $string, $matches, PREG_SET_ORDER);

    var_export($matches);
    // => array(
    // array(’joe@example.com’, ‘joe’, ‘example’),
    // array(’walter@example.org’, ‘walter’, ‘example’)
    // )

  • comment by adheflygale 2 Sep 08

    hey :)
    its very unconventional point of view.
    Nice post.
    realy good post

    thx :-)

Post a comment


We welcome your participation but please note we reserve the right to remove any comments that we think are not relevant or do not contribute to the discussion.