PDA

View Full Version : Reg Expr in Java



ZthDimension
26-10-2009, 02:13 PM
Basically I'm not particualry sure what I'm doing in life in Reg Expr(so please excuse the mess I'm sure there is a much more efficient way of writing them), just a quick bit of code as a example



private String NameCountryData;
private String [] Country;
private String [] surname;
private String [] name;

NameCountryData = "England (2006)/Blogg, Fred/Davey, Jen/Doe, Jane"

title = NameCountryData.split("\\(\\d\\d\\d\\d\\)/"); //Splitting the data in
half from England(whilst removing (2006) so have two array positions one
with "England" in and the other containing the rest)

surname = title[1].split("(, )+"); //these are the two lines I'm struggling
with, not sure what Reg Expr I need to split this on the ", " after Blogg
and for the reg expr only to match ", " ONCE so I end up with only two
array positions similar to above.
name = surname[1].split();// Same as above



I hope I've explained what I'm trying to do above reasonably clearly but the end result I want to end up with I'm trying to get to look like this:

title[0] = England
title[1] = Blogg, Fred/Davey, Jen/Doe, Jane
surname[0] = Blogg
surname[1] = Fred/Davey, Jen/Doe, Jane
name[0] = Fred
name[1] = Davey, Jen/Doe, Jane

I assume its possible but I've spent the past two hours banging my head over this, I know its quite simple but can't work it out!

Cheers if you can help

Marc

PS This is sample code, so most probably won't/shouldn't compile

codemonkey
26-10-2009, 03:01 PM
Sorry but I'm not entirely sure what you are trying to achieve but maybe a StringTokenizer would help

Something like this...



String nameCountryData = "England (2006)/Blogg, Fred/Davey, Jen/Doe, Jane"
StringTokenizer st = new StringTokenizer(nameCountryData, ",");
while (st.hasMoreTokens()) {
println(st.nextToken());
}


That will break your main string down by commas allowing you to strip out the unrequired characters

codemonkey
26-10-2009, 03:05 PM
This link is worth a read too

http://java.sun.com/docs/books/tutorial/essential/regex/

ZthDimension
26-10-2009, 04:08 PM
Hi Cheers,

Basically I'm doing my uni assignment which is based on "6 Degrees of Kevin Bacon" (effectively just rewrite it). The data is coming from a huge IMDB txt file which contians data in the form film title(Year of Film)/Last Name of Actor, First Name of Actor, So a example of the data would look like this:


'Tis Autumn: The Search for Jackie Paris (2006)/Paris, Jackie/Moody, James (IV)/Bogdanovich, Peter/Vera, Billy/Ellison, Harlan/Newman, Barry


I'm reading this in via a buffered reader reading it line by line so my code looks somtehing along the lines like this(at the moment):


String FileName = "SmallFile.txt";
in = new BufferedReader( new FileReader(FileName));

title = in.readLine().split("\\(\\d\\d\\d\\d\\)/");

Surname = title[1].split("(, )+");
FirstName = Surname[1].split("/");


So the genral point of the code is that it reads the line in then splits it at the year(I don't need this infomation) so the title[0] would contain 'Tis Autumn: The Search for Jackie Paris whilst title[1] would contain the rest of the line I read in. I then assign title[1] to surname and split it so surname[0] = paris and the rest of the data to surname[1] and then again pass this to FirstName so FirstName = surname[1] and so split this at Jackie so FirstName[0] = Jackie and FirstName[1] = the rest of the data.

From here I intend to loop it back round so Title[1] =FirstName[1] and then feed Surname[1] again etc. So effectively I'm picking out one word at a time from the data until I run out of data on that line.

Thus the end result would be something along the lines of this:

title[0] = 'Tis Autumn: The Search for Jackie Paris
Surname[0] = Paris
FirstName[0] = Jackie
Surname[1] = Moody
FirstName[1] = James

However the issue I'm having is that split() takes a reg exp, I'm perfectly fine with splitting at the title but when I then pass it to surname[/I ]I don't know how to get my reg exp to go through my data and match that I want to split it at [I]Paris just before the ", " and then stop(i.e only match the pattern one(the first time it comes across it)).

Ta for the link to http://java.sun.com/docs/books/tutorial/essential/regex/ I'd been using it and thought I had found the answer in regards to Quantifiers but I can't seem to get it to work!

Sorry if I appear condescending I don't mean to be I sort of babble as I type and as I'm writing this I begin to understand what I'm writing better! As a result I suspect that the approach I'm trying to take won't really work as I'm going to struggle controlling the looping construct, either way this reg exp is starting to bug me!!

Cheers

Marc

Ian.H
26-10-2009, 04:40 PM
If the array you've posted is the end result (as in, data required), the following regex appears to work:


^(.+) \([0-9]{4}\)\/(.+?),\s*(.+?)\/(.+?),\s*(.+?)\/

This will give you:


$1 == 'Tis Autumn: The Search for Jackie Paris
$2 == Paris
$3 == Jackie
$4 == Moody
$5 == James (IV)

A great, and free app for working with RegEx is The Regex Coach (http://weitz.de/regex-coach/) :)

I know nothing of Java itself, so you'll have to implement the RegEx code into your Java code, but do enjoy playing with RegEx myself [insert geeky smiley here] :D


Regards,

Ian


EDIT: Just realised array code you gave was literally an example.. and you'd want the rest of the actors names too I'd guess. The above code would only grab the first 2 (as per your example).. doh! I'll come up with a fix.


EDIT (fix):

^(.+) \([0-9]{4}\)\/(.+)

$1 == title
$2 == list of actors

Pseudo code:

$actorNames = array();
$actorList = $2;

/* RegEx split at '/' char to get array of actor names */
$actors = $actorList.split('\/');

loop through ($actors as $actor) {
/* Split name at comma to get first / last name */
$actor.regex('^/(.+?),\s*(.+)/');
$firstName = $2;
$lastName = $1;

/* Add results to array to do with as you please later in the code */
addToArray($actorNames, array('firstName' => $firstName, 'lastName' => $lastName);
}

Result:
$actorNames[0][firstName] == Jackie
$actorNames[0][lastName] == Paris
$actorNames[1][firstName] == James
$actorNames[1][lastName] == Moody

etc etc