Archive for category Java

Blackberry Networking Helper Class

I came across this “life-saver” helper class, it helps auto detect the appropriate connection type to use when making connections using the blackberry networking classes. I’ve spent a ton of time debugging issues in applications and this class really has helped. I didn’t write it, but I want to point other BB developers towards it.

http://www.versatilemonkey.com/blog/index.php/2009/06/24/networking-helper-class/

Using URLRewrite to rename a website

In the process of renaming our website from MyFriendSuggests.com to theSUGGESTR.com I found that most articles on this refer to using mod_rewrite to accomplish the redirect.  However for most us Java guys this may not be the way to go.  I used URLRewrite to accomplish this name change and have 301(Permanent redirects) from my old URL to my new one. 301’s work best with search engines and will help preserve your page rank.

Here is the URL Rewrite configuration I used:

<rule>
<name>Domain Change</name>
<condition name=”host” operator=”notequal”>www.newdomain.com</condition>
<condition name=”host” operator=”notequal”>localhost</condition>
<from>^/(.*)</from>
<to type=”permanent-redirect”>http://www.newdomain.com/$1</to>
</rule>
<rule>
<name>Domain Change</name>
<condition name=”host” operator=”notequal”>www.newdomain.com</condition>
<condition name=”host” operator=”notequal”>localhost</condition>
<from>^/$</from>
<to type=”permanent-redirect”>http://www.newdomain.com/</to>
</rule>

Note for this to work your old domain must still be pointed at your server (via DNS entry).

301 Redirection Rename Website URLRewrite

Tags: , , ,

Introducing APTags a Java API for Tagging

Tagging pages or items is a big part of the Web2.0 movement.  After doing some mild searching I didn’t find a prebuilt API for working with tags for Java (found some PHP stuff).  So I created one to be used in the next release of our site (www.myfriendsuggests.com).  As we’ve done in the past we decided to make the API available for anyone who wants to use it.  We don’t have a ton of working examples or documentation but the API handles all the DB communication and allows for:

  • Adding tags to items
  • Finding items with a tag (or tags)
  • Finding items tagged by a certain user
  • Generating a tag cloud for a user
  • Generating a tag cloud for an item
  • Other tag queries.

So far this has only been tested with MySQL but if there is interest out there please leave us a comment and we’ll work to verify it for other databases.

 To learn more and download the APTags API click here.

api java tagging tags Web 2.0

Creating a custom recommender using taste

Taste is a great framework for collaborative filtering.  We are going to be launching a new recommendation algorithm on our site (MyFriendSuggests.com) in the coming weeks (Stay Tuned!) based on the Taste framework.  Taste provides a User-based and Item-based recommender.  User based recommenders find users that have similiar tastes to you and then use their ratings to predict how you might rate a given item.  Item based recommenders find items that are similar to each others and use those similar items to predict how you might rate a given item.  In our testing we found that a recommender that uses both types of recommenders would be most effective.  Basically we use the following formulat to predict user u’s rating of object x.

P(u,x) = alpha*uRec(u,x) + (1-alpha) * iRec(u,x)

Where alpha is a constant between 0 and 1 (basically weighting the two recommenders) and uRec and iRec are the Taste User and Item based recommenders.

Using the Taste evaluators you can build a simple program to find the bast value of alpha for your application.  Since we still have very sparse data we are leaving the value 0.50 until we have more data to work with.  In the next few days I’ll be posting some more on how I used taste to build our recommender.

collaborative filtering java Programming recommender taste Web 2.0

Scraping Hotmail for Contacts using JScrape

As we’ve seen in my posts for scraping AOL, GMail and Yahoo, each site has its own “tricks” that make it challenging to scrape contact information from.  The final site in this series of posts is for Hotmail.  Hotmail is one of the trickier ones.  As I did with the previous posts I’m going to outline some of the trickier parts of scraping the site.

After posting to Hotmail.com you need to parse all the hidden parameters on the form, you will need to repost those parameters along with the login and passwd for the user.  You also need to pass a parameter PwdPad which is generated by remove X chars from the end of the string “IfYouAreReadingThisYouHaveTooMuchFreeTime” where X is the length of the user’s password.   To determine the URL you need to parse out of the JavaScript the value of the JS variable, g_DO["hotmail.com"]. 

After posting to the URL you will need to parse some more JS, find the window.location.replace JS and use the URL in that parameter to post your next URL.  In the response you will find a mailbox ID, you can find that by looking for ‘_UM=’ in the response and parsing out the value.  From there you are home free… simply post to:  http://”+host+“/cgi-bin/addresses?”+mbox  (you can get the host by grabbing the attribute using the following code:  String host = get.getRequestHeader(“Host”).getValue(); ).

Well that’s about it.  Hopefully that helps some people out.  If you want to see this in action sign up for an account at MyFriendSuggests.com and use my version of the contact importer (and while your there try our site out and let us know what you think). 

java MyFriendSuggests Scraping Social Marketing Web 2.0

Scraping GMAIL for contacts

In our previous posts we’ve looked at how to scrape both Yahoo! and AOL webmail for a list of contacts given a username and password.  This technique can be critical in growing your user base by allowing your users to invite many friends in one quick and easy step.  Our next site that we supported is GMail. 

However, for GMail we did not use our JScrape API but rather just used the G4J API.  It was extremely easy to use and to incorporate into our framework.  I recommend downloading it and testing it, it should only take a few short lines of code, here is what I did:

GMConnector gm = new GMConnector(userID,passwd,1);
gm.connect();
GMContact [] data = gm.getContact(1,
“”);

The last site we will cover is Hotmail which was probably the most challenging of the 4 sites. 

Scraping AOL WebMail for contacts

This is the 3rd post in a short series discussing how I built an API to grab contact list information from Yahoo!, AOL, GMail and Hotmail.  In our first post we reviewed the high level approach to scraping sites.  In our second post we went over how to scrape Yahoo! – which is by far the easiest of the 4 sites to scrape.  This post will discuss how to scrape AOL which is much more challenging as it requires some cookie manipulation and some javascript emulation.  The tips below aren’t necessarily the best way to do this but it worked for me.

For working with AOL you need to work with the HttpClient and PostMethod objects, from the Apache Commons HttpClient API, directly.  For all URLs you post to make sure to set User-Agent and set the cookie policy:

post.getParams().setCookiePolicy(CookiePolicy.BROWSER_COMPATIBILITY);
post.setRequestHeader(“User-Agent”,” Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0; .NET CLR 2.0.50727)”);
 

Also for each post I set the Referrer attribute to the previous URL. After you post to the first URL you’ll need to process all the hidden variables that are returned and add them to next post.  Also there was a cookie that I seemed to need to manually add, to do so I used the following snippet of code:

Cookie[] c = client.getState().getCookies();
String cStr =
“”;
for (int i = 0 ; i < c.length; i++)
   cStr += c[i].getName()+
“=”+c[i].getValue()+“; “;
cStr+=
“s_cc=true; s_sq=aolsnssignin%2Caolsvc%3D%2526pid%253Dsso%252520%25253A%252520login%2526pidt%253D1%2526oid%253DSign%252520In%2526oidt%253D3%2526ot%253DSUBMIT%2526oi%253D97″;
post.setRequestHeader(“Cookie”,cStr);
This second post should also contain the user name and password.  This is the first part of the login. In the response you’ll find that there is javascript that will forward to a new specific URL, you need to get it dynamically.  I used the following code:

int onLoad = data.indexOf(“<body onLoad”);
int http = data.indexOf(“http:”,onLoad);
int endPos = data.indexOf(‘\”,http);
String newURL = data.substring(http,endPos);

The resulting page ALSO has some JavaScript that you will be required to emulated.  I used the following code to find the new URL:

http = data.indexOf(“gInitBasePath “);
int startPos = data.indexOf(‘\”‘,http);
endPos = data.indexOf(
‘\”‘,startPos+1);
newURL =
“http://webmail.aol.com”+data.substring(startPos+1,endPos);
newURL = newURL.replaceAll(
” “, “%20″);

Your almost there!!  In the response for that last request you need to find the uid returned in one of the cookies.  Just grab all the cookies and parse out the “uid:”.   Last but not least just post to the Address book url (you can do find this by using Fiddler) and pass in the value for the uid for user attribute.  At that point you can use JScrape to process the resulting page and parse out all the email addresses. 

Hopefully these tips help you in creating your own contact importer.

 

java Scraping Social Marketing Web 2.0

Scraping Yahoo! for contacts using JScrape

This post builds on my previous post, in which we discuss how to scrape webmail sites for contacts.  Yahoo! is by far the easiest of the sites to scrape (of the major sites).  After you’ve sniffed the URLs used for the login you just need to replace the username and password for the login.  Yahoo! currently does not use any JavaScript tricks or special cookies during the login.  Using JScrape as-is should be sufficient.  The one trick to Yahoo is that it breaks up the address book into seperate pages.  In my solution I dynamically grab these URL’s using the following snippet of code:

public String[] getURLs()
{

 String q = “declare namespace xhtml=\”http://www.w3.org/1999/xhtml\”; \n” +
 ”for $d in //xhtml:ol[@id='abcnav']/xhtml:li/xhtml:a \n”+
 ” return <li> { $d/@href/string() } </li> “;

//pScrape is a com.apsquared.jscrape.PageScraper object that has already logged in to the site.
  List l =
pScrape.scrapePageForList(“http://address.yahoo.com/yab/us”, q);
  if (l == null)
   
return null;

  String[] ret = new String[l.size()];
  for (int i = 0; i < l.size() ; i++)
  {
    TinyNodeImpl ti = (TinyNodeImpl)l.get(i);
    ret[i] = new String(ti.getStringValue());
  }
  return ret;
}

Note: this may return null if the user account only has a small # of contacts.

For each url returned you need to scrape the page looking for the contacts.  I used the following XQuery for that scrape:

declare namespace xhtml=\”http://www.w3.org/1999/xhtml\”;
for $d in //xhtml:td[@class='contactnumbers']/xhtml:span/xhtml:a
return <li> { data($d) } </li>

That’s about it, as we’ll see in the next few days this is much simpler than many other sites (GMail, Hotmail, AOL) as they require many more tricks to login.

java Programming Scraping Social Marketing Web 2.0 XQuery Yahoo!

Scraping WebMail sites for contacts using JScrape

Many new websites, especially those that depend on social networks, are now offering ways to import contacts from various WebMail sites.  I’m not going to go into the ethics of asking a user for their user name and password to a webmail site and scraping the site but I will touch on the technical challenges.  I started by building JScrape, a Java API that makes scraping websites easier.  I then decided to try to scrape contact lists from Yahoo!, GMail, Hotmail and AOL.  I found that each of these sites had their own challenges.  The easiest by far was Yahoo!, so that is what I’ll start with.  I’m not going to provide the exact code but will give you tips that will definetly get you going.

The basic process for all of these sites is:

1) Use a tool (such as Fiddler or Ethereal) to capture the network traffic that occurs when you login to the site.
2) Each site will use different cookies and JS to make logging in more challenging (this is the hard part). 
3) Use the same session and post to the address book page for that site.
4) Use JScrape to parse out the email addresses that you want.  You may need to page through different pages depending on the number of email addresses (and how the site displays the addresses).

Sounds simple eh?  Well step #2 can be quite challengine and frustrating.  I will add a new blog entry for each of the different sites and how to “login” to them, so check back soon. 

java Scraping Social Marketing Technorati Web 2.0 Yahoo!

Improving performance of Taste using DBCP

For the past few weeks I’ve been playing with Taste, a Java based framework for collaborative filtering (basically the recommendation feature found on sites like Amazon and Netflix).    Hopefully in the near feature this tool will be incorporated in our site, MyFriendSuggests.com to improve our suggestion algorithms. 

What I found was the initial description of using a MySQL DataSource sounded fine, but do to the heavy access to the database performance was bad.  Actually it would stop being able to find new connections since the connections were being grabbed faster than windows was cleaning up open sockets.  Simple solution to this was to use the Apache DBCP for db connection pooling.  All I needed to do was add commons-dbcp and commons-pool to my class path and then create a simple function:

public static DataSource getDataSource()
{
  BasicDataSource md =
new BasicDataSource();
  md.setDriverClassName(
“com.mysql.jdbc.Driver”); 
  md.setUrl(
“jdbc:mysql://localhost:3306/dbname”);
  md.setUsername(
“user”);
  md.setPassword(
“pass”);
  return md;
}

I call this method in the constructor of the MySQLJDBCDataModel class.  After doing that things started performing much better.

java MyFriendSuggests taste Technorati Web 2.0
Close
E-mail It