JScrape - Java Scraping API
JScrape is a simple yet powerful java api for scraping (aka screen scraping) data from a web page using XQuery. This API makes it simple to pull data from other sources and maintain them in a simple way. The best part about it is that it is free, all we ask is that you go to our new social networking site, myfriendsuggests.com and invite some friends.
Please note that this software is in an alpha stage and has limited testing and documentation. We hope to provide more examples and better documentation soon. If you have any questions and/or comments you can either post a questions/comment to our blog or send us an email.
Why JScrape?
In the past I used to write scraping methods using simple string parsing, the problem was that code was impossible to maintain. As the structure of the webpage changed my code needed to change with it and those changes were often difficult to make. JScrape makes this all easier by allowing the XQuery language to do most of the hard work and makes finding information on the webpage much simpler and contained to a single query.
How it works
Instead of reinventing the wheel JScrape uses some 3rd party libraries to make this all as robust as possible.
The libraries used are:
- Xalan
- Commons-Codec
- Commons-Logging
- Commons-HttpClient
- Log4j
- Saxon8
- TagSoup
JScrape using the HttpClient API to get an input stream to a web page, then using the TagSoup API to turn the HTML into an acceptable DOM object and then from their saxon is used to apply the XQuery.
Changes / Updates to JScrape
Download JScrape
Before downloading please be sure to read our Terms of Use.
JScrape is available in two versions:
- JScrape Lite - This is just the precompiled JScrape jarfile, sample code and associated documentation. The required pre-requisite jars are not included and must be downloaded seperately. (This one is much smaller).
- JScrape Full - This is the JScrapejarfile, sample code, documetation, and required pre-requisite jars. This download will let you hit the ground running faster, however if you already have the pre-requisite JAR files than there is no need for this download.
JScrape Documentation & Support
See the README and API docs distributed with both versions of JScrape.
If you have any questions and/or comments you can either post a questions/comment to our blog or send us an email.

