Android HTML Parser Using JSOUP Tutorial

Android HTML Parser Using JSOUP Tutorial

In this tutorial we are going to learn how to parse HTML file in android using JSOUP library. This can come in handy when you want to extract some node or HTML element in a web page which you will like to use in your android application.

Jsoup is an open source Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods. Jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers like Chrome and Firefox do.

If you do not want to parse and manipulate HTML file rather you want to display HTML file or web pages in android application, I will suggest you read my tutorial on How to load external web page inside Android WebView

We are going to use two different examples in this tutorial. The first example is parsing HTML file that is stored in the assets folder of our project. The second example will focus on parsing HTML file of a live web page.

We will create a button View in our main layout so that when the button is click, the HTML file will be parse and we will retrieve the title value of the page.

Due to the fact that the second example will make an internet call, we are going to add the internet permission in our project Manifest.xml file.

Before we start, the first thing I will do is to list the environment and tools I used in this android tutorial but feel free to use whatever environment or tools you are familiar with.

Windows 7

Android Studio

Samsung Galaxy Fame Lite

Min SDK 14

Target SDK 19

To create a new android application project, following the steps as stipulated below.

Go to File menu

Click on New menu

Click on Android Application

Enter Project name: AndroidJsoupParser

Package: com.inducesmile.androidjsoupparser

Keep other default selections.

Continue to click on next button until Finish button is active, then click on Finish Button

Once you are done with creating your project, make sure you change the package name if you did not use the same package.

Update Manifest.xml file

The project Mainfest.xml file has been updated with the addition of internet permission. The code is shown below.

<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
    package="inducesmile.com.androidjsouphtmlparser" >

    <uses-permission android:name="android.permission.INTERNET" />

    <application
        android:allowBackup="true"
        android:icon="@mipmap/ic_launcher"
        android:label="@string/app_name"
        android:theme="@style/AppTheme" >
        <activity
            android:name=".MainActivity"
            android:label="@string/app_name" >
            <intent-filter>
                <action android:name="android.intent.action.MAIN" />

                <category android:name="android.intent.category.LAUNCHER" />
            </intent-filter>
        </activity>
        <activity
            android:name=".LivepageActivity"
            android:label="@string/title_activity_livepage" >
        </activity>
    </application>

</manifest>

 Adding Jsoup Java Library

Since we are going to use Jsoup java library to parse our HTML file in our android application, we are going to import Jsoup library into our project.

The first thing to do is to download Jsoup jar file here

Jsoup android

Add the Jsoup.jar file in the libs folder of your android project.

Then, right click on the jar file and click on add as library in the fly-out menu that appeared in your Android Studio. You can read more here on how to add jar in android studio.

Now that we have done with importing Jsoup library into our project, we will continue with the design of our main layout file.

Activity_main.xml

Open the activity_main.xml file, drag and drop two View controls.

1. TextView which will be used to display the title data of the parsed HTML file.

2. Button View – when a user click on this View it will display the title information of HTML file or page.

The code composition is shown below.

<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    android:paddingLeft="@dimen/activity_horizontal_margin"
    android:paddingRight="@dimen/activity_horizontal_margin"
    android:paddingTop="@dimen/activity_vertical_margin"
    android:paddingBottom="@dimen/activity_vertical_margin"
    tools:context=".MainActivity">

    <TextView
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:text="@string/parse_html"
        android:id="@+id/html_content"
        android:textSize="16sp"
        android:layout_alignParentTop="true"
        android:layout_centerHorizontal="true"
        android:layout_marginTop="119dp" />

    <Button
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:id="@+id/button"
        android:text="@string/html_button"
        android:layout_marginTop="30dp"
        android:layout_alignParentTop="true"
        android:layout_centerHorizontal="true" />
</RelativeLayout>

We will add some string literals in the string.xml file which will be use in our main layout. The updated version of the string.xml is shown below.

<resources>
    <string name="app_name">Android JSOUP HTML Parser</string>
    <string name="hello_world">Hello world!</string>
    <string name="action_settings">Settings</string>
    <string name="parse_html"> </string>
    <string name="html_button">Click to get HTML Page Title</string>
    <string name="title_activity_livepage">LivepageActivity</string>
</resources>

 Parsing Locally Stored HTML File in Assets Folder Using Jsoup

package inducesmile.com.androidjsouphtmlparser;

import android.content.res.AssetManager;
import android.os.Bundle;
import android.support.v7.app.ActionBarActivity;
import android.view.Menu;
import android.view.MenuItem;
import android.view.View;
import android.widget.Button;
import android.widget.TextView;
import android.widget.Toast;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.io.StringWriter;
import java.io.Writer;

public class MainActivity extends ActionBarActivity {

    private Document htmlDocument;
    private String htmlContentInStringFormat;
    private TextView parsedHtmlNode;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        parsedHtmlNode = (TextView)findViewById(R.id.html_content);

        String htmlFilename = "filename.html";
        AssetManager mgr = getBaseContext().getAssets();
        try {
            InputStream in = mgr.open(htmlFilename, AssetManager.ACCESS_BUFFER);
            htmlContentInStringFormat = StreamToString(in);
            in.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
        Button htmlTitleButton = (Button)findViewById(R.id.button);
        htmlTitleButton.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View v) {
                if(htmlContentInStringFormat.equals("")){
                    Toast.makeText(MainActivity.this, "There is no HTML file to parse", Toast.LENGTH_LONG).show();
                    return;
                }else{
                    htmlDocument = Jsoup.parse(htmlContentInStringFormat);
                    String pageTitle = htmlDocument.title();
                    if(pageTitle != null){
                        parsedHtmlNode.setText(pageTitle);
                    }
                }               
            }
        });
    }
    @Override
    public boolean onCreateOptionsMenu(Menu menu) {
        // Inflate the menu; this adds items to the action bar if it is present.
        getMenuInflater().inflate(R.menu.menu_main, menu);
        return true;
    }

    @Override
    public boolean onOptionsItemSelected(MenuItem item) {
        // Handle action bar item clicks here. The action bar will
        // automatically handle clicks on the Home/Up button, so long
        // as you specify a parent activity in AndroidManifest.xml.
        int id = item.getItemId();

        //noinspection SimplifiableIfStatement
        if (id == R.id.action_settings) {
            return true;
        }
        return super.onOptionsItemSelected(item);
    }

    public static String StreamToString(InputStream in) throws IOException {
        if(in == null) {
            return "";
        }
        Writer writer = new StringWriter();
        char[] buffer = new char[1024];
        try {
            Reader reader = new BufferedReader(new InputStreamReader(in, "UTF-8"));
            int n;
            while ((n = reader.read(buffer)) != -1) {
                writer.write(buffer, 0, n);
            }
        } finally {
        }
        return writer.toString();
    }
}

 Parsing Live HTML Page Using Jsoup

package inducesmile.com.androidjsouphtmlparser;

import android.os.AsyncTask;
import android.os.Bundle;
import android.support.v7.app.ActionBarActivity;
import android.view.Menu;
import android.view.MenuItem;
import android.view.View;
import android.widget.Button;
import android.widget.TextView;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

import java.io.IOException;


public class LivepageActivity extends ActionBarActivity {

    private Document htmlDocument;
    private String htmlPageUrl = "https://inducesmile.com/";
    private TextView parsedHtmlNode;
    private String htmlContentInStringFormat;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_livepage);

        parsedHtmlNode = (TextView)findViewById(R.id.html_content);
        Button htmlTitleButton = (Button)findViewById(R.id.button);
        htmlTitleButton.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View v) {
                JsoupAsyncTask jsoupAsyncTask = new JsoupAsyncTask();
                jsoupAsyncTask.execute();
            }
        });
    }
    @Override
    public boolean onCreateOptionsMenu(Menu menu) {
        // Inflate the menu; this adds items to the action bar if it is present.
        getMenuInflater().inflate(R.menu.menu_livepage, menu);
        return true;
    }

    @Override
    public boolean onOptionsItemSelected(MenuItem item) {
        // Handle action bar item clicks here. The action bar will
        // automatically handle clicks on the Home/Up button, so long
        // as you specify a parent activity in AndroidManifest.xml.
        int id = item.getItemId();

        //noinspection SimplifiableIfStatement
        if (id == R.id.action_settings) {
            return true;
        }
        return super.onOptionsItemSelected(item);
    }

    private class JsoupAsyncTask extends AsyncTask<Void, Void, Void> {

        @Override
        protected void onPreExecute() {
            super.onPreExecute();
        }

        @Override
        protected Void doInBackground(Void... params) {
            try {
                htmlDocument = Jsoup.connect(htmlPageUrl).get();
                htmlContentInStringFormat = htmlDocument.title();
            } catch (IOException e) {
                e.printStackTrace();
            }
            return null;
        }

        @Override
        protected void onPostExecute(Void result) {
            parsedHtmlNode.setText(htmlContentInStringFormat);
        }
    }
}

Save the file and run your project. If everything works for you, the project will appear like this in your device.

jsoup example

You can download the code for this tutorial below. If you are having hard time downloading the tutorials, kindly contact me.

Remember to subscribe with your email so that you will be among the first to receive our new post once it is published

Add a Comment