How to read PDF File with Apache PDFBox in Android

If you have worked with PDF in android you will understand that there are limited option when working with PDF in android.

Most available SDK in Android is commercial SDK or partly license under GPLV which limits close source.

We will not focus on commercially available PDF SDK like PDFTRON, KdanMobile and Foxit.

Rather Java has an open-source PDF library which is offered by Apache foundation – Apache PDFBox. You can read more about PDFBox here.

Java PDFBox .jar files cannot be use in Android project because it contains some Java API (like Swing) that is not supported in Android.

The Apache PDFBox library is an open source Java tool for working with PDF documents.

This project allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents.

Apache PDFBox also includes several command-line utilities. Apache PDFBox is published under the Apache License v2.0.

The good news is that someone has ported Java Apache PDFBox to Android.

The Android Apache PDFBox we will use in this application is PDFBox-Android.

Below is the screenshot of the app we will create.

1. CREATE A NEW ANDROID PROJECT

  • Open Android Studio
  • Go to file menu
  • Select  new
  • Enter project name
  • Enter activity name
  • Keep other default settings
  • Click on finish button to create a new android project

2. ADD PERMISSION

Since accessing android storage API needs permission, we are going to add this code in our project manifest file.

<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />

3. ADD ANDROID THIRD-PARTY LIBRARIES

We are going to add few android libraries that will help us solve some problems rather than reinventing the wheels.

Open your project build.gradle and add the libraries below.

apply plugin: 'com.android.application'

android {
    compileSdkVersion 30
    buildToolsVersion "30.0.0"

    defaultConfig {
        applicationId "com.inducesmile.apachepdfbox"
        minSdkVersion 21
        targetSdkVersion 30
        versionCode 1
        versionName "1.0"

        testInstrumentationRunner "androidx.test.runner.AndroidJUnitRunner"
    }

    buildTypes {
        release {
            minifyEnabled false
            proguardFiles getDefaultProguardFile('proguard-android-optimize.txt'), 'proguard-rules.pro'
        }
    }

    compileOptions {
        sourceCompatibility JavaVersion.VERSION_1_8
        targetCompatibility JavaVersion.VERSION_1_8
    }
}

dependencies {
    implementation fileTree(dir: "libs", include: ["*.jar"])
    implementation 'androidx.appcompat:appcompat:1.1.0'
    implementation 'androidx.constraintlayout:constraintlayout:1.1.3'
    implementation 'com.google.android.material:material:1.3.0-alpha02'

    //PDFBox android
    implementation 'com.tom_roush:pdfbox-android:1.8.10.1'
    //File picker
    implementation 'com.github.jaiselrahman:FilePicker:1.3.2'

    implementation 'com.jakewharton:butterknife:10.2.1'
    annotationProcessor "com.jakewharton:butterknife-compiler:10.2.1"

    implementation 'com.karumi:dexter:5.0.0'

    //Lombok
    compileOnly 'org.projectlombok:lombok:1.18.8'
    annotationProcessor 'org.projectlombok:lombok:1.18.8'

    testImplementation 'junit:junit:4.12'
    androidTestImplementation 'androidx.test.ext:junit:1.1.1'
    androidTestImplementation 'androidx.test.espresso:espresso-core:3.2.0'
}

4. Implement PDF File Picker from device

In our build.gradle file, we have added a library com.github.jaiselrahman:FilePicker:1.3.2 that will be used to pick any PDF file in our device.

Create a new java class and name it PDFFilePicker.java. Open the file and paste the code below to it.

public class PdfFilePicker {

    public static Intent openPdf(Context context, int numOfFile){
        String[] suffix = {"pdf", "Pdf", "PDF"};
        Intent intent = new Intent(context, FilePickerActivity.class);
        return intent.putExtra(FilePickerActivity.CONFIGS, new Configurations.Builder()
                .setCheckPermission(true)
                .setShowImages(false)
                .setShowAudios(false)
                .setShowVideos(false)
                .setShowFiles(true)
                .enableImageCapture(false)
                .setMaxSelection(numOfFile)
                .setSkipZeroSizeFiles(true)
                .setSuffixes(suffix)
                .build());
    }
}

5. Create a new Activity class

Create a new activity class and name it ReadPdfActivity or any name of your choice.

Open the xml layout file of the activity class and paste the code below.

<?xml version="1.0" encoding="utf-8"?>
<FrameLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    tools:context=".ReadPDFActivity">

    <androidx.appcompat.widget.LinearLayoutCompat
        android:id="@+id/wrapper"
        android:layout_width="match_parent"
        android:layout_height="wrap_content"
        android:padding="12dp"
        android:gravity="center"
        android:visibility="gone"
        android:layout_marginTop="40dp"
        android:orientation="vertical">

        <androidx.appcompat.widget.AppCompatImageView
            android:layout_width="wrap_content"
            android:layout_height="wrap_content"
            android:scaleType="centerCrop"
            android:src="@drawable/pdf"/>

        <androidx.appcompat.widget.AppCompatTextView
            android:id="@+id/file_name"
            android:layout_width="wrap_content"
            android:layout_height="wrap_content"
            android:layout_marginTop="8dp"
            android:text=""/>

    </androidx.appcompat.widget.LinearLayoutCompat>


<com.google.android.material.floatingactionbutton.FloatingActionButton
        android:id="@+id/fab"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_gravity="end|bottom"
        android:layout_margin="16dp"
        android:backgroundTint="@color/colorAccent"
        android:src="@drawable/ic_baseline_add_24" />

</FrameLayout>

In the activity class, we are going to get the View references from our layout file.

When the file picker is click , it will open a fragment dialog with list of files. Once a user select one file and click done, the application will get the PDF file details and path.

Finally, we will create a PDF object using the load() of the PDDocument class.

public class ReadPDFActivity extends AppCompatActivity {

    private static final String TAG = ReadPDFActivity.class.getSimpleName();

    private Document document;

    private static final int PDF_REQUEST_CODE = 2121;

    @BindView(R.id.wrapper)
    LinearLayoutCompat wrapper;

    @BindView(R.id.file_name)
    AppCompatTextView filename;

    private PDDocument pdDocument;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);
        ButterKnife.bind(this);

        ActionBar actionBar = getSupportActionBar();
        if (actionBar != null){
            actionBar.setTitle("Read PDF File");
        }
    }

    @OnClick(R.id.fab)
    public void onOpenFilePicker(View view){
        Log.d(TAG, "Working");
        Intent intent = PdfFilePicker.openPdf(this, 1);
        startActivityForResult(intent, PDF_REQUEST_CODE);
    }

    @Override
    protected void onActivityResult(int requestCode, int resultCode, @Nullable Intent data) {
        super.onActivityResult(requestCode, resultCode, data);
        if (resultCode == RESULT_OK && requestCode == PDF_REQUEST_CODE) {
            getFileInStorage(data);
        }
    }

    private void getFileInStorage(@Nullable Intent data) {
        if (data != null) {
            ArrayList<MediaFile> mediaFileArrayList = data.getParcelableArrayListExtra(FilePickerActivity.MEDIA_FILES);
            if (mediaFileArrayList != null) {
                MediaFile mediaFile = mediaFileArrayList.get(0);
                String filePath = Helper.getPath(this, mediaFile.getUri());

                if (!TextUtils.isEmpty(filePath) || filePath != null){
                    File file = new File(filePath);
                    document = new Document(file.getName(), Helper.formatSize(file.length()), Helper.formatDate(file.lastModified()), file.getAbsolutePath());
                    Log.d(TAG, "Selected Filename =  " + document.getDocumentName());
                    //show or hide
                    showOrHidePDFIcon();
                }
            }
        }
    }

    private void showOrHidePDFIcon(){
        if (document != null){
            wrapper.setVisibility(View.VISIBLE);
            filename.setText(document.getDocumentName());

            //Read and show text
            runOnUiThread(new Runnable()
            {
                @Override
                public void run(){
                    try {
                        readPDFContent();
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }
            });
        }
    }

    private void readPDFContent() throws IOException {
        if (document != null){
            File file = new File(document.getDocumentPath());
            if (file.exists()){
                pdDocument = PDDocument.load(file);
                PDFTextStripper pdfTextStripper = new PDFTextStripper();
                String textContent = pdfTextStripper.getText(pdDocument);
                if (!TextUtils.isEmpty(textContent)){
                    //Log the text content of the PDF file
                    Log.d(TAG, textContent);
                }
            }
        }
    }

    private void readPDFContentPageByPage() throws IOException {
        if (document != null){
            File file = new File(document.getDocumentPath());
            if (file.exists()){
                pdDocument = PDDocument.load(file);
                PDFTextStripper pdfTextStripper = new PDFTextStripper();

                for (int i = 0; i < pdDocument.getNumberOfPages(); i++){
                    pdfTextStripper.setStartPage(i);
                    pdfTextStripper.setEndPage(i);
                    String pageTextContent = pdfTextStripper.getText(pdDocument);
                    if (!TextUtils.isEmpty(pageTextContent)){
                        //Log the text content of the PDF file
                        Log.d(TAG, pageTextContent);
                    }
                }
            }
        }
    }
}

5. Create a Helper Java Class

Create a new Java class and name it Helper.Java. This class will contain some important methods we need.

Open the java class and add the information below to it.

public class Helper {

    public static String formatDate(long milliseconds) {
        SimpleDateFormat dateFormat = new SimpleDateFormat("MM/dd/yyyy", Locale.US);
        dateFormat.setTimeZone(TimeZone.getTimeZone("GMT+1"));
        return dateFormat.format(new Date(milliseconds));
    }

    public static String formatSize(long bytes) {
        DecimalFormat sizeFormat = new DecimalFormat("0.00");
        return sizeFormat.format(bytes/1048576.0).concat(" MB");
    }

    public static String getPath(Activity activity, Uri uri) {
        String[] projection = {MediaStore.Images.Media.DATA};
        Cursor cursor = activity.getContentResolver().query(uri, projection, null, null, null);
        if (cursor == null) return null;
        int column_index = cursor.getColumnIndexOrThrow(MediaStore.Images.Media.DATA);
        cursor.moveToFirst();
        String s = cursor.getString(column_index);
        cursor.close();
        return s;
    }
}

6. Extra Bonus

I have included two methods, one is used to get all the text content of a given PDF file while the other method is used to get text content of each page in PDF file.

private void readPDFContentPageByPage() throws IOException {
        if (document != null){
            File file = new File(document.getDocumentPath());
            if (file.exists()){
                pdDocument = PDDocument.load(file);
                PDFTextStripper pdfTextStripper = new PDFTextStripper();

                for (int i = 0; i < pdDocument.getNumberOfPages(); i++){
                    pdfTextStripper.setStartPage(i);
                    pdfTextStripper.setEndPage(i);
                    String pageTextContent = pdfTextStripper.getText(pdDocument);
                    if (!TextUtils.isEmpty(pageTextContent)){
                        //Log the text content of the PDF file
                        Log.d(TAG, pageTextContent);
                    }
                }
            }
        }
    }

In the next post, we will learn how to create a new PDF file in Android using Apache PDFBox.

If you have any questions about working with PDF file in Android kindly use the comment box below and drop your questions.

Add a Comment