Action that extracts content from PDF files Can extract text, metadata, or specific pages

Example

// Extract all text from PDF
await runAction({
ActionName: 'PDF Extractor',
Params: [{
Name: 'FileURL',
Value: 'https://example.com/document.pdf'
}, {
Name: 'ExtractType',
Value: 'text'
}]
});

// Extract metadata from PDF
await runAction({
ActionName: 'PDF Extractor',
Params: [{
Name: 'PDFData',
Value: base64PdfData
}, {
Name: 'ExtractType',
Value: 'metadata'
}]
});

// Extract specific pages
await runAction({
ActionName: 'PDF Extractor',
Params: [{
Name: 'FileID',
Value: 'uuid-of-pdf-file'
}, {
Name: 'ExtractType',
Value: 'pages'
}, {
Name: 'PageNumbers',
Value: [1, 3, 5]
}]
});

Hierarchy

  • BaseFileHandlerAction
    • PDFExtractorAction

Constructors

Methods

  • Extracts content from PDF files

    Parameters

    • params: RunActionParams<any>

      The action parameters containing:

      • FileID: UUID of MJ Storage file (optional)
      • FileURL: URL of PDF file (optional)
      • PDFData: Base64 encoded PDF data (optional)
      • ExtractType: "text" | "metadata" | "pages" (default: "text")
      • PageNumbers: Array of page numbers to extract (for pages extraction)
      • MergePages: Boolean - merge text from all pages (default: true)
      • IncludePageBreaks: Boolean - add page break markers (default: false)

    Returns Promise<ActionResultSimple>

    Extracted content based on extraction type

  • Executes the action with the provided parameters.

    Parameters

    • params: RunActionParams<any>

      The action execution parameters including context

    Returns Promise<ActionResultSimple>

    Promise resolving to the action result

  • Extract specific pages from PDF

    Parameters

    • pdfData: any
    • pageNumbersParam: any
    • mergePages: boolean
    • includePageBreaks: boolean

    Returns Promise<any>

  • Get file content from various sources based on parameters Priority: FileID > FileURL > Data parameter

    Parameters

    • params: RunActionParams<any>

      Action parameters

    • dataParamName: string

      Name of the parameter containing direct data

    • fileParamName: string = 'FileID'

      Name of the parameter containing file ID (default: 'FileID')

    • urlParamName: string = 'FileURL'

      Name of the parameter containing file URL (default: 'FileURL')

    Returns Promise<{
        content: string | Buffer;
        fileName?: string;
        mimeType?: string;
        source: "url" | "storage" | "direct";
    }>

    Object with content and metadata