Hierarchy (view full)

Constructors

Properties

CrawlOtherSitesInTopLevelDomain: boolean
CrawlSitesInLowerLevelDomain: boolean
MaxDepth: number
RootURL: string
URLPattern: string
contentSourceTypeID: string
contextUser: UserInfo
visitedURLs: Set<string>

Methods

  • Implemented abstract method from the AutotagBase class. that runs the entire autotagging process. This method is the entry point for the autotagging process. It initializes the connection, retrieves the content sources corresponding to the content source type, sets the content items that we want to process, extracts and processes the text, and sets the results in the database.

    Parameters

    Returns Promise<void>

  • Given a list of content item links, check if the content item already exists in the database. If the content item exists, check if the content item has been modified since the last time it was processed. If the content item does not exist, create a new content item and add it to the list of content items to process.

    Parameters

    Returns Promise<ContentItemEntity[]>

  • Given a root URL that corresponds to a content source, retrieve all the links in accordance to the crawl settings. If the crawl settings are set to crawl other sites in the top level domain, then all links in the top level domain will be retrieved. If the crawl settings are set to crawl sites in lower level domains, then function is recursively called to retrieve all links in the lower level domains.

    Parameters

    • url: string
    • rootURL: string
    • regex: RegExp

    Returns Promise<string[]>

  • For a given URL, retrieves all links at lower level domains up to the specified crawl depth.

    Parameters

    • url: string
    • rootURL: string
    • crawlDepth: number
    • scrapedURLs: Set<string>
    • regex: RegExp

    Returns Promise<Set<string>>