Protected CrawlProtected CrawlProtected MaxProtected RootURLProtected URLPatternProtected contentPrivate contextPrivate engineProtected visitedURLsGiven a content source, retrieve all content items associated with the content sources. The content items are then processed to determine if they have been modified since the last time they were processed or if they are new content items.
Protected SetGiven a list of content item links, check if the content item already exists in the database. If the content item exists, check if the content item has been modified since the last time it was processed. If the content item does not exist, create a new content item and add it to the list of content items to process.
Protected delayProtected getGiven a root URL that corresponds to a content source, retrieve all the links in accordance to the crawl settings. If the crawl settings are set to crawl other sites in the top level domain, then all links in the top level domain will be retrieved. If the crawl settings are set to crawl sites in lower level domains, then function is recursively called to retrieve all links in the lower level domains.
Protected getProtected getProtected getProtected getProtected getProtected isProtected url
Implemented abstract method from the AutotagBase class. that runs the entire autotagging process. This method is the entry point for the autotagging process. It initializes the connection, retrieves the content sources corresponding to the content source type, sets the content items that we want to process, extracts and processes the text, and sets the results in the database.