A MemberJunction package for identifying and managing duplicate records using AI-powered vector similarity search. This package generates vector representations of records and uses similarity scoring to detect potential duplicates, with options for automatic merging.
The AI Vector Dupe package provides sophisticated duplicate detection capabilities by:
npm install @memberjunction/ai-vector-dupe
The main class that handles duplicate detection operations.
import { DuplicateRecordDetector } from '@memberjunction/ai-vector-dupe';
import { PotentialDuplicateRequest, UserInfo } from '@memberjunction/core';
const detector = new DuplicateRecordDetector();
Abstract base class providing utilities for vector synchronization operations.
import { VectorSyncBase } from '@memberjunction/ai-vector-dupe';
Type definition for entity synchronization configuration.
import { EntitySyncConfig } from '@memberjunction/ai-vector-dupe';
const config: EntitySyncConfig = {
EntityDocumentID: 'entity-doc-id',
Interval: 3600,
RunViewParams: { /* RunView parameters */ },
IncludeInSync: true,
LastRunDate: 'January 1, 2024 00:00:00',
VectorIndexID: 1,
VectorID: 1
};
import { DuplicateRecordDetector } from '@memberjunction/ai-vector-dupe';
import { PotentialDuplicateRequest, UserInfo } from '@memberjunction/core';
// Initialize the detector
const detector = new DuplicateRecordDetector();
// Define the request parameters
const request: PotentialDuplicateRequest = {
ListID: 'your-list-id', // ID of the list containing records to check
EntityID: 'your-entity-id', // ID of the entity type
EntityDocumentID: 'doc-id', // ID of the entity document with template
Options: {
DuplicateRunID: 'run-id' // Optional: existing duplicate run to continue
}
};
// Execute duplicate detection
const response = await detector.getDuplicateRecords(request, currentUser);
if (response.Status === 'Success') {
console.log(`Found ${response.PotentialDuplicateResult.length} records with potential duplicates`);
for (const result of response.PotentialDuplicateResult) {
console.log(`Record ${result.RecordCompositeKey.ToString()}:`);
for (const duplicate of result.Duplicates) {
console.log(` - Potential duplicate: ${duplicate.ToString()} (${duplicate.ProbabilityScore * 100}% match)`);
}
}
}
// Configure thresholds via Entity Document settings
// PotentialMatchThreshold: Minimum score to consider as potential duplicate (e.g., 0.8)
// AbsoluteMatchThreshold: Score at which automatic merging occurs (e.g., 0.95)
const entityDocument = await vectorizer.GetEntityDocument(entityDocumentID);
entityDocument.PotentialMatchThreshold = 0.8; // 80% similarity
entityDocument.AbsoluteMatchThreshold = 0.95; // 95% for auto-merge
await entityDocument.Save();
getDuplicateRecords(params: PotentialDuplicateRequest, contextUser?: UserInfo): Promise<PotentialDuplicateResponse>Performs duplicate detection on records in a list.
Parameters:
params: Request parameters including:ListID: ID of the list containing records to analyzeEntityID: ID of the entity typeEntityDocumentID: ID of the entity document configurationOptions: Optional configuration including DuplicateRunIDcontextUser: Optional user context for permissionsReturns: PotentialDuplicateResponse containing:
Status: 'Success' or 'Error'ErrorMessage: Error details if failedPotentialDuplicateResult[]: Array of results for each analyzed recordBase class providing utility methods:
parseStringTemplate(str: string, obj: any): string - Parse template stringstimer(ms: number): Promise<unknown> - Async delay utilitystart() / end() / timeDiff() - Timing utilitiessaveJSONData(data: any, path: string) - JSON file operationsThe duplicate detection process follows these steps:
The package integrates with these MemberJunction entities:
Create a .env file with:
# AI Model Configuration
OPENAI_API_KEY=your-openai-key
MISTRAL_API_KEY=your-mistral-key
# Vector Database
PINECONE_API_KEY=your-pinecone-key
PINECONE_HOST=your-pinecone-host
PINECONE_DEFAULT_INDEX=your-index-name
# Database Connection
DB_HOST=your-sql-server
DB_PORT=1433
DB_USERNAME=your-username
DB_PASSWORD=your-password
DB_DATABASE=your-database
# User Context
CURRENT_USER_EMAIL=user@example.com
Entity documents use template syntax to define how records are converted to text for vectorization:
// Example template
const template = "${FirstName} ${LastName} works at ${Company} as ${Title}";
@memberjunction/ai: AI model abstractions@memberjunction/ai-vectordb: Vector database interfaces@memberjunction/ai-vectors: Vector operations@memberjunction/ai-vectors-pinecone: Pinecone implementation@memberjunction/ai-vector-sync: Entity vectorization@memberjunction/core: Core MJ functionality@memberjunction/core-entities: Entity definitionsThe package provides detailed error messages for common issues:
try {
const response = await detector.getDuplicateRecords(request, user);
if (response.Status === 'Error') {
console.error('Duplicate detection failed:', response.ErrorMessage);
}
} catch (error) {
console.error('Unexpected error:', error.message);
}
For issues, questions, or contributions, please refer to the MemberJunction documentation or contact the development team.