For several months now, our data science and development team has been working on advanced business logic for matching items on competitor shops based on Artificial Intelligence (AI) and Machine Learning (ML). Assigning articles without a clear primary key, such as EAN or GTIN, to articles of our customers – especially in the area of private labels – allows for an extended scope even in scenarios such as assortment planning or product development. In addition, in the area of Dynamic Pricing, the correct, verified matching of articles is a key technology, as data cannot be collected on all platforms with unique identifiers.
Currently, our databases provide more than 15 million products from 100,000 online shops in 25 countries. Due to the expansion of the technology these numbers are constantly increasing. In the following article and in the upcoming weeks we would like to inform you more about the topic ‘Matching’ using AI and ML.
Matching quality for better insights
One of the biggest challenges in price monitoring is the assignment of products from different platforms and data sources to unique products. The crawling of product data as an initial process for data collection only becomes a valuable insight for online retailers through intelligent product matching and verification of the collected article data. The provided price and product data are made available daily or even hourly to enable rule-based dynamic pricing, to optimize sales or to identify trends.
Higher identity rates through Machine Learning
The range of products and categories is increasing and often no unique primary key is identified. In order to map this variety of offers, we are expanding our solution and now offer automated assignments with a multi-stage machine learning process.
When matching without available primary keys, the maximal available data points are determined and their matching at the level of categories, characteristics and attributes is automatically checked and assigned. Text mining tools and image recognition algorithms search for similarities in the product presentation.
Smart Score evaluates AI-based
The Smart Scoring developed by us is based on Machine Learning and verifies the matching of the collected data for inconsistencies on the text and image level.
Separated by data type, numerical data such as length, width, size and the affiliation to product bundles are evaluated. In an automated process these data are normalized and evaluated with a learning score value. Combined product information with additional data, such as ingredients, are also normalized by data points.
An ML-based evaluation model analyzes the allocation on article level on a semantic and syntactic level. It is only possible to use the data from a defined threshold value at matching data points. The final assignment is re-verified by a multi-layered decision logic before publication and can be checked by our data service team in an additional hybrid procedure, if desired.
Non-hits are also evaluated with a Deep Learning neural network. With historically learned probabilities from millions of matches, products are assigned at this level to further increase the identity rate.
The AI-based Smart Score evaluates a product match, which can be played out to online retailers or manufacturers.
We can already see a high number of positive effects in the matching results of assortments without a primary key. For the first time, for example, private labels can be compared with other private labels or even brands within the same category.
In addition to pure product matching for significantly improved automated repricing, aggregated assortment comparisons with competitors are also possible and trend barometers for their development in different online channels can be produced. The insights gained can be directly incorporated into product development or assortment planning.