The KMX platform has an intuitive user interface that makes it easy to users to search, analyze and visualize large volumes of unstructured data (text, images, documents). KMX Platform has benefits at the system, user and business level.
With Treparel KMX you facilitate Information Professionals with the tools that will not only deliver them access to information but offer the capacity to recognize patterns, to grasp context, to infer meaning with the ability to create and apply models.
The KMX platform includes the following Big Data Text Analytics functionalities:
- Data source Importer
- Text Preprocessing
- Information Extraction
- Stop-lists for words / stemmers
- Multi language word lists
- Developing automated Classifiers
- Assisted Classifier Performance tuning
- Landscape Visualization
- Semantic Analytics to build Taxonomies & Ontologies
- Automated backups
- User management
- Collaboration & sharing
The technical benefits of the KMX platform are found in:
The Treparel KMX platform enables users to automatically find patterns in structured and unstructured data. Understanding of tables and numbers but also of text (Web pages, e-mails, blogs, websites, patents, documents)
Data Mining done today…
Data mining is the science of extracting and determining patterns and trends from data. In most of the cases (>90%) this comes down to:
(A) classification, segmentation, clustering of the data or
(B) regression (forecasting) of the data.
Note that clustering is a form of binary classification: does this value yes or no belong to this group. Whereas regression is a continuous classification: the likelihood (chance) of a certain value.
This is traditionally done in the following manner and steps:
- define a business problem (e.g. fraud handling)
- selection of data (transaction data of a bank)
- selection of the variables from the available data (e.g. name, address, transaction dates, number of transactions etc)
- cleaning of the data
- determine the model that needs to be build, e.g. y=f(x1,x2,x3,….,xn), where y is that likelihood for fraud and x1,…xn are the variables from the available data
- selection of an algorithm that fits the best for this problem and model
- selection of part of the data to build the model y=f(x), usually one takes about 20% of the data to fine-tune the model
- test the model, on values where the answers are known (check if the model works)
- If it does not work, go back to step 5…
- deployment of the model to detect fraud and re-iterate step 2-9 to improve the model.
Conclusion: a lot of consultancy combined with a long and laborious process.
KMX, through it’s patented software, automates the process as described above. It will look into all possible models that may exist, leaving no stone unturned. The result is in fact a complete scan of all patterns in the data, revealing more that you ever imagined was hidden in your data.
Profiling and personalization are meaningless words to most people because most software applications use inaccurate key word or linguistic based technologies to attempt profiling and personalization emulation with – in most cases – hopeless and disappointing results.
In contrast, KMX provides highly accurate analysis of a users’ information requirements utilizing both implicit and – where required – explicit techniques to establish dynamic, real time, results that go far beyond users experiences and in many cases, expectations.
KMX offers industry leading high performance in terms of precision and recall. The platform uses Oracle technology (Database and Oracle Data Mining) and ensures high performance and scalability. The system will automatically classify documents to a given set of classes and indicates the match of a given document to a number of classes. This can be useful for documents that have relevance to more than one class.
Treparel’s software is deployed to enhance knowledge discovery with a wide variety of capacity and performance requirements. With the ever-increasing volumes of users and data, and operations going 24/7, the importance of immediate results, and rapid and unpredictable increase in content and usage impose rigorous requirements on Treparel’s underlying technology.
Treparel builds on industry standards and uses the latest and proven state-of-the-art technology and hardware, to optimize each system for the problem it needs to solve.
KMX is designed to be modular, multi user, multi process and multi-threaded in design and to support a multiplexed communication model, enabling the provision of a demonstrably high performance, high-capacity platform for high performance data and text mining and visualization, both today and in the future.
Treparel packages a scalable and platform-independent core technology within modular servers that can be built into a technical solution that scales to the needs of the application – whether the target system is a single user or a large, heterogeneous corporate network.
Cloud/SaaS (Software as a Service)
Treparel KMX can be used as a Client/Service solution as a well as hosted in the Cloud. There is no need for your IT department to look after it. You don’t have to invest in expensive hardware or software, so you can be up and running by the swipe of a credit card (or paid invoice). KMX can be implemented quickly. All users have access to the platform immediately and simultaneously, while being able to share models and results.
Based on the method of implementation KMX is delivered a secured platform:
On Premise (for secured operations). KMX is installed on the clients data center, placed behind a companies firewall, where most of the security issues are covered within the network and are handled by the comanies ICT department.
Off Premise (hosted in the cloud) Treparel deploys for KMX a personalized encrypted log-in procedure (similar to what banks deploy for internet banking).
Treparel’s KMX™ enables organizations to seamlessly integrate with other systems. Founded on a technology that is modular by design and facilitates global distribution, Treparel has developed a flexible infrastructure that allows optional use of the latest Web Service standards to enable organizations to build innovative solutions. Build on Oracle technology, Treparel has ensured that the technology can be rolled out in any environment.
More and more software developers and application or service providers are building KMX in to their own solution to enhance their Text Analytics functionality (demanded by their clients).
Adding Intelligence to XML
KMX core infrastructure enables all modules to intercommunicate with one another delivering the key benefits of using the KMX API. This API enables servers to facilitate interaction. Communication with KMX is implemented over HTTP using XML.
The KMX API uses HTTP to allow custom-built applications (for example C++, Java, Python applications) to communicate with KMX core functionality.
Automated Classification of large documents sets
Searching in large volumes of text always produce too many results that are time-consuming to manually review. KMX provides a simple user interface and workflow for interactively train domain-specific so called text Classifiers that can be applied to search results. The classifier analyzes and ranks the results so that relevant documents are displayed first. Classifiers can be saved, shared and improved without the need for special technical expertise.
Treparel partners offer their users easy access to their full text datasources or line-of-business applications that are bundled with pre-developed Classifiers. Through this pre-classification users can use the Classifiers for quick access to large data sources like newsfeeds, blogs, media libraries, or scientific databases.
KMX is to text and content what Pandora (US) or Spotify (Europe) is to music.
Visualization of results
KMX enables users to visually explore sets of records drawn from an unlimited amount of data and text sources.
Application of the analytical visualizations generated in KMX are:
- Getting better insights in the portfolio of documents and unstructured data
- Create high-quality automated Classifiers, enabling detection of (hidden or unexpected) trends, patterns and relationships
- Find concepts
- Use landscaping and alerting.
The core concept text mining capabilities of KMX does not depend on an intimate knowledge of English grammatical structure or that of any particular language. It treats words as abstract symbols of meaning, deriving its understanding through the context of their occurrence rather than a rigid definition of grammar.
Even, Arabic languages and Asian characters can be handled by KMX. KMX™ handles over more than 30 international languages at this moment, while we continue to support more.
KMX Approach to Language
KMX is based on a advanced pattern-matching technology that exploits high-performance machine learning techniques to extract a document’s digital information content and determine the characteristics that give the text meaning. KMX technology is based on machine learning algorithms and it does not use any form of language dependent parsing or dictionaries. KMX treats words as abstract symbols of meaning deriving its understanding through the context of their occurrence rather than a rigid definition of the language grammar.
KMX develops a statistical understanding of the patterns that occur in the content that it sees over time. The more information KMX has about a particular type of information (e.g. legal terms, pharmaceutical developments, technology, legislation, media etc.) the more understanding it will have of those topics. A new language can be thought of as simply another ‘type’ of information that KMX needs enough material to learn from. Therefore, it is possible to mix more than one language as long as the amounts for each language are sufficient to build its understanding.
The choice of language does not compromise the accuracy of the concepts and output results extracted by the KMX server. The underlying algorithm is the same regardless of the language used.
Semantic computing (in particular semantic search) is ‘search made smarter’ as it boosts accuracy by taming ambiguity via an understanding of context. It delivers the searcher a better match to searched-for content and information.
Semantic computing systems (including semantic search) require the development of knowledge-bases including complex ontologies, annotation tools and data curation processes. Because of the complexity of semantic systems, the results from working with specialized semantic tools are usually black boxes and as such are very inflexible. Training semantic systems is a costly process requiring specialized expertise and very disciplined management, causing long and expensive development processes as they require involvement from subject-matter experts.
KMX Approach to Semantic search
KMX provides a descriptive route to semantics, making sense of information in-the-wild, as generated online, on social networks, and in everyday communications like email, content management systems or application notes. Artefacts of semantic computing like taxonomies, ontologies, thesauruses, controlled vocabularies and metadata can be generated automatically using a statistical approach to term-frequency.