Treparel

  • Benefits
    • Big Data Text Analytics
      • Trends in Search and Discovery
    • Functional Benefits
    • Technical Benefits
    • Licenses Overview
    • Become a Client
  • Solutions
    • Intellectual Property & Patents
      • IP & Patent Use Cases
    • Life Sciences & Healthcare
    • News, Media & Publishing
    • Other Application Areas
    • Request your Trial
  • KMX for OEM
    • Benefits KMX API
    • Partners
      • Embedded Partner
      • Solution Partner
    • Become a Partner
  • Customers
    • Case Studies
    • Schedule a Call
  • News
    • Blog
    • Events
    • Webinars
      • Register for Webinar
    • Subscribe to Newsletter
  • Resources
    • Knowledge base (FAQ)
    • Text Analytics Genius section
    • Download Center
    • Customer Support
  • Company
    • Vision
    • Leadership team
    • Careers
    • Join Us
  • News

    • Predictive Coding: some multi billion dollars saving opportunity in litigation

      Posted by Marketing on 22 May 2013 0 Comments
      Predictive Coding: a Multi Billion Dollar Saving Opportunity in Litigation

      Predictive Coding: some multi billion dollars saving opportunity in litigation

      In the US by estimate US$20 billion is spent each year by having attorneys review documents linked to litigation or compliance.  One of the fastest growing sub-categories in eDiscovery (and legal services, in general) is the use of Predictive Coding technologies and techniques to replace linear human review processes.

      Talk to almost any organization about legal issues and invariably the subject of eDiscovery comes up as a thorny pain point. These discussions commonly focus on the high costs of eDiscovery related to document review. In this blog we provide an overview of the role of Predictive Coding (at Treparel we use the term  Classification or Auto-Categorization) as part of eDiscovery.

      Predictive Coding = eDiscovery 2.0

      Predictive coding technology is a new approach to attorney document review that can be used to help legal teams significantly reduce the time and cost of eDiscovery. The predictive coding technology is relatively new to the legal field, and significant confusion about the proper use of these tools is pervasive.

      Predictive Coding is a software tool or service that takes a large set of documents and with relatively minimal human input, codes or ranks them for you. Commonly a predictive coding engine will work by having a top-level reviewer/attorney look over a seed selection of documents and determine whether they are relevant or apply to a certain issue.

      Note: Other terms or concepts used in a similar way as Predictive Coding are: Classification, Suggestive Coding, Technology-assisted review (TAR), Automated Review, Auto-Categorization or Relevance Ranking.

      Use Cases and Business Benefits

      Predictive coding has slowly gained momentum in the legal community because many believe the technology can be more accurate than traditional review methodologies while simultaneously reducing review time and costs. Some of the key benefits and use cases of predictive coding technology are:

      • Reduced cost and time: The main reason predictive coding technology costs less and takes less time is that the technology requires fewer documents to be reviewed by humans. Instead of requiring humans to painstakingly review each document for responsiveness, the technology relies on human input to help prioritize important documents for review and eliminate the need to review other documents altogether. Review costs can be substantially decreased if the predictive coding software costs are less than the costs of manual review. A general rule to remember is that the fewer documents requiring manual review and the lower the cost of using predictive coding software, the more money saved.
      • Strategic negotiations The ability to rank a large group of documents by estimated degree of responsiveness is also a valuable method for reducing costs and time. Using KMX a reviewer can choose only that documents containing a prediction score in the 80 percent range and higher to appear responsive, a legitimate argument can be made that only the top 20 percent of documents should be produced. Reviewing and producing only the top 20 percent of documents compared to reviewing all the documents could result in significant time and cost savings for the producing party. The prioritization or filtering and ranking of documents can be used to eliminate the need to manually review documents with the lowest rankings. The more documents that can be eliminated without requiring human review, the faster the review process and the more money saved.
      • Assessment of early cases  Ranking documents by responsiveness also helps you find important documents quickly without requiring every document to be manually reviewed. The ability to identify the most important documents without first spending significant time and money sorting through other less important documents enables attorneys to assess the strength of their cases earlier. If key documents reveal a weak position, settling the case may be preferable to going to trial. On the other hand, if key documents are strong, then you gain leverage to help secure a better outcome through settlement negotiations or at trial. The ability to assess case strength early by ranking documents with predictive coding tools saves time and money.
      • Increased accuracy and reduced risk Many believe predictive coding technology can determine document responsiveness better than humans. Accuracy is important because the risk of overlooking important documents could have severe consequences.

      Humanly selected training data

      The predictive coding system analyzes the initial seed examples and identifies references in the text such as people, concepts, places, products and materials to generate rules that will find further concepts of the same type. The system then uses mathematical algorithms (in KMX it is machine learning using Support Vector Machine / SVM) to apply these rules across the entire universe of documents (again, this number is unlimited) and rank or code them correspondingly. A law firm may then use the auto-coded document set as-is for production (meaning no more human eyes have to view the documents) or treat the machine rankings as a guideline – still performing a human review on the ranked documents, but in a much more targeted way. For example, a firm would place the best reviewers on the most highly ranked documents and lower-level reviewers on less relevant or applicable documents. Either way, the system saves law firms lot’s of time and big money.

      We read in an eDiscovery blog about the similar topic that “usually a human review of smaller sets somewhere between a  1800 and 2500 documents will be enough to teach the system to auto-rank an unlimited number of remaining documents”. KMX however is requiring significantly less training (More about KMX’s approach to training: Optimize precision & recall of documents: How much training data do you need?) See: note on bottom of page.

      Is Predictive Coding defensible in court?

      A recent e-Discovery Journal poll showed what many critics of predictive coding have been espousing; namely that lawyers will fail to adopt the technology due to fear of inadequate defensibility. However, these notions are outdated and simply need some explanation in order to be assuaged. First and foremost the current standard of defensibility in document review includes human review and keyword searching, both of which have been proven to be highly unreliable. A 2008 TREC study analyzing the success of keyword searching indicated that on average, “Boolean keyword search found only 24% of the total number of responsive documents in the target data set.” Since this is the current court-accepted standard, it’s the only one you have to beat when defending a predictive coding solution. Most predictive coding products, including KMX, offer ample ability to sample, as well as transparency and a full auditing capability, this shouldn’t be hard to do. Moreover, more official studies regarding predictive coding are being performed currently. The ongoing 2010 TREC legal track study which aims to measure the effectiveness of predictive coding tools showed numbers well above the 24% level, even with the least effective products and made the conclusion, “the assumption that manual review is more effective than technology-assisted review is not necessarily valid.” Previous information retrieval studies have shown again and again how inconsistent and flawed human review is. Since predictive coding only has to beat that standard to come out ahead, the course is not actually that difficult.

      The Next Big Thing in eDiscovery

      Predictive Coding topped most of the recent years e-Discovery next big trend prediction lists – and with just cause. The rising tide of ediscovery in law and litigation is in no way abating, and cost and time savings is becoming of the utmost importance to litigants. With promptings from the bench to use technology above and beyond that of the keyword search, the natural next step is to consider a type of automated review. Different technology companies, including Treparel, are offering ready-to-deploy integrations to embed best-in-class predictive coding functionality in to your current eDiscovery solution.

      Predictive Coding in your Solution? Leverage the KMX API

      More on embedding the predictive coding KMX API: KMX for OEM

      Note:

      ✓ Recall: Refers to the proportion or percentage of truly responsive documents identified within a defined document population that are identified as responsive. In other words, recall is a measure of completeness.

      ✓ Precision: Refers to the proportion or percentage of documents identified within a defined document population that are truly responsive. In other words, precision is a measure of exactness.

      Continue Reading

    • Ontmoeting HKH Prinses Maxima en Treparel CEO op Week van de Ondernemer

      Posted by Marketing on 17 April 2013 0 Comments

      Treparel CEO aan tafel bij HKH Prinses Maxima op Week van de Ondernemer

      HKH Prinses Maxima op Week van de Ondernemer 2013

      HKH Prinses Maxima op Week van de Ondernemer 2013

      Utrecht, 11 april 2013. De ‘Week van de Ondernemer’ is al 16 jaar lang een groot en goed gewaardeerd evenement voor ondernemers in Nederland. Het biedt een programma met inspirerende gasten en topsprekers uit het bedrijfsleven, de politiek, de wetenschap en de media. Nieuw dit jaar is de introductie van het deelthema Ondernemend Financieren.

      Dit jaar heeft HKH Prinses Máxima, in haar rol als lid van het Comité voor Onderschap en Financiering, een speciaal werkbezoek gebracht aan de WvdO en in het bijzonder deel genomen aan een ronde tafelgesprek over de mogelijkheden van alternatieve financiering van het MKB.

      Het Ronde tafelgesprek ging over de mogelijkheden die alternatieve financiering biedt nu met name het MKB steeds meer moeite ondervindt bij het aantrekken van financiering. Uit de Financieringsmonitor die Economische Zaken eind vorig jaar naar de Tweede Kamer heeft gezonden blijkt dat bij kleine bedrijven 52% van de aanvragen voor vreemd vermogen wordt afgewezen (was 34% in de monitor over de tweede helft van 2011), en 4% bij grootbedrijf (was 10%).  Uit de frequente contacten met (potentiële) aanbieders van alternatieve financieringsvormen blijkt dat één van de knelpunten die zij ondervinden de onbekendheid is bij de categorieën bedrijven waar men zich op richt. In het regeerakkoord is de volgende passage opgenomen: “Nieuwe alternatieve financieringsvormen zoals kredietunies, crowdfunding en MKB-obligaties zullen worden ondersteund via promotie, wegnemen van belemmeringen in de regelgeving en door de inzet van kennis en bestaande instrumenten.” Het ronde tafelgesprek tussen enkele alternatieve financieringsaanbieders en ondernemers uit het MKB, onder gastheerschap van het Ministerie van Economische Zaken en in aanwezigheid van HKH Prinses Máxima, is een middel om de promotie van alternatieve financieringsvormen te bevorderen.

      Tijdens de Ronde Tafel kwamen onder leiding van dhr. Rinke Zonneveld, Directeur Onderschap bij het Ministerie van EZ, vier ondernemers en vier aanbieders van alternatieve financiering aan het woord. TIIN Capital werd vertegenwoordigt door Hans Biesheuvel, Voorzitter MKB Nederland, Commissaris en informeel investeerder in TIIN Capital. Hans gaf een enthousiast pleidooi om als ondernemer informeel te investeren met kapitaal, kennis en netwerk.

      Treparel CEO Jeroen Kleinhoven in gesprek met HKH Prinses Maxima

      Treparel CEO Jeroen Kleinhoven in gesprek met HKH Prinses Maxima

      Jeroen Kleinhoven, recentelijk aangesteld als algemeen directeur, introduceerde Treparel als ‘een pareltje van big data innovatie in het Nederlandse ondernemingslandschap’ en deelde ervaringen in de zoektocht van het bedrijf naar groei financiering. HKH Prinses Máxima, luisterde met interesse naar de verschillende mogelijkheden en toonde zich betrokken in de voor- en nadelen van de verschillende financieringsvormen in aanvulling op de traditionele rol van banken. Jeroen gaf voorbeelden van zijn positieve ervaring met TIIN in de ‘hands en eye-on samenwerking’ waarbij het bedrijf kan vertrouwen op mede-ondernemers als gesprekspartners bij TIIN bij het werven van nieuwe klanten en partnerships. HKH Prinses Máxima bedankte de deelnemers na afloop en gaf aan het gesprek als leuk en interessant te ervaren. Dit betekent dat ze veel nieuwe zaken heeft gehoord, in een prettige setting. Ze wenste bij vertrek Treparel persoonlijk een welgemeend succes toe.

      Meer over de Week van de Ondernemer en dit Ronde Tafel gesprek op: Nu Zakelijk

      Continue Reading

    • Analyzing the Graphene patent landscape reveals a global race

      Posted by Marketing on 10 April 2013 0 Comments

      Guest blog by: Elicet Cruz PhD. of IFI Claims partner IALE Tecnologia (Spain)

      Analyzing the Graphene Patent landscape reveals a global race

      A recent article by the BBC News talked about the surge in research into the novel material graphene to reveal an intensifying global contest to lead a potential industrial revolution. According to the BBC: “Latest figures show a sharp rise in patents filed to claim rights over different aspects of graphene since 2007, with a further spike last year China leads the field as the country with the most patents. The South Korean electronics giant Samsung stands out as the company with most to its name.”

      IALE used the data from Treparel’s partner IFI Claims and the KMX technology to provide more insights in the recent development oft he Graphene patent landscape.

      Graphene 1

      Figure 1: Graphene could find uses in computing, energy, medicine and other fields

      Graphene is a material which through its extraordinary properties has attracted the attention of both scientists and industry worldwide.

      It is an extremely thin sheet composed of a thick carbon atom with a networked, or hexagonal honeycomb structure, containing 50 million atoms per centimeter. In this regard is considered a two-dimensional material. When a graphene layer is placed one above the other, we obtain graphite. When wound forming spheres we obtain fullerene, and when wound forming tubes, carbon nanotubes are obtained. All of these three-dimensional shapes are materials from the same family.

      In addition to its thinness, graphene stands out for its high transparency, flexibility, strength, impermeability and high electrical conductivity. Its conductivity is superior to any known metal. Furthermore, it is considered an environmentally friendly material and is relatively cheap to produce.
      Due to these characteristics, graphene is considered a material with great future market potential, with applications in telecommunications (mobile telephony …), electronics (chip manufacturing …), medical – pharmaceutical, energy (solar panels), etc.

      Patents on Graphene

      Figure 2. Graphene, graphite, carbon nanotubes and fullerene

      Figure 2. Graphene, graphite, carbon nanotubes and fullerene

      Through a general search on graphene in the IFI CLAIMS Global Database, we obtain 12,878 granted patents and applications worldwide through December 2012. Figure 3 shows a patent landscape produced by the KMX patent analytics tool. The figure shows graphene patents, clustered according to main areas and technological development lines. As a next step, we create a KMX free classifier. We do this by labeling a few patents based on their location in the landscape and a review by the analyst. Then we train the classifier using the KMX Support Vector Machine algorithm developed by Treparel Information Solutions. The training process takes the labeled patents and uses them as a training set. Based on the full text of the patents, KMX applies labels to the entire collection based on their similarity to the training set. This interactive process of labeling, training and classifying can be repeated over and over again until we obtain the best classification.

      Figure 3. Graphene patents used to train the classsifier (above), and the entire collection after training and applying the KMX classifier (below).

      Figure 3. Graphene patents used to train the classsifier (above), and the entire collection after training and applying the KMX classifier (below).

      The top part of figure 3 shows the patents used as the training set; the lower part shows the classification results after training and applying the classifier.

      The two biggest clusters (“film, graphene, subtrate” containing 1,519 patents and “nanotubes, carbon, nanostructures” containing 2,096 patents) were extracted to create two new data sets. These were further classified, in order to explore the applications and developments specifically related to them (Figure 4).

      Figure 4. Data sets for "film, graphene, substrate" and "nanotubes, carbon, nanostructures". Produced by KMX with data provided by IFI CLAIMS Patent Services.

      Figure 4. Data sets for “film, graphene, substrate” and “nanotubes, carbon, nanostructures”. Produced by KMX with data provided by IFI CLAIMS Patent Services.

      The development of graphene across the years, shows a growing trend in patentability in the latest 10 years, with more than the 57% of the patents published between 2011 and 2012 (Figure 5).

      Evolution of patents on graphene

      Figure 5. Evolution of patents on graphene. Source: KMX and IFI CLAIMS Global Database.

      The number of graphene related patents have shown rapid growth over the last 10 years (Figure 5).  More than 57% of the patents were published in 2011 and 2012.

      Looking at International Patent Classification (IPC) codes, there are 4,935 codes covering graphene patents.  The most common are shown in Figure 6.

      Figure 6. Main IPC codes related to Graphene. Source: IFI CLAIMS Global Database.

      Figure 6. Main IPC codes related to Graphene. Source: IFI CLAIMS Global Database.

      The main patented contents are related to carbon preparation (C01B 31/02), graphite, including modified graphite (C01B 31/04), manufacture of carbon filaments (D01F 9/12), and to nanotechnologies for materials or surface science (B82K 30/00). There is a growing trend for all codes in the latest 3 years (2010-2012).

      Patents have been published in more than 30 priority countries. 96% of the patents during the period 1994 through 2012 were filed in only 8 countries. The United States (US), China (CN) and Japan (JP) are the most prominent countries in the period (Fig. 7).

      Figure 7. Main countries during the period 1994-2012. Source: IFI CLAIMS Global Database.

      Figure 7. Main countries during the period 1994-2012. Source: IFI CLAIMS Global Database.

      The covered patents belong to 6,831 families.  55% are  single patent families. 7 large families stand out due to their size (13-33 patents). Figure 8 shows the largest families, the subject areas claimed and the organizations which are the assignees of these families.

      Figure 8. Largest patent families. Source: IFI Claims Global Database. Created with KMX.

      Figure 8. Largest patent families. Source: IFI Claims Global Database. Created with KMX.

      Geim and Novoselov – The Original Researchers

      Graphene was discovered in 2004 by two Russian-born researchers Andre Geim (Sochi, 1958) and Konstantin Novoselov (Nizhny Tagil, 1974), professors at the University of Manchester (UK). They were awarded with the Nobel Prize in Physics in 2010.

      Until 2010, Geim and Novoselov, had not applied for any patents on this material, according to the article “Andre Geim: in praise of graphene” published in Nature News in October, 2010.

      In this interview Geim explained the reasons:

      “We considered patenting; we prepared a patent and it was nearly filed. Then I had an interaction with a big, multinational electronics company. I approached a guy at a conference and said, “We’ve got this patent coming up, would you be interested in sponsoring it over the years?” It’s quite expensive to keep a patent alive for 20 years. The guy told me, “We are looking at graphene, and it might have a future in the long term. If after ten years we find it’s really as good as it promises, we will put a hundred patent lawyers on it to write a hundred patents a day, and you will spend the rest of your life, and the gross domestic product of your little island, suing us.” … I considered this arrogant comment, and I realized how useful it was. There was no point in patenting graphene at that stage. You need to be specific: you need to have a specific application and an industrial partner.”

      However, after this interview, Geim and Novoselov decided to patent graphene innovations associated with specific applications, as shown in Figure 9.

      Figure 9 Patents on graphene with Geim y Novoselov as inventors (INV) or applicants (PA). Source: IFI CLAIMS Global Database. Created with KMX.

      Figure 9 Patents on graphene with Geim y Novoselov as inventors (INV) or applicants (PA). Source: IFI CLAIMS Global Database. Created with KMX.

      Geim (5 patents, 3 families) and Novoselov (7 patents, 4 families) have patented both together (5 patents, 3 families) and separately (3 patents, 1 family). The jointly filed patents are presented in the top part of Figure 9.  Novoselov’s patents appear in dark blue in the bottom part of Figure 9. These patents are associated with the cluster “fiber, polymer, composite.”

      There is no doubt that graphene has great potential within multiple industries.  This potential was validated by a Nobel Prize for the researchers who first synthesized it.  The high level of patent activity, especially in the years 2010-2012, reinforces this view.

      This article was originally published by IFI CLAIMS.

      Continue Reading

    • 2013: Convergence of Big Data and Cloud Computing

      Posted by Anton Heijs on 31 December 2012 0 Comments

      2013: Convergence of 2 major technological trends

      "Big Data" in Google Search Trends

      “Big Data” in Google Search Trends

      Delft, December 31, 2012: At the very end of 2012, only hours away from the new year , while the politicians of the republican party and the democratic party still work on a solution for the fiscal cliff  in the US, it is good to look back and important to look forward.  In 2012 the term “Big Data” certainly gained a lot of attention and although it may already many things to many people, it is clear that in 2013 we will hear more about analyzing very large complex data sets. Also cloud computing has become very important as approach to process large amounts of data when needed. Now we have scale-able approaches for very large data sets and compute jobs one would expect that we have made big steps forward in being able to learn more from more data.

      Demand for Quantitative Decision Making

      Since there is a strong demand for quantitative decision making supported by proving that one can make the best solution (most accurate with a minimal risk) one expects that we will see a convergence of big data technologies and cloud computing technology. There is much activity happening and every week we can read even about start-ups being well funded by investments to explore, develop and bring to the market new technologies in both domains. But many technology development activities do not guarantee convergence towards technology that enables large scale knowledge discovery. One thing that is needed is focus on the essentials to enable this. Of course many aspects are important such as technology standards and economics of scale and defining proper approaches to deal with privacy of data.

      "Cloud Computing" in Google Search Trends

      “Cloud Computing” in Google Search Trends

      One essential component is to be able to take into account all relevant data needed for analyzing all knowledge on a certain topic. This means that we need technology to enable us to analyze data from tables, text, images and graphs (network data) in whatever size (tera to petabytes) and number of observations (variables) as well as being able to combine static data with dynamic (temporal) data. We also need technology to detect and extract all patterns (regularities) in a data set and build automatically accurate and reliable mathematical models describing such a patterns. Even if we have this and are able to run this in a large scale setting, we still have another “essential nut to crack” and that is determining the meaning of the model that describe an pattern or a trend in a data set. In other words – we need large scale support for providing the semantic meaning of patterns and trends in data.

      Automatic Model Generation in Data Patterns

      Given the strong progress in research and development coming from the area of big data analytics and cloud computing and a market demand for this – it is logical to expect that in 2013 we will see much progress in the area of automatic model generation of patterns in data sets with support of semantics of these models. This will enable intelligent decision making – based on also understanding the context of multiple patterns in a dataset – and is a step further than decision making based only on a accurate model of a single pattern or trend in a complex data set.

      Continue Reading

    • Open Call for Users: Data for Business and Product development

      Posted by Marketing on 28 November 2012 0 Comments

      Receive up to 7000 euros as end user for the flagship applications developed for one of Europe’s leading data pooling platforms. If you are a professional or a SME and interested in data tools for patents, tenders, partner matching or customer feedback, then the Fusepool team is interested in your help to deliver usable and effective applications as close as possible to your user needs.

      Building on the existing applications of Fusepool partners in the patent and tender business, flagship applications are developed to provide clear added value of linked data demonstrated not only in principle but also in practice. Select one or more of the four applications described in the right column and apply today.

      Put Fusepool to the test, apply today: www.fusepool.eu/join/calls

      The Fusepool Data Pooling Platform

      PatentExplorer

      Designed to help boost your innovation potential, the Fusepool PatentExplorer gives you the tools to analyze patents related to your products and business processes, and to assess the value of your technology. Use it for finding and visualizing the relevant patents around your core technologies, and perform a SWOT analysis of the innovation position.

      FundingFinder

      The Fusepool FundingFinder provides an innovative recommender system for consolidating, managing, matching and distributing information about tenders and funding opportunities. It is the single-point of access where relevant details are available, structured and interlinked.

      PartnerMatch

      The Fusepool PartnerMatch helps find partners with similar or complementary capabilities for product research and development, by offering fine-grained partner matching capabilities. The partner matching capabilities are enriched through the other Fusepool use cases so that selecting a partner also shows related tenders, patents, or publications.

      CustomerFeedback

      Fusepool CustomerFeedback features algorithms for opinion mining to provide feedback to the chain’s components such that their accuracy can improve over time, learning from past iterations. With a small amount of user feedback, the components become “learnable” to improve accuracy on a wide variety of inputs that the system has not encountered previously.
      Download the Fusepool Open Call PDF flyer

      Continue Reading

    • How Big Data News Analytics leads to Data Driven Journalism

      Posted by Anton Heijs on 23 November 2012 0 Comments

      In the old days reading news was easy. We read what the journalists wrote and we trusted them that they analyzed the news and selected the most important developments assuming proper fact finding before they publish what was about to be the truth. The internet has changed this forever. The internet is a ‘free’ source of information for journalists as well as their readers. The journalist needs to demonstrate he did his fact finding using up-to-date sources of information and important news.

      A new Era in News finding and creation

      Data driven journalism is bringing news a level further with insight from data. The journalist is becoming a data scientist and analyzes available data sources to be able to support a story with indepth analysis of relevant data and thus add much more value to the story compared to the free information on the internet. Additionally data driven journalism provides the readers an insight look on important trends and developments in society based on the analysis and interpretation of public data often provided by goverment agencies.

      The Guardian (UK) is one of the most well know quality news papers and supports data driven journalism strongly by providing access to their data.

      The quality of news is determined by the quality of the data that has to take in to account many aspects including accuracy and completeness of data. Data Driven Journalism is going more in the direction of analyzing Big Data; to derive signal from noise a journalist needs to have access to professional reliable data, analysis methods and software tools.

      Analysing Big News from The Guardian

      To demonstrate how data journalism could work using advanced search tools we at Treparel have used The Guardian API to extract data sets of specific topics to analyze them. As an example we searched in the The Guardian API for all documents related to “big data” which resulted in a total of 10.521 documents. We have to analysed this in more detail as you can image the amount of noise in the data.

      Visualization of all the 10521 news articles where big data is mentioned in.

      The annotation terms on top of each cluster provide an overview of the different topic areas where big data is mentioned in the news articles provided by the Guardian.

      From this landscape visualization we immediately determine important topic areas (called ‘clusters of text’). These clusters with the most important words (‘annotations’) help us to easily identify the most addressed topics.

      A closer look on Google vs Microsoft

      Based on this visualization we notice that Google, Apple and Microsoft are mentioned often in ‘big data’ articles. We decide to filter where in these clusters these companies are mentioned to understand the relationship between the company and the topics of the clusters.

      All articles on ‘big data’ where Google is mentioned (shown as green dots)

      Google is mentioned in 1274 out of the 10.521 news articles.

      All articles on big data where Microsoft is mentioned (shown as red dots)

      Microsoft is mentioned 928 times as shown by the red dots in the visualization where these documents are more concentrated around games and Facebook, video and search and mobile internet. The articles where Google is dominantly mentioned are focussed more around search/video and mobile internet.

      All articles on big data where apple is mentioned (shown as blue dots)

      Apple is mentioned 1051 times as shown by the blue documents which is about 10% of the full set of 10521 articles. Apple is mentioned much broader with a focus on mobile phone and internet.

      To exclude the least relevant documents we decide to select all news articles documents that have a calculated relevance ranking above 80%: this helps us limiting the set of articles to the most important (or relevant) articles on ‘big data’ about Google, Apple (white/yellow dots) and Microsoft (red dots) (in total 215 articles).

      News articles ranked by relevance on internet technology.

      Now we have excluded irrelevant articles we can much better analyze what the most important articles are in respect to internet technologies.

      Is there a story?

      Through these analysis we are finding some insightful relevant articles about a general thema like ‘big data’.

      The is a cluster on ‘cloud computing’ and on ‘videos of Youtube’ but dominant in the centre are the articles about Apple’s technologies on tablets and mobile phones (iPad and iPhone). Related to this are the articles on patents where Nokia is important because they own many basic patents. When we look for the topics that are important in relation to Microsoft we see that this consists of their OS Windows but also games and the Xbox (where Sony pops up as well). If we then look what are the important articles related to Google we find ‘search technologies’ and ‘privacy’ related topics.

      We could ask ourselves now how this evolved over the years from 2000 to 2012. Since the most talked topics are related to Apple we select Apple and visualize all articles over time using blue rings for Apple and a color mapping from red (2000) to white (2012) which gives the final visualization shown below.

      Trend of all articles from 2000 (red) to 2012 (white) related to Apple (blue rings) and all other news articles.

      Trend of all articles from 2000 (red) to 2012 (white) related to Apple (blue rings) and all other news articles.

      Conclusion? Microsoft is entering the market late ….

      By looking at both visualizations we noticed that ‘big data’ is getting rapidly more media attention. But it also shows that Apple is gaining more interest from the The Guardian versus Google and Microsoft since 2010. Given the fact that the articles are about technology this demonstrates that also in ‘big data’ competition in gearing up.

      (Disclaimer: for this analysis we used KMX on a simple Window desktop PC. It took us les then 10 minutes to do the analysis of over 10.000 documents)

      Continue Reading

    • CEPIUG President: New ways to access patent information

      Posted by Marketing on 15 October 2012 0 Comments

      Guest blog by: Aalt van de Kuilen, President CEPIUG.

      Patent discovery: making specialist tools available for business users

      For many years, Patent Searchers retrieved patent information by using classical hosts (STN, Questel Dialog and ORBIT in the old days). A specific command language was essential for performing searches with these tools and good understanding of this language was a basic requirement.

      This was very useful for retrieving accurate legal information and finding relevant information for patentability, patent clearance as well as material for oppositions and due diligence. This information was mainly intended to be used by Patent Attorneys, to give a legal opinion on certain projects and/or products.

      information overload

      Information overload leads to growing need for business user friendly tools

      Over the last years, a new group of clients coming from the “business” has become more and more interested in patent information (as well as trends in patenting) So, the more specific requests for patent information not only for legal opinions, but also for more business related decisions became a fact. These “new clients” also want to receive a different kind of information out of patent data.

      Growing need for patent information

      We see a growing need for information relevant for more strategic/marketing decisions.

      To retrieve this information, new tools are needed, to create more landscape like output. On the other hand the possibility to handle bigger sets of data and perform more statistical analysis is a requirement.

      New initiatives has been undertaken to satisfy the needs of the searchers and several tools has been developed like tools for semantic searching.

      Over 90% relevancy is promising

      Also Treparel has come up with the KMX tool to handle larger sets of patent data. The basic principle by feeding the system with some relevant documents (which are particular relevant to the topic) makes it possible to retrieve “similar” documents almost immediately and with a high precision.

      First trials with the software look very promising, with a relevancy of over 90% for certain topics. Although we are not yet there, we think KMX is a product with future potency and is worth to be further developed and improved.

      We are looking forward to see the next steps in the development of this product. As president of the CEPIUG I will strongly support these initiatives and all of those who are looking to new ways of accessing bulk sets of patent data.

      aalt van de kuilen president cepiugAalt van de Kuilen, President CEPIUG

      About CEPIUG

      The CEPIUG (Confederacy of European Patent Information User Groups) was founded in 2008 and it aims to be a platform for cooperation for the Patent Information User Groups in Europe.

      Continue Reading

    • IP Diversification at Fujifilm: from photofilms to cosmetics

      Posted by Marketing on 15 October 2012 0 Comments

      Guest blog by: Enric Escorsa O’Callaghan of IFI Claims partner IALE Tecnologia (Spain).

      IP diversification by Fujifilm: from photofilm to cosmetics

      Fujifilm’s Astalift Anti Ageing Skin Care

      Antiaging at Fujifilm

      In 2006 Fuji Photo Film Co., Ltd. entered into the cosmetics sector  by launching a series of new antiaging products. Surely for some it was shocking to see how a company operating for decades in the photography industry had moved now into a completely different sector such as the healthcare one.

      In this article we try to understand the diversification process followed by the Japanese company by using IFI CLAIMS Direct patent database and Treparel KMX Patent Analytics Software.

      Fuji Photo Film has many years of experience studying the properties of Collagen. As explained by Andrzej Brylak, Fujifilm’s Europen Director, “Collagen is a key ingredient in the emulsion film and a material widely used by the cosmetics industry. It prevents oxidation from exposure to light and this is a major problem for protecting rolls of film as well as for preventing skin damage”

      Apart from Collagen, Fuji also focuses its research on other healthcare related areas such as the control of free radicals or the improvement of absorption and penetration processes.

      Patent portfolio diversification

      Let’s study Fuji’s patent portfolio over the last few years to track this diversification process.

      We search by applicant/assignee on IFI CLAIMS Global Patent Database and obtain all patents from Fujifilm. We import the text fields of the patent (title, abstract, claims, description) into the KMX Text Analytics tool. Within KMX, we can search and highlight documents containing specific terms such as “collagen”, “free radicals”, “reducing agent”, “skin”, etc.  A representative landscape visualization is shown below.

      1. Create an IP Landscape

      selecting patents from textual search

      Figure 1: Brushing and coloring patents according to textual matches with KMX.

       

       

       

       

       

       

       

       

       

       

       

      As expected, Fujifilm had been working on collagen for several years.  But when did Fujifilm begin developing other potential applications for collagen? When did that shift to another radically different market begin?  To answer those questions, the landscape map is a start, but perhaps it is not the most accurate approach, as some related processes can be described with other synonym or more abstract terms.

      Continue Reading

    • Trends in large document analysis

      Posted by Anton Heijs on 1 October 2012 0 Comments
      landscape

      Analysing large patent portfolios

      Today’s economy is more and more a global economy where information is available very fast. Technological innovation has become an important driver for many economies and global companies. Increased competition from more companies lead to lower margins on many products and shorter product life cycles. Many companies outsource their manufacturing processes to low wage countries and invest them in R&D to stay competitive. Companies look to obtain a faster return on their R&D investment directly driving the importance of a solid IP strategy. The irreversible change that has taken place is that many global companies shifted from production based to knowledge based competitiveness.

      Revenues from the R&D investments were obtained from selling products but now there is an additional value extraction point coming from IP licensing. This requires an attractive IP portfolio and provides the opportunity to obtain revenues directly after the R&D process. Value creation of IP and maximizing ROI from R&D investments needs a strategy and this needs a thorough SWOT analysis of all IP around the relevant technologies.

      Past and Present processes to fund R&D and IP development

      The traditional IP development process (revenue from product sales generates money to invest in R&D)

      The new process where revenues from IP generate revenue directly after the innovation (to be invested directly in R&D).

      The number of patent filings is still growing which leads to a growing need for  analysis of large patent collections to optimize revenue from IP licensing.

      Continue Reading

    • Fueling growth of SME companies in Europe

      Posted by Anton Heijs on 10 September 2012 0 Comments
      FP7 is Europe's largest funding programme for Research

      Fusepool to develop online SME services on the use of public open data

      In a global economy stimulation of innovation is essential for todays economies and the European Union 7th Framework Program (FP7) is very important in this. Treparel is part of a consortium called Fusepool.net that was granted a two year project. In the fusepool project will be delivering a software platform to help Small and Medium Enterprises (SME) in Europe to analyze their innovative capabilities and opportunities but also to find possible partners and funding opportunities. The analysis of large text document sets is a key component in the project.

      Fusepool fuels business growth of SME in Europe

      Treparel is contributing to the project with text analytics and visualization technology that will enable a SME’s to perform patent landscaping analysis on their technology area and to learn where their strength and weaknesses are and also where opportunities and threads can be identified. In such a SWOT analysis the opportunities and threads are important also for partner matching and finding possible research funding opportunities which also relies on large scale text analytics techniques. The other participating partners are the Bern University (BUAS) who is coordinator and Xerox Research, Searchbox, Geox and the European Network of Living Labs.

      You can watch the project evolving over time on the special website >> Fusepool.net

      Continue Reading

    • 1
    • 2
    • Recent Posts

      • Predictive Coding: some multi billion dollars saving opportunity in litigation
      • Ontmoeting HKH Prinses Maxima en Treparel CEO op Week van de Ondernemer
      • Analyzing the Graphene patent landscape reveals a global race
      • 2013: Convergence of Big Data and Cloud Computing
    • Twitter Stream

    • Archives

    • Social Links

  • Latest News

    • Predictive Coding: some multi billion dollars saving opportunity in litigation 22 May 2013
    • Ontmoeting HKH Prinses Maxima en Treparel CEO op Week van de Ondernemer 17 April 2013
    • Analyzing the Graphene patent landscape reveals a global race 10 April 2013
  • Social Links

  • About Treparel

    Treparel is a leading global software provider in Big Data Text Analytics and Visualization.

    The KMX platform allows organizations to enhance innovation processes, improve competitive advantage, mitigate litigation risk and cost and manage interactions with customers by gaining insights from numerous sources unstructured data (text, application notes, images, blogs, email and patents).

    Global companies, government agencies, software vendors or data publishers are using Treparel KMX text analysis software to gain faster, reliable, precise insights in large complex unstructured data sets allowing them to make better informed decisions.

Copyright 2006-2013. Treparel, Delftechpark 26, 2628XH Delft (The Netherlands) - info@treparel.com - Tel: +31 15 2600455