Who’s Big in Big Data?

05 November, 2013

Uncovering the real Big Data Innovators using Big Data technology

As it often happens with new concepts causing great impact on the current paradigms used for dealing with today’s problems, Big Data has become one of these fashionable terms that everybody loves talking about, but few would agree on a unique and concise definition of its actual meaning. Many companies claim they are into Big Data, but what are they actually doing? In this post we use Treparel KMX text analytics software to analyze and visualize the activities of Big Data-related companies based on information from the CrunchBase and IFI Claims databases.

This guest blog is written by: Enric Escorsa O’Callaghan PhD. CEO of IALE Tecnologia (Spain)

Key Findings

  • About 11.000 Crunchbase companies’ core business is related to managing big volumes of data, with software and enterprise as main areas of focus.
  • Main Big Data application for Enterprises are Market Intelligence and analytics.
  • Over 400 Crunchbase companies have patents on Big Data, in particular on software, mobile, analytics and security.
  • The top five Big Data patent assignees are IBM, Microsoft, SAP, Huawei, and ZTE Corporation.
  • IBM, Microsoft, and SAP focus on Big Data business management & analytics, whereas Chinese companies Huawei and ZTE Corporation on infrastructure
  • Expanding Big Data application areas: health, energy and aircraft industries
  • Chinese universities overwhelmingly dominating research centers Big Data patent activity.

Who is working in Big Data?

Many diverse actors play its role within the Big Data ecosystem: not only big companies used to handling big volumes of data, but also many new entrants such as web and software developers which have been able to identify new opportunities subjacent in Big data.

Fig 1: Landscape of Big Data-related companies from Crunchbase, grouped by their activities. (Colors represent the Crunchbase category. The amount of companies in each category is shown in the legend)

Fig 1: Landscape of Big Data-related companies from Crunchbase, grouped by their activities. (Legend: Colors represent the Crunchbase category plus Amount of companies) (Click to enlarge)

Considering the Big Data value chain’s associated levels, the list of actors can range from infrastructure developers, to search and data providers, analysts and up to those actually using Big Data solutions for specific applications. To find out what these companies are up to we first queried Crunchbase. CrunchBase is a reference database about technology based start-up companies, people and investors developed by Techcrunch, the leading media provider.

From a general search, we found more than 11.000 companies whose main activity is related to managing big volumes of data. Ironically enough, the best way of analyzing this large data set is to use Big Data text analytics software, in our case Treparel’s KMX. Figure 1 shows a landscape plot of companies (colored dots) grouped by topics (labels) as referred in their own products and services specifications. A more detailed analysis of Big Data companies that are active in largest CrunchBase categories, software solutions and enterprise, can be found in the appendix below.

Who owns Big Data technology?

It is one thing to analyze what these companies say they do, another thing is what type of technology they actually patent. If we search CrunchBase for companies mentioning owned patents we end up with 449 hits (230 with patented technologies and 219 with patent-pending technologies). The main CrunchBase categories to which these companies belong are Software, Mobile, Analytics and Security. This can also be seen in a landscape plot of their activities listed in CrunchBase (figure 2). A complete list of companies can be found in the appendix section (at the end of this blog).

Fig 2: Landscape of Big Data topics covered by patenting and patent-pending companies. CLICK TO ENLARGE

Fig 2: Landscape of Big Data topics covered by patenting and patent-pending companies. (Click to enlarge)

However, CrunchBase does not offer a complete picture of Big Data patents. Therefore, we also used KMX to analyze data-related patents from the IFIClaims, global patent database.

Figure 3 shows a landscape plot of the areas covered by Big Data patents published worldwide in the last 10 years. The dots represent a single patent and the colored dots are patents owned by the most prominent assignees. This casts an interesting light on the technical vocabulary related to the technologies whose protection has been sought after.

Fig 3: Landscape of areas covered by Big Data patents (publication years 2003-2013). CLICK TO ENLARGE

Fig 3: Landscape of areas covered by Big Data patents (publication years 2003-2013). (Click to enlarge)

(Click to enlarge)

Top 5 Big Data patent assignees

IBM tops the patent leaders list in the field; IBM’s top inventors (chief engineering manager Charles J Archer –now working at Intel- master inventor Michael Blocksome and software engineer Joe Ratterman –now working at Microsoft) account for an outstanding activity on data communication and parallel computing). Microsoft also counts more than 300 patents on Big Data, being some of them important applications in the areas of online advertisingand Business intelligence such as the ones held by its leading engineer Rajeev Prasad (now founder of Picube Labs). SAP also falls into the technology top players, with Chinese multinationals ZTE and HUAWEI following closely behind these historic leaders.

Table C in the appendix shows the main topics patented by the top 5 leaders. We note strong, continuous and diverse R&D activity being carried out by all leaders, although Western companies seem to focus primarily on analytical and business management processes, whereas the Chinese multinationals on infrastructure technologies above all.

More and more players in the Big Data ecosystem

Oracle is another clear leader in Big Data related technology development. It has filed patents on business applications integration with enterprise management Systems, and on other subjects as diverse as: efficient storage of xml documents based on paths, determination of the need for electronic signature in transactions to a database, guided navigation Systems, backup storage methods, dynamic record methods for blocking data duplicates, prediction of costumer behavior, changes in ranking algorithms based on customer Settings, or in self-learning data lenses, amongst many others.

We identify many other companies active in the field of Big Data. Apple, for example, owns patents on integrating structured and unstructured databases, on synchronization methods and Systems and dynamic data association and on protocols for transaction based Communications, amongst others.  CA Technologies has applied for protecting technologies on identification and generation of business events, processing mixed numeric and non-numeric data, data pack up and storage and dynamic query models. In fact, Google had pledged recently patents form IBM and CA Technologies, as announced in Techcrunch.

The list of Big data technology players follows with companies such as HP (patents on analytical data processing, continuous querying of a data stream, resource assignments for jobs in a processing pipelines, etc.), Japanese multinationals such as  Fujitsu (patents on distributed data processing systems, memory management programs, database and search engine query systems, speech recognition methods, etc.),  Matsushita  (digital filters, image coding methods, digital filters, assistive call center interfaces, etc.), Hitachi (storage server management), Toshiba (electronic documents recording) and NEC corporation (methods for estimating the load characteristics of a program and for categorizing unstructured data).

Some of the companies we found in CrunchBase are indeed patent assignees with several registered patents on their counter (e.g. NTT, KLA Tencor, Mastech Group, Exalead, Hansen Technologies, McAfee, Dasan Networks, Qwest, Intermec, Eka Systems, Itron, Aster Data Systems Inc.). But we also found CrunchBase companies that do not mention patents in their activity description, while owning many (e.g. patenting on hadoop based technologies are: Maana KK, Cloudera Inc., Ravel, Evident Software inc, Rainstor Ltd, Platfora Inc., Mapr Technologies, Innovolt Inc., Zettaset Inc., Data Treasure Corp).

A further analysis of recent patents can also be found in the appendix below.

Surprising big data applications

These patenting activities mostly fall in the areas we also found in CrunchBase. However we would like to mention a few notable but less obvious cases of specific Big Data applications. For example in Health Amanda Peters, a Lawrence Fellow at Lawrence Livermore National Laboratory has several patents filed on large-scale parallel applications to study research problems in areas ranging from cardiovascular disease to wireless networks to drug development. Canadian Wellpoint Systems, registers methods for computer-readable media managment of healthcare records.

Meanwhile, the multinational Celartem, headquartered in Japan and with strong presence in China, develops management software solutions for the analysis and maximization of Energy and electrical power use. A field in which China Electric power is also active.

An evidence of the application of Big Data to the aircraft industry is well represented by Boeing, that has recently registered several patents on intelligence analysis of unstructured data based on associative memory technology.

Analysis of Big Data patents filed in 2012 and 2013

Focusing now only in recent developments, patents registered in the latest two years (2012-2013, we obtain yet another revealing representation (figure C).

Fig C: Big Data patents filed between 2012-2013 (Click to enlarge)

Image processing is a prominent and growing area of interest; SonyCanonRicoh and Mitsubishi are companies actively patenting on image processing methods and media transfer protocols.

Virtualization software and services company VMware has  recently protected several inventions on processing and analyzing unstructured data and distributed computing processess. Operating in a close field, Nutanix has patents on virtual storage systems maintenance.

Other specific technologies have been developed by US companies such as Clarabridge, a Costumer Experience Management and Voice of the client Integration solutions provider; Clarabridge has applied for patents on transformation of structured and structured data and for theme detection in unstructured data. Cloudera, a hadoop cluster based platform provider for enterprises, registers patents on distributed computing cluster architectures. Exegy Inc, the US hardware acceleration Technology company commercializing technology developed in the School of Engineering at Washington University, protects associative methods of data retrieval and pipeline processing form distributed databases. Applied Materials files patents on dynamic content context querying for equipments. Developing concept-base analysis and discovery technologies from large datasets is the analytics service provider for enterprises IXreveal (Ureveal). Social media marketing  provider Socialware develops methodologies for tagging content on uncontrolled web Applications.  Infrastructure software provider TIBCO deals with collaborative, contextual enterprise networking systems and methods and Database query optimization and cost estimation.

In China, the e-commerce Alibaba group holding works on ranking and transmitting product information and methods for contextual recommendation of candidate products; UFIDA Enterprise management software does it on data query services;  prominent cloud computing and information system outsourcing services providers are Centrin Data Systems Co, with patents on data encapsulation; GUANGDONG Electronics Ind Inst Ltd protecting data storage and query methods; Opzoon on failure solving methods on distributed file systems.

As for the Universities and research centers, Chinese Universities are clearly and overwhelmingly dominating the recent Big Data patent activity: Univ Huazhong Science Tech, Univ Tsinghua, Univ Zhejiang, National Univ Defense Technology, Univ Beihang, Univ Xidian, Univ Wuhan, Univ Nanjing, State Grid Electric Power Research, Univ Southeast, Chinese academy Institute of Automation, Univ Electronic Science&tech and Nat University Chungnam have all a significant number of patents filed.

South Korean Telecom subsidiary SK Planet protects methods for online and Mobile data cluster analysis based on hadoop mapreduce framework.

In India, tech business consultancy Infosys has patents on eliminating duplicates in dynamic data. Also, Arista Networks is another company to keep an eye on. Its engineer Banekar Vishal has several patents filed on software defined cloud networks.


Detailed view of software- and enterprise-related activities in CrunchBase

Software solutions is the Crunchbase category including the highest number of companies (2763). A close-up map of this category only (figure A) shows differentiated topic clusters related to the companies activities: technology management software development, online tracking and analytics, social media engagement, trading/accounting/investment, mobile applications, cloud storage and virtual desktop software, security networks and encryption, recovery backup protection, file formats conversion, also patient clinical care.

Fig A: Close-up map of Big Data-related SOFTWARE companies (Crunchbase category: software)

Fig A: Close-up map of Big Data-related SOFTWARE companies (Crunchbase category: Software)

We also identify a significant number of companies (802) in the category Enterprise (figure B). The main solutions here are clearly those related to marketing and intelligence analytics; Virtual cloud storage and Online Backup recovery systems are also included. Hadoop cluster analytics technology, the open-source Big Data processing standard platform, is obviously featured. Other relevant subjects included in solutions for enterprises are B2B social sales and social media branding, video streams and media content management, etc.

Fig B: Close-up map of Big Data-related ENTERPRISE companies (Crunchbase category: Enterprise)

Fig B: Close-up map of Big Data-related ENTERPRISE companies (Crunchbase category: Enterprise)

Patent-owning companies in CrunchBase

Table A: List of Big Data-related CrunchBase companies mentioning patents or patent-pending technologies in their activity description.



Advertising and ecommerce

Algebraix Data Corp., NTT Data, Cirrus Data Solutions, Treparel.com, Hyperfine, Massive Analytic, Coversant, Inc., SimpleRelevance, Zettics, BuzzNumbers, Brevity, mon.ki, Tag This Car, SkyRank, InMage Systems, Kextil, Exobox, Technologies, CompIQ Corp., SAND, sambasafety, Boardwalktech, Kickfire, Secure Islands,TechnologiesEmcien, SwiftKnowledge, Discovery Logic, Hopedot Electronics, Flimp Media, Wolf Frameworks, Proofpoint, Varonis Systems, Skytide, Infineta Systems, PriceMetrix, Zeebric, Virtual Power Systems, Are You Watching This?!, Bazelevs Innovations, Sapience Analytics Pvt Ltd, Realtime Applications, Embotics, KLA Tencor, X1 Technologies, Vericept, Protexx, Hansen Technologies, GenieDB, Cryptzone, Sky-trax, Patentula, Zoral Labs, Clinithink, Local-Insights, SocialGlimpz, GadgetTrak, TOA Technologies, CoKinetic Systems Corp, RingCube Technologies, FastScaleTechnology, C2BII, Appnomic Systems, OpenAmplifyLiquiverse, Authentidate Holding, SalesVu, InteliWISE USA, The Neat Company, Shazzle, Third Solutions , Electron Database, Markzware, Infinetics Technologies, Jumio, AmegoWorld, MATRIXX Software, RF-iT Solutions, Venalytica, Kneebone, FALCONEER, Technologies, ODIN, Visual Purple, Mitek Systems, Agelio Networks, Litera, clearCi, Gravity Jack, gestigon, FormVerse, iolo, technologies, AppFolio SecureDocs, EmailAge, OnMyWay, Profoundis, Labs, Navisens, Mobiley, AlgoFast, BuildersCloud, whiteCryption PlaceIQ, BestBuzz, ValidasLet, Zympi, AlfredNFC.com, Ground Truth, EveryWare Technologies, iMobiTrax, Trustlook, ChartsNow, Narian Technologies, iGenApps, Hoopz Planet Info, Qwest, biNu, Altobridge, milog, Eon Corporation, InfoScout, Texas Energy Network, FaceOn Mobile, Shozu, Plusmo, Bitstream, MobileTrainer, ABSOLU TELECOM, Root Wireless, Geolenz, Recursion Software, Luxul Wireless, Saguna Networks, Vaayoo, eVOTZ, SignalSet, 5BARz International, MoBeam, Mobiplex, inmobly, Sprylogics International, WiseSec, Calypso Wireless, Sparkfly Social Rewards, VisualDNA, JovianDATA, CitizenNet, Aggregate Knowledge, TRA, TruSignal, Peer39Sendori, iMediaStreams, SEO Genie, Ace Metrix, Augme, Solariat, Webshoz, Be Spotted, BlisMedia, WikiSeer, Alicanto, Technologyuserlists, Augme Technologies, Trueffect, Screen?, 0-RA, ADOMIC (formerly YieldMetrics), Deal Linker, DeliveryEdge, InComm, TinyMassive, The Retail Zip, Company, KoalaDeal, i-Cue Design, U*tique, Secret Sauce Partners, WishClouds, ThirdLove


Analytics and Consulting


Rage Frameworks, Ilesfay, Technology Group, BehaviorMatrix, Fidelis Security Systems, KiteTaleCipherCloud, Catavolt, nCrypted Cloud, Cisco Accelerator for, Entrepreneurs, EcoFactor, NationalField, AnyPresence, Fileblaze, [x+1], Finjan, Agito, Networks, SandForce, Dynadec, CloudPrime, Tokutek, Chronotek, Jericho Systems, AppCentral, Inc., ConcernTrak, Alacritech, IT Analyzer, Firebind Inc., zaahah, GoLIMS Aster Data Systems, FusionOps, Xeround, Evolv, Sumo Logic, DrivingBuddy, xDayta, HStreaming, Via Science, Quantivo, Synesis, Amiato, CROSSROADS SYSTEMS, Minetta Brook, Connotate, BottlenoseAccelOps, Naiscorp Information, Technology Services, Kinetic Social, Subsidence, Vehcon, FirstRain, Kyield, Amplidata, RiskIQ, Deltasight, Umbel, Hireology, Correlor, Sureline Systems, Cirrascale, Mapegy, TARGUSinfo, TeleshuttleCorporation Service Company, Aaron Clarke Data Locker Inc., Ampex, Vendscreen, Drobo, Valencell, Anteryon, Intermec, JVC, Noitavonne, GSense Inc., Asetek, Zarlink, GreenGoose!, Pentasonics, GlassUp, BuQu Tech, Revolights, mySkin

Network hosting


Games video

Data Deposit Box is now KineticDAsankyaCompass DatacentersInfima TechnologiesInteliCloudANTlabs (Advanced Network Technology Laboratories Pte. Ltd.)NTT CommunicationsDASAN NetworksCENTRI TechnologyHuawei TechnologiesAs It IsGigamonSkyfiber Damballa, Narus, Seculert, Security First, Vaultive, Protegrity, Voltage Security, Access Forensics, SurfCanister, Forum Systems, Bangcle Security, McAfee, SenSage, QSecure, Socure, CRAM WorldwideNativeFlow, mSIGNIA, BreakingPoint Systems, NitroSecurity, FireEye, Marble Security, MarkMonitor, WebLookOn, Perceptive Pixel, SecureRF Corporation, Laconic Security, NATION Technologies, GoldKey, Security Corporation, DB Networks, Secure Access Technologies Inc., TopPatch Tag Networks, vushaper, Layerstream, Sportronix, iOpenerPPLive, Masstech Group, Trivid, Last 2 Left, Mobile Video Date



Public relations

PayPerks, ByAllAccounts, Mint.com AirCell, tawkon, Whisper, Communications, CelebCalls Taylor, AirNet Communications, PlaceTemplate, Clarite Research, VascoDe Technologies, Azurn, Networks, Qualcomm, Network Innovations, RENTi, eFinancial Communications

Health and Medical



Newtopia, QPID Health, Eliza Corporation, SweetWater Health, iFormulary , CardioCare, ConforMIS, RxAnte The SEO Engine, Q-Sensei, Back Azimuth Consulting, Exalead, Traffic Smart – Adthena, Pikimal Quantance, Symwave, eDesignTimeInvenSense, Visualant, Magnolia, Broadband, Vativ Technologies, SenseHere Technology, Kopin Corporation

Biotech and Cleantech



Cyclica, NeuroVigil, ADispell, ViroPharma, Arizona Instrument, Entelos, Biolauncher, Sway Balance, AdhereTech, Spectrum-Dynamics, Isowalk, Uscom, Hypertension, Diagnostics, TaKaDu, Blue Water, Satellite, SensorWare Systems, Eka Systems, Itron, Locust Storage, Diomede Storage, Avantium,Technologies, Green Charge Networks Optemo, myLykes, MakeCloud, TrendSpottr, Nifo, Shindig, ElephantDrive, SanDisk, CogniFit, WiseWindow, Agent Ace, CRITICAL TECHNOLOGIES, Vobile, Krillion, LoanKrunch, eCourier.co.uk, ClearFit, Rec.fm, Triangulate, Second Half Playbook, Gild, Whotever, Rock My World, SumoBrain Scour Prevention, Foodem, i365 A Seagate Company, Olixir Technologies, Idea Design Studio Group, Inc., Acacia Research, SPOC Medical, MarketsandMarkets, H2Tran, 54 Freedom, Butler Motorcycle Maps, Streamworks International SA, Veechi Corp, Spectraseis, Hiptype, Accredited Transcription Corp, Devolo, Solid Design Solutions, Inc., MarketChorus, GoodChime!, LogicTree, FlightCaster, GetGoing, SlipStream Data, DataWare Ventures, RagingWire, 1eq, NewRiver, Arcametrics Systems, Inc., Authentise, AgaMatrix Inc., Genscape, Achex, Wegora, Mongoose Metrics, FatPipe, Anue Systems, DR Systems, Dacuda AG, Ayata, Zuli, Fashinating, Metail, G2 Collective, GoMoto

CrunchBase companies focused on Mapreduce and Hadoop technologies

Table B: List of CrunchBase companies mentioning Mapreduce or Hadoop technology in their activity description.

Big Data Elephants

ClearStory Data

Big Data Partnership

Big Data Elephants

Bright Computing


Big Data Partnership

Mortar Data



MapR Technologies


Treasure Data

Clogeny Technologies



AMAX Information Technologies Inc


Aster Data Systems


Magnific Training



DataMine Lab

ClearStory Data


Nessos – {m}brace the cloud


Splice Machine


Mortar Data




Social Synaptics

Skyhigh Networks

Clogeny Technologies




iauro Systems Pvt Ltd

CIGNEX Datamatics






Evident Software

Treasure Data


Cetas Software



Drawn to Scale

Aster Data Systems



Perpetro Technologies Private Ltd


Mobius Knowledge Services







Pythian DrawntoScale GrepData Visionist, Inc.

Newsie Co.


Detailed patent description of top Big Data Assignees

Table C: detailed description of patents owned by the top Big Data assignees.


Integration: integrating Search and alert between structured and unstructured data sources, Data integration in service oriented platforms, Data protocols for integrating workflows executing in parallel processing platforms, Efficient retrieval using metadata, Enterprise architectures, Simplifying complex data stream problems involving feature extraction from noisy data…Natural Language processing such as Semantic indexing Systems, Question-answering Systems, Automatic glossary creation, use of  Hypothesis pruning, semantic graphs relating information assets, etc.Data analytics solutions for the cloud, Intelligent event-based data mining of unstructured information, Knowledge based data mining systems such as systems for discovering business processes from activity logs, Graphical User interfaces, etc.Micro or multiprocessors for efficient communication SystemsOn HHRR: Methods and analytics tools for locating experts with specific sets of expertise.


Structured and structured data models, Knowledge discovery in unstructured sources

Distributed analytics platforms, Enterprise search Systems and security, integration of accounting data, Exploitation of organizational knowledge

Integration of crowdsourcing data

Query results estimation, contextual queries

Communication Systems and message data management, synchronization of conversation structures in web-based email systems

Efficient data filtering methods

Information rules based classification methods

Semantic advertising and provision of knowledge content to users

Storing and retrieving XML data encapsulated as object in a data base Store, agregation trees in data centers


Providing or setting access of a user to resources in a computer System, involving historic events and tasksGeneric extraction of business object dataDocument clustering based on cohesive termsSecurity sensitive data flow analysisOLAP and Business intelligence architectures and multidimensional data analytics and visualizationContext object interface integrating structured and unstructured databases


Systems terminals and servers for transferring video calls between access networksMultimedia Broadcast Multicast Services (MBMS)Data Network Systems and nodes for transmitting dataParallel processing methods


Multimedia Broadcast Multicast Services (MBMS)WAPI (WLAN Authentication and Privacy Infrastructure) wireless network terminalsMethods for controlling and for updating data structureStorage of data packetsSynchronization processing methodsAlarm processing methodsResponse methods of large data quantity classification and web-page search.


Post a Comment

Your email address will not be published. Required fields are marked *