Making big data a little smaller

While this result is nice, it also seems to mean that theoretically, we have already reached the limit in dimensional reduction for data compression.

Source: Science Daily

Harvard computer scientist demonstrates 30-year-old theorem still best to reduce data and speed up algorithms

Date:: October 19, 2017
Source:: Harvard John A. Paulson School of Engineering and Applied Sciences
Summary:: Computer scientists have found that the Johnson-Lindenstrauss lemma, a 30-year-old theorem, is the best approach to pre-process large data into a manageably low dimension for algorithmic processing.

When we think about digital information, we often think about size. A daily email newsletter, for example, may be 75 to 100 kilobytes in size. But data also has dimensions, based on the numbers of variables in a piece of data. An email, for example, can be viewed as a high-dimensional vector where there’s one coordinate for each word in the dictionary and the value in that coordinate is the number of times that word is used in the email. So, a 75 Kb email that is 1,000 words long would result in a vector in the millions.

This geometric view on data is useful in some applications, such as learning spam classifiers, but, the more dimensions, the longer it can take for an algorithm to run, and the more memory the algorithm uses.

As data processing got more and more complex in the mid-to-late 1990s, computer scientists turned to pure mathematics to help speed up the algorithmic processing of data. In particular, researchers found a solution in a theorem proved in the 1980s by mathematics William B. Johnson and Joram Lindenstrauss working the area of functional analysis.

Known as the Johnson-Lindenstrauss lemma (JL lemma), computer scientists have used the theorem to reduce the dimensionality of data and help speed up all types of algorithms across many different fields, from streaming and search algorithms, to fast approximation algorithms for statistical and linear algebra and even algorithms for computational biology.

Source:

Harvard John A. Paulson School of Engineering and Applied Sciences. “Making big data a little smaller: Harvard computer scientist demonstrates 30-year-old theorem still best to reduce data and speed up algorithms.” ScienceDaily. ScienceDaily, 19 October 2017. <www.sciencedaily.com/releases/2017/10/171019101026.htm>.

Author: mathtuition88

Math and Education Blog View all posts by mathtuition88

One thought on “Making big data a little smaller”

Harvard computer scientist demonstrates 30-year-old theorem still best to reduce data and speed up algorithms

Share this:

Related

Author: mathtuition88

One thought on “Making big data a little smaller”

Leave a comment Cancel reply