with first element 00 = f/Ilf]2! This is, in a ..... Proceedings of SPIE conference July 1993, San Diego. ... Academic Press, San Diego, 1994, 233270. 10. Donoho ...
UNCLASSIFIED
Defense Technical Information Center Compilation Part Notice ADPO 11978 TITLE: Curvelets: A Surprisingly Effective Nonadaptive Representation for Objects with Edges DISTRIBUTION: Approved for public release, distribution unlimited
This paper is part of the following report: TITLE: International Conference on Curves and Surfaces [4th], SaintMalo, France, 17 July 1999. Proceedings, Volume 2. Curve and Surface Fitting To order the complete compilation report, use: ADA399401 The component part is provided here to allow users access to individually authored sections f proceedings, annals, symposia, etc. However, the component should be considered within
[he context of the overall compilation report and not as a standalone technical report. The following component part numbers comprise the compilation report: ADP011967 thru ADPO12009
UNCLASSIFIED
Curvelets: A Surprisingly Effective Nonadaptive Representation for Objects with Edges Emmanuel J. Cand~s and David L. Donoho Abstract. It is widely believed that to efficiently represent an otherwise smooth object with discontinuities along edges, one must use an adaptive representation that in some sense 'tracks' the shape of the discontinuity set. This folkbelief  some would say folktheorem  is incorrect. At the very least, the possible quantitative advantage of such adaptation is vastly smaller than commonly believed. We have recently constructed a tight frame of curvelets which provides stable, efficient, and nearoptimal representation of otherwise smooth objects having discontinuities along smooth curves. By applying naive thresholding to the curvelet transform2 of such an object, one can form mterm approximations with rate of L approximation rivaling the rate obtainable by complex adaptive schemes which attempt to 'track' the discontinuity set. In this article we explain the basic issues of efficient mterm approximation, the construction of efficient adaptive representation, the construction of the curvelet frame, and a crude analysis of the performance of curvelet schemes.
§1. Introduction In many important imaging applications, images exhibit edges  discontinuities across curves. In traditional photographic imaging, for example, this occurs whenever one object occludes another, causing the luminance to undergo step discontinuities at boundaries. In biological imagery, this occurs whenever two different organs or tissue structures meet. In image synthesis applications, such as CAD, there is no problem in dealing with such discontinuities, because one knows where they are and builds the discontinuities into the representation by specially adapting the representation  for example, inserting free knots, or adaptive refinement rules. In image analysis applications, the situation is different. When working with real rather than synthetic data, one of course doesn't 'know' where these edges are; one only has a digitized pixel array, with potential imperfections caused by noise, by blurring, and of course by the unnatural pixelization of the underlying continuous scene. Hence the typical image analyst only Curve and Surface Fitting: SaintMalo 1999 Albert Cohen, Christophe Rabut, and Larry L. Schumnaker (eds.), pp. 105120. Copyright 02000 by Vanderbilt University Press, Nashville, TN. ISBN 0826513573. All rights of reproduction in any form reserved.
105
106
E. J. Cand~s and D. L. Donoho
has recourse to representations which don't 'know' about the existence and geometry of the discontinuities in the image. The success of discontinuityadapting methods in CAD and related image synthesis fields creates a temptation for an image analyst  a temptation to spend a great deal of time and effort importing such ideas into image analysis. Almost everyone we know has yielded to this temptation in some form, which creates a possibility for surprise. Oracles and ideallyadapted representation One could imagine an ideallyprivileged image analyst who has recourse to an oracle able to reveal the positions of all the discontinuities underlying the image formation. It seems natural that this ideallyprivileged analyst could do far better than the normallyendowed analyst who knows nothing about the position of the discontinuities in the image. To elaborate this distinction, we introduce terminology borrowed from fluid dynamics, where 'edges' arise in the form of fronts or shock fronts. A Lagrangian representation is constructed using full knowledge of the intrinsic structure of the object and adapting perfectly to that structure.
"* In
fluid dynamics this means that the fluid flow pattern is known, and one constructs a coordinate system which 'flows along with the particles', with coordinates mimicking the shape of the flow streamlines. "* In image representation this could mean that the edge curves are known, and one constructs an image representation adapted to the structure of the edge curves. For example, one might construct a basis with discontinuities exactly where the underlying object has discontinuities. An Eulerian representation is fixed, constructed once and for all. It is nonadaptive  having nothing to do with the known or hypothesized details of the underlying object.
"* In
fluid dynamics, this would mean a usual euclidean coordinate system, one that does not depend in any way on the fluid motion. "* In image representation, this could mean that the representation is some fixed coordinate representation, such as wavelets or sinusoids, which does not change depending on the positions of edges in the image. It is quite natural to suppose that the Lagrangian perspective, when it is available, is much more powerful that the Eulerian one. Having the privilege of 'inside information' about the position of important geometric characteristics of the solution seems a priori rather valuable. In fact, this position has rather a large following. Much recent work in computational harmonic analysis (CHA) attempts to find bases which are optimally adapted to the specific object in question [7,10,11]; in this sense much of the ongoing work in CHA is based on the presumption that the Lagrangian viewpoint is best. In the setting of edges in images, there has, in fact, been considerable interest in the problem of developing representations which are adapted to the structure of discontinuities in the object being studied. The (equivalent)
107
Curvelets
concepts of probing and minimum entropy segmentation are old examples of this: wavelet systems which are specifically constructed to allow discontinuities in the basis elements at specific locations [8,9]. More recently, we are aware of much informal unpublished or preliminary work attempting to build 2D edgeadapted schemes; we give two examples. e Adaptive triangulation aims to represent a smooth function by partitioning the plane into a sequence of triangular meshes, refining the meshes at one stage to create finer meshes at the next stage. One represents the underlying object using piecewise linear functions supported on individual triangles. It is easy to see how, in an image synthesis setting, one can in principle develop a triangulation where the triangles are arranged to track a discontinuity very faithfully, with the bulk of refinement steps allocated to refinements near the discontinuity, and one obtains very effective representation of the object. It is not easy to see how to do this in an image analysis setting, but one can easily be persuaded that the development of adaptive triangulation schemes for noisy, blurred data is an important and interesting project. e In an adaptively warped wavelet representation, one deforms the underlying image so that the object being analyzed has all its discontinuities aligned purely horizontal or vertical. Then one analyzes the warped object in a basis of tensorproduct wavelets where elements take the form O'J,k(Xl) • •i',k'(X 2 ). This is very effective for objects which are smooth apart from purely horizontal and purely vertical discontinuities. Hence, the warping deforms the singularities to render the the tensor product scheme very effective. It is again not easy to see how adaptive warping could work in an image analysis setting, but one is easily persuaded that development of adaptively warped representations for noisy, blurred data is an important and interesting project. Activity to build such adaptive representations is based on an article of faith: namely, that Eulerian approaches are inferior, that oracledriven Lagrangian approaches are ideal, and that one should, in an image analysis setting, mimic Lagrangian approaches, attempting empirically to estimate from noisy, blurred data the information that an oracle would supply, and build an adaptive representation based on that information. Quantifying rates of approximation In order to get away from articles of faith, we now quantify performance, using an asymptotic viewpoint. Suppose we have an object supported in [0, 1]2 which has a discontinuity across a nice curve F, and which is otherwise smooth. Then using a standard Fourier representation, and approximating with Fmbuilt from the best m nonzero Fourier terms, we have
If 

1oo mF1
m
(1)
108
E. J. Cand~s and D. L. Donoho
This rather slow rate of approximation is improved upon by wavelets. The approximant fw built from the best m nonzero wavelet terms satisfies
1IIrWnm
Itff
m21
(2)
This is better than the rate of Fourier approximation, and, until now, is the best published rate for a fixed nonadaptive method (i.e. best published result for an 'Eulerian viewpoint'). On the other hand, we will discuss below a method which is adapted to the object at hand, and which achieves a much better approximation rate than previously known 'nonadaptive' or 'Eulerian' approaches. This adaptive method selects terms from an overcomplete dictionary and is able to achieve 1If  f
LJI×
m2,
m
(3)
00.
Roughly speaking, the terms in this dictionary amount to triangular wedges, ideally fitted to approximate the shape of the discontinuity. Owing to the apparent trend indicated by (1)(3) and the prevalence of the puritanical belief that 'you can't get something for nothing', one might suppose that inevitably would follow the FolkConjecture/[FolkTheorem]. The result (3) for adaptive representations far exceeds the rate of mterm approximation achievable by fixed nonadaptive representations. This conjecture appeals to a number of widespread beliefs:
"* the belief that "* the belief that
adaptation is very powerful, the way to represent discontinuities in image analysis is to mimic the approach in image synthesis, "•the belief that wavelets give the best fixed nonadaptive representation. In private discussions with many respected researchers we have many times heard expressed views equivalent to the purported FolkTheorem. The surprise It turns out that performance almost equivalent to (3) can be achieved by a nonadaptive scheme. In other words, the FolkTheorem is effectively false. There is a tight frame, fixed once and for all nonadaptively, which we call a frame of curvelets, which competes surprisingly well with the ideal adaptive rate (3). A very simple mterm approximation  summing the m biggest terms in the curvelet frame expansion  can achieve 1if
fC 112 < c.
m
2
3 (logm) ,
m
* 00,
(4)
which is nearly as good as (3) as regards asymptotic order. In short, in a problem of considerable applied relevance, where one would have thought that adaptive representationwas essentially more powerful than fixed nonadaptive representation, it turns out that a new fixed nonadaptive representation is essentially as good as adaptive representation, from the point of view of asymptotic mterm approximation errors. As one might expect, the new nonadaptive representation has several very subtle and distinctive features.
109
Curvelets Contents
In this article, we would like to give the reader an idea of why (3) represents the ideal behavior of an adaptive representation, of how the curvelet frame is constructed, and of the key elements responsible for (4). We will also attempt to indicate why curvelets perform for singularities along curves the task that wavelets perform for singularities at points.
§2. A Precedent: Wavelets and Point Singularities We mention an important precedent  a case where a nonadaptive scheme is roughly competitive with an ideal adaptive scheme. Suppose we have a piecewise polynomial function f on the interval [0, 1], with jump discontinuities at several points. An obvious adaptive representation is to fit a piecewise polynomial with breakpoints at the discontinuities. If there are P pieces and each polynomial is of degree < D, then we need only keep P. (D + 1) coefficients and P  1 breakpoints to exactly represent this function. Common sense tells us that this is the natural, and even, the ideal representation for such a function. To build this representation, we need to know locations of the discontinuities. If the measurements are noisy or blurred, and if we don't have recourse to an oracle, then we can't necessarily build this representation. A less obvious but much more robust representation is to take a nice wavelet transform of the object, and keep the few resulting nonzero wavelet coefficients. If we have an Npoint digital signal f(i/N), 1 < i < N, and we use Daubechies wavelets of compact support, then there are no more than C. log 2 (N) . P . (D + 1) nonzero wavelet coefficients for the digital signal. In short, the nonadaptive representation needs only to keep a factor C log2 (N) more data to give an equally faithful representation. We claim that this phenomenon is at least partially responsible for the widespread success of wavelet methods in data compression settings. One can build a single fast transform and deal with a wide range of different f, with different discontinuity sets, without recourse to an oracle. In particular, since one almost never has access to an oracle, the natural first impulse of one committed to the adaptive viewpoint would be to 'estimate' the break points  i.e. to perform some sort of edge detection. Unfortunately this is problematic when one is dealing with noisy blurred data. Edge detection is a whole topic in itself which has thousands of proposed solutions and (evidently, as one can see from the continuing rate of publication in this area) no convincing solution. In using wavelets, one does not need edge detectors or any other problematic schemes, one simply extracts the big coefficients from the transform domain, and records their values and positions in an organized fashion. We can lend a useful perspective to this phenomenon by noticing that the discontinuities in the underlying f are point singularities, and we are saying that wavelets need in some sense at most log(n) coefficients to represent a point singularity out to scale 1/n.
110
E. J. Cand~s and D. L. Donoho
It turns out that even in higher dimensions, wavelets have a nearideal ability to represent objects with point singularities. The twodimensional object f,(Xl, x 2 ) = 1/((xi  1/2)2 + (x2  1/2)2), has, for /3 < 1/2, a squareintegrable singularity at the point (1/2, 1/2) and is otherwise smooth. At each level of the 2D wavelet pyramid, there are effectively only a few wavelets which 'feel' the point singularity, other coefficients being effectively negligible. In approximation out to scale 1/n, only about O(log(n)) coefficients are required. Another approach to understanding the representation of singularities, which is not limited by scale, is to consider rates of decay of the countable coefficient sequence. Analysis of wavelet coefficients of f, shows that for any desired rate p, the Nth largest coefficient can be bounded by CpNP for all N. In short, the wavelet coefficients of such an object are very sparse. Thus we have a slogan: wavelets perform very well for objects with point singularities in dimensions 1 and 2. §3. Failure of Wavelets on Edges We now briefly sketch why wavelets, which worked surprisingly well in representing point discontinuities in dimension 1, are less successful dealing with 'edge' discontinuities in dimension 2. Suppose we have an object f on the square [0, 1]2 and that f is smooth away from a discontinuity along a C2 curve F. Let's look at the number of substantial wavelet coefficients. A grid of squares of side 2 i by 2i has order 2i squares intersecting F. At level j of the twodimensional wavelet pyramid, each wavelet is localized near a corresponding square of side 2i by 23. There are therefore 0(2i) wavelets which 'feel' the discontinuity along F. Such a wavelet coefficient is controlled by
I(f,¢ j,ki,k 2 )l
mo(7r).
Then it seems natural to say that p* is the optimal rate of mterm approximation from any dictionary when polynomial depth search delimited by 7r(.).
Curvelets
113
Starshaped objects with C2 boundaries We define StarSet 2 (C), a class of starshaped sets with C 2 smooth boundaries, by imposing regularity on the boundaries using a kind of polar coordinate system. Let p(O) : [0, 27r) 4 [0, 1] be a radius function and bo = (x 1 ,0 , x 2 ,0 ) be an origin with respect to which the set of interest is starshaped. With 6i(x) = xi  xi,o, i = 1, 2, define functions O(xl, x2) and r(xl, x 2 ) by
0 = arctan(62/6 1 );
r = ((61)2 + (62)2)1/2.
For a starshaped set, we have (x 1 , x 2 ) E B iff 0 < r < p(O). Define the class StarSet 2 (C) of sets by
11{:B BC [110'10 9]12, 10 < P(O)
0E [0,27r),
:5 1
1
pEC2,jIgo)~ I•! C1,
and consider the corresponding functional class Star 2 (C) =
{f
= 1B:
B
E StarSet 2 (C)}.
The following lower rate bound should be compared with (3). Lemma. Let the polynomial wr(.) be given. There is a constant c so that, for every dictionary D, em(Star2 (C); D, wr)
c
2
log(m)
m*+0.
This is proved in [10] by the technique of hypercube embedding. Inside the class Star 2 (C) one can embed very highdimensional hypercubes, and the ability of a dictionary to represent all members of a hypercube of dimension n by selecting m < n terms from a subdictionary of size r(m) is highly limited if wr(m) grows only polynomially. For each f, a corresponding orthobasis is adaptively constructed in [13] which achieves the rate (3). It tracks the boundary of B at increasing accuracy using a sequence of polygons; in fact these are ngons connecting equispaced points along the boundary of B, for n = 2j. The difference between ngons for n = 2' and n = 2 j+l is a collection of thin triangular regions obeying width z length 2 ; taking the indicators of each region as 'a term in a basis, one gets an orthonormal basis whose terms at fine scales are thin triangular pieces. Estimating the coefficient sizes by simple geometric analysis leads to the result (3). In fact, [13] shows how to do this under the constraint of polynomialdepth selection, with polynomial Cm 7 . Although space constraints prohibit a full explanation, our polynomialdepth search formalism also makes perfect sense in discussing the warped wavelet representations of the Introduction. Consider the noncountable 'dictionary' of all wavelets in a given basis, with all continuum warpings applied. Notice that for wavelets at a given fixed scale, warpings can be quantized with a certain finite accuracy. Carefully specifying the quantization of the warping, one obtains a countable collection of warped wavelets, for which polynomial depth search constraints make sense, and which is as effective as adaptive triangulation, but not more so. Hence (3) applies to (properly interpreted) deformation methods as well.
114
E. J. Cand~s and D. L. Donoho §5. Curvelet Construction
We now briefly describe the curvelet construction. It is based on combining several ideas, which we briefly review:
"* Ridgelets,
a method of analysis suitable for objects with discontinuities across straight lines. "* Multiscale Ridgelets, a pyramid of windowed ridgelets, renormalized and transported to a wide range of scales and locations. "* Bandpass Filtering, a method of separating an object into a series of disjoint scales. We briefly describe each idea in turn, and then their combination. Ridgelets The theory of ridgelets was developed in the Ph.D. Thesis of Emmanuel Cand~s (1998). In that work, Cand~s showed that one could develop a system of analysis based on ridge functions .,b,O (xI, x2) = a1/ 2 0((xl cos(O) + X2 sin(O)  b)/a).
(5)
He introduced a continuous ridgelet transform Rf(a, b, 0) = (0a,b,6(x), f) with a reproducing formula and a Parseval relation. He also constructed frames, giving stable series expansions in terms of a special discrete collection of ridge functions. The approach was general, and gave ridgelet frames for functions in L 2 [0, 1 ]d in all dimensions d > 2  For further developments, see [3,5]. Donoho [12] showed that in two dimensions, by heeding the sampling pattern underlying the ridgelet frame, one could develop an orthonormal set for L 2 (1R2 ) having the same applications as the original ridgelets. The orthonormal ridgelets are convenient to use for the curvelet construction, although it seems clear that the original ridgelet frames could also be used. The orthoridgelets are indexed using A = (j, k, i, e, E), where j indexes the ridge scale, k the ridge location, i the angular scale, and f the angular location; f is a gender token. Roughly speaking, the orthoridgelets look like pieces of ridgelets (5) which are windowed to lie in discs of radius about 2i; 9i,j = f/ 2 i is roughly the orientation parameter, and 2 i is roughly the thickness. A formula for orthoridgelets can be given in the frequency domain =
IfIi 6
Iwt(9) + Ojk(J6J)W'(O + 7r))/2
Here the Oj,k are Meyer wavelets for IR, w',, are periodic wavelets for [7r, 7r), and indices run as follows: j,k C 2Z, e = 0,....,2i1  1; i > 1, and, if E = 0, i= max(1,j), while if c = 1, i > max(1,j). We let A be the set of such A. The formula is an operationalization of the ridgelet sampling principle:
"* Divide the frequency domain in dyadic coronae 161 c [2 i, 2i+1]. "* In the angular direction, sample the jth corona at least 2i times. "* In the radial frequency direction, sample behavior using local cosines.
Curvelets
115
The sampling principle can be motivated by the behavior of Fourier transforms of functions with singularities along lines. Such functions have Fourier transforms which decay slowly along associated lines through the origin in the frequency domain. As one traverses a constant radius arc in Fourier space, one encounters a 'Fourier ridge' when crossing the line of slow decay. The ridgelet sampling scheme tries to represent such Fourier transforms by using wavelets in the angular direction, so that the 'Fourier ridge' is captured neatly by one or a few wavelets. In the radial direction, the Fourier ridge is actually oscillatory, and this is captured by local cosines. A precise quantitative treatment is given in [4]. Multiscale ridgelets Think of orthoridgelets as objects which have a "length" of about 1 and a "width" which can be arbitrarily fine. The multiscale ridgelet system renormalizes and transports such objects, so that one has a system of elements at all lengths and all finer widths. In a light mood, we may describe the system impressionistically as "brush strokes" with a variety of lengths, thicknesses, orientations and locations. The construction employs a nonnegative, smooth partition of energy function w, obeying kl,k 2 W 2 (XI  k 1 , X 2  k2) =_ 1. Define a transport operator, so that with index Q indicating a dyadic square Q = (s, ki, k 2 ) of the form [k 1/2 8 , (k1 + 1)/2s) x [k2 /2 8, (k 2 + 1)/2s), by (TQf)(xl, x 2 ) = f(2sxl  k 1, 2sx2  k 2 ). The Multiscale Ridgelet with index [t = (Q, A) is then 01 = 2'• TQ(w . p)3. In short, one transports the normalized, windowed orthoridgelet. Letting Q 8 denote the dyadic squares of side 2s, we can define the subcollection of Monoscale Ridgelets at scale s: M48 = {(Q,A): Q E Q ,A E A}. Orthonormality of the ridgelets implies that each system of monoscale ridgelets makes a tight frame, in particular obeying the Parseval relation
S
f 2 = If11L2.
/1EM.
It follows that the dictionary of multiscale ridgelets at all scales, indexed by
M = U.>iM, is not frameable, as we have energy blowup: E AE M
(0,
f>2 =
.(6)
116
E. J. Cand~s and D. L. Donoho
The Multiscale Ridgelets dictionary is simply too massive to form a good analyzing set. It lacks interscale orthogonality  P(Q,•) is not typically orthogonal to 0(QX') if Q and Q' are squares at different scales and overlapping locations. In analyzing a function using this dictionary, the repeated interactions with all different scales causes energy blowup (6). Our construction of curvelets solves (6) by disallowing the full richness of the Multiscale Ridgelets dictionary. Instead of allowing 'brushstrokes' of all different 'lengths' and 'widths', we allow only those where width .• length2 . Subband filtering Our solution to the 'energy blowup' (6) is to decompose f into subbands using standard filterbank ideas. Then we assign one specific monoscale dictionary M, to analyze one specific (and specially chosen) subband. We define coronae of frequencies 11 C[22s, 22s+2], and subband filters A, extracting components of f in the indicated subbands; a filter P 0 deals with frequencies 1ý] < 1. The filters decompose the energy exactly into subbands:
I1f112 = IIPofIl + E IIAsfI1. S
The construction of such operators is standard [15]; the coronization oriented around powers 2 2s is nonstandard  and essential for us. Explicitly, we build a sequence of filters o%and 'P2s = 24sT(22s.), s = 0, 1, 2,... with the following properties: 4DO is a lowpass filter concentrated near frequencies 1ý1 < 1; T'•2s is bandpass, concentrated near 1j1 E [228, 22,+2]; and we have
1ý0(C)12 +
S 1I,(2
2
s.)1 2 = 1,
VC.
s>0
Hence, A, is simply the convolution operator AJf =
1'2s *
f.
Definition of curvelet transform Assembling the above ingredients, we are able to sketch the definition of the Curvelet transform. We let M' consist of M merged with the collection of integral pairs (kl, k 2 ) indexing unitside squares in the plane. The curvelet transform is a map L 2 (1R2 ) * f 2 (M'), yielding Curvelet coefficients (aP : p E M'). These come in two types. At coarse scale we have wavelet scaling function coefficients
ap = (Okk,k 2 ,Pof),
P
=
(k1 , k 2 )
C
M'\M,
where Ok1 ,k 2 is the Lemari4 scaling function of the Meyer basis, while at fine scale we have Multiscale Ridgelets coefficients of the bandpass filtered object: a, = (Af,
0,),
p E M 8 ,s = 1,2,....
Curvelets
117
Note well that each coefficient associated to scale 2' derives from the subband filtered version off  Af  and not from f. Several properties are immediate: "*Tight Frame: JIl12. = If 112 1c M'
"*Existence of Coefficient Representers (Frame Elements): a
 (f, M)"
"* L2 Reconstruction Formula: I cM'
"*Formula for Frame Elements:
In short, the curvelets are obtained by bandpass filtering of Multiscale Ridgelets with passband is rigidly linked to the scale of spatial localization
"*Anisotropy Scaling Law:
Linking the filter passband ICI ý 22, to the spatial scale 2' imposes that (1) most curvelets are negligible in norm (most multiscale ridgelets do not survive the bandpass filtering A,); (2) the nonnegligible curvelets obey length z 21 while width z: 22s. So the system obeys approximately the scaling relationship width P length 2 .
It is here that the
22,
coronization scheme comes into play. §6. Why Should This Work?
The curvelet decomposition can be equivalently stated in the following form: "* Subband Decomposition. The object f is filtered into subbands: f
(Pof, Alf, A 2f,..).
"* Smooth Partitioning. Each subband is smoothly windowed into "squares" of an appropriate scale: Af 4 (WQAsJ)QEQý.
"• Renormalization. Each resulting square is renormalized to unit scale gQ = 2s(TQ)'(wQAsf),
Q E Q8 .
118
E. J. Cand~s and D. L. Donoho
* Ridgelet Analysis. Each square is analyzed in the orthoridgelet system a. = (9Q, PA),
y = (Q, A).
We can now give a crude explanation of why the main result (4) holds. Effectively, the bandpass images Ajf are almost vanishing at x far from the edges in f. Along edges, the bandpass images exhibit ridges of width : 22,  the width of the underlying bandpass filter. The partitioned bandpass images are broken into squares of side 2 S x2'. The squares which do not intersect edges have effectively no energy, and we ignore them. The squares which do intersect edges result from taking a nearlystraight ridge and windowing it. Thus the squares which 'matter' exhibit tiny ridge fragments of aspect ratio 21 by 22,. After renormalization, the resulting gQ exhibits a ridge fragment of about unit length and of width 2s. The ridge fragment is then analyzed in the orthoridgelet system, which should (we hope) yield only a few significant coefficients. In fact, simple arguments of size and order give an idea how the curvelet coefficients roughly behave. We give an extremely loose description. First, at scale 2s, there are only about 0(2s) squares Q G Q, that interact with the edges. Calculating the energy in such a square using the size of the support and the height of the ridge leads to (length . width) 1/ 2 • height ; (2s x 22s)1/2 X 1. Indeed, the height of the ridge is bounded by
IIAsf11. = 1
* f11
• H1'P2s Ill1fI11.
= II'lIlIfII .
Since we are interested in uniformly bounded functions f, the height is thus bounded by a constant C. The calculation of the norm IgQ I12 ; 23/2 follows immediately (because of the renormalization, the height of the ridge gQ is now 2') . Now temporarily suppose that for some fixed K not depending on Q, each ridge fragment gQ is a sum of at most K orthoridgelets.
(7)
This would imply that at level s we have a total number of coefficients 0( 2 s) squares which 'matter' x Kcoefficients/square, while the norm estimate for gQ and the orthonormality of ridgelets give coefficient amplitude < C 23s/2. The above assumptions imply that the Nth largest curvelet coefficient is of size < C . N 3/ 2 . Letting Ial(N) denote the Nth coefficient amplitude, the tail sum of squares would obey
Z
N>m
ICYIN) < C.M
2
.
()
119
Curvelets
This coefficient decay leads to (4) as follows. Let /p,,... , ILm enumerate indices of the m largest curvelet coefficients. Build the mterm approximation m
By the tight frame property,
N=m+l
where the last step used (8). This of course would establish (4)  in fact something even stronger, something fully as good as (3). However, we have temporarily assumed (7)  which is not true. Each ridge fragment generates a countable number of nonzero ridgelet coefficients in general. The paper [6] gets (4) using much more subtle estimates. §7. Discussion Why call these things curvelets? The visual appearance of curvelets does not match the name we have given them. The curvelets waveforms look like brushstrokes; brushlets would have been an appropriate name, but it was taken already, by F. Meyer and R. Coifman, for an unrelated scheme (essentially, Gabor Analysis). Our point of view is simply that curvelets exemplify a certain curve scaling law, width = length 2 , which is naturally associated to curves. A deeper connection between curves and curvelets was alluded to in our talk at Curves and Surfaces '99. Think of a curve in the plane as a distribution supported on the curve, in the same way that a point in the plane can be thought of as a Dirac distribution supported on that point. The curvelets scheme can be used to represent that distribution as a superposition of functions of various lengths and widths obeying the scaling law width = length 2 . In a certain sense this is a nearoptimal representation of the distribution. The analogy and the surprise Sections 2 and 3 showed that wavelets do surprisingly well in representing point singularities. Without attempting an explicit representation of 'where the bad points are', wavelets do essentially as well as ideal adaptive schemes in representing the point singularities. Sections 46 showed that the nonadaptive curvelets representation can do nearly as well in representing objects with discontinuities along curves as adaptive methods that explicitly track the shape of the discontinuity and use a special adaptive representation dependent on that tracking. We find it surprising and stimulating that the curvelet representation can work so well despite the fact that it never constructs or depends on the existence of any 'map' of the discontinuity set. We also find it interesting that there is a system of analysis which plays the role for curvilinear singularities that wavelets play for point singularities.
120
E. J. Candes and D. L. Donoho
Acknowledgments. This research was supported by National Science Foundation grants DMS 9872890 (KDI), DMS 9505151, and by AFOSR MURI 95P496209610028. References 1. Cand~s, E. J., Harmonic analysis of neural networks, Appl. Comput. Harmon. Anal. 6 (1999), 197218. 2. Cand~s, E. J., Ridgelets: theory and applications, Ph.D. Thesis, Statistics, Stanford, 1998. 3. Cand~s, E. J., Monoscale ridgelets for the representation of images with edges, Technical Report, Statistics, Stanford, 1999. 4. Cand~s, E. J., On the representation of mutilated Sobolev functions, Technical Report, Statistics, Stanford, 1999. 5. Candbs, E. J., and D. L. Donoho, Ridgelets: The key to highdimensional intermittency? Phil. Trans. R. Soc. Lond. A. 357 (1999), 24952509. 6. Cand~s, E. J., and D. L. Donoho, Curvelets, Manuscript, 1999. 7. Coifman, R. R., and M. V. Wickerhauser, Entropybased algorithms for best basis selection, IEEE Trans. Inform. Theory 38 (1992), 17131716. 8. Deng, B., B. Jawerth, G. Peters, and W. Sweldens, Wavelet probing for compressionbased segmentation, in Proc. SPIE Symp. Math. Imaging: Wavelet Applications in Signal and Image Processing, 1993. Proceedings of SPIE conference July 1993, San Diego. 9. Donoho, D. L., Minimum entropy segmentation, in Wavelets: Theory, Algorithms and Applications, C. K. Chui, L. Montefusco and L. Puccio (eds.), Academic Press, San Diego, 1994, 233270. 10. Donoho, D. L., and I. M. Johnstone, Empirical Atomic Decomposition, Manuscript, 1995. 11. Donoho, D. L., Wedgelets: nearly minimax estimation of edges, Ann. Statist. 27 (1999), 859897. 12. Donoho, D. L., Orthonormal ridgelets and linear singularities, Technical Report, Statistics, Stanford, 1998, SIAM J. Math. Anal., to appear. 13. Donoho, D. L., Sparse components analysis and optimal atomic decomposition, Technical Report, Statistics, Stanford, 1998. 14. Meyer, Y., Wavelets and Operators,Cambridge University Press, 1992. 15. Vetterli, M., and J. Kovacevic, Wavelets and Subband Coding, Prentice Hall, 1995. Department of Statistics Stanford University Stanford, CA 943054065, USA {emmanuel,donoho}lstat. stanford. edu http: //wwwstat. stanford. edu/{1emmanuel, donoho}/