1) Why can’t recursive queries contain • GROUP BY, • LEFT/RIGHT JOINS • SELECT DISTINCT • AGGREGATE FUNCTIONS • SUBQUERIES • hints on the recursive CTE reference
BTW - What is meant “by hints on the recursive CTE reference”?
2) Anchor queries can be unioned together with UNION operator, so why can’t recursive queries also be unioned together with UNION operator ( instead they must use UNION ALL )?
On Fri, 6 Nov 2009 12:39:40 -0800 (PST), klem s wrote: >hi
>I hope I didn’t put too many questions >1) Why can’t recursive queries contain >• GROUP BY, >• LEFT/RIGHT JOINS >• SELECT DISTINCT >• AGGREGATE FUNCTIONS >• SUBQUERIES >• hints on the recursive CTE reference
Hi Klem,
Simple - because that are the limitation the development team chose to put in place.
<speculation mode on>
Allowing these options would probably allow users to write queries that were too hard to figure out how to do, or that would require more development time than was necessary. Evey option allowed adds to the complexion of the final product. Remember that if an option is allowed, customers will expect it to work in EVERY situation, even the most wacky ones!
</speculation mode off>
>BTW - What is meant “by hints on the recursive CTE reference”?
You can put in "hints" in several places in your query. And the name "hints" is quite bad, since they are actually directives. You can use them to force the optimizer to use a specific index, force it to use some specific join strategy, or force it to follow a certain locking strategy.
In the very simple recursive CTE below, the lines marked "-- Here!" are the lines that containt references to the recursive CTE. So you can not use any hints there (though you still can use hints in the rest of the query).
WITH Recursive AS (SELECT 1 AS Num UNION ALL SELECT Num + 1 FROM Recursive -- Here! WHERE Num < 10) SELECT Num FROM Recursive; -- Here!
>2) Anchor queries can be unioned together with UNION operator, so why >can’t recursive queries also be unioned together with UNION operator >( instead they must use UNION ALL )?
Probably the same reason as above. Plus, there's some ambiguity if you allow this. Normally, rows produced in the third iteration are input for the fourth. But what if one of the rows in the third iteration is a duplicate of a row in the first or second iteration - the UNION (without ALL) causes it to be removed from the final results, but is it also removed from the input for the fourth iteration or is it only removed at the end? If I were on the dev team, I'd figure that an unanswerable question such as this is already a strong case to simply disallow the option.
MS SQL Server is a minimal subset of the ANSI/ISO Standards. In Standard SQL, the WITH clause is "WITH [RECURSIVE]<cte name>" so that the compiler gets to change modes when a CTE is recursive. There are other differences. I am at speaking at PASS now and then going to SQL Connections, so I do not have my Standards with me; Google around for it.
I think the Standard might allow for any set operator between the fixpoint table expression that does not cause infinite recursion.
1. You can use subqueries but they have to be expressed as CROSS (like JOIN) or OUTER (like LEFT JOIN) APPLYs
WITH bla AS ( SELECT FROM ancor
UNION ALL
SELECT FROM bla r CROSS APPLY -- using OUTER APPLY will be like LEFT JOIN ( SELECT FROM someotherplace t WHERE t.key = r.key )v
)
SELECT DISTINCT COUNT(*) FROM bla GROUP BY
2. You can define one CTE that uses LEFT JOIN (think of views) but in recursive CTE you can then JOIN to that "LEFT JOIN" view. For exampe, see code posted for this bug. Look past bug and notice how vw is defined and how it is used in CTE. that vw can be any derived table source, and can use LEFT JOINs. RIGHT JOIN can be re-expressed as LEFT JOIN, easily. https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?Fe...
WITH vw AS ( SELECT FROM here JOIN there LEFT JOIN overthere ) , bla AS ( SELECT FROM ancor
UNION ALL
SELECT FROM bla r JOIN vw v ON v.key = r.key )v
)
SELECT DISTINCT COUNT(*) FROM bla GROUP BY
3. You can apply DISTINCT or aggregates on the final SELECT
4. You can have query hints at the end of a query but not at the recursive reference.
1) Why can’t recursive queries contain • GROUP BY, • LEFT/RIGHT JOINS • SELECT DISTINCT • AGGREGATE FUNCTIONS • SUBQUERIES • hints on the recursive CTE reference
BTW - What is meant “by hints on the recursive CTE reference”?
2) Anchor queries can be unioned together with UNION operator, so why can’t recursive queries also be unioned together with UNION operator ( instead they must use UNION ALL )?
> 1. You can use subqueries but they have to be expressed as CROSS (like > JOIN) or OUTER (like LEFT JOIN) APPLYs
> WITH bla AS > ( > SELECT > FROM ancor
> UNION ALL
> SELECT > FROM bla r > CROSS APPLY -- using OUTER APPLY will be like LEFT JOIN > ( > SELECT > FROM someotherplace t > WHERE t.key = r.key > )v
> )
> SELECT DISTINCT > COUNT(*) > FROM bla > GROUP BY
> 2. You can define one CTE that uses LEFT JOIN (think of views) but in > recursive CTE you can then JOIN to that "LEFT JOIN" view. > For exampe, see code posted for this bug. Look past bug and notice how vw > is defined and how it is used in CTE. that vw can be any derived table > source, and can use LEFT JOINs. RIGHT JOIN can be re-expressed as LEFT > JOIN, easily. > https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?Fe...
> WITH vw AS > ( > SELECT > FROM here > JOIN there > LEFT JOIN overthere > ) > , bla AS > ( > SELECT > FROM ancor
> UNION ALL
> SELECT > FROM bla r > JOIN vw v ON v.key = r.key > )v
> )
> SELECT DISTINCT > COUNT(*) > FROM bla > GROUP BY
> 3. You can apply DISTINCT or aggregates on the final SELECT
> 4. You can have query hints at the end of a query but not at the recursive > reference.
> BTW - What is meant “by hints on the recursive CTE reference”?
> 2) Anchor queries can be unioned together with UNION operator, so why > can’t recursive queries also be unioned together with UNION operator > ( instead they must use UNION ALL )?
At each recursive level you can have some success with windowing functions. Therefore, at each recursive level you can use aggregates with OVER clause.
Why some success? well check that MS Connect link. Some don't work as expected. If they did, you could row_number() at each level, get max( from derived table where row_number (level data set)) OVER () at each level and then start the next ireration level with r.max_prev_iteration + max(from derived table, row_number(level data set)).
WITH bla AS ( SELECT FROM ancor
UNION ALL
SELECT MAX(somevalue)OVER () -- and other aggregate values FROM bla r OUTER APPLY ( SELECT FROM someplace t WHERE t.key = r.key )v
UNION ALL
SELECT MAX(somevalue)OVER () -- and other aggregate values FROM bla r CROSS APPLY ( SELECT FROM someotherplace t WHERE t.key = r.key )v
>> 1. You can use subqueries but they have to be expressed as CROSS (like >> JOIN) or OUTER (like LEFT JOIN) APPLYs
>> WITH bla AS >> ( >> SELECT >> FROM ancor
>> UNION ALL
>> SELECT >> FROM bla r >> CROSS APPLY -- using OUTER APPLY will be like LEFT JOIN >> ( >> SELECT >> FROM someotherplace t >> WHERE t.key = r.key >> )v
>> )
>> SELECT DISTINCT >> COUNT(*) >> FROM bla >> GROUP BY
>> 2. You can define one CTE that uses LEFT JOIN (think of views) but in >> recursive CTE you can then JOIN to that "LEFT JOIN" view. >> For exampe, see code posted for this bug. Look past bug and notice how vw >> is defined and how it is used in CTE. that vw can be any derived table >> source, and can use LEFT JOINs. RIGHT JOIN can be re-expressed as LEFT >> JOIN, easily. >> https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?Fe...
>> WITH vw AS >> ( >> SELECT >> FROM here >> JOIN there >> LEFT JOIN overthere >> ) >> , bla AS >> ( >> SELECT >> FROM ancor
>> UNION ALL
>> SELECT >> FROM bla r >> JOIN vw v ON v.key = r.key >> )v
>> )
>> SELECT DISTINCT >> COUNT(*) >> FROM bla >> GROUP BY
>> 3. You can apply DISTINCT or aggregates on the final SELECT
>> 4. You can have query hints at the end of a query but not at the >> recursive reference.
>> BTW - What is meant “by hints on the recursive CTE reference”?
>> 2) Anchor queries can be unioned together with UNION operator, so why >> can’t recursive queries also be unioned together with UNION operator >> ( instead they must use UNION ALL )?
SQL server is full of limitations that don't make sense. We've only recently got the ability to do something as basic as define a variable of infinite length string. When we got CLR integration we were given a limit of 8K of memory to work with (E I G H T K!!!!!!). When we got the ability to define our own aggregates we couldn't pass any parameters into that aggregate.
I think the cause for all this is backwards compatibility. While for many this is a good thing, for others like me, it is just holding sqlserver back in the dark ages. Hopefully soon we get a brand new product based on LINQ.
2) Anchor queries can be unioned together with UNION operator, so why can’t recursive queries also be unioned together with UNION operator ( instead they must use UNION ALL )?
I guess it's just assuming anything with a UNION is recursive. You can just use a second CTE to do a DISTINCT on the first.
Michael C (m...@nospam.com) writes: > SQL server is full of limitations that don't make sense. We've only > recently got the ability to do something as basic as define a variable > of infinite length string. When we got CLR integration we were given a > limit of 8K of memory to work with (E I G H T K!!!!!!). When we got > the ability to define our own aggregates we couldn't pass any parameters > into that aggregate.
> I think the cause for all this is backwards compatibility.
Some limitations or funky behaviour is certainly due to backwards compatibility issue. But a more common reason is that even Microsoft has limited resources and at some point they have to ship.
As for the limitation to 8K in the CLR, this restriction has been lifted in SQL 2008, so that you can define CLR UDTs and CLR aggregates that can produce more than 8000 bytes.
> Hopefully soon we get a brand new product based on LINQ.
That is very unlikely. After all, SQL is the standard for relational databases, not LINQ. And LINQ also buys you lots of limitations. For instance, apparently you cannot use table-valued parameters with LINQ.
-- Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se
> Some limitations or funky behaviour is certainly due to backwards > compatibility issue. But a more common reason is that even Microsoft has > limited resources and at some point they have to ship.
That's true, I think there is also a mindset in the sqlserver team. The dot net team seems to have a policy of giving users as much flexibility as possible where sqlserver team seems to restrict users. That's obviously not the only reason but it's got to be a contributing factor.
> As for the limitation to 8K in the CLR, this restriction has been lifted > in SQL 2008, so that you can define CLR UDTs and CLR aggregates that can > produce more than 8000 bytes.
That I know but to place an 8K restriction on this is a perfect of example of the sqlserver team mindset I was talking about. This sort of thing is unheard of in anything else modern I have used.
>> Hopefully soon we get a brand new product based on LINQ.
> That is very unlikely. After all, SQL is the standard for relational > databases, not LINQ.
So what you're saying is we can't sell cars because we are a motorcycle shop. :-) I think all the elements are in place for a LINQ database. Obviously we have linq, we have the .net framework as a language to replace store procs etc and we have an IDE in visual studio that could be modified to suit a linq DB. I think this could sit along side sqlserver as a seperate product.
> And LINQ also buys you lots of limitations. For > instance, apparently you cannot use table-valued parameters with LINQ.
I don't think that is true. The equivelant of a table in Linq is an IEnumerable and you can pass an IEnumerable parameter into and out of any LINQ query in all sorts of ways with far more flexibility than sql. For example, an IEnumerable can be a simple list of integers or it can be an object which is the eqivelant of a row. It can even be an object with a grouping of subobjects (eg customer with many invoices). The way a result set from a query in sql can only be a certainly number of rows and columns (no sub collections etc) seems very limiting.
Michael C (m...@nospam.com) writes: > That's true, I think there is also a mindset in the sqlserver team. The > dot net team seems to have a policy of giving users as much flexibility > as possible where sqlserver team seems to restrict users. That's > obviously not the only reason but it's got to be a contributing factor.
That must explain ahy in VB .Net can write:
Using cn As SqlConnection = setup_connection(), _ cmd As SqlCommand = cn.CreateCommand()
But I in C# has to write:
using (SqlConnection cn = setup_connection()) { using (SqlCommand cmd = cn.CreateCommand()) {
> That I know but to place an 8K restriction on this is a perfect of example > of the sqlserver team mindset I was talking about. This sort of thing is > unheard of in anything else modern I have used.
So in any other product you work with there is a perfect functionality matrix? Everything works with everything else?
>> That is very unlikely. After all, SQL is the standard for relational >> databases, not LINQ.
> So what you're saying is we can't sell cars because we are a motorcycle > shop. :-)
No, I'm saying that selling cars with a joystick instead of a steering wheel is dead in the water.
>> And LINQ also buys you lots of limitations. For >> instance, apparently you cannot use table-valued parameters with LINQ.
> I don't think that is true. The equivelant of a table in Linq is an > IEnumerable and you can pass an IEnumerable parameter into and out of any > LINQ query in all sorts of ways with far more flexibility than sql. For > example, an IEnumerable can be a simple list of integers or it can be an > object which is the eqivelant of a row. It can even be an object with a > grouping of subobjects (eg customer with many invoices). The way a result > set from a query in sql can only be a certainly number of rows and columns > (no sub collections etc) seems very limiting.
OK, can you then provide a sample in a C# program using LINQ that calls a stored procedure with a table-valued parameter?
Or write in LINQ a query that in Northwind database (or AdventureWorks) retrieves this information from the first order of each customer: CustomerID, CustomerName, order date, number of articles, and the total order order amount.
-- Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se
> Using cn As SqlConnection = setup_connection(), _ > cmd As SqlCommand = cn.CreateCommand()
> But I in C# has to write:
> using (SqlConnection cn = setup_connection()) { > using (SqlCommand cmd = cn.CreateCommand()) {
That's is about as minor an issue as you can get. Doing what you've described would be handy in C# but it doesn't stop you doing something. While you picked on something completely trivial you've missed all the massive amounts of extensibility they give you in c#. As an example, I can write my own connection object which connects to "Mikes Propriotry File(c)" and do anything I like with it. In sqlserver if you've got an object, eg a cursor, then you're pretty much stuck with what they give you. What extensibility they give you is usually extremely restrictive.
> So in any other product you work with there is a perfect functionality > matrix? Everything works with everything else?
I never said there was. What I meant is that I have never worked with any product (except sqlserver of course) that does something as silly as give the programmer only 8k to work with. I don't think you understand what a HUGE step back into the 80s that that is. This stands out as pretty much the silliest restriction I have ever encountered in my time as a programmer.
> No, I'm saying that selling cars with a joystick instead of a steering > wheel is dead in the water.
There was a time when people thought selling cars with headlights was a bad idea. Maybe we'll never sell cars with a joystick but I don't think that linq is comparable to a joystick. LINQ has really taken off and could easily become a real competitor to sql. You've got to take into account that you can pretty much write standard sql but back to front (select statement last) but have the huge advantage of writing the less friendly but more functional version eg:
From i in SomeCollect Where i.STATE = 'VIC' select i.Postcode or SomeCollection.Where(i => i.State == Vic).Select(i => i.POstcode)
> OK, can you then provide a sample in a C# program using LINQ that calls > a stored procedure with a table-valued parameter?
But that's not linq, that's calling a stored proc in sqlserver. In a pure linq environment you just pass in an IEnumerable instance.
> Or write in LINQ a query that in Northwind database (or AdventureWorks) > retrieves this information from the first order of each customer: > CustomerID, CustomerName, order date, number of articles, and the total > order order amount.
I think I see what you're friend was talking about, you can't use a table variable when working with linq that is used as a wrapper for sqlserver? Linq to sql I think they call it. This is just crappy auto generated code, the limitation is in what generates the code, not linq itself. IMO, linq to sql isn't going to go anywhere.
If you're going to ignore linq-to-sql and write code yourself to call a stored proc then surely you can pass in a table variable. I haven't used sql2008 so don't know about this feature. If this is a limitation then the limitation is in ado.net, not linq itself. If ado.net can pass a table variable to sqlserver then linq can.
The LINQ provider for SQL actually generates SQL code behind the scenes. To be fair, it does a fair job of basic optimization (properly parameterizing variables, etc.), but it doesn't really have an "industrial" optimizer built into its code generator. Not sure how you feel LINQ will replace SQL--who's to say EF won't replace LINQ next year?
> As an example, I can write my own connection object which connects to > "Mikes Propriotry File(c)" and do anything I like with it. In sqlserver if > you've got an object, eg a cursor, then you're pretty much stuck with what > they give you. What extensibility they give you is usually extremely > restrictive.
I was following your conversation intently, but you completely lost me on this comparison... What problem are you trying to solve with a cursor here, and what's the comparison between a cursor and "an object" that you're making? If you really want to compare apples to apples (though possibly different varieties) then you'd be better off comparing a "SqlDataReader" "object" to a cursor. I guess you don't see a lot of comparisons of "Connection" objects (like "SqlConnection") to "cursors", in much the same way that you don't see people comparing grizzly bears to drywall.
>> Using cn As SqlConnection = setup_connection(), _ >> cmd As SqlCommand = cn.CreateCommand()
>> But I in C# has to write:
>> using (SqlConnection cn = setup_connection()) { >> using (SqlCommand cmd = cn.CreateCommand()) {
> That's is about as minor an issue as you can get. Doing what you've > described would be handy in C# but it doesn't stop you doing something. > While you picked on something completely trivial you've missed all the > massive amounts of extensibility they give you in c#. As an example, I can > write my own connection object which connects to "Mikes Propriotry > File(c)" and do anything I like with it. In sqlserver if you've got an > object, eg a cursor, then you're pretty much stuck with what they give > you. What extensibility they give you is usually extremely restrictive.
>> So in any other product you work with there is a perfect functionality >> matrix? Everything works with everything else?
> I never said there was. What I meant is that I have never worked with any > product (except sqlserver of course) that does something as silly as give > the programmer only 8k to work with. I don't think you understand what a > HUGE step back into the 80s that that is. This stands out as pretty much > the silliest restriction I have ever encountered in my time as a > programmer.
>> No, I'm saying that selling cars with a joystick instead of a steering >> wheel is dead in the water.
> There was a time when people thought selling cars with headlights was a > bad idea. Maybe we'll never sell cars with a joystick but I don't think > that linq is comparable to a joystick. LINQ has really taken off and could > easily become a real competitor to sql. You've got to take into account > that you can pretty much write standard sql but back to front (select > statement last) but have the huge advantage of writing the less friendly > but more functional version eg:
> From i in SomeCollect Where i.STATE = 'VIC' select i.Postcode > or > SomeCollection.Where(i => i.State == Vic).Select(i => i.POstcode)
>> OK, can you then provide a sample in a C# program using LINQ that calls >> a stored procedure with a table-valued parameter?
> But that's not linq, that's calling a stored proc in sqlserver. In a pure > linq environment you just pass in an IEnumerable instance.
>> Or write in LINQ a query that in Northwind database (or AdventureWorks) >> retrieves this information from the first order of each customer: >> CustomerID, CustomerName, order date, number of articles, and the total >> order order amount.
> I think I see what you're friend was talking about, you can't use a table > variable when working with linq that is used as a wrapper for sqlserver? > Linq to sql I think they call it. This is just crappy auto generated code, > the limitation is in what generates the code, not linq itself. IMO, linq > to sql isn't going to go anywhere.
> If you're going to ignore linq-to-sql and write code yourself to call a > stored proc then surely you can pass in a table variable. I haven't used > sql2008 so don't know about this feature. If this is a limitation then the > limitation is in ado.net, not linq itself. If ado.net can pass a table > variable to sqlserver then linq can.
Michael C (m...@nospam.com) writes: > "Erland Sommarskog" <esq...@sommarskog.se> wrote in message >> But I in C# has to write:
>> using (SqlConnection cn = setup_connection()) { >> using (SqlCommand cmd = cn.CreateCommand()) {
> That's is about as minor an issue as you can get. Doing what you've > described would be handy in C# but it doesn't stop you doing something.
Certaintly it's a minor issue. But I write less than 50 lines of C# code per month an average, so I don't run into that many limitations. My point was just that any language and environment has its limitations.
> While you picked on something completely trivial you've missed all the > massive amounts of extensibility they give you in c#. As an example, I > can write my own connection object which connects to "Mikes Propriotry > File(c)" and do anything I like with it. In sqlserver if you've got an > object, eg a cursor, then you're pretty much stuck with what they give > you. What extensibility they give you is usually extremely restrictive.
On the other hand, imagine that you have a gross amount of data spread on disk in a couple of flat files, and you need to compute some sort of report from this data. Furthermore, there is a desire that execution time is as quick as possible. All you have is C#. Your code may be extensible, but unless you first implement a relational engine, your code will use the same access path every time, even if the data distribution changes over time. And the amount of code you write will be considerable, possible several thousands lines of code.
In an RDBMS you may be able to write a query in less of 50 lines of SQL, and the optimizer will make sure that you get a new query plan as data changes.
Or put it another ways: boats are extremely crappy on motorways. Even if you put wheels on them, they are overtaken by the slowest cars. Now, put the cars into the water...
> I never said there was. What I meant is that I have never worked with > any product (except sqlserver of course) that does something as silly as > give the programmer only 8k to work with. I don't think you understand > what a HUGE step back into the 80s that that is. This stands out as > pretty much the silliest restriction I have ever encountered in my time > as a programmer.
Reality check here. When did you last see a C# program that on its own was able to deal with terabytes of data?
A C# program exists under the assumption that it will never deal with that much data. Never more than the amount of main memory in the box. So there is no problem with having an unbounded string. There will not be terribly many of them anyway.
SQL Server on the other hand works in a realm of potentially unbounded amount of data. There is a real cost for handling volumes. The reason for the border between a limited and unlimited (well 2GB) string at 8000 bytes relates to the page size on disk. There is nothing that says that this size has to be 8000, or the page size has to be fixed. I've worked with a product where you could set the page size yourself. But it's also easy to see that a uniform page size, permits quite a few simplifications which are likely to be good for performance. And an RDBMS is lot about performance.
If you really think the limitation of 8000 chars gets in your way, you can always declare you string variables and string columns as (n)varchar(MAX), but I advice against it. I've found that there is a performance penalty, even if the actual length is below 8000.
>> No, I'm saying that selling cars with a joystick instead of a steering >> wheel is dead in the water.
> There was a time when people thought selling cars with headlights was a > bad idea. Maybe we'll never sell cars with a joystick but I don't think > that linq is comparable to a joystick.
I think you missed the point. The analogy was not meant to denigrate LINQ as such. Rather, even if joysticks would prove to be a better way to manoever a car, it would be a hard sell, since people are used to steering wheels.
>> OK, can you then provide a sample in a C# program using LINQ that calls >> a stored procedure with a table-valued parameter?
> But that's not linq, that's calling a stored proc in sqlserver. In a pure > linq environment you just pass in an IEnumerable instance.
Well, you made a number of that T-SQL had so many limitations. You will have to admit that if I cannot call a stored procedure with a table-variable from LINQ, be that LINQ-to-SQL or LINQ-to-EF, that is quite a limitation, since it deprives you of a very useful feature in SQL 2008.
(To be fair, VS 2008 was released before SQL 2008 came out. Maybe they are adressing it in VS 2010?)
>> Or write in LINQ a query that in Northwind database (or AdventureWorks) >> retrieves this information from the first order of each customer: >> CustomerID, CustomerName, order date, number of articles, and the total >> order order amount.
It seems that you missed this one. But I'm still interested in seeing the query.
> If you're going to ignore linq-to-sql and write code yourself to call a > stored proc then surely you can pass in a table variable. I haven't used > sql2008 so don't know about this feature. If this is a limitation then the > limitation is in ado.net, not linq itself. If ado.net can pass a table > variable to sqlserver then linq can.
You can pass data to TVPs from ADO .Net, using either a DataSet, a List<SqlDataRecord> or an IDataReader. But how to do it in LINQ? Or by using EF?
-- Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se
> Certaintly it's a minor issue. But I write less than 50 lines of C# > code per month an average, so I don't run into that many limitations. > My point was just that any language and environment has its limitations.
I never said otherwise. What I did say is that the limitations in sqlserver are simply bigger and sillier than the limitations in other products I have encountered. Sometimes there are valid reasons for this, eg indexes would place a large number of limitations on things but in other cases there is no good reason except maybe compatibility with stuff in the past.
> On the other hand, imagine that you have a gross amount of data spread > on disk in a couple of flat files, and you need to compute some sort of > report from this data. Furthermore, there is a desire that execution time > is as quick as possible. All you have is C#. Your code may be extensible, > but unless you first implement a relational engine, your code will use > the same access path every time, even if the data distribution changes > over time. And the amount of code you write will be considerable, > possible several thousands lines of code.
> In an RDBMS you may be able to write a query in less of 50 lines of > SQL, and the optimizer will make sure that you get a new query plan as > data changes.
> Or put it another ways: boats are extremely crappy on motorways. Even > if you put wheels on them, they are overtaken by the slowest cars. Now, > put the cars into the water...
C# with linq would be more than capable of being the language for a database (not sure of the technical term here). It is certainly not a boat out of water.
> Reality check here. When did you last see a C# program that on its own > was able to deal with terabytes of data?
Currently it does not but we don't have that ability yet. MS would need to write a database that uses linq instead of sql for this to be a reality.
> A C# program exists under the assumption that it will never deal with > that much data. Never more than the amount of main memory in the box.
That's just not true. C# can easily handle more data than there is memory available. I have a C# program that handles 80GB files. SQLServer is simply C code, there is no reason C# can't replicate anything sqlserver does. Besides, I'm not suggesting this "Linq database" be actually written in C#, just that C# is the front end for it.
> SQL Server on the other hand works in a realm of potentially unbounded > amount of data. There is a real cost for handling volumes. The reason > for the border between a limited and unlimited (well 2GB) string at > 8000 bytes relates to the page size on disk. There is nothing that > says that this size has to be 8000, or the page size has to be fixed. > I've worked with a product where you could set the page size yourself. > But it's also easy to see that a uniform page size, permits quite a few > simplifications which are likely to be good for performance. And an > RDBMS is lot about performance.
There is absolutely no reason sqlserver would have an 8K limit today, except for hangovers from the past. Maybe if we go past 8K we get a performance hit but that would be fine.
> If you really think the limitation of 8000 chars gets in your way, > you can always declare you string variables and string columns as > (n)varchar(MAX), but I advice against it. I've found that there is a > performance penalty, even if the actual length is below 8000.
It's not varchar that's the problem for me, it's calling CLR code for a custom aggregate that is the issue.
> I think you missed the point. The analogy was not meant to denigrate LINQ > as such. Rather, even if joysticks would prove to be a better way to > manoever a car, it would be a hard sell, since people are used to steering > wheels.
Ah, ok, that might be true but then again it might not. Certainly the industry hasn't had any problem picking up .net and C# or vb.net.
> Well, you made a number of that T-SQL had so many limitations. You > will have to admit that if I cannot call a stored procedure with a > table-variable from LINQ, be that LINQ-to-SQL or LINQ-to-EF, that is > quite a limitation, since it deprives you of a very useful feature in > SQL 2008.
> (To be fair, VS 2008 was released before SQL 2008 came out. Maybe they > are adressing it in VS 2010?)
But my point was this is not a limitation of linq. If it is not possible then this is because ado.net does not allow it. If ado.net allows it then linq can do it.
> It seems that you missed this one. But I'm still interested in seeing > the query.
I don't have sql2008 here and it would take me several hours to get it installed and write a sample so I cannot do this. Even if this is a limitation of linq then it's really just a limitation interacting with sql.
> You can pass data to TVPs from ADO .Net, using either a DataSet, a > List<SqlDataRecord> or an IDataReader. But how to do it in LINQ? Or > by using EF?
In Linq I would just cast the data I have to a List<SqlDataRecord>. Something like this:
> The LINQ provider for SQL actually generates SQL code behind the scenes. > To be fair, it does a fair job of basic optimization (properly > parameterizing variables, etc.), but it doesn't really have an > "industrial" optimizer built into its code generator. Not sure how you > feel LINQ will replace SQL--who's to say EF won't replace LINQ next year?
Linq to sql is rubbish imo. Why bother writing in a similar language with a translation when you can write in sql directly. Writing native linq against a linq database would be much better :-))
> I was following your conversation intently, but you completely lost me on > this comparison... What problem are you trying to solve with a cursor > here, and what's the comparison between a cursor and "an object" that > you're making? If you really want to compare apples to apples (though > possibly different varieties) then you'd be better off comparing a > "SqlDataReader" "object" to a cursor. I guess you don't see a lot of > comparisons of "Connection" objects (like "SqlConnection") to "cursors", > in much the same way that you don't see people comparing grizzly bears to > drywall.
I'm not comparing the 2 because they have the same functionality, just using 2 random objects as an example. In sqlserver you have a cursor. You can not extend this object in any way at all. You cannot write your own cursor object in C++ and then reference that in sqlserver, you just have a cursor object as is. In C# when we have an object it can usually be extended in some sort of way. As an example, people or companies can write their own data providers for C#, as I'm sure Oracle have done for their database. If we had a cursor object in C# we would be able to extend it using extension methods, most likely inherit from it, we could possible implement its interface and pretend to be a cursor object or we could write our own cursor object from scratch.
> "Michael Coles" <ad...@geocodenet.com> wrote in message > news:eI7wW6ZaKHA.4668@TK2MSFTNGP06.phx.gbl... >> The LINQ provider for SQL actually generates SQL code behind the scenes. >> To be fair, it does a fair job of basic optimization (properly >> parameterizing variables, etc.), but it doesn't really have an >> "industrial" optimizer built into its code generator. Not sure how you >> feel LINQ will replace SQL--who's to say EF won't replace LINQ next year?
> Linq to sql is rubbish imo. Why bother writing in a similar language with > a translation when you can write in sql directly. Writing native linq > against a linq database would be much better :-))
I didn't realize there was such an animal as a "linq database". LINQ generally provides a common interface to query a variety of different datasources--SQL databases, Objects, websites, or anything else that you care to write a provider for. LINQ's strength is its ability to expose disparate data sources via a common query interface, so I'm not sure about a couple of things: 1) what would a "linq database" even look like, and 2) how it would be "better", given LINQ's primary purpose.
>> I was following your conversation intently, but you completely lost me on >> this comparison... What problem are you trying to solve with a cursor >> here, and what's the comparison between a cursor and "an object" that >> you're making? If you really want to compare apples to apples (though >> possibly different varieties) then you'd be better off comparing a >> "SqlDataReader" "object" to a cursor. I guess you don't see a lot of >> comparisons of "Connection" objects (like "SqlConnection") to "cursors", >> in much the same way that you don't see people comparing grizzly bears to >> drywall.
> I'm not comparing the 2 because they have the same functionality, just > using 2 random objects as an example. In sqlserver you have a cursor. You > can not extend this object in any way at all. You cannot write your own > cursor object in C++ and then reference that in sqlserver, you just have a > cursor object as is. In C# when we have an object it can usually be > extended in some sort of way. As an example, people or companies can write > their own data providers for C#, as I'm sure Oracle have done for their > database. If we had a cursor object in C# we would be able to extend it > using extension methods, most likely inherit from it, we could possible > implement its interface and pretend to be a cursor object or we could > write our own cursor object from scratch.
As Erland pointed out, different languages have different strengths and weaknesses. SQL's strength is it's ability to efficiently store and retrieve data in sets and to abstract away a lot of the details so you can focus on your end results instead of focusing on the steps needed to get there. C#'s strengths lie in other areas that are more procedural, such as bit manipulation, looping, string manipulation, etc. While it is entirely possible to write your own data providers for C# and create your own databases using C++ and generate your own query engine using VB, you could potentially run into a couple of issues.
The first question is how much time and money is it worth to write your own database engine using C# and create your own query language built to your own specifications? How much of your resources do you want to throw at a project of this scale?
Another issue is do you want to make your database engine general enough that it can be reused, or is it a one-off thing that will only ever be used for one dedicated project? Obviously you'll get better performance out of a dedicated one-off solution, but you won't be able to use it for any other database projects down the road.
Finally, who are you going to hire to run it and administer it for you? Since it's a homemade project that's not standards-compliant you'll have to teach someone from scratch and won't really be able to leverage your admin's and developer's experience on SQL-based DBMS's. This could be a fairly expensive proposition.
"Michael Coles" <michaelcoAToptonlineDOTnet> wrote in message news:% > I didn't realize there was such an animal as a "linq database".
Currently there is not but the main point I am trying to make is that MS should create a native linq database.
> LINQ generally provides a common interface to query a variety of different > datasources--SQL databases, Objects, websites, or anything else that you > care to write a provider for. LINQ's strength is its ability to expose > disparate data sources via a common query interface, so I'm not sure about > a couple of things: 1) what would a "linq database" even look like, and
A linq database would be similar to sqlserver except you would write in C# code instead of stored procs and linq instead of sql. There would be a collection of tables and each table would have a collection of rows.
2) how
> it would be "better", given LINQ's primary purpose.
It would be more extendable, giving you the ability to write your own sql keywords. Linq would make it more flexible and it would be more consistant. There would be less of the restrictions that are common in sql. It would be more modern as it wouldn't need to be compatible with sql server from 1956.
> As Erland pointed out, different languages have different strengths and > weaknesses. SQL's strength is it's ability to efficiently store and > retrieve data in sets and to abstract away a lot of the details so you can > focus on your end results instead of focusing on the steps needed to get > there. C#'s strengths lie in other areas that are more procedural, such as > bit manipulation, looping, string manipulation, etc.
But now we have linq C# potentially has all the strengths of sql, with not so many of it's weaknesses.
> While it is entirely possible to write your own data providers for C# and > create your own databases using C++ and generate your own query engine > using VB, you could potentially run into a couple of issues.
You've missed my point there. I'm not suggesting writing my own database. I'm just giving examples of how C# is extensible. I might not want to write my own database but I might want to add an extra function to the cursor object. In C# I could inherit from the cursor object and add my own functions. This would be a few minutes work potentially and add some very useful functions to the cursor object. My point is that these sort of things are possible in C# while in sqlserver there is much less facility for these sort of things. When they do give extensibility it's usually very difficult to use and restrictive.
Here's another example. In SQL Server if we want to include some C++ code we have created we can create an extended stored proc. This works but is pretty painful and very restricted. In C# we have a whole array of methods available for adding C++ code.
You do realize that extended stored procs (unmanaged C++ code) are deprecated and will be going away at some point? They have been replaced by SQL CLR, which supports .NET code written in C#, with which you can roll your own custom "cursor-like" objects all day long. I don't quite understand your infatuation with inheriting from and extending cursors to create your own custom cursors -- cursors tend to be one of the worst possible way to get things done in SQL. What functionality would you like to add to cursors anyway?
BTW, your dates seem to be a teeny weeny little bit off. The direct descendants of SQL weren't implemented until the 1970s, and SQL did not become a standard until 1986. Dr. Codd didn't even publish his seminal paper on the relational model until 1970. Perhaps you're confusing SQL (circa 1986) with COBOL (circa 1959)?
> "Michael Coles" <michaelcoAToptonlineDOTnet> wrote in message news:% >> I didn't realize there was such an animal as a "linq database".
> Currently there is not but the main point I am trying to make is that MS > should create a native linq database.
>> LINQ generally provides a common interface to query a variety of >> different datasources--SQL databases, Objects, websites, or anything else >> that you care to write a provider for. LINQ's strength is its ability to >> expose disparate data sources via a common query interface, so I'm not >> sure about a couple of things: 1) what would a "linq database" even look >> like, and
> A linq database would be similar to sqlserver except you would write in C# > code instead of stored procs and linq instead of sql. There would be a > collection of tables and each table would have a collection of rows.
> 2) how >> it would be "better", given LINQ's primary purpose.
> It would be more extendable, giving you the ability to write your own sql > keywords. Linq would make it more flexible and it would be more > consistant. There would be less of the restrictions that are common in > sql. It would be more modern as it wouldn't need to be compatible with sql > server from 1956.
>> As Erland pointed out, different languages have different strengths and >> weaknesses. SQL's strength is it's ability to efficiently store and >> retrieve data in sets and to abstract away a lot of the details so you >> can focus on your end results instead of focusing on the steps needed to >> get there. C#'s strengths lie in other areas that are more procedural, >> such as bit manipulation, looping, string manipulation, etc.
> But now we have linq C# potentially has all the strengths of sql, with not > so many of it's weaknesses.
>> While it is entirely possible to write your own data providers for C# and >> create your own databases using C++ and generate your own query engine >> using VB, you could potentially run into a couple of issues.
> You've missed my point there. I'm not suggesting writing my own database. > I'm just giving examples of how C# is extensible. I might not want to > write my own database but I might want to add an extra function to the > cursor object. In C# I could inherit from the cursor object and add my own > functions. This would be a few minutes work potentially and add some very > useful functions to the cursor object. My point is that these sort of > things are possible in C# while in sqlserver there is much less facility > for these sort of things. When they do give extensibility it's usually > very difficult to use and restrictive.
> Here's another example. In SQL Server if we want to include some C++ code > we have created we can create an extended stored proc. This works but is > pretty painful and very restricted. In C# we have a whole array of methods > available for adding C++ code.
Michael C (m...@nospam.com) writes: > C# with linq would be more than capable of being the language for a > database (not sure of the technical term here). It is certainly not a > boat out of water.
Today it is. Of course, you could implement a number of classes, and also add language constructs for the querying part. Well, it is already there. But you would also need the memory management, the storage management, the process management etc that are core parts of a database engine. SQL Server is *lot* more than just T-SQL.
> That's just not true. C# can easily handle more data than there is > memory available. I have a C# program that handles 80GB files. SQLServer > is simply C code, there is no reason C# can't replicate anything > sqlserver does. Besides, I'm not suggesting this "Linq database" be > actually written in C#, just that C# is the front end for it.
Again, you can of course write classes etc. But C# is just a language. SQL Server is far more than just a language.
Can you put all those 80 GB in one single string variable? Or even 2GB, even if machine only has a virtual address space of 1GB?
> There is absolutely no reason sqlserver would have an 8K limit today, > except for hangovers from the past. Maybe if we go past 8K we get a > performance hit but that would be fine.
And how do you know? The page size in SQL Server is 8192. It could be made larger or variable, but that would be a major change, particularly the latter. And the page size sets the limit between small objects and big objects.
>> If you really think the limitation of 8000 chars gets in your way, >> you can always declare you string variables and string columns as >> (n)varchar(MAX), but I advice against it. I've found that there is a >> performance penalty, even if the actual length is below 8000.
> It's not varchar that's the problem for me, it's calling CLR code for a > custom aggregate that is the issue.
Eh, that limit has been lifted in SQL 2008. If you want to play that game I can add complaints on that C# 1.0 did not have generics. After all, that is a quite essential feature in an object-oriented language. Quite a silly restriction. I mean, if you want to play that game.
> But my point was this is not a limitation of linq. If it is not possible > then this is because ado.net does not allow it. If ado.net allows it then > linq can do it.
ADO .Net can do it. LINQ can't.
> I don't have sql2008 here and it would take me several hours to get it > installed and write a sample so I cannot do this. Even if this is a > limitation of linq then it's really just a limitation interacting with > sql.
Well, since I'm SQL person and think TVPs are a great feature, I think that is quite a limitation.
-- Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se
> You do realize that extended stored procs (unmanaged C++ code) are > deprecated and will be going away at some point? They have been replaced > by SQL CLR, which supports .NET code written in C#, with which you can > roll your own custom "cursor-like" objects all day long. I don't quite > understand your infatuation with inheriting from and extending cursors to > create your own custom cursors -- cursors tend to be one of the worst > possible way to get things done in SQL. What functionality would you like > to add to cursors anyway?
Mike, it's just AN EXAMPLE :-)))
> BTW, your dates seem to be a teeny weeny little bit off. The direct > descendants of SQL weren't implemented until the 1970s, and SQL did not > become a standard until 1986. Dr. Codd didn't even publish his seminal > paper on the relational model until 1970. Perhaps you're confusing SQL > (circa 1986) with COBOL (circa 1959)?
I was just joking of course :-) My point was sqlserver is a very backwards compatible product. This has advantages if you need backwards compatbility but if you don't it's just one HUGE disadvantage, holding large amounts of the product back in the past.
>> What functionality would you like to add to cursors anyway?
> Mike, it's just AN EXAMPLE :-)))
That's OK; do you have a more useful example? Like I said, you have SQL CLR where you can code with C# to interact with the database to your heart's content.
> I was just joking of course :-) My point was sqlserver is a very backwards > compatible product. This has advantages if you need backwards compatbility > but if you don't it's just one HUGE disadvantage, holding large amounts of > the product back in the past.
Hard to tell on here sometimes. Backwards compatibility is seen by some as a positive, especially companies that have invested hundreds of thousands, or millions, of dollars in developing for a specific platform.
> "Michael Coles" <michaelcoAToptonlineDOTnet> wrote in message > news:erbBpcRbKHA.5156@TK2MSFTNGP05.phx.gbl... >> You do realize that extended stored procs (unmanaged C++ code) are >> deprecated and will be going away at some point? They have been replaced >> by SQL CLR, which supports .NET code written in C#, with which you can >> roll your own custom "cursor-like" objects all day long. I don't quite >> understand your infatuation with inheriting from and extending cursors to >> create your own custom cursors -- cursors tend to be one of the worst >> possible way to get things done in SQL.
>> BTW, your dates seem to be a teeny weeny little bit off. The direct >> descendants of SQL weren't implemented until the 1970s, and SQL did not >> become a standard until 1986. Dr. Codd didn't even publish his seminal >> paper on the relational model until 1970. Perhaps you're confusing SQL >> (circa 1986) with COBOL (circa 1959)?
> I was just joking of course :-) My point was sqlserver is a very backwards > compatible product. This has advantages if you need backwards compatbility > but if you don't it's just one HUGE disadvantage, holding large amounts of > the product back in the past.
> Today it is. Of course, you could implement a number of classes, and also > add language constructs for the querying part. Well, it is already there. > But you would also need the memory management, the storage management, > the process management etc that are core parts of a database engine. SQL > Server is *lot* more than just T-SQL.
That's true, it would be a lot of work for MS (or whoever). Nothing they couldn't achieve however.
> Again, you can of course write classes etc. But C# is just a language. > SQL Server is far more than just a language.
Aren't all the elements there is C# to be the front end for a database? Triggers would be events, stored procs would be functions, views would be linq statements, tables would be collections etc. Imagine writing your application in C# and stepping through code and straight into a C# stored proc (insert sound of Homer Simpson drooling here :-)
> Can you put all those 80 GB in one single string variable? Or even 2GB, > even if machine only has a virtual address space of 1GB?
No, but you can use a filestream to parse the data sequentially from the disk. Strings are an in memory object but there are plenty of classes for dealing with stuff that is not.
>> There is absolutely no reason sqlserver would have an 8K limit today, >> except for hangovers from the past. Maybe if we go past 8K we get a >> performance hit but that would be fine.
> And how do you know? The page size in SQL Server is 8192. It could > be made larger or variable, but that would be a major change, particularly > the latter. And the page size sets the limit between small objects > and big objects.
I just said "There is absolutely no reason sqlserver would have an 8K limit today, except for hangovers from the past." and you replied that the reason it's 8K is because that's the way it is and changing it would be a major change. Didn't you just say exactly what I said in different words? :-)
> Eh, that limit has been lifted in SQL 2008. If you want to play that > game I can add complaints on that C# 1.0 did not have generics. After all, > that is a quite essential feature in an object-oriented language. Quite > a silly restriction. I mean, if you want to play that game.
Ok, fair point. But C# 1.0 was a brand new language and the restriction wasn't really that silly. They could have held out to put it in but decided to release without it. On the other hand sqlserver is 20+ years old and shouldn't have 8k restrictions on anything.
> ADO .Net can do it. LINQ can't.
If ADO.Net can do it then linq can do, you would just cast whatever collection you have into whatever ADO.net requires.
> Well, since I'm SQL person and think TVPs are a great feature, I think > that is quite a limitation.
It's not a limitation. It can be done very easily with the most minor of code.
> Aren't all the elements there is C# to be the front end for a database? > Triggers would be events, stored procs would be functions, views would be > linq statements, tables would be collections etc. Imagine writing your > application in C# and stepping through code and straight into a C# stored > proc (insert sound of Homer Simpson drooling here :-)
There have actually been several front-end languages for database access invented over the years. SQL is just one (albeit a very widely adopted) possibility. You can definitely use C# as a front end for a database; in fact, people do exactly this every day when they write .NET Windows and Web client applications.
If you want C# as the native language for a database you're going to need to rearchitect the DBMS. C# would probably make a better native language for an object-oriented database as opposed to a database derived from the relational model though. If you want to create a DBMS whose native language is C#, I would say go for it--you may strike gold. Usually these things are created to solve a problem--is there a particular problem you're trying to solve with this idea (other than your dislike of SQL, that is)?
> "Erland Sommarskog" <esq...@sommarskog.se> wrote in message > news:Xns9CCDE487CBBCBYazorman@127.0.0.1... >> Today it is. Of course, you could implement a number of classes, and also >> add language constructs for the querying part. Well, it is already there. >> But you would also need the memory management, the storage management, >> the process management etc that are core parts of a database engine. SQL >> Server is *lot* more than just T-SQL.
> That's true, it would be a lot of work for MS (or whoever). Nothing they > couldn't achieve however.
>> Again, you can of course write classes etc. But C# is just a language. >> SQL Server is far more than just a language.
> Aren't all the elements there is C# to be the front end for a database? > Triggers would be events, stored procs would be functions, views would be > linq statements, tables would be collections etc. Imagine writing your > application in C# and stepping through code and straight into a C# stored > proc (insert sound of Homer Simpson drooling here :-)
>> Can you put all those 80 GB in one single string variable? Or even 2GB, >> even if machine only has a virtual address space of 1GB?
> No, but you can use a filestream to parse the data sequentially from the > disk. Strings are an in memory object but there are plenty of classes for > dealing with stuff that is not.
>>> There is absolutely no reason sqlserver would have an 8K limit today, >>> except for hangovers from the past. Maybe if we go past 8K we get a >>> performance hit but that would be fine.
>> And how do you know? The page size in SQL Server is 8192. It could >> be made larger or variable, but that would be a major change, >> particularly >> the latter. And the page size sets the limit between small objects >> and big objects.
> I just said "There is absolutely no reason sqlserver would have an 8K > limit today, except for hangovers from the past." and you replied that the > reason it's 8K is because that's the way it is and changing it would be a > major change. Didn't you just say exactly what I said in different words? > :-)
>> Eh, that limit has been lifted in SQL 2008. If you want to play that >> game I can add complaints on that C# 1.0 did not have generics. After >> all, >> that is a quite essential feature in an object-oriented language. Quite >> a silly restriction. I mean, if you want to play that game.
> Ok, fair point. But C# 1.0 was a brand new language and the restriction > wasn't really that silly. They could have held out to put it in but > decided to release without it. On the other hand sqlserver is 20+ years > old and shouldn't have 8k restrictions on anything.
>> ADO .Net can do it. LINQ can't.
> If ADO.Net can do it then linq can do, you would just cast whatever > collection you have into whatever ADO.net requires.
>> Well, since I'm SQL person and think TVPs are a great feature, I think >> that is quite a limitation.
> It's not a limitation. It can be done very easily with the most minor of > code.
"Michael Coles" <ad...@geocodenet.com> wrote in message news:OW%23bv% > That's OK; do you have a more useful example? Like I said, you have SQL > CLR where you can code with C# to interact with the database to your > heart's content.
Custom aggregates is a good example. While sqlserver can do it, it is kludgy and difficult. You need to recompile and add your assembly back into sqlserver. You can't pass parameters into the custom aggregates which means you need to write different custom aggregates where normally you would pass in a parameter. As an example, I wrote a custom aggregate to comma seperate varchar values when grouping. This worked well and was use by all the programmers here. But then someone wanted it seperated by a space so I had to write a whole new aggregate. Then someone wanted it sorted, then sorted descending, then one to remove duplicates, and the remove duplicates sorted descding seperate by a semi colon.
In comparison with C# and linq this would be very straight forward. Its really just calling a function which is something very basic.
It's interesting that you're selling me the CLR as a good feature (which it is), yet having a C#/Linq database is only 1 step away where we directly write C# code in Management Studio. Wouldn't that be a good feaure also?
> Hard to tell on here sometimes. Backwards compatibility is seen by some > as a positive, especially companies that have invested hundreds of > thousands, or millions, of dollars in developing for a specific platform.
That is true of course and sqlserver is certainly the area where you get crusified if you don't continue to support it. Image if they did to sqlserver what they did to VB6? :-) That brings up an interesting point though. They dumped VB6 for C#/vb.net and while this did upset those who used VB6 it did produce a far far superior product. Obviously they couldn't do this with sqlserver without being lynched but they could bring out a new product to sit along side sqlserver.