• Home >>  SQL >> LINQ
  • sqllinq: taking linq to sql in the other direction


  • Rating : Excellent[0]   Very Good[0]   Average[0]   Below Average[0]   Poor[0]
  • Pub Date: 10/9/2008
  • admin [ View Profile ]

  • Abstract
  • I've wanted to write an application that would allow me to execute queries against my music collection for some time now. Having been burning my CDs to MP3 and WMA for some years now, it contains thousands of songs, and I've been wanting something a bit more robust than what most music players I've used (like the Windows Media Player) allow. With the introduction of LINQ in .NET 3.5, this seemed like a good opportunity to create an app ......

  • Introduction
  •  Sponsored Links
  • introduction

    i've wanted to write an application that would allow me to execute queries against my music collection for some time now. having been burning my cds to mp3 and wma for some years now, it contains thousands of songs, and i've been wanting something a bit more robust than what most music players i've used (like the windows media player) allow. with the introduction of linq in .net 3.5, this seemed like a good opportunity to create an app i've been thinking about for a while, and at the same time dig into a new technology. this set of code is the result.

    my ultimate goal is to create a set of libraries that can be used to allow for executing sql queries against any data source that can represent itself in a relational, semi-relational, and tabular format. a future article will explore iqueryable<t>, but this article is the first step in that direction.

    background

    the first thing i came up against with linq and the vast majority of examples for its usage was they all assume implicit knowledge of the structure of the underlying data and compile time knowledge of the structure of the desired results. i'm sure this holds true for the vast majority of "line of business" applications where linq will really pay dividends. creating a linq query to aggregate and manipulate a sales report or a product catalog page entails detailed knowledge of the data and the requirements of the resulting aggregates. looking at linq to sql, it generates sql statements to match the runtime linq expression execution, and provides a very nice object relational mapping tool that, in the past, required extensive infrastructure coding or third party tools.

    the compiled code wouldn't know the structure of the underlying data (other than that it consists of name value pairs) nor would it know what fields or aggregations were needed, as the exact data manipulation and result structure is defined at runtime, not compile time. building up a linq statement like:

    enumerablerowcollection<datarow> query =
        from order in orders.asenumerable()
        where order.field<bool>("onlineorderflag") == true
        orderby order.field<decimal>("totaldue")
        select order;

    requires knowing, at a minimum, the order and number of clauses in the where statement as well as the order by statement. so, after some initial reading and investigation into linq, i came to the conclusion that i needed to go the other way: sql to linq rather than linq to sql. that lead to some other interesting areas like needing a sql parser and dynamic construction of complex expressions, but it did make for a good crash course in learning linq.

    using the code

    the structure of the code consists of three layers:

    • musdataprovider - this layer contains an idataadapter and associated types like a derived dbconnection and dbcommand. it is the layer that the example user interface interacts with.
    • musiclinq - this layer knows how to access the tags from media files, and uses the sqllinq layer to retrieve and evaluate sql statements passed in from the data provider layer.
    • sqllinq - this layer contains the sql parser, and exposes the resulting parse tree along with an api to allow the evaluation of arbitrary sets of data.

    the ui interacts with the data provider layer much like it would with any other provider:

    dataset data = new dataset();
    
    musdbconnection connection = new musdbconnection("root='c:\\music'");
    connection.open();
    idataadapter adapter = new musdataadapter("select * from soundtrack", connection);
    adapter.fill(data);

    the dataset can then be used just like an application would with one retrieved from sql server, oracle, jet, or other data source.

    a note about the unit tests

    most of the unit tests will fail when you run them from the attached code because they are written against a set of media files that, for obvious reasons, i haven't included with the download. they do provide basic examples of the sql syntax that works, and should be pretty easy to point at some other set of media and made to pass again.

    the musdataprovider layer

    the connection string is of the format name=value, with a semicolon separator like most other database connection formats.

    supported connection string parameters are:

    • root - the rooted path from which queries will search. this parameter is required. queries will be evaluated relative to this path. in the select statement above, the query will search "c:\music\soundtrack\".
    • recurse - true | false - indicates whether to recurse directory structures during query evaluation. defaults to true if not provided.
    • mediaextensions - the list of file extensions to include in the search. defaults to mp3,wma if not provided.

    aside from using the musiclinq layer to process sql statements, the only other thing that is done in this assembly is providing an implementation of idataadapter and dbdatareader so that the results can be returned to an application as a system.dataset. that is all pretty straightforward, and there are plenty of examples out there, so i won't delve into the details.

    the musiclinq layer

    this assembly has two main responsibilities:

    1. retrieve any or all of the media tags from a set of files, using the windows media format sdk.
    2. pass the sql command from the data provider layer to the sqllinq assembly, and use the resulting parse tree to retrieve and evaluate the correct media tags.

    as such, it is the glue between the sql as text, the sql as a parse tree, and the physical data store represented by a collection of media files. i started with some simple code that could read id3 v1.1 tags, but wanted something a bit more robust (asf container formats, other id3 tag versions etc.). a bit of searching lead to the windows media format sdk, and now the retrieval of the actual tag values is all done using wmvcore.dll and the interfaces iwmmetadataeditor, iwmmetadataeditor2, and iwmheaderinfo3. the code that performs this retrieval started its life as part of one of the sdk samples. it's now modified to fit the needs of this application and my particular coding style. as such, i think it may be a bit more generically useful than the original sample as it doesn't do the same tag of formatting as it deserializes the tags (except for byte arrays which are only returned as string representations indicating their length). if you are looking for some media tag access code, check out the tagloader class in the attached code. it should be pretty easy to yank out and bend to your needs.

    the meat of retrieving the actual tags has a basic win32 flavor to it; calling methods twice to get buffer sizes, and then passing them back in to be filled, null terminated strings etc. if you've worked with windows apis at all, nothing too surprising with this piece.

    private static iwmheaderinfo3 getheaderinfo(string path){
        iwmmetadataeditor editor;
        wmfsdkfunctions.wmcreateeditor(out editor);
    
        iwmmetadataeditor2 editor2 = (iwmmetadataeditor2)editor;
        editor2.openex(path, file_access.generic_read, file_share.file_share_read);
        return (iwmheaderinfo3)editor2;
    }
    
    private static void getallmediatags(idictionary<string, object> tags, string path)
    {
        iwmheaderinfo3 headerinfo3 = getheaderinfo(path);
        try
        {
            ushort wattributecount;
            headerinfo3.getattributecountex(0, out wattributecount);
    
            for (ushort windex = 0; windex < wattributecount; windex++)
            {
                wmt_attr_datatype wattribtype;
                string pwszname = null;
                ushort wattribnamelen = 0;
                ushort pwlangindex = 0;
                uint pdwdatalength = 0;
    
                // get the length of this attribute name
                // and value in order to alloc the buffers
                headerinfo3.getattributebyindexex(0,
                    windex,
                    pwszname,
                    ref wattribnamelen,
                    out wattribtype,
                    out pwlangindex,
                    null,
                    ref pdwdatalength);
    
                pwszname = new string('\0', wattribnamelen);
    
                readandaddattribue(tags, headerinfo3, windex, pwszname, pdwdatalength);
            }
        }
        finally
        {
            ((iwmmetadataeditor)headerinfo3).close();
            marshal.finalreleasecomobject(headerinfo3);
            headerinfo3 = null;
        }
    }

    the tags from each media file are returned from the loader in an idictionary<string, object>. the dictionary keys are case insensitive to match the case insensitivity of the sql parsing. as the code iterates all of the files in the specified directory structure, it builds up an ilist<idictionary<string, object>>. this ilist represents the result set from the query, with each idictionary being analogous to a single row. the list of rows is how data is returned to the provider layer.

    the sqllinq layer

    this is where all the good linq-y stuff happens!

    sql parsing

    sql parsing started with a very simplistic approach using what was little more than a rudimentary tokenizer. that got me as far as the simple select field from source part, but not much farther. a day or so of searching came up with nothing usable, until i came upon the grammar oriented language developer or gold application. this thing started as devin cook's master's thesis, and is really a remarkable tool.

    what it does is take in the rules for an arbitrary language in bnf format and spit out a set of parse tables to be used later by a language specific state machine parse engine. there are even a number of engines available in multiple languages including c#. this code uses a c# engine by vladimir morozova, which is included in the download. there was even a specification for a simple sql syntax available (extended slightly to add column aliases, boolean literals, and to allow having clauses to contain aggregate expressions). there is also a nice article here on codeproject that explains the gold application and its use in more depth. basically, this solved my problem perfectly, and kudos to the authors of the gold application.

    parsing proceeds from the leaves to the trunk, building up the parse tree along the way. as the parser traverses tokens and rules, it yields up a reduction rule which corresponds to the various parts of the language. a custom attribute is used to map c# classes to sql rules and instantiate them as the parser does its work.

    [syntaxnode(ruleconstants.rule_whereclause_where)]
    public class whereclause : nonterminalnode
    {
        public whereclause()
        {
        }
    
        public ienumerable<idictionary<string, t>> 
           evaluate<t>(ienumerable<idictionary<string, t>> source)
        {
            return source.where(createevaluationfunction<t>());
        }
    
        private func<idictionary<string, t>, bool> 
                 createevaluationfunction<t>()
        {
            predicatenode predicate = findchild<predicatenode>();
            if (predicate != null)
                return predicate.createevaluationfunction<t>();
    
            literalnode node = findchild<literalnode>();
            if (node != null)
                return node.createevaluationfunction<t>();
    
            debug.assert(false);
            return null;
        }
    }
    public bool parse(textreader sourcereader)
    {
        debug.assert(sourcereader != null);
    
        m_parser = parserfactory.createparser(sourcereader);
        m_parser.trimreductions = true;
    
        while (true)
        {
            switch (m_parser.parse())
            {
            
            ...
            
            case parsemessage.reduction:
                nonterminalnode nonterminal = 
                    syntaxrulefactory.createnode(m_parser.reductionrule);
                m_parser.tokensyntaxnode = nonterminal;
    
                for (int i = 0; i < m_parser.reductioncount; i++)
                    nonterminal.appendchildnode(m_parser.getreductionsyntaxnode(i) 
                                                as syntaxnode);
    
                // post parsing syntax check (used to segregate the difference 
                // between having and where 
                // expressions in terms of the validtity of aggregate expressions)
                nonterminal.checksyntax();
                break;
            ...
            
            }       
        }
    }
                
    static class syntaxrulefactory
    {
        private static idictionary<int, type> _nodeimpltypemap = loadimpltypes();
    
        public static nonterminalnode createnode(rule rule)
        {
            debug.assert(rule != null);
    
            if (_nodeimpltypemap.containskey(rule.index))
            {
                nonterminalnode node = activator.createinstance(
                           _nodeimpltypemap[rule.index]) as nonterminalnode;
                debug.assert(node != null);
                node.rule = rule;
                return node;
            }
    
            // if no type is bound to the rule then
            // just create a base non-terminal node
            nonterminalnode genericnode = new nonterminalnode();
            genericnode.rule = rule;
            return genericnode;
        }
    
        private static idictionary<int, type> loadimpltypes()
        {
            idictionary<int, type> dict = new dictionary<int, type>();
    
            foreach (type t in assembly.getexecutingassembly().gettypes())
            {
                if (t.isclass && t.isabstract == false && 
                    t.issubclassof(typeof(nonterminalnode)))
                {
                    foreach (syntaxnodeattribute attr in 
                             t.getcustomattributes(typeof(syntaxnodeattribute), false))
                        dict.add((int)attr.ruleconstant, t);
                }
            }
    
            return dict;
        }
    }

    a sql statement that looks like select * from soundtrack where bitrate = 128000 results in a parse tree like:

        <select stm> [rule id=rule_selectstm_select class=selectstatement]
            select
            <columns> [rule id=rule_columns_times class=columns]
                <restriction> [rule id=rule_restriction class=nonterminalnode]
                *
            <into clause> [rule id=rule_intoclause class=nonterminalnode]
            <from clause> [rule id=rule_fromclause_from class=fromclause]
                from
                <id member> [rule id=rule_idmember_id class=nodewithid]
                    soundtrack
                <join chain> [rule id=rule_joinchain2 class=nonterminalnode]
            <where clause> [rule id=rule_whereclause_where class=whereclause]
                where
                <pred exp> [rule id=rule_predexp_eq class=equalitynode]
                    <value> [rule id=rule_value_id class=nodewithid]
                        bitrate
                    =
                    <value> [rule id=rule_value_integerliteral class=integerliteral]
                        128000
            <group clause> [rule id=rule_groupclause class=nonterminalnode]
            <having clause> [rule id=rule_havingclause class=nonterminalnode]
            <order clause> [rule id=rule_orderclause class=nonterminalnode]

    once the parse tree is built, then it is just a matter of traversing it in order to generate the appropriate linq constructs or expressions appropriate to each node. for instance, in the selectstatement (the root node for a select statement), the evaluation function uses the various sub clauses with each child node, further modifying the data passed in according to its sql semantics and its own branch of the parse tree:

    public ienumerable<idictionary<string, object>> 
            evaluate(ienumerable<idictionary<string, object>> source) 
    {
        ienumerable<idictionary<string, object>> results = source;
    
        if (whereclause != null)
            results = whereclause.evaluate<object>(results);
    
        results = evaluateaggregates(results);
    
        if (havingclause != null)
            results = havingclause.evaluate<object>(results);
    
        if (orderbyclause != null)
            results = orderbyclause.evaluate<object>(results);
    
        return postprocessresults(results);
    }

    the orderbyclause uses the orderby and thenby extensions to do ordering. similarly, the groupbyclause uses linq's grouping constructs to roll up the source data:

    public ienumerable<idictionary<string, object>>
        evaluate(ienumerable<idictionary<string, object>> source, 
             ienumerable<aggregatenode> aggregates)
    {
        ilist<idictionary<string, object>> list = new list<idictionary<string, object>>();
    
        foreach (nodewithid item in groupbyitems)
        {
            var groupby = from d in source
                          group d by d.containskey(item.lookupid) ? 
                                d[item.lookupid] : dbnull.value 
                              into g 
                              select g;
    
            foreach (var g in groupby)
            {
                idictionary<string, object> dict = 
                  new dictionary<string, object>(stringcomparer.invariantcultureignorecase);
                dict.add(item.lookupid, g.key.tostring());
    
                foreach (aggregatenode aggregate in aggregates)
                    dict.add(aggregate.lookupid, 
                         aggregate.evaluate<object>(g.asenumerable()));
                
                list.add(dict);
            }
        }
    
        return list;
    }
    
    public ienumerable<nodewithid> groupbyitems
    {
        get
        {
            return finddescendants<nodewithid>();
        }
    }

    building expressions

    in order to implement the where and having clauses, heavy use is made of system.linq.expressions. this made implementing that logic pretty straightforward as most sql operators have a direct analog in the expression namespace. the one glaring exception is like, which i haven't got working yet (never been good at regex's...). the main task for expression evaluation is to generate a func<idictionary<string, object>, bool> that can then be used with the linq where<t> extension method. all of this work is done with a class hierarchy that starts with the predicatenode. this abstract base class implements basic functionality like building expressions to index the input idictionary, type coercion, and navigation to sub predicates and operands.

    public func<idictionary<string, t>, bool> 
                 createevaluationfunction<t>()
    {
        m_param = expression.parameter(typeof(idictionary<string, t>), "arg");
        m_indexermethod = typeof(idictionary<string, t>).getmethod("get_item");
        m_containskeymethod = typeof(idictionary<string, t>).getmethod("containskey");
    
        // traverse the tree to generate a lambda expression and 
        // then compiile into a function
        return expression.lambda<func<idictionary<string, t>, 
               bool>>(createoperatorexpression(left), m_param).compile();
    }
    
    protected abstract expression createoperatorexpression(expression left);
    type coercion

    because the data type of each field in the input data is not known when the expression is built, type coercion is somewhat lax. if an expression is being compared to an integer literal, for example, a expression.convert will be used to coerce the input to an integer (ultimately using system.convert). if the type cannot be determined from a literal, there is a simple system of fall backs. arithmetic expressions will default to the real domain for instance, boolean operations defaulting to bool and most other predicates falling back to strings.

    private literalnode findcoerciontype(int index)
    {
        if (index != 0 && index != 2)
            return null;
    
        literalnode node = findchild<literalnode>(oppositeside(index));
        // look at what the child operand is being compared to
    
        if (node == null && (this.index == 0 || this.index == 2))
            node = parent.findchild<literalnode>(oppositeside(this.index)); 
              // look at what the whole expression is being compared to
    
        // if we don't find any literals in the area, look for a predicate
        // expression that can drive the type coercion
        if (node == null)
        {
            predicatenode predicate = findchild<predicatenode>(oppositeside(index)); 
            // look at what the child operand is being compared to
            if (predicate == null &&
               (this.index == 0 || this.index == 2))
                predicate = parent.findchild<predicatenode>(oppositeside(this.index)); 
            // look at what the whole expression is being compared to
    
            if (predicate != null)
                node = predicate.getexpressiontype();
        }
    
        return node;
    }
    indexing the dictionary

    in this example usage of dynamic expression building (media file tag collections), not every row in the input data will have the same set of fields. this is because not every media file has the same set of tags embedded within it. for this reason, an extra step is taken when indexing each dictionary, first checking containskey and returning dbnull if that returns false, basically allowing the same set of fields to be evaluated for every row, and allowing every row to be conceptually nullable. this pattern of first checking containskey shows up in a number of places in the code.

    private conditionalexpression createdereferenceexpression(int index)
    {
        nodewithid idnode = findchild<nodewithid>(index);
        if (idnode != null)
        {                
            // in order to avoid keynotfoundexceptions this will create 
            // an expression of the general form:
            // if(dictionary.containskey(key))
            //      return dictionary[key];
            // else 
            //      return null;
            literalnode node = gettypecoercionnode(index); 
            // this is used to coerce the value in the dictionary to the correct 
            // type for comparison
    
            expression key = expression.constant(idnode.lookupid);
            expression containskey = expression.call(m_param, 
                                        m_containskeymethod, key);
            expression returnvalue = expression.convert(expression.call(m_param,
                                       m_indexermethod, key), node.valuetype, 
                                       node.coerciondelegate.method);
            expression returnnull = expression.constant(node.nullrepresentation,
                                       node.valuetype);
    
            return expression.condition(containskey, returnvalue, returnnull);
        }
    
        return null;
    }
    putting them all together

    with all of the heavy lifting done in the predicatenode base class, each derived class just needs to provide the correct expression in its override of createoperatorexpression:

    [syntaxnode(ruleconstants.rule_predexp_eq)]
    public class equalitynode : predicatenode
    {
        public equalitynode()
        {
        }
    
        protected override expression createoperatorexpression(expression left)
        {
            return expression.equal(left, right);
        }
    }

    retrieving and evaluating data

    with the current implementation, evaluating a set of data is a two step process. first, the client code has to retrieve the larger set of data to be evaluated. it can use the parse tree to guide the retrieval, but the actual de-serialization out of the source data store exists entirely outside of the sqllinq assembly. second, it uses the sqllinq code to evaluate the data which applies the logic for everything but the from clause. eventually, i'd like to explore iqueryable<t> in order to eliminate the two part sequence. the good thing with the current two step implementation is that the sqllinq code knows nothing about the nature of the data other than that it can be represented as an ilist<idictionary<string, object>>. the one place where this is compromised slightly is that since this test case searches folder hierarchies, the fromclause does not expose the join chain as one would find with a truly relational data source. below is the method in the musiclinq assembly that ties the selectnode to the retrieval of the tag data and then evaluates and returns the results:

    public result process(selectnode selectstatement, string path)
    {
        if (path.ispathrooted(selectstatement.fromclause.root))
            path = selectstatement.fromclause.root;
        else
            path = path.combine(path, selectstatement.fromclause.root);
    
        trackdata data = new trackdata(path);
        data.recurse = recurse;
        data.extensionlist = extensionlist;
    
        if (selectstatement.columns.allcolumns)
            data.load();
    
        else
            data.load(selectstatement.getfieldlist());
    
        result result = new result();
        result.tracktags = selectstatement.evaluate(data.tracks).tolist();
    
        // set the alias values for those column that have them
        foreach (columnsource column in selectstatement.columns.columnsources)
        {
            if (column.hasalias && result.fields.contains(column.lookupid))
                result.fields[column.lookupid].alias = column.alias;
        }
    
        return result;
    }

    points of interest

    i learned a lot in writing this article. from how to get tags out of a media file, to the details parsing and parse trees, and then on to linq itself. for a procedural/oo guy like myself, linq took some time to wrap my head around. i hope to keep extending this as time allows and see what other sorts of data stores can be evaluated with textual sql statements...

    things to do

    • add support for iqueryable<t> and provide an implementation that can go directly at the data rather than requiring the client code to retrieve it and then evaluate the results.
    • implement update and delete statements (delete from soundtrack where [wm/shareduserrating] < 25 etc.)

    history

    • july 27, 2008 - first posting.
    • july 28, 2008 - added like, not like, in, and not in operators.
  • RATE THIS ARTICLE :
  •  
    • Latest Comments:
      • Add a comment
      • Title:
      • Comment
      •  
    Other popular LINQ articles:
    • LINQ to SQL Serialization

      I am a huge fan of LINQ to SQL feature of the .NET Framework 3.5. If you don't know what LINQ to SQL is, please read the white paper here.I like the way in which it makes database coding simple and easy. Developers do not have to use different programming models (CLR functions and SQL Stored Procedu

    • Implementing a Left Join with LINQ

      Oddly enough, LINQ doesn't define keywords for cross join, left join, or right join. As part of the LINQ grammar, you get join and group join. Joins can be equijoins or non-equijoins. An equijoin uses the join keyword and non-equal joins are contrived using where clauses. However, left, right, and c

    • LINQ in Multi-tier Applications

      If you had the chance to work with LINQ, the latest Object Relational Mapping tool (ORM) added to the .NET 3.5 platform, then you understand how LINQ maps your database structure into several classes in your application. It creates for each table an entity class that has its properties mapped to the

    • SqlLinq: Taking LINQ to SQL in the Other Direction

      I've wanted to write an application that would allow me to execute queries against my music collection for some time now. Having been burning my CDs to MP3 and WMA for some years now, it contains thousands of songs, and I've been wanting something a bit more robust than what most music players I've

    • LINQ Challenges and SQL Server Compact Edition

      ContentsLINQ to SQLGrounds for Determination (The Sample)LINQ and SQL Server Compact EditionWorking with Enumerations in LINQ to SQLField Subset Challenges with LINQ to SQLChange Tracking Challenges with LINQ to SQLDisconnected Challenges with LINQ to SQLPre-fetching Challenges with LINQ to SQLOther

    • how to: linq to sql transformation

      The v3.5 release of the .NET framework includes a significant number of new and enhanced technologies. LINQ (Language Integrated Query) is, in my opinion, the most significant new technology in the v3.5 release. Microsoft has implemented a set of libraries to transform LINQ expression trees to SQL s

    • linq to csv library

      ContentsRequirementsInstallationQuick StartWrite OverloadsRead OverloadsDeferred ReadingCsvFileDescriptionCsvColumn AttributeError HandlingThis library makes it easy to use CSV files with LINQ queries. Its features include:Follows the most common rules for CSV files. Correctly handles data fields th

    • Working with Range Variables and Let Statements in LINQ

      Complicated things seem intuitively simple when complexity is cleverly hidden in the open. LINQ is a query language that at its basic is pretty simple to learn to use, but there is a lot of meaning in all of the aspects of a query. Understanding these cleverly hidden meanings will help you get beyon

    • Introduction to LINQ, Part 1: LINQ to Objects

      Perhaps the most important new feature to the next version of Visual Studio, for now code-named 'Orcas,' is the release of LINQ, which stands for Language INtegrated Queries. LINQ is actually a set of operators, called standard query operators, that allow the querying of any .NET sequence. LINQ come

    • How To LINQ To SQL: Part III

      This article is the third in a series outlining how to translate LINQ expression trees to SQL statements that can be executed against multiple RDBMS systems and not just Microsoft's SQL Server offerings. The articles will also illustrate how to:Correctly and comprehensively translate binary and unar

    About Us |Contact us |Site Map |Csharp |Visual C / C++ |Visual basic |Java |SQL |Linux / Unix |Ajax
    ©2007-2018 CodeCoolest.com. Ptolive.cn Asp.net source code All Rights Reserved.