Using BINARY in a MySQL IN Expression

With the default collations in MySQL (e.g. utf8_general_ci), strings will be matched case-insensitively:

WHERE col IN ('IT', 'Ruby') will match “it” and “ruby”.

For case-sensitive matches, each string must be preceded by the keyword BINARY:

WHERE col IN (BINARY 'IT', BINARY 'Ruby')

Caveat: I would guess that BINARY affects the matching of strings with combining characters (same character, different byte representation).

Elgg Core Proposal: New Table “entity_list_items”

[This proposal has been superseded with ElggCollection.]

As of Elgg 1.7.4 there’s no way to specify a specific list or ordering of entities for use in an elgg_get_entities query. E.g. one might want to implement:

  • A “featured items” list for a group or user widget. On a group page this could be implemented in “left” and “right” lists for better display control
  • “Sticky” entities in discussion lists, or any other object lists
  • A “favorites” list for each user

Continue reading  

SQL Server Mgmt Studio Can’t Export CSV

I’m trying to export large (e.g. 16M rows) SQL Server 2008 views for eventual import into MySQL. The underlying setup for these views is bizarre: A SELECT for a single row can take over two minutes, a SELECT * FROM view with no WHERE clause is much faster, but still too slow to finish an export in any client we’ve connected with. The only reasonable solution I’ve found is to use SQL Server Management Studio, right-click the DB > Tasks > Export Data… and target a flat file. This seriously seems about 100x as fast as a direct query in SSMS and 200x as fast as an ODBC connection.

There’s only one (major) gotcha: The flat file writer in SQL Server Management Studio (ver 10.0.2531.0) is broken for CSV output. There’s no configuration I can find that will correctly escape characters that need to be. By default it doesn’t enclose any values in quotes, but even if you specify the double quote as “Text Qualifier” (yay for obscure proprietary descriptions), double quotes within values won’t be escaped in any way. Not with \" nor "" (the CSV standard). This means you can either export fixed-width columns, or try to find chars that never exist in your data to use as field/line delimiters. The escape char \ is also not escaped, so you could end up with a line feed character in your import when the original data was literally “foo\road”. Stupid.

Again, I can’t just select all rows in the tool and then save the resulting recordset as CSV (which isn’t broken) because the views are so big that the queries time out (the resultsets would probably crash the tool anyway) and because there’s no way to partition the query that doesn’t destroy performance. I’m going to have to create a local SQL Server DB and do a DB to DB transfer, then use a client connection to export.

Update: You can at least use obscure Unicode chars as delimiters, though it outputs invalid UTF-8 chars after the first row. You have to export as “Unicode” (UTF-16 little-endian. 2-byte chars with a leading \xFF\xFE BOM). If you pick delimiters like ʖ and ʁ and go this path, you won’t be able to use traditional line-by-line file reading functions. You’ll have to stream a fixed number of bytes at a time, building up columns (and validating that the number of columns/row stays constant) and converting to rational CSV that can be handled by MySQL. Another vote for setting up a local SQL Server instance.

Distance and bearings between GPS coordinates in SQL

Another post from my sql abominations series. I’m running this on MySQL 5 particularly.

Assume you have a table of locations with Latitude and Longitude for each one. In my case the table is “station”, primary key being “LocID”. With help from this article, first we create a view to get 3D coordinates (6378 = Earth’s radius in km):

CREATE VIEW gpsGlb AS
    SELECT
        LocID
        ,6378 * COS(RADIANS(Latitude)) * COS(RADIANS(Longitude)) AS x
        ,6378 * COS(RADIANS(Latitude)) * SIN(RADIANS(Longitude)) AS y
        ,6378 * SIN(RADIANS(Latitude)) AS z
    FROM station;

Now we can query for great-circle distance (I want rounded miles) to all locations from, say, LocID 405:

SELECT
    LocID
    ,ROUND((2 * 6378 * ASIN(d / 2 / 6378)) * 0.621371192) AS dist_mi
FROM
    (SELECT
        SQRT(dx * dx + dy * dy + dz * dz) AS d
        ,LocID
     FROM
        (SELECT
            p1.x - p2.x AS dx
            ,p1.y - p2.y AS dy
            ,p1.z - p2.z AS dz
            ,p2.LocID
        FROM gpsGlb p1
        JOIN gpsGlb p2 ON (p1.LocID = 405 AND p2.LocID != 405)
       ) t1
    ) t2
ORDER BY dist_mi

With help from this article, we can query for the initial bearing to each location. The “boxed” calculation will come in handy later.

SELECT
    LocID
    ,(360 + DEGREES(ATAN2(y, x))) % 360 AS initBearing_deg
    ,ROUND(((360 + DEGREES(ATAN2(y, x))) % 360) / 22.5) * 22.5
     AS initBearingBoxed_deg
FROM
    (SELECT
        SIN(RADIANS(s2.Longitude - s1.Longitude)) * COS(RADIANS(s2.Latitude))
        AS y
        ,COS(RADIANS(s1.Latitude)) * SIN(RADIANS(s2.Latitude))
            - SIN(RADIANS(s1.Latitude)) * COS(RADIANS(s2.Latitude))
               * COS(RADIANS(s2.Longitude - s1.Longitude))
        AS x
        ,s2.LocID
    FROM station s1
    JOIN station s2 ON (s1.LocID = 405 AND s2.LocID != 405)
    ) q1

What you’ve all been waiting for! The combined query plus boxed compass directions (like ‘NNE’), etc. I’ve also added a limit for the distance in the qq1 subquery since I only want close locations.

SELECT
    qq2.LocID
    ,dist_mi
    ,CASE initBearingBoxed_deg
        WHEN 22.5 THEN 'NNE'   WHEN 45 THEN 'NE'
        WHEN 67.5 THEN 'ENE'   WHEN 90 THEN 'E'
        WHEN 112.5 THEN 'ESE'  WHEN 135 THEN 'SE'
        WHEN 157.5 THEN 'SSE'  WHEN 180 THEN 'S'
        WHEN 202.5 THEN 'SSW'  WHEN 225 THEN 'SW'
        WHEN 247.5 THEN 'WSW'  WHEN 270 THEN 'W'
        WHEN 292.5 THEN 'WNW'  WHEN 315 THEN 'NW'
        WHEN 337.5 THEN 'NNW'  ELSE 'N'
     END AS bearing
FROM (
    SELECT
        LocID
        ,ROUND((2 * 6378 * ASIN(d / 2 / 6378)) * 0.621371192) AS dist_mi
    FROM
        (SELECT
            SQRT(dx * dx + dy * dy + dz * dz) AS d
            ,LocID
         FROM
            (SELECT
                p1.x - p2.x AS dx
                ,p1.y - p2.y AS dy
                ,p1.z - p2.z AS dz
                ,p2.LocID
            FROM gpsGlb p1
            JOIN gpsGlb p2 ON (p1.LocID = 405 AND p2.LocID != 405)
           ) t1
        ) t2
    ) qq1
JOIN (
    SELECT
        LocID
        ,(360 + DEGREES(ATAN2(y, x))) % 360 AS initBearing_deg
        ,(360 + ROUND((DEGREES(ATAN2(y, x))) / 22.5) * 22.5) % 360
         AS initBearingBoxed_deg
    FROM
        (SELECT
            SIN(RADIANS(s2.Longitude - s1.Longitude)) * COS(RADIANS(s2.Latitude))
             AS y
            ,COS(RADIANS(s1.Latitude)) * SIN(RADIANS(s2.Latitude))
                - SIN(RADIANS(s1.Latitude)) * COS(RADIANS(s2.Latitude))
                   * COS(RADIANS(s2.Longitude - s1.Longitude))
             AS x
            ,s2.LocID
        FROM station s1
        JOIN station s2 ON (s1.LocID = 405 AND s2.LocID != 405)
        ) q1
    ) qq2 ON (qq1.LocID = qq2.LocID
              AND qq1.dist_mi <= 60)
ORDER BY dist_mi

Result set is something like:

LocID dist_mi bearing
250 25 E
260 30 NNE
240 42 ENE

Hope this is useful to someone.

Average direction in SQL

Given a column of polar directions in degrees, this is a single SQL expression to compute their average, for use in a GROUP BY query. Some functions may be MySQL-specific.

IF(DEGREES(ATAN2(
        -AVG(SIN(RADIANS(direction_deg)))
        ,-AVG(COS(RADIANS(direction_deg))))
    ) < 180
    ,DEGREES(ATAN2(
        -AVG(SIN(RADIANS(direction_deg)))
        ,-AVG(COS(RADIANS(direction_deg))))
    ) + 180
    ,DEGREES(ATAN2(
        -AVG(SIN(RADIANS(direction_deg)))
        ,-AVG(COS(RADIANS(direction_deg))))
    ) - 180
)

Thought someone might like to run across this.