Q: I've read through the documentation but I am having trouble grasping what exactly I should be doing.
Here's a typical scenario for how clients use our XML that may give you a clearer idea of the process:
1. Request new feeds from Atom (e.g. http://xml.sportsdirectinc.com/Atom?feed=/baseball/boxscores&newerThan=2018-08-07T12:42:13). This will return Atom XML listing all of the updated baseball boxscores since 12:42:13pm on Aug 7, 2007.
2. Request the URLs for each updated baseball boxscore XML feed (available from the Atom feed).
3. Parse each XML feed and save the data to a database.
4. The client then pulls the data from the database as needed.
Q: How long does it take to get the feeds integrated in our systems?
This one is hard to answer as it depends on so many specific things at the customer's end. Why, and how is that XML data used? What systems are involved (databases, other repositories)? Is there any data mapping to be done (mapping SDI's players to customer's players, for example)? What is the experience and knowledge level of the customer's IT staff?
We provide general support by answering any questions how to retrieve our data, what the feeds contain, and so forth. The rest is up to the customer to evaluate. We will also most likely provide reference implementations as to how to deal with our feeds, but in general terms we do not expect to know any specifics about customer's systems and integration.
Q: Your XML schema is quite relaxed. Why?
Balancing between a relaxed schema and a strict one is not easy. Strict schemas tell you explicitly what data elements will be there, what the numeric values are, and so forth. Relaxed schemas have more optional data elements, less enumerations and other fixed value sets, therefore knowing which exact data elements are present is left to the application to deal with.
As an XML provider, you do not want to change your schema very often. Changing the schema means versioning, and migrating customers to newer schemas. This is a very slow and cumbersome process.
The problem with strict schemas is that they do not allow variation in the content. It is likely that strict schemas will have to change far more often than relaxed ones. Special care has to be taken to choose which data elements are relaxed, and which ones are strict. This is also dictated by the data, and availability of various data items across the whole database, for example. It might be necessary to allow variance, just for this reason.
The main motivation:
- We have varying levels of information available in our databases, within one sport, or between leagues of a sport. Accommodating these with a strict schema adds to complexity and the decisions are hard to change once made. But with a fairly relaxed schema we can just let things evolve, and we can make the schema stricter as we improve our data consistency.
- Simplicity. Only one schema to deal with, not ten, or one per feed. The schema is relatively simple and straightforward to understand.
- Flexibility. More relaxed schemas do not have to change all the time. We have taken care to allow change in parts where it is expected.
- We cannot relax a strict schema without breaking customer applications. For example, you can not drop a data element that was expected to be there.
- We can, however, make the schema stricter over time without breaking customer applications. Since the applications already expect varying data elements, making them always available will not break the application.
Q: Are the feeds incremental or are they dumps?
The answer depends on your definition of 'incremental'. Most of our feeds are definitely 'dumps' that contain the whole state of the schedule, results for that day, week, season. However, some feeds could be seen as incremental; box-scores and live-scores receive updates by individual games, but even these contain the full state of the game at given time.
Incremental feeds are trickier to handle because you run the risk of losing the incremental updates. For example, if there is a system crash, lost connection, or anything of the sort, you will need a 'dump' update or replay of those events that were missed. Making the incremental updates reliable is possible with Message Queues but this is still quite complicated to achieve. The happy medium is to consider the granularity of the 'dumps' and make them act more like incremental updates when it makes sense, and at the same time keeping the benefits of retaining the full state of 'dumps' at all times. For example, live-scores update one file/game; as the game progresses the full state of the game is updated to this same game file. We provide an Atom syndication feed as a way to follow updates (HTTP delivery) to all of the files so that the customer can retrieve only those files that have recently updated. In other words, we cannot guarantee delivery of all individual changes, but we can provide the latest state at all times.
In the future, we may offer more fine grained updates to allow more flexibility. We can also support a more event-driven incremental model this way, but it is easier to use a push-style delivery method, like FTP (not currently available), with this model.
Q: How often is each feed updated? When I retrieve the feed, how far back will the data go? Will it have only the last day/week/month worth of data or will it contain the entire season?
The feeds update at different schedules based on what is happening. Basically if something is updated on a hosted page that we provide for a client (http://columbusdispatch.sportsdirectinc.com/sports-scores/Baseball-Scores-Matchups.aspx), then it will be available in the XML as well. You will know when updates occur by using the Atom syndication feed.
Boxscores - boxscores are setup individually; all completed games that have a boxscore are available; the client can tell the Atom listener how far they want to go for boxscore information
Injuries - all current injuries
Live Scores - full status of a game continually updated throughout the game
Odds - current odds available
Players - listing of all of the current players
Player Stats - year-to-date stats for players
Teams - listing of all of the current teams
Team Stats - year-to-date stats for teams
Results - all past (completed) games are available (note: for MLB suspended games are also included)
Schedules - all remaining games for the season are included (past games removed)
Q: Looking at the Results feed, what does it mean when there is no results stat for a team? Also, I only ever see stat-groups with keys of "results", what are the other allowed values there?
In the sample below, Kansas City did not score. The final was 17-0 for the NY Giants. We don't show 0 stats, so the stat does not appear. Also, since the stat-group would be empty, it also does not appear.
<stat num="17" type="points"/>
At this time, "results" is the only stat-group key that will appear in the Results feed but where this could possibly change down the road clients should look at the stat-group and not assume "results".
Q: On your website you say you offer full customized solutions to suit our needs. We realize that the box scores you emailed contain absolutely everything, but we're having a few startup issues with the nested approach that you provided. Is it possible that we provide you with a template which better suits our need for both NBA and NCAA boxscores?
You are perhaps referring to our hosted web content (HTML). The hosted content could be a solution that you want instead of XML. Currently we do not provide customized XML feeds. We can, however, tailor the package that you pay for if you choose to go with our XML feeds. For example, you might only choose one sport, one league, one type of feed, etc.
Some of the XML is a little complex at first, but we feel it is necessary to describe the data. We feel that the nested XML approach is closer to the nature and purpose of XML markup. There are some exceptions to this rule in our XML when it is reasonable to take another approach (for example look at Golf XML). Nested XML structures are easy to verify to be correct, and most tools are specifically geared towards this type of XML. If you, for example, heavily use your own id references you can not verify the document to be correct with standard tools because it requires business logic to figure that out. If you choose to use the XML standard id references, it is very hard to create them for complex documents, and we find that they are not worth the trouble.
Q: How often are changes made to the schema?
We will avoid changing the schema as much as possible, but we will make changes when we have to. Right now we expect little changes. For example, our live scores feeds will probably have more information in the future, but those changes should be 'compatible'. We may, over time, tighten the restrictions on elements (instead of saying an element may or may not appear, we may change it to say that the element will always appear. Again these types of changes shouldn't break anything on the client side since the client already has to account for the possibility of the element being present.
There are two types of changes:
- 'compatible': these are additions or changes that do not break anything existing. These include additions of new elements, new stats, without breaking the semantics of anything that already existed before.
- 'incompatible': these changes break the semantics of the xml and will require work at the client's end in most cases. For any incompatible changes, we will provide the old version for a suitable period of time.
Q: What is the communication mechanism used to inform customers of pending changes?
We have a customer mailing list that we will use for things like this and also have plans of providing a Wiki page where we will be able to post a news item. Clients should provide us with a contact(s) that they would like included for the mailing list.
Q: What is the lead time given to customers prior to the changes being made effective?
For small changes that are backwards compatible we post notes on the main page of this documentation site. If the changes are still small but might affect customers, we usually email and inquire whether our change will inconvenience the customers or not. If not, we go ahead with the change. If yes, we will give some time for work at the customer's end. When bigger changes occur, we will provide an estimated cut-off time (e.g. 'x' amount of months) when we will remove feeds supporting the old schema to allow clients enough time to switch over.
Q: What is the typical nature of such changes based on past experience?
The changes have been both small and big, but so far we have been able to handle this without any disruptions to our customers. For example, we have added entirely new sports to the mix. We have also adjusted small details on a few occasion.
To make the changes as transparent as possible we recommend that the clients do not impose validation to received documents at their end because anything that changes the schema will then cause the reading application to break. We validate all documents before they are made available. If you still wish to validate the XML data at your end, you must keep grabbing the latest schema frequently.
Q: How are the XML files transferred to us? Do you need ftp access to one of our machines, or do we fetch the data from you?
The current delivery method is HTTP Pull, where clients fetch the data from us. You must supply us with your IP address or addresses.
Q: Does SDI have a database format in table-format that they already use which would save us re-inventing the wheel and making our own based on their data?
We don't have this at this time. It is something we have discussed.
Q: Do you have a master list of Team ID's?
Yes, there is a Team feed with Team IDs for each sport/league. You can find the URLs for each in the Feed Index documentation here:
For example, here's the Team feed for NBA:
Q: Are all Team ID's unique?
The ids are completely unique across the sports, leagues, etc. The number part of the ID is not, however. The numbers are only unique within the sport. For example:
These ids are unique if they are handled as full IDs, but the number part can be the same across sports.
Q: I was wondering if there was a way that I could get every competition-id?
- The competition IDs are available in the Schedule feeds for all games yet to be played. You can store the competition-ids from here so that you know what to do with them when live scores, boxscores, etc. are available. You should be using the Atom listener (XML Delivery Documentation) to notify you when updates have occurred for different feeds, including when particular live scores, boxscores, etc. are available.
- You can use the Results feeds to find a particular competition-id for a game that is finished and the boxscore is available.
Q: When I attempt to access the feed for the upcoming game I get an error, what's up?
The game you are trying to hit will become available shortly before game time. Until we are ready to enter the actual live scores for the game, there are no live scores feed for the game available.
The Atom Listener is a great tool that we STRONGLY recommend using. It is extremely useful for feeds like live scores and boxscores after the game. Here is some documentation about the Atom Listener setup and instructions on how to use it.
The Schedule feeds show the upcoming games.
Q: When I try to access the various XML feeds, I get prompted to enter a username and password. What should I enter?
Chances are you are trying to access the XML feeds from a system that we do not have the IP address setup for. Please send us the proper IP address(es) that you will be using to access the feeds from. The other option is that you are trying to access feeds with a wrong URL.
Q: the NCAAB premium odds feeds do not seem to include an "opening line". Is the "opening line" not included with the feeds?
Opening lines are included. In the feeds Opening = "True" or "False" indicates if it is an opening line.
Q: What time zone are your times representing in your various feeds?
We often have times in the Eastern time zone (e.g. <start-date>2008-01-05T20:00:00-05:00</start-date>) where the "-05:00" indicates Eastern time at this time. However, this isn't always guaranteed to be the case (see the AHL game in Central Time below as an example) so we always show the time zone. Clients are then able to convert to any time zone they desire.
Q: Which time zone should we be sending to your server in the "newerThan" parameter when making feed requests?
It doesn't really matter; it gets converted to the server time when we do the date comparison on the files (e.g. 19:00:00-0400 gives the same results as 18:00:00-0500).
Q: We have noticed that some of the NCAA data is incomplete. From one side of a box score being missing, to entire teams worth of stats, some of data appears to be missing. Is this going to correct itself, or is it something we can expect to continue permanently?
The information that you see as missing from different boxscores is probably for non-Division 1 schools, such as Central Missouri State. We only include full team/player stats for Division 1 teams. We do include all of the games that Division 1 teams play so sometimes non-Division 1 teams are included in feeds such as schedules, live scores, results, etc. but we do not include specific information for these teams. So for example, Central Missouri State would appear in the schedule, live scores and results feeds but there wouldn't be any specific player stats for them in the boxscore for that game and we don't track team stats for them in the Team Stats feed either.
Q: How does the Live-Scores-Feed work? I look at XML Live Scores Feed and see that I need a game ID. Do you have any advice on the best way to get live scores? Would I just look in the schedule xml file and look for a status of "in progress" or something and use that gameid when I fetch live scores?
Live Scores are arranged so that there is one file per game. A game file will then update every time there are changes. The game file contains the full state of the game at the time of publishing. The old game files remain on the server for two days before removal.
You will find out when games are scheduled using the Schedule feed and then look for these games in the Live Scores feed. The Game IDs come from the schedule feed as well
Use the Atom feeder (XML Delivery Documentation) to know about updates and set the polling interval to something fairly small, for example 5 seconds. It is important to use the 'newerThan' parameter with the Atom feeder when polling is frequent like this.